GPT calibration tag

See Also
Gwern
- “GPT-3 Nonfiction”, Gwern 2020
Links
Miscellaneous
Link Bibliography

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

See Also

Parent

Gwern

“GPT-3 Nonfiction”, Gwern 2020

GPT-3 Nonfiction

Links

“Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience”, Han et al 2024

Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience

“Few-Shot Recalibration of Language Models”, Li et al 2024

Few-Shot Recalibration of Language Models

“Do LLMs Know about Hallucination? An Empirical Investigation of LLM’s Hidden States”, Duan et al 2024

Do LLMs Know about Hallucination? An Empirical Investigation of LLM’s Hidden States

“The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, Renze & Guven 2024

The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4

“Learning to Trust Your Feelings: Leveraging Self-Awareness in LLMs for Hallucination Mitigation”, Liang et al 2024

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

“Challenges With Unsupervised LLM Knowledge Discovery”, Farquhar et al 2023

Challenges with unsupervised LLM knowledge discovery

“Calibrated Language Models Must Hallucinate”, Kalai & Vempala 2023

Calibrated Language Models Must Hallucinate

“R-Tuning: Teaching Large Language Models to Refuse Unknown Questions”, Zhang et al 2023

R-Tuning: Teaching Large Language Models to Refuse Unknown Questions

“Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”, Shrivastava et al 2023

Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation

“Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Schoenegger & Park 2023

Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament

“The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets”, Marks & Tegmark 2023

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

“Representation Engineering: A Top-Down Approach to AI Transparency”, Zou et al 2023

Representation Engineering: A Top-Down Approach to AI Transparency

“How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions”, Pacchiardi et al 2023

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

“Inference-Time Intervention: Eliciting Truthful Answers from a Language Model”, Li et al 2023

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

“Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned With Human Feedback”, Tian et al 2023

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

“How Language Model Hallucinations Can Snowball”, Zhang et al 2023

How Language Model Hallucinations Can Snowball

“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Xie et al 2023

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

“GPT-4 Technical Report § Limitations: Calibration”, OpenAI 2023 (page 12 org openai)

GPT-4 Technical Report § Limitations: Calibration

“Toolformer: Language Models Can Teach Themselves to Use Tools”, Schick et al 2023

Toolformer: Language Models Can Teach Themselves to Use Tools

“Predicting Consumer Contracts [With GPT-3]”, Kolt 2023

Predicting Consumer Contracts [With GPT-3]

“Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, Nay 2023

Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards

“Can Large Language Models Reason about Medical Questions?”, Liévin et al 2022

Can large language models reason about medical questions?

“Language Models (Mostly) Know What They Know”, Kadavath et al 2022

Language Models (Mostly) Know What They Know

“Forecasting Future World Events With Neural Networks”, Zou et al 2022

Forecasting Future World Events with Neural Networks

“Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, Srivastava et al 2022

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

“Teaching Models to Express Their Uncertainty in Words”, Lin et al 2022

Teaching Models to Express Their Uncertainty in Words

“Co-Training Improves Prompt-Based Learning for Large Language Models”, Lang et al 2022

Co-training Improves Prompt-based Learning for Large Language Models

“AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021

AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

“Calibrate Before Use: Improving Few-Shot Performance of Language Models”, Zhao et al 2021

Calibrate Before Use: Improving Few-Shot Performance of Language Models

“Reducing Conversational Agents’ Overconfidence through Linguistic Calibration”, Mielke et al 2020

Reducing conversational agents’ overconfidence through linguistic calibration

Miscellaneous

Link Bibliography

https://arxiv.org/abs/2310.13014: “Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Philipp Schoenegger, Peter S. Park

link-bibliography
https://arxiv.org/abs/2305.13534: “How Language Model Hallucinations Can Snowball”, Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah Smith

link-bibliography
https://arxiv.org/pdf/2303.08774.pdf#page=12&org=openai: “GPT-4 Technical Report § Limitations: Calibration”, OpenAI

link-bibliography
2022-kolt.pdf: “Predicting Consumer Contracts [With GPT-3]”, Noam Kolt

link-bibliography
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335945: “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, John Nay

link-bibliography
https://arxiv.org/abs/2207.08143: “Can Large Language Models Reason about Medical Questions?”, Valentin Liévin, Christoffer Egeberg Hother, Ole Winther

link-bibliography
https://arxiv.org/abs/2207.05221#anthropic: “Language Models (Mostly) Know What They Know”, Saurav Kadavath, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer

link-bibliography
https://arxiv.org/abs/2206.15474: “Forecasting Future World Events With Neural Networks”, Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]