GPT-3 nonfiction tag

Gwern Branwen

See Also
Links
Miscellaneous
Link Bibliography

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

Links

“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization”, Kim et al 2024

FABLES: Evaluating faithfulness and content selection in book-length summarization

“Vulnerability Detection With Code Language Models: How Far Are We?”, Ding et al 2024

Vulnerability Detection with Code Language Models: How Far Are We?

“The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, Who Leads Research at the National Security Agency, Says Large Language Models Are Incredibly Useful—And a Bit of a Headache—For America’s Intelligence Machine”, Knight 2024

The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge: Gilbert Herrera, who leads research at the National Security Agency, says large language models are incredibly useful—and a bit of a headache—for America’s intelligence machine

“Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews”, Liang et al 2024

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

“Tokenization Counts: the Impact of Tokenization on Arithmetic in Frontier LLMs”, Singh & Strouse 2024

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

“`ArtPrompt`: ASCII Art-Based Jailbreak Attacks against Aligned LLMs”, Jiang et al 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

“The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, Renze & Guven 2024

The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4

“Does Using ChatGPT Result in Human Cognitive Augmentation?”, Fulbright & Morrison 2024

Does Using ChatGPT Result in Human Cognitive Augmentation?

“TinyGSM: Achieving >80% on GSM8k With Small Language Models”, Liu et al 2023

TinyGSM: achieving >80% on GSM8k with small language models

“Universal Self-Consistency for Large Language Model Generation”, Chen et al 2023

Universal Self-Consistency for Large Language Model Generation

“PEARL: Personalizing Large Language Model Writing Assistants With Generation-Calibrated Retrievers”, Mysore et al 2023

PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

“Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations”, Hong et al 2023

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

“Data Contamination Through the Lens of Time”, Roberts et al 2023

Data Contamination Through the Lens of Time

“Can GPT Models Be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on Mock CFA Exams”, Callanan et al 2023

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

“Beyond Memorization: Violating Privacy Via Inference With Large Language Models”, Staab et al 2023

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

“GeoLLM: Extracting Geospatial Knowledge from Large Language Models”, Manvi et al 2023

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

“Can a Computer Outfake a Human [personality]?”, Phillips & Robie 2023

Can a computer outfake a human [personality]?

“Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Zhou et al 2023

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

“Using Large Language Models for Qualitative Analysis Can Introduce Serious Bias”, Ashwin et al 2023

Using Large Language Models for Qualitative Analysis Can Introduce Serious Bias

“Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve”, McCoy et al 2023

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

“The Cambridge Law Corpus: A Corpus for Legal AI Research”, Östling et al 2023

The Cambridge Law Corpus: A Corpus for Legal AI Research

“Assessing the Nature of Large Language Models: A Caution against Anthropocentrism”, Speed 2023

Assessing the nature of large language models: A caution against anthropocentrism

“A Boy Saw 17 Doctors over 3 Years for Chronic Pain. ChatGPT Found the Diagnosis”, Holohan 2023

A boy saw 17 doctors over 3 years for chronic pain. ChatGPT found the diagnosis

“Investigating the Existence of ‘Secret Language’ in Language Models”, Wang et al 2023

Investigating the Existence of ‘Secret Language’ in Language Models

“Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow”, Rio-Chanona et al 2023

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

“Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events”, Gu et al 2023

Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events

“Explaining Competitive-Level Programming Solutions Using LLMs”, Li et al 2023

Explaining Competitive-Level Programming Solutions using LLMs

“Lost in the Middle: How Language Models Use Long Contexts”, Liu et al 2023

Lost in the Middle: How Language Models Use Long Contexts

“Evaluating Superhuman Models With Consistency Checks”, Fluri et al 2023

Evaluating Superhuman Models with Consistency Checks

“Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks”, Veselovsky et al 2023

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

“Can Large Language Models Democratize Access to Dual-Use Biotechnology?”, Soice et al 2023

Can large language models democratize access to dual-use biotechnology?

“Iterative Translation Refinement With Large Language Models”, Chen et al 2023

Iterative Translation Refinement with Large Language Models

“Don’t Want Students to Rely on ChatGPT? Have Them Use It: It’s Easy to Forget How Little Students and Educators Understand Generative AI’s Flaws. Once They Actually Try It Out, They’ll See That It Can’t Replace Them”, Howell 2023

Don’t Want Students to Rely on ChatGPT? Have Them Use It: It’s easy to forget how little students and educators understand generative AI’s flaws. Once they actually try it out, they’ll see that it can’t replace them

“The Exciting Potential for ChatGPT in Obstetrics and Gynecology”, Grünebaum et al 2023

The exciting potential for ChatGPT in obstetrics and gynecology⁠:

View PDF:

/doc/ai/nn/transformer/gpt/3/nonfiction/2023-grunebaum.pdf

“Do GPTs Produce Less Literal Translations?”, Raunak et al 2023

Do GPTs Produce Less Literal Translations?

“The False Promise of Imitating Proprietary LLMs”, Gudibande et al 2023

The False Promise of Imitating Proprietary LLMs

“Learning to Generate Novel Scientific Directions With Contextualized Literature-Based Discovery”, Wang et al 2023

Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery

“How Language Model Hallucinations Can Snowball”, Zhang et al 2023

How Language Model Hallucinations Can Snowball

“LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions”, Wu et al 2023

LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions

“Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition”, Muffo et al 2023

Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition

“Generative AI at Work”, Brynjolfsson et al 2023

Generative AI at Work

“Humans in Humans Out: On GPT Converging Toward Common Sense in Both Success and Failure”, Koralus & Wang-Maścianica 2023

Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure

“Language Models Can Solve Computer Tasks”, Kim et al 2023

Language Models can Solve Computer Tasks

“Performance of ChatGPT on Free-Response, Clinical Reasoning Exams”, Strong et al 2023

Performance of ChatGPT on free-response, clinical reasoning exams

“How Well Do Large Language Models Perform in Arithmetic Tasks?”, Yuan et al 2023

How well do Large Language Models perform in Arithmetic tasks?

“Is ChatGPT a General-Purpose Natural Language Processing Task Solver?”, Qin et al 2023

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

“Predicting Consumer Contracts [With GPT-3]”, Kolt 2023

Predicting Consumer Contracts [With GPT-3]

“Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, Pullen 2023

Use GPT-3 incorrectly: reduce costs 40× and increase speed by 5×

“A Judge Just Used ChatGPT to Make a Court Decision: The Case Is the First Time a Court Has Admitted to Using the AI Text Generator’s Answers in a Legal Ruling”, Rose 2023

A Judge Just Used ChatGPT to Make a Court Decision: The case is the first time a court has admitted to using the AI text generator’s answers in a legal ruling

“Co-Writing With Opinionated Language Models Affects Users’ Views”, Jakesch et al 2023

Co-Writing with Opinionated Language Models Affects Users’ Views

“The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, Kahn 2023

The inside story of ChatGPT: How OpenAI founder Sam Altman built the world’s hottest technology with billions from Microsoft

“How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection”, Guo et al 2023

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

“Can GPT-3 Produce New Ideas? Partially Automating Robin Hanson and Others § If You Never Miss a Plane…”, Sempere 2023

Can GPT-3 produce new ideas? Partially automating Robin Hanson and others § If you never miss a plane…

“How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment”, Gilson et al 2023

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

“GPT-3 Takes the Bar Exam”, II & Katz 2022

GPT-3 Takes the Bar Exam

“Precise Zero-Shot Dense Retrieval without Relevance Labels”, Gao et al 2022

Precise Zero-Shot Dense Retrieval without Relevance Labels

“Self-Instruct: Aligning Language Models With Self-Generated Instructions”, Wang et al 2022

Self-Instruct: Aligning Language Models with Self-Generated Instructions

“Emergent Analogical Reasoning in Large Language Models”, Webb et al 2022

Emergent Analogical Reasoning in Large Language Models

“Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI”, Wiggers 2022

Harvey, which uses AI to answer legal questions, lands cash from OpenAI

“LMentry: A Language Model Benchmark of Elementary Language Tasks”, Efrat et al 2022

LMentry: A Language Model Benchmark of Elementary Language Tasks

“Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)”, Press et al 2022

Self-Ask: Measuring and Narrowing the Compositionality Gap in Language Models (Bamboogle)

“How Persuasive Is AI-Generated Argumentation? An Analysis of the Quality of an Argumentative Text Produced by the GPT-3 AI Text Generator”, Hinton & Wagemans 2022

How persuasive is AI-generated argumentation? An analysis of the quality of an argumentative text produced by the GPT-3 AI text generator

“Out of One, Many: Using Language Models to Simulate Human Samples”, Argyle et al 2022

Out of One, Many: Using Language Models to Simulate Human Samples

“What Does a Platypus Look Like? Generating Customized Prompts for Zero-Shot Image Classification (CuPL)”, Pratt et al 2022

What does a platypus look like? Generating customized prompts for zero-shot image classification (CuPL)

“Using Large Language Models to Simulate Multiple Humans”, Aher et al 2022

Using Large Language Models to Simulate Multiple Humans

“Limitations of Language Models in Arithmetic and Symbolic Induction”, Qian et al 2022

Limitations of Language Models in Arithmetic and Symbolic Induction

“RealTime QA: What’s the Answer Right Now?”, Kasai et al 2022

RealTime QA: What’s the Answer Right Now?

“GODEL: Large-Scale Pre-Training for Goal-Directed Dialog”, Peng et al 2022

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

“Can GPT-3 Write an Academic Paper on Itself, With Minimal Human Input?”, GPT-3 et al 2022 (page 2)

Can GPT-3 write an academic paper on itself, with minimal human input?

“NaturalProver: Grounded Mathematical Proof Generation With Language Models”, Welleck et al 2022

NaturalProver: Grounded Mathematical Proof Generation with Language Models

“OPT: Open Pre-Trained Transformer Language Models”, Zhang et al 2022

OPT: Open Pre-trained Transformer Language Models

“InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022

InstructGPT: Training language models to follow instructions with human feedback

“Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?”, Min et al 2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

“Impact of Pretraining Term Frequencies on Few-Shot Reasoning”, Razeghi et al 2022

Impact of Pretraining Term Frequencies on Few-Shot Reasoning

“Contracts in the Age of Smart Readers”, Arbel & Becher 2022

Contracts in the Age of Smart Readers

“Memory-Assisted Prompt Editing to Improve GPT-3 After Deployment”, Madaan et al 2022

Memory-assisted prompt editing to improve GPT-3 after deployment

“CommonsenseQA 2.0: Exposing the Limits of AI through Gamification”, Talmor et al 2022

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

“Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution”, Tu et al 2022

Limits of Using Artificial Intelligence and GPT-3 in Patent Prosecution

“What Can a Generative Language Model Answer About a Passage?”, Summers-Stay et al 2021

What Can a Generative Language Model Answer About a Passage?

“Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, Solaiman & Dennison 2021

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

“Scaling Laws for Autoregressive Generative Modeling”, Henighan et al 2020

Scaling Laws for Autoregressive Generative Modeling

“GPT-3: Its Nature, Scope, Limits, and Consequences”, Floridi & Chiriatti 2020

GPT-3: Its Nature, Scope, Limits, and Consequences

“MMLU: Measuring Massive Multitask Language Understanding”, Hendrycks et al 2020

MMLU: Measuring Massive Multitask Language Understanding

“GPT-3: Language Models Are Few-Shot Learners”, Brown et al 2020

GPT-3: Language Models are Few-Shot Learners

spolu

The examples are indeed extremely simple on purpose (otherwise it’s hard to communicate efficiently what’s happening to non-Metamath experts). That being said, we’re still pretty far away from IMOs; but this is definitely a goal for us, and one we’re actively working towards!

“A Robot Wrote This Entire Article. Are You Scared Yet, Human? We Asked GPT-3, OpenAI’s Powerful New Language Generator, to Write an Essay for Us from Scratch. The Assignment? To Convince Us Robots Come in Peace | For More about GPT-3 and How This Essay Was Written and Edited, Please Read Our Editor’s Note Below”

A robot wrote this entire article. Are you scared yet, human? We asked GPT-3, OpenAI’s powerful new language generator, to write an essay for us from scratch. The assignment? To convince us robots come in peace | For more about GPT-3 and how this essay was written and edited, please read our editor’s note below

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.