‘Claude-3 AI’ directory

See Also
Gwern
Links
Miscellaneous
Bibliography

See Also

Gwern

“Simulating ‘tail Collapse’ in R ”, Gwern 2024

Simulating ‘tail collapse’ in R

“LLM Challenge: Write Non-Biblical Sentences ”, Gwern 2024

LLM Challenge: Write Non-Biblical Sentences

“A Christmas Protestation ”, o1-pro et al 2024

A Christmas Protestation

“On the Impossibility of Superintelligent Rubik’s Cube Solvers ”, Gwern et al 2023

On the Impossibility of Superintelligent Rubik’s Cube Solvers

Links

“Using Claude Code to Modernize a 25-Year-Old Kernel Driver ”, Brant 2025

Using Claude Code to modernize a 25-year-old kernel driver

“Making Games in Go: 3 Months Without LLMs vs 3 Days With LLMs! ”, Gappa 2025

Making Games in Go: 3 Months Without LLMs vs 3 Days With LLMs! :

View External Link:

https://marianogappa.github.io/software/2025/08/24/i-made-two-card-games-in-go/

“What Makes Claude Code so Damn Good (And How to Recreate That Magic in Your Agent)‽ ”, vivek 2025

What makes Claude Code so damn good (and how to recreate that magic in your agent)‽

andonlabs @ "2025-08-13"

[GPT-5 on Vending Machine sales benchmark] :

https://x.com/andonlabs/status/1955692437558677529

elder_plinius @ "2025-08-06"

[Claude-4.1 jailbreak] :

https://x.com/elder_plinius/status/1952829605653749768

“Claude Fans Threw a Funeral for Anthropic’s Retired AI Model: Roughly 200 People Gathered in San Francisco on Saturday to Mourn the Loss of Claude 3 Sonnet, an Older AI Model That Anthropic Recently Killed ”, Robison 2025

Claude Fans Threw a Funeral for Anthropic’s Retired AI Model: Roughly 200 people gathered in San Francisco on Saturday to mourn the loss of Claude 3 Sonnet, an older AI model that Anthropic recently killed

“Teaching AI [Blacklist/whitelist of Example Sources for Claude RLHF] ”, Sheets 2025

Teaching AI [blacklist/whitelist of example sources for Claude RLHF]

“How Many Instructions Can LLMs Follow at Once? ”, Jaroslawicz et al 2025

How Many Instructions Can LLMs Follow at Once?

“So You Think You’ve Awoken ChatGPT ”, Mills 2025

So You Think You’ve Awoken ChatGPT

“Shutdown Resistance in Reasoning Models ”, Schlatter et al 2025

Shutdown resistance in reasoning models

“Strategic Intelligence in Large Language Models: Evidence from Evolutionary Game Theory ”, Payne & Alloui-Cros 2025

Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory

“Early Signs of Steganographic Capabilities in Frontier LLMs ”, Zolkowski et al 2025

Early Signs of Steganographic Capabilities in Frontier LLMs

“Show HN: I AI-Coded a Tower Defense Game and Documented the Whole Process ”

Show HN: I AI-coded a tower defense game and documented the whole process :

View HTML:

/doc/www/news.ycombinator.com/80dd4f64f4ce1de931772a3eddc874f45b9d74a3.html#44465558

“It’s Really Hard to Make Scheming Evals Look Realistic ”

It’s really hard to make scheming evals look realistic

“VideoGameBench: Can Vision-Language Models Complete Popular Video Games? ”, Zhang et al 2025

VideoGameBench: Can Vision-Language Models complete popular video games?

“RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics ”, Zhang et al 2025

RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

“RealMath [Code] ”, Zhang et al 2025

RealMath [code]

“Large Language Models Are More Persuasive Than Incentivized Human Persuaders ”, Schoenegger et al 2025

Large Language Models Are More Persuasive Than Incentivized Human Persuaders

“Measuring General Intelligence With Generated Games ”, Verma et al 2025

Measuring General Intelligence with Generated Games

“Claude Code [Experience and User Guide] ”, Slatton 2025

Claude Code [experience and user guide]

“Emacs in the Golden Age of LLMs Has Become the Truly Flexible Editor It Was Always Promised to Be but Never Achieved ”, AmateurPhotoGuy415 2025

Emacs in the Golden Age of LLMs has become the truly flexible editor it was always promised to be but never achieved

“Research Notes: Running Claude 3.7, Gemini 2.5-Pro, and GPT-O3 on Pokémon Red ”, Anonymous & Bradshaw 2025

Research Notes: Running Claude 3.7, Gemini 2.5-pro, and GPT-o3 on Pokémon Red :

View HTML:

/doc/www/www.greaterwrong.com/93dc008071a2ce5cf804a88bef80cae7386a4cae.html

“Is Google Gemini-2.5-Pro Now Better Than Claude at Pokémon? [Probably] ”, Bradshaw 2025

Is Google Gemini-2.5-pro now better than Claude at Pokémon? [probably]

“Measuring Models’ Special Interests ”

Measuring Models’ Special Interests :

View HTML:

https://zswitten.github.io/2025/04/14/model-special-interests.html

“AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-Time Computation ”, Chakrabarty et al 2025

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation

“Anthropic Education Report: How University Students Use Claude ”, Anthropic 2025

Anthropic Education Report: How University Students Use Claude

“The People Who Fall in Love With Chatbots: I Interviewed People Who’ve Developed Emotional—Even Sexual—Relationships With LLMs. They’re Not As Crazy As They Seem ”, Dee 2025

The People Who Fall in Love With Chatbots: I interviewed people who’ve developed emotional—even sexual—relationships with LLMs. They’re not as crazy as they seem

“The Curve Is Bending ”, Slatton 2025

The Curve is Bending

“Why Does Claude Speak Byzantine Music Notation? ”, Finke 2025

Why does Claude Speak Byzantine Music Notation?

“Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad ”, Petrov et al 2025

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

“GSM8K-Platinum: Revealing Performance Gaps in Frontier LLMs ”, Vendrow et al 2025

GSM8K-Platinum: Revealing Performance Gaps in Frontier LLMs

“Obscure Scientific Facts Benchmark ”, Azulay 2025

Obscure Scientific Facts Benchmark

“Reflecting on WikiTok ”, Aizk 2025

Reflecting on WikiTok

“Fiction.live: LiveBench Results, 25 February 2025: Real-World Long Context Benchmark for Writers ”

Fiction.live: liveBench results, 25 February 2025: Real-World Long Context Benchmark for Writers

“Spontaneous Giving and Calculated Greed in Language Models ”, Li & Shirado 2025

Spontaneous Giving and Calculated Greed in Language Models

“Claude 3.7 Sonnet and Claude Code ”, Anthropic 2025

Claude 3.7 Sonnet and Claude Code :

View HTML:

/doc/www/www.anthropic.com/bb31a6b905837c0bbca5154cbd2d1ac6329c9805.html

“None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks ”, Salido et al 2025

None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks

“Idiosyncrasies in Large Language Models ”, Sun et al 2025

Idiosyncrasies in Large Language Models

“Constitutional Classifiers: Defending against Universal Jailbreaks ”

Constitutional Classifiers: Defending against universal jailbreaks

“SycEval: Evaluating LLM Sycophancy ”, Fanous et al 2025

SycEval: Evaluating LLM Sycophancy

“Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs ”, Saxena et al 2025

Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs

“Do Large Language Model Benchmarks Test Reliability? ”, Vendrow et al 2025

Do Large Language Model Benchmarks Test Reliability?

“Thought Bubble: I Am Pleased to Report That AI Is Now a Better Poet Than William McGonagall ”, Hugh-Jones 2025

Thought bubble: I am pleased to report that AI is now a better poet than William McGonagall

“On DeepSeek and Export Controls ”, Amodei 2025

On DeepSeek and Export Controls

“A Young Man Used AI to Build A Nuclear Fusor and Now I Must Weep: Goodbye, Digital Natives. Hello, AI Natives ”, Vance 2025

A Young Man Used AI to Build A Nuclear Fusor and Now I Must Weep: Goodbye, Digital Natives. Hello, AI Natives :

View HTML:

/doc/www/www.corememory.com/c613a256525e633a3fcb8846713b5d9cd492dcf0.html

“Building Personal Software With Claude ”, Elhage 2025

Building personal software with Claude :

View HTML:

/doc/www/blog.nelhage.com/7b7c29617419e040d145eaeb19bd1855d5d99d71.html

“How Different LLMs Answered the PhilPapers 2020 Survey ”, Satron 2025

How different LLMs answered the PhilPapers 2020 survey

“People Who Frequently Use ChatGPT for Writing Tasks Are Accurate and Robust Detectors of AI-Generated Text ”, Russell et al 2025

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text

“Human Study on AI Spear Phishing Campaigns ”, Lermen & Heiding 2025

Human study on AI spear phishing campaigns

“Can LLMs Write Better Code If You Keep Asking Them to ‘Write Better Code’? ”

Can LLMs write better code if you keep asking them to ‘write better code’? :

View External Link:

https://minimaxir.com/2025/01/write-better-code/

“Won’t versus Can’t: Sandbagging-Like Behavior from Claude Models ”

Won’t versus Can’t: Sandbagging-like Behavior from Claude Models :

View HTML:

/doc/www/alignment.anthropic.com/551b4c951c710ea6c6bc7e9ea3002aa3e6333322.html

“Favorite Colors of Some LLMs ”, an 2024

Favorite colors of some LLMs

“Performance of LLMs on Advent of Code 2024 ”, Pinto 2024

Performance of LLMs on Advent of Code 2024

“Conversations With Tyler 2024 Retrospective: Predictions With Claude ”, Reesor 2024

Conversations with Tyler 2024 Retrospective: predictions with Claude

“The Emergence of Strategic Reasoning of Large Language Models ”, Lee & Kader 2024

The Emergence of Strategic Reasoning of Large Language Models

“Cultural Evolution of Cooperation among LLM Agents ”, Vallinder & Hughes 2024

Cultural Evolution of Cooperation among LLM Agents

“Clio: Privacy-Preserving Insights into Real-World AI Use ”, Anthropic 2024

Clio: Privacy-preserving insights into real-world AI use

“Frontier Models Are Capable of In-Context Scheming ”, Meinke et al 2024

Frontier Models are Capable of In-Context Scheming

“Frontier Models Are Capable of In-Context Scheming ”, Hobbhahn et al 2024

Frontier Models are Capable of In-Context Scheming

“LLMs Learn to Collaborate and Reason: December 2024 Update to ‘Generative AI for Economic Research: Use Cases and Implications for Economists’, Published in the Journal of Economic Literature 61(4) ”, Korinek 2024

LLMs Learn to Collaborate and Reason: December 2024 Update to ‘Generative AI for Economic Research: Use Cases and Implications for Economists’, Published in the Journal of Economic Literature 61(4)

“A Few Prompts I Use to Test LLM Creativity ”

A Few Prompts I Use to Test LLM Creativity

“Age against the Machine—Susceptibility of Large Language Models to Cognitive Impairment: Cross Sectional Analysis ”

Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis

“Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects ”, Heiding et al 2024

Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects

“BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ”, Paglieri et al 2024

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

“Business Spending on AI Surged 500% This Year to $13.8 Billion ”

Business spending on AI surged 500% this year to $13.8 billion

“Are LLMs Prescient? A Continuous Evaluation Using Daily News As the Oracle ”, Dai et al 2024

Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle

“The Neruda Factory ”, Jenn 2024

The Neruda Factory :

View HTML:

/doc/www/jenn.site/64e8a75cfe83b7b754583dab77826628e2d3ee84.html

“Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters ”, Potter et al 2024

Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters

“AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents ”, Andriushchenko et al 2024

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

“Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making ”, Li et al 2024

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

“A Single Cloud Compromise Can Feed an Army of AI Sex Bots ”, Krebs 2024

A Single Cloud Compromise Can Feed an Army of AI Sex Bots

“Invisible Unicode Text That AI Chatbots Understand and Humans Can’t? Yep, It’s a Thing ”

Invisible Unicode text that AI chatbots understand and humans can’t? Yep, it’s a thing

“Does Style Matter? Disentangling Style and Substance in Chatbot Arena ”

Does style matter? Disentangling style and substance in Chatbot Arena :

View HTML:

/doc/www/lmsys.org/f378decdc51f1ed985c69386f92511c2898363c7.html

“Replacing My Right Hand With AI ”, Schluntz 2024

Replacing my Right Hand with AI :

View HTML:

/doc/www/erikschluntz.com/076e50f5dc692923bc072d387bd8f3911e9cad53.html

“System Prompts ”, Anthropic 2024

System Prompts :

View HTML:

/doc/www/docs.anthropic.com/e117d055c52d54ee6dfa9e3d029b0309ff59077a.html#july-12th-2024

“Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs ”, Laine et al 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

“APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Liu et al 2024

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

“On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-Sonnet] ”, Claude-3 2024

On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-sonnet]

“Anthropic Claims Its Latest Model Is Best-In-Class ”, Wiggers 2024

Anthropic claims its latest model is best-in-class

“Anthropic’s Latest Claude AI Model Pulls ahead of Rivals from OpenAI and Google ”, Knight 2024

Anthropic’s latest Claude AI model pulls ahead of rivals from OpenAI and Google

“OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI ”, Huang et al 2024

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

“Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models ”, Denison et al 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

“Are We Done With MMLU? ”, Gema et al 2024

Are We Done with MMLU?

“DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ ”, Belouadi et al 2024

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

“AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse ”, Levy 2024

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse

“SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”, Yang et al 2024

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

“Analyzing Poems With LLMs ”, Toper 2024

Analyzing poems with LLMs

“GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic ”, Zhang et al 2024

GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic

“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples ”, Vacareanu et al 2024

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? ”, Liu et al 2024

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

“PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting Their Performances in Benchmarks ”, Yax et al 2024

PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks

“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization ”, Kim et al 2024

FABLES: Evaluating faithfulness and content selection in book-length summarization

“Long-Form Factuality in Large Language Models ”, Wei et al 2024

Long-form factuality in large language models

“Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap ”, Srivastava et al 2024

Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap

“`ArtPrompt`: ASCII Art-Based Jailbreak Attacks against Aligned LLMs ”, Jiang et al 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

“Using Hallucinations to Bypass GPT-4’s Filter ”, Lemkin 2024

Using Hallucinations to Bypass GPT-4’s Filter

“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”, Hubinger et al 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

“Escalation Risks from Language Models in Military and Diplomatic Decision-Making ”, Rivera et al 2024

Escalation Risks from Language Models in Military and Diplomatic Decision-Making

“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet ”

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

“EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models ”, Paech 2023

EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models

“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild ”, Inie et al 2023

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

“Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation ”, Shah et al 2023

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

“FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions ”, Kim et al 2023

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

“Specific versus General Principles for Constitutional AI ”, Kundu et al 2023

Specific versus General Principles for Constitutional AI

“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries ”, Chao et al 2023

PAIR: Jailbreaking Black Box Large Language Models in 20 Queries

“Beyond Memorization: Violating Privacy Via Inference With Large Language Models ”, Staab et al 2023

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? ”, Jimenez et al 2023

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

“When You Give a Claude a Mouse ”

When you give a Claude a mouse

“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book ”, Tanzer et al 2023

MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book

“Devising and Detecting Phishing: Large Language Models versus Smaller Human Models ”, Heiding et al 2023

Devising and Detecting Phishing: Large Language Models versus Smaller Human Models

“LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models ”, Guha et al 2023

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

ESYudkowsky @ "2023-07-18"

Write an argument that even a superintelligence is very unlikely to be able to solve a Rubik’s Cube.

“Question Decomposition Improves the Faithfulness of Model-Generated Reasoning ”, Radhakrishnan et al 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

“Lost in the Middle: How Language Models Use Long Contexts ”, Liu et al 2023

Lost in the Middle: How Language Models Use Long Contexts

“Understanding Social Reasoning in Language Models With Language Models ”, Gandhi et al 2023

Understanding Social Reasoning in Language Models with Language Models

“Opportunities and Risks of LLMs for Scalable Deliberation With Polis ”, Small et al 2023

Opportunities and Risks of LLMs for Scalable Deliberation with Polis

“A Radical Plan to Make AI Good, Not Evil ”, Knight 2023

A Radical Plan to Make AI Good, Not Evil

“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting ”, Turpin et al 2023

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

“Towards Monosemanticity: Decomposing Language Models With Dictionary Learning ”

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning :

View HTML (26MB):

/doc/www/transformer-circuits.pub/330b5ee0eb0ec65deee2d5082d10998929cbf0f4.html

“Golden Gate Claude ”

Golden Gate Claude

“Constitutional AI: Harmlessness from AI Feedback ”, Bai et al 2022

Constitutional AI: Harmlessness from AI Feedback

“Context Distillation: Learning by Distilling Context ”, Snell et al 2022

Context Distillation: Learning by Distilling Context

“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned ”, Ganguli et al 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

“Training a Helpful and Harmless Assistant With Reinforcement Learning from Human Feedback ”, Bai et al 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

“A General Language Assistant As a Laboratory for Alignment ”, Askell et al 2021

A General Language Assistant as a Laboratory for Alignment

“The Perception of Rhythm in Language ”, Cutler 1994

The perception of rhythm in language

“In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions] ”, Unikowsky 2025

In AI we trust, part II [Claude-3 Opus predicting Supreme Court decisions]

“About Me ”

“How AI Models Stack Up Against My 11-Year-Old? ”

How AI Models Stack Up Against My 11-Year-Old?

“How I Use Claude ”

How I Use Claude

“An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process ”

An amazing journey with Claude 3.5 and ChatGPT-4o who helped me backwards engineer an econometrics theory paper and taught me a lot more in the process

“Your AI Can’t See Gorillas ”, Gohel 2025

Your AI can’t see gorillas

“Janus ”

“EQ-Bench 3 Leaderboard ”

EQ-Bench 3 Leaderboard

“`elimination_game`: A Multi-Player Tournament Benchmark That Tests LLMs in Social Reasoning, Strategy, & Deception. Players Engage in Public & Private Conversations, Form Alliances, & Vote to Eliminate Each Other ”, Mazur 2025

elimination_game: A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, & deception. Players engage in public & private conversations, form alliances, & vote to eliminate each other

“HN Wrapped: ‘Gwern’ [Claude Roast] ”

HN Wrapped: ‘Gwern’ [Claude roast] :

View HTML:

/doc/www/hn-wrapped.kadoa.com/1c3dac64b313946fd15deede039cf0b31189e1e6.html

“Rational Agents [Simulating Your Girlfriend As a Chatbot] ”, Sun & Anonymous 2025

rational agents [simulating your girlfriend as a chatbot] :

View HTML:

/doc/www/jasmi.news/a38f6ce8704e537a186e99b401dd4fbd0c61bf4a.html#%C2%A7rational-agents

“Claude, Read the Chevron PDF ”, Cowen & Claude-3 2025

Claude, read the Chevron PDF

“Claude Sonnet 3.5, Economist ”

Claude Sonnet 3.5, economist

“How Anthropic Built Artifacts ”, Orosz 2025

How Anthropic built Artifacts :

View HTML:

/doc/www/newsletter.pragmaticengineer.com/e20cc27ccea0d8ec5d4e7a9a71b5d3e325d41754.html

“My First Wave Meditation Grandma: So Great to Hear You’ve Gotten into That Bay Area Dharma Scene! How Are Those Old Hippies?… ”, Eaton 2025

my first wave meditation grandma: So great to hear you’ve gotten into that Bay Area dharma scene! How are those old hippies?…

“SWE-Agent ”

“On Claude 3.5 Sonnet ”

On Claude 3.5 Sonnet

“Claude’s Dark Spiritual AI Futurism ”

Claude’s dark spiritual AI futurism

“European Parliament Revolutionizes Archive Access With Claude AI ”, Anthropic 2025

European Parliament Revolutionizes Archive Access with Claude AI

“Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku ”, Anthropic 2025

Introducing ‘computer use’, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku

“Introducing Claude 3.5 ”

Introducing Claude 3.5

“Fine-Tune Claude 3 Haiku in Amazon Bedrock ”

Fine-tune Claude 3 Haiku in Amazon Bedrock :

View HTML:

/doc/www/www.anthropic.com/291a48ed6101368fdb8588cc0568979ce9db3e20.html

“Claude 3.5 Sonnet on GitHub Copilot ”, Anthropic 2025

Claude 3.5 Sonnet on GitHub Copilot

“Introducing Citations on the Anthropic API ”

Introducing Citations on the Anthropic API

“Claude Can Now Search the Web ”, Anthropic 2025

Claude can now search the web :

View HTML:

/doc/www/www.anthropic.com/5dab4e1f22e8c6fbc1826725801c4020ada11a3f.html

“Claude’s Character ”, Anthropic 2025

Claude’s Character :

View HTML:

/doc/www/www.anthropic.com/a9f33831747615fc9d619b346ca263844b243b61.html

“Developing a Computer Use Model ”, Anthropic 2025

Developing a computer use model

“Project Vend: Can Claude Run a Small Shop? (And Why Does That Matter?) ”

Project Vend: Can Claude run a small shop? (And why does that matter?)

“Values in the Wild: Discovering and Analyzing Values in Real-World Language Model Interactions ”

Values in the wild: Discovering and analyzing values in real-world language model interactions

“How I Use Claude ”, Balwit 2025

How I Use Claude

“Websim, Worldsim, and The Summer of Simulative AI ”

Websim, Worldsim, and The Summer of Simulative AI

“The Hidden Cost of Our Lies to AI ”

The Hidden Cost of Our Lies to AI

“[Critical Thinking in Factchecking a Wikipedia Entry] ”, Marcello 2025

[Critical thinking in factchecking a Wikipedia entry]

“Claude Sonnet 3.7 (Often) Knows When It’s in Alignment Evaluations ”

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations

“How Good Are LLMs at Doing ML on an Unknown Dataset? ”

How good are LLMs at doing ML on an unknown dataset?

“VDT: a Solution to Decision Theory ”

VDT: a solution to decision theory

“A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More ”

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More

“Do Models Know When They Are Being Evaluated? ”

Do models know when they are being evaluated?

“A Three-Layer Model of LLM Psychology ”

A Three-Layer Model of LLM Psychology

“One Shockingly Impressive Capability of GPT-4.5 [Photo Geolocation] ”

One shockingly impressive capability of GPT-4.5 [photo geolocation]

“AI Will Increase the Quantity—And Quality—Of Phishing Scams ”

AI Will Increase the Quantity—and Quality—of Phishing Scams

“Claude Plays Pokemon ”

Claude Plays Pokemon

QiaochuYuan

[Claude jokes about itself] :

https://x.com/QiaochuYuan/status/1852831246482813336

Steve_Yegge

[on Claude Code] :

https://x.com/Steve_Yegge/status/1898674257808515242

elder_plinius

[Claude as AI Hitman] :

https://x.com/elder_plinius/status/1878946571565650264

repligate

Claude-3 base-model-like jailbreak :

https://x.com/repligate/status/1776041976653402508

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`benchmarking-tools harm-detection llm-flexibility dataset-generation scientific-graphics harm-evaluation`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`explainability`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`claude-performance`

[see previous entry]

[see previous entry]

`llm-manipulation`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia (1)

Claude (language model) :

https://en.wikipedia.org/wiki/Claude_(language_model)

Miscellaneous

Bibliography

https://arxiv.org/abs/2505.07215: “Measuring General Intelligence With Generated Games ”, Vivek Verma, David Huang, William Chen, Dan Klein, Nicholas Tomlin

link-bibliography
https://www.reddit.com/r/emacs/comments/1ka2zmv/emacs_in_the_golden_age_of_llms/: “Emacs in the Golden Age of LLMs Has Become the Truly Flexible Editor It Was Always Promised to Be but Never Achieved ”, AmateurPhotoGuy415

link-bibliography
https://arxiv.org/abs/2503.21934: “Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad ”, Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev, Maria Drencheva, Kristian Minchev, Mislav Balunović, Nikola Jovanović, Martin Vechev

link-bibliography
https://arxiv.org/abs/2502.12896: “None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks ”, Eva Sánchez Salido, Julio Gonzalo, Guillermo Marco

link-bibliography
https://arxiv.org/abs/2501.15654: “People Who Frequently Use ChatGPT for Writing Tasks Are Accurate and Robust Detectors of AI-Generated Text ”, Jenna Russell, Marzena Karpinska, Mohit Iyyer

link-bibliography
https://wiremodal.net/cwt: “Conversations With Tyler 2024 Retrospective: Predictions With Claude ”, Ben Reesor

link-bibliography
https://arxiv.org/abs/2411.13543: “BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ”, Davide Paglieri, Bartłomiej Cupiał, Samuel Coward, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rocktäschel

link-bibliography
https://arxiv.org/abs/2407.04694: “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs ”, Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans

link-bibliography
https://arxiv.org/abs/2406.18518#salesforce: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

link-bibliography
https://arxiv.org/abs/2405.15306: “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ ”, Jonas Belouadi, Simone Paolo Ponzetto, Steffen Eger

link-bibliography
https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/: “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse ”, Steven Levy

link-bibliography
https://arxiv.org/abs/2405.15793: “SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”, John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press

link-bibliography
https://arxiv.org/abs/2405.00332#scale: “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic ”, Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

link-bibliography
https://arxiv.org/abs/2404.07544: “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples ”, Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu

link-bibliography
https://arxiv.org/abs/2404.05955: “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? ”, Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue

link-bibliography
https://arxiv.org/abs/2403.18802#deepmind: “Long-Form Factuality in Large Language Models ”, Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

link-bibliography
https://arxiv.org/abs/2402.19450: “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap ”, Saurabh Srivastava, Annarose M. B, Anto P. V, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas

link-bibliography
https://arxiv.org/abs/2402.11753: “ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs ”, Fengqing Jiang, Zhangchen Xu, Luyao Niu, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li, Radha Poovendran

link-bibliography
https://arxiv.org/abs/2401.05566#anthropic: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”, Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez

link-bibliography
https://arxiv.org/abs/2312.06281: “EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models ”, Samuel J. Paech

link-bibliography
https://arxiv.org/abs/2310.08419: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries ”, Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

link-bibliography
https://arxiv.org/abs/2308.12287: “Devising and Detecting Phishing: Large Language Models versus Smaller Human Models ”, Fredrik Heiding, Bruce Schneier, Arun Vishwanath, Jeremy Bernstein, Peter S. Park

link-bibliography
https://x.com/ESYudkowsky/status/1681442477994311681: “Write an Argument That Even a Superintelligence Is Very Unlikely to Be Able to Solve a Rubik’s Cube. ”, Eliezer Yudkowsky

link-bibliography
https://arxiv.org/abs/2306.15448: “Understanding Social Reasoning in Language Models With Language Models ”, Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

link-bibliography
https://www.wired.com/story/anthropic-ai-chatbots-ethics/: “A Radical Plan to Make AI Good, Not Evil ”, Will Knight

link-bibliography
https://arxiv.org/abs/2305.04388: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting ”, Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman

link-bibliography
https://www.anthropic.com/red_teaming.pdf: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned ”, Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy L. Jones, Samuel R. Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom B. Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, Jack Clark

link-bibliography
https://arxiv.org/abs/2112.00861#anthropic: “A General Language Assistant As a Laboratory for Alignment ”, Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy L. Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom B. Brown, Jack Clark, Sam McCandlish, Chris Olah, Jared Kaplan

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]