Codex tag

Gwern Branwen

See Also
Gwern
- “CQK Is The First Unused TLA”, Gwern 2023
Links
Miscellaneous
Link Bibliography

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

Gwern

“CQK Is The First Unused TLA”, Gwern 2023

CQK Is The First Unused TLA

Links

“A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round Could Increase Startup’s Valuation Nearly Sixfold in a Matter of Weeks, Reflecting AI Frenzy”, Jin 2024

A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion Valuation: Funding round could increase startup’s valuation nearly sixfold in a matter of weeks, reflecting AI frenzy

“Vulnerability Detection With Code Language Models: How Far Are We?”, Ding et al 2024

Vulnerability Detection with Code Language Models: How Far Are We?

“Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A New Startup Called Cognition AI Can Turn a User’s Prompt into a Website or Video Game”, Vance 2024

Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A new startup called Cognition AI can turn a user’s prompt into a website or video game

“TestGen-LLM: Automated Unit Test Improvement Using Large Language Models at Meta”, Alshahwan et al 2024

TestGen-LLM: Automated Unit Test Improvement using Large Language Models at Meta

“The Impact of AI Tool on Engineering at ANZ Bank: An Empirical Study on GitHub Copilot Within a Corporate Environment”, Chatterjee et al 2024

The Impact of AI Tool on Engineering at ANZ Bank: An Empirical Study on GitHub Copilot Within a Corporate Environment

“Coding on Copilot: 2023 Data Shows Downward Pressure on Code Quality, Plus Projections for 2024”, Harding & Kloster 2024

Coding on Copilot: 2023 Data Shows Downward Pressure on Code Quality, Plus Projections for 2024

“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, Hubinger et al 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

“Leveraging Large Language Models to Boost Dafny’s Developers Productivity”, Silva et al 2024

Leveraging Large Language Models to Boost Dafny’s Developers Productivity

“WaveCoder: Widespread And Versatile Enhanced Instruction Tuning With Refined Data Generation”, Yu et al 2023

WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation

“StarVector: Generating Scalable Vector Graphics Code from Images”, Rodriguez et al 2023

StarVector: Generating Scalable Vector Graphics Code from Images

“Universal Self-Consistency for Large Language Model Generation”, Chen et al 2023

Universal Self-Consistency for Large Language Model Generation

“LLM-Assisted Code Cleaning For Training Accurate Code Generators”, Jain et al 2023

LLM-Assisted Code Cleaning For Training Accurate Code Generators

“ChipNeMo: Domain-Adapted LLMs for Chip Design”, Liu et al 2023

ChipNeMo: Domain-Adapted LLMs for Chip Design

“CodeFusion: A Pre-Trained Diffusion Model for Code Generation”, Singh et al 2023

CodeFusion: A Pre-trained Diffusion Model for Code Generation

“Eureka: Human-Level Reward Design via Coding Large Language Models”, Ma et al 2023

Eureka: Human-Level Reward Design via Coding Large Language Models

“Data Contamination Through the Lens of Time”, Roberts et al 2023

Data Contamination Through the Lens of Time

“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?”, Jimenez et al 2023

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

“Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”, Zhou et al 2023

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

“Solving Challenging Math Word Problems Using GPT-4 Code Interpreter With Code-Based Self-Verification”, Zhou et al 2023

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

“Testing GPT-4 With Wolfram Alpha and Code Interpreter Plug-Ins on Math and Science Problems”, Davis & Aaronson 2023

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

“Insights into Stack Overflow’s Traffic: We’re Setting the Record Straight”, Darilek 2023

Insights into Stack Overflow’s traffic: We’re setting the record straight

“Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow”, Rio-Chanona et al 2023

Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow

“Explaining Competitive-Level Programming Solutions Using LLMs”, Li et al 2023

Explaining Competitive-Level Programming Solutions using LLMs

“AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—And Not Going Anywhere”, Dzieza 2023

AI Is a Lot of Work: As the technology becomes ubiquitous, a vast tasker underclass is emerging—and not going anywhere

“When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)”, Mozannar et al 2023

When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming (CDHF)

“CodeCompose: A Large-Scale Industrial Deployment of AI-Assisted Code Authoring”, Murali et al 2023

CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring

“Chatting With GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing”, Liu et al 2023

Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing

“Large Language Model Programs”, Schlag et al 2023

Large Language Model Programs

“StarCoder: May the Source Be With You!”, Li et al 2023

StarCoder: may the source be with you!

“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Xie et al 2023

Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding

“LLM+P: Empowering Large Language Models With Optimal Planning Proficiency”, Liu et al 2023

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

“Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes”, Arora et al 2023

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

“How Secure Is Code Generated by ChatGPT?”, Khoury et al 2023

How Secure is Code Generated by ChatGPT?

“Today Was the First Day That I Could Definitively Say That GPT-4 Has Saved Me a Substantial Amount of Tedious Work”, Tao 2023

Today was the first day that I could definitively say that GPT-4 has saved me a substantial amount of tedious work

“Language Models Can Solve Computer Tasks”, Kim et al 2023

Language Models can Solve Computer Tasks

“Introducing Microsoft 365 Copilot—Your Copilot for Work”, Spataro 2023

Introducing Microsoft 365 Copilot—your copilot for work

“Large Language Models and Simple, Stupid Bugs”, Jesse et al 2023

Large Language Models and Simple, Stupid Bugs

“ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics”, Azerbayev et al 2023

ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics

“CodeBERTScore: Evaluating Code Generation With Pretrained Models of Code”, Zhou et al 2023

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

“Faithful Chain-Of-Thought Reasoning”, Lyu et al 2023

Faithful Chain-of-Thought Reasoning

“Large Language Models Are Versatile Decomposers: Decompose Evidence and Questions for Table-Based Reasoning”, Ye et al 2023

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

“Google Is Asking Employees to Test Potential ChatGPT Competitors, including a Chatbot Called 'Apprentice Bard'”, Elias 2023

Google is asking employees to test potential ChatGPT competitors, including a chatbot called 'Apprentice Bard'

“An Analysis of the Automatic Bug Fixing Performance of ChatGPT”, Sobania et al 2023

An Analysis of the Automatic Bug Fixing Performance of ChatGPT

“Connor Leahy on Aliens, Ethics, Economics, Memetics, and Education § GPT-4”, Leahy 2023

Connor Leahy on Aliens, Ethics, Economics, Memetics, and Education § GPT-4

“General Availability of Azure OpenAI Service Expands Access to Large, Advanced AI Models With Added Enterprise Benefits”, Boyd 2023

General availability of Azure OpenAI Service expands access to large, advanced AI models with added enterprise benefits

“SantaCoder: Don’t Reach for the Stars!”, Allal et al 2023

SantaCoder: don’t reach for the stars!

“TrojanPuzzle: Covertly Poisoning Code-Suggestion Models”, Aghakhani et al 2023

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

“ERNIE-Code: Beyond English-Centric Cross-Lingual Pretraining for Programming Languages”, Chai et al 2022

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

“The Stack: 3 TB of Permissively Licensed Source Code”, Kocetkov et al 2022

The Stack: 3 TB of permissively licensed source code

“PAL: Program-Aided Language Models”, Gao et al 2022

PAL: Program-aided Language Models

“Do Users Write More Insecure Code With AI Assistants?”, Perry et al 2022

Do Users Write More Insecure Code with AI Assistants?

“Broken Neural Scaling Laws”, Caballero et al 2022

Broken Neural Scaling Laws

“Programming Possibility: Kevin Scott on AI’s Impact on Cognitive Work”, Hoffman & Scott 2022

Programming Possibility: Kevin Scott on AI’s Impact on Cognitive Work

“Challenging BIG-Bench Tasks (BBH) and Whether Chain-Of-Thought Can Solve Them”, Suzgun et al 2022

Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them

“Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners”, Su et al 2022

Vote-K: Selective Annotation Makes Language Models Better Few-Shot Learners

“Repair Is Nearly Generation: Multilingual Program Repair With LLMs”, Joshi et al 2022

Repair Is Nearly Generation: Multilingual Program Repair with LLMs

“Limitations of Language Models in Arithmetic and Symbolic Induction”, Qian et al 2022

Limitations of Language Models in Arithmetic and Symbolic Induction

“Language Models Can Teach Themselves to Program Better”, Haluptzok et al 2022

Language Models Can Teach Themselves to Program Better

“PanGu-Coder: Program Synthesis With Function-Level Language Modeling”, Christopoulou et al 2022

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

“CodeT: Code Generation With Generated Tests”, Chen et al 2022

CodeT: Code Generation with Generated Tests

“Can Large Language Models Reason about Medical Questions?”, Liévin et al 2022

Can large language models reason about medical questions?

“Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code”, Volum et al 2022

Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code

“Code Translation With Compiler Representations”, Szafraniec et al 2022

Code Translation with Compiler Representations

“Repository-Level Prompt Generation for Large Language Models of Code”, Shrivastava et al 2022

Repository-Level Prompt Generation for Large Language Models of Code

“Learning to Model Editing Processes”, Reid & Neubig 2022

Learning to Model Editing Processes

“Productivity Assessment of Neural Code Completion”, Ziegler et al 2022

Productivity Assessment of Neural Code Completion

“End-To-End Symbolic Regression With Transformers”, Kamienny et al 2022

End-to-end symbolic regression with transformers

“InCoder: A Generative Model for Code Infilling and Synthesis”, Fried et al 2022

InCoder: A Generative Model for Code Infilling and Synthesis

“PaLM: Scaling Language Modeling With Pathways”, Chowdhery et al 2022

PaLM: Scaling Language Modeling with Pathways

“A Conversational Paradigm for Program Synthesis”, Nijkamp et al 2022

A Conversational Paradigm for Program Synthesis

“Evaluating the Text-To-SQL Capabilities of Large Language Models”, Rajkumar et al 2022

Evaluating the Text-to-SQL Capabilities of Large Language Models

“Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models”, Vaithilingam et al 2022

Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

“PolyCoder: A Systematic Evaluation of Large Language Models of Code”, Xu et al 2022

PolyCoder: A Systematic Evaluation of Large Language Models of Code

“Pop Quiz! Can a Large Language Model Help With Reverse Engineering?”, Pearce et al 2022

Pop Quiz! Can a Large Language Model Help With Reverse Engineering?

“Text and Code Embeddings by Contrastive Pre-Training”, Neelakantan et al 2022

Text and Code Embeddings by Contrastive Pre-Training

“Neural Language Models Are Effective Plagiarists”, Biderman & Raff 2022

Neural Language Models are Effective Plagiarists

“Deep Symbolic Regression for Recurrent Sequences”, d’Ascoli et al 2022

Deep Symbolic Regression for Recurrent Sequences

“Discovering the Syntax and Strategies of Natural Language Programming With Generative Language Models”, Jiang et al 2022

Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models

“A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”, Drori et al 2021

A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More

“Few-Shot Semantic Parsing With Language Models Trained On Code”, Shin & Durme 2021

Few-Shot Semantic Parsing with Language Models Trained On Code

“WebGPT: Browser-Assisted Question-Answering With Human Feedback”, Nakano et al 2021

WebGPT: Browser-assisted question-answering with human feedback

“WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Hilton et al 2021

WebGPT: Improving the factual accuracy of language models through web browsing

“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Rae et al 2021

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

“Jigsaw: Large Language Models Meet Program Synthesis”, Jain et al 2021

Jigsaw: Large Language Models meet Program Synthesis

“Can Pre-Trained Language Models Be Used to Resolve Textual and Semantic Merge Conflicts?”, Zhang et al 2021

Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?

“Solving Linear Algebra by Program Synthesis”, Drori & Verma 2021

Solving Linear Algebra by Program Synthesis

“Solving Probability and Statistics Problems by Program Synthesis”, Tang et al 2021

Solving Probability and Statistics Problems by Program Synthesis

“Automatic Program Repair With OpenAI’s Codex: Evaluating QuixBugs”, Prenner & Robbes 2021

Automatic Program Repair with OpenAI’s Codex: Evaluating QuixBugs

“GenLine and GenForm: Two Tools for Interacting With Generative Language Models in a Code Editor”, Jiang et al 2021b

GenLine and GenForm: Two Tools for Interacting with Generative Language Models in a Code Editor

“An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions”, Pearce et al 2021

An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions

“Learning C to X86 Translation: An Experiment in Neural Compilation”, Armengol-Estapé & O’Boyle 2021

Learning C to x86 Translation: An Experiment in Neural Compilation

“Program Synthesis With Large Language Models”, Austin et al 2021

Program Synthesis with Large Language Models

“TAPEX: Table Pre-Training via Learning a Neural SQL Executor”, Liu et al 2021

TAPEX: Table Pre-training via Learning a Neural SQL Executor

“Evaluating Large Language Models Trained on Code”, Chen et al 2021

Evaluating Large Language Models Trained on Code

“Research Recitation: A First Look at Rote Learning in GitHub Copilot Suggestions”, Ziegler 2021

Research recitation: A first look at rote learning in GitHub Copilot suggestions

“Microsoft and OpenAI Have a New AI Tool That Will Give Coding Suggestions to Software Developers”, Novet 2021

Microsoft and OpenAI have a new AI tool that will give coding suggestions to software developers

“SymbolicGPT: A Generative Transformer Model for Symbolic Regression”, Valipour et al 2021

SymbolicGPT: A Generative Transformer Model for Symbolic Regression

“Measuring Coding Challenge Competence With APPS”, Hendrycks et al 2021

Measuring Coding Challenge Competence With APPS

“Improving Code Autocompletion With Transfer Learning”, Zhou et al 2021

Improving Code Autocompletion with Transfer Learning

“LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning”, Wu et al 2021

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

“Learning Autocompletion from Real-World Datasets”, Aye et al 2020

Learning Autocompletion from Real-World Datasets

“GraphCodeBERT: Pre-Training Code Representations With Data Flow”, Guo et al 2020

GraphCodeBERT: Pre-training Code Representations with Data Flow

“CoCoNuT: Combining Context-Aware Neural Translation Models Using Ensemble for Program Repair”, Lutellier et al 2020

CoCoNuT: Combining Context-Aware Neural Translation Models using Ensemble for Program Repair

“TransCoder: Unsupervised Translation of Programming Languages”, Lachaux et al 2020

TransCoder: Unsupervised Translation of Programming Languages

“GPT-3 Random Sample Dump: JavaScript Tutorial”, GPT-3 2020

GPT-3 random sample dump: JavaScript tutorial

“IJON: Exploring Deep State Spaces via Fuzzing”, Aschermann et al 2020

IJON: Exploring Deep State Spaces via Fuzzing

“IntelliCode Compose: Code Generation Using Transformer”, Svyatkovskiy et al 2020

IntelliCode Compose: Code Generation Using Transformer

“Deep Learning for Symbolic Mathematics”, Lample & Charton 2019

Deep Learning for Symbolic Mathematics

“CodeSearchNet Challenge: Evaluating the State of Semantic Code Search”, Husain et al 2019

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

“BERTScore: Evaluating Text Generation With BERT”, Zhang et al 2019

BERTScore: Evaluating Text Generation with BERT

“Seq2SQL: Generating Structured Queries from Natural Language Using Reinforcement Learning”, Zhong et al 2017

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

“Learning to Superoptimize Programs”, Bunel et al 2017

Learning to superoptimize programs

“DeepCoder: Learning to Write Programs”, Balog et al 2016

DeepCoder: Learning to Write Programs

“Neural Programmer-Interpreters”, Reed & Freitas 2015

Neural Programmer-Interpreters

“OpenAI API Alchemy: Smart Formatting and Code Creation”

OpenAI API Alchemy: Smart Formatting and Code Creation

spolu

The examples are indeed extremely simple on purpose (otherwise it’s hard to communicate efficiently what’s happening to non-Metamath experts). That being said, we’re still pretty far away from IMOs; but this is definitely a goal for us, and one we’re actively working towards!

“Transformer-VAE for Program Synthesis”

Transformer-VAE for Program Synthesis

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.