- See Also
- Gwern
-
Links
- “Reverse Training to Nurse the Reversal Curse”, Golovneva et al 2024
- “Evolutionary Optimization of Model Merging Recipes”, Akiba et al 2024
- “Yi: Open Foundation Models by 01.AI”, AI et al 2024
- “Fast Adversarial Attacks on Language Models In One GPU Minute”, Sadasivan et al 2024
- “Grandmaster-Level Chess Without Search”, Ruoss et al 2024
- “SliceGPT: Compress Large Language Models by Deleting Rows and Columns”, Ashkboos et al 2024
- “Excuse Me, Sir? Your Language Model Is Leaking (information)”, Zamir 2024
- “TinyLlama: An Open-Source Small Language Model”, Zhang et al 2024
- “LLaMA Pro: Progressive LLaMA With Block Expansion”, Wu et al 2024
- “Generative AI Is Already Widespread in the Public Sector”, Bright et al 2024
- “Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws”, Sardana & Frankle 2023
- “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, Yuan et al 2023
- “Reasons to Reject? Aligning Language Models With Judgments”, Xu et al 2023
- “Generative Multimodal Models Are In-Context Learners”, Sun et al 2023
- “Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning”, Dutta et al 2023
- “MEDITRON-70B: Scaling Medical Pretraining for Large Language Models”, Chen et al 2023
- “Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching”, Campbell et al 2023
- “OpenAI Researchers Warned Board of AI Breakthrough ahead of CEO Ouster, Sources Say”, Tong et al 2023
- “Positional Description Matters for Transformers Arithmetic”, Shen et al 2023
- “Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models”, Zhang et al 2023
- “Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”, Toyer et al 2023
- “In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries”, Shi et al 2023
- “OSD: Online Speculative Decoding”, Liu et al 2023
- “Let Models Speak Ciphers: Multiagent Debate through Embeddings”, Pham et al 2023
- “OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Paster et al 2023
- “XVal: A Continuous Number Encoding for Large Language Models”, Golkar et al 2023
- “Language Modeling Is Compression”, Delétang et al 2023
- “Sparse Autoencoders Find Highly Interpretable Features in Language Models”, Cunningham et al 2023
- “Anchor Points: Benchmarking Models With Much Fewer Examples”, Vivek et al 2023
- “When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023
- “ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023
- “Studying Large Language Model Generalization With Influence Functions”, Grosse et al 2023
- “Multimodal Neurons in Pretrained Text-Only Transformers”, Schwettmann et al 2023
- “Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models”, Chen et al 2023
- “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
- “SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023
- “Undetectable Watermarks for Language Models”, Christ et al 2023
- “Improving Language Models With Advantage-based Offline Policy Gradients”, Baheti et al 2023
- “DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, Xie et al 2023
- “Memorization for Good: Encryption With Autoregressive Language Models”, Stevens & Su 2023
- “MEGABYTE: Predicting Million-byte Sequences With Multiscale Transformers”, Yu et al 2023
- “Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023
- “Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Konrad 2023
- “Emergent and Predictable Memorization in Large Language Models”, Biderman et al 2023
- “A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023
- “Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, Wang et al 2023
- “How Large-Language Models Can Revolutionize Military Planning”, Jensen & Tadross 2023
- “Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling”, Biderman et al 2023
- “8 Things to Know about Large Language Models”, Bowman 2023
- “BloombergGPT: A Large Language Model for Finance”, Wu et al 2023
- “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
- “Consistency Analysis of ChatGPT”, Jang & Lukasiewicz 2023
- “Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023
- “Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Kataoka 2023
- “SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Zhu et al 2023
- “A Prompt Pattern Catalog to Enhance Prompt Engineering With ChatGPT”, White et al 2023
- “BiLD: Big Little Transformer Decoder”, Kim et al 2023
- “In-Context Retrieval-Augmented Language Models”, Ram et al 2023
- “Crawling the Internal Knowledge-Base of Language Models”, Cohen et al 2023
- “Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React”, Tiku et al 2023
- “Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023
- “InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023
- “A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Grant & Metz 2022
- “Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers”, Dai et al 2022
- “Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale”, Bansal et al 2022
- “Interpreting Neural Networks through the Polytope Lens”, Black et al 2022
- “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022
- “InstructPix2Pix: Learning to Follow Image Editing Instructions”, Brooks et al 2022
- “Galactica: A Large Language Model for Science”, Taylor et al 2022
- “Large Language Models Struggle to Learn Long-Tail Knowledge”, Kandpal et al 2022
- “The CRINGE Loss: Learning What Language Not to Model”, Adolphs et al 2022
- “Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus 2022
- “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Frantar et al 2022
- “What Is My Math Transformer Doing? – 3 Results on Interpretability and Generalization”, Charton 2022
- “When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022
- “Can Language Models Handle Recursively Nested Grammatical Structures? A Case Study on Comparing Models and Humans”, Lampinen 2022
- “Evaluating Parameter Efficient Learning for Generation”, Xu et al 2022
- “BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining”, Luo et al 2022
- “Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Vilnis et al 2022
- “MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022
- “Foundation Transformers”, Wang et al 2022
- “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Arora et al 2022
- “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, Ramamurthy et al 2022
- “Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022
- “Generate rather than Retrieve (GenRead): Large Language Models Are Strong Context Generators”, Yu et al 2022
- “FP8 Formats for Deep Learning”, Micikevicius et al 2022
- “Petals: Collaborative Inference and Fine-tuning of Large Models”, Borzunov et al 2022
-
“
LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022 - “Meaning without Reference in Large Language Models”, Piantadosi & Hill 2022
- “Effidit: Your AI Writing Assistant”, Shi et al 2022
- “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”, Garg et al 2022
- “Language Models Show Human-like Content Effects on Reasoning”, Dasgupta et al 2022
- “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022
- “Can Foundation Models Talk Causality?”, Willig et al 2022
- “NOAH: Neural Prompt Search”, Zhang et al 2022
- “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
- “Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
- “Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, Tirumala et al 2022
- “RankGen: Improving Text Generation With Large Ranking Models”, Krishna et al 2022
- “Opal: Multimodal Image Generation for News Illustration”, Liu et al 2022
- “What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022
- “WAVPROMPT: Towards Few-Shot Spoken Language Understanding With Frozen Language Models”, Gao et al 2022
- “Shared Computational Principles for Language Processing in Humans and Deep Language Models”, Goldstein et al 2022
- “Vector-quantized Image Modeling With Improved VQGAN”, Yu et al 2022
- “Brains and Algorithms Partially Converge in Natural Language Processing”, Caucheteux & King 2022
- “A Contrastive Framework for Neural Text Generation”, Su et al 2022
- “AdaPrompt: Adaptive Model Training for Prompt-based NLP”, Chen et al 2022
- “InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022
- “ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022
- “Cedille: A Large Autoregressive French Language Model”, Müller & Laurent 2022
- “Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Bansal et al 2022
- “PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022
- “Typical Decoding for Natural Language Generation”, Meister et al 2022
- “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Smith et al 2022
- “Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022
- “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
- “A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models”, Zhang et al 2022
- “The Defeat of the Winograd Schema Challenge”, Kocijan et al 2022
- “Learning To Retrieve Prompts for In-Context Learning”, Rubin et al 2021
- “Learning to Prompt for Continual Learning”, Wang et al 2021
- “Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021
- “Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Prabhumoye et al 2021
- “PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
- “LMTurk: Few-Shot Learners As Crowdsourcing Workers”, Zhao et al 2021
- “Improving Language Models by Retrieving from Trillions of Tokens”, Borgeaud et al 2021
- “Linear Algebra With Transformers”, Charton 2021
- “Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic”, Tewel et al 2021
- “Long-range and Hierarchical Language Predictions in Brains and Algorithms”, Caucheteux et al 2021
- “True Few-Shot Learning With Prompts—A Real-World Perspective”, Schick & Schütze 2021
- “Few-shot Named Entity Recognition With Cloze Questions”, Gatta et al 2021
- “Evaluating Distributional Distortion in Neural Language Modeling”, Anonymous 2021
- “On Transferability of Prompt Tuning for Natural Language Understanding”, Su et al 2021
- “CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, Mukherjee et al 2021
- “An Explanation of In-context Learning As Implicit Bayesian Inference”, Xie et al 2021
- “Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey”, Min et al 2021
- “Fast Model Editing at Scale”, Mitchell et al 2021
- “Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning”, Wu et al 2021
- “Towards a Unified View of Parameter-Efficient Transfer Learning”, He et al 2021
- “A Few More Examples May Be Worth Billions of Parameters”, Kirstain et al 2021
- “Scaling Laws for Neural Machine Translation”, Ghorbani et al 2021
- “Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color”, Abdou et al 2021
- “What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Kim et al 2021
- “Medically Aware GPT-3 As a Data Generator for Medical Dialogue Summarization”, Chintagunta et al 2021
- “General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021
- “An Empirical Exploration in Quality Filtering of Text Data”, Gao 2021
- “Want To Reduce Labeling Cost? GPT-3 Can Help”, Wang et al 2021
- “Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021
- “Cutting Down on Prompts and Parameters: Simple Few-Shot Learning With Language Models”, IV et al 2021
- “RASP: Thinking Like Transformers”, Weiss et al 2021
- “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Xue et al 2021
- “Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, Anthropic 2021
- “Naver Unveils First ‘hyperscale’ AI Platform”, Jae-eun 2021
- “Scaling Laws for Language Transfer Learning”, Kim 2021
- “GPT Understands, Too”, Liu et al 2021
- “How Many Data Points Is a Prompt Worth?”, Scao & Rush 2021
- “Pretrained Transformers As Universal Computation Engines”, Lu et al 2021
- “Language Models Have a Moral Dimension”, Schramowski et al 2021
- “Learning Chess Blindfolded: Evaluating Language Models on State Tracking”, Toshniwal et al 2021
- “Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
- “Proof Artifact Co-training for Theorem Proving With Language Models”, Han et al 2021
- “Clinical Outcome Prediction from Admission Notes Using Self-Supervised Knowledge Integration”, Aken et al 2021
- “Scaling Laws for Transfer”, Hernandez et al 2021
- “MAUVE: Measuring the Gap Between Neural Text and Human Text Using Divergence Frontiers”, Pillutla et al 2021
- “Apparently ‘what Ho’ Is a Corruption Of…”, Marguerite 2021
- “Making Pre-trained Language Models Better Few-shot Learners”, Gao et al 2020
- “Thinking Ahead: Prediction in Context As a Keystone of Language in Humans and Machines”, Goldstein et al 2020
- “CPM: A Large-scale Generative Chinese Pre-trained Language Model”, Zhang et al 2020
- “L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
- “Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Sun et al 2020
- “The Neural Architecture of Language: Integrative Reverse-engineering Converges on a Model for Predictive Processing”, Schrimpf et al 2020
- “RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text”, Dugan et al 2020
- “A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation”, Nadeem et al 2020
- “Generative Language Modeling for Automated Theorem Proving”, Polu & Sutskever 2020
- “Learning to Summarize from Human Feedback”, Stiennon et al 2020
- “Aligning AI With Shared Human Values”, Hendrycks et al 2020
- “Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity”, Basu et al 2020
- “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Bender & Koller 2020
- “Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
- “OpenAI API Beta Homepage”, OpenAI 2020
- “Trading Off Diversity and Quality in Natural Language Generation”, Zhang et al 2020
- “Scaling Laws from the Data Manifold Dimension”, Sharma & Kaplan 2020
- “Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020
- “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Hasson et al 2020
- “Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020
- “Scaling Laws for Neural Language Models”, Kaplan et al 2020
- “Reformer: The Efficient Transformer”, Kitaev et al 2020
- “What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020
- “Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)
- “Plug and Play Language Models: A Simple Approach to Controlled Text Generation”, Dathathri et al 2019
- “How Can We Know What Language Models Know?”, Jiang et al 2019
- “CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
- “Generalization through Memorization: Nearest Neighbor Language Models”, Khandelwal et al 2019
- “DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation”, Zhang et al 2019
- “CTRL: A Conditional Transformer Language Model For Controllable Generation”, Keskar et al 2019
- “Smaller, Faster, Cheaper, Lighter: Introducing DistilGPT, a Distilled Version of GPT”, Sanh 2019
- “Language Modelling State-of-the-art Leaderboards”, paperswithcode.com 2019
- “Neural Text Generation With Unlikelihood Training”, Welleck et al 2019
- “GROVER: Defending Against Neural Fake News”, Zellers et al 2019
- “Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019
- “The Curious Case of Neural Text Degeneration”, Holtzman et al 2019
- “Smart Vet: Autocompleting Sentences in Veterinary Medical Records”, Ginn 2019
- “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019
- “Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018
- “Universal Transformers”, Dehghani et al 2018
- “Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018
- “GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training”, Radford et al 2018
- “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)
- “Deep Reinforcement Learning from Human Preferences § Appendix A.2: Atari”, Christiano et al 2017 (page 15 org openai)
- “Learning to Generate Reviews and Discovering Sentiment”, Radford et al 2017
- “Design a Role-playing Game Using 200 Words or Less.”
- “AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.”
- “Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷♂️ 🤔 🤔 🤔 💯]”
- “OpenAI API Alchemy: Emoji Storytelling 🤖”
- “AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”
- “Transformers As Variational Autoencoders”
- “Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.”
- “Deep Learning for Assisting the Process of Music Composition (part 3)”
- “Homepage of Paul F. Christiano”, Christiano 2024
- “Meditations on Moloch”
- “Humans Who Are Not Concentrating Are Not General Intelligences”
- nickwalton00
- sama
- “This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet”
- “Interpreting GPT: the Logit Lens”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Link Bibliography
See Also
Gwern
“Research Ideas”, Gwern 2017
“Machine Learning Scaling”, Gwern 2021
Links
“Reverse Training to Nurse the Reversal Curse”, Golovneva et al 2024
“Evolutionary Optimization of Model Merging Recipes”, Akiba et al 2024
“Yi: Open Foundation Models by 01.AI”, AI et al 2024
“Fast Adversarial Attacks on Language Models In One GPU Minute”, Sadasivan et al 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
“Grandmaster-Level Chess Without Search”, Ruoss et al 2024
“SliceGPT: Compress Large Language Models by Deleting Rows and Columns”, Ashkboos et al 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
“Excuse Me, Sir? Your Language Model Is Leaking (information)”, Zamir 2024
Excuse me, sir? Your language model is leaking (information)
“TinyLlama: An Open-Source Small Language Model”, Zhang et al 2024
“LLaMA Pro: Progressive LLaMA With Block Expansion”, Wu et al 2024
“Generative AI Is Already Widespread in the Public Sector”, Bright et al 2024
“Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws”, Sardana & Frankle 2023
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
“TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, Yuan et al 2023
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
“Reasons to Reject? Aligning Language Models With Judgments”, Xu et al 2023
“Generative Multimodal Models Are In-Context Learners”, Sun et al 2023
“Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning”, Dutta et al 2023
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning
“MEDITRON-70B: Scaling Medical Pretraining for Large Language Models”, Chen et al 2023
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
“Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching”, Campbell et al 2023
“OpenAI Researchers Warned Board of AI Breakthrough ahead of CEO Ouster, Sources Say”, Tong et al 2023
OpenAI researchers warned board of AI breakthrough ahead of CEO ouster, sources say
“Positional Description Matters for Transformers Arithmetic”, Shen et al 2023
“Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models”, Zhang et al 2023
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
“Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game”, Toyer et al 2023
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
“In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries”, Shi et al 2023
In-Context Pretraining (ICP): Language Modeling Beyond Document Boundaries
“OSD: Online Speculative Decoding”, Liu et al 2023
“Let Models Speak Ciphers: Multiagent Debate through Embeddings”, Pham et al 2023
Let Models Speak Ciphers: Multiagent Debate through Embeddings
“OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Paster et al 2023
OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text
“XVal: A Continuous Number Encoding for Large Language Models”, Golkar et al 2023
xVal: A Continuous Number Encoding for Large Language Models
“Language Modeling Is Compression”, Delétang et al 2023
“Sparse Autoencoders Find Highly Interpretable Features in Language Models”, Cunningham et al 2023
Sparse Autoencoders Find Highly Interpretable Features in Language Models
“Anchor Points: Benchmarking Models With Much Fewer Examples”, Vivek et al 2023
“When Less Is More: Investigating Data Pruning for Pretraining LLMs at Scale”, Marion et al 2023
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
“ReST: Reinforced Self-Training (ReST) for Language Modeling”, Gulcehre et al 2023
“Studying Large Language Model Generalization With Influence Functions”, Grosse et al 2023
Studying Large Language Model Generalization with Influence Functions
“Multimodal Neurons in Pretrained Text-Only Transformers”, Schwettmann et al 2023
“Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models”, Chen et al 2023
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
Large Language Models Sometimes Generate Purely Negatively-Reinforced Text
“SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression”, Dettmers et al 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
“Undetectable Watermarks for Language Models”, Christ et al 2023
“Improving Language Models With Advantage-based Offline Policy Gradients”, Baheti et al 2023
Improving Language Models with Advantage-based Offline Policy Gradients
“DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, Xie et al 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
“Memorization for Good: Encryption With Autoregressive Language Models”, Stevens & Su 2023
Memorization for Good: Encryption with Autoregressive Language Models
“MEGABYTE: Predicting Million-byte Sequences With Multiscale Transformers”, Yu et al 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
“Finding Neurons in a Haystack: Case Studies With Sparse Probing”, Gurnee et al 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
“Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Konrad 2023
Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot
“Emergent and Predictable Memorization in Large Language Models”, Biderman et al 2023
Emergent and Predictable Memorization in Large Language Models
“A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model”, Sun et al 2023
“Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, Wang et al 2023
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
“How Large-Language Models Can Revolutionize Military Planning”, Jensen & Tadross 2023
How Large-Language Models Can Revolutionize Military Planning
“Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling”, Biderman et al 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
“8 Things to Know about Large Language Models”, Bowman 2023
“BloombergGPT: A Large Language Model for Finance”, Wu et al 2023
“Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org 2023
“Consistency Analysis of ChatGPT”, Jang & Lukasiewicz 2023
“Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023
Rewarding Chatbots for Real-World Engagement with Millions of Users
“Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Kataoka 2023
“SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Zhu et al 2023
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
“A Prompt Pattern Catalog to Enhance Prompt Engineering With ChatGPT”, White et al 2023
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
“BiLD: Big Little Transformer Decoder”, Kim et al 2023
“In-Context Retrieval-Augmented Language Models”, Ram et al 2023
“Crawling the Internal Knowledge-Base of Language Models”, Cohen et al 2023
“Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React”, Tiku et al 2023
“Rock Guitar Tablature Generation via Natural Language Processing”, Casco-Rodriguez 2023
Rock Guitar Tablature Generation via Natural Language Processing
“InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers”, Boytsov et al 2023
InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
“A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Grant & Metz 2022
“Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers”, Dai et al 2022
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
“Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale”, Bansal et al 2022
“Interpreting Neural Networks through the Polytope Lens”, Black et al 2022
“SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Xiao et al 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
“InstructPix2Pix: Learning to Follow Image Editing Instructions”, Brooks et al 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
“Galactica: A Large Language Model for Science”, Taylor et al 2022
“Large Language Models Struggle to Learn Long-Tail Knowledge”, Kandpal et al 2022
“The CRINGE Loss: Learning What Language Not to Model”, Adolphs et al 2022
“Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus 2022
“GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Frantar et al 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
“What Is My Math Transformer Doing? – 3 Results on Interpretability and Generalization”, Charton 2022
What is my math transformer doing? – 3 results on interpretability and generalization
“When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022
When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels
“Can Language Models Handle Recursively Nested Grammatical Structures? A Case Study on Comparing Models and Humans”, Lampinen 2022
“Evaluating Parameter Efficient Learning for Generation”, Xu et al 2022
“BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining”, Luo et al 2022
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
“Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Vilnis et al 2022
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
“MTEB: Massive Text Embedding Benchmark”, Muennighoff et al 2022
“Foundation Transformers”, Wang et al 2022
“Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, Arora et al 2022
Ask Me Anything (AMA): A simple strategy for prompting language models
“Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, Ramamurthy et al 2022
“Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022
Sparrow: Improving alignment of dialogue agents via targeted human judgements
“Generate rather than Retrieve (GenRead): Large Language Models Are Strong Context Generators”, Yu et al 2022
Generate rather than Retrieve (GenRead): Large Language Models are Strong Context Generators
“FP8 Formats for Deep Learning”, Micikevicius et al 2022
“Petals: Collaborative Inference and Fine-tuning of Large Models”, Borzunov et al 2022
Petals: Collaborative Inference and Fine-tuning of Large Models
“LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale”, Dettmers et al 2022
LLM.int8()
: 8-bit Matrix Multiplication for Transformers at Scale
“Meaning without Reference in Large Language Models”, Piantadosi & Hill 2022
“Effidit: Your AI Writing Assistant”, Shi et al 2022
“What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”, Garg et al 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
“Language Models Show Human-like Content Effects on Reasoning”, Dasgupta et al 2022
Language models show human-like content effects on reasoning
“LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Shah et al 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
“Can Foundation Models Talk Causality?”, Willig et al 2022
“NOAH: Neural Prompt Search”, Zhang et al 2022
“ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Yao et al 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
“Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
Quark: Controllable Text Generation with Reinforced Unlearning
“Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models”, Tirumala et al 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
“RankGen: Improving Text Generation With Large Ranking Models”, Krishna et al 2022
RankGen: Improving Text Generation with Large Ranking Models
“Opal: Multimodal Image Generation for News Illustration”, Liu et al 2022
“What Language Model to Train If You Have One Million GPU Hours?”, Scao et al 2022
What Language Model to Train if You Have One Million GPU Hours?
“WAVPROMPT: Towards Few-Shot Spoken Language Understanding With Frozen Language Models”, Gao et al 2022
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
“Shared Computational Principles for Language Processing in Humans and Deep Language Models”, Goldstein et al 2022
Shared computational principles for language processing in humans and deep language models
“Vector-quantized Image Modeling With Improved VQGAN”, Yu et al 2022
“Brains and Algorithms Partially Converge in Natural Language Processing”, Caucheteux & King 2022
Brains and algorithms partially converge in natural language processing
“A Contrastive Framework for Neural Text Generation”, Su et al 2022
“AdaPrompt: Adaptive Model Training for Prompt-based NLP”, Chen et al 2022
“InPars: Data Augmentation for Information Retrieval Using Large Language Models”, Bonifacio et al 2022
InPars: Data Augmentation for Information Retrieval using Large Language Models
“ROME: Locating and Editing Factual Associations in GPT”, Meng et al 2022
“Cedille: A Large Autoregressive French Language Model”, Müller & Laurent 2022
“Data Scaling Laws in NMT: The Effect of Noise and Architecture”, Bansal et al 2022
Data Scaling Laws in NMT: The Effect of Noise and Architecture
“PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts”, Bach et al 2022
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
“Typical Decoding for Natural Language Generation”, Meister et al 2022
“Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, Smith et al 2022
“Language Models As Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents”, Huang et al 2022
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
“WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Liu et al 2022
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
“A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models”, Zhang et al 2022
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
“The Defeat of the Winograd Schema Challenge”, Kocijan et al 2022
“Learning To Retrieve Prompts for In-Context Learning”, Rubin et al 2021
“Learning to Prompt for Continual Learning”, Wang et al 2021
“Amortized Noisy Channel Neural Machine Translation”, Pang et al 2021
“Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases”, Prabhumoye et al 2021
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
“PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts”, Khashabi et al 2021
PROMPT WAYWARDNESS: The Curious Case of Discretized Interpretation of Continuous Prompts
“LMTurk: Few-Shot Learners As Crowdsourcing Workers”, Zhao et al 2021
“Improving Language Models by Retrieving from Trillions of Tokens”, Borgeaud et al 2021
Improving language models by retrieving from trillions of tokens
“Linear Algebra With Transformers”, Charton 2021
“Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic”, Tewel et al 2021
Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
“Long-range and Hierarchical Language Predictions in Brains and Algorithms”, Caucheteux et al 2021
Long-range and hierarchical language predictions in brains and algorithms
“True Few-Shot Learning With Prompts—A Real-World Perspective”, Schick & Schütze 2021
True Few-Shot Learning with Prompts—A Real-World Perspective
“Few-shot Named Entity Recognition With Cloze Questions”, Gatta et al 2021
“Evaluating Distributional Distortion in Neural Language Modeling”, Anonymous 2021
Evaluating Distributional Distortion in Neural Language Modeling
“On Transferability of Prompt Tuning for Natural Language Understanding”, Su et al 2021
On Transferability of Prompt Tuning for Natural Language Understanding
“CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, Mukherjee et al 2021
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
“An Explanation of In-context Learning As Implicit Bayesian Inference”, Xie et al 2021
An Explanation of In-context Learning as Implicit Bayesian Inference
“Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey”, Min et al 2021
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
“Fast Model Editing at Scale”, Mitchell et al 2021
“Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning”, Wu et al 2021
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
“Towards a Unified View of Parameter-Efficient Transfer Learning”, He et al 2021
Towards a Unified View of Parameter-Efficient Transfer Learning
“A Few More Examples May Be Worth Billions of Parameters”, Kirstain et al 2021
“Scaling Laws for Neural Machine Translation”, Ghorbani et al 2021
“Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color”, Abdou et al 2021
Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color
“What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers”, Kim et al 2021
“Medically Aware GPT-3 As a Data Generator for Medical Dialogue Summarization”, Chintagunta et al 2021
Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization
“General-Purpose Question-Answering With Macaw”, Tafjord & Clark 2021
“An Empirical Exploration in Quality Filtering of Text Data”, Gao 2021
“Want To Reduce Labeling Cost? GPT-3 Can Help”, Wang et al 2021
“Multimodal Few-Shot Learning With Frozen Language Models”, Tsimpoukelli et al 2021
“Cutting Down on Prompts and Parameters: Simple Few-Shot Learning With Language Models”, IV et al 2021
Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
“RASP: Thinking Like Transformers”, Weiss et al 2021
“ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Xue et al 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
“Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, Anthropic 2021
Anthropic raises $124 million to build more reliable, general AI systems
“Naver Unveils First ‘hyperscale’ AI Platform”, Jae-eun 2021
“Scaling Laws for Language Transfer Learning”, Kim 2021
“GPT Understands, Too”, Liu et al 2021
“How Many Data Points Is a Prompt Worth?”, Scao & Rush 2021
“Pretrained Transformers As Universal Computation Engines”, Lu et al 2021
“Language Models Have a Moral Dimension”, Schramowski et al 2021
“Learning Chess Blindfolded: Evaluating Language Models on State Tracking”, Toshniwal et al 2021
Learning Chess Blindfolded: Evaluating Language Models on State Tracking
“Investigating the Limitations of the Transformers With Simple Arithmetic Tasks”, Nogueira et al 2021
Investigating the Limitations of the Transformers with Simple Arithmetic Tasks
“Proof Artifact Co-training for Theorem Proving With Language Models”, Han et al 2021
Proof Artifact Co-training for Theorem Proving with Language Models
“Clinical Outcome Prediction from Admission Notes Using Self-Supervised Knowledge Integration”, Aken et al 2021
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration
“Scaling Laws for Transfer”, Hernandez et al 2021
“MAUVE: Measuring the Gap Between Neural Text and Human Text Using Divergence Frontiers”, Pillutla et al 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
“Apparently ‘what Ho’ Is a Corruption Of…”, Marguerite 2021
“Making Pre-trained Language Models Better Few-shot Learners”, Gao et al 2020
“Thinking Ahead: Prediction in Context As a Keystone of Language in Humans and Machines”, Goldstein et al 2020
Thinking ahead: prediction in context as a keystone of language in humans and machines
“CPM: A Large-scale Generative Chinese Pre-trained Language Model”, Zhang et al 2020
CPM: A Large-scale Generative Chinese Pre-trained Language Model
“L2L: Training Large Neural Networks With Constant Memory Using a New Execution Algorithm”, Pudipeddi et al 2020
L2L: Training Large Neural Networks with Constant Memory using a New Execution Algorithm
“Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries”, Sun et al 2020
“The Neural Architecture of Language: Integrative Reverse-engineering Converges on a Model for Predictive Processing”, Schrimpf et al 2020
“RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text”, Dugan et al 2020
RoFT: A Tool for Evaluating Human Detection of Machine-Generated Text
“A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation”, Nadeem et al 2020
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation
“Generative Language Modeling for Automated Theorem Proving”, Polu & Sutskever 2020
“Learning to Summarize from Human Feedback”, Stiennon et al 2020
“Aligning AI With Shared Human Values”, Hendrycks et al 2020
“Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity”, Basu et al 2020
Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity
“Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, Bender & Koller 2020
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data
“Transformers Are RNNs: Fast Autoregressive Transformers With Linear Attention”, Katharopoulos et al 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
“OpenAI API Beta Homepage”, OpenAI 2020
“Trading Off Diversity and Quality in Natural Language Generation”, Zhang et al 2020
Trading Off Diversity and Quality in Natural Language Generation
“Unigram LM: Byte Pair Encoding Is Suboptimal for Language Model Pretraining”, Bostrom & Durrett 2020
Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining
“Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks”, Hasson et al 2020
Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks
“Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions”, Huang & Yang 2020
Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
“Scaling Laws for Neural Language Models”, Kaplan et al 2020
“Reformer: The Efficient Transformer”, Kitaev et al 2020
“What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020
What does BERT dream of? A visual investigation of nightmares in Sesame Street
“Generative Language Modeling for Automated Theorem Proving § Experiments”, Polu & Sutskever 2020 (page 11 org openai)
Generative Language Modeling for Automated Theorem Proving § Experiments
“Plug and Play Language Models: A Simple Approach to Controlled Text Generation”, Dathathri et al 2019
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
“How Can We Know What Language Models Know?”, Jiang et al 2019
“CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning”, Lin et al 2019
CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning
“Generalization through Memorization: Nearest Neighbor Language Models”, Khandelwal et al 2019
Generalization through Memorization: Nearest Neighbor Language Models
“DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation”, Zhang et al 2019
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
“CTRL: A Conditional Transformer Language Model For Controllable Generation”, Keskar et al 2019
CTRL: A Conditional Transformer Language Model For Controllable Generation
“Smaller, Faster, Cheaper, Lighter: Introducing DistilGPT, a Distilled Version of GPT”, Sanh 2019
Smaller, faster, cheaper, lighter: Introducing DistilGPT, a distilled version of GPT
“Language Modelling State-of-the-art Leaderboards”, paperswithcode.com 2019
“Neural Text Generation With Unlikelihood Training”, Welleck et al 2019
“GROVER: Defending Against Neural Fake News”, Zellers et al 2019
“Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously”, Child & Gray 2019
“The Curious Case of Neural Text Degeneration”, Holtzman et al 2019
“Smart Vet: Autocompleting Sentences in Veterinary Medical Records”, Ginn 2019
Smart Vet: Autocompleting Sentences in Veterinary Medical Records
“Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context”, Dai et al 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
“Music Transformer: Generating Music With Long-Term Structure”, Huang et al 2018
Music Transformer: Generating Music with Long-Term Structure
“Universal Transformers”, Dehghani et al 2018
“Adversarial Reprogramming of Neural Networks”, Elsayed et al 2018
“GPT-1: Improving Language Understanding With Unsupervised Learning”, OpenAI 2018
GPT-1: Improving Language Understanding with Unsupervised Learning
“GPT-1: Improving Language Understanding by Generative Pre-Training”, Radford et al 2018
GPT-1: Improving Language Understanding by Generative Pre-Training
“GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Radford et al 2018 (page 5)
GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications
“Deep Reinforcement Learning from Human Preferences § Appendix A.2: Atari”, Christiano et al 2017 (page 15 org openai)
Deep reinforcement learning from human preferences § Appendix A.2: Atari
“Learning to Generate Reviews and Discovering Sentiment”, Radford et al 2017
“Design a Role-playing Game Using 200 Words or Less.”
“AI Dungeon: Dragon Model Upgrade—You Can Now Play AI Dungeon With One of the Most Powerful AI Models in the World.”
“Introducing AI Dungeon Translate: AI Dungeon Players Can Now Translate Their Stories into Emojis by Just Clicking a Button. [ 🤔 💯 🤷♂️ 🤔 🤔 🤔 💯]”
“OpenAI API Alchemy: Emoji Storytelling 🤖”
“AlphaStar: Mastering the Real-Time Strategy Game StarCraft II”
AlphaStar: Mastering the Real-Time Strategy Game StarCraft II
“Transformers As Variational Autoencoders”
“Math: OpenAI API Can Do Some Math out of the Gate, but Most Math It Seems It Has to Learn. Many Times, the Numbers That It Spits out Are Just Random. However, including Different Priming Prompts Can Result in Decent Results.”
“Deep Learning for Assisting the Process of Music Composition (part 3)”
Deep learning for assisting the process of music composition (part 3)
“Homepage of Paul F. Christiano”, Christiano 2024
“Meditations on Moloch”
“Humans Who Are Not Concentrating Are Not General Intelligences”
Humans Who Are Not Concentrating Are Not General Intelligences
nickwalton00
sama
“This Is the OpenAI API. It Makes Spookily Good Twitter Bots. 13⁄10 Would Retweet”
This is the OpenAI API. It makes spookily good twitter bots. 13⁄10 would retweet
“Interpreting GPT: the Logit Lens”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
ai-impact
generative
transformer-applications
language-models
Wikipedia
Miscellaneous
-
/doc/ai/nn/transformer/gpt/2023-bommarito-figure1-gpt3cpaaccountingexamperformancebyexamsection.png
: -
/doc/ai/nn/transformer/gpt/2023-bommarito-figure2-progressofgpt3overtimeoncpaaccountingexam.png
: -
/doc/ai/nn/transformer/gpt/2023-qin-figure1-chatgptvsgpt35on20nlpdatasets.png
: -
/doc/ai/nn/transformer/gpt/2022-08-06-gwern-meme-netflixliegirl-studyingdeeplearningscaling.png
: -
/doc/ai/nn/transformer/gpt/2022-05-22-gwern-meme-tintinwhataweekhuh-2ndanniversaryofgpt3paper.png
: -
/doc/ai/nn/transformer/gpt/2022-bommarito-figure1-gpt3performanceonbarexambycategory.png
: -
/doc/ai/nn/transformer/gpt/2022-bommarito-figure2-increaseofgpt3modelaccuracyonbarexambysize.png
: -
/doc/ai/nn/transformer/gpt/2021-05-25-naver-hyperclova-computescaling0137bto82b.png
: -
/doc/ai/nn/transformer/gpt/2021-01-11-gwern-meme-dogbarkcanthurtyou-aiscaling.jpg
: -
/doc/ai/nn/transformer/gpt/2021-almeida-figure2-lhoptgpt3hyperparametertuningscalinglaw.png
: -
/doc/ai/nn/transformer/gpt/2021-dou-figure2-errorsbymodel.png
: -
/doc/ai/nn/transformer/gpt/2021-dou-figure3-errorsbytype.png
: -
/doc/ai/nn/transformer/gpt/2021-dou-figure4-errorsbydecodingsamplingstrategyhyperparameters.png
: -
/doc/ai/nn/transformer/gpt/2021-hernandez-transferlearning-figure2-transferscaling.png
: -
/doc/ai/nn/transformer/gpt/2021-kim-figure4-datatransferfromenglishtochinese.png
: -
/doc/ai/nn/transformer/gpt/2021-kim-figure5-transferfromenglishtochinesespanishgerman.png
: -
/doc/ai/nn/transformer/gpt/2021-nogueira-figure1-additionperformanceofnumberorthographies.png
-
/doc/ai/nn/transformer/gpt/2020-06-21-openai-beta-gpt3-playgroundui.png
: -
/doc/ai/nn/transformer/gpt/2020-06-18-karpathy-expandingbrainmeme-gpt3metalearning.jpg
: -
/doc/ai/nn/transformer/gpt/2020-04-01-gwern-gpt2-5k-midi-training.png
: -
/doc/ai/nn/transformer/gpt/2020-02-03-gpt21.5b-archiveofourownao3-model-510427-samples-topp090.txt
-
/doc/ai/nn/transformer/gpt/2020-02-03-gpt21.5b-videogamewalkthrough-model-174925-samples-topp090.txt
-
/doc/ai/nn/transformer/gpt/2020-01-20-gwern-gpt2-25k-midi-training.png
: -
/doc/ai/nn/transformer/gpt/2020-bostrom-unigramlm-figure1-unigramlmvsbpe.png
: -
/doc/ai/nn/transformer/gpt/2020-brown-figure31-gpt3scaling.png
-
/doc/ai/nn/transformer/gpt/2020-brown-figure313-humanabilitytodetectmodelgeneratednewsstories.png
-
/doc/ai/nn/transformer/gpt/2020-brown-gpt3-figure13-meanperformancescalingcurve.png
: -
/doc/ai/nn/transformer/gpt/2020-hendrycks-figure1b-gpt3-qascaling.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-figure1-scalingacrossdomains.png
: -
/doc/ai/nn/transformer/gpt/2020-henighan-figure11-pretrainingimageclassificationscaling.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-figure2-universalmodelsizescaling.png
: -
/doc/ai/nn/transformer/gpt/2020-henighan-figure3-domainmodelsizescaling.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-figure31-qandamodelscaling.png
-
/doc/ai/nn/transformer/gpt/2020-henighan-table1-autoregressivemodelsscalingpowerlaws.png
: -
/doc/ai/nn/transformer/gpt/2020-kaplan-appendix1-summaryofneurallanguagemodelscalingpowerlaws.png
: -
/doc/ai/nn/transformer/gpt/2020-kaplan-figure1-dlscaling.png
: -
/doc/ai/nn/transformer/gpt/2020-kaplan-figure15-projectingscaling.png
: -
/doc/ai/nn/transformer/gpt/2020-kaplan-figure7-rnnsvstransformers.png
: -
/doc/ai/nn/transformer/gpt/2020-zhang-figure1-thelikelihoodtrap.png
: -
/doc/ai/nn/transformer/gpt/2019-12-21-gwern-gpt2-preferencelearning-abc-combinedmodel-divergence.png
-
/doc/ai/nn/transformer/gpt/2019-12-17-gwern-gpt2-preferencelearning-abc-terminal.png
: -
/doc/ai/nn/transformer/gpt/2019-12-16-gwern-gpt2-15b-poetry-tensorboard-100tputraining.png
: -
/doc/ai/nn/transformer/gpt/2019-12-13-gwern-gpt2-15b-poetry-tensorboard-97tputraining.png
: -
/doc/ai/nn/transformer/gpt/2019-12-13-gwern-gpt2-preferencelearning-abc-combinedmodel-halfbounce.png
: -
/doc/ai/nn/transformer/gpt/2019-12-12-gwern-gpt2-abc-score-polkaebbbab.png
: -
/doc/ai/nn/transformer/gpt/2019-11-19-gwern-gpt2-15b-poetry-tensorboard-1tputraining.jpg
: -
/doc/ai/nn/transformer/gpt/2019-keskar-table2-ctrltextsamplesusingonlymetadatawithoutaprompt.png
: -
/doc/ai/nn/transformer/gpt/2019-keskar-table7-datasetsandcontrolcodesmetadata.png
-
/doc/ai/nn/transformer/gpt/2019-openai-gpt2-demo-recyclingtextsample.png
: -
/doc/ai/nn/transformer/gpt/2019-radford-figure4-gpt2validationloss.png
-
/doc/ai/nn/transformer/gpt/2019-ziegler-preferencelearning-figure1-architecture.png
: -
/doc/ai/nn/transformer/gpt/2018-huang-magenta-musictransformer-attentionvisualization.png
: -
https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
-
https://adversa.ai/blog/universal-llm-jailbreak-chatgpt-gpt-4-bard-bing-anthropic-and-beyond/
-
https://analyticsindiamag.com/when-chatgpt-attempted-upsc-exam/
: -
https://blog.research.google/2017/08/transformer-novel-neural-network.html
-
https://colab.research.google.com/drive/1c6VccMPsOMAUQCKU4BVDRd5Y32qkozmK
-
https://davidrozado.substack.com/p/the-political-preferences-of-llms
-
https://github.com/jujumilk3/leaked-system-prompts/tree/main
-
https://hedgehogreview.com/issues/markets-and-the-good/articles/language-machinery
-
https://homepages.inf.ed.ac.uk/abmayne/publications/sennrich2016NAACL.pdf
-
https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion
-
https://mi.eng.cam.ac.uk/projects/cued-rnnlm/papers/Interspeech15.pdf
-
https://nautil.us/your-next-new-best-friend-might-be-a-robot-235779/
:View External Link:
https://nautil.us/your-next-new-best-friend-might-be-a-robot-235779/
-
https://platform.openai.com/docs/guides/gpt-best-practices
: -
https://promptarmor.substack.com/p/data-exfiltration-from-writercom
-
https://simonwillison.net/2023/Apr/14/worst-that-can-happen/
-
https://soundcloud.com/seaandsailor/sets/char-rnn-composes-irish-folk-music
-
https://techtualist.substack.com/p/i-wrote-a-script-for-gpt-3-to-take
-
https://thezvi.substack.com/p/jailbreaking-the-chatgpt-on-release
-
https://twitter.com/BlancheMinerva/status/1662521904727756801
-
https://twitter.com/OfficialLoganK/status/1664476604658069511
-
https://twitter.com/RiversHaveWings/status/1459646450275553285
-
https://twitter.com/fluffykittnmeow/status/1737639861350269213
-
https://twitter.com/francoisfleuret/status/1714531085512544760
-
https://twitter.com/mathemagic1an/status/1595410144522813440
-
https://web.archive.org/web/20240102075620/https://www.jailbreakchat.com/
-
https://www.alignmentforum.org/posts/rtEtTybuCcDWLk7N9/ama-conjecture-a-new-alignment-startup
-
https://www.forbes.com/sites/thomasbrewster/2023/11/16/chatgpt-becomes-a-social-media-spy-assistant/
-
https://www.forefront.ai/blog-posts/how-to-fine-tune-gpt-neox
: -
https://www.freaktakes.com/p/the-past-and-present-of-computer
-
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
-
https://www.lesswrong.com/posts/EzuBSASuui5qekhLA/assessing-alephalphas-multimodal-model
: -
https://www.lesswrong.com/posts/PDLfpRwSynu73mxGw/basic-facts-about-language-model-internals-1
: -
https://www.lesswrong.com/posts/YKfNZAmiLdepDngwi/gpt-175bee
-
https://www.lesswrong.com/posts/a3FuA7fGgpTQ7mX3W/is-gpt3-a-good-rationalist-instructgpt3-2-2
-
https://www.lesswrong.com/posts/etoMr4vcnP7joQHWa/notes-from-a-prompt-factory
-
https://www.lesswrong.com/posts/jfq2BH5kfQqu2vYv3/we-are-conjecture-a-new-alignment-research-startup
-
https://www.lesswrong.com/posts/yZb5eFvDoaqB337X5/investigating-causal-understanding-in-llms
: -
https://www.lesswrong.com/posts/ydeaHqDPJ5REJWvat/a-one-question-turing-test-for-gpt-3
: -
https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
-
https://www.nytimes.com/interactive/2023/04/26/upshot/gpt-from-scratch.html
-
https://www.oneusefulthing.org/p/working-with-ai-two-paths-to-prompting
-
https://www.politico.eu/article/italian-privacy-regulator-bans-chatgpt/
-
https://www.reddit.com/r/ChatGPT/comments/12xai7j/spamming_the_word_stop_2300_times_or_probably_any/
-
https://www.reddit.com/r/ChatGPT/comments/15y4mqx/i_asked_chatgpt_to_maximize_its_censorship/
-
https://www.reddit.com/r/GPT3/comments/ra6nk4/had_gpt3_generate_the_onion_headlines/
-
https://www.reddit.com/r/GPT3/comments/tgud2t/my_new_favorite_thing_is_making_gpt3_create/
: -
https://www.reddit.com/r/MachineLearning/comments/v42pej/p_this_is_the_worst_ai_ever_gpt4chan_model/
: -
https://www.sfchronicle.com/projects/2021/jessica-simulation-artificial-intelligence/
:
Link Bibliography
-
https://arxiv.org/abs/2402.15570
: “Fast Adversarial Attacks on Language Models In One GPU Minute”, Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan, Priyatham Kattakinda, Atoosa Chegini, Soheil Feizi -
https://arxiv.org/abs/2401.15024#microsoft
: “SliceGPT: Compress Large Language Models by Deleting Rows and Columns”, Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman -
https://arxiv.org/abs/2312.16862
: “TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones”, Zhengqing Yuan, Zhaoxu Li, Lichao Sun -
https://arxiv.org/abs/2311.16079
: “MEDITRON-70B: Scaling Medical Pretraining for Large Language Models”, -
https://www.reuters.com/technology/sam-altmans-ouster-openai-was-precipitated-by-letter-board-about-ai-breakthrough-2023-11-22/
: “OpenAI Researchers Warned Board of AI Breakthrough ahead of CEO Ouster, Sources Say”, Anna Tong, Jeffrey Dastin, Krystal Hu -
https://arxiv.org/abs/2310.06786
: “OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text”, Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba -
https://arxiv.org/abs/2309.10668#deepmind
: “Language Modeling Is Compression”, -
https://arxiv.org/abs/2306.07567
: “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Fabien Roger -
https://arxiv.org/abs/2305.10429#google
: “DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining”, -
https://www.forbes.com/sites/alexkonrad/2023/05/02/inflection-ai-ex-deepmind-launches-pi-chatbot/
: “Inflection AI, Startup From Ex-DeepMind Leaders, Launches Pi—A Chattier Chatbot”, Alex Konrad -
https://arxiv.org/abs/2304.06762#nvidia
: “Shall We Pretrain Autoregressive Language Models With Retrieval? A Comprehensive Study”, -
https://warontherocks.com/2023/04/how-large-language-models-can-revolutionize-military-planning/
: “How Large-Language Models Can Revolutionize Military Planning”, Benjamin Jensen, Dan Tadross -
https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and
: “Int-4 LLaMa Is Not Enough—Int-3 and Beyond: More Compression, Easier to Build Apps on LLMs That Run Locally”, nolano.org -
https://osf.io/5uxra/
: “Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, Yuki Kataoka -
https://arxiv.org/abs/2302.13939
: “SpikeGPT: Generative Pre-trained Language Model With Spiking Neural Networks”, Rui-Jie Zhu, Qihang Zhao, Jason K. Eshraghian -
https://www.nytimes.com/2022/12/21/technology/ai-chatgpt-google-search.html
: “A New Chat Bot Is a ‘Code Red’ for Google’s Search Business: A New Wave of Chat Bots like ChatGPT Use Artificial Intelligence That Could Reinvent or Even Replace the Traditional Internet Search Engine”, Nico Grant, Cade Metz -
https://arxiv.org/abs/2211.10438
: “SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models”, Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han -
https://arxiv.org/abs/2211.09800
: “InstructPix2Pix: Learning to Follow Image Editing Instructions”, Tim Brooks, Aleksander Holynski, Alexei A. Efros -
https://arxiv.org/abs/2211.09085#facebook
: “Galactica: A Large Language Model for Science”, -
https://arxiv.org/abs/2211.08411
: “Large Language Models Struggle to Learn Long-Tail Knowledge”, Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel -
https://arxiv.org/abs/2210.17323
: “GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers”, Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh -
https://arxiv.org/abs/2210.13673#nvidia
: “Evaluating Parameter Efficient Learning for Generation”, -
https://arxiv.org/abs/2210.10341#microsoft
: “BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining”, Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liu -
https://arxiv.org/abs/2210.15458#google
: “Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models”, Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Passos, Sumit Sanghai -
https://arxiv.org/abs/2210.06423#microsoft
: “Foundation Transformers”, -
https://arxiv.org/abs/2210.02441
: “Ask Me Anything (AMA): A Simple Strategy for Prompting Language Models”, -
https://arxiv.org/abs/2210.01241
: “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, -
https://arxiv.org/abs/2208.01066
: “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes”, Shivam Garg, Dimitris Tsipras, Percy Liang, Gregory Valiant -
https://arxiv.org/abs/2207.04429
: “LM-Nav: Robotic Navigation With Large Pre-Trained Models of Language, Vision, and Action”, Dhruv Shah, Blazej Osinski, Brian Ichter, Sergey Levine -
https://arxiv.org/abs/2206.01861#microsoft
: “ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers”, Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He -
https://www.nature.com/articles/s41593-022-01026-4
: “Shared Computational Principles for Language Processing in Humans and Deep Language Models”, -
https://arxiv.org/abs/2110.04627#google
: “Vector-quantized Image Modeling With Improved VQGAN”, -
https://www.nature.com/articles/s42003-022-03036-1
: “Brains and Algorithms Partially Converge in Natural Language Processing”, Charlotte Caucheteux, Jean-Rémi King -
https://arxiv.org/abs/2201.11990#microsoftnvidia
: “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”, -
https://swabhs.com/assets/pdf/wanli.pdf#allen
: “WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation”, Alisa Liu, Swabha Swayamdipta, Noah A. Smith, Yejin Choi -
https://arxiv.org/abs/2112.04426#deepmind
: “Improving Language Models by Retrieving from Trillions of Tokens”, -
https://arxiv.org/abs/2111.02570#microsoft
: “CLUES: Few-Shot Learning Evaluation in Natural Language Understanding”, -
https://arxiv.org/abs/2110.11309
: “Fast Model Editing at Scale”, Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning -
https://arxiv.org/abs/2109.02593#allen
: “General-Purpose Question-Answering With Macaw”, Oyvind Tafjord, Peter Clark -
https://arxiv.org/abs/2106.06981
: “RASP: Thinking Like Transformers”, Gail Weiss, Yoav Goldberg, Eran Yahav -
https://arxiv.org/abs/2105.13626#google
: “ByT5: Towards a Token-free Future With Pre-trained Byte-to-byte Models”, Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel -
https://m.koreaherald.com/view.php?ud=20210525000824#naver
: “Naver Unveils First ‘hyperscale’ AI Platform”, Kang Jae-eun -
https://arxiv.org/abs/2009.03393#openai
: “Generative Language Modeling for Automated Theorem Proving”, Stanislas Polu, Ilya Sutskever -
https://arxiv.org/abs/2004.10802
: “Scaling Laws from the Data Manifold Dimension”, Utkarsh Sharma, Jared Kaplan -
https://arxiv.org/abs/2001.08361#openai
: “Scaling Laws for Neural Language Models”, -
https://arxiv.org/abs/2001.04451#google
: “Reformer: The Efficient Transformer”, Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya -
https://arxiv.org/abs/1909.05858#salesforce
: “CTRL: A Conditional Transformer Language Model For Controllable Generation”, Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher (Salesforce) -
https://magenta.tensorflow.org/music-transformer
: “Music Transformer: Generating Music With Long-Term Structure”, Cheng-Zhi Anna Huang, Ian Simon, Monica Dinculescu -
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf#page=5
: “GPT-1: Improving Language Understanding by Generative Pre-Training § Model Specifications”, Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever -
https://paulfchristiano.com/
: “Homepage of Paul F. Christiano”, Paul F. Christiano