- “SynthBio: A Case Study in Faster Curation of Text Datasets”, Yuan et al 2022
- “GLaM: Efficient Scaling of Language Models With Mixture-of-Experts”, Du et al 2021
- “Show Your Work: Scratchpads for Intermediate Computation With Language Models”, Nye et al 2021
- “AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021
- “A Recipe For Arbitrary Text Style Transfer With Large Language Models”, Reif et al 2021
- “Finetuned Language Models Are Zero-Shot Learners”, Wei et al 2021
- “Program Synthesis With Large Language Models”, Austin et al 2021
- “Towards a Human-like Open-Domain Chatbot”, Adiwardana et al 2020
Google LaMDA is a large 137b-parameter dense Transformer neural network model, announced by Google in May 2021, as a followup to Meena; it is most similar to OpenAI’s May 2020 GPT-3 (175b) in both design and capabilities. This parity may be due to higher-quality training data used for LaMDA, particularly the large dialogue training dataset inherited from Meena.
LaMDA is one of the standard testbeds for Google scaling research and examining the many surprising capabilities scaled-up models turn out to have, and many papers have been published about it. Mysteriously, Googlers are not allowed to name LaMDA in those papers, or even to confirm or deny whether it is LaMDA when asked; instead, the papers will vaguely allude to a series of large (eg. “we used pre-trained dense decoder-only language models, ranging in size from 2 million to 137 billion parameters. These models were pre-trained on web documents and dialog data”).
This index collates papers I infer make use of LaMDA: typically, if a Google paper uses a model size <20b, then it is probably a T5 bidirectional ; >200b-parameters, it is actually a mixture-of-experts model (eg. Switch); if a >150b-parameter model is specified to be dense, then it may be a different model like DeepMind’s 280b-parameter Gopher.