Google LaMDA LLM

Gwern Branwen

2022-01-16–2022-01-16 finished certainty: highly likely importance: 3 bibliography

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

Google LaMDA is a large 137b-parameter dense Transformer neural network model, announced by Google in May 2021⁠, as a followup to Meena⁠; it is most similar to OpenAI’s May 2020 GPT-3 (175b) in both design and capabilities. This parity may be due to higher-quality training data used for LaMDA, particularly the large dialogue training dataset inherited from Meena.

LaMDA is one of the standard testbeds for Google scaling research and examining the many surprising capabilities scaled-up models turn out to have, and many papers have been published about it. Mysteriously, Googlers were not allowed to name LaMDA in those papers, or even to confirm or deny whether it is LaMDA when asked; instead, the early papers vaguely alluded to a series of large Transformers (eg. “we used pre-trained dense decoder-only Transformer language models, ranging in size from 2 million to 137 billion parameters. These models were pre-trained on web documents and dialog data”), leading to confusion.

So, this index collates LaMDA papers: typically during 2021–2022 if a Google paper uses a model size <20b, then it is probably a T5 bidirectional Transformer; >200b-parameters, it is actually a mixture-of-experts model (eg. Switch); if a >150b-parameter model is specified to be dense, then it may be a different model like DeepMind’s 280b-parameter Gopher⁠.

[Error: JavaScript disabled.]

[Backlinks, similar links, and the link-bibliography require JS enabled to load.]

Link Bibliography

[Bibliography of links/references used in page]

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]