May 2020 Gwern.net newsletter: GPT-3 scaling, implications, deep theory; anime GAN updates, and 1 book review.
May 2020’s Gwern.net newsletter is now out; previous, April 2020 (archives). This is a collation of links and summary of major changes, overlapping with my Changelog; brought to you by my donors on Patreon.
Writings
-
Ganbooru prototype: released 256px BigGAN trained on Danbooru2019; Danbooru2019 Figures dataset
-
Gwern.net:
- experimental
<srcset>
mobile image optimization -
popups.js
: +support for reverse-footnote popups
Mailing List SwitchThe newsletter moved this month to Substack due to reaching the TinyLetter 5000-subscriber limit. Please let me know of any issues beyond the known issue of length truncation. (Note that reading the website version on desktop is the recommended way for annotations etc.) - experimental
On GPT-3: Meta-Learning, Scaling, Implications, And Deep Theory
On “GPT-3: Language Models are Few-Shot Learners”, Brown et al 2020 (poems & my followup GPT-3 Creative Writing, compare my old finetuned GPT-2 poetry; random samples; “OpenAI API” with real-world demos)
GPT-3, announced by OpenAI in May 2020, is the largest neural network ever trained, by over an order of magnitude. Trained on Internet text data, it is the successor to GPT-2, which had surprised everyone by its natural language understanding & generation ability. To the surprise of most (including myself), this vast increase in size did not run into diminishing or negative returns, as many expected, but the benefits of scale continued to happen as forecasted by OpenAI. These benefits were not merely learning more facts & text than GPT-2, but qualitatively distinct & even more surprising in showing meta-learning: while GPT-2 learned how to do common natural language tasks like text summarization, GPT-3 instead learned how to follow directions and learn new tasks from a few examples. (As a result, GPT-3 outputs & interaction are more fascinating & human-like than GPT-2.)
While the immediate applications of GPT-3, like my poetry or humor writings, are nice, the short-term implications of GPT-3 are much more important.
First, while GPT-3 is expensive by conventional DL standards, it is cheap by scientific/commercial/military/government budget standards, and the results indicate that models could be made much larger. Second, models can also be made much more powerful, as GPT is an old approach known to be flawed in both minor & major ways, and far from an ‘ideal’ Transformer. Third, GPT-3’s capabilities come from learning on raw (unsupervised) data; that has long been one of the weakest areas of DL, holding back progress in other areas like reinforcement learning or robotics. Models like GPT-3 suggest that large unsupervised models will be vital components of future DL systems, as they can be ‘plugged into’ systems to immediately provide understanding of the world, humans, natural language, and reasoning.
The meta-learning has a longer-term implication: it is a demonstration of the blessings of scale, where problems with simple neural networks vanish, and they become more powerful, more generalizable, more human-like when simply made very large & trained on very large datasets with very large compute—even though those properties are believed to require complicated architectures & fancy algorithms (and this perceived need drives much research). Unsupervised models benefit from this, as training on large corpuses like Internet-scale text present a myriad of difficult problems to solve; this is enough to drive meta-learning despite GPT not being designed for meta-learning in any way. (This family of phenomena is perhaps driven by neural networks functioning as ensembles of many sub-networks with them all averaging out to an Occam’s razor, which for small data & models, learn superficial or memorized parts of the data, but can be forced into true learning by making the problems hard & rich enough; as meta-learners learn amortized Bayesian inference, they build in informative priors when trained over many tasks, and become dramatically more sample-efficient and better at generalization.)
The blessings of scale in turn support a radical theory: an old AI paradigm held by a few pioneers in connectionism (early artificial neural network research) and by more recent deep learning researchers, the scaling hypothesis. The scaling hypothesis regards the blessings of scale as the secret of AGI: intelligence is ‘just’ simple neural units & learning algorithms applied to diverse experiences at a (currently) unreachable scale. (In particular, much of intelligence is ‘just’ prediction because accurate prediction requires understanding.) As increasing computational resources permit running such algorithms at the necessary scale, the neural networks will get ever more intelligent.
When? Estimates of Moore’s law-like progress curves decades ago by pioneers like Hans Moravec indicated that it would take until the 2010s for the sufficiently-cheap compute for tiny insect-level prototype systems to be available, and the 2020s for the first sub-human systems to become feasible, and these forecasts are holding up. (Despite this vindication, the scaling hypothesis is so unpopular an idea, and difficult to prove in advance rather than as a fait accompli, that while the GPT-3 results finally drew some public notice after OpenAI enabled limited public access & people could experiment with it live, it is unlikely that many entities will modify their research philosophies, much less kick off an ‘arms race’.)
More concerningly, GPT-3’s scaling curves, unpredicted meta-learning, and success on various anti-AI challenges suggests that in terms of futurology, AI researchers’ forecasts are an emperor sans garments: they have no coherent model of how AI progress happens or why GPT-3 was possible or what specific achievements should cause alarm, where intelligence comes from, and do not learn from any falsified predictions. Their primary concerns appear to be supporting the status quo, placating public concern, and remaining respectable. As such, their comments on AI risk are meaningless: they would make the same public statements if the scaling hypothesis were true or not.
Depending on what investments are made into scaling DL, and how fast compute grows, the 2020s should be quite interesting—sigmoid or singularity?
Moved to “The Scaling Hypothesis”.
Media
Links
AI:
-
Matters Of Scale:
- GPT-3: see above; for GPT-3 compared to humans on the absolute scale of character prediction, see Scaling Hypothesis, footnote 18
- “Measuring the Algorithmic Efficiency of Neural Networks”, Hernandez & Brown 2020 (blog/interview; the first prototype is never the best one, but given enough compute & time, you can refine it and figure out how it should have been done all along, and this paper quantifies the neural net hardware overhang just since 2012: “it now takes 44× less compute to train…to the level of AlexNet”. Unsurprising—eg the experience curve in linear programming: Bixby 2002; see Grace 2013/Yudkowsky 2013. We don’t know how to train the right kind of neural nets and make huge mistakes with the simplest things, as capability jumps like resnets or EfficientNet or R2D2 occasionally remind us.)
- “IntelliCode Compose: Code Generation Using [GPT-2] Transformer”, Svyatkovskiy et al 2020 (unclear if application of ZeRO-2; see also the GPT-3 few-shot code completion abilities)
- “GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce” (blog; one model, 7 datasets, 89m images, 83 losses/tasks, and +8% search quality boost worldwide)
-
“Deep neuroethology of a virtual rodent”, Merel et al 2019 (media)
-
“Go-Explore 2: First return then explore”, Ecoffet et al 2021
-
“Learning to Simulate Dynamic Environments with GameGAN”, Kim et al 2020 (project page, code; an unexpected appearance of a Neural Turing Machine)
-
“Exploring Bayesian Optimization: Breaking Bayesian Optimization into small, sizeable chunks”, Agnihotri & Batra 2020
-
“This Word Does Not Exist” (GPT-2); “This Fursona Does Not Exist (TFDNE)” editor (a simple but high-quality StyleGAN 2 face model of furries, also available on Artbreeder; interesting for how the fur flew due to legal fuzziness & some artists acting like animals, howling about ‘theft’ & free fursonas being a wolf in sheep’s clothing upsetting their pecking order1—though the creator has outfoxed the paper tiger threats, these kittlesome questions will dog ML as DL models multiply like rabbits)
Genetics:
-
Everything Is Heritable:
- “Local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traits”, Zhang et al 2020 (“autism/IQ rg…could be explained by 2 etiologically-distinct genetic signatures w/bidirectional local genetic correlations”)
- “Identification of 370 loci for age at onset of sexual and reproductive behaviour, highlighting common aetiology with reproductive biology, externalizing behaviour and longevity”, Mills et al 2020
- “Genome-wide association study of school grades identifies a genetic overlap between language ability, psychopathology and creativity”, Rajagopal et al 2020 (“math performance was severely affected whereas language performance (Danish and English) was relatively unaffected or enhanced in those with psychiatric disorders”)
- “GWAS of Depression Phenotypes in the Million Veteran Program and Meta-analysis in More than 1.2 Million Participants Yields 178 Independent Risk Loci”, Levey et al 2020
- “A Single Gene Causes Thelytokous Parthenogenesis, the Defining Feature of the Cape Honeybee Apis mellifera capensis”, Yagound et al 2020
- “Insights into the genetic architecture of the human face”, White et al 2020
-
Recent Evolution:
- “Sex-biased reduction in reproductive success drives selective constraint on human genes”, Gardner et al 2020; “Genome-wide analysis identifies genetic effects on reproductive success and ongoing natural selection at the FADS locus”, Mathieson et al 2020 (previously: Barban et al 2016/Tropf et al 2016/Verweij et al 2017)
- “Disentangling selection on genetically correlated polygenic traits using whole-genome genealogies”, Stern et al 2020
-
Engineering:
Statistics/Meta-Science:
- “Variability in the analysis of a single neuroimaging dataset by many teams”, Botvinik-Nezer et al 2020
- “Remembering John Conway’s FRACTRAN, a ridiculous, yet surprisingly deep language”, Reginald Braithwaite (how does the recently-deceased John Conway’s 1980 esolang lead to the Collatz conjecture?)
- “Tumbling toast, Murphy’s Law and the fundamental constants”, Matthews 1995 (overview; anthropics size argument from Press 1980; see also Bacon et al 2001/Borghi 2012)
Politics/religion:
- Review of The Cultural Revolution
- “The Voluntariness of Voluntary Consent: Consent Searches and the Psychology of Compliance”, Sommers & Bohns 2019 (people are bad at predicting resistance to police requests; see also Christin et al 2012)
- Operation INFEKTION (see also Gordon 1997)
- “Progress Studies for Aspiring Young Scholars” (experimental online summer class for high school students by Jason Crawford on development)
Psychology/biology:
- “Understanding immunity through the lens of disease ecology”, Hedrick 20172 (“…for the past few thousand years, we human beings have been the most diseased species on earth”; followup to Hedrick 2004)
- “How sanitation conquered disease long before vaccines or antibiotics”, Jason Crawford
- “Everyday Life as an Intelligence Test: Effects of Intelligence and Intelligence Context”, Gordon 1997
- “Objective and subjective experiences of child maltreatment and their relationships with psychopathology”, Danese & Widom 2020 (nothing in psychology makes sense except in the light of individual-differences)
- “Brainless but Multi-Headed: Decision Making by the Acellular Slime Mould Physarum polycephalum”, Beekman & Latty 2015
- “I’m paid biweekly, just not by leprechauns: Evaluating valid-but-incorrect response rates to attention check items”, Curran & Hauser 2019 (how do “lizardman constant” responders justify it? Or, ‘free response is the devil’)
Technology:
- “Reflections on How Designers Design with Data”, Bigelow et al 2014 (why are data visualizations so bad—superficially pretty but misleading or useless? Because many designers don’t look at the data, avoid automation & create manually so they can focus on pretty shapes/colors & enjoying fiddling with it, and ignore readers)
- “Do Ads Harm News Consumption?”, Yan et al 2020 (“Users who adopt ad blockers subsequently consume 20% more news articles corresponding to 10% more categories. The effect persists over time…”; see my ad page)
- “The 1-Bit Instrument: The Fundamentals of 1-Bit Synthesis, Their Implementational Implications, and Instrumental Possibilities”, Troise 2020
Economics:
- “In Ohio, the Amish Take On the Coronavirus” (supply and demand: masks can be easily made anywhere if prices are allowed to rise & they are not illegal to sell)
- “The Story of America’s Most Prolific Counterfeiter” (how Frank Bourassa tricked a Swiss mill into selling him the unique U.S. dollar linen-paper to create $334.3$250.02012m in perfect counterfeit money & mostly got away with it)
Fiction:
Misc:
Books
Fiction:
- The Battle Between the Frogs and the Mice: A Tiny Homeric Epic, translated Stallings 2009 (review)
Music
- “Sept Jours sans Elle (Vocal)” (Raven’s Jig; Une Semaine chez les Écarlates {2018}) [classical]
- “Un Jour Joueur” (Raven’s Jig; Une Semaine chez les Écarlates {2018}) [classical]
- “Bons et mauvais Jours” (Raven’s Jig; Une Semaine chez les Écarlates {2018}) [classical]
MLP:
- “Morning in Baltimare” (Mane in Green; II. The Journey [The Quest of the Lost Sapphire—Ep. 2] {2017}) [instrumental rock]
- “Love and Reflection” (Dionte George; Ignite {2020}) [jazz]
- “Second Prances (Vocal VIP)” (Etherium Apex ft. Nicole Carino {2020}) [electronic]
- “Spun” (The Wasteland Wailers feat. Brittany Church & Haymaker; Ignite {2020}) [country]
- “Equiterian Empire” (Carbon Maestro; Celestial Divide OST) [orchestral]
- “The Storm Is Coming VIP [Single Purpose Remix]” (UndreamedPanic feat. Metajoker; Ignite {2020}) [rock]
- “Mare Cognitum” (Idyllia feat. Velvet R. Wings; Ignite {2020}) [orchestral rock]
- “Fire City (Day & Night)” (Wandering Artist; Ignite {2020}) [orchestral]
- “What Remains” (Totalspark; Ignite {2020}) [Liquid Drum & Brass]
Doujin:
- “Come, Sweet Death [Komm, süsser Tod]” (Platina Jazz feat. Niklas Gabrielsson; Anime Standards Vol. 6 {2019}) [jazz]
- “Hope” (Simpsonill {2017}) [electronic]
-
Don’t worry: we already have short-shorts & ear-TIPS to hedge against fursona inflation. That said, we advise taking a large position in equineties image macro funds to benefit from a flight to quality and herding: it’ll be a bear market for kinky bonds—and that’s no bull.↩︎
-
Some interesting references on viral evolution:
-
Coevolution Of Virulence:
- Experimental Epidemiology, Greenwood et al 1936 (editorial)
- “Population biology of infectious diseases: Part I”/“Part II”, Anderson & May 1979
- “Coevolution of hosts and parasites”, Anderson & May 1982
-
- “Experimental Evolution of Parasites”, Ebert 1998
- “History of Sabin attenuated poliovirus oral live vaccine strains”, Sabin & Boulger 1973 (making Sabin’s polio vaccine by dozens of passages through monkeys & monkey tissues)
-