Skip to main content

transhumanism directory


“It Looks Like You’re Trying To Take Over The World”, Branwen 2022

Clippy: “It Looks Like You’re Trying To Take Over The World”⁠, Gwern Branwen (2022-03-06; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Fictional short story about Clippy & AI hard takeoff scenarios grounded in contemporary ML scaling, self-supervised learning⁠, reinforcement learning, and meta-learning research literature.

It might help to imagine a hard takeoff scenario using solely known sorts of NN & scaling effects… Below is a story which may help stretch your imagination and defamiliarize the 2022 state of machine learning.

To read the annotated alternate version of this story, scroll to the end or manually disable ‘reader-mode’ () in the theme toggle in the upper-right corner.

“GPT-3 Creative Fiction”, Branwen 2020

GPT-3: “GPT-3 Creative Fiction”⁠, Gwern Branwen (2020-06-19; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Creative writing by OpenAI’s GPT-3 model, demonstrating poetry, dialogue, puns, literary parodies, and storytelling. Plus advice on effective GPT-3 prompt programming & avoiding common errors.

I continue my AI poetry generation experiments with OpenAI’s 2020 GPT-3, which is 116× larger, and much more powerful, than the 2019 GPT-2⁠. GPT-3, however, is not merely a quantitative tweak yielding “GPT-2 but better”—it is qualitatively different, exhibiting eerie runtime learning capabilities allowing even the raw model, with zero finetuning, to “meta-learn” many textual tasks purely by example or instruction. One does not train or program GPT-3 in a normal way, but one engages in dialogue and writes prompts to teach GPT-3 what one wants.

Experimenting through the OpenAI Beta API in June 2020, I find that GPT-3 does not just match my finetuned GPT-2-1.5b-poetry for poem-writing quality, but exceeds it, while being versatile in handling poetry⁠, Tom Swifty puns⁠, science fiction, dialogue like Turing’s Turing-test dialogue⁠, literary style parodies… As the pièce de résistance, I recreate Stanislaw Lem’s Cyberiad’s “Trurl’s Electronic Bard” poetry using GPT-3. (Along the way, I document instances of how the BPE text encoding unnecessarily damages GPT-3’s performance on a variety of tasks, how to best elicit the highest-quality responses, common errors people make in using GPT-3, and test out GPT-3’s improvements in NN weak points like logic or commonsense knowledge.)

GPT-3’s samples are not just close to human level: they are creative, witty, deep, meta, and often beautiful. They demonstrate an ability to handle abstractions, like style parodies, I have not seen in GPT-2 at all. Chatting with GPT-3 feels uncannily like chatting with a human. I was impressed by the results reported in the GPT-3 paper, and after spending a week trying it out, I remain impressed.

This page records GPT-3 samples I generated in my explorations, and thoughts on how to use GPT-3 and its remaining weaknesses⁠. I hope you enjoy them even a tenth as much as I enjoyed testing GPT-3 and watching the completions scroll across my screen.

“The Scaling Hypothesis”, Branwen 2020

Scaling-hypothesis: “The Scaling Hypothesis”⁠, Gwern Branwen (2020-05-28; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

On GPT-3: meta-learning, scaling, implications, and deep theory. The scaling hypothesis: neural nets absorb data & compute, generalizing and becoming more Bayesian as problems get harder, manifesting new abilities even at trivial-by-global-standards-scale. The deep learning revolution has begun as foretold.

GPT-3, announced by OpenAI in May 2020, is the largest neural network ever trained, by over an order of magnitude. Trained on Internet text data, it is the successor to GPT-2⁠, which had surprised everyone by its natural language understanding & generation ability. To the surprise of most (including myself), this vast increase in size did not run into diminishing or negative returns, as many expected, but the benefits of scale continued to happen as forecasted by OpenAI. These benefits were not merely learning more facts & text than GPT-2, but qualitatively distinct & even more surprising in showing meta-learning: while GPT-2 learned how to do common natural language tasks like text summarization, GPT-3 instead learned how to follow directions and learn new tasks from a few examples. (As a result, GPT-3 outputs & interaction are more fascinating & human-like than GPT-2.)

While the immediate applications of GPT-3, like my poetry or humor writings, are nice, the short-term implications of GPT-3 are much more important.

First, while GPT-3 is expensive by conventional DL standards, it is cheap by scientific/​commercial/​military/​government budget standards, and the results indicate that models could be made much larger. Second, models can also be made much more powerful, as GPT is an old approach known to be flawed in both minor & major ways, and far from an ‘ideal’ Transformer⁠. Third, GPT-3’s capabilities come from learning on raw (unsupervised) data; that has long been one of the weakest areas of DL, holding back progress in other areas like reinforcement learning or robotics. Models like GPT-3 suggest that large unsupervised models will be vital components of future DL systems, as they can be ‘plugged into’ systems to immediately provide understanding of the world, humans, natural language, and reasoning.

The meta-learning has a longer-term implication: it is a demonstration of the blessings of scale⁠, where problems with simple neural networks vanish, and they become more powerful, more generalizable, more human-like when simply made very large & trained on very large datasets with very large compute—even though those properties are believed to require complicated architectures & fancy algorithms (and this perceived need drives much research). Unsupervised models benefit from this, as training on large corpuses like Internet-scale text present a myriad of difficult problems to solve; this is enough to drive meta-learning despite GPT not being designed for meta-learning in any way. (This family of phenomena is perhaps driven by neural networks functioning as ensembles of many sub-networks with them all averaging out to an Occam’s razor, which for small data & models, learn superficial or memorized parts of the data, but can be forced into true learning by making the problems hard & rich enough; as meta-learners learn amortized Bayesian inference⁠, they build in informative priors when trained over many tasks, and become dramatically more sample-efficient and better at generalization.)

The blessings of scale in turn support a radical theory: an old AI paradigm held by a few pioneers in connectionism (early artificial neural network research) and by more recent deep learning researchers, the scaling hypothesis⁠. The scaling hypothesis regards the blessings of scale as the secret of AGI: intelligence is ‘just’ simple neural units & learning algorithms applied to diverse experiences at a (currently) unreachable scale. As increasing computational resources permit running such algorithms at the necessary scale, the neural networks will get ever more intelligent.

When? Estimates of Moore’s law-like progress curves decades ago by pioneers like Hans Moravec indicated that it would take until the 2010s for the sufficiently-cheap compute for tiny insect-level prototype systems to be available, and the 2020s for the first sub-human systems to become feasible, and these forecasts are holding up. (Despite this vindication, the scaling hypothesis is so unpopular an idea, and difficult to prove in advance rather than as a fait accompli, that while the GPT-3 results finally drew some public notice after OpenAI enabled limited public access & people could experiment with it live, it is unlikely that many entities will modify their research philosophies, much less kick off an ‘arms race’.)

More concerningly, GPT-3’s scaling curves, unpredicted meta-learning, and success on various anti-AI challenges suggests that in terms of futurology, AI researchers’ forecasts are an emperor sans garments: they have no coherent model of how AI progress happens or why GPT-3 was possible or what specific achievements should cause alarm, where intelligence comes from, and do not learn from any falsified predictions. Their primary concerns appear to be supporting the status quo, placating public concern, and remaining respectable. As such, their comments on AI risk are meaningless: they would make the same public statements if the scaling hypothesis were true or not.

Depending on what investments are made into scaling DL, and how fast compute grows, the 2020s should be quite interesting—sigmoid or singularity?

For more ML scaling research, follow the  /  ​ r /  ​ MLScaling subreddit. For a fiction treatment as SF short story, see “It Looks Like You’re Trying To Take Over The World”⁠.

“Frequently Overlooked Realistic Moral Bioenhancement Interventions”, Conan 2019

2019-conan.pdf: “Frequently overlooked realistic moral bioenhancement interventions”⁠, Gregory Mark Conan (2019; ; similar):

Many supporters of ‘moral bioenhancement’ (MBE), the use of biomedical interventions for moral improvement, have been criticised for having unrealistic proposals. The interventions they suggest have often been called infeasible and their implementation plans vague or unethical. I dispute these criticisms by showing that various interventions to implement MBE are practically and ethically feasible enough to warrant serious consideration. Such interventions include transcranial direct current stimulation over the medial and dorsolateral prefrontal cortex, as well as supplementation with lithium and omega-3.

Considering their efficacy and feasibility, it is strange that these interventions have rarely been proposed or discussed as MBE. I review evidence that each of those interventions can reduce antisocial behaviour, reduce racial bias, increase executive function or increase prosocial traits like fairness and altruism. I then specify and defend realistic, ethically permissible ways to implement these interventions, especially for violent offenders and public servants—the former as rehabilitation and the latter to meet the high standards of their occupations. These interventions could be given to violent offenders in exchange for a reduced sentence or compulsorily in some cases. Potential intervention methods for non-prisoners include increasing the USDA-recommended dose of omega-3, encouraging food companies to supplement their products with omega-3 or trace lithium, requiring MBE for employment as a police officer or political leader, and insurance companies providing discounts for undergoing MBE.

In some reasonably limited form, using these interventions may be a good first step to implement the project of MBE.

“Race in My Little Pony”, Branwen 2018

MLP-genetics: “Race in My Little Pony”⁠, Gwern Branwen (2018-06-04; ⁠, ; backlinks; similar):

In MLP:FiM, the 3 pony races sometimes bear offspring of other pony races; I review 4 complicated Mendelian models attempting to explain this, and note that a standard polygenic liability-threshold model can fit it parsimoniously.

(For background on My Little Pony: Friendship is Magic, see my review of My Little Pony⁠.)

Another fictional universe with genetic mechanisms is My Little Pony: Friendship Is Magic, where there are 3 pony races which are heritable. One outlier family which has all 3 races represented challenges simple Mendelian interpretations of MLP races. I review 4 attempts to reconcile the outlier with Mendelian mechanisms, and propose another interpretation, drawing on polygenic mechanisms, treating race as a polytomous liability threshold trait, which is flexible enough to explain all observations in-universe (at least for the first few seasons of MLP).

“Amusing Ourselves to Death?”, Branwen 2018

Amuse: “Amusing Ourselves to Death?”⁠, Gwern Branwen (2018-05-12; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A suggested x-risk⁠/​Great Filter is the possibility of advanced entertainment technology leading to wireheading/​mass sterility/​population collapse and extinction. As media consumption patterns are highly heritable, any such effect would trigger rapid human adaptation, implying extinction is almost impossible unless immediate collapse or exponentially accelerating addictiveness.

To demonstrate the point that there are pervasive genetic influences on all aspects of media consumption or leisure time activities/​preferences/​attitudes, I compile >580 heritability estimates from the behavioral genetics literature (drawing particularly on Loehlin & Nichols 1976’s A Study of 850 Sets of Twins), roughly divided in ~13 categories.

“Genetics and Eugenics in Frank Herbert’s Dune-verse”, Branwen 2018

Dune-genetics: “Genetics and Eugenics in Frank Herbert’s Dune-verse”⁠, Gwern Branwen (2018-05-05; ⁠, ⁠, ⁠, ; backlinks; similar):

Discussion of fictional eugenics program in the SF Dune-verse and how it contradicts contemporary known human genetics but suggests heavy agricultural science and Mendelian inspiration to Frank Herbert’s worldview.

Frank Herbert’s SF Dune series features as a central mechanic a multi-millennium human eugenics breeding program by the Bene Gesserit, which produces the main character, Paul Atreides⁠, with precognitive powers. The breeding program is described as oddly slow and ineffective and requiring roles for incest and inbreeding at some points, which contradict most proposed human eugenics methods. I describe the two main historical paradigms of complex trait genetics, the Fisherian infinitesimal model and the Mendelian monogenic model, the former of which is heavily used in human behavioral genetics and the latter of which is heavily used in agricultural breeding for novel traits, and argue that Herbert (incorrectly but understandably) believed the latter applied to most human traits, perhaps related to his lifelong autodidactic interest in plants & insects & farming, and this unstated but implicit intellectual background shaped Dune and resolves the anomalies.

“On the Existence of Powerful Natural Languages”, Branwen 2016

Language: “On the Existence of Powerful Natural Languages”⁠, Gwern Branwen (2016-12-18; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A common dream in philosophy and politics and religion is the idea of languages superior to evolved demotics, whether Latin or Lojban, which grant speakers greater insight into reality and rationality, analogous to well-known efficacy of mathematical sub-languages in solving problems. This dream fails because such languages gain power inherently from specialization.

Designed formal notations & distinct vocabularies are often employed in STEM fields, and these specialized languages are credited with greatly enhancing research & communication. Many philosophers and other thinkers have attempted to create more generally-applicable designed languages for use outside of specific technical fields to enhance human thinking, but the empirical track record is poor and no such designed language has demonstrated substantial improvements to human cognition such as resisting cognitive biases or logical fallacies. I suggest that the success of specialized languages in fields is inherently due to encoding large amounts of previously-discovered information specific to those fields, and this explains their inability to boost human cognition across a wide variety of domains.

“Why Tool AIs Want to Be Agent AIs”, Branwen 2016

Tool-AI: “Why Tool AIs Want to Be Agent AIs”⁠, Gwern Branwen (2016-09-07; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn, because all problems are reinforcement-learning problems.

Autonomous AI systems (Agent AIs) trained using reinforcement learning can do harm when they take wrong actions, especially superintelligent Agent AIs. One solution would be to eliminate their agency by not giving AIs the ability to take actions, confining them to purely informational or inferential tasks such as classification or prediction (Tool AIs), and have all actions be approved & executed by humans, giving equivalently superintelligent results without the risk.

I argue that this is not an effective solution for two major reasons. First, because Agent AIs will by definition be better at actions than Tool AIs, giving an economic advantage. Secondly, because Agent AIs will be better at inference & learning than Tool AIs, and this is inherently due to their greater agency: the same algorithms which learn how to perform actions can be used to select important datapoints to learn inference over, how long to learn, how to more efficiently execute inference, how to design themselves, how to optimize hyperparameters, how to make use of external resources such as long-term memories or external software or large databases or the Internet, and how best to acquire new data.

All of these actions will result in Agent AIs more intelligent than Tool AIs, in addition to their greater economic competitiveness. Thus, Tool AIs will be inferior to Agent AIs in both actions and intelligence, implying use of Tool AIs is an even more highly unstable equilibrium than previously argued, as users of Agent AIs will be able to outcompete them on two dimensions (and not just one).

“Embryo Selection For Intelligence”, Branwen 2016

Embryo-selection: “Embryo Selection For Intelligence”⁠, Gwern Branwen (2016-01-22; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A cost-benefit analysis of the marginal cost of IVF-based embryo selection for intelligence and other traits with 2016-2017 state-of-the-art

With genetic predictors of a phenotypic trait, it is possible to select embryos during an in vitro fertilization process to increase or decrease that trait. Extending the work of Shulman & Bostrom 2014⁠/​Hsu 2014⁠, I consider the case of human intelligence using SNP-based genetic prediction, finding:

  • a meta-analysis of GCTA results indicates that SNPs can explain >33% of variance in current intelligence scores, and >44% with better-quality phenotype testing
  • this sets an upper bound on the effectiveness of SNP-based selection: a gain of 9 IQ points when selecting the top embryo out of 10
  • the best 2016 polygenic score could achieve a gain of ~3 IQ points when selecting out of 10
  • the marginal cost of embryo selection (assuming IVF is already being done) is modest, at $1,822.7[^\$1,500.0^~2016~]{.supsub} + $243.0[^\$200.0^~2016~]{.supsub} per embryo, with the sequencing cost projected to drop rapidly
  • a model of the IVF process, incorporating number of extracted eggs, losses to abnormalities & vitrification & failed implantation & miscarriages from 2 real IVF patient populations, estimates feasible gains of 0.39 & 0.68 IQ points
  • embryo selection is currently unprofitable (mean: -$435.0[^\$358.0^~2016~]{.supsub}) in the USA under the lowest estimate of the value of an IQ point, but profitable under the highest (mean: $7,570.3[^\$6,230.0^~2016~]{.supsub}). The main constraints on selection profitability is the polygenic score; under the highest value, the NPV EVPI of a perfect SNP predictor is $29.2[^\$24.0^~2016~]{.supsub}b and the EVSI per education/​SNP sample is $86.3[^\$71.0^~2016~]{.supsub}k
  • under the worst-case estimate, selection can be made profitable with a better polygenic score, which would require n > 237,300 using education phenotype data (and much less using fluid intelligence measures)
  • selection can be made more effective by selecting on multiple phenotype traits: considering an example using 7 traits (IQ/​height/​BMI/​diabetes/​ADHD⁠/​bipolar/​schizophrenia), there is a factor gain over IQ alone; the outperformance of multiple selection remains after adjusting for genetic correlations & polygenic scores and using a broader set of 16 traits.

“Complexity No Bar to AI”, Branwen 2014

Complexity-vs-AI: “Complexity no Bar to AI”⁠, Gwern Branwen (2014-06-01; ⁠, ⁠, ⁠, ; backlinks; similar):

Critics of AI risk suggest diminishing returns to computing (formalized asymptotically) means AI will be weak; this argument relies on a large number of questionable premises and ignoring additional resources, constant factors, and nonlinear returns to small intelligence advantages, and is highly unlikely.

Computational complexity theory describes the steep increase in computing power required for many algorithms to solve larger problems; frequently, the increase is large enough to render problems a few times larger totally intractable. Many of these algorithms are used in AI-relevant contexts. It has been argued that this implies that AIs will fundamentally be limited in accomplishing real-world tasks better than humans because they will run into the same computational complexity limit as humans, and so the consequences of developing AI will be small, as it is impossible for there to be any large fast global changes due to human or superhuman-level AIs. I examine the assumptions of this argument and find it neglects the many conditions under which computational complexity theorems are valid and so the argument doesn’t work: problems can be solved more efficiently than complexity classes would imply, large differences in problem solubility between humans and AIs is possible, greater resource consumption is possible, the real-world consequences of small differences on individual tasks can be large on agent impacts, such consequences can compound, and many agents can be created; any of these independent objections being true destroys the argument.

“Radiance: A Novel”, Scholz et al 2013

2002-scholz-radiance: “Radiance: A Novel”⁠, Carter Scholz, Gregory Benford, Hugh Gusterson, Sam Cohen, Curtis LeMay (2013-07-06; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

E-book edition of the 2002 Carter Scholz novel of post-Cold War science/​technology, extensively annotated with references and related texts.

Radiance: A Novel is SF author Carter Scholz’s second literary novel. It is a roman à clef of the 1990s set at the Lawrence Livermore National Laboratory⁠, centering on two nuclear physicists entangled in corruption, mid-life crises, institutional incentives, technological inevitability, the end of the Cold War & start of the Dotcom Bubble, nuclear bombs & Star Wars missile defense program, existential risks⁠, accelerationism, and the great scientific project of mankind. (For relevant historical background, see the excerpts in the appendices⁠.)

I provide a HTML transcript prepared from the novel, with extensive annotations of all references and allusions, along with extracts from related works, and a comparison with the novella version.

Note: to hide apparatus like the links, you can use reader-mode ().

“The Hyperbolic Time Chamber & Brain Emulation”, Branwen 2012

Hyperbolic-Time-Chamber: “The Hyperbolic Time Chamber & Brain Emulation”⁠, Gwern Branwen (2012-08-29; ⁠, ; backlinks; similar):

A time dilation chamber as thought experiment on the power of pure thought, with comparison to computer AGI advantages/​disadvantages.

A time dilation tool from an anime is discussed for its practical use on Earth; there seem surprisingly few uses and none that will change the world, due to the severe penalties humans would incur while using it, and basic constraints like Amdahl’s law limit the scientific uses. A comparison with the position of an Artificial Intelligence such as an emulated human brain seems fair, except most of the time dilation disadvantages do not apply or can be ameliorated and hence any speedups could be quite effectively exploited. I suggest that skeptics of the idea that speedups give advantages are implicitly working off the crippled time dilation tool and not making allowance for the disanalogies.

“Slowing Moore’s Law: How It Could Happen”, Branwen 2012

Slowing-Moores-Law: “Slowing Moore’s Law: How It Could Happen”⁠, Gwern Branwen (2012-03-16; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Weak points in the networks powering technological progress: chip factories

Brain emulation requires enormous computing power; enormous computing power requires further progression of Moore’s law⁠; further Moore’s law relies on large-scale production of cheap processors in ever more-advanced chip fabs⁠; cutting-edge chip fabs are both expensive and vulnerable to state actors (but not non-state actors such as terrorists). Therefore: the advent of brain emulation can be delayed by global regulation of chip fabs.

“Dune Notes”, Branwen 2010

dune: “Dune notes”⁠, Gwern Branwen (2010-10-18; ⁠, ; backlinks; similar):

Observations on Frank Herbert’s Dune series—the notes are probably overstated, the Butlerian Jihad was not a robot uprising seeking to exterminate humanity, and precognition/​prescience in the Dune universe appears to work backwards allowing for retro-causality and stable time-loops as exemplified by Leto II’s Golden Path creating Paul Atreides and the events of Dune.

Frank Herbert’s SF Dune series features as a central mechanic a multi-millennium human eugenics breeding program by the Bene Gesserit, which produces the main character, Paul Atreides, with precognitive powers. The breeding program is described as oddly slow and ineffective and requiring roles for incest and inbreeding at some points, which contradict most proposed human eugenics methods. I describe the two main historical paradigms of complex trait genetics, the Fisherian infinitesimal model and the Mendelian monogenic model, the former of which is heavily used in human behavioral genetics and the latter of which is heavily used in agricultural breeding for novel traits, and argue that Herbert (incorrectly but understandably) believed the latter applied to most human traits, perhaps related to his longstanding autodidactic interest in plants & insects & farming, and this unstated but implicit intellectual background shaped Dune and resolves the anomalies.

“Miscellaneous”, Branwen 2009

Notes: “Miscellaneous”⁠, Gwern Branwen (2009-08-05; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Misc thoughts, memories, proto-essays, musings, etc.

We usually clean up after ourselves, but sometimes, we are expected to clean before (ie. after others) instead. Why?

Because in those cases, pre-cleanup is the same amount of work, but game-theoretically better whenever a failure of post-cleanup would cause the next person problems.

“Great Mambo Chicken and the Transhuman Condition: Science Slightly over the Edge”, Regis 1990

1990-regis-greatmambochickenandthetranshumancondition.pdf: “Great Mambo Chicken and the Transhuman Condition: Science Slightly over the Edge”⁠, Ed Regis (1990-01-01)