2019 News

Annual summary of 2019 gwern.net newsletters, selecting my best writings, the best 2019 links by topic, and the best books/movies/anime I saw in 2019, with some general discussion of the year and the 2010s, and an intellectual autobiography of the past decade.
newsletter, personal, meta, NN
2019-11-212020-11-25 in progress certainty: log importance: 0


This is the of the Gwern.net newslet­ter (), sum­ma­riz­ing the best of the monthly 2019 newslet­ters:

  1. end of year sum­mary (here)

Pre­vi­ous annual newslet­ters: , , , .

Writings

2019 went well, with much inter­est­ing news and sev­eral stim­u­lat­ing trips. My 2019 writ­ings includ­ed:

  1. /

I’m par­tic­u­larly proud of the tech­ni­cal improve­ments to the Gwern.net site design this year: along with a host of minor typo­graphic improve­ments & per­for­mance opti­miza­tions, enables auto­matic updates of cur­ren­cies (a fea­ture I’ve long felt would make doc­u­ments far less mis­lead­ing), the link annotations/popups () are a major usabil­ity enhance­ment few sites have, sidenotes.js elim­i­nates the frus­tra­tion of foot­notes by pro­vid­ing , col­lapsi­ble sec­tions help tame long writ­ings by avoid­ing the need for hid­ing code or rel­e­gat­ing mate­r­ial to appen­dices, and link icons & drop caps & epigraphs are just pret­ty. While changes are never unan­i­mously received, we have received many com­pli­ments on the over­all design, and are pleased with it.

Site traffic (more detailed break­down) was again up as com­pared with the year before: 2019 saw 1,361,195 pageviews by 671,774 unique vis­i­tors (life­time totals: 7,988,362 pageviews by 3,808,776 user­s). I ben­e­fited pri­mar­ily from TWDNE, although the num­bers are some­what inflated by host­ing a num­ber of pop­u­lar archived pages from /OKCupid/, which I put Google Ana­lyt­ics on to keep track of refer­rals.

Overview

2019 was a fun year.

AI: 2019 was a great year for hob­by­ists and fun gen­er­a­tive projects like mine, thanks to spin­offs and espe­cially pre­trained mod­els. How much more bor­ing it would have been with­out the GPT-2 or StyleGAN mod­els! (There was irri­tat­ingly lit­tle mean­ing­ful news about self­-driv­ing cars.) More seri­ous­ly, the theme of 2019 was scal­ing. Whether GPT-2 or StyleGAN 1/2, or the scal­ing papers, or AlphaS­tar, or , 2019 demon­strated the bless­ings of scale in scal­ing up mod­els, com­pute, data, and tasks; it is no acci­dent that the most exten­sively dis­cussed edi­to­r­ial on DL/DRL was Rich Sut­ton’s . For all the crit­ics’ carp­ing and goal­post-mov­ing, scal­ing is work­ing, espe­cially as we go far past the regimes where they assured us years ago that mere size and com­pute would break down and we would have to use more ele­gant and intel­li­gent meth­ods like Bayesian pro­gram syn­the­sis. Instead, every year it looks increas­ingly like the strong con­nec­tion­ist the­sis is cor­rect: much like humans & evo­lu­tion, AGI can be reached by train­ing an extremely large num­ber of rel­a­tively sim­ple units end-to-end for a long time on a wide vari­ety of mul­ti­modal tasks, and it will recur­sively self­-im­prove meta-learn­ing effi­cient inter­nal struc­tures & algo­rithms opti­mal for the real world which learns how to gen­er­al­ize, rea­son, self­-mod­ify with inter­nal learned reward prox­ies & opti­miza­tion algo­rithms, and do zero/few-shot learn­ing boot­strapped purely from the ulti­mate reward sig­nal­s—with­out requir­ing exten­sive hand-engi­neer­ing, hard­wired spe­cial­ized mod­ules designed to sup­port sym­bolic rea­son­ing, com­pletely new par­a­digms of com­put­ing hard­ware etc. (eg ). Self­-driv­ing cars remain a bit­ter mys­tery (although it was nice to see in 2019 Elon Musk & Tesla snatch vic­tory from the jaws of snatch­ing-de­feat-from-the-jaws-of-vic­to­ry).

2019 for genet­ics saw more progress on genet­ic-engi­neer­ing top­ics than GWASes; the GWASes that did come out were largely con­fir­ma­to­ry—no one really needed more SES GWASes from Hill et al, or con­fir­ma­tion that the IQ GWASes work and that brain size is in fact causal for intel­li­gence, and while the recov­ery of full height/BMI trait her­i­tabil­ity from WGS is a strong endorse­ment of the long-term value of WGS, the tran­si­tion from lim­ited SNP data to WGS is fore­or­dained (espe­cially since WGS costs appear to finally be drop­ping again after their long stag­na­tion). Even embryo selec­tion saw greater main­stream accep­tance, with a paper in Cell con­clud­ing (for the crud­est pos­si­ble sim­ple embryo selec­tion meth­ods) that, for­tu­nate­ly, the glass was half-empty and need not be feared over­much. (I am pleased to see that as of 2019, human geneti­cists have man­aged to rein­vent /Lush 1943‘s breed­er’s equa­tion; with any luck, in a few years they may progress as far as rein­vent­ing Hazel & Lush 1943.) More inter­est­ing were the notable events along all axis of post-sim­ple-em­bry­o-s­e­lec­tion strate­gies: Genomic Pre­dic­tion claimed to have done the first embryo selec­tion on mul­ti­ple PGSes, genome syn­the­sis saw E. coli achieved, mul­ti­ple promis­ing post-CRISPR or mass CRISPR edit­ing meth­ods were announced, gene drive pro­gressed to mam­mals, game­to­ge­n­e­sis saw progress (in­clud­ing at least two human fer­til­ity star­tups I know of), seri­ous pro­pos­als for human germline CRISPR edit­ing are being made by a Russ­ian (among oth­er­s), and while He Jiankui was impris­oned by a secret court there oth­er­wise do not appear to have been seri­ous reper­cus­sions such as reports of the 3 CRISPR babies being harmed or an ’in­defi­nite mora­to­rium’ (ie. ban). Thus, we saw good progress towards the enabling tech­nolo­gies for mas­sive embryo selec­tion (break­ing the egg bot­tle­neck by allow­ing gen­er­a­tion of hun­dreds or thou­sands of embryos and thus mul­ti­ple-SD gains from selec­tion), IES (It­er­ated Embryo Selec­tion), mas­sive embryo edit­ing (CRISPR or deriv­a­tives), and genome syn­the­sis.

VR’s 2019 launch of proved quite suc­cess­ful, sell­ing out occa­sion­ally well after launch, and appears to appeal to nor­mal peo­ple, with even hard­core VR fans acknowl­edg­ing how much they appre­ci­ate the con­ve­nience of a sin­gle inte­grated unit. Unfor­tu­nate­ly… it is not suc­cess­ful enough. There is no VR wave. Sell­ing out may have as much to do with Face­book not invest­ing too much into man­u­fac­tur­ing. Worse, there is still no killer app beyond . The hard­ware is ade­quate to the job, the price is nuga­to­ry, the expe­ri­ence unpar­al­leled, but there is no stam­pede into VR. So it seems VR is doomed to the long slow mul­ti­-decade adop­tion slog like that of PCs: it’s too new, too differ­ent, and we’re still not sure what to do with it. One day, it would not be sur­pris­ing if most peo­ple have a VR head­set, but that day is a long way away.

Bit­coin: lit­tle of note. Dark­net mar­kets proved unusu­ally inter­est­ing: Dream Mar­ket, the longest-lived DNM ever, finally expired; Red­dit betrayed its users by whole­sale purg­ing of sub­red­dits, includ­ing /r/DarkNetMarkets, caus­ing me a great deal of grief; and most shock­ing­ly, Deep­DotWeb was raided by the FBI over affil­i­ate com­mis­sions it received from DNMs (ap­par­ently into the tens of mil­lions of dol­lars—y­ou’d’ve thought they’d taken down those hideous ads all over DDW if the affil­i­ate links were so profitable…)

What Progress?

As the end of a decade is a tra­di­tional time to look back, I thought I’d try my own ver­sion of Scott Alexan­der’s essay , where he con­sid­ered how his ideas/beliefs evolved over the past decade of blog­ging, as I started my own web­site a decade ago as well.

I’m not given to intro­spec­tion, so I was sur­prised to think back to 2010 and real­ize how far I’ve come in every way—even list­ing them objec­tively would sound insuffer­ably con­ceit­ed, so I won’t. To thank some of the peo­ple who helped me, directly or indi­rect­ly, risks (to para­phrase Borges) repu­di­at­ing my debts to the oth­ers; nev­er­the­less, I should at least thank the fol­low­ing: Satoshi Nakamo­to, kiba, Ross Ulbricht, Seth Roberts, Luke Muehlhauser, Nava White­ford, Patrick McKen­zie, SDr, Modafinil­Cat, Steve Hsu, Jack Con­te, Said Achmiz, Patrick & John Col­lison, and Shawn Press­er.

2010 was per­haps the worst of times but also best of times, because it was the year the future reboot­ed.

My per­sonal cir­cum­stances were less than ide­al. Wikipedi­a’s dele­tion­ist invo­lu­tion had inten­si­fied to the point where every­one could see it both on the ground and from the global sta­tis­tics, and it was becom­ing clear the cul­tural shift was irre­versible. Genetic engi­neer­ing con­tin­ued its grind­ing­ly-s­low progress towards some day doing some­thing, while in com­plex trait/behavioral genet­ics research, the early GWASes pro­vided no use­ful poly­genic scores but did reveal the full mea­sure of the can­di­date-gene deba­cle: it was­n’t a minor method­olog­i­cal issue, but in some fields the false-pos­i­tive rate approached 100%, and tens of thou­sands of papers were, or were based on, absolutely worth­less research. AI/machine learn­ing were exhaust­ed, with state-of-the-art typ­i­cally some sort of hand-engi­neered solu­tion or com­pli­cated incre­men­tal tweaks to some­thing crude like an SVM or ran­dom forest, with no even slightly viable path to inter­est­ingly pow­er­ful sys­tems (much less AGI). At best, one could say that the stag­nant back­wa­ter of AI, neural net­work research, showed a few inter­est­ing results, and the Schmid­hu­ber lab had won a few obscure con­tests, and might finally prove use­ful for some­thing over the com­ing years beyond play­ing backgam­mon or read­ing ZIP codes, assum­ing any­one could avoid eye­-rolling at Schmid­hu­ber’s web­site(s) long enough to read as far as his pro­jec­tions that com­put­ing power and NN per­for­mance trends would both con­tinue (“what NN trends‽”); I noted to my sur­prise that Shane Legg had left an appar­ently suc­cess­ful aca­d­e­mic career to launch a new startup called ‘Deep­Mind’, to do some­thing with NNs. (Good for him, I thought, maybe they’ll get acqui­hired for a few mil­lion bucks by some cor­po­ra­tion need­ing a small improve­ment from exotic ML vari­ants, and then with a finan­cial cush­ion, he can get back to his real work. After all, it’s not like con­nec­tion­ism ever worked before…) And rein­force­ment learn­ing, as far as any­one need be con­cerned, did­n’t work. Cryp­tog­ra­phy, as far as any­one need be con­cerned, con­sisted of the art of (not) pro­tect­ing your Inter­net con­nec­tion. Main­stream tech­nol­ogy was obsessed with the mobile shift, of lit­tle inter­est or value to me, and a shift that came with severe draw­backs, such as a mas­sive migra­tion away from FLOSS back to pro­pri­etary tech­nolo­gies & walled gar­dens & ‘ser­vices’. All in all, there was not that much to draw inter­est in exis­ten­tial risk; if you had been inter­est­ed, at the time, prob­a­bly the best thing you could’ve done is focus on the stand­bys of nuclear war & pan­demic (nei­ther have been solved), or left your money invested to wait for more com­pelling char­i­ta­ble oppor­tu­ni­ties. The future had arrived, and had brought lit­tle with it besides 144 char­ac­ters.

But also in 2010, dis­il­lu­sioned with writ­ing on Wikipedia, I reg­is­tered Gwern.net. Some geneti­cists had begun buzzing over GCTA and the early GWASes’ poly­genic scores, which indi­cated that, can­di­date-genes notwith­stand­ing, the genes were there to be found, and sim­ple power analy­sis implied there were sim­ply so many of them that one would need sam­ples of tens of thou­sand­s–no, hun­dreds of thou­sand­s—of peo­ple to start find­ing them, which sounded daunt­ing, but for­tu­nately the super-ex­po­nen­tial curve in sequenc­ing costs ensured that those sam­ples would become avail­able in mere years, through things like some­thing called ‘The UK BioBank’ (UKBB). (Told of this early on, I was skep­ti­cal: that seemed like an awful lot of faith in extrap­o­lat­ing a trend, and when did mega-pro­jects like that ever work out?) Even more obscure­ly, some micro­bial geneti­cists noted that an odd pro­tein asso­ci­ated with ‘CRISPR’ regions seemed to be part of a sort of bac­te­r­ial immune sys­tem and could cut DNA effec­tive­ly. Con­nec­tion­ism sud­denly started work­ing, and the nascent com­pute and NN trends did con­tinue for the next decade, with overnight chang­ing com­puter vision, after which NNs began rapidly expand­ing and col­o­niz­ing adja­cent fields (de­spite peren­nial pre­dic­tions that they would reach their lim­its Real Soon Now), with Trans­form­er­s+pre­train­ing recently claim­ing the scalp of the great hold­out, nat­ural lan­guage pro­cess­ing—nous sommes tous con­nex­ion­nistes. At the same time, Wik­ileaks reached its high­-wa­ter mark, help­ing inspire Edward Snow­den, Bit­coin was gain­ing trac­tion (I would hear of it and start look­ing into it in late 2010), and Ross Ulbricht was mak­ing Silk Road 1 (which he would launch in Jan­u­ary 2011). In VR, Valve had returned to tin­ker­ing with it, and a young Palmer Luckey had begun to play with using smart­phone screens as cheap high­-speed high­-res small dis­plays for head­sets. As dom­i­nant as mobile was, 2010 was also close to the peak (eg the launch of Insta­gram): a mobile strat­egy could now be taken for grant­ed, and infra­struc­ture & prac­tices had begun to catch up, so the gold rush was over and atten­tion could refo­cus else­where.

Pro­gres­sion of inter­ests: DNB → QS + IQ → sta­tis­tics + meta-analy­sis + Repli­ca­tion Cri­sis → Bit­coin + dark­net mar­kets → behav­ioral genet­ics → deci­sion the­ory → DL+DRL.

To the extent there is any con­sis­tent thread through my writ­ing, it is an inter­est in opti­miza­tion, which in humans is lim­ited pri­mar­ily by intel­li­gence. New dis­cov­er­ies, improve­ments, and changes must come from some­where; no mat­ter how you orga­nize any num­ber of chim­panzees, they will not invent the atomic bomb. Thus my inter­est in Dual N-Back (DNB), Quan­ti­fied Self (QS), and Spaced Rep­e­ti­tion Sys­tems (SRS) in 2010: these seemed promis­ing routes to improve­ment. Of the­se, the first was a com­plete and unmit­i­gated fail­ure, the sec­ond was some­what use­ful, and the third is highly use­ful in rel­a­tively nar­row domains (although exper­i­ments in extend­ing & inte­grat­ing it are worth­while).

I’d come across in Wired the first men­tion of the study that put dual n-back on the map, Jaeggi et al 2008 and was intrigued. While I knew the his­tory of IQ-boost­ing inter­ven­tions was dis­mal, to say the least, Jaeggi seemed to be quite rig­or­ous, had a good story about WM being the bot­tle­neck con­sis­tent with what I knew from my cog­ni­tive psy­chol­ogy courses and read­ing, and it quickly attracted some fol­lowup stud­ies which seemed to repli­cate it. I began n-back­ing, and dis­cussing the lat­est vari­ants and research. The rep­e­ti­tion in dis­cus­sions prompted me to start putting together a DNB FAQ, which was unsuit­able for Wikipedia, so I hosted it on my shell account on code.haskell.org for a while, until I wanted to write a few more pages and decided it was time to stop abus­ing their gen­eros­ity and set up my own web­site, Gwern.net, using this new sta­tic web­site gen­er­a­tor called Hakyll. I did­n’t know what I wanted to do, but I could at least write about any­thing I found inter­est­ing on my own web­site with­out wor­ry­ing about dele­tion­ists or whether it was appro­pri­ate for Less­Wrong, and it would be good for host­ing the occa­sional PDF too.

While n-back­ing, I began read­ing about the Repli­ca­tion Cri­sis and became increas­ingly con­cerned, par­tic­u­larly when Moody crit­i­cized DNB on sev­eral method­olog­i­cal grounds, argu­ing that the IQ gains might be hol­low and dri­ven by the test being sped up (and thus like play­ing DNB) or by moti­va­tional effects (be­cause the con­trol group did noth­ing). I began pay­ing closer atten­tion to stud­ies, null results began to come out from Jaeg­gi-u­naffil­i­ated labs, and I began teach­ing myself R to ana­lyze my self­-ex­per­i­ments & meta-an­a­lyze the DNB stud­ies (be­cause no one else was doing it).

What I found was ugly: a thor­ough can­vass­ing of the lit­er­a­ture turned up plenty of null results, researchers would tell me about the diffi­cul­ties in get­ting their nulls pub­lished, such as how peer review­ers would tell them they must have messed up their exper­i­ment because “we know n-back works”, and I could see even before run­ning my first meta-analy­sis that stud­ies with pas­sive con­trol groups got much larger effects than the bet­ter stud­ies run with active con­trol groups. Worse, some method­ol­o­gists finally caught up and ran the SEMs on ful­l-s­cale IQ tests instead of sin­gle matrix sub­tests (which is what you are always sup­posed to do to con­firm mea­sure­ment invari­ance, but in psy­chol­o­gy, mea­sure­ment & psy­cho­met­rics are hon­ored pri­mar­ily in the breach); the gain, such as it was, was­n’t even on the latent fac­tor of intel­li­gence. Mean­while, Jaeggi et al began pub­lish­ing about com­plex mod­er­a­tion effects from per­son­al­i­ty, and ulti­mately sug­gest­ing in their rival meta-analy­sis that pas­sive vs active was sim­ply because of coun­try-level differ­ences—be­cause I guess brains don’t work in the USA the way they do every­where else. They (Jaeggi had since earned tenure) con­ceded the debate largely by sim­ply no longer run­ning DNB stud­ies and shift­ing focus to spe­cial­ized pop­u­la­tions like chil­dren with ADHD and other fla­vors of work­ing mem­ory train­ing, although by this point I had long since moved on and I can’t say what hap­pened to all the drama­tis per­sonae.

It was, in short, a pro­to­typ­i­cal case of the Repli­ca­tion Cri­sis: a con­ve­nient envi­ron­men­tal “One Weird Trick” inter­ven­tion, with extremely low prior prob­a­bil­i­ty, which made no quan­ti­ta­tive pre­dic­tions, sup­ported by a weak method­ol­ogy which could man­u­fac­ture pos­i­tive results and propped up by selec­tive cita­tion, sys­temic pres­sure towards pub­li­ca­tion bias, researcher alle­giance & com­mer­cial con­flicts of inter­est, which the­ory was never defin­i­tively dis­proven, but lost pop­u­lar­ity and sort of faded away. When I later reread Meehl, I rue­fully rec­og­nized it all.

Inspired by Seth Roberts, I began run­ning per­sonal self­-ex­per­i­ments, mostly on nootrop­ics. I think this is an under­ap­pre­ci­ated way of learn­ing sta­tis­tics, as it ren­ders con­crete all sorts of abstruse issues: block­ing vs ran­dom­iza­tion, blind­ing, power analy­sis, nor­mal­i­ty, miss­ing­ness, time-series & auto-cor­re­la­tion, car­ry­over, infor­ma­tive pri­ors & meta-analy­sis, sub­jec­tive Bayesian deci­sion the­o­ry—all of these arise nat­u­rally in con­sid­er­ing, plan­ning, car­ry­ing out, and ana­lyz­ing a self­-ex­per­i­ment. By the end, I had a nice work­flow: obtain a hypoth­e­sis from the QS com­mu­nity or per­sonal obser­va­tion; find or make a meta-analy­sis about it; con­sider costs and ben­e­fits to do a power analy­sis for Value of Infor­ma­tion; run a blinded ran­dom­ized self­-ex­per­i­ment with blocks to esti­mate the true per­sonal causal effect purged of hon­ey­moon or expectancy effects; run the analy­sis at check­points for futility/low VoI; and write it up.

The first break­through was blind­ing: hav­ing been burned by DNB’s moti­va­tional expectancy effects, I became con­cerned about that ren­der­ing all QS results invalid. Blind­ing is sim­ple enough if you are work­ing with other peo­ple, but involv­ing any­one else is a great bur­den for a self­-ex­per­i­menter, which is one rea­son no one in QS ever ran blinded self­-ex­per­i­ments. After some thought, I real­ized self­-blind­ing is quite easy; all that is nec­es­sary is to invis­i­bly mark con­tain­ers or pills in some way, ran­dom­ize, and then one can infer after data col­lec­tion which was used. For exam­ple, one could color two pills by experimental/placebo, shake them in a bag, take one with one’s eyes closed, and look at the remain­ing pill after record­ing data, or one could use iden­ti­cal-look­ing pills but in two sep­a­rate con­tain­ers, mark­ing the inside or bot­tom of one, and again look­ing after­wards. (Self-blind­ing is so sim­ple I refuse to believe I invented it, but I have yet to come across any prior art.) This gave me the con­fi­dence to go for­ward with my self­-ex­per­i­ments, and I think I had some inter­est­ing results, some of which are expected (red-t­int­ing soft­ware affect­ing sleep sched­ule), have been sup­ported by sub­se­quent research (the null effects of LSD micro­dos­ing1), some which are plau­si­ble but not yet con­firmed (vi­t­a­min D dam­ag­ing sleep when taken at night), and some which are sim­ply mys­te­ri­ous (the strik­ingly harm­ful effects of potas­sium but also appar­ently of mag­ne­sium?). Iron­i­cal­ly, I’ve grad­u­ally become con­vinced that self­-blind­ing is not that impor­tant because the major­ity of bias in self­-ex­per­i­ments comes from selec­tive data report­ing & non-ran­dom­iza­tion—peo­ple fool them­selves by attribut­ing the good days to the new nootropic (omit­ting all the aver­age or blah days), and good days cause one to do optional things (like take nootrop­ic­s).

I wound down my QS & nootrop­ics activ­i­ties for a few rea­sons. First, I became more con­cerned about the mea­sure­ment issues: I am inter­ested in daily func­tion­ing and ‘pro­duc­tiv­ity’, but how on earth does one mean­ing­fully quan­tify that? What are the “ener­gies of men”, as William James put it? Almost all mod­els look for an aver­age treat­ment effect, but given the sheer num­ber of nulls and intro­spect­ing about what ‘pro­duc­tive’ days felt like, it seems like expect­ing a large increase in the aver­age (treat­ing it as a latent vari­able in being spread across many mea­sured vari­ables) is entirely miss­ing the value of pro­duc­tiv­i­ty. It’s not that one gets a lot done across every vari­able, but one gets done the impor­tant things. A day in which one tidies up, sends a lot of emails, goes to the gym, walks the dog, may be worse than use­less, while an extra­or­di­nary day might entail 12 hours pro­gram­ming with­out so much as chang­ing out of one’s pyja­mas. Pro­duc­tiv­i­ty, it seems, might func­tion more like ‘mana’ in an RPG than more famil­iar con­structs like IQ, but unfor­tu­nate­ly, sta­tis­ti­cal mod­els which infer some­thing like a total sum or index across many vari­ables are rare; I’ve made lit­tle progress on mod­el­ing QS exper­i­ments which might increase total ‘mana’. But if you can’t mea­sure some­thing cor­rect­ly, there is no point in try­ing to exper­i­ment on it; my Zeo did not mea­sure sleep any­where close to per­fect­ly, but I lost faith that mea­sur­ing a few things like emails or my daily 1–5 self­-rat­ings could give me mean­ing­ful results rather than con­stant false neg­a­tives due to mea­sure­ment error. Sec­ond­ly, nootrop­ics seem to run into dimin­ish­ing returns quick­ly. As I see it, all nootrop­ics work in one of 4 ways:

  1. stim­u­lant
  2. anx­i­olytic
  3. nutri­tional defi­ciency fix
  4. it does­n’t

You don’t need to run through too many stim­u­lants or anx­i­olyt­ics to find one that works for you, defi­cien­cies are deeply idio­syn­cratic and hard to pre­dict (who would think that iron sup­ple­men­ta­tion could cure pica or that veg­e­tar­i­ans seem to ben­e­fit from cre­atine?), and past that, you are just wast­ing money (or worse). The low-hang­ing fruit has gen­er­ally been plucked already—yeah, iodine is great… if you lived a cen­tu­ry, in a defi­cient region, but unfor­tu­nate­ly, as a West­ern adult it prob­a­bly does zilch for you. (Like­wise mul­ti­vi­t­a­min­s.) I lost inter­est as it looked increas­ingly likely that I was­n’t going to find a new nootropic as use­ful to me as mela­ton­in, nicotine, or modafinil.

Why were there so few sil­ver bul­lets? Why did DNB fail (as well as every other intel­li­gence inter­ven­tion), why was the Flynn effect hol­low, why were good nootrop­ics so hard to find, why was the base rate of cor­rect psy­cho­log­i­cal hypothe­ses (as Ioan­ni­dis would put it) so low? The best answer seemed to be “Alger­non’s law”: as a highly poly­genic fit­ness-rel­e­vant trait, human intel­li­gence has already been rea­son­ably well opti­mized, is not par­tic­u­larly mal­adapted to the cur­rent envi­ron­ment (un­like per­son­al­ity traits or moti­va­tion or caloric expen­di­ture), and thus has no use­ful ‘hacks’ or ‘over­rides’. Intel­li­gence is highly valu­able to increase, and cer­tainly could be increased—just marry a smart per­son­—but there are no sim­ple easy ways to do so, par­tic­u­larly for adults. Gains would have to come from fix­ing many small prob­lems, but that is the sort of thing that drugs and envi­ron­men­tal changes are worst at. (It is telling that after decades of search­es, there is still not a sin­gle known muta­tion, rare or com­mon, which increases intel­li­gence by even a few IQ points, aside from pos­si­bly that ultra­-rare one . In con­trast, plenty of large-effect muta­tions are known for more neu­tral traits like height.)

Going fur­ther, I won­dered whether the dis­as­ters in nutri­tion & exer­cise research, so par­al­lel to soci­ol­ogy & psy­chol­o­gy, were merely cher­ry-picked. Yes, every­one knows ‘cor­re­la­tion ≠ cau­sa­tion’, but is this a seri­ous objec­tion, or is it a nit­pick­ing one of the sort which only occa­sion­ally mat­ters and is abused as a uni­ver­sal excuse? I found that first, the pres­ence of a cor­re­la­tion is a fore­gone con­clu­sion and hence com­pletely unin­for­ma­tive: because “every­thing is cor­re­lated”, find­ing a cor­re­la­tion between two vari­ables is use­less, since of course you’ll find one if you look, so it’d be sur­pris­ing only if you were did­n’t detect one (and that might say more about your sta­tis­ti­cal power than the real world). Pre­dict­ing their direc­tion is not impres­sive either, since that’s just a coin-flip, and there is a pos­i­tive man­i­fold (from intel­li­gence, SES, and health, if noth­ing else), mak­ing it even eas­i­er. Sec­ond, the sim­plest way to answer this ques­tion is to look at ran­dom­ized exper­i­ments and com­pare—pre­sum­ably only the best and most promis­ing cor­re­la­tions, sup­ported by many datasets and sur­viv­ing all stan­dard ‘con­trols’, will progress to expen­sive ran­dom­ized exper­i­ments, and the con­di­tional prob­a­bil­ity of the ran­dom­ized exper­i­ment agree­ing with a good cor­re­la­tion is what we care about most, since we in prac­tice we already ignore the more dubi­ous ones. There are a few such papers cited in the Repli­ca­tion Cri­sis lit­er­a­ture, but I found that (Cowen’s Law!) there were actu­ally quite a few such reviews/meta-analyses, par­tic­u­larly by the UK NICE (which does under­ap­pre­ci­ated work). The results are not good: it is not rare, but com­mon, for the ran­dom­ized result to be seri­ously differ­ent from the prior cor­re­la­tion results, and this holds true whether we look at eco­nom­ics, soci­ol­o­gy, med­i­cine, adver­tis­ing… When we look at the track record of social inter­ven­tions, pay­ing par­tic­u­lar atten­tion to inter­ven­tions which have gone down the mem­ory hole and lessons from the Repli­ca­tion Cri­sis (sor­ry, no one believes your 20-year-fol­lowup where only females under 13 at the time of inter­ven­tion show sta­tis­ti­cal­ly-sig­nifi­cant ben­e­fit­s), it’s not much bet­ter. Inter­ven­tions also tend to be self­-lim­ited by their tem­po­rary and focused nature: “every­thing is cor­re­lated” and out­comes are dri­ven by pow­er­ful latent vari­ables which are main­tained by unchange­able indi­vid­ual genet­ics (“every­thing is her­i­ta­ble”) which are respon­si­ble for lon­gi­tu­di­nal sta­bil­ity in traits (per­haps through mech­a­nisms like “the envi­ron­ment is genetic”) and long-last­ing indi­vid­ual differ­ences which are set early in life. This pro­duces the “gloomy prospect” for QS, and for social engi­neer­ing in gen­er­al: there may be use­ful inter­ven­tions, but they will be of lit­tle value on aver­age—if the ben­e­fit is uni­ver­sal, then it will be small; if it is large and pre­dictable, then it will be lim­ited to the few with a par­tic­u­lar dis­ease; oth­er­wise, it will be unpre­dictably idio­syn­cratic so those who need it will not know it. Thus, the metal­lic laws: the larger the change you expect, the less likely it is; the low-hang­ing fruit, hav­ing already been plucked, will not be test­ed; and the more rig­or­ously you test the left­overs, the smaller the final net effects will be.

Spaced rep­e­ti­tion was another find in Wired, and made me feel rather dim. We had cov­ered Ebbing­haus and the spac­ing effect in class, but it had never occurred to me that it could be use­ful, espe­cially with a com­put­er. (“How extremely stu­pid of me not to have thought of that.”) I imme­di­ately began using the Mnemosyne SRS for Eng­lish vocab­u­lary (sourc­ing new words from A Word A Day) and class­work. I stopped the Eng­lish vocab when I real­ized that my vocab­u­lary was already so large that if I needed SRS to mem­o­rize a word, I was bet­ter off never using that word because it was a bar to com­mu­ni­ca­tion, and that my prose already used too many vocab words, but I was so impressed that I kept using it. (It is mag­i­cal to use an SRS for a few weeks, doing bor­ing flash­card reviews with some oddly fussy frills, and then a flash­card pops up and one real­izes that one just remem­bers it despite hav­ing only seen the card 4 or 5 times and noth­ing oth­er­wise.) It was par­tic­u­larly handy for learn­ing French, but I also value it for stor­ing quotes and poet­ry, and for help­ing cor­rect per­sis­tent errors. For a Less­Wrong con­test, I read entirely too many papers on spaced rep­e­ti­tion and wrote an overview of the top­ic, which has been use­ful over the years as inter­est in SRS has steadily grown.

What drew in my inter­est next? A reg­u­lar on IRC, one kiba, kept talk­ing about some­thing called Bit­coin, tout­ing it to us in Novem­ber 2010 and on. I was curi­ous to see this atavis­tic resur­gence of cypher­punk ideas—a decen­tral­ized uncen­sorable e-cash, real­ly? It reminded me of being a kid read­ing the Jar­gon File and the Cypher­nomi­con and Phrack and &TOTSE. Bit­coin, sur­pris­ingly enough, worked (which was more than one could say of other cypher­punk infra­struc­ture like remail­er­s), and the core idea, while defi­nitely a bit per­verse, was a sur­pris­ing method of squar­ing the cir­cle; I could­n’t think of how it ought to fail in the­o­ry, and it was work­ing in prac­tice. In ret­ro­spect, being intro­duced to Bit­coin so ear­ly, before it had become con­tro­ver­sial or polit­i­cal­ly-charged, was enor­mously lucky: you could waste a life­time look­ing into long­shots with­out hit­ting a sin­gle Bit­coin, and here Bit­coin walked right up to me unbid­den, and all I had to do was sim­ply eval­u­ate it on its mer­its & not reflex­ively write it off. Still, some cute ideas and a work­ing net­work hardly jus­ti­fied wild claims about it eat­ing world cur­ren­cies, until I read in Gawker about a Tor drug mar­ket which was not a scam (like most Tor drug shops were) and used only Bit­coin, called “Silk Road”, straight out of the pages of the Cypher­nomi­con: it used escrow, feed­back, and had many buy­ers & sell­ers already. That got my atten­tion, but sub­se­quent cov­er­age of SR1 drove me nuts with how incom­pe­tent and igno­rant it was. Kiba gave me my first bit­coins to write an SR1 dis­cus­sion and tuto­r­ial for his Bit­coin Weekly online mag­a­zine, which I duly did (order­ing, of course, some Adder­all for my QS self­-ex­per­i­ments) The SR1 tuto­r­ial was my first page to ever go viral. I liked that. Peo­ple kept using it as a guide to get­ting started on SR1, because there is a great gap between a tool exist­ing and every­one using it, and good doc­u­men­ta­tion is as under­es­ti­mated as open datasets.

Many crit­i­cisms of Bit­coin, includ­ing those from cryp­to­graphic or eco­nomic experts, were not even wrong, but showed they had­n’t under­stood the ‘Proof of Work’ mech­a­nism behind Bit­coin. This made me eager to get Bit­coin because when some­one like P— K— or C— S— made really dumb crit­i­cisms and revealed that they claimed Bit­coin would fail not because the actual mech­a­nisms would break down but because they wanted Bit­coin to fail so gov­ern­ments can more eas­ily manip­u­late the money sup­ply, such wish­ful think­ing, from oth­er­wise decent thinkers implied that Bit­coin was under­val­ued. (Bit­coin’s price was either far too low or far too high, and reversed stu­pid­ity is intel­li­gence when deal­ing with a binary like that.) The per­sis­tence of such lousy crit­ics was inter­est­ing, as it meant that PoW fell into some sort of intel­lec­tual blind spot; as I not­ed, there was no rea­son Bit­coin could not have been invented decades pri­or, and unlike most suc­cess­ful tech­nolo­gies or star­tups where , Bit­coin came out of the blue—in terms of intel­lec­tual impact, “hav­ing had no pre­de­ces­sor to imi­tate, he had no suc­ces­sor capa­ble of imi­tat­ing him”. What was the key idea of Bit­coin, its biggest advan­tage, and why did peo­ple find it so hard to under­stand? I explained it to every­one in my May 2011 essay, , which also went viral; I am some­times asked, all these crazy years lat­er, if I have changed my views on it. I have not. I nailed the core of Bit­coin in 2011, and there is naught to add nor take away.

Mean­while, Bit­coin kept grow­ing from a tiny seed to, years lat­er, some­thing even my grand­mother had heard of. (She heard on the news that Bit­coin was bank­rupt and asked if I was OK; I assured her I had had noth­ing on MtGox and would be fine.) I did not ben­e­fit as much finan­cially as I should have, because of, shall we say, ‘liq­uid­ity con­straints’, but I can hardly com­plain—in­volve­ment would have been worth­while even with­out any remu­ner­a­tion, just to watch it develop and see how wrong every­one can be. One regret was the case of Craig Wright, which inves­ti­ga­tion turned out to be a large waste of my time and to back­fire; I regret not work­ing with Andy Green­berg to write an arti­cle, nor any of the pos­si­ble find­ings we missed, but that when we learned Gawker was work­ing on a Wright piece, that we assumed the worst, that it was a triv­ial blog post which would scoop our months of work, and so we jumped the gun by pub­lish­ing an old draft arti­cle instead of con­tact­ing them to find out what they had. We had not remotely fin­ished look­ing into Craig Wright, and it turned out that Gawker had got­ten even fur­ther than we had, if any­thing, and if both groups had pooled their research, we would’ve had a lot more time before pub­lish­ing, and more time to look into the many red flags and unchecked details. As it was, all we got for our trou­ble was to be dragged through the mud by Wright believ­ers, who were furi­ous at us mon­sters try­ing to unmask a great man, and by Wright crit­ics, who were furi­ous we were morons assist­ing a con­man mak­ing mil­lions and too dumb to see all the red flags in our own report­ing. More pos­i­tive­ly, the revival of cypher­punk ideas in Bit­coin and Ethereum was really some­thing to see, and gave rise to all sorts of crazy new ideas, as well as cre­at­ing a boom in research­ing exotic new cryp­tog­ra­phy, par­tic­u­larly zero-knowl­edge proofs. There are two impor­tant lessons there. The first is the extent to which progress can be held back by a sin­gle miss­ing piece: the basic prob­lem with almost every­thing in Cypher­nomi­con is that they require a viable e-cash to work, and with the fail­ure of Chau­mian e-cash schemes, there was none; the ideas for things like black mar­kets for drugs were cor­rect, but use­less with­out that one miss­ing piece. The sec­ond is the power of incen­tives: the amount of money in cryp­tocur­rency is, all things con­sid­ered, really not that large com­pared to the rest of the econ­omy or things like pet food sales, and yet, it is approx­i­mately ∞% more money than pre­vi­ously went into writ­ing cypher­punk soft­ware or into cryp­tog­ra­phy, and while a shock­ing amount went into bla­tant scams, Ponzis, hacks, bez­zles, and every other kind of fraud, an awful lot of soft­ware got writ­ten and an awful lot of new cryp­tog­ra­phy sud­denly showed up or was matured into prod­ucts pos­si­bly decades faster than one would’ve expect­ed.

Media cov­er­age of SR1 did not improve, hon­or­able excep­tions like Andy Green­berg or Eileen Ormsby aside, and I kept track­ing the top­ic, par­tic­u­larly record­ing arrests because I was intensely curi­ous how safe using DNMs was. SR1 worked so well that we became com­pla­cent, and the fall of SR1 was trau­mat­ic—all that infor­ma­tion and forum posts lost Almost all DNM researchers selfishly guarded their crawls (some­times to cover up bull­shit), , and datasets were clearly a bot­tle­neck in DNM research & his­to­ry, par­tic­u­larly given the extra­or­di­nary turnover in DNMs after­wards (not helped by the FBI announc­ing just how much Ross Ulbricht had made). This led me to my most exten­sive archiv­ing efforts ever, crawl­ing every extent Eng­lish DNM & forum on a weekly or more fre­quent basis. This was tech­ni­cally tricky and exhaust­ing, espe­cially as I mod­er­ated /r/DarkNetMarkets, and tried to track the fall­out from SR1 & all DNM-related arrests (due not just to the SR1 raid or lax secu­rity at suc­ces­sors, but sim­ply grow­ing vol­ume and time lags in learn­ing of cas­es, as many had been kept secret or were tied up in legal pro­ceed­ings).

As amus­ing as it was to be inter­viewed reg­u­larly by jour­nal­ists, or to see DNMs pop up & be hacked almost imme­di­ately (I did my part when I pointed out to Black Gob­lin Mar­ket that they had sent my user reg­is­tra­tion email over the clear­net, deanonymiz­ing their server in a Ger­man dat­a­cen­ter, and should prob­a­bly cut their loss­es), DNM dynam­ics dis­ap­pointed me. A few DNMs like The Mar­ket­place tried to improve, but buy­ers proved actively hos­tile to gen­uine mul­ti­sig, and despite all the doc­u­mented SR1 arrests, PGP use in pro­vid­ing per­sonal infor­ma­tion seemed to, if any­thing, decline. In the end, SR1 proved not to be a pro­to­type, soon obso­leted in favor of inno­va­tions like mul­ti­sig and truly decen­tral­ized mar­kets, but in some respects the peak of the cen­tral­ized DNM, rel­a­tively effi­cient & well-run and with fea­tures rarely seen on suc­ces­sors like hedg­ing the Bit­coin exchange rate. The card­ing fraud com­mu­nity moved in on DNMs, and the worst of the syn­thetic opi­ates like car­fen­tanil soon appeared. In 2020, DNMs are lit­tle differ­ent than when they sprang ful­ly-formed out of Ross Ulbricht’s fore­head in Jan­u­ary 2011. Users, it seems, are lazy; they are really lazy; even if they are doing crimes, they would rather risk going to jail than spend 20 min­utes fig­ur­ing out how to use PGP, and they would rather accept a large risk every month of los­ing hun­dreds or thou­sands of dol­lars rather than spend time fig­ur­ing out mul­ti­sig. By 2015, I had grown weary; the final straw was when ICE, absurd­ly, sub­poe­naed Red­dit for my account infor­ma­tion. (I could have told them that the infor­ma­tion they were inter­ested in had never been sent to me, and if it had, it would almost cer­tainly have been lies or a frame, not that they ever both­ered to ask me.) So, I shut down my spi­ders, neatly orga­nized and com­pressed them, and released them as a sin­gle pub­lic archive. I had hoped that by releas­ing so many datasets, it would set an exam­ple, but while >46 pub­li­ca­tions use my data as of Jan­u­ary 2020, few have seen fit to release their own. The even­tual suc­cess of my archives rein­forced my view that pub­lic per­mis­sion-less datasets are often a bot­tle­neck to research: you can­not guar­an­tee that peo­ple will use your dataset, but you can guar­an­tee that they won’t use it. Hardly any of the peo­ple who used my data ever so much as con­tacted me, and the num­ber of uses stands in stark con­trast to Nico­las Christin & Kyle Soska’s DNM datasets, which were released either cen­sored to the point of use­less­ness or using a highly oner­ous data archive ser­vice called IMPACT. And that was that. The FBI would later pay me some vis­its, but I was long done with the DNMs and had moved on.

For genet­ics was expe­ri­enc­ing its own revival. Much more dra­mat­i­cally than my DNM archives, human genet­ics was demon­strat­ing the power of open data as an obscure project called the UK BioBank (UKBB) came online, with n = 500k SNP geno­types & rich phe­no­type data; UKBB was only a small frac­tion of global genome data (sharded across a mil­lion silos and lit­tle emper­ors), and much smaller than 23andMe (dis­in­clined to do any­thing con­tro­ver­sial or which might impede drug com­mer­cial­iza­tion), but it differed in one all-im­por­tant respect: they made it easy for researchers to get all the data. The result is that—un­like 23and­Me, All Of Us, Mil­lion Vet­eran Pro­gram, or the entire nation of Chi­na—­pa­pers using UKBB show up lit­er­ally every day on BioRxiv alone, and it would be hard to find human genet­ics research which has­n’t ben­e­fit­ed, one way or anoth­er, from UKBB. (How did UKBB hap­pen? I don’t know, but there must be many unsung heroes behind it.) Empha­siz­ing this even more was the explo­sion of genetic cor­re­la­tion results. I read much of the human genetic cor­re­la­tion lit­er­a­ture, and it went from a few genetic cor­re­la­tion papers a year, to hun­dreds. Why? Sim­ple: pre­vi­ous­ly, genetic cor­re­la­tions required indi­vid­ual per­sonal data, as you either needed the data from twin pairs to cal­cu­late cross-twin cor­re­la­tions or run the SEM, or you needed the raw SNP geno­types to use GCTA; twin reg­istries keep their data close to their chest, and every­one with SNP data guards those even more jeal­ously (aside from UKBB). Like a dog in the manger, if the own­ers of the nec­es­sary datasets could­n’t get around to pub­lish­ing them, no one would be allowed to. But then a method­olog­i­cal break­through hap­pened: LD Score Regres­sion was released with a con­ve­nient soft­ware imple­men­ta­tion, and LDSC worked around the bro­ken sys­tem by only requir­ing the PGSes, not the raw data. Now a genetic cor­re­la­tion could be com­puted by any­one, for any pair of PGSes, and many PGSes had already been released as a con­ces­sion to open­ness. An explo­sion of reported genetic cor­re­la­tions fol­lowed, to the point where I had to stop com­pil­ing them for the Wikipedia entry because it was futile when every other paper might run LDSC on 50 traits.

The pre­dic­tions of some behav­ioral geneti­cists & human geneti­cists (par­tic­u­larly those with ani­mal genet­ics back­grounds) came true: increas­ing sam­ple sizes did deliver suc­cess­ful GWASes, and the ‘miss­ing her­i­tabil­ity’ prob­lem was a non-prob­lem. I had remained agnos­tic on the ques­tion of IQ and genet­ics, because while IQ is too empir­i­cally suc­cess­ful to be doubted (not that that stops many), the genet­ics have always been fiercely opposed; on look­ing into the ques­tion, I had decided that GWASes would be the crit­i­cal test, and in par­tic­u­lar, sib­ling com­par­isons would be the gold stan­dard­—as R.A. Fisher pointed out, sib­lings inherit ran­dom­ized genes from their par­ents and also grow up in shared envi­ron­ments, exclud­ing all pos­si­bil­i­ties of con­found­ing or reverse cau­sa­tion or pop­u­la­tion struc­ture: if a PGS works between-si­b­lings, it must be tap­ping into causal genes.2 The crit­ics had always con­cen­trated their fire on IQ, as, sup­pos­ed­ly, the most biased, racist, and rot­ten key­stone in the over­all archi­tec­ture of behav­ioral genet­ics, and evinced utter cer­ti­tude; as went IQ, so went their argu­ments. I decided years ago that a suc­cess­ful sib­ling test of an IQ GWAS with genome-wide sta­tis­ti­cal­ly-sig­nifi­cant hits (not can­di­date-ge­nes) is what it would take to change my mind.

The key result was Rietveld et al 2013, the first truly suc­cess­ful IQ GWAS. Rietveld et al 2013 found GWAS hits; fur­ther, it found between-si­b­ling differ­ences. (This sib­ling test would be repli­cated eas­ily a dozen times by 2020.) Read­ing it was a rev­e­la­tion. The debate was over: behav­ioral genet­ics was right, and the crit­ics were wrong. Kam­in, Gould, Lewon­tin, Shal­izi, the whole sorry pack­—an­ni­hi­lat­ed. IQ was indeed highly her­i­ta­ble, poly­genic, and GWASes would only get bet­ter for it, and for all the eas­ier traits as well. (“To see the gods dis­pelled in mid-air and dis­solve like clouds is one of the great human expe­ri­ences. It is not as if they had gone over the hori­zon to dis­ap­pear for a time; nor as if they had been over­come by other gods of greater power and pro­founder knowl­edge. It is sim­ply that they came to noth­ing.”) Among other impli­ca­tions, embryo selec­tion was now proven fea­si­ble (em­bryos are, after all, just sib­lings), and sud­denly far-dis­tant future spec­u­la­tions like iter­ated embryo selec­tion (IES) no longer seemed to rest on such a rick­ety tower of assump­tions. This was con­cern­ing. Also con­cern­ing was the will­ful blind­ness of many, includ­ing respectable geneti­cists and sci­en­tists, who hap­pily made up argu­ments about poly­genic­i­ty, invented genetic cor­re­la­tions, con­flated hits with PGS with SNP her­i­tabil­ity with her­i­tabil­i­ty, claimed GWASes would­n’t repli­cate, ignored all incon­ve­nient ani­mal exam­ples, and sim­ply dis­missed out of hand all pos­si­bil­i­ties as minor with­out so much as a sin­gle num­ber men­tioned; in short, if there was any way to be con­fused or one could invent any pos­si­ble obsta­cle, then that imme­di­ately became a fatal objec­tion.3 Bostrom & Shul­man 2014 finally pro­vided an adult per­spec­tive on embryo selec­tion pos­si­bil­i­ties and was good as far as it went in a few pages, but I felt neglected a lot of prac­ti­cal ques­tions and did­n’t go beyond the sim­plest pos­si­ble kind of embryo selec­tion.

Since no one was will­ing to answer my ques­tions, I began answer­ing them myself. I first began by repli­cat­ing Bostrom & Shul­man 2014’s results with the sim­plest model of embryo selec­tion, and began work­ing in the var­i­ous costs and attri­tion in the IVF pipeline, to cre­ate a real­is­tic answer. I then began look­ing at the scal­ing: which is more impor­tant, PGS or num­ber of embryos? How much can either be boosted in the fore­see­able future? Sur­pris­ing, n is more impor­tant than PGS, despite PGS being what every­one always debat­ed, and I detoured a bit into order sta­tis­tics, since the impor­tance of ‘mas­sive embryo selec­tion’ was under­rat­ed. Dimin­ish­ing returns do set in, but there are two major improve­ments: mul­ti­-stage selec­tion, where one selects at mul­ti­ple stages in the process, which turns out to be absurdly more effec­tive, and select­ing on an index of mul­ti­ple traits using the count­less PGSes now avail­able, which is sub­stan­tially more effi­cient and also addresses the buga­boo of neg­a­tive genetic cor­re­la­tion­s—s­e­lect­ing on a trait like intel­li­gence will not ‘back­fire’ because when you look at human phe­no­typic and geno­typic cor­re­la­tions as a whole, almost every good trait is cor­re­lated with (not merely inde­pen­dent of) other good traits, and like­wise for bad traits, and this super­charges selec­tion. There were a num­ber of other inter­est­ing avenues, but I largely answered my ques­tion: embryo selec­tion is cer­tainly pos­si­ble, will be soon (and has since) been done, is profitable already, albeit mod­estly so, genetic edit­ing like CRISPR is prob­a­bly dras­ti­cally over­rated bar­ring break­throughs in doing hun­dreds or thou­sands of edits safe­ly, but there are mul­ti­ple path­ways to far more effec­tive and thus dis­rup­tive changes in the 2020s–2030s through mas­sive embryo selec­tion or IES or genome syn­the­sis (with a few wild cards like gamete or chro­mo­some selec­tion), par­tic­u­larly as gains accu­mu­late over gen­er­a­tions. Mod­el­ing IES or genome syn­the­sis is almost unnec­es­sary because the poten­tial gains are so large. (There are still some inter­est­ing ques­tions in con­strained opti­miza­tion and mod­el­ing breed­ing pro­grams with the exist­ing haplotypes/LD, but I’m not sure they’re impor­tant to know at this point.)

I kept an eye on deep learn­ing the entire time post-AlexNet, and was per­turbed by how DL just kept on grow­ing in capa­bil­i­ties and march­ing through fields, and in par­tic­u­lar, how its strengths were in the areas that had always his­tor­i­cally bedev­iled AI the most and how they kept scal­ing as model sizes improved—im­prove as mod­els with mil­lions of para­me­ters were, peo­ple were already talk­ing about train­ing NNs with as many as a bil­lion para­me­ters. Crazy talk? One could­n’t write it off so eas­i­ly. Back in 2009 or so, I had spent a lot of time read­ing about Lisp machines and AI in the 1980s, going through old jour­nals and news arti­cles to improve the Wikipedia arti­cle on Lisp machi­nes, and I was amazed by the Lisp machine OSes & soft­ware, so supe­rior to Linux et al, but also doing a lot of eye­-rolling at the expert sys­tems and robots which passed for AI back then; in fol­low­ing deep learn­ing, I was struck by how it was the reverse, GPU were a night­mare to pro­gram for and the soft­ware ecosys­tem was almost actively mali­cious in sab­o­tag­ing pro­duc­tiv­i­ty, but the result­ing AIs were uncan­nily good and excelled at per­cep­tual tasks. Grad­u­al­ly, I became con­vinced that DL was here to stay, and offered a poten­tial path to AGI: not that any­one was going to throw a 2016-style char-RNN at a mil­lion GPUs and get an AGI, of course, but that there was now a non­triv­ial pos­si­bil­ity that fur­ther tweaks to DL-style archi­tec­tures of sim­ple differ­en­tiable units com­bined with DRL would keep on scal­ing to human-level capa­bil­i­ties across the board. (Have you noticed how no one uses the word “tran­shu­man­ist” any­more? Because we’re all tran­shu­man­ists now.) There was no equiv­a­lent of Rietveld et al 2013 for me, just years of read­ing Arxiv and fol­low­ing the trends, rein­forced by occa­sional case-s­tud­ies like AlphaGo (let’s take a moment to remem­ber how amaz­ing it was that between Octo­ber and May, the state of com­puter Go went from ‘per­haps a MCTS vari­ant will defeat a pro in a few years, and then maybe the world champ in a decade or two’ to ‘untouch­ably super­hu­man’; tech­nol­ogy and com­put­ers do not fol­low human time­lines or scal­ing, and 9 GPUs can train a NN in a mon­th).

AlphaGo Zero: ‘just stack moar lay­ers lol!’

The “bit­ter les­son” encap­su­lates the long-term trends observed in DL, from use in a few lim­ited areas, often on pre/post-processed data or as part of more con­ven­tional machine learn­ing work­flows, heavy on sym­bolic tech­niques, to ever-larger (and often sim­pler) neural net mod­els. MuZero is per­haps the most strik­ing 2019 exam­ple of the bit­ter lesson, as Alp­haZero had pre­vi­ously leapt past all pre­vi­ous sym­bolic Go (and chess, and shogi) play­ers using a hybrid of neural nets (for near-su­per­hu­man intu­ition) and then sym­bolic tree search to do plan­ning over thou­sands or mil­lions of sce­nar­ios in an exact sim­u­la­tion of the envi­ron­ment, but then MuZero threw all that away in favor of just doing a lot of train­ing of an RNN, which does… what­ever it is an RNN does inter­nal­ly, with no need for explicit plan­ning or even an envi­ron­ment sim­u­la­tor—and quite aside from beat­ing Alp­haZe­ro, also applies directly to play­ing reg­u­lar ALE video games. (To update quip, “every time I fire a machine learn­ing researcher & replace him with a TPU pod, the per­for­mance of our Go-play­ing sys­tem goes up.”)

An open ques­tion: why was I and every­one else wrong to ignore con­nec­tion­ism when things have played out much as Schmid­hu­ber and and a few oth­ers pre­dict­ed? Were we wrong, or just unlucky? What was, ex ante, the right way to think about this, even back in the 1990s or 1960s? I am usu­ally pretty good at bul­let-bit­ing on graphs of trends, but I can’t remem­ber any per­for­mance graphs for con­nec­tion­ism; what graph should I have believed, or if it did­n’t exist, why not?

If there had always been a loud noisy con­tin­gent, per­haps a minor­ity like a quar­ter of ML researchers, who watched GPU progress with avid inter­est and repeat­edly sketched out the pos­si­bil­i­ties of scal­ing in 2010–2020, and eagerly leapt on it as soon as resources per­mit­ted and advo­cated for ever larger invest­ments, one could write this off as a nat­ural exam­ple of a break­through: sur­prises hap­pen, that’s why we do research, to find out what we don’t know. But instead, there were per­haps a hand­ful who truly expected it, even they seemed sur­prised by how it hap­pened; and no mat­ter how much progress was made, the naysay­ers never changed their tune (only their goal­post­s). (The most strik­ing exam­ple was offered mid­way in 2020 with .) A sys­tem­at­ic, com­pre­hen­sive, field­-wide fail­ure of pre­dic­tion & updat­ing like that demands expla­na­tion.

The best expla­na­tion I’ve come up with so far is work­ing back­wards from the excus­es, that this may be yet another man­i­fes­ta­tion of the human bias against reduc­tion­ism & pre­dic­tion: “it’s just mem­o­riza­tion”, “it’s just inter­po­la­tion”, “it’s just pat­tern-match­ing”, etc, per­haps accom­pa­nied by an exam­ple prob­lem that DL can’t (yet) solve which sup­pos­edly demon­strates the pro­found gulf between ‘just X’ and real intel­li­gence. It is, in other words, fun­da­men­tally an anti-re­duc­tion­ist argu­ment from increduli­ty: “I sim­ply can­not believe that intel­li­gence like a human brain has could pos­si­bly be made up of a lot of small parts”. (That the human brain is also made of small parts is irrel­e­vant to them because one can always appeal to the mys­te­ri­ous and ineffa­ble com­plex­ity of bio­log­i­cal neu­rons, with all their neu­ro­trans­mit­ters and synapses and what­not, so the brain feels ade­quately com­plex and part-less.) If so, deep learn­ing merely joins the long pan­theon of deeply unpop­u­lar reduc­tion­ist the­o­ries through­out intel­lec­tual his­to­ry: Atom­ism, mate­ri­al­ism, athe­ism, grad­u­al­ism, cap­i­tal­ism, evo­lu­tion, germ the­o­ry, elan vital & ‘organic’ mat­ter, poly­genic­i­ty, Boolean log­ic, Monte Carlo/simulation meth­ods… All of these inspired enor­mous resis­tance and deep vis­ceral hatred, despite prov­ing to be true or more use­ful. Humans seem cog­ni­tively hard­wired to believe that every­thing is made up of a few fun­da­men­tal units which are onto­log­i­cally basic and not reducible to (ex­treme­ly) large num­bers of small uni­form dis­crete objects.4 If the out­puts are beau­ti­fully com­plex and intri­cate, the inputs must also be that way: things can’t be made of a few kinds of atoms, they must instead be made of a differ­ent type for every object like a plenum of bone-par­ti­cles for bones or water-par­ti­cles for water; you can’t be a phys­i­cal brain, your mind must instead be a sin­gle indi­vis­i­ble onto­log­i­cal­ly-ba­sic ‘soul’ exist­ing on an imma­te­r­ial plane; economies can’t be best run as many inde­pen­dent agents inde­pen­dently trans­act­ing, it would be much bet­ter for the philoso­pher-k­ing & his cadres to plan out every­thing; etc. Sim­i­lar­ly, intel­li­gence can’t just be a rel­a­tively few basic units repli­cated tril­lions of times & trained with brute force, it must be thou­sands of intri­cate units per­form­ing arith­metic, log­ic, mem­o­ry, pat­tern-match­ing, pro­gram syn­the­sis, analo­giz­ing, rein­force­ment learn­ing, all in differ­ent ways and care­fully orga­nized with enor­mous exper­tise… But for all the bias­es, reduc­tion­ism still ulti­mately wins out. So why did­n’t it win soon­er?

What went wrong? There is a Catch-22 here: with the right tech­niques, impres­sive proof-of-con­cepts could have been done quite a few years ago on exist­ing super­com­put­ers and suc­cess­ful pro­to­types would have jus­ti­fied the invest­ment, with­out wait­ing for com­mod­ity gam­ing GPUs; but the tech­niques could not be found with­out run­ning many failed pro­to­types on those super­com­put­ers in the first place! Only once the pre­req­ui­sites fell to such low costs that near-zero fund­ing sufficed to go through those count­less iter­a­tions of fail­ure, could the right tech­niques be found, and jus­tify the cre­ation of the nec­es­sary datasets, and fur­ther jus­tify scal­ing up. Hence, the sud­den deep learn­ing renais­sance—had we known what we were doing from the start, we would have sim­ply seen a grad­ual increase in capa­bil­i­ties from the 1980s.

The flip side of the bit­ter les­son is the sweet short­cut: as long as you have weak com­pute and small data, it’s always easy for the researcher to smug­gle in prior knowledge/bias to gain greater per­for­mance. That this will be dis­pro­por­tion­ately true of the archi­tec­tures which scale the worst will be invis­i­ble, because it is impos­si­ble to scale any supe­rior approach at that time. Appeal­ing to future com­pute and spec­u­lat­ing about how “brain-e­quiv­a­lent com­put­ing power” arriv­ing by 2010 or 2030 will enable AI sounds more like wish­ful think­ing than good sci­ence. A con­nec­tion­ist might scoff at this skep­ti­cism, but they have no com­pelling argu­ments: the human brain may be an exis­tence proof, but most con­nec­tion­ist work is a car­i­ca­ture of the baroque com­plex­ity of neu­ro­bi­ol­o­gy, and besides, planes do not flap their wings nor do sub­marines swim. How would they prove any of this? They can’t, until it’s too late and every­one has retired.

TODO: can­di­date-gene deba­cle

Thus, there is an epis­temic trap. The very fact that con­nec­tion­ism is so gen­eral and scales to the best pos­si­ble solu­tions means that it per­forms the worst early on in R&D and com­pute trends, and is out­com­peted by its smaller (but more lim­it­ed) com­peti­tors; because of this com­pe­ti­tion, it is starved of research, fur­ther ensur­ing that it looks use­less; with a track record of being use­less, the steadily decreas­ing required invest­ments don’t make any differ­ence because no one is tak­ing seri­ously any pro­jec­tions; until final­ly, a hard­ware over­hang accu­mu­lates to the point that it is doomed to suc­cess, when 1 GPU is enough to iter­ate and set SOTAs, break­ing the equi­lib­rium by pro­vid­ing unde­ni­able hard results.

This trap is intrin­sic to the approach. There is no alter­nate his­tory where con­nec­tion­ism some­how wins the day in the 1970s and all this DL progress hap­pens decades ahead of sched­ule. If Min­sky had­n’t pointed out the prob­lems with per­cep­trons, some­one else would have; if some­one had imported con­vo­lu­tions in the 1970s rather than LeCun in 1990, it would have sped things up only a lit­tle; if back­prop­a­ga­tion had been intro­duced decades ear­lier, as early as imag­in­able, per­haps in the 1950s with the devel­op­ment of dynamic pro­gram­ming, that too would have made lit­tle differ­ence because there would be lit­tle one could back­prop over (and resid­ual net­works were intro­duced in the 1980s decades before they were rein­vented in 2015, to no effec­t); and so on. The his­tory of con­nec­tion­ism is not one of being lim­ited by ideas—ev­ery­one has tons of ideas, great ideas, just ask Schmid­hu­ber for a bas­ket as a party favor!—but one of results; some­what like behav­ioral & pop­u­la­tion genet­ics, all of these great ideas fell through a por­tal from the future, drop­ping in on sav­ages lack­ing the pre­req­ui­sites to sort rub­bish from rev­o­lu­tion. The com­pute was not avail­able, and humans just aren’t smart enough to either invent every­thing required with­out painful tri­al-and-er­ror or prove beyond a doubt their effi­cacy with­out need­ing to run them.

GANs in 2014 caught my atten­tion because I knew the ultra­-crude 64px grayscale faces would improve con­stant­ly, and in a few years GANs would be gen­er­at­ing high­-res­o­lu­tion color images of Ima­geNet. I was­n’t too inter­ested in Ima­geNet per se, but if char-RNNs could do Shake­speare and GANs could do Ima­geNet, they could do other things… like anime and poet­ry. (Why anime and poet­ry? To épa­ter la bour­geoisie, of course!) How­ev­er, there was no anime dataset equiv­a­lent to Ima­geNet, and as I knew from my DNM archives, datasets are often a bot­tle­neck, so after look­ing around for a while, I began lay­ing the ground­work for what would become Dan­booru2017. Karpa­thy also famously put char-RNNs on the map, and I began exper­i­ment­ing with poetry gen­er­a­tion. Anime did­n’t work well with any GAN I tried, and I had to put it aside. I knew a use­ful GAN would come along, and when it did, Dan­booru2017 would be ready—the pat­tern with deep learn­ing is that it does­n’t work at all, and one lay­ers on com­pli­cated hand-engi­neered archi­tec­tures to eek out some per­for­mance, until some­one finds a rel­a­tively sim­ple approach which scales and then one can sim­ply throw GPUs & data at the prob­lem. What I’ve learned about NNs is that they scale almost indefi­nite­ly; we don’t have any idea how to train NNs well, and they are grossly over­pa­ra­me­ter­ized, with almost all of a NN’s para­me­ters being unnec­es­sary; break­throughs are made by tri­al-and-er­ror to a degree scrubbed from research papers, and ‘algo­rith­mic’ progress is pri­mar­ily due to com­pute enabling enor­mous amounts of exper­i­ments; the­ory is almost use­less in guid­ing NN design, with papers actively mis­lead­ing the reader about this (eg the ResNet paper); even sub­tle details of ini­tial­iza­tion or train­ing can have shock­ingly large impli­ca­tions on per­for­mance—a NN which seems to be work­ing fine may in fact be badly bro­ken but still work OK because “NNs want to work”; “NNs are lazy” and will solve any given task in the lazi­est pos­si­ble way unless the task is hard (eg they can do dis­en­tan­gle­ment or gen­er­al­iza­tion or rea­son­ing just fine, but only if we actu­ally force them to solve those tasks and not some­thing eas­ier).

Final­ly, in 2017, ProGAN showed that anime faces were almost doable, and then with StyleGAN’s release in 2019, I gave it a sec­ond shot (ex­pect­ing a mod­est improve­ment over ProGAN, which is what was reported on pho­to­graphic faces etc) and was shocked when almost overnight StyleGAN cre­ated bet­ter anime faces than ProGAN, and soon was gen­er­at­ing shock­ing­ly-good faces. As a joke, I put up sam­ples as a stand­alone web­site TWDNE, and then a mil­lion Chi­nese decided to pay a vis­it. 2019 also saw GPT-2, and if the char-RNN poetry was OK, the GPT-2 poetry sam­ples, espe­cially once I col­lab­o­rated with Shawn Presser to use hun­dreds of TPUs to fine­tune GPT-2-1.5b (Google TFRC paid for that and other pro­ject­s), were fan­tas­tic. Overnight, the SOTA for both anime and poetry gen­er­a­tion took huge leaps. I wrote exten­sive tuto­ri­als on both StyleGAN and GPT-2 to help other peo­ple use them, and I’m pleased that, like my SR1 tuto­ri­al, a great many peo­ple found them use­ful, fill­ing the gap between ordi­nary peo­ple and a repo dumped on Github. (A fol­lowup project to use an exper­i­men­tal DRL approach to improve poetry/music qual­ity fur­ther did­n’t work out.)

I’d always spent time tweak­ing gwern.net’s appear­ance and func­tion­al­ity over the years, but my dis­in­cli­na­tion to learn JS/CSS/HTML (a tech stack that man­ages to be even more mis­er­able than the deep learn­ing Python ecosys­tem) lim­ited how far I could go. Said Achmiz vol­un­teered, and redesigned it almost from the ground up 2018–2020, and is respon­si­ble for every­thing from the drop caps to the col­lapsi­ble sec­tions to side­notes to link anno­ta­tions. (I can take credit for rel­a­tively few things, like the Haskell back­ends and writ­ing anno­ta­tions; I mostly just request fea­tures while try­ing to edu­cate myself about typog­ra­phy & design.) What was a util­i­tar­ian sta­tic site is now almost a design show­case, and I can’t help but feel from watch­ing traffic and men­tions that the improve­ments have made gwern.net more respectable and read­ers recep­tive, despite most of the con­tents remain­ing the same; syn­chro­nous­ly, I began work­ing on my appear­ance by get­ting LASIK, pick­ing up weightlift­ing as a hob­by, upgrad­ing my wardrobe etc, and there too I can’t help but feel that peo­ple were judg­ing by my appear­ance to an even greater extent that I (thought I had cyn­i­cal­ly) sup­posed. This reminds me of how AI risk devel­oped in the 2010s; it went from a wingnut fringe con­cern to broad accep­tance, as far as I can tell, start­ing in 2014 when Nick Bostrom’s Oxford Uni­ver­sity Pressed-pub­lished Super­in­tel­li­gence (which largely repack­ages the thought of Eliezer Yud­kowsky and other Sin­gu­lar­i­tar­i­ans for a gen­eral audi­ence) became a NYT best­seller and then was endorsed by Elon Musk. What does Elon Musk know about AI risk? What does the NYT best­seller list know about any­thing? And yet, I noticed at the time the sea change in atti­tude from such high­-s­ta­tus endorse­ments. Not, of course, that it helped Yud­kowsky’s rep­u­ta­tion, as new­com­ers to the topic promptly did their best to erase him and plant their flags by coin­ing ever new neol­o­gisms to avoid using terms like “Friendly AI” (I lost track of them all, like “Benefi­cent AI” or “human-aligned AI” or “AI align­ment” or “Prov­ably Ben­e­fi­cial AI” or “AI For Good” or “AI safety”). A les­son there for pio­neers expect­ing to at least get credit for being right: if you suc­ceed in becom­ing a bridge to a bet­ter future, it may be because the ‘respectable’ ‘seri­ous’ peo­ple knifed you in the back and walked across. (AI risk was also promptly hijacked by “what if the real AI risk is cap­i­tal­ism & not enough iden­ti­ty-pol­i­tics” types, but it has­n’t been a total loss.)

I think it would have been a mis­take to focus too much on design and appear­ance early on, how­ev­er. There is no point in invest­ing all that effort in tart­ing up a web­site which has noth­ing on it, and it makes sense only when you have a sub­stan­tial cor­pus to upgrade. One early web­site design flaw I do regret is not putting an even greater empha­sis on detailed cita­tion and link archiv­ing; in ret­ro­spect, I would have saved myself a lot of grief by mir­ror­ing all PDFs and exter­nal links as soon as I linked them. I thought my archiver dae­mon would be enough, but the con­stant cas­cade of linkrot, com­bined with the con­stant expan­sion of gwern.net, made it impos­si­ble to keep up with linkrot man­u­al­ly, and by omit­ting basic cita­tion meta­data like titles & authors, I had a hard time deal­ing with links that escaped archiv­ing before dying. Like­wise, treat­ing social media as a plat­form for draft­ing and pub­li­ciz­ing is impor­tant for any writer to remem­ber. Plat­forms are not your friend, because A plat­form like Google+ or Twit­ter or Red­dit can and will demote or delete you and your con­tent, or dis­ap­pear entire­ly, for triv­ial rea­sons. (Who can for­get the of watch­ing young col­lege-grad Red­dit employ­ees flip­pantly decide which sub­red­dits to erase, crow­ing as they set­tled scores? Or how many sub­red­dits were deleted in their entirety in var­i­ous purges because of a coin­ci­dence in names, show­ing no employ­ees had so much as looked at their front page? Google+, of course, was erased in its entire­ty; a sub­stan­tial chance of its death, but was still dis­mayed.) The prob­a­bil­ity may be small each year, but it adds up. In the next decade, I don’t know what web­site I use will go under or crazy: HN, Twit­ter, Red­dit, WP, LW? But I will try to be ready and ensure that any­thing of value is archived, export­ed, or moved to gwern.net.

Another under­rated activ­i­ty, along with cre­at­ing datasets and meth­ods, is pro­vid­ing full­texts. “Users are lazy”, and if it isn’t in Google/Google Schol­ar, then it does­n’t exist for researchers either. I noticed on Wikipedia that while there were few overt signs that my arti­cles were hav­ing any influ­ence—who cites a Wikipedia arti­cle?—there were con­tin­u­ous sus­pi­cious coin­ci­dences. I would dig up a par­tic­u­larly recher­ché arti­cle or inter­view, scan or host it, quote it on WP, and years lat­er, notice that some arti­cle would cite it; with­out any ref­er­ence to me or WP. (“Oh ho, you just hap­pened to dig it up inde­pen­dently in your doubt­less thor­ough lit­er­a­ture review, did you? A likely sto­ry!”) This is equally true for other top­ics; how many times have I dug up some paper or the­sis from the 1950s and—mirabile dictu!—other researchers sud­denly began to cite it in papers? It would be inter­est­ing to pull cita­tion meta­data from Seman­tic Scholar and cross-ref­er­ence it with my revi­sion his­tory to look at the time-series and see how much my host­ing increases total cita­tion count once Google Scholar indexed it.

Over­all, I think I have a good track record. My pre­dic­tions (whether on IEM, Intrade, Pre­dic­tion­Book, or Good Judg­ment Pro­ject) are far above aver­age, and I have taken many then-con­tro­ver­sial posi­tions where con­tem­po­rary research or opin­ion has moved far closer to me than the crit­ics, often when I was in a minute minor­i­ty. Exam­ples include Wikipedi­a’s edit­ing cri­sis, anti-DNB, the Repli­ca­tion Cri­sis, Bit­coin, dark­net mar­kets, modafinil & nicotine, LSD micro­dos­ing, AI risk, behav­ioral genet­ics, embryo selec­tion, and adver­tis­ing harms. If I have not always been right from the start, I have at least been less wrong than most in updat­ing faster than most (DNB, behav­ioral genet­ics, DL/DRL).

Media

Books

Non­fic­tion:

  1. The Cul­tural Rev­o­lu­tion: A Peo­ple’s His­to­ry, 1962–1976, 2016 ()
  2. , 1997 (review)

TV/movies

Non­fic­tion movies:

  1. (review)
  2. (review)

Fic­tion:

  1. ()
  2. (review)
  3. (review)
  4. , 1932 (review)
  5. ()
  6. ()
  7. ()
  8. , 1978 (review)
  9. 2012/ 2014/ 2014 (review)

Ani­me:

  1. (review)
  2. (review)
  3. (review)
  4. Neon Gen­e­sis Evan­ge­lion Con­cur­rency Project, 2013 (review)
  5. (review)
  6. : (re­view: , s9)

  1. Psy­che­delics enthu­si­asts have never for­given me for this, no mat­ter how much I have been vin­di­cated by sub­se­quent stud­ies.↩︎

  2. Bar­ring, of course, addi­tional fac­tors like pub­li­ca­tion bias or fraud or soft­ware errors, the lat­ter of which have hap­pened.↩︎

  3. This should not have been a sur­prise to me, after see­ing how, after the dis­as­trous impe­r­ial pres­i­dency of George W. Bush, lib­er­als fell over them­selves to defend all admin­is­tra­tion poli­cies and abuses once a Demo­c­rat was in office, and so on with Trump. Pol­i­tics is the mind­killer.↩︎

  4. The psy­chol­ogy of reli­gion in small chil­dren is reveal­ing in this way: small chil­dren do not believe peo­ple are made of care­ful­ly-arranged parts. They just are. So if some­one dies, they must go some­where else—that’s just basic objec­t-per­sis­tence and con­ser­va­tion!↩︎