2019 News

Annual summary of 2019 Gwern.net newsletters, selecting my best writings, the best 2019 links by topic, and the best books/movies/anime I saw in 2019, with some general discussion of the year and the 2010s, and an intellectual autobiography of the past decade.
newsletter, personal, meta, NN
2019-11-212021-02-25 in progress certainty: log importance: 0

This is the of the Gw­ern.net newslet­ter (), sum­ma­riz­ing the best of the monthly 2019 newslet­ters:

  1. end of year sum­mary (here)

Pre­vi­ous an­nual newslet­ters: , , 2016, 2015.


2019 went well, with much in­ter­est­ing news and sev­eral stim­u­lat­ing trips. My 2019 writ­ings in­clud­ed:

  1. Dan­booru2018 re­leased: a dataset of 3.33m anime im­ages (2.5tb) with 92.7m de­scrip­tive tags
  2. /

I’m par­tic­u­larly proud of the tech­ni­cal im­prove­ments to the Gw­ern.net site de­sign this year: along with a host of mi­nor ty­po­graphic im­prove­ments & per­for­mance op­ti­miza­tions, en­ables au­to­matic up­dates of cur­ren­cies (a fea­ture I’ve long felt would make doc­u­ments far less mis­lead­ing), the link an­no­ta­tion­s/pop­ups () are a ma­jor us­abil­ity en­hance­ment few sites have, sidenotes.js elim­i­nates the frus­tra­tion of foot­notes by pro­vid­ing , col­lapsi­ble sec­tions help tame long writ­ings by avoid­ing the need for hid­ing code or rel­e­gat­ing ma­te­r­ial to ap­pen­dices, and link icons & drop caps & epigraphs are just pret­ty. While changes are never unan­i­mously re­ceived, we have re­ceived many com­pli­ments on the over­all de­sign, and are pleased with it.

Site traffic (more de­tailed break­down) was again up as com­pared with the year be­fore: 2019 saw 1,361,195 pageviews by 671,774 unique vis­i­tors (life­time to­tals: 7,988,362 pageviews by 3,808,776 user­s). I ben­e­fited pri­mar­ily from TWDNE, al­though the num­bers are some­what in­flated by host­ing a num­ber of pop­u­lar archived pages from /OKCupid/, which I put Google An­a­lyt­ics on to keep track of re­fer­rals.


2019 was a fun year.

AI: 2019 was a great year for hob­by­ists and fun gen­er­a­tive projects like mine, thanks to spin­offs and es­pe­cially pre­trained mod­els. How much more bor­ing it would have been with­out the GPT-2 or StyleGAN mod­els! (There was ir­ri­tat­ingly lit­tle mean­ing­ful news about self­-driv­ing cars.) More se­ri­ous­ly, the theme of 2019 was scal­ing. Whether GPT-2 or StyleGAN 1/2, or the scal­ing pa­pers, or Al­phaS­tar, or , 2019 demon­strated the bless­ings of scale in scal­ing up mod­els, com­pute, data, and tasks; it is no ac­ci­dent that the most ex­ten­sively dis­cussed ed­i­to­r­ial on DL/DRL was Rich Sut­ton’s . For all the crit­ics’ carp­ing and goal­post-mov­ing, scal­ing is work­ing, es­pe­cially as we go far past the regimes where they as­sured us years ago that mere size and com­pute would break down and we would have to use more el­e­gant and in­tel­li­gent meth­ods like Bayesian pro­gram syn­the­sis. In­stead, every year it looks in­creas­ingly like the strong con­nec­tion­ist the­sis is cor­rect: much like hu­mans & evo­lu­tion, AGI can be reached by train­ing an ex­tremely large num­ber of rel­a­tively sim­ple units end-to-end for a long time on a wide va­ri­ety of mul­ti­modal tasks, and it will re­cur­sively self­-im­prove meta-learn­ing effi­cient in­ter­nal struc­tures & al­go­rithms op­ti­mal for the real world which learns how to gen­er­al­ize, rea­son, self­-mod­ify with in­ter­nal learned re­ward prox­ies & op­ti­miza­tion al­go­rithms, and do ze­ro/few-shot learn­ing boot­strapped purely from the ul­ti­mate re­ward sig­nal­s—with­out re­quir­ing ex­ten­sive hand-engi­neer­ing, hard­wired spe­cial­ized mod­ules de­signed to sup­port sym­bolic rea­son­ing, com­pletely new par­a­digms of com­put­ing hard­ware etc. (eg ). Self­-driv­ing cars re­main a bit­ter mys­tery (although it was nice to see in 2019 Elon Musk & Tesla snatch vic­tory from the jaws of snatch­ing-de­feat-from-the-jaws-of-vic­to­ry).

2019 for ge­net­ics saw more progress on ge­net­ic-engi­neer­ing top­ics than GWASes; the GWASes that did come out were largely con­fir­ma­to­ry—no one re­ally needed more SES GWASes from Hill et al, or con­fir­ma­tion that the IQ GWASes work and that brain size is in fact causal for in­tel­li­gence, and while the re­cov­ery of full height/BMI trait her­i­tabil­ity from WGS is a strong en­dorse­ment of the long-term value of WGS, the tran­si­tion from lim­ited SNP data to WGS is fore­or­dained (e­spe­cially since WGS costs ap­pear to fi­nally be drop­ping again after their long stag­na­tion). Even em­bryo se­lec­tion saw greater main­stream ac­cep­tance, with a pa­per in Cell con­clud­ing (for the crud­est pos­si­ble sim­ple em­bryo se­lec­tion meth­ods) that, for­tu­nate­ly, the glass was half-empty and need not be feared over­much. (I am pleased to see that as of 2019, hu­man ge­neti­cists have man­aged to rein­vent /Lush 1943‘s breed­er’s equa­tion; with any luck, in a few years they may progress as far as rein­vent­ing Hazel & Lush 1943.) More in­ter­est­ing were the no­table events along all axis of post-sim­ple-em­bry­o-s­e­lec­tion strate­gies: Ge­nomic Pre­dic­tion claimed to have done the first em­bryo se­lec­tion on mul­ti­ple PGSes, genome syn­the­sis saw E. coli achieved, mul­ti­ple promis­ing post-CRISPR or mass CRISPR edit­ing meth­ods were an­nounced, gene drive pro­gressed to mam­mals, ga­me­to­ge­n­e­sis saw progress (in­clud­ing at least two hu­man fer­til­ity star­tups I know of), se­ri­ous pro­pos­als for hu­man germline CRISPR edit­ing are be­ing made by a Russ­ian (a­mong oth­er­s), and while He Jiankui was im­pris­oned by a se­cret court there oth­er­wise do not ap­pear to have been se­ri­ous reper­cus­sions such as re­ports of the 3 CRISPR ba­bies be­ing harmed or an ’in­defi­nite mora­to­rium’ (ie. ban). Thus, we saw good progress to­wards the en­abling tech­nolo­gies for mas­sive em­bryo se­lec­tion (break­ing the egg bot­tle­neck by al­low­ing gen­er­a­tion of hun­dreds or thou­sands of em­bryos and thus mul­ti­ple-SD gains from se­lec­tion), IES (It­er­ated Em­bryo Se­lec­tion), mas­sive em­bryo edit­ing (CRISPR or de­riv­a­tives), and genome syn­the­sis.

VR’s 2019 launch of proved quite suc­cess­ful, sell­ing out oc­ca­sion­ally well after launch, and ap­pears to ap­peal to nor­mal peo­ple, with even hard­core VR fans ac­knowl­edg­ing how much they ap­pre­ci­ate the con­ve­nience of a sin­gle in­te­grated unit. Un­for­tu­nate­ly… it is not suc­cess­ful enough. There is no VR wave. Sell­ing out may have as much to do with Face­book not in­vest­ing too much into man­u­fac­tur­ing. Worse, there is still no killer app be­yond . The hard­ware is ad­e­quate to the job, the price is nu­ga­to­ry, the ex­pe­ri­ence un­par­al­leled, but there is no stam­pede into VR. So it seems VR is doomed to the long slow mul­ti­-decade adop­tion slog like that of PCs: it’s too new, too differ­ent, and we’re still not sure what to do with it. One day, it would not be sur­pris­ing if most peo­ple have a VR head­set, but that day is a long way away.

Bit­coin: lit­tle of note. Dark­net mar­kets proved un­usu­ally in­ter­est­ing: Dream Mar­ket, the longest-lived DNM ever, fi­nally ex­pired; Red­dit be­trayed its users by whole­sale purg­ing of sub­red­dits, in­clud­ing /r/DarkNetMarkets, caus­ing me a great deal of grief; and most shock­ing­ly, Deep­DotWeb was raided by the FBI over affil­i­ate com­mis­sions it re­ceived from DNMs (ap­par­ently into the tens of mil­lions of dol­lars—y­ou’d’ve thought they’d taken down those hideous ads all over DDW if the affil­i­ate links were so profitable…)

What Progress?

As the end of a decade is a tra­di­tional time to look back, I thought I’d try my own ver­sion of Scott Alexan­der’s es­say , where he con­sid­ered how his ideas/­be­liefs evolved over the past decade of blog­ging, as I started my own web­site a decade ago as well.

I’m not given to in­tro­spec­tion, so I was sur­prised to think back to 2010 and re­al­ize how far I’ve come in every way—even list­ing them ob­jec­tively would sound in­suffer­ably con­ceit­ed, so I won’t. To thank some of the peo­ple who helped me, di­rectly or in­di­rect­ly, risks (to para­phrase Borges) re­pu­di­at­ing my debts to the oth­ers; nev­er­the­less, I should at least thank the fol­low­ing: Satoshi Nakamo­to, kiba, Ross Ul­bricht, Seth Roberts, Luke Muehlhauser, Nava White­ford, Patrick McKen­zie, SDr, Modafinil­Cat, Steve Hsu, Jack Con­te, Said Achmiz, Patrick & John Col­lison, and Shawn Press­er.

2010 was per­haps the worst of times but also best of times, be­cause it was the year the fu­ture re­boot­ed.

My per­sonal cir­cum­stances were less than ide­al. Wikipedi­a’s dele­tion­ist in­vo­lu­tion had in­ten­si­fied to the point where every­one could see it both on the ground and from the global sta­tis­tics, and it was be­com­ing clear the cul­tural shift was ir­re­versible. Ge­netic en­gi­neer­ing con­tin­ued its grind­ing­ly-s­low progress to­wards some day do­ing some­thing, while in com­plex trait/be­hav­ioral ge­net­ics re­search, the early GWASes pro­vided no use­ful poly­genic scores but did re­veal the full mea­sure of the can­di­date-gene de­ba­cle: it was­n’t a mi­nor method­olog­i­cal is­sue, but in some fields the false-pos­i­tive rate ap­proached 100%, and tens of thou­sands of pa­pers were, or were based on, ab­solutely worth­less re­search. AI/­ma­chine learn­ing were ex­haust­ed, with state-of-the-art typ­i­cally some sort of hand-engi­neered so­lu­tion or com­pli­cated in­cre­men­tal tweaks to some­thing crude like an SVM or ran­dom forest, with no even slightly vi­able path to in­ter­est­ingly pow­er­ful sys­tems (much less AGI). At best, one could say that the stag­nant back­wa­ter of AI, neural net­work re­search, showed a few in­ter­est­ing re­sults, and the Schmid­hu­ber lab had won a few ob­scure con­tests, and might fi­nally prove use­ful for some­thing over the com­ing years be­yond play­ing backgam­mon or read­ing ZIP codes, as­sum­ing any­one could avoid eye­-rolling at Schmid­hu­ber’s web­site(s) long enough to read as far as his pro­jec­tions that com­put­ing power and NN per­for­mance trends would both con­tinue (“what NN trends‽”); I noted to my sur­prise that Shane Legg had left an ap­par­ently suc­cess­ful aca­d­e­mic ca­reer to launch a new startup called ‘Deep­Mind’, to do some­thing with NNs. (Good for him, I thought, maybe they’ll get ac­qui­hired for a few mil­lion bucks by some cor­po­ra­tion need­ing a small im­prove­ment from ex­otic ML vari­ants, and then with a fi­nan­cial cush­ion, he can get back to his real work. After all, it’s not like con­nec­tion­ism ever worked be­fore…) And re­in­force­ment learn­ing, as far as any­one need be con­cerned, did­n’t work. Cryp­tog­ra­phy, as far as any­one need be con­cerned, con­sisted of the art of (not) pro­tect­ing your In­ter­net con­nec­tion. Main­stream tech­nol­ogy was ob­sessed with the mo­bile shift, of lit­tle in­ter­est or value to me, and a shift that came with se­vere draw­backs, such as a mas­sive mi­gra­tion away from FLOSS back to pro­pri­etary tech­nolo­gies & walled gar­dens & ‘ser­vices’. All in all, there was not that much to draw in­ter­est in ex­is­ten­tial risk; if you had been in­ter­est­ed, at the time, prob­a­bly the best thing you could’ve done is fo­cus on the stand­bys of nu­clear war & pan­demic (nei­ther have been solved), or left your money in­vested to wait for more com­pelling char­i­ta­ble op­por­tu­ni­ties. The fu­ture had ar­rived, and had brought lit­tle with it be­sides 144 char­ac­ters.

But also in 2010, dis­il­lu­sioned with writ­ing on Wikipedia, I reg­is­tered Gw­ern.net. Some ge­neti­cists had be­gun buzzing over GCTA and the early GWASes’ poly­genic scores, which in­di­cated that, can­di­date-genes notwith­stand­ing, the genes were there to be found, and sim­ple power analy­sis im­plied there were sim­ply so many of them that one would need sam­ples of tens of thou­sand­s–no, hun­dreds of thou­sand­s—of peo­ple to start find­ing them, which sounded daunt­ing, but for­tu­nately the su­per-ex­po­nen­tial curve in se­quenc­ing costs en­sured that those sam­ples would be­come avail­able in mere years, through things like some­thing called ‘The UK BioBank’ (UKBB). (Told of this early on, I was skep­ti­cal: that seemed like an aw­ful lot of faith in ex­trap­o­lat­ing a trend, and when did mega-pro­jects like that ever work out?) Even more ob­scure­ly, some mi­cro­bial ge­neti­cists noted that an odd pro­tein as­so­ci­ated with ‘CRISPR’ re­gions seemed to be part of a sort of bac­te­r­ial im­mune sys­tem and could cut DNA effec­tive­ly. Con­nec­tion­ism sud­denly started work­ing, and the nascent com­pute and NN trends did con­tinue for the next decade, with overnight chang­ing com­puter vi­sion, after which NNs be­gan rapidly ex­pand­ing and col­o­niz­ing ad­ja­cent fields (de­spite peren­nial pre­dic­tions that they would reach their lim­its Real Soon Now), with Trans­form­er­s+pre­train­ing re­cently claim­ing the scalp of the great hold­out, nat­ural lan­guage pro­cess­ing—nous sommes tous con­nex­ion­nistes. At the same time, Wik­ileaks reached its high­-wa­ter mark, help­ing in­spire Ed­ward Snow­den, Bit­coin was gain­ing trac­tion (I would hear of it and start look­ing into it in late 2010), and Ross Ul­bricht was mak­ing Silk Road 1 (which he would launch in Jan­u­ary 2011). In VR, Valve had re­turned to tin­ker­ing with it, and a young Palmer Luckey had be­gun to play with us­ing smart­phone screens as cheap high­-speed high­-res small dis­plays for head­sets. As dom­i­nant as mo­bile was, 2010 was also close to the peak (eg the launch of In­sta­gram): a mo­bile strat­egy could now be taken for grant­ed, and in­fra­struc­ture & prac­tices had be­gun to catch up, so the gold rush was over and at­ten­tion could re­fo­cus else­where.

Pro­gres­sion of in­ter­ests: DNB → QS + IQ → sta­tis­tics + meta-analy­sis + Repli­ca­tion Cri­sis → Bit­coin + dark­net mar­kets → be­hav­ioral ge­net­ics → de­ci­sion the­ory → DL+DRL.

To the ex­tent there is any con­sis­tent thread through my writ­ing, it is an in­ter­est in op­ti­miza­tion, which in hu­mans is lim­ited pri­mar­ily by in­tel­li­gence. New dis­cov­er­ies, im­prove­ments, and changes must come from some­where; no mat­ter how you or­ga­nize any num­ber of chim­panzees, they will not in­vent the atomic bomb. Thus my in­ter­est in Dual N-Back (DNB), Quan­ti­fied Self (QS), and Spaced Rep­e­ti­tion Sys­tems (SRS) in 2010: these seemed promis­ing routes to im­prove­ment. Of the­se, the first was a com­plete and un­mit­i­gated fail­ure, the sec­ond was some­what use­ful, and the third is highly use­ful in rel­a­tively nar­row do­mains (although ex­per­i­ments in ex­tend­ing & in­te­grat­ing it are worth­while).

I’d come across in Wired the first men­tion of the study that put dual n-back on the map, Jaeggi et al 2008 and was in­trigued. While I knew the his­tory of IQ-boost­ing in­ter­ven­tions was dis­mal, to say the least, Jaeggi seemed to be quite rig­or­ous, had a good story about WM be­ing the bot­tle­neck con­sis­tent with what I knew from my cog­ni­tive psy­chol­ogy courses and read­ing, and it quickly at­tracted some fol­lowup stud­ies which seemed to repli­cate it. I be­gan n-back­ing, and dis­cussing the lat­est vari­ants and re­search. The rep­e­ti­tion in dis­cus­sions prompted me to start putting to­gether a DNB FAQ, which was un­suit­able for Wikipedia, so I hosted it on my shell ac­count on code.haskell.org for a while, un­til I wanted to write a few more pages and de­cided it was time to stop abus­ing their gen­eros­ity and set up my own web­site, Gw­ern.net, us­ing this new sta­tic web­site gen­er­a­tor called Hakyll. I did­n’t know what I wanted to do, but I could at least write about any­thing I found in­ter­est­ing on my own web­site with­out wor­ry­ing about dele­tion­ists or whether it was ap­pro­pri­ate for Less­Wrong, and it would be good for host­ing the oc­ca­sional PDF too.

While n-back­ing, I be­gan read­ing about the Repli­ca­tion Cri­sis and be­came in­creas­ingly con­cerned, par­tic­u­larly when Moody crit­i­cized DNB on sev­eral method­olog­i­cal grounds, ar­gu­ing that the IQ gains might be hol­low and dri­ven by the test be­ing sped up (and thus like play­ing DNB) or by mo­ti­va­tional effects (be­cause the con­trol group did noth­ing). I be­gan pay­ing closer at­ten­tion to stud­ies, null re­sults be­gan to come out from Jaeg­gi-u­naffil­i­ated labs, and I be­gan teach­ing my­self R to an­a­lyze my self­-ex­per­i­ments & meta-an­a­lyze the DNB stud­ies (be­cause no one else was do­ing it).

What I found was ug­ly: a thor­ough can­vass­ing of the lit­er­a­ture turned up plenty of null re­sults, re­searchers would tell me about the diffi­cul­ties in get­ting their nulls pub­lished, such as how peer re­view­ers would tell them they must have messed up their ex­per­i­ment be­cause “we know n-back works”, and I could see even be­fore run­ning my first meta-analy­sis that stud­ies with pas­sive con­trol groups got much larger effects than the bet­ter stud­ies run with ac­tive con­trol groups. Worse, some method­ol­o­gists fi­nally caught up and ran the SEMs on ful­l-s­cale IQ tests in­stead of sin­gle ma­trix sub­tests (which is what you are al­ways sup­posed to do to con­firm mea­sure­ment in­vari­ance, but in psy­chol­o­gy, mea­sure­ment & psy­cho­met­rics are hon­ored pri­mar­ily in the breach); the gain, such as it was, was­n’t even on the la­tent fac­tor of in­tel­li­gence. Mean­while, Jaeggi et al be­gan pub­lish­ing about com­plex mod­er­a­tion effects from per­son­al­i­ty, and ul­ti­mately sug­gest­ing in their ri­val meta-analy­sis that pas­sive vs ac­tive was sim­ply be­cause of coun­try-level differ­ences—be­cause I guess brains don’t work in the USA the way they do every­where else. They (Jaeggi had since earned tenure) con­ceded the de­bate largely by sim­ply no longer run­ning DNB stud­ies and shift­ing fo­cus to spe­cial­ized pop­u­la­tions like chil­dren with ADHD and other fla­vors of work­ing mem­ory train­ing, al­though by this point I had long since moved on and I can’t say what hap­pened to all the drama­tis per­sonae.

It was, in short, a pro­to­typ­i­cal case of the Repli­ca­tion Cri­sis: a con­ve­nient en­vi­ron­men­tal “One Weird Trick” in­ter­ven­tion, with ex­tremely low prior prob­a­bil­i­ty, which made no quan­ti­ta­tive pre­dic­tions, sup­ported by a weak method­ol­ogy which could man­u­fac­ture pos­i­tive re­sults and propped up by se­lec­tive ci­ta­tion, sys­temic pres­sure to­wards pub­li­ca­tion bi­as, re­searcher al­le­giance & com­mer­cial con­flicts of in­ter­est, which the­ory was never de­fin­i­tively dis­proven, but lost pop­u­lar­ity and sort of faded away. When I later reread Meehl, I rue­fully rec­og­nized it all.

In­spired by Seth Roberts, I be­gan run­ning per­sonal self­-ex­per­i­ments, mostly on nootrop­ics. I think this is an un­der­ap­pre­ci­ated way of learn­ing sta­tis­tics, as it ren­ders con­crete all sorts of ab­struse is­sues: block­ing vs ran­dom­iza­tion, blind­ing, power analy­sis, nor­mal­i­ty, miss­ing­ness, time-series & au­to-cor­re­la­tion, car­ry­over, in­for­ma­tive pri­ors & meta-analy­sis, sub­jec­tive Bayesian de­ci­sion the­o­ry—all of these arise nat­u­rally in con­sid­er­ing, plan­ning, car­ry­ing out, and an­a­lyz­ing a self­-ex­per­i­ment. By the end, I had a nice work­flow: ob­tain a hy­poth­e­sis from the QS com­mu­nity or per­sonal ob­ser­va­tion; find or make a meta-analy­sis about it; con­sider costs and ben­e­fits to do a power analy­sis for Value of In­for­ma­tion; run a blinded ran­dom­ized self­-ex­per­i­ment with blocks to es­ti­mate the true per­sonal causal effect purged of hon­ey­moon or ex­pectancy effects; run the analy­sis at check­points for fu­til­i­ty/low VoI; and write it up.

The first break­through was blind­ing: hav­ing been burned by DNB’s mo­ti­va­tional ex­pectancy effects, I be­came con­cerned about that ren­der­ing all QS re­sults in­valid. Blind­ing is sim­ple enough if you are work­ing with other peo­ple, but in­volv­ing any­one else is a great bur­den for a self­-ex­per­i­menter, which is one rea­son no one in QS ever ran blinded self­-ex­per­i­ments. After some thought, I re­al­ized self­-blind­ing is quite easy; all that is nec­es­sary is to in­vis­i­bly mark con­tain­ers or pills in some way, ran­dom­ize, and then one can in­fer after data col­lec­tion which was used. For ex­am­ple, one could color two pills by ex­per­i­men­tal/­place­bo, shake them in a bag, take one with one’s eyes closed, and look at the re­main­ing pill after record­ing data, or one could use iden­ti­cal-look­ing pills but in two sep­a­rate con­tain­ers, mark­ing the in­side or bot­tom of one, and again look­ing after­wards. (Self-blind­ing is so sim­ple I refuse to be­lieve I in­vented it, but I have yet to come across any prior art.) This gave me the con­fi­dence to go for­ward with my self­-ex­per­i­ments, and I think I had some in­ter­est­ing re­sults, some of which are ex­pected (red-t­int­ing soft­ware affect­ing sleep sched­ule), have been sup­ported by sub­se­quent re­search (the null effects of LSD mi­cro­dos­ing1), some which are plau­si­ble but not yet con­firmed (vi­t­a­min D dam­ag­ing sleep when taken at night), and some which are sim­ply mys­te­ri­ous (the strik­ingly harm­ful effects of potas­sium but also ap­par­ently of mag­ne­sium?). Iron­i­cal­ly, I’ve grad­u­ally be­come con­vinced that self­-blind­ing is not that im­por­tant be­cause the ma­jor­ity of bias in self­-ex­per­i­ments comes from se­lec­tive data re­port­ing & non-ran­dom­iza­tion—peo­ple fool them­selves by at­tribut­ing the good days to the new nootropic (omit­ting all the av­er­age or blah days), and good days cause one to do op­tional things (like take nootrop­ic­s).

I wound down my QS & nootrop­ics ac­tiv­i­ties for a few rea­sons. First, I be­came more con­cerned about the mea­sure­ment is­sues: I am in­ter­ested in daily func­tion­ing and ‘pro­duc­tiv­ity’, but how on earth does one mean­ing­fully quan­tify that? What are the “en­er­gies of men”, as William James put it? Al­most all mod­els look for an av­er­age treat­ment effect, but given the sheer num­ber of nulls and in­tro­spect­ing about what ‘pro­duc­tive’ days felt like, it seems like ex­pect­ing a large in­crease in the av­er­age (treat­ing it as a la­tent vari­able in be­ing spread across many mea­sured vari­ables) is en­tirely miss­ing the value of pro­duc­tiv­i­ty. It’s not that one gets a lot done across every vari­able, but one gets done the im­por­tant things. A day in which one ti­dies up, sends a lot of emails, goes to the gym, walks the dog, may be worse than use­less, while an ex­tra­or­di­nary day might en­tail 12 hours pro­gram­ming with­out so much as chang­ing out of one’s py­ja­mas. Pro­duc­tiv­i­ty, it seems, might func­tion more like ‘mana’ in an RPG than more fa­mil­iar con­structs like IQ, but un­for­tu­nate­ly, sta­tis­ti­cal mod­els which in­fer some­thing like a to­tal sum or in­dex across many vari­ables are rare; I’ve made lit­tle progress on mod­el­ing QS ex­per­i­ments which might in­crease to­tal ‘mana’. But if you can’t mea­sure some­thing cor­rect­ly, there is no point in try­ing to ex­per­i­ment on it; my Zeo did not mea­sure sleep any­where close to per­fect­ly, but I lost faith that mea­sur­ing a few things like emails or my daily 1–5 self­-rat­ings could give me mean­ing­ful re­sults rather than con­stant false neg­a­tives due to mea­sure­ment er­ror. Sec­ond­ly, nootrop­ics seem to run into di­min­ish­ing re­turns quick­ly. As I see it, all nootrop­ics work in one of 4 ways:

  1. stim­u­lant
  2. anx­i­olytic
  3. nu­tri­tional de­fi­ciency fix
  4. it does­n’t

You don’t need to run through too many stim­u­lants or anx­i­olyt­ics to find one that works for you, de­fi­cien­cies are deeply idio­syn­cratic and hard to pre­dict (who would think that iron sup­ple­men­ta­tion could cure pica or that veg­e­tar­i­ans seem to ben­e­fit from cre­atine?), and past that, you are just wast­ing money (or worse). The low-hang­ing fruit has gen­er­ally been plucked al­ready—yeah, io­dine is great… if you lived a cen­tu­ry, in a de­fi­cient re­gion, but un­for­tu­nate­ly, as a West­ern adult it prob­a­bly does zilch for you. (Like­wise mul­ti­vi­t­a­min­s.) I lost in­ter­est as it looked in­creas­ingly likely that I was­n’t go­ing to find a new nootropic as use­ful to me as mela­ton­in, nicotine, or modafinil.

Why were there so few sil­ver bul­lets? Why did DNB fail (as well as every other in­tel­li­gence in­ter­ven­tion), why was the Flynn effect hol­low, why were good nootrop­ics so hard to find, why was the base rate of cor­rect psy­cho­log­i­cal hy­pothe­ses (as Ioan­ni­dis would put it) so low? The best an­swer seemed to be “Al­ger­non’s law”: as a highly poly­genic fit­ness-rel­e­vant trait, hu­man in­tel­li­gence has al­ready been rea­son­ably well op­ti­mized, is not par­tic­u­larly mal­adapted to the cur­rent en­vi­ron­ment (un­like per­son­al­ity traits or mo­ti­va­tion or caloric ex­pen­di­ture), and thus has no use­ful ‘hacks’ or ‘over­rides’. In­tel­li­gence is highly valu­able to in­crease, and cer­tainly could be in­creased—just marry a smart per­son­—but there are no sim­ple easy ways to do so, par­tic­u­larly for adults. Gains would have to come from fix­ing many small prob­lems, but that is the sort of thing that drugs and en­vi­ron­men­tal changes are worst at. (It is telling that after decades of search­es, there is still not a sin­gle known mu­ta­tion, rare or com­mon, which in­creases in­tel­li­gence by even a few IQ points, aside from pos­si­bly that ul­tra­-rare one . In con­trast, plenty of large-effect mu­ta­tions are known for more neu­tral traits like height.)

Go­ing fur­ther, I won­dered whether the dis­as­ters in nu­tri­tion & ex­er­cise re­search, so par­al­lel to so­ci­ol­ogy & psy­chol­o­gy, were merely cher­ry-picked. Yes, every­one knows ‘cor­re­la­tion ≠ cau­sa­tion’, but is this a se­ri­ous ob­jec­tion, or is it a nit­pick­ing one of the sort which only oc­ca­sion­ally mat­ters and is abused as a uni­ver­sal ex­cuse? I found that first, the pres­ence of a cor­re­la­tion is a fore­gone con­clu­sion and hence com­pletely un­in­for­ma­tive: be­cause “every­thing is cor­re­lated”, find­ing a cor­re­la­tion be­tween two vari­ables is use­less, since of course you’ll find one if you look, so it’d be sur­pris­ing only if you were did­n’t de­tect one (and that might say more about your sta­tis­ti­cal power than the real world). Pre­dict­ing their di­rec­tion is not im­pres­sive ei­ther, since that’s just a coin-flip, and there is a pos­i­tive man­i­fold (from in­tel­li­gence, SES, and health, if noth­ing else), mak­ing it even eas­i­er. Sec­ond, the sim­plest way to an­swer this ques­tion is to look at ran­dom­ized ex­per­i­ments and com­pare—pre­sum­ably only the best and most promis­ing cor­re­la­tions, sup­ported by many datasets and sur­viv­ing all stan­dard ‘con­trols’, will progress to ex­pen­sive ran­dom­ized ex­per­i­ments, and the con­di­tional prob­a­bil­ity of the ran­dom­ized ex­per­i­ment agree­ing with a good cor­re­la­tion is what we care about most, since we in prac­tice we al­ready ig­nore the more du­bi­ous ones. There are a few such pa­pers cited in the Repli­ca­tion Cri­sis lit­er­a­ture, but I found that (Cowen’s Law!) there were ac­tu­ally quite a few such re­views/meta-analy­ses, par­tic­u­larly by the UK NICE (which does un­der­ap­pre­ci­ated work). The re­sults are not good: it is not rare, but com­mon, for the ran­dom­ized re­sult to be se­ri­ously differ­ent from the prior cor­re­la­tion re­sults, and this holds true whether we look at eco­nom­ics, so­ci­ol­o­gy, med­i­cine, ad­ver­tis­ing… When we look at the track record of so­cial in­ter­ven­tions, pay­ing par­tic­u­lar at­ten­tion to in­ter­ven­tions which have gone down the mem­ory hole and lessons from the Repli­ca­tion Cri­sis (sor­ry, no one be­lieves your 20-year-fol­lowup where only fe­males un­der 13 at the time of in­ter­ven­tion show sta­tis­ti­cal­ly-sig­nifi­cant ben­e­fit­s), it’s not much bet­ter. In­ter­ven­tions also tend to be self­-lim­ited by their tem­po­rary and fo­cused na­ture: “every­thing is cor­re­lated” and out­comes are dri­ven by pow­er­ful la­tent vari­ables which are main­tained by un­change­able in­di­vid­ual ge­net­ics (“every­thing is her­i­ta­ble”) which are re­spon­si­ble for lon­gi­tu­di­nal sta­bil­ity in traits (per­haps through mech­a­nisms like “the en­vi­ron­ment is ge­netic”) and long-last­ing in­di­vid­ual differ­ences which are set early in life. This pro­duces the “gloomy prospect” for QS, and for so­cial en­gi­neer­ing in gen­er­al: there may be use­ful in­ter­ven­tions, but they will be of lit­tle value on av­er­age—if the ben­e­fit is uni­ver­sal, then it will be small; if it is large and pre­dictable, then it will be lim­ited to the few with a par­tic­u­lar dis­ease; oth­er­wise, it will be un­pre­dictably idio­syn­cratic so those who need it will not know it. Thus, the metal­lic laws: the larger the change you ex­pect, the less likely it is; the low-hang­ing fruit, hav­ing al­ready been plucked, will not be test­ed; and the more rig­or­ously you test the left­overs, the smaller the fi­nal net effects will be.

Spaced rep­e­ti­tion was an­other find in Wired, and made me feel rather dim. We had cov­ered Ebbing­haus and the spac­ing effect in class, but it had never oc­curred to me that it could be use­ful, es­pe­cially with a com­put­er. (“How ex­tremely stu­pid of me not to have thought of that.”) I im­me­di­ately be­gan us­ing the Mnemosyne SRS for Eng­lish vo­cab­u­lary (sourc­ing new words from A Word A Day) and class­work. I stopped the Eng­lish vo­cab when I re­al­ized that my vo­cab­u­lary was al­ready so large that if I needed SRS to mem­o­rize a word, I was bet­ter off never us­ing that word be­cause it was a bar to com­mu­ni­ca­tion, and that my prose al­ready used too many vo­cab words, but I was so im­pressed that I kept us­ing it. (It is mag­i­cal to use an SRS for a few weeks, do­ing bor­ing flash­card re­views with some oddly fussy frills, and then a flash­card pops up and one re­al­izes that one just re­mem­bers it de­spite hav­ing only seen the card 4 or 5 times and noth­ing oth­er­wise.) It was par­tic­u­larly handy for learn­ing French, but I also value it for stor­ing quotes and po­et­ry, and for help­ing cor­rect per­sis­tent er­rors. For a Less­Wrong con­test, I read en­tirely too many pa­pers on spaced rep­e­ti­tion and wrote an overview of the top­ic, which has been use­ful over the years as in­ter­est in SRS has steadily grown.

What drew in my in­ter­est next? A reg­u­lar on IRC, one kiba, kept talk­ing about some­thing called Bit­coin, tout­ing it to us in No­vem­ber 2010 and on. I was cu­ri­ous to see this atavis­tic resur­gence of cypher­punk ideas—a de­cen­tral­ized un­cen­sorable e-cash, re­al­ly? It re­minded me of be­ing a kid read­ing the Jar­gon File and the Cypher­nomi­con and Phrack and &TOTSE. Bit­coin, sur­pris­ingly enough, worked (which was more than one could say of other cypher­punk in­fra­struc­ture like re­mail­er­s), and the core idea, while defi­nitely a bit per­verse, was a sur­pris­ing method of squar­ing the cir­cle; I could­n’t think of how it ought to fail in the­o­ry, and it was work­ing in prac­tice. In ret­ro­spect, be­ing in­tro­duced to Bit­coin so ear­ly, be­fore it had be­come con­tro­ver­sial or po­lit­i­cal­ly-charged, was enor­mously lucky: you could waste a life­time look­ing into long­shots with­out hit­ting a sin­gle Bit­coin, and here Bit­coin walked right up to me un­bid­den, and all I had to do was sim­ply eval­u­ate it on its mer­its & not re­flex­ively write it off. Still, some cute ideas and a work­ing net­work hardly jus­ti­fied wild claims about it eat­ing world cur­ren­cies, un­til I read in Gawker about a Tor drug mar­ket which was not a scam (like most Tor drug shops were) and used only Bit­coin, called “Silk Road”, straight out of the pages of the Cypher­nomi­con: it used es­crow, feed­back, and had many buy­ers & sell­ers al­ready. That got my at­ten­tion, but sub­se­quent cov­er­age of SR1 drove me nuts with how in­com­pe­tent and ig­no­rant it was. Kiba gave me my first bit­coins to write an SR1 dis­cus­sion and tu­to­r­ial for his Bit­coin Weekly on­line mag­a­zine, which I duly did (order­ing, of course, some Adder­all for my QS self­-ex­per­i­ments) The SR1 tu­to­r­ial was my first page to ever go vi­ral. I liked that. Peo­ple kept us­ing it as a guide to get­ting started on SR1, be­cause there is a great gap be­tween a tool ex­ist­ing and every­one us­ing it, and good doc­u­men­ta­tion is as un­der­es­ti­mated as open datasets.

Many crit­i­cisms of Bit­coin, in­clud­ing those from cryp­to­graphic or eco­nomic ex­perts, were not even wrong, but showed they had­n’t un­der­stood the ‘Proof of Work’ mech­a­nism be­hind Bit­coin. This made me ea­ger to get Bit­coin be­cause when some­one like P— K— or C— S— made re­ally dumb crit­i­cisms and re­vealed that they claimed Bit­coin would fail not be­cause the ac­tual mech­a­nisms would break down but be­cause they wanted Bit­coin to fail so gov­ern­ments can more eas­ily ma­nip­u­late the money sup­ply, such wish­ful think­ing, from oth­er­wise de­cent thinkers im­plied that Bit­coin was un­der­val­ued. (Bit­coin’s price was ei­ther far too low or far too high, and re­versed stu­pid­ity is in­tel­li­gence when deal­ing with a bi­nary like that.) The per­sis­tence of such lousy crit­ics was in­ter­est­ing, as it meant that PoW fell into some sort of in­tel­lec­tual blind spot; as I not­ed, there was no rea­son Bit­coin could not have been in­vented decades pri­or, and un­like most suc­cess­ful tech­nolo­gies or star­tups where , Bit­coin came out of the blue—in terms of in­tel­lec­tual im­pact, “hav­ing had no pre­de­ces­sor to im­i­tate, he had no suc­ces­sor ca­pa­ble of im­i­tat­ing him”. What was the key idea of Bit­coin, its biggest ad­van­tage, and why did peo­ple find it so hard to un­der­stand? I ex­plained it to every­one in my May 2011 es­say, , which also went vi­ral; I am some­times asked, all these crazy years lat­er, if I have changed my views on it. I have not. I nailed the core of Bit­coin in 2011, and there is naught to add nor take away.

Mean­while, Bit­coin kept grow­ing from a tiny seed to, years lat­er, some­thing even my grand­mother had heard of. (She heard on the news that Bit­coin was bank­rupt and asked if I was OK; I as­sured her I had had noth­ing on Mt­Gox and would be fine.) I did not ben­e­fit as much fi­nan­cially as I should have, be­cause of, shall we say, ‘liq­uid­ity con­straints’, but I can hardly com­plain—in­volve­ment would have been worth­while even with­out any re­mu­ner­a­tion, just to watch it de­velop and see how wrong every­one can be. One re­gret was the case of Craig Wright, which in­ves­ti­ga­tion turned out to be a large waste of my time and to back­fire; I re­gret not work­ing with Andy Green­berg to write an ar­ti­cle, nor any of the pos­si­ble find­ings we missed, but that when we learned Gawker was work­ing on a Wright piece, that we as­sumed the worst, that it was a triv­ial blog post which would scoop our months of work, and so we jumped the gun by pub­lish­ing an old draft ar­ti­cle in­stead of con­tact­ing them to find out what they had. We had not re­motely fin­ished look­ing into Craig Wright, and it turned out that Gawker had got­ten even fur­ther than we had, if any­thing, and if both groups had pooled their re­search, we would’ve had a lot more time be­fore pub­lish­ing, and more time to look into the many red flags and unchecked de­tails. As it was, all we got for our trou­ble was to be dragged through the mud by Wright be­liev­ers, who were fu­ri­ous at us mon­sters try­ing to un­mask a great man, and by Wright crit­ics, who were fu­ri­ous we were mo­rons as­sist­ing a con­man mak­ing mil­lions and too dumb to see all the red flags in our own re­port­ing. More pos­i­tive­ly, the re­vival of cypher­punk ideas in Bit­coin and Ethereum was re­ally some­thing to see, and gave rise to all sorts of crazy new ideas, as well as cre­at­ing a boom in re­search­ing ex­otic new cryp­tog­ra­phy, par­tic­u­larly ze­ro-knowl­edge proofs. There are two im­por­tant lessons there. The first is the ex­tent to which progress can be held back by a sin­gle miss­ing piece: the ba­sic prob­lem with al­most every­thing in Cypher­nomi­con is that they re­quire a vi­able e-cash to work, and with the fail­ure of Chau­mian e-cash schemes, there was none; the ideas for things like black mar­kets for drugs were cor­rect, but use­less with­out that one miss­ing piece. The sec­ond is the power of in­cen­tives: the amount of money in cryp­tocur­rency is, all things con­sid­ered, re­ally not that large com­pared to the rest of the econ­omy or things like pet food sales, and yet, it is ap­prox­i­mately ∞% more money than pre­vi­ously went into writ­ing cypher­punk soft­ware or into cryp­tog­ra­phy, and while a shock­ing amount went into bla­tant scams, Ponzis, hacks, bez­zles, and every other kind of fraud, an aw­ful lot of soft­ware got writ­ten and an aw­ful lot of new cryp­tog­ra­phy sud­denly showed up or was ma­tured into prod­ucts pos­si­bly decades faster than one would’ve ex­pect­ed.

Me­dia cov­er­age of SR1 did not im­prove, hon­or­able ex­cep­tions like Andy Green­berg or Eileen Ormsby aside, and I kept track­ing the top­ic, par­tic­u­larly record­ing ar­rests be­cause I was in­tensely cu­ri­ous how safe us­ing DNMs was. SR1 worked so well that we be­came com­pla­cent, and the fall of SR1 was trau­mat­ic—all that in­for­ma­tion and fo­rum posts lost Al­most all DNM re­searchers selfishly guarded their crawls (some­times to cover up bull­shit), , and datasets were clearly a bot­tle­neck in DNM re­search & his­to­ry, par­tic­u­larly given the ex­tra­or­di­nary turnover in DNMs after­wards (not helped by the FBI an­nounc­ing just how much Ross Ul­bricht had made). This led me to my most ex­ten­sive archiv­ing efforts ever, crawl­ing every ex­tent Eng­lish DNM & fo­rum on a weekly or more fre­quent ba­sis. This was tech­ni­cally tricky and ex­haust­ing, es­pe­cially as I mod­er­ated /r/­Dark­Net­Mar­kets, and tried to track the fall­out from SR1 & all DNM-related ar­rests (due not just to the SR1 raid or lax se­cu­rity at suc­ces­sors, but sim­ply grow­ing vol­ume and time lags in learn­ing of cas­es, as many had been kept se­cret or were tied up in le­gal pro­ceed­ings).

As amus­ing as it was to be in­ter­viewed reg­u­larly by jour­nal­ists, or to see DNMs pop up & be hacked al­most im­me­di­ately (I did my part when I pointed out to Black Gob­lin Mar­ket that they had sent my user reg­is­tra­tion email over the clear­net, deanonymiz­ing their server in a Ger­man dat­a­cen­ter, and should prob­a­bly cut their loss­es), DNM dy­nam­ics dis­ap­pointed me. A few DNMs like The Mar­ket­place tried to im­prove, but buy­ers proved ac­tively hos­tile to gen­uine mul­ti­sig, and de­spite all the doc­u­mented SR1 ar­rests, PGP use in pro­vid­ing per­sonal in­for­ma­tion seemed to, if any­thing, de­cline. In the end, SR1 proved not to be a pro­to­type, soon ob­so­leted in fa­vor of in­no­va­tions like mul­ti­sig and truly de­cen­tral­ized mar­kets, but in some re­spects the peak of the cen­tral­ized DNM, rel­a­tively effi­cient & well-run and with fea­tures rarely seen on suc­ces­sors like hedg­ing the Bit­coin ex­change rate. The card­ing fraud com­mu­nity moved in on DNMs, and the worst of the syn­thetic opi­ates like car­fen­tanil soon ap­peared. In 2020, DNMs are lit­tle differ­ent than when they sprang ful­ly-formed out of Ross Ul­bricht’s fore­head in Jan­u­ary 2011. Users, it seems, are lazy; they are re­ally lazy; even if they are do­ing crimes, they would rather risk go­ing to jail than spend 20 min­utes fig­ur­ing out how to use PGP, and they would rather ac­cept a large risk every month of los­ing hun­dreds or thou­sands of dol­lars rather than spend time fig­ur­ing out mul­ti­sig. By 2015, I had grown weary; the fi­nal straw was when ICE, ab­surd­ly, sub­poe­naed Red­dit for my ac­count in­for­ma­tion. (I could have told them that the in­for­ma­tion they were in­ter­ested in had never been sent to me, and if it had, it would al­most cer­tainly have been lies or a frame, not that they ever both­ered to ask me.) So, I shut down my spi­ders, neatly or­ga­nized and com­pressed them, and re­leased them as a sin­gle pub­lic archive. I had hoped that by re­leas­ing so many datasets, it would set an ex­am­ple, but while >46 pub­li­ca­tions use my data as of Jan­u­ary 2020, few have seen fit to re­lease their own. The even­tual suc­cess of my archives re­in­forced my view that pub­lic per­mis­sion-less datasets are often a bot­tle­neck to re­search: you can­not guar­an­tee that peo­ple will use your dataset, but you can guar­an­tee that they won’t use it. Hardly any of the peo­ple who used my data ever so much as con­tacted me, and the num­ber of uses stands in stark con­trast to Nico­las Christin & Kyle Soska’s DNM datasets, which were re­leased ei­ther cen­sored to the point of use­less­ness or us­ing a highly oner­ous data archive ser­vice called IMPACT. And that was that. The FBI would later pay me some vis­its, but I was long done with the DNMs and had moved on.

For ge­net­ics was ex­pe­ri­enc­ing its own re­vival. Much more dra­mat­i­cally than my DNM archives, hu­man ge­net­ics was demon­strat­ing the power of open data as an ob­scure project called the UK BioBank (UKBB) came on­line, with n = 500k SNP geno­types & rich phe­no­type data; UKBB was only a small frac­tion of global genome data (sharded across a mil­lion si­los and lit­tle em­per­ors), and much smaller than 23andMe (dis­in­clined to do any­thing con­tro­ver­sial or which might im­pede drug com­mer­cial­iza­tion), but it differed in one al­l-im­por­tant re­spect: they made it easy for re­searchers to get all the da­ta. The re­sult is that—un­like 23and­Me, All Of Us, Mil­lion Vet­eran Pro­gram, or the en­tire na­tion of Chi­na—­pa­pers us­ing UKBB show up lit­er­ally every day on BioRxiv alone, and it would be hard to find hu­man ge­net­ics re­search which has­n’t ben­e­fit­ed, one way or an­oth­er, from UKBB. (How did UKBB hap­pen? I don’t know, but there must be many un­sung he­roes be­hind it.) Em­pha­siz­ing this even more was the ex­plo­sion of ge­netic cor­re­la­tion re­sults. I read much of the hu­man ge­netic cor­re­la­tion lit­er­a­ture, and it went from a few ge­netic cor­re­la­tion pa­pers a year, to hun­dreds. Why? Sim­ple: pre­vi­ous­ly, ge­netic cor­re­la­tions re­quired in­di­vid­ual per­sonal data, as you ei­ther needed the data from twin pairs to cal­cu­late cross-twin cor­re­la­tions or run the SEM, or you needed the raw SNP geno­types to use GCTA; twin reg­istries keep their data close to their chest, and every­one with SNP data guards those even more jeal­ously (a­side from UKBB). Like a dog in the manger, if the own­ers of the nec­es­sary datasets could­n’t get around to pub­lish­ing them, no one would be al­lowed to. But then a method­olog­i­cal break­through hap­pened: LD Score Re­gres­sion was re­leased with a con­ve­nient soft­ware im­ple­men­ta­tion, and LDSC worked around the bro­ken sys­tem by only re­quir­ing the PGSes, not the raw da­ta. Now a ge­netic cor­re­la­tion could be com­puted by any­one, for any pair of PGSes, and many PGSes had al­ready been re­leased as a con­ces­sion to open­ness. An ex­plo­sion of re­ported ge­netic cor­re­la­tions fol­lowed, to the point where I had to stop com­pil­ing them for the Wikipedia en­try be­cause it was fu­tile when every other pa­per might run LDSC on 50 traits.

The pre­dic­tions of some be­hav­ioral ge­neti­cists & hu­man ge­neti­cists (par­tic­u­larly those with an­i­mal ge­net­ics back­grounds) came true: in­creas­ing sam­ple sizes did de­liver suc­cess­ful GWASes, and the ‘miss­ing her­i­tabil­ity’ prob­lem was a non-prob­lem. I had re­mained ag­nos­tic on the ques­tion of IQ and ge­net­ics, be­cause while IQ is too em­pir­i­cally suc­cess­ful to be doubted (not that that stops many), the ge­net­ics have al­ways been fiercely op­posed; on look­ing into the ques­tion, I had de­cided that GWASes would be the crit­i­cal test, and in par­tic­u­lar, sib­ling com­par­isons would be the gold stan­dard­—as R.A. Fisher pointed out, sib­lings in­herit ran­dom­ized genes from their par­ents and also grow up in shared en­vi­ron­ments, ex­clud­ing all pos­si­bil­i­ties of con­found­ing or re­verse cau­sa­tion or pop­u­la­tion struc­ture: if a PGS works be­tween-si­b­lings, it must be tap­ping into causal genes.2 The crit­ics had al­ways con­cen­trated their fire on IQ, as, sup­pos­ed­ly, the most bi­ased, racist, and rot­ten key­stone in the over­all ar­chi­tec­ture of be­hav­ioral ge­net­ics, and evinced ut­ter cer­ti­tude; as went IQ, so went their ar­gu­ments. I de­cided years ago that a suc­cess­ful sib­ling test of an IQ GWAS with genome-wide sta­tis­ti­cal­ly-sig­nifi­cant hits (not can­di­date-ge­nes) is what it would take to change my mind.

The key re­sult was Ri­etveld et al 2013, the first truly suc­cess­ful IQ GWAS. Ri­etveld et al 2013 found GWAS hits; fur­ther, it found be­tween-si­b­ling differ­ences. (This sib­ling test would be repli­cated eas­ily a dozen times by 2020.) Read­ing it was a rev­e­la­tion. The de­bate was over: be­hav­ioral ge­net­ics was right, and the crit­ics were wrong. Kam­in, Gould, Lewon­tin, Shal­izi, the whole sorry pack­—an­ni­hi­lat­ed. IQ was in­deed highly her­i­ta­ble, poly­genic, and GWASes would only get bet­ter for it, and for all the eas­ier traits as well. (“To see the gods dis­pelled in mid-air and dis­solve like clouds is one of the great hu­man ex­pe­ri­ences. It is not as if they had gone over the hori­zon to dis­ap­pear for a time; nor as if they had been over­come by other gods of greater power and pro­founder knowl­edge. It is sim­ply that they came to noth­ing.”) Among other im­pli­ca­tions, em­bryo se­lec­tion was now proven fea­si­ble (em­bryos are, after all, just sib­lings), and sud­denly far-dis­tant fu­ture spec­u­la­tions like it­er­ated em­bryo se­lec­tion (IES) no longer seemed to rest on such a rick­ety tower of as­sump­tions. This was con­cern­ing. Also con­cern­ing was the will­ful blind­ness of many, in­clud­ing re­spectable ge­neti­cists and sci­en­tists, who hap­pily made up ar­gu­ments about poly­genic­i­ty, in­vented ge­netic cor­re­la­tions, con­flated hits with PGS with SNP her­i­tabil­ity with her­i­tabil­i­ty, claimed GWASes would­n’t repli­cate, ig­nored all in­con­ve­nient an­i­mal ex­am­ples, and sim­ply dis­missed out of hand all pos­si­bil­i­ties as mi­nor with­out so much as a sin­gle num­ber men­tioned; in short, if there was any way to be con­fused or one could in­vent any pos­si­ble ob­sta­cle, then that im­me­di­ately be­came a fa­tal ob­jec­tion.3 Bostrom & Shul­man 2014 fi­nally pro­vided an adult per­spec­tive on em­bryo se­lec­tion pos­si­bil­i­ties and was good as far as it went in a few pages, but I felt ne­glected a lot of prac­ti­cal ques­tions and did­n’t go be­yond the sim­plest pos­si­ble kind of em­bryo se­lec­tion.

Since no one was will­ing to an­swer my ques­tions, I be­gan an­swer­ing them my­self. I first be­gan by repli­cat­ing Bostrom & Shul­man 2014’s re­sults with the sim­plest model of em­bryo se­lec­tion, and be­gan work­ing in the var­i­ous costs and at­tri­tion in the IVF pipeline, to cre­ate a re­al­is­tic an­swer. I then be­gan look­ing at the scal­ing: which is more im­por­tant, PGS or num­ber of em­bryos? How much can ei­ther be boosted in the fore­see­able fu­ture? Sur­pris­ing, n is more im­por­tant than PGS, de­spite PGS be­ing what every­one al­ways de­bat­ed, and I de­toured a bit into or­der sta­tis­tics, since the im­por­tance of ‘mas­sive em­bryo se­lec­tion’ was un­der­rat­ed. Di­min­ish­ing re­turns do set in, but there are two ma­jor im­prove­ments: mul­ti­-stage se­lec­tion, where one se­lects at mul­ti­ple stages in the process, which turns out to be ab­surdly more effec­tive, and se­lect­ing on an in­dex of mul­ti­ple traits us­ing the count­less PGSes now avail­able, which is sub­stan­tially more effi­cient and also ad­dresses the buga­boo of neg­a­tive ge­netic cor­re­la­tion­s—s­e­lect­ing on a trait like in­tel­li­gence will not ‘back­fire’ be­cause when you look at hu­man phe­no­typic and geno­typic cor­re­la­tions as a whole, al­most every good trait is cor­re­lated with (not merely in­de­pen­dent of) other good traits, and like­wise for bad traits, and this su­per­charges se­lec­tion. There were a num­ber of other in­ter­est­ing av­enues, but I largely an­swered my ques­tion: em­bryo se­lec­tion is cer­tainly pos­si­ble, will be soon (and has since) been done, is profitable al­ready, al­beit mod­estly so, ge­netic edit­ing like CRISPR is prob­a­bly dras­ti­cally over­rated bar­ring break­throughs in do­ing hun­dreds or thou­sands of ed­its safe­ly, but there are mul­ti­ple path­ways to far more effec­tive and thus dis­rup­tive changes in the 2020s–2030s through mas­sive em­bryo se­lec­tion or IES or genome syn­the­sis (with a few wild cards like ga­mete or chro­mo­some se­lec­tion), par­tic­u­larly as gains ac­cu­mu­late over gen­er­a­tions. Mod­el­ing IES or genome syn­the­sis is al­most un­nec­es­sary be­cause the po­ten­tial gains are so large. (There are still some in­ter­est­ing ques­tions in con­strained op­ti­miza­tion and mod­el­ing breed­ing pro­grams with the ex­ist­ing hap­lo­type­s/LD, but I’m not sure they’re im­por­tant to know at this point.)

I kept an eye on deep learn­ing the en­tire time post-AlexNet, and was per­turbed by how DL just kept on grow­ing in ca­pa­bil­i­ties and march­ing through fields, and in par­tic­u­lar, how its strengths were in the ar­eas that had al­ways his­tor­i­cally be­dev­iled AI the most and how they kept scal­ing as model sizes im­proved—im­prove as mod­els with mil­lions of pa­ra­me­ters were, peo­ple were al­ready talk­ing about train­ing NNs with as many as a bil­lion pa­ra­me­ters. Crazy talk? One could­n’t write it off so eas­i­ly. Back in 2009 or so, I had spent a lot of time read­ing about Lisp ma­chines and AI in the 1980s, go­ing through old jour­nals and news ar­ti­cles to im­prove the Wikipedia ar­ti­cle on Lisp ma­chi­nes, and I was amazed by the Lisp ma­chine OSes & soft­ware, so su­pe­rior to Linux et al, but also do­ing a lot of eye­-rolling at the ex­pert sys­tems and ro­bots which passed for AI back then; in fol­low­ing deep learn­ing, I was struck by how it was the re­verse, GPU were a night­mare to pro­gram for and the soft­ware ecosys­tem was al­most ac­tively ma­li­cious in sab­o­tag­ing pro­duc­tiv­i­ty, but the re­sult­ing AIs were un­can­nily good and ex­celled at per­cep­tual tasks. Grad­u­al­ly, I be­came con­vinced that DL was here to stay, and offered a po­ten­tial path to AGI: not that any­one was go­ing to throw a 2016-style char-RNN at a mil­lion GPUs and get an AGI, of course, but that there was now a non­triv­ial pos­si­bil­ity that fur­ther tweaks to DL-style ar­chi­tec­tures of sim­ple differ­en­tiable units com­bined with DRL would keep on scal­ing to hu­man-level ca­pa­bil­i­ties across the board. (Have you no­ticed how no one uses the word “tran­shu­man­ist” any­more? Be­cause we’re all tran­shu­man­ists now.) There was no equiv­a­lent of Ri­etveld et al 2013 for me, just years of read­ing Arxiv and fol­low­ing the trends, re­in­forced by oc­ca­sional case-s­tud­ies like Al­phaGo (let’s take a mo­ment to re­mem­ber how amaz­ing it was that be­tween Oc­to­ber and May, the state of com­puter Go went from ‘per­haps a MCTS vari­ant will de­feat a pro in a few years, and then maybe the world champ in a decade or two’ to ‘un­touch­ably su­per­hu­man’; tech­nol­ogy and com­put­ers do not fol­low hu­man time­lines or scal­ing, and 9 GPUs can train a NN in a mon­th).

Al­phaGo Ze­ro: ‘just stack moar lay­ers lol!’

The “bit­ter les­son” en­cap­su­lates the long-term trends ob­served in DL, from use in a few lim­ited ar­eas, often on pre/­post-processed data or as part of more con­ven­tional ma­chine learn­ing work­flows, heavy on sym­bolic tech­niques, to ever-larger (and often sim­pler) neural net mod­els. MuZero is per­haps the most strik­ing 2019 ex­am­ple of the bit­ter lesson, as Al­p­haZero had pre­vi­ously leapt past all pre­vi­ous sym­bolic Go (and chess, and shogi) play­ers us­ing a hy­brid of neural nets (for near-su­per­hu­man in­tu­ition) and then sym­bolic tree search to do plan­ning over thou­sands or mil­lions of sce­nar­ios in an ex­act sim­u­la­tion of the en­vi­ron­ment, but then MuZero threw all that away in fa­vor of just do­ing a lot of train­ing of an RNN, which does… what­ever it is an RNN does in­ter­nal­ly, with no need for ex­plicit plan­ning or even an en­vi­ron­ment sim­u­la­tor—and quite aside from beat­ing Al­p­haZe­ro, also ap­plies di­rectly to play­ing reg­u­lar ALE video games. (To up­date quip, “every time I fire a ma­chine learn­ing re­searcher & re­place him with a TPU pod, the per­for­mance of our Go-play­ing sys­tem goes up.”)

An open ques­tion: why was I and every­one else wrong to ig­nore con­nec­tion­ism when things have played out much as Schmid­hu­ber and and a few oth­ers pre­dict­ed? Were we wrong, or just un­lucky? What was, ex ante, the right way to think about this, even back in the 1990s or 1960s? I am usu­ally pretty good at bul­let-bit­ing on graphs of trends, but I can’t re­mem­ber any per­for­mance graphs for con­nec­tion­ism; what graph should I have be­lieved, or if it did­n’t ex­ist, why not?

If there had al­ways been a loud noisy con­tin­gent, per­haps a mi­nor­ity like a quar­ter of ML re­searchers, who watched GPU progress with avid in­ter­est and re­peat­edly sketched out the pos­si­bil­i­ties of scal­ing in 2010–2020, and ea­gerly leapt on it as soon as re­sources per­mit­ted and ad­vo­cated for ever larger in­vest­ments, one could write this off as a nat­ural ex­am­ple of a break­through: sur­prises hap­pen, that’s why we do re­search, to find out what we don’t know. But in­stead, there were per­haps a hand­ful who truly ex­pected it, even they seemed sur­prised by how it hap­pened; and no mat­ter how much progress was made, the naysay­ers never changed their tune (only their goal­post­s). (The most strik­ing ex­am­ple was offered mid­way in 2020 with .) A sys­tem­at­ic, com­pre­hen­sive, field­-wide fail­ure of pre­dic­tion & up­dat­ing like that de­mands ex­pla­na­tion.

The best ex­pla­na­tion I’ve come up with so far is work­ing back­wards from the ex­cus­es, that this may be yet an­other man­i­fes­ta­tion of the hu­man bias against re­duc­tion­ism & pre­dic­tion: “it’s just mem­o­riza­tion”, “it’s just in­ter­po­la­tion”, “it’s just pat­tern-match­ing”, etc, per­haps ac­com­pa­nied by an ex­am­ple prob­lem that DL can’t (yet) solve which sup­pos­edly demon­strates the pro­found gulf be­tween ‘just X’ and real in­tel­li­gence. It is, in other words, fun­da­men­tally an an­ti-re­duc­tion­ist ar­gu­ment from in­creduli­ty: “I sim­ply can­not be­lieve that in­tel­li­gence like a hu­man brain has could pos­si­bly be made up of a lot of small parts”. (That the hu­man brain is also made of small parts is ir­rel­e­vant to them be­cause one can al­ways ap­peal to the mys­te­ri­ous and in­effa­ble com­plex­ity of bi­o­log­i­cal neu­rons, with all their neu­ro­trans­mit­ters and synapses and what­not, so the brain feels ad­e­quately com­plex and part-less.) If so, deep learn­ing merely joins the long pan­theon of deeply un­pop­u­lar re­duc­tion­ist the­o­ries through­out in­tel­lec­tual his­to­ry: Atom­ism, ma­te­ri­al­ism, athe­ism, grad­u­al­ism, cap­i­tal­ism, evo­lu­tion, germ the­o­ry, elan vi­tal & ‘or­ganic’ mat­ter, poly­genic­i­ty, Boolean log­ic, Monte Car­lo/sim­u­la­tion meth­ods… All of these in­spired enor­mous re­sis­tance and deep vis­ceral ha­tred, de­spite prov­ing to be true or more use­ful. Hu­mans seem cog­ni­tively hard­wired to be­lieve that every­thing is made up of a few fun­da­men­tal units which are on­to­log­i­cally ba­sic and not re­ducible to (ex­treme­ly) large num­bers of small uni­form dis­crete ob­jects.4 If the out­puts are beau­ti­fully com­plex and in­tri­cate, the in­puts must also be that way: things can’t be made of a few kinds of atoms, they must in­stead be made of a differ­ent type for every ob­ject like a plenum of bone-par­ti­cles for bones or wa­ter-par­ti­cles for wa­ter; you can’t be a phys­i­cal brain, your mind must in­stead be a sin­gle in­di­vis­i­ble on­to­log­i­cal­ly-ba­sic ‘soul’ ex­ist­ing on an im­ma­te­r­ial plane; economies can’t be best run as many in­de­pen­dent agents in­de­pen­dently trans­act­ing, it would be much bet­ter for the philoso­pher-k­ing & his cadres to plan out every­thing; etc. Sim­i­lar­ly, in­tel­li­gence can’t just be a rel­a­tively few ba­sic units repli­cated tril­lions of times & trained with brute force, it must be thou­sands of in­tri­cate units per­form­ing arith­metic, log­ic, mem­o­ry, pat­tern-match­ing, pro­gram syn­the­sis, analo­giz­ing, re­in­force­ment learn­ing, all in differ­ent ways and care­fully or­ga­nized with enor­mous ex­per­tise… But for all the bi­as­es, re­duc­tion­ism still ul­ti­mately wins out. So why did­n’t it win soon­er?

What went wrong? There is a Catch-22 here: with the right tech­niques, im­pres­sive proof-of-con­cepts could have been done quite a few years ago on ex­ist­ing su­per­com­put­ers and suc­cess­ful pro­to­types would have jus­ti­fied the in­vest­ment, with­out wait­ing for com­mod­ity gam­ing GPUs; but the tech­niques could not be found with­out run­ning many failed pro­to­types on those su­per­com­put­ers in the first place! Only once the pre­req­ui­sites fell to such low costs that near-zero fund­ing sufficed to go through those count­less it­er­a­tions of fail­ure, could the right tech­niques be found, and jus­tify the cre­ation of the nec­es­sary datasets, and fur­ther jus­tify scal­ing up. Hence, the sud­den deep learn­ing re­nais­sance—had we known what we were do­ing from the start, we would have sim­ply seen a grad­ual in­crease in ca­pa­bil­i­ties from the 1980s.

The flip side of the bit­ter les­son is the sweet short­cut: as long as you have weak com­pute and small data, it’s al­ways easy for the re­searcher to smug­gle in prior knowl­edge/bias to gain greater per­for­mance. That this will be dis­pro­por­tion­ately true of the ar­chi­tec­tures which scale the worst will be in­vis­i­ble, be­cause it is im­pos­si­ble to scale any su­pe­rior ap­proach at that time. Ap­peal­ing to fu­ture com­pute and spec­u­lat­ing about how “brain-e­quiv­a­lent com­put­ing power” ar­riv­ing by 2010 or 2030 will en­able AI sounds more like wish­ful think­ing than good sci­ence. A con­nec­tion­ist might scoff at this skep­ti­cism, but they have no com­pelling ar­gu­ments: the hu­man brain may be an ex­is­tence proof, but most con­nec­tion­ist work is a car­i­ca­ture of the baroque com­plex­ity of neu­ro­bi­ol­o­gy, and be­sides, planes do not flap their wings nor do sub­marines swim. How would they prove any of this? They can’t, un­til it’s too late and every­one has re­tired.

TODO: can­di­date-gene de­ba­cle

Thus, there is an epis­temic trap. The very fact that con­nec­tion­ism is so gen­eral and scales to the best pos­si­ble so­lu­tions means that it per­forms the worst early on in R&D and com­pute trends, and is out­com­peted by its smaller (but more lim­it­ed) com­peti­tors; be­cause of this com­pe­ti­tion, it is starved of re­search, fur­ther en­sur­ing that it looks use­less; with a track record of be­ing use­less, the steadily de­creas­ing re­quired in­vest­ments don’t make any differ­ence be­cause no one is tak­ing se­ri­ously any pro­jec­tions; un­til fi­nal­ly, a hard­ware over­hang ac­cu­mu­lates to the point that it is doomed to suc­cess, when 1 GPU is enough to it­er­ate and set SOTAs, break­ing the equi­lib­rium by pro­vid­ing un­de­ni­able hard re­sults.

This trap is in­trin­sic to the ap­proach. There is no al­ter­nate his­tory where con­nec­tion­ism some­how wins the day in the 1970s and all this DL progress hap­pens decades ahead of sched­ule. If Min­sky had­n’t pointed out the prob­lems with per­cep­trons, some­one else would have; if some­one had im­ported con­vo­lu­tions in the 1970s rather than Le­Cun in 1990, it would have sped things up only a lit­tle; if back­prop­a­ga­tion had been in­tro­duced decades ear­lier, as early as imag­in­able, per­haps in the 1950s with the de­vel­op­ment of dy­namic pro­gram­ming, that too would have made lit­tle differ­ence be­cause there would be lit­tle one could back­prop over (and resid­ual net­works were in­tro­duced in the 1980s decades be­fore they were rein­vented in 2015, to no effec­t); and so on. The his­tory of con­nec­tion­ism is not one of be­ing lim­ited by ideas—ev­ery­one has tons of ideas, great ideas, just ask Schmid­hu­ber for a bas­ket as a party fa­vor!—but one of re­sults; some­what like be­hav­ioral & pop­u­la­tion ge­net­ics, all of these great ideas fell through a por­tal from the fu­ture, drop­ping in on sav­ages lack­ing the pre­req­ui­sites to sort rub­bish from rev­o­lu­tion. The com­pute was not avail­able, and hu­mans just aren’t smart enough to ei­ther in­vent every­thing re­quired with­out painful tri­al-and-er­ror or prove be­yond a doubt their effi­cacy with­out need­ing to run them.

GANs in 2014 caught my at­ten­tion be­cause I knew the ul­tra­-crude 64px grayscale faces would im­prove con­stant­ly, and in a few years GANs would be gen­er­at­ing high­-res­o­lu­tion color im­ages of Im­a­geNet. I was­n’t too in­ter­ested in Im­a­geNet per se, but if char-RNNs could do Shake­speare and GANs could do Im­a­geNet, they could do other things… like anime and po­et­ry. (Why anime and po­et­ry? To épa­ter la bour­geoisie, of course!) How­ev­er, there was no anime dataset equiv­a­lent to Im­a­geNet, and as I knew from my DNM archives, datasets are often a bot­tle­neck, so after look­ing around for a while, I be­gan lay­ing the ground­work for what would be­come Dan­booru2017. Karpa­thy also fa­mously put char-RNNs on the map, and I be­gan ex­per­i­ment­ing with po­etry gen­er­a­tion. Anime did­n’t work well with any GAN I tried, and I had to put it aside. I knew a use­ful GAN would come along, and when it did, Dan­booru2017 would be ready—the pat­tern with deep learn­ing is that it does­n’t work at all, and one lay­ers on com­pli­cated hand-engi­neered ar­chi­tec­tures to eek out some per­for­mance, un­til some­one finds a rel­a­tively sim­ple ap­proach which scales and then one can sim­ply throw GPUs & data at the prob­lem. What I’ve learned about NNs is that they scale al­most in­defi­nite­ly; we don’t have any idea how to train NNs well, and they are grossly over­pa­ra­me­ter­ized, with al­most all of a NN’s pa­ra­me­ters be­ing un­nec­es­sary; break­throughs are made by tri­al-and-er­ror to a de­gree scrubbed from re­search pa­pers, and ‘al­go­rith­mic’ progress is pri­mar­ily due to com­pute en­abling enor­mous amounts of ex­per­i­ments; the­ory is al­most use­less in guid­ing NN de­sign, with pa­pers ac­tively mis­lead­ing the reader about this (eg the ResNet pa­per); even sub­tle de­tails of ini­tial­iza­tion or train­ing can have shock­ingly large im­pli­ca­tions on per­for­mance—a NN which seems to be work­ing fine may in fact be badly bro­ken but still work OK be­cause “NNs want to work”; “NNs are lazy” and will solve any given task in the lazi­est pos­si­ble way un­less the task is hard (eg they can do dis­en­tan­gle­ment or gen­er­al­iza­tion or rea­son­ing just fine, but only if we ac­tu­ally force them to solve those tasks and not some­thing eas­ier).

Fi­nal­ly, in 2017, ProGAN showed that anime faces were al­most doable, and then with StyleGAN’s re­lease in 2019, I gave it a sec­ond shot (ex­pect­ing a mod­est im­prove­ment over ProGAN, which is what was re­ported on pho­to­graphic faces etc) and was shocked when al­most overnight StyleGAN cre­ated bet­ter anime faces than ProGAN, and soon was gen­er­at­ing shock­ing­ly-good faces. As a joke, I put up sam­ples as a stand­alone web­site TWDNE, and then a mil­lion Chi­nese de­cided to pay a vis­it. 2019 also saw GPT-2, and if the char-RNN po­etry was OK, the GPT-2 po­etry sam­ples, es­pe­cially once I col­lab­o­rated with Shawn Presser to use hun­dreds of TPUs to fine­tune GPT-2-1.5b (Google TFRC paid for that and other pro­ject­s), were fan­tas­tic. Overnight, the SOTA for both anime and po­etry gen­er­a­tion took huge leaps. I wrote ex­ten­sive tu­to­ri­als on both StyleGAN and GPT-2 to help other peo­ple use them, and I’m pleased that, like my SR1 tu­to­ri­al, a great many peo­ple found them use­ful, fill­ing the gap be­tween or­di­nary peo­ple and a repo dumped on Github. (A fol­lowup project to use an ex­per­i­men­tal DRL ap­proach to im­prove po­et­ry/­mu­sic qual­ity fur­ther did­n’t work out.)

I’d al­ways spent time tweak­ing Gw­ern.net’s ap­pear­ance and func­tion­al­ity over the years, but my dis­in­cli­na­tion to learn JS/CSS/HTML (a tech stack that man­ages to be even more mis­er­able than the deep learn­ing Python ecosys­tem) lim­ited how far I could go. Said Achmiz vol­un­teered, and re­designed it al­most from the ground up 2018–2020, and is re­spon­si­ble for every­thing from the drop caps to the col­lapsi­ble sec­tions to side­notes to link an­no­ta­tions. (I can take credit for rel­a­tively few things, like the Haskell back­ends and writ­ing an­no­ta­tions; I mostly just re­quest fea­tures while try­ing to ed­u­cate my­self about ty­pog­ra­phy & de­sign.) What was a util­i­tar­ian sta­tic site is now al­most a de­sign show­case, and I can’t help but feel from watch­ing traffic and men­tions that the im­prove­ments have made Gw­ern.net more re­spectable and read­ers re­cep­tive, de­spite most of the con­tents re­main­ing the same; syn­chro­nous­ly, I be­gan work­ing on my ap­pear­ance by get­ting LASIK, pick­ing up weightlift­ing as a hob­by, up­grad­ing my wardrobe etc, and there too I can’t help but feel that peo­ple were judg­ing by my ap­pear­ance to an even greater ex­tent that I (thought I had cyn­i­cal­ly) sup­posed. This re­minds me of how AI risk de­vel­oped in the 2010s; it went from a wingnut fringe con­cern to broad ac­cep­tance, as far as I can tell, start­ing in 2014 when Nick Bostrom’s Ox­ford Uni­ver­sity Pressed-pub­lished Su­per­in­tel­li­gence (which largely repack­ages the thought of Eliezer Yud­kowsky and other Sin­gu­lar­i­tar­i­ans for a gen­eral au­di­ence) be­came a NYT best­seller and then was en­dorsed by Elon Musk. What does Elon Musk know about AI risk? What does the NYT best­seller list know about any­thing? And yet, I no­ticed at the time the sea change in at­ti­tude from such high­-s­ta­tus en­dorse­ments. Not, of course, that it helped Yud­kowsky’s rep­u­ta­tion, as new­com­ers to the topic promptly did their best to erase him and plant their flags by coin­ing ever new ne­ol­o­gisms to avoid us­ing terms like “Friendly AI” (I lost track of them all, like “Benefi­cent AI” or “hu­man-aligned AI” or “AI align­ment” or “Prov­ably Ben­e­fi­cial AI” or “AI For Good” or “AI safety”). A les­son there for pi­o­neers ex­pect­ing to at least get credit for be­ing right: if you suc­ceed in be­com­ing a bridge to a bet­ter fu­ture, it may be be­cause the ‘re­spectable’ ‘se­ri­ous’ peo­ple knifed you in the back and walked across. (AI risk was also promptly hi­jacked by “what if the real AI risk is cap­i­tal­ism & not enough iden­ti­ty-pol­i­tics” types, but it has­n’t been a to­tal loss.)

I think it would have been a mis­take to fo­cus too much on de­sign and ap­pear­ance early on, how­ev­er. There is no point in in­vest­ing all that effort in tart­ing up a web­site which has noth­ing on it, and it makes sense only when you have a sub­stan­tial cor­pus to up­grade. One early web­site de­sign flaw I do re­gret is not putting an even greater em­pha­sis on de­tailed ci­ta­tion and link archiv­ing; in ret­ro­spect, I would have saved my­self a lot of grief by mir­ror­ing all PDFs and ex­ter­nal links as soon as I linked them. I thought my archiver dae­mon would be enough, but the con­stant cas­cade of linkrot, com­bined with the con­stant ex­pan­sion of Gw­ern.net, made it im­pos­si­ble to keep up with linkrot man­u­al­ly, and by omit­ting ba­sic ci­ta­tion meta­data like ti­tles & au­thors, I had a hard time deal­ing with links that es­caped archiv­ing be­fore dy­ing. Like­wise, treat­ing so­cial me­dia as a plat­form for draft­ing and pub­li­ciz­ing is im­por­tant for any writer to re­mem­ber. Plat­forms are not your friend, be­cause A plat­form like Google+ or Twit­ter or Red­dit can and will de­mote or delete you and your con­tent, or dis­ap­pear en­tire­ly, for triv­ial rea­sons. (Who can for­get the of watch­ing young col­lege-grad Red­dit em­ploy­ees flip­pantly de­cide which sub­red­dits to erase, crow­ing as they set­tled scores? Or how many sub­red­dits were deleted in their en­tirety in var­i­ous purges be­cause of a co­in­ci­dence in names, show­ing no em­ploy­ees had so much as looked at their front page? Google+, of course, was erased in its en­tire­ty; a sub­stan­tial chance of its death, but was still dis­mayed.) The prob­a­bil­ity may be small each year, but it adds up. In the next decade, I don’t know what web­site I use will go un­der or crazy: HN, Twit­ter, Red­dit, WP, LW? But I will try to be ready and en­sure that any­thing of value is archived, ex­port­ed, or moved to Gw­ern.net.

An­other un­der­rated ac­tiv­i­ty, along with cre­at­ing datasets and meth­ods, is pro­vid­ing full­texts. “Users are lazy”, and if it is­n’t in Google/­Google Schol­ar, then it does­n’t ex­ist for re­searchers ei­ther. I no­ticed on Wikipedia that while there were few overt signs that my ar­ti­cles were hav­ing any in­flu­ence—who cites a Wikipedia ar­ti­cle?—there were con­tin­u­ous sus­pi­cious co­in­ci­dences. I would dig up a par­tic­u­larly recher­ché ar­ti­cle or in­ter­view, scan or host it, quote it on WP, and years lat­er, no­tice that some ar­ti­cle would cite it; with­out any ref­er­ence to me or WP. (“Oh ho, you just hap­pened to dig it up in­de­pen­dently in your doubt­less thor­ough lit­er­a­ture re­view, did you? A likely sto­ry!”) This is equally true for other top­ics; how many times have I dug up some pa­per or the­sis from the 1950s and—mirabile dictu!—other re­searchers sud­denly be­gan to cite it in pa­pers? It would be in­ter­est­ing to pull ci­ta­tion meta­data from Se­man­tic Scholar and cross-ref­er­ence it with my re­vi­sion his­tory to look at the time-series and see how much my host­ing in­creases to­tal ci­ta­tion count once Google Scholar in­dexed it.

Over­all, I think I have a good track record. My pre­dic­tions (whether on IEM, In­trade, Pre­dic­tion­Book, or Good Judg­ment Pro­ject) are far above av­er­age, and I have taken many then-con­tro­ver­sial po­si­tions where con­tem­po­rary re­search or opin­ion has moved far closer to me than the crit­ics, often when I was in a minute mi­nor­i­ty. Ex­am­ples in­clude Wikipedi­a’s edit­ing cri­sis, anti-DNB, the Repli­ca­tion Cri­sis, Bit­coin, dark­net mar­kets, modafinil & nicotine, LSD mi­cro­dos­ing, AI risk, be­hav­ioral ge­net­ics, em­bryo se­lec­tion, and ad­ver­tis­ing harms. If I have not al­ways been right from the start, I have at least been less wrong than most in up­dat­ing faster than most (DNB, be­hav­ioral ge­net­ics, DL/DRL).




  1. The Cul­tural Rev­o­lu­tion: A Peo­ple’s His­to­ry, 1962–1976, 2016 ()
  2. , 1997 (re­view)


Non­fic­tion movies:

  1. (re­view)
  2. (re­view)


  1. ()
  2. (re­view)
  3. (re­view)
  4. , 1932 (re­view)
  5. ()
  6. ()
  7. ()
  8. , 1978 (re­view)
  9. 2012/ 2014/ 2014 (re­view)


  1. (re­view)
  2. (re­view)
  3. (re­view)
  4. Neon Gen­e­sis Evan­ge­lion Con­cur­rency Project, 2013 (re­view)
  5. (re­view)
  6. : (re­view: , s9)

  1. Psy­che­delics en­thu­si­asts have never for­given me for this, no mat­ter how much I have been vin­di­cated by sub­se­quent stud­ies.↩︎

  2. Bar­ring, of course, ad­di­tional fac­tors like pub­li­ca­tion bias or fraud or soft­ware er­rors, the lat­ter of which have hap­pened.↩︎

  3. This should not have been a sur­prise to me, after see­ing how, after the dis­as­trous im­pe­r­ial pres­i­dency of George W. Bush, lib­er­als fell over them­selves to de­fend all ad­min­is­tra­tion poli­cies and abuses once a De­mo­c­rat was in office, and so on with Trump. Pol­i­tics is the mind­killer.↩︎

  4. The psy­chol­ogy of re­li­gion in small chil­dren is re­veal­ing in this way: small chil­dren do not be­lieve peo­ple are made of care­ful­ly-arranged parts. They just are. So if some­one dies, they must go some­where else—that’s just ba­sic ob­jec­t-per­sis­tence and con­ser­va­tion!↩︎