Timing Technology: Lessons From The Media Lab

Technological developments can be foreseen but the knowledge is largely useless because startups are inherently risky and require optimal timing. A more practical approach is to embrace uncertainty, taking a reinforcement learning perspective.
history, technology, reviews, sociology, decision-theory, insight-porn
2012-07-122019-06-20 finished certainty: likely importance: 6


How do you time your star­tup? Tech­no­log­i­cal fore­casts are often sur­pris­ingly pre­scient in terms of pre­dict­ing that some­thing was pos­si­ble & de­sir­able and what they pre­dict even­tu­ally hap­pens; but they are far less suc­cess­ful at pre­dict­ing the tim­ing, and al­most al­ways fail, with the suc­cess (and rich­es) go­ing to an­oth­er.

Why is their knowl­edge so use­less? Why are suc­cess and fail­ure so in­ter­twined in the tech in­dus­try? The right mo­ment can­not be known ex­actly in ad­vance, so at­tempts to fore­cast will typ­i­cally be off by years or worse. For many claims, there is no way to in­vest in an idea ex­cept by go­ing all in and launch­ing a com­pa­ny, re­sult­ing in ex­treme vari­ance in out­comes, even when the idea is good and the fore­casts cor­rect about the (even­tu­al) out­come.

Progress can hap­pen and can be fore­seen long be­fore, but the de­tails and ex­act tim­ing due to bot­tle­necks are too diffi­cult to get right. Launch­ing too early means fail­ure, but be­ing con­ser­v­a­tive & launch­ing later is just as bad be­cause re­gard­less of fore­cast­ing, a good idea will draw over­ly-op­ti­mistic re­searchers or en­tre­pre­neurs to it like : all get im­mo­lated but the one with the dumb luck to kiss the flame at the per­fect in­stant, who then wins every­thing, at which point every­one can see that the op­ti­mal time is past. All ma­jor suc­cess sto­ries over­shadow their long list of pre­de­ces­sors who did the same thing, but got un­lucky. The les­son of his­tory is that for every lesson, there is an equal and op­po­site les­son. So, ideas can be di­vided into the over­ly-op­ti­mistic & likely doomed, or the fait ac­com­pli. On an in­di­vid­ual lev­el, ideas are worth­less be­cause so many oth­ers have them too—‘mul­ti­ple in­ven­tion’ is the rule, and not the ex­cep­tion. Pro­gress, then, de­pends on the ‘un­rea­son­able man’.

This over­all prob­lem falls un­der the re­in­force­ment learn­ing par­a­digm, and suc­cess­ful ap­proaches are anal­o­gous to Thomp­son sam­pling/­pos­te­rior sam­pling: even an in­formed strat­egy can’t re­li­ably beat ran­dom ex­plo­ration which grad­u­ally shifts to­wards suc­cess­ful ar­eas while con­tin­u­ing to take oc­ca­sional long shots. Since peo­ple tend to sys­tem­at­i­cally over-ex­ploit, how is this im­ple­ment­ed? Ap­par­ently by in­di­vid­u­als act­ing sub­op­ti­mally on the per­sonal lev­el, but op­ti­mally on so­ci­etal level by serv­ing as ran­dom ex­plo­ration.

A ma­jor ben­e­fit of R&D, then, is in lay­ing fal­low un­til the ‘ripe time’ when they can be im­me­di­ately ex­ploited in pre­vi­ous­ly-un­pre­dictable ways; ap­plied R&D or VC strate­gies should fo­cus on main­tain­ing di­ver­sity of in­vest­ments, while con­tin­u­ing to flex­i­bly re­visit pre­vi­ous fail­ures which fore­casts in­di­cate may have reached ‘ripe time’. This bal­ances over­all ex­ploita­tion & ex­plo­ration to progress as fast as pos­si­ble, show­ing the use­ful­ness of tech­no­log­i­cal fore­cast­ing on a global level de­spite its use­less­ness to in­di­vid­u­als.

In the 1980s, famed tech­nol­o­gist vis­ited the equal­ly-famed (per­haps the truest spir­i­tual de­scen­dant of the ) & , pub­lish­ing a 1988 book, The Me­dia Lab: In­vent­ing the Fu­ture at M.I.T. (TML). Brand sum­ma­rized the projects he saw there and Lab mem­bers’ ex­trap­o­la­tions into the fu­ture which guided their pro­jects, and added his own fore­cast­ing thoughts.

Visiting the Media Lab

Three decades lat­er, the book is highly dat­ed, and the de­scrip­tions are of mostly his­tor­i­cal in­ter­est for the de­vel­op­ment of var­i­ous tech­nolo­gies (par­tic­u­larly in the 1990s). But enough time has passed since 1988 to en­able us to judge the ba­sic truth­ful­ness of the pre­dic­tions and ex­pec­ta­tions held by the dream­ers such as Nicholas Ne­gro­pon­te: they were re­mark­ably ac­cu­rate! And the Me­dia Lab was­n’t the only one, (1989), had an al­most iden­ti­cal vi­sion of a net­worked fu­ture pow­ered by small touch­screen de­vices. (And what about Dou­glas En­gel­bart, or Alan Kay/Xe­rox PARC, who ex­plic­itly aimed to ‘skate to­wards where the puck would be’?) If you aren’t struck by a sense of déjà vu or pity when you read this book, com­pare the claims by peo­ple at the Me­dia Lab with con­tem­po­rary—or lat­er—­works like , and you’ll see how right they were.

Déjà vu, be­cause what was de­scribed in TML on every other page is rec­og­niz­ably or­di­nary life in the 1990s and 2000s, never mind the 2010s, from the spread of broad­band to the even­tual im­pact of smart­phones.

And pity, be­cause the sad thing is not­ing how few fu­ture mil­lion­aires or bil­lion­aires grace the page of TML—one quickly re­al­izes that yes, per­son X was 100% right about Y hap­pen­ing even when every­one thought it in­sane, ex­cept that X was just a bit off, by a few years, and ei­ther jumped the gun or was too late, and so some other Z who does­n’t even ap­pear in TML was the per­son who wound up tak­ing all the spoils. I read it con­stantly think­ing ‘yes, yes, you were right—­for all the good it did you!’, or ‘not quite, it’d ac­tu­ally take an­other decade for that to re­ally work out’.

To Everything A Season

“I ba­si­cally think all the ideas of the ’90s that every­body had about how this stuff was go­ing to work, I think they were all right, they were all cor­rect. I think they were just ear­ly.”

Marc An­dreessen, 2014

The ques­tion con­stantly asked of any­one who claims to know a bet­ter way (as fu­tur­ol­o­gists im­plic­itly do): “If you’re so smart, why aren’t you rich?” The les­son I draw is: it is not enough to pre­dict the fu­ture, one has to get the tim­ing right to not be ru­ined, and then ex­e­cute, and then get lucky in a myr­iad ways.

Many ‘bub­bles’ can be in­ter­preted as peo­ple be­ing 100% cor­rect the fu­ture—but miss­ing the tim­ing (1, The Econ­o­mist on ob­scure prop­erty booms, Gar­ber’s Fa­mous First Bub­bles). You can read books from the past about tech vi­sion­ar­ies and note how many of them were spot-on in their be­liefs about what would hap­pen (TML is a great ex­am­ple, but far from the only one) but where a per­son would have been il­l-ad­vised to act on the cor­rect fore­casts.

Not To the Swift

“Who­ever does not know how to hit the nail on the head should be en­treated not to hit the nail at all.”

Friedrich Ni­et­zsche2

Many star­tups have a long list of failed pre­de­ces­sors who tried to do much the same thing, often si­mul­ta­ne­ous with sev­eral other com­peti­tors (s­tar­tups are just as sus­cep­ti­ble to as sci­ence/tech­nol­ogy in gen­eral3). What made them a suc­cess was that they hap­pened to give the pinata a whack at the ex­act mo­ment where some S-curves or events hit the right point. Con­sider the il­l-fated : was the in­vestor right to be­lieve that Amer­i­cans would spend a ton of money on­line such as for buy­ing dog­food? Ab­solute­ly, Ama­zon (which has rarely turned a profit and has sucked up far more in­vest­ment than Pet­s.­com ever did, a mere ~$466$3002002m) is a suc­cess­ful on­line re­tail busi­ness that stocks thou­sands of dog food va­ri­eties, to say noth­ing of all the other pet-re­lated goods it sells, and , which pri­mar­ily does pet food, filed for a mul­ti­-bil­lion-dol­lar IPO in 2019 on the strength of its bil­lions in rev­enue. But the value of Pet­s.­com stock still went to ~$0. Face­book is the biggest archive of pho­tographs there has ever been, with truly colos­sal stor­age re­quire­ments; could it have suc­ceeded in the 1990s? , and not even lat­er, as demon­strated by and , and the lin­ger­ing death of My­Space. One of the most no­to­ri­ous tech busi­ness fail­ures of the 1990s was the , but that was brought down by bizarrely self­-s­ab­o­tag­ing de­ci­sions on the part of Mo­toro­la, and when Mo­torola was fi­nally re­moved from the equa­tion, Irid­ium found its mar­ket, and 2017 saw the launch of the sec­ond Irid­ium satel­lite con­stel­la­tion, Irid­ium NEXT, with com­pe­ti­tion from other since-launched satel­lite con­stel­la­tions, in­clud­ing SpaceX’s own nascent (aim­ing at global broad­band In­ter­net) which launched no less than 60 satel­lites in May 2019. Or look at com­put­ers: imag­ine an early adopter of an Ap­ple com­puter say­ing ‘every­one will use com­put­ers even­tu­al­ly!’ Yes, but not for an­other few decades, and ‘in the long run, we are all dead’. Early PC his­tory is rife with ex­am­ples of the pre­scient fail­ing.

Smart­phones are an even big­ger ex­am­ple of this. How often did I read in the ’90s and early ’00s about how amaz­ing Japan­ese cell­phones were and how amaz­ing a good smart­phone would be, even though year after year the phones were jokes and used pretty much solely for voice? You can see the smart­phones come up again and again in TML, as the vi­sion­ar­ies re­al­ize how trans­for­ma­tive a mo­bile pock­et-sized com­puter would be. Yet, it took un­til the mid-00s for the promise of smart­phones to ma­te­ri­al­ize overnight, as it were, a suc­cess which went pri­mar­ily to late­com­ers Ap­ple and Google, cut­ting out the pre­vi­ously high­ly-suc­cess­ful Nokia, never mind vi­sion­ar­ies like Gen­eral Mag­ic. (You too can achieve overnight suc­cess in just a few decades of hard work…) A 2013 in­ter­view with Eric Jack­son looks back on smart­phone adop­tion rates:

Q: “What’s your take on how they’re [Ap­ple] han­dling their ex­pan­sion into Chi­na, In­dia, and other emerg­ing mar­kets?”

A: “It’s de­press­ing how slow things are mov­ing on that front. We can draw lines on a graph but we don’t know the con­straints. Again, the is­sue with adop­tion is that the tim­ing is so damn hard. I was ex­pect­ing smart­phones to take off in mid 2004 and was dis­ap­pointed over and over again. And then sud­denly a cat­a­lyst took hold and the adop­tion sky­rock­et­ed. Cook calls this ‘crack­ing the nut’. I don’t know what they can do to move faster but I sus­pect it has to do with place­ment (dis­tri­b­u­tion) and with net­works which both de­pend on (cor­rupt) en­ti­ties.”

In 2012, I watched im­pressed as my aunt used the iPhone ap­pli­ca­tion Face­Time to video chat with her daugh­ter half a con­ti­nent away. In other words, her smart­phone is a video­phone; used to be one of the canon­i­cal ex­am­ples of how tech­nol­ogy failed, stem­ming from its ap­pear­ance in the & but sub­se­quent fail­ure to usurp tele­phones. This was oft-cited as an ex­am­ple of how tech­nowee­nies failed to un­der­stand that peo­ple did­n’t re­ally want video­phones at all—‘who wants to put on makeup be­fore mak­ing a call?’, peo­ple offered as an ex­pla­na­tion, in all se­ri­ous­ness—but re­al­ly, it looks like the video­phones back then sim­ply weren’t good enough.

Or to look at VR; I’ve no­ticed geeks ex­press won­der­ment at the (and and and and …) bring­ing Vir­tual Re­al­ity to the mass­es, and won’t that be a kick in the teeth for the Cliff Stolls & Jaron Laniers (who gave up VR for dead decades ago)? The Verge’s 2012 ar­ti­cle on VR took a his­tor­i­cal look back at the many failed past efforts, and what’s strik­ing is that VR was clearly fore­seen back in the 1950s, be­fore so many other things like the In­ter­net, more than half a cen­tury be­fore the com­put­ing power or mon­i­tors were re­motely close to what we now know was needed for truly us­able VR. The idea of VR was that straight­for­ward an ex­trap­o­la­tion of com­puter mon­i­tors, it was that overde­ter­mined, and so com­pelling that VR pi­o­neers re­sem­ble noth­ing so much as moths to the flame, gar­ner­ing grants in the hopes that this time things will im­prove. And at some point, it does im­prove, and the first per­son to try at the right time may win the lot­tery; Palmer Luckey (founder of Ocu­lus, sold to Face­book for $2.83$2.32014 bil­lion in March 20144):

Here’s a se­cret: the thing stop­ping peo­ple from mak­ing good VR and solv­ing these prob­lems was not tech­ni­cal. Some­one could have built the Rift in mid-to-late 2007 for a few thou­sand dol­lars, and they could have built it in mid-2008 for about $647$5002008. It’s just no­body was pay­ing at­ten­tion to that.

Go To The Ant, Thou Sluggard

“Yes, but when I dis­cov­ered it, it stayed dis­cov­ered.”

Lawrence Shepp (at­trib­ut­ed; “Pity the Sci­en­tist Who Dis­cov­ers the Dis­cov­ered”)

“It’s im­por­tant to be the last per­son to dis­cover some­thing.” (at­trib­uted)

Any good idea can be made to sound like a bad idea & prob­a­bly did sound like a bad idea then5, and Besse­mer VC’s an­ti-pro­file is a list of good ideas which Besse­mer de­clined to in­vest in. Michael Wolfe offers some ex­am­ples of this:

  • Face­book: the world needs yet an­other My­Space or Friend­ster [or or…] ex­cept sev­eral years late. We’ll only open it up to a few thou­sand over­worked, an­ti-so­cial, Ivy Lea­guers. Every­one else will then join since Har­vard stu­dents are so cool.6
  • Drop­box: we are go­ing to build a file shar­ing and sync­ing so­lu­tion when the mar­ket has a dozen of them that no one us­es, sup­ported by big com­pa­nies like Mi­crosoft. It will only do one thing well, and you’ll have to move all of your con­tent to use it.
  • Vir­gin At­lantic: air­lines are cool. Let’s start one. How hard could it be? We’ll differ­en­ti­ate with a funny safety video and by not be­ing a—­holes.
  • …iOS: a brand new op­er­at­ing sys­tem that does­n’t run a sin­gle one of the mil­lions of ap­pli­ca­tions that have been de­vel­oped for Mac OS, Win­dows, or Lin­ux. Only Ap­ple can build apps for it. It won’t have cut and paste.
  • Google: we are build­ing the world’s 20th search en­gine at a time when most of the oth­ers have been aban­doned as be­ing com­modi­tized money losers. We’ll strip out all of the ad-sup­ported news and por­tal fea­tures so you won’t be dis­tracted from us­ing the free search stuff.
  • Tes­la: in­stead of just build­ing bat­ter­ies and sell­ing them to De­troit, we are go­ing to build our own cars from scratch plus own the dis­tri­b­u­tion net­work. Dur­ing a re­ces­sion and a clean­tech back­lash.7
  • …Fire­fox: we are go­ing to build a bet­ter web browser, even though 90% of the world’s com­put­ers al­ready have a free one built in. One guy will do most of the work.

We can play this game all day:

  • How about ? “We’ll start off rent­ing peo­ple a doomed for­mat in a way in­fe­rior to our es­tab­lished com­peti­tor (which will choose to com­mit sui­cide by ig­nor­ing both mail or­der & In­ter­net all the way un­til bank­ruptcy in 2010)8; this will (some­how) let us pivot to stream­ing, where we will li­cense all our con­tent from our worst en­e­mies, who will de­stroy us the in­stant we are too suc­cess­ful & al­ready in­tend to run stream­ing ser­vices of their own—but that’s OK be­cause we’ll just con­vince Wall Street to wait decades while giv­ing us hun­dreds of bil­lions of dol­lars to re­place Hol­ly­wood by mak­ing thou­sands of film & TV se­ries our­selves (de­spite the fact that we’ve never done any­thing like that be­fore and there is no rea­son to think we would be any bet­ter at it than they are).”

  • Or : “We’ll offer code host­ing ser­vices like that of Source­Forge or Google Code which re­quires de­vel­op­ers to use one of the most user-hos­tile , only to FLOSS de­vel­op­ers who are no­to­ri­ous cheap­skates, and charge them a few bucks for a pri­vate ver­sion.”

  • : “ has a mul­ti­-decade head­start but are fat and lazy; we’ll catch up by buy­ing some spare Russ­ian rock­ets while we in­vent our own fu­tur­is­tic reusable ones. It’s only rocket sci­ence.”

  • //: “Taxis & bus­es. You’ve in­vented taxis & bus­es. And rental bikes.”

  • /: “We’ll do / again, mi­nus the bank­rupt­cy.”

  • : “Every­one else’s on­line pay­ments has failed, so we’ll do it again, with anony­mous cryp­tog­ra­phy! On phones! In 1998! End-users love cryp­tog­ra­phy, right? If the soft­ware does­n’t work out, I guess we’ll… do some­thing else. We’re not sure what.” Lat­er: “oh, ap­par­ently eBay sell­ers like us so much they’re mak­ing their own pro­mo­tional ma­te­ri­als? What if in­stead of threat­en­ing to sue them, we tried work­ing with them?”

  • : “Text­PayMe worked out well, right?”

  • : “On­line mi­cro­pay­ments & pa­tron­age schemes have failed hun­dreds of times and be­came a ’90s punch­line; might as well try again.”

  • : “Every on­line-only cur­rency from to to to [too many to list] has ei­ther failed or been shut down by gov­ern­ments; so, we’ll use ‘proof of work’—it’s a hi­lar­i­ously ex­pen­sive cryp­to­graphic thing we just made up which has zero the­o­ret­i­cal sup­port for ac­tu­ally en­sur­ing de­cen­tral­iza­tion & cen­sor­proofing, and was roundly mocked by al­most every e-cur­rency en­thu­si­ast who both­ered to read the whitepa­per.”

  • //Uber Eats// (!): “Cy­ber­Slice blew through $167$1002000m+ try­ing to sell pizza on­line, but this time will be differ­ent.”

  • FedEx: “The ex­pe­ri­enced & well-cap­i­tal­ized is al­ready try­ing and fail­ing to make the hub-and-spoke air de­liv­ery method work; I’ll blow my in­her­i­tance on try­ing to com­pete with them while be­ing so un­der­cap­i­tal­ized I’ll have to com­mit mul­ti­ple crimes to keep FedEx afloat like lit­er­ally gam­bling the com­pa­ny’s money at Las Ve­gas.”

  • : “ lit­er­ally in­vented the spread­sheet, has owned the mar­ket for 4 years de­spite clones like Mi­crosoft­’s, and sin­gle­hand­edly made the Ap­ple II PC a mega-suc­cess; we’ll write our own spread­sheet from scratch, fix­ing some of Visi­Cal­c’s prob­lems, and beat them to the IBM PC. Every­one will buy it sim­ply be­cause it’ll be slightly bet­ter.”

  • : “We’ll max out our credit cards to let peo­ple il­le­gally rent out their air mat­tresses en route to eat­ing the ho­tel in­dus­try.”

  • : “Banks & on­line pay­ment proces­sors like Pay­Pal are heav­i­ly-reg­u­lated in­effi­cient mo­nop­o­lies which re­ally suck; we’ll make friends with some banks and run a pay­ment proces­sor which does­n’t suck. Our sig­na­ture sell­ing point will be that it takes fewer lines of code to set up, so pro­gram­mers will like us.”

  • : “We’ll do so­cial net­work­ing like has been by , forc­ing us to buy their patent.”

  • : “IRC+email but in­fi­nitely slower & more locked in. Busi­nesses won’t be able to get enough of it; em­ploy­ees will love to hate it.”

  • : “We’ll com­pete with , the , , , , The Dis­trib­uted En­cy­clo­pe­dia Project (TDEP), The­In­fo, & GNU’s by let­ting lit­er­ally any­one edit some draft ar­ti­cles for .”9

You don’t have to be bipo­lar to be an en­tre­pre­neur, but it might help. (“The most suc­cess­ful peo­ple I know be­lieve in them­selves al­most to the point of delu­sion…”)

But Time And Chance

“After solv­ing a prob­lem, hu­man­ity imag­ines that it finds in anal­o­gous so­lu­tions the key to all prob­lems. Every au­then­tic so­lu­tion brings in its wake a train of grotesque so­lu­tions.”

, Nicolás Gómez Davi­la: An An­thol­ogy (o­rig­i­nal: Es­co­l­ios a un Texto Im­plíc­i­to: Se­lec­ción, p. 430)

“You can’t pos­si­bly get a good tech­nol­ogy go­ing with­out an enor­mous num­ber of fail­ures. It’s a uni­ver­sal rule. If you look at bi­cy­cles, there were thou­sands of weird mod­els built and tried be­fore they found the one that re­ally worked. You could never de­sign a bi­cy­cle the­o­ret­i­cal­ly. Even now, after we’ve been build­ing them for 100 years, it’s very diffi­cult to un­der­stand just why a bi­cy­cle work­s–it’s even diffi­cult to for­mu­late it as a math­e­mat­i­cal prob­lem. But just by trial and er­ror, we found out how to do it, and the er­ror was es­sen­tial.”

, “Free­man Dyson’s Brain” 1998 (cf )

Why so many failed pre­de­ces­sors?

Part of the ex­pla­na­tion is sur­vivor­ship bias caus­ing hind­sight bias. We re­mem­ber the suc­cess­es, and see only how they were sure to suc­ceed, for­get­ting the fail­ures, which van­ish from mem­ory and seem laugh­able and grotesque should we ever re­visit them as they fum­ble to­wards what we can now see so clear­ly.

The ori­gins of many star­tups are highly idio­syn­cratic & chancy; eg. why should a pod­cast­ing com­pa­ny, , have led to ? Sur­vival alone is highly chancy, and founders can often see times where it came down to a dice roll.10 Like his­tor­i­cal events in gen­eral (), the im­por­tance of an event or change is often known only in ret­ro­spect. Over­all, the odds of suc­cess are low, and the re­wards are not great for most—de­spite the skewed dis­tri­b­u­tion pro­duc­ing oc­ca­sional eye­-pop­ping re­turns in a few cas­es, the risk-ad­justed re­turn of the tech­nol­ogy sec­tor or VC funds is not that much greater than the broader econ­o­my.

“Of course Google was al­ways go­ing to be a huge suc­cess be­cause of and also (post hoc the­o­riz­ing) Z, Y, & Z”, ex­cept for the mi­nor prob­lem that Google was merely one of many search en­gi­nes, great per­haps11 but not profitable, and did­n’t hit upon a profitable busi­ness mod­el—­much less a uni­corn-worth mod­el—un­til 4 years later when it copied Over­ture’s ad­ver­tis­ing auc­tion, which was its sal­va­tion (In The Plex); in the mean time, Google had to sign po­ten­tially fa­tal deals or risk­ing burn­ing through the last of its cap­i­tal when mi­nor tech­ni­cal glitches de­railed vi­tal deals. (All of which was doubt­less why Page & Brin tried & failed to sell Google to Al­taVista & Ex­cite & Ya­hoo early on, and ne­go­ti­ated a pos­si­ble sale with Ya­hoo as late as 2002 which they ul­ti­mately re­ject­ed.) In a coun­ter­fac­tual world, Google went down in flames quite eas­ily be­cause it never hit upon the ad­ver­tis­ing in­no­va­tions that saved it, no mat­ter how much you liked PageR­ank, and any­thing else is hind­sight bias. , early on, could­n’t make pay­roll and the founder fa­mously kept the planes fly­ing only by gam­bling the last of their money in Las Ve­g­as, among other near-death ex­pe­ri­ences & crimes—just one of many star­tups do­ing highly ques­tion­able things.12 Both SpaceX & Tesla have come within days (or hours) of bank­rupt­cy, in 2008 and 2013; in the for­mer case, Musk bor­rowed money from friends to pay his rent after 3 rocket fail­ures in a row, and in the lat­ter, Musk re­port­edly went as far as se­cur­ing a pledge from Google to buy Tesla out­right rather than let it go bank­rupt (Vance 2015). Tes­la’s strug­gles in gen­eral are too well known to men­tion (such as Musk ask­ing Ap­ple to ac­quire them in 2017 in the depths of man­u­fac­tur­ing cri­sis when weeks from col­lapse). Mark Zucker­berg, in 2004, wanted noth­ing more than to sell Face­book for a few mil­lion dol­lars so he could work on his P2P file­shar­ing pro­gram, , com­ment­ing that the sale price just needed to be large enough “to pro­pel Wire­hog.” Youtube was a dat­ing site. wanted to make a MMORPG game which failed, and all he could sal­vage out of it was the pho­to-shar­ing part, which be­came ; he still re­ally wanted to make a MMORPG, so after Flickr, he founded a com­pany to make the MMORPG which… also failed, so after try­ing to shut down his com­pany and be­ing told not to by his in­vestors, he sal­vaged the chat part from it, which be­came . And, con­sis­tent with the idea that there is a large in­erad­i­ca­ble el­e­ment of chance to it, sur­veys of star­tups sug­gest that while there are in­di­vid­ual differ­ences in odds of suc­cess (‘skill’), any founder learn­ing curve (‘learn­ing-by-do­ing’) is small & suc­cess prob­a­bil­ity re­mains low re­gard­less of ex­pe­ri­ence (Gom­pers et al 2006/Gom­pers 2010, Parker 2011, Gottschalk 2014), and ex­pe­ri­enced en­tre­pre­neurs still have low odds of fore­cast­ing star­tups achiev­ing com­mer­cial­iza­tion at all, ap­proach­ing ran­dom pre­dic­tions in “non-R&D-in­ten­sive sec­tors” (eg Scott et al 2019, ).

Thiel (; orig­i­nal): “Every mo­ment in busi­ness hap­pens only once. The next Bill Gates will not build an op­er­at­ing sys­tem. The next Larry Page or Sergey Brin won’t make a search en­gine. And the next Mark Zucker­berg won’t cre­ate a so­cial net­work. If you are copy­ing these guys, you aren’t learn­ing from them.”. This is true but I would say it re­verses the or­der (‘N to N+1’?): you will not be the next Bill Gates, be­cause Bill Gates was not the first and only Bill Gates, he was, pace , the last Bill Gates13; many peo­ple made huge for­tunes off OS­es, both be­fore and after Gates—you may have for­got­ten , but hope­fully you re­mem­ber Steve Jobs (be­fore, Mac) and Steve Jobs (after, NeX­T). Sim­i­lar­ly, Mark Zucker­berg was not the first and only Zucker­berg, he was the last Zucker­berg; many peo­ple made so­cial net­work­ing for­tunes be­fore him—­maybe Orkut did­n’t make its Google in­ven­tor a for­tune, but you can bet that My­Space’s De­Wolfe and An­der­son did well. And there were plenty of lu­cra­tive search en­gine founders (is still a bil­lion­aire? Yes).

Gates, how­ev­er, proved the mar­ket, and re­fined the Gates strat­egy to per­fec­tion, us­ing up the trick; no one can get his­tor­i­cally rich off ship­ping an OS plus some busi­ness pro­duc­tiv­ity soft­ware be­cause there are too many com­peti­tors and too many play­ers in­ter­ested in , and so op­por­tu­nity has moved on to the next area.

A suc­cess­ful com­pany rewrites his­tory and its pre­cur­sors14; his­tory must be lived for­ward, pro­gress­ing to an ob­scure des­ti­na­tion, but we al­ways re­call it back­wards as pro­gress­ing to­wards the clar­ity of the pre­sent.

The Wise in their Craftiness

“It is uni­ver­sally ad­mit­ted that the uni­corn is a su­per­nat­ural be­ing and one of good omen; thus it is de­clared in the Odes, in the An­nals, in the bi­ogra­phies of il­lus­tri­ous men, and in other texts of un­ques­tioned au­thor­i­ty. Even the women and chil­dren of the com­mon peo­ple know that the uni­corn is a fa­vor­able por­tent. But this an­i­mal does not fig­ure among the do­mes­tic an­i­mals, it is not easy to find, it does not lend it­self to any clas­si­fi­ca­tion. It is not like the horse or the bull, the wolf or the deer. Un­der such con­di­tions, we could be in the pres­ence of a uni­corn and not know with cer­tainty that it is one. We know that a given an­i­mal with a mane is a horse, and that one with horns is a bull. We do not know what a uni­corn is like.”

, “Kafka And His Pre­cur­sors” (1951)

Can you ask re­searchers if the time is ripe? Well: re­searchers have a slight con­flict of in­ter­est in the mat­ter, and are happy to spend ar­bi­trary amounts of money on top­ics with­out any­thing to show for it. After all, why would they say no?

Scott Fish­er:

I ended up do­ing more work in Japan than any­thing else be­cause Japan in gen­eral is so tech-s­mit­ten and ob­sessed that they just love it [VR]. The Japan­ese gov­ern­ment in gen­eral was fund­ing re­search, build­ing huge re­search com­plexes just to fo­cus on this. There were huge ini­tia­tives while there was noth­ing hap­pen­ing in the US. I ended up mov­ing to Japan and work­ing there for many years.

In­deed, this would have around the Japan­ese boon­dog­gle the (note that de­spite Japan’s re­puted prowess at ro­bot­ics, it is not Japan’s ro­bots who went into Fukushima / fly­ing around the Mid­dle East / rev­o­lu­tion­iz­ing agri­cul­ture and con­struc­tion). All those ‘huge ini­tia­tives’ and…? Don’t ask Fish­er, he’s hardly go­ing to say, “oh yeah, all the money was com­pletely wast­ed, we were try­ing to do it too soon; our bad”. And Lanier im­plies that Japan alone spent a lot of mon­ey:

Jaron Lanier: “The com­po­nents have fi­nally got­ten cheap enough that we can start to talk about them as be­ing ac­ces­si­ble in the way that every­body’s al­ways want­ed…­Moore’s law is so in­ter­est­ing be­cause it’s not just the same com­po­nents get­ting cheap­er, but it re­ally changes the way you do things. For in­stance, in the old days, in or­der to tell where your head was so that you could po­si­tion vir­tual con­tent to be stand­ing still rel­a­tive to you, we used to have to use some kind of ex­ter­nal ref­er­ence point, which might be mag­net­ic, ul­tra­son­ic, or op­ti­cal. These days you put some kind of cam­era on the head and look around in the room and it just cal­cu­lates where you are—the head­sets are self­-suffi­cient in­stead of re­ly­ing on an ex­ter­nal ref­er­ence in­fra­struc­ture. That was in­con­ceiv­able be­fore be­cause it would have been just so ex­pen­sive to do that cal­cu­la­tion. Moore’s law re­ally just changes again and again, it re-fac­tors your op­tions in re­ally sub­tle and in­ter­est­ing ways.”

Kevin Kelly: “Our sense of his­tory in this world is very dim and very short. We were talk­ing about the past: VR was­n’t talked about for a long time, right? 35 years. Most peo­ple have no idea that this is 35 years old. 30 years lat­er, it’s the same head­lines. Was the tech­no­log­i­cal power just not suffi­cient 30 years ago?”

…[On the , based on a VPL data­glove de­sign:]15

JL: “Both I and a lot of other peo­ple re­al­ly, re­ally wanted to get a con­sumer­able ver­sion of this stuff out. We man­aged to get a taste of the ex­pe­ri­ence with some­thing called the Power Glove…­Sony ac­tu­ally brought out a lit­tle near-eye dis­play called Vir­tual Boy; not very good, but they gave it their best shot. and there were huge projects that have never been shown to the pub­lic to try to make a con­sum­able [VR pro­duc­t], very ex­pen­sive ones. Count­ing for in­fla­tion, prob­a­bly more money was spent [than] than Face­book just spent on Ocu­lus. We just could nev­er, nev­er, never get it quite there.”

KK: “Be­cause?”

JL: “The com­po­nent cost. It’s Moore’s law. Sen­sors, dis­plays… bat­ter­ies! Bat­ter­ies is a big one.”

Is­sues like com­po­nent cost were not some­thing that could be solved by a VR re­search pro­ject, no mat­ter how am­bi­tious. Those were hard bind­ing lim­its, and to solve them by cre­at­ing tiny high­-res­o­lu­tion LED/LCD screens for smart­phones, re­quired the ben­e­fit of decades of Moore’s law and the effects of man­u­fac­tur­ing bil­lions of smart­phones.

Re­searchers in gen­eral have no in­cen­tive to say, “this is not the right time, wait an­other 20 years for Moore’s law to make it doable”, even if every­one in the field is per­fectly aware of this—:

I spent a huge amount of time read­ing…I think that there were a lot of peo­ple that were giv­ing VR too much cred­it, be­cause they were work­ing as VR re­searchers. You don’t want to pub­lish a pa­per that says, ‘After the study, we came to the con­clu­sion that VR is use­less right now and that we should just not have a job for 20 years.’ There were a few peo­ple that ba­si­cally came to that con­clu­sion. They said, ‘Cur­rent VR gear is low field of view, high lag, too ex­pen­sive, too heavy, can’t be dri­ven prop­erly from con­sumer-grade com­put­ers, or even pro­fes­sion­al-grade com­put­ers.’ It turned out that I was­n’t the first per­son to re­al­ize these prob­lems. They’d been known for decades.

AI re­searcher , claimed in 1970, based on a 1969 poll, that a ma­jor­ity of AI re­searchers es­ti­mated 10–100 years for AGI (or 1979–2069) and that “There is also fair agree­ment that the chief ob­sta­cles are not hard­ware lim­i­ta­tions.”16 While AI re­searcher sur­veys still sug­gest that was­n’t a bad range (), the suc­cess of deep learn­ing makes clear that hard­ware was a huge lim­i­ta­tion, and re­sources 50 years ago fell short by at least 6 or­ders of mag­ni­tude. Michie went on to point out that in a pre­vi­ous case, Charles Bab­bage, his work was fore­doomed by it be­ing an “un­ripe time” due to hard­ware lim­i­ta­tions and rep­re­sented a com­plete waste of time & money17. This, ar­guably, was the case for Michie’s own re­search.

Nor Riches to Men of Understanding

“But to come very near to a true the­o­ry, and to grasp its pre­cise ap­pli­ca­tion, are two very differ­ent things, as the his­tory of sci­ence teaches us. Every­thing of im­por­tance has been said be­fore by some­body who did not dis­cover it.”

, The Or­ga­ni­za­tion of Thought (1917)

So you don’t know the tim­ing well enough to re­li­ably launch. You can’t im­i­tate a suc­cess­ful en­tre­pre­neur, the time is past. You can’t fore­see what will be suc­cess­ful based on what has been suc­cess­ful; you can’t even fore­see what won’t be suc­cess­ful based on what was al­ready un­suc­cess­ful; and you can’t ask re­searchers be­cause they are in­cen­tivized to not know the tim­ing any bet­ter than any­one else.

Can you at least profit from your knowl­edge of the out­come? Here again we must be pes­simistic.

Cer­tainty is ir­rel­e­vant, you still have prob­lems mak­ing use of this knowl­edge. Ex­am­ple: in ret­ro­spect, we know every­one wanted com­put­ers, OS­es, so­cial net­work­s—but the his­tory of them is strewn with flam­ing rub­ble. Sup­pose you some­how knew in 2000 that “in 2010, the founder of the most suc­cess­ful so­cial net­work will be worth at least $10b”; this is a fal­si­fi­able be­lief at odds with all con­ven­tional wis­dom and about a tech that blind­sided every­one. Yet, how use­ful would this knowl­edge be, re­ally? What would you do with it? Do you have the cap­i­tal to start a VC fund of your own, and throw mul­ti­-mil­lion-dol­lar in­vest­ments at every so­cial me­dia un­til fi­nally in 2010 you knew for sure that Face­book was the win­ning ticket and could cash out in the IPO? I doubt it.

It’s diffi­cult to in­vest in ‘com­put­ers’ or ‘AI’ or ‘so­cial net­work­ing’ or ‘VR’; there is no in­dex for these things, and it is hard to see how there even could be such a thing. (How do you force all rel­e­vant com­pa­nies to sell trad­able stakes? “If peo­ple don’t want to go to the ball game, how are you go­ing to stop them?” as Yogi Berra asked.) There is no con­ve­nient CMPTR you can buy 100 shares of and hold in­defi­nitely to cap­ture gains from your op­ti­mism about com­put­ers. IBM and Ap­ple both went nearly bank­rupt at points, and Mi­crosoft’s stock has been flat since 1999 or when­ever (trans­lat­ing to huge real losses and op­por­tu­nity costs to long-term hold­ers of it). If you knew for cer­tain that Face­book would be as huge as it was, what stocks, ex­act­ly, could you have in­vested in, pre-IPO, to cap­ture gains from its growth? Re­mem­ber, you don’t know any­thing else about the tech land­scape in the 2000s, like that Google will go way up from its IPO, you don’t know about Ap­ple’s re­vival un­der Job­s—all you know is that a so­cial net­work will ex­ist and will grow huge­ly. Why would any­one think that the fu­ture of smart­phones would be won by “a has-been 1980s PC maker and an ob­scure search en­gine”? (The best I can think of would be to sell any Mur­doch stock you owned when you heard they were buy­ing My­Space, but off­hand I’m not sure that Mur­doch did­n’t just stag­nate rather than drop as My­Space in­creas­ingly turned out to be a write­off.) In the hy­po­thet­i­cal that you did­n’t know the name of the com­pa­ny, you might’ve bought up a bunch of Google stock hop­ing that Orkut would be the win­ner, but while that would’ve been a de­cent in­vest­ment (yay!) it would have had noth­ing to do with Orkut (oop­s)…

And even when there are stocks avail­able to buy, you only ben­e­fit based on the specific­s—­like one of the ex­ist­ing stocks be­ing a win­ner, rather than all the stocks be­ing eaten by some new start­up. Let’s imag­ine a differ­ent sce­nar­io, where in­stead you were con­fi­dent that home ro­bot­ics were about to ex­pe­ri­ence a huge growth spurt. Is this even non­pub­lic knowl­edge at all? The world econ­omy grows at some­thing like 2% a year, la­bor costs gen­er­ally seem to go up, prices of com­put­ers and ro­bot­ics usu­ally falls… Do in­dus­try pro­jec­tions ex­pect to grow their sales by <25% a year?

But say that the mar­ket is wrongly pes­simistic. If so, you might spend some of your hy­po­thet­i­cal money on what­ever the best ap­prox­i­ma­tion to a ro­bot­ics in­dex fund you can find, as the best of a bunch of bad choic­es. (Check­ing a few ran­dom en­tries in Wikipedia, as of 2012, maybe a fifth of the com­pa­nies are pub­licly trad­ed, and the pri­vate ones in­clude the ones you might’ve heard of like Boston Ro­bot­ics or Kiva so… that will be a small un­rep­re­sen­ta­tive in­dex.) Sup­pose the home ro­botic growth were con­cen­trated in a sin­gle pri­vate com­pany which ex­ploded into the bil­lions of an­nual rev­enue and took away the mar­ket share of all the oth­ers, forc­ing them to go bank­rupt or merge or shrink. Home ro­bot­ics will have in­creased just as you be­lieved—keikaku doori!—yet your ‘in­dex fund’ gone bank­rupt (rein­dex when one of the ro­bot­ics com­pa­nies col­laps­es? Rein­dex into what, an­other doomed fir­m?). Then after your spe­cial knowl­edge has be­come pub­lic knowl­edge, the ro­bot­ics com­pany goes pub­lic, and by EMH, their shares be­come a nor­mal in­vest­ment.

Mor­gan Housel:

There were 272 au­to­mo­bile com­pa­nies in 1909. Through con­sol­i­da­tion and fail­ure, 3 emerged on top, 2 of which went bank­rupt. Spot­ting a promis­ing trend and a win­ning in­vest­ment are two differ­ent things.

Is this im­pos­si­bly rare? It sounds like Face­book! They grew fast, roflstomped other so­cial net­works, stayed pri­vate, and post-IPO, pub­lic in­vestors have not profited all that much com­pared to even late in­vestors.

Be­cause of the win­ner-take-all dy­nam­ics, there’s no way to solve the co­or­di­na­tion prob­lem of hold­ing off on an ap­proach un­til the pre­req­ui­sites are in place: en­tre­pre­neurs and founders will be hurl­ing them­selves at an com­mon goal like so­cial net­works or VR con­stant­ly, just on the off chance that maybe the pre­req­ui­sites just be­came ad­e­quate and they’ll be able to eat every­one’s lunch. A pre­dictable waste of mon­ey, per­haps, but that’s how the in­cen­tives work out. It’s a weird per­spec­tive to take, but we can think of other tech­nolo­gies which may be like this.

Bit­coin is a top­i­cal ex­am­ple: it’s still in the early stages where it looks ei­ther like a ge­nius stroke to in­vest in, or a fool’s par­adis­e/Ponzi scheme. In my first draft of this es­say in 2012, I noted that we see what looks like a Bit­coin bub­ble as the price in­flates from ~$0 to $165$1302012—yet, if Bit­coin were the Real Deal, we would ex­pect large price in­creases as peo­ple learn of it and it di­rectly gains value from in­creased use, an ecosys­tem slowly un­lock­ing the fancy cryp­to­graphic fea­tures, etc. And in 2019, with 2012 a dis­tant mem­o­ry, well, one could say some­thing sim­i­lar, just with larger num­bers…

Or take niche vi­sion­ary tech­nolo­gies: if cry­on­ics was cor­rect in prin­ci­pal, yet turned out to be worth­less for every­one do­ing it be­fore 2030 (be­cause the wrong per­fu­sion tech­niques or cry­op­reser­v­a­tives were used and some crit­i­cal bit of bi­ol­ogy was not vit­ri­fied) while prac­ti­cal post-2030 say, it would sim­ply be yet an­other tech­nol­ogy where vi­sion­ar­ies were ul­ti­mately right de­spite all nay-say­ing and skep­ti­cism from nor­mals but nev­er­the­less wrong in a prac­ti­cal sense be­cause they jumped on it too ear­ly, and so they wasted their mon­ey.

In­deed, do many things come to pass.

Surfing Uncertainty

“What­so­ever thy hand find­eth to do, do it with thy might; for there is no work, nor de­vice, nor knowl­edge, nor wis­dom, in the grave, whither thou goest.”

Qo­heleth,

ad­dressed the as­sem­bly and said: ‘I am not ask­ing you about the days be­fore the fifteenth of the month. But what about after the fifteen­th? Come and give me a word about those days.’ And he him­self gave the an­swer for them: ‘Every day is a good day.’”

, 18

Where does this leave us? In what I would call, in a nod to Thiel’s ‘defi­nite’ vs ‘in­defi­nite op­ti­mism’, defi­nite­ly-maybe op­ti­mism. Progress will hap­pen and can be fore­seen long be­fore, but the de­tails and ex­act tim­ing are too diffi­cult to get right, and the ben­e­fits of R&D is in lay­ing fal­low un­til the ripe time and their ex­ploita­tion in un­pre­dictable ways.

Re­turn­ing to Don­ald Michie: one could make fun of his ex­tremely over­ly-op­ti­mistic AI pro­jec­tions, and write him off as the stock fig­ure of the bi­ased AI re­searcher blinded by the ‘Maes-Gar­reau law’ where AI is al­ways sched­uled for right when a re­searcher will re­tire19 but while he was wrong, it is un­clear this was a mis­take be­cause in other cas­es, an ap­par­ently doomed re­search pro­jec­t—­Mar­coni’s at­tempt to ra­dio across the At­lantic ocean—­suc­ceeded be­cause of an ‘un­known un­known’—the 20. We could­n’t know for sure that such pro­jec­tions were wrong, and the amount of money be­ing spent back then on AI was truly triv­ial (and the com­mer­cial spin­offs likely paid for it all any­way).

Fur­ther, on the grip­ping hand, Michie sug­gests that such re­search efforts like Bab­bage’s should be thought of not as com­mer­cial R&D, ex­pected to usu­ally pay off right now, but as pro­to­types buy­ing op­tion­al­ity, demon­strat­ing that a par­tic­u­lar tech­nol­ogy was ap­proach­ing its ‘ripe time’ & in­di­cat­ing what are the bot­tle­necks, so so­ci­ety can go after the bot­tle­necks and then has the op­tion to scale up the pro­to­type as soon as the bot­tle­necks are fixed21. de­scribes ripe time as fi­nally en­abling at­tacks on con­se­quen­tial prob­lems22 de­scribes the de­vel­op­ment of both & as “fail­ure re­boot­ing”, re­vis­it­ing (failed) past ideas which may now be work­able in the light of progress in other ar­eas23. As time pass­es, the num­ber of op­tions may open up, and any of them may by­pass what was for­merly which was fa­tal. Enough progress in one do­main (par­tic­u­larly com­put­ing pow­er), can some­times make up for sta­sis in an­other do­main.

So, what Bab­bage should have aimed for is not mak­ing a prac­ti­cal think­ing ma­chine which could churn out naval ta­bles, but demon­strat­ing that a pro­gram­ma­ble think­ing ma­chine is pos­si­ble & use­ful, and cur­rently lim­ited by the slow­ness & size of its me­chan­i­cal log­ic—so that tran­sis­tors could be pur­sued with higher pri­or­ity by gov­ern­ments, and pro­gram­ma­ble com­put­ers could be cre­ated with tran­sis­tors as soon as pos­si­ble, in­stead of the his­tor­i­cal course of a me­an­der­ing piece­meal de­vel­op­ment where Bab­bage’s work was for­got­ten & then re­peat­edly rein­vented with de­lays (eg vs von Neu­man­n). Sim­i­lar­ly, the ben­e­fit of tak­ing Moore’s law se­ri­ously is that one can plan ahead to take ad­van­tage of it24 even if one does­n’t know ex­actly when, if ever, it will hap­pen.

Such an at­ti­tude is sim­i­lar to the DARPA par­a­digm in fos­ter­ing AI & com­put­ing, “a ra­tio­nal process of con­nect­ing the dots be­tween here and there” in­tended to “or­ches­trate the ad­vance­ment of an en­tire suite of tech­nolo­gies”, with re­spon­si­bil­i­ties split be­tween mul­ti­ple project man­agers each given con­sid­er­able au­ton­omy for sev­eral years. These project man­agers tend to pick po­lar­iz­ing projects rather than con­sis­tent projects (Gold­stein & Kear­ney 2017), ones which gen­er­ate dis­agree­ment among re­views or crit­ics. Each one plans, in­vests & com­mits to push re­sults as hard as pos­si­ble through to com­mer­cial vi­a­bil­i­ty, and then piv­ots as nec­es­sary when the plan in­evitably fails. (DARPA in­deed saw it­self as much like a VC fir­m.)

The ben­e­fit for some­one like DARPA of a fore­cast like Moore’s law is that it pro­vides one fixed trend to gauge over­all tim­ing to within a decade or so, and look for those dots which have lagged be­hind and be­come re­verse salients.25 For an en­tre­pre­neur, the ad­van­tage of ex­po­nen­tial think­ing is more fa­tal­is­tic: be­ing able to launch in the win­dow of time be­tween just after tech­ni­cal fea­si­bil­ity but be­fore some­one else ran­domly gives it a try; if wrong and it was al­ways im­pos­si­ble, it does­n’t mat­ter when one launch­es, and if wrong be­cause tim­ing is wrong, one’s choice is effec­tively ran­dom and lit­tle is lost by de­lay.

Try & Try Again (But Less & Less)

“The road to wis­dom?—Well, it’s plain
and sim­ple to ex­press:
Err
and err
and err again
but less
and less
and less.”

, Grooks

This presents a con­flict be­tween per­sonal and so­cial in­cen­tives. So­cial­ly, one wants peo­ple reg­u­larly toss­ing their bod­ies into the mar­ket­place to be tram­pled by un­car­ing forces just on the off chance that this time it’ll fi­nally work, and since the crit­i­cal fac­tors are un­known and con­stantly chang­ing, one needs a sac­ri­fi­cial startup every once in a while to check (for a good idea, no amount of fail­ures is enough to prove that it should never be tried—­many fail­ures just im­plies that there should be a back­off). Pri­vate­ly, given the skewed re­turns, di­min­ish­ing util­i­ty, the over­sized neg­a­tive im­pacts (a bad startup can ruin one’s life and drive one to sui­cide), the lim­ited num­ber of star­tups any in­di­vid­ual can en­gage in (yield­ing )26, and the fact that star­tups & VC will cap­ture only a minute per­cent­age of the to­tal gains from any suc­cess (most of which will turn into con­sumer sur­plus/­pos­i­tive ex­ter­nal­i­ties), the only star­tups that make any ra­tio­nal sense, which you would­n’t have to be crazy to try, are the overde­ter­mined ones which any­one can see are a great idea. How­ev­er, those are pre­cisely the star­tups that crazy peo­ple will have done years be­fore when they looked like bad ideas, avoid­ing the waste of de­lay. Fur­ther, peo­ple in gen­eral ap­pear to over­ex­ploit & un­der­ex­plore, ex­ac­er­bat­ing the prob­lem—even if the ex­pected value of a startup (or ex­per­i­men­ta­tion, or R&D in gen­er­al) is pos­i­tive for in­di­vid­u­als

So, it seems that rapid progress de­pends on crazy peo­ple.

There is a more than su­per­fi­cial anal­ogy here, I think, to 27/pos­te­rior sam­pling (PSRL) . In RL’s mul­ti­-armed ban­dit set­ting, each turn one has a set of ‘arms’ or op­tions with un­known pay­offs and one wants to max­i­mize the to­tal long-term re­ward. The diffi­culty is in cop­ing with fail­ure: even good op­tions may fail many times in a row, and bad op­tions may suc­ceed, so op­tions can­not sim­ply be ruled out after a fail­ure or two, and if one is too hasty to write an op­tion off, one may take a long time to re­al­ize that, los­ing out for many turns.

One of the sim­plest & most effi­cient MAB so­lu­tions, which max­i­mizes the to­tal long-term re­ward and min­i­mizes ‘re­gret’ (op­por­tu­nity cost), is Thomp­son sam­pling & its gen­er­al­iza­tion PSRL28: ran­domly se­lect each op­tion with a prob­a­bil­ity equal to the cur­rent es­ti­mated prob­a­bil­ity that it is the most profitable op­tion. This ex­plores all op­tions ini­tially but grad­u­ally homes in on the most profitable op­tion to ex­ploit most of the time, while still oc­ca­sion­ally ex­plor­ing all the other op­tions once in a while, just in case; strictly speak­ing Thomp­son sam­pling will never ban an op­tion per­ma­nent­ly, the prob­a­bil­ity of se­lect­ing an op­tion merely be­comes van­ish­ingly rare. Ban­dit set­tings can fur­ther as­sume that op­tions are ‘rest­less’ and the op­ti­mal op­tion may ‘drift’ over time or ‘run out’ or ‘switch’, in which case one also es­ti­mates the prob­a­bil­ity that an op­tion has switched, and when it does, one changes over to the new best op­tion; in­stead of the reg­u­lar Thomp­son sam­pling where bad op­tions be­come ever more un­likely to be tried, a rest­less ban­dit re­sults in con­stant low-level ex­plo­ration be­cause one must con­stantly check lest one fails to no­tice a switch.

This bears a re­sem­blance to startup rates over time: an ini­tial burst of en­thu­si­asm for a new ‘op­tion’, when it still has high prior prob­a­bil­ity of be­ing the most profitable op­tion at the mo­ment, trig­gers a bunch of star­tups se­lect­ing that op­tion, but then when they fail, the pos­te­rior prob­a­bil­ity drops sub­stan­tial­ly; how­ev­er, even if some­thing now looks like a bad idea, there will still be peo­ple every once in a while who in­sist on try­ing again any­way, and, be­cause the prob­a­bil­ity is not 0, once in a while they suc­ceed wildly and every­one is as­ton­ished that ‘so, X is a thing now!’

In DARPA’s re­search fund­ing and VC, they often aren’t look­ing for a plan which looks good on av­er­age to every­one, or which no one can find any par­tic­u­lar prob­lem with, but some­thing closer to a plan which at least one per­son thinks could be awe­some for some rea­son. An ad­di­tional anal­ogy from re­in­force­ment learn­ing is PSRL, which han­dles more com­plex prob­lems by com­mit­ting to a strat­egy and fol­low­ing it un­til the end and ei­ther suc­cess/­fail­ure. A naive Thomp­son sam­pling would do badly in a long-term prob­lem be­cause at every step, it would ‘change its mind’ and be un­able to fol­low any plan con­sis­tently for long enough to see what hap­pens; what is nec­es­sary is to do ‘deep ex­plo­ration’, fol­low­ing a sin­gle plan long enough to see how it works, even if one thinks that plan is al­most cer­tainly wrong, one must “Dis­agree and com­mit”. The av­er­age of mul­ti­ple plans is often worse than any sin­gle plan. The most in­for­ma­tive plan is the most po­lar­iz­ing one.29

The sys­tem as a whole can be seen in RL terms. One theme I no­tice in many sys­tems is that they fol­low a En­sem­ble meth­ods like dropout or mul­ti­-a­gent op­ti­miza­tion can fol­low this pat­tern as well.

A par­tic­u­larly ger­mane ex­am­ple here is / (dis­cus­sion), which ex­am­ines a large dataset of trades made by on­line traders, who are able to clone fi­nan­cial trad­ing strate­gies of more suc­cess­ful traders; as traders find suc­cess­ful strate­gies, oth­ers grad­u­ally im­i­tate them, and so the sys­tem as a whole con­verges on bet­ter strate­gies in what they iden­tify as a sort of -like im­ple­men­ta­tion of “dis­trib­uted Thomp­son sam­pling” which they dub “so­cial sam­pling”. So for the most part, traders clone pop­u­lar strate­gies, but with cer­tain prob­a­bil­i­ties, they’ll ran­domly ex­plore rarer ap­par­ent­ly-un­suc­cess­ful strate­gies.

This sounds a good deal like in­di­vid­u­als pur­su­ing stan­dard ca­reers & oc­ca­sion­ally ex­plor­ing un­usual strate­gies like a star­tup; they will oc­ca­sion­ally ex­plore strate­gies which have per­formed badly (ie. pre­vi­ous sim­i­lar star­tups failed). En­tre­pre­neurs, with their spec­u­la­tions and op­ti­mistic bi­as­es, serve as ran­dom­iza­tion de­vices to sam­ple a strat­egy re­gard­less of the ‘con­ven­tional wis­dom’, which at that point may be no more than an ; in­for­ma­tion cas­cades, how­ev­er, can be bro­ken by the ex­is­tence of out­liers who are ei­ther in­formed or act at ran­dom (“mis­fits”). While each time a failed op­tion is tried, it may seem ir­ra­tional (“how many times must VR fail be­fore peo­ple fi­nally give up on it‽”), it was still ra­tio­nal in the big pic­ture to give it a try, as this col­lec­tive strat­egy col­lec­tively min­i­mizes re­gret & max­i­mizes col­lec­tive to­tal long-term re­turn­s—as long as failed op­tions aren’t tried too often.

Reducing Regret

What does this anal­ogy sug­gest? The two fail­ure modes of a MAB al­go­rithm are in­vest­ing too much in one op­tion early on, and then in­vest­ing too lit­tle later on; in the for­mer, you in­effi­ciently buy too much in­for­ma­tion on an op­tion which hap­pened to have good luck but is not guar­an­teed to be the best at the ex­pense of oth­ers (which may in fact be the best), while in the lat­ter, you buy too lit­tle & risk per­ma­nently mak­ing a mis­take by pre­ma­turely re­ject­ing an ap­par­ent­ly-bad op­tion (which sim­ply had bad luck early on). To the ex­tent that VC/s­tar­tups stam­pede into par­tic­u­lar sec­tors, this leads to in­effi­ciency of the first time—were so many ‘green en­ergy’ star­tups nec­es­sary? When they be­gan fail­ing in a clus­ter, in­for­ma­tion-wise, that was highly re­dun­dant. And then on the other hand, if a startup idea be­comes ‘de­bunked’, and no one is will­ing to in­vest in it ever, that idea may be starved of in­vest­ment long past its ripe time, and this means big re­gret.

I think most peo­ple are aware of fad­s/­stam­pedes in in­vest­ing, but the lat­ter er­ror is not so com­monly dis­cussed. One idea is that a VC firm could ex­plic­itly track ideas that seem great but have had sev­eral failed star­tups, and try to sched­ule ad­di­tional in­vest­ments at ever greater in­ter­vals (sim­i­lar to DS-PRL), which bounds losses (if the idea turns out to be truly a bad idea after all) but en­sures even­tual suc­cess (if a good one). For ex­am­ple, even if on­line pizza de­liv­ery has failed every time it’s tried, it still seems like a good idea that peo­ple will want to or­der pizza on­line via their smart­phones, so one could try to do a pizza startup 2.5 years lat­er, then 5 years lat­er, then 10 years, then 20 years, or per­haps every time com­puter costs drop an or­der of mag­ni­tude, or per­haps every time the rel­e­vant mar­ket dou­bles in size? Since some­one want­ing to try the busi­ness again might not pop up at the ex­act time de­sired, a VC might need to cre­ate one them­selves by try­ing to in­spire some­one to do it.

What other lessons could we draw if we thought about tech­nol­ogy this way? The use of lot­tery grants is one idea which has been pro­posed, to help break the over-ex­ploita­tion fos­tered by peer re­view; the ran­dom­iza­tion gives dis­fa­vored low-prob­a­bil­ity pro­pos­als (and peo­ple) a chance. If we think about mul­ti­-level op­ti­miza­tion sys­tems & pop­u­la­tion-based train­ing, and op­ti­miza­tion of evo­lu­tion like strong am­pli­fiers (which re­sem­ble small but net­worked com­mu­ni­ties: ), that would sug­gest we should have a bias against both large and small group­s/in­sti­tutes/­granters, be­cause small ones are buffeted by ran­dom noise/­drift and can’t afford well-pow­ered ex­per­i­ments, but large ones are too nar­row-mind­ed.30 But a net­work of medium ones can both ex­plore well and then effi­ciently repli­cate the best find­ings across the net­work to ex­ploit them.

See Also

Appendix

ARPA and SCI: Surfing AI (Review of Roland & Shiman 2002)

Re­view of DARPA his­tory book, Strate­gic Com­put­ing: DARPA and the Quest for Ma­chine In­tel­li­gence, 1983–1993, Roland & Shi­man 2002, which re­views a large-s­cale DARPA effort to jump­start re­al-world uses of AI in the 1980s by a mul­ti­-pronged re­search effort into more effi­cient com­puter chip R&D, su­per­com­put­ing, ro­bot­ic­s/­self-driv­ing cars, & ex­pert sys­tem soft­ware. Roland & Shi­man 2002 par­tic­u­larly fo­cus on the var­i­ous ‘philoso­phies’ of tech­no­log­i­cal fore­cast­ing & de­vel­op­ment, which guided DARPA’s strat­egy in differ­ent pe­ri­ods, ul­ti­mately en­dors­ing a weak tech­no­log­i­cal de­ter­min­ism where the bot­tle­necks are too large for a small (in com­par­i­son to the global econ­omy & global R&D) or­ga­ni­za­tion best a DARPA can hope for is a largely ag­nos­tic & re­ac­tive strat­egy in which granters ‘surf’ tech­no­log­i­cal changes, rapidly ex­ploit­ing new tech­nol­ogy while in­vest­ing their lim­ited funds into tar­geted re­search patch­ing up any gaps or lags that ac­ci­den­tally open up and block broader ap­pli­ca­tions.

While read­ing “Fund­ing Break­through Re­search: Promises and Chal­lenges of the ‘ARPA Model’”, Azoulay et al 2018, on DARPA, I no­ticed an in­ter­est­ing com­ment:

In this pa­per, we pro­pose that the key el­e­ments of the ARPA model for re­search fund­ing are: or­ga­ni­za­tional flex­i­bil­ity on an ad­min­is­tra­tive lev­el, and sig­nifi­cant au­thor­ity given to pro­gram di­rec­tors to de­sign pro­grams, se­lect projects and ac­tively man­age pro­jects. We iden­tify the ARPA mod­el’s do­main as mis­sion-ori­ented re­search on nascent S-curves within an in­effi­cient in­no­va­tion sys­tem.

…De­spite a great deal of com­men­tary on , lack of ac­cess to in­ter­nal archival data has ham­pered efforts to study it em­pir­i­cal­ly. One no­table ex­cep­tion is the work of Roland and Shi­man (2002),2 who offer an in­dus­trial his­tory of DARPA’s effort to de­velop ma­chine in­tel­li­gence un­der the “” [SCI]. They em­pha­size both the agen­cy’s po­si­tion­ing in the re­search ecosys­tem—­car­ry­ing mil­i­tary ideas to proof of con­cept that would be oth­er­wise ne­glect­ed—as well as the pro­gram man­agers’ role as “con­nec­tors” in that ecosys­tem. Roland and Shi­man are to our knowl­edge the only aca­d­e­mic re­searchers ever to re­ceive in­ter­nal ac­cess to DARPA’s archives. Re­cent work by Gold­stein and Kear­ney (2018a) on ARPA-E is to-date the only quan­ti­ta­tive analy­sis us­ing in­ter­nal pro­gram data from an ARPA agency. [For in­sights into this painful process, see the pref­ace of Roland and Shi­man (2002).]

The two Gold­stein & Kear­ney 2018 pa­pers sounded in­ter­est­ing but alas, are listed as “man­u­script un­der re­view”/“man­u­script in prepa­ra­tion”; only one is avail­able as a preprint. I was sur­prised that an agency as well known and in­ti­mately in­volved in com­put­ing his­tory could be de­scribed as hav­ing one in­ter­nal his­to­ry, ever, and looked up a PDF copy of .

The pref­ace makes clear the odd foot­note: while they may have had some ac­cess to in­ter­nal archival data, they had a lot less ac­cess than they re­quest­ed, DARPA was not en­thu­si­as­tic about it, and even­tu­ally can­celed their book con­tract (they pub­lished any­way). This leads to an… in­ter­est­ing pref­ace. You don’t often hear his­to­ri­ans of so­licited offi­cial his­to­ries de­scribe the ac­cess as a “mixed bless­ing” and say things like “they never lied to us, as best as we can tell”, they just “sim­ply could not un­der­stand why we wanted to see the ma­te­ri­als we re­quested”, or re­count that their “re­quests for ac­cess to these [e­mails] were most often met with laugh­ter”, not­ing that “We were never ex­plic­itly de­nied ac­cess to records con­trolled by DARPA; we just never gained com­plete ac­cess.” Frus­trat­ed, they

…then asked if they could iden­tify any doc­u­ment in the SC pro­gram that spoke the truth, that could be ac­cepted at face val­ue. They [ARPA in­ter­vie­wees] found this an in­trigu­ing ques­tion. They could not think of a sin­gle such doc­u­ment. All doc­u­ments, in their view, dis­torted re­al­ity one way or an­oth­er—al­ways in pur­suit of some greater good.

In one anec­dote from the in­ter­views, shows up with a stack of in­ter­nal DARPA doc­u­ments, states that a NDA pre­vents her from talk­ing about them (as if any­one cared about NDAs from decades be­fore), and re­fuses to show any of the doc­u­ments to the in­ter­view­er, leav­ing me rather be­mused—why both­er? (Although in this case, it may just be that Con­way is a jerk—one might re­mem­ber her from help­ing try to frame Michael Bai­ley for sex­ual abuse.) I was re­minded a lit­tle of Carter Scholz’s also 2002 nov­el, , which touches on SDI and in­di­rectly on SCI.

The book it­self does­n’t seem to have suffered too badly for the birth pangs. It’s an overview of the birth and death of the SCI, or­ga­nized in chunks by the man­ag­er. The di­vi­sion by man­ager is not an ac­ci­den­t—R&S com­ment dep­re­cat­ingly about DARPA per­son­nel be­ing fo­cused on the tech­nol­ogy and how they did­n’t want them to “talk about peo­ple and pol­i­tics” and in­voke the straw­man of “tech­no­log­i­cal de­ter­min­ists”; they seem to adopt the com­mon his­to­rian pose that a so­phis­ti­cated his­to­rian fo­cuses on peo­ple and it is naive & un­so­phis­ti­cated to in­voke ob­jec­tive con­straints of sci­ence & tech­nol­ogy & physics. This is wrong in the con­text of SCI, as their in­-depth re­count­ing will even­tu­ally make clear. The peo­ple did not have much to do with the fail­ures: stuff like gal­lium ar­senide or or au­tonomous ro­bots did­n’t work out be­cause they don’t work or are hard or re­quire com­put­ing power un­avail­able at the time, not be­cause some bu­reau­crat made a bad nam­ing choice or ran afoul of the wrong Sen­a­tor. Peo­ple don’t mat­ter to some­thing like Moore’s law. Man pro­poses but Na­ture dis­pos­es—you can fake med­i­cine or psy­chol­ogy eas­i­ly, but it’s harder to fake a ro­bot not run­ning into trees. For­tu­nate­ly, for all the time R&S spend on project man­agers shuffling around acronyms, they still de­vote ad­e­quate space to the ac­tual sci­ence & tech­nol­ogy and do a good job of it.

So what was SCI? It was a 1980s–1990 ad­d-on to ARPA’s ex­ist­ing fund­ing pro­grams, where the spec­tre of Japan’s was used to lobby Con­gress for ad­di­tional R&D fund­ing which would be de­voted to a clus­ter of in­ter­con­nected tech­no­log­i­cal op­por­tu­ni­ties ARPA spied on the US hori­zon, to push them for­ward si­mul­ta­ne­ously and break the log­jams. (As al­ways, “fund­ing comes from the threat”, though many were highly skep­ti­cal that Fifth Gen­er­a­tion would go any­where or that its in­tended goal­s—­much of which was to sim­ply work around flaws in Japan­ese lan­guage han­dling—were much of a threat, and most West­ern eval­u­a­tions of it gen­er­ally de­scribe it as a fail­ure or at least not a no­tably pro­duc­tive R&D in­vest­men­t.) The sys­tems in­cluded chips to re­place sil­i­con’s poor ther­mal/ra­di­a­tion tol­er­ance and op­er­ate at faster fre­quen­cies as well, VLSI chips which would com­bine pre­vi­ously dis­parate chips onto a sin­gle small chip as part of a sil­i­con de­sign ecosys­tem which would de­sign & man­u­fac­ture chips much faster than pre­vi­ously31, par­al­lel pro­cess­ing com­put­ers go­ing far be­yond just 1 or 2 proces­sors, au­tonomous car ro­bots, AI ex­pert sys­tems, and ad­vanced user-friendly soft­ware tools in gen­er­al. The name “Strate­gic Com­put­ing Ini­tia­tive” was cho­sen to try to ben­e­fit from Rea­gan’s SDI, but while the mil­i­tary con­nec­tions re­mained through­out, the con­nec­tion was ul­ti­mately quite ten­u­ous and the gal­lium ar­senide chips were de­lib­er­ately split out to SDI to avoid con­t­a­m­i­na­tion, al­though the US mil­i­tary would still be the best cus­tomer for many of the prod­ucts & the con­nec­tions con­tin­ued to alien­ate peo­ple. Sur­pris­ing­ly—shock­ing­ly, even—­com­puter net­work­ing was not a ma­jor SCI fo­cus: the ARPA net­work­ing PM Barry Leiner kept clear of SCI (not need­ing the money & fear­ing a re­peat of know-noth­ing Re­pub­li­can Con­gress­men search­ing for some­thing to ax­e). The fund­ing ul­ti­mately amounted to $2,205,607$1,000,0001993, triv­ial com­pared to to­tal mil­i­tary fund­ing, but still real mon­ey.

The project im­ple­men­ta­tion fol­lowed ARPA’s ex­ist­ing loose over­sight par­a­digm, where trav­el­ing project man­agers were em­pow­ered to dis­pense grants to ap­pli­cants on their own au­thor­i­ty, de­pend­ing pri­mar­ily on their own good taste to match tal­ented re­searchers with ripe op­por­tu­ni­ties, with bu­reau­cracy lim­ited to meet­ing with the grantees semi­-an­nu­ally or an­nu­ally for progress re­ports & eval­u­a­tion, often in groups so as to let re­searchers test each oth­er’s met­tle & form so­cial ties. (“ARPA pro­gram man­agers like to re­peat the quip that they are 75 en­tre­pre­neurs held to­gether by a com­mon travel agent.”) An ARPA PM would humbly ‘surf’ the cut­ting-edge, go­ing with the waves rather than swim­ming up­stream, so to speak, to fol­low grow­ing trends while cut­ting their losses on dead ends, to bring things through the ‘val­ley of death’ be­tween lab pro­to­type and the real world:

Steven Squires, who rose from pro­gram man­ager to be Chief Sci­en­tist of SC and then di­rec­tor of its par­ent office, sought or­der­s-of-mag­ni­tude in­creases in com­put­ing power through par­al­lel con­nec­tion of proces­sors. He en­vi­sioned re­search as a con­tin­u­um. In­stead of point so­lu­tions, sin­gle tech­nolo­gies to serve a given ob­jec­tive, he sought mul­ti­ple im­ple­men­ta­tions of re­lated tech­nolo­gies, an ar­ray of ca­pa­bil­i­ties from which users could con­nect differ­ent pos­si­bil­i­ties to cre­ate the best so­lu­tion for their par­tic­u­lar prob­lem. He called it “gray cod­ing”. Re­search moved not from the white of ig­no­rance to the black of rev­e­la­tion, but rather it inched along a tra­jec­tory step­ping in­cre­men­tally from one shade of gray to an­oth­er. His re­search map was not a quan­tum leap into the un­known but a ra­tio­nal process of con­nect­ing the dots be­tween here and there. These and other DARPA man­agers at­tempted to or­ches­trate the ad­vance­ment of an en­tire suite of tech­nolo­gies. The desider­a­tum of their sym­phony was con­nec­tion. They per­ceived that re­search had to mir­ror tech­nol­o­gy. If the sys­tem com­po­nents were to be con­nect­ed, then the re­searchers had to be con­nect­ed. If the sys­tem was to con­nect to its en­vi­ron­ment, then the re­searchers had to be con­nected to the users. Not every­one in SC shared these in­sights, but the founders did, and they at­tempted to in­still this ethos in the pro­gram.

Done wrong, of course, this re­sults in a cor­rupt slush fund dol­ing out R&D funds to an in­ces­tu­ous net­work of grantees for tech­nolo­gies al­ways just on the hori­zon and whose fail­ure is al­ways ex­cused by the claim that high­-risk re­search often won’t work out, or re­sults in elab­o­rate sys­tems try­ing to do too many things and col­laps­ing un­der the weight of many ad­vanced half-de­bugged sys­tems chaot­i­cally in­ter­act­ing (eg ). Hav­ing been con­ceived in sci­en­tific sin and born of blue-u­ni­form bu­reau­cracy while mid­wifed by con­niv­ing com­mit­tees, SCI’s prospects might not look too great.

So, did SCI work out? The an­swer is a defi­nite, un­qual­i­fied—­may­be:

At the end of their decade, 1983–1993, the con­nec­tion failed. SC never achieved the ma­chine in­tel­li­gence it had promised. It did, how­ev­er, achieve some re­mark­able tech­no­log­i­cal suc­cess­es. And the pro­gram lead­ers and re­searchers learned as much from their fail­ures as from their tri­umphs. They aban­doned the weak com­po­nents in their sys­tem and re­con­fig­ured the strong ones. They called the new sys­tem “high per­for­mance com­put­ing”. Un­der this new rubric they con­tin­ued the cam­paign to im­prove com­put­ing sys­tems. “Grand chal­lenges” re­placed the for­mer goal, ma­chine in­tel­li­gence; but the strat­egy and even the tac­tics re­mained the same.

The end of SCI co­in­cided with (and par­tially caused) the “”, but SCI went be­yond just the & ex­pert sys­tem soft­ware com­pa­nies we as­so­ciate with the AI win­ter. Of the sys­tems, some worked out, oth­ers were good ideas but the time was­n’t ripe in an un­fore­see­able way and have been ma­tur­ing ever since, some have poked along in a kind of per­ma­nent sta­sis (not dead but not alive ei­ther), oth­ers were dead ends but dead ends in im­por­tant ways, and some are plain dead. In or­der, one might list: par­al­lel com­mod­ity proces­sors and rapid de­vel­op­ment of large sil­i­con chips via a sub­si­dized foundry, the au­tonomous cars/ve­hi­cles and gen­er­al­ized ma­chine in­tel­li­gence sys­tems and ex­pert sys­tems, , and .

Pin­ing for the fjords: su­per-fast su­per­con­duct­ing Joseph­son junc­tions were rapidly aban­doned be­fore be­com­ing offi­cially part of SCI re­search, while gal­lium ar­senide suffered a sim­i­lar fate—at the time, they were ex­cit­ing and in­fa­mously bet big on the achiev­ing its OOM im­prove­ment in part with gal­lium ar­senide chips, but some­how it never quite worked out or re­placed sil­i­con and re­mains in a small niche. (I doubt it was SDI’s fault, since gal­lium ar­senide has had 2 decades since, and there’s been a ton of com­mer­cial in­cen­tive to find a re­place­ment for sil­i­con as it gets ever harder to shrink sil­i­con nodes.)

Im­por­tant fail­ures: au­tonomous ve­hi­cles and gen­er­al­ized AI sys­tems rep­re­sent an in­ter­est­ing in­ter­me­di­ate case: the funded ve­hi­cles, like the work at CMU, were use­less—­ex­pen­sive, slow, triv­ially con­fused by slight differ­ences in roads or scenery, un­able to cope in re­al­time with more than mono­chrome im­ages with piti­ful res­o­lu­tions like 640x640px or smaller be­cause the com­puter vi­sion al­go­rithms were too com­pu­ta­tion­ally de­mand­ing, and the de­vel­op­ment bogged down by end­less tweaks and hack­ing with reg­u­lar re­gres­sions in ca­pa­bil­i­ty. But these re­search pro­grams and demos were di­rect an­ces­tors of the , which it­self kick­started the cur­rent self­-driv­ing car boom a decade ago. ARPA and the mil­i­tary did­n’t get the ex­cit­ing ve­hi­cles promised by the early ’90s, but they do now have au­tonomous cars and es­pe­cially drones, and it’s amaz­ing to think that Google Waymo cars are wan­der­ing around Ari­zona now reg­u­larly pick­ing up and drop­ping off rid­ers with­out a sin­gle fa­tal­ity or ma­jor in­jury after mil­lions of miles. As far as I can tell, Waymo would­n’t ex­ist now with­out the DARPA Grand Chal­lenge, and it seems pos­si­ble that DARPA was en­cour­aged by the mixed suc­cess of the SCI ve­hi­cles, so that’s an in­ter­est­ing case of po­ten­tial suc­cess al­beit de­layed. (But then, we do ex­pect that with tech­nol­o­gy—A­ma­ra’s law.)

Par­al­lel com­put­ers: Think­ing Ma­chines ben­e­fited a lot from SCI as did other par­al­lel com­put­ing pro­jects, and while TM did fail and the com­put­ers we use now don’t re­sem­ble the Con­nec­tion Ma­chine at all32, the field of par­al­lel pro­cess­ing was proven out (ie. sys­tems with thou­sands of weak CPUs could be suc­cess­fully built, pro­grammed, re­al­ize OOM per­for­mance gains, and com­mer­cially sol­d); I’d no­ticed once that a lot of par­al­lel com­put­ing ar­chi­tec­tures we use now seemed to stem from an efflo­res­cence in the 1980s, but it was only while read­ing R&S and not­ing all the fa­mil­iar names that I re­al­ized that that was not a co­in­ci­dence be­cause many of them were ARPA-funded at this time. Even with­out R&S not­ing that the par­al­lel com­put­ing was suc­cess­fully rolled over into “HPC”, SCI’s in­vest­ment into par­al­lel com­put­ing was a big suc­cess.

A suc­cess­ful ad­junct to the par­al­lel com­put­ing was an in­ter­est­ing pro­gram I’d never heard of be­fore: . MOSIS was es­sen­tially a gov­ern­men­t-sub­si­dized chip foundry, com­pet­i­tive with com­mer­cial chip foundries, which would ac­cept stu­dent & re­searcher sub­mis­sions of VLSI chip de­signs like CPUs or ASICs and make phys­i­cal chips in com­bined batches to save costs. Any­one with in­ter­est­ing new ideas could email in a de­sign and get back within 2 months a real live chip for a few hun­dred dol­lars. The chips would be made cheap­ly, quick­ly, qual­i­ty-checked, with as­sur­ance of pri­va­cy, and ran thou­sands of projects a year (peak­ing at 1880 in 1989). This is quite a cool pro­gram to run and must have been a god­send, es­pe­cially for any­one try­ing to make cus­tom chips for par­al­lel pro­jects. (“SC also sup­ported BBN’s But­ter­fly par­al­lel proces­sor, Charles Seitz’s Hy­per­cube and Cos­mic Cube at Cal­Tech, Columbi­a’s Non-Von, and the Cal­Tech Tree Ma­chine. It sup­ported an en­tire new­comer as well, Danny Hillis’s Con­nec­tion Ma­chine, com­ing out of MIT.47 All of these projects used MOSIS ser­vices to move their de­sign ideas into ex­per­i­men­tal chips.”) It was in­volved in early GPU work (Clark’s Geom­e­try En­gine) and RISC de­signs like MIPS and even odd­i­ties like sys­tolic ar­ray chip­s/­com­put­ers like the . Sad­ly, MOSIS was a bit of a vic­tim of its own suc­cess and drew po­lit­i­cal fire.

Ex­pert sys­tems and plan­ners are gen­er­ally listed as a ‘fail­ure’ and the cause of the AI Win­ter, and it’s true they did­n’t give us HAL as some GOFAI peo­ple hoped, but they did find a use­ful niche and have been im­por­tan­t—R&S give a throw­away para­graph not­ing that one sys­tem from SCI, , was used in plan­ning lo­gis­tics for the first Gulf War and saved the DoD more money than the whole SCI pro­gram com­bined cost. (The listed ref­er­ence, DART: Rev­o­lu­tion­iz­ing Lo­gis­tics Plan­ning”, Hed­berg 2002, ac­tu­ally makes the bolder claim that DART “paid back all of DARPA’s 30 years of in­vest­ment in AI in a mat­ter of a few months, ac­cord­ing to Vic­tor Reis, Di­rec­tor of DARPA at the time.” Which could be equally well taken as a com­ment on how ex­pen­sive a war is, how in­effi­cient DoD lo­gis­tics plan­ning was, or how lit­tle has been in­vested in AI.) It’s also worth not­ing that speech recog­ni­tion based on & , the first speech recog­ni­tion sys­tems which were any use (un­der­ly­ing suc­cesses like ), was a suc­cess here, even if now ob­so­lesced by deep learn­ing.

Per­haps the most rel­e­vant area to con­tem­po­rary AI dis­cus­sions of deep learn­ing is the ex­pert sys­tems. Why was there such op­ti­mism? Ex­pert sys­tems had ac­com­plished a few suc­cess­es: / (although it was never used in pro­duc­tion), some min­ing/oil case stud­ies like PROSPECTOR, a cus­tomer con­fig­u­ra­tion as­sis­tant for DEC… And SCI was a syn­er­gis­tic pro­gram, re­mem­ber, pro­vid­ing the chips and then pow­er­ful par­al­lel com­put­ers whose ex­pert sys­tems would scale up to the tens of thou­sands of rules per sec­ond es­ti­mated nec­es­sary for things like the au­tonomous ve­hi­cles:

Small won­der, then, that Robert Kahn and the ar­chi­tects of SC be­lieved in 1983 that AI was ripe for ex­ploita­tion. It was fi­nally mov­ing out of the lab­o­ra­tory and into the real world, out of the realm of toy prob­lems and into the realm of real prob­lems, out of the ster­ile world of the­ory and into the prac­ti­cal world of ap­pli­ca­tions.

…That such a goal ap­peared within reach in the early 1980s is a mea­sure of how far the field had al­ready come. In the early 1970s, the MYCIN ex­pert sys­tem had taken twenty per­son­-years to pro­duce just 475 rules.38 The full po­ten­tial of ex­pert sys­tems lay in pro­grams with thou­sands, even tens and hun­dreds of thou­sands, of rules. To achieve such lev­els, pro­duc­tion of the sys­tems had to be dra­mat­i­cally stream­lined. The com­mer­cial firms spring­ing up in the early 1980s were build­ing cus­tom sys­tems one client at a time. DARPA would try to raise the field above that lev­el, up to the generic or uni­ver­sal ap­pli­ca­tion.

Thus was shaped the SC agenda for AI. While the ba­sic pro­gram within IPTO con­tin­ued fund­ing for all ar­eas of AI, SC would seek “generic ap­pli­ca­tions” in four ar­eas crit­i­cal to the pro­gram’s ap­pli­ca­tions: (1) speech recog­ni­tion would sup­port Pi­lot’s As­so­ciate and Bat­tle Man­age­ment; (2) nat­ural lan­guage would be de­vel­oped pri­mar­ily for Bat­tle Man­age­ment; (3) vi­sion would serve pri­mar­ily the Au­tonomous Land Ve­hi­cle; and (4) ex­pert sys­tems would be de­vel­oped for all of the ap­pli­ca­tions. If AI was the penul­ti­mate tier of the SC pyra­mid, then ex­pert sys­tems were the pin­na­cle of that tier. Upon them all ap­pli­ca­tions de­pend­ed. De­vel­op­ment of a generic ex­pert sys­tem that might ser­vice all three ap­pli­ca­tions could be the crown­ing achieve­ment of the pro­gram. Op­ti­mism on this point was fu­eled by the whole phi­los­o­phy be­hind SC. AI in gen­er­al, and ex­pert sys­tems in par­tic­u­lar, had been ham­pered pre­vi­ously by lack of com­put­ing pow­er. Feigen­baum, for ex­am­ple, had be­gun DENDRAL on an IBM 7090 com­put­er, with about 130K bytes of core mem­ory and an op­er­at­ing speed be­tween 50 and 100,000 float­ing point op­er­a­tions per sec­ond.39 Com­puter power was al­ready well be­yond that stage, but SC promised to take it to un­prece­dented lev­el­s—a gi­gaflop by 1992. Speed and power would no longer con­strain ex­pert sys­tems. If AI could de­liver the generic ex­pert sys­tem, SC would de­liver the hard­ware to run it. Com­pared to ex­ist­ing ex­pert sys­tems run­ning 2,000 rules at 50–100 rules per sec­ond, SC promised “mul­ti­ple co­op­er­at­ing ex­pert sys­tems with plan­ning ca­pa­bil­ity” run­ning 30,000 rules fir­ing at 12,000 rules per sec­ond and six times real time.40

What hap­pened was that the hard­ware came into ex­is­tence, but the ex­pert sys­tems did­n’t scale. They in­stantly hit a com­bi­na­to­r­ial wall, could­n’t solve the ground­ing prob­lem, and knowl­edge en­gi­neer­ing never be­came fea­si­ble at the level where you might en­code a hu­man’s knowl­edge. Ex­pert sys­tems also strug­gled to be ex­tended be­yond sym­bolic sys­tems to real data like vi­sion or sound. AI did­n’t have re­motely enough com­put­ing power to do any­thing use­ful, and it did­n’t have meth­ods which could use the com­put­ing power if it had it. We got the VLSI chips, we got the gi­ga­hertz proces­sors even with­out gal­lium ar­senide, we got the gi­gaflops and then the ter­aflops and now the petaflop­s—but what do you do with an ex­pert sys­tem on those? Noth­ing. The grand goals of SCI re­lied on all the parts do­ing their part, and one part fell through:

Only four years into the SC pro­gram, when Schwartz was about to ter­mi­nate the In­tel­liCorp and Teknowl­edge con­tracts, ex­pec­ta­tions for ex­pert sys­tems were al­ready be­ing scaled back. By the time that Hayes-Roth re­vised his ar­ti­cle for the 1992 edi­tion of the En­cy­clo­pe­dia, the pic­ture was still more bleak. There he made no pre­dic­tions at all about pro­gram speeds. In­stead he noted that rule-based sys­tems still lacked “a pre­cise an­a­lyt­i­cal foun­da­tion for the prob­lems solv­able by RBSs . . . and a the­ory of knowl­edge or­ga­ni­za­tion that would en­able RBSs to be scaled up with­out loss of in­tel­li­gi­bil­ity of per­for­mance.”108 SC con­trac­tors in other fields, es­pe­cially ap­pli­ca­tions, had to rely on cus­tom-de­vel­oped soft­ware of con­sid­er­ably less power and ver­sa­til­ity than those en­vi­sioned when con­tracts were made with In­tel­liCorp and Teknowl­edge. In­stead of a generic ex­pert sys­tem, SC ap­pli­ca­tions re­lied in­creas­ingly on “do­main-spe­cific soft­ware”, a change in ter­mi­nol­ogy that re­flected the di­rec­tion in which the en­tire field was mov­ing.109 This is strik­ingly sim­i­lar to the pes­simistic eval­u­a­tion Schwartz had made in 1987. It was not just that In­tel­liCorp and Teknowl­edge had failed; it was that the en­ter­prise was im­pos­si­ble at cur­rent lev­els of ex­pe­ri­ence and un­der­stand­ing…­Does this mean that AI has fi­nally mi­grated out of the lab­o­ra­tory and into the mar­ket­place? That de­pends on one’s per­spec­tive. In 1994 the U.S. De­part­ment of Com­merce es­ti­mated the global mar­ket for AI sys­tems to be about $1,918$9001994 mil­lion, with North Amer­ica ac­count­ing for two-thirds of that to­tal.119 Michael Schrage, of the Sloan School’s Cen­ter for Co­or­di­na­tion Sci­ence at MIT, con­cluded in the same year that “AI is—­dol­lar for dol­lar—prob­a­bly the best soft­ware de­vel­op­ment in­vest­ment that smart com­pa­nies have made.”120 Fred­er­ick Hayes-Roth, in a wide-rang­ing and can­did as­sess­ment, in­sisted that “KBS have at­tained a per­ma­nent and se­cure role in in­dus­try”, even while ad­mit­ting the many short­com­ings of this tech­nol­o­gy.121 Those short­com­ings weighed heav­ily on AI au­thor­ity Daniel Crevier, who con­cluded that “the ex­pert sys­tems flaunted in the early and mid-1980s could not op­er­ate as well as the ex­perts who sup­plied them with knowl­edge. To true hu­man ex­perts, they amounted to lit­tle more than so­phis­ti­cated re­mind­ing lists.”122 Even Ed­ward Feigen­baum, the fa­ther of ex­pert sys­tems, has con­ceded that the prod­ucts of the first gen­er­a­tion have proven nar­row, brit­tle, and iso­lat­ed.123 As far as the SC agenda is con­cerned, Hayes-Roth’s 1993 opin­ion is dev­as­tat­ing: “The cur­rent gen­er­a­tion of ex­pert and KBS tech­nolo­gies had no hope of pro­duc­ing a ro­bust and gen­eral hu­man-like in­tel­li­gence.”124

…Each new [ALV] fea­ture and ca­pa­bil­ity brought with it a host of unan­tic­i­pated prob­lems. A new pan­ning sys­tem, in­stalled in early 1986 to per­mit the cam­era to turn as the road curved, un­ex­pect­edly caused the ve­hi­cle to veer back and forth un­til it ran off the road al­to­geth­er.45 The soft­ware glitch was soon fixed, but the pan­ning sys­tem had to be scrapped any­way; the heavy, 40-pound cam­era stripped the de­vice’s gears when­ever the ve­hi­cle made a sud­den stop.46 Given such unan­tic­i­pated diffi­cul­ties and de­lays, Mar­tin in­creas­ingly di­rected its efforts to­ward achiev­ing just the spe­cific ca­pa­bil­i­ties re­quired by the mile­stones, at the ex­pense of de­vel­op­ing more gen­eral ca­pa­bil­i­ties. One of the lessons of the first demon­stra­tion, ac­cord­ing to the ALV en­gi­neers, was the im­por­tance of defin­ing “ex­pected ex­per­i­men­tal re­sults”, be­cause “too much time was wasted do­ing things not ap­pro­pri­ate to proof of con­cept.”47 Mar­t­in’s se­lec­tion of tech­nol­ogy was con­ser­v­a­tive. It had to be, as the ALV pro­gram could afford nei­ther the lost time nor the bad pub­lic­ity that a ma­jor fail­ure would bring. One BDM ob­server ex­pressed con­cern that the pres­sure of the demon­stra­tions was en­cour­ag­ing Mar­tin to cut cor­ners, for in­stance by us­ing the “flat earth” al­go­rithm with its two-di­men­sional rep­re­sen­ta­tion. ADS’s ob­sta­cle-avoid­ance al­go­rithm was so nar­rowly fo­cused that the com­pany was un­able to test it in a park­ing lot; it worked only on roads.84…The vi­sion sys­tem proved highly sen­si­tive to en­vi­ron­men­tal con­di­tion­s—the qual­ity of light, the lo­ca­tion of the sun, shad­ows, and so on. The sys­tem worked differ­ently from month to mon­th, day to day, and even test to test. Some­times it could ac­cu­rately lo­cate the edge of the road, some­times not. The sys­tem re­li­ably dis­tin­guished the pave­ment of the road from the dirt on the shoul­ders, but it was fooled by dirt that was tracked onto the road­way by heavy ve­hi­cles ma­neu­ver­ing around the ALV. In the fall, the sun, now lower in the sky, re­flected bril­liantly off the myr­i­ads of pol­ished peb­bles in the tar­mac it­self, pro­duc­ing glit­ter­ing re­flec­tions that con­fused the ve­hi­cle. Shad­ows from trees pre­sented prob­lems, as did as­phalt patches from the fre­quent road re­pairs made nec­es­sary by the harsh Col­orado weather and the con­stant pound­ing of the eight-ton ve­hi­cle.42

…Knowl­edge-based sys­tems in par­tic­u­lar were diffi­cult to ap­ply out­side the en­vi­ron­ment for which they had been de­vel­oped. A vi­sion sys­tem de­vel­oped for au­tonomous nav­i­ga­tion, for ex­am­ple, prob­a­bly would not prove effec­tive for an au­to­mated man­u­fac­tur­ing as­sem­bly line. “There’s no sin­gle uni­ver­sal mech­a­nism for prob­lem solv­ing”, Amarel would later say, “but de­pend­ing on what you know about a prob­lem, and how you rep­re­sent what you know about the prob­lem, you may use one of a num­ber of ap­pro­pri­ate mech­a­nisms.”…In an­other ma­jor shift in em­pha­sis, SC2 re­moved “ma­chine in­tel­li­gence” from its own plateau on the pyra­mid, sub­sum­ing it un­der the gen­eral head­ing “soft­ware”. This seem­ingly mi­nor shift in nomen­cla­ture sig­naled a pro­found recon­cep­tu­al­iza­tion of AI, both within DARPA and through­out much of the com­puter com­mu­ni­ty. The effer­ves­cent op­ti­mism of the early 1980s gave way to more sober ap­praisal. AI did not scale. In spite of im­pres­sive achieve­ments in some fields, de­sign­ers could not make sys­tems work at a level of com­plex­ity ap­proach­ing hu­man in­tel­li­gence. Ma­chines ex­celled at data stor­age and re­trieval; they lagged in judg­ment, learn­ing, and com­plex pat­tern recog­ni­tion.

…Dur­ing SC, AI had proved un­able to ex­ploit the pow­er­ful ma­chines de­vel­oped in SC’s ar­chi­tec­tures pro­gram to achieve Kah­n’s generic ca­pa­bil­ity in ma­chine in­tel­li­gence. On the fine-grained lev­el, AI, in­clud­ing many de­vel­op­ments from the SC pro­gram, is ubiq­ui­tous in mod­ern life. It in­hab­its every­thing from au­to­mo­biles and con­sumer elec­tron­ics to med­ical de­vices and in­stru­ments of the fine arts. Iron­i­cal­ly, AI now per­forms mir­a­cles unimag­ined when SC be­gan, though it can’t do what SC promised.

Given how peo­ple keep reach­ing back to the AI Win­ter in dis­cus­sions of con­nec­tion­is­m—I mean, deep learn­ing—it’s in­ter­est­ing to con­trast the two par­a­digms.

While work­ing on the Wikipedia ar­ti­cle for (and ar­ti­cles on re­lated high­-pro­file suc­cesses like MYCIN/DENDRAL) back in 2009, I read many jour­nals & mag­a­zines from the 1980s, the Lisp ma­chine hey­day, and even played with a Gen­era OS im­age in a VM; the more I read about AI, the MIT AI Lab, Lisp ma­chi­nes, the ‘AI win­ter’, and so on, the more im­pressed I was by the op­er­at­ing sys­tems & tools (such as the so­phis­ti­cated hy­per­text doc­u­men­ta­tion & text ed­i­tors and ca­pa­bil­i­ties of Com­mon Lisp & its pre­de­ces­sors, which still put con­tem­po­rary OS ecosys­tems on Win­dows/­Mac/Linux to shame in many ways33) and the less I was im­pressed by the ac­tual AI al­go­rithms of the era. In con­trast, with deep learn­ing, I am in­creas­ingly unim­pressed by the sur­round­ing ecosys­tem of soft­ware tools (with its end­less lay­ers of buggy Python & rigid C++) the more I use it, but more and more im­pressed by what is pos­si­ble with deep learn­ing.

Deep learn­ing has long ago es­caped into the com­mer­cial mar­ket, in­deed, is pri­mar­ily dri­ven by in­dus­try re­searchers at this point. The case stud­ies are in­nu­mer­able (and many are se­cret due to their con­sid­er­able com­mer­cial val­ue). DL han­dles ground­ing prob­lems & raw sen­sory data well and in­deed strug­gles most on prob­lems with richly for­mal­ized struc­tures like hi­er­ar­chies/­cat­e­gories/di­rected graphs (ML prac­ti­tion­ers cur­rently tend to use de­ci­sion tree meth­ods like for those), or which re­quire us­ing rules & log­i­cal rea­son­ing (some­what like hu­man­s). Per­haps most im­por­tantly from the per­spec­tive of SCI and HPC, deep learn­ing scales: it par­al­lelizes in a num­ber of ways, and it can soak up in­defi­nite amounts of com­put­ing power & data (see for much more on deep learn­ing scal­ing). You can train a CNN on a few hun­dred or thou­sand im­ages use­fully34, but Face­book & Google have run ex­per­i­ments go­ing from mil­lions to large datasets such as hun­dreds of mil­lions to bil­lions of im­ages, and the CNNs steadily im­prove their per­for­mance on both their as­signed task and in ac­com­plish­ing trans­fer learn­ing35. Sim­i­larly in re­in­force­ment learn­ing, the richer the re­sources avail­able, the richer a NN can be trained (see “bless­ings of scale”). Even self­-driv­ing car pro­grams which are a by­word for in­com­pe­tence deal just fine with all the is­sues that be­dev­iled ALV by us­ing, well, ‘a sin­gle uni­ver­sal mech­a­nism for prob­lem solv­ing’ (which we call CNNs, which can do any­thing from im­age seg­men­ta­tion to hu­man lan­guage trans­la­tion). These points are all the more strik­ing as there is no sign that hard­ware im­prove­ments are over or that any in­her­ent lim­its have been hit; even the large-s­cale ex­per­i­ments crit­i­cized as ‘boil the oceans’ projects nev­er­the­less spend what are triv­ial amounts of money by both global eco­nomic and R&D cri­te­ria, like a few mil­lion dol­lars of GPU time. But none of this could have been done in the 1980s, or early 1990s. (As Hin­ton says, why did­n’t con­nec­tion­ism work back then? Be­cause the com­put­ers were thou­sands of times too slow, the datasets were thou­sands of times too small, and some of the neural net­work de­tails like ini­tial­iza­tions & ac­ti­va­tions were bro­ken.)

Con­sid­er­ing all this, it’s not a sur­prise that the AI part of SC did­n’t pan out and even­tu­ally got axed, as it should have. Some­times the time is not ripe. Hero can in­vent the steam en­gine, but you don’t get steam en­gine trains un­til it’s steam en­gine train time, and the best in­ten­tions of all the bu­reau­crats in the world can’t affect that much. The turnover in man­agers and po­lit­i­cal in­ter­fer­ence may well have been enough to “dis­rupt the care­ful or­ches­tra­tion that its am­bi­tious agenda re­quired”, but this was more in the na­ture of shoot­ing a dead horse. R&S seem, some­what re­luc­tant­ly, to ul­ti­mately as­sent to the view they cri­tiqued at the be­gin­ning, held by the ARPA staff, that the fail­ure of SC is pri­mar­ily a demon­stra­tion of tech­no­log­i­cal de­ter­min­ism than so­cial & po­lit­i­cal con­tin­gen­cy, and more about the tech­nol­ogy than peo­ple:

…Thus, for all of their agen­cy, their story ap­pears to be one dri­ven by the tech­nol­o­gy. If they were un­able to so­cially con­struct this tech­nol­o­gy, to main­tain agency over tech­no­log­i­cal choice, does it then fol­low that some tech­no­log­i­cal im­per­a­tive shaped the SC tra­jec­to­ry, di­vert­ing it in the end from ma­chine in­tel­li­gence to high per­for­mance com­put­ing? In­sti­tu­tion­al­ly, SC is best un­der­stood as an ana­log of the de­vel­op­ment pro­grams for the and bal­lis­tic mis­siles. An elab­o­rate struc­ture was cre­ated to sell the pro­gram, but in prac­tice the plan bore lit­tle re­sem­blance to day-to-day op­er­a­tions. Con­cep­tu­al­ly, SC is best un­der­stood by mix­ing Thomas Hugh­es’s frame­work of large-s­cale tech­no­log­i­cal sys­tems with Gio­vanni Dosi’s no­tions of re­search tra­jec­to­ries. Its ex­pe­ri­ence does not quite map on Hugh­es’s model be­cause the man­agers could not or would not bring their re­verse salients on line. It does not quite map on Dosi be­cause the man­agers reg­u­larly dealt with more tra­jec­to­ries and more vari­ables than Dosi an­tic­i­pates in his analy­ses. In essence, the man­agers of SC were try­ing to re­search and de­velop a com­plex tech­no­log­i­cal sys­tem. They suc­ceeded in de­vel­op­ing some com­po­nents; they failed to con­nect them in a sys­tem. The over­all pro­gram his­tory sug­gests that at this level of ba­sic or fun­da­men­tal re­search it is best to aim for a broad range of ca­pa­bil­i­ties within the tech­nol­ogy base and leave in­te­gra­tion to oth­er­s…While the Fifth Gen­er­a­tion pro­gram con­tributed sig­nifi­cantly to Japan’s na­tional in­fra­struc­ture in com­puter tech­nol­o­gy, it did not vault that coun­try past the United States…SC played an im­por­tant role, but even some SC sup­port­ers have noted that the Japan­ese were in any event headed on the wrong tra­jec­tory even be­fore the United States mo­bi­lized it­self to meet their chal­lenge.

…In some ways the vary­ing records of the SC ap­pli­ca­tions shed light on the pro­gram mod­els ad­vanced by Kahn and Cooper at the out­set. Cooper be­lieved that the ap­pli­ca­tions would pull tech­nol­ogy de­vel­op­ment; Kahn be­lieved that the evolv­ing tech­nol­ogy base would re­veal what ap­pli­ca­tions were pos­si­ble. Kah­n’s ap­praisal looks more re­al­is­tic in ret­ro­spect. It is clear that ex­pert sys­tems en­joyed sig­nifi­cant suc­cess in plan­ning ap­pli­ca­tions. This made pos­si­ble ap­pli­ca­tions rang­ing from Naval Bat­tle Man­age­ment to DART. Vi­sion did not make com­pa­ra­ble pro­gress, thus pre­clud­ing achieve­ment of the am­bi­tious goals set for the ALV. Once again, the pro­gram went where the tech­nol­ogy al­lowed. Some re­verse salients re­sisted efforts to or­ches­trate ad­vance of the en­tire field in con­cert. If one com­po­nent in a sys­tem did not con­nect, the sys­tem did not con­nect.

In the fi­nal analy­sis, SC failed for want of con­nec­tion.

Read­ing about SC fur­nishes an un­ex­pected les­son about the im­por­tance of be­liev­ing in Moore’s Law and hav­ing tech­niques which can scale. What are we do­ing now which won’t scale, and what waves are we pad­dling up in­stead of surfing?

Reverse Salients

Ex­cerpts from The First Mir­a­cle Drugs: How the Sulfa Drugs Trans­formed Med­i­cine, Lesch 2006, de­scrib­ing Hein­rich Hör­lein’s drug de­vel­op­ment pro­grams & Thomas Edis­on’s elec­tri­cal pro­grams as strate­gi­cally aimed at “re­verse salients”, nec­es­sary steps which hold back the prac­ti­cal ap­pli­ca­tion of progress in ar­eas, where re­search efforts have dis­pro­por­tional pay­offs by re­mov­ing a bot­tle­neck.

From pg48, “A Sys­tem of In­ven­tion”, The First Mir­a­cle Drugs: How the Sulfa Drugs Trans­formed Med­i­cine, Lesch 2006:

at­ti­tude was based not sim­ply, or even pri­mar­i­ly, on the sit­u­a­tion of any par­tic­u­lar area of re­search con­sid­ered in iso­la­tion, but on his com­pre­hen­sive overview of ad­vance in ar­eas in which chem­istry and bio­med­i­cine in­ter­sect­ed. These ar­eas shared a num­ber of generic prob­lems and so­lu­tions, for ex­am­ple, the need to iso­late a sub­stance (nat­ural pro­duct, syn­thetic pro­duct, body sub­stance) in chem­i­cally pure form, the need to syn­the­size the sub­stance and to do so eco­nom­i­cally if it was to go on the mar­ket, and the need for phar­ma­co­log­i­cal, chemother­a­peu­tic, tox­i­co­log­i­cal, and clin­i­cal test­ing of the sub­stance. Hör­lein’s efforts to trans­late suc­cess in cer­tain ar­eas (vi­t­a­min de­fi­ciency dis­ease, chemother­apy of pro­to­zoal in­fec­tions) into op­ti­mism about pos­si­bil­i­ties in other ar­eas (cancer, an­tibac­te­r­ial chemother­a­py) was char­ac­ter­is­tic. He re­garded the chem­i­cal at­tack on dis­ease as a many-fronted bat­tle in which there was a gen­er­ally ad­vanc­ing line but also many points at which ad­vance was slow or ar­rest­ed.

In this sense, Hör­lein might be said to have thought—as Thomas Hughes has shown that Edi­son did—in terms of re­verse salients and crit­i­cal prob­lems. Re­verse salients are ar­eas of re­search and de­vel­op­ment that are lag­ging in some ob­vi­ous way be­hind the gen­eral line of ad­vance. Crit­i­cal prob­lems are the re­search ques­tions, cast in terms of the con­crete par­tic­u­lars of cur­rently avail­able knowl­edge and tech­nique and of spe­cific ex­em­plars or mod­els (e.g., in­sulin, chemother­apy of and malar­ia) that are solv­able and whose so­lu­tions would elim­i­nate the re­verse salients.18

  1. On Edis­on, see Thomas P. Hugh­es, Net­works of Pow­er: Elec­tri­fi­ca­tion in West­ern So­ci­ety 1880–1930 (Bal­ti­more, MD: Johns Hop­kins Uni­ver­sity Press, 1983), 18–46.
  2. Ibid; and Thomas P. Hugh­es, “The evo­lu­tion of large tech­no­log­i­cal sys­tems”, in Wiebe E. Bijk­er, Thomas P. Hugh­es, and Trevor Pinch, ed­i­tors, The So­cial Con­struc­tion of Tech­no­log­i­cal Sys­tems (Cam­bridge, MA: The MIT Press, 1987)

…What was sys­temic in Hör­lein’s way of think­ing was his con­cept of the or­ga­ni­za­tional pat­tern or pat­terns that will best fa­cil­i­tate the pro­duc­tion of valu­able re­sults in the ar­eas in which med­i­cine and chem­istry in­ter­act. A valu­able out­come is a re­sult that has prac­ti­cal im­por­tance for clin­i­cal or pre­ven­tive med­i­cine and, im­plic­it­ly, com­mer­cial value for in­dus­try. Hör­lein per­ceived a need for a set of mu­tu­ally com­ple­men­tary in­sti­tu­tions and trained per­son­nel whose in­ter­ac­tion pro­duces the de­sired re­sults. The or­ga­ni­za­tional pat­tern that emerges more or less clearly from Hör­lein’s lec­tures is closely as­so­ci­ated with his view of the typ­i­cal phases or cy­cles of de­vel­op­ment of re­search in chemother­apy or phys­i­o­log­i­cal chem­istry. He saw a need for friendly and mu­tu­ally sup­port­ive re­la­tions be­tween in­dus­trial re­search and de­vel­op­ment or­ga­ni­za­tions, aca­d­e­mic in­sti­tu­tions, and clin­i­cians. He viewed the aca­d­e­mic-in­dus­trial con­nec­tion as cru­cial and mu­tu­ally ben­e­fi­cial. Un­der­ly­ing this view was his de­fi­n­i­tion and differ­en­ti­a­tion of the rel­e­vant dis­ci­plines and his be­lief in their gen­er­ally ex­cel­lent con­di­tion in Ger­many. He saw a need for gov­ern­ment sup­port of ap­pro­pri­ate in­sti­tu­tions, es­pe­cially re­search in­sti­tutes in uni­ver­si­ties. Within in­dus­trial re­search or­ga­ni­za­tion­s—and, im­plic­it­ly, within aca­d­e­mic ones—Hör­lein called for spe­cial in­sti­tu­tional arrange­ments to en­cour­age ap­pro­pri­ate in­ter­ac­tions be­tween chem­istry and bio­med­i­cine.

An el­e­ment of cru­cial—and to Hör­lein, per­son­al—im­por­tance in these in­ter­ac­tions was the role of the re­search man­ager or “team leader.” When Hör­lein spoke of the re­search done un­der his di­rec­tion as “our work,” he used the pos­ses­sive ad­vis­edly to con­vey a strong sense of his own par­tic­i­pa­tion. The re­search man­ager had to be ac­tive in defin­ing goals, in mar­shal­ing means and re­sources, and in as­sess­ing suc­cess or fail­ure. He had to in­ter­vene where nec­es­sary to min­i­mize fric­tion be­tween chemists and med­ical re­searchers, an es­pe­cially im­por­tant task for chemother­apy as a com­pos­ite en­ti­ty. He had to pub­li­cize the com­pa­ny’s suc­cess­es—a ne­ces­sity for what was ul­ti­mately a com­mer­cial en­ter­prise—and act as li­ai­son be­tween com­pany lab­o­ra­to­ries and the aca­d­e­mic and med­ical com­mu­ni­ties. Through it all, he had to take a long view of the value of re­search, not in­sist­ing on im­me­di­ate re­sults of med­ical or com­mer­cial val­ue.

As a re­search man­ager with train­ing and ex­pe­ri­ence in phar­ma­ceu­ti­cal chem­istry, a lively in­ter­est in med­i­cine, and rap­port with the med­ical com­mu­ni­ty, Hör­lein was well po­si­tioned to sur­vey the field where chem­istry and med­i­cine joined bat­tle against dis­ease. He could spot the points where the en­e­my’s line was bro­ken, and the re­verse salients in his own. What he could not do—or could not do alone—was to di­rect the day-to-day op­er­a­tions of his troops, that is, to de­fine the crit­i­cal prob­lems to be solved, to iden­tify the terms of their so­lu­tion, and to do the work that would carry the day. In the case of chemother­a­py, these things could be effected only by the med­ical re­searcher and the chemist, each work­ing on his own do­main, and co­op­er­a­tive­ly. For his at­tack on one of the most im­por­tant re­verse salients—the chemother­apy of bac­te­r­ial in­fec­tion­s—Hör­lein called upon the med­ical re­searcher Do­magk and the chemists Mi­et­zsch and Klar­er.

“Investing in Good Ideas That Look Like Bad Ideas”

Sum­mary by one VC of a16z in­vest­ment strat­e­gy.

Se­crets of Sand Hill Road: Ven­ture Cap­i­tal and How to Get It, by Scott Ku­por 2019 (), ex­cerpts

In a strange way, some­times fa­mil­iar­ity can breed con­temp­t—and con­verse­ly, the dis­tance from the prob­lem that comes from hav­ing a com­pletely differ­ent pro­fes­sional back­ground might ac­tu­ally make one a bet­ter founder. Though not ven­ture backed, South­west Air­lines was co­founded in 1967 by and of course has gone on to be­come a very suc­cess­ful busi­ness. When in­ter­viewed many years later about why, de­spite be­ing a lawyer by train­ing, he was the nat­ural founder for an air­line busi­ness, Kelle­her quipped: “I knew noth­ing about air­li­nes, which I think made me em­i­nently qual­i­fied to start one, be­cause what we tried to do at South­west was get away from the tra­di­tional way that air­lines had done busi­ness.”

This has his­tor­i­cally been less typ­i­cal in the ven­ture world, but, in­creas­ing­ly, as en­tre­pre­neurs take on more es­tab­lished in­dus­tries—­par­tic­u­larly those that are reg­u­lat­ed—bring­ing a view of the mar­ket that is un­con­strained by pre­vi­ous pro­fes­sional ex­pe­ri­ences may in fact be a plus. We often joke at a16z that there is a ten­dency to “fight the last bat­tle” in an area in which one has long-s­tand­ing pro­fes­sional ex­po­sure; the scars from pre­vi­ous mis­takes run too deep and can make it harder for one to de­velop cre­ative ways to ad­dress the busi­ness prob­lem at hand. Per­haps had Kelle­her known in­ti­mately of all the chal­lenges of en­ter­ing the air­line busi­ness, he would have run scream­ing from the chal­lenge ver­sus de­cid­ing to take on the full set of risks.

What­ever the ev­i­dence, the fun­da­men­tal ques­tion VCs are try­ing to an­swer is: Why back this founder against this prob­lem set ver­sus wait­ing to see who else may come along with a bet­ter or­ganic un­der­stand­ing of the prob­lem? Can I con­ceive of a team bet­ter equipped to ad­dress the mar­ket needs that might walk through our doors to­mor­row? If the an­swer is no, then this is the team to back.

The third big area of team in­ves­ti­ga­tion for VCs fo­cuses on the founder’s lead­er­ship abil­i­ties. In par­tic­u­lar, VCs are try­ing to de­ter­mine whether this founder will be able to cre­ate a com­pelling story around the com­pany mis­sion in or­der to at­tract great en­gi­neers, ex­ec­u­tives, sales and mar­ket­ing peo­ple, etc. In the same vein, the founder has to be able to at­tract cus­tomers to buy the pro­duct, part­ners to help dis­trib­ute the pro­duct, and, even­tu­al­ly, other VCs to fund the busi­ness be­yond the ini­tial round of fi­nanc­ing. Will the founder be able to ex­plain her vi­sion in a way that causes oth­ers to want to join her on this mis­sion? And will she walk through walls when the go­ing gets tough—which it in­evitably will in nearly all star­tup­s—and sim­ply refuse to even con­sider quit­ting?

When and first started An­dreessen Horow­itz, they de­scribed this founder lead­er­ship ca­pa­bil­ity as “ego­ma­ni­a­cal.” Their the­o­ry—notwith­stand­ing the choice of word­s—was that to make the de­ci­sion to be a founder (a job fraught with likely fail­ure), an in­di­vid­ual needed to be so con­fi­dent in her abil­i­ties to suc­ceed that she would bor­der on be­ing so self­-ab­sorbed as to be truly ego­ma­ni­a­cal. As you might imag­ine, the use of that term in our fund-rais­ing deck for our first fund struck a chord with a num­ber of our po­ten­tial in­vestors, who wor­ried that we would back in­suffer­able founders. We ul­ti­mately chose to aban­don our word choice, but the prin­ci­ple re­mains to­day: You have to be partly delu­sional to start a com­pany given the prospects of suc­cess and the need to keep push­ing for­ward in the wake of the con­stant stream of doubters.

After all, nonob­vi­ous ideas that could in fact be­come big busi­nesses are by de­fi­n­i­tion nonob­vi­ous. My part­ner de­scribes our job as VCs as in­vest­ing in good ideas that look like bad ideas. If you think about the spec­trum of things in which you could in­vest, there are good ideas that look like good ideas. These are tempt­ing, but likely can’t gen­er­ate out­size re­turns be­cause they are sim­ply too ob­vi­ous and in­vite too much com­pe­ti­tion that squeezes out the eco­nomic rents. Bad ideas that look like bad ideas are also eas­ily dis­missed; as the de­scrip­tion im­plies, they are sim­ply bad and thus likely to be trap­doors through which your in­vest­ment dol­lars will van­ish. The tempt­ing deals are the bad ideas that look like good ideas, yet they ul­ti­mately con­tain some hid­den flaw that re­veals their true “bad­ness”. This leaves good VCs to in­vest in good ideas that look like bad ideas—hid­den gems that prob­a­bly take a slightly delu­sional or un­con­ven­tional founder to pur­sue. For if they were ob­vi­ously good ideas, they would never pro­duce ven­ture re­turns.


  1. Thiel uses the ex­am­ple of ‘New France’/the Louisiana Ter­ri­to­ry, in which the pro­jec­tions of et al that it (and thus the ) would be as valu­able as France it­self turned out to be cor­rec­t—just cen­turies lat­er, with the ben­e­fits re­dound­ing to the British colonies. Even the Mis­sis­sippi Com­pany worked out: “The ships that went abroad on be­half of his great com­pany be­gan to turn a profit. The au­di­tor who went through the com­pa­ny’s books con­cluded that it was en­tirely sol­ven­t—which is­n’t sur­pris­ing, when you con­sider that the lands it owned in Amer­ica now pro­duce tril­lions of dol­lars in eco­nomic val­ue.” One could also say the same thing of Chi­na: count­less Eu­ro­pean ob­servers fore­cast that China was a ‘sleep­ing gi­ant’ which, once it in­dus­tri­al­ized & mod­ern­ized, would again be a global pow­er. They were cor­rect, but many of them would be sur­prised & dis­ap­pointed how long it took.↩︎

  2. #326, “Part II. The Wan­derer And His Shadow”, .↩︎

  3. patent troll com­pany is also fea­tured in Mal­colm Glad­well’s es­say on mul­ti­ple in­ven­tion, “In the Air: Who says big ideas are rare?”; IV’s busi­ness model is to spew out patents for spec­u­la­tions that other peo­ple will then ac­tu­ally in­vent, who can then be ex­torted for li­cense fees when they make the in­ven­tions work in the real world & pro­duce val­ue. (This is as­sisted by the fact that patents no longer re­quire even the pre­tense of a work­ing mod­el.) As Bill Gates says, “I can give you fifty ex­am­ples of ideas they’ve had where, if you take just one of them, you’d have a startup com­pany right there.” —that this model works demon­strates the com­mon­ness of ‘mul­ti­ples’, the worth­less of ideas, and the moral bank­ruptcy of the cur­rent patent sys­tem.↩︎

  4. At the mar­gin, com­pared to other com­peti­tors in the VR space, like Valve’s con­cur­rent efforts, and every­thing that the Rift built on, did Luckey and co re­ally cre­ate ~$2.83$2.32014b of new val­ue? Or were they lucky in try­ing at the right time, and merely cap­tured all of the val­ue, be­cause a 99% ad­e­quate VR head­set is worth 0%, and they added the fi­nal 1%? If the lat­ter, how could IP or eco­nom­ics be fixed to more closely link in­ter­me­di­ate con­tri­bu­tions to the fi­nal re­sult to more closely ap­proach a fairer dis­tri­b­u­tion like the than con­tri­bu­tions be­ing , yield­ing last-mover win­ner-take-all dy­nam­ics?↩︎

  5. Bene­dict Evans (“In Praise of Fail­ure”) sum­ma­rizes the prob­lem:

    It’s pos­si­ble for a few peo­ple to take an idea and cre­ate a real com­pany worth bil­lions of dol­lars in less than a decade—to go from an idea and a few notes to Google or Face­book, or for that mat­ter or Ner­vana [?]. It’s pos­si­ble for en­tre­pre­neurs to cre­ate some­thing with huge im­pact.

    But equal­ly, any­thing with that much po­ten­tial has a high like­li­hood of fail­ure—if it was ob­vi­ously a good idea with no risks, every­one would be do­ing it. In­deed, it’s in­her­ent in re­ally trans­for­ma­tive ideas that they look like bad ideas—­Google, Ap­ple, Face­book and Ama­zon all did, some­times sev­eral times over. In hind­sight the things that worked look like good ideas and the ones that failed look stu­pid, but sadly it’s not that ob­vi­ous at the time. Rather, this is how the process of in­ven­tion and cre­ation works. We try things—we try to cre­ate com­pa­nies, prod­ucts and ideas, and some­times they work, and some­times they change the world. And so, we see, in our world around half such at­tempts fail com­plete­ly, and 5% or so go to the moon.

    It’s worth not­ing that ‘looks like a bad idea’ is flex­i­ble here: I em­pha­size that many good ideas look like bad ideas be­cause they’ve been tried be­fore & failed, but many oth­ers look bad be­cause a nec­es­sary change has­n’t yet hap­pened or peo­ple un­der­es­ti­mate ex­ist­ing tech­nol­o­gy.↩︎

  6. How about ? Clearly doomed to fail­ure when they re­fused Face­book’s $4$32013b buy­out offer and Face­book launched a di­rect com­peti­tor; by Feb­ru­ary 2020, they had failed their way to a mar­ket­cap of >$100b.↩︎

  7. Where there is, as Musk de­scribes it, a “grave­yard of com­pa­nies” like or . It may be rel­e­vant to note that Musk did not found Tesla; the two co-founders ul­ti­mately quit the com­pa­ny.↩︎

  8. As late as 2007–2008, Block­buster could have still beaten Net­flix, as its "To­tal Ac­cess" pro­gram demon­strat­ed, but CEO changes scup­pered its last chance. And, in­ci­den­tal­ly, offer­ing an ex­am­ple of why stock mar­kets are fine with pay­ing ex­ec­u­tives so much: a good ex­ec­u­tive can cre­ate—or de­stroy—the en­tire com­pa­ny. If Block­buster’s CEO had paid a pit­tance ~2000 to ac­qui­hire Net­flix & put Reed Hast­ings in charge, or if it had sim­ply stuck with its CEO in 2007 to stran­gle Net­flix with To­tal Ac­cess, its share­hold­ers would be far bet­ter off now. But it did­n’t.↩︎

  9. “Al­most Wikipedia: Eight Early En­cy­clo­pe­dia Projects and the Mech­a­nisms of Col­lec­tive Ac­tion”, Hill 2013; “Al­most-Wikipedias and in­no­va­tion in free col­lab­o­ra­tion pro­jects: why did 7 pre­de­ces­sors fail?”.↩︎

  10. Find­ing out these tid­bits is one rea­son I en­joyed read­ing Founders at Work: Sto­ries of Star­tups’ Early Days (ed Liv­ingston 2009; “In­tro­duc­tion”), be­cause the chal­lenges are not al­ways what you think they are. Pay­Pal’s ma­jor chal­lenge, for ex­am­ple, was not find­ing a mar­ket like eBay power sell­ers, but cop­ing with fraud as they scaled, which ap­par­ently was the un­do­ing of any num­ber of ri­vals.↩︎

  11. Per­son­al­ly, I was still us­ing Dog­pile un­til at least 2000.↩︎

  12. From Frock 2006, Chang­ing How the World Does Busi­ness: Fedex’s In­cred­i­ble Jour­ney to Suc­cess, in 1973:

    On sev­eral oc­ca­sions, we came within an inch of fail­ure, be­cause of dwin­dling fi­nan­cial re­sources, reg­u­la­tory road­blocks, or un­fore­seen events like the Arab oil em­bar­go. On­ce, Fred’s luck at the gam­ing ta­bles of Las Ve­gas helped to save the com­pany from fi­nan­cial dis­as­ter. An­other time, we had to ask our em­ploy­ees to hold their pay­checks while we waited for the next wave of fi­nanc­ing…Fred dumped his en­tire in­her­i­tance into the com­pany and was full speed ahead with­out con­cern for his per­sonal fi­nances.

    …The loan guar­an­tee from Gen­eral Dy­nam­ics raised our hopes and in­creased our spir­its, but also in­creased the pres­sure to fi­nal­ize the pri­vate place­ment. We con­tin­ued to be in des­per­ate fi­nan­cial trou­ble, par­tic­u­larly with our sup­pli­ers. The most de­mand­ing sup­pli­ers when it came to pay­ments were the oil com­pa­nies. Every Mon­day, they re­quired Fed­eral Ex­press to pre­pay for the an­tic­i­pated weekly us­age of jet fu­el. By mid-July our funds were so mea­ger that on Fri­day we were down to about $20,296$5,0001973 in the check­ing ac­count, while we needed $97,421$24,0001973 for the jet fuel pay­ment. I was still com­mut­ing to Con­necti­cut on the week­ends and re­ally did not know what was go­ing to tran­spire on my re­turn.

    How­ev­er, when I ar­rived back in Mem­phis on Mon­day morn­ing, much to my sur­prise, the bank bal­ance stood at nearly $129,895$32,0001973. I asked Fred where the funds had come from, and he re­spond­ed, “The meet­ing with the Gen­eral Dy­nam­ics board was a bust and I knew we needed money for Mon­day, so I took a plane to Las Ve­gas and won $109,599$27,0001973.” I said, “You mean you took our last $20,296$5,0001973—how could you do that?” He shrugged his shoul­ders and said, “What differ­ence did it make? With­out the funds for the fuel com­pa­nies, we could­n’t have flown any­way.” Fred’s luck held again. It was not much but it came at a crit­i­cal time and kept us in busi­ness for an­other week.

    This also il­lus­trates the ex post & fine line be­tween ‘vi­sion­ary founder’ & ‘crim­i­nal con artist’; had Fred­er­ick W. Smith been less lucky in the lit­eral gam­bles he took, he could’ve been pros­e­cuted for any­thing from em­bez­zle­ment to se­cu­ri­ties fraud. As a mat­ter of fact, Smith was pros­e­cut­ed—­for some­thing else en­tire­ly:

    Fred now re­vealed that a year ear­lier [also in 1973] he had forged doc­u­ments in­di­cat­ing ap­proval of a loan guar­an­tee by the En­ter­prise Com­pany with­out con­sent of the other board mem­bers, specifi­cally his two sis­ters and Bobby Cox, the En­ter­prise sec­re­tary. Our re­spected leader ad­mit­ted his cul­pa­bil­ity to the Fed­eral Ex­press board of di­rec­tors and to the in­vestors and lenders we were count­ing on to sup­port the sec­ond round of the pri­vate place­ment fi­nanc­ing. While it is pos­si­ble to un­der­stand that, un­der ex­treme pres­sure, Fred was act­ing to save Fed­eral Ex­press from al­most cer­tain bank­rupt­cy, and even to em­pathize with what he did, it nev­er­the­less ap­peared to be a se­ri­ous breach of con­duc­t…De­cem­ber 1975 was also the month that set­tled the mat­ter of the forged loan guar­an­tee doc­u­ments for the Union Bank. At his tri­al, Fred tes­ti­fied that as pres­i­dent of the En­ter­prise board and with sup­port­ing let­ters from his sis­ters, he had au­thor­ity to com­mit the board. After 10 hours of de­lib­er­a­tion, he was ac­quit­ted. If con­vict­ed, he would have faced a prison term of up to five years.

    Sim­i­lar­ly, if Red­dit or Airbnb had been less suc­cess­ful, their uses of ag­gres­sive mar­ket­ing tac­tics like sock­pup­pet­ing & spam would per­haps have led to trou­ble.↩︎

  13. To bor­row a phrase from Kel­ly:

    The elec­tric in­can­des­cent light­bulb was in­vent­ed, rein­vent­ed, coin­vent­ed, or “first in­vented” dozens of times. In their book Edis­on’s Elec­tric Light: Bi­og­ra­phy of an In­ven­tion, Robert Friedel, Paul Is­rael, and Bernard Finn list 23 in­ven­tors of in­can­des­cent bulbs prior to Edi­son. It might be fairer to say that Edi­son was the very last “first” in­ven­tor of the elec­tric light. These 23 bulbs (each an orig­i­nal in its in­ven­tor’s eyes) var­ied tremen­dously in how they fleshed out the ab­strac­tion of “elec­tric light­bulb.” Differ­ent in­ven­tors em­ployed var­i­ous shapes for the fil­a­ment, differ­ent ma­te­ri­als for the wires, differ­ent strengths of elec­tric­i­ty, differ­ent plans for the bases. Yet they all seemed to be in­de­pen­dently aim­ing for the same ar­che­typal de­sign. We can think of the pro­to­types as 23 differ­ent at­tempts to de­scribe the in­evitable generic light­bulb.

    This hap­pens even in lit­er­a­ture: Doyle’s Sher­lock Holmes sto­ries weren’t the first to in­vent “clues”, but the last (, , Ba­tu­man 2005), with other de­tec­tive fic­tion writ­ers do­ing things that can only be called ‘grotesque’; Moret­ti, baffled, re­counts that “one de­tec­tive, hav­ing de­duced that ‘the drug is in the third cup of coffee’, pro­ceeds to drink the coffee.”

    To give a per­sonal ex­am­ple: while re­search­ing , sup­pos­edly in­vented in 2013, I dis­cov­ered that they had been in­vented at least 10 times dat­ing back to 1966.↩︎

  14. “Kafka And His Pre­cur­sors”, Borges 1951:

    At one time I con­sid­ered writ­ing a study of Kafka’s pre­cur­sors. I had thought, at first, that he was as unique as the phoenix of rhetor­i­cal praise; after spend­ing a lit­tle time with him, I felt I could rec­og­nize his voice, or his habits, in the texts of var­i­ous lit­er­a­tures and var­i­ous ages…If I am not mis­tak­en, the het­ero­ge­neous pieces I have listed re­sem­ble Kafka; if I am not mis­tak­en, not all of them re­sem­ble each oth­er. This last fact is what is most sig­nifi­cant. Kafka’s idio­syn­crasy is present in each of these writ­ings, to a greater or lesser de­gree, but if Kafka had not writ­ten, we would not per­ceive it; that is to say, it would not ex­ist. The poem “Fears and Scru­ples” by proph­e­sies the work of Kafka, but our read­ing of Kafka no­tice­ably re­fines and di­verts our read­ing of the po­em. Brown­ing did not read it as we read it now. The word “pre­cur­sor” is in­dis­pens­able to the vo­cab­u­lary of crit­i­cism, but one must try to pu­rify it from any con­no­ta­tion of polemic or ri­val­ry. The fact is that each writer cre­ates his pre­cur­sors. His work mod­i­fies our con­cep­tion of the past, as it will mod­ify the fu­ture.’ In this cor­re­la­tion, the iden­tity or plu­ral­ity of men does­n’t mat­ter. The first Kafka of is less a pre­cur­sor of the Kafka of the gloomy myths and ter­ri­fy­ing in­sti­tu­tions than is Brown­ing or .

    ↩︎
  15. See also “An Oral His­tory of Nin­ten­do’s Power Glove”, and Poly­gon’s oral his­tory of the , .↩︎

  16. “In­te­grated Cog­ni­tive Sys­tems”, Michie 1970 (pg93–96 of Michie, On Ma­chine In­tel­li­gence):

    How long is it likely to be be­fore a ma­chine can be de­vel­oped ap­prox­i­mat­ing to adult hu­man stan­dards of in­tel­lec­tual per­for­mance? In a re­cent poll [8], thir­ty-five out of forty-two peo­ple en­gaged in this sort of re­search gave es­ti­mates be­tween ten and one hun­dred years. [8: Eu­ro­pean AISB Newslet­ter, no. 9, 4 (1969)] There is also fair agree­ment that the chief ob­sta­cles are not hard­ware lim­i­ta­tions. The speed of light im­poses the­o­ret­i­cal bounds on rates of in­for­ma­tion trans­fer, so that it was once rea­son­able to won­der whether these lim­its, in con­junc­tion with phys­i­cal lim­its to mi­cro­minia­tur­iza­tion of switch­ing and con­duct­ing el­e­ments, might give the bi­o­log­i­cal sys­tem an ir­re­ducible ad­van­tage. But re­cent es­ti­mates [9, 10], which are sum­ma­rized in Ta­bles 7.1 and 7.2, in­di­cate that this is not so, and that the bal­ance of ad­van­tage in terms of sheer in­for­ma­tion-han­dling power may even­tu­ally like with the com­puter rather than the brain. It seems a rea­son­able guess that the bot­tle­neck will never again lie in hard­ware speeds and stor­age ca­pac­i­ties, as op­posed to purely log­i­cal and pro­gram­ming prob­lems. Granted that an ICS can be de­vel­oped, is now the right time to mount the effort?

    ↩︎
  17. Michie 1970:

    Yet the prin­ci­ple of ‘un­ripe time’, dis­tilled by F. M. Corn­ford [15] more than half a cen­tury ago from the change­less stream of Cam­bridge aca­d­e­mic life, has pro­vided the epi­taph of more than one pre­ma­ture tech­nol­o­gy. The aero­plane in­dus­try can­not now re­deem Daedalus nor can the com­puter in­dus­try re­cover the money spent by the British Ad­mi­ralty more than a hun­dred years ago in sup­port of Charles Bab­bage and his cal­cu­lat­ing ma­chine. Al­though Bab­bage was one of Britain’s great in­no­v­a­tive ge­nius­es, sup­port of his work was wasted money in terms of tan­gi­ble re­turn on in­vest­ment. It is now ap­pre­ci­ated that of the fac­tors needed to make the stored-pro­gram dig­i­tal com­puter a tech­no­log­i­cal re­al­ity only one was miss­ing: the means to con­struct fast switch­ing el­e­ments. The greater part of a cen­tury had to elapse be­fore the vac­uum tube ar­rived on the scene.

    ↩︎
  18. Trans­la­tion, Kat­suki Seki­da, Two Zen Clas­sics: The Gate­less Gate and The Blue Cliff Records, 2005.↩︎

  19. Which as a side note is wrong; com­piled pre­dic­tions ac­tu­ally in­di­cate that AI re­searcher fore­casts, while vary­ing any­where from a decade to cen­turies, typ­i­cally clus­ter around 20 years in the fu­ture re­gard­less of re­searcher age. For a re­cent time­line sur­vey, see , Gruet­zemacher et al 2019, and for more, AI Im­pact­s.org. (One won­ders if a 20-year fore­cast might be dri­ven by : in an ex­po­nen­tial­ly-grow­ing field, most re­searchers will be present in the fi­nal ‘gen­er­a­tion’, and so a pri­ori one could pre­dict ac­cu­rately that it will be 20 years to AI. In this re­gard, it is amus­ing to note the ex­po­nen­tial growth of con­fer­ences like NIPS or ICML 2010–2019.)↩︎

  20. Michie 1970:

    …A fur­ther ap­pli­ca­tion of cri­te­rion 4 arises if the­o­ret­i­cal in­fea­si­bil­ity is demon­strat­ed…But it is well to look on such neg­a­tive proofs with cau­tion. The pos­si­bil­ity of broad­cast­ing ra­dio waves across the At­lantic was con­vinc­ingly ex­cluded by the­o­ret­i­cal analy­sis. This did not de­ter Mar­coni from the at­tempt, even though he was as un­aware of the ex­is­tence of the Heav­i­side layer as every­one else.

    ↩︎
  21. Michie 1970:

    It can rea­son­ably be said that time was un­ripe for dig­i­tal com­put­ing as an in­dus­trial tech­nol­o­gy. But it is by no means ob­vi­ous that it was un­ripe for Bab­bage’s re­search and de­vel­op­ment effort, if only it had been con­ceived in terms of a more se­verely de­lim­ited ob­jec­tive: the con­struc­tion of a work­ing mod­el. Such a de­vice would not have been aimed at the then un­at­tain­able goal of eco­nomic vi­a­bil­i­ty; but its suc­cess­ful demon­stra­tion might, just con­ceiv­ably, have greatly ac­cel­er­ated mat­ters when the time was fi­nally ripe. Vac­uum tube tech­nol­ogy was first ex­ploited for high­-speed dig­i­tal com­put­ing in Britain dur­ing the Sec­ond World War [16]. But it was left to Eck­ert and Mauchly [16] sev­eral years later to re­dis­cover and im­ple­ment the con­cep­tions of stored pro­grams and con­di­tional jumps, which had al­ready been present in Bab­bage’s an­a­lyt­i­cal en­gine [17]. Only then could the new tech­nol­ogy claim to have drawn level with Bab­bage’s de­sign ideas of a hun­dred years ear­li­er.

    ↩︎
  22. A kind of de­fi­n­i­tion of Value of In­for­ma­tion:

    If you do not work on an im­por­tant prob­lem, it’s un­likely you’ll do im­por­tant work. It’s per­fectly ob­vi­ous. Great sci­en­tists have thought through, in a care­ful way, a num­ber of im­por­tant prob­lems in their field, and they keep an eye on won­der­ing how to at­tack them. Let me warn you, ‘im­por­tant prob­lem’ must be phrased care­ful­ly. The three out­stand­ing prob­lems in physics, in a cer­tain sense, were never worked on while I was at Bell Labs. By im­por­tant I mean guar­an­teed a No­bel Prize and any sum of money you want to men­tion. We did­n’t work on (1) time trav­el, (2) tele­por­ta­tion, and (3) anti­grav­i­ty. They are not im­por­tant prob­lems be­cause we do not have an at­tack. It’s not the con­se­quence that makes a prob­lem im­por­tant, it is that you have a rea­son­able at­tack. That is what makes a prob­lem im­por­tant.

    ↩︎
  23. “Ed Boy­den on Mind­ing your Brain (Ep. 64)”:

    BOYDEN: …One idea is, how do we find the di­a­monds in the rough, the big ideas but they’re kind of hid­den in plain sight? I think we see this a lot. Ma­chine learn­ing, deep learn­ing, is one of the hot top­ics of our time, but a lot of the math was worked out decades ago—back­prop­a­ga­tion, for ex­am­ple, in the 1980s and 1990s. What has changed since then is, no doubt, some im­prove­ments in the math­e­mat­ics, but large­ly, I think we’d all agree, bet­ter com­pute power and a lot more da­ta.

    So how could we find the trea­sure that’s hid­ing in plain sight? One of the ideas is to have sort of a SWAT team of peo­ple who go around look­ing for how to con­nect the dots all day long in these serendip­i­tous ways.

    COWEN: Two last ques­tions. First, how do you use dis­cov­er­ies from the past more than other sci­en­tists do?

    BOYDEN: One way to think of it is that, if a sci­en­tific topic is re­ally pop­u­lar and every­body’s do­ing it, then I don’t need to be part of that. What’s the ben­e­fit of be­ing the 100,000th per­son work­ing on some­thing?

    So I read a lot of old pa­pers. I read a lot of things that might be for­got­ten be­cause I think that there’s a lot of trea­sure hid­ing in plain sight. As we dis­cussed ear­lier, and both be­gin from pa­pers from other fields, some of which are quite old and which mostly had been ig­nored by other peo­ple.

    I some­times prac­tice what I call ‘fail­ure re­boot­ing’. We tried some­thing, or some­body else tried some­thing, and it did­n’t work. But you know what? Some­thing hap­pened that made the world differ­ent. Maybe some­body found a new gene. Maybe com­put­ers are faster. Maybe some other dis­cov­ery from left field has changed how we think about things. And you know what? That old failed idea might be ready for prime time.

    With op­to­ge­net­ics, peo­ple were try­ing to con­trol brain cells with light go­ing back to 1971. I was ac­tu­ally read­ing some ear­lier pa­pers. There were peo­ple play­ing around with con­trol­ling brain cells with light go­ing back to the 1940s. What is differ­ent? Well, this class of mol­e­cules that we put into neu­rons had­n’t been dis­cov­ered yet.

    ↩︎
  24. “Was Moore’s Law In­evitable?”, Kevin Kelly again:

    Lis­ten to the tech­nol­o­gy, Carver Mead says. What do the curves say? Imag­ine it is 1965. You’ve seen the curves Gor­don Moore dis­cov­ered. What if you be­lieved the story they were try­ing to tell us: that each year, as sure as win­ter fol­lows sum­mer, and day fol­lows night, com­put­ers would get half again bet­ter, and half again small­er, and half again cheap­er, year after year, and that in 5 decades they would be 30 mil­lion times more pow­er­ful than they were then, and cheap. If you were sure of that back then, or even mostly per­suad­ed, and if a lot of oth­ers were as well, what good for­tune you could have har­vest­ed. You would have needed no other prophe­cies, no other pre­dic­tions, no other de­tails. Just know­ing that sin­gle tra­jec­tory of Moore’s, and none oth­er, we would have ed­u­cated differ­ent­ly, in­vested differ­ent­ly, pre­pared more wisely to grasp the amaz­ing pow­ers it would sprout.

    ↩︎
  25. It’s not enough to the­o­rize about the pos­si­bil­ity or pro­to­type some­thing in the lab if there is then no fol­lowup. The mo­ti­va­tion to take some­thing into the ‘real world’, which nec­es­sar­ily re­quires at­tack­ing the re­verse salients, may be part of why cor­po­rate & aca­d­e­mic re­search are both nec­es­sary; too lit­tle of ei­ther cre­ates a bot­tle­neck. A place like Bell Labs ben­e­fits from re­main­ing in con­tact with the needs of com­merce, as it pro­vides a check on l’art pour l’art patholo­gies, a fer­tile source of prob­lems, and can feed back the ben­e­fits of mass pro­duc­tion/­ex­pe­ri­ence curves. (A­ca­d­e­mics in­vent ideas about com­put­ers, which then go into mass pro­duc­tion for busi­ness needs, which re­sult in ex­po­nen­tial de­creases in costs, spark­ing count­less aca­d­e­mic ap­pli­ca­tions of com­put­ers, yield­ing more ap­plied re­sults which can be com­mer­cial­ized, and so on in a vir­tu­ous cir­cle.) In re­cent times, cor­po­rate re­search has di­min­ished, and that may be a bad thing: , Arora et al 2020.↩︎

  26. One might ap­peal to the as a guide to how much in­di­vid­u­als should wa­ger on ex­per­i­ments, since the Kelly cri­te­rion gives op­ti­mal growth of wealth over the long-term while avoid­ing gam­bler’s ru­in, but given the ex­tremely small num­ber of ‘wa­gers’ an in­di­vid­ual en­gages in, with a highly fi­nite hori­zon, the Kelly cri­te­ri­on’s as­sump­tions are far from sat­is­fied, and the true op­ti­mal strat­egy can be rad­i­cally differ­ent from a naive Kelly cri­te­ri­on; I ex­plore this differ­ence more in , which is mo­ti­vated by stock­-mar­ket in­vest­ing.↩︎

  27. Thomp­son sam­pling, in­ci­den­tal­ly, has been re­dis­cov­ered↩︎

  28. PSRL (pos­te­rior , see also Ghavamzadeh et al 2016) gen­er­al­izes Thomp­son sam­pling to more com­plex prob­lems, MDPs or POMDPs in gen­er­al, by for each it­er­a­tion, as­sum­ing an en­tire col­lec­tion or dis­tri­b­u­tion of pos­si­ble en­vi­ron­ments which are more com­plex than a sin­gle-step ban­dit, pick­ing an en­vi­ron­ment at ran­dom based on its prob­a­bil­ity of be­ing the real en­vi­ron­ment, find­ing the op­ti­mal ac­tions for that one, and then act­ing on that so­lu­tion; this does the same thing in smoothly bal­anc­ing ex­plo­ration with ex­ploita­tion. Nor­mal PSRL re­quires ‘episodes’, which don’t re­ally have a re­al-world equiv­a­lent, but PSRL can be ex­tended to han­dle con­tin­u­ous ac­tion—a nice ex­am­ple is , which does ‘back off’ in pe­ri­od­i­cally stop­ping, and re-e­val­u­at­ing the op­ti­mal strat­egy based on ac­cu­mu­lated ev­i­dence, but less & less often, so it does PSRL over in­creas­ingly large time win­dows.↩︎

  29. Po­lar­iz­ing here could be re­flect a wide pos­te­rior value dis­tri­b­u­tion, or if the pos­te­rior is be­ing ap­prox­i­mated by some­thing like a mix­ture of ex­perts or an en­sem­ble of mul­ti­ple mod­els (like run­ning mul­ti­ple passes over a dropout-trained neural net­work, or a boot­strapped neural net­work en­sem­ble). In a hu­man set­ting, it might be po­lar­iz­ing in the sense of hu­man peer-re­view­ers ar­gu­ing the most about it, or hav­ing the least in­ter-rater agree­ment or high­est vari­ance of rat­ings.

    As Gold­stein & Kear­ney 2017 de­scribe their analy­sis of the nu­mer­i­cal peer re­viewer rat­ings of DARPA pro­pos­als:

    In other words, ARPA-E PDs tend to fund pro­pos­als on which re­view­ers dis­agree, given the same mean over­all score. When min­i­mum and max­i­mum score are in­cluded in the same mod­el, the co­effi­cient on min­i­mum score dis­ap­pears. This sug­gests that ARPA-E PDs are more likely to se­lect pro­pos­als that were high­ly-rated by at least one re­view­er, but they are not de­terred by the pres­ence of a low rat­ing. This trend per­sists when me­dian score is in­cluded (Model 7 in Ta­ble 3). ARPA-E PDs tend to agree with the bulk of re­view­ers, and they also tend to agree with scores in the up­per tail of the dis­tri­b­u­tion. They use their dis­cre­tion to sur­face pro­pos­als that have at least one cham­pi­on, re­gard­less of whether there are any de­trac­tors…The re­sults show that there is greater ex ante un­cer­tainty in the ARPA-E re­search port­fo­lio com­pared to pro­pos­als with the high­est mean scores (Model 1).

    ↩︎
  30. The differ­ent patholo­gies might be: small ones will col­lec­tively try lots of strange or novel ideas but will fail by run­ning un­der­pow­ered poor­ly-done ex­per­i­ments (for lack of fund­ing & ex­per­tise) which con­vince no one, suffer from smal­l­-s­tudy bi­as­es, and merely pol­lute the lit­er­a­ture, giv­ing meta-an­a­lysts mi­graines. Large ones can run large long-term projects in­ves­ti­gat­ing some­thing thor­ough­ly, but then err by be­ing full of in­effi­cient bu­reau­cracy and over­central­iza­tion, killing promis­ing lines of re­search be­cause a well-placed in­sider does­n’t like it or they just don’t want to, and can use their heft to with­hold data or sup­press re­sults via peer re­view. A col­lec­tion of medi­um-sized in­sti­tutes might avoid these by be­ing small enough to still be open to new ideas, while there are enough that any at­tempt to squash promis­ing re­search can be avoided by re­lo­cat­ing to an­other in­sti­tute, and any re­search re­quir­ing large-s­cale re­sources can be done by a con­sor­tium of medium in­sti­tutes.

    Mod­ern ge­nomics strikes me as a bit like this. Can­di­date-gene stud­ies were done by every Tom, Dick, and Har­ry, but the method­ol­ogy failed com­pletely be­cause sam­ple sizes many or­ders of mag­ni­tude larger were nec­es­sary. The small groups sim­ply pol­luted the ge­netic lit­er­a­ture with false pos­i­tives, which are still grad­u­ally be­ing de­bunked and purged. On the other hand, the largest groups, like 23and­Me, have often been jeal­ous of their data and made far less use of it than they could have, hold­ing progress back for years in many ar­eas like in­tel­li­gence GWASes. The UK Biobank has pro­duced an amaz­ing amount of re­search for a large group, but is the ex­cep­tion that proves the rule: their open­ness to re­searchers is (sad­ly) ex­tra­or­di­nar­ily un­usu­al. Much progress has come from groups like SSGAC or PGC, which are con­sor­tiums of groups of all sizes (with some highly con­di­tional par­tic­i­pa­tion from 23and­Me).↩︎

  31. Iron­i­cal­ly, as I write this in 2018, DARPA has re­cently an­nounced an­other at­tempt at “sil­i­con com­pil­ers”, pre­sum­ably sparked by com­mod­ity chips top­ping out and ASICs be­ing re­quired, which I can only sum­ma­rize as “ but let’s do it sanely this time and with FLOSS rather than a crazy pro­pri­etary ecosys­tem of crap”.↩︎

  32. Specifi­cal­ly, con­tem­po­rary com­put­ers don’t use the dense grid of 1-bit proces­sors with lo­cal mem­ory which char­ac­ter­ized the CM. They do fea­ture in­creas­ingly thou­sands of ‘proces­sor’ equiv­a­lents in the form of CPU cores and the GPU cores, but those are all far more pow­er­ful than a CM CPU node. But we might yet see some con­ver­gence with the CM thanks to neural net­works: neural net­works are typ­i­cally trained with waste­fully pre­cise float­ing point op­er­a­tions, slow­ing them down, thus the rise of ‘ten­sor cores’ and ‘TPUs’ us­ing lower pre­ci­sion, like 8-bit in­te­gers, and it is pos­si­ble to dis­cretize neural nets all the way down to bi­nary weights. This offers a lot of po­ten­tial elec­tric­ity sav­ings, and if you have bi­nary weights, why not bi­nary com­put­ing el­e­ments as well…?↩︎

  33. Which could also be said of orig­i­nal vi­sion, as only par­tially demon­strated in the 1968.↩︎

  34. Peo­ple tend to ig­nore this, but CNNs can work with a few hun­dred or even just one or two im­ages, us­ing trans­fer learn­ing, few-shot learn­ing, and ag­gres­sive reg­u­lar­iza­tion like data aug­men­ta­tion.↩︎

  35. While the ac­cu­racy rates may in­crease by what looks like a tiny amount, and one might ask how im­por­tant a change from 99% to 99.9% ac­cu­racy is, the large-s­cale train­ing pa­pers demon­strate that neural nets con­tinue to learn hid­den knowl­edge from the ad­di­tional data which pro­vide ever bet­ter se­man­tic fea­tures which can be reused else­where.↩︎