Timing Technology: Lessons From The Media Lab

Technological developments can be foreseen but the knowledge is largely useless because startups are inherently risky and require optimal timing. A more practical approach is to embrace uncertainty, taking a reinforcement learning perspective.
history, technology, reviews, sociology, decision-theory, insight-porn
2012-07-122019-06-20 finished certainty: likely importance: 6


How do you time your star­tup? Tech­no­log­i­cal fore­casts are often sur­pris­ingly pre­scient in terms of pre­dict­ing that some­thing was pos­si­ble & desir­able and what they pre­dict even­tu­ally hap­pens; but they are far less suc­cess­ful at pre­dict­ing the tim­ing, and almost always fail, with the suc­cess (and rich­es) going to anoth­er.

Why is their knowl­edge so use­less? Why are suc­cess and fail­ure so inter­twined in the tech indus­try? The right moment can­not be known exactly in advance, so attempts to fore­cast will typ­i­cally be off by years or worse. For many claims, there is no way to invest in an idea except by going all in and launch­ing a com­pa­ny, result­ing in extreme vari­ance in out­comes, even when the idea is good and the fore­casts cor­rect about the (even­tu­al) out­come.

Progress can hap­pen and can be fore­seen long before, but the details and exact tim­ing due to bot­tle­necks are too diffi­cult to get right. Launch­ing too early means fail­ure, but being con­ser­v­a­tive & launch­ing later is just as bad because regard­less of fore­cast­ing, a good idea will draw over­ly-op­ti­mistic researchers or entre­pre­neurs to it like : all get immo­lated but the one with the dumb luck to kiss the flame at the per­fect instant, who then wins every­thing, at which point every­one can see that the opti­mal time is past. All major suc­cess sto­ries over­shadow their long list of pre­de­ces­sors who did the same thing, but got unlucky. The les­son of his­tory is that for every lesson, there is an equal and oppo­site les­son. So, ideas can be divided into the over­ly-op­ti­mistic & likely doomed, or the fait accom­pli. On an indi­vid­ual lev­el, ideas are worth­less because so many oth­ers have them too—‘mul­ti­ple inven­tion’ is the rule, and not the excep­tion. Pro­gress, then, depends on the ‘unrea­son­able man’.

This over­all prob­lem falls under the rein­force­ment learn­ing par­a­digm, and suc­cess­ful approaches are anal­o­gous to Thomp­son sampling/posterior sam­pling: even an informed strat­egy can’t reli­ably beat ran­dom explo­ration which grad­u­ally shifts towards suc­cess­ful areas while con­tin­u­ing to take occa­sional long shots. Since peo­ple tend to sys­tem­at­i­cally over-ex­ploit, how is this imple­ment­ed? Appar­ently by indi­vid­u­als act­ing sub­op­ti­mally on the per­sonal lev­el, but opti­mally on soci­etal level by serv­ing as ran­dom explo­ration.

A major ben­e­fit of R&D, then, is in lay­ing fal­low until the ‘ripe time’ when they can be imme­di­ately exploited in pre­vi­ous­ly-un­pre­dictable ways; applied R&D or VC strate­gies should focus on main­tain­ing diver­sity of invest­ments, while con­tin­u­ing to flex­i­bly revisit pre­vi­ous fail­ures which fore­casts indi­cate may have reached ‘ripe time’. This bal­ances over­all exploita­tion & explo­ration to progress as fast as pos­si­ble, show­ing the use­ful­ness of tech­no­log­i­cal fore­cast­ing on a global level despite its use­less­ness to indi­vid­u­als.

In the 1980s, famed tech­nol­o­gist vis­ited the equal­ly-famed (per­haps the truest spir­i­tual descen­dant of the ) & , pub­lish­ing a 1988 book, The Media Lab: Invent­ing the Future at M.I.T. (TML). Brand sum­ma­rized the projects he saw there and Lab mem­bers’ extrap­o­la­tions into the future which guided their pro­jects, and added his own fore­cast­ing thoughts.

Visiting the Media Lab

Three decades lat­er, the book is highly dat­ed, and the descrip­tions are of mostly his­tor­i­cal inter­est for the devel­op­ment of var­i­ous tech­nolo­gies (par­tic­u­larly in the 1990s). But enough time has passed since 1988 to enable us to judge the basic truth­ful­ness of the pre­dic­tions and expec­ta­tions held by the dream­ers such as Nicholas Negro­pon­te: they were remark­ably accu­rate! And the Media Lab was­n’t the only one, (1989), had an almost iden­ti­cal vision of a net­worked future pow­ered by small touch­screen devices. (And what about Dou­glas Engel­bart, or Alan Kay/Xerox PARC, who explic­itly aimed to ‘skate towards where the puck would be’?) If you aren’t struck by a sense of déjà vu or pity when you read this book, com­pare the claims by peo­ple at the Media Lab with con­tem­po­rary—or lat­er—­works like , and you’ll see how right they were.

Déjà vu, because what was described in TML on every other page is rec­og­niz­ably ordi­nary life in the 1990s and 2000s, never mind the 2010s, from the spread of broad­band to the even­tual impact of smart­phones.

And pity, because the sad thing is not­ing how few future mil­lion­aires or bil­lion­aires grace the page of TML—one quickly real­izes that yes, per­son X was 100% right about Y hap­pen­ing even when every­one thought it insane, except that X was just a bit off, by a few years, and either jumped the gun or was too late, and so some other Z who does­n’t even appear in TML was the per­son who wound up tak­ing all the spoils. I read it con­stantly think­ing ‘yes, yes, you were right—­for all the good it did you!’, or ‘not quite, it’d actu­ally take another decade for that to really work out’.

To Everything A Season

“I basi­cally think all the ideas of the ’90s that every­body had about how this stuff was going to work, I think they were all right, they were all cor­rect. I think they were just ear­ly.”

Marc Andreessen, 2014

The ques­tion con­stantly asked of any­one who claims to know a bet­ter way (as futur­ol­o­gists implic­itly do): “If you’re so smart, why aren’t you rich?” The les­son I draw is: it is not enough to pre­dict the future, one has to get the tim­ing right to not be ruined, and then exe­cute, and then get lucky in a myr­iad ways.

Many ‘bub­bles’ can be inter­preted as peo­ple being 100% cor­rect the future—but miss­ing the tim­ing (1, The Econ­o­mist on obscure prop­erty booms, Gar­ber’s Famous First Bub­bles). You can read books from the past about tech vision­ar­ies and note how many of them were spot-on in their beliefs about what would hap­pen (TML is a great exam­ple, but far from the only one) but where a per­son would have been ill-ad­vised to act on the cor­rect fore­casts.

Not To the Swift

“Who­ever does not know how to hit the nail on the head should be entreated not to hit the nail at all.”

Friedrich Niet­zsche2

Many star­tups have a long list of failed pre­de­ces­sors who tried to do much the same thing, often simul­ta­ne­ous with sev­eral other com­peti­tors (star­tups are just as sus­cep­ti­ble to as science/technology in gen­eral3). What made them a suc­cess was that they hap­pened to give the pinata a whack at the exact moment where some S-curves or events hit the right point. Con­sider the ill-fated : was the investor right to believe that Amer­i­cans would spend a ton of money online such as for buy­ing dog­food? Absolute­ly, Ama­zon (which has rarely turned a profit and has sucked up far more invest­ment than Pet­s.­com ever did, a mere ~$466m) is a suc­cess­ful online retail busi­ness that stocks thou­sands of dog food vari­eties, to say noth­ing of all the other pet-re­lated goods it sells, and , which pri­mar­ily does pet food, filed for a mul­ti­-bil­lion-dol­lar IPO in 2019 on the strength of its bil­lions in rev­enue. But the value of Pet­s.­com stock still went to ~$0. Face­book is the biggest archive of pho­tographs there has ever been, with truly colos­sal stor­age require­ments; could it have suc­ceeded in the 1990s? , and not even lat­er, as demon­strated by and , and the lin­ger­ing death of MySpace. One of the most noto­ri­ous tech busi­ness fail­ures of the 1990s was the , but that was brought down by bizarrely self­-s­ab­o­tag­ing deci­sions on the part of Motoro­la, and when Motorola was finally removed from the equa­tion, Irid­ium found its mar­ket, and 2017 saw the launch of the sec­ond Irid­ium satel­lite con­stel­la­tion, Irid­ium NEXT, with com­pe­ti­tion from other since-launched satel­lite con­stel­la­tions, includ­ing SpaceX’s own nascent (aim­ing at global broad­band Inter­net) which launched no less than 60 satel­lites in May 2019. Or look at com­put­ers: imag­ine an early adopter of an Apple com­puter say­ing ‘every­one will use com­put­ers even­tu­al­ly!’ Yes, but not for another few decades, and ‘in the long run, we are all dead’. Early PC his­tory is rife with exam­ples of the pre­scient fail­ing.

Smart­phones are an even big­ger exam­ple of this. How often did I read in the ’90s and early ’00s about how amaz­ing Japan­ese cell­phones were and how amaz­ing a good smart­phone would be, even though year after year the phones were jokes and used pretty much solely for voice? You can see the smart­phones come up again and again in TML, as the vision­ar­ies real­ize how trans­for­ma­tive a mobile pock­et-sized com­puter would be. Yet, it took until the mid-00s for the promise of smart­phones to mate­ri­al­ize overnight, as it were, a suc­cess which went pri­mar­ily to late­com­ers Apple and Google, cut­ting out the pre­vi­ously high­ly-suc­cess­ful Nokia, never mind vision­ar­ies like Gen­eral Mag­ic. (You too can achieve overnight suc­cess in just a few decades of hard work…) A 2013 inter­view with Eric Jack­son looks back on smart­phone adop­tion rates:

Q: “What’s your take on how they’re [Ap­ple] han­dling their expan­sion into Chi­na, India, and other emerg­ing mar­kets?”

A: “It’s depress­ing how slow things are mov­ing on that front. We can draw lines on a graph but we don’t know the con­straints. Again, the issue with adop­tion is that the tim­ing is so damn hard. I was expect­ing smart­phones to take off in mid 2004 and was dis­ap­pointed over and over again. And then sud­denly a cat­a­lyst took hold and the adop­tion sky­rock­et­ed. Cook calls this ‘crack­ing the nut’. I don’t know what they can do to move faster but I sus­pect it has to do with place­ment (dis­tri­b­u­tion) and with net­works which both depend on (cor­rupt) enti­ties.”

In 2012, I watched impressed as my aunt used the iPhone appli­ca­tion Face­Time to video chat with her daugh­ter half a con­ti­nent away. In other words, her smart­phone is a video­phone; used to be one of the canon­i­cal exam­ples of how tech­nol­ogy failed, stem­ming from its appear­ance in the & but sub­se­quent fail­ure to usurp tele­phones. This was oft-cited as an exam­ple of how tech­nowee­nies failed to under­stand that peo­ple did­n’t really want video­phones at all—‘who wants to put on makeup before mak­ing a call?’, peo­ple offered as an expla­na­tion, in all seri­ous­ness—but real­ly, it looks like the video­phones back then sim­ply weren’t good enough.

Or to look at VR; I’ve noticed geeks express won­der­ment at the (and and and and …) bring­ing Vir­tual Real­ity to the mass­es, and won’t that be a kick in the teeth for the Cliff Stolls & Jaron Laniers (who gave up VR for dead decades ago)? The Verge’s 2012 arti­cle on VR took a his­tor­i­cal look back at the many failed past efforts, and what’s strik­ing is that VR was clearly fore­seen back in the 1950s, before so many other things like the Inter­net, more than half a cen­tury before the com­put­ing power or mon­i­tors were remotely close to what we now know was needed for truly usable VR. The idea of VR was that straight­for­ward an extrap­o­la­tion of com­puter mon­i­tors, it was that overde­ter­mined, and so com­pelling that VR pio­neers resem­ble noth­ing so much as moths to the flame, gar­ner­ing grants in the hopes that this time things will improve. And at some point, it does improve, and the first per­son to try at the right time may win the lot­tery; Palmer Luckey (founder of Ocu­lus, sold to Face­book for $2.83 bil­lion in March 20144):

Here’s a secret: the thing stop­ping peo­ple from mak­ing good VR and solv­ing these prob­lems was not tech­ni­cal. Some­one could have built the Rift in mid-to-late 2007 for a few thou­sand dol­lars, and they could have built it in mid-2008 for about $647. It’s just nobody was pay­ing atten­tion to that.

Go To The Ant, Thou Sluggard

“Yes, but when I dis­cov­ered it, it stayed dis­cov­ered.”

(at­trib­ut­ed; “Pity the Sci­en­tist Who Dis­cov­ers the Dis­cov­ered”)

“It’s impor­tant to be the last per­son to dis­cover some­thing.” (attrib­uted)

Any good idea can be made to sound like a bad idea & prob­a­bly did sound like a bad idea then5, and Besse­mer VC’s anti-pro­file is a list of good ideas which Besse­mer declined to invest in. Michael Wolfe offers some exam­ples of this:

  • Face­book: the world needs yet another MySpace or Friend­ster [or or…] except sev­eral years late. We’ll only open it up to a few thou­sand over­worked, anti-so­cial, Ivy Lea­guers. Every­one else will then join since Har­vard stu­dents are so cool.
  • Drop­box: we are going to build a file shar­ing and sync­ing solu­tion when the mar­ket has a dozen of them that no one uses, sup­ported by big com­pa­nies like Microsoft. It will only do one thing well, and you’ll have to move all of your con­tent to use it.
  • Vir­gin Atlantic: air­lines are cool. Let’s start one. How hard could it be? We’ll differ­en­ti­ate with a funny safety video and by not being a**­holes.
  • …iOS: a brand new oper­at­ing sys­tem that does­n’t run a sin­gle one of the mil­lions of appli­ca­tions that have been devel­oped for Mac OS, Win­dows, or Lin­ux. Only Apple can build apps for it. It won’t have cut and paste.
  • Google: we are build­ing the world’s 20th search engine at a time when most of the oth­ers have been aban­doned as being com­modi­tized money losers. We’ll strip out all of the ad-sup­ported news and por­tal fea­tures so you won’t be dis­tracted from using the free search stuff.
  • Tes­la: instead of just build­ing bat­ter­ies and sell­ing them to Detroit, we are going to build our own cars from scratch plus own the dis­tri­b­u­tion net­work. Dur­ing a reces­sion and a clean­tech back­lash.6
  • …Fire­fox: we are going to build a bet­ter web browser, even though 90% of the world’s com­put­ers already have a free one built in. One guy will do most of the work.

We can play this game all day:

  • How about ? “We’ll start off rent­ing peo­ple a doomed for­mat in a way infe­rior to our estab­lished com­peti­tor (which will choose to com­mit sui­cide by ignor­ing both mail order & Inter­net all the way until bank­ruptcy in 2010)7; this will (some­how) let us pivot to stream­ing, where we will license all our con­tent from our worst ene­mies, who will destroy us the instant we are too suc­cess­ful & already intend to run stream­ing ser­vices of their own—but that’s OK because we’ll just con­vince Wall Street to wait decades while giv­ing us hun­dreds of bil­lions of dol­lars to replace Hol­ly­wood by mak­ing thou­sands of film & TV series our­selves (de­spite the fact that we’ve never done any­thing like that before and there is no rea­son to think we would be any bet­ter at it than they are).”

  • Or : “We’ll offer code host­ing ser­vices like that of Source­Forge or Google Code which requires devel­op­ers to use one of the most user-hos­tile , only to FLOSS devel­op­ers who are noto­ri­ous cheap­skates, and charge them a few bucks for a pri­vate ver­sion.”

  • : “ has a mul­ti­-decade head­start but are fat and lazy; we’ll catch up by buy­ing some spare Russ­ian rock­ets while we invent our own futur­is­tic reusable ones. It’s only rocket sci­ence.”

  • //: “Taxis & bus­es. You’ve invented taxis & bus­es. And rental bikes.”

  • /: “We’ll do / again, minus the bank­rupt­cy.”

  • : “Every­one else’s online pay­ments has failed, so we’ll do it again, with anony­mous cryp­tog­ra­phy! On phones! In 1998! End-users love cryp­tog­ra­phy, right? If the soft­ware does­n’t work out, I guess we’ll… do some­thing else. We’re not sure what.” Lat­er: “oh, appar­ently eBay sell­ers like us so much they’re mak­ing their own pro­mo­tional mate­ri­als? What if instead of threat­en­ing to sue them, we tried work­ing with them?”

  • : “Text­PayMe worked out well, right?”

  • : “Online micro­pay­ments & patron­age schemes have failed hun­dreds of times and became a ’90s punch­line; might as well try again.”

  • : “Every online-only cur­rency from to to to [too many to list] has either failed or been shut down by gov­ern­ments; so, we’ll use ‘proof of work’—it’s a hilar­i­ously expen­sive cryp­to­graphic thing we just made up which has zero the­o­ret­i­cal sup­port for actu­ally ensur­ing decen­tral­iza­tion & cen­sor­proofing, and was roundly mocked by almost every e-cur­rency enthu­si­ast who both­ered to read the whitepa­per.”

  • //// (!): “Cyber­Slice blew through $167m+ try­ing to sell pizza online, but this time will be differ­ent.”

  • FedEx: “The expe­ri­enced & well-cap­i­tal­ized is already try­ing and fail­ing to make the hub-and-spoke air deliv­ery method work; I’ll blow my inher­i­tance on try­ing to com­pete with them while being so under­cap­i­tal­ized I’ll have to com­mit mul­ti­ple crimes to keep FedEx afloat like lit­er­ally gam­bling the com­pa­ny’s money at Las Vegas.”

  • : “ lit­er­ally invented the spread­sheet, has owned the mar­ket for 4 years despite clones like Microsoft­’s, and sin­gle­hand­edly made the Apple II PC a mega-suc­cess; we’ll write our own spread­sheet from scratch, fix­ing some of Visi­Cal­c’s prob­lems, and beat them to the IBM PC. Every­one will buy it sim­ply because it’ll be slightly bet­ter.”

  • : “We’ll max out our credit cards to let peo­ple ille­gally rent out their air mat­tresses en route to eat­ing the hotel indus­try.”

  • : “Banks & online pay­ment proces­sors like Pay­Pal are heav­i­ly-reg­u­lated ineffi­cient monop­o­lies which really suck; we’ll make friends with some banks and run a pay­ment proces­sor which does­n’t suck. Our sig­na­ture sell­ing point will be that it takes fewer lines of code to set up, so pro­gram­mers will like us.”

  • : “We’ll do social net­work­ing like has been by , forc­ing us to buy their patent.”

  • : “IRC+email but infi­nitely slower & more locked in. Busi­nesses won’t be able to get enough of it; employ­ees will love to hate it.”

  • : “We’ll com­pete with , the , , , , The Dis­trib­uted Ency­clo­pe­dia Project (TDEP), The­In­fo, & GNU’s by let­ting lit­er­ally any­one edit some draft arti­cles for .”8

You don’t have to be bipo­lar to be an entre­pre­neur, but it might help. (“The most suc­cess­ful peo­ple I know believe in them­selves almost to the point of delu­sion…”)

But Time And Chance

“After solv­ing a prob­lem, human­ity imag­ines that it finds in anal­o­gous solu­tions the key to all prob­lems. Every authen­tic solu­tion brings in its wake a train of grotesque solu­tions.”

, Nicolás Gómez Davi­la: An Anthol­ogy (orig­i­nal: Esco­l­ios a un Texto Implíc­i­to: Selec­ción, p. 430)

“You can’t pos­si­bly get a good tech­nol­ogy going with­out an enor­mous num­ber of fail­ures. It’s a uni­ver­sal rule. If you look at bicy­cles, there were thou­sands of weird mod­els built and tried before they found the one that really worked. You could never design a bicy­cle the­o­ret­i­cal­ly. Even now, after we’ve been build­ing them for 100 years, it’s very diffi­cult to under­stand just why a bicy­cle work­s–it’s even diffi­cult to for­mu­late it as a math­e­mat­i­cal prob­lem. But just by trial and error, we found out how to do it, and the error was essen­tial.”

, “Free­man Dyson’s Brain” 1998 (cf )

Why so many failed pre­de­ces­sors?

Part of the expla­na­tion is sur­vivor­ship bias caus­ing hind­sight bias. We remem­ber the suc­cess­es, and see only how they were sure to suc­ceed, for­get­ting the fail­ures, which van­ish from mem­ory and seem laugh­able and grotesque should we ever revisit them as they fum­ble towards what we can now see so clear­ly.

The ori­gins of many star­tups are highly idio­syn­cratic & chancy; eg. why should a pod­cast­ing com­pa­ny, , have led to ? Sur­vival alone is highly chancy, and founders can often see times where it came down to a dice roll.9 Like his­tor­i­cal events in gen­eral (), the impor­tance of an event or change is often known only in ret­ro­spect. Over­all, the odds of suc­cess are low, and the rewards are not great for most—de­spite the skewed dis­tri­b­u­tion pro­duc­ing occa­sional eye­-pop­ping returns in a few cas­es, the risk-ad­justed return of the tech­nol­ogy sec­tor or VC funds is not that much greater than the broader econ­o­my.

“Of course Google was always going to be a huge suc­cess because of and also (post hoc the­o­riz­ing) Z, Y, & Z”, except for the minor prob­lem that Google was merely one of many search engi­nes, great per­haps10 but not profitable, and did­n’t hit upon a profitable busi­ness mod­el—­much less a uni­corn-worth mod­el—un­til 4 years later when it copied Over­ture’s adver­tis­ing auc­tion, which was its sal­va­tion (In The Plex); in the mean time, Google had to sign poten­tially fatal deals or risk­ing burn­ing through the last of its cap­i­tal when minor tech­ni­cal glitches derailed vital deals. (All of which was doubt­less why Page & Brin tried & failed to sell Google to AltaVista & Excite & Yahoo early on, and nego­ti­ated a pos­si­ble sale with Yahoo as late as 2002 which they ulti­mately reject­ed.) In a coun­ter­fac­tual world, Google went down in flames quite eas­ily because it never hit upon the adver­tis­ing inno­va­tions that saved it, no mat­ter how much you liked PageR­ank, and any­thing else is hind­sight bias. , early on, could­n’t make pay­roll and the founder famously kept the planes fly­ing only by gam­bling the last of their money in Las Veg­as, among other near-death expe­ri­ences & crimes—just one of many star­tups doing highly ques­tion­able things.11 Both SpaceX & Tesla have come within days (or hours) of bank­rupt­cy, in 2008 and 2013; in the for­mer case, Musk bor­rowed money from friends to pay his rent after 3 rocket fail­ures in a row, and in the lat­ter, Musk report­edly went as far as secur­ing a pledge from Google to buy Tesla out­right rather than let it go bank­rupt (Vance 2015). Tes­la’s strug­gles in gen­eral are too well known to men­tion. Mark Zucker­berg, in 2004, wanted noth­ing more than to sell Face­book for a few mil­lion dol­lars so he could work on his P2P file­shar­ing pro­gram, , com­ment­ing that the sale price just needed to be large enough “to pro­pel Wire­hog.” Youtube was a dat­ing site. wanted to make a MMORPG game which failed, and all he could sal­vage out of it was the pho­to-shar­ing part, which became ; he still really wanted to make a MMORPG, so after Flickr, he founded a com­pany to make the MMORPG which… also failed, so after try­ing to shut down his com­pany and being told not to by his investors, he sal­vaged the chat part from it, which became . And, con­sis­tent with the idea that there is a large inerad­i­ca­ble ele­ment of chance to it, sur­veys of star­tups sug­gest that while there are indi­vid­ual differ­ences in odds of suc­cess (‘skill’), any founder learn­ing curve (‘learn­ing-by-do­ing’) is small & suc­cess prob­a­bil­ity remains low regard­less of expe­ri­ence (Gom­pers et al 2006/Gom­pers 2010, Parker 2011, Gottschalk 2014), and expe­ri­enced entre­pre­neurs still have low odds of fore­cast­ing star­tups achiev­ing com­mer­cial­iza­tion at all, approach­ing ran­dom pre­dic­tions in “non-R&D-in­ten­sive sec­tors” (eg Scott et al 2019, ).

Thiel (; orig­i­nal): “Every moment in busi­ness hap­pens only once. The next Bill Gates will not build an oper­at­ing sys­tem. The next Larry Page or Sergey Brin won’t make a search engine. And the next Mark Zucker­berg won’t cre­ate a social net­work. If you are copy­ing these guys, you aren’t learn­ing from them.”. This is true but I would say it reverses the order (‘N to N+1’?): you will not be the next Bill Gates, because Bill Gates was not the first and only Bill Gates, he was, pace , the last Bill Gates12; many peo­ple made huge for­tunes off OSes, both before and after Gates—you may have for­got­ten , but hope­fully you remem­ber Steve Jobs (be­fore, Mac) and Steve Jobs (after, NeX­T). Sim­i­lar­ly, Mark Zucker­berg was not the first and only Zucker­berg, he was the last Zucker­berg; many peo­ple made social net­work­ing for­tunes before him—­maybe Orkut did­n’t make its Google inven­tor a for­tune, but you can bet that MySpace’s DeWolfe and Ander­son did well. And there were plenty of lucra­tive search engine founders (is still a bil­lion­aire? Yes).

Gates, how­ev­er, proved the mar­ket, and refined the Gates strat­egy to per­fec­tion, using up the trick; no one can get his­tor­i­cally rich off ship­ping an OS plus some busi­ness pro­duc­tiv­ity soft­ware because there are too many com­peti­tors and too many play­ers inter­ested in , and so oppor­tu­nity has moved on to the next area.

A suc­cess­ful com­pany rewrites his­tory and its pre­cur­sors13; his­tory must be lived for­ward, pro­gress­ing to an obscure des­ti­na­tion, but we always recall it back­wards as pro­gress­ing towards the clar­ity of the pre­sent.

The Wise in their Craftiness

“It is uni­ver­sally admit­ted that the uni­corn is a super­nat­ural being and one of good omen; thus it is declared in the Odes, in the Annals, in the biogra­phies of illus­tri­ous men, and in other texts of unques­tioned author­i­ty. Even the women and chil­dren of the com­mon peo­ple know that the uni­corn is a favor­able por­tent. But this ani­mal does not fig­ure among the domes­tic ani­mals, it is not easy to find, it does not lend itself to any clas­si­fi­ca­tion. It is not like the horse or the bull, the wolf or the deer. Under such con­di­tions, we could be in the pres­ence of a uni­corn and not know with cer­tainty that it is one. We know that a given ani­mal with a mane is a horse, and that one with horns is a bull. We do not know what a uni­corn is like.”

, “Kafka And His Pre­cur­sors” (1951)

Can you ask researchers if the time is ripe? Well: researchers have a slight con­flict of inter­est in the mat­ter, and are happy to spend arbi­trary amounts of money on top­ics with­out any­thing to show for it. After all, why would they say no?

Scott Fish­er:

I ended up doing more work in Japan than any­thing else because Japan in gen­eral is so tech-s­mit­ten and obsessed that they just love it [VR]. The Japan­ese gov­ern­ment in gen­eral was fund­ing research, build­ing huge research com­plexes just to focus on this. There were huge ini­tia­tives while there was noth­ing hap­pen­ing in the US. I ended up mov­ing to Japan and work­ing there for many years.

Indeed, this would have around the Japan­ese boon­dog­gle the (note that despite Japan’s reputed prowess at robot­ics, it is not Japan’s robots who went into Fukushima / fly­ing around the Mid­dle East / rev­o­lu­tion­iz­ing agri­cul­ture and con­struc­tion). All those ‘huge ini­tia­tives’ and…? Don’t ask Fish­er, he’s hardly going to say, “oh yeah, all the money was com­pletely wast­ed, we were try­ing to do it too soon; our bad”. And Lanier implies that Japan alone spent a lot of mon­ey:

Jaron Lanier: “The com­po­nents have finally got­ten cheap enough that we can start to talk about them as being acces­si­ble in the way that every­body’s always want­ed…­Moore’s law is so inter­est­ing because it’s not just the same com­po­nents get­ting cheap­er, but it really changes the way you do things. For instance, in the old days, in order to tell where your head was so that you could posi­tion vir­tual con­tent to be stand­ing still rel­a­tive to you, we used to have to use some kind of exter­nal ref­er­ence point, which might be mag­net­ic, ultra­son­ic, or opti­cal. These days you put some kind of cam­era on the head and look around in the room and it just cal­cu­lates where you are—the head­sets are self­-suffi­cient instead of rely­ing on an exter­nal ref­er­ence infra­struc­ture. That was incon­ceiv­able before because it would have been just so expen­sive to do that cal­cu­la­tion. Moore’s law really just changes again and again, it re-fac­tors your options in really sub­tle and inter­est­ing ways.”

Kevin Kelly: “Our sense of his­tory in this world is very dim and very short. We were talk­ing about the past: VR was­n’t talked about for a long time, right? 35 years. Most peo­ple have no idea that this is 35 years old. 30 years lat­er, it’s the same head­lines. Was the tech­no­log­i­cal power just not suffi­cient 30 years ago?”

…[On the , based on a VPL data­glove design:]14

JL: “Both I and a lot of other peo­ple real­ly, really wanted to get a con­sumer­able ver­sion of this stuff out. We man­aged to get a taste of the expe­ri­ence with some­thing called the Power Glove…­Sony actu­ally brought out a lit­tle near-eye dis­play called Vir­tual Boy; not very good, but they gave it their best shot. and there were huge projects that have never been shown to the pub­lic to try to make a con­sum­able [VR pro­duc­t], very expen­sive ones. Count­ing for infla­tion, prob­a­bly more money was spent [than] than Face­book just spent on Ocu­lus. We just could nev­er, nev­er, never get it quite there.”

KK: “Because?”

JL: “The com­po­nent cost. It’s Moore’s law. Sen­sors, dis­plays… bat­ter­ies! Bat­ter­ies is a big one.”

Issues like com­po­nent cost were not some­thing that could be solved by a VR research pro­ject, no mat­ter how ambi­tious. Those were hard bind­ing lim­its, and to solve them by cre­at­ing tiny high­-res­o­lu­tion LED/LCD screens for smart­phones, required the ben­e­fit of decades of Moore’s law and the effects of man­u­fac­tur­ing bil­lions of smart­phones.

Researchers in gen­eral have no incen­tive to say, “this is not the right time, wait another 20 years for Moore’s law to make it doable”, even if every­one in the field is per­fectly aware of this—:

I spent a huge amount of time read­ing…I think that there were a lot of peo­ple that were giv­ing VR too much cred­it, because they were work­ing as VR researchers. You don’t want to pub­lish a paper that says, ‘After the study, we came to the con­clu­sion that VR is use­less right now and that we should just not have a job for 20 years.’ There were a few peo­ple that basi­cally came to that con­clu­sion. They said, ‘Cur­rent VR gear is low field of view, high lag, too expen­sive, too heavy, can’t be dri­ven prop­erly from con­sumer-grade com­put­ers, or even pro­fes­sion­al-grade com­put­ers.’ It turned out that I was­n’t the first per­son to real­ize these prob­lems. They’d been known for decades.

AI researcher , claimed in 1970, based on a 1969 poll, that a major­ity of AI researchers esti­mated 10–100 years for AGI (or 1979–2069) and that “There is also fair agree­ment that the chief obsta­cles are not hard­ware lim­i­ta­tions.”15 While AI researcher sur­veys still sug­gest that was­n’t a bad range (), the suc­cess of deep learn­ing makes clear that hard­ware was a huge lim­i­ta­tion, and resources 50 years ago fell short by at least 6 orders of mag­ni­tude. Michie went on to point out that in a pre­vi­ous case, Charles Bab­bage, his work was fore­doomed by it being an “unripe time” due to hard­ware lim­i­ta­tions and rep­re­sented a com­plete waste of time & money16. This, arguably, was the case for Michie’s own research.

Nor Riches to Men of Understanding

“But to come very near to a true the­o­ry, and to grasp its pre­cise appli­ca­tion, are two very differ­ent things, as the his­tory of sci­ence teaches us. Every­thing of impor­tance has been said before by some­body who did not dis­cover it.”

, The Orga­ni­za­tion of Thought (1917)

So you don’t know the tim­ing well enough to reli­ably launch. You can’t imi­tate a suc­cess­ful entre­pre­neur, the time is past. You can’t fore­see what will be suc­cess­ful based on what has been suc­cess­ful; you can’t even fore­see what won’t be suc­cess­ful based on what was already unsuc­cess­ful; and you can’t ask researchers because they are incen­tivized to not know the tim­ing any bet­ter than any­one else.

Can you at least profit from your knowl­edge of the out­come? Here again we must be pes­simistic.

Cer­tainty is irrel­e­vant, you still have prob­lems mak­ing use of this knowl­edge. Exam­ple: in ret­ro­spect, we know every­one wanted com­put­ers, OSes, social net­work­s—but the his­tory of them is strewn with flam­ing rub­ble. Sup­pose you some­how knew in 2000 that “in 2010, the founder of the most suc­cess­ful social net­work will be worth at least $10b”; this is a fal­si­fi­able belief at odds with all con­ven­tional wis­dom and about a tech that blind­sided every­one. Yet, how use­ful would this knowl­edge be, really? What would you do with it? Do you have the cap­i­tal to start a VC fund of your own, and throw mul­ti­-mil­lion-dol­lar invest­ments at every social media until finally in 2010 you knew for sure that Face­book was the win­ning ticket and could cash out in the IPO? I doubt it.

It’s diffi­cult to invest in ‘com­put­ers’ or ‘AI’ or ‘social net­work­ing’ or ‘VR’; there is no index for these things, and it is hard to see how there even could be such a thing. (How do you force all rel­e­vant com­pa­nies to sell trad­able stakes? “If peo­ple don’t want to go to the ball game, how are you going to stop them?” as Yogi Berra asked.) There is no con­ve­nient CMPTR you can buy 100 shares of and hold indefi­nitely to cap­ture gains from your opti­mism about com­put­ers. IBM and Apple both went nearly bank­rupt at points, and Microsoft’s stock has been flat since 1999 or when­ever (trans­lat­ing to huge real losses and oppor­tu­nity costs to long-term hold­ers of it). If you knew for cer­tain that Face­book would be as huge as it was, what stocks, exact­ly, could you have invested in, pre-IPO, to cap­ture gains from its growth? Remem­ber, you don’t know any­thing else about the tech land­scape in the 2000s, like that Google will go way up from its IPO, you don’t know about Apple’s revival under Job­s—all you know is that a social net­work will exist and will grow huge­ly. Why would any­one think that the future of smart­phones would be won by “a has-been 1980s PC maker and an obscure search engine”? (The best I can think of would be to sell any Mur­doch stock you owned when you heard they were buy­ing MySpace, but off­hand I’m not sure that Mur­doch did­n’t just stag­nate rather than drop as MySpace increas­ingly turned out to be a write­off.) In the hypo­thet­i­cal that you did­n’t know the name of the com­pa­ny, you might’ve bought up a bunch of Google stock hop­ing that Orkut would be the win­ner, but while that would’ve been a decent invest­ment (yay!) it would have had noth­ing to do with Orkut (oop­s)…

And even when there are stocks avail­able to buy, you only ben­e­fit based on the specific­s—­like one of the exist­ing stocks being a win­ner, rather than all the stocks being eaten by some new start­up. Let’s imag­ine a differ­ent sce­nar­io, where instead you were con­fi­dent that home robot­ics were about to expe­ri­ence a huge growth spurt. Is this even non­pub­lic knowl­edge at all? The world econ­omy grows at some­thing like 2% a year, labor costs gen­er­ally seem to go up, prices of com­put­ers and robot­ics usu­ally falls… Do indus­try pro­jec­tions expect to grow their sales by <25% a year?

But say that the mar­ket is wrongly pes­simistic. If so, you might spend some of your hypo­thet­i­cal money on what­ever the best approx­i­ma­tion to a robot­ics index fund you can find, as the best of a bunch of bad choic­es. (Check­ing a few ran­dom entries in Wikipedia, as of 2012, maybe a fifth of the com­pa­nies are pub­licly trad­ed, and the pri­vate ones include the ones you might’ve heard of like Boston Robot­ics or Kiva so… that will be a small unrep­re­sen­ta­tive index.) Sup­pose the home robotic growth were con­cen­trated in a sin­gle pri­vate com­pany which exploded into the bil­lions of annual rev­enue and took away the mar­ket share of all the oth­ers, forc­ing them to go bank­rupt or merge or shrink. Home robot­ics will have increased just as you believed—keikaku doori!—yet your ‘index fund’ gone bank­rupt (rein­dex when one of the robot­ics com­pa­nies col­laps­es? Rein­dex into what, another doomed fir­m?). Then after your spe­cial knowl­edge has become pub­lic knowl­edge, the robot­ics com­pany goes pub­lic, and by EMH, their shares become a nor­mal invest­ment.

Mor­gan Housel:

There were 272 auto­mo­bile com­pa­nies in 1909. Through con­sol­i­da­tion and fail­ure, 3 emerged on top, 2 of which went bank­rupt. Spot­ting a promis­ing trend and a win­ning invest­ment are two differ­ent things.

Is this impos­si­bly rare? It sounds like Face­book! They grew fast, roflstomped other social net­works, stayed pri­vate, and post-IPO, pub­lic investors have not profited all that much com­pared to even late investors.

Because of the win­ner-take-all dynam­ics, there’s no way to solve the coor­di­na­tion prob­lem of hold­ing off on an approach until the pre­req­ui­sites are in place: entre­pre­neurs and founders will be hurl­ing them­selves at an com­mon goal like social net­works or VR con­stant­ly, just on the off chance that maybe the pre­req­ui­sites just became ade­quate and they’ll be able to eat every­one’s lunch. A pre­dictable waste of mon­ey, per­haps, but that’s how the incen­tives work out. It’s a weird per­spec­tive to take, but we can think of other tech­nolo­gies which may be like this.

Bit­coin is a top­i­cal exam­ple: it’s still in the early stages where it looks either like a genius stroke to invest in, or a fool’s paradise/Ponzi scheme. In my first draft of this essay in 2012, I noted that we see what looks like a Bit­coin bub­ble as the price inflates from ~$0 to $165—yet, if Bit­coin were the Real Deal, we would expect large price increases as peo­ple learn of it and it directly gains value from increased use, an ecosys­tem slowly unlock­ing the fancy cryp­to­graphic fea­tures, etc. And in 2019, with 2012 a dis­tant mem­o­ry, well, one could say some­thing sim­i­lar, just with larger num­bers…

Or take niche vision­ary tech­nolo­gies: if cry­on­ics was cor­rect in prin­ci­pal, yet turned out to be worth­less for every­one doing it before 2030 (be­cause the wrong per­fu­sion tech­niques or cry­op­reser­v­a­tives were used and some crit­i­cal bit of biol­ogy was not vit­ri­fied) while prac­ti­cal post-2030 say, it would sim­ply be yet another tech­nol­ogy where vision­ar­ies were ulti­mately right despite all nay-say­ing and skep­ti­cism from nor­mals but nev­er­the­less wrong in a prac­ti­cal sense because they jumped on it too ear­ly, and so they wasted their mon­ey.

Indeed, do many things come to pass.

Surfing Uncertainty

“What­so­ever thy hand find­eth to do, do it with thy might; for there is no work, nor device, nor knowl­edge, nor wis­dom, in the grave, whither thou goest.”

Qoheleth,

addressed the assem­bly and said: ‘I am not ask­ing you about the days before the fifteenth of the month. But what about after the fifteen­th? Come and give me a word about those days.’ And he him­self gave the answer for them: ‘Every day is a good day.’”

, 17

Where does this leave us? In what I would call, in a nod to Thiel’s ‘defi­nite’ vs ‘indefi­nite opti­mism’, defi­nite­ly-maybe opti­mism. Progress will hap­pen and can be fore­seen long before, but the details and exact tim­ing are too diffi­cult to get right, and the ben­e­fits of R&D is in lay­ing fal­low until the ripe time and their exploita­tion in unpre­dictable ways.

Return­ing to Don­ald Michie: one could make fun of his extremely over­ly-op­ti­mistic AI pro­jec­tions, and write him off as the stock fig­ure of the biased AI researcher blinded by the ‘Maes-Gar­reau law’ where AI is always sched­uled for right when a researcher will retire18 but while he was wrong, it is unclear this was a mis­take because in other cas­es, an appar­ently doomed research pro­jec­t—­Mar­coni’s attempt to radio across the Atlantic ocean—­suc­ceeded because of an ‘unknown unknown’—the 19. We could­n’t know for sure that such pro­jec­tions were wrong, and the amount of money being spent back then on AI was truly triv­ial (and the com­mer­cial spin­offs likely paid for it all any­way).

Fur­ther, on the grip­ping hand, Michie sug­gests that such research efforts like Bab­bage’s should be thought of not as com­mer­cial R&D, expected to usu­ally pay off right now, but as pro­to­types buy­ing option­al­ity, demon­strat­ing that a par­tic­u­lar tech­nol­ogy was approach­ing its ‘ripe time’ & indi­cat­ing what are the bot­tle­necks, so soci­ety can go after the bot­tle­necks and then has the option to scale up the pro­to­type as soon as the bot­tle­necks are fixed20. describes ripe time as finally enabling attacks on con­se­quen­tial prob­lems21 describes the devel­op­ment of both & as “fail­ure reboot­ing”, revis­it­ing (failed) past ideas which may now be work­able in the light of progress in other areas22. As time pass­es, the num­ber of options may open up, and any of them may bypass what was for­merly which was fatal. Enough progress in one domain (par­tic­u­larly com­put­ing pow­er), can some­times make up for sta­sis in another domain.

So, what Bab­bage should have aimed for is not mak­ing a prac­ti­cal think­ing machine which could churn out naval tables, but demon­strat­ing that a pro­gram­ma­ble think­ing machine is pos­si­ble & use­ful, and cur­rently lim­ited by the slow­ness & size of its mechan­i­cal log­ic—so that tran­sis­tors could be pur­sued with higher pri­or­ity by gov­ern­ments, and pro­gram­ma­ble com­put­ers could be cre­ated with tran­sis­tors as soon as pos­si­ble, instead of the his­tor­i­cal course of a mean­der­ing piece­meal devel­op­ment where Bab­bage’s work was for­got­ten & then repeat­edly rein­vented with delays (eg vs von Neu­man­n). Sim­i­lar­ly, the ben­e­fit of tak­ing Moore’s law seri­ously is that one can plan ahead to take advan­tage of it23 even if one does­n’t know exactly when, if ever, it will hap­pen.

Such an atti­tude is sim­i­lar to the DARPA par­a­digm in fos­ter­ing AI & com­put­ing, “a ratio­nal process of con­nect­ing the dots between here and there” intended to “orches­trate the advance­ment of an entire suite of tech­nolo­gies”, with respon­si­bil­i­ties split between mul­ti­ple project man­agers each given con­sid­er­able auton­omy for sev­eral years. These project man­agers tend to pick polar­iz­ing projects rather than con­sis­tent projects (Gold­stein & Kear­ney 2017), ones which gen­er­ate dis­agree­ment among reviews or crit­ics. Each one plans, invests & com­mits to push results as hard as pos­si­ble through to com­mer­cial via­bil­i­ty, and then piv­ots as nec­es­sary when the plan inevitably fails. (DARPA indeed saw itself as much like a VC fir­m.)

The ben­e­fit for some­one like DARPA of a fore­cast like Moore’s law is that it pro­vides one fixed trend to gauge over­all tim­ing to within a decade or so, and look for those dots which have lagged behind and become reverse salients.24 For an entre­pre­neur, the advan­tage of expo­nen­tial think­ing is more fatal­is­tic: being able to launch in the win­dow of time between just after tech­ni­cal fea­si­bil­ity but before some­one else ran­domly gives it a try; if wrong and it was always impos­si­ble, it does­n’t mat­ter when one launch­es, and if wrong because tim­ing is wrong, one’s choice is effec­tively ran­dom and lit­tle is lost by delay.

Try & Try Again (But Less & Less)

“The road to wis­dom?—Well, it’s plain
and sim­ple to express:
Err
and err
and err again
but less
and less
and less.”

, Grooks

This presents a con­flict between per­sonal and social incen­tives. Social­ly, one wants peo­ple reg­u­larly toss­ing their bod­ies into the mar­ket­place to be tram­pled by uncar­ing forces just on the off chance that this time it’ll finally work, and since the crit­i­cal fac­tors are unknown and con­stantly chang­ing, one needs a sac­ri­fi­cial startup every once in a while to check (for a good idea, no amount of fail­ures is enough to prove that it should never be tried—­many fail­ures just implies that there should be a back­off). Pri­vate­ly, given the skewed returns, dimin­ish­ing util­i­ty, the over­sized neg­a­tive impacts (a bad startup can ruin one’s life and drive one to sui­cide), the lim­ited num­ber of star­tups any indi­vid­ual can engage in (yield­ing )25, and the fact that star­tups & VC will cap­ture only a minute per­cent­age of the total gains from any suc­cess (most of which will turn into con­sumer surplus/positive exter­nal­i­ties), the only star­tups that make any ratio­nal sense, which you would­n’t have to be crazy to try, are the overde­ter­mined ones which any­one can see are a great idea. How­ev­er, those are pre­cisely the star­tups that crazy peo­ple will have done years before when they looked like bad ideas, avoid­ing the waste of delay. Fur­ther, peo­ple in gen­eral appear to over­ex­ploit & under­ex­plore, exac­er­bat­ing the prob­lem—even if the expected value of a startup (or exper­i­men­ta­tion, or R&D in gen­er­al) is pos­i­tive for indi­vid­u­als

So, it seems that rapid progress depends on crazy peo­ple.

There is a more than super­fi­cial anal­ogy here, I think, to 26/posterior sam­pling (PSRL) . In RL’s mul­ti­-armed ban­dit set­ting, each turn one has a set of ‘arms’ or options with unknown pay­offs and one wants to max­i­mize the total long-term reward. The diffi­culty is in cop­ing with fail­ure: even good options may fail many times in a row, and bad options may suc­ceed, so options can­not sim­ply be ruled out after a fail­ure or two, and if one is too hasty to write an option off, one may take a long time to real­ize that, los­ing out for many turns.

One of the sim­plest & most effi­cient MAB solu­tions, which max­i­mizes the total long-term reward and min­i­mizes ‘regret’ (op­por­tu­nity cost), is Thomp­son sam­pling & its gen­er­al­iza­tion PSRL27: ran­domly select each option with a prob­a­bil­ity equal to the cur­rent esti­mated prob­a­bil­ity that it is the most profitable option. This explores all options ini­tially but grad­u­ally homes in on the most profitable option to exploit most of the time, while still occa­sion­ally explor­ing all the other options once in a while, just in case; strictly speak­ing Thomp­son sam­pling will never ban an option per­ma­nent­ly, the prob­a­bil­ity of select­ing an option merely becomes van­ish­ingly rare. Ban­dit set­tings can fur­ther assume that options are ‘rest­less’ and the opti­mal option may ‘drift’ over time or ‘run out’ or ‘switch’, in which case one also esti­mates the prob­a­bil­ity that an option has switched, and when it does, one changes over to the new best option; instead of the reg­u­lar Thomp­son sam­pling where bad options become ever more unlikely to be tried, a rest­less ban­dit results in con­stant low-level explo­ration because one must con­stantly check lest one fails to notice a switch.

This bears a resem­blance to startup rates over time: an ini­tial burst of enthu­si­asm for a new ‘option’, when it still has high prior prob­a­bil­ity of being the most profitable option at the moment, trig­gers a bunch of star­tups select­ing that option, but then when they fail, the pos­te­rior prob­a­bil­ity drops sub­stan­tial­ly; how­ev­er, even if some­thing now looks like a bad idea, there will still be peo­ple every once in a while who insist on try­ing again any­way, and, because the prob­a­bil­ity is not 0, once in a while they suc­ceed wildly and every­one is aston­ished that ‘so, X is a thing now!’

In DARPA’s research fund­ing and VC, they often aren’t look­ing for a plan which looks good on aver­age to every­one, or which no one can find any par­tic­u­lar prob­lem with, but some­thing closer to a plan which at least one per­son thinks could be awe­some for some rea­son. An addi­tional anal­ogy from rein­force­ment learn­ing is PSRL, which han­dles more com­plex prob­lems by com­mit­ting to a strat­egy and fol­low­ing it until the end and either success/failure. A naive Thomp­son sam­pling would do badly in a long-term prob­lem because at every step, it would ‘change its mind’ and be unable to fol­low any plan con­sis­tently for long enough to see what hap­pens; what is nec­es­sary is to do ‘deep explo­ration’, fol­low­ing a sin­gle plan long enough to see how it works, even if one thinks that plan is almost cer­tainly wrong, one must “Dis­agree and com­mit”. The aver­age of mul­ti­ple plans is often worse than any sin­gle plan. The most infor­ma­tive plan is the most polar­iz­ing one.28

The sys­tem as a whole can be seen in RL terms. One theme I notice in many sys­tems is that they fol­low a Ensem­ble meth­ods like dropout or mul­ti­-a­gent opti­miza­tion can fol­low this pat­tern as well.

A par­tic­u­larly ger­mane exam­ple here is / (dis­cus­sion), which exam­ines a large dataset of trades made by online traders, who are able to clone finan­cial trad­ing strate­gies of more suc­cess­ful traders; as traders find suc­cess­ful strate­gies, oth­ers grad­u­ally imi­tate them, and so the sys­tem as a whole con­verges on bet­ter strate­gies in what they iden­tify as a sort of -like imple­men­ta­tion of “dis­trib­uted Thomp­son sam­pling” which they dub “social sam­pling”. So for the most part, traders clone pop­u­lar strate­gies, but with cer­tain prob­a­bil­i­ties, they’ll ran­domly explore rarer appar­ent­ly-un­suc­cess­ful strate­gies.

This sounds a good deal like indi­vid­u­als pur­su­ing stan­dard careers & occa­sion­ally explor­ing unusual strate­gies like a star­tup; they will occa­sion­ally explore strate­gies which have per­formed badly (ie. pre­vi­ous sim­i­lar star­tups failed). Entre­pre­neurs, with their spec­u­la­tions and opti­mistic bias­es, serve as ran­dom­iza­tion devices to sam­ple a strat­egy regard­less of the ‘con­ven­tional wis­dom’, which at that point may be no more than an ; infor­ma­tion cas­cades, how­ev­er, can be bro­ken by the exis­tence of out­liers who are either informed or act at ran­dom (“mis­fits”). While each time a failed option is tried, it may seem irra­tional (“how many times must VR fail before peo­ple finally give up on it‽”), it was still ratio­nal in the big pic­ture to give it a try, as this col­lec­tive strat­egy col­lec­tively min­i­mizes regret & max­i­mizes col­lec­tive total long-term return­s—as long as failed options aren’t tried too often.

Reducing Regret

What does this anal­ogy sug­gest? The two fail­ure modes of a MAB algo­rithm are invest­ing too much in one option early on, and then invest­ing too lit­tle later on; in the for­mer, you ineffi­ciently buy too much infor­ma­tion on an option which hap­pened to have good luck but is not guar­an­teed to be the best at the expense of oth­ers (which may in fact be the best), while in the lat­ter, you buy too lit­tle & risk per­ma­nently mak­ing a mis­take by pre­ma­turely reject­ing an appar­ent­ly-bad option (which sim­ply had bad luck early on). To the extent that VC/startups stam­pede into par­tic­u­lar sec­tors, this leads to ineffi­ciency of the first time—were so many ‘green energy’ star­tups nec­es­sary? When they began fail­ing in a clus­ter, infor­ma­tion-wise, that was highly redun­dant. And then on the other hand, if a startup idea becomes ‘debunked’, and no one is will­ing to invest in it ever, that idea may be starved of invest­ment long past its ripe time, and this means big regret.

I think most peo­ple are aware of fads/stampedes in invest­ing, but the lat­ter error is not so com­monly dis­cussed. One idea is that a VC firm could explic­itly track ideas that seem great but have had sev­eral failed star­tups, and try to sched­ule addi­tional invest­ments at ever greater inter­vals (sim­i­lar to DS-PRL), which bounds losses (if the idea turns out to be truly a bad idea after all) but ensures even­tual suc­cess (if a good one). For exam­ple, even if online pizza deliv­ery has failed every time it’s tried, it still seems like a good idea that peo­ple will want to order pizza online via their smart­phones, so one could try to do a pizza startup 2.5 years lat­er, then 5 years lat­er, then 10 years, then 20 years, or per­haps every time com­puter costs drop an order of mag­ni­tude, or per­haps every time the rel­e­vant mar­ket dou­bles in size? Since some­one want­ing to try the busi­ness again might not pop up at the exact time desired, a VC might need to cre­ate one them­selves by try­ing to inspire some­one to do it.

What other lessons could we draw if we thought about tech­nol­ogy this way? The use of lot­tery grants is one idea which has been pro­posed, to help break the over-ex­ploita­tion fos­tered by peer review; the ran­dom­iza­tion gives dis­fa­vored low-prob­a­bil­ity pro­pos­als (and peo­ple) a chance. If we think about mul­ti­-level opti­miza­tion sys­tems & pop­u­la­tion-based train­ing, and opti­miza­tion of evo­lu­tion like strong ampli­fiers (which resem­ble small but net­worked com­mu­ni­ties: ), that would sug­gest we should have a bias against both large and small groups/institutes/granters, because small ones are buffeted by ran­dom noise/drift and can’t afford well-pow­ered exper­i­ments, but large ones are too nar­row-mind­ed.29 But a net­work of medium ones can both explore well and then effi­ciently repli­cate the best find­ings across the net­work to exploit them.

See Also

Appendix

ARPA and SCI: Surfing AI (Review of Roland & Shiman 2002)

Review of DARPA his­tory book, Strate­gic Com­put­ing: DARPA and the Quest for Machine Intel­li­gence, 1983–1993, Roland & Shi­man 2002, which reviews a large-s­cale DARPA effort to jump­start real-world uses of AI in the 1980s by a mul­ti­-pronged research effort into more effi­cient com­puter chip R&D, super­com­put­ing, robotics/self-driving cars, & expert sys­tem soft­ware. Roland & Shi­man 2002 par­tic­u­larly focus on the var­i­ous ‘philoso­phies’ of tech­no­log­i­cal fore­cast­ing & devel­op­ment, which guided DARPA’s strat­egy in differ­ent peri­ods, ulti­mately endors­ing a weak tech­no­log­i­cal deter­min­ism where the bot­tle­necks are too large for a small (in com­par­i­son to the global econ­omy & global R&D) orga­ni­za­tion best a DARPA can hope for is a largely agnos­tic & reac­tive strat­egy in which granters ‘surf’ tech­no­log­i­cal changes, rapidly exploit­ing new tech­nol­ogy while invest­ing their lim­ited funds into tar­geted research patch­ing up any gaps or lags that acci­den­tally open up and block broader appli­ca­tions.

While read­ing “Fund­ing Break­through Research: Promises and Chal­lenges of the ‘ARPA Model’”, Azoulay et al 2018, on DARPA, I noticed an inter­est­ing com­ment:

In this paper, we pro­pose that the key ele­ments of the ARPA model for research fund­ing are: orga­ni­za­tional flex­i­bil­ity on an admin­is­tra­tive lev­el, and sig­nifi­cant author­ity given to pro­gram direc­tors to design pro­grams, select projects and actively man­age pro­jects. We iden­tify the ARPA mod­el’s domain as mis­sion-ori­ented research on nascent S-curves within an ineffi­cient inno­va­tion sys­tem.

…De­spite a great deal of com­men­tary on , lack of access to inter­nal archival data has ham­pered efforts to study it empir­i­cal­ly. One notable excep­tion is the work of Roland and Shi­man (2002),2 who offer an indus­trial his­tory of DARPA’s effort to develop machine intel­li­gence under the “” [SCI]. They empha­size both the agen­cy’s posi­tion­ing in the research ecosys­tem—­car­ry­ing mil­i­tary ideas to proof of con­cept that would be oth­er­wise neglect­ed—as well as the pro­gram man­agers’ role as “con­nec­tors” in that ecosys­tem. Roland and Shi­man are to our knowl­edge the only aca­d­e­mic researchers ever to receive inter­nal access to DARPA’s archives. Recent work by Gold­stein and Kear­ney (2018a) on ARPA-E is to-date the only quan­ti­ta­tive analy­sis using inter­nal pro­gram data from an ARPA agency. [For insights into this painful process, see the pref­ace of Roland and Shi­man (2002).]

The two Gold­stein & Kear­ney 2018 papers sounded inter­est­ing but alas, are listed as “man­u­script under review”/“man­u­script in prepa­ra­tion”; only one is avail­able as a preprint. I was sur­prised that an agency as well known and inti­mately involved in com­put­ing his­tory could be described as hav­ing one inter­nal his­to­ry, ever, and looked up a PDF copy of .

The pref­ace makes clear the odd foot­note: while they may have had some access to inter­nal archival data, they had a lot less access than they request­ed, DARPA was not enthu­si­as­tic about it, and even­tu­ally can­celed their book con­tract (they pub­lished any­way). This leads to an… inter­est­ing pref­ace. You don’t often hear his­to­ri­ans of solicited offi­cial his­to­ries describe the access as a “mixed bless­ing” and say things like “they never lied to us, as best as we can tell”, they just “sim­ply could not under­stand why we wanted to see the mate­ri­als we requested”, or recount that their “requests for access to these [emails] were most often met with laugh­ter”, not­ing that “We were never explic­itly denied access to records con­trolled by DARPA; we just never gained com­plete access.” Frus­trat­ed, they

…then asked if they could iden­tify any doc­u­ment in the SC pro­gram that spoke the truth, that could be accepted at face val­ue. They [ARPA inter­vie­wees] found this an intrigu­ing ques­tion. They could not think of a sin­gle such doc­u­ment. All doc­u­ments, in their view, dis­torted real­ity one way or anoth­er—al­ways in pur­suit of some greater good.

In one anec­dote from the inter­views, shows up with a stack of inter­nal DARPA doc­u­ments, states that a NDA pre­vents her from talk­ing about them (as if any­one cared about NDAs from decades before), and refuses to show any of the doc­u­ments to the inter­view­er, leav­ing me rather bemused—why both­er? (Although in this case, it may just be that Con­way is a jerk—one might remem­ber her from help­ing try to frame Michael Bai­ley for sex­ual abuse.) I was reminded a lit­tle of Carter Scholz’s also 2002 nov­el, , which touches on SDI and indi­rectly on SCI.

The book itself does­n’t seem to have suffered too badly for the birth pangs. It’s an overview of the birth and death of the SCI, orga­nized in chunks by the man­ag­er. The divi­sion by man­ager is not an acci­den­t—R&S com­ment dep­re­cat­ingly about DARPA per­son­nel being focused on the tech­nol­ogy and how they did­n’t want them to “talk about peo­ple and pol­i­tics” and invoke the straw­man of “tech­no­log­i­cal deter­min­ists”; they seem to adopt the com­mon his­to­rian pose that a sophis­ti­cated his­to­rian focuses on peo­ple and it is naive & unso­phis­ti­cated to invoke objec­tive con­straints of sci­ence & tech­nol­ogy & physics. This is wrong in the con­text of SCI, as their in-depth recount­ing will even­tu­ally make clear. The peo­ple did not have much to do with the fail­ures: stuff like gal­lium arsenide or or autonomous robots did­n’t work out because they don’t work or are hard or require com­put­ing power unavail­able at the time, not because some bureau­crat made a bad nam­ing choice or ran afoul of the wrong Sen­a­tor. Peo­ple don’t mat­ter to some­thing like Moore’s law. Man pro­poses but Nature dis­pos­es—you can fake med­i­cine or psy­chol­ogy eas­i­ly, but it’s harder to fake a robot not run­ning into trees. For­tu­nate­ly, for all the time R&S spend on project man­agers shuffling around acronyms, they still devote ade­quate space to the actual sci­ence & tech­nol­ogy and do a good job of it.

So what was SCI? It was a 1980s–1990 add-on to ARPA’s exist­ing fund­ing pro­grams, where the spec­tre of Japan’s was used to lobby Con­gress for addi­tional R&D fund­ing which would be devoted to a clus­ter of inter­con­nected tech­no­log­i­cal oppor­tu­ni­ties ARPA spied on the US hori­zon, to push them for­ward simul­ta­ne­ously and break the log­jams. (As always, “fund­ing comes from the threat”, though many were highly skep­ti­cal that Fifth Gen­er­a­tion would go any­where or that its intended goal­s—­much of which was to sim­ply work around flaws in Japan­ese lan­guage han­dling—were much of a threat, and most West­ern eval­u­a­tions of it gen­er­ally describe it as a fail­ure or at least not a notably pro­duc­tive R&D invest­men­t.) The sys­tems included chips to replace sil­i­con’s poor thermal/radiation tol­er­ance and oper­ate at faster fre­quen­cies as well, VLSI chips which would com­bine pre­vi­ously dis­parate chips onto a sin­gle small chip as part of a sil­i­con design ecosys­tem which would design & man­u­fac­ture chips much faster than pre­vi­ously30, par­al­lel pro­cess­ing com­put­ers going far beyond just 1 or 2 proces­sors, autonomous car robots, AI expert sys­tems, and advanced user-friendly soft­ware tools in gen­er­al. The name “Strate­gic Com­put­ing Ini­tia­tive” was cho­sen to try to ben­e­fit from Rea­gan’s SDI, but while the mil­i­tary con­nec­tions remained through­out, the con­nec­tion was ulti­mately quite ten­u­ous and the gal­lium arsenide chips were delib­er­ately split out to SDI to avoid con­t­a­m­i­na­tion, although the US mil­i­tary would still be the best cus­tomer for many of the prod­ucts & the con­nec­tions con­tin­ued to alien­ate peo­ple. Sur­pris­ing­ly—shock­ing­ly, even—­com­puter net­work­ing was not a major SCI focus: the ARPA net­work­ing PM Barry Leiner kept clear of SCI (not need­ing the money & fear­ing a repeat of know-noth­ing Repub­li­can Con­gress­men search­ing for some­thing to axe). The fund­ing ulti­mately amounted to $2,205,607, triv­ial com­pared to total mil­i­tary fund­ing, but still real mon­ey.

The project imple­men­ta­tion fol­lowed ARPA’s exist­ing loose over­sight par­a­digm, where trav­el­ing project man­agers were empow­ered to dis­pense grants to appli­cants on their own author­i­ty, depend­ing pri­mar­ily on their own good taste to match tal­ented researchers with ripe oppor­tu­ni­ties, with bureau­cracy lim­ited to meet­ing with the grantees semi­-an­nu­ally or annu­ally for progress reports & eval­u­a­tion, often in groups so as to let researchers test each oth­er’s met­tle & form social ties. (“ARPA pro­gram man­agers like to repeat the quip that they are 75 entre­pre­neurs held together by a com­mon travel agent.”) An ARPA PM would humbly ‘surf’ the cut­ting-edge, going with the waves rather than swim­ming upstream, so to speak, to fol­low grow­ing trends while cut­ting their losses on dead ends, to bring things through the ‘val­ley of death’ between lab pro­to­type and the real world:

Steven Squires, who rose from pro­gram man­ager to be Chief Sci­en­tist of SC and then direc­tor of its par­ent office, sought order­s-of-mag­ni­tude increases in com­put­ing power through par­al­lel con­nec­tion of proces­sors. He envi­sioned research as a con­tin­u­um. Instead of point solu­tions, sin­gle tech­nolo­gies to serve a given objec­tive, he sought mul­ti­ple imple­men­ta­tions of related tech­nolo­gies, an array of capa­bil­i­ties from which users could con­nect differ­ent pos­si­bil­i­ties to cre­ate the best solu­tion for their par­tic­u­lar prob­lem. He called it “gray cod­ing”. Research moved not from the white of igno­rance to the black of rev­e­la­tion, but rather it inched along a tra­jec­tory step­ping incre­men­tally from one shade of gray to anoth­er. His research map was not a quan­tum leap into the unknown but a ratio­nal process of con­nect­ing the dots between here and there. These and other DARPA man­agers attempted to orches­trate the advance­ment of an entire suite of tech­nolo­gies. The desider­a­tum of their sym­phony was con­nec­tion. They per­ceived that research had to mir­ror tech­nol­o­gy. If the sys­tem com­po­nents were to be con­nect­ed, then the researchers had to be con­nect­ed. If the sys­tem was to con­nect to its envi­ron­ment, then the researchers had to be con­nected to the users. Not every­one in SC shared these insights, but the founders did, and they attempted to instill this ethos in the pro­gram.

Done wrong, of course, this results in a cor­rupt slush fund dol­ing out R&D funds to an inces­tu­ous net­work of grantees for tech­nolo­gies always just on the hori­zon and whose fail­ure is always excused by the claim that high­-risk research often won’t work out, or results in elab­o­rate sys­tems try­ing to do too many things and col­laps­ing under the weight of many advanced half-de­bugged sys­tems chaot­i­cally inter­act­ing (eg ). Hav­ing been con­ceived in sci­en­tific sin and born of blue-u­ni­form bureau­cracy while mid­wifed by con­niv­ing com­mit­tees, SCI’s prospects might not look too great.

So, did SCI work out? The answer is a defi­nite, unqual­i­fied—­may­be:

At the end of their decade, 1983–1993, the con­nec­tion failed. SC never achieved the machine intel­li­gence it had promised. It did, how­ev­er, achieve some remark­able tech­no­log­i­cal suc­cess­es. And the pro­gram lead­ers and researchers learned as much from their fail­ures as from their tri­umphs. They aban­doned the weak com­po­nents in their sys­tem and recon­fig­ured the strong ones. They called the new sys­tem “high per­for­mance com­put­ing”. Under this new rubric they con­tin­ued the cam­paign to improve com­put­ing sys­tems. “Grand chal­lenges” replaced the for­mer goal, machine intel­li­gence; but the strat­egy and even the tac­tics remained the same.

The end of SCI coin­cided with (and par­tially caused) the “”, but SCI went beyond just the & expert sys­tem soft­ware com­pa­nies we asso­ciate with the AI win­ter. Of the sys­tems, some worked out, oth­ers were good ideas but the time was­n’t ripe in an unfore­see­able way and have been matur­ing ever since, some have poked along in a kind of per­ma­nent sta­sis (not dead but not alive either), oth­ers were dead ends but dead ends in impor­tant ways, and some are plain dead. In order, one might list: par­al­lel com­mod­ity proces­sors and rapid devel­op­ment of large sil­i­con chips via a sub­si­dized foundry, the autonomous cars/vehicles and gen­er­al­ized machine intel­li­gence sys­tems and expert sys­tems, , and .

Pin­ing for the fjords: super-fast super­con­duct­ing Joseph­son junc­tions were rapidly aban­doned before becom­ing offi­cially part of SCI research, while gal­lium arsenide suffered a sim­i­lar fate—at the time, they were excit­ing and infa­mously bet big on the achiev­ing its OOM improve­ment in part with gal­lium arsenide chips, but some­how it never quite worked out or replaced sil­i­con and remains in a small niche. (I doubt it was SDI’s fault, since gal­lium arsenide has had 2 decades since, and there’s been a ton of com­mer­cial incen­tive to find a replace­ment for sil­i­con as it gets ever harder to shrink sil­i­con nodes.)

Impor­tant fail­ures: autonomous vehi­cles and gen­er­al­ized AI sys­tems rep­re­sent an inter­est­ing inter­me­di­ate case: the funded vehi­cles, like the work at CMU, were use­less—­ex­pen­sive, slow, triv­ially con­fused by slight differ­ences in roads or scenery, unable to cope in real­time with more than mono­chrome images with piti­ful res­o­lu­tions like 640x640px or smaller because the com­puter vision algo­rithms were too com­pu­ta­tion­ally demand­ing, and the devel­op­ment bogged down by end­less tweaks and hack­ing with reg­u­lar regres­sions in capa­bil­i­ty. But these research pro­grams and demos were direct ances­tors of the , which itself kick­started the cur­rent self­-driv­ing car boom a decade ago. ARPA and the mil­i­tary did­n’t get the excit­ing vehi­cles promised by the early ’90s, but they do now have autonomous cars and espe­cially drones, and it’s amaz­ing to think that Google Waymo cars are wan­der­ing around Ari­zona now reg­u­larly pick­ing up and drop­ping off rid­ers with­out a sin­gle fatal­ity or major injury after mil­lions of miles. As far as I can tell, Waymo would­n’t exist now with­out the DARPA Grand Chal­lenge, and it seems pos­si­ble that DARPA was encour­aged by the mixed suc­cess of the SCI vehi­cles, so that’s an inter­est­ing case of poten­tial suc­cess albeit delayed. (But then, we do expect that with tech­nol­o­gy—A­ma­ra’s law.)

Par­al­lel com­put­ers: Think­ing Machines ben­e­fited a lot from SCI as did other par­al­lel com­put­ing pro­jects, and while TM did fail and the com­put­ers we use now don’t resem­ble the Con­nec­tion Machine at all31, the field of par­al­lel pro­cess­ing was proven out (ie. sys­tems with thou­sands of weak CPUs could be suc­cess­fully built, pro­grammed, real­ize OOM per­for­mance gains, and com­mer­cially sol­d); I’d noticed once that a lot of par­al­lel com­put­ing archi­tec­tures we use now seemed to stem from an efflo­res­cence in the 1980s, but it was only while read­ing R&S and not­ing all the famil­iar names that I real­ized that that was not a coin­ci­dence because many of them were ARPA-funded at this time. Even with­out R&S not­ing that the par­al­lel com­put­ing was suc­cess­fully rolled over into “HPC”, SCI’s invest­ment into par­al­lel com­put­ing was a big suc­cess.

A suc­cess­ful adjunct to the par­al­lel com­put­ing was an inter­est­ing pro­gram I’d never heard of before: . MOSIS was essen­tially a gov­ern­men­t-sub­si­dized chip foundry, com­pet­i­tive with com­mer­cial chip foundries, which would accept stu­dent & researcher sub­mis­sions of VLSI chip designs like CPUs or ASICs and make phys­i­cal chips in com­bined batches to save costs. Any­one with inter­est­ing new ideas could email in a design and get back within 2 months a real live chip for a few hun­dred dol­lars. The chips would be made cheap­ly, quick­ly, qual­i­ty-checked, with assur­ance of pri­va­cy, and ran thou­sands of projects a year (peak­ing at 1880 in 1989). This is quite a cool pro­gram to run and must have been a god­send, espe­cially for any­one try­ing to make cus­tom chips for par­al­lel pro­jects. (“SC also sup­ported BBN’s But­ter­fly par­al­lel proces­sor, Charles Seitz’s Hyper­cube and Cos­mic Cube at Cal­Tech, Columbi­a’s Non-Von, and the Cal­Tech Tree Machine. It sup­ported an entire new­comer as well, Danny Hillis’s Con­nec­tion Machine, com­ing out of MIT.47 All of these projects used MOSIS ser­vices to move their design ideas into exper­i­men­tal chips.”) It was involved in early GPU work (Clark’s Geom­e­try Engine) and RISC designs like MIPS and even odd­i­ties like sys­tolic array chips/computers like the . Sad­ly, MOSIS was a bit of a vic­tim of its own suc­cess and drew polit­i­cal fire.

Expert sys­tems and plan­ners are gen­er­ally listed as a ‘fail­ure’ and the cause of the AI Win­ter, and it’s true they did­n’t give us HAL as some GOFAI peo­ple hoped, but they did find a use­ful niche and have been impor­tan­t—R&S give a throw­away para­graph not­ing that one sys­tem from SCI, , was used in plan­ning logis­tics for the first Gulf War and saved the DoD more money than the whole SCI pro­gram com­bined cost. (The listed ref­er­ence, DART: Rev­o­lu­tion­iz­ing Logis­tics Plan­ning”, Hed­berg 2002, actu­ally makes the bolder claim that DART “paid back all of DARPA’s 30 years of invest­ment in AI in a mat­ter of a few months, accord­ing to Vic­tor Reis, Direc­tor of DARPA at the time.” Which could be equally well taken as a com­ment on how expen­sive a war is, how ineffi­cient DoD logis­tics plan­ning was, or how lit­tle has been invested in AI.) It’s also worth not­ing that speech recog­ni­tion based on & , the first speech recog­ni­tion sys­tems which were any use (un­der­ly­ing suc­cesses like ), was a suc­cess here, even if now obso­lesced by deep learn­ing.

Per­haps the most rel­e­vant area to con­tem­po­rary AI dis­cus­sions of deep learn­ing is the expert sys­tems. Why was there such opti­mism? Expert sys­tems had accom­plished a few suc­cess­es: / (although it was never used in pro­duc­tion), some mining/oil case stud­ies like PROSPECTOR, a cus­tomer con­fig­u­ra­tion assis­tant for DEC… And SCI was a syn­er­gis­tic pro­gram, remem­ber, pro­vid­ing the chips and then pow­er­ful par­al­lel com­put­ers whose expert sys­tems would scale up to the tens of thou­sands of rules per sec­ond esti­mated nec­es­sary for things like the autonomous vehi­cles:

Small won­der, then, that Robert Kahn and the archi­tects of SC believed in 1983 that AI was ripe for exploita­tion. It was finally mov­ing out of the lab­o­ra­tory and into the real world, out of the realm of toy prob­lems and into the realm of real prob­lems, out of the ster­ile world of the­ory and into the prac­ti­cal world of appli­ca­tions.

…That such a goal appeared within reach in the early 1980s is a mea­sure of how far the field had already come. In the early 1970s, the MYCIN expert sys­tem had taken twenty per­son­-years to pro­duce just 475 rules.38 The full poten­tial of expert sys­tems lay in pro­grams with thou­sands, even tens and hun­dreds of thou­sands, of rules. To achieve such lev­els, pro­duc­tion of the sys­tems had to be dra­mat­i­cally stream­lined. The com­mer­cial firms spring­ing up in the early 1980s were build­ing cus­tom sys­tems one client at a time. DARPA would try to raise the field above that lev­el, up to the generic or uni­ver­sal appli­ca­tion.

Thus was shaped the SC agenda for AI. While the basic pro­gram within IPTO con­tin­ued fund­ing for all areas of AI, SC would seek “generic appli­ca­tions” in four areas crit­i­cal to the pro­gram’s appli­ca­tions: (1) speech recog­ni­tion would sup­port Pilot’s Asso­ciate and Bat­tle Man­age­ment; (2) nat­ural lan­guage would be devel­oped pri­mar­ily for Bat­tle Man­age­ment; (3) vision would serve pri­mar­ily the Autonomous Land Vehi­cle; and (4) expert sys­tems would be devel­oped for all of the appli­ca­tions. If AI was the penul­ti­mate tier of the SC pyra­mid, then expert sys­tems were the pin­na­cle of that tier. Upon them all appli­ca­tions depend­ed. Devel­op­ment of a generic expert sys­tem that might ser­vice all three appli­ca­tions could be the crown­ing achieve­ment of the pro­gram. Opti­mism on this point was fueled by the whole phi­los­o­phy behind SC. AI in gen­er­al, and expert sys­tems in par­tic­u­lar, had been ham­pered pre­vi­ously by lack of com­put­ing pow­er. Feigen­baum, for exam­ple, had begun DENDRAL on an IBM 7090 com­put­er, with about 130K bytes of core mem­ory and an oper­at­ing speed between 50 and 100,000 float­ing point oper­a­tions per sec­ond.39 Com­puter power was already well beyond that stage, but SC promised to take it to unprece­dented lev­el­s—a gigaflop by 1992. Speed and power would no longer con­strain expert sys­tems. If AI could deliver the generic expert sys­tem, SC would deliver the hard­ware to run it. Com­pared to exist­ing expert sys­tems run­ning 2,000 rules at 50–100 rules per sec­ond, SC promised “mul­ti­ple coop­er­at­ing expert sys­tems with plan­ning capa­bil­ity” run­ning 30,000 rules fir­ing at 12,000 rules per sec­ond and six times real time.40

What hap­pened was that the hard­ware came into exis­tence, but the expert sys­tems did­n’t scale. They instantly hit a com­bi­na­to­r­ial wall, could­n’t solve the ground­ing prob­lem, and knowl­edge engi­neer­ing never became fea­si­ble at the level where you might encode a human’s knowl­edge. Expert sys­tems also strug­gled to be extended beyond sym­bolic sys­tems to real data like vision or sound. AI did­n’t have remotely enough com­put­ing power to do any­thing use­ful, and it did­n’t have meth­ods which could use the com­put­ing power if it had it. We got the VLSI chips, we got the giga­hertz proces­sors even with­out gal­lium arsenide, we got the gigaflops and then the ter­aflops and now the petaflop­s—but what do you do with an expert sys­tem on those? Noth­ing. The grand goals of SCI relied on all the parts doing their part, and one part fell through:

Only four years into the SC pro­gram, when Schwartz was about to ter­mi­nate the Intel­liCorp and Teknowl­edge con­tracts, expec­ta­tions for expert sys­tems were already being scaled back. By the time that Hayes-Roth revised his arti­cle for the 1992 edi­tion of the Ency­clo­pe­dia, the pic­ture was still more bleak. There he made no pre­dic­tions at all about pro­gram speeds. Instead he noted that rule-based sys­tems still lacked “a pre­cise ana­lyt­i­cal foun­da­tion for the prob­lems solv­able by RBSs . . . and a the­ory of knowl­edge orga­ni­za­tion that would enable RBSs to be scaled up with­out loss of intel­li­gi­bil­ity of per­for­mance.”108 SC con­trac­tors in other fields, espe­cially appli­ca­tions, had to rely on cus­tom-de­vel­oped soft­ware of con­sid­er­ably less power and ver­sa­til­ity than those envi­sioned when con­tracts were made with Intel­liCorp and Teknowl­edge. Instead of a generic expert sys­tem, SC appli­ca­tions relied increas­ingly on “domain-spe­cific soft­ware”, a change in ter­mi­nol­ogy that reflected the direc­tion in which the entire field was mov­ing.109 This is strik­ingly sim­i­lar to the pes­simistic eval­u­a­tion Schwartz had made in 1987. It was not just that Intel­liCorp and Teknowl­edge had failed; it was that the enter­prise was impos­si­ble at cur­rent lev­els of expe­ri­ence and under­stand­ing…­Does this mean that AI has finally migrated out of the lab­o­ra­tory and into the mar­ket­place? That depends on one’s per­spec­tive. In 1994 the U.S. Depart­ment of Com­merce esti­mated the global mar­ket for AI sys­tems to be about $1,918 mil­lion, with North Amer­ica account­ing for two-thirds of that total.119 Michael Schrage, of the Sloan School’s Cen­ter for Coor­di­na­tion Sci­ence at MIT, con­cluded in the same year that “AI is—­dol­lar for dol­lar—prob­a­bly the best soft­ware devel­op­ment invest­ment that smart com­pa­nies have made.”120 Fred­er­ick Hayes-Roth, in a wide-rang­ing and can­did assess­ment, insisted that “KBS have attained a per­ma­nent and secure role in indus­try”, even while admit­ting the many short­com­ings of this tech­nol­o­gy.121 Those short­com­ings weighed heav­ily on AI author­ity Daniel Crevier, who con­cluded that “the expert sys­tems flaunted in the early and mid-1980s could not oper­ate as well as the experts who sup­plied them with knowl­edge. To true human experts, they amounted to lit­tle more than sophis­ti­cated remind­ing lists.”122 Even Edward Feigen­baum, the father of expert sys­tems, has con­ceded that the prod­ucts of the first gen­er­a­tion have proven nar­row, brit­tle, and iso­lat­ed.123 As far as the SC agenda is con­cerned, Hayes-Roth’s 1993 opin­ion is dev­as­tat­ing: “The cur­rent gen­er­a­tion of expert and KBS tech­nolo­gies had no hope of pro­duc­ing a robust and gen­eral human-like intel­li­gence.”124

…Each new [ALV] fea­ture and capa­bil­ity brought with it a host of unan­tic­i­pated prob­lems. A new pan­ning sys­tem, installed in early 1986 to per­mit the cam­era to turn as the road curved, unex­pect­edly caused the vehi­cle to veer back and forth until it ran off the road alto­geth­er.45 The soft­ware glitch was soon fixed, but the pan­ning sys­tem had to be scrapped any­way; the heavy, 40-pound cam­era stripped the device’s gears when­ever the vehi­cle made a sud­den stop.46 Given such unan­tic­i­pated diffi­cul­ties and delays, Mar­tin increas­ingly directed its efforts toward achiev­ing just the spe­cific capa­bil­i­ties required by the mile­stones, at the expense of devel­op­ing more gen­eral capa­bil­i­ties. One of the lessons of the first demon­stra­tion, accord­ing to the ALV engi­neers, was the impor­tance of defin­ing “expected exper­i­men­tal results”, because “too much time was wasted doing things not appro­pri­ate to proof of con­cept.”47 Mar­t­in’s selec­tion of tech­nol­ogy was con­ser­v­a­tive. It had to be, as the ALV pro­gram could afford nei­ther the lost time nor the bad pub­lic­ity that a major fail­ure would bring. One BDM observer expressed con­cern that the pres­sure of the demon­stra­tions was encour­ag­ing Mar­tin to cut cor­ners, for instance by using the “flat earth” algo­rithm with its two-di­men­sional rep­re­sen­ta­tion. ADS’s obsta­cle-avoid­ance algo­rithm was so nar­rowly focused that the com­pany was unable to test it in a park­ing lot; it worked only on roads.84…The vision sys­tem proved highly sen­si­tive to envi­ron­men­tal con­di­tion­s—the qual­ity of light, the loca­tion of the sun, shad­ows, and so on. The sys­tem worked differ­ently from month to mon­th, day to day, and even test to test. Some­times it could accu­rately locate the edge of the road, some­times not. The sys­tem reli­ably dis­tin­guished the pave­ment of the road from the dirt on the shoul­ders, but it was fooled by dirt that was tracked onto the road­way by heavy vehi­cles maneu­ver­ing around the ALV. In the fall, the sun, now lower in the sky, reflected bril­liantly off the myr­i­ads of pol­ished peb­bles in the tar­mac itself, pro­duc­ing glit­ter­ing reflec­tions that con­fused the vehi­cle. Shad­ows from trees pre­sented prob­lems, as did asphalt patches from the fre­quent road repairs made nec­es­sary by the harsh Col­orado weather and the con­stant pound­ing of the eight-ton vehi­cle.42

…Knowl­edge-based sys­tems in par­tic­u­lar were diffi­cult to apply out­side the envi­ron­ment for which they had been devel­oped. A vision sys­tem devel­oped for autonomous nav­i­ga­tion, for exam­ple, prob­a­bly would not prove effec­tive for an auto­mated man­u­fac­tur­ing assem­bly line. “There’s no sin­gle uni­ver­sal mech­a­nism for prob­lem solv­ing”, Amarel would later say, “but depend­ing on what you know about a prob­lem, and how you rep­re­sent what you know about the prob­lem, you may use one of a num­ber of appro­pri­ate mech­a­nisms.”…In another major shift in empha­sis, SC2 removed “machine intel­li­gence” from its own plateau on the pyra­mid, sub­sum­ing it under the gen­eral head­ing “soft­ware”. This seem­ingly minor shift in nomen­cla­ture sig­naled a pro­found recon­cep­tu­al­iza­tion of AI, both within DARPA and through­out much of the com­puter com­mu­ni­ty. The effer­ves­cent opti­mism of the early 1980s gave way to more sober appraisal. AI did not scale. In spite of impres­sive achieve­ments in some fields, design­ers could not make sys­tems work at a level of com­plex­ity approach­ing human intel­li­gence. Machines excelled at data stor­age and retrieval; they lagged in judg­ment, learn­ing, and com­plex pat­tern recog­ni­tion.

…Dur­ing SC, AI had proved unable to exploit the pow­er­ful machines devel­oped in SC’s archi­tec­tures pro­gram to achieve Kah­n’s generic capa­bil­ity in machine intel­li­gence. On the fine-grained lev­el, AI, includ­ing many devel­op­ments from the SC pro­gram, is ubiq­ui­tous in mod­ern life. It inhab­its every­thing from auto­mo­biles and con­sumer elec­tron­ics to med­ical devices and instru­ments of the fine arts. Iron­i­cal­ly, AI now per­forms mir­a­cles unimag­ined when SC began, though it can’t do what SC promised.

Given how peo­ple keep reach­ing back to the AI Win­ter in dis­cus­sions of con­nec­tion­is­m—I mean, deep learn­ing—it’s inter­est­ing to con­trast the two par­a­digms.

While work­ing on the Wikipedia arti­cle for (and arti­cles on related high­-pro­file suc­cesses like MYCIN/DENDRAL) back in 2009, I read many jour­nals & mag­a­zines from the 1980s, the Lisp machine hey­day, and even played with a Gen­era OS image in a VM; the more I read about AI, the MIT AI Lab, Lisp machi­nes, the ‘AI win­ter’, and so on, the more impressed I was by the oper­at­ing sys­tems & tools (such as the sophis­ti­cated hyper­text doc­u­men­ta­tion & text edi­tors and capa­bil­i­ties of Com­mon Lisp & its pre­de­ces­sors, which still put con­tem­po­rary OS ecosys­tems on Windows/Mac/Linux to shame in many ways32) and the less I was impressed by the actual AI algo­rithms of the era. In con­trast, with deep learn­ing, I am increas­ingly unim­pressed by the sur­round­ing ecosys­tem of soft­ware tools (with its end­less lay­ers of buggy Python & rigid C++) the more I use it, but more and more impressed by what is pos­si­ble with deep learn­ing.

Deep learn­ing has long ago escaped into the com­mer­cial mar­ket, indeed, is pri­mar­ily dri­ven by indus­try researchers at this point. The case stud­ies are innu­mer­able (and many are secret due to their con­sid­er­able com­mer­cial val­ue). DL han­dles ground­ing prob­lems & raw sen­sory data well and indeed strug­gles most on prob­lems with richly for­mal­ized struc­tures like hierarchies/categories/directed graphs (ML prac­ti­tion­ers cur­rently tend to use deci­sion tree meth­ods like for those), or which require using rules & log­i­cal rea­son­ing (some­what like human­s). Per­haps most impor­tantly from the per­spec­tive of SCI and HPC, deep learn­ing scales: it par­al­lelizes in a num­ber of ways, and it can soak up indefi­nite amounts of com­put­ing power & data. You can train a CNN on a few hun­dred or thou­sand images use­fully33, but Face­book & Google have run exper­i­ments going from mil­lions to large datasets such as hun­dreds of mil­lions and bil­lions of images (eg , , , Maha­jan et al 2018, , ), and the CNNs steadily improve their per­for­mance on both their assigned task and in accom­plish­ing trans­fer learn­ing34. Sim­i­larly in rein­force­ment learn­ing, the richer the resources avail­able, the richer a NN can be trained (; con­sider how deep is com­pared to the , or or for learn­ing many ALE games simul­ta­ne­ous­ly, or Ope­nAI’s 5x5 DoTA progress via essen­tially brute force). Even self­-driv­ing car pro­grams which are a byword for incom­pe­tence deal just fine with all the issues that bedev­iled ALV by using, well, ‘a sin­gle uni­ver­sal mech­a­nism for prob­lem solv­ing’ (which we call CNNs, which can do any­thing from image seg­men­ta­tion to human lan­guage trans­la­tion). These points are all the more strik­ing as there is no sign that hard­ware improve­ments are over or that any inher­ent lim­its have been hit; even the large-s­cale exper­i­ments crit­i­cized as ‘boil the oceans’ projects nev­er­the­less spend what are triv­ial amounts of money by both global eco­nomic and R&D cri­te­ria, like a few mil­lion dol­lars of GPU time. But none of this could have been done in the 1980s, or early 1990s. (As Hin­ton says, why did­n’t con­nec­tion­ism work back then? Because the com­put­ers were thou­sands of times too slow, the datasets were thou­sands of times too small, and some of the neural net­work details like ini­tial­iza­tions & acti­va­tions were bro­ken.)

Con­sid­er­ing all this, it’s not a sur­prise that the AI part of SC did­n’t pan out and even­tu­ally got axed, as it should have. Some­times the time is not ripe. Hero can invent the steam engine, but you don’t get steam engine trains until it’s steam engine train time, and the best inten­tions of all the bureau­crats in the world can’t affect that much. The turnover in man­agers and polit­i­cal inter­fer­ence may well have been enough to “dis­rupt the care­ful orches­tra­tion that its ambi­tious agenda required”, but this was more in the nature of shoot­ing a dead horse. R&S seem, some­what reluc­tant­ly, to ulti­mately assent to the view they cri­tiqued at the begin­ning, held by the ARPA staff, that the fail­ure of SC is pri­mar­ily a demon­stra­tion of tech­no­log­i­cal deter­min­ism than social & polit­i­cal con­tin­gen­cy, and more about the tech­nol­ogy than peo­ple:

…Thus, for all of their agen­cy, their story appears to be one dri­ven by the tech­nol­o­gy. If they were unable to socially con­struct this tech­nol­o­gy, to main­tain agency over tech­no­log­i­cal choice, does it then fol­low that some tech­no­log­i­cal imper­a­tive shaped the SC tra­jec­to­ry, divert­ing it in the end from machine intel­li­gence to high per­for­mance com­put­ing? Insti­tu­tion­al­ly, SC is best under­stood as an ana­log of the devel­op­ment pro­grams for the and bal­lis­tic mis­siles. An elab­o­rate struc­ture was cre­ated to sell the pro­gram, but in prac­tice the plan bore lit­tle resem­blance to day-to-day oper­a­tions. Con­cep­tu­al­ly, SC is best under­stood by mix­ing Thomas Hugh­es’s frame­work of large-s­cale tech­no­log­i­cal sys­tems with Gio­vanni Dosi’s notions of research tra­jec­to­ries. Its expe­ri­ence does not quite map on Hugh­es’s model because the man­agers could not or would not bring their reverse salients on line. It does not quite map on Dosi because the man­agers reg­u­larly dealt with more tra­jec­to­ries and more vari­ables than Dosi antic­i­pates in his analy­ses. In essence, the man­agers of SC were try­ing to research and develop a com­plex tech­no­log­i­cal sys­tem. They suc­ceeded in devel­op­ing some com­po­nents; they failed to con­nect them in a sys­tem. The over­all pro­gram his­tory sug­gests that at this level of basic or fun­da­men­tal research it is best to aim for a broad range of capa­bil­i­ties within the tech­nol­ogy base and leave inte­gra­tion to oth­er­s…While the Fifth Gen­er­a­tion pro­gram con­tributed sig­nifi­cantly to Japan’s national infra­struc­ture in com­puter tech­nol­o­gy, it did not vault that coun­try past the United States…SC played an impor­tant role, but even some SC sup­port­ers have noted that the Japan­ese were in any event headed on the wrong tra­jec­tory even before the United States mobi­lized itself to meet their chal­lenge.

…In some ways the vary­ing records of the SC appli­ca­tions shed light on the pro­gram mod­els advanced by Kahn and Cooper at the out­set. Cooper believed that the appli­ca­tions would pull tech­nol­ogy devel­op­ment; Kahn believed that the evolv­ing tech­nol­ogy base would reveal what appli­ca­tions were pos­si­ble. Kah­n’s appraisal looks more real­is­tic in ret­ro­spect. It is clear that expert sys­tems enjoyed sig­nifi­cant suc­cess in plan­ning appli­ca­tions. This made pos­si­ble appli­ca­tions rang­ing from Naval Bat­tle Man­age­ment to DART. Vision did not make com­pa­ra­ble pro­gress, thus pre­clud­ing achieve­ment of the ambi­tious goals set for the ALV. Once again, the pro­gram went where the tech­nol­ogy allowed. Some reverse salients resisted efforts to orches­trate advance of the entire field in con­cert. If one com­po­nent in a sys­tem did not con­nect, the sys­tem did not con­nect.

In the final analy­sis, SC failed for want of con­nec­tion.

Read­ing about SC fur­nishes an unex­pected les­son about the impor­tance of believ­ing in Moore’s Law and hav­ing tech­niques which can scale. What are we doing now which won’t scale, and what waves are we pad­dling up instead of surfing?

Reverse Salients

Excerpts from The First Mir­a­cle Drugs: How the Sulfa Drugs Trans­formed Med­i­cine, Lesch 2006, describ­ing Hein­rich Hör­lein’s drug devel­op­ment pro­grams & Thomas Edis­on’s elec­tri­cal pro­grams as strate­gi­cally aimed at “reverse salients”, nec­es­sary steps which hold back the prac­ti­cal appli­ca­tion of progress in areas, where research efforts have dis­pro­por­tional pay­offs by remov­ing a bot­tle­neck.

From pg48, “A Sys­tem of Inven­tion”, The First Mir­a­cle Drugs: How the Sulfa Drugs Trans­formed Med­i­cine, Lesch 2006:

atti­tude was based not sim­ply, or even pri­mar­i­ly, on the sit­u­a­tion of any par­tic­u­lar area of research con­sid­ered in iso­la­tion, but on his com­pre­hen­sive overview of advance in areas in which chem­istry and bio­med­i­cine inter­sect­ed. These areas shared a num­ber of generic prob­lems and solu­tions, for exam­ple, the need to iso­late a sub­stance (nat­ural pro­duct, syn­thetic pro­duct, body sub­stance) in chem­i­cally pure form, the need to syn­the­size the sub­stance and to do so eco­nom­i­cally if it was to go on the mar­ket, and the need for phar­ma­co­log­i­cal, chemother­a­peu­tic, tox­i­co­log­i­cal, and clin­i­cal test­ing of the sub­stance. Hör­lein’s efforts to trans­late suc­cess in cer­tain areas (vi­t­a­min defi­ciency dis­ease, chemother­apy of pro­to­zoal infec­tions) into opti­mism about pos­si­bil­i­ties in other areas (cancer, antibac­te­r­ial chemother­a­py) was char­ac­ter­is­tic. He regarded the chem­i­cal attack on dis­ease as a many-fronted bat­tle in which there was a gen­er­ally advanc­ing line but also many points at which advance was slow or arrest­ed.

In this sense, Hör­lein might be said to have thought—as Thomas Hughes has shown that Edi­son did—in terms of reverse salients and crit­i­cal prob­lems. Reverse salients are areas of research and devel­op­ment that are lag­ging in some obvi­ous way behind the gen­eral line of advance. Crit­i­cal prob­lems are the research ques­tions, cast in terms of the con­crete par­tic­u­lars of cur­rently avail­able knowl­edge and tech­nique and of spe­cific exem­plars or mod­els (e.g., insulin, chemother­apy of and malar­ia) that are solv­able and whose solu­tions would elim­i­nate the reverse salients.18

  1. On Edis­on, see Thomas P. Hugh­es, Net­works of Pow­er: Elec­tri­fi­ca­tion in West­ern Soci­ety 1880–1930 (Bal­ti­more, MD: Johns Hop­kins Uni­ver­sity Press, 1983), 18–46.
  2. Ibid; and Thomas P. Hugh­es, “The evo­lu­tion of large tech­no­log­i­cal sys­tems”, in Wiebe E. Bijk­er, Thomas P. Hugh­es, and Trevor Pinch, edi­tors, The Social Con­struc­tion of Tech­no­log­i­cal Sys­tems (Cam­bridge, MA: The MIT Press, 1987)

…What was sys­temic in Hör­lein’s way of think­ing was his con­cept of the orga­ni­za­tional pat­tern or pat­terns that will best facil­i­tate the pro­duc­tion of valu­able results in the areas in which med­i­cine and chem­istry inter­act. A valu­able out­come is a result that has prac­ti­cal impor­tance for clin­i­cal or pre­ven­tive med­i­cine and, implic­it­ly, com­mer­cial value for indus­try. Hör­lein per­ceived a need for a set of mutu­ally com­ple­men­tary insti­tu­tions and trained per­son­nel whose inter­ac­tion pro­duces the desired results. The orga­ni­za­tional pat­tern that emerges more or less clearly from Hör­lein’s lec­tures is closely asso­ci­ated with his view of the typ­i­cal phases or cycles of devel­op­ment of research in chemother­apy or phys­i­o­log­i­cal chem­istry. He saw a need for friendly and mutu­ally sup­port­ive rela­tions between indus­trial research and devel­op­ment orga­ni­za­tions, aca­d­e­mic insti­tu­tions, and clin­i­cians. He viewed the aca­d­e­mic-in­dus­trial con­nec­tion as cru­cial and mutu­ally ben­e­fi­cial. Under­ly­ing this view was his defi­n­i­tion and differ­en­ti­a­tion of the rel­e­vant dis­ci­plines and his belief in their gen­er­ally excel­lent con­di­tion in Ger­many. He saw a need for gov­ern­ment sup­port of appro­pri­ate insti­tu­tions, espe­cially research insti­tutes in uni­ver­si­ties. Within indus­trial research orga­ni­za­tion­s—and, implic­it­ly, within aca­d­e­mic ones—Hör­lein called for spe­cial insti­tu­tional arrange­ments to encour­age appro­pri­ate inter­ac­tions between chem­istry and bio­med­i­cine.

An ele­ment of cru­cial—and to Hör­lein, per­son­al—im­por­tance in these inter­ac­tions was the role of the research man­ager or “team leader.” When Hör­lein spoke of the research done under his direc­tion as “our work,” he used the pos­ses­sive advis­edly to con­vey a strong sense of his own par­tic­i­pa­tion. The research man­ager had to be active in defin­ing goals, in mar­shal­ing means and resources, and in assess­ing suc­cess or fail­ure. He had to inter­vene where nec­es­sary to min­i­mize fric­tion between chemists and med­ical researchers, an espe­cially impor­tant task for chemother­apy as a com­pos­ite enti­ty. He had to pub­li­cize the com­pa­ny’s suc­cess­es—a neces­sity for what was ulti­mately a com­mer­cial enter­prise—and act as liai­son between com­pany lab­o­ra­to­ries and the aca­d­e­mic and med­ical com­mu­ni­ties. Through it all, he had to take a long view of the value of research, not insist­ing on imme­di­ate results of med­ical or com­mer­cial val­ue.

As a research man­ager with train­ing and expe­ri­ence in phar­ma­ceu­ti­cal chem­istry, a lively inter­est in med­i­cine, and rap­port with the med­ical com­mu­ni­ty, Hör­lein was well posi­tioned to sur­vey the field where chem­istry and med­i­cine joined bat­tle against dis­ease. He could spot the points where the ene­my’s line was bro­ken, and the reverse salients in his own. What he could not do—or could not do alone—was to direct the day-to-day oper­a­tions of his troops, that is, to define the crit­i­cal prob­lems to be solved, to iden­tify the terms of their solu­tion, and to do the work that would carry the day. In the case of chemother­a­py, these things could be effected only by the med­ical researcher and the chemist, each work­ing on his own domain, and coop­er­a­tive­ly. For his attack on one of the most impor­tant reverse salients—the chemother­apy of bac­te­r­ial infec­tion­s—Hör­lein called upon the med­ical researcher Domagk and the chemists Miet­zsch and Klar­er.

“Investing in Good Ideas That Look Like Bad Ideas”

Sum­mary by one VC of a16z invest­ment strat­e­gy.

Secrets of Sand Hill Road: Ven­ture Cap­i­tal and How to Get It, by Scott Kupor 2019 (), excerpts

In a strange way, some­times famil­iar­ity can breed con­temp­t—and con­verse­ly, the dis­tance from the prob­lem that comes from hav­ing a com­pletely differ­ent pro­fes­sional back­ground might actu­ally make one a bet­ter founder. Though not ven­ture backed, was cofounded in 1967 by and of course has gone on to become a very suc­cess­ful busi­ness. When inter­viewed many years later about why, despite being a lawyer by train­ing, he was the nat­ural founder for an air­line busi­ness, Kelle­her quipped: “I knew noth­ing about air­li­nes, which I think made me emi­nently qual­i­fied to start one, because what we tried to do at South­west was get away from the tra­di­tional way that air­lines had done busi­ness.”

This has his­tor­i­cally been less typ­i­cal in the ven­ture world, but, increas­ing­ly, as entre­pre­neurs take on more estab­lished indus­tries—­par­tic­u­larly those that are reg­u­lat­ed—bring­ing a view of the mar­ket that is uncon­strained by pre­vi­ous pro­fes­sional expe­ri­ences may in fact be a plus. We often joke at a16z that there is a ten­dency to “fight the last bat­tle” in an area in which one has long-s­tand­ing pro­fes­sional expo­sure; the scars from pre­vi­ous mis­takes run too deep and can make it harder for one to develop cre­ative ways to address the busi­ness prob­lem at hand. Per­haps had Kelle­her known inti­mately of all the chal­lenges of enter­ing the air­line busi­ness, he would have run scream­ing from the chal­lenge ver­sus decid­ing to take on the full set of risks.

What­ever the evi­dence, the fun­da­men­tal ques­tion VCs are try­ing to answer is: Why back this founder against this prob­lem set ver­sus wait­ing to see who else may come along with a bet­ter organic under­stand­ing of the prob­lem? Can I con­ceive of a team bet­ter equipped to address the mar­ket needs that might walk through our doors tomor­row? If the answer is no, then this is the team to back.

The third big area of team inves­ti­ga­tion for VCs focuses on the founder’s lead­er­ship abil­i­ties. In par­tic­u­lar, VCs are try­ing to deter­mine whether this founder will be able to cre­ate a com­pelling story around the com­pany mis­sion in order to attract great engi­neers, exec­u­tives, sales and mar­ket­ing peo­ple, etc. In the same vein, the founder has to be able to attract cus­tomers to buy the pro­duct, part­ners to help dis­trib­ute the pro­duct, and, even­tu­al­ly, other VCs to fund the busi­ness beyond the ini­tial round of financ­ing. Will the founder be able to explain her vision in a way that causes oth­ers to want to join her on this mis­sion? And will she walk through walls when the going gets tough—which it inevitably will in nearly all star­tup­s—and sim­ply refuse to even con­sider quit­ting?

When and first started Andreessen Horow­itz, they described this founder lead­er­ship capa­bil­ity as “ego­ma­ni­a­cal.” Their the­o­ry—notwith­stand­ing the choice of word­s—was that to make the deci­sion to be a founder (a job fraught with likely fail­ure), an indi­vid­ual needed to be so con­fi­dent in her abil­i­ties to suc­ceed that she would bor­der on being so self­-ab­sorbed as to be truly ego­ma­ni­a­cal. As you might imag­ine, the use of that term in our fund-rais­ing deck for our first fund struck a chord with a num­ber of our poten­tial investors, who wor­ried that we would back insuffer­able founders. We ulti­mately chose to aban­don our word choice, but the prin­ci­ple remains today: You have to be partly delu­sional to start a com­pany given the prospects of suc­cess and the need to keep push­ing for­ward in the wake of the con­stant stream of doubters.

After all, nonob­vi­ous ideas that could in fact become big busi­nesses are by defi­n­i­tion nonob­vi­ous. My part­ner describes our job as VCs as invest­ing in good ideas that look like bad ideas. If you think about the spec­trum of things in which you could invest, there are good ideas that look like good ideas. These are tempt­ing, but likely can’t gen­er­ate out­size returns because they are sim­ply too obvi­ous and invite too much com­pe­ti­tion that squeezes out the eco­nomic rents. Bad ideas that look like bad ideas are also eas­ily dis­missed; as the descrip­tion implies, they are sim­ply bad and thus likely to be trap­doors through which your invest­ment dol­lars will van­ish. The tempt­ing deals are the bad ideas that look like good ideas, yet they ulti­mately con­tain some hid­den flaw that reveals their true “bad­ness”. This leaves good VCs to invest in good ideas that look like bad ideas—hid­den gems that prob­a­bly take a slightly delu­sional or uncon­ven­tional founder to pur­sue. For if they were obvi­ously good ideas, they would never pro­duce ven­ture returns.


  1. Thiel uses the exam­ple of ‘New France’/the Louisiana Ter­ri­to­ry, in which the pro­jec­tions of et al that it (and thus the ) would be as valu­able as France itself turned out to be cor­rec­t—just cen­turies lat­er, with the ben­e­fits redound­ing to the British colonies. Even the Mis­sis­sippi Com­pany worked out: “The ships that went abroad on behalf of his great com­pany began to turn a profit. The audi­tor who went through the com­pa­ny’s books con­cluded that it was entirely sol­ven­t—which isn’t sur­pris­ing, when you con­sider that the lands it owned in Amer­ica now pro­duce tril­lions of dol­lars in eco­nomic val­ue.” One could also say the same thing of Chi­na: count­less Euro­pean observers fore­cast that China was a ‘sleep­ing giant’ which, once it indus­tri­al­ized & mod­ern­ized, would again be a global pow­er. They were cor­rect, but many of them would be sur­prised & dis­ap­pointed how long it took.↩︎

  2. #326, “Part II. The Wan­derer And His Shadow”, .↩︎

  3. patent troll com­pany is also fea­tured in Mal­colm Glad­well’s essay on mul­ti­ple inven­tion, “In the Air: Who says big ideas are rare?”; IV’s busi­ness model is to spew out patents for spec­u­la­tions that other peo­ple will then actu­ally invent, who can then be extorted for license fees when they make the inven­tions work in the real world & pro­duce val­ue. (This is assisted by the fact that patents no longer require even the pre­tense of a work­ing mod­el.) As Bill Gates says, “I can give you fifty exam­ples of ideas they’ve had where, if you take just one of them, you’d have a startup com­pany right there.” —that this model works demon­strates the com­mon­ness of ‘mul­ti­ples’, the worth­less of ideas, and the moral bank­ruptcy of the cur­rent patent sys­tem.↩︎

  4. At the mar­gin, com­pared to other com­peti­tors in the VR space, like Valve’s con­cur­rent efforts, and every­thing that the Rift built on, did Luckey and co really cre­ate ~$2.83b of new val­ue? Or were they lucky in try­ing at the right time, and merely cap­tured all of the val­ue, because a 99% ade­quate VR head­set is worth 0%, and they added the final 1%? If the lat­ter, how could IP or eco­nom­ics be fixed to more closely link inter­me­di­ate con­tri­bu­tions to the final result to more closely approach a fairer dis­tri­b­u­tion like the than con­tri­bu­tions being , yield­ing last-mover win­ner-take-all dynam­ics?↩︎

  5. Bene­dict Evans (“In Praise of Fail­ure”) sum­ma­rizes the prob­lem:

    It’s pos­si­ble for a few peo­ple to take an idea and cre­ate a real com­pany worth bil­lions of dol­lars in less than a decade—to go from an idea and a few notes to Google or Face­book, or for that mat­ter or Ner­vana [?]. It’s pos­si­ble for entre­pre­neurs to cre­ate some­thing with huge impact.

    But equal­ly, any­thing with that much poten­tial has a high like­li­hood of fail­ure—if it was obvi­ously a good idea with no risks, every­one would be doing it. Indeed, it’s inher­ent in really trans­for­ma­tive ideas that they look like bad ideas—­Google, Apple, Face­book and Ama­zon all did, some­times sev­eral times over. In hind­sight the things that worked look like good ideas and the ones that failed look stu­pid, but sadly it’s not that obvi­ous at the time. Rather, this is how the process of inven­tion and cre­ation works. We try things—we try to cre­ate com­pa­nies, prod­ucts and ideas, and some­times they work, and some­times they change the world. And so, we see, in our world around half such attempts fail com­plete­ly, and 5% or so go to the moon.

    It’s worth not­ing that ‘looks like a bad idea’ is flex­i­ble here: I empha­size that many good ideas look like bad ideas because they’ve been tried before & failed, but many oth­ers look bad because a nec­es­sary change has­n’t yet hap­pened or peo­ple under­es­ti­mate exist­ing tech­nol­o­gy.↩︎

  6. Where there is, as Musk describes it, a “grave­yard of com­pa­nies” like or . It may be rel­e­vant to note that Musk ; the two co-founders ulti­mately quit the com­pa­ny.↩︎

  7. As late as 2007–2008, Block­buster could have still beaten Net­flix, as its "To­tal Access" pro­gram demon­strat­ed, but CEO changes scup­pered its last chance. And, inci­den­tal­ly, offer­ing an exam­ple of why stock mar­kets are fine with pay­ing exec­u­tives so much: a good exec­u­tive can cre­ate—or destroy—the entire com­pa­ny. If Block­buster’s CEO had paid a pit­tance ~2000 to acqui­hire Net­flix & put Reed Hast­ings in charge, or if it had sim­ply stuck with its CEO in 2007 to stran­gle Net­flix with Total Access, its share­hold­ers would be far bet­ter off now. But it did­n’t.↩︎

  8. “Almost Wikipedia: Eight Early Ency­clo­pe­dia Projects and the Mech­a­nisms of Col­lec­tive Action”, Hill 2013; “Almost-Wikipedias and inno­va­tion in free col­lab­o­ra­tion pro­jects: why did 7 pre­de­ces­sors fail?”.↩︎

  9. Find­ing out these tid­bits is one rea­son I enjoyed read­ing Founders at Work: Sto­ries of Star­tups’ Early Days (ed Liv­ingston 2009; “Intro­duc­tion”), because the chal­lenges are not always what you think they are. Pay­Pal’s major chal­lenge, for exam­ple, was not find­ing a mar­ket like eBay power sell­ers, but cop­ing with fraud as they scaled, which appar­ently was the undo­ing of any num­ber of rivals.↩︎

  10. Per­son­al­ly, I was still using Dog­pile until at least 2000.↩︎

  11. From Frock 2006, Chang­ing How the World Does Busi­ness: Fedex’s Incred­i­ble Jour­ney to Suc­cess, in 1973:

    On sev­eral occa­sions, we came within an inch of fail­ure, because of dwin­dling finan­cial resources, reg­u­la­tory road­blocks, or unfore­seen events like the Arab oil embar­go. Once, luck at the gam­ing tables of Las Vegas helped to save the com­pany from finan­cial dis­as­ter. Another time, we had to ask our employ­ees to hold their pay­checks while we waited for the next wave of financ­ing…Fred dumped his entire inher­i­tance into the com­pany and was full speed ahead with­out con­cern for his per­sonal finances.

    …The loan guar­an­tee from Gen­eral Dynam­ics raised our hopes and increased our spir­its, but also increased the pres­sure to final­ize the pri­vate place­ment. We con­tin­ued to be in des­per­ate finan­cial trou­ble, par­tic­u­larly with our sup­pli­ers. The most demand­ing sup­pli­ers when it came to pay­ments were the oil com­pa­nies. Every Mon­day, they required Fed­eral Express to pre­pay for the antic­i­pated weekly usage of jet fuel. By mid-July our funds were so mea­ger that on Fri­day we were down to about $20,296 in the check­ing account, while we needed $97,421 for the jet fuel pay­ment. I was still com­mut­ing to Con­necti­cut on the week­ends and really did not know what was going to tran­spire on my return.

    How­ev­er, when I arrived back in Mem­phis on Mon­day morn­ing, much to my sur­prise, the bank bal­ance stood at nearly $129,895. I asked Fred where the funds had come from, and he respond­ed, “The meet­ing with the Gen­eral Dynam­ics board was a bust and I knew we needed money for Mon­day, so I took a plane to Las Vegas and won $109,599.” I said, “You mean you took our last $20,296—how could you do that?” He shrugged his shoul­ders and said, “What differ­ence did it make? With­out the funds for the fuel com­pa­nies, we could­n’t have flown any­way.” Fred’s luck held again. It was not much but it came at a crit­i­cal time and kept us in busi­ness for another week.

    This also illus­trates the ex post & fine line between ‘vision­ary founder’ & ‘crim­i­nal con artist’; had Fred­er­ick W. Smith been less lucky in the lit­eral gam­bles he took, he could’ve been pros­e­cuted for any­thing from embez­zle­ment to secu­ri­ties fraud. As a mat­ter of fact, Smith was pros­e­cut­ed—­for some­thing else entire­ly:

    Fred now revealed that a year ear­lier [also in 1973] he had forged doc­u­ments indi­cat­ing approval of a loan guar­an­tee by the Enter­prise Com­pany with­out con­sent of the other board mem­bers, specifi­cally his two sis­ters and Bobby Cox, the Enter­prise sec­re­tary. Our respected leader admit­ted his cul­pa­bil­ity to the Fed­eral Express board of direc­tors and to the investors and lenders we were count­ing on to sup­port the sec­ond round of the pri­vate place­ment financ­ing. While it is pos­si­ble to under­stand that, under extreme pres­sure, Fred was act­ing to save Fed­eral Express from almost cer­tain bank­rupt­cy, and even to empathize with what he did, it nev­er­the­less appeared to be a seri­ous breach of con­duc­t…De­cem­ber 1975 was also the month that set­tled the mat­ter of the forged loan guar­an­tee doc­u­ments for the Union Bank. At his tri­al, Fred tes­ti­fied that as pres­i­dent of the Enter­prise board and with sup­port­ing let­ters from his sis­ters, he had author­ity to com­mit the board. After 10 hours of delib­er­a­tion, he was acquit­ted. If con­vict­ed, he would have faced a prison term of up to five years.

    Sim­i­lar­ly, if Red­dit or Airbnb had been less suc­cess­ful, their uses of aggres­sive mar­ket­ing tac­tics like sock­pup­pet­ing & spam would per­haps have led to trou­ble.↩︎

  12. To bor­row a phrase from Kel­ly:

    The elec­tric incan­des­cent light­bulb was invent­ed, rein­vent­ed, coin­vent­ed, or “first invented” dozens of times. In their book Edis­on’s Elec­tric Light: Biog­ra­phy of an Inven­tion, Robert Friedel, Paul Israel, and Bernard Finn list 23 inven­tors of incan­des­cent bulbs prior to Edi­son. It might be fairer to say that Edi­son was the very last “first” inven­tor of the elec­tric light. These 23 bulbs (each an orig­i­nal in its inven­tor’s eyes) var­ied tremen­dously in how they fleshed out the abstrac­tion of “elec­tric light­bulb.” Differ­ent inven­tors employed var­i­ous shapes for the fil­a­ment, differ­ent mate­ri­als for the wires, differ­ent strengths of elec­tric­i­ty, differ­ent plans for the bases. Yet they all seemed to be inde­pen­dently aim­ing for the same arche­typal design. We can think of the pro­to­types as 23 differ­ent attempts to describe the inevitable generic light­bulb.

    This hap­pens even in lit­er­a­ture: Doyle’s Sher­lock Holmes sto­ries weren’t the first to invent “clues”, but the last (, , Batu­man 2005), with other detec­tive fic­tion writ­ers doing things that can only be called ‘grotesque’; Moret­ti, baffled, recounts that “one detec­tive, hav­ing deduced that ‘the drug is in the third cup of coffee’, pro­ceeds to drink the coffee.”

    To give a per­sonal exam­ple: while research­ing , sup­pos­edly invented in 2013, I dis­cov­ered that they had been invented at least 10 times dat­ing back to 1966.↩︎

  13. “Kafka And His Pre­cur­sors”, Borges 1951:

    At one time I con­sid­ered writ­ing a study of Kafka’s pre­cur­sors. I had thought, at first, that he was as unique as the phoenix of rhetor­i­cal praise; after spend­ing a lit­tle time with him, I felt I could rec­og­nize his voice, or his habits, in the texts of var­i­ous lit­er­a­tures and var­i­ous ages…If I am not mis­tak­en, the het­ero­ge­neous pieces I have listed resem­ble Kafka; if I am not mis­tak­en, not all of them resem­ble each oth­er. This last fact is what is most sig­nifi­cant. Kafka’s idio­syn­crasy is present in each of these writ­ings, to a greater or lesser degree, but if Kafka had not writ­ten, we would not per­ceive it; that is to say, it would not exist. The poem “Fears and Scru­ples” by proph­e­sies the work of Kafka, but our read­ing of Kafka notice­ably refines and diverts our read­ing of the poem. Brown­ing did not read it as we read it now. The word “pre­cur­sor” is indis­pens­able to the vocab­u­lary of crit­i­cism, but one must try to purify it from any con­no­ta­tion of polemic or rival­ry. The fact is that each writer cre­ates his pre­cur­sors. His work mod­i­fies our con­cep­tion of the past, as it will mod­ify the future.’ In this cor­re­la­tion, the iden­tity or plu­ral­ity of men does­n’t mat­ter. The first Kafka of is less a pre­cur­sor of the Kafka of the gloomy myths and ter­ri­fy­ing insti­tu­tions than is Brown­ing or .

    ↩︎
  14. See also “An Oral His­tory of Nin­ten­do’s Power Glove”, and Poly­gon’s oral his­tory of the , .↩︎

  15. “Inte­grated Cog­ni­tive Sys­tems”, Michie 1970 (pg93–96 of Michie, On Machine Intel­li­gence):

    How long is it likely to be before a machine can be devel­oped approx­i­mat­ing to adult human stan­dards of intel­lec­tual per­for­mance? In a recent poll [8], thir­ty-five out of forty-two peo­ple engaged in this sort of research gave esti­mates between ten and one hun­dred years. [8: Euro­pean AISB Newslet­ter, no. 9, 4 (1969)] There is also fair agree­ment that the chief obsta­cles are not hard­ware lim­i­ta­tions. The speed of light imposes the­o­ret­i­cal bounds on rates of infor­ma­tion trans­fer, so that it was once rea­son­able to won­der whether these lim­its, in con­junc­tion with phys­i­cal lim­its to micro­minia­tur­iza­tion of switch­ing and con­duct­ing ele­ments, might give the bio­log­i­cal sys­tem an irre­ducible advan­tage. But recent esti­mates [9, 10], which are sum­ma­rized in Tables 7.1 and 7.2, indi­cate that this is not so, and that the bal­ance of advan­tage in terms of sheer infor­ma­tion-han­dling power may even­tu­ally like with the com­puter rather than the brain. It seems a rea­son­able guess that the bot­tle­neck will never again lie in hard­ware speeds and stor­age capac­i­ties, as opposed to purely log­i­cal and pro­gram­ming prob­lems. Granted that an ICS can be devel­oped, is now the right time to mount the effort?

    ↩︎
  16. Michie 1970:

    Yet the prin­ci­ple of ‘unripe time’, dis­tilled by F. M. Corn­ford [15] more than half a cen­tury ago from the change­less stream of Cam­bridge aca­d­e­mic life, has pro­vided the epi­taph of more than one pre­ma­ture tech­nol­o­gy. The aero­plane indus­try can­not now redeem Daedalus nor can the com­puter indus­try recover the money spent by the British Admi­ralty more than a hun­dred years ago in sup­port of Charles Bab­bage and his cal­cu­lat­ing machine. Although Bab­bage was one of Britain’s great inno­v­a­tive genius­es, sup­port of his work was wasted money in terms of tan­gi­ble return on invest­ment. It is now appre­ci­ated that of the fac­tors needed to make the stored-pro­gram dig­i­tal com­puter a tech­no­log­i­cal real­ity only one was miss­ing: the means to con­struct fast switch­ing ele­ments. The greater part of a cen­tury had to elapse before the vac­uum tube arrived on the scene.

    ↩︎
  17. Trans­la­tion, Kat­suki Seki­da, Two Zen Clas­sics: The Gate­less Gate and The Blue Cliff Records, 2005.↩︎

  18. Which as a side note is wrong; com­piled pre­dic­tions actu­ally indi­cate that AI researcher fore­casts, while vary­ing any­where from a decade to cen­turies, typ­i­cally clus­ter around 20 years in the future regard­less of researcher age. For a recent time­line sur­vey, see , Gruet­zemacher et al 2019, and for more, AI Impact­s.org. (One won­ders if a 20-year fore­cast might be dri­ven by : in an expo­nen­tial­ly-grow­ing field, most researchers will be present in the final ‘gen­er­a­tion’, and so a pri­ori one could pre­dict accu­rately that it will be 20 years to AI. In this regard, it is amus­ing to note the expo­nen­tial growth of con­fer­ences like NIPS or ICML 2010–2019.)↩︎

  19. Michie 1970:

    …A fur­ther appli­ca­tion of cri­te­rion 4 arises if the­o­ret­i­cal infea­si­bil­ity is demon­strat­ed…But it is well to look on such neg­a­tive proofs with cau­tion. The pos­si­bil­ity of broad­cast­ing radio waves across the Atlantic was con­vinc­ingly excluded by the­o­ret­i­cal analy­sis. This did not deter Mar­coni from the attempt, even though he was as unaware of the exis­tence of the Heav­i­side layer as every­one else.

    ↩︎
  20. Michie 1970:

    It can rea­son­ably be said that time was unripe for dig­i­tal com­put­ing as an indus­trial tech­nol­o­gy. But it is by no means obvi­ous that it was unripe for Bab­bage’s research and devel­op­ment effort, if only it had been con­ceived in terms of a more severely delim­ited objec­tive: the con­struc­tion of a work­ing mod­el. Such a device would not have been aimed at the then unat­tain­able goal of eco­nomic via­bil­i­ty; but its suc­cess­ful demon­stra­tion might, just con­ceiv­ably, have greatly accel­er­ated mat­ters when the time was finally ripe. Vac­uum tube tech­nol­ogy was first exploited for high­-speed dig­i­tal com­put­ing in Britain dur­ing the Sec­ond World War [16]. But it was left to Eck­ert and Mauchly [16] sev­eral years later to redis­cover and imple­ment the con­cep­tions of stored pro­grams and con­di­tional jumps, which had already been present in Bab­bage’s ana­lyt­i­cal engine [17]. Only then could the new tech­nol­ogy claim to have drawn level with Bab­bage’s design ideas of a hun­dred years ear­li­er.

    ↩︎
  21. A kind of defi­n­i­tion of :

    If you do not work on an impor­tant prob­lem, it’s unlikely you’ll do impor­tant work. It’s per­fectly obvi­ous. Great sci­en­tists have thought through, in a care­ful way, a num­ber of impor­tant prob­lems in their field, and they keep an eye on won­der­ing how to attack them. Let me warn you, ‘impor­tant prob­lem’ must be phrased care­ful­ly. The three out­stand­ing prob­lems in physics, in a cer­tain sense, were never worked on while I was at Bell Labs. By impor­tant I mean guar­an­teed a Nobel Prize and any sum of money you want to men­tion. We did­n’t work on (1) time trav­el, (2) tele­por­ta­tion, and (3) anti­grav­i­ty. They are not impor­tant prob­lems because we do not have an attack. It’s not the con­se­quence that makes a prob­lem impor­tant, it is that you have a rea­son­able attack. That is what makes a prob­lem impor­tant.

    ↩︎
  22. “Ed Boy­den on Mind­ing your Brain (Ep. 64)”:

    BOYDEN: …One idea is, how do we find the dia­monds in the rough, the big ideas but they’re kind of hid­den in plain sight? I think we see this a lot. Machine learn­ing, deep learn­ing, is one of the hot top­ics of our time, but a lot of the math was worked out decades ago—back­prop­a­ga­tion, for exam­ple, in the 1980s and 1990s. What has changed since then is, no doubt, some improve­ments in the math­e­mat­ics, but large­ly, I think we’d all agree, bet­ter com­pute power and a lot more data.

    So how could we find the trea­sure that’s hid­ing in plain sight? One of the ideas is to have sort of a SWAT team of peo­ple who go around look­ing for how to con­nect the dots all day long in these serendip­i­tous ways.

    COWEN: Two last ques­tions. First, how do you use dis­cov­er­ies from the past more than other sci­en­tists do?

    BOYDEN: One way to think of it is that, if a sci­en­tific topic is really pop­u­lar and every­body’s doing it, then I don’t need to be part of that. What’s the ben­e­fit of being the 100,000th per­son work­ing on some­thing?

    So I read a lot of old papers. I read a lot of things that might be for­got­ten because I think that there’s a lot of trea­sure hid­ing in plain sight. As we dis­cussed ear­lier, and both begin from papers from other fields, some of which are quite old and which mostly had been ignored by other peo­ple.

    I some­times prac­tice what I call ‘fail­ure reboot­ing’. We tried some­thing, or some­body else tried some­thing, and it did­n’t work. But you know what? Some­thing hap­pened that made the world differ­ent. Maybe some­body found a new gene. Maybe com­put­ers are faster. Maybe some other dis­cov­ery from left field has changed how we think about things. And you know what? That old failed idea might be ready for prime time.

    With opto­ge­net­ics, peo­ple were try­ing to con­trol brain cells with light going back to 1971. I was actu­ally read­ing some ear­lier papers. There were peo­ple play­ing around with con­trol­ling brain cells with light going back to the 1940s. What is differ­ent? Well, this class of mol­e­cules that we put into neu­rons had­n’t been dis­cov­ered yet.

    ↩︎
  23. “Was Moore’s Law Inevitable?”, Kevin Kelly again:

    Lis­ten to the tech­nol­o­gy, Carver Mead says. What do the curves say? Imag­ine it is 1965. You’ve seen the curves Gor­don Moore dis­cov­ered. What if you believed the story they were try­ing to tell us: that each year, as sure as win­ter fol­lows sum­mer, and day fol­lows night, com­put­ers would get half again bet­ter, and half again small­er, and half again cheap­er, year after year, and that in 5 decades they would be 30 mil­lion times more pow­er­ful than they were then, and cheap. If you were sure of that back then, or even mostly per­suad­ed, and if a lot of oth­ers were as well, what good for­tune you could have har­vest­ed. You would have needed no other prophe­cies, no other pre­dic­tions, no other details. Just know­ing that sin­gle tra­jec­tory of Moore’s, and none oth­er, we would have edu­cated differ­ent­ly, invested differ­ent­ly, pre­pared more wisely to grasp the amaz­ing pow­ers it would sprout.

    ↩︎
  24. It’s not enough to the­o­rize about the pos­si­bil­ity or pro­to­type some­thing in the lab if there is then no fol­lowup. The moti­va­tion to take some­thing into the ‘real world’, which nec­es­sar­ily requires attack­ing the reverse salients, may be part of why cor­po­rate & aca­d­e­mic research are both nec­es­sary; too lit­tle of either cre­ates a bot­tle­neck. A place like Bell Labs ben­e­fits from remain­ing in con­tact with the needs of com­merce, as it pro­vides a check on l’art pour l’art patholo­gies, a fer­tile source of prob­lems, and can feed back the ben­e­fits of mass production/experience curves. (Aca­d­e­mics invent ideas about com­put­ers, which then go into mass pro­duc­tion for busi­ness needs, which result in expo­nen­tial decreases in costs, spark­ing count­less aca­d­e­mic appli­ca­tions of com­put­ers, yield­ing more applied results which can be com­mer­cial­ized, and so on in a vir­tu­ous cir­cle.) In recent times, cor­po­rate research has dimin­ished, and that may be a bad thing: , Arora et al 2020.↩︎

  25. One might appeal to the as a guide to how much indi­vid­u­als should wager on exper­i­ments, since the Kelly cri­te­rion gives opti­mal growth of wealth over the long-term while avoid­ing gam­bler’s ruin, but given the extremely small num­ber of ‘wagers’ an indi­vid­ual engages in, with a highly finite hori­zon, the Kelly cri­te­ri­on’s assump­tions are far from sat­is­fied, and the true opti­mal strat­egy can be rad­i­cally differ­ent from a naive Kelly cri­te­ri­on; I explore this differ­ence more in , which is moti­vated by stock­-mar­ket invest­ing.↩︎

  26. Thomp­son sam­pling, inci­den­tal­ly, has been redis­cov­ered↩︎

  27. PSRL (pos­te­rior , see also Ghavamzadeh et al 2016) gen­er­al­izes Thomp­son sam­pling to more com­plex prob­lems, MDPs or POMDPs in gen­er­al, by for each iter­a­tion, assum­ing an entire col­lec­tion or dis­tri­b­u­tion of pos­si­ble envi­ron­ments which are more com­plex than a sin­gle-step ban­dit, pick­ing an envi­ron­ment at ran­dom based on its prob­a­bil­ity of being the real envi­ron­ment, find­ing the opti­mal actions for that one, and then act­ing on that solu­tion; this does the same thing in smoothly bal­anc­ing explo­ration with exploita­tion. Nor­mal PSRL requires ‘episodes’, which don’t really have a real-world equiv­a­lent, but PSRL can be extended to han­dle con­tin­u­ous action—a nice exam­ple is , which does ‘back off’ in peri­od­i­cally stop­ping, and re-e­val­u­at­ing the opti­mal strat­egy based on accu­mu­lated evi­dence, but less & less often, so it does PSRL over increas­ingly large time win­dows.↩︎

  28. Polar­iz­ing here could be reflect a wide pos­te­rior value dis­tri­b­u­tion, or if the pos­te­rior is being approx­i­mated by some­thing like a mix­ture of experts or an ensem­ble of mul­ti­ple mod­els (like run­ning mul­ti­ple passes over a dropout-trained neural net­work, or a boot­strapped neural net­work ensem­ble). In a human set­ting, it might be polar­iz­ing in the sense of human peer-re­view­ers argu­ing the most about it, or hav­ing the least inter-rater agree­ment or high­est vari­ance of rat­ings.

    As Gold­stein & Kear­ney 2017 describe their analy­sis of the numer­i­cal peer reviewer rat­ings of DARPA pro­pos­als:

    In other words, ARPA-E PDs tend to fund pro­pos­als on which review­ers dis­agree, given the same mean over­all score. When min­i­mum and max­i­mum score are included in the same mod­el, the coeffi­cient on min­i­mum score dis­ap­pears. This sug­gests that ARPA-E PDs are more likely to select pro­pos­als that were high­ly-rated by at least one review­er, but they are not deterred by the pres­ence of a low rat­ing. This trend per­sists when median score is included (Model 7 in Table 3). ARPA-E PDs tend to agree with the bulk of review­ers, and they also tend to agree with scores in the upper tail of the dis­tri­b­u­tion. They use their dis­cre­tion to sur­face pro­pos­als that have at least one cham­pi­on, regard­less of whether there are any detrac­tors…The results show that there is greater ex ante uncer­tainty in the ARPA-E research port­fo­lio com­pared to pro­pos­als with the high­est mean scores (Model 1).

    ↩︎
  29. The differ­ent patholo­gies might be: small ones will col­lec­tively try lots of strange or novel ideas but will fail by run­ning under­pow­ered poor­ly-done exper­i­ments (for lack of fund­ing & exper­tise) which con­vince no one, suffer from smal­l­-s­tudy bias­es, and merely pol­lute the lit­er­a­ture, giv­ing meta-an­a­lysts migraines. Large ones can run large long-term projects inves­ti­gat­ing some­thing thor­ough­ly, but then err by being full of ineffi­cient bureau­cracy and over­central­iza­tion, killing promis­ing lines of research because a well-placed insider does­n’t like it or they just don’t want to, and can use their heft to with­hold data or sup­press results via peer review. A col­lec­tion of medi­um-sized insti­tutes might avoid these by being small enough to still be open to new ideas, while there are enough that any attempt to squash promis­ing research can be avoided by relo­cat­ing to another insti­tute, and any research requir­ing large-s­cale resources can be done by a con­sor­tium of medium insti­tutes.

    Mod­ern genomics strikes me as a bit like this. Can­di­date-gene stud­ies were done by every Tom, Dick, and Har­ry, but the method­ol­ogy failed com­pletely because sam­ple sizes many orders of mag­ni­tude larger were nec­es­sary. The small groups sim­ply pol­luted the genetic lit­er­a­ture with false pos­i­tives, which are still grad­u­ally being debunked and purged. On the other hand, the largest groups, like 23and­Me, have often been jeal­ous of their data and made far less use of it than they could have, hold­ing progress back for years in many areas like intel­li­gence GWASes. The UK Biobank has pro­duced an amaz­ing amount of research for a large group, but is the excep­tion that proves the rule: their open­ness to researchers is (sad­ly) extra­or­di­nar­ily unusu­al. Much progress has come from groups like SSGAC or PGC, which are con­sor­tiums of groups of all sizes (with some highly con­di­tional par­tic­i­pa­tion from 23and­Me).↩︎

  30. Iron­i­cal­ly, as I write this in 2018, DARPA has recently announced another attempt at “sil­i­con com­pil­ers”, pre­sum­ably sparked by com­mod­ity chips top­ping out and ASICs being required, which I can only sum­ma­rize as “ but let’s do it sanely this time and with FLOSS rather than a crazy pro­pri­etary ecosys­tem of crap”.↩︎

  31. Specifi­cal­ly, con­tem­po­rary com­put­ers don’t use the dense grid of 1-bit proces­sors with local mem­ory which char­ac­ter­ized the CM. They do fea­ture increas­ingly thou­sands of ‘proces­sor’ equiv­a­lents in the form of CPU cores and the GPU cores, but those are all far more pow­er­ful than a CM CPU node. But we might yet see some con­ver­gence with the CM thanks to neural net­works: neural net­works are typ­i­cally trained with waste­fully pre­cise float­ing point oper­a­tions, slow­ing them down, thus the rise of ‘ten­sor cores’ and ‘TPUs’ using lower pre­ci­sion, like 8-bit inte­gers, and it is pos­si­ble to dis­cretize neural nets all the way down to binary weights. This offers a lot of poten­tial elec­tric­ity sav­ings, and if you have binary weights, why not binary com­put­ing ele­ments as well…?↩︎

  32. Which could also be said of orig­i­nal vision, as only par­tially demon­strated in the 1968.↩︎

  33. Peo­ple tend to ignore this, but CNNs can work with a few hun­dred or even just one or two images, using trans­fer learn­ing, few-shot learn­ing, and aggres­sive reg­u­lar­iza­tion like data aug­men­ta­tion.↩︎

  34. While the accu­racy rates may increase by what looks like a tiny amount, and one might ask how impor­tant a change from 99% to 99.9% accu­racy is, the large-s­cale train­ing papers demon­strate that neural nets con­tinue to learn hid­den knowl­edge from the addi­tional data which pro­vide ever bet­ter seman­tic fea­tures which can be reused else­where.↩︎