Technological developments can be foreseen but the knowledge is largely useless because startups are inherently risky and require optimal timing. A more practical approach is to embrace uncertainty, taking a reinforcement learning perspective.
12 July 2012–20 June 2019 finished certainty: likely importance: 6
- Visiting the Media Lab
- To Everything A Season
- Surfing Uncertainty
- See Also
- External Links
How do you time your startup? Technological forecasts are often surprisingly prescient in terms of predicting that something was possible & desirable and what they predict eventually happens; but they are far less successful at predicting the timing, and almost always fail, with the success (and riches) going to another.
Why is their knowledge so useless? Why are success and failure so intertwined in the tech industry? The right moment cannot be known exactly in advance, so attempts to forecast will typically be off by years or worse. For many claims, there is no way to invest in an idea except by going all in and launching a company, resulting in extreme variance in outcomes, even when the idea is good and the forecasts correct about the (eventual) outcome.
Progress can happen and can be foreseen long before, but the details and exact timing due to bottlenecks are too difficult to get right. Launching too early means failure, but being conservative & launching later is just as bad because regardless of forecasting, a good idea will draw overly-optimistic researchers or entrepreneurs to it like moths to a flame: all get immolated but the one with the dumb luck to kiss the flame at the perfect instant, who then wins everything, at which point everyone can see that the optimal time is past. All major success stories overshadow their long list of predecessors who did the same thing, but got unlucky. The lesson of history is that for every lesson, there is an equal and opposite lesson. So, ideas can be divided into the overly-optimistic & likely doomed, or the fait accompli. On an individual level, ideas are worthless because so many others have them too—‘multiple invention’ is the rule, and not the exception. Progress, then, depends on the ‘unreasonable man’.
This overall problem falls under the reinforcement learning paradigm, and successful approaches are analogous to Thompson sampling/posterior sampling: even an informed strategy can’t reliably beat random exploration which gradually shifts towards successful areas while continuing to take occasional long shots. Since people tend to systematically over-exploit, how is this implemented? Apparently by individuals acting suboptimally on the personal level, but optimally on societal level by serving as random exploration.
A major benefit of R&D, then, is in laying fallow until the ‘ripe time’ when they can be immediately exploited in previously-unpredictable ways; applied R&D or VC strategies should focus on maintaining diversity of investments, while continuing to flexibly revisit previous failures which forecasts indicate may have reached ‘ripe time’. This balances overall exploitation & exploration to progress as fast as possible, showing the usefulness of technological forecasting on a global level despite its uselessness to individuals.
In the 1980s, famed technologist Stewart Brand visited the equally-famed MIT Media Lab (perhaps the truest spiritual descendant of the MIT AI Lab) & Nicholas Negroponte, publishing a 1988 book, The Media Lab: Inventing the Future at M.I.T. (TML). Brand summarized the projects he saw there and Lab members’ extrapolations into the future which guided their projects, and added his own forecasting thoughts.
Three decades later, the book is highly dated, and the descriptions are of mostly historical interest for the development of various technologies (particularly in the 1990s). But enough time has passed since 1988 to enable us to judge the basic truthfulness of the predictions and expectations held by the dreamers such as Nicholas Negroponte: they were remarkably accurate! And the Media Lab wasn’t the only one, General Magic (1989), had an almost identical vision of a networked future powered by small touchscreen devices. (And what about Douglas Engelbart, or Alan Kay/Xerox PARC, who explicitly aimed to ‘skate towards where the puck would be’?) If you aren’t struck by a sense of déjà vu or pity when you read this book, compare the claims by people at the Media Lab with contemporary—or later—works like Clifford Stoll’s Silicon Snake Oil, and you’ll see how right they were.
Déjà vu, because what was described in TML on every other page is recognizably ordinary life in the 1990s and 2000s, never mind the 2010s, from the spread of broadband to the eventual impact of smartphones.
And pity, because the sad thing is noting how few future millionaires or billionaires grace the page of TML—one quickly realizes that yes, person X was 100% right about Y happening even when everyone thought it insane, except that X was just a bit off, by a few years, and either jumped the gun or was too late, and so some other Z who doesn’t even appear in TML was the person who wound up taking all the spoils. I read it constantly thinking ‘yes, yes, you were right—for all the good it did you!’, or ‘not quite, it’d actually take another decade for that to really work out’.
“I basically think all the ideas of the ’90s that everybody had about how this stuff was going to work, I think they were all right, they were all correct. I think they were just early.”
Marc Andreessen, 2014
The question constantly asked of anyone who claims to know a better way (as futurologists implicitly do): “If you’re so smart, why aren’t you rich?” The lesson I draw is: it is not enough to predict the future, one has to get the timing right to not be ruined, and then execute, and then get lucky in a myriad ways.
Many ‘bubbles’ can be interpreted as people being 100% correct the future—but missing the timing (Thiel’s article on China and bubbles1, The Economist on obscure property booms, Garber’s Famous First Bubbles). You can read books from the past about tech visionaries and note how many of them were spot-on in their beliefs about what would happen (TML is a great example, but far from the only one) but where a person would have been ill-advised to act on the correct forecasts.
“Whoever does not know how to hit the nail on the head should be entreated not to hit the nail at all.”
Many startups have a long list of failed predecessors who tried to do much the same thing, often simultaneous with several other competitors (startups are just as susceptible to multiple discovery as science/technology in general3). What made them a success was that they happened to give the pinata a whack at the exact moment where some S-curves or events hit the right point. Consider the ill-fated Pets.com: was the investor right to believe that Americans would spend a ton of money online such as for buying dogfood? Absolutely, Amazon (which has rarely turned a profit and has sucked up far more investment than Pets.com ever did, a mere ~$466m) is a successful online retail business that stocks thousands of dog food varieties, to say nothing of all the other pet-related goods it sells, and Chewy, which primarily does pet food, filed for a multi-billion-dollar IPO in 2019 on the strength of its billions in revenue. But the value of Pets.com stock still went to ~$0. Facebook is the biggest archive of photographs there has ever been, with truly colossal storage requirements; could it have succeeded in the 1990s? No, and not even later, as demonstrated by Orkut and Friendster, and the lingering death of MySpace. One of the most notorious tech business failures of the 1990s was the Iridium satellite constellation, but that was brought down by bizarrely self-sabotaging decisions on the part of Motorola, and when Motorola was finally removed from the equation, Iridium found its market, and 2017 saw the launch of the second Iridium satellite constellation, Iridium NEXT, with competition from other since-launched satellite constellations, including SpaceX’s own nascent Starlink (aiming at global broadband Internet) which launched no less than 60 satellites in May 2019. Or look at computers: imagine an early adopter of an Apple computer saying ‘everyone will use computers eventually!’ Yes, but not for another few decades, and ‘in the long run, we are all dead’. Early PC history is rife with examples of the prescient failing.
Smartphones are an even bigger example of this. How often did I read in the ’90s and early ’00s about how amazing Japanese cellphones were and how amazing a good smartphone would be, even though year after year the phones were jokes and used pretty much solely for voice? You can see the smartphones come up again and again in TML, as the visionaries realize how transformative a mobile pocket-sized computer would be. Yet, it took until the mid-00s for the promise of smartphones to materialize overnight, as it were, a success which went primarily to latecomers Apple and Google, cutting out the previously highly-successful Nokia, never mind visionaries like General Magic. (You too can achieve overnight success in just a few decades of hard work…) A 2013 interview with Eric Jackson looks back on smartphone adoption rates:
Q:“What’s your take on how they’re [Apple] handling their expansion into China, India, and other emerging markets?”
A:“It’s depressing how slow things are moving on that front. We can draw lines on a graph but we don’t know the constraints. Again, the issue with adoption is that the timing is so damn hard. I was expecting smartphones to take off in mid 2004 and was disappointed over and over again. And then suddenly a catalyst took hold and the adoption skyrocketed. Cook calls this ‘cracking the nut’. I don’t know what they can do to move faster but I suspect it has to do with placement (distribution) and with networks which both depend on (corrupt) entities.”
In 2012, I watched impressed as my aunt used the iPhone application FaceTime to video chat with her daughter half a continent away. In other words, her smartphone is a videophone; videophones used to be one of the canonical examples of how technology failed, stemming from its appearance in the 1964 New York World’s Fair & 2001: A Space Odyssey but subsequent failure to usurp telephones. This was oft-cited as an example of how technoweenies failed to understand that people didn’t really want videophones at all—‘who wants to put on makeup before making a call?’, people offered as an explanation, in all seriousness—but really, it looks like the videophones back then simply weren’t good enough.
Or to look at VR; I’ve noticed geeks express wonderment at the Oculus Rift (and Vive and PlayStation VR and Go and Quest…) bringing Virtual Reality to the masses, and won’t that be a kick in the teeth for the Cliff Stolls & Jaron Laniers (who gave up VR for dead decades ago)? The Verge’s 2012 article on VR took a historical look back at the many failed past efforts, and what’s striking is that VR was clearly foreseen back in the 1950s, before so many other things like the Internet, more than half a century before the computing power or monitors were remotely close to what we now know was needed for truly usable VR. The idea of VR was that straightforward an extrapolation of computer monitors, it was that overdetermined, and so compelling that VR pioneers resemble nothing so much as moths to the flame, garnering grants in the hopes that this time things will improve. And at some point, it does improve, and the first person to try at the right time may win the lottery; Palmer Luckey (founder of Oculus, sold to Facebook for $2.83 billion in March 20144):
Here’s a secret: the thing stopping people from making good VR and solving these problems was not technical. Someone could have built the Rift in mid-to-late 2007 for a few thousand dollars, and they could have built it in mid-2008 for about $647. It’s just nobody was paying attention to that.
“Yes, but when I discovered it, it stayed discovered.”
Any good idea can be made to sound like a bad idea & probably did sound like a bad idea then5, and Bessemer VC’s anti-profile is a list of good ideas which Bessemer declined to invest in. Michael Wolfe offers some examples of this:
- Facebook: the world needs yet another MySpace or Friendster except several years late. We’ll only open it up to a few thousand overworked, anti-social, Ivy Leaguers. Everyone else will then join since Harvard students are so cool.
- Dropbox: we are going to build a file sharing and syncing solution when the market has a dozen of them that no one uses, supported by big companies like Microsoft. It will only do one thing well, and you’ll have to move all of your content to use it.
- Virgin Atlantic: airlines are cool. Let’s start one. How hard could it be? We’ll differentiate with a funny safety video and by not being a**holes.
- …iOS: a brand new operating system that doesn’t run a single one of the millions of applications that have been developed for Mac OS, Windows, or Linux. Only Apple can build apps for it. It won’t have cut and paste.
- Google: we are building the world’s 20th search engine at a time when most of the others have been abandoned as being commoditized money losers. We’ll strip out all of the ad-supported news and portal features so you won’t be distracted from using the free search stuff.
- Tesla: instead of just building batteries and selling them to Detroit, we are going to build our own cars from scratch plus own the distribution network. During a recession and a cleantech backlash.6
- …Firefox: we are going to build a better web browser, even though 90% of the world’s computers already have a free one built in. One guy will do most of the work.
We can play this game all day:
How about Netflix? “We’ll start off renting people a doomed format in a way inferior to our established competitor Blockbuster (which will choose to commit suicide by ignoring both mail order & Internet all the way until bankruptcy in 2010)7; this will (somehow) let us pivot to streaming, where we will license all our content from our worst enemies, who will destroy us the instant we are too successful & already intend to run streaming services of their own—but that’s OK because we’ll just convince Wall Street to wait decades while giving us hundreds of billions of dollars to replace Hollywood by making thousands of film & TV series ourselves (despite the fact that we’ve never done anything like that before and there is no reason to think we would be any better at it than they are).”
Or Github: “We’ll offer code hosting services like that of SourceForge or Google Code which requires developers to use one of the most user-hostile DVCSes, only to FLOSS developers who are notorious cheapskates, and charge them a few bucks for a private version.”
SpaceX: “Orbital Sciences Corporation has a multi-decade headstart but are fat and lazy; we’ll catch up by buying some spare Russian rockets while we invent our own futuristic reusable ones. It’s only rocket science.”
PayPal: “Everyone else’s online payments has failed, so we’ll do it again, with anonymous cryptography! On phones! In 1998! End-users love cryptography, right? If the software doesn’t work out, I guess we’ll… do something else. We’re not sure what.” Later: “oh, apparently eBay sellers like us so much they’re making their own promotional materials? What if instead of threatening to sue them, we tried working with them?”
Venmo: “TextPayMe worked out well, right?”
Patreon: “Online micropayments & patronage schemes have failed hundreds of times and became a ’90s punchline; might as well try again.”
Bitcoin: “Every online-only currency from DigiCash to Flooz.com to e-gold to [too many to list] has either failed or been shut down by governments; so, we’ll use ‘proof of work’—it’s a hilariously expensive cryptographic thing we just made up which has zero theoretical support for actually ensuring decentralization & censorproofing, and was roundly mocked by almost every e-currency enthusiast who bothered to read the whitepaper.”
FedEx: “The experienced & well-capitalized Emery Air Freight is already trying and failing to make the hub-and-spoke air delivery method work; I’ll blow my inheritance on trying to compete with them while being so undercapitalized I’ll have to commit multiple crimes to keep FedEx afloat like literally gambling the company’s money at Las Vegas.”
Lotus 1-2-3: “VisiCalc literally invented the spreadsheet, has owned the market for 4 years despite clones like Microsoft’s, and singlehandedly made the Apple II PC a mega-success; we’ll write our own spreadsheet from scratch, fixing some of VisiCalc’s problems, and beat them to the IBM PC. Everyone will buy it simply because it’ll be slightly better.”
Airbnb: “We’ll max out our credit cards to let people illegally rent out their air mattresses en route to eating the hotel industry.”
Stripe: “Banks & online payment processors like PayPal are heavily-regulated inefficient monopolies which really suck; we’ll make friends with some banks and run a payment processor which doesn’t suck. Our signature selling point will be that it takes fewer lines of code to set up, so programmers will like us.”
Slack: “IRC+email but infinitely slower & more locked in. Businesses won’t be able to get enough of it; employees will love to hate it.”
You don’t have to be bipolar to be an entrepreneur, but it might help. (“The most successful people I know believe in themselves almost to the point of delusion…”)
“After solving a problem, humanity imagines that it finds in analogous solutions the key to all problems. Every authentic solution brings in its wake a train of grotesque solutions.”
“You can’t possibly get a good technology going without an enormous number of failures. It’s a universal rule. If you look at bicycles, there were thousands of weird models built and tried before they found the one that really worked. You could never design a bicycle theoretically. Even now, after we’ve been building them for 100 years, it’s very difficult to understand just why a bicycle works–it’s even difficult to formulate it as a mathematical problem. But just by trial and error, we found out how to do it, and the error was essential.”
Why so many failed predecessors?
Part of the explanation is survivorship bias causing hindsight bias. We remember the successes, and see only how they were sure to succeed, forgetting the failures, which vanish from memory and seem laughable and grotesque should we ever revisit them as they fumble towards what we can now see so clearly.
The origins of many startups are highly idiosyncratic & chancy; eg. why should a podcasting company, Odeo, have led to Twitter? Survival alone is highly chancy, and founders can often see times where it came down to a dice roll.8 Like historical events in general (Risi et al 2019), the importance of an event or change is often known only in retrospect. Overall, the odds of success are low, and the rewards are not great for most—despite the skewed distribution producing occasional eye-popping returns in a few cases, the risk-adjusted return of the technology sector or VC funds is not that much greater than the broader economy.
“Of course Google was always going to be a huge success because of PageRank and also (post hoc theorizing) Z, Y, & Z”, except for the minor problem that Google was merely one of many search engines, great perhaps9 but not profitable, and didn’t hit upon a profitable business model—much less a unicorn-worth model—until 4 years later when it copied Overture’s advertising auction, which was its salvation (In The Plex); in the mean time, Google had to sign potentially fatal deals or risking burning through the last of its capital when minor technical glitches derailed vital deals. (All of which was doubtless why Page & Brin tried & failed to sell Google to AltaVista & Excite & Yahoo early on, and negotiated a possible sale with Yahoo as late as 2002 which they ultimately rejected.) In a counterfactual world, Google went down in flames quite easily because it never hit upon the advertising innovations that saved it, no matter how much you liked PageRank, and anything else is hindsight bias. Fedex, early on, couldn’t make payroll and the founder famously kept the planes flying only by gambling the last of their money in Las Vegas, among other near-death experiences & crimes—just one of many startups doing highly questionable things.10 Both SpaceX & Tesla have come within days (or hours) of bankruptcy, in 2008 and 2013; in the former case, Musk borrowed money from friends to pay his rent after 3 rocket failures in a row, and in the latter, Musk reportedly went as far as securing a pledge from Google to buy Tesla outright rather than let it go bankrupt (Vance 2015). Tesla’s struggles in general are too well known to mention. Mark Zuckerberg, in 2004, wanted nothing more than to sell Facebook for a few million dollars so he could work on his P2P filesharing program, Wirehog, commenting that the sale price just needed to be large enough “to propel Wirehog.” Youtube was a dating site. Stewart Butterfield wanted to make a MMORPG game which failed, and all he could salvage out of it was the photo-sharing part, which became Flickr; he still really wanted to make a MMORPG, so after Flickr, he founded a company to make the MMORPG Glitch which… also failed, so after trying to shut down his company and being told not to by his investors, he salvaged the chat part from it, which became Slack. And, consistent with the idea that there is a large ineradicable element of chance to it, surveys of startups suggest that while there are individual differences in odds of success (‘skill’), any founder learning curve (‘learning-by-doing’) is small & success probability remains low regardless of experience (Gompers et al 2006/Gompers 2010, Parker 2011, Gottschalk 2014), and experienced entrepreneurs still have low odds of forecasting startups achieving commercialization at all, approaching random predictions in “non-R&D-intensive sectors” (eg Scott et al 2019, McKenzie & Sansone 2019).
Thiel (Zero to One; original): “Every moment in business happens only once. The next Bill Gates will not build an operating system. The next Larry Page or Sergey Brin won’t make a search engine. And the next Mark Zuckerberg won’t create a social network. If you are copying these guys, you aren’t learning from them.”. This is true but I would say it reverses the order (‘N to N+1’?): you will not be the next Bill Gates, because Bill Gates was not the first and only Bill Gates, he was, pace Stigler’s Law, the last Bill Gates11; many people made huge fortunes off OSes, both before and after Gates—you may have forgotten Wang, but hopefully you remember Steve Jobs (before, Mac) and Steve Jobs (after, NeXT). Similarly, Mark Zuckerberg was not the first and only Zuckerberg, he was the last Zuckerberg; many people made social networking fortunes before him—maybe Orkut didn’t make its Google inventor a fortune, but you can bet that MySpace’s DeWolfe and Anderson did well. And there were plenty of lucrative search engine founders (is Jerry Yang still a billionaire? Yes).
Gates, however, proved the market, and refined the Gates strategy to perfection, using up the trick; no one can get historically rich off shipping an OS plus some business productivity software because there are too many competitors and too many players interested in ensuring that no one becomes the next Gates, and so opportunity has moved on to the next area.
A successful company rewrites history and its precursors12; history must be lived forward, progressing to an obscure destination, but we always recall it backwards as progressing towards the clarity of the present.
“It is universally admitted that the unicorn is a supernatural being and one of good omen; thus it is declared in the Odes, in the Annals, in the biographies of illustrious men, and in other texts of unquestioned authority. Even the women and children of the common people know that the unicorn is a favorable portent. But this animal does not figure among the domestic animals, it is not easy to find, it does not lend itself to any classification. It is not like the horse or the bull, the wolf or the deer. Under such conditions, we could be in the presence of a unicorn and not know with certainty that it is one. We know that a given animal with a mane is a horse, and that one with horns is a bull. We do not know what a unicorn is like.”
Can you ask researchers if the time is ripe? Well: researchers have a slight conflict of interest in the matter, and are happy to spend arbitrary amounts of money on topics without anything to show for it. After all, why would they say no?
I ended up doing more work in Japan than anything else because Japan in general is so tech-smitten and obsessed that they just love it [VR]. The Japanese government in general was funding research, building huge research complexes just to focus on this. There were huge initiatives while there was nothing happening in the US. I ended up moving to Japan and working there for many years.
Indeed, this would have around the Japanese boondoggle the Fifth Generation Project (note that despite Japan’s reputed prowess at robotics, it is not Japan’s robots who went into Fukushima / flying around the Middle East / revolutionizing agriculture and construction). All those ‘huge initiatives’ and…? Don’t ask Fisher, he’s hardly going to say, “oh yeah, all the money was completely wasted, we were trying to do it too soon; our bad”. And Lanier implies that Japan alone spent a lot of money:
Jaron Lanier: “The components have finally gotten cheap enough that we can start to talk about them as being accessible in the way that everybody’s always wanted…Moore’s law is so interesting because it’s not just the same components getting cheaper, but it really changes the way you do things. For instance, in the old days, in order to tell where your head was so that you could position virtual content to be standing still relative to you, we used to have to use some kind of external reference point, which might be magnetic, ultrasonic, or optical. These days you put some kind of camera on the head and look around in the room and it just calculates where you are—the headsets are self-sufficient instead of relying on an external reference infrastructure. That was inconceivable before because it would have been just so expensive to do that calculation. Moore’s law really just changes again and again, it re-factors your options in really subtle and interesting ways.”
Kevin Kelly: “Our sense of history in this world is very dim and very short. We were talking about the past: VR wasn’t talked about for a long time, right? 35 years. Most people have no idea that this is 35 years old. 30 years later, it’s the same headlines. Was the technological power just not sufficient 30 years ago?”
JL: “Both I and a lot of other people really, really wanted to get a consumerable version of this stuff out. We managed to get a taste of the experience with something called the Power Glove…Sony actually brought out a little near-eye display called Virtual Boy; not very good, but they gave it their best shot. and there were huge projects that have never been shown to the public to try to make a consumable [VR product], very expensive ones. Counting for inflation, probably more money was spent [than] than Facebook just spent on Oculus. We just could never, never, never get it quite there.”
JL: “The component cost. It’s Moore’s law. Sensors, displays… batteries! Batteries is a big one.”
Issues like component cost were not something that could be solved by a VR research project, no matter how ambitious. Those were hard binding limits, and to solve them by creating tiny high-resolution LED/LCD screens for smartphones, required the benefit of decades of Moore’s law and the experience curve effects of manufacturing billions of smartphones.
Researchers in general have no incentive to say, “this is not the right time, wait another 20 years for Moore’s law to make it doable”, even if everyone in the field is perfectly aware of this—Palmer Luckey:
I spent a huge amount of time reading…I think that there were a lot of people that were giving VR too much credit, because they were working as VR researchers. You don’t want to publish a paper that says, ‘After the study, we came to the conclusion that VR is useless right now and that we should just not have a job for 20 years.’ There were a few people that basically came to that conclusion. They said, ‘Current VR gear is low field of view, high lag, too expensive, too heavy, can’t be driven properly from consumer-grade computers, or even professional-grade computers.’ It turned out that I wasn’t the first person to realize these problems. They’d been known for decades.
AI researcher Donald Michie, claimed in 1970, based on a 1969 poll, that a majority of AI researchers estimated 10–100 years for AGI (or 1979–2069) and that “There is also fair agreement that the chief obstacles are not hardware limitations.”14 While AI researcher surveys still suggest that wasn’t a bad range (Gruetzemacher et al 2019), the success of deep learning makes clear that hardware was a huge limitation, and resources 50 years ago fell short by at least 6 orders of magnitude. Michie went on to point out that in a previous case, Charles Babbage, his work was foredoomed by it being an “unripe time” due to hardware limitations and represented a complete waste of time & money15. This, arguably, was the case for Michie’s own research.
“But to come very near to a true theory, and to grasp its precise application, are two very different things, as the history of science teaches us. Everything of importance has been said before by somebody who did not discover it.”
So you don’t know the timing well enough to reliably launch. You can’t imitate a successful entrepreneur, the time is past. You can’t foresee what will be successful based on what has been successful; you can’t even foresee what won’t be successful based on what was already unsuccessful; and you can’t ask researchers because they are incentivized to not know the timing any better than anyone else.
Can you at least profit from your knowledge of the outcome? Here again we must be pessimistic.
Certainty is irrelevant, you still have problems making use of this knowledge. Example: in retrospect, we know everyone wanted computers, OSes, social networks—but the history of them is strewn with flaming rubble. Suppose you somehow knew in 2000 that “in 2010, the founder of the most successful social network will be worth at least $10b”; this is a falsifiable belief at odds with all conventional wisdom and about a tech that blindsided everyone. Yet, how useful would this knowledge be, really? What would you do with it? Do you have the capital to start a VC fund of your own, and throw multi-million-dollar investments at every social media until finally in 2010 you knew for sure that Facebook was the winning ticket and could cash out in the IPO? I doubt it.
It’s difficult to invest in ‘computers’ or ‘AI’ or ‘social networking’ or ‘VR’; there is no index for these things, and it is hard to see how there even could be such a thing. (How do you force all relevant companies to sell tradable stakes? “If people don’t want to go to the ball game, how are you going to stop them?” as Yogi Berra asked.) There is no convenient
CMPTR you can buy 100 shares of and hold indefinitely to capture gains from your optimism about computers. IBM and Apple both went nearly bankrupt at points, and Microsoft’s stock has been flat since 1999 or whenever (translating to huge real losses and opportunity costs to long-term holders of it). If you knew for certain that Facebook would be as huge as it was, what stocks, exactly, could you have invested in, pre-IPO, to capture gains from its growth? Remember, you don’t know anything else about the tech landscape in the 2000s, like that Google will go way up from its IPO, you don’t know about Apple’s revival under Jobs—all you know is that a social network will exist and will grow hugely. Why would anyone think that the future of smartphones would be won by “a has-been 1980s PC maker and an obscure search engine”? (The best I can think of would be to sell any Murdoch stock you owned when you heard they were buying MySpace, but offhand I’m not sure that Murdoch didn’t just stagnate rather than drop as MySpace increasingly turned out to be a writeoff.) In the hypothetical that you didn’t know the name of the company, you might’ve bought up a bunch of Google stock hoping that Orkut would be the winner, but while that would’ve been a decent investment (yay!) it would have had nothing to do with Orkut (oops)…
And even when there are stocks available to buy, you only benefit based on the specifics—like one of the existing stocks being a winner, rather than all the stocks being eaten by some new startup. Let’s imagine a different scenario, where instead you were confident that home robotics were about to experience a huge growth spurt. Is this even nonpublic knowledge at all? The world economy grows at something like 2% a year, labor costs generally seem to go up, prices of computers and robotics usually falls… Do industry projections expect to grow their sales by <25% a year?
But say that the market is wrongly pessimistic. If so, you might spend some of your hypothetical money on whatever the best approximation to a robotics index fund you can find, as the best of a bunch of bad choices. (Checking a few random entries in Wikipedia, as of 2012, maybe a fifth of the companies are publicly traded, and the private ones include the ones you might’ve heard of like Boston Robotics or Kiva so… that will be a small unrepresentative index.) Suppose the home robotic growth were concentrated in a single private company which exploded into the billions of annual revenue and took away the market share of all the others, forcing them to go bankrupt or merge or shrink. Home robotics will have increased just as you believed—keikaku doori!—yet your ‘index fund’ gone bankrupt (reindex when one of the robotics companies collapses? Reindex into what, another doomed firm?). Then after your special knowledge has become public knowledge, the robotics company goes public, and by EMH, their shares become a normal investment.
There were 272 automobile companies in 1909. Through consolidation and failure, 3 emerged on top, 2 of which went bankrupt. Spotting a promising trend and a winning investment are two different things.
Is this impossibly rare? It sounds like Facebook! They grew fast, roflstomped other social networks, stayed private, and post-IPO, public investors have not profited all that much compared to even late investors.
Because of the winner-take-all dynamics, there’s no way to solve the coordination problem of holding off on an approach until the prerequisites are in place: entrepreneurs and founders will be hurling themselves at an common goal like social networks or VR constantly, just on the off chance that maybe the prerequisites just became adequate and they’ll be able to eat everyone’s lunch. A predictable waste of money, perhaps, but that’s how the incentives work out. It’s a weird perspective to take, but we can think of other technologies which may be like this.
Bitcoin is a topical example: it’s still in the early stages where it looks either like a genius stroke to invest in, or a fool’s paradise/Ponzi scheme. In my first draft of this essay in 2012, I noted that we see what looks like a Bitcoin bubble as the price inflates from ~$0 to $165—yet, if Bitcoin were the Real Deal, we would expect large price increases as people learn of it and it directly gains value from increased use, an ecosystem slowly unlocking the fancy cryptographic features, etc. And in 2019, with 2012 a distant memory, well, one could say something similar, just with larger numbers…
Or take niche visionary technologies: if cryonics was correct in principal, yet turned out to be worthless for everyone doing it before 2030 (because the wrong perfusion techniques or cryopreservatives were used and some critical bit of biology was not vitrified) while practical post-2030 say, it would simply be yet another technology where visionaries were ultimately right despite all nay-saying and skepticism from normals but nevertheless wrong in a practical sense because they jumped on it too early, and so they wasted their money.
Indeed, do many things come to pass.
“Whatsoever thy hand findeth to do, do it with thy might; for there is no work, nor device, nor knowledge, nor wisdom, in the grave, whither thou goest.”
“Ummon addressed the assembly and said: ‘I am not asking you about the days before the fifteenth of the month. But what about after the fifteenth? Come and give me a word about those days.’ And he himself gave the answer for them: ‘Every day is a good day.’”
Where does this leave us? In what I would call, in a nod to Thiel’s ‘definite’ vs ‘indefinite optimism’, definitely-maybe optimism. Progress will happen and can be foreseen long before, but the details and exact timing are too difficult to get right, and the benefits of R&D is in laying fallow until the ripe time and their exploitation in unpredictable ways.
Returning to Donald Michie: one could make fun of his extremely overly-optimistic AI projections, and write him off as the stock figure of the biased AI researcher blinded by the ‘Maes-Garreau law’ where AI is always scheduled for right when a researcher will retire17 but while he was wrong, it is unclear this was a mistake because in other cases, an apparently doomed research project—Marconi’s attempt to radio across the Atlantic ocean—succeeded because of an unknown factor—the Kennelly–Heaviside layer18. We couldn’t know for sure that such projections were wrong, and the amount of money being spent back then on AI was truly trivial (and the commercial spinoffs likely paid for it all anyway).
Further, on the gripping hand, Michie suggests that such research efforts like Babbage’s should be thought of not as commercial R&D, expected to usually pay off right now, but as prototypes buying optionality, demonstrating that a particular technology was approaching its ‘ripe time’ & indicating what are the bottlenecks, so society can go after the bottlenecks and then has the option to scale up the prototype as soon as the bottlenecks are fixed19. Richard Hamming describes ripe time as finally enabling attacks on consequential problems20 Edward Boyden describes the development of both optogenetics & expansion microscopy as “failure rebooting”, revisiting (failed) past ideas which may now be workable in the light of progress in other areas21. As time passes, the number of options may open up, and any of them may bypass what was formerly a necessary or serial dependency which was fatal. Enough progress in one domain (particularly computing power), can sometimes make up for stasis in another domain.
So, what Babbage should have aimed for is not making a practical thinking machine which could churn out naval tables, but demonstrating that a programmable thinking machine is possible & useful, and currently limited by the slowness & size of its mechanical logic—so that transistors could be pursued with higher priority by governments, and programmable computers could be created with transistors as soon as possible, instead of the historical course of a meandering piecemeal development where Babbage’s work was forgotten & then repeatedly reinvented with delays (eg Konrad Zuse vs von Neumann). Similarly, the benefit of taking Moore’s law seriously is that one can plan ahead to take advantage of it22 even if one doesn’t know exactly when, if ever, it will happen.
Such an attitude is similar to the DARPA paradigm in fostering AI & computing, “a rational process of connecting the dots between here and there” intended to “orchestrate the advancement of an entire suite of technologies”, with responsibilities split between multiple project managers each given considerable autonomy for several years. These project managers tend to pick polarizing projects rather than consistent projects (Goldstein & Kearney 2017), ones which generate disagreement among reviews or critics. Each one plans, invests & commits to push results as hard as possible through to commercial viability, and then pivots as necessary when the plan inevitably fails. (DARPA indeed saw itself as much like a VC firm.)
The benefit for someone like DARPA of a forecast like Moore’s law is that it provides one fixed trend to gauge overall timing to within a decade or so, and look for those dots which have lagged behind and become reverse salients.23 For an entrepreneur, the advantage of exponential thinking is more fatalistic: being able to launch in the window of time between just after technical feasibility but before someone else randomly gives it a try; if wrong and it was always impossible, it doesn’t matter when one launches, and if wrong because timing is wrong, one’s choice is effectively random and little is lost by delay.
“The road to wisdom?—Well, it’s plain
and simple to express:
and err again
This presents a conflict between personal and social incentives. Socially, one wants people regularly tossing their bodies into the marketplace to be trampled by uncaring forces just on the off chance that this time it’ll finally work, and since the critical factors are unknown and constantly changing, one needs a sacrificial startup every once in a while to check (for a good idea, no amount of failures is enough to prove that it should never be tried—many failures just implies that there should be a backoff). Privately, given the skewed returns, diminishing utility, the oversized negative impacts (a bad startup can ruin one’s life and drive one to suicide), the limited number of startups any individual can engage in (yielding gambler’s ruin)24, and the fact that startups & VC will capture only a minute percentage of the total gains from any success (most of which will turn into consumer surplus/positive externalities), the only startups that make any rational sense, which you wouldn’t have to be crazy to try, are the overdetermined ones which anyone can see are a great idea. However, those are precisely the startups that crazy people will have done years before when they looked like bad ideas, avoiding the waste of delay. Further, people in general appear to overexploit & underexplore, exacerbating the problem—even if the expected value of a startup (or experimentation, or R&D in general) is positive for individuals
So, it seems that rapid progress depends on crazy people.
There is a more than superficial analogy here, I think, to Thompson sampling25/posterior sampling (PSRL) Bayesian reinforcement learning. In RL’s multi-armed bandit setting, each turn one has a set of ‘arms’ or options with unknown payoffs and one wants to maximize the total long-term reward. The difficulty is in coping with failure: even good options may fail many times in a row, and bad options may succeed, so options cannot simply be ruled out after a failure or two, and if one is too hasty to write an option off, one may take a long time to realize that, losing out for many turns.
One of the simplest & most efficient MAB solutions, which maximizes the total long-term reward and minimizes ‘regret’ (opportunity cost), is Thompson sampling & its generalization PSRL26: randomly select each option with a probability equal to the current estimated probability that it is the most profitable option. This explores all options initially but gradually homes in on the most profitable option to exploit most of the time, while still occasionally exploring all the other options once in a while, just in case; strictly speaking Thompson sampling will never ban an option permanently, the probability of selecting an option merely becomes vanishingly rare. Bandit settings can further assume that options are ‘restless’ and the optimal option may ‘drift’ over time or ‘run out’ or ‘switch’, in which case one also estimates the probability that an option has switched, and when it does, one changes over to the new best option; instead of the regular Thompson sampling where bad options become ever more unlikely to be tried, a restless bandit results in constant low-level exploration because one must constantly check lest one fails to notice a switch.
This bears a resemblance to startup rates over time: an initial burst of enthusiasm for a new ‘option’, when it still has high prior probability of being the most profitable option at the moment, triggers a bunch of startups selecting that option, but then when they fail, the posterior probability drops substantially; however, even if something now looks like a bad idea, there will still be people every once in a while who insist on trying again anyway, and, because the probability is not 0, once in a while they succeed wildly and everyone is astonished that ‘so, X is a thing now!’
In DARPA’s research funding and VC, they often aren’t looking for a plan which looks good on average to everyone, or which no one can find any particular problem with, but something closer to a plan which at least one person thinks could be awesome for some reason. An additional analogy from reinforcement learning is PSRL, which handles more complex problems by committing to a strategy and following it until the end and either success/failure. A naive Thompson sampling would do badly in a long-term problem because at every step, it would ‘change its mind’ and be unable to follow any plan consistently for long enough to see what happens; what is necessary is to do ‘deep exploration’, following a single plan long enough to see how it works, even if one thinks that plan is almost certainly wrong, one must “Disagree and commit”. The average of multiple plans is often worse than any single plan. The most informative plan is the most polarizing one.27
The system as a whole can be seen in RL terms. One theme I notice in many systems is that they follow a multi-level optimization structure where slow blackbox methods give rise to more efficient Bayesian inference. Ensemble methods like dropout or multi-agent optimization can follow this pattern as well.
A particularly germane example here is Krafft et al 2016/Krafft 2017 (discussion), which examines a large dataset of trades made by eToro online traders, who are able to clone financial trading strategies of more successful traders; as traders find successful strategies, others gradually imitate them, and so the system as a whole converges on better strategies in what they identify as a sort of particle filter-like implementation of “distributed Thompson sampling” which they dub “social sampling”. So for the most part, traders clone popular strategies, but with certain probabilities, they’ll randomly explore rarer apparently-unsuccessful strategies.
This sounds a good deal like individuals pursuing standard careers & occasionally exploring unusual strategies like a startup; they will occasionally explore strategies which have performed badly (ie. previous similar startups failed). Entrepreneurs, with their speculations and optimistic biases, serve as randomization devices to sample a strategy regardless of the ‘conventional wisdom’, which at that point may be no more than an information cascade; information cascades, however, can be broken by the existence of outliers who are either informed or act at random (“misfits”). While each time a failed option is tried, it may seem irrational (“how many times must VR fail before people finally give up on it‽”), it was still rational in the big picture to give it a try, as this collective strategy collectively minimizes regret & maximizes collective total long-term returns—as long as failed options aren’t tried too often.
What does this analogy suggest? The two failure modes of a MAB algorithm are investing too much in one option early on, and then investing too little later on; in the former, you inefficiently buy too much information on an option which happened to have good luck but is not guaranteed to be the best at the expense of others (which may in fact be the best), while in the latter, you buy too little & risk permanently making a mistake by prematurely rejecting an apparently-bad option (which simply had bad luck early on). To the extent that VC/startups stampede into particular sectors, this leads to inefficiency of the first time—were so many ‘green energy’ startups necessary? When they began failing in a cluster, information-wise, that was highly redundant. And then on the other hand, if a startup idea becomes ‘debunked’, and no one is willing to invest in it ever, that idea may be starved of investment long past its ripe time, and this means big regret.
I think most people are aware of fads/stampedes in investing, but the latter error is not so commonly discussed. One idea is that a VC firm could explicitly track ideas that seem great but have had several failed startups, and try to schedule additional investments at ever greater intervals (similar to DS-PRL), which bounds losses (if the idea turns out to be truly a bad idea after all) but ensures eventual success (if a good one). For example, even if online pizza delivery has failed every time it’s tried, it still seems like a good idea that people will want to order pizza online via their smartphones, so one could try to do a pizza startup 2.5 years later, then 5 years later, then 10 years, then 20 years, or perhaps every time computer costs drop an order of magnitude, or perhaps every time the relevant market doubles in size? Since someone wanting to try the business again might not pop up at the exact time desired, a VC might need to create one themselves by trying to inspire someone to do it.
What other lessons could we draw if we thought about technology this way? The use of lottery grants is one idea which has been proposed, to help break the over-exploitation fostered by peer review; the randomization gives disfavored low-probability proposals (and people) a chance. If we think about multi-level optimization systems & population-based training, and optimization of evolution like strong amplifiers (which resemble small but networked communities: Pavlogiannis et al 2018), that would suggest we should have a bias against both large and small groups/institutes/granters, because small ones are buffeted by random noise/drift and can’t afford well-powered experiments, but large ones are too narrow-minded.28 But a network of medium ones can both explore well and then efficiently replicate the best findings across the network to exploit them.
Review of DARPA history book, Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993, Roland & Shiman 2002, which reviews a large-scale DARPA effort to jumpstart real-world uses of AI in the 1980s by a multi-pronged research effort into more efficient computer chip R&D, supercomputing, robotics/self-driving cars, & expert system software. Roland & Shiman 2002 particularly focus on the various ‘philosophies’ of technological forecasting & development, which guided DARPA’s strategy in different periods, ultimately endorsing a weak technological determinism where the bottlenecks are too large for a small (in comparison to the global economy & global R&D) organization best a DARPA can hope for is a largely agnostic & reactive strategy in which granters ‘surf’ technological changes, rapidly exploiting new technology while investing their limited funds into targeted research patching up any gaps or lags that accidentally open up and block broader applications.
While reading “Funding Breakthrough Research: Promises and Challenges of the ‘ARPA Model’”, Azoulay et al 2018, on DARPA, I noticed an interesting comment:
In this paper, we propose that the key elements of the ARPA model for research funding are: organizational flexibility on an administrative level, and significant authority given to program directors to design programs, select projects and actively manage projects. We identify the ARPA model’s domain as mission-oriented research on nascent S-curves within an inefficient innovation system.
…Despite a great deal of commentary on DARPA, lack of access to internal archival data has hampered efforts to study it empirically. One notable exception is the work of Roland and Shiman (2002),2 who offer an industrial history of DARPA’s effort to develop machine intelligence under the “Strategic Computing Initiative” [SCI]. They emphasize both the agency’s positioning in the research ecosystem—carrying military ideas to proof of concept that would be otherwise neglected—as well as the program managers’ role as “connectors” in that ecosystem. Roland and Shiman are to our knowledge the only academic researchers ever to receive internal access to DARPA’s archives. Recent work by Goldstein and Kearney (2018a) on ARPA-E is to-date the only quantitative analysis using internal program data from an ARPA agency. [For insights into this painful process, see the preface of Roland and Shiman (2002).]
The two Goldstein & Kearney 2018 papers sounded interesting but alas, are listed as “manuscript under review”/“manuscript in preparation”; only one is available as a preprint. I was surprised that an agency as well known and intimately involved in computing history could be described as having one internal history, ever, and looked up a PDF copy of Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993, Roland & Shiman 2002.
The preface makes clear the odd footnote: while they may have had some access to internal archival data, they had a lot less access than they requested, DARPA was not enthusiastic about it, and eventually canceled their book contract (they published anyway). This leads to an… interesting preface. You don’t often hear historians of solicited official histories describe the access as a “mixed blessing” and say things like “they never lied to us, as best as we can tell”, they just “simply could not understand why we wanted to see the materials we requested”, or recount that their “requests for access to these [emails] were most often met with laughter”, noting that “We were never explicitly denied access to records controlled by DARPA; we just never gained complete access.” Frustrated, they
…then asked if they could identify any document in the SC program that spoke the truth, that could be accepted at face value. They [ARPA interviewees] found this an intriguing question. They could not think of a single such document. All documents, in their view, distorted reality one way or another—always in pursuit of some greater good.
In one anecdote from the interviews, Lynn Conway shows up with a stack of internal DARPA documents, states that a NDA prevents her from talking about them (as if anyone cared about NDAs from decades before), and refuses to show any of the documents to the interviewer, leaving me rather bemused—why bother? (Although in this case, it may just be that Conway is a jerk—one might remember her from helping try to frame Michael Bailey for sexual abuse.) I was reminded a little of Carter Scholz’s also 2002 novel, Radiance, which touches on SDI and indirectly on SCI.
The book itself doesn’t seem to have suffered too badly for the birth pangs. It’s an overview of the birth and death of the SCI, organized in chunks by the manager. The division by manager is not an accident—R&S comment deprecatingly about DARPA personnel being focused on the technology and how they didn’t want them to “talk about people and politics” and invoke the strawman of “technological determinists”; they seem to adopt the common historian pose that a sophisticated historian focuses on people and it is naive & unsophisticated to invoke objective constraints of science & technology & physics. This is wrong in the context of SCI, as their in-depth recounting will eventually make clear. The people did not have much to do with the failures: stuff like gallium arsenide or expert systems or autonomous robots didn’t work out because they don’t work or are hard or require computing power unavailable at the time, not because some bureaucrat made a bad naming choice or ran afoul of the wrong Senator. People don’t matter to something like Moore’s law. Man proposes but Nature disposes—you can fake medicine or psychology easily, but it’s harder to fake a robot not running into trees. Fortunately, for all the time R&S spend on project managers shuffling around acronyms, they still devote adequate space to the actual science & technology and do a good job of it.
So what was SCI? It was a 1980s–1990 add-on to ARPA’s existing funding programs, where the spectre of Japan’s Fifth Generation Project was used to lobby Congress for additional R&D funding which would be devoted to a cluster of interconnected technological opportunities ARPA spied on the US horizon, to push them forward simultaneously and break the logjams. (As always, “funding comes from the threat”, though many were highly skeptical that Fifth Generation would go anywhere or that its intended goals—much of which was to simply work around flaws in Japanese language handling—were much of a threat, and most Western evaluations of it generally describe it as a failure or at least not a notably productive R&D investment.) The systems included gallium arsenide chips to replace silicon’s poor thermal/radiation tolerance and operate at faster frequencies as well, VLSI chips which would combine previously disparate chips onto a single small chip as part of a silicon design ecosystem which would design & manufacture chips much faster than previously29, parallel processing computers going far beyond just 1 or 2 processors, autonomous car robots, AI expert systems, and advanced user-friendly software tools in general. The name “Strategic Computing Initiative” was chosen to try to benefit from Reagan’s SDI, but while the military connections remained throughout, the connection was ultimately quite tenuous and the gallium arsenide chips were deliberately split out to SDI to avoid contamination, although the US military would still be the best customer for many of the products & the connections continued to alienate people. Surprisingly—shockingly, even—computer networking was not a major SCI focus: the ARPA networking PM Barry Leiner kept clear of SCI (not needing the money & fearing a repeat of know-nothing Republican Congressmen searching for something to axe). The funding ultimately amounted to $2,205,607, trivial compared to total military funding, but still real money.
The project implementation followed ARPA’s existing loose oversight paradigm, where traveling project managers were empowered to dispense grants to applicants on their own authority, depending primarily on their own good taste to match talented researchers with ripe opportunities, with bureaucracy limited to meeting with the grantees semi-annually or annually for progress reports & evaluation, often in groups so as to let researchers test each other’s mettle & form social ties. (“ARPA program managers like to repeat the quip that they are 75 entrepreneurs held together by a common travel agent.”) An ARPA PM would humbly ‘surf’ the cutting-edge, going with the waves rather than swimming upstream, so to speak, to follow growing trends while cutting their losses on dead ends, to bring things through the ‘valley of death’ between lab prototype and the real world:
Steven Squires, who rose from program manager to be Chief Scientist of SC and then director of its parent office, sought orders-of-magnitude increases in computing power through parallel connection of processors. He envisioned research as a continuum. Instead of point solutions, single technologies to serve a given objective, he sought multiple implementations of related technologies, an array of capabilities from which users could connect different possibilities to create the best solution for their particular problem. He called it “gray coding”. Research moved not from the white of ignorance to the black of revelation, but rather it inched along a trajectory stepping incrementally from one shade of gray to another. His research map was not a quantum leap into the unknown but a rational process of connecting the dots between here and there. These and other DARPA managers attempted to orchestrate the advancement of an entire suite of technologies. The desideratum of their symphony was connection. They perceived that research had to mirror technology. If the system components were to be connected, then the researchers had to be connected. If the system was to connect to its environment, then the researchers had to be connected to the users. Not everyone in SC shared these insights, but the founders did, and they attempted to instill this ethos in the program.
Done wrong, of course, this results in a corrupt slush fund doling out R&D funds to an incestuous network of grantees for technologies always just on the horizon and whose failure is always excused by the claim that high-risk research often won’t work out, or results in elaborate systems trying to do too many things and collapsing under the weight of many advanced half-debugged systems chaotically interacting (eg ILLIAC IV). Having been conceived in scientific sin and born of blue-uniform bureaucracy while midwifed by conniving committees, SCI’s prospects might not look too great.
So, did SCI work out? The answer is a definite, unqualified—maybe:
At the end of their decade, 1983–1993, the connection failed. SC never achieved the machine intelligence it had promised. It did, however, achieve some remarkable technological successes. And the program leaders and researchers learned as much from their failures as from their triumphs. They abandoned the weak components in their system and reconfigured the strong ones. They called the new system “high performance computing”. Under this new rubric they continued the campaign to improve computing systems. “Grand challenges” replaced the former goal, machine intelligence; but the strategy and even the tactics remained the same.
The end of SCI coincided with (and partially caused) the “AI Winter”, but SCI went beyond just the Lisp machine & expert system software companies we associate with the AI winter. Of the systems, some worked out, others were good ideas but the time wasn’t ripe in an unforeseeable way and have been maturing ever since, some have poked along in a kind of permanent stasis (not dead but not alive either), others were dead ends but dead ends in important ways, and some are plain dead. In order, one might list: parallel commodity processors and rapid development of large silicon chips via a subsidized foundry, the autonomous cars/vehicles and generalized machine intelligence systems and expert systems, Thinking Machines’s Connection Machine, and Josephson junctions.
Pining for the fjords: super-fast superconducting Josephson junctions were rapidly abandoned before becoming officially part of SCI research, while gallium arsenide suffered a similar fate—at the time, they were exciting and Cray Computers infamously bet big on the Cray 3 achieving its OOM improvement in part with gallium arsenide chips, but somehow it never quite worked out or replaced silicon and remains in a small niche. (I doubt it was SDI’s fault, since gallium arsenide has had 2 decades since, and there’s been a ton of commercial incentive to find a replacement for silicon as it gets ever harder to shrink silicon nodes.)
Important failures: autonomous vehicles and generalized AI systems represent an interesting intermediate case: the funded vehicles, like the work at CMU, were useless—expensive, slow, trivially confused by slight differences in roads or scenery, unable to cope in realtime with more than monochrome images with pitiful resolutions like 640x640px or smaller because the computer vision algorithms were too computationally demanding, and the development bogged down by endless tweaks and hacking with regular regressions in capability. But these research programs and demos were direct ancestors of the DARPA Grand Challenge, which itself kickstarted the current self-driving car boom a decade ago. ARPA and the military didn’t get the exciting vehicles promised by the early ’90s, but they do now have autonomous cars and especially drones, and it’s amazing to think that Google Waymo cars are wandering around Arizona now regularly picking up and dropping off riders without a single fatality or major injury after millions of miles. As far as I can tell, Waymo wouldn’t exist now without the DARPA Grand Challenge, and it seems possible that DARPA was encouraged by the mixed success of the SCI vehicles, so that’s an interesting case of potential success albeit delayed. (But then, we do expect that with technology—Amara’s law.)
Parallel computers: Thinking Machines benefited a lot from SCI as did other parallel computing projects, and while TM did fail and the computers we use now don’t resemble the Connection Machine at all30, the field of parallel processing was proven out (ie. systems with thousands of weak CPUs could be successfully built, programmed, realize OOM performance gains, and commercially sold); I’d noticed once that a lot of parallel computing architectures we use now seemed to stem from an efflorescence in the 1980s, but it was only while reading R&S and noting all the familiar names that I realized that that was not a coincidence because many of them were ARPA-funded at this time. Even without R&S noting that the parallel computing was successfully rolled over into “HPC”, SCI’s investment into parallel computing was a big success.
A successful adjunct to the parallel computing was an interesting program I’d never heard of before: MOSIS. MOSIS was essentially a government-subsidized chip foundry, competitive with commercial chip foundries, which would accept student & researcher submissions of VLSI chip designs like CPUs or ASICs and make physical chips in combined batches to save costs. Anyone with interesting new ideas could email in a design and get back within 2 months a real live chip for a few hundred dollars. The chips would be made cheaply, quickly, quality-checked, with assurance of privacy, and ran thousands of projects a year (peaking at 1880 in 1989). This is quite a cool program to run and must have been a godsend, especially for anyone trying to make custom chips for parallel projects. (“SC also supported BBN’s Butterfly parallel processor, Charles Seitz’s Hypercube and Cosmic Cube at CalTech, Columbia’s Non-Von, and the CalTech Tree Machine. It supported an entire newcomer as well, Danny Hillis’s Connection Machine, coming out of MIT.47 All of these projects used MOSIS services to move their design ideas into experimental chips.”) It was involved in early GPU work (Clark’s Geometry Engine) and RISC designs like MIPS and even oddities like systolic array chips/computers like the iWarp. Sadly, MOSIS was a bit of a victim of its own success and drew political fire.
Expert systems and planners are generally listed as a ‘failure’ and the cause of the AI Winter, and it’s true they didn’t give us HAL as some GOFAI people hoped, but they did find a useful niche and have been important—R&S give a throwaway paragraph noting that one system from SCI, DART, was used in planning logistics for the first Gulf War and saved the DoD more money than the whole SCI program combined cost. (The listed reference, “DART: Revolutionizing Logistics Planning”, Hedberg 2002, actually makes the bolder claim that DART “paid back all of DARPA’s 30 years of investment in AI in a matter of a few months, according to Victor Reis, Director of DARPA at the time.” Which could be equally well taken as a comment on how expensive a war is, how inefficient DoD logistics planning was, or how little has been invested in AI.) It’s also worth noting that speech recognition based on Hidden Markov models & n-grams, the first speech recognition systems which were any use (underlying successes like Dragon Naturally Speaking), was a success here, even if now obsolesced by deep learning.
Perhaps the most relevant area to contemporary AI discussions of deep learning is the expert systems. Why was there such optimism? Expert systems had accomplished a few successes: MYCIN/DENDRAL (although it was never used in production), some mining/oil case studies like PROSPECTOR, a customer configuration assistant XCON for DEC… And SCI was a synergistic program, remember, providing the chips and then powerful parallel computers whose expert systems would scale up to the tens of thousands of rules per second estimated necessary for things like the autonomous vehicles:
Small wonder, then, that Robert Kahn and the architects of SC believed in 1983 that AI was ripe for exploitation. It was finally moving out of the laboratory and into the real world, out of the realm of toy problems and into the realm of real problems, out of the sterile world of theory and into the practical world of applications.
…That such a goal appeared within reach in the early 1980s is a measure of how far the field had already come. In the early 1970s, the MYCIN expert system had taken twenty person-years to produce just 475 rules.38 The full potential of expert systems lay in programs with thousands, even tens and hundreds of thousands, of rules. To achieve such levels, production of the systems had to be dramatically streamlined. The commercial firms springing up in the early 1980s were building custom systems one client at a time. DARPA would try to raise the field above that level, up to the generic or universal application.
Thus was shaped the SC agenda for AI. While the basic program within IPTO continued funding for all areas of AI, SC would seek “generic applications” in four areas critical to the program’s applications: (1) speech recognition would support Pilot’s Associate and Battle Management; (2) natural language would be developed primarily for Battle Management; (3) vision would serve primarily the Autonomous Land Vehicle; and (4) expert systems would be developed for all of the applications. If AI was the penultimate tier of the SC pyramid, then expert systems were the pinnacle of that tier. Upon them all applications depended. Development of a generic expert system that might service all three applications could be the crowning achievement of the program. Optimism on this point was fueled by the whole philosophy behind SC. AI in general, and expert systems in particular, had been hampered previously by lack of computing power. Feigenbaum, for example, had begun DENDRAL on an IBM 7090 computer, with about 130K bytes of core memory and an operating speed between 50 and 100,000 floating point operations per second.39 Computer power was already well beyond that stage, but SC promised to take it to unprecedented levels—a gigaflop by 1992. Speed and power would no longer constrain expert systems. If AI could deliver the generic expert system, SC would deliver the hardware to run it. Compared to existing expert systems running 2,000 rules at 50–100 rules per second, SC promised “multiple cooperating expert systems with planning capability” running 30,000 rules firing at 12,000 rules per second and six times real time.40
What happened was that the hardware came into existence, but the expert systems didn’t scale. They instantly hit a combinatorial wall, couldn’t solve the grounding problem, and knowledge engineering never became feasible at the level where you might encode a human’s knowledge. Expert systems also struggled to be extended beyond symbolic systems to real data like vision or sound. AI didn’t have remotely enough computing power to do anything useful, and it didn’t have methods which could use the computing power if it had it. We got the VLSI chips, we got the gigahertz processors even without gallium arsenide, we got the gigaflops and then the teraflops and now the petaflops—but what do you do with an expert system on those? Nothing. The grand goals of SCI relied on all the parts doing their part, and one part fell through:
Only four years into the SC program, when Schwartz was about to terminate the IntelliCorp and Teknowledge contracts, expectations for expert systems were already being scaled back. By the time that Hayes-Roth revised his article for the 1992 edition of the Encyclopedia, the picture was still more bleak. There he made no predictions at all about program speeds. Instead he noted that rule-based systems still lacked “a precise analytical foundation for the problems solvable by RBSs . . . and a theory of knowledge organization that would enable RBSs to be scaled up without loss of intelligibility of performance.”108 SC contractors in other fields, especially applications, had to rely on custom-developed software of considerably less power and versatility than those envisioned when contracts were made with IntelliCorp and Teknowledge. Instead of a generic expert system, SC applications relied increasingly on “domain-specific software”, a change in terminology that reflected the direction in which the entire field was moving.109 This is strikingly similar to the pessimistic evaluation Schwartz had made in 1987. It was not just that IntelliCorp and Teknowledge had failed; it was that the enterprise was impossible at current levels of experience and understanding…Does this mean that AI has finally migrated out of the laboratory and into the marketplace? That depends on one’s perspective. In 1994 the U.S. Department of Commerce estimated the global market for AI systems to be about $1,918 million, with North America accounting for two-thirds of that total.119 Michael Schrage, of the Sloan School’s Center for Coordination Science at MIT, concluded in the same year that “AI is—dollar for dollar—probably the best software development investment that smart companies have made.”120 Frederick Hayes-Roth, in a wide-ranging and candid assessment, insisted that “KBS have attained a permanent and secure role in industry”, even while admitting the many shortcomings of this technology.121 Those shortcomings weighed heavily on AI authority Daniel Crevier, who concluded that “the expert systems flaunted in the early and mid-1980s could not operate as well as the experts who supplied them with knowledge. To true human experts, they amounted to little more than sophisticated reminding lists.”122 Even Edward Feigenbaum, the father of expert systems, has conceded that the products of the first generation have proven narrow, brittle, and isolated.123 As far as the SC agenda is concerned, Hayes-Roth’s 1993 opinion is devastating: “The current generation of expert and KBS technologies had no hope of producing a robust and general human-like intelligence.”124
…Each new [ALV] feature and capability brought with it a host of unanticipated problems. A new panning system, installed in early 1986 to permit the camera to turn as the road curved, unexpectedly caused the vehicle to veer back and forth until it ran off the road altogether.45 The software glitch was soon fixed, but the panning system had to be scrapped anyway; the heavy, 40-pound camera stripped the device’s gears whenever the vehicle made a sudden stop.46 Given such unanticipated difficulties and delays, Martin increasingly directed its efforts toward achieving just the specific capabilities required by the milestones, at the expense of developing more general capabilities. One of the lessons of the first demonstration, according to the ALV engineers, was the importance of defining “expected experimental results”, because “too much time was wasted doing things not appropriate to proof of concept.”47 Martin’s selection of technology was conservative. It had to be, as the ALV program could afford neither the lost time nor the bad publicity that a major failure would bring. One BDM observer expressed concern that the pressure of the demonstrations was encouraging Martin to cut corners, for instance by using the “flat earth” algorithm with its two-dimensional representation. ADS’s obstacle-avoidance algorithm was so narrowly focused that the company was unable to test it in a parking lot; it worked only on roads.84…The vision system proved highly sensitive to environmental conditions—the quality of light, the location of the sun, shadows, and so on. The system worked differently from month to month, day to day, and even test to test. Sometimes it could accurately locate the edge of the road, sometimes not. The system reliably distinguished the pavement of the road from the dirt on the shoulders, but it was fooled by dirt that was tracked onto the roadway by heavy vehicles maneuvering around the ALV. In the fall, the sun, now lower in the sky, reflected brilliantly off the myriads of polished pebbles in the tarmac itself, producing glittering reflections that confused the vehicle. Shadows from trees presented problems, as did asphalt patches from the frequent road repairs made necessary by the harsh Colorado weather and the constant pounding of the eight-ton vehicle.42
…Knowledge-based systems in particular were difficult to apply outside the environment for which they had been developed. A vision system developed for autonomous navigation, for example, probably would not prove effective for an automated manufacturing assembly line. “There’s no single universal mechanism for problem solving”, Amarel would later say, “but depending on what you know about a problem, and how you represent what you know about the problem, you may use one of a number of appropriate mechanisms.”…In another major shift in emphasis, SC2 removed “machine intelligence” from its own plateau on the pyramid, subsuming it under the general heading “software”. This seemingly minor shift in nomenclature signaled a profound reconceptualization of AI, both within DARPA and throughout much of the computer community. The effervescent optimism of the early 1980s gave way to more sober appraisal. AI did not scale. In spite of impressive achievements in some fields, designers could not make systems work at a level of complexity approaching human intelligence. Machines excelled at data storage and retrieval; they lagged in judgment, learning, and complex pattern recognition.
…During SC, AI had proved unable to exploit the powerful machines developed in SC’s architectures program to achieve Kahn’s generic capability in machine intelligence. On the fine-grained level, AI, including many developments from the SC program, is ubiquitous in modern life. It inhabits everything from automobiles and consumer electronics to medical devices and instruments of the fine arts. Ironically, AI now performs miracles unimagined when SC began, though it can’t do what SC promised.
Given how people keep reaching back to the AI Winter in discussions of connectionism—I mean, deep learning—it’s interesting to contrast the two paradigms.
While working on the Wikipedia article for Lisp machines (and articles on related high-profile successes like MYCIN/DENDRAL) back in 2009, I read many journals & magazines from the 1980s, the Lisp machine heyday, and even played with a Genera OS image in a VM; the more I read about AI, the MIT AI Lab, Lisp machines, the ‘AI winter’, and so on, the more impressed I was by the operating systems & tools (such as the sophisticated hypertext documentation & text editors and capabilities of Common Lisp & its predecessors, which still put contemporary OS ecosystems on Windows/Mac/Linux to shame in many ways31) and the less I was impressed by the actual AI algorithms of the era. In contrast, with deep learning, I am increasingly unimpressed by the surrounding ecosystem of software tools (with its endless layers of buggy Python & rigid C++) the more I use it, but more and more impressed by what is possible with deep learning.
Deep learning has long ago escaped into the commercial market, indeed, is primarily driven by industry researchers at this point. The case studies are innumerable (and many are secret due to their considerable commercial value). DL handles grounding problems & raw sensory data well and indeed struggles most on problems with richly formalized structures like hierarchies/categories/directed graphs (ML practitioners currently tend to use decision tree methods like XGBoost for those), or which require using rules & logical reasoning (somewhat like humans). Perhaps most importantly from the perspective of SCI and HPC, deep learning scales: it parallelizes in a number of ways, and it can soak up indefinite amounts of computing power & data. You can train a CNN on a few hundred or thousand images usefully32, but Facebook & Google have run experiments going from millions to large datasets such as hundreds of millions and billions of images (eg Gao et al 2017, Gross et al 2017, Sun et al 2017, Mahajan et al 2018, Laanait et al 2019, Anonymous et al 2019), and the CNNs steadily improve their performance on both their assigned task and in accomplishing transfer learning33. Similarly in reinforcement learning, the richer the resources available, the richer a NN can be trained (OA chart; consider how deep Zero’s NN is compared to the original AlphaGo, or Ape-X or Impala for learning many ALE games simultaneously, or OpenAI’s 5x5 DoTA progress via essentially brute force). Even self-driving car programs which are a byword for incompetence deal just fine with all the issues that bedeviled ALV by using, well, ‘a single universal mechanism for problem solving’ (which we call CNNs, which can do anything from image segmentation to human language translation). These points are all the more striking as there is no sign that hardware improvements are over or that any inherent limits have been hit; even the large-scale experiments criticized as ‘boil the oceans’ projects nevertheless spend what are trivial amounts of money by both global economic and R&D criteria, like a few million dollars of GPU time. But none of this could have been done in the 1980s, or early 1990s. (As Hinton says, why didn’t connectionism work back then? Because the computers were thousands of times too slow, the datasets were thousands of times too small, and some of the neural network details like initializations & activations were broken.)
Considering all this, it’s not a surprise that the AI part of SC didn’t pan out and eventually got axed, as it should have. Sometimes the time is not ripe. Hero can invent the steam engine, but you don’t get steam engine trains until it’s steam engine train time, and the best intentions of all the bureaucrats in the world can’t affect that much. The turnover in managers and political interference may well have been enough to “disrupt the careful orchestration that its ambitious agenda required”, but this was more in the nature of shooting a dead horse. R&S seem, somewhat reluctantly, to ultimately assent to the view they critiqued at the beginning, held by the ARPA staff, that the failure of SC is primarily a demonstration of technological determinism than social & political contingency, and more about the technology than people:
…Thus, for all of their agency, their story appears to be one driven by the technology. If they were unable to socially construct this technology, to maintain agency over technological choice, does it then follow that some technological imperative shaped the SC trajectory, diverting it in the end from machine intelligence to high performance computing? Institutionally, SC is best understood as an analog of the development programs for the Polaris and Atlas ballistic missiles. An elaborate structure was created to sell the program, but in practice the plan bore little resemblance to day-to-day operations. Conceptually, SC is best understood by mixing Thomas Hughes’s framework of large-scale technological systems with Giovanni Dosi’s notions of research trajectories. Its experience does not quite map on Hughes’s model because the managers could not or would not bring their reverse salients on line. It does not quite map on Dosi because the managers regularly dealt with more trajectories and more variables than Dosi anticipates in his analyses. In essence, the managers of SC were trying to research and develop a complex technological system. They succeeded in developing some components; they failed to connect them in a system. The overall program history suggests that at this level of basic or fundamental research it is best to aim for a broad range of capabilities within the technology base and leave integration to others…While the Fifth Generation program contributed significantly to Japan’s national infrastructure in computer technology, it did not vault that country past the United States…SC played an important role, but even some SC supporters have noted that the Japanese were in any event headed on the wrong trajectory even before the United States mobilized itself to meet their challenge.
…In some ways the varying records of the SC applications shed light on the program models advanced by Kahn and Cooper at the outset. Cooper believed that the applications would pull technology development; Kahn believed that the evolving technology base would reveal what applications were possible. Kahn’s appraisal looks more realistic in retrospect. It is clear that expert systems enjoyed significant success in planning applications. This made possible applications ranging from Naval Battle Management to DART. Vision did not make comparable progress, thus precluding achievement of the ambitious goals set for the ALV. Once again, the program went where the technology allowed. Some reverse salients resisted efforts to orchestrate advance of the entire field in concert. If one component in a system did not connect, the system did not connect.
In the final analysis, SC failed for want of connection.
Reading about SC furnishes an unexpected lesson about the importance of believing in Moore’s Law and having techniques which can scale. What are we doing now which won’t scale, and what waves are we paddling up instead of surfing?
Excerpts from The First Miracle Drugs: How the Sulfa Drugs Transformed Medicine, Lesch 2006, describing Heinrich Hörlein’s drug development programs & Thomas Edison’s electrical programs as strategically aimed at “reverse salients”, necessary steps which hold back the practical application of progress in areas, where research efforts have disproportional payoffs by removing a bottleneck.
From pg48, “A System of Invention”, The First Miracle Drugs: How the Sulfa Drugs Transformed Medicine, Lesch 2006:
Hörlein’s attitude was based not simply, or even primarily, on the situation of any particular area of research considered in isolation, but on his comprehensive overview of advance in areas in which chemistry and biomedicine intersected. These areas shared a number of generic problems and solutions, for example, the need to isolate a substance (natural product, synthetic product, body substance) in chemically pure form, the need to synthesize the substance and to do so economically if it was to go on the market, and the need for pharmacological, chemotherapeutic, toxicological, and clinical testing of the substance. Hörlein’s efforts to translate success in certain areas (vitamin deficiency disease, chemotherapy of protozoal infections) into optimism about possibilities in other areas (cancer, antibacterial chemotherapy) was characteristic. He regarded the chemical attack on disease as a many-fronted battle in which there was a generally advancing line but also many points at which advance was slow or arrested.
In this sense, Hörlein might be said to have thought—as Thomas Hughes has shown that Edison did—in terms of reverse salients and critical problems. Reverse salients are areas of research and development that are lagging in some obvious way behind the general line of advance. Critical problems are the research questions, cast in terms of the concrete particulars of currently available knowledge and technique and of specific exemplars or models (e.g., insulin, chemotherapy of kala-azar and malaria) that are solvable and whose solutions would eliminate the reverse salients.18
- On Edison, see Thomas P. Hughes, Networks of Power: Electrification in Western Society 1880–1930 (Baltimore, MD: Johns Hopkins University Press, 1983), 18–46.
- Ibid; and Thomas P. Hughes, “The evolution of large technological systems”, in Wiebe E. Bijker, Thomas P. Hughes, and Trevor Pinch, editors, The Social Construction of Technological Systems (Cambridge, MA: The MIT Press, 1987)
…What was systemic in Hörlein’s way of thinking was his concept of the organizational pattern or patterns that will best facilitate the production of valuable results in the areas in which medicine and chemistry interact. A valuable outcome is a result that has practical importance for clinical or preventive medicine and, implicitly, commercial value for industry. Hörlein perceived a need for a set of mutually complementary institutions and trained personnel whose interaction produces the desired results. The organizational pattern that emerges more or less clearly from Hörlein’s lectures is closely associated with his view of the typical phases or cycles of development of research in chemotherapy or physiological chemistry. He saw a need for friendly and mutually supportive relations between industrial research and development organizations, academic institutions, and clinicians. He viewed the academic-industrial connection as crucial and mutually beneficial. Underlying this view was his definition and differentiation of the relevant disciplines and his belief in their generally excellent condition in Germany. He saw a need for government support of appropriate institutions, especially research institutes in universities. Within industrial research organizations—and, implicitly, within academic ones—Hörlein called for special institutional arrangements to encourage appropriate interactions between chemistry and biomedicine.
An element of crucial—and to Hörlein, personal—importance in these interactions was the role of the research manager or “team leader.” When Hörlein spoke of the research done under his direction as “our work,” he used the possessive advisedly to convey a strong sense of his own participation. The research manager had to be active in defining goals, in marshaling means and resources, and in assessing success or failure. He had to intervene where necessary to minimize friction between chemists and medical researchers, an especially important task for chemotherapy as a composite entity. He had to publicize the company’s successes—a necessity for what was ultimately a commercial enterprise—and act as liaison between company laboratories and the academic and medical communities. Through it all, he had to take a long view of the value of research, not insisting on immediate results of medical or commercial value.
As a research manager with training and experience in pharmaceutical chemistry, a lively interest in medicine, and rapport with the medical community, Hörlein was well positioned to survey the field where chemistry and medicine joined battle against disease. He could spot the points where the enemy’s line was broken, and the reverse salients in his own. What he could not do—or could not do alone—was to direct the day-to-day operations of his troops, that is, to define the critical problems to be solved, to identify the terms of their solution, and to do the work that would carry the day. In the case of chemotherapy, these things could be effected only by the medical researcher and the chemist, each working on his own domain, and cooperatively. For his attack on one of the most important reverse salients—the chemotherapy of bacterial infections—Hörlein called upon the medical researcher Domagk and the chemists Mietzsch and Klarer.
Summary by one VC of a16z investment strategy.
In a strange way, sometimes familiarity can breed contempt—and conversely, the distance from the problem that comes from having a completely different professional background might actually make one a better founder. Though not venture backed, Southwest Airlines was cofounded in 1967 by Herb Kelleher and of course has gone on to become a very successful business. When interviewed many years later about why, despite being a lawyer by training, he was the natural founder for an airline business, Kelleher quipped: “I knew nothing about airlines, which I think made me eminently qualified to start one, because what we tried to do at Southwest was get away from the traditional way that airlines had done business.”
This has historically been less typical in the venture world, but, increasingly, as entrepreneurs take on more established industries—particularly those that are regulated—bringing a view of the market that is unconstrained by previous professional experiences may in fact be a plus. We often joke at a16z that there is a tendency to “fight the last battle” in an area in which one has long-standing professional exposure; the scars from previous mistakes run too deep and can make it harder for one to develop creative ways to address the business problem at hand. Perhaps had Kelleher known intimately of all the challenges of entering the airline business, he would have run screaming from the challenge versus deciding to take on the full set of risks.
Whatever the evidence, the fundamental question VCs are trying to answer is: Why back this founder against this problem set versus waiting to see who else may come along with a better organic understanding of the problem? Can I conceive of a team better equipped to address the market needs that might walk through our doors tomorrow? If the answer is no, then this is the team to back.
The third big area of team investigation for VCs focuses on the founder’s leadership abilities. In particular, VCs are trying to determine whether this founder will be able to create a compelling story around the company mission in order to attract great engineers, executives, sales and marketing people, etc. In the same vein, the founder has to be able to attract customers to buy the product, partners to help distribute the product, and, eventually, other VCs to fund the business beyond the initial round of financing. Will the founder be able to explain her vision in a way that causes others to want to join her on this mission? And will she walk through walls when the going gets tough—which it inevitably will in nearly all startups—and simply refuse to even consider quitting?
When Marc and Ben first started Andreessen Horowitz, they described this founder leadership capability as “egomaniacal.” Their theory—notwithstanding the choice of words—was that to make the decision to be a founder (a job fraught with likely failure), an individual needed to be so confident in her abilities to succeed that she would border on being so self-absorbed as to be truly egomaniacal. As you might imagine, the use of that term in our fund-raising deck for our first fund struck a chord with a number of our potential investors, who worried that we would back insufferable founders. We ultimately chose to abandon our word choice, but the principle remains today: You have to be partly delusional to start a company given the prospects of success and the need to keep pushing forward in the wake of the constant stream of doubters.
After all, nonobvious ideas that could in fact become big businesses are by definition nonobvious. My partner Chris Dixon describes our job as VCs as investing in good ideas that look like bad ideas. If you think about the spectrum of things in which you could invest, there are good ideas that look like good ideas. These are tempting, but likely can’t generate outsize returns because they are simply too obvious and invite too much competition that squeezes out the economic rents. Bad ideas that look like bad ideas are also easily dismissed; as the description implies, they are simply bad and thus likely to be trapdoors through which your investment dollars will vanish. The tempting deals are the bad ideas that look like good ideas, yet they ultimately contain some hidden flaw that reveals their true “badness”. This leaves good VCs to invest in good ideas that look like bad ideas—hidden gems that probably take a slightly delusional or unconventional founder to pursue. For if they were obviously good ideas, they would never produce venture returns.
Thiel uses the example of ‘New France’/the Louisiana Territory, in which the projections of John Law et al that it (and thus the Mississippi Company) would be as valuable as France itself turned out to be correct—just centuries later, with the benefits redounding to the British colonies. Even the Mississippi Company worked out: “The ships that went abroad on behalf of his great company began to turn a profit. The auditor who went through the company’s books concluded that it was entirely solvent—which isn’t surprising, when you consider that the lands it owned in America now produce trillions of dollars in economic value.” One could also say the same thing of China: countless European observers forecast that China was a ‘sleeping giant’ which, once it industrialized & modernized, would again be a global power. They were correct, but many of them would be surprised & disappointed how long it took.↩︎
Nathan Myhrvold‘s patent troll company Intellectual Ventures is also featured in Malcolm Gladwell’s essay on multiple invention, “In the Air: Who says big ideas are rare?”; IV’s business model is to spew out patents for speculations that other people will then actually invent, who can then be extorted for license fees when they make the inventions work in the real world & produce value. (This is assisted by the fact that patents no longer require even the pretense of a working model.) As Bill Gates says, “I can give you fifty examples of ideas they’ve had where, if you take just one of them, you’d have a startup company right there.” Indeed—that this model works demonstrates the commonness of ’multiples’, the worthless of ideas, and the moral bankruptcy of the current patent system.↩︎
At the margin, compared to other competitors in the VR space, like Valve’s concurrent efforts, and everything that the Rift built on, did Luckey and co really create ~$2.83b of new value? Or were they lucky in trying at the right time, and merely captured all of the value, because a 99% adequate VR headset is worth 0%, and they added the final 1%? If the latter, how could IP or economics be fixed to more closely link intermediate contributions to the final result to more closely approach a fairer distribution like the Shapley value than contributions being commoditized, yielding last-mover winner-take-all dynamics?↩︎
Benedict Evans (“In Praise of Failure”) summarizes the problem:
It’s possible for a few people to take an idea and create a real company worth billions of dollars in less than a decade—to go from an idea and a few notes to Google or Facebook, or for that matter Dollar Shave Club or Nervana [Nervana Systems?]. It’s possible for entrepreneurs to create something with huge impact.
But equally, anything with that much potential has a high likelihood of failure—if it was obviously a good idea with no risks, everyone would be doing it. Indeed, it’s inherent in really transformative ideas that they look like bad ideas—Google, Apple, Facebook and Amazon all did, sometimes several times over. In hindsight the things that worked look like good ideas and the ones that failed look stupid, but sadly it’s not that obvious at the time. Rather, this is how the process of invention and creation works. We try things—we try to create companies, products and ideas, and sometimes they work, and sometimes they change the world. And so, we see, in our world around half such attempts fail completely, and 5% or so go to the moon.
It’s worth noting that ‘looks like a bad idea’ is flexible here: I emphasize that many good ideas look like bad ideas because they’ve been tried before & failed, but many others look bad because a necessary change hasn’t yet happened or people underestimate existing technology.↩︎
Where there is, as Musk describes it, a “graveyard of companies” like Coda Automotive or Fisker Automotive. It may be relevant to note that Musk did not found Tesla; the two co-founders ultimately quit the company.↩︎
As late as 2007–2008, Blockbuster could have still beaten Netflix, as its "Total Access" program demonstrated, but CEO changes scuppered its last chance. And, incidentally, offering an example of why stock markets are fine with paying executives so much: a good executive can create—or destroy—the entire company. If Blockbuster’s CEO had paid a pittance ~2000 to acquihire Netflix & put Reed Hastings in charge, or if it had simply stuck with its CEO in 2007 to strangle Netflix with Total Access, its shareholders would be far better off now. But it didn’t.↩︎
Finding out these tidbits is one reason I enjoyed reading Founders at Work: Stories of Startups’ Early Days (ed Livingston 2009; “Introduction”), because the challenges are not always what you think they are. PayPal’s major challenge, for example, was not finding a market like eBay power sellers, but coping with fraud as they scaled, which apparently was the undoing of any number of rivals.↩︎
Personally, I was still using Dogpile until at least 2000.↩︎
From Frock 2006, Changing How the World Does Business: Fedex’s Incredible Journey to Success, in 1973:
On several occasions, we came within an inch of failure, because of dwindling financial resources, regulatory roadblocks, or unforeseen events like the Arab oil embargo. Once, Fred’s luck at the gaming tables of Las Vegas helped to save the company from financial disaster. Another time, we had to ask our employees to hold their paychecks while we waited for the next wave of financing…Fred dumped his entire inheritance into the company and was full speed ahead without concern for his personal finances.
…The loan guarantee from General Dynamics raised our hopes and increased our spirits, but also increased the pressure to finalize the private placement. We continued to be in desperate financial trouble, particularly with our suppliers. The most demanding suppliers when it came to payments were the oil companies. Every Monday, they required Federal Express to prepay for the anticipated weekly usage of jet fuel. By mid-July our funds were so meager that on Friday we were down to about $20,296 in the checking account, while we needed $97,421 for the jet fuel payment. I was still commuting to Connecticut on the weekends and really did not know what was going to transpire on my return.
However, when I arrived back in Memphis on Monday morning, much to my surprise, the bank balance stood at nearly $129,895. I asked Fred where the funds had come from, and he responded, “The meeting with the General Dynamics board was a bust and I knew we needed money for Monday, so I took a plane to Las Vegas and won $109,599.” I said, “You mean you took our last $20,296—how could you do that?” He shrugged his shoulders and said, “What difference did it make? Without the funds for the fuel companies, we couldn’t have flown anyway.” Fred’s luck held again. It was not much but it came at a critical time and kept us in business for another week.
This also illustrates the ex post & fine line between ‘visionary founder’ & ‘criminal con artist’; had Frederick W. Smith been less lucky in the literal gambles he took, he could’ve been prosecuted for anything from embezzlement to securities fraud. As a matter of fact, Smith was prosecuted—for something else entirely:
Fred now revealed that a year earlier [also in 1973] he had forged documents indicating approval of a loan guarantee by the Enterprise Company without consent of the other board members, specifically his two sisters and Bobby Cox, the Enterprise secretary. Our respected leader admitted his culpability to the Federal Express board of directors and to the investors and lenders we were counting on to support the second round of the private placement financing. While it is possible to understand that, under extreme pressure, Fred was acting to save Federal Express from almost certain bankruptcy, and even to empathize with what he did, it nevertheless appeared to be a serious breach of conduct…December 1975 was also the month that settled the matter of the forged loan guarantee documents for the Union Bank. At his trial, Fred testified that as president of the Enterprise board and with supporting letters from his sisters, he had authority to commit the board. After 10 hours of deliberation, he was acquitted. If convicted, he would have faced a prison term of up to five years.
Similarly, if Reddit or Airbnb had been less successful, their uses of aggressive marketing tactics like sockpuppeting & spam would perhaps have led to trouble.↩︎
To borrow a phrase from Kelly:
The electric incandescent lightbulb was invented, reinvented, coinvented, or “first invented” dozens of times. In their book Edison’s Electric Light: Biography of an Invention, Robert Friedel, Paul Israel, and Bernard Finn list 23 inventors of incandescent bulbs prior to Edison. It might be fairer to say that Edison was the very last “first” inventor of the electric light. These 23 bulbs (each an original in its inventor’s eyes) varied tremendously in how they fleshed out the abstraction of “electric lightbulb.” Different inventors employed various shapes for the filament, different materials for the wires, different strengths of electricity, different plans for the bases. Yet they all seemed to be independently aiming for the same archetypal design. We can think of the prototypes as 23 different attempts to describe the inevitable generic lightbulb.
This happens even in literature: Doyle’s Sherlock Holmes stories weren’t the first to invent “clues”, but the last (Moretti 2000, Moretti 2005, Batuman 2005), with other detective fiction writers doing things that can only be called ‘grotesque’; Moretti, baffled, recounts that “one detective, having deduced that ‘the drug is in the third cup of coffee’, proceeds to drink the coffee.”
“Kafka And His Precursors”, Borges 1951:
At one time I considered writing a study of Kafka’s precursors. I had thought, at first, that he was as unique as the phoenix of rhetorical praise; after spending a little time with him, I felt I could recognize his voice, or his habits, in the texts of various literatures and various ages…If I am not mistaken, the heterogeneous pieces I have listed resemble Kafka; if I am not mistaken, not all of them resemble each other. This last fact is what is most significant. Kafka’s idiosyncrasy is present in each of these writings, to a greater or lesser degree, but if Kafka had not written, we would not perceive it; that is to say, it would not exist. The poem “Fears and Scruples” by Robert Browning prophesies the work of Kafka, but our reading of Kafka noticeably refines and diverts our reading of the poem. Browning did not read it as we read it now. The word “precursor” is indispensable to the vocabulary of criticism, but one must try to purify it from any connotation of polemic or rivalry. The fact is that each writer creates his precursors. His work modifies our conception of the past, as it will modify the future.’ In this correlation, the identity or plurality of men doesn’t matter. The first Kafka of “Betrachtung” is less a precursor of the Kafka of the gloomy myths and terrifying institutions than is Browning or Lord Dunsany.
See also “An Oral History of Nintendo’s Power Glove”, and Polygon’s oral history of the Kinect, “All the money in the world couldn’t make Kinect happen: For a moment a decade ago, the game industry looked like a very different place”.↩︎
“Integrated Cognitive Systems”, Michie 1970 (pg93–96 of Michie, On Machine Intelligence):
How long is it likely to be before a machine can be developed approximating to adult human standards of intellectual performance? In a recent poll , thirty-five out of forty-two people engaged in this sort of research gave estimates between ten and one hundred years. [8: European AISB Newsletter, no. 9, 4 (1969)] There is also fair agreement that the chief obstacles are not hardware limitations. The speed of light imposes theoretical bounds on rates of information transfer, so that it was once reasonable to wonder whether these limits, in conjunction with physical limits to microminiaturization of switching and conducting elements, might give the biological system an irreducible advantage. But recent estimates [9, 10], which are summarized in Tables 7.1 and 7.2, indicate that this is not so, and that the balance of advantage in terms of sheer information-handling power may eventually like with the computer rather than the brain. It seems a reasonable guess that the bottleneck will never again lie in hardware speeds and storage capacities, as opposed to purely logical and programming problems. Granted that an ICS can be developed, is now the right time to mount the effort?
Yet the principle of ‘unripe time’, distilled by F. M. Cornford  more than half a century ago from the changeless stream of Cambridge academic life, has provided the epitaph of more than one premature technology. The aeroplane industry cannot now redeem Daedalus nor can the computer industry recover the money spent by the British Admiralty more than a hundred years ago in support of Charles Babbage and his calculating machine. Although Babbage was one of Britain’s great innovative geniuses, support of his work was wasted money in terms of tangible return on investment. It is now appreciated that of the factors needed to make the stored-program digital computer a technological reality only one was missing: the means to construct fast switching elements. The greater part of a century had to elapse before the vacuum tube arrived on the scene.
Translation, Katsuki Sekida, Two Zen Classics: The Gateless Gate and The Blue Cliff Records, 2005.↩︎
Which as a side note is wrong; compiled predictions actually indicate that AI researcher forecasts, while varying anywhere from a decade to centuries, typically cluster around 20 years in the future regardless of researcher age. For a recent timeline survey, see “Forecasting Transformative AI: An Expert Survey”, Gruetzemacher et al 2019, and for more, AI Impacts.org. (One wonders if a 20-year forecast might be driven by anthropics: in an exponentially-growing field, most researchers will be present in the final ‘generation’, and so a priori one could predict accurately that it will be 20 years to AI. In this regard, it is amusing to note the exponential growth of conferences like NIPS or ICML 2010–2019.)↩︎
…A further application of criterion 4 arises if theoretical infeasibility is demonstrated…But it is well to look on such negative proofs with caution. The possibility of broadcasting radio waves across the Atlantic was convincingly excluded by theoretical analysis. This did not deter Marconi from the attempt, even though he was as unaware of the existence of the Heaviside layer as everyone else.
It can reasonably be said that time was unripe for digital computing as an industrial technology. But it is by no means obvious that it was unripe for Babbage’s research and development effort, if only it had been conceived in terms of a more severely delimited objective: the construction of a working model. Such a device would not have been aimed at the then unattainable goal of economic viability; but its successful demonstration might, just conceivably, have greatly accelerated matters when the time was finally ripe. Vacuum tube technology was first exploited for high-speed digital computing in Britain during the Second World War . But it was left to Eckert and Mauchly  several years later to rediscover and implement the conceptions of stored programs and conditional jumps, which had already been present in Babbage’s analytical engine . Only then could the new technology claim to have drawn level with Babbage’s design ideas of a hundred years earlier.
A kind of definition of Value of Information:
If you do not work on an important problem, it’s unlikely you’ll do important work. It’s perfectly obvious. Great scientists have thought through, in a careful way, a number of important problems in their field, and they keep an eye on wondering how to attack them. Let me warn you, ‘important problem’ must be phrased carefully. The three outstanding problems in physics, in a certain sense, were never worked on while I was at Bell Labs. By important I mean guaranteed a Nobel Prize and any sum of money you want to mention. We didn’t work on (1) time travel, (2) teleportation, and (3) antigravity. They are not important problems because we do not have an attack. It’s not the consequence that makes a problem important, it is that you have a reasonable attack. That is what makes a problem important.
BOYDEN: …One idea is, how do we find the diamonds in the rough, the big ideas but they’re kind of hidden in plain sight? I think we see this a lot. Machine learning, deep learning, is one of the hot topics of our time, but a lot of the math was worked out decades ago—backpropagation, for example, in the 1980s and 1990s. What has changed since then is, no doubt, some improvements in the mathematics, but largely, I think we’d all agree, better compute power and a lot more data.
So how could we find the treasure that’s hiding in plain sight? One of the ideas is to have sort of a SWAT team of people who go around looking for how to connect the dots all day long in these serendipitous ways.
COWEN: Two last questions. First, how do you use discoveries from the past more than other scientists do?
BOYDEN: One way to think of it is that, if a scientific topic is really popular and everybody’s doing it, then I don’t need to be part of that. What’s the benefit of being the 100,000th person working on something?
So I read a lot of old papers. I read a lot of things that might be forgotten because I think that there’s a lot of treasure hiding in plain sight. As we discussed earlier, optogenetics and expansion microscopy both begin from papers from other fields, some of which are quite old and which mostly had been ignored by other people.
I sometimes practice what I call ‘failure rebooting’. We tried something, or somebody else tried something, and it didn’t work. But you know what? Something happened that made the world different. Maybe somebody found a new gene. Maybe computers are faster. Maybe some other discovery from left field has changed how we think about things. And you know what? That old failed idea might be ready for prime time.
With optogenetics, people were trying to control brain cells with light going back to 1971. I was actually reading some earlier papers. There were people playing around with controlling brain cells with light going back to the 1940s. What is different? Well, this class of molecules that we put into neurons hadn’t been discovered yet.
“Was Moore’s Law Inevitable?”, Kevin Kelly again:
Listen to the technology, Carver Mead says. What do the curves say? Imagine it is 1965. You’ve seen the curves Gordon Moore discovered. What if you believed the story they were trying to tell us: that each year, as sure as winter follows summer, and day follows night, computers would get half again better, and half again smaller, and half again cheaper, year after year, and that in 5 decades they would be 30 million times more powerful than they were then, and cheap. If you were sure of that back then, or even mostly persuaded, and if a lot of others were as well, what good fortune you could have harvested. You would have needed no other prophecies, no other predictions, no other details. Just knowing that single trajectory of Moore’s, and none other, we would have educated differently, invested differently, prepared more wisely to grasp the amazing powers it would sprout.
It’s not enough to theorize about the possibility or prototype something in the lab if there is then no followup. The motivation to take something into the ‘real world’, which necessarily requires attacking the reverse salients, may be part of why corporate & academic research are both necessary; too little of either creates a bottleneck. A place like Bell Labs benefits from remaining in contact with the needs of commerce, as it provides a check on l’art pour l’art pathologies, a fertile source of problems, and can feed back the benefits of mass production/experience curves. (Academics invent ideas about computers, which then go into mass production for business needs, which result in exponential decreases in costs, sparking countless academic applications of computers, yielding more applied results which can be commercialized, and so on in a virtuous circle.) In recent times, corporate research has diminished, and that may be a bad thing: “The changing structure of American innovation: Some cautionary remarks for economic growth”, Arora et al 2020.↩︎
One might appeal to the Kelly criterion as a guide to how much individuals should wager on experiments, since the Kelly criterion gives optimal growth of wealth over the long-term while avoiding gambler’s ruin, but given the extremely small number of ‘wagers’ an individual engages in, with a highly finite horizon, the Kelly criterion’s assumptions are far from satisfied, and the true optimal strategy can be radically different from a naive Kelly criterion; I explore this difference more in “The Kelly Coin-flipping Game”, which is motivated by stock-market investing.↩︎
Thompson sampling, incidentally, has been rediscovered↩︎
PSRL (posterior sampling, see also Ghavamzadeh et al 2016) generalizes Thompson sampling to more complex problems, MDPs or POMDPs in general, by for each iteration, assuming an entire collection or distribution of possible environments which are more complex than a single-step bandit, picking an environment at random based on its probability of being the real environment, finding the optimal actions for that one, and then acting on that solution; this does the same thing in smoothly balancing exploration with exploitation. Normal PSRL requires ‘episodes’, which don’t really have a real-world equivalent, but PSRL can be extended to handle continuous action—a nice example is deterministic schedule posterior sampling reinforcement learning (DS-PRL), which does ‘back off’ in periodically stopping, and re-evaluating the optimal strategy based on accumulated evidence, but less & less often, so it does PSRL over increasingly large time windows.↩︎
Polarizing here could be reflect a wide posterior value distribution, or if the posterior is being approximated by something like a mixture of experts or an ensemble of multiple models (like running multiple passes over a dropout-trained neural network, or a bootstrapped neural network ensemble). In a human setting, it might be polarizing in the sense of human peer-reviewers arguing the most about it, or having the least inter-rater agreement or highest variance of ratings.
As Goldstein & Kearney 2017 describe their analysis of the numerical peer reviewer ratings of DARPA proposals:
In other words, ARPA-E PDs tend to fund proposals on which reviewers disagree, given the same mean overall score. When minimum and maximum score are included in the same model, the coefficient on minimum score disappears. This suggests that ARPA-E PDs are more likely to select proposals that were highly-rated by at least one reviewer, but they are not deterred by the presence of a low rating. This trend persists when median score is included (Model 7 in Table 3). ARPA-E PDs tend to agree with the bulk of reviewers, and they also tend to agree with scores in the upper tail of the distribution. They use their discretion to surface proposals that have at least one champion, regardless of whether there are any detractors…The results show that there is greater ex ante uncertainty in the ARPA-E research portfolio compared to proposals with the highest mean scores (Model 1).
The different pathologies might be: small ones will collectively try lots of strange or novel ideas but will fail by running underpowered poorly-done experiments (for lack of funding & expertise) which convince no one, suffer from small-study biases, and merely pollute the literature, giving meta-analysts migraines. Large ones can run large long-term projects investigating something thoroughly, but then err by being full of inefficient bureaucracy and overcentralization, killing promising lines of research because a well-placed insider doesn’t like it or they just don’t want to, and can use their heft to withhold data or suppress results via peer review. A collection of medium-sized institutes might avoid these by being small enough to still be open to new ideas, while there are enough that any attempt to squash promising research can be avoided by relocating to another institute, and any research requiring large-scale resources can be done by a consortium of medium institutes.
Modern genomics strikes me as a bit like this. Candidate-gene studies were done by every Tom, Dick, and Harry, but the methodology failed completely because sample sizes many orders of magnitude larger were necessary. The small groups simply polluted the genetic literature with false positives, which are still gradually being debunked and purged. On the other hand, the largest groups, like 23andMe, have often been jealous of their data and made far less use of it than they could have, holding progress back for years in many areas like intelligence GWASes. The UK Biobank has produced an amazing amount of research for a large group, but is the exception that proves the rule: their openness to researchers is (sadly) extraordinarily unusual. Much progress has come from groups like SSGAC or PGC, which are consortiums of groups of all sizes (with some highly conditional participation from 23andMe).↩︎
Ironically, as I write this in 2018, DARPA has recently announced another attempt at “silicon compilers”, presumably sparked by commodity chips topping out and ASICs being required, which I can only summarize as “Verilog but let’s do it sanely this time and with FLOSS rather than a crazy tragedy-of-the-anticommons proprietary ecosystem of crap”.↩︎
Specifically, contemporary computers don’t use the dense grid of 1-bit processors with local memory which characterized the CM. They do feature increasingly thousands of ‘processor’ equivalents in the form of CPU cores and the GPU cores, but those are all far more powerful than a CM CPU node. But we might yet see some convergence with the CM thanks to neural networks: neural networks are typically trained with wastefully precise floating point operations, slowing them down, thus the rise of ‘tensor cores’ and ‘TPUs’ using lower precision, like 8-bit integers, and it is possible to discretize neural nets all the way down to binary weights. This offers a lot of potential electricity savings, and if you have binary weights, why not binary computing elements as well…?↩︎
People tend to ignore this, but CNNs can work with a few hundred or even just one or two images, using transfer learning, few-shot learning, and aggressive regularization like data augmentation.↩︎
While the accuracy rates may increase by what looks like a tiny amount, and one might ask how important a change from 99% to 99.9% accuracy is, the large-scale training papers demonstrate that neural nets continue to learn hidden knowledge from the additional data which provide ever better semantic features which can be reused elsewhere.↩︎