Review of Roland & Shiman 2002 history of a decade of ARPA/DARPA involvement in AI and supercomputing, and the ARPA philosophy of technological acceleration; it yielded mixed results, perhaps due to ultimately insurmountable bottlenecks—the time was not yet ripe for many goals.
2018-07-04–2021-05-06 finished certainty: likely importance: 6
Review of DARPA history book, Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983–1993, Roland & Shiman 2002, which reviews a large-scale DARPA effort to jumpstart real-world uses of AI in the 1980s by a multi-pronged research effort into more efficient computer chip R&D, supercomputing, robotics/
self-driving cars, & expert system software. Roland & Shiman 2002 particularly focus on the various ‘philosophies’ of technological forecasting & development, which guided DARPA’s strategy in different periods, ultimately endorsing a weak technological determinism where the bottlenecks are too large for a small (in comparison to the global economy & global R&D) organization best a DARPA can hope for is a largely agnostic & reactive strategy in which granters ‘surf’ technological changes, rapidly exploiting new technology while investing their limited funds into targeted research patching up any gaps or lags that accidentally open up and block broader applications. (For broader discussion of progress, see “Lessons from the Media Lab” & Bakewell.)
While reading “Funding Breakthrough Research: Promises and Challenges of the ‘ARPA Model’”, Azoulay et al 2018, on DARPA, I noticed an interesting comment:
In this paper, we propose that the key elements of the ARPA model for research funding are: organizational flexibility on an administrative level, and substantial authority given to program directors to design programs, select projects and actively manage projects. We identify the ARPA model’s domain as mission-oriented research on nascent S-curves within an inefficient innovation system.
…Despite a great deal of commentary on DARPA, lack of access to internal archival data has hampered efforts to study it empirically. One notable exception is the work of Roland & Shiman 2002,2 who offer an industrial history of DARPA’s effort to develop machine intelligence under the “Strategic Computing Initiative” [SCI]. They emphasize both the agency’s positioning in the research ecosystem—carrying military ideas to proof of concept that would be otherwise neglected—as well as the program managers’ role as “connectors” in that ecosystem. Roland and Shiman are to our knowledge the only academic researchers ever to receive internal access to DARPA’s archives. Recent work by Goldstein and Kearney (2018a) on ARPA-E is to-date the only quantitative analysis using internal program data from an ARPA agency. [For insights into this painful process, see the preface of Roland & Shiman 2002.]
The two Goldstein & Kearney 2018 papers sounded interesting but alas, are listed as “manuscript under review”/
The preface makes clear the odd footnote: while they may have had some access to internal archival data, they had a lot less access than they requested, DARPA was not enthusiastic about it, and eventually canceled their book contract (they published anyway). This leads to an… interesting preface. You don’t often hear historians of solicited official histories describe the access as a “mixed blessing” and say things like “they never lied to us, as best as we can tell”, they just “simply could not understand why we wanted to see the materials we requested”, or recount that their “requests for access to these [emails] were most often met with laughter”, noting that “We were never explicitly denied access to records controlled by DARPA; we just never gained complete access.” Frustrated, they
…then asked if they could identify any document in the SC program that spoke the truth, that could be accepted at face value. They [ARPA interviewees] found this an intriguing question. They could not think of a single such document. All documents, in their view, distorted reality one way or another—always in pursuit of some greater good.
In one anecdote from the interviews, Lynn Conway shows up with a stack of internal DARPA documents, states that a NDA prevents her from talking about them (as if anyone cared about NDAs from decades before), and refuses to show any of the documents to the interviewer, leaving me rather bemused—why bother? (Although in this case, it may just be that Conway is a jerk—one might remember her from helping try to frame Michael Bailey for sexual abuse.) I was reminded a little of Carter Scholz’s also 2002 novel, Radiance, which touches on SDI and indirectly on SCI.
The book itself doesn’t seem to have suffered too badly for the birth pangs. It’s an overview of the birth and death of the SCI, organized in chunks by the manager. The division by manager is not an accident—R&S comment deprecatingly about DARPA personnel being focused on the technology and how they didn’t want them to “talk about people and politics” and invoke the strawman of “technological determinists”; they seem to adopt the common historian pose that a sophisticated historian focuses on people and it is naive & unsophisticated to invoke objective constraints of science & technology & physics. This is wrong in the context of SCI, as their in-depth recounting will eventually make clear. The people did not have much to do with the failures: stuff like gallium arsenide or expert systems or autonomous robots didn’t work out because they don’t work or are hard or require computing power unavailable at the time, not because some bureaucrat made a bad naming choice or ran afoul of the wrong Senator. People don’t matter to something like Moore’s law. Man proposes but Nature disposes—you can fake medicine or psychology easily, but it’s harder to fake a robot not running into trees. Fortunately, for all the time R&S spend on project managers shuffling around acronyms, they still devote adequate space to the actual science & technology and do a good job of it.
So what was SCI? It was a 1980s–1990 add-on to ARPA’s existing funding programs, where the spectre of Japan’s Fifth Generation Project was used to lobby Congress for additional R&D funding which would be devoted to a cluster of interconnected technological opportunities ARPA spied on the US horizon, to push them forward simultaneously and break the logjams. (As always, “funding comes from the threat”, though many were highly skeptical that Fifth Generation would go anywhere or that its intended goals—much of which was to simply work around flaws in Japanese language handling—were much of a threat, and most Western evaluations of it generally describe it as a failure or at least not a notably productive R&D investment.) The systems included gallium arsenide chips to replace silicon’s poor thermal/
The project implementation followed ARPA’s existing loose oversight paradigm, where traveling project managers were empowered to dispense grants to applicants on their own authority, depending primarily on their own good taste to match talented researchers with ripe opportunities, with bureaucracy limited to meeting with the grantees semi-annually or annually for progress reports & evaluation, often in groups so as to let researchers test each other’s mettle & form social ties. (“ARPA program managers like to repeat the quip that they are 75 entrepreneurs held together by a common travel agent.”) An ARPA PM would humbly ‘surf’ the cutting-edge, going with the waves rather than swimming upstream, so to speak, to follow growing trends while cutting their losses on dead ends, to bring things through the ‘valley of death’ between lab prototype and the real world:
Steven Squires, who rose from program manager to be Chief Scientist of SC and then director of its parent office, sought orders-of-magnitude increases in computing power through parallel connection of processors. He envisioned research as a continuum. Instead of point solutions, single technologies to serve a given objective, he sought multiple implementations of related technologies, an array of capabilities from which users could connect different possibilities to create the best solution for their particular problem. He called it “gray coding”. Research moved not from the white of ignorance to the black of revelation, but rather it inched along a trajectory stepping incrementally from one shade of gray to another. His research map was not a quantum leap into the unknown but a rational process of connecting the dots between here and there. These and other DARPA managers attempted to orchestrate the advancement of an entire suite of technologies. The desideratum of their symphony was connection. They perceived that research had to mirror technology. If the system components were to be connected, then the researchers had to be connected. If the system was to connect to its environment, then the researchers had to be connected to the users. Not everyone in SC shared these insights, but the founders did, and they attempted to instill this ethos in the program.
Done wrong, of course, this results in a corrupt slush fund doling out R&D funds to an incestuous network of grantees for technologies always just on the horizon and whose failure is always excused by the claim that high-risk research often won’t work out, or results in elaborate systems trying to do too many things and collapsing under the weight of many advanced half-debugged systems chaotically interacting (eg ILLIAC IV). Having been conceived in scientific sin and born of blue-uniform bureaucracy while midwifed by conniving committees, SCI’s prospects might not look too great.
So, did SCI work out? The answer is a definite, unqualified—maybe:
At the end of their decade, 1983–1993, the connection failed. SC never achieved the machine intelligence it had promised. It did, however, achieve some remarkable technological successes. And the program leaders and researchers learned as much from their failures as from their triumphs. They abandoned the weak components in their system and reconfigured the strong ones. They called the new system “high performance computing”. Under this new rubric they continued the campaign to improve computing systems. “Grand challenges” replaced the former goal, machine intelligence; but the strategy and even the tactics remained the same.
The end of SCI coincided with (and partially caused) the “AI Winter”, but SCI went beyond just the Lisp machine & expert system software companies we associate with the AI winter. Of the systems, some worked out, others were good ideas but the time wasn’t ripe in an unforeseeable way and have been maturing ever since, some have poked along in a kind of permanent stasis (not dead but not alive either), others were dead ends but dead ends in important ways, and some are plain dead. In order, one might list: parallel commodity processors and rapid development of large silicon chips via a subsidized foundry, the autonomous cars/
Pining for the fjords: super-fast superconducting Josephson junctions were rapidly abandoned before becoming officially part of SCI research, while gallium arsenide suffered a similar fate—at the time, they were exciting and Cray Computers infamously bet big on the Cray 3 achieving its OOM improvement in part with gallium arsenide chips, but somehow it never quite worked out or replaced silicon and remains in a small niche. (I doubt it was SDI’s fault, since gallium arsenide has had 2 decades since, and there’s been a ton of commercial incentive to find a replacement for silicon as it gets ever harder to shrink silicon nodes.)
Important failures: autonomous vehicles and generalized AI systems represent an interesting intermediate case: the funded vehicles, like the work at CMU, were useless—expensive, slow, trivially confused by slight differences in roads or scenery, unable to cope in realtime with more than monochrome images with pitiful resolutions like 640x640px or smaller because the computer vision algorithms were too computationally demanding, and the development bogged down by endless tweaks and hacking with regular regressions in capability. But these research programs and demos were direct ancestors of the DARPA Grand Challenge, which itself kickstarted the current self-driving car boom a decade ago. ARPA and the military didn’t get the exciting vehicles promised by the early ’90s, but they do now have autonomous cars and especially drones, and it’s amazing to think that Google Waymo cars are wandering around Arizona now regularly picking up and dropping off riders without a single fatality or major injury after millions of miles. As far as I can tell, Waymo wouldn’t exist now without the DARPA Grand Challenge, and it seems possible that DARPA was encouraged by the mixed success of the SCI vehicles, so that’s an interesting case of potential success albeit delayed. (But then, we do expect that with technology—Amara’s law.)
Parallel computers: Thinking Machines benefited a lot from SCI as did other parallel computing projects, and while TM did fail and the computers we use now don’t resemble the Connection Machine at all2, the field of parallel processing was proven out (ie. systems with thousands of weak CPUs could be successfully built, programmed, realize OOM performance gains, and commercially sold); I’d noticed once that a lot of parallel computing architectures we use now seemed to stem from an efflorescence in the 1980s, but it was only while reading R&S and noting all the familiar names that I realized that that was not a coincidence because many of them were ARPA-funded at this time. Even without R&S noting that the parallel computing was successfully rolled over into “HPC”, SCI’s investment into parallel computing was a big success.
A successful adjunct to the parallel computing was an interesting program I’d never heard of before: MOSIS. MOSIS was essentially a government-subsidized chip foundry, competitive with commercial chip foundries, which would accept student & researcher submissions of VLSI chip designs like CPUs or ASICs and make physical chips in combined batches to save costs. Anyone with interesting new ideas could email in a design and get back within 2 months a real live chip for a few hundred dollars. The chips would be made cheaply, quickly, quality-checked, with assurance of privacy, and ran thousands of projects a year (peaking at 1880 in 1989). This is quite a cool program to run and must have been a godsend, especially for anyone trying to make custom chips for parallel projects. (“SC also supported BBN’s Butterfly parallel processor, Charles Seitz’s Hypercube and Cosmic Cube at CalTech, Columbia’s Non-Von, and the CalTech Tree Machine. It supported an entire newcomer as well, Danny Hillis’s Connection Machine, coming out of MIT.47 All of these projects used MOSIS services to move their design ideas into experimental chips.”) It was involved in early GPU work (Clark’s Geometry Engine) and RISC designs like MIPS and even oddities like systolic array chips/
Expert systems and planners are generally listed as a ‘failure’ and the cause of the AI Winter, and it’s true they didn’t give us HAL as some GOFAI people hoped, but they did find a useful niche and have been important—R&S give a throwaway paragraph noting that one system from SCI, DART, was used in planning logistics for the first Gulf War and saved the DoD more money than the whole SCI program combined cost. (The listed reference, “DART: Revolutionizing Logistics Planning”, Hedberg 2002, actually makes the bolder claim that DART “paid back all of DARPA’s 30 years of investment in AI in a matter of a few months, according to Victor Reis, Director of DARPA at the time.” Which could be equally well taken as a comment on how expensive a war is, how inefficient DoD logistics planning was, or how little has been invested in AI.) It’s also worth noting that speech recognition based on Hidden Markov models & n-grams, the first speech recognition systems which were any use (underlying successes like Dragon Naturally Speaking), was a success here, even if now obsolesced by deep learning.
Perhaps the most relevant area to contemporary AI discussions of deep learning is the expert systems. Why was there such optimism? Expert systems had accomplished a few successes: MYCIN/ DENDRAL (although it was never used in production), some mining/
Small wonder, then, that Robert Kahn and the architects of SC believed in 1983 that AI was ripe for exploitation. It was finally moving out of the laboratory and into the real world, out of the realm of toy problems and into the realm of real problems, out of the sterile world of theory and into the practical world of applications.
…That such a goal appeared within reach in the early 1980s is a measure of how far the field had already come. In the early 1970s, the MYCIN expert system had taken twenty person-years to produce just 475 rules.38 The full potential of expert systems lay in programs with thousands, even tens and hundreds of thousands, of rules. To achieve such levels, production of the systems had to be dramatically streamlined. The commercial firms springing up in the early 1980s were building custom systems one client at a time. DARPA would try to raise the field above that level, up to the generic or universal application.
Thus was shaped the SC agenda for AI. While the basic program within IPTO continued funding for all areas of AI, SC would seek “generic applications” in four areas critical to the program’s applications: (1) speech recognition would support Pilot’s Associate and Battle Management; (2) natural language would be developed primarily for Battle Management; (3) vision would serve primarily the Autonomous Land Vehicle; and (4) expert systems would be developed for all of the applications. If AI was the penultimate tier of the SC pyramid, then expert systems were the pinnacle of that tier. Upon them all applications depended. Development of a generic expert system that might service all three applications could be the crowning achievement of the program. Optimism on this point was fueled by the whole philosophy behind SC. AI in general, and expert systems in particular, had been hampered previously by lack of computing power. Feigenbaum, for example, had begun DENDRAL on an IBM 7090 computer, with about 130K bytes of core memory and an operating speed between 50 and 100,000 floating point operations per second.39 Computer power was already well beyond that stage, but SC promised to take it to unprecedented levels—a gigaflop by 1992. Speed and power would no longer constrain expert systems. If AI could deliver the generic expert system, SC would deliver the hardware to run it. Compared to existing expert systems running 2,000 rules at 50–100 rules per second, SC promised “multiple cooperating expert systems with planning capability” running 30,000 rules firing at 12,000 rules per second and six times real time.40
What happened was that the hardware came into existence, but the expert systems didn’t scale. They instantly hit a combinatorial wall, couldn’t solve the grounding problem, and knowledge engineering never became feasible at the level where you might encode a human’s knowledge. Expert systems also struggled to be extended beyond symbolic systems to real data like vision or sound. AI didn’t have remotely enough computing power to do anything useful, and it didn’t have methods which could use the computing power if it had it. We got the VLSI chips, we got the gigahertz processors even without gallium arsenide, we got the gigaflops and then the teraflops and now the petaflops—but what do you do with an expert system on those? Nothing. The grand goals of SCI relied on all the parts doing their part, and one part fell through:
Only four years into the SC program, when Schwartz was about to terminate the IntelliCorp and Teknowledge contracts, expectations for expert systems were already being scaled back. By the time that Hayes-Roth revised his article for the 1992 edition of the Encyclopedia, the picture was still more bleak. There he made no predictions at all about program speeds. Instead he noted that rule-based systems still lacked “a precise analytical foundation for the problems solvable by RBSs . . . and a theory of knowledge organization that would enable RBSs to be scaled up without loss of intelligibility of performance.”108 SC contractors in other fields, especially applications, had to rely on custom-developed software of considerably less power and versatility than those envisioned when contracts were made with IntelliCorp and Teknowledge. Instead of a generic expert system, SC applications relied increasingly on “domain-specific software”, a change in terminology that reflected the direction in which the entire field was moving.109 This is strikingly similar to the pessimistic evaluation Schwartz had made in 1987. It was not just that IntelliCorp and Teknowledge had failed; it was that the enterprise was impossible at current levels of experience and understanding…Does this mean that AI has finally migrated out of the laboratory and into the marketplace? That depends on one’s perspective. In 1994 the U.S. Department of Commerce estimated the global market for AI systems to be about $1,918$9001994 million, with North America accounting for two-thirds of that total.119 Michael Schrage, of the Sloan School’s Center for Coordination Science at MIT, concluded in the same year that “AI is—dollar for dollar—probably the best software development investment that smart companies have made.”120 Frederick Hayes-Roth, in a wide-ranging and candid assessment, insisted that “KBS have attained a permanent and secure role in industry”, even while admitting the many shortcomings of this technology.121 Those shortcomings weighed heavily on AI authority Daniel Crevier, who concluded that “the expert systems flaunted in the early and mid-1980s could not operate as well as the experts who supplied them with knowledge. To true human experts, they amounted to little more than sophisticated reminding lists.”122 Even Edward Feigenbaum, the father of expert systems, has conceded that the products of the first generation have proven narrow, brittle, and isolated.123 As far as the SC agenda is concerned, Hayes-Roth’s 1993 opinion is devastating: “The current generation of expert and KBS technologies had no hope of producing a robust and general human-like intelligence.”124
…Each new [ALV] feature and capability brought with it a host of unanticipated problems. A new panning system, installed in early 1986 to permit the camera to turn as the road curved, unexpectedly caused the vehicle to veer back and forth until it ran off the road altogether.45 The software glitch was soon fixed, but the panning system had to be scrapped anyway; the heavy, 40-pound camera stripped the device’s gears whenever the vehicle made a sudden stop.46 Given such unanticipated difficulties and delays, Martin increasingly directed its efforts toward achieving just the specific capabilities required by the milestones, at the expense of developing more general capabilities. One of the lessons of the first demonstration, according to the ALV engineers, was the importance of defining “expected experimental results”, because “too much time was wasted doing things not appropriate to proof of concept.”47 Martin’s selection of technology was conservative. It had to be, as the ALV program could afford neither the lost time nor the bad publicity that a major failure would bring. One BDM observer expressed concern that the pressure of the demonstrations was encouraging Martin to cut corners, for instance by using the “flat earth” algorithm with its two-dimensional representation. ADS’s obstacle-avoidance algorithm was so narrowly focused that the company was unable to test it in a parking lot; it worked only on roads.84…The vision system proved highly sensitive to environmental conditions—the quality of light, the location of the sun, shadows, and so on. The system worked differently from month to month, day to day, and even test to test. Sometimes it could accurately locate the edge of the road, sometimes not. The system reliably distinguished the pavement of the road from the dirt on the shoulders, but it was fooled by dirt that was tracked onto the roadway by heavy vehicles maneuvering around the ALV. In the fall, the sun, now lower in the sky, reflected brilliantly off the myriads of polished pebbles in the tarmac itself, producing glittering reflections that confused the vehicle. Shadows from trees presented problems, as did asphalt patches from the frequent road repairs made necessary by the harsh Colorado weather and the constant pounding of the eight-ton vehicle.42
…Knowledge-based systems in particular were difficult to apply outside the environment for which they had been developed. A vision system developed for autonomous navigation, for example, probably would not prove effective for an automated manufacturing assembly line. “There’s no single universal mechanism for problem solving”, Amarel would later say, “but depending on what you know about a problem, and how you represent what you know about the problem, you may use one of a number of appropriate mechanisms.”…In another major shift in emphasis, SC2 removed “machine intelligence” from its own plateau on the pyramid, subsuming it under the general heading “software”. This seemingly minor shift in nomenclature signaled a profound reconceptualization of AI, both within DARPA and throughout much of the computer community. The effervescent optimism of the early 1980s gave way to more sober appraisal. AI did not scale. In spite of impressive achievements in some fields, designers could not make systems work at a level of complexity approaching human intelligence. Machines excelled at data storage and retrieval; they lagged in judgment, learning, and complex pattern recognition.
…During SC, AI had proved unable to exploit the powerful machines developed in SC’s architectures program to achieve Kahn’s generic capability in machine intelligence. On the fine-grained level, AI, including many developments from the SC program, is ubiquitous in modern life. It inhabits everything from automobiles and consumer electronics to medical devices and instruments of the fine arts. Ironically, AI now performs miracles unimagined when SC began, though it can’t do what SC promised.
Given how people keep reaching back to the AI Winter in discussions of connectionism—I mean, deep learning—it’s interesting to contrast the two paradigms.
While working on the Wikipedia article for Lisp machines (and articles on related high-profile successes like MYCIN/
Deep learning has long ago escaped into the commercial market, indeed, is primarily driven by industry researchers at this point. The case studies are innumerable (and many are secret due to their considerable commercial value). DL handles grounding problems & raw sensory data well and indeed struggles most on problems with richly formalized structures like hierarchies/
Considering all this, it’s not a surprise that the AI part of SC didn’t pan out and eventually got axed, as it should have. Sometimes the time is not ripe. Hero can invent the steam engine, but you don’t get steam engine trains until it’s steam engine train time, and the best intentions of all the bureaucrats in the world can’t affect that much. The turnover in managers and political interference may well have been enough to “disrupt the careful orchestration that its ambitious agenda required”, but this was more in the nature of shooting a dead horse. R&S seem, somewhat reluctantly, to ultimately assent to the view they critiqued at the beginning, held by the ARPA staff, that the failure of SC is primarily a demonstration of technological determinism than social & political contingency, and more about the technology than people:
…Thus, for all of their agency, their story appears to be one driven by the technology. If they were unable to socially construct this technology, to maintain agency over technological choice, does it then follow that some technological imperative shaped the SC trajectory, diverting it in the end from machine intelligence to high performance computing? Institutionally, SC is best understood as an analog of the development programs for the Polaris and Atlas ballistic missiles. An elaborate structure was created to sell the program, but in practice the plan bore little resemblance to day-to-day operations. Conceptually, SC is best understood by mixing Thomas Hughes’s framework of large-scale technological systems with Giovanni Dosi’s notions of research trajectories. Its experience does not quite map on Hughes’s model because the managers could not or would not bring their reverse salients on line. It does not quite map on Dosi because the managers regularly dealt with more trajectories and more variables than Dosi anticipates in his analyses. In essence, the managers of SC were trying to research and develop a complex technological system. They succeeded in developing some components; they failed to connect them in a system. The overall program history suggests that at this level of basic or fundamental research it is best to aim for a broad range of capabilities within the technology base and leave integration to others…While the Fifth Generation program contributed substantially to Japan’s national infrastructure in computer technology, it did not vault that country past the United States…SC played an important role, but even some SC supporters have noted that the Japanese were in any event headed on the wrong trajectory even before the United States mobilized itself to meet their challenge.
…In some ways the varying records of the SC applications shed light on the program models advanced by Kahn and Cooper at the outset. Cooper believed that the applications would pull technology development; Kahn believed that the evolving technology base would reveal what applications were possible. Kahn’s appraisal looks more realistic in retrospect. It is clear that expert systems enjoyed substantial success in planning applications. This made possible applications ranging from Naval Battle Management to DART. Vision did not make comparable progress, thus precluding achievement of the ambitious goals set for the ALV. Once again, the program went where the technology allowed. Some reverse salients resisted efforts to orchestrate advance of the entire field in concert. If one component in a system did not connect, the system did not connect.
In the final analysis, SC failed for want of connection.
Reading about SC furnishes an unexpected lesson about the importance of believing in Moore’s Law and having techniques which can scale. What are we doing now which won’t scale, and what waves are we paddling up instead of surfing?
Ironically, as I write this in 2018, DARPA has recently announced another attempt at “silicon compilers”, presumably sparked by commodity chips topping out and ASICs being required, which I can only summarize as “Verilog but let’s do it sanely this time and with FLOSS rather than a crazy tragedy-of-the-anticommons proprietary ecosystem of crap”.↩︎
Specifically, contemporary computers don’t use the dense grid of 1-bit processors with local memory which characterized the CM. They do feature increasingly thousands of ‘processor’ equivalents in the form of CPU cores and the GPU cores, but those are all far more powerful than a CM CPU node. But we might yet see some convergence with the CM thanks to neural networks: neural networks are typically trained with wastefully precise floating point operations, slowing them down, thus the rise of ‘tensor cores’ and ‘TPUs’ using lower precision, like 8-bit integers, and it is possible to discretize neural nets all the way down to binary weights. This offers a lot of potential electricity savings, and if you have binary weights, why not binary computing elements as well…?↩︎
People tend to ignore this, but CNNs can work with a few hundred or even just one or two images, using transfer learning, few-shot learning, and aggressive regularization like data augmentation.↩︎
While the accuracy rates may increase by what looks like a tiny amount, and one might ask how important a change from 99% to 99.9% accuracy is, the large-scale training papers demonstrate that neural nets continue to learn hidden knowledge from the additional data which provide ever better semantic features which can be reused elsewhere.↩︎