Complexity no Bar to AI

Critics of AI risk suggest diminishing returns to computing (formalized asymptotically) means AI will be weak; this argument relies on a large number of questionable premises and ignoring additional resources, constant factors, and nonlinear returns to small intelligence advantages, and is highly unlikely.
computer-science, transhumanism, AI, R, insight-porn
2014-06-012019-06-09 finished certainty: likely importance: 10


Com­pu­ta­tional com­plex­ity the­ory describes the steep increase in com­put­ing power required for many algo­rithms to solve larger prob­lems; fre­quent­ly, the increase is large enough to ren­der prob­lems a few times larger totally intractable. Many of these algo­rithms are used in AI-rel­e­vant con­texts. It has been argued that this implies that AIs will fun­da­men­tally be lim­ited in accom­plish­ing real-­world tasks bet­ter than humans because they will run into the same com­pu­ta­tional com­plex­ity limit as humans, and so the con­se­quences of devel­op­ing AI will be small, as it is impos­si­ble for there to be any large fast global changes due to human or super­hu­man-level AIs. I exam­ine the assump­tions of this argu­ment and find it neglects the many con­di­tions under which com­pu­ta­tional com­plex­ity the­o­rems are valid and so the argu­ment does­n’t work: prob­lems can be solved more effi­ciently than com­plex­ity classes would imply, large dif­fer­ences in prob­lem sol­u­bil­ity between humans and AIs is pos­si­ble, greater resource con­sump­tion is pos­si­ble, the real-­world con­se­quences of small dif­fer­ences on indi­vid­ual tasks can be large on agent impacts, such con­se­quences can com­pound, and many agents can be cre­at­ed; any of these inde­pen­dent objec­tions being true destroys the argu­ment.

attempts to describe the resource usage of algo­rithms from the abstract van­tage point of con­sid­er­ing how run­ning time on some ide­al­ized com­puter rel­a­tively increases for a spe­cific algo­rithm as the inputs scale in size towards infin­i­ty. For many impor­tant algo­rithms used in AI and pro­gram­ming in gen­er­al, the dif­fi­culty turns out to increase steeply with extra data—­com­par­ison-based sort­ing algo­rithms like take only and so you can sort just about any amount of data in a fea­si­ble time, but more inter­est­ing prob­lems like the / become (or expo­nen­tially or worse) as the data increases and quickly go from fast to fea­si­ble to impos­si­ble.

Complexity implies Singularities are impossible

One argu­ment against pow­er­ful arti­fi­cial intel­li­gences, and sce­nar­ios cor­re­spond­ing to in gen­er­al, draws from .

For exam­ple, in “The Sin­gu­lar­ity Is Fur­ther Than It Appears”, makes a num­ber of objec­tions rang­ing from the pos­si­bil­ity that neu­rons are more pow­er­ful than gen­er­ally believed and that cor­po­ra­tions have not cre­ated a Sin­gu­lar­ity yet so they never will (some of which are crit­i­cized by William Hertling), but he starts with a com­pu­ta­tional com­plex­ity argu­ment using as an exam­ple:

Are we headed for a Sin­gu­lar­i­ty? Is it immi­nen­t?…But regard­less of which def­i­n­i­tion you use, there are good rea­sons to think that it’s not on the imme­di­ate hori­zon…This is the so-­called ‘hard take­off’ sce­nar­io, also called the FOOM model by some in the sin­gu­lar­ity world. It’s the sce­nario where in a blink of an AI, a ‘god­like’ intel­li­gence boot­straps into being, either by upgrad­ing itself or by being cre­ated by suc­ces­sive gen­er­a­tions of ances­tor AIs. It’s also, with due respect to Ver­nor Vinge, of whom I’m a great fan, almost cer­tainly wrong. It’s wrong because most real-­world prob­lems don’t scale lin­ear­ly. In the real world, the inter­est­ing prob­lems are much much harder than that.

Graph of expo­nen­tial scal­ing time in chem­i­cal mod­el­ing

started in the 1950s. Today we have lit­er­ally tril­lions of times more com­put­ing power avail­able per dol­lar than was avail­able at that time. But it’s still hard. Why? Because the prob­lem is incred­i­bly non-­lin­ear…How fast? The very fastest (and also, sad­ly, the most lim­ited and least accu­rate) scale at N2, which is still far worse than lin­ear. By anal­o­gy, if design­ing intel­li­gence is an N2 prob­lem, an AI that is 2× as intel­li­gent as the entire team that built it (not just a sin­gle human) would be able to design a new AI that is only 70% as intel­li­gent as itself. That’s not escape veloc­i­ty.

A fol­lowup post by Naam, “Why AIs Won’t Ascend in the Blink of an Eye­—­Some Math”, describes it more direct­ly:

In my pre­vi­ous post on why the Sin­gu­lar­ity is Fur­ther Than it Appears, I argued that cre­at­ing more advanced minds is very likely a prob­lem of non-­lin­ear com­plex­i­ty. That is to say, cre­at­ing a mind of intel­li­gence 2 is prob­a­bly more than twice as hard as cre­at­ing a mind of intel­li­gence 1. The dif­fi­culty might go up expo­nen­tial­ly. Or it might go up ‘merely’ with the cube or the square of the intel­li­gence level you’re try­ing to reach. Blog reader Paul Baum­bart took it upon him­self to graph out how the intel­li­gence of our AI changes over time, depend­ing on the com­pu­ta­tional com­plex­ity of increas­ing intel­li­gence.

Intel­li­gence growth under Var­i­ous Dif­fi­culty Assump­tions

…Ev­ery other model Paul put into his spread­sheet showed con­ver­gence instead of diver­gence. Almost any non-­lin­ear dif­fi­culty in boost­ing intel­li­gence means that no run­away occurs. (Note that these do not include the ben­e­fit of get­ting new hard­ware over time and gen­eral speedup from Moore’s Law, for so long as that con­tin­ues. But they do include the ben­e­fit of design­ing new hard­ware for itself or any speedup that it can cause to Moore’s Law.) The bot­tom line, in green, is expo­nen­tial dif­fi­culty (ex). Many real-­world prob­lems are expo­nen­tially dif­fi­cult as they grow in size. The ‘trav­el­ing sales­man’ prob­lem is an expo­nen­tial prob­lem (at least to find an exact solu­tion). Mod­el­ing quan­tum mechan­i­cal sys­tems is an expo­nen­tial prob­lem. Even some impor­tant sce­nar­ios of pro­tein fold­ing are expo­nen­tially dif­fi­cult. So it’s not at all unlikely that boost­ing intel­li­gence would fall into this cat­e­go­ry. And as you can see,if intel­li­gence is expo­nen­tially dif­fi­cult, the super-in­tel­li­gence does ascend.

Or to put it per­haps more clear­ly, for a fixed amount of com­pu­ta­tion, at each greater level of intel­li­gence, a smaller increase in intel­li­gence can be real­ized with that amount of com­pu­ta­tion.

A some­what sim­i­lar argu­ment is ges­tured at by Nathan Myhrvold:

The­o­rists have proved that some math­e­mat­i­cal prob­lems are actu­ally so com­pli­cated that they will always be chal­leng­ing or even impos­si­ble for com­put­ers to solve. So at least for now, peo­ple who can push for­ward the bound­ary of com­pu­ta­tion­ally hard prob­lems need never fear for lack of work. This tells us some­thing impor­tant about AI. Like math­e­mat­ics, intel­li­gence is not just one sim­ple kind of prob­lem, such as pat­tern recog­ni­tion. It’s a huge con­stel­la­tion of tasks of widely dif­fer­ing com­plex­i­ty. So far, the most impres­sive demon­stra­tions of “intel­li­gent” per­for­mance by AI have been pro­grams that play games like chess or Go at super­hu­man lev­els. These are tasks that are so dif­fi­cult for human brains that even the most tal­ented peo­ple need years of prac­tice to mas­ter them. Mean­while, many of the tasks that seem most basic to us human­s—­like run­ning over rough ter­rain or inter­pret­ing body lan­guage—are all but impos­si­ble for the machines of today and the fore­see­able future. As AI gets more capa­ble, the sphere of jobs that com­put­ers can do faster or more accu­rately than peo­ple will expand. But an expand­ing uni­verse of work will remain for humans, well out­side the reach of automa­tion.

Awk­ward­ly, this argu­ment con­tains its own refu­ta­tion: chess/Go are com­pu­ta­tion­ally dif­fi­cult in pre­cisely the way Myhrvold claims will put math­e­mat­i­cal prob­lems out of reach of com­put­ers, and yet, have already fall­en.1

This can be seen as one of the “struc­tural objec­tions” where the is specif­i­cally attrib­uted to incre­ments in com­put­ing power solv­ing less data points as data sizes scale (to use Chalmers 2010’s tax­on­o­my). So the argu­ment (fill­ing in the gaps and omit­ting the var­i­ous graphs show­ing hypo­thet­i­cal scal­ings) goes some­thing like this:

  1. most tasks an intel­li­gent agent (hu­man or arti­fi­cial intel­li­gence) needs to solve are in dif­fi­cult com­plex­ity class­es, such as NP or NP-hard: Trav­el­ing Sales­man, 3-SAT, belief prop­a­ga­tion, train­ing, , play­ing , solv­ing
  2. a task in NP or higher com­plex­ity class can only be solved for small prob­lem sizes
  3. if a task can only be solved for small prob­lem sizes, then the best agent will solve only slightly larger prob­lem sizes
  4. the real-­world reward to an agent from solv­ing a slightly larger prob­lem is only slightly greater
  5. the long-term con­se­quence of slightly greater rewards is itself small
  6. if an AI becomes the best agent, then it must solve prob­lems in dif­fi­cult com­plex­ity classes (1), so it will be able to solve only slightly larger pro­grams (2–3), receiv­ing slightly greater rewards (4), with only small long-term con­se­quences (5)
  7. if each AI has only small long-term con­se­quences, all AIs together will have a small total long-term con­se­quence
  8. thus, AIs becom­ing intel­li­gent agents will have only small total long-term con­se­quences

This argu­ment is valid as far as it goes and can prob­a­bly be for­mal­ized. But is the argu­ment sound?

Complexity caveats

One dif­fi­culty with apply­ing com­pu­ta­tional com­plex­ity the­ory out­side its usual area is that peo­ple tend to neglect the require­ments of com­plex­ity the­ory which gives it its gen­er­al­i­ty: that it omits the ‘con­stant fac­tors’ and the actual run­time, that many of the state­ments are lower/upper bounds or state­ments about worst-­case com­plex­i­ty, that the state­ments tend to be about spe­cific algo­rithm­s—which are rarely the only way to solve a real-­world prob­lem—and that it does­n’t try to say any­thing about util­ity or con­se­quences. Sim­i­lar­ly, peo­ple some­times rea­son that since a human and AI would be in the same com­putabil­ity class (Tur­ing-­com­plete­ness), that any­thing an AI could do or think, a human must also be able to do or think, but they neglect that humans do not have unbounded time or space like the ide­al­ized Tur­ing machine and there is no more rea­son to expect under­stand­ing to be pos­si­ble than to expect an ant to under­stand every­thing a human does before it dies of old age; an ant with galax­ies of note­books and bil­lions of years could per­haps under­stand human civ­i­liza­tion, but no such ant has ever existed or ever will, and the under­stand­ing of that ant of human action will ulti­mately be . The ques­tion of whether such tasks are fea­si­ble for a “com­pact, effi­cient com­puter pro­gram”, not just com­putable, may take on both meta­phys­i­cal and prac­ti­cal sig­nif­i­cance (to para­phrase ).

Laid out bare, I would have to say that the argu­ment depends crit­i­cally on each of the premises being true, but every premise 1–5 is either ques­tion­able or false.

Are all problems worst-case and NP-hard?

Premise 1 is incor­rect because the proofs of those com­plex­i­ties gen­er­ally depend on gen­eral solu­tions with deter­min­is­tic exact­ly-op­ti­mal worst-­case behav­ior. The appar­ent bar­rier of a com­plex prob­lem can be bypassed by (in roughly increas­ing order of prac­ti­cal impor­tance):

  1. opti­miz­ing com­plex­ity class: exist­ing proofs could be incor­rect, inap­plic­a­ble (such as assum­ing clas­si­cal rather than quan­tum com­put­er­s), or based on open con­jec­tures widely believed by humans one way but which could still resolve in the more advan­ta­geous direc­tion (eg )

  2. giv­ing up deter­min­ism and using which are faster but or a 2 (they typ­i­cally can be run many times if cor­rect­ness is impor­tant; after a few times, the prob­a­bil­ity of an error will be smaller than the prob­a­bil­ity that the com­puter hard­ware has suf­fered ); ran­dom­iza­tion can be applied to algo­rithms and to data struc­tures.

  3. need­ing good aver­age-­case behav­ior rather than worst-­case behav­ior

    Rather than merge sort, one could use —merge sort has bet­ter worst-­case com­plex­ity than Quick­sort (which ren­ders it vul­ner­a­ble to DoS attacks if there are adver­saries who can force the worst-­case ), but Quick­sort is still usu­ally faster. Like­wise, in the worst case, 3-SAT/Traveling Sales­man are wildly intractable for any real­is­tic dataset like plan­ning a trip across the USA; but the aver­age-­case per­for­mance is quite dif­fer­ent and in prac­tice, 3-SAT/Traveling Sales­man are solved all the time, to the extent where SAT solvers are rou­tinely used in com­puter secu­rity or the­o­rem prov­ing or type­-check­ing and logis­tics com­pa­nies are able to heav­ily opti­mize their ship­ping with them.

    Sim­i­larly for lin­ear pro­gram­ming’s and other oper­a­tions research algo­rithms, with are the­o­ret­i­cally intim­i­dat­ing but in real-­world prob­lems yield solu­tions after rea­son­able run­time—they work in prac­tice, but not in the­o­ry. For exam­ple, TSP instances up to 85,900 cities have been solved.

  4. giv­ing up gen­er­al­ity and spe­cial­iz­ing: an algo­rithm may be unnec­es­sar­ily gen­er­al.

    A com­par­i­son sort can be done in , yes, but one fre­quently does­n’t need to sort any kind of datatype for which an order­ing is avail­able but specif­i­cally strings, num­bers, etc, for which quasi linear/ sort­ing is pos­si­ble using /. An algo­rithm could also have prior infor­ma­tion about the kind of data input which will be avail­able— is aware that most inputs will be par­tially sorted already and can out­per­form a naive sort. Data struc­tures can be tuned for par­tic­u­lar dis­tri­b­u­tions of data, and & & can be seen as gen­eral algo­rithms to the cur­rent or like­ly-­fu­ture inputs.

  5. giv­ing up opti­mal­ity and com­put­ing an approx­i­ma­tion of the opti­mal answer (often very close to the opti­mal answer)

    Already men­tioned is 3-SAT/TSP, for which there is a World Tour of 1,904,711-c­i­ties which has been solved with a tour within 0.0474% of the opti­mal tour by 2007, and plan­ning prob­lems sol­u­ble by trans­la­tion into SAT form can have mil­lions of clauses & vari­ables3, enabling such silly appli­ca­tions as draw­ing por­traits & art using the TSP; Naam gives chem­istry as an exam­ple by not­ing that while the exact physics is totally intractable, approx­i­ma­tions which are much more fea­si­ble are used. The last frac­tion of a per­cent­age point of opti­mal­ity can take truly enor­mous amounts of com­pu­ta­tion to squeeze out.

  6. chang­ing the prob­lem rather than suc­cumb­ing to : many prob­lems can be rede­fined or envi­ron­ments can be tweaked to bypass a chal­lenge & lever­age com­puter strengths.

    A may not be as good at vision as a human dri­ver, but sen­sors can be incor­po­rated into it in a way that can­not for humans as it would dis­tract them; a may not be as good at dri­ving around as a human work­er, but the envi­ron­ment can be altered with white lines or bar­code tags on the floor so the robots always know where they are. To quote para­phrase of 4:

    All processes that are sta­ble we shall pre­dict. All processes that are unsta­ble we shall con­trol.

  7. **solv­ing dif­fer­ent non-hu­man prob­lems* which humans can­not or will not solve.

    As says, “There are wave­lengths that peo­ple can­not see, there are sounds that peo­ple can­not hear, and maybe com­put­ers have thoughts that peo­ple can­not think.” There are prob­lems humans could never solve because it would require too much train­ing, or too much mem­o­ry, or too bizarre solu­tions. A human would never come up with many solu­tions that or neural net­works do, and they can be used on scales that humans never would; an unim­por­tant but inter­est­ing exam­ple would be —I can’t imag­ine any human beat­ing such a CNN, or even try­ing. In such cas­es, scal­ing con­cerns are totally besides the point.

Sev­eral of these cat­e­gories might ring famil­iar to those inter­ested in com­puter secu­ri­ty, because com­puter secu­rity suf­fers from sim­i­lar issues in the attempt to close the gap between the the­o­ret­i­cal guar­an­tees about the secu­rity of par­tic­u­lar cryp­tog­ra­phy algo­rithms and what secu­rity one gets in prac­tice.

In par­tic­u­lar, the have demon­strated the remark­able breadth of ways in which the NSA goes about break­ing com­puter secu­rity with­out need­ing access to the­o­ret­i­cal break­throughs or exotic quan­tum com­put­ers (and indeed, the NSA is more than a lit­tle con­temp­tu­ous of the aca­d­e­mic com­puter security/cryptography com­mu­ni­ties for their mis­guided focus on the­ory at the expense of imple­men­ta­tion): com­put­ers can be inter­cepted in the mail and hard­ware bugs implant­ed; com­put­ers can be mon­i­tored remotely using var­i­ous radio and phreak­ing devices; air­gapped net­works can be jumped by mal­ware hitch-hik­ing on USB dri­ves or buried inerad­i­cally inside BIOSes of devices like hard dri­ves which have their own proces­sors; data which is not at rest can be stolen from oth­er­wise-se­cure data cen­ters by tap­ping pri­vate fiber optic links (eg Google); more pub­lic fiber optic cables such as under­seas cables are hacked using ISP assis­tance and sub­ma­rine oper­a­tions, in some cases entire days of raw traf­fic being retained for analy­sis; encrypted data can be retained for­ever for future decryp­tion (such as by the NSA’s active quan­tum com­put­ing R&D effort); Inter­net-wide attacks can be mounted by fac­tor­ing cer­tain very com­monly used num­bers using NSA’s large com­pu­ta­tional resources and likely spe­cial­ized ASICs (the amor­tized cost of fac­tor­ing many keys simul­ta­ne­ously is dif­fer­ent and much smaller than the usu­ally cal­cu­lated cost of crack­ing a sin­gle key); pri­vate keys can be stolen by using sub­poe­nas or national secu­rity let­ters or hack­ing in or even phys­i­cal breakins; data can be traded with the intel­li­gence agen­cies of other coun­tries or their own hack­ing oper­a­tions hacked by the NSA (and ); back­doors can be intro­duced into oth­er­wise-se­cure soft­ware (Du­al_EC); com­monly used soft­ware can be exten­sively audit­ed, with bugs dis­cov­ered and exploited long before they are pub­licly known (); Inter­net con­nec­tions can be hijacked and diverted to NSA servers to serve up mal­ware. This gives an idea of the dif­fi­cul­ties faced when try­ing to be secure: where does one trustably get one’s com­puter and the soft­ware on it? How many 0-day vul­ner­a­bil­i­ties are there in the oper­at­ing sys­tem and all the cryp­to­graphic soft­ware? The encryp­tion algo­rithms may be inse­cure, or imple­mented inse­cure­ly, or exist decrypted some­where, or be run on sub­verted hard­ware, or the con­tents inferrable from meta­data & other activ­i­ty.

Hence, the exact dif­fi­culty of inte­ger fac­tor­ing or the exis­tence of one-way func­tions is often among the least of the fac­tors deter­min­ing the secu­rity of a sys­tem.

Are all implementations equally fast?

“For every poly­no­mi­al-­time algo­rithm you have, there is an expo­nen­tial algo­rithm that I would rather run.”

5

Premise 3 ignores that com­plex­ity classes by design try to abstract away from the ‘con­stant fac­tors’ which is the com­pu­ta­tion time deter­mined not by input size but by the details of com­puter archi­tec­tures, imple­men­ta­tions, and avail­able com­put­ing hard­ware. AIs and humans can be equally bound by asymp­totic com­plex­i­ty, but still dif­fer on per­for­mance. Scott Aaron­son:

…while P≟NP has tremen­dous rel­e­vance to arti­fi­cial intel­li­gence, it says noth­ing about the dif­fer­ences, or lack there­of, between humans and machines. Indeed, P ≠ NP would rep­re­sent a lim­i­ta­tion on all clas­si­cal dig­i­tal com­pu­ta­tion, one that might plau­si­bly apply to human brains just as well as to elec­tronic com­put­ers. Nor does P ≠ NP rule out the pos­si­bil­ity of robots tak­ing over the world. To defeat human­i­ty, pre­sum­ably the robots would­n’t need to solve arbi­trary NP prob­lems in poly­no­mial time: they’d merely need to be smarter than us, and to have imper­fect heuris­tics bet­ter than the imper­fect heuris­tics that we picked up from a bil­lion years of evo­lu­tion! Con­verse­ly, while a proof of P=NP might has­ten a robot upris­ing, it would­n’t guar­an­tee one.

But with care­fully opti­mized code, use of the , and spe­cial­ized hard­ware (eg GPUs, ASICs), it is pos­si­ble to see per­for­mance gains of mul­ti­ple orders of mag­ni­tude, which implies that one can increase the input size sev­eral times before hit­ting the scal­ing way that another agent might who paid less atten­tion to con­stant fac­tors. (Com­pu­ta­tional chem­istry may be intractable, even with approx­i­ma­tions, on clas­si­cal hard­ware—but what about if one has a with a few hun­dred qubits, enough that one can do ?) The impor­tance of con­stant fac­tors is one of the major traps in prac­ti­cal use of com­plex­ity class­es: a fancy algo­rithm with a supe­rior com­plex­ity class may eas­ily be defeated by a sim­pler algo­rithm with worse com­plex­ity but faster imple­men­ta­tion.6 (One rea­son that pro­gram­mers are exhorted to bench­mark, bench­mark, bench­mark!)

This does­n’t dis­prove the com­plex­ity class, which is about asymp­totic scal­ing and will still kick in at some point, but if it’s pos­si­ble to dou­ble or dec­tu­ple or more the input, this is enough of an increase that it’s hard to dis­miss the dif­fer­ence between best and worst agents’ prob­lem sizes as being only ‘slight’.

Final­ly, increased resource use / brute force is always an option for a pow­er­ful agent. Par­tic­u­larly in his sec­ond post, Naam’s argu­ment assumes fixed resources. This might be rel­e­vant to a few sce­nar­ios like an AI per­ma­nently con­fined to a sin­gle com­puter and unable to access more resources—but then, how intel­li­gent could such an AI pos­si­bly be if it can’t get out? How­ev­er, thanks to its intel­li­gence, human­ity now con­trols a large frac­tion of the bios­phere’s energy and with a super­com­put­er, or tech giants like Google or Ama­zon who con­trol mil­lions of proces­sor-­cores, can com­pute things totally out of reach of other agents; no lim­its to the amount of com­pu­ta­tion that can be done on (or off) Earth have yet been reached. Increases in com­put­ing resources of thou­sands or mil­lions of times, along with larger timescales, can over­come the asymp­totic to achieve the next intel­li­gence increase; if a human-level AI can ‘only’ accom­plish a few dozen dou­blings, well…

Are all returns linear?

Premise 4 is where the argu­ment starts try­ing to tie state­ments about com­plex­ity to real-­world con­se­quences. Naam argues

By anal­o­gy, if design­ing intel­li­gence is an N2 prob­lem, an AI that is 2× as intel­li­gent as the entire team that built it (not just a sin­gle human) would be able to design a new AI that is only 70% as intel­li­gent as itself. That’s not escape veloc­i­ty.

But this does­n’t make any sense. First, Naam’s require­ment for a Sin­gu­lar­ity is a straw man: ‘escape veloc­ity’ is not a con­cept any­one has required to be true of the Sin­gu­lar­i­ty; if noth­ing else, there are can be done in the observ­able uni­verse, so it’s unlikely that there is such a thing as an ‘infi­nite intel­li­gence’. At no point do Good or Vinge say that the Sin­gu­lar­ity is impor­tant only if the increase of intel­li­gence can con­tinue eter­nally with­out bound and Vinge is clear that the Sin­gu­lar­ity is a metaphor with no actual infin­ity7; intel­li­gence increases are impor­tant because wher­ever the improve­ments ter­mi­nate, they will ter­mi­nate at a intel­li­gence level above human­i­ty, which will give it capa­bil­i­ties beyond human­i­ty’s. (Good, for exam­ple, in his cost pro­jec­tions, appears to have a dimin­ish­ing returns model in mind when he spec­u­lates that if human-level intel­li­gence can be cre­at­ed, then twice the cost would give a greater-than-hu­man level intel­li­gence, and his later empha­sis on ‘econ­omy of mean­ing’; and Vinge says the Sin­gu­lar­ity is “the point where our old mod­els must be dis­carded and a new real­ity rules”, with­out mak­ing claims about indef­i­nite intel­li­gence increase, just that con­trol of events will have “intel­lec­tual run­away” from human­s—but a run­away train does­n’t increase veloc­ity expo­nen­tially until it attains the speed of light, it just escapes its oper­a­tors’ con­trol.)

Sec­ond, an intel­li­gence explo­sion scal­ing even super­lin­early at, say, would result in absolutely enor­mous prac­ti­cal dif­fer­ences, although I can’t under­stand what model Naam has in mind exact­ly—de­sign­ing intel­li­gence can’t lit­er­ally work as he describes with the AI get­ting dumber because the orig­i­nal AI could sim­ply copy itself to ‘design’ a new AI which is 100% as intel­li­gent as itself at lit­tle com­pu­ta­tional cost, but it’s unclear what sort of input/output vari­ables are going into this scal­ing equa­tion. Naam’s endorse­ment of the spreadsheet/chart in the sec­ond post implies that he is think­ing of a model in which the input is some unspec­i­fied unit of com­pu­ta­tion like 1 GPU-year, and the out­put is an addi­tional ‘unit’ of intel­li­gence, in which case it does make sense to observe that where the AI pre­vi­ously got a 100% increase in intel­li­gence for spend­ing that 1 GPU-year, now it only gets a <100% increase; in this sce­nar­io, it gets a smaller increase each com­pu­ta­tion unit and (with appro­pri­ate asymp­tot­ics) it may con­verge on some finite upper bound. But you could just as eas­ily express this rela­tion­ship the other way around, and note that the num­ber of com­pu­ta­tion units for each dou­bling of intel­li­gence is increas­ing steeply. Looked at this way, there’s no rea­son to expect con­ver­gence on a finite bound, or even the intel­li­gence increase to slow down—be­cause the fixed com­pu­ta­tion input assump­tion becomes glar­ing; the AI sim­ply “must con­struct addi­tional pylons”, as it were.

A lit­tle per­spec­tive from ani­mal intel­li­gence may be help­ful here; as a sim­ple mod­el, ani­mal intel­li­gence seems closely related to mod­er­ated by . Start­ing at 0, we have the sponge; by 250,000 neu­rons, we have the fly (which can accom­plish behav­iors like fly­ing around but lit­tle in the way of plan­ning) and the ant (sim­pler loco­mo­tion but capa­ble of sim­ple plan­ning and in con­junc­tion with many other ants, sur­pris­ingly com­plex emer­gent behav­ior); at 1,000,000, we have the frus­trat­ingly tough cock­roach, and at 16,000,000, the frog; by 71,000,000, the com­mon house mouse, which can be taught tricks, solve com­plex plan­ning tasks and mazes, and is respectably intel­li­gent. Clearly the scal­ing here is not lin­ear—it’s hard to argue that the mouse is 284 times smarter than a fly. The scal­ing gets worse as we con­tin­ue; the star-nosed mole has 131,000,000 but is it twice as intel­li­gent as the house mouse? Only at the octo­pus with 500,000,000 does one rec­og­nize a real dif­fer­ence in intel­li­gence, and thank­fully the cat shows up at 760,000,000. But for a crea­ture which has ~11× the neu­rons, the cat does­n’t seem to be as good at catch­ing mice as one might expect! From there, the neu­ron count gets worse and worse—­ca­puchins need almost 4 bil­lion neu­rons, macaques almost dou­ble that, and humans require a cool 86 bil­lion neu­rons, 113× a cat (with ele­phants at 267 bil­lion, but as much as those neu­rons are used up by their enor­mous body size, they are still eeri­ly, dis­turbing­ly, intel­li­gent) Plot­ted on a graph by some for­mal or infor­mal mea­sure­ment of behav­ioral com­plex­i­ty, we have a super-­lin­ear asymp­tot­ic; ani­mal psy­chol­o­gists are always dis­cov­er­ing ways in which human behav­iors have roots in ani­mal antecedents, imply­ing that humans are, on an absolute lev­el, not that much smarter. Surely each neu­ron added along the way suf­fered from dimin­ish­ing returns. We already live in a world with dimin­ish­ing returns to com­pu­ta­tional resources! Yet, despite that asymp­tot­ic, it clearly has been pos­si­ble for humans to defy this scal­ing and develop brains with almost a hun­dred bil­lion neu­rons (and ele­phants triple that) and con­sid­er­able room for fur­ther growth (), and this evo­lu­tion has also led to enor­mous real-­world con­se­quences: humans not just con­trol the earth, they have remade it in their own image, dri­ven count­less species extinct or to the brink of extinc­tion (as other pri­mates can attest) as humans (and their world) changes faster than most species are able to adapt, and done impos­si­ble things like gone to the moon. And all this in a blink of an eye.

Aside from the issue that the com­plex­ity claims are prob­a­bly false, this one is par­tic­u­larly ques­tion­able: small advan­tages on a task do trans­late to large real-­world con­se­quences, par­tic­u­larly in com­pet­i­tive set­tings. A horse or an ath­lete wins a race by a frac­tion of a sec­ond; a stock­-­mar­ket invest­ing edge of 1% annu­ally is worth a bil­lion­aire’s for­tune; a slight advan­tage in pick­ing each move in a game likes chess trans­lates to almost cer­tain vic­tory (con­sider how rank­ing changed with ); a logistics/shipping com­pany which could shave the remain­ing 1–2% of inef­fi­ciency off its plan­ning algo­rithms would have a major advan­tage over its rivals inas­much as ship­ping is one their major costs & the profit mar­gin of such com­pa­nies is itself only a few per­cent­age points of rev­enue; or con­sider & win­ner-­take-all mar­kets. (Or think about safety in some­thing like self­-­driv­ing cars: even a small absolute dif­fer­ence in ‘reac­tion times’ between humans and machines could be enough to drive humans out of the mar­ket and per­haps ulti­mately even make them ille­gal.)

Turn­ing to human intel­li­gence, the absolute range of human intel­li­gence is very small: dif­fer­ences in reac­tion times are small, s range from 3–7, brain imag­ing stud­ies have dif­fi­culty spot­ting neu­ro­log­i­cal dif­fer­ences, the absolute genetic influ­ence on intel­li­gence is on net min­i­mal, and this nar­row range may be a gen­eral phe­nom­e­non about humans (Wech­sler 1935); and yet, in human soci­ety, are these tiny absolute dif­fer­ences in deter­min­ing who will become rich or poor, who will become a crim­i­nal, who will do cut­ting-edge sci­en­tific research, who will get into the Ivy Leagues, who will be a suc­cess­ful politi­cian, and this holds true as high as IQ can be mea­sured reli­ably (see ). (I think this nar­row­ness of objec­tive per­for­mance may help explain why some events sur­prise a lot of observers: when we look at enti­ties below the human per­for­mance win­dow, we just see it as a uni­form ‘bad’ level of per­for­mance, we can’t see any mean­ing­ful dif­fer­ences and can’t see any trends, so our pre­dic­tions tend to be hilar­i­ously opti­mistic or pes­simistic based on our prior views; then, when they finally enter the human per­for­mance win­dow, we can finally apply our exist­ing exper­tise and become sur­prised and opti­mistic, and then the enti­ties can with small objec­tive increases in per­for­mance move out of the human win­dow entirely and it becomes an activ­ity humans are now uncom­pet­i­tive at like chess (be­cause even grand­mas­ters are con­stantly mak­ing mis­takes8) but may still con­tribute a bit on the mar­gin in things like , and within a few years, becomes truly super­hu­man.)

This also ignores the many poten­tial advan­tages of AIs which have noth­ing to do with com­pu­ta­tional com­plex­i­ty; AlphaGo may con­front the same that human Go play­ers do, but as soft­ware it is immor­tal and can be con­tin­u­ously improved, among other advan­tages (Sotala 2012, ).

Are all scenarios one-shot?

Premise 5 would seem to assume that there is no such thing as com­pound inter­est or expo­nen­tial growth or that small advan­tages can accu­mu­late to become crush­ing ones; which of course there is for com­pa­nies, coun­tries, and indi­vid­u­als alike.

Some­thing sim­i­lar has been noted about human intel­li­gence—while any par­tic­u­lar day-­to-­day deci­sion has lit­tle to do with intel­li­gence, the effects of intel­li­gence are con­sis­tently ben­e­fi­cial and accu­mu­late over a life­time, so the ran­dom noise starts to can­cel out, and intel­li­gence is seen to have strong cor­re­la­tions with long-term out­comes (eg ). More abstract­ly, many career or intel­lec­tual out­comes have been noticed to fol­low a roughly (Burt 1943; , Mur­ray 2003); a log-nor­mal dis­trib­uted can be gen­er­ated when an out­come is caused by a ‘leaky pipeline’ (sci­en­tific out­put might be due to moti­va­tion times intel­li­gence times cre­ativ­i­ty…), in which case a small improve­ment on each vari­able can yield a large improve­ment in the out­put.9 For repeated actions, advan­tages can build up. A chess exam­ple: may be the strongest human chess player in his­to­ry, with a peak rat­ing of ~2882; as of 2016, the top-rated is prob­a­bly Komodo at ELO 3358. The ELO expected score for­mula implies that if Komodo 2016 played peak Carlsen, it would have an expected score of , so it would win ~94% of its games; this is impres­sive enough (it would lose only 1 time out of 20), how­ev­er, in the stan­dard 5-game match, it would win not 94%, but ~99.8% of the 5-game matches (los­ing only 1 time out of 500). One thinks of Ama­ra’s law: “We tend to over­es­ti­mate the effect of a tech­nol­ogy in the short run and under­es­ti­mate the effect in the long run.”

Are AI agents rare?

Expand­ing on the obser­va­tion that AIs can have advan­tages which are unre­lated to com­pu­ta­tional com­plex­ity or solv­ing larger instances of prob­lems, is it really the case that a Sin­gu­lar­ity can hap­pen only if AIs are able to sur­pass humans on par­tic­u­lar algo­rith­mic tasks? This is unlike­ly. For exam­ple, in a sce­nario (Sand­berg & Bostrom 2008), such uploaded minds would not nec­es­sar­ily be gifted with any new accom­plish­ments with regard to com­plex­ity classes but can we really swal­low the argu­men­t’s con­clu­sion that this sce­nario sim­ply can­not lead to any major changes wor­thy of the name Sin­gu­lar­i­ty? Far from it—it seems that such an achieve­ment would rad­i­cally change human soci­ety in a num­ber of ways, rang­ing from redefin­ing mor­tal­ity to accel­er­at­ing neu­ro­science to trans­for­ma­tions of the econ­omy as such emu­lated brains can be copied indef­i­nitely onto as much hard­ware as is avail­able (con­sider extrap­o­la­tions in Age of Em; a world-cham­pi­on-level human Go player requires mil­lions of chil­dren sam­pled and 15+ years to train, while an Alp­haZero is sim­ply copied over to another com­put­er). It would be sur­pris­ing if one could run human-level intel­li­gences (per­haps arbi­trar­ily many of them) on a pocket smart­phone, with mil­lions or bil­lions of them across the world, even out­num­ber­ing reg­u­lar bio­log­i­cal humans, and still have no ‘Sin­gu­lar­ity’ through sheer num­bers.

Parable of the Worms

explains to Spi­der-­Man why he is unfriend­ly.

Once upon a time, two worms were debat­ing the prospect of “transwormism”, and specif­i­cally the pos­si­bil­ity of hypo­thet­i­cal crea­tures from else­where in the space of all pos­si­ble organ­isms, which might exceed worms by as much as worms exceed the bac­te­ria they thrive upon, and the impli­ca­tions for their pile of com­post­ing manure which con­tains their worm civ­i­liza­tion. Was this pos­si­ble? Could all species be ranked on some sort of ‘intel­li­gence’ or ‘power’ met­ric? Where would the resources to sup­port such enti­ties come from, and how would they be able to do any­thing more than worms them­selves could do? Crawlviati argues for transwormism:

"There is no a pri­ori rea­son to believe that worms must be the pin­na­cle of cre­ation, and that there are no larger or more com­plex or more intel­li­gent organ­isms pos­si­ble. We should be apply­ing the Pooper­ni­can Prin­ci­ple here—‘the manure piles in which we live are but a tiny seg­ment of the uni­verse in both space and time, of no priv­i­leged per­spec­tive’, and so in the Great Chain of Eat­ing, we should expect us to be in a mediocre posi­tion.

Indeed, from the start of life, we can see many break­throughs: mul­ti­cel­lu­lar life has pro­duced end­less forms most dif­fer­ent from uni­cel­lu­lar life, while fairly recently have neu­rons been invented & just increas­ing neu­ron count seems to yield con­sid­er­able gains (look at us ver­sus bac­te­ri­a), so we live at an excit­ing time. We can spec­u­late about the pos­si­bil­i­ties: a transworm might be com­pletely dif­fer­ent from us worms; or it might be sim­i­lar in archi­tec­ture to us worms, per­haps with a much longer body with many more neu­rons and so much smarter than us.

Regard­less, a transworm would be dif­fi­cult for us to pre­dict, and may be able to develop very fast as it learns new ways of hunt­ing bac­te­ria and self­-fer­til­iza­tion, in what we might call a ‘Moundu­lar­ity’ in which it piles up resources and off­spring faster than any­one else; inas­much as a transworm may have very dif­fer­ent pri­or­i­ties from us and change the envi­ron­ment to fit its needs, it would be dan­ger­ous to us."

Slimepli­cius dis­agrees:

"Ridicu­lous! Think for a moment about your claims. We are blessed with 302 neu­rons, with which we can react to stim­uli, move for­ward, move back­ward, hunt bac­te­ria, and solve chal­leng­ing nav­i­ga­tional puz­zles of many wor­m-lengths.

But these prob­lems exhibit dimin­ish­ing return­s—­worm­pu­ta­tional com­plex­ity the­ory tells us that opti­mal maze nav­i­ga­tion is expo­nen­tially dif­fi­cult, for exam­ple, and many impor­tant prob­lems are worse. Transworms would imme­di­ately find their addi­tional cog­ni­tion to be of ever less mar­ginal value as they run up against the wall of worm­pu­ta­tional com­plex­i­ty.

What could they pos­si­bly do with, say, 1000 neu­rons that would jus­tify a meta­bolic cost over 3 times greater? And to be truly wor­thy of the name transworm, they might need tens of thou­sands, or even mil­lions of neu­rons! (I won’t even bother to debunk fan­ci­ful extrap­o­la­tions to bil­lions of neu­rons, which would be more than exist in pos­si­bly all the manure piles in our observ­able uni­verse put togeth­er.)

Con­sider the absur­dity of such an archi­tec­ture: could our manure pile sup­port a sin­gle such transworm? Where would the food come from? For that mat­ter, how would its body sup­port so many neu­rons? And its genes could no longer spec­ify cell place­ment one by one, but orga­ni­za­tion would have to some­how ‘emerge’ or be ‘learned’, and you are rather obscure about how this might hap­pen.

Not to men­tion the many enor­mously com­pli­cated prac­ti­cal engi­neer­ing prob­lems you seem to gloss over in your blind faith in progress and lines on graphs: for exam­ple, dif­fu­sion would no longer work to feed each cell, requir­ing novel mech­a­nisms solely to move flu­ids around to avoid over­heat­ing or starv­ing to death.

If a transworm could exist, it would be expo­nen­tially dif­fi­cult for it to eat bac­te­ria and repro­duce faster than reg­u­lar worms, and its per­for­mance would likely con­verge with ours: it would solve our prob­lems only slightly bet­ter than us, at tremen­dously increased cost. Con­sider Tur­ing-­com­plete­ness: any­thing a transworm could com­pute, us worms could also com­pute, is it not so? (We could even make an evo­lu­tion­ary argu­ment: we have evolved to be as smart as is opti­mal in our niche—and no more or less.)

Cer­tain­ly, any ‘Moundu­lar­ity’ would be so slow that us worms would smell it com­ing long in advance and wrig­gle together in a big ball to crush it."

Crawlvi­ati:

"Your argu­ment seems overly nar­row to me. Yes, I agree that it would be dif­fi­cult to sup­port so many neu­rons packed together in one worm, but I’m sure the engi­neer­ing dif­fi­cul­ties can be over­come—there seems to be no fun­da­men­tal limit to worm­pu­ta­tion much greater than 302 neu­rons, so there must be a way. And your food objec­tion is like­wise sol­uble: per­haps transworms can migrate from com­post pile to gar­den to com­post pile reg­u­larly as they exhaust resources in each one, or even fig­ure out some way to eas­ily knock down low-hang­ing fruit & let them rot.

They may not, bac­terium for bac­terium or cell for cell, be as effi­cient as us, but that does­n’t mat­ter as long as the dimin­ish­ing returns don’t turn into neg­a­tive returns. As long as the returns are pos­i­tive, they will be able to pay for their increased resource uti­liza­tion and con­tinue climb­ing up the expo­nen­tial curves.

And what does ‘bet­ter’ even mean here? The worm­pu­ta­tional com­plex­ity of a maze may increase sharply with maze size, but that’s a state­ment about mazes, not about com­par­ing maze-­solvers, which might be arbi­trar­ily bet­ter or worse than each oth­er, so there’s a prob­lem: maybe they could solve mazes 100× faster.

Then there’s fig­ur­ing out what any bit of per­for­mance means: if a transworm could solve mazes twice as fast as you or I, maybe it’ll get all the rot­ting food when it beats us in a race to the end, and not less than twice as much.

Heck, we’re worms—what do we know about the world? Maybe there’s more to life, the mound and every­thing; per­haps there are all sorts of cool things we could do, besides ‘stim­u­lus, respon­se; stim­u­lus, respon­se; stim­u­lus, response’—if we could just think for once in our lives!"

Slimepli­cius:

“These claims seem to rest entirely on what I might call an appeal to igno­rance: maybe mazes can be run faster than we can, maybe there are great things which could be done with more neu­rons, maybe there’s lots of food we can’t obtain but could with more intel­li­gence… Sure, maybe I can’t prove that there aren’t, but is any of this what a rea­son­able worm, the ordi­nary worm in the dirt, would believe? Cer­tainly not. We are the pin­na­cle of civ­i­liza­tion, and can hardly be expected to believe in the pos­si­bil­ity of ‘transworms’ with­out even a live exam­ple of a transworm to point to. Cre­ate a transworm and per­haps then I will take your wild spec­u­la­tions more seri­ous­ly.”

Crawlvi­ati, plain­tive:

“If you’ll just think a lit­tle more about the pos­si­bil­i­ties…”

Slimepli­cius, dis­mis­sive­ly:

“There are bet­ter things to worry about: like pile warm­ing. What if our wastes and their decay make our mound too hot for us? We should dis­cuss that instead.”

So they did. A week lat­er, the farm was sold to a real estate devel­oper to build town­houses on. The mound was flat­tened by a steam roller, and then paved over with asphalt—the con­struc­tion work­ers nei­ther loved nor hated the worms, but the worms had noth­ing to offer in trade, and were made of atoms use­ful for widen­ing the new road to make some humans’ com­mute ever so slightly faster.

Conclusion

Com­pu­ta­tional com­plex­ity classes offer lit­tle guid­ance about the capa­bil­i­ties of humans, AIs, or other agents as they are too uni­ver­sal and gen­er­al­ized and do not tightly bind out­comes; at most, they demon­strate that nei­ther humans nor AIs are omnipo­tent. If one wants to put lim­its on the abil­ity of an AI by way of com­pu­ta­tional resources, a much more detailed analy­sis must be done link­ing data/sample effi­ciency or algo­rith­mic per­for­mance to capa­bil­ity improve­ment to per­for­mance on an ensem­ble of tasks & access to addi­tional resources, with the con­se­quent eco­nom­ic, mil­i­tary, or social out­comes.

See Also

Appendix

Technology forecasting errors: functional fixedness in assuming dependencies

A clas­sic cog­ni­tive bias in tech­no­log­i­cal fore­cast­ing is moti­vat­ed-stop­ping and lack of imag­i­na­tion in con­sid­er­ing pos­si­bil­i­ties. Many peo­ple use a men­tal model of tech­nolo­gies in which they pro­ceed in a ser­ial sequen­tial fash­ion and assume every step is nec­es­sary and only all together are they suf­fi­cient, and note that some par­tic­u­lar step is dif­fi­cult or unlikely to suc­ceed and thus as a whole it will fail & never hap­pen. But in real­i­ty, few steps are truly required. Progress is pre­dictably unpre­dictable: A tech­nol­ogy only needs to suc­ceed in one way to suc­ceed, and to fail it must fail in all ways. There may be many ways to work around, approx­i­mate, brute force, reduce the need for, or skip entirely a step, or rede­fine the prob­lem to no longer involve that step at all. Exam­ples of this include the par­al­lel projects used by the Man­hat­tan Project & Apollo pro­gram, which rea­soned that despite the for­mi­da­ble dif­fi­cul­ties in each path to the end goal, at least one would work out­—and they did. In fore­cast­ing, to counter this bias, one should make a strong effort to imag­ine all pos­si­ble alter­na­tives which could be pur­sued in par­al­lel, and remem­ber that over­all fail­ure requires all of them to fail.

“Gal­lant fel­lows, these sol­diers; they always go for the thick­est place in the fence.”

, of the 10

“In express­ing full func­tion, there are no fixed meth­ods.”

11

One of the most com­mon errors in crit­i­ciz­ing a tech­nol­ogy or in fore­cast­ing future progress is to engage in overly rigid inter­pre­ta­tion of what that tech­nol­ogy is: to assume that it will be the same in 20 years as it is now, using the same com­po­nents to achieve the same goals, and each com­po­nent help­lessly depen­dent on another com­po­nent such that fail­ure of one is fail­ure of all. This then leads to pes­simistic extrap­o­la­tions and assump­tions of immi­nent fail­ure which will look comic in ret­ro­spect. And too often a critic set­tles for find­ing a sin­gle way in which, if a tech­nol­ogy were imple­mented or used in the dumb­est way pos­si­ble, that would be bad, as if that was the only way pos­si­ble or if mis­use were uni­ver­sal­ly-inevitable, and declares the ques­tion set­tled. (One is reminded of the joke about a patient who goes to the doc­tor, assumes a yoga posi­tion, and com­plains, “doc­tor, it hurts when I do this!”, to which he replies, “don’t do that, then.”)

The core mis­take in this fal­lacy is rem­i­nis­cent of the cre­ativ­ity tests which ask the sub­ject to accom­plish a task like mak­ing a can­dle holder from com­mon objects like a box of soap and thumb­tacks, or to come up with as many pos­si­ble uses for a brick as pos­si­ble; a sub­ject who insists that a brick is a brick, and will not reimag­ine the box of soap as an open box which could be mounted on the wall, will never solve the tasks well, will be sur­prised when oth­ers accom­plish it eas­i­ly, and per­haps feel deep down that the oth­ers cheated (even if it worked per­fectly well).

Defin­ing a sys­tem, one might treat it as a “con­junc­tive” sys­tem where every­thing has to work: D depends on C1, C1 depends on B1, and B1 depends on A1; so if any prob­lem is found is A1, B1, or C1, the entire sys­tem can be treated as debunked and an impos­si­ble con­trap­tion. It suf­fices to show that prob­a­bly A1~, so prob­a­bly ~D too. The prob­lem with this way of think­ing is that it may be ade­quate for imme­di­ate crit­i­cism of a spe­cific sys­tem and a use­ful approach for engi­neer­ing it, but it is not the same thing as answer­ing whether D will be accom­plished in 10 years or if the sys­tem is a good idea! Because such a crit­i­cism can be fixed by using a dif­fer­ent A2, or per­haps A3, or A4, or A5, or A6, or maybe not even need­ing some­thing like A at all and switch­ing to a dif­fer­ent sys­tem {D, X1, Y1, Z1}. The critic asks “is there any way this sys­tem might not work?” and stops there; but the fore­caster asks, “is there any sys­tem with a way that works?” and keeps going.

Most tech­nol­ogy goals are “dis­junc­tive” in the sense that “any­thing goes” as long as it accom­plishes the final goal, and it does­n’t mat­ter how many dead ends get tried. If one way ‘hurts’, “don’t do that, then” and find a dif­fer­ent way. The critic needs to dis­prove all pos­si­ble ways, and not just fall for the in set­tling for using their exper­tise for the far eas­ier (and far more use­less) task of scat­ter­ing around doubts about a hand­ful of ways. To fly, it is not nec­es­sary to recre­ate a bird’s wing, only to find some way to fly­—and there turn out to be many dif­fer­ent propul­sion meth­ods which suc­ceed at the task of flight but bear lit­tle resem­blance to how a bird flies (hot air, lighter-than-air gas­es, pro­pellers, jet engi­nes, ram­jets, rock­ets, glid­er­s). A critic of the Wright or Mont­golfier broth­ers would have been per­fectly cor­rect in not­ing that their meth­ods could not and would never lead to bird flight, and wrong if they had then tried to pro­claim that human flight was impos­si­ble or cen­turies away. Sim­i­lar­ly, the Man­hat­tan Project and the Apollo Project had a spe­cific tech­nol­ogy goal, but the goal was dis­junc­tive in that any of sev­eral meth­ods of refin­ing ura­nium or plu­to­nium or going to the Moon could have achieved the goal, each had for­mi­da­ble tech­no­log­i­cal & sci­en­tific obsta­cles and had crit­ics pre­dict­ing fail­ure, but only one had to work out; with money no object, they pur­sued mul­ti­ple meth­ods in par­al­lel, and at least one did work out. Sim­i­lar­ly, Thomas Edi­son knew many peo­ple had failed to cre­ate a use­ful elec­tric light­bulb before him, and failed to cre­ate one him­self after try­ing out hun­dreds of vari­ants; did this mean that an elec­tric light­bulb would never be invent­ed? No, it just meant that he had to keep try­ing fil­a­ments until he found one that worked, and that was all he need­ed. One might analo­gize R&D to com­puter hack­ers: both share a secu­rity mind­set where they only need to find one way in past the defenses thrown up in their path by Nature or cor­po­ra­tions; if the fire­wall can’t be pen­e­trat­ed, then try hack­ing their print­er, inject a into their servers, or just try call­ing up the sec­re­tary, say you’re the CEO, and ask for the pass­word!

There is rarely a need to be rigid about the exact method, as long as the result works; we are free to be flex­i­ble.

If vir­tual real­ity head­sets require 4K res­o­lu­tion per eye to deliver sat­is­fac­tory VR expe­ri­ences but this is too hard for GPUs to ren­der fast enough, does this prove VR is impos­si­ble, or does it sim­ply mean we must get a lit­tle cre­ative and explore alter­na­tive solu­tions like using “foveated ren­der­ing” to cheat and ren­der only a frac­tion of the 4K? In 10 years, should you expect this to be an issue? No, because res­o­lu­tion & qual­ity is a dis­junc­tive prob­lem, just like motion-sick­ness in VR before it, which was fixed by a com­bi­na­tion of con­tin­u­ous resolution/optics/FPS/locomotion/controller-hand-tracking/software/game-design fix­es; there is no com­po­nent which can be proven to be unfix­able and uni­ver­sal across all pos­si­ble solu­tions. If vac­uum tubes will, for good physics rea­sons, cease to be shrink­able soon, does that mean Moore’s law is doomed and com­put­ers will always be ENIAC-sized, or, is it a con­junc­tive prob­lem where any com­pu­ta­tional sub­strate can work, and the sig­moid of vac­uum tubes will be replaced by other sig­moids of tran­sis­tors, main­tain­ing Moore’s law into the mod­ern era? (In­deed, any expo­nen­tial growth may turn out on closer inspec­tion to be a stack of sig­moid growths, where alter­na­tives are reg­u­larly found in order to make fur­ther pro­gress; a cor­rect argu­ment that a par­tic­u­lar tech­nol­ogy will soon top out still does not prove that the expo­nen­tial growth will stop, which would require prov­ing that no alter­na­tives would be found after the cur­rent sig­moid.) If GOFAI expert sys­tems turn out to not be great, are expert sys­tems a con­junc­tive prob­lem where they are the only pos­si­ble way of approach­ing arti­fi­cial intel­li­gence, or can arti­fi­cial intel­li­gence be solved in many ways and progress con­tin­ue? If Bit­coin trans­ac­tions are only pseu­do­ny­mous and cryp­tocur­rency blockchains scale poor­ly, are those con­junc­tive or dis­junc­tive prob­lems? (The answer should be easy even if you do not fol­low all the pro­pos­als for improve­ments like SegWit/Mimblewimble/sharding/MAST/scriptless scripts/Schnorr etc.) If genetic stud­ies like GWASes are only able to explain a small per­cent­age of mea­sured indi­vid­ual dif­fer­ences at a par­tic­u­lar point in time, does that prove GWASes are a failed method­ol­ogy and that genet­ics in gen­eral is unim­por­tant, or should we note that data will con­tinue pil­ing up and that there are a dozen ways in which GWASes are inef­fi­cient & we expect the growth in pre­dic­tive power to con­tin­ue? How many dif­fer­ent sen­sors could self­-­driv­ing cars draw on to achieve human-level safe­ty? etc. A sim­i­lar obser­va­tion holds for impos­si­bil­ity proofs and no-go the­o­rems.

Often, these argu­ments seem impres­sive but in prac­tice, peo­ple con­tinue to do the impos­si­ble or go where they should­n’t; why?

The proofs may be entirely cor­rect, but as with any proof’s assump­tions and the­o­rem, the devil is in the details: per­haps the assump­tions are not plau­si­ble in the first place, or the the­o­rem does­n’t mean what you thought it meant. It might be impos­si­ble to cre­ate an algo­rithm with 100% accu­ra­cy, but the errors occur only on inputs that never occur, or the result is an approx­i­ma­tion which is how­ever indis­tin­guish­able from an exact answer for all intents & pur­poses or is less likely to be wrong than one’s com­puter or brain to be hit by a cos­mic ray or stroke while think­ing it through. And so on. A rel­e­vant exam­ple of this phe­nom­e­non comes up in dis­cussing AI risk: one type of argu­ment that AIs can­not ever be dan­ger­ous and AI risk is not real relies on an argu­ment from worst-­case algo­rith­mic com­plex­i­ty, observ­ing that many impor­tant prob­lems like 3SAT are NP-hard, where the required com­pu­ta­tion quickly increases to infea­si­ble lev­els, so humans and AIs must be equiv­a­lently intel­li­gent, there­for AIs can­not be dan­ger­ous (see above); unfor­tu­nate­ly, every premise is vul­ner­a­ble—­worst-­case is often irrel­e­vant, the ignored con­stant fac­tors can be crit­i­cal, small dif­fer­ences in intelligence/capability can have large com­pound­ing effects par­tic­u­larly in com­pe­ti­tions, prob­lems can be rearranged to bypass or make them eas­ier, extremely large amounts of com­pu­ta­tion can be thrown at prob­lems, AIs could have many advan­tages like immor­tal­ity which are sim­ply orthog­o­nal to com­pu­ta­tional com­plex­i­ty, etc. There is noth­ing wrong with the proofs of what com­plex­ity class 3SAT is in, it’s just that what is proven is not nec­es­sary to the argu­ment and what is nec­es­sary to the argu­ment is not proven.

When is it valid to argue from a con­junc­tive per­spec­tive?

Some­times, it is pos­si­ble to come up with a uni­ver­sal proof like one based on laws of physics; if a goal turns out to be ther­mo­dy­nam­i­cally or infor­ma­tion-the­o­ret­i­cally impos­si­ble, that’s a good hint that no one is going to achieve it. So for the much-de­bated pos­si­bil­ity of cry­on­ics, a valid dis­proof would be to demon­strate that all the rel­e­vant infor­ma­tion in the human brain is lost or destroyed within minutes/hours of death; an invalid con­junc­tive dis­proof would be to set up a chain of prob­a­bil­i­ties like “10% chance you die in a hos­pi­tal, 10% chance you are vit­ri­fied within a few hours, 10% chance ALCOR does­n’t go bank­rupt, 10% chance your body can be regrown health­ily… thus, there’s ~0% chance it’ll work”, because each step is not really con­junc­tive but dis­junc­tive (one can die else­where, the time win­dow is not known, a cry­on­ics orga­ni­za­tion bank­ruptcy ≠ rethaw­ing, grow­ing bod­ies is not the only pos­si­ble good out­come, etc).

Other times a tech­nol­ogy may be inher­ently bot­tle­necked at a step all of which pos­si­ble imple­men­ta­tions are very dif­fi­cult and it becomes qua­si­-­con­junc­tive, ver­sus a com­pet­ing par­a­digm which is dis­junc­tive. An exam­ple there is iter­ated embryo selec­tion (IES) vs genome syn­the­sis for cre­at­ing ultra­-high­ly-op­ti­mized human genomes: iter­ated embryo selec­tion requires close con­trol over con­vert­ing stem cells to gametes and back in order to exert selec­tion pres­sure over many ‘gen­er­a­tions’ to opti­mize for par­tic­u­lar traits, and given the trick­i­ness of coax­ing cells into regress­ing to stem cells and devel­op­ing into par­tic­u­lar cell types, there is prob­a­bly only one effec­tive pro­ce­dure and no real alter­na­tives; but ‘genome syn­the­sis’ is sim­ply the idea of syn­the­siz­ing an entire human genome from scratch to a pre-spec­i­fied opti­mal­ly-de­signed genome, and, like the many com­pet­ing tech­niques in genome sequenc­ing, there are many dif­fer­ent meth­ods of com­bin­ing nucleotides to build up a cus­tom genome, which con­tribute to the (yes, expo­nen­tial) fall in genome syn­the­sis costs, and many fur­ther meth­ods pro­posed (like spe­cial­ized yeast) to reduce costs even fur­ther to the point where human genomes can be syn­the­sized for a few mil­lion dol­lars each (and then reused arbi­trar­ily often). IES is strongly con­junc­tive: if the stem cell steps can’t be solved, the entire cycle falls apart; genome syn­the­sis is dis­junc­tive: any of sev­eral meth­ods can be used to syn­the­size nucleotides, and many more can be tried. Which one can we expect first? IES has a con­sid­er­able head­start but genome syn­the­sis pro­gresses like a metronome because if it gets blocked on one track, it can hop to anoth­er; so my money is on genome syn­the­sis at the moment.

So in fore­cast­ing some­thing like GWASes, Bit­coin, or VR, here are some use­ful ques­tions to ask:

  1. Are there any hard con­straints from pow­er­ful the­o­ries like ther­mo­dy­nam­ics, evo­lu­tion, or eco­nom­ics bound­ing the sys­tem as a whole?
  2. Break­ing down by func­tion­al­i­ty, how many dif­fer­ent ways are there to imple­ment each log­i­cal com­po­nent? Is any step a seri­ous bot­tle­neck with no appar­ent way to cir­cum­vent, rede­fine, or solve with brute force?
  3. Do its past improve­ments describe any lin­ear or expo­nen­tial trend? What hap­pens if you extrap­o­late it 20 years (in the grand scheme of things, a minor error of tim­ing)? If the trend is expo­nen­tial, is it a set of stacked sig­moids indi­cat­ing con­sid­er­able flex­i­bil­ity in choice of technology/method?
  4. What could this ben­e­fit from in each of its com­po­nents and as a whole? If the cost decreases 5% for every dou­bling of cumu­la­tive pro­duc­tion, what hap­pens and how much demand do the price drops induce? Are any of the inputs cumu­la­tive, like dataset size (eg. num­ber of genomes sequenced to date)?
  5. Does it ben­e­fit from var­i­ous forms of Moore’s laws for CPUs, RAM, HDD, or GPU FLOPs? Can fixed-­cost hard­ware or humans be replaced by soft­ware, Inter­net, or sta­tis­tics? Is any part par­al­leliz­able?
  6. What cal­iber of intel­lect has been applied to opti­miza­tion? What amount of ‘con­stant fac­tor’ slack is left in the sys­tem to be micro-op­ti­mized away?

  1. , for exam­ple, has tremen­dous game tree com­plex­i­ty. (Even decid­ing the win­ner of a Go board is sur­pris­ingly dif­fi­cult—, so pos­si­bly worse than NP.) Nev­er­the­less, AIs greatly sur­pass human abil­i­ties at them and as of 2018, even ‘cen­taur’ teams no longer add any­thing to the AI per­for­mance. Since this is the case already, Myhrvold’s argu­ment that math’s com­plex­ity makes it immune to AI is under­mined by his own exam­ples.↩︎

  2. “Shifts In Algo­rithm Design”, Lipton/Regan:

    Now today, in the 21st cen­tu­ry, we have a bet­ter way to attack prob­lems. We change the prob­lem, often to one that is more tractable and use­ful. In many sit­u­a­tions solv­ing the exact prob­lem is not really what a prac­ti­tioner needs. If com­put­ing X exactly requires too much time, then it is use­less to com­pute it. A per­fect exam­ple is the weath­er: com­put­ing tomor­row’s weather in a week’s time is clearly not very use­ful. The bril­liance of the cur­rent approach is that we can change the prob­lem. There are at least two major ways to do this:

    • Change the answer required. Allow approx­i­ma­tion, or allow a par­tial answer. Do not insist on an exact answer.
    • Change the algo­rith­mic method. Allow algo­rithms that can be wrong, or allow algo­rithms that use ran­dom­ness. Do not insist that the algo­rithm is a per­fect deter­min­is­tic one.

    This is exactly what Chayes and her co-au­thors have done.

    ↩︎
  3. Rin­ta­nen 2012, , dis­cussing how to turn AI plan­ning prob­lems into SAT prob­lems which can be solved effi­cient­ly, notes that

    A pecu­liar­ity of SAT prob­lems obtained by trans­la­tion from the stan­dard plan­ning bench­mark prob­lems from the plan­ning com­pe­ti­tions, in con­trast to SAT prob­lems rep­re­sent­ing many other appli­ca­tions, is their extremely large size and the fact that these prob­lems can still often be solved quick­ly. The largest SAT prob­lems Lin­geling solves (within the time bounds explained ear­lier) are instance 41 of AIRPORT (417476 propo­si­tional vari­ables, 92.9 mil­lion claus­es) and instance 26 of TRUCKS (926857 propo­si­tional vari­ables, 11.3 mil­lion claus­es).

    Our plan­ner solves instance 49 of AIRPORT (13840 actions and 14770 state vari­ables) with a com­pleted unsat­is­fi­a­bil­ity test for hori­zon 65, with 1.12 mil­lion propo­si­tional vari­ables and 108.23 mil­lion claus­es, and a plan for hori­zon 85, with 1.46 mil­lion propo­si­tional vari­ables and 141.54 mil­lion claus­es. The plan­ner also solves instance 33 of SATELLITE (989250 actions and 5185 state vari­ables), with a plan found for hori­zon 20, with 19.89 mil­lion propo­si­tional vari­ables and 69.99 mil­lion claus­es, back­track­-free in 14.50 sec­onds exclud­ing trans­la­tion into SAT and includ­ing search effort for shorter hori­zons. These are extreme cas­es. More typ­i­cal SAT instances have less than 2 mil­lion propo­si­tional vari­ables and a cou­ple of mil­lion clauses

    ↩︎
  4. , 1988; Dyson is summarizing/paraphrasing a weather pre­dic­tion lec­ture by von Neu­mann ~1950. It’s unclear if von Neu­mann said this exact thing, although it is usu­ally attrib­uted to him.↩︎

  5. Attrib­uted to him in 2009 by his col­league .↩︎

  6. Some exam­ples of this folk wis­dom: Can­tor & Zassen­haus 1981:

    The asymp­tot­i­cally best algo­rithms fre­quently turn out to be worst on all prob­lems for which they are used.

    “Notes on Pro­gram­ming on C”, Rob Pike:

    Rule 3. Fancy algo­rithms are slow when n is small, and n is usu­ally small. Fancy algo­rithms have big con­stants. Until you know that n is fre­quently going to be big, don’t get fan­cy. (Even if n does get big, use Rule 2 first.)

    Knuth:

    In gen­eral I’m look­ing for more focus on algo­rithms that work fast with respect to prob­lems whose size, n, is fea­si­ble. Most of today’s lit­er­a­ture is devoted to algo­rithms that are asymp­tot­i­cally great, but they are help­ful only when n exceeds the size of the uni­verse…An­other issue, when we come down to earth, is the effi­ciency of algo­rithms on real com­put­ers. As part of the Stan­ford Graph­Base project I imple­mented four algo­rithms to com­pute min­i­mum span­ning trees of graphs, one of which was the very pretty method that you devel­oped with Cheri­ton and Karp. Although I was expect­ing your method to be the win­ner, because it exam­ines much of the data only half as often as the oth­ers, it actu­ally came out two to three times worse than Kruskal’s ven­er­a­ble method. Part of the rea­son was poor cache inter­ac­tion, but the main cause was a large con­stant fac­tor hid­den by O nota­tion.

    More specif­i­cal­ly: “Knuth did a com­par­i­son between Fibonacci heap and binary heaps for min­i­mum span­ning trees back in 1993 for his book Stan­ford Graph­Base. He found Fibonacci to be 30 to 60% slower than binary heaps at the graph sizes he was test­ing, 128 ver­tices at dif­fer­ent den­si­ties.”

    On the :

    The Cop­per­smith-Wino­grad algo­rithm is fre­quently used as a build­ing block in other algo­rithms to prove the­o­ret­i­cal time bounds. How­ev­er, unlike the Strassen algo­rithm, it is not used in prac­tice because it only pro­vides an advan­tage for matri­ces so large that they can­not be processed by mod­ern hard­ware.[6]

    Some algo­rithms are par­tic­u­larly infa­mous for their excel­lent asymp­tot­ics but abysmal con­stant fac­tors, such as the com­putable ver­sions of AIXI. Lip­ton dubs such algo­rithms “galac­tic algo­rithms”.↩︎

  7. The only author on the Sin­gu­lar­ity I know of who claims an actual indef­i­nite increase in intel­li­gence to infin­i­ty, tak­ing ‘sin­gu­lar­ity’ quite lit­er­ally and not as Vinge’s metaphor/comparison, would be ideas, but as far as I know, even assum­ing the cor­rect­ness of his cal­cu­la­tions, his infi­nite intel­li­gence is phys­i­cally pos­si­ble only under a num­ber of cos­mo­log­i­cal con­di­tions, some of which do not seem to be true (such as a closed uni­verse rather than a flat expand­ing one).↩︎

  8. , 2013:

    Vasik Rajlich, the pro­gram­mer behind , gives a more pes­simistic spin to what we have learned from the chess-­play­ing pro­grams. In Rajlich’s view, the strik­ing fact about chess is how hard it is for humans to play it well. The out­put from the pro­grams shows that we are mak­ing mis­takes on a very large num­ber of moves. Ken’s mea­sures show that even top grand­mas­ters, except at the very peaks of their per­for­mances, are for­tu­nate to match Rybka’s rec­om­men­da­tions 55% of the time. When I com­pare a grand­mas­ter game to an ongo­ing Rybka eval­u­a­tion, what I see is an ini­tial posi­tion of value being squan­dered away by blun­der­s—if only small ones—a­gain and again and again. It’s a bit depress­ing. Rajlich stresses that humans blun­der con­stant­ly, that it is hard to be objec­tive, hard to keep con­cen­trat­ing, and hard to cal­cu­late a large num­ber of vari­a­tions with exact­ness. He is not talk­ing here about the club patzer but rather the top grand­mas­ters: “I am sur­prised how far they are from per­fec­tion.” In ear­lier times these grand­mas­ters had a kind of aura about them among the chess-view­ing pub­lic, but in the days of the pro­grams the top grand­mas­ters now com­mand less respect. When a world-­class player plays a club expert, the world-­class player looks bril­liant and invin­ci­ble at the board. Indeed, the world-­class player does choose a lot of very good moves. At some point his supe­rior posi­tion starts “play­ing itself,” to use an expres­sion from the chess world, and just about every­thing falls into place. When the same world-­class player takes on Shred­der, to select an appro­pri­ately named pro­gram, he seems more like a hap­less fool who must exert great care to keep the sit­u­a­tion under con­trol at all. And yet it is the very same play­er. That gap—­be­tween our per­cep­tion of supe­rior human intel­lect and its actual real­i­ty—is the sober­ing les­son of the pro­grams.

    See also , Ander­son et al 2016, and for com­par­ison, AlphaGo Zero: the first AlphaGo was trained to pre­dict human player moves, achiev­ing ~55% accu­ra­cy; Zero plays Go far bet­ter but pre­dicts notice­ably worse (~51%) and its pre­dic­tions get worse even as it gets bet­ter at play­ing or pre­dict­ing the ulti­mate win­ner, imply­ing that human experts also are able to choose the best move only 50% or less of the time.↩︎

  9. Shock­ley notes that with 8 vari­ables and an advan­tage of 50%, the out­put under a log-nor­mal model would be increased by as much as 25×:

    simulateLogNormal <- function(advantage, n.variables, iters=100000) {
      regular <- 1
      advantaged <- replicate(iters, Reduce(`*`, rnorm(n.variables, mean=(1+advantage), sd=1), 1))
      ma <- mean(advantaged)
      return(ma)
    }
    simulateLogNormal(0.5, 8)
    # [1] 25.58716574

    With more vari­ables, the out­put dif­fer­ence would be larger still, and is con­nected to the . This poses a chal­lenge to those who expect small dif­fer­ences in abil­ity to lead to small out­put dif­fer­ences, as the log-nor­mal dis­tri­b­u­tion is com­mon in the real world, and also implies that hybrid human-AI sys­tems are unsta­ble as the human will quickly become a bot­tle­neck.↩︎

  10. Spo­ken to ; recorded pg296 of The Naval Mem­oirs 1910–1915, Keyes 1934.↩︎

  11. As quoted by in , Mikkel­son 1997.↩︎