Surprisingly Turing-Complete

A catalogue of software constructs, languages, or APIs which are unexpectedly Turing-complete; implications for security and reliability
computer-science⁠, philosophy⁠, insight-porn
2012-12-092019-06-15 finished certainty: highly likely importance: 6 backlinks

‘Computers’, in the sense of being Turing-complete, are extremely common. Almost any system of sufficient complexity—unless carefully engineered otherwise—may be found to ‘accidentally’ support Turing-complete somewhere inside it through ‘weird machines’ which can be rebuilt out of the original system’s parts, even systems which would appear to have not the slightest thing to do with computation. Software systems are especially susceptible to this, which often leads to serious security problems as the Turing-complete components can be used to run attacks on the rest of the system.

I provide a running catalogue of systems which have been, surprisingly, demonstrated to be Turing-complete. These examples may help unsee surface systems to see the weird machines and Turing-completeness lurking within.

Turing-completeness () is (avoiding pedantically rigorous formal definitions) the property of a system being able to, under some simple representation of input & output, compute any program of interest, including another computer in some form.

TC, besides being foundational to computer science and understanding many key issues like “why a perfect antivirus program is impossible”, is also weirdly common: one might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite—it is difficult to write a useful system which does not immediately tip over into TC. “Surprising” examples of this behavior remind us that TC lurks everywhere, and security is extremely difficult.

I like demonstrations of TC lurking in surprising places because they are often a display of considerable ingenuity, and feel like they are making a profound philosophical point about the nature of computation: computation is not something esoteric which can exist only in programming languages or computers carefully set up, but is something so universal to any reasonably complex system that TC will almost inevitably pop up unless actively prevented.

Accidentally Turing-complete

“Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.”

 #1636, “ Stack”

They are probably best considered as a subset of “discovered” or “found” s (esolangs). So ⁠, as extraordinarily minimalist as it is, does not count; nor would a deliberately obfuscated language like (where it took years to write a trivial program) count because it was designed to be an esolang; but neither would count because questions about whether it was TC appeared almost immediately upon publication and being able to program Tetris in it is not surprising, and given the complexity of packet-switching Ethernet networks & routers it’s not necessarily too surprising if one can build a cellular automaton into them or encode logical circuits⁠, or if airplane ticket planning/validation is not just NP-hard or EXPSPACE-hard but undecidable (because of the complex rules airlines require). Many configuration or special-purpose languages or tools or complicated games turn out to violate the & be “accidentally Turing-complete”⁠, like Sendmail1⁠, MediaWiki templates⁠, sed or repeated regexp/find-replace commands in an editor (any form of or templating or compile-time computation is highly likely to be TC on its own or when iterated since they often turn out to support a or a language or tag system eg esolangs “/// ” or Thue), XSLT⁠, Infinite Minesweeper⁠, Dwarf Fortress2⁠, Starcraft, Minecraft⁠, ⁠, Transport Tycoon & Cities: Skyline⁠, & ⁠, ⁠/  etc are TC but these are not surprising either: many games support scripting (ie TC-ness) to make their development easier and enable fan modifications, so games’ TC may be as simple as including syntax for calling out to a better-known language like Perl, or it may just be an obscure part of a standard format (most people these days are probably unaware that & many fonts are programs based on stack machines, similar to DWARF debugging and ELF metadata⁠, or that go beyond in providing scripting capabilities and must be interpreted to be displayed; once one knows this, then fonts being TC are no more surprising than TeX documents being TC, leading of course, to many severe & fascinating font or media security vulnerabilities such as the BLEND vulnerability or SNES & NES code exploiting Linux systems⁠. Other formats, like PDF, are simply appalling.3). Similarly, such feats as creating a small Turing machine using Legos or 4 would not count, since we already know that mechanical computers work. On the other hand, “weird machines” are a fertile ground of “that’s TC?” reactions.

Surprisingly Turing-complete

Many cases of discovering TC seem to consist of simply noticing that a primitive in a system is a little too powerful/flexible. For example, if Boolean logic can be implemented, that’s a sign that more may be possible and turn Boolean circuits into full-blown circuit logic for a TM. Substitutions, definitions/abbreviations, regular expressions (especially with any extensions or custom features), or any other kind of ‘search and replace’ functionality is another red flag, as they suggest that a cellular automaton or tag system is lurking. This applies to anything which can change state based on ‘neighbors’, like a spreadsheet cell or a pixel. Any sort of scripting interface or API, even if locked down, may not be locked down quite enough. An actual scripting language or VM is so blatant as to be boring when (not if) someone finds a vulnerability or escape from the sandbox. Operations which take variable lengths of times or whose completion can’t easily be predicted from the start are another source of primitives, as they may ‘depend’ on the data they are operating over in some way, implementing different operations on different data, which may mean that they can be made equivalent to Boolean conditionals based on a careful encoding of data.

What is “surprising” may differ from person to person. Here is a list of accidentally-Turing-complete systems that I found surprising:

Possibly accidentally or surprisingly Turing-complete systems:

Security implications

It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This matters because, if one is clever, it provides an escape hatch from a system which is small, predictable, controllable, and secure, to one which could do anything. It’s hard enough to make a program do what it’s supposed to do without giving anyone in the world the ability to insert another program into your program. Even if there is no way to outright ‘escape’ the sandbox, such hidden programs can be dangerous, by extracting information about the surrounding program (eg JS embedded in a web page which can extract your passwords by using to attack your hardware directly, even if it can’t actually escape your web browser), or can take the host into strange & uncharted (and untested) territories.

That we find these demonstrations surprising is itself a demonstration of our lack of imagination and understanding of computers, computer security, and AI. We pretend that we are running programs on these simple abstract machines which do what we intuitively expect, but they run on computers which are bizarre, and our programs themselves turn out to be computers which are even more bizarre. Secure systems have to be built by construction; once the genie has been let out of the lamp, it’s difficult to patch the lamp.

An active area of research is into languages & systems carefully designed and proven to not be TC (eg. ). Why this effort to make a language in which many programs can’t be written? Because TC is intimately tied to & ⁠, allowing TC means that one is forfeiting all sorts of provability properties: in a non-TC language, one may be able to easily prove all sorts of useful things to know; for example, that programs terminate, that they are type-safe or not, that they can be easily converted into a logical theorem, that they consume a bounded amount of resources, that one implementation of a protocol is correct or equivalent to another implementation, that there are a lack of and a program can be transformed into a logically-equivalent but faster version (particularly important for declarative languages like where the being able to transform queries is key to acceptable performance, but of course one can do a surprising amount in SQL like 3D raytracing gradient descent for fitting machine learning models and some SQL extensions make it TC anyway by allowing either a to be encoded, the model DSL⁠, or to call out to ) etc.

Languages or systems which unintentionally cross over the line into being TC can be amusing or useful (although ), but they also have some serious implications: such systems, because they were never expected to be programmable, can be harmful, or extremely insecure & a cracker’s delight, as exemplified by the “language-theoretic security” paradigm, based on exploiting ⁠; some of the literature:

Most recently, & generalizations () can be interpreted as providing a whole ‘shadow computer’ in the CPU via which can be programmed to do things like run malware without visibly executing any of the malware instructions while having side-effects in the real computer. Spectre is interesting in being a class of vulnerabilities which have existed for decades in CPU architectures that were closely scrutinized for security problems, but just sort of fell into a collective human blind spot. Nobody thought of controllable speculative execution as being a ‘computer’ which could be ‘programmed’. Once someone noticed, because it was a powerful computer and of course TC, it could be used in many ways to attack stuff.

“Too powerful” languages can also manifest as nasty DoS attacks; the afl found that it could create an infinite loop in document formatting tool (first version, 43 years prior) by abusing some of the string substitution rules.

See Also


How Many Computers Are In Your Computer?

Why are there so many places for backdoors and weird machines in your “computer”? Because your computer is in fact scores or hundreds, perhaps even thousands, of computer chips, many of which host weird machines and are explicitly or implicitly capable of Turing-complete computations (many more powerful than desktops of bygone eras), working together to create the illusion of a single computer. Backdoors, bugs, weird machines, and security do not care about what you think—only where resources can be found and orchestrated into a computation.

Moved to “How Many Computers Are In Your Computer?”⁠.

On Seeing Through and Unseeing

To draw some parallels here and expand Dullien 2017⁠, I think unexpected Turing-complete systems and weird machines have something in common with heist movies or cons or stage magic: they all share a specific paradigm we might call the security mindset or hacker mindset.

What they/OP/security/ ⁠/hacking/  all have in common is that they show that the much-ballyhooed ‘hacker mindset’ is, fundamentally, a sort of reductionism run amok, where one ‘sees through’ abstractions to a manipulable reality.6 Like Neo in the Matrix—a deeply cliche analogy for hacking, but cliche because it resonates—one achieves enlightenment by seeing through the surface illusions of objects and can now see the endless lines of green code which make up the Matrix (and vice-versa).

In each case, the fundamental principle is that the hacker asks: “here I have a system W, which pretends to be made out of a few Xs⁠; however, it is really made out of many Y, which form an entirely different system, Z; I will now proceed to ignore the X and understand how Z works, so I may use the Y to thereby change W however I like”.

Moved to “On Seeing Through and Unseeing: The Hacker Mindet”⁠.

  1. The baroque complexity of possibly contributed to its infamous reputation for insecurity—it was one of the exploit vectors of the ⁠, and for years shipped with a remote root backdoor (WIZ).↩︎

  2. Dwarf Fortress provides clockwork mechanisms, so TC is unsurprising; but the water is implemented as a simple cellular automation, so there might be more ways of getting TC in DF! The DF wiki currently lists 4 potential ways of creating logic gates: the fluids, the clockwork mechanisms, mine-carts, and creature/animal logic gates involving doors+pressure-sensors.↩︎

  3. The full PDF specification is notoriously bloated. For example, in a PDF viewer which supports enough of the PDF spec, like the Google Chrome Browser, one can play Breakout (because PDF includes its own weird subset of JavaScript). The Adobe PDF viewer includes functionality as far afield as 3D CAD support.↩︎

  4. See Think Math’s domino logic gates & 2014 demonstration of a 4-bit adder implemented using domino logic⁠.↩︎

  5. Earlier versions required players to take all possible actions, but the 2019 paper claims to remove this assumption and force all actions, rendering the construction fully mechanical.↩︎

  6. ‘Thinking outside the box’ can be this, but often isn’t. This is a specific pattern of reductionism, and many instances of ‘thinking outside the box’ are other patterns, like putting on another layer, or eliminating the systems in question entirely.↩︎