Skip to main content

Surprisingly Turing-Complete

A catalogue of software constructs, languages, or APIs which are unexpectedly Turing-complete; implications for security and reliability

2012-12-092021-12-15 finished
certainty: highly likely importance: 6
backlinks similar bibliography

‘Computers’, in the sense of being Turing-complete, are extremely common. Almost any system of sufficient complexity—unless carefully engineered otherwise—may be found to ‘accidentally’ support Turing-complete somewhere inside it through ‘weird machines’ which can be rebuilt out of the original system’s parts, even systems which would appear to have not the slightest thing to do with computation. Software systems are especially susceptible to this, which often leads to serious security problems as the Turing-complete components can be used to run attacks on the rest of the system.

I provide a running catalogue of systems which have been, surprisingly, demonstrated to be Turing-complete. These examples may help unsee surface systems to see the weird machines and Turing-completeness lurking within.

Turing-completeness (TC) is (avoiding pedantically rigorous formal definitions) the property of a system being able to, under some simple representation of input & output, compute any program of interest, including another computer in some form.

TC, besides being foundational to computer science and understanding many key issues like “why a perfect antivirus program is impossible”, is also weirdly common: one might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite—it is difficult to write a useful system which does not immediately tip over into TC. “Surprising” examples of this behavior remind us that TC lurks everywhere, and security is extremely difficult.

I like demonstrations of TC lurking in surprising places because they are often a display of considerable ingenuity, and feel like they are making a profound philosophical point about the nature of computation: computation is not something esoteric which can exist only in programming languages or computers carefully set up, but is something so universal to any reasonably complex system that TC will almost inevitably pop up unless actively prevented.

Accidentally Turing-complete

“Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp⁠.”

Greenspun’s Tenth Law

XKCD #1636, “XKCD Stack”

They are probably best considered as a subset of “discovered” or “found” esoteric programming languages (esolangs). So FRACTRAN⁠, as extraordinarily minimalist as it is, does not count; nor would a deliberately obfuscated language like Malbolge (where it took years to write a trivial program & an interpreter was only solved in 2020) count because it was designed to be an esolang; but neither would Conway’s Game of Life count because questions about whether it was TC appeared almost immediately upon publication and being able to program Tetris in it is not surprising, and given the complexity of packet-switching Ethernet networks & routers it’s not necessarily too surprising if one can build a cellular automaton into them or encode logical circuits⁠, or if airplane ticket planning/​validation is not just NP-hard or EXPSPACE-hard but undecidable (because of the complex rules airlines require). Many configuration or special-purpose languages or tools or complicated games turn out to violate the Rule of least power & be “accidentally Turing-complete”⁠, like Sendmail1⁠, MediaWiki templates⁠, sed or repeated regexp / find-replace commands in an editor (any form of string substitution or templating or compile-time computation is highly likely to be TC on its own or when iterated since they often turn out to support a lambda calculus or a term-rewriting language or tag system eg. esolangs “ /  /  /  ” or Thue), XSLT⁠, Infinite Minesweeper⁠, Dwarf Fortress2⁠, Starcraft, Minecraft in Minecraft⁠, Ant⁠, Transport Tycoon & Cities: Skyline⁠, C++ templates & Java generics⁠, RNA⁠/​DNA computing etc are TC but these are not surprising either: many games support scripting (ie. TC-ness) to make their development easier and enable fan modifications, so games’ TC may be as simple as including syntax for calling out to a better-known language like Perl (which can’t even be parsed without undecidability coming up), or it may just be an obscure part of a standard format (most people these days are probably unaware that TrueType & many fonts are PostScript programs based on stack machines, similar to DWARF debugging and ELF metadata⁠, or that some music formats go beyond MIDI in providing scripting capabilities and must be interpreted to be displayed; once one knows this, then fonts being TC are no more surprising than TeX documents being TC, leading of course, to many severe & fascinating font or media security vulnerabilities such as the BLEND vulnerability or SNES & NES code exploiting Linux systems⁠. Other formats, like PDF, are simply appalling.3). Similarly, such feats as creating a small Turing machine using Legos or dominos4 would not count, since we already know that mechanical computers work. It may be somewhat surprising that the Z3 electromechanical computer is Turing-complete given that it could only “execute fixed sequences of of floating-point arithmetical operations (addition, subtraction, multiplication, division, and square roots)”—but with all those operations, the camel’s nose is in the tent.

On the other hand, “weird machines” are a fertile ground of “that’s TC?” reactions.

Surprisingly Turing-complete

Many cases of discovering TC seem to consist of simply noticing that a primitive in a system is a little too powerful/​flexible. For example, if Boolean logic can be implemented, that’s a sign that more may be possible and turn Boolean circuits into full-blown circuit logic for a TM. Substitutions, definitions/​abbreviations, regular expressions (especially with any extensions or custom features), or any other kind of ‘search and replace’ functionality is another red flag (eg. Notepad++ Turing machines), as they suggest that a cellular automaton or tag system is lurking. This applies to anything which can change state based on ‘neighbors’, like a spreadsheet cell or a pixel. Any sort of scripting interface or API, even if locked down, may not be locked down quite enough. An actual scripting language or VM is so blatant as to be boring when (not if) someone finds a vulnerability or escape from the sandbox. Operations which take variable lengths of times or whose completion can’t easily be predicted from the start are another source of primitives, as they may ‘depend’ on the data they are operating over in some way, implementing different operations on different data, which may mean that they can be made equivalent to Boolean conditionals based on a careful encoding of data.

What is “surprising” may differ from person to person. Here is a list of accidentally-Turing-complete systems that I found surprising:

Possibly accidentally or surprisingly Turing-complete systems:

  • CSS without mouse clicks? Perhaps some sort of Wang tile using reflections and conditionals?

  • Unicode? Nicolas Seriot suggests that Unicode’s bidirectional algorithms (intended for displaying scripts like Arabic or Hebrew which go right-to-left rather than left-to-right like English) may be complex enough to support a tag system via case folding rules (eg. Turkish).

    • Fonts themselves? Font rendering systems also support glyph substitution rules which are suspiciously close to tag systems and allow for tricks like solving Fizz Buzz or rendering sparklines⁠, although some implementations may not allow recursion and others have hardcoded depth limits.

      One interestingly practical use is ZawDecode⁠, a Burmese font which tries to partially solve the problem that much Burmese is written in a fake Unicode called Zawgyi font which defines the meaning of each Unicode point totally differently from the official Unicode Burmese, so a string of text which is written in ‘Zawgyi’ will be a ‘valid’ UTF-8 string but gibberish, and vice-versa. ZawDecode abuses ligatures to guess based on the gibberish what a string is written in, so it can either render it according to Zawgyi or as regular Burmese. (The author claims “85%” accuracy in guessing Zawgyi but no details are available in English.)

      By far the most impressive use of ‘font programming’ so far is Fontemon: World’s first video game in a font!” (a Choose-Your-Own-Adventure text game).

  • Human optical illusions? Changizi2008 presents ambiguous images in a circuit-like format, whose depth perception ‘flips’ based on the ‘input’ or top of the circuit, which are analogous to OR/​AND/​NOT/​XOR computations; the existence of these ‘visual circuits’ hints at the possibility of ‘TC-complete’ images (although many pieces, like a working memory⁠, are missing)

Security implications

It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This matters because, if one is clever, it provides an escape hatch from a system which is small, predictable, controllable, and secure, to one which could do anything. It’s hard enough to make a program do what it’s supposed to do without giving anyone in the world the ability to insert another program into your program. Even if there is no way to outright ‘escape’ the sandbox, such hidden programs can be dangerous, by extracting information about the surrounding program (eg. JS embedded in a web page which can extract your passwords by using Row hammer to attack your hardware directly, even if it can’t actually escape your web browser), or can take the host into strange & uncharted (and untested) territories.

That we find these demonstrations surprising is itself a demonstration of our lack of imagination and understanding of computers, computer security, and AI. We pretend that we are running programs on these simple abstract machines which do what we intuitively expect, but they run on computers which are bizarre, and our programs themselves turn out to be computers which are even more bizarre. Secure systems have to be built by construction; once the genie has been let out of the lamp, it’s difficult to patch the lamp.

An active area of research is into languages & systems carefully designed and proven to not be TC (eg. total functional programming). Why this effort to make a language in which many programs can’t be written? Because TC is intimately tied to Godel’s incompleteness theorems & Rice’s theorem⁠, allowing TC means that one is forfeiting all sorts of provability properties: in a non-TC language, one may be able to easily prove all sorts of useful things to know; for example, that programs terminate, that they are type-safe or not, that they can be easily converted into a logical theorem, that they consume a bounded amount of resources, that one implementation of a protocol is correct or equivalent to another implementation, that there are a lack of side-effects and a program can be transformed into a logically-equivalent but faster version (particularly important for declarative languages like SQL where the query optimizer being able to transform queries is key to acceptable performance, but of course one can do a surprising amount in SQL like 3D raytracing gradient descent for fitting machine learning models and some SQL extensions make it TC anyway by allowing either a cyclic tag system to be encoded, the model DSL⁠, or to call out to PL / SQL) etc.

Languages or systems which unintentionally cross over the line into being TC can be amusing or useful (although usually not), but they also have some serious implications: such systems, because they were never expected to be programmable, can be harmful, or extremely insecure & a cracker’s delight, as exemplified by the “language-theoretic security” paradigm, based on exploiting “weird machines”⁠; some of the literature:

Most recently, Spectre & generalizations (Mcilroy et al 2019) can be interpreted as providing a whole ‘shadow computer’ in the CPU via speculative execution which can be programmed to do things like run malware without visibly executing any of the malware instructions while having side-effects in the real computer. Spectre is interesting in being a class of vulnerabilities which have existed for decades in CPU architectures that were closely scrutinized for security problems, but just sort of fell into a collective human blind spot. Nobody thought of controllable speculative execution as being a ‘computer’ which could be ‘programmed’. Once someone noticed, because it was a powerful computer and of course TC, it could be used in many ways to attack stuff.

“Too powerful” languages can also manifest as nasty DoS attacks; the fuzz tester afl found that it could create an infinite loop in OpenBSD’s roff document formatting tool (first version, 43 years prior) by abusing some of the string substitution rules.

See Also


How Many Computers Are In Your Computer?

Moved to “How Many Computers Are In Your Computer?”.

On Seeing Through and Unseeing

Moved to “On Seeing Through and Unseeing: The Hacker Mindset”.

  1. The baroque complexity of Sendmail possibly contributed to its infamous reputation for insecurity—it was one of the exploit vectors of the Morris worm⁠, and for years shipped with a remote root backdoor (password: WIZ).↩︎

  2. Dwarf Fortress provides clockwork mechanisms, so TC is unsurprising; but the water is implemented as a simple cellular automation, so there might be more ways of getting TC in DF! The DF wiki currently lists 4 potential ways of creating logic gates: the fluids, the clockwork mechanisms, mine-carts, and creature/​animal logic gates involving doors+pressure-sensors.↩︎

  3. The full PDF specification is notoriously bloated. For example, in a PDF viewer which supports enough of the PDF spec, like the Google Chrome Browser, one can play Breakout (because PDF includes its own weird subset of JavaScript). The Adobe PDF viewer includes functionality as far afield as 3D CAD support.↩︎

  4. See Think Math’s domino logic gates & 2014 demonstration of a 4-bit adder implemented using domino logic⁠.↩︎

  5. Earlier versions required players to take all possible actions, but the 2019 paper claims to remove this assumption and force all actions, rendering the construction fully mechanical.↩︎

Further Reading