Surprisingly Turing-Complete

A catalogue of software constructs, languages, or APIs which are unexpectedly Turing-complete; implications for security and reliability (computer science)
created: 9 Dec 2012; modified: 01 Nov 2017; status: finished; confidence: highly likely; importance: 6

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.Greenspun’s Tenth Law

Turing-completeness (TC) is the property of a system being able to, under some simple representation of input & output, compute any program.

TC, besides being foundational to computer science and understanding many key issues like why a perfect antivirus program is impossible, is also weirdly common: one might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite and it is difficult to write a useful system which does not immediately tip over into TC. It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This can be amusing, useful (although usually not), harmful, or extremely insecure & a cracker’s delight (see language-theoretic security, based on exploiting weird machines1). Surprising examples of this behavior remind us that TC lurks everywhere, and security is extremely difficult.

Too powerful languages can also manifest as nasty DoS attacks; the fuzz tester afl found in OpenBSD’s roff that it could create an infinite loop by abusing some of the string substitution rules.

They are probably best considered as a subset of discovered or found esoteric programming languages (esolangs). So FRACTRAN, as extraordinarily minimalist as it is, does not count2; nor would a deliberately obfuscated language like Malbolge (where it took years to write a trivial program) count because it was designed to be an esolang; but neither would Conway’s Game of Life count because questions about whether it was TC appeared almost immediately upon publication and so it turning out to be TC is not surprising, and given the complexity of packet-switching networks & routers it’s not necessarily too surprising if one can build a cellular automaton into them or encode logical circuits, or if airplane ticket planning/validation is not just NP-hard or EXPSPACE-hard but undecidable (because of the complex rules airlines require). Many configuration or special-purpose languages or tools or complicated games turn out to violate the Rule of least power & be accidentally Turing-complete, like MediaWiki templates, sed or repeated regexp/find-replace commands in an editor (any form of string substitution or templating or compile-time computation is highly likely to be TC on its own or when iterated since they often turn out to support a lambda calculus or a term-rewriting language or tag system eg esolangs /// or Thue ), XSLT, Infinite Minesweeper, Dwarf Fortress3, Starcraft, Minecraft, Ant, Transport Tycoon, C++ templates & Java generics, DNA computing etc are TC but these are not surprising either: many games support scripting (ie TC-ness) to make their development easier and enable fan modifications, so games’ TC may be as simple as including syntax for calling out to a better-known language like Perl, or it may just be an obscure part of a standard format (most people these days are probably unaware that TrueType & many fonts are PostScript programs[^SVG] based on stack machines, similar to DWARF debugging and ELF metadata, or that some music formats go beyond MIDI in providing scripting capabilities and must be interpreted to be displayed; once one knows this, then fonts being TC are no more surprising than TeX documents being TC, leading of course, to many severe & fascinating font or media security vulnerabilities such as the BLEND vulnerability or SNES & NES code exploiting Linux systems). Similarly, such feats as creating a small Turing machine using Legos or dominos4 would not count, since we already know that mechanical computers work.

On the other hand, the vein of computer security research called weird machines is a fertile ground of that’s TC? reactions. What is surprising may differ from person to person.

Possibly accidentally Turing-complete systems:

  • CSS without the assumption of a driving mouse click
  • SVG: PostScript is TC by design, but what about the more modern vector graphics image format, SVG, which is written as XML, a (usually) not-TC document language? It seems like in conjunction with XSLT it may be, but I haven’t found any proofs or demonstrations of this in a normal web browser context. The SVG standard is large and occasionally horrifying: the (failed) SVG 1.2 standard tried to add to SVG images the ability to open raw network sockets.
  • Unicode: Nicolas Seriot suggests that Unicode’s bidirectional algorithms (intended for displaying scripts like Arabic or Hebrew which go right-to-left rather than left-to-right like English) may be complex enough to support a tag system via case folding rules (eg Turkish)

See also

  1. An active area of research is into languages & systems carefully designed and proven to not be TC (eg. total functional programming). Why this effort to make a language in which many programs can’t be written? Because TC is intimately tied to Godel’s incompleteness theorems & Rice’s theorem, allowing TC means that one is forfeiting all sorts of provability properties: in a non-TC language, one may be able to easily prove all sorts of useful things to know; for example, that programs terminate, that they are type-safe or not, that they can be easily converted into a logical theorem, that they consume a bounded amount of resources, that one implementation of a protocol is correct or equivalent to another implementation, that there are a lack of side-effects and a program can be transformed into a logically-equivalent but faster version (particularly important for declarative languages like SQL where the query optimizer being able to transform queries is key to acceptable performance, but of course some SQL extensions make it TC anyway by allowing either a cyclic tag system to be encoded, the model DSL, or to call out to PL/SQL) etc. Some of the literature on weird machines:

  2. Although linear NNs exploiting round-to-zero floating point mode in order to encode complex, potentially Turing-complete (for RNNs) behavior, which is invisible when run normally, would be both accidentally Turing-complete and a good example of langsec.

  3. Dwarf Fortress provides clockwork mechanisms, so TC is unsurprising; but the water is implemented as a simple cellular automation, so there might be more ways of getting TC in DF! The DF wiki currently lists 4 potential ways of creating logic gates: the fluids, the clockwork mechanisms, mine-carts, and creature/animal logic gates involving doors+pressure-sensors.

  4. See Think Math’s domino logic gates & 2014 demonstration of a 4-bit adder implemented using domino logic.