Surprisingly Turing-Complete

A catalogue of constructs which are unexpectedly Turing-complete; implications for security (computer science)
created: 9 Dec 2012; modified: 13 Mar 2017; status: finished; confidence: highly likely; importance: 6

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.Greenspun’s Tenth Law

Turing-completeness (TC) is the property of a system being able to, under some simple representation of input & output, compute any program.

TC, besides being foundational to computer science and understanding many key issues like why a perfect antivirus program is impossible, is also weirdly common: one might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite and it is difficult to write a useful system which does not immediately tip over into TC. It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This can be amusing, useful (although usually not), harmful, or extremely insecure & a cracker’s delight (see language-theoretic security, based on exploiting weird machines1). Surprising examples of this behavior remind us that TC lurks everywhere, and security is extremely difficult.

Too powerful languages can also manifest as nasty DoS attacks; the fuzz tester afl found in OpenBSD’s roff that it could create an infinite loop by abusing some of the string substitution rules.

They are probably best considered as a subset of discovered or found esoteric programming languages (esolangs). So FRACTRAN, as extraordinarily minimalist as it is, does not count; nor would a deliberately-obfuscated language like Malbolge (where it took years to write a trivial program) count because it was designed to be an esolang; but neither would Conway’s Game of Life count because questions about whether it was TC appeared almost immediately upon publication and so it turning out to be TC is not surprising, and given the complexity of packet-switching networks & routers it’s not necessarily too surprising if one can build a cellular automaton into them or encode logical circuits. Many configuration or special-purpose languages or tools or complicated games turn out to violate the Rule of least power & be accidentally Turing-complete, like MediaWiki templates, sed (any form of templates or compile-time computation is highly likely to be TC since they often turn out to support a lambda calculus or a term-rewriting language), XSLT, Infinite Minesweeper, Dwarf Fortress2, Starcraft, Minecraft, Ant, Transport Tycoon, C++ templates & Java generics, DNA computing etc are TC but these are not surprising either: many games support scripting (ie TC-ness) to make their development easier and enable fan modifications, so games’ TC may be as simple as including syntax for calling out to a better-known language like Perl, or it may just be an obscure part of a standard format (most people these days are probably unaware that TrueType & many fonts are PostScript programs based on stack machines, similar to DWARF debugging and ELF metadata, or that some music formats go beyond MIDI in providing scripting capabilities and must be interpreted to be displayed; once one knows this, then fonts being TC are no more surprising than TeX documents being TC, leading of course, to many severe & fascinating font or media security vulnerabilities such as the BLEND vulnerability or SNES & NES code exploiting Linux systems). Similarly, such feats as creating a small Turing machine using Legos would not count, since we already know that mechanical computers work. On the other hand, the vein of computer security research called weird machines is a fertile ground of that’s TC? reactions. What is surprising may differ from person to person.

See also

  1. An active area of research is into languages & systems carefully designed and proven to not be TC (eg. total functional programming). Why this effort to make a language in which many programs can’t be written? Because TC is intimately tied to Godel’s incompleteness theorems & Rice’s theorem, allowing TC means that one is forfeiting all sorts of provability properties: in a non-TC language, one may be able to easily prove all sorts of useful things to know; for example, that programs terminate, that they are type-safe or not, that they can be easily converted into a logical theorem, that they consume a bounded amount of resources, that one implementation of a protocol is correct or equivalent to another implementation, that there are a lack of side-effects and a program can be transformed into a logically-equivalent but faster version (particularly important for declarative languages like SQL where the query optimizer being able to transform queries is key to acceptable performance, but of course some SQL extensions make it TC anyway by allowing either a cyclic tag system to be encoded, the model DSL, or to call out to PL/SQL) etc. Some of the literature on weird machines:

  2. Dwarf Fortress provides clockwork mechanisms, so TC is unsurprising; but the water is implemented as a simple cellular automation, so there might be more ways of getting TC in DF! The DF wiki currently lists 4 potential ways of creating logic gates: the fluids, the clockwork mechanisms, minecarts, and creature/animal logic gates involving doors+pressure-sensors.