created: 9 Dec 2012; modified: 01 Nov 2017; status: finished; confidence: highly likely; importance: 6
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.–Greenspun’s Tenth Law
Turing-completeness (TC) is the property of a system being able to, under some simple representation of input & output, compute any program.
TC, besides being foundational to computer science and understanding many key issues like
why a perfect antivirus program is impossible, is also weirdly common: one might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite and it is difficult to write a useful system which does not immediately tip over into TC. It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This can be amusing, useful (although usually not), harmful, or extremely insecure & a cracker’s delight (see
language-theoretic security, based on exploiting
Surprising examples of this behavior remind us that TC lurks everywhere, and security is extremely difficult.
They are probably best considered as a subset of
found esoteric programming languages (esolangs). So FRACTRAN, as extraordinarily minimalist as it is, does not count2; nor would a deliberately obfuscated language like Malbolge (where it took years to write a trivial program) count because it was designed to be an esolang; but neither would Conway’s Game of Life count because questions about whether it was TC appeared almost immediately upon publication and so it turning out to be TC is not surprising, and given the complexity of packet-switching networks & routers it’s not necessarily too surprising if one can build a cellular automaton into them or encode logical circuits, or if airplane ticket planning/validation is not just NP-hard or EXPSPACE-hard but undecidable (because of the complex rules airlines require). Many configuration or special-purpose languages or tools or complicated games turn out to violate the Rule of least power & be
accidentally Turing-complete, like MediaWiki templates,
sed or repeated regexp/find-replace commands in an editor (any form of string substitution or templating or compile-time computation is highly likely to be TC on its own or when iterated since they often turn out to support a lambda calculus or a term-rewriting language or tag system eg esolangs
/// or Thue ), XSLT, Infinite Minesweeper, Dwarf Fortress3, Starcraft, Minecraft, Ant, Transport Tycoon, C++ templates & Java generics, DNA computing etc are TC but these are not surprising either: many games support scripting (ie TC-ness) to make their development easier and enable fan modifications, so games’ TC may be as simple as including syntax for calling out to a better-known language like Perl, or it may just be an obscure part of a standard format (most people these days are probably unaware that TrueType & many fonts are PostScript programs[^SVG] based on stack machines, similar to DWARF debugging and ELF metadata, or that some music formats go beyond MIDI in providing scripting capabilities and must be interpreted to be displayed; once one knows this, then fonts being TC are no more surprising than TeX documents being TC, leading of course, to many severe & fascinating font or media security vulnerabilities such as the BLEND vulnerability or SNES & NES code exploiting Linux systems). Similarly, such feats as creating a small Turing machine using Legos or dominos4 would not count, since we already know that mechanical computers work.
On the other hand, the vein of computer security research called
weird machines is a fertile ground of
that’s TC? reactions. What is
surprising may differ from person to person.
- Peano arithmetic: addition & multiplication on natural numbers is enough to be TC; in contrast, Presburger arithmetic removes multiplication and hence is not TC
- Wang tiles: multi-colored squares, whose placement is governed by the rule that adjacent colors must be the same (historically, not surprising to Wang, but was surprising to me and I think to a lot of other people)
- MMU shuffle computer RAM around to make programming easier; if a program sets up its share of memory properly, it can execute arbitrary computations via MMU page-faults (comments; paper) without ever running code itself by turning the MMU faulting mechanism into a one-instruction set computer.
: the apparently innocuous x86 assembler instruction
mov, which copies data between the CPU & RAM, can be used to implement a transport-triggered-architecture one instruction set computer (and for bonus points, it can be done using
x86 is Turing-complete with no registers
return-into-libc attacks: software libraries provide pre-packaged functions, each of which is intended to do one useful thing; a fully TC
languagecan be cobbled out of just calls to these functions and nothing else, which enables evasion of security mechanisms since the attacker is not running any recognizable code of his own. See, among many others,
The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86)&
On the Expressiveness of Return-into-libc Attacks.
- Pokemon Yellow:
Pokemon Yellow Total Control Hackoutlines an exploit of a memory corruption attack which allows one to write arbitrary Game Boy assembler programs by repeated in-game walking and item purchasing. (There are similar feats which have been developed by speedrun aficionados, but I tend to ignore most of them as they are
impure: for example, one can turn the SNES Super Mario World into an arbitrary game like Snake or Pong but you need the new programs loaded up into extra hardware, so in my opinion, it’s not really showing SMW to be unexpectedly TC and is different from the other examples. Similarly, one can go from Super Game Boy to SNES to arbitrary code like IRC. This distinction is debatable.)
- Braid: TC
- musical notation: given instructions for transposing successive notes, musical notation becomes the esolang Choon
- heart cells: interact in a way allowing logic gates and hence TC (perhaps not too surprising since cellular automatons were biologically motivated)
one category of weird machines doesn’t quite count since they require an assumption along the lines of the user mechanically clicking or making the only possible choice in order to drive the system into its next step; while the user provides no logical or computational power in the process, they aren’t as satisfying examples for this reason:
- Magic: the Gathering: TC, with the assumption that players mechanically take any option they are given, but otherwise all actions/plays are forced by Magic rules
- CSS: was designed to be a declarative markup language for tweaking the visual appearance of HTML pages, but CSS declarations interact just enough to allow an encoding of the cellular automaton Rule 110, under the assumption of mechanical mouse clicks on the web browser to advance state
- Microsoft PowerPoint animations (excluding macros, VBScript etc) can implement a Turing machine when linked appropriately (Wildenhain 2017; video; PPT), under the assumption of a user clicking on the only active animation triggers
Possibly accidentally Turing-complete systems:
- CSS without the assumption of a driving mouse click
- SVG: PostScript is TC by design, but what about the more modern vector graphics image format, SVG, which is written as XML, a (usually) not-TC document language? It seems like in conjunction with XSLT it may be, but I haven’t found any proofs or demonstrations of this in a normal web browser context. The SVG standard is large and occasionally horrifying: the (failed) SVG 1.2 standard tried to add to SVG images the ability to open raw network sockets.
- Unicode: Nicolas Seriot suggests that Unicode’s bidirectional algorithms (intended for displaying scripts like Arabic or Hebrew which go right-to-left rather than left-to-right like English) may be complex enough to support a tag system via case folding rules (eg Turkish)
An active area of research is into languages & systems carefully designed and proven to not be TC (eg. total functional programming). Why this effort to make a language in which many programs can’t be written? Because TC is intimately tied to Godel’s incompleteness theorems & Rice’s theorem, allowing TC means that one is forfeiting all sorts of provability properties: in a non-TC language, one may be able to easily prove all sorts of useful things to know; for example, that programs terminate, that they are type-safe or not, that they can be easily converted into a logical theorem, that they consume a bounded amount of resources, that one implementation of a protocol is correct or equivalent to another implementation, that there are a lack of side-effects and a program can be transformed into a logically-equivalent but faster version (particularly important for declarative languages like SQL where the query optimizer being able to transform queries is key to acceptable performance, but of course some SQL extensions make it TC anyway by allowing either a cyclic tag system to be encoded, the
modelDSL, or to call out to PL/SQL) etc. Some of the literature on
Exploit Programming: From Buffer Overflows to, Bratus et al 2011
Weird Machinesand Theory of Computation
The Halting Problems of Network Stack Insecurity, Sassaman et al 2011
The Page-Fault Weird Machine: Lessons in Instruction-less Computation, Bangert et al 2013
, Shapiro et al 2013
Weird Machinesin ELF: A Spotlight on the Underappreciated Metadata
Interrupt-oriented Bugdoor Programming: A minimalist approach to bugdooring embedded systems firmware, Tan et al 2014
The Weird Machines in Proof-Carrying Code, Vanegue 2014
Framing Signals - A Return to Portable Shellcode, Bosman & Bos 2014
Although linear NNs exploiting round-to-zero floating point mode in order to encode complex, potentially Turing-complete (for RNNs) behavior, which is invisible when run normally, would be both
accidentallyTuring-complete and a good example of langsec.↩
Dwarf Fortress provides clockwork mechanisms, so TC is unsurprising; but the water is implemented as a simple cellular automation, so there might be more ways of getting TC in DF! The DF wiki currently lists 4 potential ways of creating logic gates: the fluids, the clockwork mechanisms, mine-carts, and creature/animal logic gates involving doors+pressure-sensors.↩