Hacker News new | past | comments | ask | show | jobs | submit login
Surprisingly Turing-Complete (gwern.net)
138 points by telekid on April 11, 2020 | hide | past | favorite | 49 comments



> This matters because, if one is clever, it provides an escape hatch from system which is small, predictable, controllable, and secure, to one which could do anything. It’s hard enough to make a program do what it’s supposed to do without giving anyone in the world the ability to insert another program into your program, which can then interfere with or take over its host

I take issue with the statement that TC means that something could "do anything". It means it can theoretically compute anything, but Turing completeness does not in itself give access to any additional resources; being TC does not magically allow something to talk to internet or write to disk or spawn new processes, or possibly not even allocate new memory. Computers do so much more than just compute; TC might be the first step towards being able to "do anything" but it certainly is not the final step.

From security point of view, what TC can (but not necessarily!) is open the host to denial of service by excessive resource consumption, or by non-terminating program. But as another comment noted, also non-TC systems can consume impractically large amounts of resources even if their resource consumption is not theoretically unbounded (as it is afaik with Turing machines).


Author of one of the cited papers here. The author of the post falls into a common misconception of the weird machine literature (which led me to write my paper): conflating TC (the ability to compute any function worth computing) with the ability to transition the victim machine into states that should be unreachable via paths that should not exist (“weird machine programming”). It is a bit unfortunate that this misunderstanding is pervasive in early WM papers :-/ - this ensures perpetuation of the misunderstanding.


Complexity theorist here. In addition to your point, there's another commonly-overlooked problem: TC isn't quite the actual top! There are ways to make problems that, even with an oracle for solving Halting, are still hard [0].

It seems like folks are very quick to confuse the expressiveness of a machine with the expressiveness of analyzing programs for that machine; usually, a program is far harder to analyze than to run.

I think that a better way to view the post's author's point is by unpredictability. Given a short program in a weak setting, we can not only predict what the program will do, but what the program cannot do, usually because it is too short or too simple. In a Turing-complete setting, though, there are short programs with very unpredictable behavior.

[0] https://en.wikipedia.org/wiki/Turing_degree


I don't think this discussion thread is giving the writer's concern enough credit. Universality on its own obviously doesn't allow the instance to take over its host, but it can enable the bulk of the malicious payload to be encoded as legitimate instances of whatever P-hard optimization problem the cloud service solves, so that it need not be injected directly via the actual vulnerability that the malicious actor uses to take over the host.

> TC isn't quite the actual top! There are ways to make problems that, even with an oracle for solving Halting, are still hard [0].

I'm not sure why you're raising the subject of the degrees here. Malicious software doesn't need to have hypercomputational powers to be a threat.


I think that, if you're going to care about being exploited by real-world payloads, then Turing-completeness is a red herring; I agree with the thread-starter. For example, it is already bad enough to not be able to tell when computations are in P vs. NP, for responsiveness under load. It is not good when an NP database query halts a P Web server. For this reason, languages like Pola [0] which are far weaker than Turing-completeness are valuable.

And, if you thought that it was easy to be accidentally Turing-complete, wait until you see how easy it is to be accidentally NP [1]. The typical database query is in NP, because constraint satisfaction problems are in NP. So is the typical optimization problem.

[0] https://www.researchgate.net/publication/266217730_Pola_a_la...

[1] https://en.wikipedia.org/wiki/List_of_NP-complete_problems


I do want to add (as my parent comment was bit negative) that I do think non-TC programming is an area that deserves more research, as is the research around better characterizing TC programs (http://raml.co/ being example of that)


Yeah, it seems like "Turing-complete" is becoming a misused meme. Here's related post and thread about the Dhall configuration language:

http://www.haskellforall.com/2020/01/why-dhall-advertises-ab...

My comment:

That is, it’s fine (and often necessary) for a config language to be Turing complete. (This doesn’t mean it needs global variables or mutation, etc.)

The real issue is side effects (I/O), which is not technically related to Turing-completeness.

https://lobste.rs/s/gcfdnn/why_dhall_advertises_absence_turi...

And for some bizarre reason a couple people replying keep insisting that this serves some purpose, even though the post basically admits it's the wrong issue, but they're doing it as "signaling mechanism" (i.e. marketing to people who hold a misconception about computer science.)

But that's not even the end of it -- as mentioned:

not only is the messaging focused on an irrelevant thing, but the language design is too! It would almost make more sense if the messaging were the only odd thing.


I find your first link particularly ironic. It claims that certain words are used beyond their literal meaning as a "signaling mechanism" for "like-minded people" that you're doing the right thing. And "non-Turing completeness" is one such "shibboleth" that signals to security-minded people that your configuration language is very secure and awesome.

Now I could claim on the contrary that whenever you're using "security" and "non-Turing completeness" in that way together, the only thing you're actually signaling is that you don't know WTF you're talking about.


Yeah, I'm not so sure. Security is about ensuring certain guarantees. If your configuration language is Turing complete, it's easy to get into a spot where you simply can't be certain of the final state of your system. That's not secure. Turing complete configuration language might be an instant red flag, like "perpetual motion" is to physicists.

It's possible to ensure termination of Turing complete languages by rejecting certain programs, but the work required is not something you'll find in a config file parsing library.


I’m unsure of how you mean that in a technical sense. It’s straightforward to make those guarantees in your interpreter.

And just to be clear, Dhall, the configuration language we’re talking about, is not TC, but powerful enough to compute the Ackermann function: https://gist.github.com/Gabriel439/77f715350ecc0443eed5fa613...

Add “ackermann 10 10” to your configuration file and you have something that’s technically proven to terminate, but won’t do so before the sun burns out. No security properties are gained here.


The security property gained is that with terminatation, other sorts of analyses that check security properties of interest now become possible to prove. It doesn't mean they'll be tractable, but "possible" is a necessary precondition for tractable.


> Turing completeness does not in itself give access to any additional resources; being TC does not magically allow something to talk to internet or write to disk or spawn new processes, or possibly not even allocate new memory. Computers do so much more than just compute; TC might be the first step towards being able to "do anything" but it certainly is not the final step.

In many cases it is the final step.

If you're trying to secure something that lacks any good reason to access the internet, it shouldn't be able to. And yet so many things like that still have internet access.

This creates a problem when you have a program which is only supposed to process some sensitive data and not export it off to the attacker, because as soon as the attacker can execute their own code, the process already had access to the sensitive data and to the internet. Or there is no sensitive data but the process already had access to the internet, so now the attacker is using your hardware to mine cryptocurrency or route their network traffic through your IP address.

We could stop giving network access to processes that otherwise shouldn't need it, but that requires overcoming the incumbent economic forces that use network access for telemetry and advertising. So there are a lot of people hoping that making things that aren't Turing-complete is easy. But it turns out to be pretty hard. So we may have to start pushing back against those economic forces.


> because as soon as the attacker can execute their own code, the process already had access to the sensitive data and to the internet.

That's where your thinking goes wrong. TC does not mean that the program can take over the host process control flow.


> That's where your thinking goes wrong. TC does not mean that the program can take over the host process control flow.

But that's commonly what happens in practice. Return into libc and similar do exactly that.

Not only that, compromising the host process control flow is not strictly required. You may be executing weird machine instructions and not machine code instructions, but the weird machine being inside the host process often means that it already has access to at least some of the host process address space. When it's the whole address space or even an interesting subset (e.g. read access to sensitive data, write access to anything that goes into an outgoing network buffer), achieving TC is already the end of the game.

Access to host process data need not even be so direct. Once you can execute weird machine code inside the host process, it enables timing attacks that may reveal more host process data -- especially when you're inside the host process control flow, even if you don't fully control it. Exporting data is likewise possible if you can bring about any externally-visible change in the host process behavior whatsoever, e.g. the timing of outgoing network packets.

There are cases where none of these things are true but the cases where they are true are common. And they're also not generally regarded as a vulnerabilities that could justify mitigating them with things like denying network access, even though maybe they should be.


A good illustration of this is the linked TC regex[1], which can be implemented in Notepad++ or similar.

Just because you can hit "Replace all" a bunch of times to run the TC "regex code", transforming some input to some output, doesn't in itself mean you can make Notepad++ do something weird.

[1]: https://github.com/Tegmen/RegEx-Brainfuck-Interpreter


Author of one of the linked weird machine papers here. The use of “Turing complete” in both the ROP and the weird machine literature is both incorrect and misleading; I wrote some comments on this here: http://addxorrol.blogspot.com/2018/10/turing-completeness-we...

This does not detract from this post being a good, fun, and interesting read, but for anyone that is puzzled why “Turing complete” should imply “insecure”: It doesn’t.


> If that’s not enough, the SVG standard is large and occasionally horrifying: the (failed) SVG 1.2 standard tried to add to SVG images the ability to open raw network sockets.

!!!

!!!!!!!!!!!

From the SVG 1.2 draft:

> Note that these interfaces expose possible security concerns. The security model that these interfaces work under is defined by the user agent. However, there are a well-known set of common security guidelines used by the browser implementations in this area. For example, most do not allow access to hosts other than the host from which the document was retrieved. > > The next draft of SVG 1.2 will clearly list the minimum set of security features that an SVG user agent should put in place for these interfaces.

"Possible security concerns". No kidding. At least they were going to address them in the next draft version... though probably not by removing the ability to open sockets. Words fail me.


Python pickle files are a sequence of op-codes that run on the pickle VM. By default the VM allows calls to arbitrary Python functions. I'm still puzzling whether Python pickles without access to Python globals (e.g. using https://docs.python.org/3/library/pickle.html#restricting-gl...) are Turing complete. I don't think so, because the pickle VM has no branching or looping, but it does have a stack and my understanding of automata theory is not great.

My research/tinkering so far is https://github.com/moreati/pickle-fuzz


"Peano arithmetic: addition & multiplication on natural numbers is enough to be TC;"

My head swims when the situation is described with this level of vagueness. I mean, sure the task of proving a theorem using the modern version of the Peano postulates is undeciable and so I'd assume a map from theorems in the Peano system to proofs of theorems would be Turing complete.

But a computation system based on calculating the values of simple arithmetic expressions isn't Turing complete. An express involving just adding and multiplying constant integer values will terminate.


Perhaps he means that you could somehow abuse the induction axiom, although it seems to me that would be in a way that's not what the axiom was meant for.


I love this...

"...mov, which copies data between the CPU & RAM, can be used to implement a transport-triggered-architecture one instruction set computer, allowing for playing Doom..."

Click on "Doom" link and read:

"The mov-only DOOM renders approximately one frame every 7 hours, so playing this version requires somewhat increased patience."


If TrueType hinting is turing complete - are outputs observable from a Web Font context? Is it possible to write a WASM polyfill based on TrueType hinting?

From https://docs.microsoft.com/en-us/typography/opentype/spec/tt... looks to have 32-bit words, a dynamic heap, unrestricted JMP targets, a generous number of math functions, ...


The article mentions tt fonts as being based on the postscript language.


> “return-into-libc attacks”: software libraries provide pre-packaged functions, each of which is intended to do one useful thing; a fully TC ‘language’ can be cobbled out of just calls to these functions and nothing else, which enables evasion of security mechanisms since the attacker is not running any recognizable code of his own.

Note that ROP attacks in general tend to jump into the middle of functions because they have partially-cobbled together call states. ROP "chains" join together a couple of instructions followed by a return into something useful, but with "return-into-libc" it's usually to just jump straight midway into system and spawn a shell.

> Pokemon Yellow: “Pokemon Yellow Total Control Hack” outlines an exploit of a memory corruption attack which allows one to write arbitrary Game Boy assembler programs by repeated in-game walking and item purchasing. (There are similar feats which have been developed by speedrun aficionados, but I tend to ignore most of them as they are ‘impure’: for example, one can turn the SNES Super Mario World into an arbitrary game like Snake or Pong but you need the new programs loaded up into extra hardware, so in my opinion, it’s not really showing SMW to be unexpectedly TC and is different from the other examples.

I fail to see the difference; as far as I understood it, the Sumer Mario World examples were done by just playing the game? (By the way, I hear that Ocarina of Time has something like this now, too.)

> This matters because, if one is clever, it provides an escape hatch from system which is small, predictable, controllable, and secure, to one which could do anything. It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This matters because, if one is clever, it provides an escape hatch from system which is small, predictable, controllable, and secure, to one which could do anything.

You can still prove sandboxing guarantees about executing Turing-complete programs.


Stuck in an appendix is a fascinating mini-essay, "How many computers are in your computer?"

https://gwern.net/Turing-complete#how-many-computers-are...


Indeed. Everything is a hetrogenous cluster now. Don't forget the input devices and outputs; one of my more interesting jobs was very tangentially being involved with https://www.flatfrog.com/inglass . Every dot around that screen has an ARM and a small DSP.


This is definitely interesting, but there's always Javascript in the browser. It's turing complete by design, and it can and is sandboxed to a lot of success. The fast and quick conclusion that TC in itself is dangerous is not warranted, but when it's not intended, it can have unexpected consequences that might have some security, or other (stability) implications. That's what I take away from the article.


Turing Complete is not a very high bar. Add a second stack to a pushdown automata and its Turing Complete. Add two counters to a NFA and it’s Turing Complete. I don’t think folks know what this means.


A related idea that I’m interested in but find a bit hard to articulate is to describe “simple” Turing complete languages, where simplicity is defined more by ease of reasoning for a human than by any objective metric.

Basically, if I wanted to provide someone with a Turing complete language, what’s the simplest/easiest thing I could provide, that would still be useful?


The simplest you could give someone is probably the Turing machine itself, the Brainfuck language or the lambda calculus.

Simplifying, to have a TC programming language you need two things: RAM and the ability to decide your next state based on the memory contents.


Right, so I think the suggestion of brainfuck illustrates the difficulty I’m having articulating what I want, because while it’s TC and trivial to implement, it is basically impossible to use as a language. I think I’m going for simultaneous ease-of-implementation and ease-of-use rather than any actually type of “minimalism”.

I’m probably just looking for Lisp, really. It’s easy to implement and usable enough.


Yeah, a simple Lisp like Scheme is basically the lambda calculus.

Brainfuck is useless because the operations are useless. But if you give the user a few more things, like addressing memory, basic arithmetic and a way to define variables and functions, then you have something way more useful very quickly.


From what I understand, Forth might be a contender.


> Turing-completeness (TC) is ... the property of a system being able to ... compute any program of interest, including another computer in some form.

So Turing completeness implies recursive Turing completeness. It is the theoretical threshold at which a device is capable of reproduction, a sort Schwarzschild radius for complex, heritable behavior, aka life.


Does finding weird and unexpected ways to do computation always imply a security risk?


I do research in this space professionally.

The answer is not always, but sometimes: discovering unintended states or transitions in the execution contract of a program is a common building block for exploits. However, proving that the execution contract of a program can be coerced into representing computations in a TC language doesn't necessarily prove that you can do anything interesting.

Complex formats like PDF are a good example of this: you can probably contrive a PDF input such that the parser's state represents a minimal (and TC) language while interpreting it (e.g. a language with a few mathematical operators and a "store" primitive), but that doesn't magically get you network access or arbitrary memory read/writes. You need to show that said language, when programmed in, can affect the overall execution contract in a way that violates high-level assumptions.

Some resources (FD: my company's) on the subject:

* https://blog.trailofbits.com/2019/11/01/two-new-tools-that-t...

* https://blog.trailofbits.com/2018/10/26/the-good-the-bad-and...


Just a heads up: I think the second post has a typo; the code has named "new_item" but is referred to as "item" throughout. (I'm also not sure I understand the safety added by dynamic_cast.)


Thanks for the heads up! I'll fix it.


I’m not sure why that’s explicitly mentioned, I believe that for analyzing security, TC is a bit of an academic red herring. (Warning: not an expert.) There are two kinds of security implication to consider here I think.

One, can certain input data cause a large usage of computing resources (processing time or memory)? Non-TC mechanisms have repeatedly failed this test: ZIP bombs and XML entity bombs are notorious examples. On the other hand, you can easily put a resource cap on an interpreter of a theoretically TC language and be safe.

Two, can the untrusted code access resources that it shouldn’t (memory, files, sockets)? That’s mostly a quality of implementation issue, not one of Turing completeness. JavaScript interpreters have certainly been vulnerable to various exploits, but so have JPEG decoders. I don’t think TC is the issue here. (However, this is complicated a bit by side-channel attacks à la Spectre. I’m not sure how TC factors into those.)

In summary, I’m not convinced that Turing completeness is all that relevant for security. Am I wrong here?


Both can have exploits as you've shown, but you haven't addressed the difficulty of securing one or the other. Turing complete systems have a dramatically larger state space than non turing complete implementations and so at a fundamental level are more difficult to secure.


I simply don't think Turing completeness is the deciding factor in determining that difficulty. Hopefully someone can define this more clearly, but in my opinion, the issue is not abstract "state space", but implementation complexity and attack surface.

For example, it's a lot easier to write a secure Brainfuck interpreter (a very simple Turing complete language that can only access STDIN and STDOUT) than a program that securely extracts a ZIP file to disk.


At least theoretically, it means certain inputs can cause infinite loops, which, thanks to Turing completeness and the undecidability of the halting problem, means there’s no way to detect in advance in all cases.


It means you can't easily (or statically) reason about the output given arbitrary (user) input. So yes, it makes it much more likely that security bugs are introduced.


Also it at least implies a guaranteed resource leak / DOS capability, which is a problem though it may or may not be considered a security issue.


Well, that depends; you can slap a rlimit on a process and it will no longer DOS you, even if it's executing Turing complete code. It does mean that inside the Turing machine you can't really apply protections, but from the outside a Turing-complete language cannot magically escape its sandbox unless you give (or accidentally include) it a tool to do so.


Unless I'm completely off my mark, any resource limited system is not true universal Turing machine in the theoretical sense. So depending on your level of pedantry, TC can mean guaranteed DoS.


For the purposes of this conversation, we generally speak of the TC-ness of a system if we could somehow expand it indefinitely.

I believe there was someone who set up a TC-complete roller coaster network in some Rollercoaster Tycoon sort of game. It had something like 25-slots of "RAM". Not very TC-ish. But if you could hypethetically expand out, hypothetically it would work.

So it's a common convention.


But...that's your whole computer too then.


Indeed.




Applications are open for YC Summer 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: