Julian Assange/Wikileaks made some headlines in 2010 when they released an insurance file, an 1.4GB AES-256-encrypted file available through BitTorrent. It’s generally assumed that copies of the encryption key have been left with Wikileaks supporters who will, in the appropriate contingency like Assange being assassinated, leak the key online to the thousands of downloaders of the insurance file, who will then read and publicize whatever contents as in it (speculated to be additional US documents Manning gave Wikileaks).

Of course, any one of those supporters could become disaffected and leak the key at any time. Or if there’s only 1 supporter, they might lose the key to a glitch or become disaffected in the opposite direction and refuse to transmit the key to anyone. If one trusts the person with the key absolutely, that’s fine. But wouldn’t it be nice if one didn’t have to trust another person like that? Cryptography does really well at eliminating the need to trust others, so maybe there’re better schemes.

Now, it’s hard to imagine how some abstract math could observe an assassination and decrypt embarrassing files. Perhaps a different question could be answered - can you design an encryption scheme which requires no trusted parties but can be broken after a certain date? (This would be useful for many things1.)

One could encrypt the file against information that will be known in the future, like stock prices - except wait, how can you find out what stock prices will be a year from now? You can’t use anything that is public knowledge now because that’d let the file be decrypted immediately, and by definition you don’t have access to information currently unknown but which will be known in the future, and if you generate the information yourself planning to release it, now you have problems - you can’t even trust yourself (what if you are abruptly assassinated like Gerald Bull?) much less your confederates.

# No trusted third-parties

Note that this bars a lot of the usual suggestions for cryptography schemes. For example the general approach of key escrow (eg. Bellare & Goldwasser 1996) if you trust some people, you can just adopt a secret sharing protocol where they XOR together their keys to get the master key for the publicly distributed encrypted file. Or if you only trust some of those people (but are unsure which will try to betray you and either release early or late), you can adopt where k of the n people suffice to reconstruct the master key like Rabin & Thorpe 2006. (And you can connect multiple groups, so each decrypts some necessary keys for the next group; but this gives each group a consecutive veto on release…) Or perhaps something could be devised based on trusted timestamping like Crescenzo et al 1999 or Blake & Chan 2004; but then don’t you need the trusted third party to survive on the network? Or secure multi-party computation (but don’t you need to be on the network, or risk all the parties saying screw it, we’re too impatient, let’s just pool our secrets and decrypt the file now?) or you could exploit physics and use the speed of light to communicate with a remote computer on a spacecraft (except now we’re trusting the spacecraft as our third party, hoping no one stole its onboard private key & is able to decrypt our transmissions instantaneously)…

One approach is to focus on creating problems which can be solved with a large but precise amount of work, reasoning that if the problem can’t be solved in less than a month, then you can use that as a way to guarantee the file can’t be decrypted within a month’s time. (This would be a proof-of-work system.) This has its own problems2, but it at least delivers what it promises

# Hashing

For example, one could take a hash like bcrypt, give it a random input, and hash it for a month. Each hash depends on the previous hash, and there’s no way to skip from the first hash to the trillionth hash. After a month, you use the final hash as the encryption key, and then release the encrypted file and the random input to all the world. The first person who wants to decrypt the file has no choice but to redo the trillion hashes in order to get the same encryption key you used.

Nor can the general public (or the NSA) exploit the parallelism they have available, because each hash depends sensitively on the hash before it - the avalanche effect is a key property to cryptographic hashes. On the other hand, the person running this algorithm can run it in parallel.

One generates n random inputs (for n CPUs, presumably), and sets them hashing as before for however long one can spare. Then, one sets up a chain between the n results - the final hash of seed 1 is used to encrypt seed 2, the final hash of which was the encryption for seed 3, and so on. Then one releases the encrypted file, the $n-1$ encrypted seeds, and the first seed. Now the public has to hash the first seed for a month, and only then can it unlock the second seed, and start hashing that for a month, and so on. (A similar scheme, A Guided Tour Puzzle for Denial of Service Prevention, uses network latency rather than hash outputs as the chained data - clients bounce from resources to resource - but this obviously requires an online server and is unsuitable for our purposes.)

This is pretty clever. If one has a thousand CPUs handy, one can store up 3 years’ of computation-resistance in just a day. This satisfies a number of needs. But what about people who only have a normal computer? Fundamentally, this repeated hashing requires you to put in as much computation as you want your public to expend reproducing the computation, which is not enough. We want to force the public to expend more computation - potentially much more - than we put in. How can we do this?

It’s hard to see. At least, I haven’t thought of anything clever. Homomorphic encryption promises to let us encode arbitrary computations into an encrypted file, so one could imagine implementing the above hash chains inside the homomorphic computation, or perhaps just encoding a simple loop counting up to a very large number; but it’s not clear how one would let the public decrypt the result of the homomorphic encryption without also letting them tamper with the loop or whatever, and in any case, homomorphic encryption is currently a net-loss - it takes as much or more CPU time to create such a program as it would take to run the program, and in that case, one might as well use the previous hash schemes.

## Vulnerability of one-way functions

As it turns out, Time-Lock Puzzles in the Random Oracle Model (Mahmoody, Moran, and Vadhan 2011; slides) directly & formally analyzes the general power of one-way functions used for time-lock puzzles assuming a random oracle. Unfortunately, they find an opponent can exploit the oracle to gain speedups. Fortunately, the cruder scheme where one stores up computation (repeatedly asking the oracle at inputs based on its previous output) still works under their assumptions:

A time-lock puzzle with a linear gap in parallel time. Although our negative results rule out strong time-lock puzzles, they still leave open the possibility for a weaker version: one that can be generated with n parallel queries to the oracle but requires n rounds of adaptive queries to solve. In a positive result, we show that such a puzzle can indeed be constructed…Although this work rules out black-box constructions (with a super-constant gap) from one-way permutations and collision-resistant hash functions, we have no reason to believe that time-lock puzzles based on other concrete problems (e.g., lattice-based problems) do not exist. Extending our approach to other general assumptions (e.g., trapdoor permutations) is also an interesting open problem.

That is, the puzzle constructor can construct the puzzle in parallel, and the solver has to solve it serially.

# Successive squaring

At this point, let’s see what the crypto experts have to say. Googling to see what the existing literature was (after I’d thought of the above schemes), I found that the relevant term is time-lock puzzles (from analogy with the bank vault time lock). In particular, Rivest/Shamir/Wagner have published a 1996 paper on the topic, Time-lock puzzles and timed-release crypto. The very first paragraph is interesting; apparently the question was first raised by Timothy C. May on the Cypherpunks mailing list. Unfortunately, there seem to be no archives online of the cypherpunk mailing list (the provided URL is long dead); fortunately, May discusses the topic briefly in his Cyphernomicon, ch14.5; unfortunately, May’s solution (14.5.1) is essentially to punt to the legal system and rely on legal privilege and economic incentives to keep keys private.

Rivest et al agree with us that

There are 2 natural approaches to implementing timed-release crypto:

• Use time-lock puzzles - computational problems that can not be solved without running a computer continuously for at least a certain amount of time.
• Use trusted agents who promise not to reveal certain information until a specified date.

And that for time-lock puzzles:

Our goal is thus to design time-lock puzzles that, to the great extent possible, are intrinsically sequential in nature, and can not be solved substantially faster with large investments in hardware. In particular, we want our puzzles to have the property that putting computers to work together in parallel doesn’t speed up finding the solution. (Solving the puzzle should be like having a baby: two women can’t have a baby in 4.5 months.)

Rivest et al then points out that the most obvious approach - encrypt the file to a random short key, short enough that brute-forcing takes only a few months/years as opposed to eons - is flawed because brute-forcing a key is very parallelizable and amenable to special hardware3. (And as well, the randomness of searching a key space means that the key might be found very early or very late; any estimate of how long it will take to brute force is just a guess.) One cute application of the same brute-forcing idea is Merkle’s Puzzles where the time-lock puzzle is used to hide a key for a second-party to communicate with the first-party creator, but it has the same drawback: it has the creator make many time-lock puzzles (any of which could be used by the second-party) and raises the cost to the attacker (who might have to crack each puzzle), but can be defeated by a feasibly wealthy attacker, and offers only probabilistic guarantees (what if the attacker cracks the same puzzle the second-party happens to choose?).

Rivest et al propose a scheme in which one encrypts the file with a very strong key as usual, but then one encrypts the key in such a way that one must calculate ${\mathrm{\text{encryptedKey}}}^{{2}^{t}}\mathrm{\text{mod}}\left(n\right)$ where t is the adjustable difficulty factor. With the original numbers, one can easily avoid doing the successive squarings since it’s just ${2}^{t}=e×\mathrm{\text{mod}}\left(\Phi \left(n\right)\right)$ (Rivest says). This has the nice property that the puzzle constructor invests only $O\left(n\right)$ computing power, but the solver has to spend $O\left({n}^{2}\right)$ computing power. (This scheme works in the random oracle model, but Barak & Mahmoody-Ghidary 2009 proves that is the best you can do.)

Rivest has actually used this scheme for a time capsule commemorating the MIT Computer Science and Artificial Intelligence Laboratory; he expects his puzzle to take ~35 years. He offers some advice for anyone attempting to unlock this time-lock puzzle (may or may not be related to Mao’s 2000 paper Time-Lock Puzzle with Examinable Evidence of Unlocking Time):

An interesting question is how to protect such a computation from errors. If you have an error in year 3 that goes undetected, you may waste the next 32 years of computing. Adi Shamir has proposed a slick means of checking your computation as you go, as follows. Pick a small (50-bit) prime c, and perform the computation modulo cn rather than just modulo n. You can check the result modulo c whenever you like; this should be a extremely effective check on the computation modulo n as well.

## Constant factors

How well does this work? The complexity seems correct, but I worry about the constant factors. Back in 1996, computers were fairly homogeneous, and Rivest et al could reasonably write

We know of no obvious way to parallelize it to any large degree. (A small amount of parallelization may be possible within each squaring.) The degree of variation in how long it might take to solve the puzzle depends on the variation in the speed of single computers and not on one’s total budget. Since the speed of hardware available to individual consumers is within a small constant factor of what is available to large intelligence organizations, the difference in time to solution is reasonably controllable.

But that doesn’t seem very true any more. Devices can differ dramatically now even in the same computers; to take the example of Bitcoin mining, my laptop’s CPU can search for hashes at 4k/sec, or its GPU can search at 54m/second4. This has had the negative effect of centralizing Bitcoin mining power, reducing the security of the network: while there are tens or hundreds of thousands of nodes in the Bitcoin P2P network, only a few of them are actual miners because CPU mining has become useless - the big miners, who have large server farms of GPUs or ASICs, collectively control much of the hash power. This has not yet been a problem, but may. Using a (partially) memory-bound hash function is one of the selling points of a competing Bitcoin currency, Litecoin.

Many scientific applications have moved to clusters of GPUs because they offer such great speedups; as have a number of cryptographic applications such as generating5 rainbow tables.

And then there are more exotic technologies like field-programmable gate arrays which may be specialized for successive squaring; if problems like the n-body problem can be handled with custom chips, why not multiplication? Offhand, I don’t know of any compelling argument to the effect that there are no large constant-factor speedups possible for multiplication/successive-squaring. Indeed, the general approach of exponentiation and factoring has to worry about the fact that the complexity of factoring has never been proven (and could still be very fast) and that there are speedups with quantum techniques like Shor’s algorithm.

# Memory-bound hashes

Of course, one could ask the same question of my original proposal - what makes you think that hashing can’t be sped up? You already supplied an example where cryptographic hashes were sped up astonishingly by a GPU, Bitcoin mining.

The difference is that hashing can be made to stress the weakest part of any modern computer system, the memory hierarchy’s terrible bandwidth and latency6; the hash can blow the fast die-level caches (the CPU & its cache) and force constant fetches from the main RAM. They were devised for anti-spam proof-of-work systems that wouldn’t unfairly penalize cellphones & PDAs while still being costly on desktops & workstations (which rules out the usual functions like Hashcash that stress the CPU). For example, the 2003 On Memory-Bound Functions for Fighting Spam; from the abstract:

Burrows suggested that, since memory access speeds vary across machines much less than do CPU speeds, memory-bound functions may behave more equitably than CPU-bound functions; this approach was first explored by Abadi, Burrows, Manasse, and Wobber [8]. We further investigate this intriguing proposal. Specifically, we…

1. Provide an abstract function and prove an asymptotically tight amortized lower bound on the number of memory accesses required to compute an acceptable proof of effort; specifically, we prove that, on average, the sender of a message must perform many unrelated accesses to memory, while the receiver, in order to verify the work, has to perform significantly fewer accesses;
2. Propose a concrete instantiation of our abstract function, inspired by the RC4 stream cipher;
3. Describe techniques to permit the receiver to verify the computation with no memory accesses; 5. Give experimental results showing that our concrete memory-bound function is only about four times slower on a 233 MHz settop box than on a 3.06 GHz workstation, and that speedup of the function is limited even if an adversary knows the access sequence and uses optimal off-line cache replacement.

Abadi 2005, Moderately hard, memory-bound functions develop more memory-bound functions and benchmark them (partially replicated by Das & Doshi 2004):

…we give experimental results for five modern machines that were bought within a two-year period in 2000-2002, and which cover a range of performance characteristics. All of these machines are sometimes used to send e-mail-even the settop box,which is employed as a quiet machine in a home…None of the machines have huge caches-the largest was on the server machine, which has a 512KB cache. Although the clock speeds of the machines vary by a factor of 12, the memory read times vary by a factor of only 4.2. This measurement confirms our premise that memory read latencies vary much less than CPU speeds.

…At the high end, the server has lower performance than one might expect, because of a complex pipeline that penalizes branching code. In general, higher clock speeds correlate with higher performance, but the correlation is far from perfect…Second, the desktop machine is the most cost-effective one for both CPU-bound and memory-bound computations; in both cases, attackers are best served by buying the same type of machines as ordinary users. Finally, the memory-bound functions succeed in maintaining a performance ratio between the slowest and fastest machines that is not much greater than the ratio of memory read times.

Colin Percival continues the general trend in the context of finding passwords schemes which are resistant to cheap brute-forcing, inventing scrypt in the 2009 paper Stronger Key Derivation via Sequential Memory-Hard Functions. Percival notes that designing a really good memory-bound function requires not overly relying on latency since his proofs do not incorporate latency, although in practice this might not be so bad:

Existing widely used hash functions produce outputs of up to 512 bits (64 bytes), closely matching the cache line sizes of modern CPUs (typically 32-128 bytes), and the computing time required to hash even a very small amount of data (typically 200-2000 clock cycles on modern CPUs, depending on the hash used) is sufficient that the memory latency cost (typically 100-500 clock cycles) does not dominate the running time of ROMix.

However, as semiconductor technology advances, it is likely that neither of these facts will remain true. Memory latencies, measured in comparison to CPU performance or memory bandwidth, have been steadily increasing for decades, and there is no reason to expect that this will cease — to the contrary, switching delays impose a lower bound of Ω(log N ) on the latency of accessing a word in an N-byte RAM, while the speed of light imposes a lower bound of Ω( √N ) for 2-dimensional circuits. Furthermore, since most applications exhibit significant locality of reference, it is reasonable to expect cache designers to continue to increase cache line sizes in an attempt to trade memory bandwidth for (avoided) memory latency.

In order to avoid having ROMix become latency-limited in the future, it is necessary to apply it to larger hash functions. While we have only proved that ROMix is sequential memory-hard under the Random Oracle model, by considering the structure of the proof we note that the full strength of this model does not appear to be necessary.

Percival constructs a password algorithm on his new hash function and then calculates costs using 2002 circuit prices

When used for interactive logins, it is 35 times more expensive than bcrypt and 260 times more expensive than PBKDF2; and when used for file encryption — where, unlike bcrypt and PBKDF2, scrypt uses not only more CPU time but also increases the die area required — scrypt increases its lead to a factor of 4000 over bcrypt and 20000 over PBKDF2.

That is quite a difference between the hashes, especially considered that bcrypt and PBKDF2 were already engineered to have adjustable difficulty for similar reasons to our time-lock crypto puzzles.

1. Time-Lock Puzzles in the Random Oracle Model (Mahmoody, Moran, and Vadhan 2011):

In addition to the basic use of sending messages to the future, there are many other potential uses of timed-release crypto. Rivest, Shamir and Wagner 1996 suggest, among other uses, delayed digital cash payments, sealed-bid auctions and key escrow. Boneh and Naor define timed commitments and timed signatures and show that they can be used for fair contract signing, honesty-preserving auctions and more

Document embargoes and Receipt-free voting are other applications; a cute application is Offline Submission with RSA Time-Lock Puzzles, Jerschow & Mauve 2010:

Our main contribution is an offline submission protocol which enables an author being currently offline to commit to his document before the deadline by continuously solving an RSA puzzle based on that document. When regaining Internet connectivity, he submits his document along with the puzzle solution which is a proof for the timely completion of the document.

One not-so-cute use is in defeating antivirus software. Anti-Emulation Through Time-Lock Puzzles, Ebringer 2008 outlines it: one starts a program with a small time-lock puzzle which must be solved before the program does anything evil, in the hopes that the antivirus scanner will give up or stop watching before the puzzle has been solved and the program decrypts the evil payload; the puzzle’s math backing means no antivirus software can analyze or solve the puzzle first. The basic functionality cannot be blacklisted as it is used by legitimate cryptography software such as OpenSSL which would be expensive collateral damage.

2. Ebringer 2008, applying time-lock puzzles to enhancing the ability of computer viruses & trojans to defeat anti-virus scanners, describes Rivest’s original successive-squaring solution somewhat sarcastically:

Even in the original paper, the authors struggled to find a plausible use for it. To actually use the construction as a time-lock requires predicting the speed of CPUs in the future, resulting, at best, in a fuzzy release-date. This assumes that someone cares enough to want what is allegedly wrapped up in the puzzle to bother to compute the puzzle in the first place. It is not obvious that in the majority of situations, this would have a clear advantage over, say, leaving the information with a legal firm with instructions to release it on a particular date. Although this paper proposes a practical use for time-lock puzzles, the original authors would probably be dismayed that there is still not a widespread usage that appears to be of net benefit to humanity.

On the other hand, a similar criticism could and has been made about Bitcoin (supporters/users must expend massive computing power constantly just to keep it working, with no computational advantage over attackers), and that system has worked pretty well in practice.

3. Colin Percival’s Insecurity in the Jungle (disk) presents a table giving times for brute-forcing MD5 hashes given various hardware; most dramatically, <$1m of custom ASIC hardware could bruteforce a random 10 character string in 2 hours. (Hardware reaps extreme performance gains mostly when when few memory accesses are required, and a few fast operations applied to small amounts of data; this is because flexibility imposes overhead, and when the overhead is incurred just to run fast instructions, the overhead dominates the entire operation. For example, graphics chips do just a relative handful of math to a frame, again and again, and so they gain orders of magnitude speedups by being specialized chips - as does any other program which is like that, which includes cryptographic hashes designed for speed like the ones Bitcoin uses.) Wikipedia gives an older example using FPGAs (also being used for Bitcoin hashing): An important consideration to be made is that CPU-bound hash functions are still vulnerable to hardware implementations. For example, the literature provides efficient hardware implementations of SHA-1 in as low as 5000 gates, and able to produce a result in less than 400 clock cycles2. Since multi-million gate FPGAs can be purchased at less than$100 price points3, it follows that an attacker can build a fully unrolled hardware cracker for about \$5000. Such a design, if clocked at 100MHz can try about 300,000 keys/second for the algorithm proposed above.

4. Actual numbers; the difference really is that large.

5. eg. the 2008 Graves thesis, High performance password cracking by implementing rainbow tables on nVidia graphics cards (IseCrack) claims a 100x speedup over CPU generation of rainbow tables, or the actively developed utility, RainbowCrack (which you can even buy the generated rainbow tables from).