Notes on Pascal’s Mugging

Gwern Branwen

Attempt to resolve Pascal’s mugging by proposing & justifying a linear prior probability which bounds losses.

2009-12-23–2019-09-13 draft certainty: possible importance: 5 similar bibliography

Background
The Mugging
Fixes
Bibliography
Optimal Sequential Decision Making

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

Pascal’s Mugging is a general problem in decision theory: it exposes the paradoxical and apparently irrational consequences of a thorough-going probabilistic approach to devising a rational utilitarian agent, by offering a deal which seems, intuitively, to be utterly worthless but which the agent will accept. It is generally thought that any system which allows an agent to be exploited this way, to be a ‘money pump’, is irrational & flawed. I attempt to dissect Pascal’s Mugging into 2 separate problems, one of an epistemological flaw in the agent, and another about how to deal with low-probability but high-payoff events.

Background

Classical decision theory, going back to von Neumann and Morgenstern’s Theory of Games and Economic Behavior, says that a rational agent pursuing an arbitrary goal will select the sequence of actions that leads to the greatest fulfillment of that goal. If we imagine a branching tree of actions, the agent will sum up every path through the tree, and take the highest path. But what if the branches are not certain, if taking a path through the tree like A->D->M is not guaranteed to wind up at M, if each step through the tree is only probabilistic? How does one weigh the probability of a step succeeding against the value of a step in such a way that it can be compared to the many other possible steps? The very simplest way is to multiply the value of a step against the probability of reaching it. This gives us some number for each step; we will call it the ‘expected value’.

So imagine an agent seeking money; he is offered a fair coin-flip where heads, he gains $6 and tails he loses $4. Here he assess three possible steps: don’t accept the deal, accept the deal and get heads, accept the deal and get tails. The value of not accepting the deal is 0, and the probability of 0 is 100%, so 0. The value of accepting the deal and getting heads is $6, and since it’s a fair coin, he has a 50% chance; 50% of $6 is $3. The value of accepting the deal and getting tails is -$4, and this too is a 50% chance; 50% of -4 is -2. So, heads has an expected value of 3; tails has an expected value of -2; added together, they have a value of 1. 1 is greater than 0, so the agent will take the coin-flip and will wax wealthy.

Taking the choice with the greatest expected value is a very simple policy, and it works well in many real-life situations. It tells us to avoid the lottery, for example, and casinos. It will tell us when we win in a game and when we wouldn’t. Besides simplicity, it has the law of large numbers on its side—after enough trials, we will have the expected value. Or if we imagine that there is an agent for every possible outcome, then the total wealth/value across all these possible agents is as high as is possible, no other approach can do better across all possible outcomes. (That is, we could imagine an otherwise highly irrational agent, which has precognition, so it will only play games with you if it knows it will win; this agent would win against an expected-value agent, but only because we have magically defined away all the many outcomes where that precognitive agent loses. Or we could imagine a highly risk-averse agent; it would pass up opportunities our expected-value agent wouldn’t, and in the long run would do worse than our agent.)

So this expected-value paradigm has long been one of the best contenders for a definition of rationality in actions. (As far as beliefs goes, the agent could be a Bayesian or some other statistical formalism.)

The Mugging

“There is a concept which corrupts and upsets all others. I refer not to Evil, whose limited realm is that of ethics; I refer to the infinite.”¹

So let us imagine our agent in the Mugging. As described by Nick Bostrom, our agent is accosted by the mugger and asked what probability it would assign any deal offered by the mugger. How the agent calculates this probability is irrelevant, but the agent does give a probability—one in one quadrillion. The mugger then offers the agent a deal: some money right now, in exchange for 100 quadrillion tomorrow. The agent sets the requested money (-20) against the expected value (100 quadrillion times 1-in-1-quadrillion = 100), and concludes a net benefit of 80, which is greater than the expected-value of not taking the deal (0). (Note that I use ‘money’ here as a place-holder of ‘utility’ or ‘goal-fulfillment’ or some such term.)

Obviously something has gone wrong here. We seem to have a money pump or a Dutch Book: a deal which leaves our supposedly supremely rational agent worse off than he could have been, and worse, a deal which the mugger can offer again and again until our agent is flat-broke. But what exactly went wrong? The agent followed the procedure exactly as we taught him, and it had worked well up to this point.

And this is a real issue, since the mugging could take the form of the mugger asking the agent to become his disciple and help spread his new cult, or to spend the rest of his life statistically analyzing the Bible for messages from God, or ask the agent’s help in bombing some buildings. The mugger can ask essentially anything of the agent, and get it. As a normative ideal of rationality, this leaves something to be desired, and without a good normative ideal of rationality, we cannot point out the errors of other peoples’ ways, or analyze situations where human intuitions about expected-value breaks down (such as with particularly complex financial agreements, or blue-sky research projects).

Fixes

The simplest fix is to set some upper bound on the agent’s possible value. So our formula would be something like ‘probability times value, unless value is greater than upper-bound in which case 0’. In this case, our agent would have a bound, say, 20 billion, and would consider the offer of 100 quadrillion as an offer for 0, and thus incurring a cost of -20 with no gain (100 quadrillion times 0), which is clearly worse than the status quo of 0. The mugger would go away unless he could somehow convince the agent that there was a better than one-in-a-billion chance that the mugger would deliver on 20 billion (as 1/100 million times 20 billion, minus the cost of the $20 is greater than 0).

The upper bound on value has been suggested by multiple authors (McGee1999, De Blanc2007 & 2009). The upper bound does have some intuitive appeal. If our agent is seeking pleasure, there’s only so much pleasure it can physiologically have at any one moment, and its lifespan is finite, so the maximum possible pleasure would also be finite. Given that, why would it want any more? In other areas, further gains decrease in value, and sometimes seem to be worthless. (When Bill Gates was valued at $90 billion, would another $100,000 have been worth anything whatsoever to him?)

But our grand theory of rationality should work for all agents. What if our agent were immortal? If it could only have 20 billion units of pleasure and then it considered any further pleasure absolutely worthless, would we see it living for a million years or so and then one day simply cease moving & starve to death because it had reached the upper limit and now every possible action was equally worthless? That seems rather odd.

And how would this limit be assigned? If we imagine making our agent, how could we calculate what the limit should be? We might say that ‘our agent can only have 10 orgasms a second and will live for a century, so we can set the limit to X’, but this might cause problems if our agent is upgraded to be able to experience pleasure equivalent to 30 orgasms a second or its lifespan is extended. And if the goal is tied to external things, we also have this problem; we might set out to create a rational Gandhi whose mission is to lessen the suffering of all mankind—but mankind may keep growing and expanding indefinitely. If we set the limit too high, then our agent is vulnerable to the Mugging; if the limit is too low, it will commit suicide without, from our perspective, the mission being fulfilled.

So the upper bound solution may solve some Muggings, but at costs unacceptable. Here too we can ask our intuitions: we do not feel vulnerable or persuaded by the Mugging even at our most rational, but do we also feel that we might one day cease to take pleasure in any activity and simply wait for death? I at least do not.

A second solution is to take into account the specific value offered by the mugger and thereby change the assessed probability. Robin Hanson (Yudkowsky2007), writing about a negative version of the Mugging (with astronomical threats instead of rewards), suggests that “It might be more promising to assume that states with many people hurt have a low correlation with what any random person claims to be able to effect.” That is, for an agent with a small probability to be confronted with a ‘real’ mugger (genuinely capable of delivering on his claim) is to be in a very unusual position: the agent is now able to affect possibly even more people than Hitler, Stalin, Napoleon, Mao, Alexander the Great etc. all put together. (Perhaps the mugger threatened to harm 20 quadrillion people.) Yet all those other people have no ‘real’ mugger of their own. So what are the odds of our agent just happening to be the only one out of 20 quadrillion people to run into the mugger? We might guess that it’s something like 1 in 20 quadrillion, which as it happens, exactly counterbalances the value, giving us a very small or even negative expected value and defeating the mugger.

The problem with this solution is that it seems to solve the Mugging at the expense of other problems. If we initially assume that the chance of us really affecting X people is 1/X, then this suggests some counterintuitive beliefs. For example, we ought to start out believing that the chance of humanity right now being able to do anything about global warming is on the order of 1 in hundreds of billions or trillions. Why? It seems plausible humanity will continue to exist for centuries or millennia, and the billions or trillions of humans living during those centuries will be affected by any fix of global warming. Now, one may be skeptical about the political process being effective or anti-global warming efforts bearing fruit, but one-in-trillions or even just one-in-billions seems unduly pessimistic.

And there is a situation in which everybody could in fact affect X people—if there is an infinite number of people. Scientists have suggested infinite universes on multiple occasions, and we cannot rule the idea out on any logical ground. Should our theory of rationality stand or fall on what the cosmologists currently think? That too seems a flaw.

The third solution is to object to the procedure entirely. The problem is that our agent provides his probability of any payout first, and then the mugger can easily construct a deal with positive expected-value. Baumann2009 objects to providing the probability of a payout first, but he doesn’t justify this except to call such an ‘independent’ probability ‘extremely implausible’. I think we can firm this up by pointing out that here, with that independent probability, infinities have snuck right back into the problem. And where infinities are, we should not be surprised to see confusion. Thus, we can dissect the Mugging into 2 distinct problems: the problem of how one should assess arbitrary statements, and how to value high-payoff-low-probability actions.

Consider the generalized form of the first question: the mugger asks the agent what its probability that the mugger’s next statement will be true. It replies 1 in n; the mugger then makes the promise ‘give me your wallet & I will give you 1000n tomorrow’ etc.

But what sort of probability could n be? It’s in effect trying to give a probability over anything the mugger might say or promise. I don’t think this makes sense, as even if we ignore the unbounded length of the mugger’s potential next statement, he could state strange things like ‘Turing machine X halts’ or ‘the quadrillionth digit of Chaitin’s uncomputable number Omega is 1’—state all manner of undecidable, intractable, or incoherent statements. We might get rid of outright nonsense with some stipulations on content, but we’d be left with perfectly reasonable yet undecidable or intractable statements. Giving any probability seems to me to be quite impossible.

Or to narrow it back down, maybe the agent is told the mugger offers deals paying out a random integer, and asked his assessment that the mugger will pay out on any offer. Each possible offer must have a probability, we feel, which is nonzero. (Per Baumann, we feel that whatever the odds of payout on 10 utilities/dollars is, it’s greater than that of 1000.) But there are an infinite number of integers, so if each one is nonzero, then what is your total assessment? We asked for ‘any’, so the individual probabilities must simply add up —if the mugger has secretly resolved to pay you only if the random integer is 20, but not otherwise, then the total must be non-zero, since if the agent said 0, and the mugger wound up paying because 20 came up, then that’d be a contradiction.

But what’s the sum of an infinite series of non-zero terms? The obvious answers are 1 or infinity, both of which are ridiculous answers, and there’s no apparent sensible limit to converge on. So it must be impossible to have a minimum probability for every possible offer. (I remember hearing once that there is no uniform random way to pick an integer from the set of all integers, which would also imply the impossibility of having any sort of maximum-entropy prior on all the integers.)

In a very general way, I think this defeats many concerns about agents succumbing to the ‘fanaticism’ problem of thinking it might be in a simulation just because someone mentioned the idea to it, and scouring the universe looking for a way out—you can’t form any sensible probabilities about things ‘outside’ your total assemblage of theories & data & principles, because that implies you have information (encoded as a probability) about those outside things which isn’t included in your assemblage, which is either irrational or contradictory, but if those outside things are part of your assemblage (if you really have good reason to believe them), then it’s not really a ‘mugging’ to pursue them.

An example: imagine our universe allows the agent an optimal outcome of 1 trillion utilities, and someone suggests the simulation argument to it, saying nothing stops the ‘real’ universe from being big enough to allow the agent 1 trillion squared utilities, and so it should go look for loopholes; the agent wouldn’t do it because the possibility would be based on actual research and investigations, which all argue that there is no simulation and that even if there were, the agent has no reason to believe it’s any particular size larger than the current universe, because per the previous arguments that would imply assessments over all the integers or all statements/strings/theories. Now, if the agent had real evidence of simulation, such as stars choreographically flashing on and off, or perhaps anthropic arguments from actually running its own simulations, then it might go off on the search—but that’s as it should be.

So, Baumann is right in rejecting providing the probability first, since it could well imply outright contradiction, but then how do we decide how to scale our probability by the promised sum?

If we don’t scale down in proportion, we just render ourselves muggable at some point, even though the mugger might not know what; but if we scale down proportionately, that seems to me to say something along the lines of ‘under all possible deals I expect a certain constant to be my expected utility’ (eg. if we put him at 1/1 quadrillion when he offers 10 quadrillion, then we expect 10 from the deal, just like we would if it were 1/100 quadrillion for his 1000 quadrillion offer…), which is kind of strange too, since we already have claimed that some sort of nonlinear scaling feels intuitively appropriate—do we decrease nonlinearly until we hit the total wealth of the earth & switch to linear, or just when the promise reaches into the millions, or what?

On the other hand, the disproportionately shrinking strategy may not work—imagine we are in a universe where we know the majority of the universe’s utility is in the hands of a perverse god who will send muggers out to do his bidding, only one of which will deliver (but there will be a delivery before the universe expires). In this universe, shouldn’t our probability that a mugger will deliver go up over time as the utility remains unawarded to someone? We don’t actually know that there is such a god, but mightn’t we assign it a nonzero probability since it doesn’t initially seem to run afoul of the previous arguments about the illegitimacy of assigning a probability over everything (a set & finite universe, number of mugger encounters, and utility.) This argument is weak, though, since we could imagine that the utility might not be guaranteed to be awarded to someone, in which case the odds would get worse over time as lottery tickets get used up and the deadline draws near—every increasing-odds scenario could be neutralized by a decreasing-odds scenario.

Bibliography

Baumann, Peter. “Counting on Numbers”, Analysis 69:3 (2009), pp. 446-448 Bostrom, Nick. “The Infinitarian Challenge to Aggregative Ethics” (2008), https://nickbostrom.com/ethics/infinite.pdf Bostrom, Nick. “Pascal’s Mugging”, Analysis 69:3 (2009), pp. 443-445. https://nickbostrom.com/papers/pascal.pdf De Blanc, Peter. “Convergence of Expected Utilities with Algorithmic Probability Distributions” (2007), https://arxiv.org/abs/0712.4318 De Blanc, Peter. “Convergence of Expected Utility for Universal AI” (2009), https://arxiv.org/abs/0907.5598 McGee, Vann. “An airtight Dutch book”, Analysis 59:4 (1999), pp. 257-65 Yudkowsky, Eliezer. “Pascal’s Mugging: Tiny Probabilities of Vast Utilities”, LessWrong.com, 2007-10-19. https://www.lesswrong.com/posts/a5JAiTdytou3Jg749/pascal-s-mugging-tiny-probabilities-of-vast-utilities

One way to try to escape a mugging is to unilaterally declare that all probabilities below a certain small probability will be treated as zero. With the right pair of lower limit and mugger’s credibility, the mugging will not take place.

But such a ad hoc method violates common axioms of probability theory, and thus we can expect there to be repercussions. It turns out to be easy to turn such a person into a money pump, if not by the mugging itself.

Suppose your friend adopts this position, and he says specifically that any probabilities less than or equal to 1/20 are 0. You then suggest a game; the two of you will roll a d20 die, and if the die turns up 1-19, you will pay him one penny and if the die turns up 20, he pays you one dollar—no, one bazillion dollars.

Your friend then calculates: there is a 19/20 chance that he will win a little money, and there is a 1/20 chance he will lose a lot of money—but wait, 1/20 is too small to matter! It is zero chance, by his rule. So, there is no way he can lose money on this game and he can only make money.

He is of course wrong, and on the 5th roll you walk away with everything he owns. (And you can do this as many times as you like.)

Nor can it be rescued simply by shrinking the probability; the same game that defeated a limit of 1/20 can easily be generalized to 1/n, and in fact, we can make the game even worse: instead of 1 to n-1 winning, we define 0 and 1 as winning, and each increment as losing. As n increases, the large percentage of instant losses increases.

So clearly this modification doesn’t work.

If we think about the scaling of the prior probability of a reward, there’s clearly 3 general sorts of behavior:

sublinear relationship where the probability shrinks slower than the reward increases
a linear relationship where the probability shrinks as fast as the reward increases
and a superlinear relationship where the probability shrinks faster than the reward increases.

The first doesn’t resolve the mugging. If the mugging is ever rejected, then the mugger merely increases the reward and can always devise an accepted mugging.

The second and third both resolve the mugging—the agent will reject a large mugging if it rejected a smaller one. But is there any reason to prefer the linear relationship to the superlinear?

Yes, by the same reasoning as leads us to reject the lower limit solution. We can come up with a game or a set of deals which presents the same payoff but which an agent will accept when they should reject and vice versa.

Let’s suppose a mugger tries a different mugging. Instead of asking for $20, he asks for a penny, and he mentions that he will ask this another 1999 times; because the reward is so small, the probability is much higher than if he had asked for a flat $20 and offered a consequently larger reward.

If the agent’s prior was not so low that they always rejected the mugger, then we would see an odd flip—for small sums, the agent would accept the mugging and then as the sums increased, it would suddenly start rejecting muggings, even when the same total sums would have been transacted. That is, just by changing some labels while leaving the aggregate utility alone, we change the agent’s decisions.

If the agent had been using a linear relationship, there would be no such flip. If the agent rejected the penny offer it would reject the $20 offer and vice versa. It would be consistently right (or wrong). The sublinear and superlinear relationships are inconsistently wrong.

http://sl4.org/archive/0812/index.html#19570

https://www.lesswrong.com/posts/Ge3EueY5kfogfbd7e/open-thread-august-2010-part-2?commentId=6GM2j5AJgzCxpWCmu#comments

While raking, I think I finally thought of a proof that the before-offer-probability can’t be known.

The question is basically ‘what fraction of all Turing machines making an offer (which is accepted) will then output a certain result?’

We could rewrite this as ’what is the probability that a random Turing machine will output a certain result?

We could then devise a rewriting of all those Turing machines into Turing machines that halt or not when their offer is accepted (eg. halting might = delivering, not halting = welshing on the deal. This is like Rice’s theorem).

Now we are asking ‘what fraction of all these Turing machines will halt?’

However, this is asking ‘what is Chaitin’s constant for this rewritten set of Turing machines?’ and that is uncomputable!

Since Turing machine-based agents are a subset of all agents that might try to employ Pascal’s Mugging (even if we won’t grant that agents must be Turing machines), the probability is at least partially uncomputable. A decision procedure which entails uncomputability is unacceptable, so we reject giving the probability in advance, and so our probability must be contingent on the offer’s details (like its payoff).

continuing above:

Last I emailed you, I had an intuition that a linear relationship was best but no real justification. I think I have one now that goes like this:

Inasmuch as the mugger could be the robotic instantiation of an arbitrary Turing Machine, making predictions about what the mugger will output (‘do’) in response to your input is tantamount to solving the Halting Problem (see Rice’s theorem). Trying to make a prediction about the fraction of all muggers which will welsh on the deal is like trying to compute Chaitin’s omega; if you only compute part of it, you will make mistakes when the offered reward is very close to the right value to make the expected utility positive, but to compute all of it is impossible. This is true for the same reasons that most universal priors (like for Solomonoff induction) are uncomputable.

So, it is impossible to be an optimal agent when being mugged. An agent must have an erroneous prior and make mistakes. The question becomes what kind of prior and what kind of mistakes should the agent make?

We can look at our prior as a monotonic function on a Cartesian graph with the y-axis for offered-reward and the x-axis for the prior of said reward; for example:

the graph might be a simple inverse relationship where prior is inverse to reward; small increases cause small decreases and large increases large decreases
it might look more like an exponential where a large increase in reward leads to only a small decrease in the prior probability
or it might look logarithmic where small reward increases lead to a large decrease the prior probability.

These three general classes are all there are. (small/small; large/small; small/large.)

Suppose we make our agent with strategy #2. The mugger presents the agent with a mugging, is turned down, and simply throws in a few exponentials and succeeds. The prior doesn’t decrease as fast as the reward increases, so a patient mugger will always win. And this is true even if the agent updates the prior probability so all the previous deals are now considered negative expected utility—the mugger comes back and simply escalates the offer!

Suppose we make our agent have #3. A mugger presents our agent with a mugging, but makes a mistake and offers too high a reward. The agent instantly rejects it because the large reward incurs a massive penalty. So far so good. But the mugger reflects, and decides to break the offer up into 1 billion smaller offers summing to the same reward. Since each of the 1 billion muggings is for a much smaller reward, they incur little penalty and the agent accepts them all and is thoroughly mugged. Oops.

We could try to repair #3 by translating the curve leftwards, if you will—reducing the prior probability down to something safe even at very small values. For example, if a #3 agent would take a dollar in exchange for a penny and its prior is 2%, it can be mugged, and one would repair by forcing the prior down to <1% so now it is either indifferent or rejects outright the mugging. But alas, as most ad hoc interventions do, this doesn’t fix the problem entirely. Now our mugger can simply make 100 billion offers for slivers of a penny rather than dollars. Each deal will seem like a good deal, yet add up to ruin. Not a good system. Worse, any adjustment to the prior based on updates from the mugger’s failure to deliver would seem to leave this hole intact! Suppose they run through the 100 billion offers and the next day the mugger is nowhere to be seen. So the agent cuts the prior in half. Now the mugger can come back and make offers for half slivers of pennies…

If our agent is a #1 agent, matters are a little different.

A #1 agent does not have the disaggregation problem of a #3 agent; whether it’s one big offer or a billion small offers, the expected utilities sum the same no matter how they are divvied up.

Does it have the escalation problem of a #2 agent? It would seem not to; the question of whether it would take a large reward is the same as the question of whether it would take the small reward—would it accept any mugging? It all depends on how far leftwards the line starts off, on whether the equation looks like $y \cdot x = 1$ or $y \cdot x = 0.9$ or $y \cdot x = 1.1$ . The degree that the constant differs from 1 is the degree to which our agent is prejudiced towards or against mugging. If the constant is 0.9 then it will accept no muggings, of course, since all rewards come pre-assessed as negative-sum offers; if the constant is 1.1, then vice versa.

So with a bad constant, our #1 agent could be even worse than a #2 or #3 agent! But actually, it’s better. It will make errors and accept some muggings, but each mugging the agent will learn. Each time the mugger fails to deliver, the constant will shrink until it reaches the magic 1 and then the agent will start rejecting muggings. Its losses are finite and limited by how large the constant starts off with, so it doesn’t matter what that constant is. It could be lower than 1, since our agent could learn to increase its constant as well as decrease; if the mugger really is a Dark Lord of the Matrix, the mugger simply has to offer a little proof and our agent will be happy to transact with him.

Considering all the possibilities, the agent with the fewest problems is #1, and we have our answer to how to avoid Pascal’s Mugging: choose a prior which decreases in simple proportion to the offer.

Now, it’s obvious that you don’t want to scale any prior probability more or less than the reward because either way involves you in difficulties. But do we have any non-ad hoc reason to specifically scale the size of the reward by 1/n, to regard the complexity as equal to the size?

I think there may be a Kolmogorov complexity argument that size does equal complexity, and welcome anyone capable of formally dealing with it. My intuition goes: Kolmogorov complexity is about lengths of strings in some language which map onto outputs. By the pigeonhole principle, not all outputs have short inputs. But various languages will favor different outputs with the scarce short inputs, much like a picture compressor won’t do so hot on music files. 3^^^^3 has a short representation in the usual mathematical language, but that language is just one of indefinitely many possible mathematical languages; there are languages where 3^^^^3 has a very long representation, one 3^^^^3 long even, or longer still! Just like there are inputs to zip which compress very well and inputs which aren’t so great.

Now, given that our particular language favors 3^^^^3 with such a short encoding rather than many other numbers relatively nearby to 3^^^^3, doesn’t Pascal’s mugging start looking like an issue with our language favoring certain numbers? It’s not necessary to favor 3^^^^3, other languages don’t favor it or actively disfavor it. Those other languages would seem to be as valid and logically true as the current one, just the proofs or programs may be longer and written differently of course.

So, what do you get when you abstract away from the particular outputs favored by a language? When you add together and average the languages with long encodings and the ones with short encodings for 3^^^^3? What’s the length of the average program for 3^^^^3? I suggest it’s 3^^^^3, with every language that gives it a shorter encoding counterbalanced by a language with an exactly longer encoding.

An example of Kolmogorov complexity: Fabrice Bellard has written a 475-byte C program which computes the prime 2^6972593 − 1, which has over 2 million digits—and does so in a few seconds. Other examples are pretty much anything written for the demoscene, which prized writing complex graphics & audio programs in just a few hundred bytes.