---
title: "The Kelly Coin-Flipping Game: Exact Solutions via Decision Trees"
description: Decision-theoretic analysis of how to optimally play Haghani & Dewey 2016's 300-round double-or-nothing coin-flipping game with an edge and ceiling better than using the Kelly Criterion. Computing and following an exact decision tree increases earnings by $6.6 over a modified KC.
tags: statistics, decision theory, Haskell, NN, Python, shell, R, C, C++, Bayes
author: Gwern Branwen, Arthur B., nshepperd, FeepingCreature, Gurkenglas
created: 19 Jan 2017
status: finished
confidence: likely
importance: 8
...
> Haghani & Dewey 2016 experiment with a double-or-nothing coin-flipping game where the player starts with \$25 and has an edge of 60%, and can play 300 times, choosing how much to bet each time, winning up to a maximum ceiling of \$250. Most of their subjects fail to play well, earning an average \$91, compared to Haghani & Dewey 2016's heuristic benchmark of ~\$240 in winnings achievable using a modified Kelly Criterion as their strategy. The KC, however, is not optimal for this problem as it ignores the ceiling and limited number of plays.
>
> We solve the problem of the value of optimal play exactly by using decision trees & dynamic programming for calculating the value function, with implementations in R, Haskell, and C. We also provide a closed-form exact value formula in R & Python, several approximations using Monte Carlo/random forests/neural networks, visualizations of the value function, and a Python implementation of the game for the OpenAI Gym collection. We find that optimal play yields \$246.61 on average (rather than ~\$240), and so the human players actually earned only 36.8% of what was possible, losing \$155.6 in potential profit. Comparing decision trees and the Kelly criterion for various horizons (bets left), the relative advantage of the decision tree strategy depends on the horizon: it is highest when the player can make few bets (at _b_=23, with a difference of ~\$36), and decreases with number of bets as more strategies hit the ceiling.
>
> In the Kelly game, the maximum winnings, number of rounds, and edge are fixed; we describe a more difficult generalized version in which the 3 parameters are drawn from Pareto, normal, and beta distributions and are unknown to the player (who can use Bayesian inference to try to estimate them during play). Upper and lower bounds are estimated on the value of this game. In the variant of this game where subjects are not told the exact edge of 60%, a Bayesian decision tree approach shows that performance can closely approach that of the decision tree, with a penalty for 1 plausible prior of only \$1. Two deep reinforcement learning agents, DQN & DDPG, are implemented but DQN fails to learn and DDPG doesn't show acceptable performance, indicating better deep RL methods may be required to solve the generalized Kelly game.
# Background
## Set up
The paper ["Rational Decision-Making Under Uncertainty: Observed Betting Patterns on a Biased Coin"](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2856963), by Haghani & Dewey 2016 runs an economics/psychology experiment on optimal betting in a simple coin-flipping game:
> What would you do if you were invited to play a game where you were given \$25 and allowed to place bets for 30 minutes on a coin that you were told was biased to come up heads 60% of the time? This is exactly what we did, gathering 61 young, quantitatively trained men and women to play this game. The results, in a nutshell, were that the majority of these 61 players didn't place their bets very well, displaying a broad panoply of behavioral and cognitive biases. About 30% of the subjects actually went bust, losing their full \$25 stake. We also discuss optimal betting strategies, valuation of the opportunity to play the game and its similarities to investing in the stock market. The main implication of our study is that people need to be better educated and trained in how to approach decision making under uncertainty. If these quantitatively trained players, playing the simplest game we can think of involving uncertainty and favourable odds, didn't play well, what hope is there for the rest of us when it comes to playing the biggest and most important game of all: investing our savings? In the words of [Ed Thorp](!Wikipedia "Edward O. Thorp"), who gave us helpful feedback on our research: "This is a great experiment for many reasons. It ought to become part of the basic education of anyone interested in finance or gambling."
More specifically:
> ...Prior to starting the game, participants read a detailed description of the game, which included a clear statement, in bold, indicating that the simulated coin had a 60% chance of coming up heads and a 40% chance of coming up tails. Participants were given \$25 of starting capital and it was explained in text and verbally that they would be paid, by check, the amount of their ending balance subject to a maximum payout. The maximum payout would be revealed if and when subjects placed a bet that if successful would make their balance greater than or equal to the cap. We set the cap at \$250...Participants were told that they could play the game for 30 minutes, and if they accepted the \$25 stake, they had to remain in the room for that amount of time.^5^ Participants could place a wager of any amount in their account, in increments of \$0.01, and they could bet on heads or tails...Assuming a player with agile fingers can put down a bet every 6 seconds, that would allow 300 bets in the 30 minutes of play.
## Near-optimal play
The authors make a specific suggestion about what near-optimal play in this game would be, based on the [Kelly criterion](!Wikipedia) which would yield bets each round of 20% of capital:
> The basic idea of the Kelly formula is that a player who wants to maximize the rate of growth of his wealth should bet a constant fraction of his wealth on each flip of the coin, defined by the function $(2 \cdot p) - 1$, where $p$ is the probability of winning. The formula implicitly assumes the gambler has log utility. It's intuitive that there should be an optimal fraction to bet; if the player bets a very high fraction, he risks losing so much money on a bad run that he would not be able to recover, and if he bet too little, he would not be making the most of what is a finite opportunity to place bets at favorable odds...We present the Kelly criterion as a useful heuristic a subject could gainfully employ. It may not be the optimal approach for playing the game we presented for several reasons. The Kelly criterion is consistent with the bettor having log-utility of wealth, which is a more tolerant level of risk aversion than most people exhibit. On the other hand, the subjects of our experiment likely did not view \$25 (or even \$250) as the totality of their capital, and so they ought to be less risk averse in their approach to maximizing their harvest from the game. The fact that there is some cap on the amount the subject can win should also modify the optimal strategy...In our game, the Kelly criterion would tell the subject to bet 20% ($(2 \cdot 0.6) - 1$) of his account on heads on each flip. So, the first bet would be \$5 (20% of \$25) on heads, and if he won, then he'd bet \$6 on heads (20% of \$30), but if he lost, he'd bet \$4 on heads (20% of \$20), and so on.
>
> ...If the subject rightly assumed we wouldn't be offering a cap of more than \$1,000 per player, then a reasonable heuristic would be to bet a constant proportion of one's bank using a fraction less than the Kelly criterion, and if and when the cap is discovered, reducing the betting fraction further depending on betting time remaining to glide in safely to the maximum payout. For example, betting 10% or 15% of one's account may have been a sound starting strategy. We ran simulations on the probability of hitting the cap if the subject bet a fixed proportion of wealth of 10%, 15% and 20%, and stopping when the cap was exceeded with a successful bet. We found there to be a 95% probability that the subjects would reach the \$250 cap following any of those constant proportion betting strategies, and so the expected value of the game as it was presented (with the \$250 cap) would be just under \$240. However, if they bet 5% or 40% of their bank on each flip, the probability of exceeding the cap goes down to about 70%.
This game is interesting as a test case because it is just easy enough to solve exactly on a standard computer in various ways, but also hard enough to defeat naive humans and be nontrivial.
## Subjects' performance
Despite the Kelly criterion being well-known and fairly intuitive, and the game being very generous, participants did not perform well:
> The sample was largely comprised of college age students in economics and finance and young professionals at finance firms. We had 14 analyst and associate level employees at two leading asset management firms. The sample consisted of 49 males and 12 females. Our prior was that these participants should have been well prepared to play a simple game with a defined positive expected value...Only 21% of participants reached the maximum payout of \$250,^7^ well below the 95% that should have reached it given a simple constant percentage betting strategy of anywhere from 10% to 20%.^8^ We were surprised that one third of the participants wound up with less money in their account than they started with. More astounding still is the fact that 28% of participants went bust and received no payout. That a game of flipping coins with an ex-ante 60/40 winning probability produced so many subjects that lost everything is startling. The average ending bankroll of those who did not reach the maximum and who also did not go bust, which represented 51% of the sample, was \$75. While this was a tripling of their initial \$25 stake, it still represents a very sub-optimal outcome given the opportunity presented. The average payout across all subjects was \$91, letting the authors off the hook relative to the \$250 per person they'd have had to pay out had all the subjects played well.
This is troubling because the problem is so well-defined and favorable to the players, and can be seen as a microcosm of the difficulties people experience in rational betting.
(While it's true that human subjects typically perform badly initially in games like the iterated prisoner's dilemma and need time to learn, it's also true that humans only have one life to learn stock market investment during, and these subjects all should've been well-prepared to play.)
Instead of expected earnings of ~\$240, the players earned \$91 - forfeiting \$149.
However, if anything, the authors understate the underperformance, because as they correctly note, the Kelly criterion is not guaranteed to be optimal in this problem due to the potential for different utility functions (what if we simply want to maximize expected wealth, not log wealth?), the fixed number of bets & the ceiling, as the Kelly criterion tends to assume that wealth can increase without limit & there is an indefinite time horizon.
# Optimality in the coin-flipping MDP
Indeed, we can see with a simple example that KC is suboptimal in terms of maximizing expected value: what if we are given only 1 bet (_b_=1) to use our \$25 on?
If we bet 20% (or less) per the KC, then
$$0.6 \cdot (25+5) + 0.4 \cdot (25-5)= 26$$
But if we bet everything:
$$0.6 \cdot (25+25) + 0.4 \cdot (25-25) = 30$$
It's true that 40% of the time, we go bankrupt and so we couldn't play again... but there are no more plays in _b_=1 so avoiding bankruptcy boots nothing.
We can treat this coin-flipping game as a tree-structured [Markov decision process](!Wikipedia).
For more possible bets, the value of a bet of a particular amount given a wealth _w_ and bets remaining _b_-1 will recursively depend on the best strategy for the two possible outcomes (weighted by probability), giving us a Bellman value equation to solve like:
$$V(w,b) = \max_{x \in [0,w]}[0.6 \cdot V(\min(w+x, 250), b-1) + 0.4 \cdot V(w-x, b-1) ]$$
To solve this equation, we can explore all possible sequences of bets and outcomes to a termination condition and reward, and then work use backwards induction, defining a decision tree which can be (reasonably) efficiently computed using memoization/[dynamic programming](!Wikipedia).
Given the problem setup, we can note a few things about the optimal strategy:
1. if the wealth ever reaches \$0, the game has effectively ended regardless of how many bets remain, because betting \$0 is the only possibility and it always returns \$0
2. similarly, if the wealth ever reaches the upper bound of \$250, the optimal strategy will effectively end the game by always betting \$0 after that regardless of how many bets remain, since it can't do better than \$250 and can only do worse.
These two shortcuts will make the tree *much* easier to evaluate because many possible sequences of bet amounts & outcomes will quickly hit \$0 or \$250 and require no further exploration.
3. a state with more bets is always of equal or better value than fewer
3. a state with more wealth is always equal or better value than less
3. the value of 0 bets is the current wealth
4. the value of 1 bet depends on the ceiling and current wealth: whether \$250 is >2x current wealth. If the ceiling more than twice current wealth, the optimal strategy with 1 bet left is to bet everything, since that has highest EV and there's no more bets to worry about going bankrupt & missing out on.
# Implementation of Game
A Python implementation is available in [OpenAI Gym](https://github.com/openai/gym):
~~~{.Python}
import gym
env = gym.make('KellyCoinflip-v0')
env.reset()
env._step(env.wealth*100*0.2) # bet with 20% KC
# ...
env._reset() # end of game, start a new one, etc
~~~
## Decision tree
We can write down the value function as a mutual recursion: `f` calculates the expected value of the current step, and calls `V` to estimate the value of future steps; `V` checks for the termination conditions _w_=\$0/\$250 & _b_=0 returning current wealth as the final payoff, and if not, calls `f` on every possible action to estimate their value and returns the max...
This mutual recursion bottoms out at the termination conditions.
As defined, this is grossly inefficient as every node in the decision tree will be recalculated many times despite yielding identical deterministic, referentially transparent results.
So we need to memoize results if we want to evaluate much beyond _b_=5.
### Approximate Value function
Implemented in R:
~~~{.R}
# devtools::install_github("hadley/memoise")
library(memoise)
f <- function(x, w, b) { 0.6*mV(min(w+x,250), b-1) + 0.4*mV(w-x, b-1)}
mf <- memoise(f)
V <- function(w,b) {
returns <- if (b>0 && w>0 && w<250) { sapply(seq(0, w, by=0.1), function(x) { mf(x,w,b) }) } else { w }
max(returns) }
mV <- memoise(V)
## sanity checks:
mV(25, 0) == 25 && mV(25, 1) == 30 && mV(0, 300) == 0
~~~
Given our memoized value function `mV` we can now take a look at expected value of optimal betting for bets _b_ from 0 to 300 with the fixed starting payroll of \$25.
Memoization is no panacea, and R/[`memoise`](https://cran.r-project.org/web/packages/memoise/index.html) is slow enough that I am forced to make one change to the betting: instead of allowing bets in penny/\$0.01 increments, I approximate by only allowing bets in \$1 increments, to reduce the branching factor to a maximum of 250 possible choices each round rather than 25000 choices.
(Some comparisons with decipenny and penny trees suggests that this makes little difference since early on the bet amounts are identical and later on being able to make <\$1 adjustments to bets yield small gains relative to total wealth; see the Haskell implementation.)
Of course, even with memoization and reducing the branching factor, it's still difficult to compute the value of the optimal strategy all the way to _b_=300 because the exponentially increasing number of strategies still need to be computed at least once.
With this R implementation, trees for up to _b_=150 can be computed with 4GB RAM & ~16h.
The value function for increasing bets:
~~~{.R}
vs <- sapply(1:150, function(b) { round(mV(25, b), digits=1) }); vs
# [1] 30.0 36.0 43.2 45.4 48.0 53.7 54.6 57.6 61.8 62.6 66.8 68.2 70.6 74.1 75.0 79.1 79.7 82.5 84.7 86.2 89.8 90.3 93.8
# [24] 94.5 97.0 98.8 100.3 103.2 103.9 107.2 107.6 110.3 111.4 113.3 115.2 116.4 119.0 119.6 122.4 122.9 125.3 126.2 128.1 129.5 130.8 132.8
# [47] 133.6 136.0 136.5 138.8 139.4 141.4 142.3 143.8 145.2 146.3 148.0 148.8 150.6 151.3 153.1 153.8 155.5 156.3 157.7 158.8 159.9 161.2 162.1
# [70] 163.6 164.2 165.8 166.4 167.9 168.5 169.9 170.7 171.8 172.8 173.7 174.8 175.6 176.8 177.5 178.7 179.3 180.5 181.1 182.3 182.9 184.0 184.7
# [93] 185.6 186.4 187.2 188.1 188.8 189.8 190.3 191.3 191.9 192.8 193.4 194.3 194.9 195.7 196.3 197.1 197.8 198.4 199.1 199.7 200.5 201.0 201.8
# [116] 202.3 203.0 203.5 204.3 204.8 205.4 206.0 206.6 207.1 207.7 208.3 208.8 209.4 209.9 210.5 210.9 211.5 211.9 212.5 213.0 213.5 213.9 214.5
# [139] 214.9 215.4 215.8 216.3 216.8 217.2 217.6 218.0 218.5 218.9 219.3 219.7
~~~
So as the number of bets escalates, our expected payoff increases fairly quickly and we can get very close to the ceiling of \$250 with canny betting.
#### Monte Carlo tree evaluation
If we do not have enough RAM to expand the full decision tree to its terminal nodes, there are many ways to approximate it.
A simple one is to expand the tree as many levels far down as possible, and then if a terminal node is reached, do exact backwards induction, otherwise, at each pseudo-terminal node, approximate the value function somehow and then do backwards induction as usual.
The deeper the depth, the closer the approximation becomes to the exact value, while still doing optimal planning within the horizon.
One way to approximate it would be to run a large number of simulations (perhaps 100) taking random actions until a terminal node is hit, and take the mean of the total values as the estimate.
This would be a Monte Carlo tree evaluation.
(This forms the conceptual basis of the famous MCTS/[Monte Carlo tree search](!Wikipedia).)
A random policy is handy because it can be used anywhere, but here we already know a good heuristic which does better than random: the KC. So we can use that instead.
This gives us as much optimality as we can afford.
Setting up a simulation of the coin-flipping game which can be played with various strategies:
~~~{.R}
game <- function(strategy, wealth, betsLeft) {
if (betsLeft>0) {
bet <- strategy(wealth, betsLeft)
wealth <- wealth - bet
flip <- rbinom(1,1,p=0.6)
winnings <- 2*bet*flip
wealth <- min(wealth+winnings, 250)
return(game(strategy, wealth, betsLeft-1)) } else { return(wealth); } }
simulateGame <- function(s, w=25, b=300, iters=100) { mean(replicate(iters, game(s, w, b))) }
kelly <- function(w, b) { 0.20 * w }
smarterKelly <- function(w, b) { if(w==250) {0} else { (2*0.6-1) * w } }
random <- function(w, b) { sample(seq(0, w, by=0.1), 1) }
evaluate <- function(w,b) { simulateGame(smarterKelly, w, b) }
mevaluate <- memoise(evaluate)
mevaluate(25, 10)
# [1] 35.30544906
mevaluate(25, 100)
# [1] 159.385275
mevaluate(25, 300)
# [1] 231.4763619
~~~
As expected the KC can do well and is just as fast to compute as a random action, so using it will give much better estimates of the value function for free.
With the Monte Carlo value function set up, the original value function can be slightly modified to include a maximum depth parameter and to evaluate using the MC value function instead once that maximum depth is hit:
~~~{.R}
f <- function(x, w, b, maxDepth) { 0.6*mV(min(w+x,250), b-1, maxDepth) + 0.4*mV(w-x, b-1, maxDepth)}
mf <- memoise(f)
V <- function(w,b, maxDepth) {
if (b<=(b-maxDepth)) { mevaluate(w,b) } else {
returns <- if (b>0 && w>0 && w<250) { sapply(seq(0, w, by=0.1), function(x) { mf(x,w,b, maxDepth) }) } else { w }
max(returns) }}
mV <- memoise(V)
mV(25, 300, 100); mV(25, 300, 50)
~~~
### Optimal next action
One is also curious what the optimal strategy *is*, not just how much we can make with the optimal strategy given a particular starting point.
As defined, I can't see how to make `mV` return the optimal choices along the way, since it has to explore all choices bottom-up and can't know what is optimal until it has popped all the way back to the root, and state can't be threaded in because that defeats the memoization.
But then I remembered the point of computing value functions: given the (memoized) value function for each state, we can then plan by simply using `V` in a greedy planning algorithm by asking, at each step, for the value of each possible action (which is memoized, so it's fast), and returning/choosing the action with the maximum value (which takes into account all the downstream effects, including our use of `V` at each future choice):
~~~{.R}
VPplan <- function(w, b) {
if (b==0) { return (0); } else {
returns <- sapply(seq(0, w), function(wp) { mf(wp, w, b); })
return (which.max(returns)-1); }
}
mVPplan <- memoise(VPplan)
## sanity checks:
mVPplan(250, 0)
# [1] 0
mVPplan(250, 1)
# [1] 0
mVPplan(25, 3)
# [1] 25
~~~
It's interesting that when _b_ is very small, we want to bet everything on our first bet, because there's not enough time to bother with recovering from losses; but as we increase _b_, the first bet will shrink & become more conservative:
~~~{.R}
firstAction <- sapply(1:150, function(b) { mVPplan(25, b) }); firstAction
# [1] 25 25 25 10 10 18 17 18 14 14 14 9 10 11 7 11 7 10 10 10 9 6 9 7 9 8 9 8 7 8 7 8 7 8 7 8 7 7 7 7 7 6 7 6 7 6 7
# [48] 6 6 6 6 6 6 6 6 6 6 6 6 5 6 5 6 5 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 5 4 5 4 5 4
# [95] 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3
# [142] 4 3 4 3 4 3 3 3 3
~~~
### Optimizing
The foregoing is interesting but the R implementation is too slow to examine the case of most importance: _b_=300.
The issue is not so much the intrinsic difficulty of the problem - since the R implementation's RAM usage is moderate even at _b_=150, indicating that the boundary conditions do indeed tame the exponential growth & turn it into something more like quadratic growth - but the slowness of computing.
Profiling suggests that most of the time is spent inside `memoise`:
~~~{.R}
# redefine to clear out any existing caching:
forget(mV)
Rprof(memory.profiling=TRUE, gc.profiling=TRUE, line.profiling=TRUE)
mV(w=25, b=9)
Rprof(NULL)
summaryRprof()
~~~
Most of the time is spent in functions which appear nowhere in our R code, like `lapply` or `deparse`, which points to the behind-the-scenes `memoise`.
Memoization *should* be fast because the decision tree can be represented as nothing but a multidimensional array in which one does lookups for each combination of _w_/_b_ for a stored value _V_, and array lookups ought to be near-instantaneous.
But apparently there is enough overhead to dominate our decision tree's computation.
R does not provide hash tables, and there don't seem to be any good libraries for this; what one can do is use R's "environments", which use hash tables under the hood, but are restricted to string keys (yes, really) as a poor man's hash table by using `digest` to serialize & hash R objects into string keys/
The backwards induction remains the same and the hash-table can be updated incrementally without any problem (useful for switching to alternative methods like MCTS), so the rewrite is easy:
~~~{.R}
## crude hash table
library(digest)
d <- function (o) { digest(o, algo="xxhash32") }
lookup <- function(key) { get(d(key), tree) }
set <- function(key,value) { assign(envir=tree, d(key), value) }
isSet <- function(key) { exists(d(key), envir=tree) }
## example:
tree <- new.env(size=1000000, parent=emptyenv())
set(c(25,300), 50); lookup(c(25, 300))
# [1] 50
tree <- new.env(size=1000000, parent=emptyenv())
f <- function(x, w, b) {
0.6*V(min(w+x,250), b-1) + 0.4*V(w-x, b-1) }
V <- function(w, b) {
if (isSet(c(w,b))) { return(lookup(c(w,b))) } else {
returns <- if (b>0 && w>0 && w<250) { sapply(seq(0, w, by=0.1), function(x) { f(x,w,b) }) } else { return(w) }
set(c(w,b), max(returns))
return(lookup(c(w,b)))
}
}
V(25,5)
~~~
While the overhead is not as bad as `memoise`, performance is still not great.
#### Python
Unmemoized version (a little different due to floating point):
~~~{.Python}
import numpy as np # for enumerating bet amounts as a float range
def F(w, x, b):
return 0.6*V(min(w+x,250), b-1) + 0.4*V(w-x, b-1)
def V(w, b):
if b>0 and w>0 and w<250:
bets = np.arange(0, w, 0.1).tolist()
returns = [F(w,x,b) for x in bets]
else:
returns = [w]
return max(returns)
V(25,0)
V(25,0)
# 25
V(25,1)
# 29.98
V(25,2)
# 35.967999999999996
V(25,3)
~~~
Python has hash-tables built in as dicts, so one can use those:
~~~{.Python}
import numpy as np
def F(w, x, b, dt):
return 0.6*V(min(w+x,250), b-1, dt) + 0.4*V(w-x, b-1, dt)
def V(w, b, dt):
if b>0 and w>0 and w<250:
if (w,b) in dict:
return dict[(w,b)]
else:
bets = np.arange(0, w, 0.1).tolist()
returns = [F(w,x,b,dt) for x in bets]
else:
returns = [w]
best = max(returns)
dict[(w,b)] = best
return best
dict = {}
V(25, 0, dict)
# 25
~~~
#### Haskell
Gurkenglas provides an inscrutably elegant & fast version of the value function which works in GHCi:
~~~{.Haskell}
(!!25) $ (!!300) $ (`iterate` [0..250]) $ \nextutilities -> (0:) $ (++[250]) $ tail $ init $ map maximum $
(zipWith . zipWith) (\up down -> 0.6 * up + 0.4 * down) (tail $ tails nextutilities) (map reverse $ inits nextutilities)
-- 246.2080494949234
-- it :: (Enum a, Fractional a, Ord a) => a
-- (7.28 secs, 9,025,766,704 bytes)
~~~
Noting:
> `iterate f x = x : iterate f (f x)` doesn't need to think much about memoization. From right to left, the line reads "Given how much each money amount is worth when _n_ games are left, you can zip together that list's prefixes and suffixes to get a list of options for each money amount at (_n_+1) games left. Use the best for each, computing the new utility as a weighted average of the previous. At 0 and 250 we have no options, so we must manually prescribe utilities of 0 and 250. Inductively use this to compute utility at each time from knowing that at the end, money is worth itself. Look at the utilites at 300 games left. Look at the utility of \$25." Inductively generating that whole list prescribes laziness.
Since it's Haskell, we can switch to compiled code and use fast arrays.
Implementations in Haskell have been written by Gurkenglas & nshepperd using [`array-memoize`](https://hackage.haskell.org/package/array-memoize) & [`Data.Vector`](https://hackage.haskell.org/package/vector) (respectively):
~~~{.Haskell}
import System.Environment (getArgs)
import Data.Function (iterate)
import Data.Function.ArrayMemoize (arrayMemo)
type Wealth = Int
type EV = Double
cap :: Wealth
cap = 250
value :: [Wealth -> EV]
value = iterate f fromIntegral where
f next = arrayMemo (0, cap) $ \x -> maximum [go w | w <- [0..min (cap - x) x]] where
go w = 0.6 * next (min cap (x+w)) + 0.4 * next (x-w)
main :: IO ()
main = do [x, b] <- map read <$> getArgs
print $ value !! b $ x
~~~
The second one is more low-level and uses laziness/"tying the knot" to implement the memoization; it is the fastest and is able to evaluate `memo 25 300` in <12s: the value turns out to be \$246.
(If we assume that all games either end in \$250 or \$0, then this implies that $\frac{246}{250} = 0.984$ or >98.4% of decision-tree games hit the max, as compared to Haghani & Dewey's estimate that ~95% of KC players would.)
The memoization is done internally, so to access all values we do more logic within the lexical scope.
~~~{.Haskell}
import System.Environment (getArgs)
import Data.Function (fix)
import Data.Function.Memoize (memoFix2)
import Data.Vector (Vector)
import qualified Data.Vector as V (generate, (!))
import qualified Data.MemoUgly as U (memo)
type Wealth = Int
type Bets = Int
type EV = Double
cap :: Wealth
cap = 250
value :: (Wealth -> Bets -> EV) -> Wealth -> Bets -> EV
value next 0 b = 0
value next w 0 = fromIntegral w
value next w b = maximum [go x | x <- [0..min (cap - w) w]]
where
go x = 0.6 * next (min cap (w+x)) (b-1) + 0.4 * next (w-x) (b-1)
-- There are 4 possible ways aside from 'array-memoize':
-- 1. Direct recursion.
direct :: Wealth -> Bets -> EV
direct = fix value
-- 2. Memoised recursion.
memo :: Wealth -> Bets -> EV
memo w b = cached_value w b
where
cached_value w b = table V.! b V.! w
table :: Vector (Vector EV)
table = V.generate (b + 1)
(\b -> V.generate (cap + 1)
(\w -> value cached_value w b))
-- 3. Memoised recursion using 'memoize' library; slower.
memo2 :: Wealth -> Bets -> EV
memo2 = memoFix2 value
-- 4. Memoize using 'memoize-ugly' library; also slower but global
memo3 :: Wealth -> Bets -> EV
memo3 = U.memo . direct
main :: IO ()
main = do [w, b] <- map read <$> getArgs
print (memo w b)
~~~
We can obtain all values with a function like
~~~{.Haskell}
memo3 :: [Wealth] -> [Bets] -> [EV]
memo3 ws bs = zipWith cached_value ws bs
where
cached_value w b = table V.! b V.! w
table :: Vector (Vector EV)
table = V.generate (maximum bs + 1)
(\b -> V.generate (cap + 1)
(\w -> value cached_value w b))
~~~
and evaluate:
~~~{.Haskell}
λ> map round (memo3 (repeat 25) [1..300])
-- [30,36,43,45,48,54,55,58,62,63,67,68,71,74,75,79,80,82,85,86,90,90,94,94,97,99,100,
-- 103,104,107,108,110,111,113,115,116,119,120,122,123,125,126,128,130,131,133,134,
-- 136,136,139,139,141,142,144,145,146,148,149,151,151,153,154,155,156,158,159,160,
-- 161,162,164,164,166,166,168,169,170,171,172,173,174,175,176,177,177,179,179,181,
-- 181,182,183,184,185,186,186,187,188,189,190,190,191,192,193,193,194,195,196,196,
-- 197,198,198,199,200,200,201,202,202,203,204,204,205,205,206,207,207,208,208,209,
-- 209,210,210,211,212,212,213,213,214,214,214,215,215,216,216,217,217,218,218,219,
-- 219,219,220,220,221,221,221,222,222,222,223,223,224,224,224,225,225,225,226,226,
-- 226,227,227,227,228,228,228,228,229,229,229,230,230,230,230,231,231,231,231,232,
-- 232,232,232,233,233,233,233,234,234,234,234,234,235,235,235,235,236,236,236,236,
-- 236,236,237,237,237,237,237,238,238,238,238,238,238,239,239,239,239,239,239,239,
-- 240,240,240,240,240,240,240,241,241,241,241,241,241,241,241,242,242,242,242,242,
-- 242,242,242,242,243,243,243,243,243,243,243,243,243,243,244,244,244,244,244,244,
-- 244,244,244,244,244,244,245,245,245,245,245,245,245,245,245,245,245,245,245,245,
-- 246,246,246,246,246,246,246,246,246,246,246,246,246]
~~~
### Exact value function
We could go even further with `zip [1..3000] (memo3 (repeat 25) [1..3000])` (which takes ~118s).
Interestingly, the value stops increasing around _b_=2150 (_V_=249.99009946798637) and simply repeats from there; it does not actually reach 250.
This is probably because with a stake of \$25 it is possible to go bankrupt ([gambler's ruin](!Wikipedia)) even betting the minimal fixed amount of \$1.
(The original Haskell code is modeled after the R, which discretized this way for tractability; however, since the Haskell code is so fast, this is now unnecessary.)
If this is the case, then being able to bet smaller amounts should allow expected value to converge to \$250 as exactly as can be computed, because the probability of going bankrupt after 2500 bets of \$0.01 each is effectively zero - as far as R and Haskell will compute without special measures, $0.4^{2500}=0$.
We can go back and model the problem exactly as it was in the paper by simply multiply everything by 100 & interpreting in pennies:
~~~{.Haskell}
cap = 25000 -- 250*100
-- ...
λ> zip [1..3000] (memo3 (repeat (25*100)) [1..3000])
[(1,3000.0),(2,3600.0),(3,4320.0),(4,4536.000000000002),(5,4795.200000000003),
(6,5365.440000000002),(7,5458.752000000004),(8,5757.350400000005),(9,6182.853120000005),(10,6260.488704000007),
(11,6676.137676800007),(12,6822.331453440009),(13,7060.11133132801),(14,7413.298296422411),(15,7499.673019514893),
(16,7913.219095166989),(17,7974.777338678492),(18,8250.07680018581),(19,8471.445742115113),(20,8625.387566014524),
(21,8979.7747405993),(22,9028.029532170909),(23,9384.523360172316),(24,9449.401095461177),(25,9699.670042282964),
(26,9882.821122181414),(27,10038.471393315525),(28,10323.078038637072),(29,10394.862038743217),(30,10760.507349554608),
(31,10763.85850433795),(32,11040.414570571116),(33,11141.364854687306),(34,11337.607662912955),(35,11524.013477359924),
(36,11648.117264906417),(37,11909.035644688667),(38,11968.56153182668),(39,12294.156320547045),(40,12296.062638839443),
(41,12554.172237535688),(42,12628.174503649356),(43,12820.701630209007),(44,12962.820770637838),(45,13095.979243218648),
(46,13298.241837094192),(47,13377.799870378874),(48,13632.949691989288),(49,13664.245374230446),(50,13935.591599024072),
(51,13953.650302164471),(52,14168.318904539414),(53,14244.57145315508),(54,14407.574739209653),(55,14535.761032800987),
(56,14651.76990919319),(57,14826.143029982914),(58,14899.500923102909),(59,15114.7924517843),(60,15149.530755496799),
(61,15399.983335449066),(62,15400.771450460255),(63,15604.908915203496),(64,15652.268440952908),(65,15813.476931576255),
(66,15903.186446379666),(67,16025.159151461572),(68,16152.79680966944),(69,16238.976728976708),(70,16400.466139297678),
(71,16454.05880056233),(72,16645.646128447275),(73,16669.63235112316),(74,16880.735724497223),(75,16885.012958972715),
(76,17058.916142797243),(77,17099.596358241968),(78,17239.339510457277),(79,17312.850756552616),(80,17421.275332110683),
(81,17524.309847338853),(82,17604.068867939182),(83,17733.56645905442),(84,17787.13463287746),(85,17940.266786525535),
(86,17969.95038247786),(87,18144.105153480315),(88,18152.051576292564),(89,18315.336828409476),(90,18333.026286703964),
(91,18467.997509430665),(92,18512.510521892633),(93,18621.43451689773),(94,18690.183932686265),(95,18775.180977557586),
(96,18865.765874427907),(97,18928.817685066606),(98,19039.0117965317),(99,19081.969208988696),(100,19209.70993405155),
(101,19234.300290845116),(102,19377.678277281033),(103,19385.51250899971),(104,19524.210291319585),(105,19535.341194745335),
(106,19651.63659936918),(107,19683.55258264968),(108,19779.19175398216),(109,19829.94117896368),(110,19906.559349360658),
(111,19974.327332744302),(112,20033.4549154762),(113,20116.554995203987),(114,20159.623419373274),(115,20256.489653646044),
(116,20284.836941737507),(117,20394.01642719967),(118,20408.892517902153),(119,20529.038285622184),(120,20531.61013289084),
(121,20639.865300258658),(122,20652.830860566657),(123,20744.14900653099),(124,20772.415137446053),(125,20848.093322557295),
(126,20890.24116222975),(127,20951.49506668366),(128,21006.203412595118),(129,21054.171837601196),(130,21120.21127127963),
(131,21155.960430614512),(132,21232.187753960352),(133,21256.71536121999),(134,21342.06833189189),(135,21356.30748944987),
(136,21449.7998427048),(137,21454.622738748447),(138,21545.893033861896),(139,21551.56090345761),(140,21629.60553083633),
(141,21647.034539300425),(142,21712.842497605183),(143,21740.96793155338),(144,21795.467123757197),(145,21833.2961358927),
(146,21877.356736609778),(147,21923.964087187007),(148,21958.401746908596),(149,22012.92577178374),(150,22038.504663866137),
(151,22100.143459100647),(152,22117.579175397976),(153,22185.586988586107),(154,22195.549289624447),(155,22269.23310835155),
(156,22272.348533907876),(157,22343.200726693787),(158,22347.9192078922),(159,22408.92268721723),(160,22422.211687202966),
(161,22474.09110290414),(162,22495.183774650206),(163,22538.619509420547),(164,22566.800095953415),(165,22602.430698828903),
(166,22637.031537177958),(167,22665.456036220217),(168,22705.854721234122),(169,22727.634820414445),(170,22773.2515209447),
(171,22788.913686133077),(172,22839.208606334214),(173,22849.246045182404),(174,22903.71702393249),(175,22908.591564315902),
(176,22966.44249475668),(177,22966.915677569094),(178,23017.91541566918),(179,23024.18913098028),(180,23068.1913376925),
(181,23080.387557725626),(182,23117.90218718125),(183,23135.491081806776),(184,23166.99717629234),(185,23189.48394853381),
(186,23215.431334191013),(187,23242.35418014611),(188,23263.165077204263),(189,23294.093255008785),(190,23310.163806371635),
(191,23344.695808912184),(192,23356.397530793947),(193,23394.159357087876),(194,23401.840515265318),(195,23442.484035635396),
(196,23446.47095075504),(197,23489.640293583914),(198,23490.270646383346),(199,23529.46834312825),(200,23533.224741609163),
(201,23567.312697440484),(202,23575.321437418483),(203,23604.666095269786),(204,23616.551745369172),(205,23641.497567737664),
(206,23656.90925341186),(207,23677.779921209272),(208,23696.38990746728),(209,23713.489458166325),(210,23734.991807798215),
(211,23748.60571568506),(212,23772.71501926872),(213,23783.111220502426),(214,23809.56139463541),(215,23816.991259707916),
(216,23845.534410064567),(217,23850.233666149958),(218,23880.639012115425),(219,23882.82861769509),(220,23914.391375329913),
(221,23914.76844952492),(222,23942.714790600083),(223,23946.0474787004),(224,23970.438187949254),(225,23976.66184026535),
(226,23997.74934793851),(227,24006.60933420138),(228,24024.631040582935),(229,24035.889282584474),(230,24051.068374275732),
(231,24064.50239633),(232,24077.048621078735),(233,24092.450650947027),(234,24102.561052984205),(235,24119.737170755707),
(236,24127.59678851838),(237,24146.366121052237),(238,24152.148649090945),(239,24172.342607735227),(240,24176.211024526496),
(241,24197.672583935127),(242,24199.779747244538),(243,24222.35471729034),(244,24222.851974583486),(245,24243.93438158678),
(246,24245.42607879143),(247,24263.972996501543),(248,24267.50154423283),(249,24283.6900658947),(250,24289.078871384656),
(251,24303.07565677869),(252,24310.159487219513),(253,24322.121319078917),(254,24330.74566159496),(255,24340.819980440137),
(256,24350.840429290023),(257,24359.165841053073),(258,24370.447517349417),(259,24377.154275094566),(260,24389.571277415347),
(261,24394.781738403053),(262,24408.216622744527),(263,24412.045682031203),(264,24426.388969625325),(265,24428.94447133704),
(266,24444.09418292577),(267,24445.477310292874),(268,24461.32346887292),(269,24461.644170708976),(270,24476.42136455785),
(271,24477.445726085083),(272,24490.501181704694),(273,24492.883289818776),(274,24504.335944986862),(275,24507.958757514363),
(276,24517.920620087607),(277,24522.6745531501),(278,24531.251041416166),(279,24537.03357887482),(280,24544.32387659046),
(281,24551.039168217816),(282,24557.136561611253),(283,24564.69504250772),(284,24569.687240115927),(285,24578.005270307505),
(286,24581.974706479406),(287,24590.97422968356),(288,24593.998352542163),(289,24603.606573136858),(290,24605.75811775705),
(291,24615.907195034095),(292,24617.254442557904),(293,24627.881197714756),(294,24628.488224763427),(295,24639.274079225856),
(296,24639.46077884001),(297,24649.128469095107),(298,24650.173797856685),(299,24658.72751316172),(300,24660.629317974468)
...
(1884,24999.999999999614),(1885,24999.99999999963),(1886,24999.999999999643),(1887,24999.999999999658),(1888,24999.99999999967),
(1889,24999.99999999968),(1890,24999.999999999694),(1891,24999.9999999997),(1892,24999.999999999716),(1893,24999.999999999727),
(1894,24999.999999999738),(1895,24999.99999999975),(1896,24999.99999999976),(1897,24999.999999999767),(1898,24999.99999999978),
(1899,24999.99999999979),(1900,24999.9999999998),(1901,24999.99999999981),(1902,24999.999999999818),(1903,24999.999999999825),
(1904,24999.999999999833),(1905,24999.999999999844),(1906,24999.999999999854),(1907,24999.99999999986),(1908,24999.99999999987),
(1909,24999.999999999876),(1910,24999.999999999884),(1911,24999.99999999989),(1912,24999.999999999898),(1913,24999.999999999905),
(1914,24999.999999999913),(1915,24999.99999999992),(1916,24999.999999999924),(1917,24999.999999999927),(1918,24999.999999999935),
(1919,24999.99999999994),(1920,24999.99999999995),(1921,24999.99999999995),(1922,24999.999999999956),(1923,24999.999999999964),
(1924,24999.999999999967),(1925,24999.99999999997),(1926,24999.999999999978),(1927,24999.999999999978),(1928,24999.999999999985),
(1929,24999.99999999999),(1930,24999.999999999993),(1931,24999.999999999993),(1932,25000.0),(1933,25000.0),
(1934,25000.0),(1935,25000.0),(1936,25000.0),(1937,25000.0),(1938,25000.0),
(1939,25000.0),(1940,25000.0),(1941,25000.0),(1942,25000.0),(1943,25000.0),
(1944,25000.0),(1945,25000.0),(1946,25000.0),(1947,25000.0),(1948,25000.0),
(1949,25000) ... ]
~~~~
With the exact penny version, the value of _b_=300 is now \$246.6062932 (taking ~3746s compiled), so the approximation costs an expected value of only \$0.61.
The full run reveals that convergence happens at _b_=1932 (which can be computed in ~48503s compiled), past which there is no need to compute further.
#### C/C++
[FeepingCreature](https://github.com/FeepingCreature) provides a vectorized C implementation of the exact penny game that runs in a few minutes (benefiting ~13% from profile-guided optimization when compiled with GCC).
He also notes that memoization/dynamic programming is overkill for this problem: because almost all states ultimately get evaluated, the dynamic programming doesn't avoid evaluating many states.
Since each state depends on an immediate successor and forms a clean tree, it is possible to calculate the tree much more directly by evaluating all the terminal nodes _b_=0, looping over the previous nodes _b_+1, then throwing away the no longer necessary _b_, and evaluating _b_+2, and so on; only two layers of the tree need to be stored at any time, which can potentially fit into on-processor cache & exhibit far better cache locality.
Another advantage of this representation is parallelism, as each node in the upper layer can be evaluated independently.
An implementation using OpenMP for threading can solve _b_=300 in ~8s:
~~~{.C}
#include
#include
#include
#include
#define ROUND_LIMIT 300
#define MONEY_LIMIT 25000
#define MONEY_START 2500
#define MONEY_FACTOR 100.0f
// #define DATATYPE float
// #define F(X) X ## f
#define DATATYPE double
#define F(X) X
#define THREADS 8
// MEMOIZED METHOD
typedef struct {
bool known;
DATATYPE ev;
} CacheEntry;
static DATATYPE decide(CacheEntry * const table, int money, int round) {
if (round < 0) return money;
if (money == 0 || money == MONEY_LIMIT) return money;
int index = round * (MONEY_LIMIT + 1) + money;
// int index = money * ROUND_LIMIT + round;
if (table[index].known) return table[index].ev;
DATATYPE best_bet_score = -1;
for (int bet = 0; bet <= money; bet++) {
DATATYPE winres = decide(table, (money + bet > MONEY_LIMIT) ? MONEY_LIMIT : (money + bet), round - 1);
DATATYPE loseres = decide(table, money - bet, round - 1);
DATATYPE combined = F(0.6) * winres + F(0.4) * loseres;
best_bet_score = F(fmax)(best_bet_score, combined);
}
table[index] = (CacheEntry) { true, best_bet_score };
return best_bet_score;
}
void method1() {
CacheEntry *table = calloc(sizeof(CacheEntry), ROUND_LIMIT * (MONEY_LIMIT + 1));
printf("running.\n");
int best_bet = 0;
DATATYPE best_bet_score = -1;
for (int bet = 0; bet <= MONEY_START; bet++) {
DATATYPE winres = decide(table, MONEY_START + bet, ROUND_LIMIT - 1);
DATATYPE loseres = decide(table, MONEY_START - bet, ROUND_LIMIT - 1);
DATATYPE combined = F(0.6) * winres + F(0.4) * loseres;
if (combined > best_bet_score) {
best_bet = bet;
best_bet_score = combined;
}
}
printf("first round: bet %f for expected total reward of %f\n", best_bet / MONEY_FACTOR, best_bet_score / MONEY_FACTOR);
int count_known = 0;
for (int i = 0; i < ROUND_LIMIT * (MONEY_LIMIT + 1); i++) {
if (table[i].known) count_known ++;
}
printf("known: %i of %i\n", count_known, ROUND_LIMIT * (MONEY_LIMIT + 1));
}
// BOTTOM-UP METHOD
// given pass n in "from", compute pass n-1 in "to"
static void propagate(DATATYPE *from, DATATYPE *to) {
// for each money state...
#pragma omp parallel for num_threads(THREADS)
for (int a = 0; a < THREADS; a++) {
// distribute workload so that inner loop passes are approximately equal
// f(x) = 2x F(x) = x^2 = a/THREADS x = sqrt(a / THREADS)
int low = (int) (MONEY_LIMIT * sqrtf((float)a / THREADS));
int high = (int) (MONEY_LIMIT * sqrtf((a + 1.0f) / THREADS));
if (a == THREADS - 1) high = MONEY_LIMIT + 1; // precise upper border
for (int m = low; m < high; m++) {
DATATYPE max_score = 0.0;
// determine the bet that maximizes our expected earnings
// to enable vectorization we break the loop into two so that each half only gets one branch of the conditional
// DATATYPE winval = from[(m + b > MONEY_LIMIT) ? MONEY_LIMIT : (m + b)];
int high_inner = (m + m > MONEY_LIMIT) ? (MONEY_LIMIT - m) : m;
for (int b = 0; b <= high_inner; b++) {
DATATYPE winval = from[m + b];
DATATYPE loseval = from[m - b];
DATATYPE combined = F(0.6) * winval + F(0.4) * loseval;
max_score = F(fmax)(max_score, combined);
}
for (int b = high_inner + 1; b <= m; b++) {
DATATYPE winval = from[MONEY_LIMIT];
DATATYPE loseval = from[m - b];
DATATYPE combined = F(0.6) * winval + F(0.4) * loseval;
max_score = F(fmax)(max_score, combined);
}
to[m] = max_score;
}
}
}
void method2() {
DATATYPE *buf = malloc(sizeof(DATATYPE) * (MONEY_LIMIT + 1));
DATATYPE *buf_to = malloc(sizeof(DATATYPE) * (MONEY_LIMIT + 1));
// set up base case: making no bet, we have money of i
for (int i = 0; i <= MONEY_LIMIT; i++) buf[i] = i;
for (int i = ROUND_LIMIT - 1; i >= 0; i--) {
propagate(buf, buf_to);
DATATYPE *temp = buf;
buf = buf_to;
buf_to = temp;
}
int best_bet = 0;
DATATYPE best_bet_score = -1;
for (int b = 0; b <= MONEY_START; b++) {
DATATYPE winval = buf[MONEY_START + b];
DATATYPE loseval = buf[MONEY_START - b];
DATATYPE combined = 0.6 * winval + 0.4 * loseval;
if (combined > best_bet_score) {
best_bet_score = combined;
best_bet = b;
}
}
printf("first round: bet %f for expected total reward of %f\n", best_bet / MONEY_FACTOR, best_bet_score / MONEY_FACTOR);
}
int main() {
method2();
return 0;
}
~~~
Some quick benchmarking on my 2016 Lenovo ThinkPad P70 laptop with 4 physical cores & 8 virtual cores (Intel Xeon E3-1505M v5 Processor 8MB Cache / up to 3.70GHz) shows a roughly linear speedup from 1 to 3 threads and then small gains thereafter to a minimum of 7.2s with 8 threads:
~~~{.Bash}
for C in `ls coingame-*.c`; do
gcc -Wall -Ofast -fwhole-program -march=native -fopenmp $C -o coingame -lm &&
echo "$C" &&
(for i in {1..10}; do
/usr/bin/time -f '%e' ./coingame; done
);
done
~~~
For levels of parallelism on my laptop from 1-8 threads, the average times are: 28.98s/15.44s/10.70s/9.22s/9.10s/8.4s/7.66s/7.20s.
Caio Oliveira provides [a parallel C++ implementation using `std::async`](https://gist.github.com/caioaao/84f174c03b8ab1d804bc3243bb95bdd4); compiled with the same optimizations on (`g++ -Ofast -fwhole-program -march=native -pthread --std=c++11 kc.cpp -o kc`), it runs in ~1m16s:
~~~{.Cpp}
#include
#include
#include