Skip to main content

Haskell directory


“The Kelly Coin-Flipping Game: Exact Solutions”, Branwen et al 2017

Coin-flip: “The Kelly Coin-Flipping Game: Exact Solutions”⁠, Gwern Branwen, Arthur B., nshepperd, FeepingCreature, Gurkenglas (2017-01-19; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Decision-theoretic analysis of how to optimally play Haghani & Dewey 2016’s 300-round double-or-nothing coin-flipping game with an edge and ceiling better than using the Kelly Criterion. Computing and following an exact decision tree increases earnings by $6.6 over a modified KC.

Haghani & Dewey 2016 experiment with a double-or-nothing coin-flipping game where the player starts with $30.4[^\$25.0^~2016~]{.supsub} and has an edge of 60%, and can play 300 times, choosing how much to bet each time, winning up to a maximum ceiling of $303.8[^\$250.0^~2016~]{.supsub}. Most of their subjects fail to play well, earning an average $110.6[^\$91.0^~2016~]{.supsub}, compared to Haghani & Dewey 2016’s heuristic benchmark of ~$291.6[^\$240.0^~2016~]{.supsub} in winnings achievable using a modified Kelly Criterion as their strategy. The KC, however, is not optimal for this problem as it ignores the ceiling and limited number of plays.

We solve the problem of the value of optimal play exactly by using decision trees & dynamic programming for calculating the value function, with implementations in R, Haskell⁠, and C. We also provide a closed-form exact value formula in R & Python, several approximations using Monte Carlo/​random forests⁠/​neural networks, visualizations of the value function, and a Python implementation of the game for the OpenAI Gym collection. We find that optimal play yields $246.61 on average (rather than ~$240), and so the human players actually earned only 36.8% of what was possible, losing $155.6 in potential profit. Comparing decision trees and the Kelly criterion for various horizons (bets left), the relative advantage of the decision tree strategy depends on the horizon: it is highest when the player can make few bets (at b = 23, with a difference of ~$36), and decreases with number of bets as more strategies hit the ceiling.

In the Kelly game, the maximum winnings, number of rounds, and edge are fixed; we describe a more difficult generalized version in which the 3 parameters are drawn from Pareto, normal, and beta distributions and are unknown to the player (who can use Bayesian inference to try to estimate them during play). Upper and lower bounds are estimated on the value of this game. In the variant of this game where subjects are not told the exact edge of 60%, a Bayesian decision tree approach shows that performance can closely approach that of the decision tree, with a penalty for 1 plausible prior of only $1. Two deep reinforcement learning agents, DQN & DDPG⁠, are implemented but DQN fails to learn and DDPG doesn’t show acceptable performance, indicating better deep RL methods may be required to solve the generalized Kelly game.

“Candy Japan’s New Box A/B Test”, Branwen 2016

Candy-Japan: “Candy Japan’s new box A/B test”⁠, Gwern Branwen (2016-05-06; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Bayesian decision-theoretic analysis of the effect of fancier packaging on subscription cancellations & optimal experiment design.

I analyze an A/​B test from a mail-order company of two different kinds of box packaging from a Bayesian decision-theory perspective, balancing posterior probability of improvements & greater profit against the cost of packaging & risk of worse results, finding that as the company’s analysis suggested, the new box is unlikely to be sufficiently better than the old. Calculating expected values of information shows that it is not worth experimenting on further, and that such fixed-sample trials are unlikely to ever be cost-effective for packaging improvements. However, adaptive experiments may be worthwhile.

“Statistical Notes”, Branwen 2014

Statistical-notes: “Statistical Notes”⁠, Gwern Branwen (2014-07-17; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Miscellaneous statistical stuff

Given two disagreeing polls, one small & imprecise but taken at face-value, and the other large & precise but with a high chance of being totally mistaken, what is the right Bayesian model to update on these two datapoints? I give ABC and MCMC implementations of Bayesian inference on this problem and find that the posterior is bimodal with a mean estimate close to the large unreliable poll’s estimate but with wide credible intervals to cover the mode based on the small reliable poll’s estimate.

“2012 Election Predictions”, Branwen 2012

2012-election-predictions: “2012 election predictions”⁠, Gwern Branwen (2012-11-05; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Compiling academic and media forecaster’s 2012 American Presidential election predictions and statistically judging correctness; Nate Silver was not the best.

Statistically analyzing in R hundreds of predictions compiled for ~10 forecasters of the 2012 American Presidential election, and ranking them by Brier, RMSE, & log scores; the best overall performance seems to be by Drew Linzer and Wang & Holbrook, while Nate Silver appears as somewhat over-rated and the famous Intrade prediction market turning in a disappointing overall performance.

“‘HP: Methods of Rationality’ Review Statistics”, Branwen 2012

hpmor: “‘HP: Methods of Rationality’ review statistics”⁠, Gwern Branwen (2012-11-03; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Recording fan speculation for retrospectives; statistically modeling reviews for ongoing story with R

The unprecedented gap in Methods of Rationality updates prompts musing about whether readership is increasing enough & what statistics one would use; I write code to download reviews, clean it, parse it, load into R, summarize the data & depict it graphically, run linear regression on a subset & all reviews, note the poor fit, develop a quadratic fit instead, and use it to predict future review quantities.

Then, I run a similar analysis on a competing fanfiction to find out when they will have equal total review-counts. A try at logarithmic fits fails; fitting a linear model to the previous 100 days of MoR and the competitor works much better, and they predict a convergence in <5 years.

A survival analysis finds no major anomalies in reviewer lifetimes, but an apparent increase in mortality for reviewers who started reviewing with later chapters, consistent with (but far from proving) the original theory that the later chapters’ delays are having negative effects.

“Hyphenation: Configurable Knuth-Liang Hyphenation”, Kmett 2012

“hyphenation: Configurable Knuth-Liang hyphenation”⁠, Edward A. Kmett (2012; ; backlinks; similar):

Configurable Knuth-Liang hyphenation: Uses the UTF8 encoded hyphenation patterns provided by hyph-utf8⁠.


hyphenate english_US "supercalifragilisticexpialidocious"-- ["su","per","cal","ifrag","ilis","tic","ex","pi","al","ado","cious"] 
hyphenate english_US "hyphenation"-- ["hy","phen","ation"]

“Archiving GitHub”, Branwen 2011

Archiving-GitHub: “Archiving GitHub”⁠, Gwern Branwen (2011-03-20; ⁠, ; similar):

Scraping and downloading Haskell-related repositories from GitHub

Tutorial of how to write a Haskell program to scrape Haskell-related repositories on GitHub and download them for offline installation, search, reference, and source code analysis, using TagSoup & curl⁠.


“Archiving URLs”, Branwen 2011

Archiving-URLs: “Archiving URLs”⁠, Gwern Branwen (2011-03-10; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Archiving the Web, because nothing lasts forever: statistics, online archive services, extracting URLs automatically from browsers, and creating a daemon to regularly back up URLs to multiple sources.

Links on the Internet last forever or a year, whichever comes first. This is a major problem for anyone serious about writing with good references, as link rot will cripple several% of all links each year, and compounding.

To deal with link rot, I present my multi-pronged archival strategy using a combination of scripts, daemons, and Internet archival services: URLs are regularly dumped from both my web browser’s daily browsing and my website pages into an archival daemon I wrote, which pre-emptively downloads copies locally and attempts to archive them in the Internet Archive⁠. This ensures a copy will be available indefinitely from one of several sources. Link rot is then detected by regular runs of linkchecker, and any newly dead links can be immediately checked for alternative locations, or restored from one of the archive sources.

As an additional flourish, my local archives are efficiently cryptographically timestamped using Bitcoin in case forgery is a concern, and I demonstrate a simple compression trick for substantially reducing sizes of large web archives such as crawls (particularly useful for repeated crawls such as my DNM archives).

“Zeo Sleep Self-experiments”, Branwen 2010

Zeo: “Zeo sleep self-experiments”⁠, Gwern Branwen (2010-12-28; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

EEG recordings of sleep and my experiments with things affecting sleep quality or durations: melatonin, potassium, vitamin D etc

I discuss my beliefs about Quantified Self⁠, and demonstrate with a series of single-subject design self-experiments using a Zeo. A Zeo records sleep via EEG; I have made many measurements and performed many experiments. This is what I have learned so far:

  1. the Zeo headband is wearable long-term
  2. melatonin improves my sleep
  3. one-legged standing does little
  4. Vitamin D at night damages my sleep & Vitamin D in morning does not affect my sleep
  5. potassium (over the day but not so much the morning) damages my sleep and does not improve my mood/​productivity
  6. small quantities of alcohol appear to make little difference to my sleep quality
  7. I may be better off changing my sleep timing by waking up somewhat earlier & going to bed somewhat earlier
  8. lithium orotate does not affect my sleep
  9. Redshift causes me to go to bed earlier
  10. ZMA: inconclusive results slightly suggestive of benefits

“Nootropics”, Branwen 2010

Nootropics: “Nootropics”⁠, Gwern Branwen (2010-01-02; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar)

“Writing a Wikipedia RSS Link Archive Bot”, Branwen 2009

Wikipedia-RSS-Archive-Bot: “Writing a Wikipedia RSS Link Archive Bot”⁠, Gwern Branwen (2009-11-02; ⁠, ⁠, ; backlinks; similar):

Archiving using Wikipedia Recent Changes RSS feed (obsolete).

Continuation of the 2009 Haskell Wikipedia link archiving bot tutorial, extending it from operating on a pre-specified list of articles to instead archiving links live by using TagSoup parsing Wikipedia Recent Changes for newly-added external links which can be archived using WebCite in parallel. (Note: these tutorials are obsolete. WebCite is largely defunct, doing archiving this way is not advised, and WP link archiving is currently handled by Internet Archive-specific plugins by the WMF. For a more general approach suitable for personal use, see the writeup of archiver-bot in Archiving URLs⁠.)

“Who Wrote The ‘Death Note’ Script?”, Branwen 2009

Death-Note-script: “Who Wrote The ‘Death Note’ Script?”⁠, Gwern Branwen (2009-11-02; ⁠, ⁠, ⁠, ; backlinks; similar):

Internal, external, stylometric evidence point to live-action leak of Death Note Hollywood script being real.

I give a history of the 2009 leaked script, discuss internal & external evidence for its realness including stylometrics; and then give a simple step-by-step Bayesian analysis of each point. We finish with high confidence in the script being real, discussion of how this analysis was surprisingly enlightening, and what followup work the analysis suggests would be most valuable.

“Spaced Repetition for Efficient Learning”, Branwen 2009

Spaced-repetition: “Spaced Repetition for Efficient Learning”⁠, Gwern Branwen (2009-03-11; ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Efficient memorization using the spacing effect: literature review of widespread applicability, tips on use & what it’s good for.

Spaced repetition is a centuries-old psychological technique for efficient memorization & practice of skills where instead of attempting to memorize by ‘cramming’, memorization can be done far more efficiently by instead spacing out each review, with increasing durations as one learns the item, with the scheduling done by software. Because of the greater efficiency of its slow but steady approach, spaced repetition can scale to memorizing hundreds of thousands of items (while crammed items are almost immediately forgotten) and is especially useful for foreign languages & medical studies.

I review what this technique is useful for, some of the large research literature on it and the testing effect (up to ~2013, primarily), the available software tools and use patterns, and miscellaneous ideas & observations on it.

“Summers of Code, 2006–2013”, Branwen 2009

Summer-of-Code: “Summers of Code, 2006–2013”⁠, Gwern Branwen (2009-02-11; ⁠, ; backlinks; similar):

A retrospective of 8 years of SoC, and lessons learned

A compilation of Haskell-related student projects 2006-2013, with evaluations of their usefulness to the Haskell community, thoughts on what makes a good project, and predictions for 2011-2013.

“In Defense of Inclusionism”, Branwen 2009

In-Defense-Of-Inclusionism: “In Defense of Inclusionism”⁠, Gwern Branwen (2009-01-15; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Iron Law of Bureaucracy: the downwards deletionism spiral discourages contribution and is how Wikipedia will die.

English Wikipedia is in decline. As a long-time editor & former admin, I was deeply dismayed by the process. Here, I discuss UI principles, changes in Wikipedian culture, the large-scale statistical evidence of decline, run small-scale experiments demonstrating the harm, and conclude with parting thoughts.

“Writing a Wikipedia Link Archive Bot”, Branwen 2008

Wikipedia-Archive-Bot: “Writing a Wikipedia Link Archive Bot”⁠, Gwern Branwen (2008-09-26; ⁠, ⁠, ; backlinks; similar):

Haskell: tutorial on writing a daemon to archive links in Wikipedia articles with TagSoup and WebCite; obsolete.

This is a 2008 tutorial demonstrating how to write a Haskell program to automatically archive Internet links into WebCite & Internet Archive to avoid linkrot, by parsing WP dumps, downloading & parsing WP articles for external links with the TagSoup HTML parsing library, using the WebCite/​IA APIs to archive them, and optimizing runtime. This approach is suitable for one-off crawls but not for live archiving using the RSS feed; for the next step, see Wikipedia RSS Archive Bot for a demonstration of how one could write a RSS-oriented daemon.