Advertisement for 'HTerm, The Graphical Terminal'

Statistically analyzing in R hundreds of predictions compiled for ~10 forecasters of the 2012 American Presidential election, and ranking them by Brier, RMSE, & log scores; the best overall performance seems to be by Drew Linzer and Wang & Holbrook, while Nate Silver appears as somewhat over-rated and the famous Intrade prediction market turning in a disappointing overall performance.

In November 2012, I was hired by CFAR to compile an extensive dataset of pundits, modelers, hobbyists, and academics who had attempted to statistically forecast the 2012 American presidential race and other minor races; the results were interesting in that they contradicted the lionization of Nate Silver’s forecasts in The New York Times. This page is a full listing of the R source code I used to produce my analysis for the CFAR essay; notes on the derivation of each dataset are stored at 2012-election-gwern-notes.txt.

The essay itself lives at “Was Nate Silver the Most Accurate 2012 Election Pundit?”.

Background

This election prediction judgment divided up into several sections dealing with different categories of predictions:

  1. the overall Presidential race predictions: probability of Obama victory, final electoral vote count, and percentage of popular vote
  2. the Presidential state-by-state predictions: the percentage Obama will take (vote share/margin/edge), as well as the probability he will win that state at all
  3. the Senate state-by-state predictions: similar, but normalized for the Democratic candidate

Few forecasters made predictions in all categories, the ones who did make predictions did not always make their full predictions public, etc. Note that all percentages are normalized in terms of that going to Obama, Democrats, or in some cases, Independents/Greens. The “Reality” ‘forecaster’ is the ground truth; these were all updated 23 November in what is hopefully a final update.

The point of these calculations is to extract Brier scores (for categorical predictions like percentage of Obama victory) and RMSE sums (for continuous/quantitative predictions like vote share). Intrade prices were interpreted as straightforward probabilities without any correction for Intrade’s long-shot bias1

Presidential

presidential <- read.csv("http://www.gwern.net/docs/2012-election-presidential.csv", row.names=1)
# Reality=2012 result; 2008=2008 results
presidential
                probability electoral popular
Reality              1.0000       332   50.79
2008                 1.0000       365   53.00
Nate Silver          0.9090       313   50.80
Drew Linzer          0.9900       332      NA
Simon Jackman        0.9140       332   50.80
DeSart               0.8862       303   51.37
Margin of Error      0.6800       303   51.50
Wang & Ferguson      1.0000       303   51.10
Intrade              0.6580       291   50.75
Josh Putnam              NA       332      NA
Unskewed Polls           NA       263   48.88

# probability can be scored as a Brier score; available in 'verification' library
install.packages("verification")
library(verification)
# handle lists & vectors for later
br <- function(obs, pred) brier(unlist(obs),
                                unlist(pred),
                                bins=FALSE)$bs # bins=FALSE avoids rounding
# convenience function
brp <- function(p) brier(presidential["Reality",]$probability,
                         presidential[p,]$probability,
                         bins=FALSE)$bs
lapply(rownames(presidential)[1:9], brp)
  • Reality: 0
  • 2008: 0
  • Wang: 0
  • Linzer: 0.0001
  • Jackman: 0.007396
  • Silver: 0.008281
  • DeSart: 0.01295044
  • Margin: 0.1024
  • Intrade: 0.116964
  • Random: 0.25 (50% guess is always 0.25)
# To score electorals and populars, we use RMSE
rmse <- function(obs, pred) sqrt(mean((obs-pred)^2,na.rm=TRUE))
rpe <- function(p) rmse(presidential["Reality",]$electoral, presidential[p,]$electoral)
lapply(rownames(presidential), rpe)
  • Reality: 0
  • Linzer: 0
  • Jackman: 0
  • Putnam: 0
  • Silver: 19
  • DeSart: 29
  • Margin: 29
  • Wang: 29
  • 2008: 33
  • Intrade: 41
  • Unskewed: 69
rpp <- function(p) rmse(presidential["Reality",]$popular, presidential[p,]$popular)
lapply(rownames(presidential)[c(1:9,11)], rpp)
  • Reality: 0
  • Wang: 0.31
  • DeSart: 0.58
  • Jackman: 0.01
  • Silver: 0.01
  • Intrade: 0.04
  • Margin: 0.71
  • 2008: 2.21
  • Unskewed: 1.91

State

State Win Probabilities

# Reality=final 2012 result - 0 for Romney states, 100 for Obama
# 2008=2008 state results (=Reality, negated for Obama loss of Indiana & North Carolina)
statewin <- read.csv("http://www.gwern.net/docs/2012-election-statewin.csv", row.names=1)
statewin
                    al       ak     az     ar     ca      co     ct      de
Reality         0.0000 0.000000 0.0000 0.0000 1.0000 1.00000 1.0000 1.00000
2008            0.0000 0.000000 0.0000 0.0000 1.0000 1.00000 1.0000 1.00000
Nate Silver     0.0000 0.000000 0.0200 0.0000 1.0000 0.80000 1.0000 1.00000
Drew Linzer     0.0000 0.000086 0.0000 0.0000 1.0000 0.98333 1.0000 0.98333
Margin of Error 0.0269 0.099800 0.4388 0.0451 0.9443 0.64710 0.9125 0.93770
Intrade         0.0000 0.000000 0.0600 0.0000 0.9500 0.55600 0.9900 0.96000
DeSart          0.0000 0.090000 0.0390 0.0000 1.0000 0.52300 0.9990 1.00000
Simon Jackman   0.0052 0.000000 0.0050 0.0000 1.0000 0.76520 1.0000 1.00000
Wang & Ferguson 0.0000 0.000000 0.0000 0.0000 1.0000 0.84000 1.0000 1.00000
Josh Putnam         NA       NA     NA     NA     NA      NA     NA      NA
Unskewed Polls      NA       NA     NA     NA     NA      NA     NA      NA
                   dc     fl     ga     hi     id     il indiana     ia      ks
Reality         1.000 1.0000 0.0000 1.0000 0.0000 1.0000  0.0000 1.0000 0.00000
2008            1.000 1.0000 0.0000 1.0000 0.0000 1.0000  1.0000 1.0000 0.00000
Nate Silver     1.000 0.5000 0.0000 1.0000 0.0000 1.0000  0.0000 0.8400 0.00000
Drew Linzer        NA 0.6040 0.0000 1.0000 0.0000 1.0000  0.0000 0.9966 0.03866
Margin of Error 1.000 0.4575 0.1972 0.9987 0.0086 0.9569  0.3273 0.6467 0.09710
Intrade         0.975 0.3300 0.0300 0.9750 0.0000 0.9890  0.0200 0.6630 0.00000
DeSart          1.000 0.4910 0.0300 1.0000 0.0000 1.0000  0.0020 0.7700 0.00000
Simon Jackman   1.000 0.5216 0.0014 1.0000 0.0000 1.0000  0.0000 0.8376 0.00000
Wang & Ferguson 1.000 0.5000 0.0000 1.0000 0.0000 1.0000  0.0000 0.8400 0.00000
Josh Putnam        NA     NA     NA     NA     NA     NA      NA     NA      NA
Unskewed Polls     NA     NA     NA     NA     NA     NA      NA     NA      NA
                      ky     la     me      md     ma     mi     mn         ms
Reality         0.000000 0.0000 1.0000 1.00000 1.0000 1.0000 1.0000 0.00000000
2008            0.000000 0.0000 1.0000 1.00000 1.0000 1.0000 1.0000 0.00000000
Nate Silver     0.000000 0.0000 1.0000 1.00000 1.0000 0.9900 1.0000 0.00000000
Drew Linzer     0.000000 0.0000 1.0000 1.00000 1.0000 1.0000 1.0000 0.07866667
Margin of Error 0.019100 0.0442 0.8403 0.96837 0.8988 0.6837 0.7149 0.13470000
Intrade         0.000000 0.0000 0.9300 0.94000 0.9950 0.8840 0.8490 0.00000000
DeSart          0.000000 0.0000 0.9930 1.00000 1.0000 0.9350 0.9610 0.00000000
Simon Jackman   0.000004 0.0000 1.0000 1.00000 1.0000 0.9998 0.9992 0.00000000
Wang & Ferguson 0.000000 0.0200 1.0000 1.00000 1.0000 1.0000 1.0000 0.00000000
Josh Putnam           NA     NA     NA      NA     NA     NA     NA         NA
Unskewed Polls        NA     NA     NA      NA     NA     NA     NA         NA
                      mo     mt     ne        nv     nh     nj     nm     ny
Reality         0.000000 0.0000 0.0000 1.0000000 1.0000 1.0000 1.0000 1.0000
2008            0.000000 0.0000 0.0000 1.0000000 1.0000 1.0000 1.0000 1.0000
Nate Silver     0.000000 0.0200 0.0000 0.9300000 0.8500 1.0000 0.9900 1.0000
Drew Linzer     0.000000 0.0000 0.0000 0.9993333 0.9980 1.0000 1.0000 1.0000
Margin of Error 0.447300 0.2436 0.0562 0.7710000 0.6886 0.8647 0.8579 0.9697
Intrade         0.050000 0.0500 0.0000 0.8370000 0.6490 0.9790 0.9390 0.9500
DeSart          0.052000 0.0080 0.0000 0.7680000 0.7560 0.9980 0.9740 1.0000
Simon Jackman   0.000004 0.0032 0.0000 0.9120000 0.8324 0.9998 0.9968 1.0000
Wang & Ferguson 0.000000 0.0000 0.0000 0.9900000 0.8400 1.0000 1.0000 1.0000
Josh Putnam           NA     NA     NA        NA     NA     NA     NA     NA
Unskewed Polls        NA     NA     NA        NA     NA     NA     NA     NA
                        nc     nd        oh     ok        or     pa     ri
Reality         0.00000000 0.0000 1.0000000 0.0000 1.0000000 1.0000 1.0000
2008            1.00000000 0.0000 1.0000000 0.0000 1.0000000 1.0000 1.0000
Nate Silver     0.26000000 0.0000 0.9100000 0.0000 1.0000000 0.9900 1.0000
Drew Linzer     0.08533333 0.0000 0.9986667 0.0000 0.9986667 1.0000 1.0000
Margin of Error 0.50030000 0.1284 0.6038000 0.0029 0.7886000 0.7562 0.9684
Intrade         0.23000000 0.0030 0.6550000 0.0010 0.9590000 0.8200 0.9500
DeSart          0.06600000 0.0000 0.7040000 0.0000 0.9430000 0.8810 1.0000
Simon Jackman   0.28120000 0.0000 0.9298000 0.0000 0.9726000 0.9910 1.0000
Wang & Ferguson 0.16000000 0.0000 0.9300000 0.0000 1.0000000 0.9300 1.0000
Josh Putnam             NA     NA        NA     NA        NA     NA     NA
Unskewed Polls          NA     NA        NA     NA        NA     NA     NA
                       sc     sd       tn     tx     ut     vt     va     wa
Reality         0.0000000 0.0000 0.000000 0.0000 0.0000 1.0000 1.0000 1.0000
2008            0.0000000 0.0000 0.000000 0.0000 0.0000 1.0000 1.0000 1.0000
Nate Silver     0.0000000 0.0000 0.000000 0.0000 0.0000 1.0000 0.7900 1.0000
Drew Linzer     0.1386667 0.0000 0.000000 0.0000 0.0000 1.0000 0.9760 1.0000
Margin of Error 0.1345000 0.1665 0.053100 0.0545 0.0035 0.9846 0.5046 0.8473
Intrade         0.0400000 0.0500 0.020000 0.0200 0.0450 0.9800 0.5800 0.9750
DeSart          0.0030000 0.0010 0.000000 0.0000 0.0000 1.0000 1.0000 0.9980
Simon Jackman   0.1290000 0.0068 0.000004 0.0000 0.0000 1.0000 0.7840 1.0000
Wang & Ferguson 0.0000000 0.0000 0.000000 0.0000 0.0000 1.0000 0.8400 1.0000
Josh Putnam            NA     NA       NA     NA     NA     NA     NA     NA
Unskewed Polls         NA     NA       NA     NA     NA     NA     NA     NA
                         wv     wi          wy
Reality         0.000000000 1.0000 0.000000000
2008            0.000000000 1.0000 0.000000000
Nate Silver     0.000000000 0.9700 0.000000000
Drew Linzer     0.001333333 1.0000 0.000666667
Margin of Error 0.042700000 0.6448 0.006900000
Intrade         0.020000000 0.7460 0.000000000
DeSart          0.000000000 0.8560 0.000000000
Simon Jackman   0.005400000 0.9698 0.000000000
Wang & Ferguson 0.000000000 0.9900 0.000000000
Josh Putnam              NA     NA          NA
Unskewed Polls           NA     NA          NA

brstate <- function(p) br(statewin["Reality",], statewin[p,])
lapply(rownames(statewin)[1:9], brstate)
  • Reality: 0
  • Drew Linzer: 0.00384326
  • Wang/Ferguson: 0.007615686
  • Nate Silver: 0.00911372
  • Simon Jackman: 0.00971369
  • DeSart/Holbrook: 0.01605542
  • Intrade: 0.02811906
  • 2008: 0.03921569
  • Margin of Error: 0.05075311
  • random (50%) guesser 0.25000000

State win vote-shares

Sources:

statemargin <- read.csv("http://www.gwern.net/docs/2012-election-statemargin.csv", row.names=1)
statemargin
                      al       ak       az       ar       ca       co       ct
Reality         38.42829 40.79253 44.44855 36.87899 59.69455 51.56536 58.38274
2008            38.80000 37.70000 45.00000 38.80000 60.90000 53.50000 60.50000
Nate Silver     36.70000 38.60000 46.20000 38.60000 58.10000 50.80000 56.60000
Drew Linzer     40.30000 37.50000 46.20000 37.10000 59.80000 51.20000 56.80000
Margin of Error 37.00000 41.00000 49.00000 39.00000 61.00000 53.00000 59.00000
Josh Putnam           NA       NA 46.59500       NA 58.39500 50.87500 55.92000
Unskewed Polls  37.78000 36.40000 43.95000 44.68000 57.65000 49.48000 54.55000
Intrade               NA       NA       NA       NA       NA       NA       NA
Simon Jackman   38.70000       NA 46.10000 36.40000 58.60000 51.00000 56.80000
DeSart          35.20000 32.20000 46.40000 38.70000 59.20000 50.10000 57.70000
Wang & Ferguson 42.50000 39.00000 46.00000 38.00000 57.50000 51.00000 56.50000
                      de       dc       fl       ga       hi       id       il
Reality         58.61074 90.91402 50.00787 45.48216 70.54523 32.62233 57.53322
2008            61.90000 92.90000 50.90000 47.00000 71.80000 36.10000 61.80000
Nate Silver     59.60000 93.00000 49.80000 45.50000 66.50000 32.10000 59.80000
Drew Linzer     61.00000       NA 50.20000 46.00000 65.60000 31.20000 60.20000
Margin of Error 60.00000 91.00000 49.00000 44.00000 70.00000 35.00000 61.00000
Josh Putnam           NA       NA 50.08000 45.38000       NA       NA 59.58000
Unskewed Polls  86.88000 57.40000 47.60000 43.20000 58.55000 30.95000 55.25000
Intrade               NA       NA       NA       NA       NA       NA       NA
Simon Jackman         NA 91.60000 50.10000 45.50000 65.00000 32.00000 59.60000
DeSart          60.50000 95.80000 49.90000 45.50000 66.60000 29.10000 60.80000
Wang & Ferguson 62.50000 90.00000 50.00000 46.00000 63.50000 32.00000 59.50000
                 indiana      ia       ks       ky       la       me       md
Reality         44.08345 51.9882 37.82721 37.80994 40.57746 55.96352 61.97419
2008            49.90000 54.0000 41.40000 41.10000 39.90000 57.60000 61.90000
Nate Silver     45.30000 51.1000 37.90000 40.30000 39.30000 55.90000 60.90000
Drew Linzer     44.30000 51.6000 41.10000 45.10000 39.70000 56.40000 61.30000
Margin of Error 47.00000 52.0000 41.00000 36.00000 39.00000 57.00000 62.00000
Josh Putnam     44.36500 51.2750       NA       NA 43.06500 56.17000 60.64500
Unskewed Polls  41.90000 49.8800 36.60000 40.80000 43.63000 51.90000 55.83000
Intrade               NA      NA       NA       NA       NA       NA       NA
Simon Jackman   44.80000 51.4000       NA 40.90000 38.90000 56.00000 61.00000
DeSart          43.10000 51.8000 39.40000 41.90000 39.30000 55.90000 61.90000
Wang & Ferguson 43.50000 51.0000 41.50000 44.50000 43.50000 55.50000 61.00000
                      ma       mi      mn       ms       mo       mt       ne
Reality         60.74886 54.30391 52.6497 43.54862 44.34962 41.70813 37.86805
2008            62.00000 57.40000 54.2000 42.80000 49.30000 47.20000 41.50000
Nate Silver     59.00000 53.00000 53.7000 45.60000 45.60000 45.20000 40.40000
Drew Linzer     60.00000 52.70000 54.2000 41.80000 45.30000 45.30000 42.50000
Margin of Error 58.00000 53.00000 54.0000 43.00000 49.00000 45.00000 40.00000
Josh Putnam     56.17000 52.78500 53.7650       NA 45.92500 45.46500       NA
Unskewed Polls  60.10000 51.75000 51.0300 39.83000 46.20000 38.80000 34.40000
Intrade               NA       NA      NA       NA       NA       NA       NA
Simon Jackman   59.70000 53.60000 54.0000       NA 45.30000 46.00000 42.80000
DeSart          62.80000 53.70000 54.3000 40.00000 46.10000 44.20000 39.80000
Wang & Ferguson 59.50000 52.75000 53.7500 44.00000 45.25000 45.75000 43.00000
                      nv       nh       nj       nm       ny       nc       nd
Reality         52.35625 51.98268 57.85939 52.99547 62.62461 48.35097 38.69731
2008            55.10000 54.30000 56.80000 56.70000 62.20000 49.90000 44.70000
Nate Silver     51.80000 51.40000 55.50000 54.10000 62.40000 48.90000 42.00000
Drew Linzer     52.20000 51.60000 56.60000 54.40000 63.20000 49.10000 41.70000
Margin of Error 55.00000 53.00000 57.00000 57.00000 62.00000 50.00000 43.00000
Josh Putnam     52.02500 51.51500 56.18000 54.56500 62.51000 49.22000       NA
Unskewed Polls  52.15000 50.03000 53.80000 53.53000 58.75000 44.98000 37.15000
Intrade               NA       NA       NA       NA       NA       NA       NA
Simon Jackman   51.90000 51.30000 56.00000 54.40000 62.70000 49.20000 43.00000
DeSart          51.80000 51.70000 57.10000 54.70000 64.60000 47.70000 40.10000
Wang & Ferguson 52.50000 51.00000 56.00000 53.00000 62.00000 49.00000 43.00000
                      oh       ok       or       pa       ri       sc       sd
Reality         50.14323 33.22768 54.30016 51.75834 62.70096 44.08803 39.86614
2008            51.20000 34.40000 57.10000 54.70000 63.10000 44.90000 44.70000
Nate Silver     51.30000 33.80000 53.60000 52.50000 61.80000 43.20000 42.50000
Drew Linzer     51.60000 33.50000 53.60000 52.70000 63.10000 44.30000 44.80000
Margin of Error 52.00000 31.00000 55.00000 55.00000 62.00000 43.00000 44.00000
Josh Putnam     51.47000       NA       NA 52.84500       NA       NA 44.79000
Unskewed Polls  47.75000 35.55000 50.53000 50.28000 59.73000 41.58000 39.88000
Intrade               NA       NA       NA       NA       NA       NA       NA
Simon Jackman   51.90000 34.50000 53.10000 53.10000 62.80000 44.80000 44.90000
DeSart          51.30000 35.40000 53.80000 52.90000 65.20000 43.30000 42.70000
Wang & Ferguson 51.50000 35.50000 53.00000 51.50000 62.00000 47.00000 44.50000
                      tn       tx       ut       vt       va       wa       wv
Reality         39.07377 41.36371 24.73251 66.57055 51.15646 56.13941 35.50631
2008            41.80000 43.80000 34.20000 67.80000 52.70000 57.50000 42.60000
Nate Silver     41.40000 41.20000 27.80000 66.20000 50.70000 56.20000 41.30000
Drew Linzer     43.30000 41.40000 26.70000 70.50000 51.10000 57.10000 42.80000
Margin of Error 39.00000 39.00000 31.00000 64.00000 50.00000 57.00000 38.00000
Josh Putnam     43.93500 42.44000 27.31000       NA 50.89500 56.68000       NA
Unskewed Polls  43.70000 39.85000 28.75000 56.53000 48.88000 51.43000 44.55000
Intrade               NA       NA       NA       NA       NA       NA       NA
Simon Jackman   44.00000 40.80000 26.90000 68.80000 51.00000 56.50000 41.50000
DeSart          41.30000 40.70000 25.90000 70.70000 50.10000 57.10000 39.80000
Wang & Ferguson 44.50000 42.00000 27.50000 65.50000 51.00000 57.00000 41.50000
                      wi       wy
Reality         52.80191 27.81889
2008            56.30000 32.70000
Nate Silver     52.40000 30.90000
Drew Linzer     52.50000 32.00000
Margin of Error 52.00000 33.00000
Josh Putnam     52.30500       NA
Unskewed Polls  49.98000 30.55000
Intrade               NA       NA
Simon Jackman   52.50000       NA
DeSart          52.60000 30.10000
Wang & Ferguson 52.25000 34.00000

What’s the equivalent of Brier function for outcomes which aren’t yes/no binary? A more quantitative measure; a common choice is the RMSE (which punishes outliers), in this case, we’re looking at the difference between the predicted edge in votes and the actual edge over all the states a predictor gave us numbers:

rmse <- function(obs, pred) sqrt(mean((obs-pred)^2,na.rm=TRUE))
rmsesm <- function(p) rmse(statemargin["Reality",], statemargin[p,])
lapply(rownames(statemargin), rmsesm)
  • Reality: 0
  • Nate Silver: 1.863676
  • Josh Putnam: 2.033683
  • Simon Jackman: 2.25422
  • DeSart & Holbrook: 2.414322
  • Margin of Error: 2.426244
  • Drew Linzer: 2.5285
  • Wang & Ferguson: 2.79083
  • 2008: 3.206457
  • Unskewed Polls: 7.245104

Senate

Senate Win Probabilities

senatewin <- read.csv("http://www.gwern.net/docs/2012-election-senatewin.csv", row.names=1)
                   az    ca    ct   de    fl   hi indiana    me   md    ma   mi
Reality         0.000 1.000 1.000 1.00 1.000 1.00    1.00 1.000 1.00 1.000 1.00
Nate Silver     0.040 1.000 0.960 1.00 1.000 1.00    0.70 0.930 1.00 0.940 1.00
Intrade         0.225 0.998 0.888 0.99 0.859 0.96    0.85 0.957 0.96 0.786 0.95
Wang & Ferguson 0.120 0.950 0.998 0.95 0.950 0.95    0.84 0.950 0.95 0.960 0.96
                  mn   ms    mo    mt   ne   nv   nj   nm   ny    nd   oh   pa
Reality         1.00 0.00 1.000 1.000 0.00 0.00 1.00 1.00 1.00 1.000 1.00 1.00
Nate Silver     1.00 0.00 0.980 0.340 0.01 0.17 1.00 0.97 1.00 0.080 0.97 0.99
Intrade         0.95 0.00 0.703 0.371 0.06 0.06 0.96 0.95 1.00 0.155 0.84 0.86
Wang & Ferguson 0.95 0.05 0.960 0.690 0.05 0.27 0.95 0.95 0.95 0.750 0.95 0.95
                  ri   tn    tx   ut   vt   va   wa    wv    wi   wy
Reality         1.00 0.00 0.000 0.00 0.00 1.00 1.00 1.000 1.000 0.00
Nate Silver     1.00 0.00 0.000 0.00 0.00 0.88 1.00 0.920 0.790 0.00
Intrade         0.99 0.00 0.025 0.00 0.05 0.78 0.96 0.951 0.626 0.00
Wang & Ferguson 0.95 0.05 0.050 0.05 0.05 0.96 0.95 0.950 0.720 0.05

The Senate win predictions (done only by Wang, Silver, & Intrade in this dataset):

brsw <- function (pundit) br(senatewin["Reality",], senatewin[pundit,])
lapply(rownames(senatewin), brsw)
  • Wang: 0.01246376
  • Silver: 0.04484545
  • Intrade: 0.04882958

To combine the state win predictions with the presidency win prediction and also the Senate race win predictions requires data on all 3, so still Wang vs Silver vs Intrade:

combineBinaryForecasts <- function(p) c(statewin[p,],
                                        senatewin[p,],
                                        presidential[p,]$probability)
brpssw <- function(pundit) br(combineBinaryForecasts("Reality"), combineBinaryForecasts(pundit))
lapply(rownames(senatewin), brpssw)
  • Wang: 0.009408282
  • Silver: 0.02297625
  • Intrade: 0.03720485

Senate win vote-shares

Source: Washington Post

senatemargin <- read.csv("http://www.gwern.net/docs/2012-election-senatemargin.csv", row.names=1)
senatemargin
              az   ca   ct   de   fl   hi   id   me   md   ma   mi   mn   ms
Reality     45.8 61.6 55.2 66.4 55.2 62.6 49.9 52.9 55.3 53.7 54.7 65.3 40.3
Nate Silver 46.6 59.6 52.6 66.5 53.2 56.6 50.0 53.0 60.8 51.7 56.0 63.7 32.3
              mo   mt   ne   nv   nj   nm   ny   nd   oh   pa   ri   tn   tx
Reality     54.7 48.7 41.8 44.7 58.5 51.0 71.9 50.5 50.3 53.6 64.8 30.4 40.5
Nate Silver 52.2 48.4 45.6 47.5 56.1 53.4 67.5 47.2 51.9 52.9 59.1 35.5 41.5
              ut   vt   va   wa   wv   wi   wy
Reality     30.2 24.8 52.5 60.2 60.6 51.5 21.6
Nate Silver 32.4 25.0 51.0 59.3 56.0 51.1 27.7

rmse <- function(obs, pred) sqrt(mean((obs-pred)^2,na.rm=TRUE))
r <- function(x) rmse(senatemargin["Reality",], senatemargin[x,])
r("Reality"); r("Nate Silver"); # no one else's predictions are available
  • Reality: 0
  • Nate Silver: 3.272197

Not bad at all.

Let’s combine the state margin with the electoral / popular to get an overall RMSE picture of the predictors:

r <- function(p) rmse(unlist(c(statemargin["Reality",],
                        presidential["Reality",]$electoral,
                        presidential["Reality",]$popular)),
                        unlist(c(statemargin[p,], presidential[p,]$electoral,
                                                  presidential[p,]$popular)))
lapply(rownames(statemargin), r)
  • Reality: 0
  • Josh Putnam: 2.002633
  • Simon Jackman: 2.206758
  • Drew Linzer: 2.503588
  • Nate Silver: 3.186463
  • DeSart: 4.635004
  • Margin of Error: 4.641332
  • Wang & Ferguson: 4.83369
  • 2008: 5.525641
  • Unskewed Polls: 11.84946

(Which shows you how bad Unskewed Polls was: we could fit Putnam, Jackman, Linzer, and Silver’s errors into his and have room left over.)

Log scores of win predictions

logScore <- function(obs, pred) sum(ifelse(obs, log(pred), log(1-pred)), na.rm=TRUE)

Example of the difference between Brier and log score:

# Oops!
brier(0,1,bins=FALSE)$bs
1
# But we can recover by getting the second right
brier(c(0,1),c(1,1),bins=FALSE)$bs
0.5

# Oops!
logScore(1, 0)
-Inf
# Can we recover? ...we're screwed
logScore(c(1,1), c(0,1))
-Inf

Presidency win prediction:

lsp <- function(p) logScore(1, presidential[p,]$probability)
lapply(rownames(presidential), lsp)
  • Reality: 0
  • 2008: 0
  • Wang & Ferguson: 0
  • Linzer: -0.01005034
  • Jackman: -0.08992471
  • Silver: -0.09541018
  • DeSart: -0.1208126
  • Margin of Error: -0.3856625
  • Intrade: -0.4185503

Applied to state win predictions:

ls <- function(p) logScore(statewin["Reality",], statewin[p,])
lapply(rownames(statewin), ls)
  • Reality: 0
  • Linzer: -0.9327548
  • Wang & Ferguson: -1.750359
  • Silver: -2.057887
  • Jackman: -2.254638
  • DeSart: -3.30201
  • Intrade: -5.719922
  • Margin of Error: -10.20808
  • 2008: -Inf

Now Senate win predictions:

lss <- function(p) logScore(senatewin["Reality",], senatewin[p,])
lapply(rownames(senatewin), lss)
  • Reality: 0
  • Wang & Ferguson: -2.89789
  • Silver: -4.911792
  • Intrade: -5.813129

And all of them together:

combineBinaryForecasts <- function(p) c(statewin[p,], senatewin[p,], presidential[p,]$probability)
lssp <- function(pundit) logScore(combineBinaryForecasts("Reality"), combineBinaryForecasts(pundit))
lapply(c("Wang & Ferguson", "Nate Silver", "Intrade"), lssp)
  • Reality: 0
  • Wang & Ferguson: -4.648249
  • Silver: -7.06509
  • Intrade: -11.9516

Summary tables

RMSEs

Predictor Presidential electoral Presidential popular State margins S+Pp+Sm2 Senate margins
Silver 19 0.01 1.81659 20.82659 3.272197
Linzer 0 2.5285
Wang 29 0.31 2.79083 32.10083
Jackman 0 0.01 2.25422 2.26422
DeSart 29 0.58 2.414322 31.99432
Intrade 41 0.04
2008 33 2.21 3.206457 38.41646
Margin 29 0.71 2.426244 32.13624
Putnam 0 2.033683
Unskewed 69 1.91 7.245104 78.1551

Brier scores

(0 is a perfect Brier score or RMSE.)

Predictor Presidential win State win Senate win St+Sn+P
Silver 0.008281 0.00911372 0.04484545 0.02297625
Linzer 0.0001 0.00384326
Wang 0 0.00761569 0.01246376 0.009408282
Jackman 0.007396 0.00971369
DeSart 0.012950 0.01605542
Intrade 0.116964 0.02811906 0.04882958 0.03720485
2008 0 0.03921569
Margin 0.1024 0.05075311
Random 0.2500 0.25000000 0.25000000 0.25000000

Log scores

We mentioned there were other proper scoring rules besides the Brier score; another binary-outcome rule, less used by political forecasters, is the “logarithmic scoring rule” (see Wikipedia or Eliezer Yudkowsky’s “Technical Explanation”); it has some deep connections to areas like information theory, data compression, and Bayesian inference, which makes it invaluable in some context. But because a log score ranges between 0 and negative Infinity (bigger is better/smaller worse) rather than 0 and 1 (smaller better) and has some different behaviors, it’s a bit harder to understand than a Brier score.

(One way in which the log score differs from the Brier score is treatment of 100/0% predictions: the log score of a 100% prediction which is wrong is negative Infinity, while in Brier it’d simply be 1 and one can recover; hence if you say 100% twice and are wrong once, your Brier score would recover to 0.5 but your log score will still be negative Infinity! This is what happens with the “2008” benchmark.)

Forecaster State win probabilities
Reality 0
Linzer -0.9327548
Wang & Ferguson -1.750359
Silver -2.057887
Jackman -2.254638
DeSart -3.30201
Intrade -5.719922
Margin of Error -10.20808
2008 -Infinity
Forecaster Presidential win probability
Reality 0
2008 0
Wang & Ferguson 0
Jackman -0.08992471
Linzer -0.01005034
Silver -0.09541018
DeSart -0.1208126
Intrade -0.4185503
Margin of Error -0.3856625

Note that the 2008 benchmark and Wang & Ferguson took a risk here by an outright 100% chance of victory, which the log score rewarded with a 0: if somehow Obama had lost, then the log score of any set of their predictions which included the presidential win probability would automatically be -Infinity, rendering them officially The Worst Predictors In The World. This is why one should allow for the unthinkable by including some fraction of percent; of course, I’m sure Wang & Ferguson don’t mean 100% literally but more like “it’s so close to 100% we can’t be bothered to report the tiny remaining possibility”.

Forecaster Senate win probabilities
Reality 0
Wang -2.89789
Silver -4.911792
Intrade -5.813129

See also


  1. I have been told that once Intrade pries have been corrected for this, the new results are comparable to Silver & Wang. This doesn’t necessarily surprise me, but during the original analysis I did not look into doing the long-shot bias correction because: hardly anyone does in discussions of prediction markets; it would’ve been more work; I’m not sure it’s really legitimate, since if Intrade is biased, then it’s biased - if someone produces extreme estimates which can be easily improved by regressing to some relevant mean, it doesn’t seem quite honest to present your corrected version instead as what they “really” meant.

  2. Summing together RMSEs from different metrics is statistically illegitimate & misleading since the summation will reflect almost entirely the electoral vote performance, since it’s on a scale much bigger than the other metrics. I include it for curiosity only.