> Mail is delivered by the USPS mailman at a regular but not observed time; what is observed is whether the mail has been delivered at a time, yielding somewhat-unusual "interval-censored data".
> I describe the problem of estimating when the mailman delivers, write a simulation of the data-generating process, and demonstrate analysis of interval-censored data in R using maximum-likelihood (survival analysis with Gaussian regression using `survival` library), MCMC (Bayesian model in JAGS), and likelihood-free Bayesian inference (custom ABC, using the simulation). This allows estimation of the distribution of mail delivery times.
> I compare those estimates from the interval-censored data with estimates from a (smaller) set of exact delivery-times provided by USPS tracking & personal observation, using a multilevel model to deal with heterogeneity apparently due to a change in USPS routes/postmen.
> Finally, I define a loss function on mail checks, enabling: a choice of optimal time to check the mailbox to minimize loss (exploitation); optimal time to check to maximize information gain (exploration); Thompson sampling (balancing exploration & exploitation indefinitely), and estimates of the value-of-information of another datapoint (to estimate when to stop exploration and start exploitation after a finite amount of data).

Consider a question of burning importance: what time does the mailman come, bearing gifts and when should I check my mail---considering that my mailbox is >110m/>7.5 minutes away, and that I *really* dislike walking out to it in up to 90F heat for a package or book only to discover the mail hasn't come yet?
No one wants to sit around all morning to spot the *exact* time the mailman comes. At least, I don't.
We could more easily measure by going out in the morning at a random time to see if the mail has come yet, and then (somehow) estimate.
Given a set of data like "2015-06-20 11:00AM: mail has not come yet; 2015-06-21 11:59AM: mail had come", how can we estimate?
This is not a normal setup where we estimate a mean but our data is interestingly messed up: censored or truncated or an interval somehow.
[Survival analysis](!Wikipedia) seems like the appropriate paradigm.
This is not a simple survival analysis with "right-censoring" where each individual is followed up to a censoring time and the exact time of 'failure' is observed.
(This would be right-censoring if instead we had gone out to the mailbox early in the morning and sat there waiting for the mailman to record when she came, occasionally getting bored around 10AM or 11AM and wandering off without seeing when the mail comes.)
This isn't "left-censoring" either (for left-censoring, we'd go out to the mailbox late in the morning when the mail might already be there, and if it isn't, then wait until it does come).
I don't think this is left or right truncation either, since each day data is collected and there's no sampling biases at play.
What this is, is interval censoring: when we go out to the mailbox at 11AM and discover the mail is there, we learn that the mail was delivered today sometime in the interval midnight-10:59AM, or if the mail isn't there, we learn it will be delivered later today sometime during the interval 11:01AM-midnight (hopefully closer to the first end than the second).
Interval censoring comes up in biostatistics for situations like periodic checkups for cancer, which does resemble our mail situation.
# Inference
## Interval-censored data
### ML
The R [`survival` library](http://cran.r-project.org/web/packages/survival/index.html) supports the usual right/left-censoring but also the interval-censoring.
It supports two encodings of intervals, `interval` and `interval2`[^survival-interval-encoding]; I use the former, which format works well with both the `survival` library and also other tools like JAGS.
Times are written as minutes since midnight, so they can be handled as positive numbers rather than date-times (ie midnight=0, 11AM=660, noon=720, midnight=1440, etc), and the upper and lower bounds on intervals are 0 and 1440 (so if I check the mail at 660 and it's there, then the interval is 0-660, and if it's not, 660-1440).
[^survival-interval-encoding]: From the documentation:
> `Surv(time, time2, event, type="interval")`
>
> 1. `time`: For interval data, the first argument is the starting time for the interval.
> 2. `time2`: ending time of the interval for interval censored or counting process data only. Intervals are assumed to be open on the left and closed on the right, `(start, end]`. For counting process data, event indicates whether an event occurred at the end of the interval.
> 3. `event`: The status indicator, normally 0=alive, 1=dead....For interval censored data, the status indicator is 0=right censored, 1=event at time, 2=left censored, 3=interval censored.
>
> Interval censored data can be represented in two ways. For the first use `type = "interval"` and the codes shown above. In that usage the value of the `time2` argument is ignored unless `event=3`. The second approach is to think of each observation as a time interval with _(-∞, t)_ for left censored, _(t, ∞)_ for right censored, _(t,t)_ for exact and _(t~1~, t~2~)_ for an interval. This is the approach used for `type = "interval2"`. Infinite values can be represented either by actual infinity (`Inf`) or `NA`. The second form has proven to be the more useful one.
>
> ...a subject's data for the pair of columns in the dataset `(time1, time2)` is _(t~e~, t~e~)_ if the event time _t~e~_ is known exactly; _(t~l~, NA)_ if right censored (where _t~l~_ is the censoring time); and _(t~l~, t~u~)_ if interval censored (where _t~l~_ is the lower and _t~u~_ is the upper bound of the interval).
~~~{.R}
# Routines to make it easier to work in minutes-since-midnight:
library(lubridate)
fromClock <- function(ts){ (hour(ts)*60) + minute(ts) + (second(ts)/60)}
toClock <- function(t) {
h <- floor(t/60)
m <- floor(t - h*60)
sprintf("%0.2d:%0.2d", h, m) }
set.seed(2015-06-21)
# simulate a scenario in which the mailman tends to come around 11AM (660) and I tend to check around then,
# & generate interval data for each time, bounded by end-of-day/midnight below & above, collecting ~1 month:
simulateMailbox <- function(n, time) {
deliveryTime <- round(rnorm(n, mean = time, sd = 30))
checkTime <- round(rnorm(n, mean = time, sd = 20))
simulates <- mapply(function (ck, dy) { if(ck>dy) { return(c(0,ck)) } else { return(c(ck,1440)) }},
checkTime, deliveryTime)
return(data.frame(Time1=simulates[1,], Time2=simulates[2,])) }
mailSim <- simulateMailbox(30, 650); mailSim
## Time1 Time2
## 1 620 1440
## 2 0 651
## 3 0 627
## 4 624 1440
## 5 629 1440
## 6 664 1440
## 7 665 1440
## 8 652 1440
## ...
library(ggplot2)
png(file="~/wiki/images/maildelivery/simulated.png", width = 800, height = 500)
ggplot(mailSim) + geom_segment(aes(x=Time1, xend=Time2, y=1:nrow(mailSim), yend=1:nrow(mailSim))) +
geom_vline(xintercept=650, color="blue") + ylab("Day") + xlab("Time")
invisible(dev.off())
~~~
Inferring the mean time of delivery might sound difficult with such extremely crude data of intervals 700 minutes wide or worse, but plotting the little simulated dataset and marking the true mean time of 650, we see it's not *that* bad---the mean time is probably whatever line passes through the most intervals:
![The simulated overlapping-intervals data, with the true mean time drawn in blue](/images/maildelivery/simulated.png)
And also with our simulated dataset, we can see if the standard R survival library and a interval-censored model written in JAGS can recover the 650:
~~~{.R}
library(survival)
surv <- Surv(mailSim$Time1, mailSim$Time2, type="interval2")
s <- survfit(surv ~ 1, data=mailSim); summary(s)
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 625.5 30.0000 3.33599e+00 0.888800 0.0573976 0.783131 1.000000
## 641.0 26.6640 4.15541e+00 0.750287 0.0790267 0.610339 0.922323
## 647.0 22.5086 6.14506e+00 0.545451 0.0909091 0.393449 0.756178
## 651.5 16.3635 3.84506e-11 0.545451 0.0909091 0.393449 0.756178
## 664.5 16.3635 5.82315e-03 0.545257 0.0909124 0.393258 0.756005
## 675.0 16.3577 1.63577e+01 0.000000 NaN NA NA
plot(s)
png(file="~/wiki/images/maildelivery/simulated-survivalcurve.png", width = 800, height = 500)
plot(s)
invisible(dev.off())
sr <- survreg(surv ~ 1, dist="gaussian", data=mailSim); summary(sr)
## Value Std. Error z p
## Value Std. Error z p
## (Intercept) 663.08028 9.742171 68.06289 0.00000e+00
## Log(scale) 3.41895 0.438366 7.79931 6.22465e-15
##
## Scale= 30.5374
##
## Gaussian distribution
## Loglik(model)= -15.8 Loglik(intercept only)= -15.8
## Number of Newton-Raphson Iterations: 11
## n= 30
~~~
![Survival curve (death=delivery)](/images/maildelivery/simulated-survivalcurve.png)
### MCMC
More Bayesianly, we can write an interval-censoring model in JAGS, which gives us the opportunity to use an informative prior about the mean time the mailman comes.
They work normal 9-5 hours as far as I know, so we can rule out anything outside 540-1020.
From past experience, I expect the mail to show up not before 10AM (600) and not after 1PM (780), with those extremes being rare and sometime around 11AM (650) being much more common; so not a uniform distribution over 600-780 but a normal one centered on 650 and then somewhat arbitrarily saying that 600-700 represent 3 SDs out from the mean of delivery times to get SD=~30 minutes so in all, `dnorm(650, pow(30, -2))`.
The SD itself seems to me like it could range anywhere from a few minutes to an hour, but much beyond that is impossible (if the SD was over an hour, then every so often the mailman would have to come at 8AM! and if it was smaller than 10 minutes, then I would never have noticed much variation in the first place).
~~~{.R}
library(R2jags)
model1 <- function () {
for (i in 1:n){
y[i] ~ dinterval(t[i], dt[i,])
t[i] ~ dnorm(mu,tau)
}
mu ~ dnorm(650, pow(30, -2))
sd ~ dunif(10, 60)
tau <- pow(1/sd, 2)
y.new ~ dnorm(mu, tau)
}
# y=1 == Event=3 for `Surv`: event is hidden inside interval, not observed/left-/right-censored
data <- list("dt"=mailSim, "n"=nrow(mailSim), "y"=rep(1, nrow(mailSim)))
inits <- function() { list(mu=rnorm(1),sd=30,t=as.vector(apply(mailSim,1,mean))) }
params <- c("mu", "sd", "y.new")
j1 <- jags(data,inits, params, model1); j1
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## mu 665.055 9.841 647.724 658.092 664.334 670.983 686.264 1.008 280
## sd 40.153 11.592 18.626 30.643 40.813 49.707 59.048 1.047 51
## y.new 664.103 42.908 577.518 637.561 663.011 689.912 749.936 1.002 1300
## deviance 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1
~~~
Both approaches' point-value mean time of 663-665 (11:03-11:05AM) come close to the true simulation value of 650 (11AM) and the prediction interval of 577-749 also sounds right, validating the models.
The estimated standard deviation isn't as accurate with a wide credible interval, reflecting that it's a harder parameter to estimate and the estimate is still vague with only _n_ = 30.
With a simulation and JAGS set up, we could also see how the posterior estimates of the mean, 95% CI of the mean, and predictive interval change as an additional datapoint is added:
~~~{.R}
animatePosteriors <- function (df, filename, upperTrue, lowerTrue) {
# http://cran.r-project.org/web/packages/animation/index.html
library(animation)
saveGIF(
for(n in 1:nrow(df)){
data <- list("dt"=df[1:n,], "n"=nrow(df[1:n,]), "y"=rep(1, nrow(df[1:n,])))
inits <- function() { list(mu=rnorm(1),sd=30,t=as.vector(apply(df[1:n,],1,mean))) }
params <- c("mu","sd", "y.new")
j1 <- jags(data, inits, params, model1, progress.bar="none")
lowerMean <- j1$BUGSoutput$summary[c(2),][3]
medianMean <- j1$BUGSoutput$mean$mu
upperMean <- j1$BUGSoutput$summary[c(2),][7]
lowerPredictive <- j1$BUGSoutput$summary[c(4),][3]
upperPredictive <- j1$BUGSoutput$summary[c(4),][7]
## WARNING: need an environment call for `ggplot` inside a function for local variables like
## 'n' to be visible: http://stackoverflow.com/a/29595312/329866
## WARNING: & you can't just call qplot due to animation/ggplot2 bug; have to assign & print
timeLimits <- seq(from=9*60, to=12.8*60, by=15)
p <- ggplot(df[1:n,], environment = environment()) +
coord_cartesian(xlim = timeLimits) +
scale_x_continuous(breaks=timeLimits, labels=sapply(timeLimits, toClock)) +
ylab("Day") + xlab("Time") +
geom_segment(aes(x=df[1:n,]$Time1, xend=df[1:n,]$Time2, y=1:n, yend=1:n)) +
geom_vline(xintercept=medianMean, color="blue") +
geom_vline(xintercept=lowerMean, color="green") +
geom_vline(xintercept=upperMean, color="green") +
geom_vline(xintercept=lowerPredictive, color="red") +
geom_vline(xintercept=upperPredictive, color="red") +
geom_vline(xintercept=lowerTrue, color="red4") +
geom_vline(xintercept=upperTrue, color="red4")
print(p)
},
interval = 0.7, ani.width = 800, ani.height=800,
movie.name = filename) }
simData <- simulateMailbox(200, 650)
confintTrue <- round(qnorm(c(0.025, 0.975), mean=650, sd=30))
lowerPredictiveTrue <- confintTrue[1]
upperPredictiveTrue <- confintTrue[2]
animatePosteriors(simData, "/home/gwern/wiki/images/maildelivery/simulated-inferencesamplebysample.gif", lowerPredictiveTrue, upperPredictiveTrue)
~~~
Probably the most striking aspect of watching these summaries updated datum by datum, to me, is how the estimate of the mean homes in almost immediately on close to the true value (this isn't due solely to the informative prior either, as with a completely uninformative `dunif(0,1440)` will zoom in within 4 or 5 datapoints as well, after some violent thrashing around).
What happens is that the intervals initially look uninformative if the first two or three all turn out to be delivered/non-delivered and so the mean delivery time could still be anywhere from ~11:AM-midnight or vice-versa, but then as soon as even one needle falls the other way, then the mean suddenly snaps into tight focus and gets better from there.
While I understand this abrupt transition in hindsight (only a tiny subset of values around the overlapping tips of the needles can yield the observed flip-flops of delivery/non-delivery, while a mean time far from the tips would yield a completely consistent dataset of all deliveries/non-deliveries), I didn't expect this, simply reasoning that "one interval-censored datum seems very uninformative, so it must take many data to yield any sort of decent result for the mean & standard deviation and hence the predictions".
But, even as the mean becomes precisely estimated, the predictive interval---which is what, in the end, we really care about---remains obdurate and broad, because we assume the delivery time is generated by a normal distribution and so the predicted delivery times are the product of not just the mean but the standard deviation as well, and the standard deviation is hard to estimate (a 95% credible interval of 12-51!).
Also in hindsight this is predictable as well, since the flip-flopping needles may be sensitive to the mean, but not to the spread of delivery-times; the data would not look much different than it does if the mailman could deliver anywhere from 8AM to 3PM---on early days she delivers a few hours before I check the mailbox around 11AM and on late days she delivers a few hours after.
These considerations also raise questions about statistical power/optimal experiment design: what are the best times to sample from interval-censored data in order to estimate as precisely as possible with a limited budget of samples?
I searched for material on interval-censored data but didn't find anything directly addressing my question.
The flip-flops suggest that to estimate the mean, one should sample only at the current estimated mean, which maximizes the probability that there will be a net 50-50 split of delivery/non-delivery; but where should one sample for the SD as well?
~~~{.R}
mailInterval <- data.frame(
Date=as.POSIXct(c( "2015-06-20 11:00AM", "2015-06-21 11:06AM", "2015-06-23 11:03AM",
"2015-06-24 11:05AM", "2015-06-25 11:00AM", "2015-06-26 10:56AM",
"2015-06-27 10:45AM", "2015-06-29 10:31AM", "2015-06-30 10:39AM", "2015-07-01 10:27AM",
"2015-07-02 10:47AM", "2015-07-03 10:27AM", "2015-07-04 10:54AM", "2015-07-05 10:55AM",
"2015-07-06 11:21AM", "2015-07-07 10:01AM", "2015-07-08 10:20AM", "2015-07-09 10:50AM",
"2015-07-10 11:10AM", "2015-07-11 11:12AM", "2015-07-13 11:05AM", "2015-07-14 11:14AM",
"2015-07-15 11:40AM", "2015-07-16 11:24AM", "2015-07-17 11:03AM", "2015-07-18 10:46AM",
"2015-07-20 11:05AM", "2015-07-21 10:56AM", "2015-07-22 11:00AM", "2015-07-23 11:17AM",
"2015-07-24 11:15AM", "2015-07-27 11:11AM", "2015-07-28 10:44AM", "2015-07-29 11:18AM",
"2015-07-30 11:08AM", "2015-07-31 10:44AM", "2015-08-01 11:25AM", "2015-08-03 10:45AM",
"2015-08-04 10:45AM", "2015-08-05 10:44AM", "2015-08-06 10:33AM", "2015-08-07 10:55AM",
"2015-08-10 11:09AM", "2015-08-11 11:16AM", "2015-08-12 11:14AM", "2015-08-13 11:10AM",
"2015-08-14 11:02AM", "2015-08-15 11:04AM", "2015-08-18 11:15AM", "2015-08-20 11:20AM",
"2015-08-22 11:46AM", "2015-08-23 11:04AM", "2015-08-24 10:56AM", "2015-08-25 10:26AM",
"2015-08-26 11:57AM", "2015-08-27 10:15AM", "2015-08-31 12:35PM", "2015-09-01 11:44AM",
"2015-09-02 10:54AM", "2015-09-03 10:29AM", "2015-09-04 10:08AM", "2015-09-05 10:45AM",
"2015-09-07 10:58AM", "2015-09-08 10:25AM", "2015-09-09 10:29AM", "2015-09-10 10:48AM",
"2015-09-16 11:22AM", "2015-09-17 11:06AM", "2015-09-18 11:22AM", "2015-09-19 11:12AM",
"2015-09-21 11:52AM", "2015-09-23 12:02PM", "2015-09-24 10:00AM", "2015-09-25 12:08PM",
"2015-09-26 10:59AM", "2015-09-28 10:57AM", "2015-09-29 10:52AM" ),
"EDT"),
Delivered=c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE,
FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE,
FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE))
mailInterval$Time <- fromClock(mailInterval$Date)
mail <- with(mailInterval, data.frame(Time1 = ifelse(Delivered, 0, Time),
Time2 = ifelse(Delivered, Time, 1440)))
library(R2jags)
model1 <- function() { for (i in 1:n){
y[i] ~ dinterval(t[i], dt[i,])
t[i] ~ dnorm(mu,tau)
}
mu ~ dnorm(650, pow(30, -2))
sd ~ dunif(10, 60)
tau <- pow(1/sd, 2)
y.new ~ dnorm(mu, tau)
}
data <- list("dt"=mail, "n"=nrow(mail), "y"=rep(1, nrow(mail)))
inits <- function() { list(mu=rnorm(1),sd=30,t=as.vector(apply(mail,1,mean))) }
params <- c("mu","sd", "y.new")
j1 <- jags(data, inits, params, model1, n.iter=100000); j1
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## mu 658.173 4.468 649.098 655.469 658.156 660.943 666.797 1.001 3000
## sd 27.138 7.976 16.034 21.417 25.459 30.916 48.149 1.001 3000
## y.new 658.098 27.960 601.899 640.912 657.942 675.067 712.985 1.001 3000
## deviance 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1
animatePosteriors(mail, "/home/gwern/wiki/images/maildelivery/real-inferencesamplebysample.gif", 0, 0)
~~~
### ABC
Because JAGS provides an interval-censored distribution in the form of `dinterval()` with a [likelihood function](!Wikipedia), we can use MCMC for inverse inference (reasoning from data to the underlying process)
But if it didn't, I wouldn't know how to write one down for it and then the MCMC wouldn't work; but I was able to write a little simulation of how the underlying process of delivery-and-checking works, which, given a set of parameters, spits out simulated results generated by the process, which is probability or forward inference (reasoning from a version of an underlying process to see what it creates).
This is a common situation: you can write a good simulation simply by describing how you think something works, but you can't write a likelihood function.
[ABC](!Wikipedia "Approximate Bayesian computation") (exemplified in the fun example ["Tiny Data, Approximate Bayesian Computation and the Socks of Karl Broman"](http://www.sumsar.net/blog/2014/10/tiny-data-and-the-socks-of-karl-broman/); see also Wilkinson's [intro](https://darrenjw.wordpress.com/2013/03/31/introduction-to-approximate-bayesian-computation-abc/ "Introduction to Approximate Bayesian Computation (ABC)") & [summary statistics](https://darrenjw.wordpress.com/2013/09/01/summary-stats-for-abc/ "Summary stats for ABC") posts) is a remarkably simple and powerful idea which lets us take a forward simulation and use it to run backwards inference.
(Reminds me a little of [Solomonoff induction](!Wikipedia "Solomonoff's theory of inductive inference").)
There's [an R package, of course](https://cran.r-project.org/web/packages/abc/index.html) which implements nice features & optimizations, but ABC is so simple we might as well write our own for transparency.
The simplest ABC goes like this:
You sample possible parameters from your prior, feed the set of parameters into your simulation, and if the result is identical to your data, you save that set of parameters.
At the end, you're left with a bunch of sets and that's your posterior distribution which you can look at the histograms of and calculate 95% densities etc.
So for the mail data, ABC goes like this:
~~~{.R}
simulateMailbox <- function(n, dTime, dSD) {
deliveryTime <- round(rnorm(n, mean = dTime, sd = dSD))
checkTime <- round(rnorm(n, mean = dTime, sd = dSD))
simulates <- mapply(function (ck, dy) { if(ck>dy) { return(c(0,ck)) } else { return(c(ck,1440)) }},
checkTime, deliveryTime)
return(data.frame(Time1=simulates[1,], Time2=simulates[2,])) }
# if both data-frames are sorted, comparison is easier
mailSorted <- mail[order(mail$Time1),]
mail_sim <- replicate(100000, {
# mu ~ dnorm(650, 20)
mu <- rnorm(n=1, mean=650, sd=20)
# sd ~ dunif(10, 60)
sd <- runif(n=1, min=10, max=60)
newData <- simulateMailbox(nrow(mailSorted), mu, sd)
newDataSorted <- newData[order(newData$Time1),]
if (all(newDataSorted == mailSorted)) { return(c(Mu=mu, SD=sd)) }
}
)
results <- Filter(function(x) {!is.null(x)}, mail_sim)
results <- data.frame(t(sapply(results,c)))
summary(results)
~~~
The first thing to note is efficiency: I can get a reasonable number of samples in reasonable amount of time for _n_ = 1-3, but at 4 datapoints, it becomes slow.
There's so many possible datasets when 4 checks are simulated that almost all get rejected because they are not identical to the real dataset and it takes millions of samples and hours to run.
And this problem only gets worse for _n_ = 5 and bigger.
To run ABC more efficiently, you relax the requirement that the simulated data == real data and instead accept the pair of parameters if the simulated data is 'close enough' in some sense to the real data, close in terms of some summary statistic (hopefully sufficient) like the mean.
I don't know what are the sufficient statistics for a set of interval-censored data, but I figure that if the means of the pairs of times are similar in both datasets, then they are probably close enough for ABC to work, so I can use that as a rejection tolerance; implementing that and playing around, it seems I can make the difference in means as tight as <1 while still running fast.
~~~{.R}
mail_abc <- function(samples) {
results <- list()
n <- 0
while (n
> "...It is no surprise that when this model fails, it is the [likelihood](!Wikipedia "Likelihood function") rather than the [prior](!Wikipedia "Prior probability") that is causing the problem. In the binomial model under consideration here, the prior comes into the posterior distribution only once and the likelihood comes in _n_ times. It is perhaps merely an accident of history that **skeptics and subjectivists alike strain on the gnat of the prior distribution while swallowing the camel that is the likelihood.**"
>
> [Andrew Gelman](!Wikipedia), 2013^[Emphasis added; ["'Not Only Defended But Also Applied': The Perceived Absurdity of Bayesian Inference"](http://www.stat.columbia.edu/~gelman/research/published/feller8.pdf), Gelman & Robert 2013; ["The prior can generally only be understood in the context of the likelihood"](https://arxiv.org/abs/1708.07487), Gelman et al 2017.]

In setting up an analysis, one must [make a variety of choices](http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf "'The garden of forking paths: Why multiple comparisons can be a problem, even when there is no `fishing expedition` or `p-hacking` and the research hypothesis was posited ahead of time', Gelman & Loken 2013") in what models to use, which influences the result; this neglect of "model uncertainty" can lead to overconfident inferences or miss critical details.
In Bayesian analyses, typically the prior comes in for the most scrutiny as to how it determines the result; but the prior part of the model often has much less influence than the structural or functional form of the analysis, where it is assumed that the outcome is drawn from a normal distribution or another such common choice, although it's not.
This choice itself is often arbitrary and can determine the result almost regardless of what the data says.
For example, the normal distribution has thin tails and so the probability of something being far out on a tail will be extremely small and combined with other considerations like measurement error, can result in a model refusing to ever accept that a value really is 'extreme' without equally extreme amounts of data---a refusal that would not happen if any of a number of other perfectly possible distributions had been chosen instead like log-normal distribution or [Cauchy distribution](!Wikipedia)---despite the fact that the choice of normal distribution itself is usually due more to convenience than any extremely strong evidence that the normal distribution is the true distribution ("every model is false, but some are useful"), which means the best response is simply to note that [one man's modus ponens is another man's modus tollens](/Modus)---the probability that [one's math or model is wrong is far higher](/The-Existential-Risk-of-Mathematical-Error) sets an upper bound on how much we can get out of them, and a bound which often bars many of the inferences one might try to extract from them.
For example:
1. the normal distribution is commonly used for its analytic tractability and asymptotic properties, but it has some aspects that are, in a sense, *too* powerful and restrictive: specifically, that its tails are thin. Thin tails can cause serious underestimations of rare phenomenon (and not just in cases involving power laws) and can lead to absurdly powerful conclusions. Two examples:
- in [one analysis of charities](http://lesswrong.com/lw/745/why_we_cant_take_expected_value_estimates/ "Why We Can't Take Expected Value Estimates Literally (Even When They're Unbiased)"), the writer uses normal distributions and thus winds up arguing that with error in our assessments of charities' values, our estimate must be shrunken far towards the mean (which is entirely true); but [this is circular](http://lesswrong.com/lw/gzq/bayesian_adjustment_does_not_defeat_existential/), since by assuming normal distributions he has also assumed away the existence of large differences in value between charities.
If one were to grant the apparently innocuous assumption of normally distributed effectiveness of interventions, one would be forced to ignore that there *are* large differences---consider [knitting sweaters for penguins](http://www.giantflightlessbirds.com/2011/10/the-great-penguin-sweater-fiasco/ "The Great Penguin Sweater Fiasco") vs [stopping children from being infected by parasites](!Wikipedia "Schistosomiasis Control Initiative") or [killed by malaria](!Wikipedia "Against Malaria Foundation"); are they really within a few standard deviations of each other? Should we really ignore interventions like vaccines which claim to save lives at a cost of no more than tens of thousands of dollars, on the grounds that most charities require hundreds of thousands or millions of dollars to achieve goals like that? If we look at [Global Burden of Disease](!Wikipedia) and see extreme skews in what causes loss of QALYs, with some classifications being hundreds or thousands of times more damaging than others (like indoor fires), is it more likely that the data is wrong than right solely because the normal distribution says that the largest problems are many standard deviations out? It's clear that whatever the right distribution for modeling charity or health outcomes is, it is probably not literally & exactly a normal distribution and it should allow for large differences; and having abandoned the normal, we then lose the apparently magical ability to make confident predictions about how many charities are how effective.
- [in another analysis](http://polymatharchives.blogspot.com/2015/01/the-inappropriately-excluded.html), the author, a Michael Ferguson, argues that American society discriminates to a truly astonishing degree against people with high IQs, claiming that people with 150IQ have 97% less chance of achieving success than people with the ideal IQ of ~133, and thus, that past that, the chance of success drops almost to 0%, and that this is due to vicious constant discrimination & persecution. His analysis takes the strategy of noting the population IQ is normally distributed, by definition, at $\mathcal{N}(100,15)$ and taking a few old studies of elite groups like students at Harvard Medical School and noting the studies report means like 122 and standard deviations like 6, inferring the students are normally distributed at $\mathcal{N}(122,6)$, and then noting that when these two normal distributions are superimposed, there will be almost no elites as high as IQ 140 or 150 despite there being many such people in the general population and this underrepresentation gets more and more extreme as one goes further out on the tail; why did all the high IQ population members fail to show up in the elite distribution? It must be discrimination. Thus, he grimly warns his readers that "If your IQ is over 150, it is a clarion call; without direct intervention, your career prospects are very poor. If you are the parent of a child with a D15IQ over 150, immediate and dramatic action is required."
This would be a remarkable fact if true for many reasons (not least the waste & damage to science and humanity), but that is unlikely: because the underrepresentation is a mathematical necessity of the normal distribution due to the elite distribution being defined as having a smaller SD. The assumption that the elite distribution is normal drives the entire remarkable, unlikely result. But why assume elites are normally distributed? Most of the processes by which elite schools such as Harvard would select from the general population would *not* produce a normal distribution: if selected at a SAT threshold, the result will not be normal but a truncated normal; if selected on multiple factors such as test scores and personality, it will more likely be log-normal; and so on. That the studies of the Harvard students reported a mean and SD does not show that the student distribution was normal, and in fact, it's easy to simulate scenarios like the threshold model or a model in which increasing IQ produces increasing odds of admission, and show that the reported means & SDs are entirely possible under other highly non-normal distributions (intuitively, people as high as 140 or 150 are so rare that they don't affect the summary statistics much, regardless of whether they are heavily discriminated against or heavily favored for admission). The shocking conclusion follows from the mathematical/statistical assumption, but where did this assumption come from and should we grant it?
2. linear models have a similar problem as normal distributions: while convenient and simple, they have well-known problems when at the extremes of the data or when projected beyond that. Textbooks give examples like a linear regression on ice cream sales predicting negative sales during the winter, or predicting that sales in summer will be orders of magnitude higher, and it's easy to make up other examples like a linear regression on sprinting times predicting that in another century, sprinters will finish races before they have started. Linear models have no problems predicting negative values backwards in time, or predicting absurdly high amounts in the far future, but they are convenient and for the most part, it's easy to simply ignore the nonsensical predictions or, if they're causing problems, use a technical fix like logging values, switching to a log-normal distribution (with its strictly natural support), using a sigmoid or logarithmic fit to account for the inevitable flattening out, handle ratio data with beta regression, fix the end-values with spline regression, etc; if we make the mistake of modeling a bacterial population in an agar dish as an exponentially increasing population, we'll soon learn otherwise than the sigmoid or logistic curves make a better fit. Mark Twain offered an amusing criticism in _[Life on the Mississippi](!Wikipedia)_, and indeed, one can often detect model problems watching for instances where one gets "wholesale returns of conjecture out of such a trifling investment of fact", for that indicates that we may be engaged in postulating, which has all "the advantages of theft over honest toil":
> In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.
An interesting use of OLS linear models is given in [Sharov & Gordon 2013](https://arxiv.org/abs/1304.3381 "Life Before Earth"), which follows up a [2006 paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1526419/ "'Genome increase as a clock for the origin and evolution of life', Sharov 2006"). By taking the [genome length](!Wikipedia "Genome size") of 8 biological groups (eg prokaryote genomes are ~5 logged and prokaryotes originated 3-4bya, so it is assumed that genome lengths were 5 3-4bya) and regressing against date of origin (on a log scale), their linear model, in the mother of all extrapolations, extrapolates backwards to the smallest possible genome (and hence, origin of all life) around 9 billion years ago---well before Earth could have supported life or even existed, and consistent only with a panspermia hypothesis.
This is problematic for a number of reasons: genome length is being used as a clock, but the animal and bacteria genomes involved are all *contemporary* (since no DNA older than a million years or so has ever been recovered or the full length counted) which raises some questions about why we think genome length has any connection with time and how exactly this linear increase process is supposed to work (is evolution supposed to have stopped for the prokaryotes 3-4bya and their genome lengths been static ever since?); how they defined those particular taxonomic groups of all the possibilities and those particular genome lengths & dates (and a questionable reason for excluding plants); measurement error and the very small data set implies considerable uncertainty in the estimates & a rejection of Earthly origins even at face value; genome lengths seem to have little to do with the passage of time but reflect peculiarities of biology & selection (viruses are under more selection pressure, and have faster generations than, any other kind of organism, and have innovated baffling numbers of tricks for infection, and yet instead of having ultra-long genomes reflecting these endless generations and innovation, they have ultra-small ones; and similarly, the Human Genome Project yielded humbling results in showing the human genome to be smaller than many organisms we humans look down upon, such as even some single-cell organisms, which can be bizarrely huge like the ferns, [Mimivirus](!Wikipedia)) and vary tremendously within groups; there is no direct evidence showing that early life's genome lengths increased at the claimed trendline, and no particular reason to expect recent trends to hold exactly to the origins of life and times when life may not have been DNA-based or even RNA-based at all; genomes are known to shrink in size as well as increase under the right conditions (and one would hope that contemporary prokaryotes are more genetically efficient than the earliest prokaryotes); mutation load & transcription errors place an upper bound on how large a genome *can* be ([Ohno 1972](/docs/genetics/selection/1972-ohno.pdf "An Argument for the Genetic Simplicity of Man and other Mammals")); many genomic phenomenon like retroviral insertions or selfish jumping [transposons](!Wikipedia "Transposable element"); it's unclear why one should endorse constant increase in genome size rather than a population genetics perspective of equilibriums and occasional shifts; horizontal gene flow makes it unclear how much one can compare with multicellular organisms & vertical transmission; Gould argues that any increase in complexity can be due to a random walk with a lower bound, in which case there is no underlying trend at all, merely stochastic extremes occasionally reached; and the wide variations in "[junk DNA](!Wikipedia)" cannot be explained if genome length has much at all to do with a clock ticking since divergence from other species.
More relevantly here is the model misspecification: the model will happily project past 1bpa and estimate how long the genomes were---in negative units---in 10bya. (To say nothing of extrapolating a few billion years into the future.) This is nonsensical: whatever life may use for genomes, it must be positive and never negative. So linearity is known a priori to be false and to break down at some point: the linear model is guaranteed to be incorrect in its predictions early in history; now, if one must reject the extrapolations for 10bya because the model has broken down by that point, why should one believe the projections from before the breakdown...? Why not infer that any linearity appears only later, after such regime-changing events as the creation of life, the switch to DNA, the expansion over the entire world, the oxygen catastrophe, etc? (If your ice cream linear model predicts selling -1000 ice cream cones at the North Pole, why believe it when it predicts selling -10 ice cream cones in Canada?) Why not use a model like a sigmoid which at least has a chance of being correct? The problem, of course, is that if one tried to fit a sigmoid or another such distribution or curve which could fit the true data, one would discover that there is no early data which could give any indication of how fast genome lengths were increasing very early on like in 4bya, since all the available data is from *now*. Where does the idea come from that genome lengths would increase slowly even in the first life forms which discovered a virgin planet of unexploited organic goop? Examined closely, since there is no evidence for early linearity, we see that the modeling produces no additional evidence, and is merely an exercise of showing the consequences of an assumption: "if one assumes that genome lengths increase linearly [...] then with this particular set of lengths and assumed times, it logically follows that the first organisms was 9bya". This is a thoroughly bizarre premise which cannot survive the slightest contact with known sizes of genomes and influences, so one gets no more out of this than what one put in: since there was no particular reason to accept that assumption out of all possible assumptions, there's no additional reason to accept the conclusion as true either. On its own, it cannot give us any additional evidence for panspermia.
3. independence is often a questionable model assumption: many datapoints are highly correlated with each other and calculations assuming independence will become grotesquely overconfident. Is it more likely that the probability of a tied election is 10^-90^ and one could run trillions of elections without ever once coming near a tie, or that [the binomial assumption is simply wrong about voters](http://andrewgelman.com/2014/12/25/common-sense-statistics/) being independent and victory margins being accordingly distributed? Is it more likely that stock market movements are thin-tailed and large crashes would happen only once in billions of years, or that crashes are common and the distribution is fat-tailed? Is it more likely that there are many strong predictors of human populations' wealth and social factors, or that they are [highly correlated with neighbors due to descent & proximity & common geographic conditions](!Wikipedia "Galton's problem")? Is it more likely that the models were correct and the fisheries just accidentally collapsed, or that the models did not take into account autocorrelation/multilevelness and overstated accuracy through [pseudoreplication](!Wikipedia)? etc
In the case of mail deliveries, there's clearly model uncertainty.
While I used [normal distributions](!Wikipedia), I could as easily have used [_t_-distributions](!Wikipedia "Student's t-distribution") (fatter-tailed normals), [log-normals](!Wikipedia "Log-normal distribution"), [exponentials](!Wikipedia "Exponential distribution"), [uniforms](!Wikipedia "Uniform distribution (discrete)"), [betas](!Wikipedia "Beta distribution"), [overdispersed](!Wikipedia "Overdispersion") versions of some of the previous, or picked some even more exotic [univariate distribution](http://www.math.wm.edu/~leemis/chart/UDR/UDR.html "Univariate Distribution Relationships") and easily come up with a just-so story for some of them.
(It's a negative-binomial because we're modeling minutes until "failure" ie delivery; no, it's a log-normal because each stop slows down the mailman and delays multiply; no, it's a normal distribution because each stop adds time and the sum of many small random deviates is itself normally distributed; no, it's a _t_-distribution because sometimes the mail arrives much earlier or later and it's a more robust distribution we should be using anyway; no, it's even wider than that and a uniform distribution because the mailman starts and ends their shift at particular times but otherwise doesn't care about speed; no, it's a beta distribution over time because that's more flexible and can include the others as special-cases...)
Model uncertainty can be handled several ways ([Hooten & Hobbs 2015](https://www.stat.colostate.edu/~hooten/papers/pdf/Hooten_Hobbs_EcolMono_2015.pdf "A guide to Bayesian model selection for ecologists")), including:
- Penalty or regularization or sparsity priors (only handles selection of variables/predictors; gains largely eliminated in this case by informative priors)
- Diagnostics like posterior predictive checks, crossvalidation, and [AIC](!Wikipedia "Akaike information criterion")/[BIC](!Wikipedia "Bayesian information criterion")/[DIC](!Wikipedia "Deviance information criterion")/[WAIC](https://arxiv.org/abs/1208.6338 "'A Widely Applicable Bayesian Information Criterion', Watanabe 2012") ([overview](http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf "'Understanding predictive information criteria for Bayesian models', Gelman et al 2013")) can be used to check for lack of fit; but like tests in software development, they can only show the presence of bugs & not their absence
- Maximally flexible models can be used like nonparametric Bayesian models; if "all models are wrong", we can use models which are arbitrarily flexible in order to minimize the influence of our model misspecification ([Dunson 2013](/docs/statistics/bayes/2013-dunson.pdf "Nonparametric Bayes"))
- [Ensemble approaches](!Wikipedia "Ensemble learning") (often targeted at the narrow use-case of selecting variables/predictors in a linear model, but some are applicable to choosing between different distributions as well), particularly Bayesian approaches ([popularization](http://lesswrong.com/lw/hzu/model_combination_and_adjustment/); [Friel & Wyse 2011](https://arxiv.org/abs/1111.1957 "Estimating the model evidence: a review"); [Hooten & Hobbs 2015](http://www.esajournals.org/doi/full/10.1890/14-0661.1 "A guide to Bayesian model selection for ecologists")):
- Bayesian model choice: models can be pitted against each other (eg using Bayes factors) implemented using [nested sampling](!Wikipedia), or [reversible jump MCMC](!Wikipedia)/product-space method ([Carlin & Chib 1995](http://stats.ma.ic.ac.uk/~das01/MyWeb/SCBI/Papers/CarlinChib.pdf "Bayesian model choice via Markov chain Monte Carlo methods"), [Lodewyckx et al 2011](http://ejwagenmakers.com/2011/LodewyckxEtAl2011.pdf "A tutorial on Bayes factor estimation with the product space method") & [source code](http://ppw.kuleuven.be/okp/software/scripts_tut_bfepsm/), [Tenan 2014](/docs/statistics/bayes/2014-tenan.pdf "Bayesian model selection: The steepest mountain to climb") & [source code](/docs/statistics/bayes/2014-tenan-supplement.zip)), or ABC, and the one with highest posterior probability is selected for any further analysis while all others are ignored
- Bayesian model averaging: posterior probabilities of models are calculated, but all models are retained and their posteriors are combined weighted by the probability that model is the true one ([Hoeting et al 1999](https://projecteuclid.org/download/pdf_1/euclid.ss/1009212519 "Bayesian model averaging: a tutorial")); but with enough data, the posterior probability of the model closest to the truth will converge on 1 and BMA becomes a model selection procedure
- Bayesian model combination: posterior probabilities of combinations/ensembles of models are calculated, yielding a richer set of 'models' which often outperform all of the original models; such as by learning an optimal weighting in a linear combination ([Monteith et al 2011](http://axon.cs.byu.edu/papers/Kristine.ijcnn2011.pdf "Turning Bayesian Model Averaging Into Bayesian Model Combination"))
Of the non-ensemble approaches: regularization is already done via the informative priors; some of the diagnostics might work but others won't (JAGS won't calculate DIC for interval-censored data); nonparametric approaches would be too challenging for me to implement in JAGS and I'm not sure there is enough data to allow the nonparametric models to work well.
This leaves PPC, crossvalidation, and the ensemble approaches:
- Bayesian model combination is overkill
- Bayesian model averaging, given Bayesian model choice, may also be overkill and not meaningfully improve predictions enough to change final decisions about mail check times
- Bayesian model choice: nested sampling doesn't seem applicable, leaving reversible-jump/product-space; while JAGS examples are available in tutorial papers, I found them lengthy & complicated by using realistic data & models, hardwired to their specific use-case still too difficult to understand with my limited JAGS skills. So I wound up implementing using ABC.
### PPC
Posterior predictive checks (PPC) are a technique recommended by Gelman as a test of how appropriate the model is (as the model or likelihood often is far more important in forcing particular results than the priors one might've used), where one generates possible observations from the model and inferred posteriors, and sees how often the simulated data matches the real data; if that is rarely or never, then the model may be bad and one should fix it or possibly throw in additional models for Bayesian model comparison (which will hopefully then pick out the right model).
For continuous data or more than trivially small data, we have the same issue as in ABC (which is similar to PPCs): the simulates will never exactly match the data, so we must define some sort of summary or distance measure which lets us say that a particular simulated dataset is similar enough to the real data.
In the two summaries I tried for ABC, counting 'flip-flops' worked better, so I'll reuse that here.
For the simulation, we also need check or interval-censoring times, which I also treat as a normal distribution & estimate their means and SDs from the original data.
~~~{.R}
posteriorTimes <- k2$BUGSoutput$sims.list[["y.new2015"]]
simsPPC <- replicate(10000, {
nChecks <- nrow(mail)
checkTimes <- c(mail[mail$Time1>0,]$Time1, mail[mail$Time2<1440,]$Time2)
meanCheck <- mean(checkTimes)
sdCheck <- sd(checkTimes)
deliveryTime <- sample(posteriorTimes, size=nChecks, replace=TRUE)
checkTime <- round(rnorm(nChecks, mean = meanCheck, sd = sdCheck))
newSamples <- mapply(function (ck, dy) { if(ck>dy) { return(c(0,ck)) } else { return(c(ck,1440)) }},
checkTime, deliveryTime)
newData <- data.frame(Time1=newSamples[1,], Time2=newSamples[2,])
return(sum(newData$Time1 == 0))
}
)
png(file="~/wiki/images/maildelivery/ppc.png", width = 800, height = 500)
qplot(simsPPC) + geom_vline(xintercept=sum(mail$Time1 == 0), color="blue")
invisible(dev.off())
~~~
![Similarity of posterior predictive distribution's flip-flops with the original data's flip-flops](/images/maildelivery/ppc.png)
The true data has a summary near the middle of the distribution of the summaries of the posterior predictions, suggesting that the normal distribution's fit to the problem is OK (at least, when considered in isolation).
### Crossvalidation
CV comes in several flavors, but since the data is not big, I can use leave-one-out crossvalidation.
The usual error measures are unavailable due to interval-censoring, so after some pondering, I went with a zero-one loss based on whether a delivery is predicted or not at a particular check-time.
Mean-squared-error is out due to the intervaling, as are variants like absolute error. Thinking about it some, it seems to me that each datapoint is really made of two things: the check-time, and delivery status. The check-time has nothing to do with the model quality and the model isn't trying to predict it; what I want the model to predict is whether the mail is delivered or not *if* I were to check at particular times. The check-time is the predictor variable and the delivery-status is the response variable.
So I could ask the model's posterior predictive distribution of delivery-times (`y.new`) whether or not the mail would or would not be delivered at a particular time, and compare it against the held-out datapoint's actual delivery status, 1 if the model correctly predicts delivered/not-delivered and 0 if it predicts the opposite of what the data said. (So a zero-one loss.)
The base-rate of delivery vs non-delivery will be 50-50, so a good model should achieve a better mean zero-one loss than 0.5.
For distributions, we'll compare the normal used all along, the _t_-distribution (for its fatter tails), and a little more exotically, the log-normal & exponential distributions as well.
Of these, the first two are the most likely, and I think it's highly unlikely that the log-normal or exponential could fit better (because they will tend to predict extremely late deliveries, much later than I believe is at all realistic, while the _t_'s spread is more reasonable in accommodating some non-normality but not too much). I include those two just to see what will happen and test out the robustness of any approach used.
The 4 models in JAGS:
~~~{.R}
modelN <- "model { for (i in 1:n){
y[i] ~ dinterval(t[i], dt[i,])
t[i] ~ dnorm(mu,tau)
}
mu ~ dnorm(650, pow(30, -2))
sd ~ dunif(10, 60)
tau <- pow(1/sd, 2)
y.new ~ dnorm(mu, tau) }"
modelT <- "model { for (i in 1:n){
y[i] ~ dinterval(t[i], dt[i,])
t[i] ~ dt(mu,tau,nu)
}
nu ~ dexp(1/30)
mu ~ dnorm(650, pow(30, -2))
sd ~ dunif(10, 60)
tau <- pow(1/sd, 2)
y.new ~ dnorm(mu, tau) }"
modelL <- "model { for (i in 1:n) {
y[i] ~ dinterval(t[i], dt[i,])
t[i] ~ dlnorm(muL, tauL)
}
sd ~ dunif(10, 60)
tauL <- pow(1/log(sd), 2)
mu ~ dnorm(650, pow(30, -2))
muL <- log(mu)
y.new ~ dlnorm(muL, tauL) }"
modelE <- "model { for (i in 1:n){
y[i] ~ dinterval(t[i], dt[i,])
t[i] ~ dexp(theta)
}
# set the mean ~600, but remember rate is reciprocal:
theta ~ dexp(1/600)
y.new ~ dexp(theta) }"
~~~
Here's a try at implementing leave-one-out crossvalidation for those four JAGS models on the interval-censored mail data with a zero-one loss defined that way based on delivery status:
~~~{.R}
runModel <- function(newData, model, iter=5000) {
# set up and run the model and extract predicted delivery times
data <- list("dt"=newData, "n"=nrow(newData), "y"=rep(1, nrow(newData)))
inits <- function() { list(mu=rnorm(1,mean=600,sd=30),sd=30,t=as.vector(apply(newData,1,mean))) }
params <- c("y.new")
cv1 <- jags(data, inits, params, textConnection(model), n.iter=iter, progress.bar="none")
posteriorTimes <- cv1$BUGSoutput$sims.list[["y.new"]]
return(posteriorTimes)
}
loocvs <- function(dt, model) {
results <- NULL
for (i in 1:nrow(dt)) {
# set up relevant data for this particular fold in the crossvalidation
ith <- dt[i,]
newData <- dt[-i,] # drop the _i_th datapoint
checkTime <- if (ith$Time1==0) { ith$Time2 } else { ith$Time1 }
there <- if (ith$Time1==0) { TRUE } else { FALSE }
posteriorTimes <- runModel(dt, model, iter=500)
# score predictions against heldout data point
results[i] <- mean(sapply(posteriorTimes,
function(t) { if (t