Caffeine wakeup experiment

Self-experiment on whether consuming caffeine immediately upon waking results in less time in bed & higher productivity. The results indicate a small and uncertain effect.
experiments, biology, statistics, decision-theory, Zeo, R, power-analysis, Bayes
2013-04-072016-11-10 finished certainty: possible importance: 5


One trick to com­bat morn­ing slug­gish­ness is to get caffeine extra-early by using caffeine pills shortly before or upon try­ing to get up. From 2013-2014 I ran a blinded & place­bo-con­trolled ran­dom­ized exper­i­ment mea­sur­ing the effect of caffeine pills in the morn­ing upon awak­en­ing time and daily pro­duc­tiv­i­ty. The esti­mated effect is small and the pos­te­rior prob­a­bil­ity rel­a­tively low, but a deci­sion analy­sis sug­gests that since caffeine pills are so cheap, it would be worth­while to con­duct another exper­i­ment; how­ev­er, increas­ing Zeo equip­ment prob­lems have made me hold off addi­tional exper­i­ments indefi­nite­ly.

With the com­ing of win­ter, I, like so many other peo­ple, have started to find sleep­ing in to be too tempt­ing: why get out of bed into the cold air when I can just snug­gle under my cov­ers and drowse another hour? This is bad because I was get­ting suffi­cient sleep as it was and did­n’t need more, and because I think it may exac­er­bate sleep iner­tia as the wak­ing process is dragged out for a long time. All in all, the days seemed less pro­duc­tive and drea­rier when­ever I crawled out of bed an hour later than usu­al.

Then I was reminded by Kaj Sotala of an blog post I’d seen a while back, “The Early Bird gets the Caffeine Pill”:

I set my alarm to 6:00 and 8:00. At 6:00 I go up, take a 50mg caffeine pill, and go to bed again. Then I sleep and wake up rested and ener­getic around 8. In my case the time for the pill to start work­ing seems to be 1.5 hours. A dose of one pill ensures that I wake up (but still yawn­ing) while two pills makes me start the day much more quick­ly. The added ben­e­fit is of course a reg­u­lar sleep sched­ule.

It sounds log­i­cal enough (why would­n’t a caffeine pill work?), and he cites a study suc­cess­fully try­ing a sim­i­lar trick with naps. I’d meant to try it out at some point, and win­ter was as good a rea­son as any. I already had an ample sup­ply of caffeine pills (tech­ni­cal­ly, pirac­etam+­caffeine+other­s), so I had just been pro­cras­ti­nat­ing on doing a design & set­ting up my usual RCT. I decided that I might as well try it out as a sim­ple easy non-blinded qua­si­-ex­per­i­men­tal alter­nate-day pilot exper­i­ment and if I felt like it after a month or two of data, I might try an RCT.

So on 2013-11-04, I started keep­ing a lit­tle jar of my caffeine+pirac­etam pills by my bed­side and using them on alter­nate days (specifi­cal­ly, my Zeo Smart­Wake fires in the 9-9:30AM win­dow and I take it then, while I may or may not snooze on). I stopped around April 2014.

Pilot analysis

The cor­re­la­tional data shows a 15-20 minute differ­ence in rise-time between caffeine & non-caffeine days.

First, does morn­ing caffeine affect total sleep or time awake? I would­n’t expect so, since it’s aimed at reduc­ing morn­ing wake­ful­ness:

zeo <- read.csv("https://www.gwern.net/docs/zeo/2014-06-28-gwern-zeodata-caffeinecorrelation.csv")
zeo$Morning.Caffeine <- as.logical(zeo$Morning.Caffeine)

wilcox.test(Total.Z ~ Morning.Caffeine, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Total.Z by Morning.Caffeine
# W = 2244, p-value = 0.7168
# alternative hypothesis: true location shift is not equal to 0

wilcox.test(Time.in.Wake ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Time.in.Wake by Morning.Caffeine
# W = 2090, p-value = 0.7623
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
#  -5  3
# sample estimates:
# difference in location
#                     -1

We should be able to see a shift in rise or wake time to an ear­lier time:

## convert "05/12/2014 06:45" to "06:45"
zeo$Rise.Time <- sapply(strsplit(as.character(zeo$Rise.Time), " "), function(x) { x[[2]] })
## convert "06:45" to 24300
interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x))
                                           else { y <- unlist(strsplit(x, ":"));
                                                  as.integer(y[[1]])*60 + as.integer(y[[2]]); }
                                                  }
                          else NA
                          }
zeo$Rise.Time <- sapply(zeo$Rise.Time, interval)
## `hist(zeo$Rise.Time)` looks normally distributed, but there's a big outlier, so we'll use a U-test:
wilcox.test(Rise.Time ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Rise.Time by Morning.Caffeine
# W = 2705, p-value = 0.01863
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
#   5 40
# sample estimates:
# difference in location
#                     20

A defi­nite hit! Ris­ing 20 min­utes ear­lier seems like a plau­si­ble esti­mate, too. Let’s take a look at the graph of rise-time over time:

zeo$Sleep.Date <- as.Date(zeo$Sleep.Date, format="%m/%d/%Y")
library(ggplot2)
qplot(Sleep.Date, Rise.Time, color=Morning.Caffeine, data=zeo)
What time I got up in the morn­ing, Novem­ber–June 2014; col­ored by whether affected by a caffeine wake-up pill

Two obser­va­tions imme­di­ately jump out:

  1. the blue points (caffeine-affect­ed) do seem to gen­er­ally be below the red points (caffeine-free) and the U-test’s claim is believ­able
  2. there seem to be very dis­tinct tem­po­ral pat­terns, which make any cor­re­la­tions or analy­sis treach­er­ous: before/after exper­i­ments will be worth­less since they will sam­ple from dis­tinct peri­ods of ris­ing-time, so an exper­i­ment should defi­nitely be blocked as pairs-of-days to min­i­mize the clear drift or sinu­soidal pat­tern.

A more pre­cise analy­sis with covari­ates is pos­si­ble; for exam­ple, depend­ing on how late I went to bed, that might affect when I get up in the morn­ing. But you have to be care­ful in what you look at - if you look at some­thing like ‘total sleep length’, well, that’s par­tially caused by sleep­ing in! It must be impos­si­ble for the vari­ables to be affected by sleep­ing in or not. So, Total.Z, Time.in.REM, etc are all out as covari­ates (at least, with­out dig­ging into the time-series data to cre­ate new ver­sions which exclude any sleep time after 8AM or so). I think we can safely include:

  1. how long it took to fall asleep;
  2. what time I went to sleep; which gives us a smaller esti­mate of 15 min­utes:
zeo$Start.of.Night <- sapply(strsplit(as.character(zeo$Start.of.Night), " "), function(x) { x[[2]] })
zeo$Start.of.Night <- sapply(zeo$Start.of.Night, interval)
summary(lm(formula = Rise.Time ~ Morning.Caffeine + Start.of.Night + Time.to.Z, data = zeo))
#
# Residuals:
#     Min      1Q  Median      3Q     Max
# -137.86  -32.13    1.84   32.29  109.22
#
# Coefficients:
#                      Estimate Std. Error t value Pr(>|t|)
# (Intercept)            63.982     45.647    1.40    0.163
# Morning.CaffeineTRUE  -15.847      8.321   -1.90    0.059
# Start.of.Night          0.519      0.100    5.17  7.7e-07
# Time.to.Z               0.286      0.271    1.05    0.294

Final­ly, let’s check for dam­age to my sleep; it’s no good avoid­ing sleep­ing in if that then makes me feel like shit:

wilcox.test(ZQ ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  ZQ by Morning.Caffeine
# W = 2086, p-value = 0.7491
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
#  -4  3
# sample estimates:
# difference in location
#                     -1
wilcox.test(Morning.Feel ~ Morning.Caffeine, conf.int=TRUE, data=zeo)
#
#   Wilcoxon rank sum test with continuity correction
#
# data:  Morning.Feel by Morning.Caffeine
# W = 2069, p-value = 0.6568
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
#  -1.34e-05  1.98e-05
# sample estimates:
# difference in location
#             -5.209e-05

These are the 2 main mea­sures of whether sleep qual­ity have degrad­ed, and both look good. So it seems the morn­ing caffeine cor­re­lates with ear­lier ris­ings but not with worse sleep or feel­ing bad when I get up.

Cor­re­la­tion!=­cau­sa­tion; there’s a plau­si­ble alter­na­tive: on days when I feel like sleep­ing in, I ‘for­got’ to take a caffeine pill. So it’s worth test­ing. How long does the exper­i­ment need to be for 80% power and a shift of 20 min­utes? (not 15m since not sure how reli­able that esti­mate is)

## Calculate effect size, plug into power formula:
t.test(Rise.Time ~ Morning.Caffeine, data=zeo)
#
#     Welch Two Sample t-test
#
# data:  Rise.Time by Morning.Caffeine
# t = 2.746, df = 81.84, p-value = 0.007417
# alternative hypothesis: true difference in means is not equal to 0
# 95% confidence interval:
#   6.23 38.99
# sample estimates:
# mean in group FALSE  mean in group TRUE
#               299.9               277.2
sd(zeo$Rise.Time)
# [1] 65.19
(299.9 - 277.2) / 65.19
# [1] 0.3482
power.t.test(d=0.3482, power=0.80, type="paired", alternative="one.sided")
#
#      Paired t test power calculation
#
#               n = 52.37
#           delta = 0.3482
#              sd = 1
#       sig.level = 0.05
#           power = 0.8
#     alternative = one.sided
#
# NOTE: n is number of *pairs*, sd is std.dev. of *differences* within pairs

Using d = 0.35 as an effect size esti­mate, a proper blind exper­i­ment (block­ing pairs of days) will take >100 days (>50 placebo pills, >50 caffeine pill­s).

First morning caffeine experiment

I began 2014-06-29. I made the placebo pills the usual way with Bisquick, tossed together with the caffeine pills to equal­ize any coat­ing; I made 120, more than I need­ed, because it’s always annoy­ing to set up & make pills, and it only took 40 min­utes from start to cleanup. Unfor­tu­nate­ly, a few days into the exper­i­ment it became clear that my old caffeine pills had absorbed some ambi­ent mois­ture and the toss­ing had not equal­ized the sur­face fla­vor, so the placebo pills could be eas­ily dis­tin­guished from the caffeine pills by both fla­vor & tex­ture, ren­der­ing this not a blinded & ran­dom­ized exper­i­ment but just a ran­dom­ized exper­i­ment. I ran out of caffeine on 2016-05-04 and ter­mi­nated the exper­i­ment with n = 441.

Analysis

Data preparation

caffeine.r <- read.csv("~/wiki/docs/zeo/2016-05-04-caffeinemorning.csv", colClasses=c("Date", "integer"))
caffeine.c <- read.csv("~/wiki/docs/zeo/2014-06-28-gwern-zeodata-caffeinecorrelation.csv")
caffeine.c$Date <- as.Date(caffeine.c$Sleep.Date, format="%m/%d/%Y")
mp <- read.csv("~/selfexperiment/mp.csv", colClasses=c("Date", "integer"))
caffeine <- merge(mp, merge(caffeine.r, subset(caffeine.c, select=c("Date", "Morning.Caffeine")), all=TRUE), all=TRUE)
## combine Morning.Caffeine.r + Morning.Caffeine while preserving the NAs:
caffeine$Caffeine <- unlist(Map(function(a,b) { if (!is.na(a) & !is.na(b)) { return(a+b); } else
    { if(is.na(a)) { return(b);} else { return(a);}}},
    caffeine$Morning.Caffeine.r, caffeine$Morning.Caffeine))

zeo <- read.csv("~/wiki/docs/zeo/gwern-zeodata.csv")
zeo$Date <- as.Date(zeo$Sleep.Date, format="%m/%d/%Y")
zeo$Rise.Time <- sapply(strsplit(as.character(zeo$Rise.Time), " "), function(x) { x[2] })
interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x)) as.integer(sub(" s","",x))
                                           else { y <- unlist(strsplit(x, ":")); as.integer(y[[1]])*60 + as.integer(y[[2]]); }
                                         }
                          else NA
                        }
zeo$Rise.Time <- sapply(zeo$Rise.Time, interval)
zeo[(zeo$Date >= as.Date("2013-03-11")),]$Rise.Time  <-
 (zeo[(zeo$Date >= as.Date("2013-03-11")),]$Rise.Time + 226) %% (24*60)
zeo$Start.of.Night <- sapply(strsplit(as.character(zeo$Start.of.Night), " "), function(x) { x[2] })
zeo$Start.of.Night <- sapply(zeo$Start.of.Night, interval)

zeoCaffeine <- merge(caffeine, zeo, all=TRUE)

nightSubset <- subset(zeoCaffeine[zeoCaffeine$Date>=as.Date("2013-11-10") & zeoCaffeine$Date<=as.Date("2016-08-14"),],
  select=c("Date", "Start.of.Night", "Time.to.Z", "Total.Z", "Time.in.Wake", "Rise.Time", "Morning.Caffeine.r", "Morning.Caffeine", "Caffeine", "MP"))
write.csv(file="~/wiki/docs/zeo/2016-05-04-caffeine.csv", nightSubset, row.names=FALSE)

Bayesian analysis

caffeine <- read.csv("https://www.gwern.net/docs/zeo/2016-05-04-caffeine.csv", colClasses=c("Date", rep("numeric", 9)))

wilcox.test(Rise.Time ~ Morning.Caffeine.r, conf.int=TRUE, data=caffeine)
#   Wilcoxon rank sum test with continuity correction
#
# data:  Rise.Time by Morning.Caffeine.r
# W = 20672.5, p-value = 0.4470533
# alternative hypothesis: true location shift is not equal to 0
# 95 percent confidence interval:
#  -5.000008791 14.999948639
# sample estimates:
# difference in location
#             4.99994868

library(ggplot2)
qplot(Date, Rise.Time, color=as.logical(Morning.Caffeine.r), data=caffeine) + theme(legend.position = "none") + stat_smooth()
Wak­ing up time vs ran­dom­ized caffeine pill, 2014-2016

The U-test is unim­pressed, but we’ll con­tinue on with a more detailed medi­a­tion analy­sis, exam­in­ing the impact of the two caffeine vari­ables on Rise.Time and MP adjust­ing for Start.of.Night (as the only avail­able sleep vari­able unaffected by the inter­ven­tion). The plot of rise time shows clear tem­po­ral trends, which makes it a very good thing that ran­dom­iza­tion was done on a daily basis; we must be espe­cially cau­tious about any­thing which might be messed up by tem­po­ral trends, such as data impu­ta­tion (if I try expand­ing the dataset by imput­ing ‘0’ for all the days which did indeed not have caffeine, the results would be that using caffeine makes me wake up later, because much of the non-caffeine use is con­founded with ear­lier rise times ear­lier in the dataset).

I also trans­form to more nor­mal­ity some sleep vari­ables which are skewed, scale the dataset, and add some mildly infor­ma­tive pri­ors (im­ply­ing mostly that effects larger than 1SD are unlike­ly) to sta­bi­lize the blavaan run:

library(blavaan)
## transform
caffeine$Time.to.Z.root <- caffeine$Time.to.Z^(1/3)
caffeine$Rise.Time.root <- sqrt(caffeine$Rise.Time)
caffeine$Total.Z.2 <- caffeine$Total.Z^2
caffeine$Time.in.Wake.log <- log1p(caffeine$Time.in.Wake)

model1 <- 'Rise.Time.root    ~ Start.of.Night + Time.to.Z.root + Morning.Caffeine.r + Morning.Caffeine
           Total.Z.2         ~ Start.of.Night + Time.to.Z.root + Morning.Caffeine.r + Morning.Caffeine
           Time.in.Wake.log  ~ Start.of.Night + Time.to.Z.root + Morning.Caffeine.r + Morning.Caffeine

           MP                ~ Start.of.Night + Time.to.Z.root + Total.Z.2 + Time.in.Wake.log +
                               Morning.Caffeine.r + Morning.Caffeine + Rise.Time.root'

b <- bsem(model1, convergence="auto", burnin=300000,
          dp = dpriors(nu = "dnorm(0,1)", alpha = "dnorm(0,1)", beta = "dnorm(0,200)"),
          n.chains=7, jagcontrol=list(method="rjparallel"), fixed.x=FALSE,
          data=scale(caffeine[,-1]))
summary(b)
#                                                   Used       Total
#   Number of observations                           994        1016
#
#   Number of missing patterns                         7
#
#   Statistic                                 MargLogLik         PPP
#   Value                                      -8208.015       0.000
#
# Parameter Estimates:
#
#   Information                                     MCMC
#   Standard Errors                                 MCMC
#                      Post.Mean  Post.SD  HPD.025  HPD.975     PSRF         Prior
#   Rise.Time.root ~
#     Start.of.Night       0.230    0.039    0.152    0.306    1.039  dnorm(0,200)
#     Time.to.Z.root       0.008    0.033   -0.056    0.072    1.014  dnorm(0,200)
#     Morning.Cffn.r      -0.041    0.052   -0.142    0.063    1.046  dnorm(0,200)
#     Morning.Caffen      -0.175    0.077   -0.313   -0.019    1.024  dnorm(0,200)
#   Total.Z.2 ~
#     Start.of.Night      -0.307    0.037   -0.379   -0.235    1.015  dnorm(0,200)
#     Time.to.Z.root       0.073    0.031    0.012    0.134    1.009  dnorm(0,200)
#     Morning.Cffn.r      -0.040    0.048   -0.134    0.055    1.028  dnorm(0,200)
#     Morning.Caffen      -0.121    0.097   -0.294    0.071    1.055  dnorm(0,200)
#   Time.in.Wake.log ~
#     Start.of.Night      -0.079    0.034   -0.144    -0.01    1.002  dnorm(0,200)
#     Time.to.Z.root       0.130    0.032    0.068    0.192    1.001  dnorm(0,200)
#     Morning.Cffn.r       0.000    0.043   -0.086    0.083    1.003  dnorm(0,200)
#     Morning.Caffen      -0.030    0.061   -0.145    0.094    1.025  dnorm(0,200)
#   MP ~
#     Start.of.Night      -0.042    0.036   -0.112     0.03    1.000  dnorm(0,200)
#     Time.to.Z.root       0.069    0.033    0.005    0.132    1.000  dnorm(0,200)
#     Total.Z.2           -0.068    0.035   -0.137        0    1.002  dnorm(0,200)
#     Time.in.Wak.lg      -0.004    0.032   -0.067    0.058    1.000  dnorm(0,200)
#     Morning.Cffn.r       0.025    0.042   -0.058    0.107    1.003  dnorm(0,200)
#     Morning.Caffen       0.027    0.052   -0.073    0.131    1.003  dnorm(0,200)
#     Rise.Time.root      -0.056    0.034   -0.123    0.012    1.001  dnorm(0,200)
#
# Covariances:
#                         Post.Mean  Post.SD  HPD.025  HPD.975     PSRF         Prior
#   Start.of.Night ~~
#     Time.to.Z.root         -0.328    0.036     -0.4    -0.26    1.002    dbeta(1,1)
#     Morning.Cffn.r          0.039    0.071   -0.103    0.182    1.371    dbeta(1,1)
#     Morning.Caffen         -0.198    0.167   -0.519    0.128    1.200    dbeta(1,1)
#   Time.to.Z.root ~~
#     Morning.Cffn.r         -0.054    0.053   -0.157    0.051    1.152    dbeta(1,1)
#     Morning.Caffen          0.067    0.105   -0.136     0.28    1.236    dbeta(1,1)
#   Morning.Caffeine.r ~~
#     Morning.Caffen          0.161    0.216   -0.241    0.623    1.236    dbeta(1,1)
# ...

The fairly sim­i­lar esti­mates for Morning.Caffeine.r and Morning.Caffeine sug­gests that the qua­si­-ex­per­i­men­tal alter­nate-day inter­ven­tion yields the same effect as the ran­dom­ized one and they can be merged for a more pre­cise esti­mate (since blavaan does­n’t yet sup­port mul­ti­level mod­el­ing):

model2 <- 'Rise.Time.root    ~ Start.of.Night + Time.to.Z.root + Caffeine
           Total.Z.2         ~ Start.of.Night + Time.to.Z.root + Caffeine
           Time.in.Wake.log  ~ Start.of.Night + Time.to.Z.root + Caffeine

           MP                ~ Start.of.Night + Time.to.Z.root + Total.Z.2 + Time.in.Wake.log +
                               Caffeine + Rise.Time.root'
b2 <- bsem(model2, convergence="auto", burnin=60000,
          dp = dpriors(nu = "dnorm(0,1)", alpha = "dnorm(0,1)", beta = "dnorm(0,200)"),
          n.chains=7, jagcontrol=list(method="rjparallel"), fixed.x=FALSE,
          data=scale(caffeine[,-1]))
summary(b2)
#                                                   Used       Total
#   Number of observations                           994        1016
#
#   Number of missing patterns                         6
#
#   Statistic                                 MargLogLik         PPP
#   Value                                      -8206.054       0.000
#
# Parameter Estimates:
#
#   Information                                     MCMC
#   Standard Errors                                 MCMC
#
# Regressions:
#                      Post.Mean  Post.SD  HPD.025  HPD.975     PSRF         Prior
#   Rise.Time.root ~
#     Start.of.Night       0.254    0.031    0.193    0.315    1.000  dnorm(0,200)
#     Time.to.Z.root       0.009    0.031   -0.052    0.068    1.000  dnorm(0,200)
#     Caffeine            -0.013    0.037   -0.085    0.059    1.001  dnorm(0,200)
#   Total.Z.2 ~
#     Start.of.Night      -0.290    0.030   -0.349    -0.23    1.000  dnorm(0,200)
#     Time.to.Z.root       0.076    0.030    0.018    0.137    1.000  dnorm(0,200)
#     Caffeine            -0.043    0.037   -0.116    0.031    1.001  dnorm(0,200)
#   Time.in.Wake.log ~
#     Start.of.Night      -0.077    0.032   -0.139   -0.014    1.000  dnorm(0,200)
#     Time.to.Z.root       0.131    0.031    0.067     0.19    1.000  dnorm(0,200)
#     Caffeine             0.007    0.039   -0.069    0.085    1.001  dnorm(0,200)
#   MP ~
#     Start.of.Night      -0.047    0.035   -0.116     0.02    1.000  dnorm(0,200)
#     Time.to.Z.root       0.068    0.032    0.006    0.133    1.000  dnorm(0,200)
#     Total.Z.2           -0.071    0.033   -0.137   -0.006    1.000  dnorm(0,200)
#     Time.in.Wak.lg      -0.005    0.032   -0.069    0.057    1.000  dnorm(0,200)
#     Caffeine             0.040    0.036   -0.028    0.111    1.000  dnorm(0,200)
#     Rise.Time.root      -0.063    0.033   -0.129        0    1.000  dnorm(0,200)
#
# Covariances:
#                     Post.Mean  Post.SD  HPD.025  HPD.975     PSRF         Prior
#   Start.of.Night ~~
#     Time.to.Z.root     -0.329    0.036     -0.4   -0.258    1.001    dbeta(1,1)
#     Caffeine            0.064    0.068   -0.061    0.203    1.065    dbeta(1,1)
#   Time.to.Z.root ~~
#     Caffeine            0.013    0.048   -0.077    0.107    1.038    dbeta(1,1)
# ...

BF(b, b2)
# Laplace approximation to the log-Bayes factor (experimental):
#   -1.962

library(lavaan)
s  <- sem(model1, se="boot", missing="FIML", data=scale(caffeine[,-1]))
s2 <- sem(model2, se="boot", missing="FIML", data=scale(caffeine[,-1]))
anova(s, s2)
#    Df       AIC       BIC    Chisq Chisq diff Df diff Pr(>Chisq)
# s   3 16241.960 16374.307 61.42858
# s2  3 16279.827 16392.567 97.02922   35.60063       0 < 2.22e-16
summary(s)
# lavaan (0.5-20) converged normally after  25 iterations
#
#                                                   Used       Total
#   Number of observations                           994        1016
#
#   Number of missing patterns                         7
#
#   Estimator                                         ML
#   Minimum Function Test Statistic               61.429
#   Degrees of freedom                                 3
#   P-value (Chi-square)                           0.000
#
# Parameter Estimates:
#
#   Information                                 Observed
#   Standard Errors                            Bootstrap
#   Number of requested bootstrap draws             1000
#   Number of successful bootstrap draws            1000
#
# Regressions:
#                      Estimate  Std.Err  Z-value  P(>|z|)
#   Rise.Time.root ~
#     Start.of.Night      0.267    0.045    5.992    0.000
#     Time.to.Z.root      0.056    0.040    1.403    0.161
#     Morning.Cffn.r     -0.107    0.085   -1.261    0.207
#     Morning.Caffen     -0.378    0.052   -7.323    0.000
#   Total.Z.2 ~
#     Start.of.Night     -0.395    0.079   -5.029    0.000
#     Time.to.Z.root      0.095    0.047    2.005    0.045
#     Morning.Cffn.r     -0.087    0.079   -1.107    0.268
#     Morning.Caffen     -0.344    0.077   -4.439    0.000
#   Time.in.Wake.log ~
#     Start.of.Night     -0.097    0.041   -2.374    0.018
#     Time.to.Z.root      0.165    0.038    4.369    0.000
#     Morning.Cffn.r     -0.004    0.053   -0.075    0.940
#     Morning.Caffen     -0.092    0.066   -1.390    0.165
#   MP ~
#     Start.of.Night     -0.052    0.042   -1.227    0.220
#     Time.to.Z.root      0.082    0.036    2.287    0.022
#     Total.Z.2          -0.079    0.047   -1.685    0.092
#     Time.in.Wak.lg     -0.005    0.035   -0.136    0.892
#     Morning.Cffn.r      0.042    0.050    0.845    0.398
#     Morning.Caffen      0.045    0.074    0.609    0.543
#     Rise.Time.root     -0.049    0.049   -1.005    0.315

A log BF of >0 favors the sec­ond mod­el, while <0 favors the first mod­el; so a log BF of -1.962 or (ex­po­nen­ti­at­ed) a BF of 0.14 sug­gests that the merged caffeine vari­able is ~1/6th as likely as the orig­i­nal model and we should­n’t use it. This agrees with the lavaan-based ANOVA and AICs/BICs which favor the orig­i­nal two-caffeine-vari­able.

Total effect

The SEM expresses the fact that the morn­ing caffeine can affect a vari­ety of out­comes, which can them­selves affect the most impor­tant out­come, MP. So the morn­ing caffeine has a direct effect on the MP rat­ing, but also has an indi­rect effect through sleep vari­ables like rise time or total sleep. There’s a 78% prob­a­bil­ity that the caffeine caused an ear­lier rise time, but that does­n’t answer the broader ques­tion, for which we need to extract all the pos­si­ble path­ways and esti­mate a total effect.

samples <- combine.mcmc(b@external$runjags$mcmc)
## Morning.Caffeine.r is the third variable in the samples, "beta[3,1]"
posteriorSamplesRiseTimeCaffeine <- samples[,3]
mean(posteriorSamplesRiseTimeCaffeine<0)
# [1] 0.7870357143

# beta[7,1]
posteriorSamplesTotalzCaffeine <- samples[,7]
# beta[11,1]
posteriorSamplesTimeinwakeCaffeine <- samples[,11]
# beta[17,1]
posteriorSamplesMPCaffeine <- samples[,17]
# beta[19,1]
posteriorSamplesMPRiseTime <- samples[,19]
# beta[15,1]
posteriorSamplesMPTotalz <- samples[,15]
# beta[16,1]
posteriorSamplesMPTimeinwake <- samples[,16]

totalEffectMPCaffeine <- posteriorSamplesMPCaffeine + posteriorSamplesRiseTimeCaffeine*posteriorSamplesMPRiseTime +
    posteriorSamplesTotalzCaffeine*posteriorSamplesMPTotalz +
    posteriorSamplesMPTimeinwake*posteriorSamplesTimeinwakeCaffeine
mean(totalEffectMPCaffeine>0)
# [1] 0.76425
summary(totalEffectMPCaffeine * sd(caffeine$MP, na.rm=TRUE))
# 1. Empirical mean and standard deviation for each variable,
#    plus standard error of the mean:
#           Mean             SD       Naive SE Time-series SE
#   0.0301945943   0.0424681462   0.0002537957   0.0007091098
# 2. Quantiles for each variable:
#         2.5%          25%          50%          75%        97.5%
# -0.054155266  0.001929353  0.030434397  0.058675436  0.112366851

The indi­rect and direct effects cumu­la­tively imply a 76% prob­a­bil­ity that the morn­ing caffeine use caused an increase in my MP rat­ings.

Decision analysis

How much is that worth in prac­tice? 200mg caffeine pills cost ~$0.05/pill; with a mean effect of 0.03 (1/33), if I found it worth­while, I would implic­itly value a +1 in my MP rat­ing at $2.5.

(1 / (mean(totalEffectMPCaffeine) * sd(caffeine$MP, na.rm=TRUE))) * 0.05
# [1] 2.495673554
summary(totalEffectMPCaffeine * sd(caffeine$MP, na.rm=TRUE) * 4.8 - 0.05)
# 1. Empirical mean and standard deviation for each variable,
#    plus standard error of the mean:
#           Mean             SD       Naive SE Time-series SE
#   0.0461664235   0.1352563209   0.0008083111   0.0022584358
# 2. Quantiles for each variable:
#        2.5%         25%         50%         75%       97.5%
# -0.22247850 -0.04385523  0.04693017  0.13687474  0.30787592
(0.0461664235*365.25) / log(1.05)
# [1] 345.60831

I would cer­tainly pay $2.4/day or $75/month for a daily boost of +1; I would pay up to, I think, around $150/month ($4.8/day), sug­gest­ing that the profit would tech­ni­cally be profitable at ~$0.046/day (ig­nor­ing the has­sle of tak­ing a pill before wak­ing up) or a NPV of $345. That said, I would put hav­ing to take a pill before wak­ing up at some­where around $0.05 so… it’s a wash. Why both­er? That’s a triv­ial near-zero gain and it’s based on flimsy evi­dence with con­sid­er­able pos­si­bil­ity of net neg­a­tive effects.

My opin­ion here is that the ran­dom­ized result shows that it’s not worth­while for me despite the highly promis­ing ini­tial results.

What is the value of fur­ther exper­i­men­ta­tion? EVSI for a $12 jar of 240x200mg caffeine pills:

evsis <- replicate(50, {
    n_additional <- 240 # $12 jar of 200mg caffeine pills

    ## bootstrap collection of additional data from the randomized experiment:
    newData <- caffeine[!is.na(caffeine$Morning.Caffeine.r),]
    indices <- sample(1:nrow(newData), 240, replace=TRUE)
    newData <- rbind(caffeine, newData[indices,])

    ## delete Morning.Caffeine from the model since we don't need it anymore
    model3 <- 'Rise.Time.root    ~ Start.of.Night + Time.to.Z.root + Morning.Caffeine.r
               Total.Z.2         ~ Start.of.Night + Time.to.Z.root + Morning.Caffeine.r
               Time.in.Wake.log  ~ Start.of.Night + Time.to.Z.root + Morning.Caffeine.r

               MP                ~ Start.of.Night + Time.to.Z.root + Total.Z.2 + Time.in.Wake.log +
                                   Morning.Caffeine.r + Rise.Time.root'

    ## evaluate
    b_new <- bsem(model3, convergence="auto", burnin=300000, test="none",
                  dp = dpriors(nu = "dnorm(0,1)", alpha = "dnorm(0,1)", beta = "dnorm(0,200)"),
                  n.chains=7, jagcontrol=list(method="rjparallel"), fixed.x=FALSE,
                  data=scale(newData[,-1]))

    ## extract posteriors for coefficients, combine by paths, to get a net effect
    samples <- combine.mcmc(b_new@external$runjags$mcmc)
    posteriorSamplesRiseTimeCaffeine <- samples[,3]
    posteriorSamplesTotalzCaffeine <- samples[,7]
    posteriorSamplesTimeinwakeCaffeine <- samples[,11]
    posteriorSamplesMPCaffeine <- samples[,17]
    posteriorSamplesMPRiseTime <- samples[,19]
    posteriorSamplesMPTotalz <- samples[,15]
    posteriorSamplesMPTimeinwake <- samples[,16]
    totalEffectMPCaffeine <- posteriorSamplesMPCaffeine + posteriorSamplesRiseTimeCaffeine*posteriorSamplesMPRiseTime +
        posteriorSamplesTotalzCaffeine*posteriorSamplesMPTotalz +
        posteriorSamplesMPTimeinwake*posteriorSamplesTimeinwakeCaffeine
    ## estimate profit
    oldProfit <- 0
    newProfit <- mean(totalEffectMPCaffeine * sd(caffeine$MP, na.rm=TRUE) * 4.8 - 0.10) / log(1.05)
    print(newProfit)
    return(max(oldProfit, newProfit)) }
)
mean(evsis) - 12
# [1] 42.35478143

So I am going to have to run another self­-ex­per­i­ment.