Bitter Melon for blood glucose

Analysis of whether bitter melon reduces blood glucose in one self-experiment and utility of further self-experimentation
statistics, decision-theory, R, Bayes
2015-09-142016-07-29 finished certainty: likely importance: 6

I re-an­a­lyze a bitter-melon/blood-glucose self­-ex­per­i­ment, find­ing a small effect of increas­ing blood glu­cose after cor­rect­ing for tem­po­ral trends & daily vari­a­tion, giv­ing both fre­quen­tist & Bayesian analy­ses. I then ana­lyze the self­-ex­per­i­ment from a sub­jec­tive Bayesian deci­sion-the­o­retic per­spec­tive, cur­so­rily esti­mat­ing the costs of dia­betes & ben­e­fits of inter­ven­tion in order to esti­mate Value Of Infor­ma­tion for the self­-ex­per­i­ment and the ben­e­fit of fur­ther self­-ex­per­i­ment­ing; I find that the expected value of more data (EVSI) is neg­a­tive and fur­ther self­-ex­per­i­ment­ing would not be opti­mal com­pared to try­ing out other anti-di­a­betes inter­ven­tions.

is an Asian fruit which may reduce blood glu­cose lev­els (the stud­ies appar­ently con­flict and one might be a pri­ori dubi­ous of it1). In June 2015, Paul LaFontaine ran a self­-ex­per­i­ment on 900mg of bit­ter melon extract taken with break­fast (Vit­a­min World brand: 100x450mg, $25), ran­dom­ized daily for 20 days, mea­sur­ing blood glu­cose lev­els with a nor­mal fin­ger­prick test kit before break­fast, at 10AM, and 3PM; the 20-day ran­dom­iza­tion was fol­lowed by a sin­gle block of 13 days with bit­ter melon use. There was no place­bo-con­trol or blind­ing.

LaFontaine reports that his t-tests indi­cates no sta­tis­ti­cal­ly-sig­nifi­cant effects, and point-val­ues indi­cat­ing bit­ter melon harm­fully increases blood glu­cose.


The first thing to note is that tak­ing his data, con­vert­ing to long for­mat, and plot­ting it:

melon <- read.csv(stdin(), header=TRUE)

melon2 <- reshape(melon, varying=c("Read.wake", "Read.10am", "Read.3pm"), timevar="Date", direction=long)
melon2 <- melon2[order(melon2$Date),]
qplot(Date, Read, color=as.logical(Bitter.Melon), data=melon2) +
 geom_smooth(aes(group=1)) + geom_point(size=I(4)) + theme(legend.position = "none")
LaFontaine 2015 June bit­ter melon self­-ex­per­i­ment for reduc­ing blood glu­cose lev­els

Has an unmis­tak­able time trend of the blood glu­cose lev­els increas­ing almost lin­early over time.


So while LaFontaine is cer­tainly cor­rect that bit­ter melon does not sta­tis­ti­cal­ly-sig­nifi­cantly lower total daily blood glu­cose lev­els, and the point-es­ti­mates are dis­turb­ing large:

with(melon, t.test((Read.wake+Read.10am+Read.3pm) ~ Bitter.Melon))
#   Welch Two Sample t-test
# data:  (Read.wake + Read.10am + Read.3pm) by Bitter.Melon
# t = -1.0975011, df = 16.927915, p-value = 0.2877892
# alternative hypothesis: true difference in means is not equal to 0
# 95% confidence interval:
#  -30.459040732   9.618131641
# sample estimates:
# mean in group 0 mean in group 1
#     314.1250000     324.5454545
with(melon, wilcox.test((Read.wake+Read.10am+Read.3pm) ~ Bitter.Melon,
#   Wilcoxon rank sum test with continuity correction
# data:  (Read.wake + Read.10am + Read.3pm) by Bitter.Melon
# W = 29.5, p-value = 0.2472612
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
#  -35.999976618   7.000064149
# sample estimates:
# difference in location
#            -11.2627309

Nei­ther the t-test nor U test can yield valid results because the assump­tion is vio­lated (dat­a­points come from differ­ent dis­tri­b­u­tions depend­ing on what time they were col­lect­ed), and since bit­ter melon is back­loaded in the final 13 days where blood glu­cose is higher than ever (for unknown rea­son­s), that alone could drive the esti­mates of harm. So the tem­po­ral trend needs to be mod­eled some­how; one way, with­out going to ful­l-blown time-series mod­els, is to regress on the index of the date

Another thing to notice in the plot is that blood glu­cose tests are both highly vari­able within a day, and highly vari­able between days as well. The vari­abil­ity between days implies that a here would be good. The vari­abil­ity within days, on the other hand, implies that the differ­ent mea­sure­ment times should be treated as covari­ates them­selves, but also that we should remem­ber that home blood glu­cose tests are not infi­nitely pre­cise but the man­u­fac­tur­ers claim an accu­racy of some­thing like ±5ng/ml (and in using my own blood glu­cose test strips, I find they can be much more noisy than that), so mod­el­ing the mea­sure­men­t-er­ror would be worth­while.

A first stab at con­trol­ling for the tem­po­ral effect reduces the bit­ter melon esti­mate a bit to 9.7:

summary(lm(I(Read.wake + Read.10am + Read.3pm) ~ Bitter.Melon + Exercise + as.integer(Date), data=melon))
# ...Coefficients:
#                     Estimate  Std. Error  t value   Pr(>|t|)
# (Intercept)      300.0368557  18.3528242 16.34827 5.7259e-11
# Bitter.Melon       9.7182611  10.0285767  0.96906    0.34788
# Exercise           2.7759120  13.3097375  0.20856    0.83760
# as.integer(Date)   1.1515280   0.8078333  1.42545    0.17450
# Residual standard error: 21.23433 on 15 degrees of freedom
#   (16 observations deleted due to missingness)
# Multiple R-squared:  0.1810341, Adjusted R-squared:  0.01724088
# F-statistic:  1.10526 on 3 and 15 DF,  p-value: 0.3777754

Switch­ing to long for­mat lets us imme­di­ately fit a mul­ti­level model with ran­dom effects for days and also treat­ing time of day as a covari­ate, where the bit­ter melon esti­mate has now dropped by a third, to 3.7:

mlm <- lmer(Read ~ Exercise + Bitter.Melon + Measurement + as.integer(Date) + (1|Date), data=melon2)
# ...Random effects:
#  Groups   Name        Variance Std.Dev.
#  Date     (Intercept) 13.44398 3.666603
#  Residual             79.75841 8.930756
# Number of obs: 85, groups:  Date, 35
# Fixed effects:
#                     Estimate  Std. Error  t value
# (Intercept)      104.5433728   4.2378163 24.66916
# Exercise           0.8480363   2.8328254  0.29936
# Bitter.Melon       3.7739611   2.6027090  1.45001
# as.integer(Date)   0.3386202   0.1422834  2.37990
# Measurement3pm    -8.4730381   2.5321712 -3.34616
# Measurementwake   -4.9616823   2.3702767 -2.09329
# Correlation of Fixed Effects:
#             (Intr) Exercs Bttr.M as.(D) Msrmn3
# Exercise    -0.726
# Bitter.Meln -0.355  0.095
# as.ntgr(Dt) -0.649  0.495 -0.209
# Measrmnt3pm -0.337  0.017  0.093 -0.012
# Measurmntwk -0.273 -0.012  0.022 -0.086  0.527
#                            2.5 %         97.5 %
# .sig01             0.00000000000   6.2022516958
# .sigma             7.30173759853  10.6754722267
# (Intercept)       96.53558709879 112.5507720395
# Exercise          -4.51695529051   6.2003858378
# Bitter.Melon      -1.15069394879   8.6949759136
# as.integer(Date)   0.06908453319   0.6074697878
# Measurement3pm   -13.39921098464  -3.5485825386
# Measurementwake   -9.49214591713  -0.4055767680

Blood glu­cose mea­sure­ments defi­nitely differ by time of day, and days do clus­ter, so this model works much bet­ter than the lin­ear model or t-test did. Crit­i­cal­ly, we see that the bit­ter melon esti­mate is still much smaller than what we had from before, hav­ing dropped to a quar­ter of the effect size; this shows that the vio­la­tion of assump­tions was dri­ving much of the appar­ent harm.

As far as mea­sure­ment error goes, that can be mod­eled by lin­ear mea­sure­ment error models/error-in-variable models/Deming regression/total least squares/orthogonal regres­sion, but these usu­ally seem to assume that you have mul­ti­ple mea­sure­ments of the same dat­a­point (if LaFontaine had mea­sured 3 times imme­di­ately in the morn­ing for each day, instead of once at 3 differ­ent times of day) from which the size of the error can be esti­mat­ed, while here we just have prior infor­ma­tion.


So that moti­vates a switch to Bayesian mod­el­ing using JAGS. Here we have a mul­ti­level (for days) mea­sure­men­t-er­ror (blood glu­cose lev­els treated as a latent vari­able) model with fixed-effects for exer­cise, bit­ter mel­on, date index, and 3 times of day (man­u­ally turned into dummy vari­ables since unlike lm and lmer, JAGS does­n’t auto­mat­i­cally expand a fac­tor into mul­ti­ple dummy vari­ables):

model1 <- "model { ~ dunif(0, 30) <- pow(, -2) ~ dunif(0, 10)
   for (j in 1:m) {
        b3_1[j] ~ dnorm(,

    for (i in 1:n) {
        Blood.noise[i] ~ dnorm(0, tau.Blood.noise)
        Blood.hat[i] <- a + b1*Blood.noise[i] + b4*Exercise[i] + b2*Bitter.Melon[i] +
                            b3_1[Date[i]] + b3_2*Date[i]  + b5*Morning[i]  + b6*am10[i] + b7*pm3[i]
        Blood[i] ~ dnorm(Blood.hat[i], tau)

    a  ~ dnorm(0, .001)

    b1 ~ dnorm(0, .001)
    b2 ~ dnorm(0, .001)
    b3_2 ~ dnorm(0, .001)
    b4 ~ dnorm(0, .001)
    b5 ~ dnorm(0, .001)
    b6 ~ dnorm(0, .001)
    b7 ~ dnorm(0, .001)

    sigma ~ dunif(0, 20)
    tau <- pow(sigma, -2)

    # SD of LaFontaine's blood glucose measurements is ~9.2,
    # manufacturers claim within ~5 accuracy, so use that as the
    # prior for the accuracy of blood glucose measurements:
    tau.Blood.noise <- 1 / pow((5/9.2), 2)
j1 <- with(melon2, run.jags(model1, data=list(n=nrow(melon2), Blood=Read, Bitter.Melon=Bitter.Melon,
                                   Date=as.integer(Date), m=length(levels(Date)), Exercise=Exercise,
                                   Morning=as.integer(Measurement=="wake"), am10=as.integer(Measurement=="10am"),
                         monitor=c("b1", "b2", "b3_2", "b4", "b5", "b6", "b7"), sample=500000))
# JAGS model summary statistics from 1000000 samples (chains = 2; adapt+burnin = 5000):
#       Lower95  Median Upper95    Mean      SD Mode     MCerr MC%ofSD SSeff   AC.500   psrf
# b1    -17.787 0.42309  17.967 0.33449  11.746   --   0.27033     2.3  1888  0.17493 1.0003
# b2     -1.172  3.9699  9.3653  3.9906  2.6803   --  0.026348       1 10348 0.017809      1
# b3_2 0.076985 0.36628 0.65914 0.36598 0.14869   -- 0.0015911     1.1  8733 0.017328 1.0002
# b4    -4.3368  1.3828  7.1201  1.3751  2.9205   --  0.031235     1.1  8742 0.015053 1.0006
# b5    -10.561  19.604  53.525  19.768  16.298   --   0.52169     3.2   976  0.39992 1.0038
# b6    -7.4183   24.57  56.427  24.724  16.304   --   0.52847     3.2   952  0.39991 1.0042
# b7    -14.914  16.107  49.277  16.305  16.324   --   0.53595     3.3   928  0.40045 1.0039

So here the mean esti­mate for bit­ter mel­on, b2, is 3.9 with neg­a­tive val­ues still pos­si­ble. (Cu­ri­ous­ly, b4, whether LaFontaine exer­cised the morn­ing of that day, is close to zero, though you would expect exer­cise to reduce blood glu­cose lev­els. Exer­cise may take time to kick in, so I won­der if I should have been treat­ing it as a lagged vari­able and includ­ing a vari­able for hav­ing exer­cised the day before?)

We can also see how the pos­te­rior dis­tri­b­u­tion of the bit­ter melon para­me­ter evolves with the data:

    for(n in 1:nrow(melon2)){
        newData <- melon2[1:n,]
        j <- with(newData, run.jags(model1, data=list(n=nrow(newData), Blood=Read, Bitter.Melon=Bitter.Melon,
                                     Date=as.integer(Date), m=length(levels(Date)), Exercise=Exercise,
                                     am10=as.integer(Measurement=="10am"), pm3=as.integer(Measurement=="3pm")),
                                    monitor=c("b2"), sample=6000, silent.jags=TRUE, summarise=FALSE))
        coeff <- as.mcmc.list(j, vars="b2")

        p <- qplot(as.vector(coeff[[1]]), binwidth=1) +
              coord_cartesian(xlim = c(-11, 11)) +
              ylab("Posterior density") + xlab("Effect on blood glucose (ng/ml)") + ggtitle(n) +
    interval = 0.5, ani.width = 800, ani.height=800, = "/home/gwern/wiki/images/nootropics/2015-lafontaine-bittermelon-samplebysample.gif")
Sim­u­lated data: pos­te­rior esti­mates evolv­ing sam­ple by sam­ple

We can see by eye that by the final mea­sure­ments, the prob­a­bil­ity that bit­ter mel­on’s effect size is neg­a­tive (re­duces blood glu­cose) has become small because so lit­tle of the pos­te­rior dis­tri­b­u­tion falls below zero. The mean of the bit­ter melon winds up not chang­ing our esti­mate notice­ably, but going fully Bayesian does have some nice side-effects like giv­ing us some­thing far more inter­pretable than a non-s­ta­tis­ti­cal­ly-sig­nifi­cant p or t-val­ue—the pos­te­rior prob­a­bil­ity bit­ter melon reduces blood glu­cose lev­els, which in this case is:

coeff <- as.mcmc.list(j1, vars="b2")
sum(coeff[[1]]<0) / length(coeff[[1]])
# [1] 0.067156

So the pos­te­rior prob­a­bil­ity that bit­ter melon low­ers blood glu­cose in this self­-ex­per­i­ment is 7%.


As well, LaFontaine is con­cerned to opti­mize his health and finan­cial expen­di­tures while not spend­ing too much effort test­ing out an inter­ven­tion:

As a result of this analy­sis I will no longer take Bit­ter Melon and save myself the mon­ey…I try to bal­ance the strength of the sta­tis­tics with prag­matic “no go” deci­sions on sup­ple­ments and other mech­a­nisms.

With the pos­te­rior dis­tri­b­u­tion from the Bayesian mod­el, we could exam­ine this ques­tion direct­ly: what is the cur­rent value of bit­ter melon and is the cur­rent exper­i­ment suffi­cient to rule out bit­ter melon use, or rule out col­lect­ing addi­tional data?

To do a cost-ben­e­fit, we assign costs to the use of bit­ter mel­on, risk & cost of devel­op­ing dia­betes, esti­mate how much a reduc­tion in blood glu­cose reduces dia­betes risk, and then we can work with the pos­te­rior dis­tri­b­u­tion to esti­mate


  1. cost of bit­ter melon use: some brows­ing sug­gests that a good buy would be 120x600mg at $14; 600mg total is a rec­om­mended dose for extract, so this is 120 days’ worth at $0.12/day or $44/year or, for indefi­nite con­sump­tion dis­counted at 5% annu­ally ()
  2. cost of dia­betes: while not famil­iar with the lit­er­a­ture, it’s clear that dia­betes is extremely expen­sive in every way: sub­stan­tial ongo­ing costs to mon­i­tor blood glu­cose (the cheap­est pos­si­ble test strips are still like $0.17, which at 3+ times a day adds up), there are seri­ous side-effects like blind­ness, increased rates of other dis­eases like can­cer (them­selves expen­sive), life-ex­pectancy reduc­tions etc Just the med­ical expen­di­tures could eas­ily be $124,600 (NPV) if devel­oped at age 40. (I learned after fin­ish­ing that LaFontaine is older than that so a bet­ter fig­ure would have been $53-91,000. Another good source of infor­ma­tion would be .) So avoid­ing it is impor­tant.
  3. How much does a reduc­tion in blood glu­cose reduce the risk of dia­betes, and how much is any given reduc­tion in risk itself worth com­pared to the annual cost of bit­ter mel­on? My strat­egy here is to look at RCTs of how much drugs reduce blood glu­cose, how much drugs reduce dia­betes rates, and assume that the drugs are exert­ing this effect through the blood glu­cose reduc­tion and define the reduc­tion in dia­betes risk per ng/ml accord­ing­ly. Here too I am not famil­iar with the large lit­er­a­ture, so what I did was I looked through one of the more recent meta-analy­ses; that and other meta-analy­ses did­n’t include reduc­tions in blood glu­cose or any esti­mate of the kind I want­ed, unfor­tu­nate­ly, so I then looked for the largest sin­gle study included. It found that in their sam­ple, the fast­ing blood glu­cose went from 5.8mmol/L on placebo to 5.4mmol/L in their inter­ven­tion (104.5 vs 97.3), and this was asso­ci­ated with the inter­ven­tion group hav­ing 60% of the risk as the con­trol group2. (We could also try look­ing at exist­ing deci­sion-the­ory treat­ments of dia­betes inter­ven­tions like Li et al 2010.)

If the male Amer­i­can life­time risk for dia­betes is 0.328, and the cost of dia­betes is at least $124,600, then the expected loss is ; if an inter­ven­tion low­er­ing by 7.2ng/ml is done and thus then risk is reduced by 60% to 40% of what it was, then the expected loss is or a reduc­tion in loss of $16,347.52. Assum­ing lin­ear respon­se, then each ng/ml reduc­tion of 1 was worth .

So for exam­ple, we might ask for the prob­a­bil­ity bit­ter melon reduces blood glu­cose by >=1 ng/ml:

sum(coeff[[1]]<(-1)) / length(coeff[[1]])
# [1] 0.030978

3.1% is not much, but the expected value of >=1ng/ml is >$70.33 () ie. it’s worth more than 1 year of bit­ter melon would cost—although still not more than the life­time cost of bit­ter mel­on. On the other hand, the low­est value of bit­ter mel­on’s effect with any sub­stan­tial prob­a­bil­ity is -5, which if it hap­pened to be true, would be worth quite a bit: $10,449 ().

What is our loss func­tion over the bit­ter-melon effec­t-size pos­te­rior dis­tri­b­u­tion? In the sce­nario where we take bit­ter mel­on: If the effect is <0 or neg­a­tive, and it reduces blood­-sug­ar, then the loss is -2270 times the effec­t-size plus the cost of life­time bit­ter melon ($901 from before). If bit­ter-melon actu­ally increases blood glu­cose (>0), then like­wise—the increased blood glu­cose does harm to our health and we still pay for bit­ter melon extracts.

On the other hand, if we don’t take bit­ter mel­on, then our loss is 0 since we don’t change our blood glu­cose and we don’t pay any­thing more for bit­ter mel­on.

mean(coeff[[1]]*2270 + 901)
# [1] 9925.345678

In this case, since the pos­te­rior esti­mate for bit­ter melon is skewed so heav­ily towards increas­ing blood­-glu­cose, the expected loss is very dis­mal even beyond the cost of buy­ing bit­ter mel­on; and since 0 is less than $9925, based on the results of this self­-ex­per­i­ment, we would pre­fer to not use bit­ter melon in the future.

Of course, we have other option­s,­like col­lect­ing more infor­ma­tion. Is it worth­while to exper­i­ment more on bit­ter mel­on?


The upper bound on the value of more infor­ma­tion is the (EVPI): our expected gain if an ora­cle told us the exact effect of bit­ter mel­on; but remem­ber, we gain only if we switch actions based on new infor­ma­tion, oth­er­wise the infor­ma­tion was just triv­ia.

If the ora­cle tells us bit­ter melon increases blood glu­cose by +1 or some­thing and so we should­n’t take it, this is worth­less to us since we had already decided to not take it; if it told us that it reduced by -0.4ng/ml, that exactly coun­ter­bal­ances the total cost of $901, so we still would­n’t change our action; while if the ora­cle tells us that bit­ter melon decreases blood glu­cose by -2, then we have learned some­thing valu­able since a reduc­tion of -2 ng/ml is worth $4.5k to us and the inter­ven­tion only costs $901, for a big win of $3.6k, or even more if it was actu­ally as much as -6. But we know it’s very unlikely a pos­te­ri­ori that any decrease would be as extreme as -6, and still­rather unlikely that it’s as much as -2, so we need to dis­count these ben­e­fit esti­mates like $3.6k by how prob­a­ble they are in the first place. And then we want to aver­age over all of them. This is eas­ily done with the pos­te­ri­or: for each sam­ple of the pos­si­ble effect (which was the para­me­ter b2), we ignore it if it’s above -0.4 and note that the infor­ma­tion was worth $0, while esti­mat­ing the gain if it is, and then take the aver­age.

mean(sapply(coeff[[1]], function(bg) { if(bg>=(-0.4)) { return(0); } else { return(-bg*2270 - 901); }}))
# [1] 132.6411989

Because we still haven’t totally ruled out that bit­ter melon reduces blood glu­cose, which would be extremely valu­able if it did, we would be will­ing to pay up to $132 for cer­tain­ty, since we might learn that it does reduce blood glu­cose by a use­ful amount. $132 also implies that we might want to do more exper­i­ment­ing, since another bot­tle of bit­ter melon would not cost much and could prob­a­bly drive down that remain­ing 7%, but is also close to zero, so we might not.

We don’t have access to any EVPI ora­cle, but we can instead try pre­pos­te­rior analy­sis: sim­u­lat­ing future data, re-es­ti­mat­ing the opti­mal deci­sion based on the new pos­te­ri­or, see­ing if it changes our deci­sion to not take bit­ter mel­on, and esti­mat­ing how ben­e­fi­cial that change is, then com­bine the prob­a­bil­ity & size of ben­e­fit to get expect­ed-value and weigh it against the upfront cost of doing more exper­i­ment­ing. This gives us (EVSI). Ide­al­ly, EVSI is pos­i­tive at the start of a tri­al, and as infor­ma­tion comes in, our pos­te­rior esti­mates firm up, our deci­sions become less likely to change, and the value of addi­tional infor­ma­tion decreases until EVSI becomes neg­a­tive; at which point we can then stop col­lect­ing data because the cost of col­lect­ing it no longer is less than the reduc­tion from bad deci­sions it might yield.

Let’s say each dat­a­point costs $2 (and since we’re mea­sur­ing 3 times a day, each day costs $6) between the bit­ter melon & has­sle. And we already defined all the other data, mod­el, and loss­es, so we can cal­cu­late EVSI.

Cal­cu­lat­ing EVSI for col­lect­ing one more dat­a­point is easy but it’s also inter­est­ing to cal­cu­late it his­tor­i­cally and get an idea of how EVSI increased or decreased over the trial

data <- melon2
sampleValues <- data.frame(N=NULL, newOptimumLoss=NULL, sampleValue=NULL, sampleValueProfit=NULL)
for (n in seq(from=1, to=(nrow(data)+10))) {

    evsis <- replicate(20, {
            # if n is more than we collected, bootstrap hypothetical new data; otherwise, just take that prefix
            # and pretend we are doing a sequential trial where we have only collected the first n observations thus far
            if (n > nrow(data)) { newData <- rbind(data, data[sample(1:nrow(data), n - nrow(data) , replace=TRUE),]) } else { newData <- data[1:n,] }

           kEVSI <- with(newData, run.jags(model1, data=list(n=nrow(newData), Blood=Read, Bitter.Melon=Bitter.Melon,
                                         Date=as.integer(Date), m=length(levels(Date)), Exercise=Exercise,
                                         am10=as.integer(Measurement=="10am"), pm3=as.integer(Measurement=="3pm")),
                                        monitor=c("b2"), sample=1000, silent.jags=TRUE, summarise=FALSE))
            coeff <- as.mcmc.list(kEVSI, vars="b2")

            lossNonuse <- 0
            lossUse <- mean(sapply(coeff[[1]], function(bg) { return(-bg*2270 + 901); }))

            # compare to the previous estimated optimum using n-1 data
            if (n==1) { oldOptimum <- 0;  } else { oldOptimum <- sampleValues[n-1,]$newOptimumLoss; }
            newOptimum <- max(c(lossNonuse, lossUse))
            sampleValue <- newOptimum - oldOptimum
            sampleCost <- 2
            sampleValueProfit <- sampleValue - (n*sampleCost)

            return(list(N=n, newOptimumLoss=newOptimum, sampleValue=sampleValue, sampleValueProfit=sampleValueProfit))
    sampleValues <- rbind(sampleValues, data.frame(N=n, newOptimumLoss=mean(unlist(evsis[2,])),
                                                   sampleValue=mean(unlist(evsis[3,])), sampleValueProfit=mean(unlist(evsis[4,]))))
#       N newOptimumLoss     sampleValue sampleValueProfit
# 1     1    0.000000000     0.000000000       -2.00000000
# 2     2    0.000000000     0.000000000       -4.00000000
# 3     3    0.000000000     0.000000000       -6.00000000
# 4     4    0.000000000     0.000000000       -8.00000000
# 5     5    0.000000000     0.000000000      -10.00000000
# 6     6    0.000000000     0.000000000      -12.00000000
# 7     7    0.000000000     0.000000000      -14.00000000
# 8     8    0.000000000     0.000000000      -16.00000000
# 9     9    0.000000000     0.000000000      -18.00000000
# 10   10  766.302423286   766.302423286      746.30242329
# 11   11  414.117599555  -352.184823732     -374.18482373
# 12   12   33.173212243  -380.944387312     -404.94438731
# 13   13  453.109658628   419.936446385      393.93644639
# 14   14  519.121851308    66.012192680       38.01219268
# 15   15  129.521213019  -389.600638289     -419.60063829
# 16   16 1220.999692932  1091.478479912     1059.47847991
# 17   17 2065.063702988   844.064010056      810.06401006
# 18   18 1173.637439342  -891.426263646     -927.42626365
# 19   19    0.000000000 -1173.637439342    -1211.63743934
# 20   20    0.000000000     0.000000000      -40.00000000
# 21   21    0.000000000     0.000000000      -42.00000000
# 22   22    0.000000000     0.000000000      -44.00000000
# 23   23    3.830036441     3.830036441      -42.16996356
# 24   24    0.000000000    -3.830036441      -51.83003644
# 25   25    0.000000000     0.000000000      -50.00000000
# 26   26    0.000000000     0.000000000      -52.00000000
# 27   27    0.000000000     0.000000000      -54.00000000
# ...
# 99   99    0.000000000     0.000000000     -198.00000000
# 100 100    0.000000000     0.000000000     -200.00000000
# 101 101  103.108222757   103.108222757      -98.89177724
# 102 102    0.000000000  -103.108222757     -307.10822276
# 103 103    0.000000000     0.000000000     -206.00000000
# 104 104    0.000000000     0.000000000     -208.00000000
# 105 105    0.000000000     0.000000000     -210.00000000
# 106 106    0.000000000     0.000000000     -212.00000000
# 107 107    0.000000000     0.000000000     -214.00000000
# 108 108    0.000000000     0.000000000     -216.00000000
# 109 109    0.000000000     0.000000000     -218.00000000
# 110 110    0.000000000     0.000000000     -220.00000000
# 111 111    0.000000000     0.000000000     -222.00000000
# 112 112    0.000000000     0.000000000     -224.00000000
# 113 113    0.000000000     0.000000000     -226.00000000
# 114 114    0.000000000     0.000000000     -228.00000000
# 115 115    0.000000000     0.000000000     -230.00000000

So it looks like by the 20th mea­sure­ment or so (cor­re­spond­ing to day #11, halfway through the first ran­dom­iza­tion), LaFontaine could have been rea­son­ably cer­tain (as­sum­ing non-in­for­ma­tive pri­ors etc) that the expected gain from fur­ther exper­i­men­ta­tion with bit­ter melon did not out­weigh the cost of addi­tional exper­i­men­ta­tion. And tak­ing another 10 sam­ples is like­wise expected to be a net loss, with none of the boot­strapped dat­a­points being able to shift the pos­te­rior enough to jus­tify tak­ing bit­ter mel­on.


The final out­come sug­gests that LaFontaine should not take bit­ter melon and (prob­a­bly) should­n’t exper­i­ment fur­ther with it; the high cost of dia­betes, though, indi­cates he should exper­i­ment much more with other anti-di­a­betes inter­ven­tions (it’s not clear to me whether drugs such as are a good idea pro­phy­lac­ti­cal­ly, but there’s many poten­tial inter­ven­tions like exer­cise kind).

There are many caveats to this con­clu­sion:

  1. I used a non­in­for­ma­tive prior on the effects of bit­ter mel­on, which implies that it’s as likely for bit­ter melon to drive blood glu­cose increases as decreas­es; this strikes me as implau­si­ble, and if I had tried to meta-an­a­lyze the past stud­ies on bit­ter mel­on, I would prob­a­bly have come up with a much stronger prior in favor of bit­ter mel­on, in which case the EVSI of sam­pling would take much longer to go neg­a­tive and might have reversed the rec­om­men­da­tions

    • on the other hand, the exist­ing bit­ter melon papers also sug­gest that tak­ing it in tablet form may be ineffec­tive and only the fresh or juice forms work; in which case, the fail­ure to show ben­e­fits was a fore­gone con­clu­sion and LaFontaine should­n’t’ve both­ered with test­ing some­thing already believed not to work
  2. the tem­po­ral trend in the blood glu­cose is con­cern­ing because it is too steep to rep­re­sent any kind of long-term trend (La­Fontaine would be dead by now if his blood glu­cose really did increase by 0.36ng/ml a day) but sug­gests some­thing wacky was going on dur­ing his self­-ex­per­i­ment (could the test strips have expired and been going bad such that the results are near-mean­ing­less?); this wack­i­ness is a joker in the deck, since what­ever is caus­ing it, could itself be neu­tral­iz­ing any ben­e­fit from bit­ter melon or it could diverge from a lin­ear trend in a way that increases the under­ly­ing sam­pling error beyond what is mod­eled

  3. the dia­betes cost esti­mate is too low; I included only the direct med­ical costs, though the cost of lost QALYs is prob­a­bly even larger and the esti­mate not half what it should be. This under­es­ti­ma­tion would bias ben­e­fit esti­mates down­wards and lead to pre­ma­ture end­ing of exper­i­ment­ing.

  4. in the other direc­tion, the con­ver­sion from blood glu­cose reduc­tions to dia­betes risk is ques­tion­able; some of the anti-di­a­betes drugs like met­formin are already believed to have effects not medi­ated solely through blood glu­cose. This over­es­ti­ma­tion of the effect of blood glu­cose reduc­tions would bias esti­mates upwards and lead to too much exper­i­ment­ing.

I sus­pect prob­lem #3 & #4 mostly can­cel out, that prob­lem #1 would have been a prob­lem if LaFontaine had actu­ally con­ducted a sequen­tial trial based on EVSI but since he over-col­lected data it does­n’t wind up being an issue (a favor­able prior prob­a­bly would have been can­celed out quick­ly), and #2 is the major rea­son that the results could be wrong.

Still, an inter­est­ing self­-ex­per­i­ment to try to ana­lyze.

See Also

  1. While bit­ter melon is tra­di­tion­ally eat­en, and many sweet fruits are not dan­ger­ous because they are evolved by the plant to be eaten by ani­mals for var­i­ous rea­sons, tra­di­tional use is far from a guar­an­tee of safety (con­sider the wide­spread low-grade cyanide poi­son­ing from one of the most com­mon crops, , whose dan­ger appar­ently does not out­weigh the caloric val­ue) and bit­ter­ness in seeds or fruits is a warn­ing sign of poten­tial low-grade tox­i­c­ity from the wide world of ( and enjoy­ing a bit­ter food must be learned); apple seeds are bit­ter and con­tain cyanide, like­wise (‘sweet’ or domes­ti­cated almonds have been bred for lack of cyanide), & & are safe for humans only because we are slightly differ­ent from insects & dogs, and the fruits , , beach apples, and are all dan­ger­ous to humans. Plants ‘want’ their fruit to be eaten but only by the spe­cific pol­li­na­tors and seed-spread­ers they are co-e­volved with… Hence, any fruit whose name lit­er­ally con­tains ‘bit­ter’ is sus­pi­cious.↩︎

  2. Specifi­cal­ly:

    …There was no sta­tis­ti­cal evi­dence of an inter­ac­tion between the rosigli­ta­zone and ramipril arms of the DREAM study for the pri­mary out­comes, sec­ondary out­comes, or their com­po­nents (in­ter­ac­tion p>0·11 for all; data not shown). The pri­mary out­come of dia­betes or death was seen in [sta­tis­ti­cal­ly-]sig­nifi­cantly fewer indi­vid­u­als in the rosigli­ta­zone group than in the placebo group (haz­ard ratio [HR] 0·40, 95% CI 0·35–0·46; p < 0·0001; table 2). There was no differ­ence in the num­ber of deaths (0·91, 0·55–1·49; p = 0·7) and a large differ­ence in the fre­quency of dia­betes (0·38, 0·33–0·44; p < 0·0001) between the two groups (table 2). The event curves for the pri­mary out­come diverged by the time of the first assess­ment (after 1 year of fol­low-up; fig­ure 2).

    …Fig­ure 5 shows the effect of rosigli­ta­zone on fast­ing and 2-h plasma glu­cose con­cen­tra­tions. The median fast­ing plasma glu­cose con­cen­tra­tion was 0·5 mmol/L lower in the rosigli­ta­zone group than in the placebo group (p < 0·0001); the 2-h plasma glu­cose con­cen­tra­tion was 1·6 mmol/L lower (p < 0·0001). Mean sys­tolic and dias­tolic blood pres­sure were 1·7 mm Hg and 1·4 mm Hg low­er, respec­tive­ly, in the rosigli­ta­zone group than in the placebo group (p < 0·0001). Fur­ther­more, mean hepatic ALT con­cen­tra­tions dur­ing the first year of ther­apy were 4·2 U/L lower in patients treated with rosigli­ta­zone than those in the placebo group (p < 0·0001). All results are for the final visit apart from the ALT differ­ence, which was at 1 year. Of note, there was no differ­ence in the use of anti­hy­per­ten­sive agents in the two groups dur­ing the tri­al. Final­ly, by the final visit mean body­weight was increased by 2·2 kg more in the rosigli­ta­zone group than in the placebo group (p < 0·0001). This increase in body­weight in the rosigli­ta­zone group was asso­ci­ated with a lower waist-to-hip ratio (p < 0·0001) because of an increase in hip cir­cum­fer­ence of 1·8 cm; there was no effect on waist cir­cum­fer­ence (fig­ure 6).

    …This large, prospec­tive, blinded inter­na­tional clin­i­cal trial shows that 8 mg of rosigli­ta­zone dai­ly, together with lifestyle rec­om­men­da­tions, sub­stan­tially reduces the risk of dia­betes or death by 60% in indi­vid­u­als at high risk for dia­betes. The absolute risk differ­ence between treat­ment groups of 14·4% means that for every seven peo­ple with impaired fast­ing glu­cose or impaired glu­cose tol­er­ance who are pre­scribed rosigli­ta­zone for 3 years, one will be pre­vented from devel­op­ing dia­betes. More­over, rosigli­ta­zone [sta­tis­ti­cal­ly-]sig­nifi­cantly increased the like­li­hood of regres­sion to nor­mo­gly­caemia by about 70–80% com­pared with place­bo. The reduc­tion in dia­betes reported here is of much the same mag­ni­tude as the reduc­tion achieved with lifestyle approaches4,5 and greater than the reduc­tions reported pre­vi­ously with drugs such as met­formin4 or acar­bose.3