Bitter Melon for blood glucose

Analysis of whether bitter melon reduces blood glucose in one self-experiment and utility of further self-experimentation
statistics, decision-theory, R, Bayes
2015-09-142016-07-29 finished certainty: likely importance: 6

I re-an­a­lyze a bit­ter-mel­on/blood­-glu­cose self­-ex­per­i­ment, find­ing a small effect of in­creas­ing blood glu­cose after cor­rect­ing for tem­po­ral trends & daily vari­a­tion, giv­ing both fre­quen­tist & Bayesian analy­ses. I then an­a­lyze the self­-ex­per­i­ment from a sub­jec­tive Bayesian de­ci­sion-the­o­retic per­spec­tive, cur­so­rily es­ti­mat­ing the costs of di­a­betes & ben­e­fits of in­ter­ven­tion in or­der to es­ti­mate Value Of In­for­ma­tion for the self­-ex­per­i­ment and the ben­e­fit of fur­ther self­-ex­per­i­ment­ing; I find that the ex­pected value of more data (EVSI) is neg­a­tive and fur­ther self­-ex­per­i­ment­ing would not be op­ti­mal com­pared to try­ing out other an­ti-di­a­betes in­ter­ven­tions.

is an Asian fruit which may re­duce blood glu­cose lev­els (the stud­ies ap­par­ently con­flict and one might be a pri­ori du­bi­ous of it1). In June 2015, Paul La­Fontaine ran a self­-ex­per­i­ment on 900mg of bit­ter melon ex­tract taken with break­fast (Vi­t­a­min World brand: 100x450mg, $25), ran­dom­ized daily for 20 days, mea­sur­ing blood glu­cose lev­els with a nor­mal fin­ger­prick test kit be­fore break­fast, at 10AM, and 3PM; the 20-day ran­dom­iza­tion was fol­lowed by a sin­gle block of 13 days with bit­ter melon use. There was no place­bo-con­trol or blind­ing.

La­Fontaine re­ports that his t-tests in­di­cates no sta­tis­ti­cal­ly-sig­nifi­cant effects, and point-val­ues in­di­cat­ing bit­ter melon harm­fully in­creases blood glu­cose.


The first thing to note is that tak­ing his data, con­vert­ing to long for­mat, and plot­ting it:

melon <- read.csv(stdin(), header=TRUE)

melon2 <- reshape(melon, varying=c("Read.wake", "Read.10am", "Read.3pm"), timevar="Date", direction=long)
melon2 <- melon2[order(melon2$Date),]
qplot(Date, Read, color=as.logical(Bitter.Melon), data=melon2) +
 geom_smooth(aes(group=1)) + geom_point(size=I(4)) + theme(legend.position = "none")
La­Fontaine 2015 June bit­ter melon self­-ex­per­i­ment for re­duc­ing blood glu­cose lev­els

Has an un­mis­tak­able time trend of the blood glu­cose lev­els in­creas­ing al­most lin­early over time.


So while La­Fontaine is cer­tainly cor­rect that bit­ter melon does not sta­tis­ti­cal­ly-sig­nifi­cantly lower to­tal daily blood glu­cose lev­els, and the point-es­ti­mates are dis­turb­ing large:

with(melon, t.test((Read.wake+Read.10am+Read.3pm) ~ Bitter.Melon))
#   Welch Two Sample t-test
# data:  (Read.wake + Read.10am + Read.3pm) by Bitter.Melon
# t = -1.0975011, df = 16.927915, p-value = 0.2877892
# alternative hypothesis: true difference in means is not equal to 0
# 95% confidence interval:
#  -30.459040732   9.618131641
# sample estimates:
# mean in group 0 mean in group 1
#     314.1250000     324.5454545
with(melon, wilcox.test((Read.wake+Read.10am+Read.3pm) ~ Bitter.Melon,
#   Wilcoxon rank sum test with continuity correction
# data:  (Read.wake + Read.10am + Read.3pm) by Bitter.Melon
# W = 29.5, p-value = 0.2472612
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
#  -35.999976618   7.000064149
# sample estimates:
# difference in location
#            -11.2627309

Nei­ther the t-test nor U test can yield valid re­sults be­cause the as­sump­tion is vi­o­lated (dat­a­points come from differ­ent dis­tri­b­u­tions de­pend­ing on what time they were col­lect­ed), and since bit­ter melon is back­loaded in the fi­nal 13 days where blood glu­cose is higher than ever (for un­known rea­son­s), that alone could drive the es­ti­mates of harm. So the tem­po­ral trend needs to be mod­eled some­how; one way, with­out go­ing to ful­l-blown time-series mod­els, is to regress on the in­dex of the date

An­other thing to no­tice in the plot is that blood glu­cose tests are both highly vari­able within a day, and highly vari­able be­tween days as well. The vari­abil­ity be­tween days im­plies that a here would be good. The vari­abil­ity within days, on the other hand, im­plies that the differ­ent mea­sure­ment times should be treated as co­vari­ates them­selves, but also that we should re­mem­ber that home blood glu­cose tests are not in­fi­nitely pre­cise but the man­u­fac­tur­ers claim an ac­cu­racy of some­thing like ±5ng/ml (and in us­ing my own blood glu­cose test strips, I find they can be much more noisy than that), so mod­el­ing the mea­sure­men­t-er­ror would be worth­while.

A first stab at con­trol­ling for the tem­po­ral effect re­duces the bit­ter melon es­ti­mate a bit to 9.7:

summary(lm(I(Read.wake + Read.10am + Read.3pm) ~ Bitter.Melon + Exercise + as.integer(Date), data=melon))
# ...Coefficients:
#                     Estimate  Std. Error  t value   Pr(>|t|)
# (Intercept)      300.0368557  18.3528242 16.34827 5.7259e-11
# Bitter.Melon       9.7182611  10.0285767  0.96906    0.34788
# Exercise           2.7759120  13.3097375  0.20856    0.83760
# as.integer(Date)   1.1515280   0.8078333  1.42545    0.17450
# Residual standard error: 21.23433 on 15 degrees of freedom
#   (16 observations deleted due to missingness)
# Multiple R-squared:  0.1810341, Adjusted R-squared:  0.01724088
# F-statistic:  1.10526 on 3 and 15 DF,  p-value: 0.3777754

Switch­ing to long for­mat lets us im­me­di­ately fit a mul­ti­level model with ran­dom effects for days and also treat­ing time of day as a co­vari­ate, where the bit­ter melon es­ti­mate has now dropped by a third, to 3.7:

mlm <- lmer(Read ~ Exercise + Bitter.Melon + Measurement + as.integer(Date) + (1|Date), data=melon2)
# ...Random effects:
#  Groups   Name        Variance Std.Dev.
#  Date     (Intercept) 13.44398 3.666603
#  Residual             79.75841 8.930756
# Number of obs: 85, groups:  Date, 35
# Fixed effects:
#                     Estimate  Std. Error  t value
# (Intercept)      104.5433728   4.2378163 24.66916
# Exercise           0.8480363   2.8328254  0.29936
# Bitter.Melon       3.7739611   2.6027090  1.45001
# as.integer(Date)   0.3386202   0.1422834  2.37990
# Measurement3pm    -8.4730381   2.5321712 -3.34616
# Measurementwake   -4.9616823   2.3702767 -2.09329
# Correlation of Fixed Effects:
#             (Intr) Exercs Bttr.M as.(D) Msrmn3
# Exercise    -0.726
# Bitter.Meln -0.355  0.095
# as.ntgr(Dt) -0.649  0.495 -0.209
# Measrmnt3pm -0.337  0.017  0.093 -0.012
# Measurmntwk -0.273 -0.012  0.022 -0.086  0.527
#                            2.5 %         97.5 %
# .sig01             0.00000000000   6.2022516958
# .sigma             7.30173759853  10.6754722267
# (Intercept)       96.53558709879 112.5507720395
# Exercise          -4.51695529051   6.2003858378
# Bitter.Melon      -1.15069394879   8.6949759136
# as.integer(Date)   0.06908453319   0.6074697878
# Measurement3pm   -13.39921098464  -3.5485825386
# Measurementwake   -9.49214591713  -0.4055767680

Blood glu­cose mea­sure­ments defi­nitely differ by time of day, and days do clus­ter, so this model works much bet­ter than the lin­ear model or t-test did. Crit­i­cal­ly, we see that the bit­ter melon es­ti­mate is still much smaller than what we had from be­fore, hav­ing dropped to a quar­ter of the effect size; this shows that the vi­o­la­tion of as­sump­tions was dri­ving much of the ap­par­ent harm.

As far as mea­sure­ment er­ror goes, that can be mod­eled by lin­ear mea­sure­ment er­ror mod­el­s/er­ror-in-vari­able mod­el­s/Dem­ing re­gres­sion/­to­tal least squares/orthog­o­nal re­gres­sion, but these usu­ally seem to as­sume that you have mul­ti­ple mea­sure­ments of the same dat­a­point (if La­Fontaine had mea­sured 3 times im­me­di­ately in the morn­ing for each day, in­stead of once at 3 differ­ent times of day) from which the size of the er­ror can be es­ti­mat­ed, while here we just have prior in­for­ma­tion.


So that mo­ti­vates a switch to Bayesian mod­el­ing us­ing JAGS. Here we have a mul­ti­level (for days) mea­sure­men­t-er­ror (blood glu­cose lev­els treated as a la­tent vari­able) model with fixed-effects for ex­er­cise, bit­ter mel­on, date in­dex, and 3 times of day (man­u­ally turned into dummy vari­ables since un­like lm and lmer, JAGS does­n’t au­to­mat­i­cally ex­pand a fac­tor into mul­ti­ple dummy vari­ables):

model1 <- "model { ~ dunif(0, 30) <- pow(, -2) ~ dunif(0, 10)
   for (j in 1:m) {
        b3_1[j] ~ dnorm(,

    for (i in 1:n) {
        Blood.noise[i] ~ dnorm(0, tau.Blood.noise)
        Blood.hat[i] <- a + b1*Blood.noise[i] + b4*Exercise[i] + b2*Bitter.Melon[i] +
                            b3_1[Date[i]] + b3_2*Date[i]  + b5*Morning[i]  + b6*am10[i] + b7*pm3[i]
        Blood[i] ~ dnorm(Blood.hat[i], tau)

    a  ~ dnorm(0, .001)

    b1 ~ dnorm(0, .001)
    b2 ~ dnorm(0, .001)
    b3_2 ~ dnorm(0, .001)
    b4 ~ dnorm(0, .001)
    b5 ~ dnorm(0, .001)
    b6 ~ dnorm(0, .001)
    b7 ~ dnorm(0, .001)

    sigma ~ dunif(0, 20)
    tau <- pow(sigma, -2)

    # SD of LaFontaine's blood glucose measurements is ~9.2,
    # manufacturers claim within ~5 accuracy, so use that as the
    # prior for the accuracy of blood glucose measurements:
    tau.Blood.noise <- 1 / pow((5/9.2), 2)
j1 <- with(melon2, run.jags(model1, data=list(n=nrow(melon2), Blood=Read, Bitter.Melon=Bitter.Melon,
                                   Date=as.integer(Date), m=length(levels(Date)), Exercise=Exercise,
                                   Morning=as.integer(Measurement=="wake"), am10=as.integer(Measurement=="10am"),
                         monitor=c("b1", "b2", "b3_2", "b4", "b5", "b6", "b7"), sample=500000))
# JAGS model summary statistics from 1000000 samples (chains = 2; adapt+burnin = 5000):
#       Lower95  Median Upper95    Mean      SD Mode     MCerr MC%ofSD SSeff   AC.500   psrf
# b1    -17.787 0.42309  17.967 0.33449  11.746   --   0.27033     2.3  1888  0.17493 1.0003
# b2     -1.172  3.9699  9.3653  3.9906  2.6803   --  0.026348       1 10348 0.017809      1
# b3_2 0.076985 0.36628 0.65914 0.36598 0.14869   -- 0.0015911     1.1  8733 0.017328 1.0002
# b4    -4.3368  1.3828  7.1201  1.3751  2.9205   --  0.031235     1.1  8742 0.015053 1.0006
# b5    -10.561  19.604  53.525  19.768  16.298   --   0.52169     3.2   976  0.39992 1.0038
# b6    -7.4183   24.57  56.427  24.724  16.304   --   0.52847     3.2   952  0.39991 1.0042
# b7    -14.914  16.107  49.277  16.305  16.324   --   0.53595     3.3   928  0.40045 1.0039

So here the mean es­ti­mate for bit­ter mel­on, b2, is 3.9 with neg­a­tive val­ues still pos­si­ble. (Cu­ri­ous­ly, b4, whether La­Fontaine ex­er­cised the morn­ing of that day, is close to ze­ro, though you would ex­pect ex­er­cise to re­duce blood glu­cose lev­els. Ex­er­cise may take time to kick in, so I won­der if I should have been treat­ing it as a lagged vari­able and in­clud­ing a vari­able for hav­ing ex­er­cised the day be­fore?)

We can also see how the pos­te­rior dis­tri­b­u­tion of the bit­ter melon pa­ra­me­ter evolves with the data:

    for(n in 1:nrow(melon2)){
        newData <- melon2[1:n,]
        j <- with(newData, run.jags(model1, data=list(n=nrow(newData), Blood=Read, Bitter.Melon=Bitter.Melon,
                                     Date=as.integer(Date), m=length(levels(Date)), Exercise=Exercise,
                                     am10=as.integer(Measurement=="10am"), pm3=as.integer(Measurement=="3pm")),
                                    monitor=c("b2"), sample=6000, silent.jags=TRUE, summarise=FALSE))
        coeff <- as.mcmc.list(j, vars="b2")

        p <- qplot(as.vector(coeff[[1]]), binwidth=1) +
              coord_cartesian(xlim = c(-11, 11)) +
              ylab("Posterior density") + xlab("Effect on blood glucose (ng/ml)") + ggtitle(n) +
    interval = 0.5, ani.width = 800, ani.height=800, = "/home/gwern/wiki/images/nootropics/2015-lafontaine-bittermelon-samplebysample.gif")
Sim­u­lated data: pos­te­rior es­ti­mates evolv­ing sam­ple by sam­ple

We can see by eye that by the fi­nal mea­sure­ments, the prob­a­bil­ity that bit­ter mel­on’s effect size is neg­a­tive (re­duces blood glu­cose) has be­come small be­cause so lit­tle of the pos­te­rior dis­tri­b­u­tion falls be­low ze­ro. The mean of the bit­ter melon winds up not chang­ing our es­ti­mate no­tice­ably, but go­ing fully Bayesian does have some nice side-effects like giv­ing us some­thing far more in­ter­pretable than a non-s­ta­tis­ti­cal­ly-sig­nifi­cant p or t-val­ue—the pos­te­rior prob­a­bil­ity bit­ter melon re­duces blood glu­cose lev­els, which in this case is:

coeff <- as.mcmc.list(j1, vars="b2")
sum(coeff[[1]]<0) / length(coeff[[1]])
# [1] 0.067156

So the pos­te­rior prob­a­bil­ity that bit­ter melon low­ers blood glu­cose in this self­-ex­per­i­ment is 7%.


As well, La­Fontaine is con­cerned to op­ti­mize his health and fi­nan­cial ex­pen­di­tures while not spend­ing too much effort test­ing out an in­ter­ven­tion:

As a re­sult of this analy­sis I will no longer take Bit­ter Melon and save my­self the mon­ey…I try to bal­ance the strength of the sta­tis­tics with prag­matic “no go” de­ci­sions on sup­ple­ments and other mech­a­nisms.

With the pos­te­rior dis­tri­b­u­tion from the Bayesian mod­el, we could ex­am­ine this ques­tion di­rect­ly: what is the cur­rent value of bit­ter melon and is the cur­rent ex­per­i­ment suffi­cient to rule out bit­ter melon use, or rule out col­lect­ing ad­di­tional data?

To do a cost-ben­e­fit, we as­sign costs to the use of bit­ter mel­on, risk & cost of de­vel­op­ing di­a­betes, es­ti­mate how much a re­duc­tion in blood glu­cose re­duces di­a­betes risk, and then we can work with the pos­te­rior dis­tri­b­u­tion to es­ti­mate


  1. cost of bit­ter melon use: some brows­ing sug­gests that a good buy would be 120x600mg at $14; 600mg to­tal is a rec­om­mended dose for ex­tract, so this is 120 days’ worth at $0.12/­day or $44/year or, for in­defi­nite con­sump­tion dis­counted at 5% an­nu­ally ()
  2. cost of di­a­betes: while not fa­mil­iar with the lit­er­a­ture, it’s clear that di­a­betes is ex­tremely ex­pen­sive in every way: sub­stan­tial on­go­ing costs to mon­i­tor blood glu­cose (the cheap­est pos­si­ble test strips are still like $0.17, which at 3+ times a day adds up), there are se­ri­ous side-effects like blind­ness, in­creased rates of other dis­eases like can­cer (them­selves ex­pen­sive), life-ex­pectancy re­duc­tions etc Just the med­ical ex­pen­di­tures could eas­ily be $124,600 (NPV) if de­vel­oped at age 40. (I learned after fin­ish­ing that La­Fontaine is older than that so a bet­ter fig­ure would have been $53-91,000. An­other good source of in­for­ma­tion would be .) So avoid­ing it is im­por­tant.
  3. How much does a re­duc­tion in blood glu­cose re­duce the risk of di­a­betes, and how much is any given re­duc­tion in risk it­self worth com­pared to the an­nual cost of bit­ter mel­on? My strat­egy here is to look at RCTs of how much drugs re­duce blood glu­cose, how much drugs re­duce di­a­betes rates, and as­sume that the drugs are ex­ert­ing this effect through the blood glu­cose re­duc­tion and de­fine the re­duc­tion in di­a­betes risk per ng/ml ac­cord­ing­ly. Here too I am not fa­mil­iar with the large lit­er­a­ture, so what I did was I looked through one of the more re­cent meta-analy­ses; that and other meta-analy­ses did­n’t in­clude re­duc­tions in blood glu­cose or any es­ti­mate of the kind I want­ed, un­for­tu­nate­ly, so I then looked for the largest sin­gle study in­cluded. It found that in their sam­ple, the fast­ing blood glu­cose went from 5.8m­mol/L on placebo to 5.4m­mol/L in their in­ter­ven­tion (104.5 vs 97.3), and this was as­so­ci­ated with the in­ter­ven­tion group hav­ing 60% of the risk as the con­trol group2. (We could also try look­ing at ex­ist­ing de­ci­sion-the­ory treat­ments of di­a­betes in­ter­ven­tions like Li et al 2010.)

If the male Amer­i­can life­time risk for di­a­betes is 0.328, and the cost of di­a­betes is at least $124,600, then the ex­pected loss is ; if an in­ter­ven­tion low­er­ing by 7.2ng/ml is done and thus then risk is re­duced by 60% to 40% of what it was, then the ex­pected loss is or a re­duc­tion in loss of $16,347.52. As­sum­ing lin­ear re­spon­se, then each ng/ml re­duc­tion of 1 was worth .

So for ex­am­ple, we might ask for the prob­a­bil­ity bit­ter melon re­duces blood glu­cose by >=1 ng/ml:

sum(coeff[[1]]<(-1)) / length(coeff[[1]])
# [1] 0.030978

3.1% is not much, but the ex­pected value of >=1ng/ml is >$70.33 () ie. it’s worth more than 1 year of bit­ter melon would cost—although still not more than the life­time cost of bit­ter mel­on. On the other hand, the low­est value of bit­ter mel­on’s effect with any sub­stan­tial prob­a­bil­ity is -5, which if it hap­pened to be true, would be worth quite a bit: $10,449 ().

What is our loss func­tion over the bit­ter-melon effec­t-size pos­te­rior dis­tri­b­u­tion? In the sce­nario where we take bit­ter mel­on: If the effect is <0 or neg­a­tive, and it re­duces blood­-sug­ar, then the loss is -2270 times the effec­t-size plus the cost of life­time bit­ter melon ($901 from be­fore). If bit­ter-melon ac­tu­ally in­creases blood glu­cose (>0), then like­wise—the in­creased blood glu­cose does harm to our health and we still pay for bit­ter melon ex­tracts.

On the other hand, if we don’t take bit­ter mel­on, then our loss is 0 since we don’t change our blood glu­cose and we don’t pay any­thing more for bit­ter mel­on.

mean(coeff[[1]]*2270 + 901)
# [1] 9925.345678

In this case, since the pos­te­rior es­ti­mate for bit­ter melon is skewed so heav­ily to­wards in­creas­ing blood­-glu­cose, the ex­pected loss is very dis­mal even be­yond the cost of buy­ing bit­ter mel­on; and since 0 is less than $9925, based on the re­sults of this self­-ex­per­i­ment, we would pre­fer to not use bit­ter melon in the fu­ture.

Of course, we have other op­tion­s,­like col­lect­ing more in­for­ma­tion. Is it worth­while to ex­per­i­ment more on bit­ter mel­on?


The up­per bound on the value of more in­for­ma­tion is the (EVPI): our ex­pected gain if an or­a­cle told us the ex­act effect of bit­ter mel­on; but re­mem­ber, we gain only if we switch ac­tions based on new in­for­ma­tion, oth­er­wise the in­for­ma­tion was just triv­ia.

If the or­a­cle tells us bit­ter melon in­creases blood glu­cose by +1 or some­thing and so we should­n’t take it, this is worth­less to us since we had al­ready de­cided to not take it; if it told us that it re­duced by -0.4ng/ml, that ex­actly coun­ter­bal­ances the to­tal cost of $901, so we still would­n’t change our ac­tion; while if the or­a­cle tells us that bit­ter melon de­creases blood glu­cose by -2, then we have learned some­thing valu­able since a re­duc­tion of -2 ng/ml is worth $4.5k to us and the in­ter­ven­tion only costs $901, for a big win of $3.6k, or even more if it was ac­tu­ally as much as -6. But we know it’s very un­likely a pos­te­ri­ori that any de­crease would be as ex­treme as -6, and still­rather un­likely that it’s as much as -2, so we need to dis­count these ben­e­fit es­ti­mates like $3.6k by how prob­a­ble they are in the first place. And then we want to av­er­age over all of them. This is eas­ily done with the pos­te­ri­or: for each sam­ple of the pos­si­ble effect (which was the pa­ra­me­ter b2), we ig­nore it if it’s above -0.4 and note that the in­for­ma­tion was worth $0, while es­ti­mat­ing the gain if it is, and then take the av­er­age.

mean(sapply(coeff[[1]], function(bg) { if(bg>=(-0.4)) { return(0); } else { return(-bg*2270 - 901); }}))
# [1] 132.6411989

Be­cause we still haven’t to­tally ruled out that bit­ter melon re­duces blood glu­cose, which would be ex­tremely valu­able if it did, we would be will­ing to pay up to $132 for cer­tain­ty, since we might learn that it does re­duce blood glu­cose by a use­ful amount. $132 also im­plies that we might want to do more ex­per­i­ment­ing, since an­other bot­tle of bit­ter melon would not cost much and could prob­a­bly drive down that re­main­ing 7%, but is also close to ze­ro, so we might not.

We don’t have ac­cess to any EVPI or­a­cle, but we can in­stead try pre­pos­te­rior analy­sis: sim­u­lat­ing fu­ture data, re-es­ti­mat­ing the op­ti­mal de­ci­sion based on the new pos­te­ri­or, see­ing if it changes our de­ci­sion to not take bit­ter mel­on, and es­ti­mat­ing how ben­e­fi­cial that change is, then com­bine the prob­a­bil­ity & size of ben­e­fit to get ex­pect­ed-value and weigh it against the up­front cost of do­ing more ex­per­i­ment­ing. This gives us (EVSI). Ide­al­ly, EVSI is pos­i­tive at the start of a tri­al, and as in­for­ma­tion comes in, our pos­te­rior es­ti­mates firm up, our de­ci­sions be­come less likely to change, and the value of ad­di­tional in­for­ma­tion de­creases un­til EVSI be­comes neg­a­tive; at which point we can then stop col­lect­ing data be­cause the cost of col­lect­ing it no longer is less than the re­duc­tion from bad de­ci­sions it might yield.

Let’s say each dat­a­point costs $2 (and since we’re mea­sur­ing 3 times a day, each day costs $6) be­tween the bit­ter melon & has­sle. And we al­ready de­fined all the other data, mod­el, and loss­es, so we can cal­cu­late EVSI.

Cal­cu­lat­ing EVSI for col­lect­ing one more dat­a­point is easy but it’s also in­ter­est­ing to cal­cu­late it his­tor­i­cally and get an idea of how EVSI in­creased or de­creased over the trial

data <- melon2
sampleValues <- data.frame(N=NULL, newOptimumLoss=NULL, sampleValue=NULL, sampleValueProfit=NULL)
for (n in seq(from=1, to=(nrow(data)+10))) {

    evsis <- replicate(20, {
            # if n is more than we collected, bootstrap hypothetical new data; otherwise, just take that prefix
            # and pretend we are doing a sequential trial where we have only collected the first n observations thus far
            if (n > nrow(data)) { newData <- rbind(data, data[sample(1:nrow(data), n - nrow(data) , replace=TRUE),]) } else { newData <- data[1:n,] }

           kEVSI <- with(newData, run.jags(model1, data=list(n=nrow(newData), Blood=Read, Bitter.Melon=Bitter.Melon,
                                         Date=as.integer(Date), m=length(levels(Date)), Exercise=Exercise,
                                         am10=as.integer(Measurement=="10am"), pm3=as.integer(Measurement=="3pm")),
                                        monitor=c("b2"), sample=1000, silent.jags=TRUE, summarise=FALSE))
            coeff <- as.mcmc.list(kEVSI, vars="b2")

            lossNonuse <- 0
            lossUse <- mean(sapply(coeff[[1]], function(bg) { return(-bg*2270 + 901); }))

            # compare to the previous estimated optimum using n-1 data
            if (n==1) { oldOptimum <- 0;  } else { oldOptimum <- sampleValues[n-1,]$newOptimumLoss; }
            newOptimum <- max(c(lossNonuse, lossUse))
            sampleValue <- newOptimum - oldOptimum
            sampleCost <- 2
            sampleValueProfit <- sampleValue - (n*sampleCost)

            return(list(N=n, newOptimumLoss=newOptimum, sampleValue=sampleValue, sampleValueProfit=sampleValueProfit))
    sampleValues <- rbind(sampleValues, data.frame(N=n, newOptimumLoss=mean(unlist(evsis[2,])),
                                                   sampleValue=mean(unlist(evsis[3,])), sampleValueProfit=mean(unlist(evsis[4,]))))
#       N newOptimumLoss     sampleValue sampleValueProfit
# 1     1    0.000000000     0.000000000       -2.00000000
# 2     2    0.000000000     0.000000000       -4.00000000
# 3     3    0.000000000     0.000000000       -6.00000000
# 4     4    0.000000000     0.000000000       -8.00000000
# 5     5    0.000000000     0.000000000      -10.00000000
# 6     6    0.000000000     0.000000000      -12.00000000
# 7     7    0.000000000     0.000000000      -14.00000000
# 8     8    0.000000000     0.000000000      -16.00000000
# 9     9    0.000000000     0.000000000      -18.00000000
# 10   10  766.302423286   766.302423286      746.30242329
# 11   11  414.117599555  -352.184823732     -374.18482373
# 12   12   33.173212243  -380.944387312     -404.94438731
# 13   13  453.109658628   419.936446385      393.93644639
# 14   14  519.121851308    66.012192680       38.01219268
# 15   15  129.521213019  -389.600638289     -419.60063829
# 16   16 1220.999692932  1091.478479912     1059.47847991
# 17   17 2065.063702988   844.064010056      810.06401006
# 18   18 1173.637439342  -891.426263646     -927.42626365
# 19   19    0.000000000 -1173.637439342    -1211.63743934
# 20   20    0.000000000     0.000000000      -40.00000000
# 21   21    0.000000000     0.000000000      -42.00000000
# 22   22    0.000000000     0.000000000      -44.00000000
# 23   23    3.830036441     3.830036441      -42.16996356
# 24   24    0.000000000    -3.830036441      -51.83003644
# 25   25    0.000000000     0.000000000      -50.00000000
# 26   26    0.000000000     0.000000000      -52.00000000
# 27   27    0.000000000     0.000000000      -54.00000000
# ...
# 99   99    0.000000000     0.000000000     -198.00000000
# 100 100    0.000000000     0.000000000     -200.00000000
# 101 101  103.108222757   103.108222757      -98.89177724
# 102 102    0.000000000  -103.108222757     -307.10822276
# 103 103    0.000000000     0.000000000     -206.00000000
# 104 104    0.000000000     0.000000000     -208.00000000
# 105 105    0.000000000     0.000000000     -210.00000000
# 106 106    0.000000000     0.000000000     -212.00000000
# 107 107    0.000000000     0.000000000     -214.00000000
# 108 108    0.000000000     0.000000000     -216.00000000
# 109 109    0.000000000     0.000000000     -218.00000000
# 110 110    0.000000000     0.000000000     -220.00000000
# 111 111    0.000000000     0.000000000     -222.00000000
# 112 112    0.000000000     0.000000000     -224.00000000
# 113 113    0.000000000     0.000000000     -226.00000000
# 114 114    0.000000000     0.000000000     -228.00000000
# 115 115    0.000000000     0.000000000     -230.00000000

So it looks like by the 20th mea­sure­ment or so (cor­re­spond­ing to day #11, halfway through the first ran­dom­iza­tion), La­Fontaine could have been rea­son­ably cer­tain (as­sum­ing non-in­for­ma­tive pri­ors etc) that the ex­pected gain from fur­ther ex­per­i­men­ta­tion with bit­ter melon did not out­weigh the cost of ad­di­tional ex­per­i­men­ta­tion. And tak­ing an­other 10 sam­ples is like­wise ex­pected to be a net loss, with none of the boot­strapped dat­a­points be­ing able to shift the pos­te­rior enough to jus­tify tak­ing bit­ter mel­on.


The fi­nal out­come sug­gests that La­Fontaine should not take bit­ter melon and (prob­a­bly) should­n’t ex­per­i­ment fur­ther with it; the high cost of di­a­betes, though, in­di­cates he should ex­per­i­ment much more with other an­ti-di­a­betes in­ter­ven­tions (it’s not clear to me whether drugs such as are a good idea pro­phy­lac­ti­cal­ly, but there’s many po­ten­tial in­ter­ven­tions like ex­er­cise kind).

There are many caveats to this con­clu­sion:

  1. I used a non­in­for­ma­tive prior on the effects of bit­ter mel­on, which im­plies that it’s as likely for bit­ter melon to drive blood glu­cose in­creases as de­creas­es; this strikes me as im­plau­si­ble, and if I had tried to meta-an­a­lyze the past stud­ies on bit­ter mel­on, I would prob­a­bly have come up with a much stronger prior in fa­vor of bit­ter mel­on, in which case the EVSI of sam­pling would take much longer to go neg­a­tive and might have re­versed the rec­om­men­da­tions

    • on the other hand, the ex­ist­ing bit­ter melon pa­pers also sug­gest that tak­ing it in tablet form may be in­effec­tive and only the fresh or juice forms work; in which case, the fail­ure to show ben­e­fits was a fore­gone con­clu­sion and La­Fontaine should­n’t’ve both­ered with test­ing some­thing al­ready be­lieved not to work
  2. the tem­po­ral trend in the blood glu­cose is con­cern­ing be­cause it is too steep to rep­re­sent any kind of long-term trend (La­Fontaine would be dead by now if his blood glu­cose re­ally did in­crease by 0.36ng/ml a day) but sug­gests some­thing wacky was go­ing on dur­ing his self­-ex­per­i­ment (could the test strips have ex­pired and been go­ing bad such that the re­sults are near-mean­ing­less?); this wack­i­ness is a joker in the deck, since what­ever is caus­ing it, could it­self be neu­tral­iz­ing any ben­e­fit from bit­ter melon or it could di­verge from a lin­ear trend in a way that in­creases the un­der­ly­ing sam­pling er­ror be­yond what is mod­eled

  3. the di­a­betes cost es­ti­mate is too low; I in­cluded only the di­rect med­ical costs, though the cost of lost QALYs is prob­a­bly even larger and the es­ti­mate not half what it should be. This un­der­es­ti­ma­tion would bias ben­e­fit es­ti­mates down­wards and lead to pre­ma­ture end­ing of ex­per­i­ment­ing.

  4. in the other di­rec­tion, the con­ver­sion from blood glu­cose re­duc­tions to di­a­betes risk is ques­tion­able; some of the an­ti-di­a­betes drugs like met­formin are al­ready be­lieved to have effects not me­di­ated solely through blood glu­cose. This over­es­ti­ma­tion of the effect of blood glu­cose re­duc­tions would bias es­ti­mates up­wards and lead to too much ex­per­i­ment­ing.

I sus­pect prob­lem #3 & #4 mostly can­cel out, that prob­lem #1 would have been a prob­lem if La­Fontaine had ac­tu­ally con­ducted a se­quen­tial trial based on EVSI but since he over-col­lected data it does­n’t wind up be­ing an is­sue (a fa­vor­able prior prob­a­bly would have been can­celed out quick­ly), and #2 is the ma­jor rea­son that the re­sults could be wrong.

Still, an in­ter­est­ing self­-ex­per­i­ment to try to an­a­lyze.

See Also

  1. While bit­ter melon is tra­di­tion­ally eat­en, and many sweet fruits are not dan­ger­ous be­cause they are evolved by the plant to be eaten by an­i­mals for var­i­ous rea­sons, tra­di­tional use is far from a guar­an­tee of safety (con­sider the wide­spread low-grade cyanide poi­son­ing from one of the most com­mon crops, , whose dan­ger ap­par­ently does not out­weigh the caloric val­ue) and bit­ter­ness in seeds or fruits is a warn­ing sign of po­ten­tial low-grade tox­i­c­ity from the wide world of ( and en­joy­ing a bit­ter food must be learned); ap­ple seeds are bit­ter and con­tain cyanide, like­wise (‘sweet’ or do­mes­ti­cated al­monds have been bred for lack of cyanide), & & are safe for hu­mans only be­cause we are slightly differ­ent from in­sects & dogs, and the fruits , , beach ap­ples, and are all dan­ger­ous to hu­mans. Plants ‘want’ their fruit to be eaten but only by the spe­cific pol­li­na­tors and seed-spread­ers they are co-e­volved with… Hence, any fruit whose name lit­er­ally con­tains ‘bit­ter’ is sus­pi­cious.↩︎

  2. Specifi­cal­ly:

    …There was no sta­tis­ti­cal ev­i­dence of an in­ter­ac­tion be­tween the rosigli­ta­zone and ramipril arms of the DREAM study for the pri­mary out­comes, sec­ondary out­comes, or their com­po­nents (in­ter­ac­tion p>0·11 for all; data not shown). The pri­mary out­come of di­a­betes or death was seen in [s­ta­tis­ti­cal­ly-]sig­nifi­cantly fewer in­di­vid­u­als in the rosigli­ta­zone group than in the placebo group (haz­ard ra­tio [HR] 0·40, 95% CI 0·35–0·46; p < 0·0001; ta­ble 2). There was no differ­ence in the num­ber of deaths (0·91, 0·55–1·49; p = 0·7) and a large differ­ence in the fre­quency of di­a­betes (0·38, 0·33–0·44; p < 0·0001) be­tween the two groups (table 2). The event curves for the pri­mary out­come di­verged by the time of the first as­sess­ment (after 1 year of fol­low-up; fig­ure 2).

    …Fig­ure 5 shows the effect of rosigli­ta­zone on fast­ing and 2-h plasma glu­cose con­cen­tra­tions. The me­dian fast­ing plasma glu­cose con­cen­tra­tion was 0·5 mmol/L lower in the rosigli­ta­zone group than in the placebo group (p < 0·0001); the 2-h plasma glu­cose con­cen­tra­tion was 1·6 mmol/L lower (p < 0·0001). Mean sys­tolic and di­as­tolic blood pres­sure were 1·7 mm Hg and 1·4 mm Hg low­er, re­spec­tive­ly, in the rosigli­ta­zone group than in the placebo group (p < 0·0001). Fur­ther­more, mean he­patic ALT con­cen­tra­tions dur­ing the first year of ther­apy were 4·2 U/L lower in pa­tients treated with rosigli­ta­zone than those in the placebo group (p < 0·0001). All re­sults are for the fi­nal visit apart from the ALT differ­ence, which was at 1 year. Of note, there was no differ­ence in the use of an­ti­hy­per­ten­sive agents in the two groups dur­ing the tri­al. Fi­nal­ly, by the fi­nal visit mean body­weight was in­creased by 2·2 kg more in the rosigli­ta­zone group than in the placebo group (p < 0·0001). This in­crease in body­weight in the rosigli­ta­zone group was as­so­ci­ated with a lower waist-to-hip ra­tio (p < 0·0001) be­cause of an in­crease in hip cir­cum­fer­ence of 1·8 cm; there was no effect on waist cir­cum­fer­ence (fig­ure 6).

    …This large, prospec­tive, blinded in­ter­na­tional clin­i­cal trial shows that 8 mg of rosigli­ta­zone dai­ly, to­gether with lifestyle rec­om­men­da­tions, sub­stan­tially re­duces the risk of di­a­betes or death by 60% in in­di­vid­u­als at high risk for di­a­betes. The ab­solute risk differ­ence be­tween treat­ment groups of 14·4% means that for every seven peo­ple with im­paired fast­ing glu­cose or im­paired glu­cose tol­er­ance who are pre­scribed rosigli­ta­zone for 3 years, one will be pre­vented from de­vel­op­ing di­a­betes. More­over, rosigli­ta­zone [s­ta­tis­ti­cal­ly-]sig­nifi­cantly in­creased the like­li­hood of re­gres­sion to nor­mo­gly­caemia by about 70–80% com­pared with place­bo. The re­duc­tion in di­a­betes re­ported here is of much the same mag­ni­tude as the re­duc­tion achieved with lifestyle ap­proaches4,5 and greater than the re­duc­tions re­ported pre­vi­ously with drugs such as met­formin4 or acar­bose.3