# Life Extension Cost-Benefits

Attempts at considering the profitability of life-extension interventions for healthy adults (statistics, decision theory, R, survival analysis, power analysis, Bayes)
created: 1 June 2015; modified: 02 Jan 2018; status: in progress; confidence: likely;

We all want to live longer, but I find most discussions of available interventions unsatisfying because they tend to ignore any consideration of whether the intervention is too expensive (as the joke goes about veganism, you may live longer but will you want to?), and be simultaneously too gullible (nutrition research; looking at any health parameter which improves) and too narrow-minded (I don’t care if an intervention has p>0.05 if that still means a 90% probability of benefit).

What I would like to see is a practical guide, taking a decision theory perspective, which follows a procedure something like the following, in taking interventions which:

1. reduce all-cause mortality,

ACM is the end-point of choice for life-extension because it is both closest to what we care about (additional life) and it is the hardest of all end-points - it is difficult to cheat, miscount, or not notice, and by being simple, consistent, and unambiguous, avoids many failure modes such as subgroup hacking or dropping outliers or tweaking covariates (either a larger fraction of the controls are dead or not). It also can typically be extracted from most studies, even if they otherwise fail to report much of the data.

From a benefit point of view, average reduction in all-cause mortality is excellent for ignoring any zero-sum tradeoffs: it’s no good if reported benefits are only because the doctors writing up a trial kept slicing deaths by different categorizations of causes until they finally found a p<0.05 or if the gains only appear by excluding some deaths as irrelevant or if the gains in one cause of death comes at the expense of another cause (is it really helpful to avoid death by heart attack if the intervention just means you die of cancer instead?). For example, one primate study of caloric restriction found benefits only if it excluded a bunch of deaths in the CR group and defined them as not age-related, ignoring monkeys who died while taking blood samples under anesthesia, from injuries or from infections, such as gastritis and endometriosis. (Even in the followup paper several years later, the all-cause mortality reduction is much smaller than the age-related subcategory.) From a life-extension perspective, it is irrelevant if an intervention reduces some kinds of diseases while making one much more prone to dying from routine causes such as injuries or minor operations under anesthesia (especially since, in modern societies, one is guaranteed to be operated on at some point) - it either reduces all-cause mortality or it doesn’t.
2. as estimated in randomized trials,

Correlation is not causation and correlates only rarely turn out to be usefully causal. With aging, the situation is especially severe since aging is the exponential increase in mortality with time as all bodily systems gradually begin to fail; one cannot simply take a correlation of some chemical with aging or mortality and expect an intervention on it to have more than a minute chance of making a difference, since almost every chemical or substance or biomarker will change with age (eg papers on magnesium levels often note that surprisingly, blood levels of magnesium don’t change substantially with age). Worse, over more than a century of concerted effort and tinkering with a bewildering variety of thousands of interventions from injecting embryos to consuming testicles (or crushing one’s own) to yogurt to yoga, it can safely be said that (almost?) all tried interventions have failed, and the prior odds are strongly against any new one.
3. on healthy adult human populations

Studies of interventions in the sick are also unhelpful, as (hopefully!) the benefits observed may come from their particular disorder being treated. Animal studies are also unhelpful as results cross-species are highly unreliable and model organisms may age in entirely different manner from humans, who are unusually long-lived as it is. One challenge for this criteria is studies done in elderly populations: because death rates are so high, they offer higher power to detect reductions in mortality; but because that death rate is pushed so high due to the aging process, they will tend to have many chronic conditions and diseases as well. Do we ignore studies recruiting from, for example, 70-90yos who all inherently have health problems? I would say it’s probably better to err on the side of inclusion here as long as subjects were not selected for any particular health problem.
4. for which there are enough trials to do a random-effects/multilevel model, so one can extract a posterior predictive interval of the benefit,

It’s important to include heterogeneity from population to population as part of the uncertainty. One of the subtleties often missed in dealing with modeling is that a 95% CI around a parameter is not a CI around the outcome or result.
5. letting one compare mortality reduction against intervention cost,

For making decisions, the costs of the intervention must be taken into account. A taxing exercise regimen may be proven to increase longevity, but is that of any value of the regimen takes away much more of your life than it gives back? It may be that no matter whether our certainty is 99% or 99.99% that it works, we would never choose to embark on that regimen. On the other hand, for a cheap enough intervention, we may be willing to accept substantially lower probability: baby aspirin, for example, costs $7 & <10 seconds each morning / <60 minutes a year and we might be willing to use it if our final probability of any benefit is a mere 50%. 6. then yielding an expected value of the intervention (and if <$0, perhaps further analysis about whether it has a reasonable chance of ever becoming profitable and what the Value of Information might be of further trials)

If an intervention passes all these criteria, we can be highly confident that we will gain life, rather than lose it.

Of course, these criteria are stringent: we can, for example, exclude almost everything which might appear in an issue of Life Extension Magazine for being too small, conducted in animals, conducted in sick patients, not being a randomized experiment at all, or reporting benefits only on an extremely specific biological endpoint like some cholesterol-related metric you’ve never heard of. Medical journals don’t offer rich pickings either, as they understandably tend to focus on experiments done in sick people rather than healthy people (apparently there’s more funding for the former - who knew?). And when a randomized experiment is done in healthy adult humans, the experiment may either be too small or have run for too short a time, since annual mortality rates for many participant groups are 1% or less; for any sort of effect to show up, you want (very expensive, very rare) studies which enroll at least a thousand subjects for a decade or more. It would not be surprising if we must intone the ritual ending to all Cochrane Reviews, there is insufficient evidence for a recommendation.

Fortunately, I do know of a few plausible candidate interventions. For example, baby aspirin has been tested in what must be dozens of trials at this point (generally turning in relative risk/RRs like 0.90 or 0.95 in the meta-analyses), is almost too cheap to bother including the cost, and has minimal side-effects (rarely, internal bleeding); it would not be surprising if a cost-benefit indicated the best decision is taking baby aspirin.

Unfortunately, no one has done this sort of thing before, so for my own use, while not an expert on any of the interventions or on health economics in general, I thought I might try to sketch out some such analyses.

# Interventions

Evaluated:

Database for candidates:

• http://geroprotectors.org/?page=8&q[s]=organism_name+asc

Possible interventions:

• Aspirin
• Bisphosphonate: https://www.fightaging.org/archives/2011/02/bisphosphonates-and-an-unusual-longevity.php https://www.fightaging.org/archives/2015/12/more-investigation-of-bisphosphonates-and-reduced-mortality.php (sex-specific? redundant with vitamin D, which might be through bone/fracture too?)
• antioxidants? except I think most increase ACM…
• exercise (aerobic or resistance, primarily)
• D-Glucosamine 22828954 http://www.ncbi.nlm.nih.gov/pubmed/?term=22828954
• Magnesium 24259558 http://www.ncbi.nlm.nih.gov/pubmed/?term=24259558
• Lithium Chloride 21301855 http://www.ncbi.nlm.nih.gov/pubmed/?term=21301855
• controlling blood pressure to <=120mm Hg http://www.nejm.org/doi/full/10.1056/NEJMoa1511939 http://www.nejm.org/doi/suppl/10.1056/NEJMoa1511939/suppl_file/nejmoa1511939_appendix.pdf

while SPRINT can’t be generalized to healthy middle-aged people, having inclusion criteria for past cardiovascular problems or being high risk, they did have as one of their inclusion criteria be >75. and the total mortality HR for the <75yos was 0.77, and for >75… 0.68. not statistically significant, but certainly not evidence that the reduction of blood pressure has a lesser or different effect in the healthy >75yos than the unhealthy <75yos
• exercise
• weight loss
• diet:

• Mediterranean has at least one study http://www.nejm.org/action/showImage?doi=10.1056%2FNEJMoa1200303&iid=f01

## Exclusions

Examples of interventions which are interesting to me or often come up, but which can’t be evaluated or wouldn’t ever be profitable, and reasons for exclusion:

• promising, but current randomized studies are too short to show any effect on mortality (eg in A 2-Year Randomized Controlled Trial of Human Caloric Restriction: Feasibility and Effects on Predictors of Health Span and Longevity (the CALERIE study), Ravussin et al 2015, no one died, so there is no all-cause mortality estimate)
• often reported to increase longevity. But we can reject it as practical for a number of reasons: too serious a reduction in quality of life (going well beyond the loss of sex), social ostracism, probably must be done before puberty for any benefits (when informed consent is impossible), may not be legal, not verified in any randomized trials (a concern because not all historical eunuch seem to show life expectancy gains; Egyptian eunuchs reportedly had severe losses), and at least one of the proposed mechanisms may have lost efficacy in the modern setting (greater resistance to infectious diseases)
• famous for life-extension results in animal experiments (especially mice), but the animal experiment environments may be unrealistic (raised under controlled, sometimes SPF conditions) and rapamycin’s primary use in humans is as an immunosuppressant and has been linked with bad outcomes. Optimal dosage is unclear; at least one mice study found a higher dose did not extend lifespan as much while increasing certain cancer rates (Bitto et al 2016). Rapamycin may never undergo clinical trials in healthy adult humans because the risks are too great, although animal trials such as a large-dog trial, continue (see A randomized controlled trial to establish effects of short-term rapamycin treatment in 24 middle-aged companion dogs, Urfer et al 2017)
• a glass a day has long been noted to correlate with longer life anywhere one looks, but the confounds for moderate responsible drinking are too noticeable to put much faith in it and the debate continues to rage. It’s troubling that the economics literature seems to find health benefits to taxing alcohol, and that the more powerful Mendelian randomization design (while far from infallible) show harm rather than benefit. But the real issue is that despite seeming like the easiest experiment ever to run (hand out free booze), there apparently are no randomized trials.

(I might do alcohol anyway to get an idea of what the profitability would be like and how skeptical one must be to erase benefits.)

# Definitions

## Value of Life

The valuation of one year of life I am using here is $50,000, due to its commonness and conservativeness (some valuations would be much higher, and so tend to favor the use of interventions). ## Population survival curve To work with life expectancies and changes in it, I take a standard Gompertz curve model of age-related mortality, using a fit of the modern Dutch population given in Cramer & Kaas 2013. S <- function(t, RR=1) { exp(-((RR*0.000016443)/log(1.1124) * (1.1124^t - 1))) } # plot(sapply(0:120, S), xlab="Age", ylab="Fraction surviving") ## the min is because past age 104, hazard>1 which is wrong, so need a ceiling H <- function(t, RR) { min(1, (0 + (RR*0.000016443)*1.1124^t)) } It is also useful to draw random samples from a Gompertz-distribution of age-at-deaths, which can be used for generative model-related things like power simulations: rgompertz <- function(n, RR=1) { ageHazards <- sapply(0:120, function(t) { H(t, RR) }) ## for each possible age, randomly sample a life event according to that age's hazard; ## then the subject dies at the age at which their first FALSE is found. ## So TRUE,TRUE,TRUE,FALSE,TRUE,FALSE ~> died 4yo. deathAges <- replicate(n, Position(!, sapply(ageHazards, function(h){sample(c(TRUE,FALSE), 1, prob=c(1-h, h))}))) return(deathAges) } ## example uses: mean(rgompertz(1000)) # [1] 77.705 mean(rgompertz(1000, RR=0.50)) # [1] 84.219 ### Converting risk reduction to average life expectancy gain However, interpreting an RR/OR/STR isn’t as simple as probability of death is reduced in each time period by 15%, we have average life expectancies of ~80, therefore, X will make me live an extra $0.15 \cdot 80=12$ years. (The intuition here is that death rates increase so much with time that a one-off reduction in each period doesn’t make a huge long-term difference. You can see that any constant intervention moves the curve up a little bit but does not flatten it.) The Gompertz curve hazard can be multiplied by the RR to give a particular trajectory with that change in risk, and the mean can be obtained by integrating over a lifetime (more efficient than estimating by simulation using our rgompertz). Doing two such integrations, one with a baseline RR of 1 and one with the new RR, then gives the two different life expectancies, and subtraction indicates the net gain/loss: lifeExpectancy <- function(RR, age1, age2) { integrate(function(t){S(t, RR) / S(age1, RR)}, age1, age2)$value }
lifeExpectancyGain <- function(RR, startingAge=0, endingAge=Inf)
{ lifeExpectancy(RR, startingAge, endingAge) -
lifeExpectancy(1, startingAge, endingAge) }

## visualize:
rrs <- seq(0.01,2,by=0.05)
gainsVsRR <- data.frame(RR=rrs, Gain=sapply(rrs, lifeExpectancyGain))
plot(gainsVsRR, ylab="Gain (in years)")
RR Gain
0.01 43.22
0.06 26.40
0.11 20.71
0.16 17.19
0.21 14.64
0.26 12.64
0.31 10.99
0.36 9.58
0.41 8.36
0.46 7.28
0.51 6.32
0.56 5.44
0.61 4.64
0.66 3.90
0.71 3.21
0.76 2.57
0.81 1.98
0.86 1.41
0.91 0.88
0.96 0.38
1.01 -0.09
1.06 -0.55
1.11 -0.98
1.16 -1.39
1.21 -1.79
1.26 -2.17
1.31 -2.53
1.36 -2.88
1.41 -3.22
1.46 -3.55
1.51 -3.86
1.56 -4.17
1.61 -4.46
1.66 -4.75
1.71 -5.03
1.76 -5.30
1.81 -5.56
1.86 -5.82
1.91 -6.06
1.96 -6.31

If we plot a lot of samples of RR vs gain to get some intuition, it looks like a logarithmic curve, and indeed a linear model turns out to fit extremely well (r2= 0.9999), certainly good enough for our purposes, so if we needed better performance we could replace two integrals by 4 arithmetic operations:

summary(lm(Gain ~ log(RR), data=gainsVsRR))
#         Min          1Q      Median          3Q         Max
# -0.00110357 -0.00095014 -0.00049523  0.00025105  0.39525452
#
# Coefficients:
#                  Estimate    Std. Error      t value   Pr(>|t|)
# (Intercept)  0.0009609580  0.0002853556      3.36758 0.00077752
# log(RR)     -9.3742723770  0.0006917444 -13551.64204 < 2.22e-16
#
# Residual standard error: 0.01024382 on 1499 degrees of freedom
# Multiple R-squared:  0.9999918,   Adjusted R-squared:  0.9999918
# F-statistic: 1.83647e+08 on 1 and 1499 DF,  p-value: < 2.2204e-16
lifeExpectancyGainLogApproximation <- function(RR) { 0.0009609580 + -9.3742723770*log(RR) }
lifeExpectancyGain(0.85)
# [1] 1.523915331
lifeExpectancyGainLogApproximation(0.85)
# [1] 1.52445767

Anyway, with this conversion of risk to years of gained life, we can see how much bang we get:

lifeExpectancyGain(0.90)
# [1] 0.9879195324
lifeExpectancyGain(0.85)
# 1.523913984
lifeExpectancyGain(0.85) - lifeExpectancyGain(0.85, startingAge=30)
# [1] 0.02529745239
(0.02529745239 /  1.523913984 ) * 100
# [1] 1.660031515
lifeExpectancyGain(0.85) - lifeExpectancyGain(0.85, startingAge=60)
# [1] 0.2763949673

The back-loading of mortality means that a constant reduction in mortality risk like an RR of 0.85 would not deliver the naive expectation of 12 years but far less: ~1.6 years. On the plus side, this also means that if we wait to use an intervention until what might seem like late in life (like age 30), we have not suffered a large opportunity cost (of ~40%) like one might think, but much less (~1.7%). The gains are minimal early on, and the opportunity cost only begins to really mount starting in one’s 50s or 60s, which has the important implication that such interventions will benefit younger people much less (at least when it comes to interventions which offer only a constant reduction in risk and do not affect the acceleration of mortality caused by the aging process itself) and so the risks are more likely to make interventions profitable.

#### Maximum possible profit

More generally, it would be interesting to get an idea of what is the most profit possible for any particular reduction in ACM, by granting the most favorable possible assumptions: certainty that the RR is exact, that the intervention is side-effect-less and free both to start and continue, and it can begin at birth if need be. For the plausible range of RRs, 0.5-1.0:

profitByAge <- function(t, RR=0.85, yearValue=50000, annualCost, startCost, probabilityPenalty=(1/3)) {
(lifeExpectancyGain(RR, startingAge=t) * yearValue * probabilityPenalty) -
(annualCost*lifeExpectancy(RR,t,Inf) + startCost) }

rrs <- seq(0.5,1,by=0.025)
startAge <- 0
data.frame(RR=rrs,
Years=sapply(rrs, function(r) { lifeExpectancyGain(r, startAge) }),
Maximum.Profit=sapply(rrs, function(r){profitByAge(startAge, RR=r, annualCost=0, startCost=0,
probabilityPenalty=1)}))
#       RR        Years Maximum.Profit
# 1  0.500 6.5010477923   325052.38962
# 2  0.525 6.0433308206   302166.54103
# 3  0.550 5.6069241021   280346.20511
# 4  0.575 5.1899323362   259496.61681
# 5  0.600 4.7907023355   239535.11678
# 6  0.625 4.4077834664   220389.17332
# 7  0.650 4.0398958538   201994.79269
# 8  0.675 3.6859046025   184295.23012
# 9  0.700 3.3447986050   167239.93025
# 10 0.725 3.0156733541   150783.66770
# 11 0.750 2.6977164037   134885.82019
# 12 0.775 2.3901950076   119509.75038
# 13 0.800 2.0924464396   104622.32198
# 14 0.825 1.8038691126    90193.45563
# 15 0.850 1.5239153307    76195.76654
# 16 0.875 1.2520850389    62604.25194
# 17 0.900 0.9879204526    49396.02263
# 18 0.925 0.7310014242    36550.07121
# 19 0.950 0.4809414268    24047.07134
# 20 0.975 0.2373840605    11869.20302
# 21 1.000 0.0000000000        0.00000

This provides upper bounds, and can help gauge plausibility of interventions for various variables - for example, if we were a 90-yo man considering embarking on caloric restriction, we might note that the upper bounds look like

# ...
# 13 0.800 2.0924464396   104622.32198
# 14 0.825 1.8038691126    90193.45563
# 15 0.850 1.5239153307    76195.76654
# 16 0.875 1.2520850389    62604.25194
# 17 0.900 0.9879204526    49396.02263
# 18 0.925 0.7310014242    36550.07121
# 19 0.950 0.4809414268    24047.07134
# 20 0.975 0.2373840605    11869.20302
# 21 1.000 0.0000000000        0.00000

Then we have to take into account the uncertainty that caloric restriction works at all in the extremely elderly (a halving is probably optimistic), the high probability its reduction in mortality will be relatively modest and in the 0.90s, the large financial & time costs of safely doing such a stringent diet (easily thousands of dollars in time and higher-quality food ingredients), the need to learn new recipes and count calories, the possible side-effects like increasing frailty & muscle loss… Given that the upper bound for plausible effect sizes is merely 1 year/$50k, it strikes me as extremely unlikely that such a man could justify CR. We could also go the other direction and ask what is the worst RR that justifies a particular dollar cost using uniroot to find the zero of profitsByAge: costToRR <- function (startingAge=30, annualCost, startCost=0, probabilityPenalty=1) { uniroot(f=function(r) { profitByAge(startingAge, RR=r, annualCost=annualCost, startCost=startCost, probabilityPenalty=probabilityPenalty) }, lower=.Machine$double.eps, upper=1)$root } costToRR(annualCost=1000) # [1] 0.9008035856 costToRR(annualCost=10) # [1] 0.9989756733 ### Power analysis How many subjects does it take to detect a specified RR? This is useful for planning clinical trials but also in interpreting them. We can do simulation power analyses of an odds-ratio text on all-cause mortality by using rgompertz to generate a number of subjects’ death dates for different RRs (eg 0.85 vs 1.0), counting how many deaths in each group fall within a specified followup period of years, and running some sort of analysis based on that data. (A more efficient design would be to simulate out a full survival curve and do a log-rank test on a Gompertz or nonparametric curve, but this requires individual-level data while most papers report only the OR; compare the Gompertz curve power analysis of Petrascheck & Miller 2017.) gompertzSimulation <- function(RR, n1, n2, startingAge, followupYears) { experimental <- Filter(function(deathDate){deathDate>=startingAge}, rgompertz(n1, RR)) control <- Filter(function(deathDate){deathDate>=startingAge}, rgompertz(n2, 1)) experimentalDeaths <- Filter(function(deathDate){deathDate<=(startingAge+followupYears)}, experimental) controlDeaths <- Filter(function(deathDate){deathDate<=(startingAge+followupYears)}, control) df <- data.frame(RR=RR,Start=startingAge, Followup=followupYears, N1=length(experimental), N2=length(control), N1.deaths=length(experimentalDeaths), N2.deaths=length(controlDeaths)) return(cbind(df, data.frame(RR.observed=(df$N1.deaths/df$N1) /(df$N2.deaths/df$N2)))) } gompertzPower <- function(RR, n1, n2, startingAge, followupYears, iters=100) { library(parallel) library(plyr) ldply(mclapply(1:iters, function(i) { gompertzSimulation(RR, n1, n2, startingAge, followupYears); })) } Given a function to estimate power for a particular set of trial characteristics, we can also search for the sample size which would make a trial (or fixed-effect meta-analysis) of particular RR effects well-powered as a heuristic for what sort of sample sizes we should expect (remembering that being well-powered is neither necessary nor optimal for decision-making): gompertzPowerSearch <- function (RR, startingAge=50, followupYears=10, targetPower=0.8, startingN=4000, iters=100, increment=200) { n <- startingN nPower <- 0 while (nPower < targetPower) { n <- n+increment sims <- gompertzPower(RR, n, n, startingAge, followupYears, iters) pvalues <- sapply(1:nrow(sims), function(i) { prop.test(c(sims[i,]$N1.deaths, sims[i,]$N2.deaths), c(sims[i,]$N1, sims[i,]$N2), alternative="less")$p.value })
nPower <- sum(pvalues<=0.05) / length(pvalues)
}
return(n)
}
startingN <- 4500
for (RR in seq(0.80, 0.95, by=0.01)) {
n <- gompertzPowerSearch(RR, startingN=startingN)
startingN <- n
print(paste0("RR ", RR, "; necessary sample: ", n))
}
RR Well-powered: 1 arm Total n
0.80 4700 9400
0.81 6100 12200
0.82 6300 12600
0.83 6900 13800
0.84 7100 14200
0.85 8900 17800
0.86 9300 18600
0.87 11100 22200
0.88 12500 25000
0.89 15700 31400
0.90 18700 37400
0.91 21500 43000
0.92 28500 57000
0.93 34100 68200
0.94 46500 93000
0.95 68700 137400

## Prior on RRs

Useful interventions are rare and our analyses should incorporate a skeptical informative prior reflecting our knowledge that it is highly unlikely that any particular intervention will genuinely increase life expectancy; in particular, RRs <0.5 (or >2.0) are rare. Since there are no previous compilations of randomized effects for interventions on all-cause mortality in healthy people to use for priors, I fall back on general-purpose but still informative skeptical priors found to work well across many problems by Gelman et al 2008 & Pedroza et al 2015: the Cauchy(0, 2.52) and Normal(0, 0.352) priors.

# Vitamin D

## Causality

### Quasi-experimental

The 3 Mendelian randomization studies I found show clear negatives to genetically lower vitamin D levels:

1. Examination of 141 single-nucleotide polymorphisms in a discovery cohort of 1514 white participants…Composite outcome of incident hip fracture, myocardial infarction, cancer, and mortality over long-term follow-up….Among Cardiovascular Health Study participants, low 25-hydroxyvitamin D concentration was associated with hazard ratios for risk of the composite outcome of 1.40 (95% CI, 1.12-1.74) for those who had 1 minor allele at rs7968585 and 1.82 (95% CI, 1.31-2.54) for those with 2 minor alleles at rs7968585. In contrast, there was no evidence of an association (estimated hazard ratio, 0.93 [95% CI, 0.70-1.24]) among participants who had 0 minor alleles at this single-nucleotide polymorphism.

Table 2 gives associations with all-cause mortality: 1.3 (1.0-1.6); 0.8 (0.7-1.0); 1.2 (0.9-1.5); 0.8 (0.6-1.1); 0.8 (0.6-1.0).
2. Trummer et al 2013, Vitamin D and mortality: a Mendelian randomization study

a prospective cohort study of 3,316 male and female participants [total]…RESULTS: In a linear regression model adjusting for month of blood sampling, age, and sex, vitamin D concentrations were predicted by GC genotype (p < 0.001), CYP2R1 genotype (P = 0.068), and DHCR7 genotype (p < 0.001), with a coefficient of determination (r2) of 0.175. During a median follow-up time of 9.9 years, 955 persons (30.0%) died, including 619 deaths from cardiovascular causes. In a multivariate Cox regression adjusted for classical risk factors, GC, CYP2R1, and DHCR7 genotypes were not associated with all-cause mortality

Table 3: 0.95 (0.86-1.06), 1.00 (0.91-1.10); 0.94 (0.84-1.05);
3. 95,766 white participants of Danish descent from three cohorts…The multi-variable adjusted hazard ratios for a 20 nmol/L lower plasma 25-hydroxyvitamin D concentration were 1.19 (95% confidence interval 1.14 to 1.25) for all cause mortality…For DHCR7 and CYP2R1, we constructed an aggregate allele score of 0-4 as the sum of the number of 25-hydroxyvitamin D lowering alleles across the two genotypes in each gene….The hazard ratio per one DHCR7/CYP2R1 allele score increase was 1.02 (1.00 to 1.03; p=0.03) for all cause mortality

### RCTs

I searched Pubmed and Google Scholar on 2 January 2015 with the searches:

• vitamin D[All Fields] AND ((random allocation[MeSH Terms] OR (random[All Fields] AND allocation[All Fields]) OR random allocation[All Fields] OR randomized[All Fields]) AND (mortality[Subheading] OR mortality[All Fields] OR mortality[MeSH Terms])) AND (Meta-Analysis[ptyp] AND humans[MeSH Terms])
• vitamin d all-cause mortality meta-analysis since 2012

After reviewing hits and reading all the relevant looking papers and following citations, the most useful were:

1. We identified [k=]18 independent randomized controlled trials, including 57 311 participants. A total of 4777 deaths from any cause occurred during a trial size-adjusted mean of 5.7 years. Daily doses of vitamin D supplements varied from 300 to 2000 IU. The trial size-adjusted mean daily vitamin D dose was 528 IU. In 9 trials, there was a 1.4- to 5.2-fold difference in serum 25-hydroxyvitamin D between the intervention and control groups. The summary relative risk for mortality from any cause was 0.93 (95% confidence interval, 0.87-0.99).

2. Barnard & Colon-Emeric 2010, Extraskeletal effects of vitamin D in older adults: cardiovascular disease, mortality, mood, and cognition:

Re-reports Autier & Gandini 2007 and is redundant.
3. Elamin et al 2011, Vitamin D and cardiovascular outcomes: a systematic review and meta-analysis:

Vitamin D was associated with nonsignificant effects on the patient-important outcomes of death

[RR 0.96 (0.93-1.00); p=0.08; k=30]
4. Patient-level data were available for 24 869 people in [k=]6 trials (WHI CaD and five randomised, placebo controlled trials of calcium supplements)…The hazard ratio for death (all causes) was 1.04 (0.95 to 1.14, p=0.4).

5. The IPD analysis yielded data on 70,528 randomized participants (86.8% females) with a median age of 70 (interquartile range, 62-77) yr. Vitamin D with or without calcium reduced mortality by 7% [hazard ratio, 0.93; 95% confidence interval (CI), 0.88-0.99]. However, vitamin D alone did not affect mortality, but risk of death was reduced if vitamin D was given with calcium (hazard ratio, 0.91; 95% CI, 0.84-0.98). The number needed to treat with vitamin D plus calcium for 3 yr to prevent one death was 151. Trial level meta-analysis (24 trials with 88,097 participants) showed similar results, i.e. mortality was reduced with vitamin D plus calcium (odds ratio, 0.94; 95% CI, 0.88-0.99), but not with vitamin D alone (odds ratio, 0.98; 95% CI, 0.91-1.06).

k=8 & k=24
6. Vitamin D therapy significantly decreased all-cause mortality with a duration of follow-up longer than 3 years with a RR (95% CI) of 0.94 (0.90-0.98). [k=13] No benefit was seen in a shorter follow-up periods with a RR (95% CI) of 1.04 (0.97-1.12) [k=29]

Zheng does not report the pooled RR of all studies. I calculate it at RR=0.981, p=0.331
7. Lazzeroni et al 2013, Vitamin D Supplementation and Cancer: Review of Randomized Controlled Trials:

Focused on cancer, not all-cause mortality and so not relevant, but does note some methodological issues that apply to most vitamin D RCTs: subjects often do not take the provided supplements, and may be taking vitamin D on their own; the vitamin D doses used can vary widely; baseline levels of vitamin D in blood also can vary widely, which leads to both greater noise and, given the common U or J-shaped dose-response curves, might hide benefits or cause harm; vitamin D itself might interact with other drugs or supplements (eg Lazzeroni et al note an apparent interaction with estrogen supplementation in the Women’s Health Initiative Trial).
8. Bjelakovic et al 2014 (update of Bjelakovic et al 2011), Vitamin D supplementation for prevention of cancer in adults:

Vitamin D decreased all-cause mortality (1854/24,846 (7.5%) versus 2007/25,020 (8.0%); RR 0.93 (95% CI 0.88 to 0.98); p = 0.009; I² = 0%; [k]=15 trials; 49,866 participants; moderate quality evidence), but TSA indicates that this finding could be due to random errors.

9. Bjelakovic et al 2014, Vitamin D supplementation for prevention of mortality in adults

Accordingly, 56 randomised trials with 95,286 participants provided usable data on mortality…Vitamin D decreased mortality in all 56 trials analysed together (5,920/47,472 (12.5%) vs 6,077/47,814 (12.7%); RR 0.97 (95% confidence interval (CI) 0.94 to 0.99); P = 0.02; I2 = 0%). More than 8% of participants dropped out. Worst-best case and best-worst case scenario analyses demonstrated that vitamin D could be associated with a dramatic increase or decrease in mortality. When different forms of vitamin D were assessed in separate analyses, only vitamin D3 decreased mortality (4,153/37,817 (11.0%) vs 4,340/38,110 (11.4%); RR 0.94 (95% CI 0.91 to 0.98); P = 0.002; I2 = 0%; 75,927 participants; 38 trials). Vitamin D2, alfacalcidol and calcitriol did not significantly affect mortality. A subgroup analysis of trials at high risk of bias suggested that vitamin D2 may even increase mortality, but this finding could be due to random errors. Trial sequential analysis supported our finding regarding vitamin D3, with the cumulative Z-score breaking the trial sequential monitoring boundary for benefit, corresponding to 150 people treated over five years to prevent one additional death. We did not observe any statistically significant differences in the effect of vitamin D on mortality in subgroup analyses of trials at low risk of bias compared with trials at high risk of bias; of trials using placebo compared with trials using no intervention in the control group; of trials with no risk of industry bias compared with trials with risk of industry bias; of trials assessing primary prevention compared with trials assessing secondary prevention; of trials including participants with vitamin D level below 20 ng/mL at entry compared with trials including participants with vitamin D levels equal to or greater than 20 ng/mL at entry; of trials including ambulatory participants compared with trials including institutionalised participants; of trials using concomitant calcium supplementation compared with trials without calcium; of trials using a dose below 800 IU per day compared with trials using doses above 800 IU per day; and of trials including only women compared with trials including both sexes or only men.

Note that despite the almost identical author, date, & year, this differs from the other Bjelakovic et al 2014 Cochrane review in focusing on mortality rather than cancer prevention.
10. k=41, RR 0.96 (0.93-1.00)
11. In randomised controlled trials, relative risks for all cause mortality were 0.89 (0.80 to 0.99) for vitamin D3 supplementation and 1.04 (0.97 to 1.11) for vitamin D2 supplementation.

k=22 total: k=14 for vitamin D3 and k=8 for vitamin D2.
12. Newberry et al 2014 (update of Chung et al 2009), Vitamin D and Calcium: A Systematic Review of Health Outcomes (Update):

For the re-analysis conducted in the original report, they excluded 5 of 18 trials in the Autier 2007 meta-analysis: One trial was on patients with congestive heart failure,206 one was published only in abstract form,207 in one trial the controls also received supplementation with vitamin D, albeit with a smaller dose,208 and two trials used vitamin D injections.209,210 One additional eligible RCT (Lyons 2007)185 was identified and included in our meta-analysis….3 RCTs from the previous systematic review [based on Autier & Gandini 2007?] and an additional C rated RCT were included in our reanalysis. [k=4] Three used daily doses that ranged between 400 and 880 IU, and one used 100,000 IU every 3 months. Our meta-analysis of the 4 RCTs (13,833 participants) shows absence of significant effects of vitamin D supplementation on all-cause mortality (RR = 0.97, 95% CI: 0.92, 1.02; random effects model). There is little evidence for between-study heterogeneity in these analyses.

As the selection is so limited, Newberry et al 2014 is redundant with the others.
13. Vitamin D3: k=9, 0.91 (0.82-1.02). Vitamin D2: k=8, 1.04 (0.97-1.11). As well, an interesting comparison of randomized & correlational studies on the same outcomes:

10 (7%) outcomes were examined by both meta-analyses of observational studies and meta-analyses of randomised controlled trials: cardiovascular disease, hypertension, birth weight, birth length, head circumference at birth, small for gestational age birth, mortality in patients with chronic kidney disease, all cause mortality, fractures, and hip fractures (table 5⇓). The direction of the association/effect and level of statistical significance was concordant only for birth weight, but this outcome could not be tested for hints of bias in the meta-analysis of observational studies (owing to lack of the individual data). The direction of the association/effect but not the level of statistical significance was concordant in 6 outcomes (cardiovascular disease, hypertension, birth length, head circumference small for gestational age births, and all cause mortality), but only 2 of them (cardiovascular disease and hypertension) could be tested and were found to be free from hint of bias and of low heterogeneity in the meta-analyses of observational studies. For mortality in chronic kidney disease patients, fractures in older populations, and hip fractures [3], both the direction and the level of significance of the association/effect were not concordant.

14. Autier et al 2014, Vitamin D status and ill health: a systematic review (appendix):

Results of meta-analyses and pooled analyses consistently showed that supplementation could significantly reduce the risk of all-cause mortality, with relative risks ranging from 0.93 to 0.96 (table 4)…Randomised controlled trials: The seven most recent meta-analyses summarised results of 88 randomised trials, some of which were included in several meta-analyses (table 4). We identified an additional 84 articles of randomised trials not included in published meta-analyses (table 5)…

Table 5 (studies not included in the listed meta-analyses) does not mention all-cause mortality, so only Autier et al is not reporting any new studies and is redundant with the others; Table 4 simply summarizes 3 past meta-analyses of ACM:

1. Elamin et al 2011: k=30, RR=0.96 (0.93-1.00)
2. Bjelakovic et al 2011: k=50 [error? should be k=15?], RR=0.95 (0.91-0.99)
3. Rejnmark et al 2012: k=62 [error? should be k=8 & k=24?] total, split between small and large (n>1000) trials:
• k=38, RR=0.93 (0.88-0.99)
• k=24, RR=0.94 (0.88-0.99)
15. Avenell et al 2014 (update of Avenell et al 2009, which is an update of Avenell et al 2004), Vitamin D and vitamin D analogues for preventing fractures associated with involutional and post-menopausal osteoporosis:

…mortality was not adversely affected by either vitamin D or vitamin D plus calcium supplementation ([k=]29 trials, 71,032 participants, RR 0.97, 95% CI 0.93 to 1.01)

• High dose, intermittent vitamin D therapy did not decrease all-cause mortality among older adults. The risk ratio (95% CI) was 1.04 (0.91-1.17)…Mortality data was available in seven trials [k=7]

The overall thrust of the meta-analyses is that there is consistent and substantial evidence that regular vitamin D3 supplementation reduces all-cause mortality somewhere in the RR=0.90-1 range, that vitamin D2 may be worse but calcium seems irrelevant, and this effect seems to be general: the observable between-study heterogeneity is very small despite large differences in subject populations by gender, age, and dosage, and subgroup analyses do not find the effect confined to particular demographics or that a reduction in a particular disease is driving the ACM reduction (which while potentially due to lack of power, would be consistent with the generality of benefits in both the correlational and Mendelian randomization studies).

The most comprehensive meta-analysis by my count is Bolland et al 2014, which pools k=41 to get an estimate of RR 0.96 (0.93-1.00; p=0.04) based on an experimental death rate of $\frac{3824}{40379}$=0.0947 vs control death rate of $\frac{3950}{40794}$=0.09682. The 41 studies are:

The meta-analysis itself can be reproduced given Bolland’s forest plot & table, Figure 5.

vitaminD <- read.csv(stdin(), header=TRUE,
colClasses=c("factor","integer","integer","integer","integer","integer","logical"))
Study, Year,E.deaths, E.n, C.deaths, C.n,Calcium
Inkovaara, 1983, 41, 181, 26, 146, FALSE
Corless, 1985, 8, 41, 8, 41, FALSE
Ooms, 1995, 11, 177, 21, 171, FALSE
Lips A, 1996, 223, 1291, 251, 1287, FALSE
Komulainen, 1998, 2, 232, 2, 232, FALSE
Meyer, 2002, 169, 569, 163, 575, FALSE
Bischoff, 2003, 1, 62, 4, 60, FALSE
Cooper, 2003, 0, 93, 1, 94, FALSE
Latham, 2003, 11, 121, 3, 122, FALSE
Trivedi, 2003, 224, 1345, 247, 1341, FALSE
Avenell, 2004, 4, 70, 3, 64, FALSE
Harwood, 2004, 24, 113, 5, 37, FALSE
Aloia, 2005, 1, 104, 2, 104, FALSE
Flicker, 2005, 76, 313, 85, 312, FALSE
Grant, 2005, 438, 2649, 460, 2643, FALSE
Broe, 2007, 5, 99, 2, 25, FALSE
Burleigh, 2007, 16, 101, 13, 104, FALSE
Lappe, 2007, 4, 446, 18, 734, FALSE
Lyons, 2007, 947, 1725, 953, 1715, FALSE
Smith, 2007, 355, 4727, 354, 4713, FALSE
Björkman, 2008, 27, 150, 9, 68, FALSE
Chel, 2008, 25, 166, 33, 172, FALSE
Prince, 2008, 0, 151, 1, 151, FALSE
Zhu, 2008, 0, 39, 2, 81, FALSE
Lips B, 2010, 1, 114, 0, 112, FALSE
Sanders, 2010, 40, 1131, 47, 1125, FALSE
Glendenning, 2012, 2, 353, 0, 333, FALSE
Inkovaara, 1983, 2, 353, 0, 333, TRUE
Chapuy A, 1992, 258, 1634, 274, 1636, TRUE
Dawson-Hughes, 1997, 2, 187, 2, 202, TRUE
Baeksgaard, 1998, 0, 80, 1, 80, TRUE
Krieg, 1999, 21, 124, 26, 124, TRUE
Chapuy B, 2002, 70, 389, 43, 194, TRUE
Harwood, 2004, 17, 75, 5, 37, TRUE
Meier, 2004, 0, 30, 1, 25, TRUE
Brazier, 2005, 3, 95, 1, 97, TRUE
Grant, 2005, 221, 1306, 217, 1332, TRUE
Porthouse, 2005, 57, 1321, 68, 1993, TRUE
WHI trials, 2006, 744, 18176, 807, 18106, TRUE
Bolton-Smith, 2007, 0, 62, 1, 61, TRUE
Zhu, 2008, 0, 39, 2, 41, TRUE
Salovaara, 2010, 15, 1718, 13, 1714, TRUE

library(metafor)
rem <- rma(measure="RR", ai=E.deaths, bi=(E.n-E.deaths), ci=C.deaths, di=(C.n-C.deaths),
data=vitaminD, method="REML"); rem
# Random-Effects Model (k = 42; tau^2 estimator: REML)
#
# tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.0015)
# tau (square root of estimated tau^2 value):      0
# I^2 (total heterogeneity / total variability):   0.00%
# H^2 (total variability / sampling variability):  1.00
#
# Test for Heterogeneity:
# Q(df = 41) = 36.5200, p-val = 0.6699
#
# Model Results:
#
# estimate       se     zval     pval    ci.lb    ci.ub
#  -0.0354   0.0186  -1.8990   0.0576  -0.0719   0.0011
## since I^2=0, this is equivalent to a fixed-effect meta-analysis:
fem <- rma(measure="RR", ai=E.deaths, bi=(E.n-E.deaths), ci=C.deaths, di=(C.n-C.deaths),
data=vitaminD, method="FE"); fem
#
# Fixed-Effects Model (k = 42)
#
# Test for Heterogeneity:
# Q(df = 41) = 36.5200, p-val = 0.6699
#
# Model Results:
#
# estimate       se     zval     pval    ci.lb    ci.ub
#  -0.0354   0.0186  -1.8990   0.0576  -0.0719   0.0011
## 'metafor' works in log RR, so convert the log RR estimates back to RRs:
exp(-0.0719); exp(-0.0354); exp(0.0011)
# [1] 0.9306239536
# [1] 0.9652192513
# [1] 1.001100605

(Oddly, the per-study deaths/_n_s in the Bolland forest plot/table don’t add up to the claimed totals, but are short by a few dozen each; since the RR calculated by metafor works out to be about the same, I assume it has something to do with how Bolland decided to handle including multiple subgroups from studies and is not important.)

Turning from frequentist to Bayesian methods, we could take the fixed-effect at face-value and use a Bayesian proportion test, for example:

library(BayesianFirstAid)
bayes.prop.test(c(sum(vitaminD$E.deaths), sum(vitaminD$C.deaths)), c(sum(vitaminD$E.n), sum(vitaminD$C.n)))
#   Bayesian First Aid propotion test
#
# data: c(sum(vitaminD$E.deaths), sum(vitaminD$C.deaths)) out of c(sum(vitaminD$E.n), sum(vitaminD$C.n))
# number of successes:   4065,  4174
# number of trials:     42152, 42537
# Estimated relative frequency of success [95% credible interval]:
#   Group 1: 0.096 [0.094, 0.099]
#   Group 2: 0.098 [0.095, 0.10]
# Estimated group difference (Group 1 - Group 2):
#   0 [-0.0056, 0.0024]
# The relative frequency of success is larger for Group 1 by a probability
# of 0.203 and larger for Group 2 by a probability of 0.797 .

Using BFA has the downside that an I2=0 doesn’t guarantee that there is no heterogeneity (it’d be surprising if there wasn’t) and gives an overly narrow predictive distribution, and BFA doesn’t easily include our informative priors. We can switch to bayesmeta which is easier to use than JAGS since it’s built on metafor:

library(bayesmeta)
brem <- bayesmeta(escalc(measure="RR", ai=E.deaths, bi=(E.n-E.deaths),
ci=C.deaths, di=(C.n-C.deaths), data=vitaminD),
mu.prior.mean=0, mu.prior.sd=0.35^2)
brem
# ...ML and MAP estimates:
#                          tau             mu
# ML joint     3.878278037e-06 -0.03540646003
# ML marginal  0.000000000e+00             NA
# MAP joint    3.931136439e-07 -0.03460123602
# MAP marginal 0.000000000e+00 -0.03590402065
#
# marginal posterior summary:
#                     tau              mu
# mode      0.00000000000 -0.035904020655
# median    0.02518404476 -0.036069671647
# mean      0.03104981837 -0.036152849917
# sd        0.02543031565  0.022522077099
# 95% lower 0.00000000000 -0.080152143106
# 95% upper 0.08001681423  0.008768546338
## converting from log RRs to RR:
exp(c(-0.080152143106, -0.036152849917, 0.008768546338))
# [1] 0.9229759113 0.9644928596 1.0088071027

## posterior predictive distribution of RR values including heterogeneity

## Sensitivity

We can also use the formula to ask how small (close to 1) the RR must be before, given the specified purchase & side-effect costs, there is no longer any age at which metformin is profitable:

costToRR(startingAge=optimalAge, annualCost=metforminAnnualCost, startCost=metforminStartCost,
probabilityPenalty=(1/3))
# [1] 0.9520242072

So if one doesn’t believe the true causal ACM RR is <=0.95, then metformin use may not be advisable.

## TAME power analysis

Using the previously defined gompertzPower function, we can ask a question like: assuming a clinical trial is investigating a claimed RR of 0.85, plans to enrol 3000 people aged 70yo, so presumably 1500 in each arm, and will followup for 5 years (a trial which bears a certain resemblance to plans for TAME); how often would the observed RR<1.0?

df0.85 <- gompertzPower(0.85, 1000, 1000, 50, 10, 100)
table(df0.85$RR.observed<1.0) # FALSE TRUE # 22 78 Let’s consider something more plausible like the baby aspirin or some metformin estimates, an RR of 0.94: df0.94 <- gompertzPower(0.94, 1500, 1500, 70, 5, 1000) table(df0.94$RR.observed<1.0)
# FALSE  TRUE
#  236   764

76% power is likewise still acceptable. But just because a clinical trials plans to enrol n=3000 doesn’t mean they will succeed, as there is always attrition, and attrition can be severe depending on how unpleasant the side-effects are; for metformin, for example, the attrition could easily be 30%:

df0.94attrit <- gompertzPower(0.94, 1500*0.70, 1500*0.70, 70, 5, 1000)
table(df0.94attrit$RR.observed<1.0) # FALSE TRUE # 254 746 Knocking us back to 74%. So under more realistic assumptions about attrition and effect sizes, there’s a good chance that there would not be a correct signed result. Finally, we could consider the probability that we would get both a RR in the desired direction and the difference would be large enough to be statistically-significant at p<0.05 when treated as a two-proportion problem: df <- df0.94attrit pvalues <- sapply(1:nrow(df), function(i) { prop.test(c(df[i,]$N1.deaths, df[i,]$N2.deaths), c(df[i,]$N1, df[i,]$N2), alternative="less")$p.value })
table(pvalues<=0.05)
# FALSE  TRUE
#  890   110

So it is highly unlikely (~10%) we could expect the realistic trial to deliver a statistically-significant reduction in mortality. (This lack of power for statistical-significance on ACM may be why the TAME investigators talk primarily about looking for reductions in cancer/heart attacks/strokes: as more common events than death, reductions will show up more clearly there, as they do in past metformin trials.) We don’t necessarily care about statistical-significance, since this is a decision-analysis approach: our question is whether the gained data shifts the posterior probability enough to change the optimal decision. But it’s good to have an idea of power in that sense to make it easier to interpret the research article & media reports (eg the trial might be a failure from the point of view of statistical-significance, even as it increases the profitability of using metformin).

If we really wanted 80% power for p<0.05, how big a sample do we need? In this case, it turns out to be ~7500 per arm (after any attrition) or ~15k total.

1. Pooled experimental arm death rate: $\frac{1175+3693}{8809+33752}$=0.1143770118, control death rate $\frac{1118+3880}{9193+33712}$=0.1164899196; then the test of difference: prop.test(c((1175+3693), (1118+3880)), c((8809+33752), (9193+33712)))
2. Or to put it more concretely, $(0.0968279649 - 0.09470269199) \cdot 40794$=86 excess deaths in the control group, who did not take vitamin D.