Skip to main content

survival analysis directory


“Germline Mutation Rates in Young Adults Predict Longevity and Reproductive Lifespan”, Cawthon et al 2020

“Germline mutation rates in young adults predict longevity and reproductive lifespan”⁠, Richard M. Cawthon, Huong D. Meeks, Thomas A. Sasani, Ken R. Smith, Richard A. Kerber, Elizabeth O’Brien et al (2020-06-19; ⁠, ; similar):

Ageing may be due to mutation accumulation across the lifespan, leading to tissue dysfunction, disease, and death. We tested whether germline autosomal mutation rates in young adults predict their remaining survival, and, for women, their reproductive lifespans. Age-adjusted mutation rates (AAMRs) in 61 women and 61 men from the Utah CEPH (Centre d’Etude du Polymorphisme Humain) families were determined. Age at death, cause of death, all-site cancer incidence, and reproductive histories were provided by the Utah Population Database, Utah Cancer Registry, and Utah Genetic Reference Project. Higher AAMRs were statistically-significantly associated with higher all-cause mortality in both sexes combined. Subjects in the top quartile of AAMRs experienced more than twice the mortality of bottom quartile subjects (hazard ratio [HR], 2.07; 95% confidence interval [CI], 1.21–3.56; p = 0.008; median survival difference = 4.7 years). Fertility analyses were restricted to women whose age at last birth (ALB) was ≥ 30 years, the age when fertility begins to decline. Women with higher AAMRs had statistically-significantly fewer live births and a younger ALB. Adult germline mutation accumulation rates are established in adolescence, and later menarche in women is associated with delayed mutation accumulation. We conclude that germline mutation rates in healthy young adults may provide a measure of both reproductive and systemic ageing. Puberty may induce the establishment of adult mutation accumulation rates, just when DNA repair systems begin their lifelong decline.

“Statistical Reliability Analysis for a Most Dangerous Occupation: Roman Emperor”, Saleh 2019

“Statistical reliability analysis for a most dangerous occupation: Roman emperor”⁠, Joseph Homer Saleh (2019-12-23; ⁠, ; similar):

Popular culture associates the lives of Roman emperors with luxury, cruelty, and debauchery, sometimes rightfully so. One missing attribute in this list is, surprisingly, that this mighty office was most dangerous for its holder. Of the 69 rulers of the unified Roman Empire, from Augustus (d. 14 CE) to Theodosius I (d. 395 CE), 62% suffered violent death. This has been known for a while, if not quantitatively at least qualitatively. What is not known, however, and has never been examined is the time-to-violent-death of Roman emperors.

This work adopts the statistical tools of survival data analysis to an unlikely population, Roman emperors, and it examines a particular event in their rule, not unlike the focus of reliability engineering⁠, but instead of their time-to-failure⁠, their time-to-violent-death. We investigate the temporal signature of this seemingly haphazard stochastic process that is the violent death of a Roman emperor, and we examine whether there is some structure underlying the randomness in this process or not.

Nonparametric and parametric results show that: (1) emperors faced a statistically-significantly high risk of violent death in the first year of their rule, which is reminiscent of infant mortality in reliability engineering; (2) their risk of violent death further increased after 12 years, which is reminiscent of wear-out period in reliability engineering; (3) their failure rate displayed a bathtub-like curve, similar to that of a host of mechanical engineering items and electronic components. Results also showed that the stochastic process underlying the violent deaths of emperors is remarkably well captured by a (mixture) Weibull distribution⁠.

We discuss the interpretation and possible reasons for this uncanny result, and we propose a number of fruitful venues for future work to help better understand the deeper etiology of the spectacle of regicide of Roman emperors.

“Effect of Aspirin on Disability-free Survival in the Healthy Elderly”, McNeil et al 2018

“Effect of Aspirin on Disability-free Survival in the Healthy Elderly”⁠, John J. McNeil, Robyn L. Woods, Mark R. Nelson, Christopher M. Reid, Brenda Kirpach, Rory Wolfe, Elsdon Storey et al (2018-10-18; ; similar):

Background: Information on the use of aspirin to increase healthy independent life span in older persons is limited. Whether 5 years of daily low-dose aspirin therapy would extend disability-free life in healthy seniors is unclear.

Methods: From 2010 through 2014, we enrolled community-dwelling persons in Australia and the United States who were 70 years of age or older (or ≥65 years of age among blacks and Hispanics in the United States) and did not have cardiovascular disease, dementia, or physical disability. Participants were randomly assigned to receive 100 mg per day of enteric-coated aspirin or placebo orally. The primary end point was a composite of death, dementia, or persistent physical disability. Secondary end points reported in this article included the individual components of the primary end point and major hemorrhage.

Results: A total of 19,114 persons with a median age of 74 years were enrolled, of whom 9525 were randomly assigned to receive aspirin and 9589 to receive placebo. A total of 56.4% of the participants were women, 8.7% were nonwhite, and 11.0% reported previous regular aspirin use. The trial was terminated at a median of 4.7 years of follow-up after a determination was made that there would be no benefit with continued aspirin use with regard to the primary end point. The rate of the composite of death, dementia, or persistent physical disability was 21.5 events per 1000 person-years in the aspirin group and 21.2 per 1000 person-years in the placebo group (hazard ratio, 1.01; 95% confidence interval [CI], 0.92 to 1.11; p = 0.79). The rate of adherence to the assigned intervention was 62.1% in the aspirin group and 64.1% in the placebo group in the final year of trial participation. Differences between the aspirin group and the placebo group were not substantial with regard to the secondary individual end points of death from any cause (12.7 events per 1000 person-years in the aspirin group and 11.1 events per 1000 person-years in the placebo group), dementia, or persistent physical disability. The rate of major hemorrhage was higher in the aspirin group than in the placebo group (3.8% vs. 2.8%; hazard ratio, 1.38; 95% CI, 1.18 to 1.62; p < 0.001).

Conclusions: Aspirin use in healthy elderly persons did not prolong disability-free survival over a period of 5 years but led to a higher rate of major hemorrhage than placebo. (Funded by the National Institute on Aging and others; ASPREE number, NCT01038583)

“Epigenetic Prediction of Complex Traits and Death”, McCartney et al 2018

“Epigenetic prediction of complex traits and death”⁠, Daniel L. McCartney, Anna J. Stevenson, Stuart J. Ritchie, Rosie M. Walker, Qian Zhang, Stewart W. Morris et al (2018-04-03; similar):

Background: Genome-wide DNA methylation (DNAm) profiling has allowed for the development of molecular predictors for a multitude of traits and diseases. Such predictors may be more accurate than the self-reported phenotypes, and could have clinical applications. Here, penalized regression models were used to develop DNAm predictors for body mass index (BMI), smoking status, alcohol consumption, and educational attainment in a cohort of 5,100 individuals. Using an independent test cohort comprising 906 individuals, the proportion of phenotypic variance explained in each trait was examined for DNAm-based and genetic predictors. Receiver operator characteristic curves were generated to investigate the predictive performance of DNAm-based predictors, using dichotomized phenotypes. The relationship between DNAm scores and all-cause mortality (n = 214 events) was assessed via Cox proportional-hazards models.

Results: The DNAm-based predictors explained different proportions of the phenotypic variance for BMI (12%), smoking (60%), alcohol consumption (12%) and education (3%). The combined genetic and DNAm predictors explained 20% of the variance in BMI, 61% in smoking, 13% in alcohol consumption, and 6% in education. DNAm predictors for smoking, alcohol, and education but not BMI predicted mortality in univariate models. The predictors showed moderate discrimination of obesity (AUC = 0.67) and alcohol consumption (AUC = 0.75), and excellent discrimination of current smoking status (AUC = 0.98). There was poorer discrimination of college-educated individuals (AUC = 0.59).

Conclusion: DNAm predictors correlate with lifestyle factors that are associated with health and mortality. They may supplement DNAm-based predictors of age to identify the lifestyle profiles of individuals and predict disease risk.

List of abbreviations

DNAm: DNA methylation

BMI: Body mass index

AUC: Area under the curve

CpG: Cytosine phosphate Guanine dinucleotide

EWAS: Epigenome-wide association study

GS:SFHS: Generation Scotland: The Scottish family health study

LBC1936: Lothian birth cohort 1936

LASSO: Least absolute shrinkage and selector operator

HR: Hazard ratio

CI: Confidence interval

STRADL: Stratifying resilience and depression longitudinally

“Relationship Foraging: Does Time Spent Searching Predict Relationship Length?”, Cohen & Todd 2018

“Relationship Foraging: Does time spent searching predict relationship length?”⁠, Samantha E. Cohen, Peter M. Todd (2018; similar):

Animals foraging for resources often need to alternate between searching for and benefiting from patches of those resources. Here we explore whether such patterns of behavior can usefully be applied to the human search for romantic relationships. Optimal foraging theory suggests that foragers should alter their time spent in patches based on how long they typically spend searching between patches. We test whether human relationship search can be described as a foraging task that fits this OFT prediction. By analyzing a large, demographically representative dataset on marriage and cohabitation timing using survival analysis, we find that the likelihood of a relationship ending per unit time goes down with increased duration of search before that relationship, in accord with the foraging prediction. We consider the possible applications and limits of a foraging perspective on mate search and suggest further directions for study.

“Genome-wide Meta-analysis Associates HLA-DQA1/DRB1 and LPA and Lifestyle Factors With Human Longevity”, Joshi et al 2017

“Genome-wide meta-analysis associates HLA-DQA1/DRB1 and LPA and lifestyle factors with human longevity”⁠, Peter K. Joshi, Nicola Pirastu, Katherine A. Kentistou, Krista Fischer, Edith Hofer, Katharina E. Schraut et al (2017-10-13; ⁠, ⁠, ⁠, ; backlinks; similar):

Genomic analysis of longevity offers the potential to illuminate the biology of human aging. Here, using genome-wide association meta-analysis of 606,059 parents’ survival, we discover two regions associated with longevity (HLA-DQA1/​DRB1 and LPA). We also validate previous suggestions that APOE, CHRNA3/​5, CDKN2A/​B, SH2B3 and FOXO3A influence longevity. Next we show that giving up smoking, educational attainment, openness to new experience and high-density lipoprotein (HDL) cholesterol levels are most positively genetically correlated with lifespan while susceptibility to coronary artery disease (CAD), cigarettes smoked per day, lung cancer, insulin resistance and body fat are most negatively correlated. We suggest that the effect of education on lifespan is principally mediated through smoking while the effect of obesity appears to act via CAD. Using instrumental variables, we suggest that an increase of one body mass index unit reduces lifespan by 7 months while 1 year of education adds 11 months to expected lifespan.

“Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles”, Periáñez et al 2017

“Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles”⁠, África Periáñez, Alain Saas, Anna Guitart, Colin Magne (2017-10-06; similar):

Reducing user attrition, i.e. churn, is a broad challenge faced by several industries. In mobile social games, decreasing churn is decisive to increase player retention and rise revenues. Churn prediction models allow to understand player loyalty and to anticipate when they will stop playing a game. Thanks to these predictions, several initiatives can be taken to retain those players who are more likely to churn.

Survival analysis focuses on predicting the time of occurrence of a certain event, churn in our case. Classical methods, like regressions, could be applied only when all players have left the game. The challenge arises for datasets with incomplete churning information for all players, as most of them still connect to the game. This is called a censored data problem and is in the nature of churn. Censoring is commonly dealt with survival analysis techniques, but due to the inflexibility of the survival statistical algorithms, the accuracy achieved is often poor. In contrast, novel ensemble learning techniques, increasingly popular in a variety of scientific fields, provide high-class prediction results.

In this work, we develop, for the first time in the social games domain, a survival ensemble model which provides a comprehensive analysis together with an accurate prediction of churn. For each player, we predict the probability of churning as function of time, which permits to distinguish various levels of loyalty profiles. Additionally, we assess the risk factors that explain the predicted player survival times. Our results show that churn prediction by survival ensembles significantly improves the accuracy and robustness of traditional analyses, like Cox regression.

“Statistical Inference for Data-adaptive Doubly Robust Estimators With Survival Outcomes”, Díaz 2017

“Statistical Inference for Data-adaptive Doubly Robust Estimators with Survival Outcomes”⁠, Iván Díaz (2017-09-01; similar):

The consistency of doubly robust estimators relies on consistent estimation of at least one of two nuisance regression parameters. In moderate to large dimensions, the use of flexible data-adaptive regression estimators may aid in achieving this consistency. However, n1⁄2-consistency of doubly robust estimators is not guaranteed if one of the nuisance estimators is inconsistent. In this paper we present a doubly robust estimator for survival analysis with the novel property that it converges to a Gaussian variable at n1⁄2-rate for a large class of data-adaptive estimators of the nuisance parameters, under the only assumption that at least one of them is consistently estimated at a n1⁄4-rate. This result is achieved through adaptation of recent ideas in semiparametric inference, which amount to: (1) Gaussianizing (ie. making asymptotically linear) a drift term that arises in the asymptotic analysis of the doubly robust estimator, and (2) using cross-fitting to avoid entropy conditions on the nuisance estimators. We present the formula of the asymptotic variance of the estimator, which allows computation of doubly robust confidence intervals and p-values. We illustrate the finite-sample properties of the estimator in simulation studies, and demonstrate its use in a phase III clinical trial for estimating the effect of a novel therapy for the treatment of HER2 positive breast cancer.

“Heritability of Schizophrenia and Schizophrenia Spectrum Based on the Nationwide Danish Twin Register”, Hilker et al 2017

2017-hilker.pdf: “Heritability of Schizophrenia and Schizophrenia Spectrum Based on the Nationwide Danish Twin Register”⁠, Rikke Hilker, Dorte Helenius, Birgitte Fagerlund, Axel Skytthe, Kaare Christensen, Thomas M. Werge, Merete Nordentoft et al (2017-08-30; ⁠, ; similar):

Background: Twin studies have provided evidence that both genetic and environmental factors contribute to schizophrenia (SZ) risk. Heritability estimates of SZ in twin samples have varied methodologically. This study provides updated heritability estimates based on nationwide twin data and an improved statistical methodology.

Methods: Combining 2 nationwide registers, the Danish Twin Register and the Danish Psychiatric Research Register, we identified a sample of twins born between 1951 and 2000 (n = 31,524 twin pairs). Twins were followed until June 1, 2011. Liability threshold models adjusting for censoring with inverse probability weighting were used to estimate proband-wise concordance rates and heritability of the diagnoses of SZ and SZ spectrum disorders.

Results: The proband-wise concordance rate of SZ is 33% in monozygotic twins and 7% in dizygotic twins. We estimated the heritability of SZ to be 79%. When expanding illness outcome to include SZ spectrum disorders, the heritability estimate was almost similar (73%).

Conclusions: The key strength of this study is the application of a novel statistical method accounting for censoring in the follow-up period to a nationwide twin sample. The estimated 79% heritability of SZ is congruent with previous reports and indicates a substantial genetic risk. The high genetic risk also applies to a broader phenotype of SZ spectrum disorders. The low concordance rate of 33% in monozygotic twins demonstrates that illness vulnerability is not solely indicated by genetic factors.

[Keywords: censoring, concordance, heritability, register, schizophrenia, twin study]

“A Long Journey to Reproducible Results: Replicating Our Work Took Four Years and 100,000 Worms but Brought Surprising Discoveries”, Lithgow et al 2017

“A long journey to reproducible results: Replicating our work took four years and 100,000 worms but brought surprising discoveries”⁠, Gordon J. Lithgow, Monica Driscoll, Patrick Phillips (2017-08-22; ⁠, ; backlinks; similar):

About 15 years ago, one of us (G.J.L.) got an uncomfortable phone call from a colleague and collaborator. After nearly a year of frustrating experiments, this colleague was about to publish a paper1 chronicling his team’s inability to reproduce the results of our high-profile paper2 in a mainstream journal. Our study was the first to show clearly that a drug-like molecule could extend an animal’s lifespan. We had found over and over again that the treatment lengthened the life of a roundworm by as much as 67%. Numerous phone calls and e-mails failed to identify why this apparently simple experiment produced different results between the labs. Then another lab failed to replicate our study. Despite more experiments and additional publications, we couldn’t work out why the labs were getting different lifespan results. To this day, we still don’t know. A few years later, the same scenario played out with different compounds in other labs…In another, now-famous example, two cancer labs spent more than a year trying to understand inconsistencies6. It took scientists working side by side on the same tumour biopsy to reveal that small differences in how they isolated cells—vigorous stirring versus prolonged gentle rocking—produced different results. Subtle tinkering has long been important in getting biology experiments to work. Before researchers purchased kits of reagents for common experiments, it wasn’t unheard of for a team to cart distilled water from one institution when it moved to another. Lab members would spend months tweaking conditions until experiments with the new institution’s water worked as well as before. Sources of variation include the quality and purity of reagents, daily fluctuations in microenvironment and the idiosyncratic techniques of investigators7. With so many ways of getting it wrong, perhaps we should be surprised at how often experimental findings are reproducible.

…Nonetheless, scores of publications continued to appear with claims about compounds that slow ageing. There was little effort at replication. In 2013, the three of us were charged with that unglamorous task…Our first task, to develop a protocol, seemed straightforward.

But subtle disparities were endless. In one particularly painful teleconference, we spent an hour debating the proper procedure for picking up worms and placing them on new agar plates. Some batches of worms lived a full day longer with gentler technicians. Because a worm’s lifespan is only about 20 days, this is a big deal. Hundreds of e-mails and many teleconferences later, we converged on a technique but still had a stupendous three-day difference in lifespan between labs. The problem, it turned out, was notation—one lab determined age on the basis of when an egg hatched, others on when it was laid. We decided to buy shared batches of reagents from the start. Coordination was a nightmare; we arranged with suppliers to give us the same lot numbers and elected to change lots at the same time. We grew worms and their food from a common stock and had strict rules for handling. We established protocols that included precise positions of flasks in autoclave runs. We purchased worm incubators at the same time, from the same vendor. We also needed to cope with a large amount of data going from each lab to a single database. We wrote an iPad app so that measurements were entered directly into the system and not jotted on paper to be entered later. The app prompted us to include full descriptors for each plate of worms, and ensured that data and metadata for each experiment were proofread (the strain names MY16 and my16 are not the same). This simple technology removed small recording errors that could disproportionately affect statistical analyses.

Once this system was in place, variability between labs decreased. After more than a year of pilot experiments and discussion of methods in excruciating detail, we almost completely eliminated systematic differences in worm survival across our labs (see ‘Worm wonders’)…Even in a single lab performing apparently identical experiments, we could not eliminate run-to-run differences.

…We have found one compound that lengthens lifespan across all strains and species. Most do so in only two or three strains, and often show detrimental effects in others.

“Machine Learning for Survival Analysis: A Survey”, Wang et al 2017

“Machine Learning for Survival Analysis: A Survey”⁠, Ping Wang, Yan Li, Chandan K. Reddy (2017-08-15; backlinks; similar):

Accurately predicting the time of occurrence of an event of interest is a critical problem in longitudinal data analysis. One of the main challenges in this context is the presence of instances whose event outcomes become unobservable after a certain time point or when some instances do not experience any event during the monitoring period. Such a phenomenon is called censoring which can be effectively handled using survival analysis techniques.

Traditionally, statistical approaches have been widely developed in the literature to overcome this censoring issue. In addition, many machine learning algorithms are adapted to effectively handle survival data and tackle other challenging problems that arise in real-world data. In this survey, we provide a comprehensive and structured review of the representative statistical methods along with the machine learning techniques used in survival analysis and provide a detailed taxonomy of the existing methods. We also discuss several topics that are closely related to survival analysis and illustrate several successful applications in various real-world application domains. We hope that this paper will provide a more thorough understanding of the recent advances in survival analysis and offer some guidelines on applying these approaches to solve new problems that arise in applications with censored data.

“Candy Japan’s New Box A/B Test”, Branwen 2016

Candy-Japan: “Candy Japan’s new box A/B test”⁠, Gwern Branwen (2016-05-06; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Bayesian decision-theoretic analysis of the effect of fancier packaging on subscription cancellations & optimal experiment design.

I analyze an A/​B test from a mail-order company of two different kinds of box packaging from a Bayesian decision-theory perspective, balancing posterior probability of improvements & greater profit against the cost of packaging & risk of worse results, finding that as the company’s analysis suggested, the new box is unlikely to be sufficiently better than the old. Calculating expected values of information shows that it is not worth experimenting on further, and that such fixed-sample trials are unlikely to ever be cost-effective for packaging improvements. However, adaptive experiments may be worthwhile.

“Revisiting the Risks of Bitcoin Currency Exchange Closure”, Moore et al 2016

“Revisiting the Risks of Bitcoin Currency Exchange Closure”⁠, Tyler Moore, Nicolas Christin, Janos Szurdi (2016; ; backlinks; similar):

Bitcoin has enjoyed wider adoption than any previous cryptocurrency; yet its success has also attracted the attention of fraudsters who have taken advantage of operational insecurity and transaction irreversibility. We study the risk investors face from the closure of Bitcoin exchanges, which convert between Bitcoins and hard currency. We examine the track record of 80 Bitcoin exchanges established between 2010 and 2015. We find that nearly half (38) have since closed, with customer account balances sometimes wiped out. Fraudsters are sometimes to blame, but not always. 25 exchanges suffered security breaches, 15 of which subsequently closed. We present logistic regressions using using longitudinal data on Bitcoin exchanges aggregated quarterly. We find that experiencing a breach is correlated with a 13× greater odds that an exchange will close in that same quarter. We find that higher-volume exchanges are less likely to close (each doubling in trade volume corresponds to a 12% decrease in the odds of closure). We also find that exchanges who derive most of their business from trading less popular (fiat) currencies, which are offered by at most one competitor, are less likely to close.

“When Should I Check The Mail?”, Branwen 2015

Mail-delivery: “When Should I Check The Mail?”⁠, Gwern Branwen (2015-06-21; ⁠, ⁠, ⁠, ; backlinks; similar):

Bayesian decision-theoretic analysis of local mail delivery times: modeling deliveries as survival analysis, model comparison, optimizing check times with a loss function⁠, and optimal data collection.

Mail is delivered by the USPS mailman at a regular but not observed time; what is observed is whether the mail has been delivered at a time, yielding somewhat-unusual “interval-censored data”. I describe the problem of estimating when the mailman delivers, write a simulation of the data-generating process, and demonstrate analysis of interval-censored data in R using maximum-likelihood (survival analysis with Gaussian regression using survival library), MCMC (Bayesian model in JAGS), and likelihood-free Bayesian inference (custom ABC, using the simulation). This allows estimation of the distribution of mail delivery times. I compare those estimates from the interval-censored data with estimates from a (smaller) set of exact delivery-times provided by USPS tracking & personal observation, using a multilevel model to deal with heterogeneity apparently due to a change in USPS routes/​postmen. Finally, I define a loss function on mail checks, enabling: a choice of optimal time to check the mailbox to minimize loss (exploitation); optimal time to check to maximize information gain (exploration); Thompson sampling (balancing exploration & exploitation indefinitely), and estimates of the value-of-information of another datapoint (to estimate when to stop exploration and start exploitation after a finite amount of data).

“Only the Bad Die Young: Restaurant Mortality in the Western US”, Luo & Stark 2014

“Only the Bad Die Young: Restaurant Mortality in the Western US”⁠, Tian Luo, Philip B. Stark (2014-10-31; ; similar):

Do 9 out of 10 restaurants fail in their first year, as commonly claimed? No. Survival analysis of 1.9 million longitudinal microdata for 81,000 full-service restaurants in a 20-year U.S. Bureau of Labor Statistics non-public census of business establishments in the western US shows that only 17% of independently owned full-service restaurant startups failed in their first year, compared with 19% for all other service-providing startups. The median lifespan of restaurants is about 4.5 years, slightly longer than that of other service businesses (4.25 years). However, the median lifespan of a restaurant startup with 5 or fewer employees is 3.75 years, slightly shorter than that of other service businesses of the same startup size (4.0 years).

“Alerts Over Time”, Branwen 2013

Google-Alerts: “Alerts Over Time”⁠, Gwern Branwen (2013-07-01; ⁠, ⁠, ; backlinks; similar):

Does Google Alerts return fewer results each year? A statistical investigation

Has Google Alerts been sending fewer results the past few years? Yes. Responding to rumors of its demise, I investigate the number of results in my personal Google Alerts notifications 2007-2013, and find no overall trend of decline until I look at a transition in mid-2011 where the results fall dramatically. I speculate about the cause and implications for Alerts’s future.

“Predicting Google Closures”, Branwen 2013

Google-shutdowns: “Predicting Google closures”⁠, Gwern Branwen (2013-03-28; ⁠, ⁠, ⁠, ; backlinks; similar):

Analyzing predictors of Google abandoning products; predicting future shutdowns

Prompted by the shutdown of Google Reader⁠, I ponder the evanescence of online services and wonder what is the risk of them disappearing. I collect data on 350 Google products launched before March 2013, looking for variables predictive of mortality (web hits, service vs software, commercial vs free, FLOSS, social networking, and internal vs acquired). Shutdowns are unevenly distributed over the calendar year or Google’s history. I use logistic regression & survival analysis (which can deal with right-censorship) to model the risk of shutdown over time and examine correlates. The logistic regression indicates socialness, acquisitions, and lack of web hits predict being shut down, but the results may not be right. The survival analysis finds a median lifespan of 2824 days with a roughly Type III survival curve (high early-life mortality); a Cox regression finds similar results as the logistic - socialness, free, acquisition, and long life predict lower mortality. Using the best model, I make predictions about probability of shutdown of the most risky and least risky services in the next 5 years (up to March 2018). (All data & R source code is provided.)

“2012 Election Predictions”, Branwen 2012

2012-election-predictions: “2012 election predictions”⁠, Gwern Branwen (2012-11-05; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Compiling academic and media forecaster’s 2012 American Presidential election predictions and statistically judging correctness; Nate Silver was not the best.

Statistically analyzing in R hundreds of predictions compiled for ~10 forecasters of the 2012 American Presidential election, and ranking them by Brier, RMSE, & log scores; the best overall performance seems to be by Drew Linzer and Wang & Holbrook, while Nate Silver appears as somewhat over-rated and the famous Intrade prediction market turning in a disappointing overall performance.

“‘HP: Methods of Rationality’ Review Statistics”, Branwen 2012

hpmor: “‘HP: Methods of Rationality’ review statistics”⁠, Gwern Branwen (2012-11-03; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Recording fan speculation for retrospectives; statistically modeling reviews for ongoing story with R

The unprecedented gap in Methods of Rationality updates prompts musing about whether readership is increasing enough & what statistics one would use; I write code to download reviews, clean it, parse it, load into R, summarize the data & depict it graphically, run linear regression on a subset & all reviews, note the poor fit, develop a quadratic fit instead, and use it to predict future review quantities.

Then, I run a similar analysis on a competing fanfiction to find out when they will have equal total review-counts. A try at logarithmic fits fails; fitting a linear model to the previous 100 days of MoR and the competitor works much better, and they predict a convergence in <5 years.

A survival analysis finds no major anomalies in reviewer lifetimes, but an apparent increase in mortality for reviewers who started reviewing with later chapters, consistent with (but far from proving) the original theory that the later chapters’ delays are having negative effects.

“Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?”, SalahEldeen & Nelson 2012

“Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?”⁠, Hany M. SalahEldeen, Michael L. Nelson (2012-09-13; ; backlinks; similar):

Social media content has grown exponentially in the recent years and the role of social media has evolved from just narrating life events to actually shaping them. In this paper we explore how many resources shared in social media are still available on the live web or in public web archives. By analyzing six different event-centric datasets of resources shared in social media in the period from June 2009 to March 2012, we found about 11% lost and 20% archived after just a year and an average of 27% lost and 41% archived after two and a half years. Furthermore, we found a nearly linear relationship between time of sharing of the resource and the percentage lost, with a slightly less linear relationship between time of sharing and archiving coverage of the resource. From this model we conclude that after the first year of publishing, nearly 11% of shared resources will be lost and after that we will continue to lose 0.02% per day.

“DNM-related Arrests, 2011–2015”, Branwen 2012

DNM-arrests: “DNM-related arrests, 2011–2015”⁠, Gwern Branwen (2012-07-14; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A census database of all publicly-reported arrests and prosecutions connected to the Tor-Bitcoin drug darknet markets 2011-2015, and analysis of mistakes.

I compile a table and discussion of all known arrests and prosecutions related to English-language Tor-Bitcoin darknet markets (DNMs) such as Silk Road 1, primarily 2011–2015, along with discussion of how they came to be arrested.

“A/B Testing Long-form Readability on”, Branwen 2012

AB-testing: “A/B testing long-form readability on”⁠, Gwern Branwen (2012-06-16; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A log of experiments done on the site design, intended to render pages more readable, focusing on the challenge of testing a static site, page width, fonts, plugins, and effects of advertising.

To gain some statistical & web development experience and to improve my readers’ experiences, I have been running a series of CSS A/​B tests since June 2012. As expected, most do not show any meaningful difference.

“Silk Road 1: Theory & Practice”, Branwen 2011

Silk-Road: “Silk Road 1: Theory & Practice”⁠, Gwern Branwen (2011-07-11; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

History, background, visiting, ordering, using, & analyzing the drug market Silk Road 1

The cypherpunk movement laid the ideological roots of Bitcoin and the online drug market Silk Road; balancing previous emphasis on cryptography, I emphasize the non-cryptographic market aspects of Silk Road which is rooted in cypherpunk economic reasoning, and give a fully detailed account of how a buyer might use market information to rationally buy, and finish by discussing strengths and weaknesses of Silk Road, and what future developments are predicted by cypherpunk ideas.

“Archiving URLs”, Branwen 2011

Archiving-URLs: “Archiving URLs”⁠, Gwern Branwen (2011-03-10; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Archiving the Web, because nothing lasts forever: statistics, online archive services, extracting URLs automatically from browsers, and creating a daemon to regularly back up URLs to multiple sources.

Links on the Internet last forever or a year, whichever comes first. This is a major problem for anyone serious about writing with good references, as link rot will cripple several% of all links each year, and compounding.

To deal with link rot, I present my multi-pronged archival strategy using a combination of scripts, daemons, and Internet archival services: URLs are regularly dumped from both my web browser’s daily browsing and my website pages into an archival daemon I wrote, which pre-emptively downloads copies locally and attempts to archive them in the Internet Archive⁠. This ensures a copy will be available indefinitely from one of several sources. Link rot is then detected by regular runs of linkchecker, and any newly dead links can be immediately checked for alternative locations, or restored from one of the archive sources.

As an additional flourish, my local archives are efficiently cryptographically timestamped using Bitcoin in case forgery is a concern, and I demonstrate a simple compression trick for substantially reducing sizes of large web archives such as crawls (particularly useful for repeated crawls such as my DNM archives).

“How Fast Does the Grim Reaper Walk? Receiver Operating Characteristics Curve Analysis in Healthy Men Aged 70 and Over”, Stanaway et al 2011

“How fast does the Grim Reaper walk? Receiver operating characteristics curve analysis in healthy men aged 70 and over”⁠, Fiona F. Stanaway, Danijela Gnjidic, Fiona M. Blyth, David G. Le Couteur, Vasi Naganathan, Louise Waite et al (2011; similar):

Objective: To determine the speed at which the Grim Reaper (or Death) walks.

Design: Population based prospective study.

Setting: Older community dwelling men living in Sydney, Australia.

Participants: 1705 men aged 70 or more participating in CHAMP (Concord Health and Ageing in Men Project).

Main Outcome Measures: Walking speed (m/​s) and mortality. Receiver operating characteristics curve analysis was used to calculate the area under the curve for walking speed and determine the walking speed of the Grim Reaper. The optimal walking speed was estimated using the Youden index (sensitivity + specificity-1), a common summary measure of the receiver operating characteristics curve, and represents the maximum potential effectiveness of a marker.

Results: The mean walking speed was 0.88 (range 0.15–1.60) m⁄s. The highest Youden index (0.293) was observed at a walking speed of 0.82 m⁄s (2 miles (about 3 km) per hour), corresponding to a sensitivity of 63% and a specificity of 70% for mortality. Survival analysis showed that older men who walked faster than 0.82 m⁄s were 1.23× less likely to die (95% confidence interval 1.10 to 1.37) than those who walked slower (p = 0.0003). A sensitivity of 1.0 was obtained when a walking speed of 1.36 m⁄s (3 miles (about 5 km) per hour) or greater was used, indicating that no men with walking speeds of 1.36 m⁄s or greater had contact with Death.

Conclusion: The Grim Reaper’s preferred walking speed is 0.82 m⁄s (2 miles (about 3 km) per hour) under working conditions. As none of the men in the study with walking speeds of 1.36 m⁄s (3 miles (about 5 km) per hour) or greater had contact with Death, this seems to be the Grim Reaper’s most likely maximum speed; for those wishing to avoid their allotted fate, this would be the advised walking speed.

“Random Survival Forests”, Ishwaran et al 2008

“Random survival forests”⁠, Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, Michael S. Lauer (2008-11-11; backlinks; similar):

We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing survival trees are introduced, as is a new missing data algorithm for imputing missing data. A conservation-of-events principle for survival forests is introduced and used to define ensemble mortality, a simple interpretable measure of mortality that can be used as a predicted outcome. Several illustrative examples are given, including a case study of the prognostic implications of body mass for individuals with coronary artery disease. Computations for all examples were implemented using the freely available R-software package, randomSurvivalForest.

“Full Publication of Results Initially Presented in Abstracts”, Scherer et al 2007

2008-scherer.pdf: “Full publication of results initially presented in abstracts”⁠, Roberta W. Scherer, Patricia Langenberg, Erik von Elm (2007; ; backlinks; similar):

Studies initially reported as conference abstracts that have positive results are subsequently published as full-length journal articles more often than studies with negative results.

Less than half of all studies, and about 60% of randomized or controlled clinical trials, initially presented as summaries or abstracts at professional meetings are subsequently published as peer-reviewed journal articles. An important factor appearing to influence whether a study described in an abstract is published in full is the presence of ‘positive’ results in the abstract. Thus, the efforts of persons trying to collect all of the evidence in a field may be stymied, first by the failure of investigators to take abstract study results to full publication, and second, by the tendency to take to full publication only those studies reporting ‘significant’ results. The consequence of this is that systematic reviews will tend to over-estimate treatment effects.

Background: Abstracts of presentations at scientific meetings are usually available only in conference proceedings. If subsequent full publication of abstract results is based on the magnitude or direction of study results, publication bias may result. Publication bias, in turn, creates problems for those conducting systematic reviews or relying on the published literature for evidence.

Objectives: To determine the rate at which abstract results are subsequently published in full, and the time between meeting presentation and full publication.

Search methods: We searched MEDLINE⁠, Embase⁠, The Cochrane Library, Science Citation Index, reference lists, and author files. Date of most recent search: June 2003. Selection criteria We included all reports that examined the subsequent full publication rate of biomedical results initially presented as abstracts or in summary form. Follow-up of abstracts had to be at least two years.

Data collection and analysis: Two reviewers extracted data. We calculated the weighted mean full publication rate and time to full publication. Dichotomous variables were analyzed using relative risk and random effects models. We assessed time to publication using Kaplan-Meier survival analyses.

Main results: Combining data from 79 reports (29,729 abstracts) resulted in a weighted mean full publication rate of 44.5% (95% confidence interval (CI) 43.9 to 45.1). Survival analyses resulted in an estimated publication rate at 9 years of 52.6% for all studies, 63.1% for randomized or controlled clinical trials⁠, and 49.3% for other types of study designs.

‘Positive’ results defined as any ‘significant’ result showed an association with full publication (RR = 1.30; CI 1.14 to 1.47), as did ‘positive’ results defined as a result favoring the experimental treatment (RR = 1.17; CI 1.02 to 1.35), and ‘positive’ results emanating from randomized or controlled clinical trials (RR = 1.18, CI 1.07 to 1.30).

Other factors associated with full publication include oral presentation (RR = 1.28; CI 1.09 to 1.49); acceptance for meeting presentation (RR = 1.78; CI 1.50 to 2.12); randomized trial study design (RR = 1.24; CI 1.14 to 1.36); and basic research (RR = 0.79; CI 0.70 to 0.89). Higher quality of abstracts describing randomized or controlled clinical trials was also associated with full publication (RR = 1.30, CI 1.00 to 1.71).

Authors’ conclusions: Only 63% of results from abstracts describing randomized or controlled clinical trials are published in full. ‘Positive’ results were more frequently published than not ‘positive’ results.

“Statistics Review 12: Survival Analysis”, Bewick et al 2004

“Statistics review 12: survival analysis”⁠, Viv Bewick, Liz Cheek, Jonathan Ball (2004; backlinks):

This review introduces methods of analyzing data arising from studies where the response variable is the length of time taken to reach a certain end-point, often death. The Kaplan-Meier methods, log rank test and Cox’s proportional hazards model are described.

“Joel on Software: Strategy Letter V”, Spolsky 2002

“Joel on Software: Strategy Letter V”⁠, Joel Spolsky (2002-06-11; ⁠, ; backlinks; similar):

Every product in the marketplace has substitutes and complements. A substitute is another product you might buy if the first product is too expensive. Chicken is a substitute for beef. If you’re a chicken farmer and the price of beef goes up, the people will want more chicken, and you will sell more. A complement is a product that you usually buy together with another product. Gas and cars are complements. Computer hardware is a classic complement of computer operating systems. And babysitters are a complement of dinner at fine restaurants. In a small town, when the local five star restaurant has a two-for-one Valentine’s day special, the local babysitters double their rates. (Actually, the nine-year-olds get roped into early service.) All else being equal, demand for a product increases when the prices of its complements decrease.

Let me repeat that because you might have dozed off, and it’s important. Demand for a product increases when the prices of its complements decrease. For example, if flights to Miami become cheaper, demand for hotel rooms in Miami goes up—because more people are flying to Miami and need a room. When computers become cheaper, more people buy them, and they all need operating systems, so demand for operating systems goes up, which means the price of operating systems can go up.

…Once again: demand for a product increases when the price of its complements decreases. In general, a company’s strategic interest is going to be to get the price of their complements as low as possible. The lowest theoretically sustainable price would be the “commodity price”—the price that arises when you have a bunch of competitors offering indistinguishable goods. So:

Smart companies try to commoditize their products’ complements.

If you can do this, demand for your product will increase and you will be able to charge more and make more.

“Brms: an R Package for Bayesian Generalized Multivariate Non-linear Multilevel Models Using Stan”, Bürkner 2022

“brms: an R package for Bayesian generalized multivariate non-linear multilevel models using Stan”⁠, Paul Bürkner (⁠, ; backlinks; similar):

The brms package provides an interface to fit Bayesian generalized (non-)linear multivariate multilevel models using Stan⁠, which is a C++ package for performing full Bayesian inference. The formula syntax is very similar to that of the package lme4 to provide a familiar and simple interface for performing regression analyses.

A wide range of response distributions are supported, allowing users to fit—among others—linear, robust linear, count data, survival, response times, ordinal, zero-inflated, and even self-defined mixture models all in a multilevel context. Further modeling options include non-linear and smooth terms, auto-correlation structures, censored data, missing value imputation⁠, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Multivariate models (ie. models with multiple response variables) can be fit, as well.

Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs.

Model fit can easily be assessed and compared with posterior predictive checks, cross-validation, and Bayes factors.

Truncation (statistics)


Survivorship curve


Survival function


Survival analysis


Proportional hazards model


Kaplan-Meier estimator


Failure rate


Censoring (statistics)