Predicting Google closures

Analyzing predictors of Google abandoning products; predicting future shutdowns
statistics, archiving, predictions, R, survival-analysis, Google
2013-03-282019-04-04 finished certainty: likely importance: 7


Prompted by the shut­down of Google Read­er, I pon­der the evanes­cence of online ser­vices and won­der what is the risk of them dis­ap­pear­ing. I col­lect data on 350 Google prod­ucts launched before March 2013, look­ing for vari­ables pre­dic­tive of mor­tal­ity (web hits, ser­vice vs soft­ware, com­mer­cial vs free, FLOSS, social net­work­ing, and inter­nal vs acquired). Shut­downs are unevenly dis­trib­uted over the cal­en­dar year or Google’s his­to­ry. I use logis­tic regres­sion & sur­vival analy­sis (which can deal with right-cen­sor­ship) to model the risk of shut­down over time and exam­ine cor­re­lates. The logis­tic regres­sion indi­cates social­ness, acqui­si­tions, and lack of web hits pre­dict being shut down, but the results may not be right. The sur­vival analy­sis finds a median lifes­pan of 2824 days with a roughly Type III sur­vival curve (high ear­ly-life mor­tal­i­ty); a Cox regres­sion finds sim­i­lar results as the logis­tic - social­ness, free, acqui­si­tion, and long life pre­dict lower mor­tal­i­ty. Using the best mod­el, I make pre­dic­tions about prob­a­bil­ity of shut­down of the most risky and least risky ser­vices in the next 5 years (up to March 2018). (All data & R source code is pro­vid­ed.)

Google has occa­sion­ally shut down ser­vices I use, and not always with seri­ous warn­ing (many tech com­pa­nies are like that - here one day and gone the next - though Google is one of the least­-worst); this is frus­trat­ing and tedious.

Nat­u­ral­ly, we are preached at by apol­o­gists that Google owes us noth­ing and if it’s a prob­lem then it’s all our fault and we should have proph­e­sied the future bet­ter (and too bad about the ordi­nary peo­ple who may be screwed over or the unique his­tory1 or data casu­ally destroyed).

But how can we have any sort of ratio­nal expec­ta­tion if we lack any data or ideas about how long Google will run any­thing or why or how it chooses to do what it does? So in the fol­low­ing essay, I try to get an idea of the risk, and hope­fully the results are inter­est­ing, use­ful, or both.

A glance back

“This is some­thing that lit­er­a­ture has always been very keen on, that tech­nol­ogy never gets around to acknowl­edg­ing. The cold wind moan­ing through the empty stone box. When are you gonna own up to it? Where are the Dell PCs? This is Austin, Texas. Michael Dell is the biggest tech mogul in cen­tral Texas. Why is he not here? Why is he not at least not sell­ing his wares? Where are the ded­i­cated gam­ing con­soles you used to love? Do you remem­ber how impor­tant those were? I could spend all day here just recit­ing the names of the causal­i­ties in your line of work. It’s always the elec­tronic fron­tier. Nobody ever goes back to look at the elec­tronic forests that were cut down with chain­saws and tossed into the rivers. And then there’s this empty pre­tense that these inno­va­tions make the world ‘bet­ter’…Like: ‘If we’re not mak­ing the world bet­ter, then why are we doing this at all?’ Now, I don’t want to claim that this atti­tude is hyp­o­crit­i­cal. Because when you say a thing like that at South By: ‘Oh, we’re here to make the world bet­ter’—you haven’t even reached the level of hypocrisy. You’re stuck at the level of child­ish naivete.”

, “Text of SXSW2013 clos­ing remarks”

The shut­down of the pop­u­lar ser­vice , announced on 2013-03-13, has brought home to many peo­ple that some prod­ucts they rely on exist only at Google’s suffer­ance: it pro­vides the prod­ucts for rea­sons that are diffi­cult for out­siders to divine, may have lit­tle com­mit­ment to a prod­uct234, may not include their users’ best inter­ests, may choose to with­draw the prod­uct at any time for any rea­son5 (espe­cially since most of the prod­ucts are ser­vices6 & not in any way, and may be too tightly cou­pled with the Google infra­struc­ture7 to be spun off or sold, so when the CEO turns against it & no Googlers are will­ing to waste their careers cham­pi­oning it…), and users have no voice8 - as an option.

Andy Baio (“Never Trust a Cor­po­ra­tion to do a Library’s Job”) sum­ma­rizes Google’s track record:

“Google’s mis­sion is to orga­nize the world’s infor­ma­tion and make it uni­ver­sally acces­si­ble and use­ful.”

For years, Google’s mis­sion included the preser­va­tion of the past. In 2001, Google made their first acqui­si­tion, the Deja archives. The largest col­lec­tion of Usenet archives, Google relaunched it as Google Groups, sup­ple­mented with archived mes­sages going back to 1981. In 2004, Google Books sig­naled the com­pa­ny’s inten­tion to scan every known book, part­ner­ing with libraries and devel­op­ing its own book scan­ner capa­ble of dig­i­tiz­ing 1,000 pages per hour. In 2006, Google News Archive launched, with his­tor­i­cal news arti­cles dat­ing back 200 years. In 2008, they expanded it to include their own dig­i­ti­za­tion efforts, scan­ning news­pa­pers that were never online. In the last five years, start­ing around 2010, the shift­ing pri­or­i­ties of Google’s man­age­ment left these archival projects in lim­bo, or aban­doned entire­ly. After a series of redesigns, Google Groups is effec­tively dead for research pur­pos­es. The archives, while still online, have no means of search­ing by date. Google News Archives are dead, killed off in 2011, now direct­ing searchers to just use Google. Google Books is still online, but cur­tailed their scan­ning efforts in recent years, likely dis­cour­aged by a decade of legal wran­gling still in appeal. The offi­cial blog stopped updat­ing in 2012 and the Twit­ter accoun­t’s been dor­mant since Feb­ru­ary 2013. Even Google Search, their flag­ship pro­duct, stopped focus­ing on the his­tory of the web. In 2011, Google removed the Time­line view let­ting users fil­ter search results by date, while a series of major changes to their search rank­ing algo­rithm increas­ingly favored fresh­ness over older pages from estab­lished sources. (To the detri­ment of some.)…As it turns out, orga­niz­ing the world’s infor­ma­tion isn’t always profitable. Projects that pre­serve the past for the pub­lic good aren’t really a big profit cen­ter. Old Google knew that, but did­n’t seem to care.

In the case of Read­er, while Reader destroyed the orig­i­nal RSS reader mar­ket, there still exist some usable alter­na­tives; the con­se­quence is a shrink­age in the RSS audi­ence as inevitably many users choose not to invest in a new reader or give up or inter­pret it as a death­blow to RSS, and an irre­versible loss of Read­er’s uniquely com­pre­hen­sive RSS archives back to 2005. Although to be fair, I should men­tion 2 major points in favor of Google:

  1. a rea­son I did and still do use Google ser­vices is that, with a few lapses like Web­site Opti­mizer aside, they are almost unique in enabling users to back up their data via the work of the and have been far more proac­tive than many com­pa­nies in encour­ag­ing users to back up data from dead ser­vices - for exam­ple, in auto­mat­i­cally copy­ing Buzz users’ data to their Google Dri­ve.
  2. Google’s prac­tices of under­cut­ting all mar­ket incum­bents with free ser­vices also has very large ben­e­fits9, so we should­n’t focus just on the seen.

But nev­er­the­less, every shut­down still hurts its users to some degree, even if we - cur­rently10 - can rule out the most dev­as­tat­ing pos­si­ble shut­downs, like Gmail. It would be inter­est­ing to see if shut­downs are to some degree pre­dictable, whether there are any pat­terns, whether com­mon claims about rel­e­vant fac­tors can be con­firmed, and what the results might sug­gest for the future.

Data

Sources

Dead products

“The sum­mer grass­es—
the sole rem­nants of many
brave war­riors’ dreams.”

Basho

I begin with a list of services/APIs/programs that Google has shut­down or aban­doned taken from the Guardian arti­cle “Google Keep? It’ll prob­a­bly be with us until March 2017 - on aver­age: The clo­sure of Google Reader has got early adopters and devel­op­ers wor­ried that Google ser­vices or APIs they adopt will just get shut off. An analy­sis of 39 shut­tered offer­ings says how long they get” by Charles Arthur. Arthur’s list seemed rel­a­tively com­plete, but I’ve added in >300 items he missed based on the Slate grave­yard, Weber’s “Google Fails 36% Of The Time”11, the Wikipedia cat­e­gory/ for Google acqui­si­tions, the Wikipedia cat­e­gory/list, and finally the offi­cial Google His­tory. (The addi­tional shut­downs include many shut­downs pre­dat­ing 2010, sug­gest­ing that Arthur’s list was biased towards recent shut­down­s.)

In a few cas­es, the start dates are well-in­formed guesses (eg. Google Trans­late) and dates of abandonment/shut-down are even harder to get due to the lack of atten­tion paid to most (Joga Boni­to) and so I infer the date from archived pages on the Inter­net Archive, news reports, blogs such as Google Oper­at­ing Sys­tem, the dates of press releas­es, the shut­down of closely related ser­vices (eReader Play based on Read­er), source code repos­i­to­ries (An­gu­lar­JS) etc; some are listed as dis­con­tin­ued (Google Cat­a­logs) but are still sup­ported or were merged into other soft­ware (Spread­sheets, Docs, Write­ly, News Archive) or sold/given to third par­ties (Flu Shot Find­er, App Inven­tor, Body) or active effort has ceased but the con­tent remains and so I do not list those as dead; for cases of acquired software/services that were shut­down, I date the start from Google’s pur­chase.

Live products

“…He often lying broad awake, and yet / Remain­ing from the body, and apart / In intel­lect and power and will, hath heard / Time flow­ing in the mid­dle of the night, / And all things creep­ing to a day of doom.”

, “The Mys­tic”, Poems, Chiefly Lyri­cal

A major crit­i­cism of Arthur’s post was that it was fun­da­men­tally using the wrong data: if you have a dataset of all Google prod­ucts which have been shut­down, you can make state­ments like “the aver­age dead Google prod­uct lived 1459 days”, but you can’t infer very much about a live pro­duc­t’s life expectancy - because you don’t know if it will join the dead prod­ucts. If, for exam­ple, only 1% of prod­ucts ever died, then 1459 days would lead to a mas­sive under­es­ti­mates of the aver­age lifes­pan of all cur­rently liv­ing prod­ucts. With his data, you can only make infer­ences con­di­tional on a prod­uct even­tu­ally dying, you can­not make an uncon­di­tional infer­ence. Unfor­tu­nate­ly, the uncon­di­tional ques­tion “will it die?” is the real ques­tion any Google user wants answered!

So draw­ing on the same sources, I have com­piled a sec­ond list of liv­ing prod­ucts; the ratio of liv­ing to dead gives a base rate for how likely a ran­domly selected Google prod­uct is to be can­celed within the 1997-2013 win­dow, and with the date of the found­ing of each liv­ing pro­duct, we can also do a sim­ple right-cen­sored which will let us make bet­ter still pre­dic­tions by extract­ing con­crete results like mean time to shut­down. Some items are dead in the most mean­ing­ful sense since they have been closed to new users (Sync), lost major func­tion­al­ity (Feed­Burn­er, Mee­bo), degraded severely due to neglect (eg. ), or just been com­pletely neglected for a decade or more (Google Group’s Usenet archive) - but haven’t actu­ally died or closed yet, so I list them as alive.

Variables

“To my good friend Would I show, I thought, The plum blos­soms, Now lost to sight Amid the falling snow.”

, VIII: 1426

Sim­ply col­lect­ing the data is use­ful since it allows us to make some esti­mates like over­all death-rates or median lifes­pan. But maybe we can do bet­ter than just base rates and find char­ac­ter­is­tics which let us crack open the Google black box a tiny bit. So final­ly, for all prod­ucts, I have col­lected sev­eral covari­ates which I thought might help pre­dict longevi­ty:

  • Hits: the num­ber of Google hits for a ser­vice

    While num­ber of Google hits is a very crude mea­sure, at best, for under­ly­ing vari­ables like “pop­u­lar­ity” or “num­ber of users” or “profitabil­ity”, and clearly biased towards recently released prod­ucts (there aren’t going to be as many hits for, say, “Google Answers” as there would have been if we had searched for it in 2002), it may add some insight.

    There do not seem to be any other free qual­ity sources indi­cat­ing either his­tor­i­cal or con­tem­po­rary traffic to a prod­uct URL/homepage which could be used in the analy­sis - ser­vices like Alexa or Google Ad Plan­ner either are com­mer­cial, for domains only, or sim­ply do not cover many of the URLs. (After I fin­ished data col­lec­tion, it was pointed out to me that while Google’s Ad Plan­ner may not be use­ful, Google’s AdWords does yield a count of global searches for a par­tic­u­lar query that mon­th, which would have worked albeit it would only indi­cate cur­rent lev­els of inter­est and noth­ing about his­tor­i­cal lev­el­s.)

  • Type: a cat­e­go­riza­tion into “ser­vice”/“pro­gram”/“thing”/“other”

    1. A ser­vice is any­thing pri­mar­ily accessed through a web browser or API or the Inter­net; so Gmail or a browser load­ing fonts from a Google server, but not a Gmail noti­fi­ca­tion pro­gram one runs on one’s com­puter or a FLOSS font avail­able for down­load & dis­tri­b­u­tion.

    2. A pro­gram is any­thing which is an appli­ca­tion, plug­in, library, frame­work, or all of these com­bined; some are very small (Au­then­ti­ca­tor) and some are very large (An­droid). This does include pro­grams which require Inter­net con­nec­tions or Google APIs as well as pro­grams for which the source code has not been released, so things in the pro­gram cat­e­gory are not immune to shut­down and may be use­ful only as long as Google sup­ports them.

    3. A thing is any­thing which is pri­mar­ily a phys­i­cal object. A cell­phone run­ning Android or a Chrome­book would be an exam­ple.

      In ret­ro­spect, I prob­a­bly should have excluded this cat­e­gory entire­ly: there’s no rea­son to expect cell­phones to fol­low the same life­cy­cle as a ser­vice or pro­gram, it leads to even worse clas­si­fi­ca­tion prob­lems (when does an Android cell­phone ‘die’? should one even be look­ing at indi­vid­ual cell­phones or lap­tops rather than entire prod­uct lines?), there tend to be many iter­a­tions of a prod­uct and they’re all hard to research, etc.

    4. Other is the catch-all cat­e­gory for things which don’t quite seem to fit. Where does a Google think-tank, char­i­ty, con­fer­ence, or ven­ture cap­i­tal fund fit in? They cer­tainly aren’t soft­ware, but they don’t seem to be quite ser­vices either.

  • Profit: whether Google directly makes money off a prod­uct

    This is a tricky one. Google excuses many of its prod­ucts by say­ing that any­thing which increases Inter­net usage ben­e­fits Google and so by this log­ic, every sin­gle one of its ser­vices could poten­tially increase profit; but this is a lit­tle stretched, the truth very hard to judge by an out­sider, and one would expect that prod­ucts with­out direct mon­e­ti­za­tion are more likely to be killed.

    Gen­er­al­ly, I clas­sify as for profit any Google prod­uct directly relat­ing to producing/displaying adver­tis­ing, paid sub­scrip­tions, fees, or pur­chases (Ad­Words, Gmail, Blog­ger, Search, shop­ping engi­nes, sur­veys); but many do not seem to have any form of mon­e­ti­za­tion related to them (Alerts, Office, Dri­ve, Gears, Reader12). Some ser­vices like Voice charge (for inter­na­tional calls) but the amounts are minor enough that one might won­der if clas­si­fy­ing them as for profit is really right. While it might make sense to define every fea­ture added to, say, Google Search (eg. Per­son­al­ized Search, or Search His­to­ry) as being ‘for profit’ since Search lucra­tively dis­plays ads, I have cho­sen to clas­sify these sec­ondary fea­tures as being not for profit.

  • FLOSS: whether the source code was released or Google oth­er­wise made it pos­si­ble for third par­ties to con­tinue the ser­vice or main­tain the appli­ca­tion.

    In the long run, the util­ity of all non-Free soft­ware approaches zero. All non-Free soft­ware is a dead end.13

    Android, Angu­lar­JS, and Chrome are all exam­ples of soft­ware prod­ucts where Google los­ing inter­est would not be fatal; ser­vices spun off to third par­ties would also count. Many of the code­bases rely on a pro­pri­etary Google API or ser­vice (espe­cially the mobile appli­ca­tion­s), which means that this vari­able is not as mean­ing­ful and laud­able as one might expect, so in the minor­ity of cases where this vari­able is rel­e­vant, I code Dead & Ended as related to whether & when Google aban­doned it, regard­less of whether it was then picked up by third par­ties or not. (Ex­am­ple: App Inven­tor for Android is listed as dying in Decem­ber 2011, though it was then half a year later handed over to MIT, who has sup­ported it since.) It’s impor­tant to not naively believe that sim­ply because source code is avail­able, Google sup­port does­n’t mat­ter.

  • Acquisition: whether it was related to a pur­chase of a com­pany or licens­ing, or inter­nally devel­oped.

    This is use­ful for inves­ti­gat­ing the so-called “Google black hole”: Google has bought many star­tups (Dou­bleClick, Dodge­ball, Android, Picas­a), or technologies/data licensed (SYSTRAN for Trans­late, Twit­ter data for Real-Time Search), but it’s claimed many stag­nate & wither (Jaiku, JotSpot, Dodge­ball, Zagat). So we’ll include this. If a closely related prod­uct is devel­oped and released after pur­chase, like a mobile appli­ca­tion, I do not class it as an acqui­si­tion; just prod­ucts that were in exis­tence when the com­pany was pur­chased. I do not include prod­ucts that Google dropped imme­di­ately on pur­chase (Ap­ture, fflick, Spar­row, Reqwire­less, Peak­Stream, Wavii) or where prod­ucts based on them have not been released (Bump­Top).

Hits

Ide­ally we would have Google hits from the day before a prod­uct was offi­cially killed, but the past is, alas, no longer acces­si­ble to us, and we only have hits from searches I con­ducted 2013-04-01–2013-04-05. There are three main prob­lems with the Google hits met­ric:

  1. the Web keeps grow­ing, so 1 mil­lion hits in 2000 are not equiv­a­lent to 1 mil­lion hits in 2013
  2. ser­vices which are not killed live longer and can rack up more hits
  3. and the longer ago a pro­duc­t’s hits came into exis­tence, the more likely the rel­e­vant hits may be to have dis­ap­peared them­selves.

We can par­tially com­pen­sate by look­ing at hits aver­aged by lifes­pan; 100k hits means much less for some­thing that lived for a decade than 100k hits means for some­thing that lived just 6 months. What about the growth objec­tion? We can esti­mate the size of Google’s index at any period and inter­pret the cur­rent hits as a frac­tion of the index when the ser­vice died (ex­am­ple: sup­pose Answers has 1 mil­lion hits, died in 2006, and in 2006 the index held 1 bil­lion URLs, then we’d turn our 1m hit fig­ure into 1/1000 or 0.001); this gives us our “deflated hits”. We’ll deflate the hits by first esti­mat­ing the size of the index by fit­ting an expo­nen­tial to the rare pub­lic reports and third-party esti­mates of the size of the Google index. The data points with the best lin­ear fit:

Esti­mat­ing Google WWW index size over time

It fits rea­son­ably well. (A sig­moid might fit bet­ter, but maybe not, given the large dis­agree­ments towards the end.) With this we can then aver­age over days as well, giv­ing us 4 indices to use. We’ll look closer at the hit vari­ables lat­er.

Processing

If a prod­uct has not end­ed, the end-date is defined as 2013-04-01 (which is when I stopped com­pil­ing prod­uct­s); then the total life­time is sim­ply the end-date minus the start-date. The final CSV is avail­able at 2013-google.csv. (I wel­come cor­rec­tions from Googlers or Xooglers about any vari­ables like launch or shut­down dates or prod­ucts directly rais­ing rev­enue.)

Analysis

“I spur my horse past ruins Ruins move a trav­el­er’s heart the old para­pets high and low the ancient graves great and small the shud­der­ing shadow of a tum­ble­weed the steady sound of giant trees. But what I lament are the com­mon bones unnamed in the records of Immor­tals.”

14

Descriptive

Load­ing up our hard-won data and look­ing at an R sum­mary (for full source code repro­duc­ing all graphs and analy­ses below, see the appen­dix; I wel­come sta­tis­ti­cal cor­rec­tions or elab­o­ra­tions if accom­pa­nied by equally repro­ducible R source code), we can see we have a lot of data to look at:

    Dead            Started               Ended                 Hits               Type
#  Mode :logical   Min.   :1997-09-15   Min.   :2005-03-16   Min.   :2.04e+03   other  : 14
#  FALSE:227       1st Qu.:2006-06-09   1st Qu.:2012-04-27   1st Qu.:1.55e+05   program: 92
#  TRUE :123       Median :2008-10-18   Median :2013-04-01   Median :6.50e+05   service:234
#                  Mean   :2008-05-27   Mean   :2012-07-16   Mean   :5.23e+07   thing  : 10
#                  3rd Qu.:2010-05-28   3rd Qu.:2013-04-01   3rd Qu.:4.16e+06
#                  Max.   :2013-03-20   Max.   :2013-11-01   Max.   :3.86e+09
#    Profit          FLOSS         Acquisition       Social             Days         AvgHits
#  Mode :logical   Mode :logical   Mode :logical   Mode :logical   Min.   :   1   Min.   :      1
#  FALSE:227       FALSE:300       FALSE:287       FALSE:305       1st Qu.: 746   1st Qu.:    104
#  TRUE :123       TRUE :50        TRUE :63        TRUE :45        Median :1340   Median :    466
#                                                                  Mean   :1511   Mean   :  29870
#                                                                  3rd Qu.:2112   3rd Qu.:   2980
#                                                                  Max.   :5677   Max.   :3611940
#   DeflatedHits    AvgDeflatedHits  EarlyGoogle      RelativeRisk    LinearPredictor
#  Min.   :0.0000   Min.   :-36.57   Mode :logical   Min.   : 0.021   Min.   :-3.848
#  1st Qu.:0.0000   1st Qu.: -0.84   FALSE:317       1st Qu.: 0.597   1st Qu.:-0.517
#  Median :0.0000   Median : -0.54   TRUE :33        Median : 1.262   Median : 0.233
#  Mean   :0.0073   Mean   : -0.95                   Mean   : 1.578   Mean   : 0.000
#  3rd Qu.:0.0001   3rd Qu.: -0.37                   3rd Qu.: 2.100   3rd Qu.: 0.742
#  Max.   :0.7669   Max.   :  0.00                   Max.   :12.556   Max.   : 2.530
#  ExpectedEvents   FiveYearSurvival
#  Min.   :0.0008   Min.   :0.0002
#  1st Qu.:0.1280   1st Qu.:0.1699
#  Median :0.2408   Median :0.3417
#  Mean   :0.3518   Mean   :0.3952
#  3rd Qu.:0.4580   3rd Qu.:0.5839
#  Max.   :2.0456   Max.   :1.3443

Shutdowns over time

Google Reader: “Who is it in the blogs that calls on me? / I hear a tongue shriller than all the YouTubes / Cry ‘Read­er!’ Speak, Reader is turn’d to hear.”

Dataset: “Beware the ideas of March.”

Act 1, scene 2, 15-19; with apolo­gies.

An inter­est­ing aspect of the shut­downs is they are unevenly dis­trib­uted by month as we can see with a chi-squared test (p = 0.014) and graph­i­cal­ly, with a major spike in Sep­tem­ber and then March/April15:

Shut­downs binned by month of year, reveal­ing peaks in Sep­tem­ber, March, and April

As befits a com­pany which has grown enor­mously since 1997, we can see other imbal­ances over time: eg. Google launched very few prod­ucts from 1997-2004, and many more from 2005 and on:

Starts binned by year

We can plot life­time against shut-down to get a clearer pic­ture:

All prod­ucts scat­ter-plot­ted date of open­ing vs lifes­pan

That clumpi­ness around 2009 is sus­pi­cious. To empha­size this bulge of shut­downs in late 2011-2012, we can plot the his­togram of dead prod­ucts by year and also a ker­nel den­si­ty:

Shut­down den­sity binned by year
Equiv­a­lent ker­nel den­sity (de­fault band­width)

The ker­nel den­sity brings out an aspect of shut­downs we might have missed before: there seems to be an absence of recent shut downs. There are 4 shut downs sched­uled for 2013 but the last one is sched­uled for Novem­ber, sug­gest­ing that we have seen the last of the 2013 casu­al­ties and that any future shut downs may be for 2014.

What explains such graphs over time? One can­di­date is the 2011-04-04 acces­sion of Larry Page to CEO, replac­ing Eric Schmidt who had been hired to pro­vide “adult super­vi­sion” for pre-IPO Google. He respected Steve Jobs greatly (he and Brin sug­gest­ed, before meet­ing Schmidt, that their CEO be Job­s). Isaa­con’s Steve Jobs records that before his death, Jobs had strongly advised Page to “focus”, and asked “What are the five prod­ucts you want to focus on?”, say­ing “Get rid of the rest, because they’re drag­ging you down.” And on 2011-07-14 Page post­ed:

…Greater focus has also been another big fea­ture for me this quar­ter – more wood behind fewer arrows. Last mon­th, for exam­ple, we announced that we will be clos­ing Google Health and Google Pow­er­Me­ter. We’ve also done sub­stan­tial inter­nal work sim­pli­fy­ing and stream­lin­ing our prod­uct lines. While much of that work has not yet become vis­i­ble exter­nal­ly, I am very happy with our progress here. Focus and pri­or­i­ti­za­tion are cru­cial given our amaz­ing oppor­tu­ni­ties.

While some have tried to dis­agree, it’s hard not to con­clude that indeed, a wall of shut­downs fol­lowed in late 2011 and 2012. But this sound very much like a one-time purge: if one has a new focus on focus, then one may not be start­ing up as many ser­vices as before and the ser­vices which one does start up should be more likely to sur­vive.

Modeling

Logistic regression

A first step in pre­dict­ing when a prod­uct will be shut­down is pre­dict­ing whether it will be shut­down. Since we’re pre­dict­ing a binary out­come (a prod­uct liv­ing or dying), we can use the usu­al: an ordi­nary . Our first look uses the main vari­ables plus the total hits:

# Coefficients:
#                 Estimate Std. Error z value Pr(>|z|)
# (Intercept)       2.3968     1.0680    2.24    0.025
# Typeprogram       0.9248     0.8181    1.13    0.258
# Typeservice       1.2261     0.7894    1.55    0.120
# Typething         0.8805     1.1617    0.76    0.448
# ProfitTRUE       -0.3857     0.2952   -1.31    0.191
# FLOSSTRUE        -0.1777     0.3791   -0.47    0.639
# AcquisitionTRUE   0.4955     0.3434    1.44    0.149
# SocialTRUE        0.7866     0.3888    2.02    0.043
# log(Hits)        -0.3089     0.0567   -5.45  5.1e-08

In , >0 increases the chance of an event (shut­down) and <0 decreases it. So look­ing at the coeffi­cients, we can ven­ture some inter­pre­ta­tions:

  • Google has a past his­tory of screw­ing up social and then killing them

    This is inter­est­ing for con­firm­ing the gen­eral belief that Google has han­dled badly its social prop­er­ties in the past, but I’m not sure how use­ful this is for pre­dict­ing the future: since Larry Page became obsessed with social in 2009, a we might expect any­thing to do with “social” would be either merged into Google+ or oth­er­wise be kept on life sup­port longer than it would before

  • Google is dep­re­cat­ing soft­ware prod­ucts in favor of web ser­vices

    A lot of Google’s efforts with Fire­fox and then Chromium was for improv­ing web browsers as a plat­form for deliv­er­ing appli­ca­tions. As efforts like HTML5 mature, there is less incen­tive for Google to release and sup­port stand­alone soft­ware.

  • But appar­ently not its FLOSS soft­ware

    This seems due to a num­ber of its soft­ware releases being picked up by third-par­ties (Wave, Ether­pad, Refine), designed to be inte­grated into exist­ing com­mu­ni­ties (Sum­mer of Code pro­ject­s), or appar­ently serv­ing a strate­gic role (An­droid, Chromi­um, Dart, Go, Clo­sure Tools, VP Codecs) in which we could sum­ma­rize as ‘build­ing up a browser replace­ment for oper­at­ing sys­tems’. (Why? )

  • things which charge or show adver­tis­ing are more likely to sur­vive

    We expect this, but it’s good to have con­fir­ma­tion (if noth­ing else, it par­tially val­i­dates the data).

  • Pop­u­lar­ity as mea­sured by Google hits seems to mat­ter

    …Or does it? This vari­able seems par­tic­u­larly treach­er­ous and sus­cep­ti­ble to reverse-cau­sa­tion issues (does lack of hits diag­nose fail­ure, or does fail­ing cause lack of hits when I later searched?)

Use of hits data

Is our pop­u­lar­ity met­ric - or any of the 4 - trust­wor­thy? All this data has been col­lected after the fact, some­times many years; what if the data have been con­t­a­m­i­nated by the fact that some­thing shut­down? For exam­ple, by a burst of pub­lic­ity about an obscure ser­vice shut­ting down? (Iron­i­cal­ly, this page is con­tribut­ing to the infla­tion of hits for any dead ser­vice men­tioned.) Are we just see­ing infor­ma­tion “leak­age”? Leak­age can be sub­tle, as I learned for myself doing this analy­sis.

Inves­ti­gat­ing fur­ther, hits by them­selves do mat­ter:

#             Estimate Std. Error z value Pr(>|z|)
# (Intercept)   3.4052     0.7302    4.66  3.1e-06
# log(Hits)    -0.3000     0.0549   -5.46  4.7e-08

Aver­age hits (hits over the pro­duc­t’s life­time) turns out to be even more impor­tant:

#              Estimate Std. Error z value Pr(>|z|)
# (Intercept)    -2.297      1.586   -1.45    0.147
# log(Hits)       0.511      0.209    2.44    0.015
# log(AvgHits)   -0.852      0.217   -3.93  8.3e-05

This is more than a lit­tle strange; the higher the aver­age hits, the less likely to be killed makes per­fect sense but then, surely the higher the hits, the less likely as well? But no. The mys­tery deep­ens as we bring in the third hit met­ric we devel­oped:

#                   Estimate Std. Error z value Pr(>|z|)
# (Intercept)        -21.589     11.955   -1.81   0.0709
# log(Hits)            2.054      0.980    2.10   0.0362
# log(AvgHits)        -1.921      0.708   -2.71   0.0067
# log(DeflatedHits)   -0.456      0.277   -1.64   0.1001

And sure enough, if we run all 4 hit vari­ables, 3 of them turn out to be sta­tis­ti­cal­ly-sig­nifi­cant and large:

#                   Estimate Std. Error z value Pr(>|z|)
# (Intercept)       -24.6898    12.4696   -1.98   0.0477
# log(Hits)           2.2908     1.0203    2.25   0.0248
# log(AvgHits)       -2.0943     0.7405   -2.83   0.0047
# log(DeflatedHits)  -0.5383     0.2914   -1.85   0.0647
# AvgDeflatedHits    -0.0651     0.0605   -1.08   0.2819

It’s not that the hit vari­ables are some­how sum­ma­riz­ing or prox­y­ing for the oth­ers, because if we toss in all the non-hits pre­dic­tors and penal­ize para­me­ters based on adding com­plex­ity with­out increas­ing fit, we still wind up with the 3 hit vari­ables:

#                   Estimate Std. Error z value Pr(>|z|)
# (Intercept)        -23.341     12.034   -1.94   0.0524
# AcquisitionTRUE      0.631      0.350    1.80   0.0712
# SocialTRUE           0.907      0.394    2.30   0.0213
# log(Hits)            2.204      0.985    2.24   0.0252
# log(AvgHits)        -2.068      0.713   -2.90   0.0037
# log(DeflatedHits)   -0.492      0.280   -1.75   0.0793
# ...
# AIC: 396.9

Most of the pre­dic­tors were removed as not help­ing a lot, 3 of the 4 hit vari­ables sur­vived (but not the both aver­aged & deflated hits, sug­gest­ing it was­n’t adding much in com­bi­na­tion), and we see two of the bet­ter pre­dic­tors from ear­lier sur­vived: whether some­thing was an acqui­si­tion and whether it was social.

The orig­i­nal hits vari­able has the wrong sign, as expected of data leak­age; now the aver­age and deflated hits have the pre­dicted sign (the higher the hit count, the lower the risk of death), but this does­n’t put to rest my con­cerns: the aver­age hits has the right sign, yes, but now the effect size seems way too high - we reject the hits with a log-odds of +2.1 as con­t­a­m­i­nated and a cor­re­la­tion almost 4 times larger than one of the known-good cor­re­la­tions (be­ing an acqui­si­tion), but the aver­age hits is -2 & almost as big a log odds! The only vari­able which seems trust­wor­thy is the deflated hits: it has the right sign and is a more plau­si­ble 5x small­er. I’ll use just the deflated hits vari­able (although I will keep in mind that I’m still not sure it is free from data leak­age).

Survival curve

The logis­tic regres­sion helped win­now down the vari­ables but is lim­ited to the binary out­come of shut­down or not; it can’t use the poten­tially very impor­tant vari­able of how many days a prod­uct has sur­vived for the sim­ple rea­son that of course mor­tal­ity will increase with time! (“But this long run is a mis­lead­ing guide to cur­rent affairs. In the long run we are all dead.”)

For look­ing at sur­vival over time, might be a use­ful elab­o­ra­tion. Not being pre­vi­ously famil­iar with the area, I drew on Wikipedia, Fox & Weis­berg’s appen­dix, , Zhou’s tuto­r­ial, and Hos­mer & Lemeshow’s Applied Sur­vival Analy­sis for the fol­low­ing results using the survival library (see also CRAN Task View: Sur­vival Analy­sis, and the tax­on­omy of sur­vival analy­sis meth­ods in ). Any errors are mine.

The ini­tial char­ac­ter­i­za­tion gives us an opti­mistic median of 2824 days (note that this is higher than Arthur’s mean of 1459 days because it addressed the con­di­tion­al­ity issue dis­cussed ear­lier by includ­ing prod­ucts which were never can­celed, and I made a stronger effort to col­lect pre-2009 prod­uct­s), but the lower bound is not tight and too lit­tle of the sam­ple has died to get an upper bound:

# records   n.max n.start  events  median 0.95LCL 0.95UCL
#     350     350     350     123    2824    2095      NA

Our over­all looks a bit inter­est­ing:

Shut­down cumu­la­tive prob­a­bil­ity as a func­tion of time

If there were con­stant mor­tal­ity of prod­ucts at each day after their launch, we would expect a “type II” curve where it looks like a straight line, and if the haz­ard increased with age like with humans we would see a “type I” graph in which the curve nose-di­ves; but in fact it looks like there’s a sort of “lev­el­ing off” of deaths, sug­gest­ing a “type III” curve; per Wikipedia:

…the great­est mor­tal­ity is expe­ri­enced early on in life, with rel­a­tively low rates of death for those sur­viv­ing this bot­tle­neck. This type of curve is char­ac­ter­is­tic of species that pro­duce a large num­ber of off­spring (see ).

Very nifty: the sur­vivor­ship curve is con­sis­tent with tech indus­try or startup philoso­phies of doing lots of things, iter­at­ing fast, and throw­ing things at the wall to see what sticks. (More pleas­ing­ly, it sug­gests that my dataset is not biased against the inclu­sion of short­-lived prod­ucts: if I had been fail­ing to find a lot of short­-lived prod­ucts, then we would expect to see the true sur­vivor­ship curve dis­torted into some­thing of a type II or type I curve and not a type III curve where a lot of prod­ucts are early deaths; so if there were a data col­lec­tion bias against short­-lived prod­ucts, then the true sur­vivor­ship curve must be even more extremely type III.)

How­ev­er, it looks like the mor­tal­ity only starts decreas­ing around 2000 days, so any prod­uct that far out must have been founded around or before 2005, which is when we pre­vi­ously noted that Google started pump­ing out a lot of prod­ucts and may also have changed its shut­down-re­lated behav­iors; this could vio­late a basic assump­tion of Kaplan-Meier, that the under­ly­ing sur­vival func­tion isn’t itself chang­ing over time.

Our next step is to fit a Cox to our covari­ates:

# ...n= 350, number of events= 123
#
#                     coef exp(coef) se(coef)     z Pr(>|z|)
# AcquisitionTRUE    0.130     1.139    0.257  0.51    0.613
# FLOSSTRUE          0.141     1.151    0.293  0.48    0.630
# ProfitTRUE        -0.180     0.836    0.231 -0.78    0.438
# SocialTRUE         0.664     1.943    0.262  2.53    0.011
# Typeprogram        0.957     2.603    0.747  1.28    0.200
# Typeservice        1.291     3.638    0.725  1.78    0.075
# Typething          1.682     5.378    1.023  1.64    0.100
# log(DeflatedHits) -0.288     0.749    0.036 -8.01  1.2e-15
#
#                   exp(coef) exp(-coef) lower .95 upper .95
# AcquisitionTRUE       1.139      0.878     0.688     1.884
# FLOSSTRUE             1.151      0.868     0.648     2.045
# ProfitTRUE            0.836      1.197     0.531     1.315
# SocialTRUE            1.943      0.515     1.163     3.247
# Typeprogram           2.603      0.384     0.602    11.247
# Typeservice           3.637      0.275     0.878    15.064
# Typething             5.377      0.186     0.724    39.955
# log(DeflatedHits)     0.749      1.334     0.698     0.804
#
# Concordance= 0.726  (se = 0.028 )
# Rsquare= 0.227   (max possible= 0.974 )
# Likelihood ratio test= 90.1  on 8 df,   p=4.44e-16
# Wald test            = 79.5  on 8 df,   p=6.22e-14
# Score (logrank) test = 83.5  on 8 df,   p=9.77e-15

And then we can also test whether any of the covari­ates are sus­pi­cious; in gen­eral they seem to be fine:

#                       rho  chisq     p
# AcquisitionTRUE   -0.0252 0.0805 0.777
# FLOSSTRUE          0.0168 0.0370 0.848
# ProfitTRUE        -0.0694 0.6290 0.428
# SocialTRUE         0.0279 0.0882 0.767
# Typeprogram        0.0857 0.9429 0.332
# Typeservice        0.0936 1.1433 0.285
# Typething          0.0613 0.4697 0.493
# log(DeflatedHits) -0.0450 0.2610 0.609
# GLOBAL                 NA 2.5358 0.960

My sus­pi­cion lingers, though, so I threw in another covari­ate (EarlyGoogle): whether a prod­uct was released before or after 2005. Does this add pre­dic­tive value above and over sim­ply know­ing that a prod­uct is really old, and does the regres­sion still pass the pro­por­tional assump­tion check? Appar­ently yes to both:

#                      coef exp(coef) se(coef)     z Pr(>|z|)
# AcquisitionTRUE    0.1674    1.1823   0.2553  0.66    0.512
# FLOSSTRUE          0.1034    1.1090   0.2922  0.35    0.723
# ProfitTRUE        -0.1949    0.8230   0.2318 -0.84    0.401
# SocialTRUE         0.6541    1.9233   0.2601  2.51    0.012
# Typeprogram        0.8195    2.2694   0.7472  1.10    0.273
# Typeservice        1.1619    3.1960   0.7262  1.60    0.110
# Typething          1.6200    5.0529   1.0234  1.58    0.113
# log(DeflatedHits) -0.2645    0.7676   0.0375 -7.06  1.7e-12
# EarlyGoogleTRUE   -1.0061    0.3656   0.5279 -1.91    0.057
# ...
# Concordance= 0.728  (se = 0.028 )
# Rsquare= 0.237   (max possible= 0.974 )
# Likelihood ratio test= 94.7  on 9 df,   p=2.22e-16
# Wald test            = 76.7  on 9 df,   p=7.2e-13
# Score (logrank) test = 83.8  on 9 df,   p=2.85e-14
#                        rho   chisq     p
# ...
# EarlyGoogleTRUE   -0.05167 0.51424 0.473
# GLOBAL                  NA 2.52587 0.980

As pre­dict­ed, the pre-2005 vari­able does indeed cor­re­late to less chance of being shut­down, is the third-largest pre­dic­tor, and almost reaches a ran­dom16 level of sta­tis­ti­cal-sig­nifi­cance - but it does­n’t trig­ger the assump­tion tester, so we’ll keep using the Cox mod­el.

Now let’s inter­pret the mod­el. The covari­ates tell us that to reduce the risk of shut­down, you want to:

  1. Not be an acqui­si­tion
  2. Not be FLOSS
  3. Be directly mak­ing money
  4. Not be related to social net­work­ing
  5. Have lots of Google hits rel­a­tive to life­time
  6. Have been launched early in Google’s life­time

This all makes sense to me. I find par­tic­u­larly inter­est­ing the profit and social effects, but the odds are a lit­tle hard to under­stand intu­itive­ly; if being social increases the odds of shut­down by 1.9233 and not being directly profitable increases the odds by 1.215, what do those look like? We can graph pairs of sur­vivor­ship curves, split­ting the full dataset (omit­ting the con­fi­dence inter­vals for leg­i­bil­i­ty, although they do over­lap), to get a grasp of what these num­bers mean:

All prod­ucts over time, split by Profit vari­able
All prod­ucts over time, split by Social vari­able

Random forests

Because I can, I was curi­ous how (Breiman 2001) might stack up to the logis­tic regres­sion and against a base-rate pre­dic­tor (that noth­ing was shut down, since ~65% of the prod­ucts are still alive).

With randomForest, I trained a ran­dom for­est as a clas­si­fier, yield­ing rea­son­able look­ing error rates:

#                Type of random forest: classification
#                      Number of trees: 500
# No. of variables tried at each split: 2
#
#         OOB estimate of  error rate: 31.71%
# Confusion matrix:
#       FALSE TRUE class.error
# FALSE   216   11     0.04846
# TRUE    100   23     0.81301

To com­pare the ran­dom for­est accu­racy with the logis­tic mod­el’s accu­ra­cy, I inter­preted the logis­tic esti­mate of shut­down odds >1 as pre­dict­ing shut­down and <1 as pre­dict­ing not shut­down; I then com­pared the full sets of pre­dic­tions with the actual shut­down sta­tus. (This is not a like those I employed in grad­ing fore­casts of the , but this should be an intu­itively under­stand­able way of grad­ing mod­els’ pre­dic­tion­s.)

The base-rate pre­dic­tor got 65% right by defi­n­i­tion, the logis­tic man­aged to score 68% cor­rect (17 95% CI: 66-72%), and the ran­dom for­est sim­i­larly got 68% (67-78%). These rates are not quite as bad as they may seem: I excluded the life­time length (Days) from the logis­tic and ran­dom forests because unless one is han­dling it spe­cially with sur­vival analy­sis, it leaks infor­ma­tion; so there’s pre­dic­tive power being left on the table. A fairer com­par­i­son would use life­times.

Random survival forests

The next step is to take into account life­time length & esti­mated sur­vival curves. We can do that using (see also “Mogensen et al 2012”), imple­mented in randomForestSRC (suc­ces­sor to Ish­waran’s orig­i­nal library randomSurvivalForest). This ini­tially seems very promis­ing:

#                          Sample size: 350
#                     Number of deaths: 122
#                      Number of trees: 1000
#           Minimum terminal node size: 3
#        Average no. of terminal nodes: 61.05
# No. of variables tried at each split: 3
#               Total no. of variables: 7
#                             Analysis: Random Forests [S]RC
#                               Family: surv
#                       Splitting rule: logrank *random*
#        Number of random split points: 1
#               Estimate of error rate: 35.37%

and even gives us a cute plot of how accu­racy varies with how big the for­est is (looks like we don’t need to tweak it) and how impor­tant each vari­able is as a pre­dic­tor:

Visual com­par­i­son of the aver­age use­ful­ness of each vari­able to deci­sion trees

Esti­mat­ing the error rate for this ran­dom sur­vival for­est like we did pre­vi­ous­ly, we’re happy to see a 78% error rate. Build­ing a pre­dic­tor based on the Cox mod­el, we get a lesser (but still bet­ter than the non-sur­vival mod­els) 72% error rate.

How do these mod­els per­form when we check their robust­ness via the boot­strap? Not so great. The ran­dom sur­vival for­est col­lapses to 57-64% (95% on 200 repli­cates), but the Cox model just to 68-73%. This sug­gests to me that some­thing is going wrong with the ran­dom sur­vival for­est model (over­fit­ting? pro­gram­ming error?) and there’s no real rea­son to switch to the more com­plex ran­dom forests, so here too we’ll stick with the ordi­nary Cox mod­el.

Predictions

Before mak­ing explicit pre­dic­tions of the future, let’s look at the for prod­ucts which haven’t been shut­down. What does the Cox model con­sider the 10 most at risk and likely to be shut­down prod­ucts?

It lists (in decreas­ingly risky order):

  1. Schemer
  2. Bou­tiques
  3. Mag­ni­fier
  4. Hot­pot
  5. Page Speed Online API
  6. What­son­When
  7. Unoffi­cial Guides
  8. WDYL search engine
  9. Cloud Mes­sag­ing
  10. Cor­re­late

These all seem like rea­son­able prod­ucts to sig­nal out (as much as I love Cor­re­late for mak­ing it eas­ier than ever to demon­strate “cor­re­la­tion ≠ cau­sa­tion”, I’m sur­prised it or Bou­tiques still exist), except for Cloud Mes­sag­ing which seems to be a key part of a lot of Android. And like­wise, the list of the 10 least risky (in­creas­ingly risky order):

  1. Search
  2. Trans­late
  3. AdWords
  4. Picasa
  5. Groups
  6. Image Search
  7. News
  8. Books
  9. Tool­bar
  10. AdSense

One can’t imag­ine flag­ship prod­ucts like Search or Books ever being shut down, so this list is good as far as it goes; I am skep­ti­cal about the actual unrisk­i­ness of Picasa and Tool­bar given their gen­eral neglect and old-fash­ioned­ness, though I under­stand why the model favors them (both are pre-2005, pro­pri­etary, many hits, and adver­tis­ing-sup­port­ed). But let’s get more speci­fic; look­ing at still alive ser­vices, what pre­dic­tions do we make about the odds of a selected batch sur­viv­ing the next, say, 5 years? We can derive a sur­vival curve for each mem­ber of the batch adjusted for each sub­jec­t’s covari­ates (and they vis­i­bly differ from each oth­er):

Esti­mated curves for 15 inter­est­ing prod­ucts (Ad­Sense, Schol­ar, Voice, etc)

But these are the curves for hypo­thet­i­cal pop­u­la­tions all like the spe­cific prod­uct in ques­tion, start­ing from Day 0. Can we extract spe­cific esti­mates assum­ing the prod­uct has sur­vived to today (as by defi­n­i­tion these live ser­vices have done)? Yes, but extract­ing them turns out to be a pretty grue­some hack to extract pre­dic­tions from sur­vival curves; any­way, I derive the fol­low­ing 5-year esti­mates and as com­men­tary, reg­is­ter my own best guesses as well (I’m at mak­ing pre­dic­tion­s):

Prod­uct 5-year sur­vival Per­sonal guess Rel­a­tive risk vs aver­age (low­er=­bet­ter) Sur­vived (March 2018)
AdSense 100% 99% 0.07 Yes
Blog­ger 100% 80% 0.32 Yes
Gmail 96% 99% 0.08 Yes
Search 96% 100% 0.05 Yes
Trans­late 92% 95% 0.78 Yes
Scholar 92% 85% 0.10 Yes
Alerts 89% 70% 0.21 Yes
Google+ 79% 85% 0.36 Yes18
Ana­lyt­ics 76% 97% 0.24 Yes
Chrome 70% 95% 0.24 Yes
Cal­en­dar 66% 95% 0.36 Yes
Docs 63% 95% 0.39 Yes
Voice19 44% 50% 0.78 Yes
Feed­Burner 43% 35% 0.66 Yes
Project Glass 37% 50% 0.10 No

One imme­di­ately spots that some of the mod­el’s esti­mates seem ques­tion­able in the light of our greater knowl­edge of Google.

I am more pes­simistic about the . And I think it’s absurd to give any seri­ous cre­dence Ana­lyt­ics or Cal­en­dar or Docs being at risk (An­a­lyt­ics is a key part of the adver­tis­ing infra­struc­ture, and Cal­en­dar a sine qua non of any busi­ness soft­ware suite - much less the core of said suite, Doc­s!). The Glass esti­mate is also inter­est­ing: I don’t know if I agree with the mod­el, given how famous Glass is and how much Google is push­ing it - could its future really be so chancy? On the other hand, many tech fads have come and go with­out a trace, hard­ware is always tricky, the more inti­mate a gad­get the more design mat­ters (Glass seems like the sort of thing Apple could make a block­buster, but can Google?), Glass has already received a hefty help­ing of crit­i­cism, par­tic­u­larly the man most expe­ri­enced with such HUDs () has crit­i­cized Glass as being “much less ambi­tious” than the state of the art and wor­ries that “Google and cer­tain other com­pa­nies are neglect­ing some impor­tant lessons. Their design deci­sions could make it hard for many folks to use these sys­tems. Worse, poorly con­fig­ured prod­ucts might even dam­age some peo­ple’s eye­sight and set the move­ment back years. My con­cern comes from direct expe­ri­ence.”

But some esti­mates are more for­giv­able - Google does have a bad track record with social media so some level of skep­ti­cism about Google+ seems war­ranted (and indeed, in Octo­ber 2018 Google qui­etly announced pub­lic Google+ would be shut down & hence­forth only an enter­prise prod­uct) - and on Feed­Burner or Voice, I agree with the model that their future is cloudy. The extreme opti­mism about Blog­ger is inter­est­ing since before I began this pro­ject, I thought it was slowly dying and would inevitably shut down in a few years; but as I researched the time­lines for var­i­ous Google prod­ucts, I noticed that Blog­ger seems to be favored in some ways: such as get­ting exclu­sive access to a few oth­er­wise shut­down things (eg. Scribe & Friend Con­nec­t); it was the ground zero for Google’s Dynamic Views skin redesign which was applied glob­al­ly; and Google is still heav­ily using Blog­ger for all its offi­cial announce­ments even into the Google+ era.

Over­all, these are pretty sane-sound­ing esti­mates.

Followups

“Show me the per­son who does­n’t die— death remains impar­tial. I recall a tow­er­ing man who is now a pile of dust. The World Below knows no dawn though plants enjoy another spring; those vis­it­ing this sor­row­ful place the pine wind slays with grief.”

Han-Shan, #50

It seems like it might be worth­while to con­tinue com­pil­ing a data­base and do a fol­lowup analy­sis in 5 years (2018), by which point we can judge how my pre­dic­tions stacked up against the mod­el, and also because ~100 prod­ucts may have been shut down (go­ing by the >30 casu­al­ties of 2011 and 2012) and the sur­vival curve & covari­ate esti­mates ren­dered that much sharp­er. So to com­pile updates, I’ve:

  • set up 2 Google Alerts search­es:

    • google ("shut down" OR "shut down" "shutting" OR "closing" OR "killing" OR "abandoning" OR "leaving")
    • google (launch OR release OR announce)
  • and sub­scribed to the afore­men­tioned Google Oper­at­ing Sys­tem blog

These sources yielded ~64 can­di­dates over the fol­low­ing year before I shut down addi­tions 2014-06-04.

See Also

Appendix

Source code

Run as R --slave --file=google.r:

set.seed(7777) # for reproducible numbers

library(survival)
library(randomForest)
library(boot)
library(randomForestSRC)
library(prodlim) # for 'sindex' call
library(rms)

# Generate Google corpus model for use in main analysis
# Load the data, fit, and plot:
index <- read.csv("https://www.gwern.net/docs/statistics/2013-google-index.csv",
                   colClasses=c("Date","double","character"))
# an exponential doesn't fit too badly:
model1 <- lm(log(Size) ~ Date, data=index); summary(model1)
# plot logged size data and the fit:
png(file="~/wiki/images/google/www-index-model.png", width = 3*480, height = 1*480)
plot(log(index$Size) ~ index$Date, ylab="WWW index size", xlab="Date")
abline(model1)
invisible(dev.off())

# Begin actual data analysis
google <- read.csv("https://www.gwern.net/docs/statistics/2013-google.csv",
                    colClasses=c("character","logical","Date","Date","double","factor",
                                 "logical","logical","logical","logical", "integer",
                                 "numeric", "numeric", "numeric", "logical", "numeric",
                                 "numeric", "numeric", "numeric"))
# google$Days <- as.integer(google$Ended - google$Started)
# derive all the Google index-variables
## hits per day to the present
# google$AvgHits <- google$Hits / as.integer(as.Date("2013-04-01") - google$Started)
## divide total hits for each product by total estimated size of Google index when that product started
# google$DeflatedHits <- log(google$Hits / exp(predict(model1, newdata=data.frame(Date = google$Started))))
## Finally, let's combine the two strategies: deflate and then average.
# google$AvgDeflatedHits <- log(google$AvgHits) / google$DeflatedHits
# google$DeflatedHits <- log(google$DeflatedHits)

cat("\nOverview of data:\n")
print(summary(google[-1]))

dead <- google[google$Dead,]

png(file="~/wiki/images/google/openedvslifespan.png", width = 1.5*480, height = 1*480)
plot(dead$Days ~ dead$Ended, xlab="Shutdown", ylab="Total lifespan")
invisible(dev.off())

png(file="~/wiki/images/google/shutdownsbyyear.png", width = 1.5*480, height = 1*480)
hist(dead$Ended, breaks=seq.Date(as.Date("2005-01-01"), as.Date("2014-01-01"), "years"),
                   main="shutdowns per year", xlab="Year")
invisible(dev.off())
png(file="~/wiki/images/google/shutdownsbyyear-kernel.png", width = 1*480, height = 1*480)
plot(density(as.numeric(dead$Ended)), main="Shutdown kernel density over time")
invisible(dev.off())

png(file="~/wiki/images/google/startsbyyear.png", width = 1.5*480, height = 1*480)
hist(google$Started, breaks=seq.Date(as.Date("1997-01-01"), as.Date("2014-01-01"), "years"),
     xlab="total products released in year")
invisible(dev.off())

# extract the month of each kill
m = months(dead$Ended)
# sort by chronological order, not alphabetical
m_fac = factor(m, levels = month.name)
# count by month
months <- table(sort(m_fac))
# shutdowns by month are imbalanced:
print(chisq.test(months))
# and visibly so:
png(file="~/wiki/images/google/shutdownsbymonth.png", width = 1.5*480, height = 1*480)
plot(months)
invisible(dev.off())

cat("\nFirst logistic regression:\n")
print(summary(glm(Dead ~ Type + Profit + FLOSS + Acquisition + Social + log(Hits),
                  data=google, family="binomial")))

cat("\nSecond logistic regression, focusing on treacherous hit data:\n")
print(summary(glm(Dead ~ log(Hits), data=google,family="binomial")))
cat("\nTotal + average:\n")
print(summary(glm(Dead ~ log(Hits) + log(AvgHits), data=google,family="binomial")))
cat("\nTotal, average, deflated:\n")
print(summary(glm(Dead ~ log(Hits) + log(AvgHits) + DeflatedHits, data=google,family="binomial")))
cat("\nAll:\n")
print(summary(glm(Dead ~ log(Hits) + log(AvgHits) + DeflatedHits + AvgDeflatedHits,
                  data=google, family="binomial")))
cat("\nStepwise regression through possible logistic regressions involving the hit variables:\n")
print(summary(step(glm(Dead ~ Type + Profit + FLOSS + Acquisition + Social +
                        log(Hits) + log(AvgHits) + DeflatedHits + AvgDeflatedHits,
                       data=google, family="binomial"))))

cat("\nEntering survival analysis section:\n")

cat("\nUnconditional Kaplan-Meier survival curve:\n")
surv <- survfit(Surv(google$Days, google$Dead, type="right") ~ 1)
png(file="~/wiki/images/google/overall-survivorship-curve.png", width = 1.5*480, height = 1*480)
plot(surv, xlab="Days", ylab="Survival Probability function with 95% CI")
invisible(dev.off())

cat("\nCox:\n")
cmodel <- coxph(Surv(Days, Dead) ~ Acquisition + FLOSS + Profit + Social + Type + DeflatedHits,
                data = google)
print(summary(cmodel))
cat("\nTest proportional assumption:\n")
print(cox.zph(cmodel))

cat("\nPrimitive check for regime change (re-regress & check):\n")
google$EarlyGoogle <- (as.POSIXlt(google$Started)$year+1900) < 2005
cmodel <- coxph(Surv(Days, Dead) ~ Acquisition + FLOSS + Profit + Social + Type +
                                   DeflatedHits + EarlyGoogle,
                data = google)
print(summary(cmodel))
print(cox.zph(cmodel))

cat("\nGenerating intuitive plots of social & profit;\n")
cat("\nPlot empirical survival split by profit...\n")
png(file="~/wiki/images/google/profit-survivorship-curve.png", width = 1.5*480, height = 1*480)
smodel1 <- survfit(Surv(Days, Dead) ~ Profit, data = google);
plot(smodel1, lty=c(1, 2), xlab="Days", ylab="Fraction surviving by Day");
legend("bottomleft", legend=c("Profit = no", "Profit = yes"), lty=c(1 ,2), inset=0.02)
invisible(dev.off())

cat("\nSplit by social...\n")
smodel2 <- survfit(Surv(Days, Dead) ~ Social, data = google)
png(file="~/wiki/images/google/social-survivorship-curve.png", width = 1.5*480, height = 1*480)
plot(smodel2, lty=c(1, 2), xlab="Days", ylab="Fraction surviving by Day")
legend("bottomleft", legend=c("Social = no", "Social = yes"), lty=c(1 ,2), inset=0.02)
invisible(dev.off())

cat("\nTrain some random forests for prediction:\n")
lmodel <- glm(Dead ~ Acquisition + FLOSS + Profit + Social + Type +
                     DeflatedHits + EarlyGoogle + Days,
                   data=google, family="binomial")
rf <- randomForest(as.factor(Dead) ~ Acquisition + FLOSS + Profit + Social +
                                     Type + DeflatedHits + EarlyGoogle,
                   importance=TRUE, data=google)
print(rf)
cat("\nVariables by importance for forests:\n")
print(importance(rf))

cat("\nBase-rate predictor of ~65% products alive:\n")
print(sum(FALSE == google$Dead) / nrow(google))
cat("\nLogistic regression's correct predictions:\n")
print(sum((exp(predict(lmodel))>1) == google$Dead) / nrow(google))
cat("\nRandom forest's correct predictions:\n")
print(sum((as.logical(predict(rf))) == google$Dead) / nrow(google))

cat("\nBegin bootstrap test of predictive accuracy...\n")
cat("\nGet a subsample, train logistic regression on it, test accuracy on original Google data:\n")
logisticPredictionAccuracy <- function(gb, indices) {
  g <- gb[indices,] # allows boot to select subsample
  # train new regression model on subsample
  lmodel <- glm(Dead ~ Acquisition + FLOSS + Profit + Social + Type +
                       DeflatedHits + EarlyGoogle + Days,
                   data=g, family="binomial")
  return(sum((exp(predict(lmodel, newdata=google))>1) == google$Dead) / nrow(google))
}
lbs <- boot(data=google, statistic=logisticPredictionAccuracy, R=20000, parallel="multicore", ncpus=4)
print(boot.ci(lbs, type="norm"))

cat("\nDitto for random forests:\n")
randomforestPredictionAccuracy <- function(gb, indices) {
  g <- gb[indices,]
  rf <- randomForest(as.factor(Dead) ~ Acquisition + FLOSS + Profit + Social + Type +
                       DeflatedHits + EarlyGoogle + Days,
                   data=g)
  return(sum((as.logical(predict(rf))) == google$Dead) / nrow(google))
}
rfbs <- boot(data=google, statistic=randomforestPredictionAccuracy, R=20000, parallel="multicore", ncpus=4)
print(boot.ci(rfbs, type="norm"))

cat("\nFancier comparison: random survival forests and full Cox model with bootstrap\n")
rsf <- rfsrc(Surv(Days, Dead) ~ Acquisition + FLOSS + Profit + Social + Type + DeflatedHits + EarlyGoogle,
             data=google, nsplit=1)
print(rsf)

png(file="~/wiki/images/google/rsf-importance.png", width = 1.5*480, height = 1*480)
plot(rsf)
invisible(dev.off())

# calculate cumulative hazard function; adapted from Mogensen et al 2012 (I don't understand this)
predictSurvProb.rsf <- function (object, newdata, times, ...) {
    N <- NROW(newdata)
    # class(object) <- c("rsf", "grow")
    S <- exp(-predict.rfsrc(object, test = newdata)$chf)
    if(N == 1) S <- matrix(S, nrow = 1)
    Time <- object$time.interest
    p <- cbind(1, S)[, 1 + sindex(Time, times),drop = FALSE]
    if(NROW(p) != NROW(newdata) || NCOL(p) != length(times))
     stop("Prediction failed")
    p
}
totals <- as.integer(as.Date("2013-04-01") - google$Started)
randomSurvivalPredictionAccuracy <- function(gb, indices) {
    g <- gb[indices,]
    rsfB <- rfsrc(Surv(Days, Dead) ~ Acquisition + FLOSS + Profit + Social + Type +
                                     DeflatedHits + EarlyGoogle,
                 data=g, nsplit=1)

    predictionMatrix <- predictSurvProb.rsf(rsfB, google, totals)
    rm(predictions)
    for (i in 1:nrow(google)) { predictions[i] <- predictionMatrix[i,i] }

    return(sum((predictions<0.50) == google$Dead) / nrow(google))
}
# accuracy on full Google dataset
print(randomSurvivalPredictionAccuracy(google, 1:nrow(google)))
# check this high accuracy using bootstrap
rsfBs <- boot(data=google, statistic=randomSurvivalPredictionAccuracy, R=200, parallel="multicore", ncpus=4)
print(rsfBs)
print(boot.ci(rsfBs, type="perc"))

coxProbability <- function(cm, d, t) {
    x <- survfit(cm, newdata=d)
    p <- x$surv[Position(function(a) a>t, x$time)]
    if (is.null(p)) { coxProbability(d, (t-1)) } else {if (is.na(p)) p <- 0}
    p
    }
randomCoxPredictionAccuracy <- function(gb, indices) {
    g <- gb[indices,]
    cmodel <- cmodel <- coxph(Surv(Days, Dead) ~ Acquisition + FLOSS + Profit + Social + Type + DeflatedHits,
                data = g)

    rm(predictions)
    for (i in 1:nrow(google)) { predictions[i] <- coxProbability(cmodel, google[i,], totals[i]) }

    return(sum((predictions<0.50) == google$Dead) / nrow(google))
    }
print(randomCoxPredictionAccuracy(google, 1:nrow(google)))
coxBs <- boot(data=google, statistic=randomCoxPredictionAccuracy, R=200, parallel="multicore", ncpus=4)
print(coxBs)
print(boot.ci(coxBs, type="perc"))

cat("\nRanking products by Cox risk ratio...\n")
google$RiskRatio <- predict(cmodel, type="risk")
alive <- google[!google$Dead,]

cat("\nExtract the 10 living products with highest estimated relative risks:\n")
print(head(alive[order(alive$RiskRatio, decreasing=TRUE),], n=10)$Product)

cat("\nExtract the 10 living products with lowest estimated relative risk:\n")
print(head(alive[order(alive$RiskRatio, decreasing=FALSE),], n=10)$Product)

cat("\nBegin calculating specific numerical predictions about remaining lifespans..\n")
cpmodel <- cph(Surv(Days, Dead) ~ Acquisition + FLOSS + Profit + Social + Type +
                                  DeflatedHits + EarlyGoogle,
               data = google, x=TRUE, y=TRUE, surv=TRUE)
predictees <- subset(google, Product %in% c("Alerts","Blogger","FeedBurner","Scholar",
                                            "Book Search","Voice","Gmail","Analytics",
                                            "AdSense","Calendar","Alerts","Google+","Docs",
                                            "Search", "Project Glass", "Chrome", "Translate"))
# seriously ugly hack
conditionalProbability <- function (d, followupUnits) {
    chances <- rep(NA, nrow(d)) # stash results

    for (i in 1:nrow(d)) {

        # extract chance of particular subject surviving as long as it has:
        beginProb <- survest(cpmodel, d[i,], times=(d[i,]$Days))$surv
        if (length(beginProb)==0) { beginProb <- 1 } # set to a default

        tmpFollowup <- followupUnits # reset in each for loop
        while (TRUE) {
            # extract chance of subject surviving as long as it has + an arbitrary additional time-units
            endProb <- survest(cpmodel, d[i,], times=(d[i,]$Days + tmpFollowup))$surv
            # survival curve may not reach that far! 'survexp returns 'numeric(0)' if it doesn't;
            # so we shrink down 1 day and try again until 'survexp' *does* return a usable answer
            if (length(endProb)==0) { tmpFollowup <- tmpFollowup - 1} else { break }
        }

        # if 50% of all subjects survive to time t, and 20% of all survive to time t+100, say, what chance
        # does a survivor - at exactly time t - have of making it to time t+100? 40%: 0.20 / 0.50 = 0.40
        chances[i] <- endProb / beginProb
    }
    return(chances)
}
## the risks and survival estimate have been stashed in the original CSV to save computation
# google$RelativeRisk <- predict(cmodel, newdata=google, type="risk")
# google$LinearPredictor <- predict(cmodel, newdata=google, type="lp")
# google$ExpectedEvents <- predict(cmodel, newdata=google, type="expected")
# google$FiveYearSurvival <- conditionalProbability(google, 5*365.25)

# graphs survival curves for each of the 15
png(file="~/wiki/images/google/15-predicted-survivorship-curves.png", width = 1.5*480, height = 1*480)
plot(survfit(cmodel, newdata=predictees),
     xlab = "time", ylab="Survival", main="Survival curves for 15 selected Google products")
invisible(dev.off())
cat("\nPredictions for the 15 and also their relative risks:\n")
ps <- conditionalProbability(predictees, 5*365.25)
print(data.frame(predictees$Product, ps*100))
print(round(predict(cmodel, newdata=predictees, type="risk"), digits=2))

# Analysis done

cat("\nOptimizing the generated graphs by cropping whitespace & losslessly compressing them...\n")
system(paste('cd ~/wiki/images/google/ &&',
             'for f in *.png; do convert "$f" -crop',
             '`nice convert "$f" -virtual-pixel edge -blur 0x5 -fuzz 10% -trim -format',
             '\'%wx%h%O\' info:` +repage "$f"; done'))
system("optipng -o9 -fix ~/wiki/images/google/*.png", ignore.stdout = TRUE)

Leakage

While the hit-counts are a pos­si­ble form of leak­age, I acci­den­tally caused a clear case of leak­age while see­ing how ran­dom forests would do in pre­dict­ing shut­downs.

One way to get data leak­age is if we include the end-date; early on in my analy­sis I removed the Dead vari­able but it did­n’t occur to me to remove the Ended date fac­tor. The ran­dom for­est would pre­dict cor­rectly every sin­gle shut­down except for 8, for an error-rate of 2%. How did it turn in this near­ly-om­ni­scient set of pre­dic­tions and why did it get those 8 wrong? Because the 8 prod­ucts are cor­rectly marked in the orig­i­nal dataset as “dead” because their shut­down had been announced by Google, but had been sched­uled by Google to die after the day I was run­ning the code. So it turned out that the ran­dom forests were just emit­ting ‘dead’ for ‘any­thing with an end date before 2013-04-04’, and alive for every­thing there­after!

library(randomForest)
rf <- randomForest(as.factor(Dead) ~ ., data=google[-1])
google[rf$predicted != google$Dead,]
#                      Product Dead    Started      Ended     Hits    Type Profit FLOSS Acquisition
# 24 Gmail Exchange ActiveSync TRUE 2009-02-09 2013-07-01   637000 service  FALSE FALSE       FALSE
# 30  CalDAV support for Gmail TRUE 2008-07-28 2013-09-16   245000 service  FALSE FALSE       FALSE
# 37                    Reader TRUE 2005-10-07 2013-07-01 79100000 service  FALSE FALSE       FALSE
# 38               Reader Play TRUE 2010-03-10 2013-07-01    43500 service  FALSE FALSE       FALSE
# 39                   iGoogle TRUE 2005-05-01 2013-11-01 33600000 service   TRUE FALSE       FALSE
# 74            Building Maker TRUE 2009-10-13 2013-06-01  1730000 service  FALSE FALSE       FALSE
# 75             Cloud Connect TRUE 2011-02-24 2013-04-30   530000 program  FALSE FALSE       FALSE
# 77   Search API for Shopping TRUE 2011-02-11 2013-09-16   217000 service   TRUE FALSE       FALSE
#    Social Days  AvgHits DeflatedHits AvgDeflatedHits
# 24  FALSE 1603   419.35    9.308e-06         -0.5213
# 30  FALSE 1876   142.86    4.823e-06         -0.4053
# 37   TRUE 2824 28868.61    7.396e-03         -2.0931
# 38  FALSE 1209    38.67    3.492e-07         -0.2458
# 39   TRUE 3106 11590.20    4.001e-03         -1.6949
# 74  FALSE 1327  1358.99    1.739e-05         -0.6583
# 75  FALSE  796   684.75    2.496e-06         -0.5061
# 77  FALSE  948   275.73    1.042e-06         -0.4080
# rf
# ...
#     Type of random forest: classification
#                      Number of trees: 500
# No. of variables tried at each split: 3
#
#         OOB estimate of  error rate: 2.29%
# Confusion matrix:
#       FALSE TRUE class.error
# FALSE   226    0     0.00000
# TRUE      8  115     0.06504

  1. One sober­ing exam­ple I men­tion in my : were gone within a year. I do not know what the full dimen­sion of the Reader RSS archive loss will be.↩︎

  2. Google Reader affords exam­ples of this lack of trans­parency on a key issue - Google’s will­ing­ness to sup­port Reader (ex­tremely rel­e­vant to users, and even more so to the third-party web ser­vices and appli­ca­tions which relied on Reader to func­tion); from Buz­zFeed’s “Google’s Lost Social Net­work: How Google acci­den­tally built a truly beloved social net­work, only to steam­roll it with Google+. The sad, sur­pris­ing story of Google Reader”:

    The diffi­culty was that Reader users, while hyper­-en­gaged with the pro­duct, never snow­balled into the tens or hun­dreds of mil­lions. Brian Shih became the prod­uct man­ager for Reader in the fall of 2008. “If Reader were its own star­tup, it’s the kind of com­pany that Google would have bought. Because we were at Google, when you stack it up against some of these prod­ucts, it’s tiny and isn’t worth the invest­ment”, he said. At one point, Shih remem­bers, engi­neers were pulled off Reader to work on OpenSo­cial, a “half-baked” devel­op­ment plat­form that never amounted to much. “There was always a polit­i­cal fight inter­nally on keep­ing peo­ple staffed on this lit­tle project”, he recalled. Some­one hung a sign in the Reader offices that said “DAYS SINCE LAST THREAT OF CANCELLATION.” The num­ber was almost always zero. At the same time, user growth - while small next to Gmail’s hun­dreds of mil­lions - more than dou­bled under Shi­h’s tenure. But the “senior types”, as Bilotta remem­bers, “would look at absolute user num­bers. They would­n’t look at mar­ket sat­u­ra­tion. So Reader was con­stantly on the chop­ping block.”

    So when news spread inter­nally of Read­er’s geld­ing, it was like Hem­ing­way’s line about going broke: “Two ways. Grad­u­al­ly, then sud­den­ly.” Shih found out in the spring that Read­er’s inter­nal shar­ing func­tions - the asym­met­ri­cal fol­low­ing mod­el, endemic com­ment­ing and lik­ing, and its advanced pri­vacy set­tings - would be super­seded by the forth­com­ing Google+ mod­el. Of course, he was for­bid­den from breath­ing a word to users.

    Marco Arment says “I’ve heard from mul­ti­ple sources that it effec­tively had a staff of zero for years”.↩︎

  3. Shih fur­ther writes on Quora:

    Let’s be clear that this has noth­ing to do with rev­enue vs oper­at­ing costs. Reader never made money directly (though you could maybe attribute some of Feed­burner and AdSense for Feeds usage to it), and it was­n’t the goal of the prod­uct. Reader has been fight­ing for approval/survival at Google since long before I was a PM for the prod­uct. I’m pretty sure Reader was threat­ened with de-staffing at least three times before it actu­ally hap­pened. It was often for some rea­son related to social:

    • 2008 - let’s pull the team off to build OpenSo­cial
    • 2009 - let’s pull the team off to build Buzz
    • 2010 - let’s pull the team off to build Google+

    It turns out they decided to kill it any­way in 2010, even though most of the engi­neers opted against join­ing G+. Iron­i­cal­ly, I think the rea­son Google always wanted to pull the Reader team off to build these other social prod­ucts was that the Reader team actu­ally under­stood social (and tried a lot of exper­i­ments over the years that informed the larger social fea­tures at the com­pa­ny) [See Read­er’s friends imple­men­ta­tions v1, v2, and v3, com­ments, pri­vacy con­trols, and shar­ing fea­tures. Actu­ally wait, you can’t see those any­more, since they were all ripped out­.]. Read­er’s social fea­tures also evolved very organ­i­cally in response to users, instead of being designed top-down like some of Google’s other efforts [Rob Fish­man’s Buz­zfeed arti­cle has good cov­er­age of this: Google’s Lost Social Net­work]. I sus­pect that it sur­vived for some time after being put into main­te­nance because they believed it could still be a use­ful source of con­tent into G+. Reader users were always vora­cious con­sumers of con­tent, and many of them fil­tered and shared a great deal of it. But after switch­ing the shar­ing fea­tures over to G+ (the so called “share-poca­lypse”) along with the redesigned UI, my guess is that usage just started to fall - par­tic­u­larly around shar­ing. I know that my shar­ing basi­cally stopped com­pletely once the redesign hap­pened [Reader redesign: Ter­ri­ble deci­sion, or worst deci­sion? I was a lot angrier then than I am now – now I’m just sad.]. Though Google did ulti­mately fix a lot of the UI issues, the shar­ing (and there­fore con­tent going into G+) would never recov­er. So with dwin­dling use­ful­ness to G+, (like­ly) dwin­dling or flat­ten­ing usage due to being in main­te­nance, and Google’s big drive to focus in the last cou­ple of years, what choice was there but to kill the pro­duct?

    ↩︎
  4. The sign story is con­firmed by another Googler; “Google Reader lived on bor­rowed time: cre­ator Chris Wetherell reflects”:

    “When they replaced shar­ing with +1 on Google Read­er, it was clear that this day was going to come”, he said. Wetherell, 43, is amazed that Reader has lasted this long. Even before the project saw the light of the day, Google exec­u­tives were unsure about the ser­vice and it was through sheer per­se­ver­ance that it squeaked out into the mar­ket. At one point, the man­age­ment team threat­ened to can­cel the project even before it saw the light of the day, if there was a delay. “We had a sign that said, ‘days since can­cel­la­tion’ and it was there from the very begin­ning”, added a very san­guine Wetherell. My trans­la­tion: Google never really believed in the pro­ject. Google Reader started in 2005 at what was really the golden age of RSS, blog­ging sys­tems and a new con­tent ecosys­tem. The big kahuna at that time was (ac­quired by ) and Google Reader was an upstart.

    ↩︎
  5. The offi­cial PR release stated that too lit­tle usage was the rea­son Reader was being aban­doned. Whether this is the gen­uine rea­son has been ques­tioned by third par­ties, who observe that Reader seems to drive far more traffic than another ser­vice which Google had yet to ax, Google+; that one app had >2m users who also had Reader accounts; that just one alter­na­tive to Reader (Feed­ly) had in excess of 3 mil­lion signups post-an­nounce­ment (re­port­ed­ly, up to 4 mil­lion); and the largest of sev­eral peti­tions to Google reached 148k sig­na­tures (less, though, than the >1m down­loads of the Android clien­t). Given that few users will sign up at Feedly specifi­cal­ly, sign a peti­tion, visit the Buz­zFeed net­work, or use the apps in ques­tion, it seems likely that Reader had closer to 20m users than 2m users when its clo­sure was announced. An unknown Google engi­neer has been quoted as say­ing in 2010 Reader had “tens of mil­lions active monthly users”. Xoogler Jenna Bilotta (left Google Novem­ber 2011) said

    “I think the rea­son why peo­ple are freak­ing out about Reader is because that Reader did stick,” she said, not­ing the wide­spread sur­prise that Google would shut down such a beloved prod­uct. “The num­bers, at least until I left, were still going up.”

    The most pop­u­lar feed on Google Reader in March 2013 had 24.3m sub­scribers (some pix­el-count­ing of an offi­cial user-count graph & infer­ence from a leaked video sug­gests Reader in total may’ve had >36m users in Jan 2011). Jason Scott in 2009 reminded us that this lack of trans­parency is com­pletely pre­dictable: “Since the dawn of time, com­pa­nies have hired peo­ple whose entire job is to tell you every­thing is all right and you can com­pletely trust them and the com­pany is as sta­ble as a rock, and to do so until they, them­selves, are fired because the com­pany is out of busi­ness.”↩︎

  6. This would not come as news to Jason Scott of , of course, but nev­er­the­less James Fal­lows points out that when a cloud ser­vice evap­o­rates, it’s sim­ply gone and gives an inter­est­ing com­par­ison:

    , in Mother Jones, on why the inabil­ity to rely on Google ser­vices is more dis­rup­tive than the famil­iar pre-cloud expe­ri­ence of hav­ing favorite pro­grams get orphaned. My exam­ple is : it has offi­cially been dead for nearly 20 years, but I can still use it (if I want, in a DOS ses­sion under the VMware Fusion Win­dows emu­la­tor on my Macs. Talk about lay­ered legacy sys­tem­s!). When a cloud pro­gram goes away, as Google Reader has done, it’s gone. There is no way you can keep using your own “legacy” copy, as you could with pre­vi­ous orphaned soft­ware.

    ↩︎
  7. From Gan­nes’s “Another Rea­son Google Reader Died: Increased Con­cern About Pri­vacy and Com­pli­ance”

    But at the same time, Google Reader was too deeply inte­grated into Google Apps to spin it off and sell it, like the com­pany did last year with its SketchUp 3-D mod­el­ing soft­ware.

    mat­tbar­rie on Hacker News:

    I’m here with Alan Noble who runs engi­neer­ing at Google Aus­tralia and ran the Google Reader project until 18 months ago. They looked at open sourc­ing it but it was too much effort to do so because it’s tied to closely to Google infra­struc­ture. Basi­cally it’s been culled due to long term declin­ing use.

    ↩︎
  8. The sheer size & dom­i­nance of some Google ser­vices have lead to com­par­isons to nat­ural monop­o­lies, such as the Econ­o­mist col­umn “Google’s Google prob­lem”. I saw this com­par­i­son mocked, but it’s worth not­ing that at least one Googler made the same com­par­i­son years before. From ’s 2011, part 7, sec­tion 2:

    While some Googlers felt sin­gled out unfairly for the atten­tion, the more mea­sured among them under­stood it as a nat­ural con­se­quence of Google’s increas­ing pow­er, espe­cially in regard to dis­trib­ut­ing and stor­ing mas­sive amounts of infor­ma­tion. “It’s as if Google took over the water sup­ply for the entire United States”, says Mike Jones, who han­dled some of Google’s pol­icy issues. “It’s only fair that soci­ety slaps us around a lit­tle bit to make sure we’re doing the right thing.”

    ↩︎
  9. Specifi­cal­ly, this can be seen as a sort of issue of reduc­ing : in some of the more suc­cess­ful acqui­si­tions, Google’s modus operandi was to take a very expen­sive or highly pre­mium ser­vice and make it com­pletely free while also improv­ing the qual­i­ty. Ana­lyt­ics, Maps, Earth, Feed­burner all come to mind as ser­vices whose pre­de­ces­sors (mul­ti­ple, in the cases of Maps and Earth) charged money for their ser­vices (some­times a great deal). This leads to dead­weight loss as peo­ple do not use them, who would ben­e­fit to some degree but not to the full amount of the price (plus other fac­tors like risk­i­ness of invest­ing time and money into try­ing it out). Google cites fig­ures like bil­lions of users over the years for sev­eral of these for­mer­ly-premium ser­vices, sug­gest­ing the gains from reduced dead­weight loss are large.↩︎

  10. If there is one truth of the tech indus­try, it’s that no giant (ex­cept IBM) sur­vives for­ev­er. Death rates for all cor­po­ra­tions and non­profits are very high, but par­tic­u­larly so for tech. One blog­ger asks a good ques­tion:

    As we come to rely more and more on the Inter­net, it’s becom­ing clear that there is a real threat posed by tying one­self to a 3rd party ser­vice. The Inter­net is famously designed to route around fail­ures caused by a nuclear strike - but it can­not defend against a ser­vice being with­drawn or a com­pany going bank­rupt. It’s tempt­ing to say that mul­ti­-bil­lion dol­lar com­pa­nies like Apple and Google will never dis­ap­pear - but a quick look at his­tory shows Nokia, Enron, Amstrad, Sega, and many more which have fallen from great heights until they are mere shells and no longer offer the ser­vices which many peo­ple once relied on…I like to pose this ques­tion to my pho­tog­ra­phy friends - “What would you do if Yahoo! sud­denly decided to delete all your Flickr pho­tos?” Some of them have back­ups - most faint at the thought of all their work van­ish­ing.

    ↩︎
  11. Weber’s con­clu­sion:

    We dis­cov­ered there’s been a total of about 251 inde­pen­dent Google prod­ucts since 1998 (avoid­ing add-on fea­tures and exper­i­ments that merged into other pro­ject­s), and found that 90, or approx­i­mately 36% of them have been can­celed. Awe­some­ly, we also col­lected 8 major flops and 14 major suc­cess­es, which means that 36% of its high­-pro­file prod­ucts are fail­ures. That’s quite the coin­ci­dence! NOTE: We did not manip­u­late data to come to this con­clu­sion. It was a happy acci­dent.

    In an even more happy acci­dent, my dataset of 350 prod­ucts yields 123 canceled/shutdown entries, or 35%!↩︎

  12. Some have jus­ti­fied Read­er’s shut­down as sim­ply a ratio­nal act, since Reader was not bring­ing in any money and Google is not a char­i­ty. The truth seems to be related more to Google’s lack of inter­est since the start - it’s hard to see how Google could pos­si­bly be able to mon­e­tize Gmail and not also mon­e­tize Read­er, which is con­firmed by two involved Googlers (from “Google Reader lived on bor­rowed time: cre­ator Chris Wetherell reflects”):

    I won­der, did the com­pany (Google) and the ecosys­tem at large mis­read the tea leaves? Did the world at large see an RSS/reader mar­ket when in real­ity the actual mar­ket oppor­tu­nity was in data and sen­ti­ment analy­sis? [Chris] Wetherell agreed. “The reader mar­ket never went past the exper­i­men­tal phase and none was iter­at­ing on the busi­ness mod­el,” he said. “Mon­e­ti­za­tion abil­i­ties were never tried.”

    “There was so much data we had and so much infor­ma­tion about the affin­ity read­ers had with cer­tain con­tent that we always felt there was mon­e­ti­za­tion oppor­tu­ni­ty,” he said. Dick Cos­tolo (cur­rently CEO of Twit­ter), who worked for Google at the time (hav­ing sold Google his com­pa­ny, Feed­burn­er), came up with many mon­e­ti­za­tion ideas but they fell on deaf ears. Cos­tolo, of course is work­ing hard to mine those affin­i­ty-and-con­text con­nec­tions for Twit­ter, and is suc­ceed­ing. What Cos­tolo under­stood, Google and its man­darins totally missed, as noted in this Novem­ber 2011 blog post by Chris who wrote:

    Reader exhibits the best unpaid rep­re­sen­ta­tion I’ve yet seen of a con­sumer’s rela­tion­ship to a con­tent pro­ducer. You pay for HBO? That’s a strong sig­nal. Con­sum­ing free stuff? Read­er’s model was a dream. Even bet­ter than Net­flix. You get affin­ity (which has clear mon­e­tary val­ue) for free, and a tracked pat­tern of behav­ior for the act of iter­at­ing over differ­ently sourced items - and a mech­a­nism for dis­trib­ut­ing that quickly to an osten­si­ble audi­ence which did­n’t include social guilt or gameifi­ca­tion - along with an exten­si­ble, scal­able plat­form avail­able via com­monly used web tech­nolo­gies - all of which would be an amaz­ing oppor­tu­nity for the right prod­uct vision­ary. Reader is (was?) for infor­ma­tion junkies; not just tech nerds. This mar­ket totally exists and is weirdly under­-served (and is pos­si­bly afflu­en­t).

    Over­all, from just the PR per­spec­tive, Google prob­a­bly would have been bet­ter off switch­ing Reader to a sub­scrip­tion model and then even­tu­ally killing it while claim­ing the fees weren’t cov­er­ing the costs. Offhand, 3 exam­ples of Google adding or increas­ing fees come to mind: the Maps API, Talk inter­na­tional calls (ap­par­ently free ini­tial­ly), and App Engine fees; the API price increase was even­tu­ally rescinded as far as I know, and no one remem­bers the lat­ter two (not even App Engine devs).↩︎

  13. , “Free­dom 0”; iron­i­cal­ly, Pil­grim (hired by Google in 2007) seems to be respon­si­ble for at least one of the entries being marked dead, Google’s Doc­type tech ency­clo­pe­dia, since it dis­ap­peared around the time of his “info­s­ui­cide” and has not been res­ur­rected - it was only par­tially FLOSS.↩︎

  14. #18 in The Col­lected Songs of Cold Moun­tain, Red Pine 2000, ISBN 1-55659-140-3↩︎

  15. Xoogler Rachel Kroll on this spike:

    I have some thoughts about the spikes on the death dates.

    Sep­tem­ber: all of the interns go back to school. These peo­ple who exist on the fringes of the sys­tem man­age to get a lot of work done, pos­si­bly because they are free of most of the over­head fac­ing real employ­ees. Once they leave, it’s up to the FTEs [Full Time Employ­ee] to own what­ever was cre­at­ed, and that does­n’t always work. I wish I could have kept some of them and swapped them for some of the ful­l-timers.

    March/April: Annual bonus time? That’s what it used to be, at least, and I say this as some­one who quit in May, and that was no acci­dent. Same thing: peo­ple leave, and that dooms what­ever they left.

    ↩︎
  16. 0.057; but as the old crit­i­cism of NHST goes, “surely God loves the 0.057 almost as much as the 0.050”.↩︎

  17. Specifi­cal­ly: build­ing a logis­tic model on a boot­strap sam­ple and then test­ing accu­racy against full Google dataset.↩︎

  18. But note that “sun­set­ting” of “con­sumer Google+” was announced in Octo­ber 2018.↩︎

  19. I include Voice even though I don’t use it or oth­er­wise find it inter­est­ing (my cri­te­ria for the other 10) because spec­u­la­tion has been rife and because a pre­dic­tion on its future was requested.↩︎