Alerts Over Time

Does Google Alerts return fewer results each year? A statistical investigation
statistics, shell, R, Google
2013-07-012013-11-26 finished certainty: likely importance: 4


Has Google Alerts been send­ing fewer results the past few years? Yes. Respond­ing to rumors of its demise, I inves­ti­gate the num­ber of results in my per­sonal Google Alerts noti­fi­ca­tions 2007-2013, and find no over­all trend of decline until I look at a tran­si­tion in mid-2011 where the results fall dra­mat­i­cal­ly. I spec­u­late about the cause and impli­ca­tions for Alert­s’s future.

While research­ing my essay on how long Google prod­ucts sur­vive before being killed (in­spired by Read­er), I came across spec­u­la­tion that , a ser­vice which runs search queries on your behalf & emails you about any new match­ing web­pages (ex­tremely use­ful for keep­ing abreast of top­ics and one of the old­est Google ser­vices), had bro­ken badly in 2012. When I saw this, I remem­bered think­ing that my own alerts did not seem to be as use­ful as they did, but I had­n’t been sure if this was Alert­s’s fault or if my par­tic­u­lar key­words were just less active than the past. Google’s offi­cial com­ments on the topic have been min­i­mal1.

Alerts dying would be a prob­lem for me as I have used Alerts exten­sively since 2007-01-28 (2347 days) with 23 cur­rent Alerts (and many more in the past) - of my 501,662 total emails, 3,815 were Alert emails - and there did not seem to be any usable alter­na­tives2. Trou­bling­ly, Alert­s’s RSS feeds were unavail­able between July & Sep­tem­ber 2013.

As it hap­pened, the sur­vival model sug­gested that Alerts had a good chance of sur­viv­ing a long time, and I put it from mind until I remem­bered that since I had used Alerts for so many years and had so many emails, I could eas­ily check the claims empir­i­cal­ly—did Alerts abruptly stop return­ing many hits? This is a straight­for­ward ques­tion to answer: extract the subject/date/number of links from each Alerts email, strat­ify by unique alert, and regress over time. So I did.

Data

As part of my backup pro­ce­dures, I fetch daily my Gmail emails using getmail4 into a . Alerts uses a unchang­ing sub­ject line like Subject: Google Alert - "Frank Herbert" -mason, so it is easy to find all its emails and sep­a­rate them out.

~/mail/
$ find ~/mail/ -type f -exec fgrep -l {} 'Google Alert -' \;
/home/gwern/mail/new/1282125775.M532208P12203Q683Rb91205f53b0fec0d.craft
/home/gwern/mail/new/1282125789.M55800P12266Q737Rd98db4aa1e58e9ed.craft
...
$ find ~/mail/ -type f -exec fgrep -l 'Google Alert -' {} \; > alerts.txt
$ mkdir 2013-09-25-gwern-googlealertsemails/
$ mv `cat alerts.txt` 2013-09-25-gwern-googlealertsemails/

I deleted emails from a few alerts which were pri­vate; the remain­ing 72M of emails are avail­able at 2013-09-25-gwern-googlealertsemails.tar.xz. Then a loop & ad hoc shel­l-script­ing extracts the sub­jec­t-­line, the date, and how many instances of “http://” there are in each email:

cd 2013-09-25-gwern-googlealertsemails/

echo "Search,Date,Links" >> alerts.csv # set up the header
for EMAIL in *.craft *.elan; do

    SUBJECT="`egrep '^Subject: Google Alert - ' $EMAIL  | sed -e 's/Subject: Google Alert - //'`"
    DATE="`egrep '^Date: ' $EMAIL | cut -d ' ' -f 3-5 | sed -e 's/<b>..*/ /'`"
    COUNT="`fgrep --no-filename --count 'http://' $EMAIL`"

    echo $SUBJECT,$DATE,$COUNT >> alerts.csv
done

The script­ing isn’t per­fect and I had to delete sev­eral spu­ri­ous lines before I could read it into R and for­mat it into a clean CSV:

alerts <- read.csv("alerts.csv", quote=c(), colClasses=c("character","character","integer"))
alerts$Date <- as.Date(alerts$Date, format="%d %b %Y")
write.csv(alerts, file="2013-09-25-gwern-googlealerts.csv", row.names=FALSE)

Analysis

Descriptive

alerts <- read.csv("https://www.gwern.net/docs/personal/2013-09-25-gwern-googlealerts.csv",
                   colClasses=c("factor","Date","integer"))
summary(alerts)
                     Search          Date                Links
 wikipedia              : 255   Min.   :2007-01-28   Min.   :  0.0
 Neon Genesis Evangelion: 247   1st Qu.:2008-12-29   1st Qu.: 10.0
 "Gene Wolfe"           : 246   Median :2011-02-07   Median : 22.0
 "Nick Bostrom"         : 224   Mean   :2010-10-06   Mean   : 37.9
 modafinil              : 186   3rd Qu.:2012-06-15   3rd Qu.: 44.0
 "Frank Herbert" -mason : 184   Max.   :2013-09-25   Max.   :563.0
 (Other)                :2585

# So many because I have deleted many topics I am no longer interested in,
# and refined the search criteria of others
length(unique(alerts$Search))
[1] 68

plot(Links ~ Date, data=alerts)
Links in each email, graphed over time

The first thing I notice is that it looks like the num­ber of links per email is going up over time, with a spike in mid-2010. The sec­ond is that there’s quite a bit of vari­a­tion from email to email - while most are around 0, some are as high as 300. The third is that there’s a weird early anom­aly where emails are recorded as hav­ing 0 links; look­ing at those emails, they are encod­ed, for no appar­ent rea­son, and then all sub­se­quent emails are in more sen­si­ble HTML/text for­mats. An ill-­fated exper­i­ment by Google? I have no idea. The high­est num­ber is 563, which isn’t very big; so despite the skew, I did­n’t bother to log-­trans­form Links.

Linear model

The spik­i­ness prompts me to adjust for vari­a­tion in send­ing rate and what­not by buck­et­ing emails into months, and over­lay a lin­ear regres­sion:

library(lubridate)
alerts$Date <- floor_date(alerts$Date, "month")
alerts <- aggregate(Links ~ Search + Date, alerts, "sum")

# a simple linear model agrees with *small* monthly increase, but notes that there is a tremendous
# amount of unexplained variation
lm <- lm(Links ~ Date, data=alerts); summary(lm)
...
Residuals:
   Min     1Q Median     3Q    Max
-175.8 -110.0  -60.5   43.3  992.6

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.13e+02   1.10e+02   -3.76  0.00018
Date         3.73e-02   7.38e-03    5.06  4.9e-07

Residual standard error: 175 on 1046 degrees of freedom
Multiple R-squared:  0.0239,    Adjusted R-squared:  0.023
F-statistic: 25.6 on 1 and 1046 DF,  p-value: 4.89e-07

plot(Links ~ Date, data=alerts)
abline(lm)
Total links in each search, by month

No big dif­fer­ence with the orig­i­nal plot: still a generic increas­ing trend. Here is a basic prob­lem with this regres­sion: does this increase reflect an increase in the num­ber of alerts I am sub­scribed to, tweaks to each alert to make each return more hits (shift­ing from old alerts to new alert­s), or a increase in links per unique alert? It is only the last claim we are inter­ested in, but any of these or other phe­nom­e­non could pro­duce an increase.

Per alert

We could try treat­ing each alert sep­a­rately and doing a lin­ear regres­sion on them, and com­par­ing with the lin­ear model on all data indis­crim­i­nate­ly:

library(ggplot2)
qplot(Date, Links, color=Search, data=alerts) +
    stat_smooth(method="lm", se=FALSE, fullrange=TRUE, size=0.2) +
    geom_abline(aes(intercept=lm$coefficients[1], slope=lm$coefficients[2], color=c()), size=1) +
    ylim(0,1130) +
    theme(legend.position = "none")
Split­ting data by alert, regress­ing indi­vid­u­ally

The result is chaot­ic. Indi­vid­ual alerts are point­ing every which way. Regress­ing on every alert together con­founds issues, and regress­ing on indi­vid­ual alerts pro­duces no agree­ment. We want some inter­me­di­ate approach which respects that alerts have dif­fer­ent behav­ior, but yields a mean­ing­ful over­all state­ment.

Multi-level model

What we want is to look at each unique alert, esti­mate its increase/decrease over time, and per­haps sum­ma­rize all the slopes into a sin­gle grand slope. There is a hier­ar­chi­cal struc­ture to the data: the over­all slope of Google influ­ences the slope of each alert, which influ­ences the dis­tri­b­u­tion of the data points around each slope.

We can do this with a , using lme4.

We’ll start by fit­ting & com­par­ing 2 mod­els:

  1. only the inter­cept varies between each alert, but all alerts increase or decrease at the same rate
  2. the inter­cept varies between each alert, and also alerts dif­fer in their wax­ing or wan­ing
library(lme4)
mlm1 <- lmer(Links ~ Date + (1|Search), alerts); mlm1

Random effects:
 Groups   Name        Variance Std.Dev.
 Search   (Intercept) 28943    170
 Residual             12395    111
Number of obs: 1048, groups: Search, 68

Fixed effects:
             Estimate Std. Error t value
(Intercept) 427.93512  139.76994    3.06
Date         -0.01984    0.00928   -2.14

Correlation of Fixed Effects:
     (Intr)
Date -0.988

mlm2 <- lmer(Links ~ Date + (1+Date|Search), alerts); mlm2

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Search   (Intercept) 6.40e+06 2529.718
          Date        2.78e-02    0.167 -0.998
 Residual             8.36e+03   91.446
Number of obs: 1048, groups: Search, 68

Fixed effects:
            Estimate Std. Error t value
(Intercept) 295.5469   420.0588    0.70
Date         -0.0090     0.0278   -0.32

Correlation of Fixed Effects:
     (Intr)
Date -0.998

# compare the models: does model 2 buy us anything?
anova(mlm1, mlm2)

mlm1: Links ~ Date + (1 | Search)
mlm2: Links ~ Date + (1 + Date | Search)
     Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)
mlm1  4 13070 13089  -6531    13062
mlm2  6 12771 12801  -6379    12759   303      2     <2e-16

Model 2 is bet­ter on both simplicity/fit cri­te­ria, so we’ll look at that closer:

coef(mlm2)
$Search
                                                               (Intercept)      Date
                                                                    763.48 -0.046763
adult iodine supplementation (IQ OR intelligence OR cognitive)      718.80 -0.043157
AMD pacifica virtualization                                        -836.63  0.062123
(anime OR manga) (half-Japanese OR hafu OR half-American)           956.52 -0.059438
caloric restriction                                               -2023.10  0.153667
"Death Note" (script OR live-action OR Parlapanides)                866.63 -0.051314
"dual n-back"                                                      4212.59 -0.266879
dual n-back                                                        1213.85 -0.073265
electric sheep screensaver                                         -745.78  0.055937
"Frank Herbert"                                                     -93.28  0.013636
"Frank Herbert" -mason                                            10815.19 -0.676188
freenet project                                                   -1154.14  0.087199
"Gene Wolfe"                                                        496.01 -0.026575
Gene Wolfe                                                         1178.36 -0.072681
...
wikileaks                                                         -3080.74  0.227583
WikiLeaks                                                          -388.34  0.031441
wikipedia                                                         -1668.94  0.133976
Xen                                                                 390.01 -0.017209

Date is in links per mon­th, so when Xen has a slope of -0.02, that means that every year it falls one link.

max(abs(coef(mlm2)$Search$Date))
[1] 0.6762

Which comes from the "Frank Herbert" -mason search, prob­a­bly reflect­ing how rel­a­tively new the search is or the effec­tive­ness of the fil­ter I added to the orig­i­nal "Frank Herbert" search. In gen­er­al, the slopes are very sim­i­lar, there seem to be as many pos­i­tive slopes as there are neg­a­tive, and the over­all sum­mary slope is a tiny neg­a­tive slope in the sec­ond model (-0.01); but most of the search­es’ slopes exclude zero in the cater­pil­lar plot:

qqmath(ranef(mlm2, postVar=TRUE))

This says to me that there is no large change over time hap­pen­ing within each alert, as the orig­i­nal claims went, but there does seem to be some­thing going on. When we plot the over­all regres­sion and the per-alert regres­sions, we see

fixParam <- fixef(mlm2)
ranParam <- ranef(mlm2)$Search
params   <- cbind(ranParam[1]+fixParam[1], ranParam[2]+fixParam[2])
p <- qplot(Date, Links, color=Search, data=alerts)
p +
  geom_abline(aes(intercept=`(Intercept)`, slope=Date, color=rownames(params)), data=params, size=0.2) +
  geom_abline(aes(intercept=fixef(mlm2)[1], slope=fixef(mlm2)[2], color=c()), size=1) +
  ylim(0,1130) +
  theme(legend.position = "none")
Mul­ti­-level regres­sion, grand and indi­vid­ual fits

This clearly makes more sense than regress­ing each alert sep­a­rate­ly, as we avoid crazily steep slopes when there are just a few emails to use and their regres­sions get shrunk to the over­all regres­sion. We also see no evi­dence for any large or sta­tis­ti­cal­ly-sig­nif­i­cant change over time for alerts in gen­er­al: some alerts do increase over time but some alerts also decrease over time, and there is only a small decrease which we might blame on inter­nal Google prob­lems.

What about the fall?

Hav­ing done all this, I thought I was fin­ished until I remem­bered that the orig­i­nal blog­gers did­n’t com­plain about a steady dete­ri­o­ra­tion over time, but an abrupt one start­ing some­where in 2012. What hap­pens when I do a binary split and com­pare 2010/2011 to 2012/2013?

alertsRecent <- alerts[year(alerts$Date)>=2010,]
alertsRecent$Recent <- year(alertsRecent$Date) >= 2012
wilcox.test(Links ~ Recent, conf.int=TRUE, data=alertsRecent)

    Wilcoxon rank sum test with continuity correction

data:  Links by Recent
W = 71113, p-value = 6.999e-10
alternative hypothesis: true location shift is not equal to 0
95% confidence interval:
 34 75
sample estimates:
difference in location
                    53

I avoided a nor­mal­i­ty-based test like and used instead because there’s no rea­son to expect the num­ber of links per month to fol­low a nor­mal dis­tri­b­u­tion. Regard­less of the details, there’s a big dif­fer­ence between the two time peri­ods: 219 vs 140 links! A fall of 36% is cer­tainly a seri­ous decline, and it can­not be waved away as due to my Alerts set­tings (I always use “All results” and never “Only the best results”) nor, as we’ll see now, a con­found like the pos­si­bil­i­ties that moti­vated mul­ti­-level model use:

R> alerts$Recent <- year(alerts$Date) >= 2012
R> mlm3 <- lmer(Links ~ Date + Recent + (1+Date|Search), alerts); mlm3

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Search   (Intercept) 9.22e+03 9.60e+01
          Date        9.52e-05 9.75e-03 -0.164
 Residual             1.18e+04 1.09e+02
Number of obs: 1048, groups: Search, 68

Fixed effects:
             Estimate Std. Error t value
(Intercept) -440.1540   175.3630   -2.51
Date           0.0413     0.0121    3.42
RecentTRUE  -102.2273    13.3224   -7.67

Correlation of Fixed Effects:
           (Intr) Date
Date       -0.993
RecentTRUE  0.630 -0.647

R> anova(mlm1, mlm2, mlm3)
Models:
mlm1: Links ~ Date + (1 | Search)
mlm2: Links ~ Date + (1 + Date | Search)
mlm3: Links ~ Date + Recent + (1 + Date | Search)
     Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)
mlm1  4 13070 13089  -6531    13062
mlm2  6 12771 12801  -6379    12759   303      2     <2e-16
mlm3  7 13015 13050  -6500    13001     0      1          1

It was mid-2011

A new model treat­ing pre-2012 as dif­fer­ent turns up with a supe­rior fit. Can we do bet­ter? A changepoint fin­gers May/June 2011 as the cul­prit and giv­ing a larger dif­fer­ence in means (254 vs 147):

library(changepoint)
plot(cpt.meanvar(alertsRecent$Links), ylab="Links")
Link count 2010-2013, depict­ing a regime tran­si­tion in May/June 2011

With this new change­point, the test is more sig­nif­i­cant

alertsRecent <- alerts[year(alerts$Date)>=2010,]
alertsRecent$Recent <- alertsRecent$Date > "2011-05-01"
wilcox.test(Links ~ Recent, conf.int=TRUE, data=alertsRecent)

    Wilcoxon rank sum test with continuity correction

data:  Links by Recent
W = 63480, p-value = 4.61e-12
alternative hypothesis: true location shift is not equal to 0
95% confidence interval:
  62 112
sample estimates:
difference in location
                    87

And the fit improves by a large amount:

R> alerts$Recent <- alerts$Date > "2011-05-01"
R> mlm4 <- lmer(Links ~ Date + Recent + (1+Date|Search), alerts); mlm4

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Search   (Intercept) 8.64e+03 9.30e+01
          Date        9.28e-05 9.63e-03 -0.172
 Residual             1.11e+04 1.05e+02
Number of obs: 1048, groups: Search, 68

Fixed effects:
             Estimate Std. Error t value
(Intercept) -1.11e+03   1.87e+02   -5.91
Date         8.86e-02   1.30e-02    6.83
RecentTRUE  -1.65e+02   1.44e+01  -11.43

Correlation of Fixed Effects:
           (Intr) Date
Date       -0.994
RecentTRUE  0.709 -0.725

R> anova(mlm1, mlm2, mlm3, mlm4)
     Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)
mlm1  4 13070 13089  -6531    13062
mlm2  6 12771 12801  -6379    12759 302.7      2     <2e-16
mlm3  7 13015 13050  -6500    13001   0.0      1          1
mlm4  7 12948 12983  -6467    12934  66.8      0     <2e-16

Robustness

Is the fall robust against dif­fer­ent sam­ples of my data, using ? The answer is yes, and the Wilcoxon test even turns out to have given us a pretty good con­fi­dence inter­val ear­lier:

library(boot)
recentEstimate <- function(dt, indices) {
  d <- dt[indices,] # allows boot to select subsample
  mlm4 <- lmer(Links ~ Date + Recent + (1+Date|Search), d)
  return(fixef(mlm4)[3])
}
bs <- boot(data=alerts, statistic=recentEstimate, R=10000, parallel="multicore", ncpus=4); bs
...
Bootstrap Statistics :
    original  bias    std. error
t1*   -164.8   34.06       17.44

boot.ci(bs)
...
Intervals :
Level      Normal              Basic
95%   (-233.0, -164.7 )   (-228.2, -156.7 )

Level     Percentile            BCa
95%   (-172.9, -101.4 )   (-211.8, -156.7 )

A con­fi­dence inter­val of (-159,-95) is both sta­tis­ti­cal­ly-sig­nif­i­cant in this con­text and also an effect size to be reck­oned with. It seems this mid-2011 fall is real. I’m sur­prised to find such a pre­cise, local­ized, drop in my Alerts quan­ti­ties. I did expect to find a decline, but I expected it to be a grad­ual incre­men­tal process as Google’s search algo­rithms grad­u­ally excluded more and more links. I did­n’t expect to be able to say some­thing like “in this mon­th, results dropped by more than a third”.

Panda?

I don’t know of any changes announced to Google Alerts in May/June 2011, and the emails can’t tell us directly what hap­pened. But I can spec­u­late.

There is one cul­prit that comes to mind for what may have changed in early 2011 which would then led to a fall in col­lated links (a fall which would accu­mu­late to sta­tis­ti­cal-sig­nif­i­cance in June 2011): the per­va­sive change to web­page rank­ings called . It affected many web­sites & search­es, had teething prob­lems, report­edly boosted social net­work­ing sites (which I gen­er­ally see very few of in my own alert­s), and was rolled out glob­ally in April 2011 - just in time to trig­ger a change in May/June (with con­tin­u­ous changes through 2011).

(We’ll prob­a­bly never know the true rea­son: Google is noto­ri­ously uncom­mu­nica­tive about many of its inter­nal tech­ni­cal deci­sions and changes.)

Conclusion

So where does this leave us?

Well, the over­all lin­ear regres­sions turned out to not answer the ques­tion, but they were still edu­ca­tional in demon­strat­ing the con­sid­er­able diver­sity between alerts and the trick­i­ness of under­stand­ing what ques­tion exactly we were ask­ing; the vari­abil­ity and dif­fer­ences between alerts reminds us to not be fooled by ran­dom­ness and try to look for big effects & the big pic­ture - if some­one says their alerts seem a lit­tle down, they may have been fooled by selec­tive mem­o­ry, but when they say their alerts went from 20 links an email to 3, then we should avoid unthink­ing skep­ti­cism and look more care­ful­ly.

When we inves­ti­gated the claim direct­ly, we did­n’t quite find the claim: there was no change­point any­where in 2012 as claimed by blog­gers like - they seem to have been half a year off from when the change occurred in my own alerts. What’s going on there? It’s hard to say. Google some­times rolls out changes to users over long peri­ods of time, so per­haps I was hit early by some changes dras­ti­cally reduc­ing links. Or per­haps it sim­ply took time for peo­ple to become cer­tain that there were fewer links (in which case I have given them too lit­tle cred­it). Or per­haps sep­a­rate SEO-related changes hit their searches after mine were.

Is Alerts “bro­ken”? Well, it’s taken a clear hit: the num­ber of found links are down, and my own impres­sion is that the returned links are not such gems that they make up for their gem-­like rar­i­ty. And it’s cer­tainly not good that the prob­lem is now 2 years old with­out any dis­cernible improve­ment.

But on closer inspec­tion, the hit seems to have been a one-­time deal, and if my Panda spec­u­la­tion is cor­rect, it does not reflect any neglect or con­tempt by Google but sim­ply more impor­tant fac­tors - Search remain­ing high­-qual­ity will always be a higher pri­or­ity than Alerts, because Search is the dog that wags the tail. My sur­vival model may yet have the last laugh and Alerts out­last its more famous brethren.

I sup­pose it depends on whether you see the glass as half-­full or half-emp­ty: if half-­full, then this is good news because it means that Alerts isn’t in as bad shape as it looks and may not soon be fol­low­ing Reader into the great Recy­cle Bin in the sky; if half-emp­ty, then this is another exam­ple of how Google does not com­mu­ni­cate with its users, makes changes uni­lat­er­ally and invis­i­bly, will degrade one ser­vice for another more prof­itable ser­vice, and how users are help­less in the face of its tech­ni­cal supremacy (who else can do as good a job of spi­der­ing the Inter­net for new con­tent match­ing key­word­s?).

See Also


  1. “What’s Wrong With Google Alerts? The small but use­ful ser­vice seems to be dying. One researcher uses empir­i­cal research to answer the ques­tions that Google won’t.”, Buz­zFeed:

    Google has refused to shed light on the decline. Today, a Google spokesper­son told Buz­zFeed, “we’re always work­ing to improve our prod­ucts - we’ll con­tinue mak­ing updates to Google Alerts to make it more use­ful for peo­ple.” In other words, a polite non-an­swer.

    ↩︎
  2. I have seen some alter­na­tive ser­vices, Yahoo! Search Alerts, talk­walker & Men­tion sug­gest­ed, but have not used them; the lat­ter 2 do well in a com­par­i­son with Google Alerts.↩︎