Does Google Alerts return fewer results each year? A statistical investigation
20130701–20131126
finished
certainty: likely
importance: 4
Has Google Alerts been sending fewer results the past few years? Yes. Responding to rumors of its demise, I investigate the number of results in my personal Google Alerts notifications 20072013, and find no overall trend of decline until I look at a transition in mid2011 where the results fall dramatically. I speculate about the cause and implications for Alerts’s future.
While researching my Google services survival analysis essay on how long Google products survive before being killed (inspired by Reader), I came across speculation that Google Alerts, a service which runs search queries on your behalf & emails you about any new matching webpages (extremely useful for keeping abreast of topics and one of the oldest Google services), had broken badly in 2012. When I saw this, I remembered thinking that my own alerts did not seem to be as useful as they did, but I hadn’t been sure if this was Alerts’s fault or if my particular keywords were just less active than the past. Google’s official comments on the topic have been minimal^{1}.
Alerts dying would be a problem for me as I have used Alerts extensively since 20070128 (2347 days) with 23 current Alerts (and many more in the past)  of my 501,662 total emails, 3,815 were Alert emails  and there did not seem to be any usable alternatives^{2}. Troublingly, Alerts’s RSS feeds were unavailable between July & September 2013.
As it happened, the survival model suggested that Alerts had a good chance of surviving a long time, and I put it from mind until I remembered that since I had used Alerts for so many years and had so many emails, I could easily check the claims empirically—did Alerts abruptly stop returning many hits? This is a straightforward question to answer: extract the subject/
Data
As part of my backup procedures, I fetch daily my Gmail emails using getmail4
into a maildir. Alerts uses a unchanging subject line like Subject: Google Alert  "Frank Herbert" mason
, so it is easy to find all its emails and separate them out.
find ~/mail/ type f exec fgrep l {} 'Google Alert ' \;
# /home/gwern/mail/new/1282125775.M532208P12203Q683Rb91205f53b0fec0d.craft
# /home/gwern/mail/new/1282125789.M55800P12266Q737Rd98db4aa1e58e9ed.craft
# ...
find ~/mail/ type f exec fgrep l 'Google Alert ' {} \; > alerts.txt
mkdir 20130925gwerngooglealertsemails/
mv `cat alerts.txt` 20130925gwerngooglealertsemails/
I deleted emails from a few alerts which were private; the remaining 72M of emails are available at 20130925gwerngooglealertsemails.tar.xz
. Then a loop & ad hoc shellscripting extracts the subjectline, the date, and how many instances of “http:/
cd 20130925gwerngooglealertsemails/
echo "Search,Date,Links" >> alerts.csv # set up the header
for EMAIL in *.craft *.elan; do
SUBJECT="`egrep '^Subject: Google Alert  ' $EMAIL  sed e 's/Subject: Google Alert  //'`"
DATE="`egrep '^Date: ' $EMAIL  cut d ' ' f 35  sed e 's/<b>..*/ /'`"
COUNT="`fgrep nofilename count 'http://' $EMAIL`"
echo $SUBJECT,$DATE,$COUNT >> alerts.csv
done
The scripting isn’t perfect and I had to delete several spurious lines before I could read it into R and format it into a clean CSV:
alerts < read.csv("alerts.csv", quote=c(), colClasses=c("character","character","integer"))
alerts$Date < as.Date(alerts$Date, format="%d %b %Y")
write.csv(alerts, file="20130925gwerngooglealerts.csv", row.names=FALSE)
Analysis
Descriptive
alerts < read.csv("https://www.gwern.net/docs/personal/20130925gwerngooglealerts.csv",
colClasses=c("factor","Date","integer"))
summary(alerts)
# Search Date Links
# wikipedia : 255 Min. :20070128 Min. : 0.0
# Neon Genesis Evangelion: 247 1st Qu.:20081229 1st Qu.: 10.0
# "Gene Wolfe" : 246 Median :20110207 Median : 22.0
# "Nick Bostrom" : 224 Mean :20101006 Mean : 37.9
# modafinil : 186 3rd Qu.:20120615 3rd Qu.: 44.0
# "Frank Herbert" mason : 184 Max. :20130925 Max. :563.0
# (Other) :2585
# So many because I have deleted many topics I am no longer interested in,
# and refined the search criteria of others
length(unique(alerts$Search))
# [1] 68
plot(Links ~ Date, data=alerts)
The first thing I notice is that it looks like the number of links per email is going up over time, with a spike in mid2010. The second is that there’s quite a bit of variation from email to email  while most are around 0, some are as high as 300. The third is that there’s a weird early anomaly where emails are recorded as having 0 links; looking at those emails, they are base64 encoded, for no apparent reason, and then all subsequent emails are in more sensible HTML/Links
.
Linear model
The spikiness prompts me to adjust for variation in sending rate and whatnot by bucketing emails into months, and overlay a linear regression:
library(lubridate)
alerts$Date < floor_date(alerts$Date, "month")
alerts < aggregate(Links ~ Search + Date, alerts, "sum")
# a simple linear model agrees with *small* monthly increase, but notes that there is a tremendous
# amount of unexplained variation
lm < lm(Links ~ Date, data=alerts); summary(lm)
# ...
# Residuals:
# Min 1Q Median 3Q Max
# 175.8 110.0 60.5 43.3 992.6
#
# Coefficients:
# Estimate Std. Error t value Pr(>t)
# (Intercept) 4.13e+02 1.10e+02 3.76 0.00018
# Date 3.73e02 7.38e03 5.06 4.9e07
#
# Residual standard error: 175 on 1046 degrees of freedom
# Multiple Rsquared: 0.0239, Adjusted Rsquared: 0.023
# Fstatistic: 25.6 on 1 and 1046 DF, pvalue: 4.89e07
plot(Links ~ Date, data=alerts)
abline(lm)
No big difference with the original plot: still a generic increasing trend. Here is a basic problem with this regression: does this increase reflect an increase in the number of alerts I am subscribed to, tweaks to each alert to make each return more hits (shifting from old alerts to new alerts), or a increase in links per unique alert? It is only the last claim we are interested in, but any of these or other phenomenon could produce an increase.
Per alert
We could try treating each alert separately and doing a linear regression on them, and comparing with the linear model on all data indiscriminately:
library(ggplot2)
qplot(Date, Links, color=Search, data=alerts) +
stat_smooth(method="lm", se=FALSE, fullrange=TRUE, size=0.2) +
geom_abline(aes(intercept=lm$coefficients[1], slope=lm$coefficients[2], color=c()), size=1) +
ylim(0,1130) +
theme(legend.position = "none")
The result is chaotic. Individual alerts are pointing every which way. Regressing on every alert together confounds issues, and regressing on individual alerts produces no agreement. We want some intermediate approach which respects that alerts have different behavior, but yields a meaningful overall statement.
Multilevel model
What we want is to look at each unique alert, estimate its increase/
We can do this with a multilevel model, using lme4
.
We’ll start by fitting & comparing 2 models:
 only the intercept varies between each alert, but all alerts increase or decrease at the same rate
 the intercept varies between each alert, and also alerts differ in their waxing or waning
library(lme4)
mlm1 < lmer(Links ~ Date + (1Search), alerts); mlm1
#
# Random effects:
# Groups Name Variance Std.Dev.
# Search (Intercept) 28943 170
# Residual 12395 111
# Number of obs: 1048, groups: Search, 68
#
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) 427.93512 139.76994 3.06
# Date 0.01984 0.00928 2.14
#
# Correlation of Fixed Effects:
# (Intr)
# Date 0.988
mlm2 < lmer(Links ~ Date + (1+DateSearch), alerts); mlm2
#
# Random effects:
# Groups Name Variance Std.Dev. Corr
# Search (Intercept) 6.40e+06 2529.718
# Date 2.78e02 0.167 0.998
# Residual 8.36e+03 91.446
# Number of obs: 1048, groups: Search, 68
#
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) 295.5469 420.0588 0.70
# Date 0.0090 0.0278 0.32
#
# Correlation of Fixed Effects:
# (Intr)
# Date 0.998
# compare the models: does model 2 buy us anything?
anova(mlm1, mlm2)
#
# mlm1: Links ~ Date + (1  Search)
# mlm2: Links ~ Date + (1 + Date  Search)
# Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
# mlm1 4 13070 13089 6531 13062
# mlm2 6 12771 12801 6379 12759 303 2 <2e16
Model 2 is better on both simplicity/
coef(mlm2)
# $Search
# (Intercept) Date
# 763.48 0.046763
# adult iodine supplementation (IQ OR intelligence OR cognitive) 718.80 0.043157
# AMD pacifica virtualization 836.63 0.062123
# (anime OR manga) (halfJapanese OR hafu OR halfAmerican) 956.52 0.059438
# caloric restriction 2023.10 0.153667
# "Death Note" (script OR liveaction OR Parlapanides) 866.63 0.051314
# "dual nback" 4212.59 0.266879
# dual nback 1213.85 0.073265
# electric sheep screensaver 745.78 0.055937
# "Frank Herbert" 93.28 0.013636
# "Frank Herbert" mason 10815.19 0.676188
# freenet project 1154.14 0.087199
# "Gene Wolfe" 496.01 0.026575
# Gene Wolfe 1178.36 0.072681
# ...
# wikileaks 3080.74 0.227583
# WikiLeaks 388.34 0.031441
# wikipedia 1668.94 0.133976
# Xen 390.01 0.017209
Date is in links per month, so when Xen has a slope of 0.02, that means that every year it falls one link.
max(abs(coef(mlm2)$Search$Date))
# [1] 0.6762
Which comes from the "Frank Herbert" mason
search, probably reflecting how relatively new the search is or the effectiveness of the filter I added to the original "Frank Herbert"
search. In general, the slopes are very similar, there seem to be as many positive slopes as there are negative, and the overall summary slope is a tiny negative slope in the second model (0.01); but most of the searches’ slopes exclude zero in the caterpillar plot:
This says to me that there is no large change over time happening within each alert, as the original claims went, but there does seem to be something going on. When we plot the overall regression and the peralert regressions, we see
fixParam < fixef(mlm2)
ranParam < ranef(mlm2)$Search
params < cbind(ranParam[1]+fixParam[1], ranParam[2]+fixParam[2])
p < qplot(Date, Links, color=Search, data=alerts)
p +
geom_abline(aes(intercept=`(Intercept)`, slope=Date, color=rownames(params)), data=params, size=0.2) +
geom_abline(aes(intercept=fixef(mlm2)[1], slope=fixef(mlm2)[2], color=c()), size=1) +
ylim(0,1130) +
theme(legend.position = "none")
This clearly makes more sense than regressing each alert separately, as we avoid crazily steep slopes when there are just a few emails to use and their regressions get shrunk to the overall regression. We also see no evidence for any large or statisticallysignificant change over time for alerts in general: some alerts do increase over time but some alerts also decrease over time, and there is only a small decrease which we might blame on internal Google problems.
What about the fall?
Having done all this, I thought I was finished until I remembered that the original bloggers didn’t complain about a steady deterioration over time, but an abrupt one starting somewhere in 2012. What happens when I do a binary split and compare 2010/
alertsRecent < alerts[year(alerts$Date)>=2010,]
alertsRecent$Recent < year(alertsRecent$Date) >= 2012
wilcox.test(Links ~ Recent, conf.int=TRUE, data=alertsRecent)
#
# Wilcoxon rank sum test with continuity correction
#
# data: Links by Recent
# W = 71113, pvalue = 6.999e10
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
# 34 75
# sample estimates:
# difference in location
# 53
I avoided a normalitybased test like t.test
and used instead Wilcoxon because there’s no reason to expect the number of links per month to follow a normal distribution. Regardless of the details, there’s a big difference between the two time periods: 219 vs 140 links! A fall of 36% is certainly a serious decline, and it cannot be waved away as due to my Alerts settings (I always use “All results” and never “Only the best results”) nor, as we’ll see now, a confound like the possibilities that motivated multilevel model use:
alerts$Recent < year(alerts$Date) >= 2012
mlm3 < lmer(Links ~ Date + Recent + (1+DateSearch), alerts); mlm3
#
# Random effects:
# Groups Name Variance Std.Dev. Corr
# Search (Intercept) 9.22e+03 9.60e+01
# Date 9.52e05 9.75e03 0.164
# Residual 1.18e+04 1.09e+02
# Number of obs: 1048, groups: Search, 68
#
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) 440.1540 175.3630 2.51
# Date 0.0413 0.0121 3.42
# RecentTRUE 102.2273 13.3224 7.67
#
# Correlation of Fixed Effects:
# (Intr) Date
# Date 0.993
# RecentTRUE 0.630 0.647
anova(mlm1, mlm2, mlm3)
# Models:
# mlm1: Links ~ Date + (1  Search)
# mlm2: Links ~ Date + (1 + Date  Search)
# mlm3: Links ~ Date + Recent + (1 + Date  Search)
# Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
# mlm1 4 13070 13089 6531 13062
# mlm2 6 12771 12801 6379 12759 303 2 <2e16
# mlm3 7 13015 13050 6500 13001 0 1 1
It was mid2011
A new model treating pre2012 as different turns up with a superior fit. Can we do better? A changepoint
fingers May/
library(changepoint)
plot(cpt.meanvar(alertsRecent$Links), ylab="Links")
With this new changepoint, the test is more significant
alertsRecent < alerts[year(alerts$Date)>=2010,]
alertsRecent$Recent < alertsRecent$Date > "20110501"
wilcox.test(Links ~ Recent, conf.int=TRUE, data=alertsRecent)
#
# Wilcoxon rank sum test with continuity correction
#
# data: Links by Recent
# W = 63480, pvalue = 4.61e12
# alternative hypothesis: true location shift is not equal to 0
# 95% confidence interval:
# 62 112
# sample estimates:
# difference in location
# 87
And the fit improves by a large amount:
alerts$Recent < alerts$Date > "20110501"
mlm4 < lmer(Links ~ Date + Recent + (1+DateSearch), alerts); mlm4
#
# Random effects:
# Groups Name Variance Std.Dev. Corr
# Search (Intercept) 8.64e+03 9.30e+01
# Date 9.28e05 9.63e03 0.172
# Residual 1.11e+04 1.05e+02
# Number of obs: 1048, groups: Search, 68
#
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) 1.11e+03 1.87e+02 5.91
# Date 8.86e02 1.30e02 6.83
# RecentTRUE 1.65e+02 1.44e+01 11.43
#
# Correlation of Fixed Effects:
# (Intr) Date
# Date 0.994
# RecentTRUE 0.709 0.725
anova(mlm1, mlm2, mlm3, mlm4)
# Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
# mlm1 4 13070 13089 6531 13062
# mlm2 6 12771 12801 6379 12759 302.7 2 <2e16
# mlm3 7 13015 13050 6500 13001 0.0 1 1
# mlm4 7 12948 12983 6467 12934 66.8 0 <2e16
Robustness
Is the fall robust against different samples of my data, using bootstrapping? The answer is yes, and the Wilcoxon test even turns out to have given us a pretty good confidence interval earlier:
library(boot)
recentEstimate < function(dt, indices) {
d < dt[indices,] # allows boot to select subsample
mlm4 < lmer(Links ~ Date + Recent + (1+DateSearch), d)
return(fixef(mlm4)[3])
}
bs < boot(data=alerts, statistic=recentEstimate, R=10000, parallel="multicore", ncpus=4); bs
# ...
# Bootstrap Statistics :
# original bias std. error
# t1* 164.8 34.06 17.44
boot.ci(bs)
# ...
# Intervals :
# Level Normal Basic
# 95% (233.0, 164.7 ) (228.2, 156.7 )
#
# Level Percentile BCa
# 95% (172.9, 101.4 ) (211.8, 156.7 )
A confidence interval of (159,95) is both statisticallysignificant in this context and also an effect size to be reckoned with. It seems this mid2011 fall is real. I’m surprised to find such a precise, localized, drop in my Alerts quantities. I did expect to find a decline, but I expected it to be a gradual incremental process as Google’s search algorithms gradually excluded more and more links. I didn’t expect to be able to say something like “in this month, results dropped by more than a third”.
Panda?
I don’t know of any changes announced to Google Alerts in May/
There is one culprit that comes to mind for what may have changed in early 2011 which would then led to a fall in collated links (a fall which would accumulate to statisticalsignificance in June 2011): the pervasive change to webpage rankings called Google Panda. It affected many websites & searches, had teething problems, reportedly boosted social networking sites (which I generally see very few of in my own alerts), and was rolled out globally in April 2011  just in time to trigger a change in May/
(We’ll probably never know the true reason: Google is notoriously uncommunicative about many of its internal technical decisions and changes.)
Conclusion
So where does this leave us?
Well, the overall linear regressions turned out to not answer the question, but they were still educational in demonstrating the considerable diversity between alerts and the trickiness of understanding what question exactly we were asking; the variability and differences between alerts reminds us to not be fooled by randomness and try to look for big effects & the big picture  if someone says their alerts seem a little down, they may have been fooled by selective memory, but when they say their alerts went from 20 links an email to 3, then we should avoid unthinking skepticism and look more carefully.
When we investigated the claim directly, we didn’t quite find the claim: there was no changepoint anywhere in 2012 as claimed by bloggers like  they seem to have been half a year off from when the change occurred in my own alerts. What’s going on there? It’s hard to say. Google sometimes rolls out changes to users over long periods of time, so perhaps I was hit early by some changes drastically reducing links. Or perhaps it simply took time for people to become certain that there were fewer links (in which case I have given them too little credit). Or perhaps separate SEOrelated changes hit their searches after mine were.
Is Alerts “broken”? Well, it’s taken a clear hit: the number of found links are down, and my own impression is that the returned links are not such gems that they make up for their gemlike rarity. And it’s certainly not good that the problem is now 2 years old without any discernible improvement.
But on closer inspection, the hit seems to have been a onetime deal, and if my Panda speculation is correct, it does not reflect any neglect or contempt by Google but simply more important factors  Search remaining highquality will always be a higher priority than Alerts, because Search is the dog that wags the tail. My survival model may yet have the last laugh and Alerts outlast its more famous brethren.
I suppose it depends on whether you see the glass as halffull or halfempty: if halffull, then this is good news because it means that Alerts isn’t in as bad shape as it looks and may not soon be following Reader into the great Recycle Bin in the sky; if halfempty, then this is another example of how Google does not communicate with its users, makes changes unilaterally and invisibly, will degrade one service for another more profitable service, and how users are helpless in the face of its technical supremacy (who else can do as good a job of spidering the Internet for new content matching keywords?).
See Also
External Links

Google has refused to shed light on the decline. Today, a Google spokesperson told BuzzFeed, “we’re always working to improve our products  we’ll continue making updates to Google Alerts to make it more useful for people.” In other words, a polite nonanswer.
I have seen some alternative services, Yahoo! Search Alerts, talkwalker & Mention suggested, but have not used them; the latter 2 do well in a comparison with Google Alerts.↩︎