Skip to main content

Google directory

Links

“Internet Search Tips”, Branwen 2018

Search: “Internet Search Tips”⁠, Gwern Branwen (2018-12-11; ⁠, ⁠, ⁠, ; backlinks; similar):

A description of advanced tips and tricks for effective Internet research of papers/​books, with real-world examples.

Over time, I developed a certain google-fu and expertise in finding references, papers, and books online. Some of these tricks are not well-known, like checking the Internet Archive (IA) for books. I try to write down my search workflow, and give general advice about finding and hosting documents, with demonstration case studies⁠.

“Banner Ads Considered Harmful”, Branwen 2017

Ads: “Banner Ads Considered Harmful”⁠, Gwern Branwen (2017-01-08; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

9 months of daily A/​B-testing of Google AdSense banner ads on Gwern.net indicates banner ads decrease total traffic substantially, possibly due to spillover effects in reader engagement and resharing.

One source of complexity & JavaScript use on Gwern.net is the use of Google AdSense advertising to insert banner ads. In considering design & usability improvements, removing the banner ads comes up every time as a possibility, as readers do not like ads, but such removal comes at a revenue loss and it’s unclear whether the benefit outweighs the cost, suggesting I run an A/​B experiment. However, ads might be expected to have broader effects on traffic than individual page reading times/​bounce rates, affecting total site traffic instead through long-term effects on or spillover mechanisms between readers (eg. social media behavior), rendering the usual A/​B testing method of per-page-load/​session randomization incorrect; instead it would be better to analyze total traffic as a time-series experiment.

Design: A decision analysis of revenue vs readers yields an maximum acceptable total traffic loss of ~3%. Power analysis of historical Gwern.net traffic data demonstrates that the high autocorrelation yields low statistical power with standard tests & regressions but acceptable power with ARIMA models. I design a long-term Bayesian ARIMA(4,0,1) time-series model in which an A/​B-test running January–October 2017 in randomized paired 2-day blocks of ads/​no-ads uses client-local JS to determine whether to load & display ads, with total traffic data collected in Google Analytics & ad exposure data in Google AdSense. The A/​B test ran from 2017-01-01 to 2017-10-15, affecting 288 days with collectively 380,140 pageviews in 251,164 sessions.

Correcting for a flaw in the randomization, the final results yield a surprisingly large estimate of an expected traffic loss of −9.7% (driven by the subset of users without adblock), with an implied −14% traffic loss if all traffic were exposed to ads (95% credible interval: −13–16%), exceeding my decision threshold for disabling ads & strongly ruling out the possibility of acceptably small losses which might justify further experimentation.

Thus, banner ads on Gwern.net appear to be harmful and AdSense has been removed. If these results generalize to other blogs and personal websites, an important implication is that many websites may be harmed by their use of banner ad advertising without realizing it.

“World Catnip Surveys”, Branwen 2015

Catnip-survey: “World Catnip Surveys”⁠, Gwern Branwen (2015-11-15; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

International population online surveys of cat owners about catnip and other cat stimulant use.

In compiling a meta-analysis of reports of catnip response rats in domestic cats⁠, yielding a meta-analytic average of ~2⁄3, the available data suggests heterogeneity from cross-country differences in rates (possibly for genetic reasons) but is insufficient to definitively demonstrate the existence of or estimate those differences (particularly a possible extremely high catnip response rate in Japan). I use Google Surveys August–September 2017 to conduct a brief 1-question online survey of a proportional population sample of 9 countries about cat ownership & catnip use, specifically: Canada, the USA, UK, Japan, Germany, Brazil, Spain, Australia, & Mexico. In total, I surveyed n = 31,471 people, of whom n = 9,087 are cat owners, of whom n = 4,402 report having used catnip on their cat, and of whom n = 2996 report a catnip response.

The survey yields catnip response rates of Canada (82%), USA (79%), UK (74%), Japan (71%), Germany (57%), Brazil (56%), Spain (54%), Australia (53%), and Mexico (52%). The differences are substantial and of high posterior probability, supporting the existence of large cross-country differences. In additional analysis, the other conditional probabilities of cat ownership and trying catnip with a cat appear to correlate with catnip response rates; this intercorrelation suggests a “cat factor” of some sort influencing responses, although what causal relationship there might be between proportion of cat owners and proportion of catnip-responder cats is unclear.

An additional survey of a convenience sample of primarily US Internet users about catnip is reported, although the improbable catnip response rates compared to the population survey suggest the respondents are either highly unrepresentative or the questions caused demand bias.

“Alerts Over Time”, Branwen 2013

Google-Alerts: “Alerts Over Time”⁠, Gwern Branwen (2013-07-01; ⁠, ⁠, ; backlinks; similar):

Does Google Alerts return fewer results each year? A statistical investigation

Has Google Alerts been sending fewer results the past few years? Yes. Responding to rumors of its demise, I investigate the number of results in my personal Google Alerts notifications 2007-2013, and find no overall trend of decline until I look at a transition in mid-2011 where the results fall dramatically. I speculate about the cause and implications for Alerts’s future.

“Cultural Drift: Cleaning Methods”, Branwen 2013

Sand: “Cultural drift: cleaning methods”⁠, Gwern Branwen (2013-05-07; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Forgotten chores and their use by Romanticism

Some old books mention sandy floors and sprinkling water on the ground; these asides seem to go unnoticed by most/​all readers. I highlight them, explain and discuss their use as now-obsolete cleaning practices, poll Internet users to see how forgotten they are, and ponder implications. In an appendix, I discuss a similar issue I encountered in pre-Space-Race American science fiction.

“Predicting Google Closures”, Branwen 2013

Google-shutdowns: “Predicting Google closures”⁠, Gwern Branwen (2013-03-28; ⁠, ⁠, ⁠, ; backlinks; similar):

Analyzing predictors of Google abandoning products; predicting future shutdowns

Prompted by the shutdown of Google Reader⁠, I ponder the evanescence of online services and wonder what is the risk of them disappearing. I collect data on 350 Google products launched before March 2013, looking for variables predictive of mortality (web hits, service vs software, commercial vs free, FLOSS, social networking, and internal vs acquired). Shutdowns are unevenly distributed over the calendar year or Google’s history. I use logistic regression & survival analysis (which can deal with right-censorship) to model the risk of shutdown over time and examine correlates. The logistic regression indicates socialness, acquisitions, and lack of web hits predict being shut down, but the results may not be right. The survival analysis finds a median lifespan of 2824 days with a roughly Type III survival curve (high early-life mortality); a Cox regression finds similar results as the logistic - socialness, free, acquisition, and long life predict lower mortality. Using the best model, I make predictions about probability of shutdown of the most risky and least risky services in the next 5 years (up to March 2018). (All data & R source code is provided.)

“A/B Testing Long-form Readability on Gwern.net”, Branwen 2012

AB-testing: “A/B testing long-form readability on Gwern.net”⁠, Gwern Branwen (2012-06-16; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A log of experiments done on the site design, intended to render pages more readable, focusing on the challenge of testing a static site, page width, fonts, plugins, and effects of advertising.

To gain some statistical & web development experience and to improve my readers’ experiences, I have been running a series of CSS A/​B tests since June 2012. As expected, most do not show any meaningful difference.

“The Neural Net Tank Urban Legend”, Branwen 2011

Tanks: “The Neural Net Tank Urban Legend”⁠, Gwern Branwen (2011-09-20; ⁠, ⁠, ; backlinks; similar):

AI folklore tells a story about a neural network trained to detect tanks which instead learned to detect time of day; investigating, this probably never happened.

A cautionary tale in artificial intelligence tells about researchers training an neural network (NN) to detect tanks in photographs, succeeding, only to realize the photographs had been collected under specific conditions for tanks/​non-tanks and the NN had learned something useless like time of day. This story is often told to warn about the limits of algorithms and importance of data collection to avoid “dataset bias”/​“data leakage” where the collected data can be solved using algorithms that do not generalize to the true data distribution, but the tank story is usually never sourced.

I collate many extent versions dating back a quarter of a century to 1992 along with two NN-related anecdotes from the 1960s; their contradictions & details indicate a classic “urban legend”, with a probable origin in a speculative question in the 1960s by Edward Fredkin at an AI conference about some early NN research, which was then classified & never followed up on.

I suggest that dataset bias is real but exaggerated by the tank story, giving a misleading indication of risks from deep learning and that it would be better to not repeat it but use real examples of dataset bias and focus on larger-scale risks like AI systems optimizing for wrong utility functions.

“Gwern.net Website Traffic”, Branwen 2011

Traffic: “Gwern.net Website Traffic”⁠, Gwern Branwen (2011-02-03; ⁠, ⁠, ⁠, ⁠, ; similar):

Meta page describing Gwern.net editing activity, traffic statistics, and referrer details, primarily sourced from Google Analytics (2011-present).

On a semi-annual basis, since 2011, I review Gwern.net website traffic using Google Analytics; although what most readers value is not what I value, I find it motivating to see total traffic statistics reminding me of readers (writing can be a lonely and abstract endeavour), and useful to see what are major referrers.

Gwern.net typically enjoys steady traffic in the 50–100k range per month, with occasional spikes from social media, particularly Hacker News; over the first decade (2010–2020), there were 7.98m pageviews by 3.8m unique users.

“About This Website”, Branwen 2010

About: “About This Website”⁠, Gwern Branwen (2010-10-01; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Meta page describing Gwern.net site ideals of stable long-term essays which improve over time; idea sources and writing methodology; metadata definitions; site statistics; copyright license.

This page is about Gwern.net content; for the details of its implementation & design like the popup paradigm, see Design⁠; and for information about me, see Links⁠.

Miscellaneous