# In Defense of Inclusionism

Iron Law of Bureaucracy: the downwards deletionism spiral discourages contribution and is how Wikipedia will die.
2009-01-152018-11-28

Eng­lish Wikipedia is in decline. As a long-­time edi­tor & for­mer admin, I was deeply dis­mayed by the process. Here, I dis­cuss UI prin­ci­ples, changes in Wikipedian cul­ture, the large-s­cale sta­tis­ti­cal evi­dence of decline, run smal­l­-s­cale exper­i­ments demon­strat­ing the harm, and con­clude with part­ing thoughts.

The fun­da­men­tal cause of the decline is the Eng­lish Wikipedi­a’s increas­ingly nar­row atti­tude as to what are accept­able top­ics and to what depth those top­ics can be explored, com­bined with a nar­rowed atti­tude as to what are accept­able sources, where aca­d­e­mic & media cov­er­age trumps any con­sid­er­a­tion of other fac­tors.

I started as an anon, mak­ing occa­sional small edits after I learned of WP from in 2004. I hap­pened to be a con­trib­u­tor to at the time, and when one of my more ency­clo­pe­dic arti­cles was reject­ed, I decided it might as well go on Wikipedia, so I reg­is­tered an account in 2005 and slowly got more seri­ous about edit­ing as I became more com­fort­able with WP and excited about its poten­tial. Before I wound down my edit­ing activ­i­ty, dis­mayed by the cul­tural changes, I had done . And old Wikipedia was excit­ing.

You can see this stark dif­fer­ence between old Wikipedia and mod­ern Wikipedia: in the early days you could have things like arti­cles on each chap­ter of Atlas Shrugged or each Poke­mon. Even if you per­son­ally did not like Objec­tivism or Poke­mon, you knew that you could go into just as much detail about the top­ics you liked best—Wikipedia was not paper! We talked ide­al­is­ti­cally about how Wikipedia could become an ency­clo­pe­dia of spe­cial­ist ency­clo­pe­di­as, the super­set of ency­clo­pe­dias. “would you expect to see a Bul­basaur arti­cle in a Poke­mon ency­clo­pe­dia? yes? then let’s have a Bul­basaur arti­cle”. The poten­tial was that Wikipedia would be the sum­mary of the Inter­net and books/media. Instead of punch­ing in a key­word to a search engine and get­ting 100 pages deal­ing with tiny frag­ments of the topic (in how­ever much detail), you would get a coher­ent overview sum­ma­riz­ing every­thing worth know­ing about the top­ic, for almost all top­ics.

But now Wikipedi­a’s nar­row­ing focus means, only some of what is worth know­ing, about some top­ics. Respectable top­ics. Main­stream top­ics. Unim­peach­ably Ency­clo­pe­dic top­ics.

These days, that ideal is com­pletely gone. If you try to write niche arti­cles on cer­tain top­ics, peo­ple will tell you to save it for Wikia. I am not excited or inter­ested in such a parochial project which excludes so many of my inter­ests, which does not want me to go into great depth about even the inter­ests it deems mer­i­to­ri­ous—and a great many other peo­ple are not excited either, espe­cially as they begin to real­ize that even if you nav­i­gate the cul­ture cor­rectly and get your mate­r­ial into Wikipedia, there is far from any guar­an­tee that your con­tri­bu­tions will be respect­ed, not delet­ed, and improved. For the ama­teurs and also experts who wrote wikipedia, why would they want to con­tribute to some place that does­n’t want them?

The Wiki­Me­dia Foun­da­tion (WMF) seems unable to address this issue. I read their plans and pro­jec­tions, and I pre­dicted well in advance that they would totally fail, as they have. Their ‘solu­tions’ were band-aids which did­n’t get at what I or oth­ers were diag­nos­ing as the under­ly­ing prob­lems. The “bar­ri­ers to entry” like the com­plex markup are not the true issue. They are prob­lems, cer­tain­ly, but not the core prob­lem—if they were resolved, Wikipedi­a’s decline would con­tin­ue. WMF seems to think that a lit­tle more lip­stick on the pig will fix every­thing. Bar­ri­ers to entry are a prob­lem for non-tech­ni­cal new users, yes, but it does not explain why tech­ni­cal new users are also not appear­ing. Where are all the young pro­gram­mers? They can eas­ily learn the markup and han­dle the other bar­ri­er­s—if those bar­ri­ers were the only bar­ri­ers, Wikipedia should be hav­ing no prob­lems. Plenty of poten­tial edi­tors in that sea. But if you go to pro­gram­mer hang­outs like Hacker News, you’re not going to find every­one going “I don’t know what peo­ple are com­plain­ing about, edit­ing Wikipedia works just great for me!”, because they’re quite as embit­tered and jaded as other groups.

What is to be done? Hard to say. Wikipedia has already exiled hun­dreds of sub­jec­t-area com­mu­ni­ties to Wikia, and I’d say the nar­row­ing began in 2007, so there’s been a good 6 years of iner­tia and time for the rot to set in. And I haven’t thought much about it because too many peo­ple deny that there is any prob­lem, and when they admit there is a prob­lem, they focus on triv­ial issues like the Medi­aWiki markup. Noth­ing I can do about it, any­way. Once the prob­lem has been diag­nosed, time to move on to other activ­i­ties.

Wikipedia will still exist. The cor­pus is too huge and valu­able to rot eas­i­ly. A sys­tem can decline with­out dying. MySpace still exists, and there is no rea­son Wikipedia can­not be MySpace—use­ful for some pur­pos­es, a shell of its for­mer glo­ry, a major break­through in its time, but fun­da­men­tally bypassed by other sources of infor­ma­tion. I don’t know what the Face­book to Wikipedi­a’s MySpace is, but the Inter­net sur­vived for decades with­out Wikipedia, we’ll get along with­out a live Wikipedia. Even though it is a huge loss of poten­tial.

# Friction

A peren­nial lure of tech­nol­ogy is its promise to let us do things that we could­n’t do before, and in ways we would­n’t before.

An exam­ple here would be Wikipedia and wikis in gen­er­al: by low­er­ing the ‘cost’ of chang­ing a page, and using soft­ware that makes undo­ing most van­dal­ism far eas­ier than doing it, the par­tic­i­pa­tion goes through the roof. It’s not the tech­nol­ogy itself that really mat­ters, but how easy and com­fort­able it is to con­tribute. has been inves­ti­gat­ing why Wikipedia, out of 8 com­pa­ra­ble attempts to write an online ency­clo­pe­dia, suc­ceed­ed; his con­clu­sion seems to be that Wikipedia suc­ceeded by focus­ing on devel­op­ing con­tent and mak­ing con­tri­bu­tion easy. “The con­tri­bu­tion conun­drum: Why did Wikipedia suc­ceed while other ency­clo­pe­dias failed?”:

One answer, which seems obvi­ous only in ret­ro­spect: Wikipedia attracted con­trib­u­tors because it was built around a famil­iar pro­duc­t—the ency­clo­pe­dia. Ency­clo­pe­dias aren’t just arti­facts; they’re also epis­temic frames. They employ a par­tic­u­lar—and, yet, uni­ver­sal—ap­proach to orga­niz­ing infor­ma­tion. Prior to Wikipedia, online ency­clo­pe­dias tried to do what we tend to think is a good thing when it comes to the web: chal­leng­ing old metaphors, explod­ing ana­log tra­di­tions, invent­ing entirely new form­s…An­other intrigu­ing find­ing: Wikipedia focused on sub­stan­tive con­tent devel­op­ment instead of tech­nol­o­gy. Wikipedia was the only project in the entire sam­ple, Hill not­ed, that did­n’t build its own tech­nol­o­gy. (It was, in fact, gen­er­ally seen as tech­no­log­i­cally unso­phis­ti­cated by other ency­clo­pe­di­as’ founders, who saw them­selves more as tech­nol­o­gists than as con­tent provider­s.) , for exam­ple, had sev­eral peo­ple ded­i­cated to build­ing its infra­struc­ture, but none devoted to build­ing its arti­cles. It was all very if you build it, they will come…There are two other key con­trib­u­tors to Wikipedi­a’s suc­cess with attract­ing con­trib­u­tors, Hill’s research sug­gests: Wikipedia offered low trans­ac­tion costs to par­tic­i­pa­tion, and it de-em­pha­sized the social own­er­ship of con­tent. Edit­ing Wikipedia is easy, and instant, and vir­tu­ally com­mit­men­t-free. “You can come along and do a dri­ve-by edit and never make a con­tri­bu­tion again,” Hill pointed out. And the fact that it’s dif­fi­cult to tell who wrote an arti­cle, or who edited it—rather than dis­cour­ag­ing con­tri­bu­tion, as you might assume—ac­tu­ally encour­aged con­tri­bu­tions, Hill found. “Low tex­tual own­er­ship resulted in more col­lab­o­ra­tion,” he put it. And that could well be because Wikipedi­a’s author­less struc­ture low­ers the pres­sure some might feel to con­tribute some­thing stel­lar. The pull of rep­u­ta­tion can dis­cour­age con­tri­bu­tions even as it can also encour­age them. So Wikipedia “took advan­tage of mar­ginal con­tri­bu­tions,” Hill not­ed—a sen­tence here, a graf there—which, added up, turned into arti­cles. Which, added up, turned into an ency­clo­pe­dia.

I’ve often thought that if the ‘bar­ri­ers to entry’ were charted against ‘con­tributed effort’, one would see an rela­tion. An entire essay could likely be writ­ten on how the Wikipedia com­mu­nity put up small bar­ri­er­s—each indi­vid­u­ally rea­son­able, and not too oner­ous even in the aggre­gate—of ref­er­enc­ing, of ban­ning anony­mous page cre­ation, etc. led to the first sus­tained drop in con­trib­u­tors and con­tri­bu­tion. The effect is non­lin­ear.

# New regimes

The best rule of thumb here is per­haps the one cited by in :

Accord­ing to a rule of thumb among engi­neers, any ten­fold quan­ti­ta­tive change is a qual­i­ta­tive change1, a fun­da­men­tally new sit­u­a­tion rather than a sim­ple extrap­o­la­tion.

Clear as mud, eh? Let’s try more quotes, then:

The human long­ing for free­dom of infor­ma­tion is a ter­ri­ble and won­der­ful thing. It delin­eates a piv­otal dif­fer­ence between men­tal eman­ci­pa­tion and slav­ery. It has launched protests, rebel­lions, and rev­o­lu­tions. Thou­sands have devoted their lives to it, thou­sands of oth­ers have even died for it. And it can be stopped dead in its tracks by requir­ing peo­ple to search for “how to set up proxy” before view­ing their anti-­gov­ern­ment web­site.

I was reminded of this recently by Eliez­er’s Less Wrong Progress Report. He men­tioned how sur­prised he was that so many peo­ple were post­ing so much stuff on Less Wrong, when very few peo­ple had ever taken advan­tage of Over­com­ing Bias’ pol­icy of accept­ing con­tri­bu­tions if you emailed them to a mod­er­a­tor and the mod­er­a­tor approved. Appar­ently all us folk brim­ming with ideas for posts did­n’t want to deal with the aggra­va­tion.2

We exam­ine open access arti­cles from three jour­nals at the Uni­ver­sity of Geor­gia School of Law and con­firm that legal schol­ar­ship freely avail­able via open access improves an arti­cle’s research impact. Open access legal schol­ar­ship—which today appears to account for almost half of the out­put of law fac­ul­ties—­can expect to receive 50% more cita­tions than non-open access writ­ings of sim­i­lar age from the same venue.34

There are tools to just say, “Give me your social secu­rity num­ber, give me your address and your moth­er’s maiden name, and we send you a phys­i­cal piece of paper and you sign it and send it back to us.” By the time that’s all accom­plished, you are a very safe user. But by then you are also not a user, because for every step you have to take, the dropoff rate is prob­a­bly 30%. If you take ten steps, and each time you lose one-third of the users, you’ll have no users by the time you’re done with the fourth step.5

For exam­ple, usabil­ity the­ory holds that if you make a task 10% eas­ier, you dou­ble the num­ber of peo­ple that can accom­plish it. I’ve always felt that if you can make it 10% eas­ier to fill in a , you’ll get twice as many bug reports. (When I removed two ques­tions from the signup page, the rate of new signups went up dra­mat­i­cal­ly).6

Think of these bar­ri­ers as an obsta­cle course that peo­ple have to run before you can count them as your cus­tomers. If you start out with a field of 1000 run­ners, about half of them will trip on the tires; half of the sur­vivors won’t be strong enough to jump the wall; half of those sur­vivors will fall off the rope lad­der into the mud, and so on, until only 1 or 2 peo­ple actu­ally over­come all the hur­dles. With 8 or 9 bar­ri­ers, every­body will have one non-ne­go­tiable deal killer…By inces­sant pound­ing on elim­i­nat­ing bar­ri­ers, [Mi­crosoft] slowly pried some mar­ket share away from Lotus.7

The vast major­ity of raters were pre­vi­ously only read­ers of Wikipedia. Of the reg­is­tered users that rated an arti­cle, 66% had no prior edit­ing activ­i­ty. For these reg­is­tered users, rat­ing an arti­cle rep­re­sents their first par­tic­i­pa­tory activ­ity on Wikipedia. These ini­tial results show that we are start­ing to engage these users beyond just pas­sive read­ing, and they seem to like it…Once users have suc­cess­fully sub­mit­ted a rat­ing, a ran­domly selected sub­set of them are shown an invi­ta­tion to edit the page. Of the users that were invited to edit, 17% attempted to edit the page. 15% of those ended up suc­cess­fully com­plet­ing an edit. These results strongly sug­gest that a feed­back tool could suc­cess­fully con­vert pas­sive read­ers into active con­trib­u­tors of Wikipedia. A rich text edi­tor could make this path to edit­ing even more promis­ing.8

# Toeing the precipice

It may take only a few restric­tions before one has inched far enough the ‘bar­ri­ers’ axis that the ‘con­tri­bu­tions’ does in fact fall by ten­fold. One sees Wikipedia slowly adding restric­tions:

Each of these steps seems harm­less enough, per­haps, because we can’t see the things which do not hap­pen as a result (this is a ver­sion of Frédéric Bas­tiat’s ). The legal­is­tic motto “that which is not explic­itly per­mit­ted is for­bid­den” has the virtue of being easy to apply, at least.

Few objected to the ban­ning of anony­mous page cre­ation by dur­ing the (we had to destroy the wiki to save it), and most of those were unprin­ci­pled ones. The objec­tor was all for a tougher War on Drugs—er, I mean Ter­ror, or was that Van­dal­ism? (maybe Pover­ty)—but they did­n’t want to be stam­peded into it by some bad PR. Too, few objected to s: ‘take that you scum­bag spam­mers!’ The ironic thing is, as a frac­tion of edits, van­dal­ism shrunk from 2003-2008 (re­main­ing roughly sim­i­lar since) and sim­i­lar­ly, users spe­cial­iz­ing in van­dal fight­ing and their work­load of edits have shrunk; graph­ing new con­tri­bu­tions by size, one finds that for both reg­is­tered and anony­mous users, the apogee was 2007 and van­dal­ism has been decreas­ing ever since. (A more ambigu­ous sta­tis­tic is the reduced num­ber of actions by new page patrollers.)

## Falling

“Who alive can say,
‘Thou art no Poet­—­may’st not tell thy dreams?’
Since every man whose soul is not a clod
Hath visions, and would speak, if he had loved,
And been well nur­tured in his mother tongue.”

But by 2007 the water had become hot enough to be felt by devo­tees of mod­ern fic­tion (that is, anime & manga fran­chis­es, video games, nov­els, etc.), and even the great Jimbo could not expect to see his arti­cles go un-AfD’d.

But who really cares about what some nerds like? What mat­ters is Nota­bil­ity with a cap­i­tal N, and the fact that our feel­ings were hurt by some Wiki­groan­ing! After all, clearly the proper way to respond to the obser­va­tion that was longer than is to delete its con­tents and have peo­ple read the short, scrawny—but seri­ous!— arti­cle instead.

If it does­n’t appear in Encarta or Ency­clo­pe­dia Bri­tan­ni­ca, or isn’t treated at the same (pro­por­tion­al) length, then it must go!

## By the numbers

“Imag­ine a world in which every sin­gle per­son on the planet is given free access to the sum of all human knowl­edge. That’s what we’re doing.”

Jimmy Wales, 2004

“…in­clu­sion­ism gen­er­ally is tox­ic. It lets a huge vol­ume of garbage pile up. Dele­tion­ism just takes out the trash. We did it with damn Poke­mon, and we’ll even­tu­ally do it with junk foot­ball”biogra­phies“, with”foot­ball" in the sense of Amer­i­can and oth­er­wise. We’ll sooner or later get it done with “pop­u­lated places” and the like too."

Todd Allen, 2019-07-05 (WP edi­tor 2004, admin 2007, Arb­com 2014–2016)

Delet­ing based on nota­bil­i­ty, fic­tion arti­cles in par­tic­u­lar, does­n’t merely ill-serve our read­ers (who are numer­ous; note how many of Wikipedi­a’s most pop­u­lar pages are fic­tion-re­lat­ed, both now and in 2007 or 2011, or how many Inter­net searches lead to Wikipedia for cul­tural con­tent9), but it also dam­ages the com­mu­ni­ty.

We can see it indi­rectly in the global sta­tis­tics. The analy­ses (2007, 2008) show it. We are see­ing fewer new edi­tors, few new arti­cles, fewer new images; less of every­thing, except tedium & bureau­cra­cy.

Worse, it’s not that the growth of Wikipedia has stopped accel­er­at­ing in impor­tant met­rics. The rate of increase has in some cases not merely stopped increas­ing, but started drop­ping!

"…the size of the active edit­ing com­mu­nity of the Eng­lish Wikipedia peaked in early 2007 and has declined some­what since then. Like Wikipedi­a’s arti­cle count, the num­ber of active edi­tors grew expo­nen­tially dur­ing the early years of the pro­ject. The arti­cle cre­ation rate (which is tracked at Wikipedi­a:­Size of Wikipedia) peaked around August 2006 at about 2400 net new arti­cles per day and has fallen since then, to around under 1400 in recent months. [The graph is mir­rored at Andrew Lih’s “Wikipedia Plateau?”.]

User:M­Bisanz has charted the num­ber of new accounts reg­is­tered per mon­th, which tells a very sim­i­lar sto­ry: March 2007 recorded the largest num­ber of new accounts, and the rate of new account cre­ation has fallen sig­nif­i­cantly since then. Declines in activ­ity have also been not­ed, and fret­ted about, at Wikipedi­a:Re­quests for admin­ship…"

This been noted in mul­ti­ple sources, such as Felipe Orte­ga’s 2009 the­sis, “Wikipedia: A Quan­ti­ta­tive Analy­sis”:

So far, our empir­i­cal analy­sis of the top ten Wikipedias has revealed that the sta­bi­liza­tion of the num­ber of con­tri­bu­tions from logged authors in Wikipedia dur­ing 2007 has influ­enced the evo­lu­tion of the pro­ject, break­ing down the steady grow­ing rate of pre­vi­ous years…

Unfor­tu­nate­ly, this results raise sev­eral impor­tant con­cerns for the Wikipedia pro­ject. Though we do not have empir­i­cal data from 2008, the change in the trend of births and deaths [new & inac­tive edi­tors] will clearly decrease the num­ber of avail­able logged authors in all lan­guage ver­sions, thus cut­ting out the capac­ity of the project to effec­tively under­take revi­sions and improve con­tents. Even more seri­ous is the slightly decreas­ing trend that is start­ing to appear in the monthly num­ber of births of most ver­sions. The rate of deaths, on the con­trary, does not seem to leave its ascend­ing ten­den­cy. Eval­u­at­ing the results for 2008 will be a key aspect to val­i­date the hypoth­e­sis that this trend has changed indeed, and that the Wikipedia project needs to put in prac­tice more aggres­sive mea­sures to attract new users, if they do not want to see the monthly effort decrease in due course, as a result of the lack of human authors.10

Ortega notes indi­ca­tions that this is a pathol­ogy unique to En:

“In the first place, we note the remark­able dif­fer­ence between the Eng­lish and the Ger­man lan­guage ver­sions. The first one presents one of the worst sur­vival curves in this series, along with the Por­tuguese Wikipedia, whereas the Ger­man ver­sion shows the best results until approx­i­mately 800 days. From that point on, the Japan­ese lan­guage ver­sion is the best one. In fact, the Ger­man, French, Japan­ese and Pol­ish Wikipedias exhibits some of the best sur­vival curves in the set, and only the Eng­lish ver­sion clearly devi­ates from this gen­eral trend. The most prob­a­ble expla­na­tion for this dif­fer­ence, tak­ing into account that we are con­sid­er­ing only logged authors in this analy­sis, is that the Eng­lish Wikipedia receives too con­tri­bu­tions from too many casual users, who never come back again after per­form­ing just a few revi­sions.”11

Erik Moeller of the WMF tried to wave away the results in Novem­ber 2009 by point­ing out that “The num­ber of peo­ple writ­ing Wikipedia peaked about two and a half years ago, declined slightly for a brief peri­od, and has remained sta­ble since then”, but he also shoots him­self in the foot by point­ing out that the num­ber of arti­cles keeps grow­ing. That is not a sus­tain­able dis­par­i­ty. Worse, as the orig­i­nal writ­ers leave, their arti­cles become —on which later edi­tors must engage in , try­ing to retrieve the orig­i­nal ref­er­ences or under­stand why some­thing was omit­ted, or must sim­ply remove con­tent because they do not under­stand the larger con­text or are igno­rant. (I have had con­sid­er­able dif­fi­culty answer­ing some straight­for­ward ques­tions about errors in arti­cles I researched and wrote entirely on my own; how well could a later edi­tor have han­dled the ques­tion­s?)

The num­bers have been depress­ing ever since, from the 2010 infor­mal & Foun­da­tion study12 on edi­tor demo­graph­ics to 2011 arti­cle con­tri­bu­tions; the WSJ’s sta­tis­ti­cian Carl Bia­lik wrote in Sep­tem­ber 2011 that “the num­ber of edi­tors is dwin­dling. Just 35,844 reg­is­tered edi­tors made five or more edits in June, down 34% from the March 2007 peak. Just a small share of Wikipedia edi­tors—about 3%—ac­count for 85% of the site’s activ­i­ty, a poten­tial prob­lem, since par­tic­i­pa­tion by these heavy users has fallen even more sharply.”

Only in 2010 and 2011 has the Foun­da­tion seemed to wake up and see what the num­bers were say­ing all along; while Wales says some of the right things like “A lot of edi­to­r­ial guide­li­nes…are impen­e­tra­ble to new users”, he also back­-hand­edly dis­misses it—“We are not replen­ish­ing our ranks. It is not a cri­sis, but I con­sider it to be impor­tant.” By Decem­ber 2011, Sue Gard­ner seems to reflect a more real­is­tic view in the WMF, call­ing it the “holy-shit slide”; I think she is worth quot­ing at length to empha­size the issue. From the 2011-12-19 “The Gard­ner inter­view”:

Much of the inter­view con­cerned the issues she raised in a land­mark address in Novem­ber to the board of Wiki­me­dia UK, in which she said the slide show­ing a graph of declin­ing edi­tor reten­tion (be­low) is what the Foun­da­tion calls “the holy-shit slide”. This is a huge, “really really bad” prob­lem, she told Wiki­me­dia UK, and is worst on the Eng­lish and Ger­man Wikipedias.

A promi­nent issue on the Eng­lish Wikipedia is whether attempts to achieve high qual­ity in arti­cles—and per­cep­tions that this is entan­gled with unfriendly treat­ment of new­bies by the com­mu­ni­ty—are asso­ci­ated with low rates of attract­ing and retain­ing new edi­tors. Although Gard­ner believes that high qual­ity and attract­ing new edi­tors are both crit­i­cal goals, her view is that qual­ity has not been the prob­lem, although she did­n’t define exactly what arti­cle qual­ity is. What we did­n’t know in 2007, she said, was that “qual­ity was doing fine, whereas par­tic­i­pa­tion was in seri­ous trou­ble. The Eng­lish Wikipedia was at the tail end of a sig­nif­i­cant drop in the reten­tion of new edi­tors: peo­ple were giv­ing up the edit­ing process more quickly than ever before.”

Par­tic­i­pa­tion mat­ters because it dri­ves qual­i­ty. Peo­ple come and go nat­u­ral­ly, and that means we need to con­tin­u­ally bring in and suc­cess­fully ori­ent new peo­ple. If we don’t, the com­mu­nity will shrink over time and qual­ity will suf­fer. That’s why par­tic­i­pa­tion is our top pri­or­ity right now.

…Dele­tions and rever­sions might be dis­taste­ful to new edi­tors, but how can we, for instance, main­tain strict stan­dards about biogra­phies of liv­ing peo­ple (BLP) with­out revert­ing prob­lem­atic edits and delet­ing inap­pro­pri­ate arti­cles? Gard­ner rejected the premise:

I don’t believe that qual­ity and open­ness are inher­ently opposed to each oth­er. Open­ness is what enables and moti­vates peo­ple to show up in the first place. It also means we’ll get some bad faith con­trib­u­tors and some who don’t have the basic com­pe­tence to con­tribute well. But that’s a rea­son­able price to pay for the over­all effec­tive­ness of an open sys­tem, and it does­n’t inval­i­date the basic premise of Wikipedia: that open­ness will lead to qual­i­ty.

…While stak­ing the Foun­da­tion’s claim to the more tech­ni­cal side of the equa­tion, Gard­ner does­n’t shrink from pro­vid­ing advice on how we can fix the cul­tural prob­lem.

If you look at new edi­tors’ talk pages, they can be pretty depress­ing—they’re often an unin­ter­rupted stream of warn­ings and crit­i­cisms. Expe­ri­enced edi­tors put those warn­ings there because they want to make Wikipedia bet­ter: their intent is good. But the over­all effect, we know, is that the new edi­tors get dis­cour­aged. They feel like they’re mak­ing mis­takes, that they’re get­ting in trou­ble, peo­ple don’t want their help. And so they leave, and who can blame them? We can mit­i­gate some of that by ton­ing down the intim­i­da­tion fac­tor of the warn­ings: mak­ing them sim­pler and friend­lier. We can also help by adding some praise and thanks into the mix. When the Foun­da­tion sur­veys cur­rent edi­tors, they tell us one of the things they enjoy most about edit­ing Wikipedia is when some­one they respect tells them they’re doing a good job. Praise and thanks are pow­er­ful.

…[Around the time of the and con­tro­ver­sies] Jimmy went to Wiki­me­dia and said “qual­ity … we need to do bet­ter”, [and through the dis­tor­tions of the rip­ple-­ef­fect in the pro­jects] there was this moral panic cre­ated around qual­ity … what Jimmy said gave a whole lot of peo­ple the license to be jerks. … Folks are play­ing Wikipedia like it’s a video game and their job is to kill van­dals … every now and again a nun or a tourist wan­ders in front of the AK47 and gets mur­dered …

Many peo­ple have com­plained that Wikipedia patrollers and admin­is­tra­tors have become insu­lar and taken on a bunker men­tal­i­ty, dri­ving new con­trib­u­tors away. Do you agree, and if so, how can this atti­tude be com­bated with­out alien­at­ing the cur­rent core con­trib­u­tors?

I would­n’t char­ac­ter­ize it as bunker men­tal­ity at all. It’s just a sys­tem that’s cur­rently opti­mized for com­bat­ing bad edits, while being insuf­fi­ciently con­cerned with the well-be­ing of new edi­tors who are, in good faith, try­ing to help the pro­jects. That’s under­stand­able, because it’s a lot eas­ier to opti­mize for one thing (no bad edit should sur­vive for very long) than for many things (good edits should be pre­served and built upon, new edi­tors should be wel­comed and coached, etc.). So I don’t think it’s an atti­tu­di­nal prob­lem, but more an issue of focus­ing energy now on re-bal­anc­ing to ensure our processes for patrolling edits, delet­ing con­tent, etc. are also designed to be encour­ag­ing and sup­port­ive of new peo­ple.

How can a cul­ture that has a heavy sta­tus quo bias be changed? How can the com­mu­nity be per­suaded to become less risk-a­verse?

My hope is that the com­mu­nity will become less risk-a­verse as the Foun­da­tion makes suc­cess­ful, use­ful inter­ven­tions. I believe the Vec­tor usabil­ity improve­ments are gen­er­ally seen as suc­cess­ful, although they of course haven’t gone far enough yet. Wik­ilove is a small fea­ture, but it’s been adopted by 13 Wikipedia lan­guage-ver­sions, plus Com­mons. The arti­cle feed­back tool is on the Eng­lish Wikipedia and is cur­rently being used in seven other pro­jects. The new-ed­i­tor feed­back dash­board is live on the Eng­lish and Dutch Wikipedias. New warn­ing tem­plates are being tested on the Eng­lish and Por­tuguese Wikipedias. And the first opt-in user-­fac­ing pro­to­type of the visual edi­tor will be avail­able within a few weeks. My hope is all this will cre­ate a vir­tu­ous cir­cle: sup­port for open­ness will begin to increase open­ness, which will begin to increase new edi­tor reten­tion, which will begin to relieve the work­load of expe­ri­enced edi­tors, which will enable every­one to relax a lit­tle and allow for more exper­i­men­ta­tion and play­ful­ness.

Regain­ing our sense of open­ness will be hard work: it flies in the face of some of our strongest and least healthy instincts as human beings. Peo­ple find it dif­fi­cult to assume good faith and to devolve pow­er. We nat­u­rally put up walls and our brains fall into us-ver­sus-them pat­terns. That’s nor­mal. But we need to resist it. The Wiki­me­dia projects are a tri­umph of human achieve­ment, and they’re built on a belief that human beings are gen­er­ally well-in­ten­tioned and want to help. We need to remem­ber that and to behave con­sis­tently with it.

I am skep­ti­cal that Gard­ner’s ini­tia­tives will change the curves (although they are not bad ideas); my gen­eral belief is that delet­ing pages, and the omnipresent threat of dele­tion, are far more harm­ful than com­plex markup. (I should note that Gard­ner has read and praised this essay, but also that much of this essay is based on my feel­ings and .)

Regard­less of whether the WMF really under­stands the issue, it is almost unin­ten­tion­ally hilar­i­ous to look at the pro­posed solu­tion­s—­for exam­ple, one amounts to restor­ing early Wikipedia cul­ture & prac­tices in pri­vate sand­box­es, pro­tected from the reg­u­lars & their guide­li­nes! Band-aids like Wik­ilove or arti­cle rat­ing but­tons are not get­ting at the core of the prob­lem; a com­mu­nity does not live on high­-qual­ity rat­ing tools () or die on poor ones (YouTube). The Foundation/developers some­times do the right thing, like strik­ing down an Eng­lish Wikipedia ‘con­sen­sus’ to restrict arti­cle cre­ation even fur­ther, but will it be enough? To quote Carl Bia­lik again:

Adding more edi­tors “is one of our top pri­or­i­ties for the year,” says Howie Fung, senior prod­uct man­ager for the Wiki­me­dia Foun­da­tion, which aims to increase the num­ber of edi­tors across all lan­guages of Wikipedia to 95,000 from 81,450 by June of next year.

The sub­se­quent research has in some respects vin­di­cated my views: some have tried to argue that the declines are due to pick­ing all the low-hang­ing fruit in arti­cles or in avail­able edi­tors, that lower qual­ity edi­tors mer­ited addi­tional pro­ce­dures. But what we see is not that new edi­tors are worse or low­er-qual­i­ty, but that they are as high­-qual­ity and use­ful as they have been since 2006; nor is this due to a declin­ing sup­ply of new edi­tors plus bet­ter pro­ce­dures for win­now­ing them out, from “Kids these days: the qual­ity of new Wikipedia edi­tors over time” (“Research:New­comer qual­ity”):

What we found was encour­ag­ing: the qual­ity of new edi­tors has not sub­stan­tially changed since 2006. More­over, both in the early days of Wikipedia and now, the major­ity of new edi­tors are not out to obvi­ously harm the ency­clo­pe­dia (~80%), and many of them are leav­ing valu­able con­tri­bu­tions to the project in their first edit­ing ses­sion (~40%). How­ev­er, the rate of rejec­tion of all good-­faith new edi­tors’ first con­tri­bu­tions has been ris­ing steadi­ly, and, accord­ing­ly, reten­tion rates have fall­en. What this means is that while just as many pro­duc­tive con­trib­u­tors enter the project today as in 2006, they are enter­ing an envi­ron­ment that is increas­ingly chal­leng­ing, crit­i­cal, and/or hos­tile to their work. These lat­ter find­ings have also been con­firmed through pre­vi­ous research.

(I am struck by the fall in new­bie sur­vival rates for the high­est-qual­i­ty—‘golden’—ed­i­tors in 2006-2007. The Seigen­thaler affair was, rec­ol­lect, Novem­ber-De­cem­ber 2005.)

I sus­pected that Fung’s objec­tive would not be reached, as indeed it was not13.

Remem­ber, most mea­sures are directed against casual users. Power users can nav­i­gate the end­less process­es, or call in pow­er­ful friends, or sim­ply wait a few years14 The most pow­er­ful pre­dic­tor of whether an edi­tor will stop edit­ing is… how much they are edit­ing.15 User:Res­i­dent Mario (joined 2008) points in his Decem­ber 2011 essay “Open­ness ver­sus qual­i­ty: why we’re doing it wrong, and how to fix it”16 to a dra­matic graph of edi­tor counts17:

And it’s casual users who mat­ter. We lost the cre­den­tialed experts years ago, if we ever had them. Sur­veys ask­ing why are almost otiose; they will do so if they are excep­tional or if they are man­ag­ing PR around a dis­cov­ery. But Wikipedia is not Long Con­tent; why would they con­tribute if they can get the traf­fic they desire just by insert­ing links18? Why would they build their intel­lec­tual houses on sand?19 They get the best of both world­s—­gain­ing traf­fic and avoid­ing the toxic dele­tion­ists.

And we can see this quite direct­ly: when the gen­eral pop­u­la­tion of edi­tors get solicited to con­tribute to AfD, their !votes are dif­fer­ent from the AfD reg­u­lars, and in par­tic­u­lar, when keep !vot­ers spread the word about an AfD, their recruits are much more likely to !vote keep a well, while would-be deleters do their cause no favor with pub­lic­ity20. Can there be any more con­vinc­ing proof that dele­tion­ism and its man­i­fes­ta­tions are a can­cer on the Wikipedia cor­pus?

### The editing community is dead; who killed it?

Hav­ing dis­cussed the broad trend of dele­tion­ism and prob­lems with edi­tors, let’s look at one spe­cific dele­tion­ist prac­tice which has, as far as I know, never been exam­ined before, despite being a clas­sic dele­tion­ist prac­tice and, like most dele­tion­ist prac­tices, one that by the num­bers turns out to badly mis­serve both edi­tors and read­ers: the prac­tice of mov­ing links from Exter­nal Links to the Talk page.

The rea­son for my inter­est in this minor dele­tion­ist prac­tice is that I no longer edit as much as I used to, and so fre­quently when I find an excel­lent cita­tion (ar­ti­cle, review, inter­view etc.) I will often just copy it into the Exter­nal Links sec­tion or (if I am feel­ing espe­cially ener­get­ic) I will excerpt the impor­tant bits onto the arti­cle’s Talk page. I real­ized that this con­sti­tutes what one might call a “”: I could go back and see how often the excerpts were copied by another edi­tor into the arti­cle. This is bet­ter than just look­ing at “how often anime edi­tors edit” or “how often anime arti­cles are edited” because it is less related to out­side events—per­haps anime news was sim­ply bor­ing over that period or per­haps some new bots or scripts were rolled out. Whereas if there are no anime edi­tors who will edit even when pre­sented with gift-wrapped RSs (links & excerpts specif­i­cally called out for their atten­tion, and triv­ially copy­-­pasted into the arti­cle), then that’s pretty con­vinc­ing evi­dence that there is no longer a ‘there’ there—that the edi­tors are no longer active.

#### Sins of Omission: experiment 1

On at least two arti­cles (Talk:Gur­ren Lagan­n#In­ter­views & Talk:Royal Space Force: The Wings of Hon­nêamise#­Sources), I have been stren­u­ously opposed by edi­tors who object to hav­ing more than a hand­ful of links in the des­ig­nated Exter­nal Links sec­tion; they acknowl­edged the links were (most­ly) all undoubted RSs and rel­e­vant to the arti­cle—but they refused to incor­po­rate the links into the arti­cle. This is bad from every angle, yet few other edi­tors were inter­ested in help­ing me.

So I’ve begun going through my old main­space Talk edits using Special:Contributions, start­ing all the way back in April 2007 (>4 years ago, more than enough time for edi­tors to have made use of my gift­s!), look­ing for cases where I’ve dumped such ref­er­ences. I com­piled two lists, of 146 ani­me-re­lated edits, and 102 non-anime-re­lated edits.

Before going any fur­ther, it’s worth ask­ing—to avoid and post hoc ratio­nal­iza­tion—what you expect my results to be.

When ask­ing your­self, remem­ber that these edits, and a larger set of edit we’ll soon exam­ine, are selected edits; they are high­-qual­ity edits, ones where I thought the rel­e­vant arti­cle must cover it. They are not low-qual­ity dumps of text or links by a pass­ing anony­mous edi­tor or done out of idle amuse­ment. What per­cent­age would you expect to have been used after a week, enough time that most arti­cle-watch­list­ing edi­tors will have seen the diff and had leisure to deal with task more com­plex than revert­ing van­dal­ism? 50% does­n’t seem like a bad start­ing point. How about after a year? Or two? Maybe 70% or 90%? After that, if it has­n’t been dealt with, it’s prob­a­bly not ever going to be dealt with (even assum­ing the sec­tion has­n’t been stuffed in an archive page). Hold onto your esti­mate.

Once the lists were com­piled and weed­ed, I wrote a Haskell pro­gram to do the analy­sis. The pro­gram loads the spec­i­fied Talk page URLs and extracts all URLs from the Talk diff so it can check whether any of them were linked in the Arti­cle (which, inci­den­tal­ly, leads to false pos­i­tives and an overesti­ma­tion21).

##### Results

The results for my edits when run on the two lists:

• ani­me: of 146 edits, 11 were used, or <8%
• non-anime: 102 edits, 3 used, or <3%

For com­par­ison, we can look at an edi­tor who has devoted much of her time to find­ing ref­er­ences for anime arti­cles—but made the colos­sal mis­take of believ­ing the EL par­ti­sans when they said exter­nal links should either be incor­po­rated into arti­cle text or listed on the talk page. User:Kreb­Markt has made per­haps thou­sands of such edits from impec­ca­ble RSs; it is pos­si­ble that my own con­tri­bu­tions are skewed down­wards, say, by a con­gen­i­tal inabil­ity to select good ref­er­ences. Hence, look­ing at her ref­er­ence-ed­its will pro­vide a cross-check.

I com­piled her most recent 1000 edits to the arti­cle talk space with a quick down­load: elinks -dump 'http://en.wikipedia.org/w/index.php?title=Special:Contributions&limit=500&contribs=user&target=KrebMarkt&namespace=1' 'http://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=20110227162151&limit=500&contribs=user&target=KrebMarkt&namespace=1' | grep '&diff='. Then I man­u­ally removed edits which were minor or did not seem to be her usual ref­er­ence-ed­its, result­ing in the fol­low­ing list of 958 edits from Decem­ber 2010 to Decem­ber 2011. (Kreb­Markt almost exclu­sively adds ani­me-re­lated ref­er­ences, so I did not pre­pare a non-anime list.) The results:

• Of the 958 edits adding ref­er­ences, 36 were used in the arti­cle, or <4%
• Com­bin­ing my anime & non-anime with Kreb­Mark­t’s edits, we have 1206 edits adding ref­er­ences, of which less than 50 were used in the arti­cle, or <4.15%

Besides it being sur­pris­ing that Kreb­Markt (not a par­tic­u­larly com­mit­ted inclu­sion­ist, if she be an inclu­sion­ist at all) had a suc­cess rate half mine, <4.15% is shock­ingly low.

1156 ignored edits rep­re­sents a stag­ger­ing waste of edi­tor-­time22. This can­not be explained as our faults: we are both expe­ri­enced edi­tors (I began edit­ing in 2004, and Kreb­Markt in 2008), who know what good RSs are. And all of the edits con­tain good RSs. (The reader is invited to check edits and see for him­self whether they are solid and valu­able RSs, like reviews by the .) That per­haps 1⁄10 of our sug­gested ref­er­ences are included is due solely to the apa­thy or nonex­is­tence of other edi­tors. (If such a rate is a ‘suc­cess’, may the Almighty pre­serve us from a fail­ure!)

Since that will not soon change for the bet­ter, this leads to one con­clu­sion: the idea that ref­er­ences hid­den on Talk pages will one day be used is false.

#### Sins of Omission: experiment 2

“Some­body remarked: ‘I can tell by my own reac­tion to it that this book is harm­ful.’ But let him only wait and per­haps one day he will admit to him­self that this same book has done him a great ser­vice by bring­ing out the hid­den sick­ness of his heart and mak­ing it vis­i­ble.”

We have looked at what sug­gest­ing addi­tions results in: abject fail­ure. The Wikipedia com­mu­nity is fail­ing at incor­po­rat­ing new links. Some attempted to jus­tify my exper­i­ment above: it’s OK because at least the exist­ing Exter­nal Links sec­tions are qual­ity sec­tions. This is des­per­ate spe­cial plead­ing, but we should test it. How is the edit­ing com­mu­nity at the flip side of the coin—re­tain­ing old links? If inclu­sion­ists’ sug­ges­tions are being ignored, is this at least fairly applied, with dele­tion­ists’ edits also futile?

Unfor­tu­nate­ly, test­ing this requires destruc­tive edit­ing. (We can’t sim­ply sug­gest on talk pages that exter­nal links be removed because that is both not how dele­tion­ists oper­ate and likely will result in no changes, per the pre­vi­ous exper­i­ment demon­strat­ing inac­tion on the part of edi­tors.)

The pro­ce­dure: remove ran­dom links and record whether they are restored to obtain a restora­tion rate.

• Edi­tors might defer to other edi­tors, so I will remove links as a anony­mous IP user from mul­ti­ple prox­ies; the restora­tion rate will nat­u­rally be an underesti­mate of what a reg­is­tered edi­tor would be able to com­mit, much less a ten­den­tious dele­tion­ist.

• To avoid issues with cher­ry-pick­ing or biased selec­tion of links23, I will remove only the final exter­nal link on pages selected by Special:Random#External_links which have at least 2 exter­nal links in an ‘Exter­nal links’ sec­tion, and where the final exter­nal link is nei­ther an ‘offi­cial’ link nor tem­plate-­gen­er­at­ed. (This avoids issues where pages might have 5 or 10 ‘offi­cial’ exter­nal links to var­i­ous ver­sions or local­iza­tions, all of which an edi­tor could con­fi­dently and blindly revert the removal of; tem­plate-­gen­er­ated links also carry impri­maturs of author­i­ty.)

• The edit sum­mary for each edit will be rm external link per [[WP:EL]]—which has the nice prop­erty of being mean­ing­less to any­one capa­ble of crit­i­cal thought (by def­i­n­i­tion, a link removal should be per one of WP:EL’s cri­te­ri­on­s—but which cri­te­rion?) but also offi­cial-look­ing like many dele­tion­ist edit-­sum­maries.

This point is very impor­tant. We are not inter­ested in “van­dal­ism in gen­eral”, nor “all pos­si­ble forms of exter­nal link van­dal­ism” (like adding spam links, insert­ing gib­ber­ish, break­ing syn­tax), but in bad edits which mimic how a dele­tion­ist would edit. A dele­tion­ist would avoid cer­tain links, and would be sure to make some allu­sion to pol­i­cy. (Shades of : it is impos­si­ble to dis­tin­guish an actual dele­tion­ist’s edits from ran­dom dele­tions accom­pa­nied by repet­i­tive jar­gon.) If our exper­i­ment does not mimic these traits, our final mea­sure­ment of bad-edit rever­sion rate will sim­ply not be mea­sur­ing what we hoped to mea­sure.

• To avoid flood­ing issues and be less notice­able, no more than 5 or 10 links a day will be removed with at least 1 minute between each edit.

• To avoid build­ing up cred­i­bil­i­ty, I will not make any real edits with the anony­mous IPs

• After the last of the 100 links have been removed, I will wait 1 month (long enough for the edit to drop off all watch­lists and rever­sion rates become close to nonex­is­tent24) and restore all links. I pre­dict at least half will not be restored and cer­tainly not more than 90%.

The full list of URL diffs is avail­able as an appen­dix.

After fin­ish­ing the link removals, I briefly looked over the edits con­tri­bu­tion pages for (top), which spec­i­fies whether an edit is still the lat­est edit for that page (all reverted removals will by def­i­n­i­tion not still be the lat­est edit, but some non-re­verted edits will have unre­lated edits steal­ing the sta­tus, so the num­ber gives an upper bound on how many removals were revert­ed). It looked like <10%.

I was also struck dur­ing the process of going through Special:Random by how many ‘Exter­nal Links’ sec­tions have been, in wretched sub­terfuges, renamed ‘Sources’, ‘Ref­er­ences’, ‘Fur­ther read­ing’, or the arti­cle has a long Ref­er­ences sec­tion stuffed with exter­nal links which are used once; per­haps edi­tors col­lec­tively know that putting a link into a sec­tion named ‘Exter­nal Links’ is paint­ing a cross-hair on its fore­head. Too, I was struck by the gen­eral qual­ity of the links: of the 100, I would have assented to the removal of no more than 5 (10 at the most). In gen­er­al, arti­cles err far on the side of includ­ing too few exter­nal links rather than too many.

How many read­ers were affected by my exper­i­ment over the course of the month of wait­ing? Feel free to esti­mate or give a range—1,000 or 10,000 or maybe 100,000 read­ers? The arti­cles are ran­domly picked, so it seems highly unlikely that there is sig­nif­i­cant over­lap. But my best esti­mate, based on stats.grok.se data for the 100 arti­cles’ traf­fic in March 2012, is that some­where around >~335,000 read­ers were affected25.

How many edi­tors were affect­ed? The 100 arti­cles edited were watch­listed by a median of 5 edi­tors each; unfor­tu­nate­ly, in lieu of tech­nolo­gies like Patrolled Revi­sions, we can­not esti­mate how many times each edit was checked by a human (as many of those edi­tors no doubt are inac­tive or do not mon­i­tor their watch­list close­ly).

What was the early reac­tion when I men­tioned this exper­i­ment? Ian Wool­lard said

…if you’d have picked some­thing other than exter­nal links, that might, or might not have been a good test.

Last time I checked (which admit­tedly was a while ago) Wikipedia had a notice­board whose entire pur­pose, was essen­tially to delete as many exter­nal links as pos­si­ble, they’d even added a pol­icy that said they could do that in every sin­gle case unless you could get a major­ity in a poll to keep indi­vid­ual links; oh and in prac­tice they pretty much !vote-stuffed those polls too by announc­ing the polls on the notice­board, so the chances of a clear major­ity was low. Oh, and there was a bunch of shady anony­mous IPs involved as well that swing around after the fact to edit war them away any­way if an exter­nal link they did­n’t favor gets through all that.

Basi­cal­ly, exter­nal links are one of the most hated parts of Wikipedia, and if hardly any of them got fixed it would­n’t sur­prise me, and would­n’t prove any­thing very much.

Exag­ger­a­tion? Well, con­sider what the active admin­is­tra­tor User:­Fu­ture Per­fect at Sun­rise wrote in the WP:AN/I dis­cus­sion:

Hmm, strange exper­i­ment. Given the huge num­ber of inap­pro­pri­ate exter­nal links we have, I really won­der: would­n’t a ran­dom removal of a hun­dred links catch so many bad links objec­tively wor­thy of removal that the net effect of the “van­dal­ism” might be more ben­e­fit than harm? If the exper­i­ment is meant to mea­sure how good the com­mu­nity is at revert­ing van­dal­ism, I can’t see how they can do that with­out hav­ing a mea­sure for these ran­dom ben­e­fi­cial hits.

None of the com­menters rose to my chal­lenge to esti­mate what the revi­sion rate should be, with the excep­tion of the admin­is­tra­tor User:Horologium (who iden­ti­fies as an tran­swik­i-ing exclu­sion­ist26, which in prac­tice means dele­tion­ism) who looked at 19 arti­cles and esti­mated that ~30% of ELs were bad by his stan­dards (so we can infer that a rever­sion rate of any­thing but 70% will highly likely either be allow­ing good links to be deleted or defend­ing bad links by his stan­dard­s).

##### Results
1. nei­ther IP address was con­tacted at any point in the exper­i­ment, blocked, or banned

2. One arti­cle was delet­ed; my edit was not reverted before dele­tion (ac­cord­ing to the admin Toby Bartel)

3. Of the 100 edits, 3 were revert­ed:

3% is far worse than I had pre­dict­ed, and sta­tis­ti­cally sug­gests that the true rate is no higher than 7%27. This leads to one con­clu­sion: exter­nal links are highly vul­ner­a­ble to dele­tion­ism.

##### Followup

A month after this exper­i­ment, I resur­veyed the 100 edits to see how many restora­tions had been revert­ed. 4 had been revert­ed:

1. Castell Dinas Bran
2. Pro­tec­tor (2009 film) (no expla­na­tion)
3. Osprey Pub­lish­ing (part of a whole­sale dele­tion of links)
4. Mar­i­lyn vos Savant (this one is ques­tion­able as well; link­ing to the Parade home­page seems dis­tinctly less use­ful to the reader than link­ing to Parade’s back­-archives where vos Savan­t’s columns are…)

Those who think that 3% was the cor­rect rever­sion rate for the removals are invited to explain how 4% could be the cor­rect rever­sion rate for the re-adding of the same links—if it was accept­able for 97% to be removed in the first place, how could it also be accept­able for 94% to then be restored?

#### Tallying the Damage

##### Ignoti, sed non occulti

One might try to defend this waste­ful prac­tice by claim­ing that some edi­tors and read­ers will go to the Talk page and there might notice and visit the deleted links. This could only ame­lio­rate the prob­lem slight­ly, but it’s worth inves­ti­gat­ing just how rarely Talk pages are vis­ited so we can explode this par­tic­u­lar instance of the ‘fal­lacy of the invis­i­ble’. How many of our read­ers actu­ally look at the talk page as well? (Do a quick esti­mate, as before, so you can know if you were right or wrong, and by how much.) I know some writ­ers writ­ing arti­cles on Wikipedia have men­tioned or rhap­sodized at length on the inter­est of the talk pages for arti­cles, but they are rare birds and sta­tis­ti­cally irrel­e­vant.

It might be enough sim­ply to know how much traf­fic to talk pages there is peri­od. I doubt edi­tors make up much of Wikipedi­a’s traf­fic, with the shriv­el­ing of the edit­ing pop­u­la­tion, which never kept pace with the growth into a top 10/20 web­site, so that would give a good upper bound. It would seem to be very small; there’s not a sin­gle Talk page in the top 1000 on stats.grok.se’s top arti­cles. We can look at indi­vid­ual arti­cles; Talk:Anime has 273 hits over one month while the arti­cle has 128,657 hits (a fac­tor of 471); or Talk:Barack Obama with 1800 over that month com­pared to with its 504,827 hits (a fac­tor of 280).

The raw stats used by stats.grok.se are avail­able for down­load, so we can look at all page hits, sum all arti­cle and all Talk hits and see what the ratio is for the entire Eng­lish Wikipedia is on one day. (each file seems to be an hour of the day so I down­loaded 24 and gunzipped them all.) We do some quick shell script­ing. To find the aggre­gate hits for just talk pages:

grep -e '^en Talk:' -e '^en talk:' pagecounts-* | cut -d ' ' -f 3 | paste -sd + | bc
582771

To find aggre­gate hits for non-talk pages:

grep -e '^en ' pagecounts-* | grep -v -e '^en Talk:' -e '^en talk:' | cut -d ' ' -f 3 | paste -sd + | bc
202680742

The num­bers look sane—58,2771 for all talk page hits ver­sus 2,0268,0742 for all non-talk page hits. A fac­tor of 347 is pretty much around where I was expect­ing based on those pre­vi­ous 2 pages. The traf­fic data devel­op­er, Domas, says the sta­tis­tics exclude API hits but includes logged-in edi­tor hits, so we can safely say that anony­mous users made far fewer than 58k page views that day and hence the true ratios are worse than our pre­vi­ous ratios of 471/280/347. To put the rel­a­tive num­bers into proper per­spec­tive, we can con­vert into per­cent­ages:

• If we take the absolutely most favor­able ratio, Oba­ma’s at 280, and then fur­ther assume it was looked at by 0 logged-in users (yeah right), then that implies some­thing posted on its talk page will be seen by <0.35% of inter­ested read­ers ().
• If we use the aggre­gate sta­tis­tic and say, gen­er­ous­ly, that reg­is­tered users make up only 90% of the page views, then some­thing on the talk page will be seen by <0.028% of inter­ested read­ers ().
###### Measuring Talk page clicks: dual n-back experiment

Page views don’t tell us the most inter­est­ing thing, how many peo­ple would have clicked on the link if it had been on the arti­cle and not the Talk page. It’s impos­si­ble to answer this ques­tion in gen­er­al, unfor­tu­nate­ly, since Wikipedia does not track clicks.

How­ev­er, I have approx­i­mated the ratio for at least one arti­cle: the arti­cle links to my . There are a few dozen vis­i­tors each day from Wikipedia, Google Ana­lyt­ics tells me. What will hap­pen if the link is removed to the Talk page? The arti­cle and gen­eral inter­est in n-back haven’t changed—those vari­ables are still the same. The same sort of peo­ple will be vis­it­ing the arti­cle and (not) vis­it­ing the Talk page. The vis­i­tor count will dra­mat­i­cally fall, prob­a­bly to less than 1 a day. The link was in the arti­cle for per­haps half a year, since ~2011-07-14; on 2012-02-09, I shifted it to the Talk page with a fake mes­sage prais­ing the con­tents, to mimic how an edi­tor might gen­uinely post the link on the Talk page (ask­ing the for­bear­ance & coop­er­a­tion of my fel­low edi­tors in hid­den com­ments). I then sched­uled a fol­lowup for 100 days: 2012-05-19.

It ought to be triv­ial and point­less—ev­ery­one should acknowl­edge that essen­tially no read­ers also read Talk pages, but it’s still worth pre­com­mit­ting: I pre­dict that Talk click­-throughs will aver­age <5% of Arti­cle click­-throughs, and the dif­fer­ence between the 2 datasets will be sta­tis­ti­cal­ly-sig­nif­i­cant at p < 0.05.

As promised, on 2012-05-20 I restored my FAQ link and began analy­sis:

1. Before:

Between 2011-07-14 and 2012-02-08 (a longer peri­od), the totals were 31,454/23,538 (pageview/unique pageview), with 1,910/1,412 from the Eng­lish Wikipedia and as one would expect, a lesser 740/618 from the Ger­man Wikipedia28. n = 209, so the daily aver­age click from the Eng­lish Wikipedia is

2. After:

Between 10 Feb­ru­ary and 12:50 PM 2012-05-20, my DNB FAQ received from all sources 21,803/16,899 page views (raw/unique). 327/164 page views were from the Ger­man Wikipedia, and there were 161/155 page views from the Eng­lish Wikipedia. n = 100, so the daily aver­age is .

Divid­ing the two aver­ages shows that the aver­age clicks in this period were ~17.6%, not <5% as I had pre­dict­ed. This dif­fer­ence between the two groups is sta­tis­ti­cal­ly-sig­nif­i­cant at p < 0.001, need­less to say29.

So, Talk page click­-throughs are indeed lower than Arti­cle click­-throughs, but almost 3 times larger than I expect­ed. What hap­pened? We know this can’t be the gen­eral case from look­ing at the states.grok.se data—there just isn’t enough traf­fic to Talk pages for any rea­son.

My best guess is that the dual n-back arti­cle is sim­ply a bad exam­ple. If we look at the April 2012 data as an exam­ple, we see that it gets some­thing like 15 page views a day with occa­sional spikes and throughs, 568 vis­its over 30 days aver­ag­ing 19 vis­its a day. There were 9 click­-throughs on aver­age dur­ing the pre­vi­ous sam­ple—­sug­gest­ing that some­thing like half the read­ers are click­ing through to one exter­nal link! This does not sound like “nor­mal” arti­cle behav­ior, and sug­gests to me that the very short and incom­plete nature of the dual n-back Wikipedia arti­cle is caus­ing read­ers to look for fur­ther bet­ter infor­ma­tion like my FAQ, which might cause read­ers to also resort to check­ing the talk page for infor­ma­tion (where they would run into my glow­ing fake blurb vis­i­ble on the first screen). Unfor­tu­nate­ly, I can­not check this the­ory because cur­rently only one arti­cle links to my site where I can gather Google Ana­lyt­ics infor­ma­tion.

##### The Forgotten Reader

More instruc­tive is esti­mat­ing how many read­ers have been deprived of the chance to use the ref­er­ences for just the sub­set of 1206 edits we have already looked at above. We can reuse stats.grok.se with a lit­tle more pro­gram­ming; we will ask it how many hits/page-views, in total, there were in Novem­ber 2011 of the 472 unique arti­cles cov­ered by those 1206 edits.

The total: 8,480,394.

Extrap­o­lat­ing back­wards to 2007/2008 is left as an exer­cise for the read­er.

When we con­sider how false the idea that this prac­tice serves the edi­tor, and when we con­sider how many read­ers are ill-served, they sug­gest that the com­mon prac­tice of ‘mov­ing reference/link to the Talk page’ be named for what it is: a sub­tle form of dele­tion.

It would be a ser­vice to our read­ers to end this prac­tice entire­ly: if a link is good enough to be hid­den on a Talk page (sup­pos­edly in the inter­ests of incor­po­rat­ing it in the future, which we have seen is a empty promis­sory note), then it is good enough to put at the end of Exter­nal Links or a Fur­ther Read­ing sec­tion, and the lit­er­ally mil­lions of affected read­ers will not be deprived of the chance to make use of them.

I fully expect to see this prac­tice for years to come.

## No club that would have me

“Elab­o­rate euphemisms may con­ceal your intent to kill, but behind any use of power over another the ulti­mate assump­tion remains: ‘I feed on your ener­gy.’”

Frank Her­bert’s (“Addenda to Orders in Coun­cil—The Emperor Paul Muad’dib”)

This result will come as no sur­prise to long­time inclu­sion­ists. The dele­tion process deletes most arti­cles which enter it, and has long been com­plained about by out­siders. Entire com­mu­ni­ties (such as the 30 or online com­mu­ni­ties31) have been alien­ated by purges of arti­cles—purges which not infre­quently result in abuse of process, much new­bie bit­ing, and com­i­cal spec­ta­cles like AfD reg­u­lars (usu­ally dele­tion­ists) insist­ing a given arti­cle is absolutely non-no­table and experts in the rel­e­vant field demur­ring; a par­tic­u­larly good AfD may see state­ments of experts dis­missed on spe­ciously pro­ce­dural grounds such as hav­ing been made in the expert’s blog (and so fail­ing WP:RS, or per­haps sim­ply being dis­missed as WP:OR) and not a tra­di­tional medium (de­spite the accel­er­at­ing aban­don­ment of ‘tra­di­tional’ RSs by experts in many fields32). The trend has been clear. , who has been edit­ing Wikipedia even longer than myself (since 2003) and who wrote on Wikipedia, writes in “Unwant­ed: New arti­cles in Wikipedia”:

’It’s incred­i­ble to me that the com­mu­nity in Wikipedia has come to this, that arti­cles so obvi­ously “keep” just a year ago, are being chal­lenged and locked out. When I was active back on the mail­ing lists in 2004, I was a well known dele­tion­ist. “Wiki isn’t paper, but it isn’t an attic,” I would say. Selec­tiv­ity mat­ters for a qual­ity ency­clo­pe­dia. But it’s a whole dif­fer­ent mood in 2007. Today, I’d be labeled a wild eyed inclu­sion­ist. I sus­pect most vet­eran Wikipedi­ans would be labeled a bleed­ing heart inclu­sion­ist too. How did we raise a new gen­er­a­tion of folks who want to wipe out so much, who would shoot first, and not ask ques­tions what­so­ev­er? [If Lih can write this in 2007, you can imag­ine how peo­ple who iden­ti­fied as inclu­sion­ists in 2004, such as myself or The Cunc­ta­tor, look to Wikipedi­ans who recently joined.]

It’s as if there is a cul­ture now in Wikipedia. There are throngs of dele­tion happy users, like grumpy old gate­keep­ers, toss­ing out cus­tomers and arti­cles if they don’t com­ply to some new prickly hard-nosed stan­dard. It used to be if an arti­cle was short, some­one would add to it. If there was spam, some­one would remove it. If facts were ques­tion­able, some­one would research it. The beauty of Wikipedia was the human fac­tor—rea­son­able peo­ple inter­act­ing and col­lab­o­rat­ing, build­ing off each oth­er’s work. It was impor­tant to start stuff, even if it was­n’t com­plete. Assume good faith, neu­tral point of view and if it’s not right, {{sofixit}}. Things would grow.’

I was par­tic­u­larly depressed to read in the com­ments things from admin­is­tra­tors whose names I rec­og­nize due to their long tenure on Wikipedia, like Lly­wrch (joined 2002):

“I’m sorry that you encoun­tered that, Andrew—but not sur­prised. I had my own encounter with the new gen­er­a­tion of”quote pol­i­cy, not rea­son­ing" dele­tion­ists; I feel as if I encoun­tered (to quote from the song) “the forces of evil from a bozo night­mare.” No one—in­clud­ing me—looked good after that exchange. (I keep think­ing that I should have said some­thing dif­fer­ent, but the sur­re­al­ism of the sit­u­a­tion mul­ti­plied with the square of my frus­tra­tion kept me from my best.)"

Or Stbal­bach:

“I’m a long time edi­tor, since 2003, ranked in the top 300 by num­ber of edits (most in arti­cle space). On May 11th 2007 I mostly gave up on Wikipedi­a—there is some­thing wrong with the com­mu­ni­ty, in par­tic­u­lar peo­ple delet­ing con­tent. I’d never seen any­thing like it prior to late 2006 and 2007. Fur­ther, the use of”nag tags" at the top of arti­cles is out of hand. It’s eas­ier to nag and delete than it is to research and fix. Too many know-noth­ings who want to “help” have found a pow­er­ful niche by nag­ging and delet­ing with­out engag­ing in dia­log and sim­ply cit­ing 3 let­ter rules. If a user is unwill­ing or inca­pable of work­ing to improve an arti­cle they should not be plac­ing nag tags or delet­ing con­tent."

Also inter­est­ing is Ta bu shi da yu’s com­ment, inas­much as Ta bu invented the infa­mous {{fact}}:

“I have also seen this hap­pen­ing. It’s incred­i­ble that those who are so incred­i­bly stu­pid can get away with mis­us­ing the speedy dele­tion tag! As for DRV… don’t make me laugh. It seems to be slanted to keep arti­cles delet­ed. I can’t agree more with your sen­ti­ments that if you know all the codes to WP:AFD, then you are a men­ace to Wikipedia.”

Why is this cul­ture chang­ing? In part because arti­cle writ­ing seems to get no more respect. A review arti­cle sum­ma­rizes the find­ings of Burke and Kraut 200833:

…it is prov­ing increas­ingly hard to become a Wikipedia admin­is­tra­tor: 2,700 can­di­dates were nom­i­nated between 2001 and 2008, with a suc­cess rate of 53%. The rate has dropped from 75.5% until 2005 to 42% in 2006 and 2007. Arti­cle con­tri­bu­tion was not a strong pre­dic­tor of suc­cess. The most suc­cess­ful can­di­dates were those who edited the Wikipedia pol­icy or project space; such an edit is worth ten arti­cle edits.

What sort of edi­tor, with a uni­verse of fas­ci­nat­ing top­ics to write upon, would choose to spend most of his time on the pol­icy name­space? What sort of edi­tor would choose to stop writ­ing arti­cles?34 Admin­is­tra­tors with min­i­mal expe­ri­ence in cre­at­ing con­tent—and much expe­ri­ence in destroy­ing it and rewrit­ing the rules to per­mit the destruc­tion of even more. Is this not almost the oppo­site of what one wants? And imag­ine how the authors must feel! An arti­cle is not a triv­ial under­tak­ing; some­time sit down, select a ran­dom sub­ject, and try to write a well-or­ga­nized, flu­ent, com­pre­hen­sive, and accu­rate ency­clo­pe­dia arti­cle on it. It’s not as easy as it looks, and it’s even harder to write a well-ref­er­enced and cor­rectly for­mat­ted one. To have an arti­cle deleted is bad enough; I can’t imag­ine any neo­phyte edi­tors want­ing to have any­thing to do with Wikipedia if an arti­cle of theirs got rail­roaded through AfD. It is eas­ier to destroy than to cre­ate, and destruc­tion is infec­tious. (In the study of 3.3 years of the online SF game , play­ers were found to ‘pay it for­ward’ when the sub­ject of neg­a­tive actions; the com­mu­nity was only saved from an epi­demic of attacks by the high mor­tal­ity & quit­ting rate of neg­a­tive edi­tors—I mean, neg­a­tive play­ers35.)

Delet­ing arti­cles and pil­ing on pol­icy after guide­line after pol­icy are both directly opposed to why Wikipedi­ans con­tribute! When sur­veyed in 2011:

‘The two most fre­quently selected rea­sons for con­tin­u­ing to edit Wikipedia were “I like the idea of vol­un­teer­ing to share knowl­edge” (71%) and “I believe that infor­ma­tion should be freely avail­able to every­one” (69%), fol­lowed by “I like to con­tribute to sub­ject mat­ters in which I have exper­tise” (63%) and “It’s fun” (60%).’

And iron­i­cal­ly, the more effort an edi­tor pours into a topic and the longer & more detailed the arti­cle becomes, the more blind hatred it inspires in dele­tion­ists. If you look at AfDs for small arti­cles or stubs, the dele­tion­ists seem pos­i­tively lucid & ratio­nal; but make the arti­cle 50kB long, and watch the rhetoric fly. I call this the fan­cruft effect: dele­tion­ists are men­tally aller­gic to infor­ma­tion they do not care about or like.

If a dele­tion­ist sees an arti­cle on “Lightsaber com­bat”36 and it’s just a page long, then he has lit­tle prob­lem with it. It may strike him as too big, but rea­son­able. But if the arti­cle dares to be com­pre­hen­sive, if it is clearly the prod­uct of many hours’ labor on the part of mul­ti­ple edi­tors, if there are touches like ref­er­ences and quotes—then some­thing is wrong on the Inter­net, the very uni­verse is out of joint that this arti­cle has been so well-de­vel­oped when so many more deserv­ing top­ics lan­guish, it is a cos­mic injus­tice. A dirty beg­gar is parad­ing around act­ing like an emper­or. The arti­cle does not know its place. It needs to be smacked down and hard. And who bet­ter than the dele­tion­ist?

What is the ulti­mate sta­tus-low­er­ing action which one can do to an edi­tor, short of actu­ally ban­ning or block­ing them? Delet­ing their arti­cles.

In a par­tic­u­lar sub­ject area, who is most likely to work on obscurer arti­cles? The experts and high­-­value edi­tors—they have the resources, they have the inter­est, they have the com­pe­ten­cy. Any­one who grew up in Amer­ica post-1980 can work on [[Darth Vader]]; many fewer can work on [[Grand Admiral Thrawn]]. Any­one can work on [[Basho]]; few can work on [[Fujiwara no Teika]].

What has Wikipedia been most likely to delete in its shift dele­tion­ist over the years? Those obscurer arti­cles.

The proof is in the pud­ding: all the high-value/status Star Wars edi­tors have decamped for some­where they are val­ued; all the high-value/status Star Trek edi­tors, the Lost edi­tors… the list goes on. They left for a com­mu­nity that respected them and their work more; these spe­cific exam­ples are strik­ing because the edi­tors had to make a com­mu­ni­ty, but one should not sup­pose such depar­tures are lim­ited to fic­tion-re­lated arti­cles. There may be evap­o­ra­tive cool­ing of the com­mu­nity but it’s not towards the obses­sive fans.

“The great­est plea­sure is to van­quish your ene­mies and chase them before you, to rob them of their wealth and see those dear to them bathed in tears, to ride their horses and clasp to your bosom their wives and daugh­ters.”

Attrib­uted to

Out­siders! I real­ize it might sound like a stretch that any­one enjoys the power of nom­i­nat­ing arti­cles, that being a dele­tion­ist could be a joy­ful role. You say you under­stand how admin­is­tra­tors (with their abil­ity to directly delete, to ban, to roll­back etc.) could grow drunk on pow­er, but how could AfD nom­i­na­tions lead to such a feel­ing?

But I know from per­sonal expe­ri­ence that there is power exer­cised in nom­i­nat­ing for dele­tion. Well do I know the dark arts of gam­ing the sys­tem: of the clever use of tem­plates, of the process of delet­ing the arti­cle by care­fully chal­leng­ing and remov­ing piece after piece, of invok­ing the appro­pri­ate guide­lines and poli­cies to demol­ish argu­ments and ref­er­ences.

I have seen the wails and groans in the edit sum­maries & com­ments of my oppo­nents, and exulted in their defeat. It’s very real, the temp­ta­tion of exer­cis­ing this pow­er. It’s easy to con­vince your­self that you are doing the right thing, and merely enforc­ing the policies/guidelines as the larger com­mu­nity set them down. (Were all my nom­i­na­tions just? No, but I have suc­ceeded in fool­ing myself so well that I can no longer tell which ones truly did deserve dele­tion and which ones were deleted just because I dis­liked them or their authors.)

Who can say how many authors take it per­son­al­ly? The dele­tion process is inher­ently insult­ing: “Out of 2.5 mil­lion arti­cles, yours stands out as suck­ing so badly that it is irre­deemable and must be oblit­er­at­ed.” And it is ulti­mately sad37—life is short but must that be true of arti­cles as well as men?

# A Personal Look Back

“Once more and they think to thank you.”

I have , so I think I can speak from first-­hand expe­ri­ence here.

The prob­lem with devot­ing this much effort to Wikipedia is not that your time is wast­ed. If you get this far, you’ve absorbed enough that you know how to make edits that will last and how to defend your mate­ri­al, and this guy in par­tic­u­lar is mak­ing edits in areas par­tic­u­larly aca­d­e­mic and safe from dele­tion­ists; and your arti­cles will receives hun­dreds or thou­sands of vis­its a month (see stat­s.­grok.se—I was a lit­tle shocked at how many page hits my arti­cles col­lec­tively rep­re­sent a mon­th).

The prob­lem is that the ben­e­fits are going entirely to your read­ers. It’s a case-s­tudy in . Unlike FLOSS or other forms of cre­ation which build a port­fo­lio, you don’t even get intan­gi­bles like rep­u­ta­tion—to the extent any reader thinks about it, they’ll just men­tally thank the Wikipedia col­lec­tive. When you make 10,000 edits to your per­sonal wiki, you will prob­a­bly have writ­ten some pretty decent stuff, you will have estab­lished a per­sonal brand, etc. Maybe it’ll turn out great, maybe it’ll turn out to be worth noth­ing. But when you make 10,000 edits to Wikipedia, you are guar­an­teed to get noth­ing.

No doubt one can point to the occa­sional Wikipedia edi­tor who has ben­e­fited with a book con­tract or a job or some­thing. But what about all the other edi­tors in Wikipedi­a:List of Wikipedi­ans by num­ber of edits?

To again turn to myself; when I was pour­ing much of my free energy and research inter­est into improv­ing Wikipedia, I got noth­ing back except sat­is­fac­tion and being able to point peo­ple at bet­ter arti­cles dur­ing dis­cus­sions. I began writ­ing things that did­n’t fit on Wikipedia and got a per­sonal web­site because I did­n’t want to use some flaky free ser­vice, and the world did­n’t end. I now have an actual rep­u­ta­tion among some peo­ple; on occa­sion, peo­ple even email me with job offers to write things, hav­ing learned of me from my web­site. I owe my cur­rent (very mod­est) liv­ing to my writ­ings being clearly mine, and not “ran­dom stuff on Wikipedia”. I’m not say­ing any of this is very impres­sive, but I am say­ing that these are all ben­e­fits I would not have received had I con­tin­ued my edit­ing on Wikipedia. Now I occa­sion­ally add exter­nal links, and I try to defend arti­cles I pre­vi­ously wrote. Once in a blue moon I post some highly tech­ni­cal or fac­tual mate­r­ial I believe will be safe against even hard­core dele­tion­ists. But my glory days are long over. The game is no longer worth the can­dle.

Wikipedia is won­der­ful, but it’s sad to see peo­ple sac­ri­fic­ing so much of them­selves for it.

# What Is To Be Done?

Wikipedia was enabled by soft­ware. It enabled a com­mu­nity to form. This com­mu­nity did truly great work; it’s often said Wikipedia is his­toric, but I think most peo­ple have lost sight of how his­toric Wikipedia is as it fades into the back­ground of mod­ern life; per­haps only schol­ars of the future have enough per­spec­tive on this leviathan, in the same way that Diderot’s ency­clo­pe­dia was—­for all the con­tro­versy and ban­ning—not given its full due at pub­li­ca­tion. (But how could it? Ency­clo­pe­dias are more processes than fin­ished works, and of no ency­clo­pe­dia is this more true than Wikipedi­a.)

That com­mu­nity did great work, aston­ish­ing in breadth and depth, I said. But that com­mu­nity is also respon­si­ble for mis­us­ing the tools. If van­dal­ism is eas­ier to remove than cre­ate, then it will tend to dis­ap­pear. But AfD is not van­dal­ism. There are no tech­ni­cal fixes for dele­tion­ist edi­tors. As long as most edi­tors have weak views, are will­ing to stand by while ‘nerdy’ top­ics feel the ax, who think ‘dele­tion­ists mostly get it cor­rect’, then the sit­u­a­tion will not change.

Could dele­tion be a pos­i­tive feed­back cycle? Will the waves of dele­tion con­tinue to encour­age edi­tors to leave, to not sign up, to let the dele­tion­ists con­tinue their grisly work unop­posed, until Wikipedia is a shell of what it was?

Like the cool­ing dwarf star left by a super­nova—its lost bril­liance trav­el­ing onwards to eter­ni­ty.

# Appendices

## Analysis script

The following Haskell program requires the Haskell base libraries and the TagSoup library (cabal install tagsoup). The script is parallel. One can compile it like ghc -threaded -rtsopts -O2 script.hs; to run it one pipes in a list of newline-delimited Talk page edits, like ./script +RTS -N4 -RTS < urls.txt which will then print out a summary like

Checked 1024 edits
112 were used

The fol­low­ing sec­tions pro­vide 3 lists of selected edits which one could input.

import Control.Concurrent (forkIO, newEmptyMVar, putMVar, takeMVar, MVar)
import Control.Monad (liftM, void)
import Data.List (elemIndices, intersect, isPrefixOf, nub, sort)
import Network.HTTP (getRequest, rspBody, simpleHTTP)
import Text.HTML.TagSoup (parseTags, Tag(TagOpen, TagText))

main :: IO ()
main = do args <- liftM lines getContents

results <- mapM parallel args
results' <- mapM takeMVar results
count <- liftM (length . filter id) $sequence results' putStrLn$ "Checked " ++ show (length args) ++ " edits"
putStrLn $show count ++ " were used" parallel :: String -> IO (MVar (IO Bool)) parallel s = do m <- newEmptyMVar _ <- forkIO$ void (putMVar m (comparePages s))
return m

comparePages :: String -> IO Bool
comparePages url = do src <- liftM parseTags $openURL url let talkUrls = rmDupes$ concatMap urlsExtract $text$ diff src

artcl <- liftM parseTags $openURL (article url) let articleUrls = uniq$ extractURLs artcl

return $0 /= length (talkUrls intersect articleUrls) where uniq = nub . sort -- don't double-count in intersect! -- throw out any URL appearing twice in a diff - -- must be an old URL in both new and old pages! -- only unique URLs could have been added. rmDupes x = filter (\y -> length (elemIndices y x) <= 1) x openURL :: String -> IO String openURL u = do res <- simpleHTTP$ getRequest u
case res of
Left _ -> return ""
Right y -> return $rspBody y -- pull all text; hopefully, out of the diff-only part of the HTML page text :: [Tag String] -> [String] text src = [x | (TagText x) <- src] urlsExtract :: String -> [String] urlsExtract = filter (not . null) . map trimmer . words {- crop a string down to the "http://" prefix, or return nothing > trimmer "* http://foo.com" → "http://foo.com" > trimmer "* [http://foo.com]" → "http://foo.com" > trimmer "* [http://foo.com Foo]" → "http://foo.com" > trimmer "# It will break!" → "" -} trimmer :: String -> String trimmer [] = [] trimmer y = fst$ break (\x -> x=='[' || x ==']' || x==' ') if "http://" isPrefixOf y then y else trimmer (tail y) -- "http://en.wikipedia.org/w/index.php?title=Talk:Princess_Jellyfish&diff=prev&oldid=403146531" -- → "https://en.wikipedia.org/wiki/Princess_Jellyfish" article :: String -> String article url = "https://en.wikipedia.org/wiki/" ++ takeWhile (/= '&') (drop 47 url) -- pull out all external links (but not local relative links) extractURLs :: [Tag String] -> [String] extractURLs arg = [x | TagOpen "a" atts <- arg, (_,x) <- atts, "http://" isPrefixOf x] -- cut everything up to "<table class='diff diff-contentalign-left'>", then cut everything after -- "<!-- diff cache key enwiki: ... --> </table><hr class='diff-hr' />" diff :: [Tag String] -> [Tag String] diff = takeWhile ast' . dropWhile ast where ast, ast' :: Tag String -> Bool ast x = case x of TagOpen "table" [("class","diff diff-contentalign-left")] -> False _ -> True ast' x = case x of TagOpen "hr" [("class","diff-hr")] -> False _ -> True ## stats.grok.se script This is com­piled and run much the same way, minus the -rtsopts -threaded options (it is not par­al­lel). import Data.List (isInfixOf, nub, sort) import Network.HTTP (getRequest, rspBody, simpleHTTP) import Text.HTML.TagSoup (parseTags, Tag(TagText)) main :: IO () main = do stats <- fmap (nub . sort . map article . lines) getContents srcs <- mapM openURL stats print sum $map total srcs openURL :: String -> IO String openURL u = do res <- simpleHTTP$ getRequest u
case res of
Left _ -> return ""
Right y -> return $rspBody y article :: String -> String article url = "http://stats.grok.se/en/201111/" ++ takeWhile (/= '&') (drop 47 url) total :: String -> Int total s = read (head$ text $parseTags s) :: Int -- target: TagText " has been viewed 215 times in 201111. Talk:Xam%27d:_Lost_Memories&diff=prev&oldid=403146136 Talk:Tantei_Opera_Milky_Holmes&diff=prev&oldid=403146076 Talk:Cross_Game&diff=prev&oldid=403146038 Talk:KimiKiss&diff=prev&oldid=403145989 1. From “Dig­i­tal Fil­ters II” in The Art of Doing Sci­ence and Engi­neer­ing, 1997: This is exactly the same mis­take which was made end­lessly by peo­ple in the early days of com­put­ers. I was told repeat­ed­ly, until I was sick of hear­ing it, com­put­ers were noth­ing more than large, fast desk cal­cu­la­tors. “Any­thing you can do by a machine you can do by hand.”, so they said. This sim­ply ignores the speed, accu­ra­cy, reli­a­bil­i­ty, and lower costs of the machines vs. humans. Typ­i­cally a sin­gle order of mag­ni­tude change (a fac­tor of 10) pro­duces fun­da­men­tally new effects, and com­put­ers are many, many times faster than hand com­pu­ta­tions. Those who claimed there was no essen­tial dif­fer­ence never made any sig­nif­i­cant con­tri­bu­tions to the devel­op­ment of com­put­ers. Those who did make sig­nif­i­cant con­tri­bu­tions viewed com­put­ers as some­thing new to be stud­ied on their own mer­its and not as merely more of the same old desk cal­cu­la­tors, per­haps souped up a bit. ↩︎ 2. Yvain, . The con­nec­tion to Wikipedia is obvi­ous.↩︎ 3. Abstract of “Cita­tion Advan­tage of Open Access Legal Schol­ar­ship”, Dono­van & Wat­son 2011. Are legal schol­ars lazy? Are law libraries ill-­funded? Do legal schol­ars have lit­tle incen­tive to write well-re­searched papers? And yet, mak­ing papers a lit­tle eas­ier to access results in a dra­matic dif­fer­ence in cita­tion. Swan 2010 sur­veyed a 31 stud­ies and found 27 show­ing ben­e­fits to OA. For exam­ple, ben­e­fits to were found in by , and found sim­i­lar results for com­puter sci­ence arti­cles online or offline: The mean num­ber of cita­tions to offline arti­cles is 2.74, and the mean num­ber of cita­tions to online arti­cles is 7.03, or 2.6 times greater than the num­ber for offline arti­cles. These num­bers mask vari­a­tions over time—in par­tic­u­lar, older arti­cles have more cita­tions on aver­age, and older arti­cles are less likely to be online. When con­sid­er­ing arti­cles within each year, and aver­ag­ing across all years from 1990 to 2000, we find that online arti­cles are cited 4.5 times more often than offline arti­cles. We also ana­lyzed dif­fer­ences within each pub­li­ca­tion venue, where mul­ti­ple years for the same con­fer­ence are con­sid­ered as sep­a­rate venues. We com­puted the per­cent­age increase in the aver­age num­ber of cita­tions to online arti­cles com­pared to offline arti­cles. When offline arti­cles were more highly cit­ed, we used the neg­a­tive of the per­cent­age increase for offline arti­cles. For exam­ple, if the aver­age num­ber of cita­tions for offline arti­cles is 2, and the aver­age for online arti­cles is 4, the per­cent­age increase would be 100%. For the oppo­site sit­u­a­tion, the per­cent­age increase would be -100%. Fig­ure 2 shows the results. Aver­ag­ing the per­cent­age increase across 1,494 venues con­tain­ing at least five offline and five online arti­cles results in an aver­age of 336% more cita­tions to online arti­cles com­pared to offline arti­cles pub­lished in the same venue [the first, sec­ond (me­di­an), and third quar­tiles of the dis­tri­b­u­tion are 58%, 158%, and 361%]. ↩︎ 4. On the other hand, one eco­nom­ics study showed no ben­e­fit, Craig et al 2007 found no ben­e­fit in one physics sub­field: Three non-ex­clu­sive pos­tu­lates have been pro­posed to account for the observed cita­tion dif­fer­ences between OA and non-OA arti­cles: an open access pos­tu­late, a selec­tion bias pos­tu­late, and an early view pos­tu­late. The most rig­or­ous study to date (in con­densed mat­ter physics) showed that, after con­trol­ling for the early view pos­tu­late, the remain­ing dif­fer­ence in cita­tion counts between OA and non-OA arti­cles is explained by the selec­tion bias pos­tu­late. No evi­dence was found to sup­port the OA pos­tu­late per se; i.e. arti­cle OA sta­tus alone has lit­tle or no effect on cita­tions. Fur­ther stud­ies using a sim­i­larly rig­or­ous approach are required to deter­mine the gen­er­al­ity of this find­ing. ↩︎ 5. , Pay­Pal co-­founder; pg 11, Founders at Work↩︎ 6. Joel on Soft­ware, “Fog­Bugz”↩︎ 7. Joel on Soft­ware, “Strat­egy Let­ter III: Let Me Go Back!”↩︎ 8. This exploratory study analy­ses the con­tent of the search queries that led Aus­tralian Inter­net users from a search engine to a Wikipedia entry. The study used trans­ac­tion logs from Hit­wise that matched search queries with data on the lifestyle of the searcher. A total sam­ple of 1760 search terms, strat­i­fied by search term fre­quency and lifestyle, was drawn…The results of the study sug­gest that Wikipedia is used more for lighter top­ics than for those of a more aca­d­e­mic or seri­ous nature. Sig­nif­i­cant dif­fer­ences among the var­i­ous lifestyle seg­ments were observed in the use of Wikipedia for queries on pop­u­lar cul­ture, cul­tural prac­tice and sci­ence. ↩︎ 9. pg 136, “4.4 Demo­graphic Analy­sis of the Wikipedia Com­mu­nity”↩︎ 10. ibid. pg 137↩︎ 11. The Kag­gle back­ground infor­ma­tion on the “Wikipedi­a’s Par­tic­i­pa­tion Chal­lenge” includes an inter­est­ing extract from the WMF report: “Between 2005 and 2007, new­bies started hav­ing real trou­ble suc­cess­fully join­ing the Wiki­me­dia com­mu­ni­ty. Before 2005 in the Eng­lish Wikipedia, nearly 40% of new edi­tors would still be active a year after their first edit. After 2007, only about 12-15% of new edi­tors were still active a year after their first edit. Post-2007, lots of peo­ple were still try­ing to become Wikipedia edi­tors. What had changed, though, is that they were increas­ingly fail­ing to inte­grate into the Wikipedia com­mu­ni­ty, and fail­ing increas­ingly quick­ly.” ↩︎ 12. Rather than reach­ing 95k edi­tors, the actual March-July 2012 num­bers were 76,274/75,141/76,956/74,402/76,400. In ret­ro­spect, my pes­simistic 75% pre­dic­tion that 95k would not be reached was actu­ally ludi­crously opti­mistic, given that the 95k edi­tor mark has never been reached: the high­-wa­ter mark seems to have been March 2007 with 90,618 edi­tors >5 edits that month. So we have been shrink­ing ~2.8k edi­tors a year: ((91 - 77) / (2012 - 2007)).↩︎ 13. The suc­cess­ful recre­ation of arti­cle and the end­less dele­tion debates about Daniel Brandt (crowned in suc­cess for the dele­tion­ists) again come to mind.↩︎ 14. In June 2011, and the WMF announced a “Wikipedi­a’s Par­tic­i­pa­tion Chal­lenge” to develop a bet­ter sta­tis­ti­cal model for pre­dict­ing edi­tor reten­tion; while the train­ing data was biased, the results are not too sur­pris­ing: the sin­gle best pre­dic­tor is the fre­quency of any edits prior to the cut­off. See 2nd place, Ernest Shack­le­ton or con­tes­tant Keith T. Her­ring: A ran­domly selected Wikipedia edi­tor that has been active in the past year has approx­i­mately an 85% prob­a­bil­ity of being inac­tive (no new edits) in the next 5 months. The most infor­ma­tive fea­tures (w/r/t the fea­tures I con­sid­ered) cap­tured both the edit tim­ing and vol­ume of an edi­tor. More specif­i­cally the expo­nen­tially weighted edit vol­ume of a user (edit weight decreases expo­nen­tially with increased time between the edit and the end of the obser­va­tion peri­od) with a half-life of 80 days pro­vided the most pre­dic­tive capa­bil­ity among the 206 fea­tures included in the mod­el. Other attrib­utes of the edit his­to­ry, such as unique­ness of arti­cles, arti­cle cre­ation, com­ment behav­ior, etc. pro­vided some addi­tional use­ful infor­ma­tion, although roughly an order of mag­ni­tude or less than the edit tim­ing and vol­ume when mea­sured as global impact across the full non-­con­di­tioned edi­tor uni­verse. ↩︎ 15. I dis­agree with parts of Mar­i­o’s essay; for exam­ple, his first exam­ple is wrong as there are count­less arti­cles to write from the sis­ter wikis, and many spe­cial­ist sources like _ have hun­dreds or thou­sands of entries that Wikipedia does not (I counted a dozen or so just link­ing to the arti­cles writ­ten by Jonathan Clements—eg. many of the biog­ra­phy redlinks in “” or “”.) And every day, sites like the Anime News Net­work or New York Times post dozens of reviews or other ref­er­ences that can be eas­ily & prof­itably worked into arti­cles—but aren’t. One com­ment makes the good point that the the­ory of com­plete­ness would not pre­dict any flatlin­ing in the smaller and less com­plete wik­is, yet we seem to observe a gen­eral flatlin­ing. How­ev­er, his rea­son 2 is sim­i­lar to my own the­ory about the Seigen­thaler affair and the BLP reac­tion, and his rea­son 3 is my pre­vi­ous point about process & the fal­lacy of the invisible/broken win­dow fal­la­cy.↩︎ 16. Gard­ner’s Decem­ber UK address con­tained other graphs worth look­ing at.↩︎ 17. Which as links to cre­den­tialed sources will be uncon­tro­ver­sial and require lit­tle defense, vastly improv­ing the ROI of edit­ing Wikipedia. Wikipedia gets a great deal of traf­fic, and even highly obscure arti­cles exert sur­pris­ing influ­ence; one can look at the traf­fic rates on spe­cific pages with stats.grok.se.↩︎ 18. To quote the great com­puter sci­en­tist in 2006: I think that Wikipedi­a’s enor­mously suc­cess­ful, but it’s so brit­tle, you know, if I was, if I spent a lot of time writ­ing an arti­cle for the Wikipedia, and I wanted to make sure nobody screwed it up, I would have to check that arti­cle every day to make sure that it was still okay, and you know, after I’ve done that I want to move on and go on to oth­er, other things in my life. With , I wanted sta­bil­ity espe­cially urgently because peo­ple are depend­ing on it to be a fixed point that they can build on, so in that respect, I dif­fer from the . (The GPL con­tains clauses that users of GPLed code may use the terms of later ver­sions of the GPL, which may fix any legal vul­ner­a­bil­i­ties or exploits dis­cov­ered. This is a com­mon prac­tice among copy­left licenses and in fact, the WMF itself cross-li­censed the entire set of Wikipedias and other projects from the to as well based on a one-­time pro­vi­sion GNU added at WMF’s request.)↩︎ 19. "We also found that there have been two bots (com­puter pro­grams that edit Wikipedi­a)—BJBot and Jayden54Bot—that auto­mat­i­cally noti­fied arti­cle edi­tors about AfD dis­cus­sions and recruited them to par­tic­i­pate per the estab­lished pol­i­cy. These bots per­formed AfD noti­fi­ca­tions for sev­eral months, and offer us an oppor­tu­nity to study the effect of recruit­ment that is purely pol­icy dri­ven. We use a process like one described above to detect suc­cess­ful instances of bot-ini­ti­ated recruit­ment: if a recruit­ment bot edited a user’s talk page, and that user !voted in an AfD within two days, then we con­sider that user to have been recruited by the bot. Using the above process­es, we iden­ti­fied 8,464 instances of suc­cess­ful recruit­ing. Table 2 shows a sum­mary of who did the recruit­ing, and how their recruits !vot­ed. We see large dif­fer­ences in !vot­ing behav­ior, which sug­gests that there is bias in who peo­ple choose to recruit. (From these data we can­not tell whether the bias is an inten­tional effort to influ­ence con­sen­sus, or the result of social net­work homophily [14].) Par­tic­i­pants recruited by keep !vot­ers were about four times less likely to sup­port dele­tion as those recruited by delete !vot­ers. The par­tic­i­pants that bots recruited also appear unlikely to sup­port dele­tion, which reflects the pol­icy bias we observed ear­li­er. To see what effect par­tic­i­pant recruit­ment has on deci­sion qual­i­ty, we intro­duce four binary vari­ables: BotRecruit, NomRecruit, DeleteRecruit, and KeepRecruit. These vari­ables indi­cate whether a bot, the AfD nom­i­na­tor, a delete !voter, or a keep !voter suc­cess­fully recruited some­body to the group, respec­tive­ly. Look­ing back to table 1, we find that regard­less of the deci­sion, none of the first three vari­ables has a sta­tis­ti­cal­ly-sig­nif­i­cant effect. On the other hand, when a keep !voter recruited some­one to the dis­cus­sion, we see a sig­nif­i­cant effect: delete deci­sions are more likely to be reversed. We offer two pos­si­ble expla­na­tions: the first is that recruit­ment by keep !vot­ers, biased as it may appear, is a sign of pos­i­tive com­mu­nity inter­est, and sug­gests that the arti­cle should be kept. If the com­mu­nity decides oth­er­wise and deletes the arti­cle, then deci­sion qual­ity suf­fers. An alter­na­tive expla­na­tion is that keep !voter recruit­ment is a sign of activism among those who pre­fer to keep the arti­cle. These pro­po­nents may be espe­cially per­sis­tent in main­tain­ing the arti­cle’s exis­tence in Wikipedia, even if it requires work­ing to reverse a delete deci­sion." ↩︎ 20. It was too hard to extract only the URL(s) being added by a diff, so the script sim­ply extracts all URLs it can find in the diff part of the HTML; so if an edi­tor made 4 edits adding URLs A, B, C, and D, and only A were added to the arti­cle, then the script would 4 times extract A-D, spot A in the arti­cle, and declare vic­to­ry. This may account for Kreb­Mark­t’s increased suc­cess rate com­pared to my edits, because she is accus­tomed to pil­ing up her sug­gested links in one tidy sec­tion.↩︎ 21. I added a few links to Talk pages to time how long it took for a Kreb­Mark­t-style edit: to go from the ANN page to a saved and reloaded page which I had checked by eye that the edit was cor­rect was upwards of 30 sec­onds. >30 sec­onds times 958 edit is >479 min­utes or >8 hours; my excerpt­ing edits take at least 5 min­utes to do, so those 248 edits rep­re­sent >21 hours of work.↩︎ 22. Some­one might object that pick­ing the last link in an Exter­nal Link sec­tion is not ran­dom at all. I am reminded of an anec­dote describ­ing a court case involv­ing the draft back in Viet­nam, where the plain­tiff’s lawyer argued that the lit­tle cage and balls method was not ran­dom and was unfair because the balls on top were much more likely to be select­ed. The judge asked, “Unfair to whom?” As well, this method­ol­o­gy, while being quite as ran­dom as most meth­ods, car­ries the usual advan­tages of deter­min­ism: any­one will be able to check whether I did in fact remove only last links which are not offi­cial or tem­plate-­gen­er­ated in Exter­nal Link sec­tions. This is evi­dence that I did not sim­ply the links that I thought were worst and so least likely to be restored. (If I were going to cherry pick under this pro­ce­dure, I would have had to invest a great deal more effort: for each removal, I would have to find mul­ti­ple can­di­dates each of which sat­is­fied the cri­te­ria and only then could I pick the worst final link; and then I would have to start over for the next removal, and since I had to check ~10 ran­dom arti­cles for a pos­si­ble final link, this implies for every removal, I’d be look­ing at some­thing like 40+ ran­dom arti­cles to do one removal or 200+ ran­dom arti­cles a day! And this decep­tion would have to be delib­er­ate & planned—while most cases of bias are uncon­scious.)↩︎ 23. Some edi­tors pride them­selves on detect­ing van­dal­ism weeks or months after cre­ation; they are highly unusu­al. When I was spend­ing time read­ing aca­d­e­mic pub­li­ca­tions on Wikipedia a few years ago, a num­ber of them dealt with quan­ti­fy­ing van­dal­ism and rever­sions; almost all van­dal­ism was reverted within days, and rever­sions which took longer than a month were very rare (0-10%, to be very gen­er­ous). This was why I chose to wait a mon­th, because wait­ing longer added noth­ing. A week would have been ade­quate. Rel­e­vant research on quan­ti­fy­ing rever­sion rates over time: ↩︎ 24. It’s not hard to esti­mate. Take the list of 100 diffs, and use an edi­tor macro or a shell tool like sed to strip it down to a list of URL-encoded arti­cle names like so: Castell_Dinas_Bran Ron_O%27Neal HUD_(video_gaming) Protector_(2009_film) ... Then, loop over the list to down­load the March 2012 sum­mary page for that arti­cle, and fil­ter out the total monthly hit-­count (since we don’t care about dailies); exam­ple code: $ for URL in cat articles.txt
do elinks -dump "http://stats.grok.se/en/201203/$URL" | fgrep " has been viewed " done [1]Castell_Dinas_Bran has been viewed 914 times in 201203. [1]Ron_O'Neal has been viewed 7446 times in 201203. [1]HUD_(video_gaming) has been viewed 7579 times in 201203. ... This out­put is also easy to process with a macro or reg­exp, and once we have the monthly num­ber for each arti­cle, all that remains is total­ing them: sum [914,7446,7579,542,3103,91,1665,5291,2452,102,272,3344,16214,32268,863,10307,476, 3825,310,205,441,3028,187,94,115,211,207,522,269,182,1324,950,25660,162,14457 3881,200,3510,606,430,2048,164,214,136,77,8075,99,255,278,148,525,192,108,295 61,597,180,3491,753,527,766,113,1405,770,3683,288,873,26811,131,6625,93,212 538,313,7119,212,76,1130,7741,2136,179,263,632,870,714,338,2517,456,90,621 1323,316,1125,413,73223,122,12707,6573] -- 335445 Note that this is prob­a­bly an under­es­ti­mate. It took weeks to remove all the links, doing it just 5 or 10 at a time, and the 30 day timer only started when link #100 was removed. So for link #1, some­thing closer to 2 months passed…↩︎ 25. His user page states as of 2012-05-19 under “My activ­i­ties on Wikipedia” that …My Wikipedia phi­los­o­phy is quite com­plex, and defies easy cat­e­go­riza­tion. My ideal for a more per­fect Wikipedia would be to cre­ate many wikis for pop cul­ture top­ics and tran­swiki many of the related arti­cles on Wikipedia to them. (Some of these already exist in a fairly sub­stan­tial for­mat, such as and ). I see no rea­son why all 703 episodes of the live-ac­tion (and 17 of the 22 ani­mated episodes) should have arti­cles on Wikipedia, when Mem­ory Alpha exists. (Be­fore you go hat­ing on me for that, note that I own all 720 episodes on DVD, as well as all but three of the movies.) This does not make me a dele­tion­ist, how­ev­er. I also believe in struc­tur­ism, and a com­bi­na­tion of two oppos­ing philoso­phies mer­gism and seper­atism; merg­ing in small arti­cles rather than delet­ing them and sep­a­rat­ing large arti­cles rather than delet­ing con­tent. I also agree with the tenets of exclu­sion­ism, although that also leads back to tran­swik­ism again. ↩︎ 26. Since no one noticed the 100 removals were con­nect­ed, we can assume each removal was sta­tis­ti­cally inde­pen­dent; this lets us cal­cu­late a . Specif­i­cal­ly, with 3 suc­cesses and 100 sam­ples, the 99% is 0-7%. We can derive this from Wol­fram Alpha or one’s favorite sta­tis­ti­cal pack­age if one does­n’t want to crunch the for­mula one­self. (In­ci­den­tal­ly, Wikipedia has 3,960,143 as of 2012-06-01 accord­ing to Spe­cial:S­ta­tis­tics, and I went through per­haps 10 pages for each removal, so the total pos­si­ble sam­ple size is ~396,014. That 100 sam­ples can give such a good esti­mate—as long as they are inde­pen­den­t—is the same magic that makes things like s work; at least, as a child I found it mag­i­cal that a sam­ple of <1000 vot­ers could pre­dict so accu­rately the elec­tion results in a pop­u­la­tion of >300 mil­lion peo­ple.)↩︎ 27. One might won­der why I had so much traf­fic to an Eng­lish page; do just that many Ger­mans know Eng­lish? No, it turns out my link in their page did­n’t come with an “Eng­lish” warn­ing. I added this warn­ing on 2012-05-20, and while there was a major traf­fic spike after that and then a long out­age June-Sep­tem­ber 2012 where the link was bro­ken due to my own care­less­ness, the warn­ing seems to have sub­stan­tially reduced click­-throughs accord­ing to my ana­lyt­ics.↩︎ 28. It’s actu­ally closer to p = 0.00000000000000022. Assum­ing one has cleaned up the two CSVs by remov­ing the ini­tial sum­mary data and the final total line, the sta­tis­ti­cal analy­sis goes like this: before <- read.table("https://www.gwern.net/docs/wikipedia/2012-gwern-dnb-wikipedia-before.csv", header=TRUE,sep=",") after <- read.table("https://www.gwern.net/docs/wikipedia/2012-gwern-dnb-wikipedia-after.csv", header=TRUE, sep=",") before$Pageviews
[1]  1  0  2  3 12  3  9  3  3  2  3  0  3  1  9 11  5  6  7  7  7  5  5  7  0
[26]  1  9 21  3  6  6 12  5  9  7 13 11 11 11 10 12  5 12 16 13  4 14 14  9  3
[51]  9 11  4 10  5 11  4 21 15  3  7  1  7  4  5  2  4  7  4  5  5 12 14  9  5
[76]  7  3  3 16  9  6 15 12  6  7  4 14  5 13  5 11  3  2 12  2 19  5  5  9 14
[101]  6  6 14 11 17  5  3  2  3  6  8 26  5  8  5 10  9  3  7 11  7  7 17 14 16
[126]  7  3  4  5 13  8  7 11  3  6  7  8  6 11 16 13 15 11  9  5  6  3 11  7  7
[151]  6  7  6  9 11  6  8 16 10  4  5  9 10  3  6  5 11 25  9  9 17 17 23 21 23
[176] 34  8 15 10 21 20 10 12 21 17 11 30 17  6  7  9 17 12 19  6  7 13 12 12 10
[201] 14 11 13 14 13  9 10  6 10  8
after$Pageviews [1] 7 5 3 5 2 2 3 2 1 2 2 2 0 1 1 3 1 2 3 2 1 5 1 0 1 2 1 0 3 2 0 2 1 0 1 1 4 [38] 2 1 1 1 0 1 0 1 4 0 1 1 0 0 0 0 2 0 0 2 1 0 0 0 1 1 0 1 0 1 1 2 1 2 1 1 3 [75] 2 4 1 3 2 3 3 2 2 1 1 1 1 1 1 3 2 4 1 2 4 2 2 2 2 1 1 wilcox.test(before$Pageviews, after$Pageviews) Wilcoxon rank sum test with continuity correction data: before$Pageviews and after$Pageviews W = 20084, p-value < 2.2e-16 I’m not sure why R is report­ing slightly dif­fer­ent means than I listed pre­vi­ous­ly, but the final result is not too sur­pris­ing when you eye­ball the data—this is a very large . Specif­i­cal­ly, the effect size as Cohen’s d is 1.28 (where 0.5 is described as “medium”, and >0.8 is “large”): (mean(before$Pageviews) - mean(after$Pageviews)) / sd(append(before$Pageviews, after\$Pageviews))
1.275841
↩︎
29. See “Call For Halt To Wikipedia Web­comic Dele­tions” for an overview.↩︎

30. Anime and manga are par­tic­u­larly bad. The Amer­i­can and Japan­ese anime bub­bles of the 2000s popped, and with them went a flood of mag­a­zines and book­s—the eco­nomic real­ity has set in that they are sim­ply not sus­tain­able in a mod­ern envi­ron­ment, which of course is very use­ful to dele­tion­ists who want to apply rigid uni­ver­sal norms to arti­cles sans any con­text. This leads to odd sit­u­a­tions like experts self­-pub­lish­ing; from Brian Ruh’s ANN col­umn “The Ghost with the Most”:

This time, though, instead of a fic­tional book about the super­nat­ural I’m going to be exam­in­ing a non­fic­tion book about Japan­ese ghost­s—­Patrick Drazen’s A Gath­er­ing of Spir­its: Japan’s Ghost Story Tra­di­tion: From Folk­lore and Kabuki to Anime and Manga, which was recently self­-pub­lished through the ser­vice. This is Drazen’s sec­ond book; the first one, Anime Explo­sion! The What? Why? & Wow! of Japan­ese Ani­ma­tion, came out in 2002 from and was an intro­duc­tion to many of the gen­res and themes that can be found in ani­me.

I think the switch from a com­mer­cial press to self­-pub­li­ca­tion may indi­cate the direc­tion Eng­lish-lan­guage anime and manga schol­ar­ship may be head­ing in. A few years ago, when Japan­ese pop­u­lar cul­ture seemed like the Next Big Thing, there were more pub­lish­ers that seemed like they were will­ing to take a chance on books about anime and man­ga. Unfor­tu­nate­ly, as I know first­hand (and as I’ve heard from other authors, con­firm­ing that it’s not just me) these books did­n’t sell nearly as well as any­one was hop­ing, which in turn meant that these pub­lish­ers did­n’t want to take risks with addi­tional books along these lines. After all, all pub­lish­ers need to make money in one way or another to stay afloat. In the last few years, the major­ity of books on anime and manga have been pub­lished by uni­ver­sity press­es, per­haps most notably the Uni­ver­sity of Min­nesota Press. But I already gushed about them in my last column, so I’ll spare you from any addi­tional pub­lic dis­plays of affec­tion.

How­ev­er, this puts books like Drazen’s in an odd predica­ment. It’s not really an aca­d­e­mic book, since it lacks the ref­er­ences and the­o­ries some­thing like that would entail, which means it’s not a good can­di­date for a uni­ver­sity press. How­ev­er, since few pop­u­lar presses have seen their books on anime and manga reflect pos­i­tively on their bot­tom lines, there aren’t many other options these days other than self­-pub­lish­ing. Of course, these days pub­lish­ing a book on your own does­n’t have nearly the same con­no­ta­tions it did decades ago, when van­ity presses were the domain of those with more money (and ego) than sense. These days you can self­-pub­lish a qual­ity pro­duct, get it up on Ama­zon for all to see, and (if you’re savvy about these things) per­haps even make a tidy prof­it.

↩︎
31. ‘Tak­ing Up the Mop: Iden­ti­fy­ing Future Wikipedia Admin­is­tra­tors’, Moira Burke and Robert Kraut, in Pro­ceed­ings of the Con­fer­ence on Human Fac­tors in Com­put­ing Sys­tems, Flo­rence, Italy, 5-10. April 2008, pp. 3441-6↩︎

32. From “Cul­tural Trans­for­ma­tions in Wikipedia or ‘From Eman­ci­pa­tion to Prod­uct Ide­ol­ogy’: An Inter­view with Chris­t­ian Stegbauer”, col­lected in A Wikipedia Reader:

"Our 2006 research [Chris­t­ian Stegbauer, ‘Wikipedia. Das Rät­sel der Koop­er­a­tion’ (‘Wikipedia: the mys­tery behind the coop­er­a­tion’), Wies­baden: VS, 2009, p. 279 et seq.] com­pared con­tent on user pages from their orig­i­nal start­ing date to the pre­sent. 13 We noticed a trans­for­ma­tion from eman­ci­pa­tion to prod­uct ide­ol­ogy among those who had reached lead­er­ship sta­tus, but not for ones less inte­grat­ed. Typ­i­cal state­ments from a user site’s first days would be: ‘Wikipedia is a great idea’; ‘[a] nev­er-end­ing ency­clo­pe­dia cre­ated by many dif­fer­ent authors’; ‘every­one should be able to exchange their knowl­edge for free’; ‘Wikipedia is like ful­fill­ing a dream—a book in which every­one can write what they want’; ‘the Inter­net should­n’t be regarded as a gold­mine’; ‘Mak­ing infor­ma­tion avail­able free of charge is an impor­tant task’; ‘the pro­jec­t’s con­cept is fan­tas­tic’; ‘the idea behind Wikipedia is well worth sup­port­ing’.

Six out of seven users who changed their ide­o­log­i­cal state­ments were core users, and five of these were admin­is­tra­tors. Half of them deleted their opin­ion on eman­ci­pa­tion ide­ol­ogy in the same instance they became admin­is­tra­tors. In five out of nine cas­es, they expressed the prod­uct ide­ol­o­gy, includ­ing remarks about ‘unrea­son­able’ peo­ple dam­ag­ing the pro­ject, about end­less dis­cus­sions that should not take place when energy should be invested in the arti­cles instead, and about ‘dif­fi­cult’ peo­ple who are not wel­come at Wikipedia. We also found phras­ing such as ‘cer­tain level of exper­tise is nec­es­sary for writ­ing the arti­cles’ or that lib­eral pro­cess­ing is the rea­son behind low qual­ity con­tri­bu­tion­s."

↩︎
33. From pg 5 of Thurner et al 2012 (or see pop­u­lar cov­er­age in eg. Tech­nol­ogy Review):

Tran­si­tion rates of actions of indi­vid­u­als show that pos­i­tive actions strongly induces pos­i­tive reac­tions. Neg­a­tive behav­ior on the other hand has a high ten­dency of being repeated instead of being rec­i­p­ro­cat­ed, show­ing the ‘propul­sive’ nature of neg­a­tive actions. How­ev­er, if we con­sider only reac­tions to neg­a­tive actions, we find that neg­a­tive reac­tions are highly over­rep­re­sent­ed. The prob­a­bil­ity of act­ing out neg­a­tive actions is about 10 times higher if a per­son received a neg­a­tive action at the pre­vi­ous timestep than if she received a pos­i­tive action.

…The analy­sis of binary time­series of play­ers (good-bad) shows that the behav­ior of almost all play­ers is ‘good’ almost all the time. Neg­a­tive actions are bal­anced to a large extent by good ones. Play­ers with a high frac­tion of neg­a­tive actions tend to have a sig­nif­i­cantly shorter life. This may be due to two rea­sons: First because they are hunted down by oth­ers and give up play­ing, sec­ond because they are unable to main­tain a social life and quit the game because of lone­li­ness or frus­tra­tion. We inter­pret these find­ings as empir­i­cal evi­dence for self orga­ni­za­tion towards rec­i­p­ro­cal, good con­duct within a human soci­ety. Note that the game allows bad behav­ior in the same way as good behav­ior but the extent of pun­ish­ment of bad behav­ior is freely decided by the play­ers.

It’s worth not­ing the dis­tinc­tion between ‘rec­i­p­ro­ca­tion’ and ‘repeated’; oth­er­wise this phe­nom­e­non might have an expla­na­tion as a sta­tis­ti­cal arti­fact result­ing from an ordi­nary game activ­ity like 1-on-1 fights or duels.↩︎

34. I bring up the ‘Lightsaber com­bat’ arti­cle because I did sub­stan­tial work ref­er­enc­ing it before its wik­i-dele­tion, but because it was redi­rected the orig­i­nal page his­tory still sur­vives. It is worth­while com­par­ing the orig­i­nal page with its in the ‘Lightsaber’ arti­cle.

I am chuffed to note that the merge has resulted in infe­rior ref­er­ences! eg. the quote in para­graph 2 is unsourced and has a {{fact}} tem­plate, but was ref­er­enced in the orig­i­nal. Fur­ther, that quote is triv­ially re-ref­er­enced (#3 hit in Google). My stan­dards may be too high, but I can’t help but think that it takes real incom­pe­tence to not only lose a ref­er­ence, but be unable to re-find such an eas­ily found quote.↩︎

35. The pathos has, at times, moved me to verse. To quote one of mine from WP:HAIKU (a homage to Basho’s famous verse in ):

Summer AFD -
the sole remnant of many
editors' hard work.

It is not a coin­ci­dence that I put that haiku before the final haiku on the page—a haiku com­ment­ing on edi­tors who have aban­doned or left the project:

The summer grasses.
I edit my user page
One last time - really.
↩︎
36. One could also avoid com­pi­la­tion and run it much more slowly as cat urls.txt | runhaskell script`.↩︎