About This Website

Meta page describing gwern.net site ideals of stable long-term essays which improve over time; technical decisions using Markdown and static hosting; idea sources and writing methodology; metadata definitions; site statistics; copyright license.
personal, psychology, archiving, statistics, predictions, meta, Bayes, Google, design
2010-10-012020-12-06 finished certainty: highly likely importance: 3


This page is about Gw­ern.net; for in­for­ma­tion about me, see .

The Content

“Of all the books I have de­liv­ered to the press­es, none, I think, is as per­sonal as the strag­gling col­lec­tion mus­tered for this hodge­podge, pre­cisely be­cause it abounds in re­flec­tions and in­ter­po­la­tions. Few things have hap­pened to me, and I have read a great many. Or rather, few things have hap­pened to me more worth re­mem­ber­ing than Schopen­hauer’s thought or the mu­sic of Eng­land’s words.”

“A man sets him­self the task of por­tray­ing the world. Through the years he peo­ples a space with im­ages of provinces, king­doms, moun­tains, bays, ships, is­lands, fish­es, rooms, in­stru­ments, stars, hors­es, and peo­ple. Shortly be­fore his death, he dis­cov­ers that that pa­tient labyrinth of lines traces the im­age of his face.”

, Epi­logue

The con­tent here varies from to to / to to to to to in­ves­ti­ga­tions of or (or two top­ics at on­ce: or or heck !).

I be­lieve that some­one who has been well-e­d­u­cated will think of some­thing worth writ­ing at least once a week; to a sur­pris­ing ex­tent, this has been true. (I added ~130 doc­u­ments to this repos­i­tory over the first 3 years.)

Target Audience

“Spe­cial knowl­edge can be a ter­ri­ble dis­ad­van­tage if it leads you too far along a path you can­not ex­plain any­more.”

()

I don’t write sim­ply to find things out, al­though cu­rios­ity is my pri­mary mo­ti­va­tor, as I find I want to read some­thing which has­n’t been writ­ten—“…I re­alised that I wanted to read about them what I my­self knew. More than this—what only I knew. De­prived of this pos­si­bil­i­ty, I de­cided to write about them. Hence this book.”1 There are many ben­e­fits to keep­ing notes as they al­low one to ac­cu­mu­late con­firm­ing and es­pe­cially con­tra­dic­tory ev­i­dence2, and even drafts can be use­ful so you or sim­ply de­cently re­spect the opin­ions of mankind.

The goal of these pages is not to be a model of con­ci­sion, max­i­miz­ing en­ter­tain­ment value per word, or to preach to a choir by el­e­gantly re­peat­ing a con­clu­sion. Rather, I am at­tempt­ing to ex­plain things to my fu­ture self, who is in­tel­li­gent and in­ter­est­ed, but has for­got­ten. What I am do­ing is ex­plain­ing why I de­cided what I did to my­self and not­ing down every­thing I found in­ter­est­ing about it for fu­ture ref­er­ence. I hope my other read­ers, whomever they may be, might find the topic as in­ter­est­ing as I found it, and the es­say use­ful or at least en­ter­tain­ing–but the in­tended au­di­ence is my fu­ture self.

Development

“I hate the wa­ter that thinks that it boiled it­self on its own. I hate the sea­sons that think they cy­cle nat­u­ral­ly. I hate the sun that thinks it rose on its own.”

So­dachi Oiku­ra, (So­dachi Rid­dle, Part One)

It is every­thing I felt worth writ­ing that did­n’t fit some­where like Wikipedia or was al­ready writ­ten I never ex­pected to write so much, but I dis­cov­ered that once I had a ham­mer, nails were every­where, and that 3.

Long Site

“The In­ter­net is self de­struc­t­ing pa­per. A place where any­thing writ­ten is soon de­stroyed by ra­pa­cious com­pe­ti­tion and the only preser­va­tion is to for­ever copy writ­ing from sheet to sheet faster than they can burn. If it’s worth writ­ing, it’s worth keep­ing. If it can be kept, it might be worth writ­ing…If you store your writ­ing on a third party site like , or even on your own site, but in the com­plex for­mat used by blog/wiki soft­ware du jour you will lose it for­ever as soon as hy­per­sonic wings of In­ter­net la­bor flows di­rect peo­ple’s en­er­gies else­where. For most in­for­ma­tion pub­lished on the In­ter­net, per­haps that is not a mo­ment too soon, but how can the muse of orig­i­nal­ity soar when im­mo­lat­ing tran­sience brushes every feath­er?”

(“Self de­struc­t­ing pa­per”, 2006-12-05)

One of my per­sonal in­ter­ests is ap­ply­ing the idea of the . What and how do you write a per­sonal site with the long-term in mind? We live most of our lives in the fu­ture, and the ac­tu­ar­ial ta­bles give me un­til the 2070–2080s, ex­clud­ing any ben­e­fits from / or projects like . It is a com­mon-place in sci­ence fic­tion4 that longevity would cause wide­spread risk aver­sion. But on the other hand, it could do the op­po­site: the longer you live, the more long-shots you can afford to in­vest in. Some­one with a times­pan of 70 years has rea­son to pro­tect against black swan­s—but also time to look for them.5 It’s worth not­ing that old peo­ple make many short­-term choic­es, as re­flected in in­creased sui­cide rates and re­duced in­vest­ment in ed­u­ca­tion or new hob­bies, and this is not due solely to the rav­ages of age but the prox­im­ity of death—the HIV-infected (but oth­er­wise in per­fect health) act sim­i­larly short­-term.6

What sort of writ­ing could you cre­ate if you worked on it (be it ever so rarely) for the next 60 years? What could you do if you started now?7

Keep­ing the site run­ning that long is a chal­lenge, and leads to the rec­om­men­da­tions for : 100% soft­ware8, for data, tex­tual hu­man-read­abil­i­ty, avoid­ing ex­ter­nal de­pen­den­cies910, and sta­t­ic­ness11.

Pre­serv­ing the con­tent is an­other chal­lenge. Keep­ing the con­tent in a like pro­tects against file cor­rup­tion and makes it eas­ier to mir­ror the con­tent; reg­u­lar back­ups12 help. I have taken ad­di­tional mea­sures: has archived most pages and al­most all ex­ter­nal links; the is also archiv­ing pages & ex­ter­nal links13. (For de­tails, read .)

One could con­tinue in this vein, de­vis­ing ever more pow­er­ful & ro­bust stor­age meth­ods (per­haps com­bine the DVCS with through , a la bup), but what is one to fill the stor­age with?

Long Content

“What has been done, thought, writ­ten, or spo­ken is not cul­ture; cul­ture is only that frac­tion which is re­mem­bered.”

Gary Tay­lor (The Clock of the Long Now; em­pha­sis added)14

‘Blog posts’ might be the an­swer. But I have read blogs for many years and most blog posts are the tri­umph of the hare over the tor­toise. They are meant to be read by a few peo­ple on a week­day in 2004 and never again, and are quickly aban­doned—and per­haps as As­sange says, not a mo­ment too soon. (But is­n’t that sad? Is­n’t it a ter­ri­ble for one’s time?) On the other hand, the best blogs al­ways seem to be build­ing some­thing: they are rough draft­s—­works in progress15. So I did not wish to write a blog. Then what? More than just “ever­green con­tent”, what would con­sti­tute Long Con­tent as op­posed to the ex­ist­ing cul­ture of Short Con­tent? How does one live in a Long Now sort of way?16

It’s shock­ing to find how many peo­ple do not be­lieve they can learn, and how many more be­lieve learn­ing to be diffi­cult. Muad’Dib knew that every ex­pe­ri­ence car­ries its les­son.17

My an­swer is that one uses such a frame­work to work on projects that are too big to work on nor­mally or too te­dious. (Con­sci­en­tious­ness is often lack­ing on­line or in vol­un­teer com­mu­ni­ties18 and many use­ful things go un­done.) Know­ing your site will sur­vive for decades to come gives you the men­tal where­withal to tackle long-term tasks like gath­er­ing in­for­ma­tion for years, and such per­sis­tence can be use­ful19—if one holds onto every glim­mer of ge­nius for years, then even the dullest per­son may look a bit like a ge­nius him­self20. (Even ex­pe­ri­enced pro­fes­sion­als can only write at their peak for a few hours a day—usu­ally , it seem­s.) Half the chal­lenge of fight­ing pro­cras­ti­na­tion is the pain of start­ing—I find when I ac­tu­ally get into the swing of work­ing on even dull tasks, it’s not so bad. So this sug­gests a so­lu­tion: never start. Merely have per­pet­ual drafts, which one tweaks from time to time. And the rest takes care of it­self. I have a few ex­am­ples of this:

  1. :

    When I read in Wired in 2008 that the ob­scure work­ing mem­ory ex­er­cise called dual n-back (DNB) had been found to in­crease IQ sub­stan­tial­ly, I was shocked. IQ is one of the most stub­born prop­er­ties of one’s mind, one of the most frag­ile21, the hard­est to affect pos­i­tively22, but also one of the most valu­able traits one could have23; if the tech­nique panned out, it would be huge. Un­for­tu­nate­ly, DNB re­quires a ma­jor time in­vest­ment (as in, half an hour dai­ly); which would be a bar­gain—if it de­liv­ers. So, to do DNB or not?

    Ques­tions of great im­port like this are worth study­ing care­ful­ly. The wheels of acad­e­mia grind ex­ceed­ing slow, and only a fool ex­pects unan­i­mous an­swers from fields like psy­chol­o­gy. Any at­tempt to an­swer the ques­tion ‘is DNB worth­while?’ will re­quire years and cover a breadth of ma­te­r­i­al. This FAQ on DNB is my at­tempt to cover that breadth over those years.

  2. :

    I have been dis­cussing since 2004. The task of in­ter­pret­ing Eva is very diffi­cult; the source works them­selves are a ma­jor time-sink24, and there are thou­sands of pri­ma­ry, sec­ondary, and ter­tiary works to con­sid­er—per­sonal es­says, in­ter­views, re­views, etc. The net effect is that many Eva fans ‘know’ cer­tain things about Eva, such as not be­ing a grand ‘screw you’ state­ment by Hideaki Anno or that the TV se­ries was cen­sored, but they no longer have proof. Be­cause each fan re­mem­bers a differ­ent sub­set, they have ir­rec­on­cil­able in­ter­pre­ta­tions. (Half the value of the page for me is hav­ing a place to store things I’ve said in count­less fora which I can even­tu­ally turn into some­thing more sys­tem­at­ic.)

    To com­pile claims from all those works, to dig up for­got­ten ref­er­ences, to scroll through mi­cro­films, buy is­sues of de­funct mag­a­zi­nes—all this is enough work to shat­ter of the stoutest salary­man. Which is why I be­gan years ago and ex­pect not to fin­ish for years to come. (Fin­ish­ing by 2020 seems like a good pre­dic­tion.)

  3. : Years ago I was read­ing the pa­pers of the econ­o­mist . I rec­om­mend his work high­ly; even if they are wrong, they are imag­i­na­tive and some of the finest spec­u­la­tive fic­tion I have read. (Ex­cept they were non-fic­tion.) One night I had a dream in which I saw in a flash a me­dieval city run in part on Han­son­ian grounds; a ver­sion of his . A city must have an­other city as a ri­val, and soon I had re­mem­bered the strange ’90s idea of s, which was eas­ily tweaked to work in a me­dieval set­ting. Fi­nal­ly, be­tween them, was one of my fa­vorite pro­pos­als, Buck­min­ster Fuller’s megas­truc­ture.

    I wrote sev­eral drafts but al­ways lost them. Sad25 and dis­cour­aged, I aban­doned it for years. This fear leads straight into the next ex­am­ple.

  4. A Book read­ing list:

    On­ce, I did­n’t have to keep read­ing lists. I sim­ply went to the school li­brary shelf where I left off and grabbed the next book. But then I be­gan read­ing harder books, and they would cite other books, and some­times would even have hor­ri­fy­ing lists of hun­dreds of other books I ought to read (‘bib­li­ogra­phies’). I tried re­mem­ber­ing the most im­por­tant ones but quickly for­got. So I be­gan keep­ing a book list on pa­per. I thought I would throw it away in a few months when I read them all, but some­how it kept grow­ing and grow­ing. I did­n’t trust com­put­ers to store it be­fore26, but now I do, and it lives on in dig­i­tal form (cur­rently on Goodreads—be­cause they have ex­port func­tion­al­i­ty). With it, I can track how my in­ter­ests evolved over time27, and what I was read­ing at the time. I some­times won­der if I will read them all even by 2070.

What is next? So far the pages will per­sist through time, and they will grad­u­ally im­prove over time. But a truly Long Now ap­proach would be to make them be im­proved by time—­make them more valu­able the more time pass­es. ( re­marks in that a group of monks carved thou­sands of scrip­tures into stone, hop­ing to pre­serve them for pos­ter­i­ty—but pos­ter­ity would value far more a care­fully pre­served col­lec­tion of monk fe­ces, which would tell us count­less valu­able things about im­por­tant phe­nom­e­non like global warm­ing.)

One idea I am ex­plor­ing is adding long-term pre­dic­tions like the ones I make on Pre­dic­tion­Book.­com. Many28 pages ex­plic­itly or im­plic­itly make pre­dic­tions about the fu­ture. As time pass­es, pre­dic­tions would be val­i­dated or fal­si­fied, pro­vid­ing feed­back on the ideas.29

For ex­am­ple, the Evan­ge­lion es­say’s par­a­digm im­plies many things about the fu­ture movies in 30; is an ex­tended pre­dic­tion31 of fu­ture plot de­vel­op­ments in se­ries; has sug­ges­tions about what makes good pro­jects, which could be turned into pre­dic­tions by ap­ply­ing them to pre­dict suc­cess or fail­ure when the next Sum­mer of Code choices are an­nounced. And so on.

I don’t think “Long Con­tent” is sim­ply for work­ing on things which are equiv­a­lent to a “” (a work which at­tempts to be an ex­haus­tive ex­po­si­tion of all that is known—and what has been re­cently dis­cov­ered—on a sin­gle top­ic), al­though mono­graphs clearly would ben­e­fit from such an ap­proach. If I write a short es­say cyn­i­cally re­mark­ing on, say, Al Gore and pre­dict­ing he’d sell out and reg­is­tered some pre­dic­tions and came back 20 years later to see how it worked out, I would con­sider this “Long Con­tent” (it gets more in­ter­est­ing with time, as the pre­dic­tions reach mat­u­ra­tion); but one could­n’t con­sider this a “mono­graph” in any or­di­nary sense of the word.

One of the ironies of this ap­proach is that as a , I as­sign non-triv­ial prob­a­bil­ity to the world un­der­go­ing mas­sive change dur­ing the 21st cen­tury due to any of a num­ber of tech­nolo­gies such as ar­ti­fi­cial in­tel­li­gence (such as 32) or ; yet here I am, plan­ning as if I and the world were im­mor­tal.

I per­son­ally be­lieve that one should “think Less Wrong and act Long Now”, if you fol­low me. I dili­gently do my daily and n-back­ing; I care­fully de­sign my web­site and writ­ings to last decades, ac­tively think about how to write ma­te­r­ial that im­proves with time, and work on writ­ings that will not be fin­ished for years (if ever). It’s a bit schiz­o­phrenic since both are to­tal­ized world­views with dras­ti­cally con­flict­ing rec­om­men­da­tions about where to in­vest my time. It’s a case of high ver­sus low dis­count rates; and one could fairly ac­cuse me of com­mit­ting the , but then, I’m not sure that (cer­tain­ly, I have more to show for my wasted time than most peo­ple).

The Long Now views its pro­pos­als like the Clock and the Long Li­brary and as in­sur­ance—in case the fu­ture turns out to be sur­pris­ingly un­sur­pris­ing. I view these writ­ings sim­i­lar­ly. If most am­bi­tious pre­dic­tions turn out right and the hap­pens by 2050 or so, then much of my writ­ings will be moot, but I will have all the ben­e­fits of said Sin­gu­lar­i­ty; if the Sin­gu­lar­ity never hap­pens or ul­ti­mately pays off in a very dis­ap­point­ing way, then my writ­ings will be valu­able to me. By work­ing on them, I hedge my bets.

Finding my ideas

To the ex­tent I per­son­ally have any method for ‘get­ting started’ on writ­ing some­thing, it’s to pay at­ten­tion to any­time you find your­self think­ing, “how ir­ri­tat­ing that there’s no good webpage/Wikipedia ar­ti­cle on X” or “I won­der if Y” or “has any­one done Z” or “huh, I just re­al­ized that A!” or “this is the third time I’ve had to ex­plain this, jeez.”

The DNB FAQ started be­cause I was ir­ri­tated peo­ple were re­peat­ing them­selves on the dual n-back mail­ing list; the ar­ti­cle started be­cause it was a pain to fig­ure out where one could or­der modafinil; the trio of Death Note ar­ti­cles (, , ) all started be­cause I had an amus­ing thought about in­for­ma­tion the­o­ry; the page was com­mis­sioned after I groused about how deeply sen­sa­tion­al­ist & shal­low & il­l-in­formed all the main­stream me­dia ar­ti­cles on the Silk Road drug mar­ket­place were (sim­i­larly for ); my was based on think­ing it was a pity that Arthur’s Guardian analy­sis was triv­ially & fa­tally flawed; and so on and so forth.

None of these seems spe­cial to me. Any­one could’ve com­piled the DNB FAQ; any­one could’ve kept a list of on­line phar­ma­cies where one could buy modafinil; some­one tried some­thing sim­i­lar to my Google shut­down analy­sis be­fore me (and the fancier sta­tis­tics were all stan­dard tool­s). If I have done any­thing mer­i­to­ri­ous with them, it was per­haps sim­ply putting more work into them than some­one else would have; to :

“I think you’ll see what I mean if I teach you a few prin­ci­ples ma­gi­cians em­ploy when they want to al­ter your per­cep­tion­s…­Make the se­cret a lot more trou­ble than the trick seems worth. You will be fooled by a trick if it in­volves more time, money and prac­tice than you (or any other sane on­look­er) would be will­ing to in­vest.”

“My part­ner, Penn, and I once pro­duced 500 live cock­roaches from a top hat on the desk of talk-show host David Let­ter­man. To pre­pare this took weeks. We hired an en­to­mol­o­gist who pro­vided slow-mov­ing, cam­er­a-friendly cock­roaches (the kind from un­der your stove don’t hang around for close-ups) and taught us to pick the bugs up with­out scream­ing like pread­o­les­cent girls. Then we built a se­cret com­part­ment out of foam-core (one of the few ma­te­ri­als cock­roaches can’t cling to) and worked out a de­vi­ous rou­tine for sneak­ing the com­part­ment into the hat. More trou­ble than the trick was worth? To you, prob­a­bly. But not to ma­gi­cians.”

Be­sides that, I think after a while writing/research can be a vir­tu­ous cir­cle or au­to­cat­alyt­ic. If one were to look at my repo sta­tis­tics, you see that I haven’t al­ways been writ­ing as much. What seems to hap­pen is that as I write more:

  • I learn more tools

    eg. I learned ba­sic in R to an­swer what all the pos­i­tive & neg­a­tive , but then I was able to use it for io­dine; I learned lin­ear mod­els for an­a­lyz­ing MoR re­views but now I can use them any­where I want to, like in my .

    The “Feyn­man method” has been face­tiously de­scribed as “find a prob­lem; think very hard; write down the an­swer”, but Gi­an-Carlo Rota gives the real one:

    Richard Feyn­man was fond of giv­ing the fol­low­ing ad­vice on how to be a ge­nius. You have to keep a dozen of your fa­vorite prob­lems con­stantly present in your mind, al­though by and large they will lay in a dor­mant state. Every time you hear or read a new trick or a new re­sult, test it against each of your twelve prob­lems to see whether it helps. Every once in a while there will be a hit, and peo­ple will say: “How did he do it? He must be a ge­nius!”

  • I in­ter­nal­ize a habit of notic­ing in­ter­est­ing ques­tions that flit across my brain

    eg. in March 2013 while med­i­tat­ing: “I won­der if more dou­jin mu­sic gets re­leased when un­em­ploy­ment goes up and peo­ple may have more spare time or fail to find jobs? Hey! That gi­ant Touhou mu­sic tor­rent I down­load­ed, with its 45000 songs all tagged with re­lease year, could prob­a­bly an­swer that!” (One could ar­gue that these ques­tions prob­a­bly should be ig­nored and not in­ves­ti­gated in depth—Teller again—n­ev­er­the­less, this is how things work for me.)

  • if you aren’t writ­ing, you’ll ig­nore use­ful links or quotes; but if you stick them in small asides or foot­notes as you no­tice them, even­tu­ally you’ll have some­thing big­ger.

    I grab things I see on Google Alerts & Schol­ar, Pub­med, Red­dit, Hacker News, my RSS feeds, books I read, and note them some­where un­til they amount to some­thing. (An ex­am­ple would be my slowly ac­cret­ing ci­ta­tions on IQ and eco­nom­ics.)

  • peo­ple leave com­ments, ping me on IRC, send me emails, or leave anony­mous mes­sages, all of which help

    Some ex­am­ples of this come from my most pop­u­lar page, on Silk Road 1:

    1. an anony­mous mes­sage led me to in­ves­ti­gate a ven­dor in depth and pon­der the ac­cu­sa­tion lev­eled against them; I wrote it up and gave my opin­ions and thus I got an­other short es­say to add to my SR page which I would not have had oth­er­wise (and I think there’s a <20% chance that in a few years this will pay off and be­come a very in­ter­est­ing es­say).
    2. CMU’s Nicholas Christin, who by scrap­ing SR for many months and giv­ing all sorts of over­all sta­tis­tics, emailed me to point out I was cit­ing in­ac­cu­rate fig­ures from the first ver­sion of his pa­per. I thanked him for the cor­rec­tion and while I was re­ply­ing, men­tioned I had a hard time be­liev­ing his pa­per’s claims about the ex­treme rar­ity of scams on SR as es­ti­mated through buyer feed­back. After some back and forth and sug­gest­ing spe­cific mech­a­nisms how the es­ti­mates could be pos­i­tively bi­ased, he was able to check his data­base and con­firmed that there was at least one very large omis­sion of scams in the scraped data and there was prob­a­bly a gen­eral un­der­sam­pling; so now I have a more ac­cu­rate feed­back es­ti­mate for my SR page (im­por­tant for es­ti­mat­ing risk of or­der­ing) and he said he’ll ac­knowl­edge me in the/a pa­per, which is nice.

Information organizing

Oc­ca­sion­ally peo­ple ask how I man­age in­for­ma­tion and read things.

  1. For quotes or facts which are very im­por­tant, I em­ploy by adding them to my Mnemosyne

  2. I keep web clip­pings in Ever­notes; I also ex­cerpt from re­search pa­pers & books, and mis­cel­la­neous sources. This is use­ful for tar­geted searches when I re­mem­ber a fact but not where I learned it, and for stor­ing things which I don’t want to mem­o­rize but which have no log­i­cal home in my web­site or LW or else­where. It is also help­ful for writ­ing my and the , as I can read through my book ex­cerpts to re­mind my­self of the high­lights and at the end of the month re­view clip­pings from papers/webpages to find good things to re­share which I was too busy at the time to do so or was un­sure of its im­por­tance. I don’t make any use of more com­plex Ever­note fea­tures.

    I pe­ri­od­i­cally back up my Ever­note us­ing the Linux client Nixnote’s ex­port fea­ture. (I made sure there was a work­ing ex­port method be­fore I be­gan us­ing Ever­note, and use it only as long as Nixnote con­tin­ues to work.)

    My work­flow for deal­ing with PDFs, as of late 2014, is:

    1. if nec­es­sary, jail­break the pa­per us­ing Lib­gen or a uni­ver­sity proxy, then up­load a copy to Drop­box, named year-author.pdf
    2. read the pa­per, mak­ing ex­cerpts as I go
    3. store the meta­data & ex­cerpts in Ever­note
    4. if use­ful, in­te­grate into Gw­ern.net with its title/year/author meta­data, adding a lo­cal full­text copy if the pa­per had to be jail­bro­ken, oth­er­wise rely on my cus­tom archiv­ing setup to pre­serve the re­mote URL
    5. hence, any fu­ture searches for the file­name / ti­tle / key con­tents should re­sult in hits ei­ther in my Ever­note or Gw­ern.net
  3. Web pages are archived & backed up by . This is in­tended mostly for fix­ing dead links (eg to re­cover the full­text of the orig­i­nal URL of an Ever­note clip­ping).

  4. I don’t have any spe­cial book read­ing tech­niques. For re­ally good books I ex­cerpt from each chap­ter and stick the quotes into Ever­note.

  5. I store in­sights and thoughts in var­i­ous pages as par­en­thet­i­cal com­ments, foot­notes, and ap­pen­dices. If they don’t fit any­where, I dump them in .

  6. Larger masses of ci­ta­tions and quotes typ­i­cally get turned into pages.

  7. I make heavy use of RSS sub­scrip­tions for news. For that, I am cur­rently us­ing . (Not that I’m hugely thrilled about it. Google Reader was much bet­ter.)

  8. For projects and fol­lowups, I use re­minders in Google Cal­en­dar.

  9. For record­ing per­sonal data, I au­to­mate as much as pos­si­ble (eg Zeo and arbtt) and I make a habit of the rest—get­ting up in the morn­ing is a great time to build a habit of record­ing data be­cause it’s a time of habits like eat­ing break­fast and get­ting dressed.

Hence, to re­find in­for­ma­tion, I use a com­bi­na­tion of Google, Ever­note, grep (on the Gw­ern.net files), oc­ca­sion­ally Mnemosyne, and a good vi­sual mem­o­ry.

As far as writ­ing goes, I do not use note-tak­ing soft­ware or things like or —not that I think they are use­less but I am wor­ried about whether they would ever re­pay the large up­front in­vest­ments of learning/tweaking or in­ter­fere with other things. In­stead, I oc­ca­sion­ally com­pile out­lines of ar­ti­cles from com­ments on LW/Reddit/IRC, keep edit­ing them with stuff as I re­mem­ber them, search for rel­e­vant parts, al­low lit­tle thoughts to bub­ble up while med­i­tat­ing, and pay at­ten­tion to when I am ir­ri­tated at peo­ple be­ing wrong or an­noyed that a par­tic­u­lar topic has­n’t been writ­ten down yet.

Confidence tags

Most of the meta­data in each page is self­-ex­plana­to­ry: the date is the last time the page was mean­ing­fully mod­i­fied33, the tags are cat­e­go­riza­tion, etc. The “sta­tus” tag de­scribes the state of com­ple­tion: whether it’s a pile of links & snip­pets & “notes”, or whether it is a “draft” which at least has some struc­ture and con­veys a co­her­ent the­sis, or it’s a well-de­vel­oped draft which could be de­scribed as “in progress”, and fi­nally when a page is done—in lieu of ad­di­tional ma­te­r­ial turn­ing up­—it is sim­ply “fin­ished”.

The “con­fi­dence” tag is a lit­tle more un­usu­al. I stole the idea from Mu­flax’s “epis­temic state” tags; I use the same mean­ing for “log” for col­lec­tions of data or links (“log en­tries that sim­ply de­scribe what hap­pened with­out any judg­ment or re­flec­tion”) per­sonal or re­flec­tive writ­ing can be tagged “emo­tional” (“some clus­ter of ideas that got it­self en­tan­gled with a com­plex emo­tional state, and I needed to ex­ter­nal­ize it to even look at it; in no way en­dorsed, but oc­ca­sion­ally nec­es­sary (sim­i­lar to fic­tion)”), and “fic­tion” needs no ex­pla­na­tion (ev­ery au­thor has some rea­son for writ­ing the story or poem they do, but not even they al­ways know whether it is an ex­pres­sion of their deep­est fears, de­sires, his­to­ry, or sim­ply ran­dom thought­s). I drop his other tags in fa­vor of giv­ing my sub­jec­tive prob­a­bil­ity us­ing the :

  1. “cer­tain”
  2. “highly likely”
  3. “likely”
  4. “pos­si­ble” (my pref­er­ence over Kessel­man’s “Chances a Lit­tle Bet­ter [or Less]”)
  5. “un­likely”
  6. “highly un­likely”
  7. “re­mote”
  8. “im­pos­si­ble”

These are used to ex­press my feel­ing about how well-sup­ported the es­say is, or how likely it is the over­all ideas are right. (Of course, an in­ter­est­ing idea may be worth writ­ing about even if very wrong, and even a long shot may be profitable to ex­am­ine if the po­ten­tial pay­off is large enough.)

Importance tags

An ad­di­tional use­ful bit of meta­data would be dis­tinc­tion be­tween things which are triv­ial and those which are about more im­por­tant top­ics which might change your life. Us­ing , I’ve ranked pages in deciles from 0–10 on how im­por­tant the topic is to my­self, the in­tended read­er, or the world. For ex­am­ple, top­ics like or are vastly more im­por­tant, and be ranked 10, than some po­ems or a dream or some­one’s small nootrop­ics self­-ex­per­i­ment, which would be ranked 0–1.

Writing checklist

It turns out that writ­ing es­says (tech­ni­cal or philo­soph­i­cal) is a lot like writ­ing code—there are so many ways to err that you need a process with as much au­toma­tion as pos­si­ble. My cur­rent check­list for fin­ish­ing an es­say:

Markdown checker

I’ve found that many er­rors in my writ­ing can be caught by some sim­ple scripts, which I’ve com­piled into a shell script, markdown-lint.sh.

My lin­ter does:

  1. checks for cor­rupted non-text bi­nary files

  2. checks a black­list of do­mains which are ei­ther dead (eg Google+) or have a his­tory of be­ing un­re­li­able (eg Re­search­Gate, NBER, PNAS); such links need34 to ei­ther be fixed, pre-emp­tively mir­rored, or re­moved en­tire­ly.

    • a spe­cial case is PDFs hosted on IA; the IA is re­li­able, but I try to re­host such PDFs so they’ll show up in Google/Google Scholar for every­one else.
  3. Bro­ken syn­tax: I’ve no­ticed that when I make Mark­down syn­tax er­rors, they tend to be pre­dictable and show up ei­ther in the orig­i­nal Mark­down source, or in the ren­dered HTML. Two com­mon source er­rors:

     "(www"
     ")www"

    And the fol­low­ing should rarely show up in the fi­nal ren­dered HTML:

     "\frac"
     "\times"
     "(http"
     ")http"
     "[http"
     "]http"
     " _ "
     "[^"
     "^]"
     "<!--"
     "-->"
     "<-- "
     "<-"
     "->"
     "$title$"
     "$description$"
     "$author$"
     "$tags$"
     "$category$"

    Sim­i­lar­ly, I some­times slip up in writ­ing image/document links so any link start­ing https://www.gwern.net or ~/wiki/ or /home/gwern/ is prob­a­bly wrong. There are a few Pan­doc-spe­cific is­sues that should be checked for too, like du­pli­cate foot­note names and im­ages with­out sep­a­rat­ing new­lines or un­escaped dol­lar signs (which can ac­ci­den­tally lead to sen­tences be­ing ren­dered as TeX).

    A fi­nal pass with htmltidy finds many er­rors which slip through, like in­cor­rect­ly-escaped URLs.

  4. Flag dan­ger­ous lan­guage: Im­pe­r­ial units are dep­re­cat­ed, but so too is the mis­lead­ing lan­guage of NHST sta­tis­tics (if one must talk of “sig­nifi­cance” I try to flag it as “sta­tis­ti­cal­ly-sig­nifi­cant” to warn the read­er). I also avoid some other dan­ger­ous words like “ob­vi­ous” (if it is re­ally is, why do I need to say it?).

  5. Bad habits:

    • proselint (with some checks dis­abled be­cause they play badly with Mark­down doc­u­ments)
    • An­other sta­tic warn­ing is check­ing for too-long lines (most com­mon in code blocks, al­though some­times bro­ken in­den­ta­tion will cause this) which will cause browsers to use scroll­bars, for which I’ve writ­ten a Pan­doc script,
    • one for a bad habit of mine—too-long foot­notes
  6. du­pli­cate and hidden-PDF URLs: a URL be­ing linked mul­ti­ple times is some­times an er­ror (too much copy­-paste or in­suffi­ciently edited sec­tion­s); PDF URLs should re­ceive a vi­sual an­no­ta­tion warn­ing the reader it’s a PDF, but the CSS rules, which catch cases like .pdf$, don’t cover cases where the host qui­etly serves a PDF any­way, so all URLs are checked. (A URL which is a PDF can be made to trig­ger the PDF rule by ap­pend­ing #pdf.)

  7. bro­ken links are de­tected with linkchecker. The best time to fix bro­ken links is when you’re al­ready edit­ing a page.

While this throws many false pos­i­tives, those are easy to ig­nore, and the script fights bad habits of mine while giv­ing me much greater con­fi­dence that a page does­n’t have any merely tech­ni­cal is­sues that screw it up (with­out re­quir­ing me to con­stantly reread pages every time I mod­ify them, lest an ac­ci­den­tal typo while mak­ing an edit breaks every­thing).

Anonymous feedback

Back in No­vem­ber 2011, luke­prog posted “Tell me what you think of me” where he de­scribed his use of a Google Docs form for anony­mous re­ceipt of tex­tual feed­back or com­ments. Typ­i­cal­ly, most forms of com­mu­ni­ca­tion are non-anony­mous, or if they are anony­mous, they’re pub­lic. One can set up pseu­do­nyms and use those for pri­vate con­tact, but it’s not al­ways that easy, and is defi­nitely a se­ries of (if anony­mous feed­back is not so­licit­ed, one has to feel it’s im­por­tant enough to do and vi­o­late im­plicit norms against anony­mous mes­sages; one has to set up an iden­ti­ty; one has to com­pose and send off the mes­sage, etc).

I thought it was a good idea to try out, and on 2011-11-08, I set up my own anony­mous feed­back form and stuck it in the footer of all pages on Gw­ern.net where it re­mains to this day. I did won­der if any­one would use the form, es­pe­cially since I am easy to con­tact via email, use mul­ti­ple sites like Red­dit or Less­wrong, and even my Dis­qus com­ments al­low anony­mous com­ments—so who, if any­one, would be us­ing this form? I sched­uled a fol­lowup in 2 years on 2013-11-30 to re­view how the form fared.

754 days, 2.884m page views, and 1.350m unique vis­i­tors lat­er, I have re­ceived 116 pieces of feed­back (mean of 24.8k vis­its per feed­back). I cat­e­go­rize them as fol­lows in de­scend­ing or­der of fre­quen­cy:

  • Cor­rec­tions, prob­lems (tech­ni­cal or oth­er­wise), sug­gested ed­its: 34
  • Praise: 31
  • Question/request (per­son­al, tech sup­port, etc): 22
  • Misc (eg gib­ber­ish, so­cial­iz­ing, Japan­ese): 13
  • Crit­i­cism: 9
  • News/suggestions: 5
  • Fea­ture re­quest: 4
  • Re­quest for cy­ber­ing: 1
  • Ex­tor­tion: 1 (see my black­mail page deal­ing with the Sep­tem­ber 2013 in­ci­dent)

Some sub­mis­sions cover mul­ti­ple an­gles (they can be quite long), some­times peo­ple dou­ble-sub­mit­ted or left it blank, etc, so the num­bers won’t sum to 116.

In gen­er­al, a lot of the cor­rec­tions were us­able and fixed is­sues of vary­ing im­por­tance, from ty­pos to the en­tire site’s CSS be­ing bro­ken due to be­ing up­loaded with the wrong MIME type. One of the news/suggestion feed­backs was very valu­able, as it lead to writ­ing the Silk Road mini-es­say “A Mole?” A lot of the ques­tions were a waste of my time; I’d say half re­lated to Tor/Bitcoin/Silk-Road. (I also got an ir­ri­tat­ing num­ber of emails from peo­ple ask­ing me to, say, buy LSD or heroin off SR for them.) The fea­ture re­quests were usu­ally for a bet­ter RSS feed, which I tried to oblige by start­ing the page. The cy­ber­ing and ex­tor­tion were amus­ing, if noth­ing else. The praise was good for me men­tal­ly, as I don’t in­ter­act much with peo­ple.

I con­sider the anony­mous feed­back form to have been a suc­cess, I’m glad luke­prog brought it up on LW, and I plan to keep the feed­back form in­defi­nite­ly.

Feedback causes

One thing I won­dered is whether feed­back was purely a func­tion of traffic (the more vis­its, the more peo­ple who could see the link in the footer and de­cide to leave a com­men­t), or more re­lated to time (per­haps peo­ple re­turn­ing reg­u­larly and even­tu­ally be­ing em­bold­ened or notic­ing some­thing to com­ment on). So I com­piled daily hits, com­bined with the feed­back dates, and looked at a graph of hits:

Hits over time for Gw­ern.net

The hits are heav­ily skewed by Hacker News & Red­dit traffic spikes, and prob­a­bly should be log trans­formed. Then I did a lo­gis­tic re­gres­sion on hits, log hits, and a sim­ple time in­dex:

feedback <- read.csv("https://www.gwern.net/docs/traffic/2013-gwernnet-anonymousfeedback.csv",
                     colClasses=c("Date","logical","integer"))
plot(Visits ~ Day, data=feedback)
feedback$Time <- 1:nrow(feedback)
summary(step(glm(Feedback ~ log(Visits) + Visits + Time, family=binomial, data=feedback)))
# ...
# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)
# (Intercept) -7.363507   1.311703   -5.61  2.0e-08
# log(Visits)  0.749730   0.173846    4.31  1.6e-05
# Time        -0.000881   0.000569   -1.55     0.12
#
# (Dispersion parameter for binomial family taken to be 1)
#
#     Null deviance: 578.78  on 753  degrees of freedom
# Residual deviance: 559.94  on 751  degrees of freedom
# AIC: 565.9

The logged hits works out bet­ter than reg­u­lar hits, and sur­vives to the sim­pli­fied mod­el. And the traffic in­flu­ence seems much larger than the time vari­able (which is, cu­ri­ous­ly, neg­a­tive).

Technical aspects

Popularity

On a semi­-an­nual ba­sis, since 2011, I re­view Gw­ern.net web­site traffic us­ing Google An­a­lyt­ics; al­though what most read­ers value is not what I val­ue, I find it mo­ti­vat­ing to see to­tal traffic sta­tis­tics re­mind­ing me of read­ers (writ­ing can be a lonely and ab­stract en­deav­our), and use­ful to see what are ma­jor re­fer­rers.

Gw­ern.net typ­i­cally en­joys steady traffic in the 50–100k range per mon­th, with oc­ca­sional spikes from so­cial me­dia, par­tic­u­larly Hacker News; over the first decade (2010–2020), there were 7.98m pageviews by 3.8m unique users.

See

Colophon

Hosting

Gw­ern.net is served by through the . (A­ma­zon charges less for band­width and disk space than NFSN, al­though one loses all the ca­pa­bil­i­ties offered by Apache’s , and com­pres­sion is diffi­cult so must be han­dled by Cloud­Flare; to­tal costs may turn out to be a wash and I will con­sider the switch to Ama­zon S3 a suc­cess if it can bring my monthly bill to <$10 or <$120 a year.)

From Oc­to­ber 2010 to June 2012, the site was hosted on Near­lyFreeSpeech.net, an old host­ing com­pa­ny; its spe­cific niche is con­tro­ver­sial ma­te­r­ial and ac­tivist-friendly pric­ing. Its lib­er­tar­ian own­ers cast a jaun­diced eye on s, and pric­ing is pay-as-y­ou-go. I like the for­mer as­pect, but the lat­ter sold me on NFSN. Be­fore I stum­bled on NFSN (some­one men­tioned it offhand­edly while chat­ting), I was get­ting ready to pay $10–15 a month ($120 year­ly) to . Lin­ode’s offer­ings are overkill since I do not run dy­namic web­sites or some­thing like Haskel­l.org (with wikis and mail­ing lists and repos­i­to­ries), but I did­n’t know a good al­ter­na­tive. NFSN’s pric­ing meant that I paid for us­age rather than large flat fees. I put in $32 to cover reg­is­ter­ing Gw­ern.net un­til 2014, and then an­other $10 to cover band­width & stor­age price. DNS aside, I was billed $8.27 for Oc­to­ber-De­cem­ber 2010; DNS in­clud­ed, Jan­u­ary-April 2011 cost $10.09. $10 cov­ered months of Gw­ern.net for what I would have paid Lin­ode in 1 mon­th! In to­tal, my 2010 costs were $39.44 (bill archive); my 2011 costs were $118.32 ($9.86 a mon­th; archive); and my 2012 costs through June were $112.54 ($21 a mon­th; archive); sum to­tal: $270.3.

The switch to Ama­zon S3 host­ing is com­pli­cated by my si­mul­ta­ne­ous ad­di­tion of Cloud­Flare as a CDN; my to­tal June 2012 Ama­zon bill is $1.62, with $0.19 for stor­age. Cloud­Flare claims it cov­ered 17.5GB of 24.9GB to­tal band­width, so the $1.41 rep­re­sents 30% of my to­tal band­width; mul­ti­ply 1.41 by 3 is 4.30, and my hy­po­thet­i­cal non-Cloud­Flare S3 bill is ~$4.5. Even at $10, this was well be­low the $21 monthly cost at NFSN. (The traffic graph in­di­cates that June 2012 was a rel­a­tively quiet pe­ri­od, but I don’t think this elim­i­nates the fac­tor of 5.) From July 2012 to June 2013, my Ama­zon bills to­taled $60, which is rea­son­able ex­cept for the steady in­crease ($1.62/$3.27/$2.43/$2.45/$2.88/$3.43/$4.12/$5.36/$5.65/$5.49/$4.88/$8.48/$9.26), be­ing pri­mar­ily dri­ven by out­-bound band­width (in June 2013, the $9.26 was largely due to the 75GB trans­ferred—and that was after Cloud­Flare dealt with 82G­B); $9.26 is much higher than I would pre­fer since that would be >$110 an­nu­al­ly. This was prob­a­bly due to all the graph­ics I in­cluded in the “Google shut­downs” analy­sis, since it re­turned to a more rea­son­able $5.14 on 42GB of traffic in Au­gust. Sep­tem­ber, Oc­to­ber, No­vem­ber and De­cem­ber 2013 saw high lev­els main­tained at $7.63/$12.11/$5.49/$8.75, so it’s prob­a­bly a new nor­mal. 2014 en­tailed new costs re­lated to EC2 in­stances & S3 band­width spikes due to host­ing a mul­ti­-gi­ga­byte sci­en­tific dataset, so bills ran $8.51/$7.40/$7.32/$9.15/$26.63/$14.75/$7.79/$7.98/$8.98/$7.71/$7/$5.94. 2015 & 2016 were sim­i­lar: $5.94/$7.30/$8.21/$9.00/$8.00/$8.30/$10.00/$9.68/$14.74/$7.10/$7.39/$8.03/$8.20/$8.31/$8.25/$9.04/$7.60/$7.93/$7.96/$9.98/$9.22/$11.80/$9.01/$8.87. 2017 saw costs in­crease due to one of my side-pro­jects, ag­gres­sively in­creas­ing full­tex­ting of Gw­ern.net by pro­vid­ing more pa­pers & scan­ning cited books, only par­tially off­set by changes like lossy op­ti­miza­tion of im­ages & con­vert­ing GIFs to WebMs: $12.49/$10.68/$11.02/$12.53/$11.05/$10.63/$9.04/$11.03/$14.67/$15.52/$13.12/$12.23 (to­tal: $144.01). In 2018, I con­tin­ued full­tex­ting: $13.08/$14.85/$14.14/$18.73/$18.88/$15.92/$15.64/$15.27/$16.66/$22.56/$23.59/$25.91/(total: $213).

For 2019, I made a de­ter­mined effort to host more things, in­clud­ing whole web­sites like the OKCupid archives or rotten.com, and to in­clude more images/videos (the StyleGAN anime faces tu­to­r­ial alone must be eas­ily 20MB+ just for im­ages) and it shows in how my band­width costs ex­plod­ed: $26.49/$37.56/$37.56/$37.56/$25.00/$25.00/$25.00/$25.00/$77.91/$124.45/$74.32/$79.19. I’ve be­gun con­sid­er­ing a move of Gw­ern.net to my Het­zner ded­i­cated server which has cheap band­width, com­bined with up­grad­ing my Cloud­flare CDN to keep site la­tency in check (even at $20/month, it’s still far cheaper than AWS S3 band­width).

Source

The re­vi­sion his­tory is kept in git; in­di­vid­ual page sources can be read by ap­pend­ing .page to their URL.

Size

As of 2020-01-07, the source of Gw­ern.net is com­posed of >366 text files with >3.76m words or >27MB; this in­cludes my writ­ings & doc­u­ments I have tran­scribed into Mark­down, but ex­cludes im­ages, PDFs, HTML mir­rors, source code, archives, in­fra­struc­ture, popup and the re­vi­sion his­to­ry. With those in­cluded and every­thing com­piled to the sta­tic35 HTML, the site is >18.3GB. The source repos­i­tory con­tains >13,323 patches (this is an un­der­-count as the cre­ation of the repos­i­tory in 2008-09-26 in­cluded al­ready-writ­ten ma­te­ri­al).

Design

“Peo­ple who are re­ally se­ri­ous about soft­ware should make their own hard­ware.”

, “Cre­ative Think” 1982

The sor­row of web de­sign & ty­pog­ra­phy is that it all can mat­ter just a lit­tle how you present your pages. A page can be ter­ri­bly de­signed and ren­der as type­writer text in 80-col­umn ASCII mono­space, and read­ers will still read it, even if they com­plain about it. And the most taste­ful­ly-de­signed page, with true small­caps and cor­rect use of em-dashes vs en-dashes vs hy­phens vs mi­nuses and all, which loads in a frac­tion of a sec­ond and is SEO op­ti­mized, is of lit­tle avail if the page has noth­ing worth read­ing; no amount of ty­pog­ra­phy can res­cue a page of dreck. Per­haps 1% of read­ers could even name any of these de­tails, much less rec­og­nize them. If we added up all the small touch­es, they surely make a differ­ence to the read­er’s hap­pi­ness, but it would have to be a small one—say, 5%.36 It’s hardly worth it for writ­ing just a few things.

But the joy of web de­sign & ty­pog­ra­phy is that just its pre­sen­ta­tion can mat­ter a lit­tle to all your pages. Writ­ing is hard work, and any new piece of writ­ing will gen­er­ally add to the pile of ex­ist­ing ones, rather than mul­ti­ply­ing it all; it’s an enor­mous amount of work to go through all one’s ex­ist­ing writ­ings and im­prove them some­how, so it usu­ally does­n’t hap­pen, De­sign im­prove­ments, on the other hand, ben­e­fit one’s en­tire web­site & all fu­ture read­ers, and so at a cer­tain scale, can be quite use­ful. I feel I’ve reached the point where it’s worth sweat­ing the small stuff, ty­po­graph­i­cal­ly.

Principles

There are 4 de­sign prin­ci­ples:

  1. Aes­thet­i­cal­ly-pleas­ing Min­i­mal­ism

    The de­sign es­thetic is min­i­mal­ist. I be­lieve that helps one fo­cus on the con­tent. Any­thing be­sides the con­tent is dis­trac­tion and not de­sign. ‘At­ten­tion!’, as would say37.

    The palette is de­lib­er­ately kept to grayscale as an ex­per­i­ment in con­sis­tency and whether this con­straint per­mits a read­able aes­thet­i­cal­ly-pleas­ing web­site. Var­i­ous clas­sic ty­po­graph­i­cal tools, like and are used for em­pha­sis.

  2. Ac­ces­si­bil­ity &

    Se­man­tic markup is used where Mark­down per­mits. JavaScript is not re­quired for the core read­ing ex­pe­ri­ence, only for op­tional fea­tures: com­ments, table-sort­ing, , and so on. Pages can even be read with­out much prob­lem in a smart­phone or a text browser like elinks.

  3. Speed & Effi­ciency

    On an in­creas­ing­ly-bloated In­ter­net, a web­site which is any­where re­motely as fast as it could be is a breath of fresh air. Read­ers de­serve bet­ter. Gw­ern.net uses many tricks to offer nice fea­tures like side­notes or LaTeX math at min­i­mal cost.

  4. Struc­tural Read­ing

    How should we present texts on­line? A web page, un­like many medi­ums such as print mag­a­zi­nes, lets us pro­vide an un­lim­ited amount of text. We need not limit our­selves to overly con­cise con­struc­tions, which coun­te­nance con­tem­pla­tion but not con­vic­tion.

    The prob­lem then be­comes tam­ing com­plex­ity and length, lest we hang our­selves with our own rope. Some read­ers want to read every last word about a par­tic­u­lar top­ic, while most read­ers want the sum­mary or are skim­ming through on their way to some­thing else. A tree struc­ture is help­ful in or­ga­niz­ing the con­cepts, but does­n’t solve the pre­sen­ta­tion prob­lem: a book or ar­ti­cle may be hi­er­ar­chi­cally or­ga­nized, but it still must present every last leaf node at 100% size. Tricks like foot­notes or ap­pen­dices only go so far—hav­ing thou­sands of end­notes or 20 ap­pen­dices to tame the size of the ‘main text’ is un­sat­is­fac­tory as while any spe­cific reader is un­likely to want to read any spe­cific ap­pen­dix, they will cer­tainly want to read an ap­pen­dix & pos­si­bly many. The clas­sic hy­per­text par­a­digm of sim­ply hav­ing a rat’s-nest of links to hun­dreds of tiny pages to avoid any page be­ing too big also breaks down, be­cause how gran­u­lar does one want to go? Should every sec­tion be a sep­a­rate page? (Any­one who at­tempted to read a man­ual knows how te­dious that can be, where the de­fault pre­sen­ta­tion of sep­a­rate pages means that an en­tire page may con­tain only a sin­gle para­graph or sen­tence38, and it’s not clear that it’s much bet­ter than the other ex­treme, the mono­lithic which in­cludes every de­tail un­der the sun and is im­pos­si­ble nav­i­gate with­out one’s eyes glaz­ing over even us­ing to nav­i­gate through dozens of ir­rel­e­vant hit­s—ev­ery sin­gle time.) What about every ref­er­ence in the bib­li­og­ra­phy, should there be 100 differ­ent pages for 100 differ­ent ref­er­ences?

    A web page, how­ev­er, can be dy­nam­ic. The so­lu­tion to the length prob­lem is to pro­gres­sively ex­pose more be­yond the de­fault as the user re­quests it, and make re­quest­ing as easy as pos­si­ble. For lack of a well-known term and by anal­ogy to in /, I call this struc­tural read­ing: the hi­er­ar­chy is made vis­i­ble & mal­leable to al­low read­ing at mul­ti­ple lev­els of the struc­ture.

    A Gw­ern.net page can be read at mul­ti­ple struc­tural lev­els: ti­tle, meta­data block, ab­stracts, mar­gin notes, em­pha­sized key­words in list items, footnotes/sidenotes, col­lapsi­ble sec­tions, popup link an­no­ta­tions, and full­text links or in­ter­nal links to other pages. So the reader can read (in in­creas­ing depth) the title/metadata, or the page ab­stract, or skim the headers/Table of Con­tents, then skim mar­gin notes+item sum­maries, then read the body text, then click to un­col­lapse re­gions to read in­-depth sec­tions too, and then if they still want more, they can mouse over ref­er­ences to pull up the ab­stracts or ex­cerpts, and then they can go even deeper by click­ing the full­text link to read the full orig­i­nal. Thus, a page may look short, and the reader can un­der­stand & nav­i­gate it eas­i­ly, but like an ice­berg, those read­ers who want to know more about any spe­cific point will find much more un­der the sur­face.

Features

No­table fea­tures (com­pared to stan­dard Mark­down sta­tic site):

  • us­ing both mar­gins, fall­back to float­ing foot­notes

  • code fold­ing (col­lapsi­ble sections/code blocks/tables)

  • JS-free LaTeX math ren­der­ing

  • Link popup an­no­ta­tions:

    An­no­ta­tions are hand-writ­ten, and au­to­mat­i­cally ex­tracted from Wikipedia/Arxiv/BioRxiv/MedRxiv/gwern.net/Crossref.

  • dark mode (with a )

  • click­-to-zoom im­ages & slideshows; ful­l-width tables/images

  • Dis­qus com­ments

  • sortable ta­bles; ta­bles of var­i­ous sizes

  • au­to­mat­i­cally in­fla­tion-ad­just dol­lar amounts, ex­change-rate Bit­coin amounts

  • link icons for filetype/domain/topic

  • in­foboxes (Wikipedi­a-like by way of Markdeep)

  • light­weight drop caps

  • epigraphs

  • TeX-like hy­phen­ation for jus­ti­fied text (e­spe­cially on Chrome); au­to­matic small­caps type­set­ting

  • 2-col­umn lists

  • in­ter­wiki link syn­tax

Much of Gw­ern.net de­sign and JS/CSS was de­vel­oped by Said Achmiz, 2017–2020. Some in­spi­ra­tion has come from Tufte CSS & Matthew But­t­er­ick’s Prac­ti­cal Ty­pog­ra­phy.

Abandoned

Worth not­ing are things I tried but aban­doned (in roughly chrono­log­i­cal or­der):

  • Gi­tit wiki: I pre­ferred to edit files in Emacs/Bash rather than a GUI/browser-based wi­ki.

    A Pan­doc-based wiki us­ing Darcs as a his­tory mech­a­nism, serv­ing mostly as a de­mo; the re­quire­ment that ‘one page edit = one Darcs re­vi­sion’ quickly be­came sti­fling, and I be­gan edit­ing my Mark­down files di­rectly and record­ing patches at the end of the day, and sync­ing the HTML cache with my host (at the time, a per­sonal di­rec­tory on code.haskell.org). Even­tu­ally I got tired of that and fig­ured that since I was­n’t us­ing the wiki, but only the sta­tic com­piled pages, I might as well switch to Hakyll and a nor­mal sta­tic web­site ap­proach.

  • jQuery sausages: un­help­ful UI vi­su­al­iza­tion of sec­tion lengths.

    A UI ex­per­i­ment, ‘sausages’ add a sec­ond scroll bar where ver­ti­cal lozenges cor­re­spond to each top-level sec­tion of the page; it in­di­cates to the reader how long each sec­tion is and where they are. (They look like a long link of pale white sausages.) I thought it might as­sist the reader in po­si­tion­ing them­selves, like the pop­u­lar ‘float­ing high­lighted Ta­ble of Con­tents’ UI el­e­ment, but with­out text la­bels, the sausages were mean­ing­less. After a jQuery up­grade broke it, I did­n’t bother fix­ing it.

  • Bee­line Reader: a ‘read­ing aid’ which just an­noyed read­ers.

    BLR tries to aid read­ing by col­or­ing the be­gin­nings & end­ings of lines to in­di­cate the con­tin­u­a­tion and make it eas­ier for the read­er’s eyes to sac­cade to the cor­rect next line with­out dis­trac­tion (ap­par­ently dyslexic read­ers in par­tic­u­lar have is­sue cor­rectly fix­at­ing on the con­tin­u­a­tion of a line). The A/B test in­di­cated no im­prove­ments in the time-on-page met­ric, and I re­ceived many com­plaints about it; I was not too happy with the browser per­for­mance or the ap­pear­ance of it, ei­ther.

    I’m sym­pa­thetic to the goal and think syn­tax high­light­ing aids are un­der­used, but BLR was a bit half-baked and not worth the cost com­pared more straight­for­ward in­ter­ven­tions like re­duc­ing para­graph lengths or more rig­or­ous use of ‘struc­tural read­ing’ for­mat­ting. (We may be able to do ty­pog­ra­phy very differ­ently in the fu­ture with new tech­nol­o­gy, like VR/AR head­sets which come with tech­nol­ogy in­tended for —for­get sim­ple tricks like em­pha­siz­ing the be­gin­ning of the next line as the reader reaches the end of the cur­rent line, do we need ‘lines’ at all if we can do things like just-in-time dis­play the next piece of text in­-place to cre­ate an ‘in­fi­nite line’?)

  • : site search fea­ture which too few peo­ple used.

    A ‘cus­tom search en­gine’, a CSE is a souped-up site:gwern.net/ Google search query; I wrote one cov­er­ing gw­ern.net and some of my ac­counts on other web­sites, and added it to the side­bar. Check­ing the an­a­lyt­ics, per­haps 1 in 227 page-views used the CSE, and a de­cent num­ber of them used it only by ac­ci­dent (eg search­ing “e”); an A/B test­ing for a fea­ture used so lit­tle would be pow­er­less, and so I re­moved it rather than try to for­mally test it.

  • Tufte-CSS side­notes: fun­da­men­tally bro­ken, and su­per­seded.

    An early ad­mirer of Tufte-CSS for its side­notes, I gave a Pan­doc plu­gin a try only to dis­cover a ter­ri­ble draw­back: the CSS did­n’t sup­port block el­e­ments & so the plu­gin sim­ply deleted them. This bug ap­par­ently can be fixed, but the den­sity of foot­notes led to us­ing sidenotes.js in­stead.

  • doc­u­ment for­mat use: DjVu is a space-effi­cient doc­u­ment for­mat with the fa­tal draw­back that Google ig­nores it, and “if it’s not in Google, it does­n’t ex­ist.”

    DjVu is a doc­u­ment for­mat su­pe­rior to PDFs, es­pe­cially stan­dard PDFs: I dis­cov­ered that space sav­ings of 5× or more were en­tirely pos­si­ble, so I used it for most of my book scans. It worked fine in my doc­u­ment view­ers, In­ter­net Archive & Lib­gen pre­ferred them, and so why not? Un­til one day I won­dered if any­one was link­ing them and tried search­ing in Google Scholar for some. Not a sin­gle hit! (As it hap­pens, GS seems to specifi­cally fil­ter out book­s.) Per­plexed, I tried Google—also noth­ing. Huh‽ My scans have been vis­i­ble for years, DjVu dates back to the 1990s and was widely used (if not re­motely as pop­u­lar as PDF), and G/GS picks up all my PDFs which are hosted iden­ti­cal­ly. What about filetype:djvu? I dis­cov­ered to my hor­ror that on the en­tire In­ter­net, Google in­dexed about 50 DjVu files. To­tal. While ap­par­ently at one time Google did in­dex DjVu files, that time must be long past.

    Loathe to take the space hit, which would no­tice­ably in­crease my Ama­zon AWS S3 host­ing costs, I looked into PDFs more care­ful­ly. I dis­cov­ered PDF tech­nol­ogy had ad­vanced con­sid­er­ably over the de­fault PDFs that gscan2pdf gen­er­ates, and with com­pres­sion, they were closer to DjVu in size; I could con­ve­niently gen­er­ate such PDFs us­ing ocrmypdf.39 This let me con­vert over at mod­er­ate cost and now my doc­u­ments do show up in Google.

  • Darcs/Github repo: no use­ful con­tri­bu­tions or patches sub­mit­ted, added con­sid­er­able process over­head, and I ac­ci­den­tally broke the repo by check­ing in too-large PDFs from a failed post-D­jVu op­ti­miza­tion pass (I mis­read the re­sult as be­ing small­er, when it was much larg­er).

  • spaces in URLs: an OK idea but users are why we can’t have nice things.

    Gi­tit as­sumed ‘ti­tles = file­names = URLs’, which sim­pli­fied things and I liked spaced-sep­a­rated file­names; I car­ried this over to Hakyll, but grad­u­al­ly, by mon­i­tor­ing an­a­lyt­ics re­al­ized this was a ter­ri­ble mis­take—as straight­for­ward as URL-encoding spaces as %20 may seem to be, no one can do it prop­er­ly. I did­n’t want to fix it be­cause by the time I re­al­ized how bad the prob­lem was, it would have re­quired break­ing, or later on, redi­rect­ing, hun­dreds of URLs and up­dat­ing all my pages. The fi­nal straw was when The Browser linked a page in­cor­rect­ly, send­ing ~1500 peo­ple to the 404 page. I fi­nally gave in and re­placed spaces with hy­phens. (Un­der­scores are the other main op­tion but be­cause of Mark­down, I worry that trades one er­ror for an­oth­er.) I sus­pect I should have also low­er-cased all links while I was at it, but thus far it has not proven too hard to fix case er­rors & low­er-case URLs are ug­ly.

    In ret­ro­spect, Sam Hughes was right: I should have made URLs as sim­ple as pos­si­ble (and then a bit sim­pler): a sin­gle word, low­er­case al­phanum, with no hy­phens or un­der­scores or spaces or punc­tu­a­tion of any sort. I am, how­ev­er, locked in to longer hy­phen-sep­a­rated mixed-case URLs now.

  • ban­ner ads (and ads in gen­er­al): read­er-hos­tile and prob­a­bly a net fi­nan­cial loss.

    I hated run­ning ban­ner ads, but be­fore my Pa­treon be­gan work­ing, it seemed the lesser of two evils. As my fi­nances be­came less par­lous, I be­came cu­ri­ous as to how much lesser—but I could find no In­ter­net re­search what­so­ever mea­sur­ing some­thing as ba­sic as the traffic loss due to ad­ver­tis­ing! So I de­cided to , with a proper sam­ple size and cost-ben­e­fit analy­sis; the harm turned out to be so large that the analy­sis was un­nec­es­sary, and I re­moved Ad­Sense per­ma­nently the first time I saw the re­sults. Given the mea­sured traffic re­duc­tion, I was prob­a­bly los­ing sev­eral times more in po­ten­tial do­na­tions than I ever earned from the ads. (A­ma­zon affil­i­ate links ap­pear to not trig­ger this re­ac­tion, and so I’ve left them alone.)

  • Bitcoin/PayPal/Gittip/Flattr do­na­tion links: never worked well com­pared to Pa­tre­on.

    These meth­ods were ei­ther sin­gle-shot or never hit a crit­i­cal mass. One-off do­na­tions failed be­cause peo­ple would­n’t make a habit if it was man­u­al, and it was too in­con­ve­nient. Gittip/Flattr were sim­i­lar to Pa­treon in bundling do­na­tors, and mak­ing it a reg­u­lar thing, but never hit an ad­e­quate scale.

  • web fonts: slow and bug­gy.

    Google Fonts turned out to in­tro­duce no­tice­able la­tency in page ren­der­ing; fur­ther, its se­lec­tion of fonts is lim­it­ed, and the fonts out­dated or in­com­plete. We got both faster and nicer-look­ing pages by tak­ing the mas­ter Github ver­sions of Adobe Source Serif/Sans Pro (the Google Fonts ver­sion was both out­dated & in­com­plete then) and sub­set­ting them for gw­ern.net specifi­cal­ly.

  • JS: switched to sta­tic ren­der­ing dur­ing com­pi­la­tion for speed.

    For math ren­der­ing, Math­Jax and are rea­son­able op­tions (i­nas­much as browser adop­tion is dead in the wa­ter). Math­Jax ren­der­ing is ex­tremely slow on some pages: up to 6 sec­onds to load and ren­der all the math. Not a great read­ing ex­pe­ri­ence. When I learned that it was pos­si­ble to pre­process Math­Jax-us­ing pages, I dropped Math­Jax JS use the same day.

  • <q> quote tags for Eng­lish : di­vi­sive and a main­te­nance bur­den.

    I like the idea of treat­ing Eng­lish as a lit­tle (not a lot!) more like a for­mal lan­guage, such as a pro­gram­ming lan­guage, as it comes with ben­e­fits like syn­tax high­light­ing. In a pro­gram, the reader gets guid­ance from syn­tax high­light­ing in­di­cat­ing log­i­cal nest­ing and struc­ture of the ‘ar­gu­ment’; in a nat­ural lan­guage doc­u­ment, it’s one damn let­ter after an­oth­er, spiced up with the oc­ca­sional punc­tu­a­tion mark or in­den­ta­tion. (If Lisp looks like “oat­meal with fin­ger­nail clip­pings mixed in” due to the lack of “”, then Eng­lish must be plain oat­meal!) One of the most ba­sic kinds of syn­tax high­light­ing is sim­ply high­light­ing strings and other lit­er­als vs code: I learned early on that syn­tax high­light­ing was worth it just to make sure you had­n’t for­got­ten a quote or paren­the­sis some­where! The same is true of reg­u­lar writ­ing: if you are ex­ten­sively quot­ing or nam­ing things, the reader can get a bit lost in the thick­ets of curly quotes and be un­sure who said what.

    I dis­cov­ered an ob­scure HTML tag en­abled by an ob­scurer Pan­doc set­ting: the quote tag <q>, which re­places quote char­ac­ters and is ren­dered by the browser as quotes (usu­al­ly). Quote tags are parsed ex­plic­it­ly, rather than just be­ing opaque nat­ural lan­guage text blobs, and so they, at least, can be ma­nip­u­lated eas­ily by JS/CSS and syn­tax-high­light­ed. Any­thing in­side a pair of quotes would be tinted a gray to vi­su­ally set it off sim­i­larly to the block­quotes. I was proud of this tweak, which I’ve seen nowhere else.

    The prob­lems with it was that not every­one was a fan (to say the least); it was not al­ways cor­rect (there are many dou­ble-quotes which are not lit­eral quotes of any­thing, like rhetor­i­cal ques­tion­s); and it in­ter­acted badly with every­thing else. The HTML/CSS/JS all had to be con­stantly re­jig­gered to deal with in­ter­ac­tions with quotes, browser up­dates would silently break what was work­ing, and Said Achmiz hated the look. I tried man­u­ally an­no­tat­ing quotes to en­sure they were all cor­rect and not used in dan­ger­ous ways, but even with in­ter­ac­tive reg­exp search-and-re­place to as­sist, the man­ual toil of con­stantly mark­ing up quotes was a ma­jor ob­sta­cle to writ­ing. So I gave in.

  • : a so­lu­tion in search of a prob­lem.

    Red em­pha­sis is a vi­sual strat­egy that works won­der­fully well for many styles, but not gw­ern.net that I could find. Us­ing it on the reg­u­lar web­site re­sulted in too much em­pha­sis and the lack of color any­where else made the de­sign in­con­sis­tent; we tried us­ing it in dark mode to add some color & pre­serve night vi­sion by mak­ing headers/links/drop-caps red, but it looked like “a vam­pire fan­site” as one reader put it. It is a good idea, but we just haven’t found a use for it. (Per­haps if I ever make an­other web­site, it will be de­signed around rubri­ca­tion.)

  • wikipedia-popups.js: a JS li­brary writ­ten to im­i­tate Wikipedia pop­ups, which used the WP API to fetch ar­ti­cle sum­maries; ob­so­leted by the faster & more gen­eral lo­cal sta­tic link an­no­ta­tions.

    I dis­liked the de­lay and as I thought about it, it oc­curred to me that it would be nice to have pop­ups for other web­sites, like Arxiv/BioRxiv links—but they did­n’t have APIs which could be queried. If I fixed the first prob­lem by fetch­ing WP ar­ti­cle sum­maries while com­pil­ing ar­ti­cles and in­lin­ing them into the page, then there was no rea­son to in­clude sum­maries for only Wikipedia links, I could get sum­maries from any tool or ser­vice or API, and I could of course write my own! But that re­quired an al­most com­plete rewrite to turn it into popups.js.

  • link screen­shot pre­views: au­to­matic screen­shots too low-qual­i­ty, and un­pop­u­lar.

    To com­pen­sate for the lack of sum­maries for al­most all links (even after I wrote the code to scrape var­i­ous sites), I tried a fea­ture I had seen else­where of ‘link pre­views’: small thumb­nail sized screen­shots of a web page or PDF, load­ing us­ing JS when the mouse hov­ered over a link. (They were much too large, ~50kb, to in­line sta­t­i­cally like the link an­no­ta­tion­s.) They gave some in­di­ca­tion of what the tar­get con­tent was, and could be gen­er­ated au­to­mat­i­cally us­ing a head­less brows­er. I used Chromi­um’s built-in screen­shot mode for web pages, and took the first page of PDFs.

    The PDFs worked fine, but the web­pages often broke: thanks to ads, newslet­ters, and the GDPR, count­less web­pages will pop up some sort of gi­ant modal block­ing any view of the page con­tent, de­feat­ing the point. (I have ex­ten­sions in­stalled like Al­waysKill­Sticky to block that sort of spam, but Chrome screen­shot can­not use any ex­ten­sions or cus­tomized set­tings, and the Chrome devs refuse to im­prove it.) Even when it did work and pro­duced a rea­son­able screen­shot, many read­ers dis­liked it any­way and com­plained. I was­n’t too happy ei­ther about hav­ing 10,000 tiny PNGs hang­ing around. So as I ex­panded link an­no­ta­tions steadi­ly, I fi­nally pulled the plug on the link pre­views. Too much for too lit­tle.

    • Link Archiv­ing: my link archiv­ing im­proved on the link screen­shots in sev­eral ways. First, Sin­gle­File saves pages in­side a nor­mal Chromium brows­ing in­stance, which does sup­port ex­ten­sions and user set­tings. Killing stick­ies alone elim­i­nates half the bad archives, ad block ex­ten­sions elim­i­nate a chunk more, and No­Script black­lists spe­cific do­mains. (I ini­tially used No­Script on a whitelist ba­sis, but dis­abling JS breaks too many web­sites these days.) Fi­nal­ly, I de­cided to man­u­ally re­view every snap­shot be­fore it went live to catch bad ex­am­ples and ei­ther fix them by hand or add them to the black­list.
  • auto : a good idea but users are why we can’t have nice things.

    OSes/browsers have de­fined a ‘global dark mode’ tog­gle the user can set if they want dark mode every­where, and this is avail­able to a web page; if you are im­ple­ment­ing a dark mode for your web­site, it then seems nat­ural to just make it a fea­ture and turn on iff the tog­gle is on. There is no need for com­pli­cated UI-clut­ter­ing wid­gets. And yet—if you do that, users will reg­u­larly com­plain about the web­site act­ing bizarre or be­ing dark in the day­time, hav­ing ap­par­ently for­got­ten that they en­abled it (or never un­der­stood what that set­ting mean­t).

    A wid­get is nec­es­sary to give read­ers con­trol, al­though even there it can be screwed up: many web­sites set­tle for a sim­ple nega­tion switch of the global tog­gle, but if you do that, some­one who sets dark mode at day will be ex­posed to blind­ing white at night… Our wid­get works bet­ter than that. Most­ly.

  • mul­ti­-col­umn foot­notes: mys­te­ri­ously buggy and yield­ing over­laps.

    Since most foot­notes are short, and no one reads the end­note sec­tion, I thought ren­der­ing them as two columns, as many pa­pers do, would be more space-effi­cient and tidy. It was a good idea, but it did­n’t work.

  • Hy­phe­nop­oly: it turned out to be more effi­cient (and not much harder to im­ple­ment) to hy­phen­ate the HTML dur­ing com­pi­la­tion than to run JS clientside.

    To work around Google Chrome’s 2-decade-long re­fusal to ship hy­phen­ation dic­tio­nar­ies on desk­top and en­able (and in­ci­den­tally use the bet­ter ), the JS li­brary Hy­phe­nop­oly will down­load the TeX Eng­lish dic­tio­nary and type­set a web­page it­self. While the per­for­mance cost was sur­pris­ingly min­i­mal, it was there, and it caused prob­lems with ob­scurer browsers like In­ter­net Ex­plor­er.

    So we scrapped Hy­phenopoly, and I later im­ple­mented a Hakyll func­tion us­ing of the TeX hy­phen­ation al­go­rithm & dic­tio­nary to in­sert at com­pile-time a ‘’ every­where a browser could use­fully break a word, which en­ables Chrome to hy­phen­ate cor­rect­ly, at the mod­er­ate cost of in­lin­ing them and a few edge cas­es.40

  • au­topager key­board short­cuts: bind­ing Home/PgUp & End/PgDwn key­board shorcuts to go to the ‘pre­vi­ous’/‘next’ log­i­cal page turned out to be glitchy & con­fus­ing.

    HTML sup­ports previous/next at­trib­utes on links which spec­ify what URL is the log­i­cal next or pre­vi­ous URL, which makes sense in many con­texts like man­u­als or we­b­comics or se­ries of es­says; browsers make lit­tle use of this meta­data—­typ­i­cally not even to pre­load the next page! (Opera ap­par­ently was one of the few ex­cep­tion­s.)

    Such meta­data was typ­i­cally avail­able in older hy­per­text sys­tems by de­fault, and so older more read­er-ori­ented in­ter­faces like pre-Web hy­per­text read­ers such browsers fre­quently over­loaded the stan­dard page-up/down key­bind­ings to, if one was al­ready at the beginning/ending of a hy­per­text node, go to the log­i­cal previous/next node. This was con­ve­nient, since it made pag­ing through a long se­ries of info nodes fast, al­most as if the en­tire info man­ual were a sin­gle long page, and it was easy to dis­cov­er: most users will ac­ci­den­tally tap them twice at some point, ei­ther re­flex­ively or by not re­al­iz­ing they were al­ready at the top/bottom (as is the case on most info nodes due to egre­gious short­ness). In com­par­ison, nav­i­gat­ing the HTML ver­sion of an info man­ual is frus­trat­ing: not only do you have to use the mouse to page through po­ten­tially dozens of 1-para­graph pages, each page takes no­tice­able time to load (be­cause of fail­ure to ex­ploit pre­load­ing) whereas a lo­cal info browser is in­stan­ta­neous.

    After defin­ing a global se­quence for Gw­ern.net pages, and adding a ‘navbar’ to the bot­tom of each page with previous/next HTML links en­cod­ing that se­quence, I thought it’d be nice to sup­port con­tin­u­ous scrolling through Gw­ern.net, and wrote some JS to de­tect whether at the top/bottom of page, and on each Home/PgUp/End/PgDwn, whether that key had been pressed in the pre­vi­ous 0.5s, and if so, pro­ceed to the previous/next page.

    This worked, but proved buggy and opaque in prac­tice, and tripped up even me oc­ca­sion­al­ly. Since so few peo­ple know about that pre-WWW hy­per­text UI pat­tern (as use­ful as it is), would be un­likely to dis­cover it, or use it much if they did dis­cover it, I re­moved it.

Tools

Soft­ware tools & li­braries used in the site as a whole:

  • The source files are writ­ten in Pan­doc (Pan­doc: John Mac­Far­lane et al; GPL) (source files: Gw­ern Bran­wen, CC-0). The Pan­doc Mark­down uses a num­ber of ex­ten­sions; pipe ta­bles are pre­ferred for any­thing but the sim­plest ta­bles; and I use se­man­tic line­feeds (also called “se­man­tic line breaks” or “ven­ti­lated prose”) for­mat­ting.

  • math is writ­ten in which com­piles to , ren­dered by Math­Jax (A­pache)

  • the site is com­piled with the Hakyllv4+ sta­tic site gen­er­a­tor, used to gen­er­ate Gw­ern.net, writ­ten in (Jasper Van der Jeugt et al; BSD); for the gory de­tails, see hakyll.hs which im­ple­ments the com­pi­la­tion, RSS feed gen­er­a­tion, & pars­ing of in­ter­wiki links as well. This just gen­er­ates the ba­sic web­site; I do many ad­di­tional optimizations/tests be­fore & after up­load­ing, which is han­dled by sync-gwern.net.sh (G­w­ern Bran­wen, CC-0)

    My pre­ferred method of use is to browse & edit lo­cally us­ing Emacs, and then dis­trib­ute us­ing Hakyll. The sim­plest way to use Hakyll is that you cd into your repos­i­tory and runhaskell hakyll.hs build (with hakyll.hs hav­ing what­ever op­tions you like). Hakyll will build a sta­tic HTML/CSS hi­er­ar­chy in­side _site/; you can then do some­thing like firefox _static/index. (Be­cause HTML ex­ten­sions are not spec­i­fied in the in­ter­est of cool URIs, you can­not use the Hakyll watch web­server as of Jan­u­ary 2014.) Hakyl­l’s main ad­van­tage for me is rel­a­tively straight­for­ward in­te­gra­tion with the Pan­doc Mark­down li­braries; Hakyll is not that easy to use, and so I do not rec­om­mend use of Hakyll as a gen­eral sta­tic site gen­er­a­tor un­less one is al­ready adept with Haskell.

  • the CSS is bor­rowed from a mot­ley of sources and has been heav­ily mod­i­fied, but its ori­gin was the Hakyll home­page & Gi­tit; for specifics, see default.css

  • Mark­down ex­ten­sions:

    • I im­ple­mented a Pan­doc Mark­down plu­gin for a cus­tom syn­tax for in­ter­wiki links in Gi­tit, and then ported it to Hakyll (de­fined in hakyll.hs); it al­lows link­ing to the Eng­lish Wikipedia (a­mong oth­ers) with syn­tax like [malefits](!Wiktionary) or [antonym of 'benefits'](!Wiktionary "Malefits"). CC-0.
    • in­fla­tion ad­just­ment: pro­vides a Pan­doc Mark­down plu­gin which al­lows au­to­matic in­fla­tion ad­just­ing of dol­lar amounts, pre­sent­ing the nom­i­nal amount & a cur­rent real amount, with a syn­tax like [$5]($1980).
    • Book affil­i­ate links are through an tag ap­pended in the hakyll.hs
    • im­age di­men­sions are looked up at com­pi­la­tion time & in­serted into <img> tags as browser hints
  • JavaScript:

    • Com­ments are out­sourced to (s­ince I am not in­ter­ested in writ­ing a dy­namic sys­tem to do it, and their an­ti-s­pam tech­niques are much bet­ter than mine).
    • the float­ing foot­notes are via footnotes.js (Lukas Math­is, PD); when the browser win­dow is wide enough, the float­ing foot­notes are in­stead re­placed with mar­ginal notes/side­notes41 us­ing a cus­tom li­brary, sidenotes.js (Said Achmiz, MIT)
    Demon­stra­tion of side­notes on .
    • the HTML ta­bles are sortable via ta­ble­sorter (Chris­t­ian Bach; MIT/GPL)
    • the MathML is ren­dered us­ing
    • an­a­lyt­ics are han­dled by
    • is done us­ing AB­a­lyt­ics (Daniele Mazz­ini; MIT) which hooks into Google An­a­lyt­ics (see ) for in­di­vid­u­al-level test­ing; when do­ing site-level long-term test­ing like in the , I sim­ply write the JS man­u­al­ly.
    • for load­ing introductions/summaries of all links when one mous­es-over a link; reads sta­t­i­cal­ly-gen­er­ated an­no­ta­tions au­to­mat­i­cally pop­u­lated from many sources (Wikipedia, Pub­med, BioRx­iv, Arx­iv, hand-writ­ten…), with spe­cial han­dling of YouTube videos (Said Achmiz, Shawn Presser; MIT)
    • im­age size: ful­l-s­cale im­ages (fig­ures) can be clicked on to zoom into them with slideshow mod­e—use­ful for fig­ures or graphs which do not com­fort­ably fit into the nar­row body—us­ing an­other cus­tom li­brary, image-focus.js (Said Achmiz; GPL)
  • er­ror check­ing: prob­lems such as bro­ken links are checked in 3 phas­es:

    • markdown-lint.sh: writ­ing time
    • sync-gwern.net.sh: dur­ing com­pi­la­tion, san­i­ty-checks file size & count; greps for bro­ken in­ter­wik­is; runs HTML tidy over pages to warn about in­valid HTML; tests live­ness & MIME types of var­i­ous pages post-u­pload; checks for du­pli­cates, read­-on­ly, banned file­types, too large or un­com­pressed im­ages, etc.
    • : linkchecker, Archive­Box, and archiver-bot

Implementation Details

There are a num­ber of lit­tle tricks or de­tails that web de­sign­ers might find in­ter­est­ing.

Effi­cien­cy:

  • fonts:

    • Adobe //: orig­i­nally Gw­ern.net used Baskerville, but sys­tem Baskerville fonts don’t have ad­e­quate small caps. Adobe’s open-source “Source” font fam­ily of screen ser­ifs, how­ev­er, is high qual­ity and comes with good small caps, mul­ti­ple sets of nu­mer­als (‘old-style’ num­bers for the body text and differ­ent num­bers for ta­bles), and looks par­tic­u­larly nice on Macs. (It is also sub­set­ted to cut down the load time.) Small cap CSS is au­to­mat­i­cally added to abbreviations/acronyms/initials by a Hakyll/Pandoc plug­in, to avoid man­ual an­no­ta­tion.

    • effi­cient drop caps by sub­set­ting: 1 drop cap is used on every page, but a typ­i­cal drop cap font will slowly down­load as much as a megabyte in or­der to ren­der 1 sin­gle let­ter.

      CSS font loads avoid down­load­ing font files which are en­tirely un­used, but they must down­load the en­tire font file if any­thing in it is used, so it does­n’t mat­ter that only one let­ter gets used. To avoid this, we split each drop cap font up into a sin­gle font file per let­ter and use CSS to load all the font files; since only 1 font file is used at all, only 1 gets down­load­ed, and it will be ~4kb rather than 168kb. This has been done for all the drop cap fonts used (yinit, Cheshire Ini­tials, Deutsche Zier­schrift, Goudy Ini­tialen, Kan­zlei Ini­tialen), and the nec­es­sary CSS can be seen in fonts.css. To spec­ify the drop cap for each page, a Hakyll meta­data field is used to pick the class and sub­sti­tuted into the HTML tem­plate.

  • lazy JavaScript load­ing by In­ter­sec­tionOb­server: sev­eral JS fea­tures are used rarely or not at all on many pages, but are re­spon­si­ble for much net­work ac­tiv­i­ty. For ex­am­ple, most pages have no ta­bles but ta­ble­sorter must be loaded any­way, and many read­ers will never get all the way to the Dis­qus com­ments at the bot­tom of each page, but Dis­qus will load any­way, caus­ing much net­work ac­tiv­ity and dis­turb­ing the reader be­cause the page is not ‘fin­ished load­ing’ yet.

    To avoid this, In­ter­sec­tionOb­server can be used to write a small JS func­tion which fires only when par­tic­u­lar page el­e­ments are vis­i­ble to the read­er. The JS then loads the li­brary which does its thing. So an In­ter­sec­tionOb­server can be de­fined to fire only when an ac­tual <table> el­e­ment be­comes vis­i­ble, and on pages with no ta­bles, this never hap­pens. Sim­i­larly for Dis­qus and image-focus.js. This trick is a lit­tle dan­ger­ous if a li­brary de­pends on an­other li­brary be­cause the load­ing might cause race con­di­tions; for­tu­nate­ly, only 1 li­brary, ta­ble­sorter, has a pre­req­ui­site, jQuery, so I sim­ply prepend jQuery to ta­ble­sorter and load ta­ble­sorter. (Other li­braries, like side­notes or WP pop­ups, aren’t lazy-loaded be­cause side­notes need to be ren­dered as fast as pos­si­ble or the page will jump around & be lag­gy, and WP links are so uni­ver­sal it’s a waste of time mak­ing them lazy since they will be in the first screen on every page & be loaded im­me­di­ately any­way, so they are sim­ply loaded asyn­chro­nously with the defer JS key­word.)

  • im­age op­ti­miza­tion: PNGs are op­ti­mized by pngnq/advpng, JPEGs with mozjpeg, SVGs are mini­fied, PDFs are com­pressed with ocrmypdf’s sup­port. (GIFs are not used at all in fa­vor of WebM/MP4 <video>s.)

  • JS/CSS mini­fi­ca­tion: be­cause Cloud­flare does Brotli com­pres­sion, mini­fi­ca­tion of JS/CSS has lit­tle ad­van­tage and makes de­vel­op­ment hard­er, so no mini­fi­ca­tion is done; the font files don’t need any spe­cial com­pres­sion ei­ther.

  • Math­Jax: get­ting well-ren­dered math­e­mat­i­cal equa­tions re­quires Math­Jax or a sim­i­lar heavy­weight JS li­brary; worse, even after dis­abling fea­tures, the load & ren­der time is ex­tremely high—a page like which is both large & has a lot of equa­tions can vis­i­bly take >5s (as a progress bar that help­fully pops up in­forms the read­er).

    The so­lu­tion here is to pre­ren­der Math­Jax lo­cally after Hakyll com­pi­la­tion, us­ing the lo­cal tool mathjax-node-page to load the fi­nal HTML files, parse the page to find all the math, com­pile the ex­pres­sions, de­fine the nec­es­sary CSS, and write the HTML back out. Pages still need to down­load the fonts but the over­all speed goes from >5s to <0.5s, and JS is not nec­es­sary at all.

  • col­lapsi­ble sec­tions: man­ag­ing com­plex­ity of pages is a bal­anc­ing act. It is good to pro­vide all nec­es­sary code to re­pro­duce re­sults, but does the reader re­ally want to look at a big block of code? Some­times they al­ways would, some­times only a few read­ers in­ter­ested in the gory de­tails will want to read the code. Sim­i­lar­ly, a sec­tion might go into de­tail on a tan­gen­tial topic or pro­vide ad­di­tional jus­ti­fi­ca­tion, which most read­ers don’t want to plow through to con­tinue with the main theme. Should the code or sec­tion be delet­ed? No. But rel­e­gat­ing it to an ap­pen­dix, or an­other page en­tirely is not sat­is­fac­tory ei­ther—­for code blocks par­tic­u­lar­ly, one loses the lit­er­ate pro­gram­ming as­pect if code blocks are be­ing shuffled around out of or­der.

    A nice so­lu­tion is to sim­ply use a lit­tle JS to im­ple­ment ap­proach where sec­tions or code blocks can be vi­su­ally shrunk or col­lapsed, and ex­panded on de­mand by a mouse click. Col­lapsed sec­tions are spec­i­fied by a HTML class (eg <div class="collapse"></div>), and sum­maries of a col­lapsed sec­tion can be dis­played, de­fined by an­other class (<div class="collapseSummary">). This al­lows code blocks to be col­lapse by de­fault where they are lengthy or dis­tract­ing, and for en­tire re­gions to be col­lapsed & sum­ma­rized, with­out re­sort­ing to many ap­pen­dices or forc­ing the reader to an en­tirely sep­a­rate page.

  • side­notes: one might won­der why sidenotes.js is nec­es­sary when most side­note uses are like and use a sta­tic HTML/CSS ap­proach, which would avoid a JS li­brary en­tirely and vis­i­bly re­paint­ing the page after load?

    The prob­lem is that Tufte-CSS-style side­notes do not re­flow and are solely on the right mar­gin (wast­ing the con­sid­er­able white­space on the left), and de­pend­ing on the im­ple­men­ta­tion, may over­lap, be pushed far down the page away from their, break when the browser win­dow is too nar­row or not work on smartphones/tablets at all. The JS li­brary is able to han­dle all these and can han­dle the most diffi­cult cases like my an­no­tated edi­tion of Ra­di­ance. (Tufte-CSS-style epigraphs, how­ev­er, pose no such prob­lems and we take the same ap­proach of defin­ing an HTML class & styling with CSS.)

  • Link icons: icons are de­fined for all file­types used in Gw­ern.net and many com­mon­ly-linked web­sites such as Wikipedia, Gw­ern.net (with­in-page sec­tion links and be­tween-page get ‘§’ & logo icons re­spec­tive­ly), or YouTube; all are in­lined into default.css as ; the SVGs are so small it would be ab­surd to have them be files.

  • Redi­rects: sta­tic sites have trou­ble with redi­rects, as they are just sta­tic files. AWS 3S does not sup­port a .htaccess-like mech­a­nism for rewrit­ing URLs. To al­low­ing mov­ing pages & fix bro­ken links, I wrote Hakyll.Web.Redirect for gen­er­at­ing sim­ple HTML pages with redi­rect meta­data+JS, which sim­ply redi­rect from URL 1 to URL 2. After mov­ing to Ng­inx host­ing, I con­verted all the redi­rects to reg­u­lar rewrite rules.

    In ad­di­tion to page re­names, I mon­i­tor 404 hits in Google An­a­lyt­ics to fix er­rors where pos­si­ble, and Ng­inx logs. There are an as­ton­ish­ing num­ber of ways to mis­spell Gw­ern.net URLs, it turns out, and I have de­fined >10k redi­rects so far (in ad­di­tion to generic reg­exp rewrites to fix pat­terns of er­rors).

Benford’s law

Does Gw­ern.net fol­low the fa­mous Ben­ford’s law? A quick analy­sis sug­gests that it sort of does, ex­cept for the digit 2, prob­a­bly due to the many ci­ta­tions to re­search from the past 2 decades (>2000 AD).

In March 2013 I won­dered, upon see­ing a men­tion of : “if I ex­tracted all the num­bers from every­thing I’ve writ­ten on Gw­ern.net, would it sat­isfy Ben­ford’s law?” It seems the an­swer is… al­most. I gen­er­ate the list of num­bers by run­ning a Haskell pro­gram to parse dig­its, com­mas, and pe­ri­ods; and then I process it with shell util­i­ties.42 This can then be read in R to run a con­firm­ing lack of fit (p=~0) and gen­er­ate this com­par­i­son of the data & Ben­ford’s law43:

Histogram/barplot of parsed num­bers vs pre­dicted

There’s a clear re­sem­blance for every­thing but the digit ‘2’, which then blows the fit to heck. I have no idea why 2 is over­rep­re­sent­ed—it may be due to all the ci­ta­tions to re­cent aca­d­e­mic pa­pers which would in­volve num­bers start­ing with ‘2’ (2002, 2010, 2013…) and cause a dou­ble-count in both the ci­ta­tion and file­name, since if I look in the docs/ full­text fold­er, I see 160 files start­ing with ‘1’ but 326 start­ing with ‘2’. But this can’t be the en­tire ex­pla­na­tion since ‘2’ has 20.3k en­tries while to fit Ben­ford, it needs to be just 11.5k—leav­ing a gap of ~10k num­bers un­ex­plained. A mys­tery.

License

This site is li­censed un­der the pub­lic do­main (C­C-0) li­cense.

I be­lieve the pub­lic do­main li­cense re­duces and 44, en­cour­ages copy­ing (), gives back (how­ever lit­tle) to /, and costs me noth­ing45.


  1. , pg 19 of Russ­ian Sil­hou­ettes, on why he wrote his book of bi­o­graph­i­cal sketches of great So­viet chess play­ers. (As Richard­son asks (Vec­tors 1.0, 2001): “25. Why would we write if we’d al­ready heard what we wanted to hear?”)↩︎

  2. One dan­ger of such an ap­proach is that you will sim­ply en­gage in , and build up an im­pres­sive-look­ing wall of ci­ta­tions that is com­pletely wrong but effec­tive in brain­wash­ing your­self. The only so­lu­tion is to be dili­gent to in­clude crit­i­cis­m—so even if you do not es­cape brain­wash­ing, at least your read­ers have a chance. , 1902:

    I had, al­so, dur­ing many years fol­lowed a golden rule, name­ly, that when­ever a pub­lished fact, a new ob­ser­va­tion or thought came across me, which was op­posed to my gen­eral re­sults, to make a mem­o­ran­dum of it with­out fail and at on­ce; for I had found by ex­pe­ri­ence that such facts and thoughts were far more apt to es­cape from the mem­ory than favourable ones. Ow­ing to this habit, very few ob­jec­tions were raised against my views which I had not at least no­ticed and at­tempted to an­swer.

    ↩︎
  3. “It is only the at­tempt to write down your ideas that en­ables them to de­vel­op.” –Wittgen­stein (pg 109, Rec­ol­lec­tions of Wittgen­stein); “I thought a lit­tle [while in the iso­la­tion tank], and then I stopped think­ing al­to­geth­er…in­cred­i­ble how idle­ness of body leads to idle­ness of mind. After 2 days, I’d turned into an id­iot. That’s the rea­son why, dur­ing a flight, as­tro­nauts are al­ways kept busy.” –O­ri­ana Fal­laci, quoted in Rocket Men: The Epic Story of the First Men on the Moon by Craig Nel­son.↩︎

  4. Such as uni­verse; con­sider the in­tro­duc­tion to the chrono­log­i­cally last story in that set­ting, “Safe at Any Speed” (Tales of Known Space).↩︎

  5. :

    “If the in­di­vid­ual lived five hun­dred or one thou­sand years, this clash (be­tween his in­ter­ests and those of so­ci­ety) might not ex­ist or at least might be con­sid­er­ably re­duced. He then might live and har­vest with joy what he sowed in sor­row; the suffer­ing of one his­tor­i­cal pe­riod which will bear fruit in the next one could bear fruit for him too.”

    ↩︎
  6. From Ag­ing and Old Age:

    One way to dis­tin­guish em­pir­i­cally be­tween ag­ing effects and prox­im­i­ty-to-death effects would be to com­pare, with re­spect to choice of oc­cu­pa­tion, in­vest­ment, ed­u­ca­tion, leisure ac­tiv­i­ties, and other ac­tiv­i­ties, el­derly peo­ple on the one hand with young or mid­dle-aged peo­ple who have trun­cated life ex­pectan­cies but are in ap­par­ent good health, on the oth­er. For ex­am­ple, a per­son newly in­fected with the AIDS virus (HIV) has roughly the same life ex­pectancy as a 65-year-old and is un­likely to have, as yet, [ma­jor] symp­toms. The con­ven­tional hu­man-cap­i­tal model im­plies that, after cor­rec­tion for differ­ences in in­come and for other differ­ences be­tween such per­sons and el­derly per­sons who have the same life ex­pectancy (a big differ­ence is that the for­mer will not have pen­sion en­ti­tle­ments to fall back up­on), the be­hav­ior of the two groups will be sim­i­lar. It does ap­pear to be sim­i­lar, so far as in­vest­ing in hu­man cap­i­tal is con­cerned; the trun­ca­tion of the pay­back pe­riod causes dis­in­vest­ment. And there is a high sui­cide rate among HIV-infected per­sons (even be­fore they have reached the point in the pro­gres­sion of the dis­ease at which they are clas­si­fied as per­sons with AIDS), just as there is, as we shall see in chap­ter 6, among el­derly per­sons.

    ↩︎
  7. John F. Kennedy, 1962:

    I am re­minded of the story of the great French Mar­shal Lyautey, who once asked his gar­dener to plant a tree. The gar­dener ob­jected that the tree was slow-grow­ing and would not reach ma­tu­rity for a hun­dred years. The Mar­shal replied, “In that case, there is no time to lose, plant it this after­noon.”

    ↩︎
  8. , “Free­dom 0”:

    In the long run, the util­ity of all non-Free soft­ware ap­proaches ze­ro. All non-Free soft­ware is a dead end.

    ↩︎
  9. These de­pen­den­cies can be sub­tle. Com­puter archivist Ja­son Scott writes of ser­vices that:

    URL short­en­ers may be one of the worst ideas, one of the most back­ward ideas, to come out of the last five years. In very re­cent times, per-site short­en­ers, where a web­site reg­is­ters a smaller ver­sion of its host­name and pro­vides a sin­gle small link for a more com­pli­cated piece of con­tent within it… those are fine. But these gen­er­al-pur­pose URL short­en­ers, with their shady or frag­ile se­tups and ut­ter de­pen­dence upon them, well. If we lose or , mil­lions of weblogs, es­says, and non-archived tweets lose their mean­ing. In­stant­ly. To some­one in the fu­ture, it’ll be like every­one from a cer­tain era of his­to­ry, say ten years of the 18th cen­tu­ry, started speak­ing in a one-time pad of cryp­to­graphic pass phras­es. We’re do­ing our best to stop it. Some of the short­en­ers have been help­ful, oth­ers have been hos­tile. A num­ber have died. We’re go­ing to re­lease tor­rents on a reg­u­lar ba­sis of these spread­sheets, these code break­ing spread­sheets, and we hope oth­ers do too.

    ↩︎
  10. re­marks (and the com­ments pro­vide even more ex­am­ples) fur­ther on URL short­en­ers:

    But the biggest bur­den falls on the click­er, the per­son who fol­lows the links. The ex­tra layer of in­di­rec­tion slows down brows­ing with ad­di­tional DNS lookups and server hits. A new and po­ten­tially un­re­li­able mid­dle­man now sits be­tween the link and its des­ti­na­tion. And the long-term archiv­abil­ity of the hy­per­link now de­pends on the health of a third par­ty. The short­ener may de­cide a link is a Terms Of Ser­vice vi­o­la­tion and delete it. If the short­ener ac­ci­den­tally erases a data­base, for­gets to re­new its do­main, or just dis­ap­pears, the link will break. If a top-level do­main changes its pol­icy on com­mer­cial use, the link will break. If the short­ener gets hacked, every link be­comes a po­ten­tial phish­ing at­tack.

    ↩︎
  11. A sta­tic tex­t-source site has many ad­van­tages for Long Con­tent that I con­sider use al­most a no-brain­er.

    • By na­ture, they com­pile most con­tent down to flat stand­alone tex­tual files, which al­low re­cov­ery of con­tent even if the orig­i­nal site soft­ware has bit-rot­ted or the source files have been lost or the com­piled ver­sions can­not be di­rectly used in new site soft­ware: one can parse them with XML tools or with quick hacks or by eye.
    • Site com­pil­ers gen­er­ally re­quire de­pen­den­cies to be de­clared up front, and the ap­proach makes ex­plic­it­ness and con­tent easy, but dy­namic in­ter­de­pen­dent com­po­nents diffi­cult, all of which dis­cour­ages creep­ing com­plex­ity and hid­den state.
    • A sta­tic site can be archived into a tar­ball of files which will be read­able as long as web browsers ex­ist (or after­wards if the HTML is rea­son­ably clean), but it could be diffi­cult to archive a CMS like Word­Press or Blogspot (the lat­ter does­n’t even pro­vide the con­tent in HTML—it only pro­vides a rat’s-nest of in­scrutable JavaScript files which then down­load the con­tent from some­where and dis­play it some­how; in­deed, I’m not sure how I would au­to­mate archiv­ing of such a site if I had to; I would need some sort of head­less browser to run the JS and se­ri­al­ize the fi­nal re­sult­ing DOM, pos­si­bly with some script­ing of mouse/keyboard ac­tion­s).
    • The con­tent is often not avail­able lo­cal­ly, or is stored in opaque bi­nary for­mats rather than text (if one is lucky, it will at least be a data­base), both of which make it diffi­cult to port con­tent to other web­site soft­ware; you won’t have the nec­es­sary pieces, or they will be in wildly in­com­pat­i­ble for­mats.
    • Sta­tic sites are usu­ally writ­ten in a rea­son­ably stan­dard­ized markup lan­guage such as Mark­down or LaTeX, in dis­tinc­tion to blogs which force one through WYSIWYG ed­i­tors or in­vent their own markup con­ven­tions, which is yet an­other bar­ri­er: pars­ing a pos­si­bly il­l-de­fined lan­guage.
    • The low­ered sysad­min efforts (who wants to be con­stantly clean­ing up spam or hacks on their Word­Press blog?) are a fi­nal ad­van­tage: lower run­ning costs make it more likely that a site will stay up rather than cease to be worth the has­sle.

    Sta­tic sites are not ap­pro­pri­ate for many kinds of web­sites, but they are ap­pro­pri­ate for web­sites which are con­tent-ori­ent­ed, do not need in­ter­ac­tiv­i­ty, ex­pect to mi­grate web­site soft­ware sev­eral times over com­ing decades, want to en­able archiv­ing by one­self or third par­ties (“lots of copies keeps stuff safe”), and to grace­fully de­grade after loss or bi­trot.↩︎

  12. Such as burn­ing the oc­ca­sional copy onto read­-only me­dia like DVDs.↩︎

  13. One can’t be sure; the IA is fed by , and Alexa does­n’t guar­an­tee pages will be & pre­served if one goes through their re­quest form.↩︎

  14. I am dili­gent in back­ing up my files, in pe­ri­od­i­cally copy­ing my con­tent from the , and in pre­serv­ing viewed In­ter­net con­tent; why do I do all this? Be­cause I want to be­lieve that my mem­o­ries are pre­cious, that the things I saw and said are valu­able; “I want to meet them again, be­cause I be­lieve my feel­ings at that time were re­al.” My past is not trash to me, used up & dis­card­ed.↩︎

  15. Ex­am­ples of such blogs:

    1. con­tri­bu­tions to Less­Wrong were the rough draft of a phi­los­o­phy book (or two)
    2. John Rob­b’s Global Guer­ril­las lead to his Brave New War: The Next Stage of Ter­ror­ism and the End of Glob­al­iza­tion
    3. ’s Tech­nium was turned into What Tech­nol­ogy Wants.

    An ex­am­ple of how not to do it would be Over­com­ing Bias blog; it is stuffed with fas­ci­nat­ing ci­ta­tions & sketches of ideas, but they never go any­where with the ex­cep­tion of his mind em­u­la­tion econ­omy posts which were even­tu­ally pub­lished in 2016 as The Age of Em. Just his posts on med­i­cine would make a fas­ci­nat­ing es­say or just list—but he has never made one. ( would be a nat­ural home for many of his posts’ con­tents, but will never be up­dat­ed.)↩︎

  16. “Kevin Kelly An­swers Your Ques­tions”, 2011-09-06:

    [Ques­tion:] “One pur­pose of the is to en­cour­age long-term think­ing. Aside from the Clock, though, what do you think peo­ple can do in their every­day lives to adopt or pro­mote long-term think­ing?”

    : “The 10,000-year Clock we are build­ing in the hills of west Texas is meant to re­mind us to think long-term, but learn­ing how to do that as in in­di­vid­ual is diffi­cult. Part of the diffi­culty is that as in­di­vid­u­als we con­strained to short lives, and are in­her­ently not long-term. So part of the skill in think­ing long-term is to place our val­ues and en­er­gies in ways that tran­scend the in­di­vid­u­al—ei­ther in gen­er­a­tional pro­jects, or in so­cial en­ter­pris­es.”

    “As a start I rec­om­mend en­gag­ing in a project that will not be com­plete in your life­time. An­other way is to re­quire that your cur­rent projects ex­hibit some pay­off that is not im­me­di­ate; per­haps some small por­tion of it pays off in the fu­ture. A third way is to cre­ate things that get bet­ter, or run up in time, rather than one that de­cays and runs down in time. For in­stance a seedling grows into a tree, which has seedlings of its own. A pro­gram like which gives breed­ing pairs of an­i­mals to poor farm­ers, who in turn must give one breed­ing pair away them­selves, is an ex­otropic scheme, grow­ing up over time.”

    ↩︎
  17. ‘Princess Ir­u­lan’, , ↩︎

  18. re­ports in “A good vol­un­teer is hard to find” that of vol­un­teers mo­ti­vated enough to email them ask­ing to help, some­thing like <20% will com­plete the GiveWell test as­sign­ment and ren­der mean­ing­ful help. Such per­sons would have been well-ad­vised to have sim­ply do­nated some mon­ey. I have long noted that many of the most pop­u­lar pages on Gw­ern.net could have been writ­ten by any­one and drew on no unique tal­ents of mine; I have on sev­eral oc­ca­sions re­ceived offers to help with the DNB FAQ—none of which have re­sulted in ac­tual help.↩︎

  19. An old sen­ti­ment; con­sider “A drop hol­lows out the stone” (Ovid, Epis­tles) or Thomas Car­lyle’s “The weak­est liv­ing crea­ture, by con­cen­trat­ing his pow­ers on a sin­gle ob­ject, can ac­com­plish some­thing. The strongest, by dis­pens­ing his over many, may fail to ac­com­plish any­thing. The drop, by con­tin­u­ally falling, bores its pas­sage through the hard­est rock. The hasty tor­rent rushes over it with hideous up­roar, and leaves no trace be­hind.” (The life of Friedrich Schiller, 1825)↩︎

  20. “Ten Lessons I wish I had been Taught”, :

    Richard Feyn­man was fond of giv­ing the fol­low­ing ad­vice on how to be a ge­nius. You have to keep a dozen of your fa­vorite prob­lems con­stantly present in your mind, al­though by and large they will lay in a dor­mant state. Every time you hear or read a new trick or a new re­sult, test it against each of your twelve prob­lems to see whether it helps. Every once in a while there will be a hit, and peo­ple will say: ‘How did he do it? He must be a ge­nius!’

    ↩︎
  21. IQ is some­times used as a proxy for health, like height, be­cause it some­times seems like any health prob­lem will dam­age IQ. Did­n’t get much pro­tein as a kid? Con­grat­u­la­tions, your nerves will lack and you will lit­er­ally think slow­er. Miss­ing some ? Say good bye to <10 points! If you’re ane­mic or iron-d­e­fi­cient, that might in­crease to <15 points. Have tape­worms? There go some more points, and maybe cen­time­ters off your adult height, thanks to the worms steal­ing nu­tri­ents from you. Have a rough birth and suffer a spot of hy­poxia be­fore you be­gan breath­ing on your own? Tough luck, old bean. It is very easy to lower IQ; you can do it with a base­ball bat. It’s the other way around that’s nearly im­pos­si­ble.↩︎

  22. And Amer­ica has tried pretty hard over the past 60 years to affect IQ. The whole nature/nurture would be moot if there were some nu­tri­ent or ed­u­ca­tional sys­tem which could add even 10 points on av­er­age, be­cause then we would use it on all the blacks. But it seems that I’m con­stantly read­ing about pro­grams like which boost IQ for a lit­tle while… and do noth­ing in the long run.↩︎

  23. For de­tails on the many valu­able cor­re­lates of the Con­sci­en­tious­ness per­son­al­ity fac­tor, see Con­sci­en­tious­ness and on­line ed­u­ca­tion.↩︎

  24. 25 episodes, 6 movies, >11 manga vol­umes—just to stick to the core works.↩︎

  25. , KKS XII: 609:

    More than my life
    What I most regret
    Is
    A dream unfinished
    And awakening.
    ↩︎
  26. As with Cloud Nine; I ac­ci­den­tally erased every­thing on a rou­tine ba­sis while mess­ing around with Win­dows.↩︎

  27. For ex­am­ple, I no­tice I am no longer deeply in­ter­ested in the oc­cult. Hope­fully this is be­cause I have grown men­tally and rec­og­nize it as rub­bish; I would be em­bar­rassed if when I died it turned out my youth­ful self had a bet­ter grasp on the real world.↩︎

  28. Some pages don’t have any con­nec­tion to pre­dic­tions. It’s pos­si­ble to make pre­dic­tions for some bor­der cases like the ter­ror­ism es­says (death tolls, achieve­ments of par­tic­u­lar groups’ pol­icy goal­s), but what about the short sto­ries or po­ems? My imag­i­na­tion fails there.↩︎

  29. Think­ing of pre­dic­tions is good men­tal dis­ci­pline; we should al­ways be able to cash out our be­liefs in terms of the real world, or know why we can­not. Un­for­tu­nate­ly, hu­mans be­ing hu­mans, we need to ac­tu­ally track our pre­dic­tions——lest our pre­dict­ing de­gen­er­ate into en­ter­tain­ment like po­lit­i­cal pun­dit­ry.↩︎

  30. Dozens of the­o­ries have been put forth. I have been col­lect­ing & mak­ing pre­dic­tions; and am up to 219. It will be in­ter­est­ing to see how the movies turn out.↩︎

  31. I have 2 pre­dic­tions reg­is­tered about the the­sis on PB.­com: 1 re­viewer will ac­cept my the­ory by 2016 and the light nov­els will fin­ish by 2015.↩︎

  32. See Robin Han­son, “If Up­loads Come First”↩︎

  33. I orig­i­nally used last file mod­i­fi­ca­tion time but this turned out to be con­fus­ing to read­ers, be­cause I so reg­u­larly add or up­date links or add new for­mat­ting fea­tures that the file mod­i­fi­ca­tion time was usu­ally quite re­cent, and so it was mean­ing­less.↩︎

  34. Re­ac­tive archiv­ing is in­ad­e­quate be­cause such links may die be­fore my crawler gets to them, may not be archiv­able, or will just ex­pose read­ers to dead links for an un­ac­cept­ably long time be­fore I’d nor­mally get around to them.↩︎

  35. I like the sta­tic site ap­proach to things; it tends to be harder to use and more re­stric­tive, but in ex­change it yields bet­ter per­for­mance & leads to fewer has­sles or run­time is­sues. The sta­tic model of com­pil­ing a sin­gle mono­lithic site di­rec­tory also lends it­self to test­ing: any shell script or CLI tool can be eas­ily run over the com­piled site to find po­ten­tial bugs (which has be­come in­creas­ingly im­por­tant as site com­plex­ity & size in­creases so much that eye­balling the oc­ca­sional page is in­ad­e­quate).↩︎

  36. Rut­ter ar­gues for this point in Web Ty­pog­ra­phy, which is con­sis­tent with my own where even lousy changes are diffi­cult to dis­tin­guish from zero effect de­spite large n, and with the gen­eral sham­bolic state of the In­ter­net (eg as re­viewed in the 2019 Web Al­manac). If users and load­ing times of mul­ti­ple sec­onds have rel­a­tively mod­est traffic re­duc­tions, things like align­ing columns prop­erly or us­ing sec­tion signs or side­notes must have effects on be­hav­ior so close to zero as to be un­ob­serv­able.↩︎

  37. Para­phrased from Di­a­logues of the Zen Mas­ters as quoted in pg 11 of the Ed­i­tor’s In­tro­duc­tion to Three Pil­lars of Zen:

    One day a man of the peo­ple said to Mas­ter Ikkyu: “Mas­ter, will you please write for me max­ims of the high­est wis­dom?” Ikkyu im­me­di­ately brushed out the word ‘At­ten­tion’. “Is that all? Will you not write some more?” Ikkyu then brushed out twice: ‘At­ten­tion. At­ten­tion.’ The man re­marked ir­ri­ta­bly that there was­n’t much depth or sub­tlety to that. Then Ikkyu wrote the same word 3 times run­ning: ‘At­ten­tion. At­ten­tion. At­ten­tion.’ Half-an­gered, the man de­mand­ed: “What does ‘At­ten­tion’ mean any­way?” And Ikkyu an­swered gen­tly: “At­ten­tion means at­ten­tion.”

    ↩︎
  38. The HTML ver­sions of GNU Info man­u­als are even worse, be­cause they fail to ex­ploit prefetch­ing & are slower than lo­cal doc­u­men­ta­tion, and take away all of the use­ful key­bind­ings which makes nav­i­gat­ing info man­u­als fast & con­ve­nient. That, decades lat­er, the GNU project keeps gen­er­at­ing doc­u­men­ta­tion in that for­mat, rather than at least as large sin­gle-page man­u­als with hy­per­linked table-of-con­tents, is a good ex­am­ple of how bad they are at UI/UX de­sign.↩︎

  39. Why don’t all PDF gen­er­a­tors use that? Soft­ware patents, which makes it hard to in­stall the ac­tual JBIG2 en­coder (sup­pos­edly all JBIG2 en­cod­ing patents had ex­pired by 2017, but no one, like Linux dis­tros, wants to take the risk of un­known patents sur­fac­ing.), which has to ship sep­a­rately from ocrmypdf, and wor­ries over edge-cases in JBIG2 where num­bers might be vi­su­ally changed to differ­ent num­bers to save bits.↩︎

  40. Specifi­cal­ly: some OS/browsers pre­serve soft hy­phens in copy­-paste, which might con­fuse read­ers, so we use JS to delete soft hy­phens; this breaks for users with JS dis­abled, and on Lin­ux, the X GUI by­passes the JS en­tirely for mid­dle-click but no other way of copy­-past­ing.↩︎

  41. Side­notes have long been used as a ty­po­graphic so­lu­tion to dense­ly-an­no­tated texts such as the (first 2 pages), but have not shown up much on­line yet.

    An early & in­spir­ing use of  notes: Pierre Bayle’s His­tor­i­cal and Crit­i­cal Dic­tio­nary, demon­strat­ing re­cur­sive  (1737, vol­ume 4, pg901; source: Google Books)↩︎

  42. We write a short Haskell pro­gram as part of a pipeline:

    echo '{-# LANGUAGE OverloadedStrings #-};
          import Data.Text as T;
          main = interact (T.unpack . T.unlines . Prelude.filter (/="") .
                           T.split (not . (`elem` "0123456789,.")) . T.pack)' > ~/number.hs &&
    find ~/wiki/ -type f -name "*.page" -exec cat "{}" \; | runhaskell ~/number.hs |
     sort | tr -d ',' | tr -d '.' | cut -c 1 | sed -e 's/0$//' -e '/^$/d' > ~/number.txt
    ↩︎
  43. Graph then test:

    numbers <- read.table("number.txt")
    ta <- table(numbers$V1); ta
    
    #     1     2     3     4     5     6     7     8     9
    # 20550 20356  7087  5655  3900  2508  2075  2349  2068
    ## cribbing exact R code from http://www.math.utah.edu/~treiberg/M3074BenfordEg.pdf
    sta <- sum(ta)
    pb <- sapply(1:9, function(x) log10(1+1/x)); pb
    m <- cbind(ta/sta,pb)
    colnames(m)<- c("Observed Prop.", "Theoretical Prop.")
    barplot( rbind(ta/sta,pb/sum(pb)), beside = T, col = rainbow(7)[c(2,5)],
                  xlab = "First Digit")
    title("Benford's Law Compared to Writing Data")
    legend(16,.28, legend = c("From Page Data", "Theoretical"),
           fill = rainbow(7)[c(2,5)],bg="white")
    chisq.test(ta,p=pb)
    #
    #     Chi-squared test for given probabilities
    #
    # data:  ta
    # X-squared = 9331, df = 8, p-value < 2.2e-16
    ↩︎
  44. PD in­creases eco­nomic effi­ciency through—if noth­ing else—­mak­ing works eas­ier to find. says that “Ob­scu­rity is a far greater threat to au­thors and cre­ative artists than pira­cy.” If that is so, then that means that diffi­culty of find­ing works re­duces the wel­fare of artists and con­sumers, be­cause both forgo a ben­e­fi­cial trade (the artist loses any rev­enue and the con­sumer loses any en­joy­men­t). Even small in­creases in in­con­ve­nience make big differ­ences.↩︎

  45. Not that I could sell any­thing on this wiki; and if I could, I would pol­ish it as much as pos­si­ble, giv­ing me fresh copy­right.↩︎