About This Website

Meta page describing Gwern.net site ideals of stable long-term essays which improve over time; technical decisions using Markdown and static hosting; idea sources and writing methodology; metadata definitions; site statistics; copyright license.
personal⁠, psychology⁠, archiving⁠, statistics⁠, meta⁠, Bayes⁠, Google⁠, predictions⁠, design
2010-10-012021-02-20 finished certainty: highly likely importance: 3


This page is about Gwern.net; for information about me, see Links⁠.

The Content

“Of all the books I have delivered to the presses, none, I think, is as personal as the straggling collection mustered for this hodgepodge, precisely because it abounds in reflections and interpolations. Few things have happened to me, and I have read a great many. Or rather, few things have happened to me more worth remembering than Schopenhauer’s thought or the music of England’s words.”

“A man sets himself the task of portraying the world. Through the years he peoples a space with images of provinces, kingdoms, mountains, bays, ships, islands, fishes, rooms, instruments, stars, horses, and people. Shortly before his death, he discovers that that patient labyrinth of lines traces the image of his face.”

⁠, Epilogue

The content here varies from statistics to psychology to self-experiments⁠/ Quantified Self to philosophy to poetry to programming to anime to investigations of online drug markets or leaked movie scripts (or two topics at once: anime & statistics or anime & criticism or heck anime & statistics & criticism!).

I believe that someone who has been well-educated will think of something worth writing at least once a week; to a surprising extent, this has been true. (I added ~130 documents to this repository over the first 3 years.)

Target Audience

“Special knowledge can be a terrible disadvantage if it leads you too far along a path you cannot explain anymore.”

()

I don’t write simply to find things out, although curiosity is my primary motivator, as I find I want to read something which hasn’t been written—“…I realised that I wanted to read about them what I myself knew. More than this—what only I knew. Deprived of this possibility, I decided to write about them. Hence this book.”1 There are many benefits to keeping notes as they allow one to accumulate confirming and especially contradictory evidence2⁠, and even drafts can be useful so you or simply decently respect the opinions of mankind.

The goal of these pages is not to be a model of concision, maximizing entertainment value per word, or to preach to a choir by elegantly repeating a conclusion. Rather, I am attempting to explain things to my future self, who is intelligent and interested, but has forgotten. What I am doing is explaining why I decided what I did to myself and noting down everything I found interesting about it for future reference. I hope my other readers, whomever they may be, might find the topic as interesting as I found it, and the essay useful or at least entertaining–but the intended audience is my future self.

Development

“I hate the water that thinks that it boiled itself on its own. I hate the seasons that think they cycle naturally. I hate the sun that thinks it rose on its own.”

Sodachi Oikura, (Sodachi Riddle, Part One)

It is everything I felt worth writing that didn’t fit somewhere like Wikipedia or was already written I never expected to write so much, but I discovered that once I had a hammer, nails were everywhere, and that 3⁠.

Long Site

“The Internet is self destructing paper. A place where anything written is soon destroyed by rapacious competition and the only preservation is to forever copy writing from sheet to sheet faster than they can burn. If it’s worth writing, it’s worth keeping. If it can be kept, it might be worth writing…If you store your writing on a third party site like ⁠, or even on your own site, but in the complex format used by blog/wiki software du jour you will lose it forever as soon as hypersonic wings of Internet labor flows direct people’s energies elsewhere. For most information published on the Internet, perhaps that is not a moment too soon, but how can the muse of originality soar when immolating transience brushes every feather?”

(“Self destructing paper”⁠, 2006-12-05)

One of my personal interests is applying the idea of the ⁠. What and how do you write a personal site with the long-term in mind? We live most of our lives in the future, and the actuarial tables give me until the 2070–2080s, excluding any benefits from ⁠/  or projects like ⁠. It is a common-place in science fiction4 that longevity would cause widespread risk aversion. But on the other hand, it could do the opposite: the longer you live, the more long-shots you can afford to invest in. Someone with a timespan of 70 years has reason to protect against black swans—but also time to look for them.5 It’s worth noting that old people make many short-term choices, as reflected in increased suicide rates and reduced investment in education or new hobbies, and this is not due solely to the ravages of age but the proximity of death—the HIV-infected (but otherwise in perfect health) act similarly short-term.6

What sort of writing could you create if you worked on it (be it ever so rarely) for the next 60 years? What could you do if you started now?7

Keeping the site running that long is a challenge, and leads to the recommendations for Resilient Haskell Software: 100% software8⁠, for data, textual human-readability, avoiding external dependencies910⁠, and staticness11⁠.

Preserving the content is another challenge. Keeping the content in a like protects against file corruption and makes it easier to mirror the content; regular backups12 help. I have taken additional measures: has archived most pages and almost all external links; the is also archiving pages & external links13⁠. (For details, read Archiving URLs⁠.)

One could continue in this vein, devising ever more powerful & robust storage methods (perhaps combine the DVCS with through ⁠, a la bup), but what is one to fill the storage with?

Long Content

“What has been done, thought, written, or spoken is not culture; culture is only that fraction which is remembered.”

Gary Taylor (The Clock of the Long Now⁠; emphasis added)14

‘Blog posts’ might be the answer. But I have read blogs for many years and most blog posts are the triumph of the hare over the tortoise. They are meant to be read by a few people on a weekday in 2004 and never again, and are quickly abandoned—and perhaps as Assange says, not a moment too soon. (But isn’t that sad? Isn’t it a terrible for one’s time?) On the other hand, the best blogs always seem to be building something: they are rough drafts—works in progress15⁠. So I did not wish to write a blog. Then what? More than just “evergreen content”, what would constitute Long Content as opposed to the existing culture of Short Content? How does one live in a Long Now sort of way?16

It’s shocking to find how many people do not believe they can learn, and how many more believe learning to be difficult. Muad’Dib knew that every experience carries its lesson.17

My answer is that one uses such a framework to work on projects that are too big to work on normally or too tedious. (Conscientiousness is often lacking online or in volunteer communities18 and many useful things go undone.) Knowing your site will survive for decades to come gives you the mental wherewithal to tackle long-term tasks like gathering information for years, and such persistence can be useful19—if one holds onto every glimmer of genius for years, then even the dullest person may look a bit like a genius himself20⁠. (Even experienced professionals can only write at their peak for a few hours a day—usually first thing in the morning⁠, it seems.) Half the challenge of fighting procrastination is the pain of starting—I find when I actually get into the swing of working on even dull tasks, it’s not so bad. So this suggests a solution: never start. Merely have perpetual drafts, which one tweaks from time to time. And the rest takes care of itself. I have a few examples of this:

  1. DNB FAQ:

    When I read in Wired in 2008 that the obscure working memory exercise called dual n-back (DNB) had been found to increase IQ substantially, I was shocked. IQ is one of the most stubborn properties of one’s mind, one of the most fragile21⁠, the hardest to affect positively22⁠, but also one of the most valuable traits one could have23⁠; if the technique panned out, it would be huge. Unfortunately, DNB requires a major time investment (as in, half an hour daily); which would be a bargain—if it delivers. So, to do DNB or not?

    Questions of great import like this are worth studying carefully. The wheels of academia grind exceeding slow, and only a fool expects unanimous answers from fields like psychology. Any attempt to answer the question ‘is DNB worthwhile?’ will require years and cover a breadth of material. This FAQ on DNB is my attempt to cover that breadth over those years.

  2. Neon Genesis Evangelion notes:

    I have been discussing since 2004. The task of interpreting Eva is very difficult; the source works themselves are a major time-sink24⁠, and there are thousands of primary, secondary, and tertiary works to consider—personal essays, interviews, reviews, etc. The net effect is that many Eva fans ‘know’ certain things about Eva, such as not being a grand ‘screw you’ statement by Hideaki Anno or that the TV series was censored, but they no longer have proof. Because each fan remembers a different subset, they have irreconcilable interpretations. (Half the value of the page for me is having a place to store things I’ve said in countless fora which I can eventually turn into something more systematic.)

    To compile claims from all those works, to dig up forgotten references, to scroll through microfilms, buy issues of defunct magazines—all this is enough work to shatter of the stoutest salaryman. Which is why I began years ago and expect not to finish for years to come. (Finishing by 2020 seems like a good prediction⁠.)

  3. Cloud Nine: Years ago I was reading the papers of the economist ⁠. I recommend his work highly; even if they are wrong, they are imaginative and some of the finest speculative fiction I have read. (Except they were non-fiction.) One night I had a dream in which I saw in a flash a medieval city run in part on Hansonian grounds; a version of his ⁠. A city must have another city as a rival, and soon I had remembered the strange ’90s idea of s, which was easily tweaked to work in a medieval setting. Finally, between them, was one of my favorite proposals, Buckminster Fuller’s megastructure.

    I wrote several drafts but always lost them. Sad25 and discouraged, I abandoned it for years. This fear leads straight into the next example.

  4. A Book reading list:

    Once, I didn’t have to keep reading lists. I simply went to the school library shelf where I left off and grabbed the next book. But then I began reading harder books, and they would cite other books, and sometimes would even have horrifying lists of hundreds of other books I ought to read (‘bibliographies’). I tried remembering the most important ones but quickly forgot. So I began keeping a book list on paper. I thought I would throw it away in a few months when I read them all, but somehow it kept growing and growing. I didn’t trust computers to store it before26⁠, but now I do, and it lives on in digital form (currently on Goodreads—because they have export functionality). With it, I can track how my interests evolved over time27⁠, and what I was reading at the time. I sometimes wonder if I will read them all even by 2070.

What is next? So far the pages will persist through time, and they will gradually improve over time. But a truly Long Now approach would be to make them be improved by time—make them more valuable the more time passes. ( remarks in that a group of monks carved thousands of scriptures into stone, hoping to preserve them for posterity—but posterity would value far more a carefully preserved collection of monk feces, which would tell us countless valuable things about important phenomenon like global warming.)

One idea I am exploring is adding long-term predictions like the ones I make on PredictionBook.com⁠. Many28 pages explicitly or implicitly make predictions about the future. As time passes, predictions would be validated or falsified, providing feedback on the ideas.29

For example, the Evangelion essay’s paradigm implies many things about the future movies in 30⁠; The Melancholy of Kyon is an extended prediction31 of future plot developments in series; Haskell Summer of Code has suggestions about what makes good projects, which could be turned into predictions by applying them to predict success or failure when the next Summer of Code choices are announced. And so on.

I don’t think “Long Content” is simply for working on things which are equivalent to a “” (a work which attempts to be an exhaustive exposition of all that is known—and what has been recently discovered—on a single topic), although monographs clearly would benefit from such an approach. If I write a short essay cynically remarking on, say, Al Gore and predicting he’d sell out and registered some predictions and came back 20 years later to see how it worked out, I would consider this “Long Content” (it gets more interesting with time, as the predictions reach maturation); but one couldn’t consider this a “monograph” in any ordinary sense of the word.

One of the ironies of this approach is that as a ⁠, I assign non-trivial probability to the world undergoing massive change during the 21st century due to any of a number of technologies such as artificial intelligence (such as 32) or ⁠; yet here I am, planning as if I and the world were immortal.

I personally believe that one should “think Less Wrong and act Long Now”, if you follow me. I diligently do my daily spaced-repetition review and n-backing; I carefully design my website and writings to last decades, actively think about how to write material that improves with time, and work on writings that will not be finished for years (if ever). It’s a bit schizophrenic since both are totalized worldviews with drastically conflicting recommendations about where to invest my time. It’s a case of high versus low discount rates; and one could fairly accuse me of committing the ⁠, but then, I’m not sure that sunk cost fallacy is a fallacy (certainly, I have more to show for my wasted time than most people).

The Long Now views its proposals like the Clock and the Long Library and as insurance—in case the future turns out to be surprisingly unsurprising. I view these writings similarly. If most ambitious predictions turn out right and the happens by 2050 or so, then much of my writings will be moot, but I will have all the benefits of said Singularity; if the Singularity never happens or ultimately pays off in a very disappointing way, then my writings will be valuable to me. By working on them, I hedge my bets.

Finding my ideas

To the extent I personally have any method for ‘getting started’ on writing something, it’s to pay attention to anytime you find yourself thinking, “how irritating that there’s no good webpage/Wikipedia article on X” or “I wonder if Y” or “has anyone done Z” or “huh, I just realized that A!” or “this is the third time I’ve had to explain this, jeez.”

The DNB FAQ started because I was irritated people were repeating themselves on the dual n-back mailing list; the modafinil article started because it was a pain to figure out where one could order modafinil; the trio of Death Note articles (Anonymity⁠, Ending⁠, Script) all started because I had an amusing thought about information theory; the Silk Road 1 page was commissioned after I groused about how deeply sensationalist & shallow & ill-informed all the mainstream media articles on the Silk Road drug marketplace were (similarly for Bitcoin is Worse is Better); my Google survival analysis was based on thinking it was a pity that Arthur’s Guardian analysis was trivially & fatally flawed; and so on and so forth.

None of these seems special to me. Anyone could’ve compiled the DNB FAQ; anyone could’ve kept a list of online pharmacies where one could buy modafinil; someone tried something similar to my Google shutdown analysis before me (and the fancier statistics were all standard tools). If I have done anything meritorious with them, it was perhaps simply putting more work into them than someone else would have; to :

“I think you’ll see what I mean if I teach you a few principles magicians employ when they want to alter your perceptions…Make the secret a lot more trouble than the trick seems worth. You will be fooled by a trick if it involves more time, money and practice than you (or any other sane onlooker) would be willing to invest.”

“My partner, Penn, and I once produced 500 live cockroaches from a top hat on the desk of talk-show host David Letterman. To prepare this took weeks. We hired an entomologist who provided slow-moving, camera-friendly cockroaches (the kind from under your stove don’t hang around for close-ups) and taught us to pick the bugs up without screaming like preadolescent girls. Then we built a secret compartment out of foam-core (one of the few materials cockroaches can’t cling to) and worked out a devious routine for sneaking the compartment into the hat. More trouble than the trick was worth? To you, probably. But not to magicians.”

Besides that, I think after a while writing/research can be a virtuous circle or autocatalytic. If one were to look at my repo statistics⁠, you see that I haven’t always been writing as much. What seems to happen is that as I write more:

  • I learn more tools

    eg. I learned basic in R to answer what all the positive & negative n-back studies summed to⁠, but then I was able to use it for iodine⁠; I learned linear models for analyzing MoR reviews but now I can use them anywhere I want to, like in my Touhou draft material⁠.

    The “Feynman method” has been facetiously described as “find a problem; think very hard; write down the answer”, but Gian-Carlo Rota gives the real one:

    Richard Feynman was fond of giving the following advice on how to be a genius. You have to keep a dozen of your favorite problems constantly present in your mind, although by and large they will lay in a dormant state. Every time you hear or read a new trick or a new result, test it against each of your twelve problems to see whether it helps. Every once in a while there will be a hit, and people will say: “How did he do it? He must be a genius!”

  • I internalize a habit of noticing interesting questions that flit across my brain

    eg. in March 2013 while meditating: “I wonder if more doujin music gets released when unemployment goes up and people may have more spare time or fail to find jobs? Hey! That giant Touhou music torrent I downloaded, with its 45000 songs all tagged with release year, could probably answer that!” (One could argue that these questions probably should be ignored and not investigated in depth—Teller again—nevertheless, this is how things work for me.)

  • if you aren’t writing, you’ll ignore useful links or quotes; but if you stick them in small asides or footnotes as you notice them, eventually you’ll have something bigger.

    I grab things I see on Google Alerts & Scholar, Pubmed, Reddit, Hacker News, my RSS feeds, books I read, and note them somewhere until they amount to something. (An example would be my slowly accreting citations on IQ and economics⁠.)

  • people leave comments, ping me on IRC, send me emails, or leave anonymous messages, all of which help

    Some examples of this come from my most popular page, on Silk Road 1:

    1. an anonymous message led me to investigate a vendor in depth and ponder the accusation leveled against them⁠; I wrote it up and gave my opinions and thus I got another short essay to add to my SR page which I would not have had otherwise (and I think there’s a <20% chance that in a few years this will pay off and become a very interesting essay).
    2. CMU’s Nicholas Christin, who by scraping SR for many months and giving all sorts of overall statistics, emailed me to point out I was citing inaccurate figures from the first version of his paper. I thanked him for the correction and while I was replying, mentioned I had a hard time believing his paper’s claims about the extreme rarity of scams on SR as estimated through buyer feedback. After some back and forth and suggesting specific mechanisms how the estimates could be positively biased, he was able to check his database and confirmed that there was at least one very large omission of scams in the scraped data and there was probably a general undersampling; so now I have a more accurate feedback estimate for my SR page (important for estimating risk of ordering) and he said he’ll acknowledge me in the/a paper, which is nice.

Information organizing

Occasionally people ask how I manage information and read things.

  1. For quotes or facts which are very important, I employ spaced repetition by adding them to my Mnemosyne

  2. I keep web clippings in Evernotes; I also excerpt from research papers & books, and miscellaneous sources. This is useful for targeted searches when I remember a fact but not where I learned it, and for storing things which I don’t want to memorize but which have no logical home in my website or LW or elsewhere. It is also helpful for writing my book reviews and the ⁠, as I can read through my book excerpts to remind myself of the highlights and at the end of the month review clippings from papers/webpages to find good things to reshare which I was too busy at the time to do so or was unsure of its importance. I don’t make any use of more complex Evernote features.

    I periodically back up my Evernote using the Linux client Nixnote’s export feature. (I made sure there was a working export method before I began using Evernote, and use it only as long as Nixnote continues to work.)

    My workflow for dealing with PDFs, as of late 2014, is:

    1. if necessary, jailbreak the paper using Libgen or a university proxy, then upload a copy to Dropbox, named year-author.pdf
    2. read the paper, making excerpts as I go
    3. store the metadata & excerpts in Evernote
    4. if useful, integrate into Gwern.net with its title/year/author metadata, adding a local fulltext copy if the paper had to be jailbroken, otherwise rely on my custom archiving setup to preserve the remote URL
    5. hence, any future searches for the filename / title / key contents should result in hits either in my Evernote or Gwern.net
  3. Web pages are archived & backed up by my custom archiving setup⁠. This is intended mostly for fixing dead links (eg to recover the fulltext of the original URL of an Evernote clipping).

  4. I don’t have any special book reading techniques. For really good books I excerpt from each chapter and stick the quotes into Evernote.

  5. I store insights and thoughts in various pages as parenthetical comments, footnotes, and appendices. If they don’t fit anywhere, I dump them in Notes⁠.

  6. Larger masses of citations and quotes typically get turned into pages.

  7. I make heavy use of RSS subscriptions for news. For that, I am currently using ⁠. (Not that I’m hugely thrilled about it. Google Reader was much better.)

  8. For projects and followups, I use reminders in Google Calendar.

  9. For recording personal data, I automate as much as possible (eg Zeo and arbtt) and I make a habit of the rest—getting up in the morning is a great time to build a habit of recording data because it’s a time of habits like eating breakfast and getting dressed.

Hence, to refind information, I use a combination of Google, Evernote, grep (on the Gwern.net files), occasionally Mnemosyne, and a good visual memory.

As far as writing goes, I do not use note-taking software or things like or —not that I think they are useless but I am worried about whether they would ever repay the large upfront investments of learning/tweaking or interfere with other things. Instead, I occasionally compile outlines of articles from comments on LW/Reddit/IRC, keep editing them with stuff as I remember them, search for relevant parts, allow little thoughts to bubble up while meditating, and pay attention to when I am irritated at people being wrong or annoyed that a particular topic hasn’t been written down yet.

Confidence tags

Most of the metadata in each page is self-explanatory: the date is the last time the page was meaningfully modified33⁠, the tags are categorization, etc. The “status” tag describes the state of completion: whether it’s a pile of links & snippets & “notes”, or whether it is a “draft” which at least has some structure and conveys a coherent thesis, or it’s a well-developed draft which could be described as “in progress”, and finally when a page is done—in lieu of additional material turning up—it is simply “finished”.

The “confidence” tag is a little more unusual. I stole the idea from Muflax’s “epistemic state” tags; I use the same meaning for “log” for collections of data or links (“log entries that simply describe what happened without any judgment or reflection”) personal or reflective writing can be tagged “emotional” (“some cluster of ideas that got itself entangled with a complex emotional state, and I needed to externalize it to even look at it; in no way endorsed, but occasionally necessary (similar to fiction)”), and “fiction” needs no explanation (every author has some reason for writing the story or poem they do, but not even they always know whether it is an expression of their deepest fears, desires, history, or simply random thoughts). I drop his other tags in favor of giving my subjective probability using the :

  1. “certain”
  2. “highly likely”
  3. “likely”
  4. “possible” (my preference over Kesselman’s “Chances a Little Better [or Less]”)
  5. “unlikely”
  6. “highly unlikely”
  7. “remote”
  8. “impossible”

These are used to express my feeling about how well-supported the essay is, or how likely it is the overall ideas are right. (Of course, an interesting idea may be worth writing about even if very wrong, and even a long shot may be profitable to examine if the potential payoff is large enough.)

Importance tags

An additional useful bit of metadata would be distinction between things which are trivial and those which are about more important topics which might change your life. Using my interactive sorting tool Resorter⁠, I’ve ranked pages in deciles from 0–10 on how important the topic is to myself, the intended reader, or the world. For example, topics like embryo selection for traits such as intelligence or evolutionary pressures towards autonomous AI are vastly more important, and be ranked 10, than some poems or a dream or someone’s small nootropics self-experiment, which would be ranked 0–1.

Writing checklist

It turns out that writing essays (technical or philosophical) is a lot like writing code—there are so many ways to err that you need a process with as much automation as possible. My current checklist for finishing an essay:

Markdown checker

I’ve found that many errors in my writing can be caught by some simple scripts, which I’ve compiled into a shell script, markdown-lint.sh⁠.

My linter does:

  1. checks for corrupted non-text binary files

  2. checks a blacklist of domains which are either dead (eg Google+) or have a history of being unreliable (eg ResearchGate, NBER, PNAS); such links need34 to either be fixed, pre-emptively mirrored, or removed entirely.

    • a special case is PDFs hosted on IA; the IA is reliable, but I try to rehost such PDFs so they’ll show up in Google/Google Scholar for everyone else.
  3. Broken syntax: I’ve noticed that when I make Markdown syntax errors, they tend to be predictable and show up either in the original Markdown source, or in the rendered HTML. Two common source errors:

     "(www"
     ")www"

    And the following should rarely show up in the final rendered HTML:

     "\frac"
     "\times"
     "(http"
     ")http"
     "[http"
     "]http"
     " _ "
     "[^"
     "^]"
     "<!--"
     "-->"
     "<-- "
     "<-"
     "->"
     "$title$"
     "$description$"
     "$author$"
     "$tags$"
     "$category$"

    Similarly, I sometimes slip up in writing image/document links so any link starting https://www.gwern.net or ~/wiki/ or /home/gwern/ is probably wrong. There are a few Pandoc-specific issues that should be checked for too, like duplicate footnote names and images without separating newlines or unescaped dollar signs (which can accidentally lead to sentences being rendered as TeX).

    A final pass with htmltidy finds many errors which slip through, like incorrectly-escaped URLs.

  4. Flag dangerous language: Imperial units are deprecated, but so too is the misleading language of NHST statistics (if one must talk of “significance” I try to flag it as “statistically-significant” to warn the reader). I also avoid some other dangerous words like “obvious” (if it is really is, why do I need to say it?).

  5. Bad habits:

    • proselint (with some checks disabled because they play badly with Markdown documents)
    • Another static warning is checking for too-long lines (most common in code blocks, although sometimes broken indentation will cause this) which will cause browsers to use scrollbars, for which I’ve written a Pandoc script⁠,
    • one for a bad habit of mine—too-long footnotes
  6. duplicate and hidden-PDF URLs: a URL being linked multiple times is sometimes an error (too much copy-paste or insufficiently edited sections); PDF URLs should receive a visual annotation warning the reader it’s a PDF, but the CSS rules, which catch cases like .pdf$, don’t cover cases where the host quietly serves a PDF anyway, so all URLs are checked. (A URL which is a PDF can be made to trigger the PDF rule by appending #pdf.)

  7. broken links are detected with linkchecker⁠. The best time to fix broken links is when you’re already editing a page.

While this throws many false positives, those are easy to ignore, and the script fights bad habits of mine while giving me much greater confidence that a page doesn’t have any merely technical issues that screw it up (without requiring me to constantly reread pages every time I modify them, lest an accidental typo while making an edit breaks everything).

Anonymous feedback

Back in November 2011, lukeprog posted “Tell me what you think of me” where he described his use of a Google Docs form for anonymous receipt of textual feedback or comments. Typically, most forms of communication are non-anonymous, or if they are anonymous, they’re public. One can set up pseudonyms and use those for private contact, but it’s not always that easy, and is definitely a series of (if anonymous feedback is not solicited, one has to feel it’s important enough to do and violate implicit norms against anonymous messages; one has to set up an identity; one has to compose and send off the message, etc).

I thought it was a good idea to try out, and on 2011-11-08, I set up my own anonymous feedback form and stuck it in the footer of all pages on Gwern.net where it remains to this day. I did wonder if anyone would use the form, especially since I am easy to contact via email, use multiple sites like Reddit or Lesswrong, and even my Disqus comments allow anonymous comments—so who, if anyone, would be using this form? I scheduled a followup in 2 years on 2013-11-30 to review how the form fared.

754 days, 2.884m page views, and 1.350m unique visitors later, I have received 116 pieces of feedback (mean of 24.8k visits per feedback). I categorize them as follows in descending order of frequency:

  • Corrections, problems (technical or otherwise), suggested edits: 34
  • Praise: 31
  • Question/request (personal, tech support, etc): 22
  • Misc (eg gibberish, socializing, Japanese): 13
  • Criticism: 9
  • News/suggestions: 5
  • Feature request: 4
  • Request for cybering: 1
  • Extortion: 1 (see my blackmail page dealing with the September 2013 incident)

Some submissions cover multiple angles (they can be quite long), sometimes people double-submitted or left it blank, etc, so the numbers won’t sum to 116.

In general, a lot of the corrections were usable and fixed issues of varying importance, from typos to the entire site’s CSS being broken due to being uploaded with the wrong MIME type. One of the news/suggestion feedbacks was very valuable, as it lead to writing the Silk Road mini-essay “A Mole?” A lot of the questions were a waste of my time; I’d say half related to Tor/Bitcoin/Silk-Road. (I also got an irritating number of emails from people asking me to, say, buy LSD or heroin off SR for them.) The feature requests were usually for a better RSS feed, which I tried to oblige by starting the Changelog page. The cybering and extortion were amusing, if nothing else. The praise was good for me mentally, as I don’t interact much with people.

I consider the anonymous feedback form to have been a success, I’m glad lukeprog brought it up on LW, and I plan to keep the feedback form indefinitely.

Feedback causes

One thing I wondered is whether feedback was purely a function of traffic (the more visits, the more people who could see the link in the footer and decide to leave a comment), or more related to time (perhaps people returning regularly and eventually being emboldened or noticing something to comment on). So I compiled daily hits, combined with the feedback dates, and looked at a graph of hits:

Hits over time for Gwern.net

The hits are heavily skewed by Hacker News & Reddit traffic spikes, and probably should be log transformed. Then I did a logistic regression on hits, log hits, and a simple time index:

feedback <- read.csv("https://www.gwern.net/docs/traffic/2013-gwernnet-anonymousfeedback.csv",
                     colClasses=c("Date","logical","integer"))
plot(Visits ~ Day, data=feedback)
feedback$Time <- 1:nrow(feedback)
summary(step(glm(Feedback ~ log(Visits) + Visits + Time, family=binomial, data=feedback)))
# ...
# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)
# (Intercept) -7.363507   1.311703   -5.61  2.0e-08
# log(Visits)  0.749730   0.173846    4.31  1.6e-05
# Time        -0.000881   0.000569   -1.55     0.12
#
# (Dispersion parameter for binomial family taken to be 1)
#
#     Null deviance: 578.78  on 753  degrees of freedom
# Residual deviance: 559.94  on 751  degrees of freedom
# AIC: 565.9

The logged hits works out better than regular hits, and survives to the simplified model. And the traffic influence seems much larger than the time variable (which is, curiously, negative).

Technical aspects

Popularity

On a semi-annual basis, since 2011, I review Gwern.net website traffic using Google Analytics; although what most readers value is not what I value, I find it motivating to see total traffic statistics reminding me of readers (writing can be a lonely and abstract endeavour), and useful to see what are major referrers.

Gwern.net typically enjoys steady traffic in the 50–100k range per month, with occasional spikes from social media, particularly Hacker News; over the first decade (2010–2020), there were 7.98m pageviews by 3.8m unique users.

See Gwern.net Website Traffic

Colophon

Hosting

Gwern.net is served by through the ⁠. (Amazon charges less for bandwidth and disk space than NFSN, although one loses all the capabilities offered by Apache’s ⁠, and compression is difficult so must be handled by CloudFlare; total costs may turn out to be a wash and I will consider the switch to Amazon S3 a success if it can bring my monthly bill to <$10 or <$120 a year.)

From October 2010 to June 2012, the site was hosted on NearlyFreeSpeech.net⁠, an old hosting company; its specific niche is controversial material and activist-friendly pricing. Its libertarian owners cast a jaundiced eye on s, and pricing is pay-as-you-go. I like the former aspect, but the latter sold me on NFSN. Before I stumbled on NFSN (someone mentioned it offhandedly while chatting), I was getting ready to pay $10–15 a month ($120 yearly) to ⁠. Linode’s offerings are overkill since I do not run dynamic websites or something like Haskell.org (with wikis and mailing lists and repositories), but I didn’t know a good alternative. NFSN’s pricing meant that I paid for usage rather than large flat fees. I put in $32 to cover registering Gwern.net until 2014, and then another $10 to cover bandwidth & storage price. DNS aside, I was billed $8.27 for October-December 2010; DNS included, January-April 2011 cost $10.09. $10 covered months of Gwern.net for what I would have paid Linode in 1 month! In total, my 2010 costs were $39.44 (bill archive); my 2011 costs were $118.32 ($9.86 a month; archive); and my 2012 costs through June were $112.54 ($21 a month; archive); sum total: $270.3.

The switch to Amazon S3 hosting is complicated by my simultaneous addition of CloudFlare as a CDN; my total June 2012 Amazon bill is $1.62, with $0.19 for storage. CloudFlare claims it covered 17.5GB of 24.9GB total bandwidth, so the $1.41 represents 30% of my total bandwidth; multiply 1.41 by 3 is 4.30, and my hypothetical non-CloudFlare S3 bill is ~$4.5. Even at $10, this was well below the $21 monthly cost at NFSN. (The traffic graph indicates that June 2012 was a relatively quiet period, but I don’t think this eliminates the factor of 5.) From July 2012 to June 2013, my Amazon bills totaled $60, which is reasonable except for the steady increase ($1.62/$3.27/$2.43/$2.45/$2.88/$3.43/$4.12/$5.36/$5.65/$5.49/$4.88/$8.48/$9.26), being primarily driven by out-bound bandwidth (in June 2013, the $9.26 was largely due to the 75GB transferred—and that was after CloudFlare dealt with 82GB); $9.26 is much higher than I would prefer since that would be >$110 annually. This was probably due to all the graphics I included in the “Google shutdowns” analysis, since it returned to a more reasonable $5.14 on 42GB of traffic in August. September, October, November and December 2013 saw high levels maintained at $7.63/$12.11/$5.49/$8.75, so it’s probably a new normal. 2014 entailed new costs related to EC2 instances & S3 bandwidth spikes due to hosting a multi-gigabyte scientific dataset, so bills ran $8.51/$7.40/$7.32/$9.15/$26.63/$14.75/$7.79/$7.98/$8.98/$7.71/$7/$5.94. 2015 & 2016 were similar: $5.94/$7.30/$8.21/$9.00/$8.00/$8.30/$10.00/$9.68/$14.74/$7.10/$7.39/$8.03/$8.20/$8.31/$8.25/$9.04/$7.60/$7.93/$7.96/$9.98/$9.22/$11.80/$9.01/$8.87. 2017 saw costs increase due to one of my side-projects, aggressively increasing fulltexting of Gwern.net by providing more papers & scanning cited books, only partially offset by changes like lossy optimization of images & converting GIFs to WebMs: $12.49/$10.68/$11.02/$12.53/$11.05/$10.63/$9.04/$11.03/$14.67/$15.52/$13.12/$12.23 (total: $144.01). In 2018, I continued fulltexting: $13.08/$14.85/$14.14/$18.73/$18.88/$15.92/$15.64/$15.27/$16.66/$22.56/$23.59/$25.91/(total: $213).

For 2019, I made a determined effort to host more things, including whole websites like the OKCupid archives or rotten.com, and to include more images/videos (the StyleGAN anime faces tutorial alone must be easily 20MB+ just for images) and it shows in how my bandwidth costs exploded: $26.49/$37.56/$37.56/$37.56/$25.00/$25.00/$25.00/$25.00/$77.91/$124.45/$74.32/$79.19. I’ve begun considering a move of Gwern.net to my Hetzner dedicated server which has cheap bandwidth, combined with upgrading my Cloudflare CDN to keep site latency in check (even at $20/month, it’s still far cheaper than AWS S3 bandwidth).

Source

The revision history is kept in git; individual page sources can be read by appending .page to their URL.

Size

As of 2020-01-07, the source of Gwern.net is composed of >366 text files with >3.76m words or >27MB; this includes my writings & documents I have transcribed into Markdown, but excludes images, PDFs, HTML mirrors, source code, archives, infrastructure, popup and the revision history. With those included and everything compiled to the static35 HTML, the site is >18.3GB. The source repository contains >13,323 patches (this is an under-count as the creation of the repository in 2008-09-26 included already-written material).

Design

“People who are really serious about software should make their own hardware.”

⁠, “Creative Think” 1982

The sorrow of web design & typography is that it all can matter just a little how you present your pages. A page can be terribly designed and render as typewriter text in 80-column ASCII monospace, and readers will still read it, even if they complain about it. And the most tastefully-designed page, with true smallcaps and correct use of em-dashes vs en-dashes vs hyphens vs minuses and all, which loads in a fraction of a second and is SEO optimized, is of little avail if the page has nothing worth reading; no amount of typography can rescue a page of dreck. Perhaps 1% of readers could even name any of these details, much less recognize them. If we added up all the small touches, they surely make a difference to the reader’s happiness, but it would have to be a small one—say, 5%.36 It’s hardly worth it for writing just a few things.

But the joy of web design & typography is that just its presentation can matter a little to all your pages. Writing is hard work, and any new piece of writing will generally add to the pile of existing ones, rather than multiplying it all; it’s an enormous amount of work to go through all one’s existing writings and improve them somehow, so it usually doesn’t happen. Design improvements, on the other hand, benefit one’s entire website & all future readers, and so at a certain scale, can be quite useful. I feel I’ve reached the point where it’s worth sweating the small stuff, typographically.

Principles

There are 4 design principles:

  1. Aesthetically-pleasing Minimalism

    The design esthetic is minimalist. I believe that helps one focus on the content. Anything besides the content is distraction and not design⁠. ‘Attention!’, as would say.37

    The palette is deliberately kept to grayscale as an experiment in consistency and whether this constraint permits a readable aesthetically-pleasing website. Various classic typographical tools, like and are used for emphasis.38

  2. Accessibility &

    Semantic markup is used where Markdown permits. JavaScript is not required for the core reading experience, only for optional features: comments, table-sorting, sidenotes⁠, and so on. Pages can even be read without much problem in a smartphone or a text browser like elinks.

  3. Speed & Efficiency

    On an increasingly-bloated Internet, a website which is anywhere remotely as fast as it could be is a breath of fresh air. Readers deserve better. Gwern.net uses many tricks to offer nice features like sidenotes or LaTeX math at minimal cost.

  4. Structural Reading

    How should we present texts online? A web page, unlike many mediums such as print magazines, lets us provide an unlimited amount of text. We need not limit ourselves to overly concise constructions, which countenance contemplation but not conviction.

    The problem then becomes taming complexity and length, lest we hang ourselves with our own rope. Some readers want to read every last word about a particular topic, while most readers want the summary or are skimming through on their way to something else. A tree structure is helpful in organizing the concepts, but doesn’t solve the presentation problem: a book or article may be hierarchically organized, but it still must present every last leaf node at 100% size. Tricks like footnotes or appendices only go so far—having thousands of endnotes or 20 appendices to tame the size of the ‘main text’ is unsatisfactory as while any specific reader is unlikely to want to read any specific appendix, they will certainly want to read an appendix & possibly many. The classic hypertext paradigm of simply having a rat’s-nest of links to hundreds of tiny pages to avoid any page being too big also breaks down, because how granular does one want to go? Should every section be a separate page? (Anyone who attempted to read a manual knows how tedious that can be.39) What about every reference in the bibliography, should there be 100 different pages for 100 different references?

    A web page, however, can be dynamic. The solution to the length problem is to progressively expose more beyond the default as the user requests it, and make requesting as easy as possible. For lack of a well-known term and by analogy to in ⁠/ ⁠, I call this structural reading: the hierarchy is made visible & malleable to allow reading at multiple levels of the structure.

    A Gwern.net page can be read at multiple structural levels: title, metadata block, abstracts, margin notes, emphasized keywords in list items, footnotes/sidenotes, collapsible sections, popup link annotations, and fulltext links or internal links to other pages. So the reader can read (in increasing depth) the title/metadata, or the page abstract, or skim the headers/Table of Contents, then skim margin notes+item summaries, then read the body text, then click to uncollapse regions to read in-depth sections too, and then if they still want more, they can mouse over references to pull up the abstracts or excerpts, and then they can go even deeper by clicking the fulltext link to read the full original. Thus, a page may look short, and the reader can understand & navigate it easily, but like an iceberg, those readers who want to know more about any specific point will find much more under the surface.

Miscellaneous principles: all visual differences should be semantic differences; every UI element that can react should visually change on hover, and have tooltips/summaries; hierarchies & progressions should come in cycles of 3 (eg bold > smallcaps > italics), otherwise, all numbers should be ⁠; function > form; more > less; self-contained > fragmented; convention (linters/checkers) > constraints; hypertext is a great idea, we should try that!; local > remote—every link dies someday (archives are expensive short-term but cheap long-term); reader > author; speed is the second-most important feature after correctness; always bet on text; earn your ornaments (if you go overboard on minimalism, you may barely be mediocre); “users won’t tell you when it’s broken”; UI consistency is underrated (when in doubt, copy Wikipedia); if you find yourself doing something 3 times, fix it.

Features

Notable features (compared to standard Markdown static site):

  • sidenotes using both margins, fallback to floating footnotes

  • source code syntax highlighting

  • code folding (collapsible sections/code blocks/tables)

  • JS-free LaTeX math rendering (but where possible, HTML+Unicode is used instead, as it is much more efficient & natural-looking)

  • Link popup annotations:

    Annotations are hand-written, and automatically extracted from Wikipedia/Arxiv/BioRxiv/MedRxiv/Gwern.net/Crossref.

  • dark mode (with a )

  • click-to-zoom images & slideshows; full-width tables/images

  • Disqus comments

  • sortable tables; tables of various sizes

  • automatically inflation-adjust dollar amounts, exchange-rate Bitcoin amounts

  • link icons for filetype/domain/topic

  • infoboxes (Wikipedia-like by way of Markdeep)

  • lightweight drop caps

  • epigraphs

  • automatic smallcaps typesetting

  • multi-column lists

  • interwiki link syntax

Much of Gwern.net design and JS/CSS was developed by Said Achmiz⁠, 2017–2020. Some inspiration has come from Tufte CSS & Matthew Butterick’s Practical Typography⁠.

Abandoned

Worth noting are things I tried but abandoned (in roughly chronological order):

  • Gitit wiki: I preferred to edit files in Emacs/Bash rather than a GUI/browser-based wiki.

    A Pandoc-based wiki using Darcs as a history mechanism, serving mostly as a demo; the requirement that ‘one page edit = one Darcs revision’ quickly became stifling, and I began editing my Markdown files directly and recording patches at the end of the day, and syncing the HTML cache with my host (at the time, a personal directory on code.haskell.org). Eventually I got tired of that and figured that since I wasn’t using the wiki, but only the static compiled pages, I might as well switch to Hakyll and a normal static website approach.

  • jQuery sausages: unhelpful UI visualization of section lengths.

    A UI experiment, ‘sausages’ add a second scroll bar where vertical lozenges correspond to each top-level section of the page; it indicates to the reader how long each section is and where they are. (They look like a long link of pale white sausages.) I thought it might assist the reader in positioning themselves, like the popular ‘floating highlighted Table of Contents’ UI element, but without text labels, the sausages were meaningless. After a jQuery upgrade broke it, I didn’t bother fixing it.

  • Beeline Reader: a ‘reading aid’ which just annoyed readers.

    BLR tries to aid reading by coloring the beginnings & endings of lines to indicate the continuation and make it easier for the reader’s eyes to saccade to the correct next line without distraction (apparently dyslexic readers in particular have issue correctly fixating on the continuation of a line). The A/B test indicated no improvements in the time-on-page metric, and I received many complaints about it; I was not too happy with the browser performance or the appearance of it, either.

    I’m sympathetic to the goal and think syntax highlighting aids are underused, but BLR was a bit half-baked and not worth the cost compared more straightforward interventions like reducing paragraph lengths or more rigorous use of ‘structural reading’ formatting. (We may be able to do typography very differently in the future with new technology, like VR/AR headsets which come with technology intended for —forget simple tricks like emphasizing the beginning of the next line as the reader reaches the end of the current line, do we need ‘lines’ at all if we can do things like just-in-time display the next piece of text in-place to create an ‘infinite line’?)

  • : site search feature which too few people used.

    A ‘custom search engine’, a CSE is a souped-up site:gwern.net/ Google search query; I wrote one covering Gwern.net and some of my accounts on other websites, and added it to the sidebar. Checking the analytics⁠, perhaps 1 in 227 page-views used the CSE, and a decent number of them used it only by accident (eg searching “e”); an A/B testing for a feature used so little would be powerless, and so I removed it rather than try to formally test it.

  • Tufte-CSS sidenotes: fundamentally broken, and superseded.

    An early admirer of Tufte-CSS for its sidenotes, I gave a Pandoc plugin a try only to discover a terrible drawback: the CSS didn’t support block elements & so the plugin simply deleted them. This bug apparently can be fixed, but the density of footnotes led to using sidenotes.js instead.

  • document format use: DjVu is a space-efficient document format with the fatal drawback that Google ignores it, and “if it’s not in Google, it doesn’t exist.”

    DjVu is a document format superior to PDFs, especially standard PDFs: I discovered that space savings of 5× or more were entirely possible, so I used it for most of my book scans. It worked fine in my document viewers, Internet Archive & Libgen preferred them, and so why not? Until one day I wondered if anyone was linking them and tried searching in Google Scholar for some. Not a single hit! (As it happens, GS seems to specifically filter out books.) Perplexed, I tried Google—also nothing. Huh‽ My scans have been visible for years, DjVu dates back to the 1990s and was widely used (if not remotely as popular as PDF), and G/GS picks up all my PDFs which are hosted identically. What about filetype:djvu? I discovered to my horror that on the entire Internet, Google indexed about 50 DjVu files. Total. While apparently at one time Google did index DjVu files, that time must be long past.

    Loathe to take the space hit, which would noticeably increase my Amazon AWS S3 hosting costs, I looked into PDFs more carefully. I discovered PDF technology had advanced considerably over the default PDFs that gscan2pdf generates, and with compression, they were closer to DjVu in size; I could conveniently generate such PDFs using ocrmypdf⁠.40 This let me convert over at moderate cost and now my documents do show up in Google.

  • Darcs/Github repo: no useful contributions or patches submitted, added considerable process overhead, and I accidentally broke the repo by checking in too-large PDFs from a failed post-DjVu optimization pass (I misread the result as being smaller, when it was much larger).

  • spaces in URLs: an OK idea but users are why we can’t have nice things.

    Gitit assumed ‘titles = filenames = URLs’, which simplified things and I liked spaced-separated filenames; I carried this over to Hakyll, but gradually, by monitoring analytics realized this was a terrible mistake—as straightforward as URL-encoding spaces as %20 may seem to be, no one can do it properly. I didn’t want to fix it because by the time I realized how bad the problem was, it would have required breaking, or later on, redirecting, hundreds of URLs and updating all my pages. The final straw was when The Browser linked a page incorrectly, sending ~1500 people to the 404 page. I finally gave in and replaced spaces with hyphens. (Underscores are the other main option but because of Markdown, I worry that trades one error for another.) I suspect I should have also lower-cased all links while I was at it, but thus far it has not proven too hard to fix case errors & lower-case URLs are ugly.

    In retrospect, Sam Hughes was right: I should have made URLs as simple as possible (and then a bit simpler): a single word, lowercase alphanum, with no hyphens or underscores or spaces or punctuation of any sort. I am, however, locked in to longer hyphen-separated mixed-case URLs now.

  • banner ads (and ads in general): reader-hostile and probably a net financial loss.

    I hated running banner ads, but before my Patreon began working, it seemed the lesser of two evils. As my finances became less parlous, I became curious as to how much lesser—but I could find no Internet research whatsoever measuring something as basic as the traffic loss due to advertising! So I decided to run an A/B test myself⁠, with a proper sample size and cost-benefit analysis; the harm turned out to be so large that the analysis was unnecessary, and I removed AdSense permanently the first time I saw the results. Given the measured traffic reduction, I was probably losing several times more in potential donations than I ever earned from the ads. (Amazon affiliate links appear to not trigger this reaction, and so I’ve left them alone.)

  • Bitcoin/PayPal/Gittip/Flattr donation links: never worked well compared to Patreon.

    These methods were either single-shot or never hit a critical mass. One-off donations failed because people wouldn’t make a habit if it was manual, and it was too inconvenient. Gittip/Flattr were similar to Patreon in bundling donators, and making it a regular thing, but never hit an adequate scale.

  • web fonts: slow and buggy.

    Google Fonts turned out to introduce noticeable latency in page rendering; further, its selection of fonts is limited, and the fonts outdated or incomplete. We got both faster and nicer-looking pages by taking the master Github versions of Adobe Source Serif/Sans Pro (the Google Fonts version was both outdated & incomplete then) and subsetting them for Gwern.net specifically.

  • JS: switched to static rendering during compilation for speed.

    For math rendering, MathJax and are reasonable options (inasmuch as browser adoption is dead in the water). MathJax rendering is extremely slow on some pages: up to 6 seconds to load and render all the math. Not a great reading experience. When I learned that it was possible to preprocess MathJax-using pages, I dropped MathJax JS use the same day.

  • <q> quote tags for English : divisive and a maintenance burden.

    I like the idea of treating English as a little (not a lot!) more like a formal language, such as a programming language, as it comes with benefits like syntax highlighting. In a program, the reader gets guidance from syntax highlighting indicating logical nesting and structure of the ‘argument’; in a natural language document, it’s one damn letter after another, spiced up with the occasional punctuation mark or indentation. (If Lisp looks like “oatmeal with fingernail clippings mixed in” due to the lack of “”, then English must be plain oatmeal!) One of the most basic kinds of syntax highlighting is simply highlighting strings and other literals vs code: I learned early on that syntax highlighting was worth it just to make sure you hadn’t forgotten a quote or parenthesis somewhere! The same is true of regular writing: if you are extensively quoting or naming things, the reader can get a bit lost in the thickets of curly quotes and be unsure who said what.

    I discovered an obscure HTML tag enabled by an obscurer Pandoc setting: the quote tag <q>, which replaces quote characters and is rendered by the browser as quotes (usually). Quote tags are parsed explicitly, rather than just being opaque natural language text blobs, and so they, at least, can be manipulated easily by JS/CSS and syntax-highlighted. Anything inside a pair of quotes would be tinted a gray to visually set it off similarly to the blockquotes. I was proud of this tweak, which I’ve seen nowhere else.

    The problems with it was that not everyone was a fan (to say the least); it was not always correct (there are many double-quotes which are not literal quotes of anything, like rhetorical questions); and it interacted badly with everything else. There were puzzling drawbacks: eg web browsers delete them from copy-paste, so we had to use JS to convert them to normal quotes. Even when it was worked out, all the HTML/CSS/JS had to be constantly rejiggered to deal with interactions with them, browser updates would silently break what was working, and Said Achmiz hated the look. I tried manually annotating quotes to ensure they were all correct and not used in dangerous ways, but even with interactive regexp search-and-replace to assist, the manual toil of constantly marking up quotes was a major obstacle to writing. So I gave in. It was not meant to be.

  • typographic rubrication: a solution in search of a problem.

    Red emphasis is a visual strategy that works wonderfully well for many styles, but not Gwern.net that I could find. Using it on the regular website resulted in too much emphasis and the lack of color anywhere else made the design inconsistent; we tried using it in dark mode to add some color & preserve night vision by making headers/links/drop-caps red, but it looked like “a vampire fansite” as one reader put it. It is a good idea, but we just haven’t found a use for it. (Perhaps if I ever make another website, it will be designed around rubrication.)

  • wikipedia-popups.js: a JS library written to imitate Wikipedia popups, which used the WP API to fetch article summaries; obsoleted by the faster & more general local static link annotations.

    I disliked the delay and as I thought about it, it occurred to me that it would be nice to have popups for other websites, like Arxiv/BioRxiv links—but they didn’t have APIs which could be queried. If I fixed the first problem by fetching WP article summaries while compiling articles and inlining them into the page, then there was no reason to include summaries for only Wikipedia links, I could get summaries from any tool or service or API, and I could of course write my own! But that required an almost complete rewrite to turn it into popups.js.

  • link screenshot previews: automatic screenshots too low-quality, and unpopular.

    To compensate for the lack of summaries for almost all links (even after I wrote the code to scrape various sites), I tried a feature I had seen elsewhere of ‘link previews’: small thumbnail sized screenshots of a web page or PDF, loading using JS when the mouse hovered over a link. (They were much too large, ~50kb, to inline statically like the link annotations.) They gave some indication of what the target content was, and could be generated automatically using a headless browser. I used Chromium’s built-in screenshot mode for web pages, and took the first page of PDFs.

    The PDFs worked fine, but the webpages often broke: thanks to ads, newsletters, and the GDPR, countless webpages will pop up some sort of giant modal blocking any view of the page content, defeating the point. (I have extensions installed like AlwaysKillSticky to block that sort of spam, but Chrome screenshot cannot use any extensions or customized settings, and the Chrome devs refuse to improve it.) Even when it did work and produced a reasonable screenshot, many readers disliked it anyway and complained. I wasn’t too happy either about having 10,000 tiny PNGs hanging around. So as I expanded link annotations steadily, I finally pulled the plug on the link previews. Too much for too little.

    • Link Archiving: my link archiving improved on the link screenshots in several ways. First, SingleFile saves pages inside a normal Chromium browsing instance, which does support extensions and user settings. Killing stickies alone eliminates half the bad archives, ad block extensions eliminate a chunk more, and NoScript blacklists specific domains. (I initially used NoScript on a whitelist basis, but disabling JS breaks too many websites these days.) Finally, I decided to manually review every snapshot before it went live to catch bad examples and either fix them by hand or add them to the blacklist.
  • auto : a good idea but users are why we can’t have nice things.

    OSes/browsers have defined a ‘global dark mode’ toggle the user can set if they want dark mode everywhere, and this is available to a web page; if you are implementing a dark mode for your website, it then seems natural to just make it a feature and turn on iff the toggle is on. There is no need for complicated UI-cluttering widgets. And yet—if you do that, users will regularly complain about the website acting bizarre or being dark in the daytime, having apparently forgotten that they enabled it (or never understood what that setting meant).

    A widget is necessary to give readers control, although even there it can be screwed up: many websites settle for a simple negation switch of the global toggle, but if you do that, someone who sets dark mode at day will be exposed to blinding white at night… Our widget works better than that. Mostly.

  • multi-column footnotes: mysteriously buggy and yielding overlaps.

    Since most footnotes are short, and no one reads the endnote section, I thought rendering them as two columns, as many papers do, would be more space-efficient and tidy. It was a good idea, but it didn’t work.

  • Hyphenopoly: it turned out to be more efficient (and not much harder to implement) to hyphenate the HTML during compilation than to run JS clientside.

    To work around Google Chrome’s 2-decade-long refusal to ship hyphenation dictionaries on desktop and enable (and incidentally use the better ), the JS library Hyphenopoly will download the TeX English dictionary and typeset a webpage itself. While the performance cost was surprisingly minimal, it was there, and it caused problems with obscurer browsers like Internet Explorer.

    So we scrapped Hyphenopoly, and I later implemented a Hakyll function using of the TeX hyphenation algorithm & dictionary to insert at compile-time a ‘’ everywhere a browser could usefully break a word, which enables Chrome to hyphenate correctly, at the moderate cost of inlining them and a few edge cases.41

    Desktop Chrome finally shipped hyphen support in early 2020, and I removed the soft-hyphen hyphenation pass in April 2021 when CanIUse indicated >96% global support.

  • autopager keyboard shortcuts: binding Home/PgUp & End/PgDwn keyboard shorcuts to go to the ‘previous’/‘next’ logical page turned out to be glitchy & confusing.

    HTML supports previous/next attributes on links which specify what URL is the logical next or previous URL, which makes sense in many contexts like manuals or webcomics or series of essays; browsers make little use of this metadata—typically not even to preload the next page! (Opera apparently was one of the few exceptions.)

    Such metadata was typically available in older hypertext systems by default, and so older more reader-oriented interfaces like pre-Web hypertext readers such browsers frequently overloaded the standard page-up/down keybindings to, if one was already at the beginning/ending of a hypertext node, go to the logical previous/next node. This was convenient, since it made paging through a long series of info nodes fast, almost as if the entire info manual were a single long page, and it was easy to discover: most users will accidentally tap them twice at some point, either reflexively or by not realizing they were already at the top/bottom (as is the case on most info nodes due to egregious shortness). In comparison, navigating the HTML version of an info manual is frustrating: not only do you have to use the mouse to page through potentially dozens of 1-paragraph pages, each page takes noticeable time to load (because of failure to exploit preloading) whereas a local info browser is instantaneous.

    After defining a global sequence for Gwern.net pages, and adding a ‘navbar’ to the bottom of each page with previous/next HTML links encoding that sequence, I thought it’d be nice to support continuous scrolling through Gwern.net, and wrote some JS to detect whether at the top/bottom of page, and on each Home/PgUp/End/PgDwn, whether that key had been pressed in the previous 0.5s, and if so, proceed to the previous/next page.

    This worked, but proved buggy and opaque in practice, and tripped up even me occasionally. Since so few people know about that pre-WWW hypertext UI pattern (as useful as it is), would be unlikely to discover it, or use it much if they did discover it, I removed it.

Tools

Software tools & libraries used in the site as a whole:

  • The source files are written in Pandoc (Pandoc: John MacFarlane et al; GPL) (source files: Gwern Branwen, CC-0). The Pandoc Markdown uses a number of extensions; pipe tables are preferred for anything but the simplest tables; and I use semantic linefeeds (also called “semantic line breaks” or “ventilated prose”) formatting.

  • math is written in which compiles to ⁠, rendered by MathJax (Apache)

  • syntax highlighting: we originally used Pandoc’s builtin -derived themes, but most clashed with the overall appearance; after looking through all the existing themes, we took inspiration from Pygments’s algol_nu (BSD) based on the original report, and typeset it in the Mono font42

  • the site is compiled with the Hakyllv4+ static site generator, used to generate Gwern.net, written in (Jasper Van der Jeugt et al; BSD); for the gory details, see hakyll.hs which implements the compilation, RSS feed generation, & parsing of interwiki links as well. This just generates the basic website; I do many additional optimizations/tests before & after uploading, which is handled by sync-gwern.net.sh (Gwern Branwen, CC-0)

    My preferred method of use is to browse & edit locally using Emacs, and then distribute using Hakyll. The simplest way to use Hakyll is that you cd into your repository and runhaskell hakyll.hs build (with hakyll.hs having whatever options you like). Hakyll will build a static HTML/CSS hierarchy inside _site/; you can then do something like firefox _static/index. (Because HTML extensions are not specified in the interest of cool URIs⁠, you cannot use the Hakyll watch webserver as of January 2014.) Hakyll’s main advantage for me is relatively straightforward integration with the Pandoc Markdown libraries; Hakyll is not that easy to use, and so I do not recommend use of Hakyll as a general static site generator unless one is already adept with Haskell.

  • the CSS is borrowed from a motley of sources and has been heavily modified, but its origin was the Hakyll homepage & Gitit⁠; for specifics, see default.css

  • Markdown syntax extensions:

    • I implemented a Pandoc Markdown plugin for a custom syntax for interwiki links in Gitit, and then ported it to Hakyll (defined in hakyll.hs); it allows linking to the English Wikipedia (among others) with syntax like [malefits](!Wiktionary) or [antonym of 'benefits'](!Wiktionary "Malefits"). CC-0.
    • inflation adjustment: provides a Pandoc Markdown plugin which allows automatic inflation adjusting of dollar amounts, presenting the nominal amount & a current real amount, with a syntax like [$5]($1980).
    • Book affiliate links are through an tag appended in the hakyll.hs
    • image dimensions are looked up at compilation time & inserted into <img> tags as browser hints
  • JavaScript:

    • Comments are outsourced to (since I am not interested in writing a dynamic system to do it, and their anti-spam techniques are much better than mine).

    • the HTML tables are sortable via tablesorter (Christian Bach; MIT/GPL)

    • the MathML is rendered using

    • analytics are handled by

    • is done using ABalytics (Daniele Mazzini; MIT) which hooks into Google Analytics (see testing notes) for individual-level testing; when doing site-level long-term testing like in the advertising A/B tests⁠, I simply write the JS manually.

    • for loading introductions/summaries/previews of all links when one mouses-over a link; reads annotations, which are manually written & automatically populated from many sources (Wikipedia, Pubmed, BioRxiv, Arxiv, hand-written…), with special handling of YouTube videos (Said Achmiz, Shawn Presser; MIT).

      Note that ‘links’ here is interpreted broadly: almost everything can be ‘popped up’. This includes links to sections (or div IDs) on the current or other pages, PDFs (often page-linked using the obscure but handy #page=N feature), source code files (which are syntax-highlighted by Pandoc), locally-mirrored web pages, footnotes/sidenotes, any such links within the popups themselves recursively…

      • the floating footnotes are handled by the generalized tooltip popups (they were originally implemented via footnotes.js); when the browser window is wide enough, the floating footnotes are instead replaced with marginal notes/sidenotes43 using a custom library, sidenotes.js (Said Achmiz, MIT)

        Demonstration of sidenotes on Radiance.
    • image size: full-scale images (figures) can be clicked on to zoom into them with slideshow mode—useful for figures or graphs which do not comfortably fit into the narrow body—using another custom library, image-focus.js (Said Achmiz; GPL)

  • error checking: problems such as broken links are checked in 3 phases:

    • markdown-lint.sh: writing time
    • sync-gwern.net.sh: during compilation, sanity-checks file size & count; greps for broken interwikis; runs HTML tidy over pages to warn about invalid HTML; tests liveness & MIME types of various pages post-upload; checks for duplicates, read-only, banned filetypes, too large or uncompressed images, etc.
    • Link rot tools: linkchecker⁠, ArchiveBox⁠, and archiver-bot

Implementation Details

There are a number of little tricks or details that web designers might find interesting.

Efficiency:

  • fonts:

    • Adobe ⁠/ ⁠/ : originally Gwern.net used ⁠, but system Baskerville fonts don’t have small caps. Adobe’s open-source “Source” font family of screen serifs⁠, however, is high quality and actively-maintained, comes with good small caps and multiple sets of numerals (‘old-style’ numbers for the body text and different numbers for tables), and looks particularly nice on Macs. (It is also subsetted to cut down the load time.) Small caps CSS is automatically added to abbreviations/acronyms/initials by a Hakyll/Pandoc plugin, to avoid manual annotation.

    • efficient drop caps by subsetting: 1 drop cap is used on every page, but a typical drop cap font will slowly download as much as a megabyte in order to render 1 single letter.

      CSS font loads avoid downloading font files which are entirely unused, but they must download the entire font file if anything in it is used, so it doesn’t matter that only one letter gets used. To avoid this, we split each drop cap font up into a single font file per letter and use CSS to load all the font files; since only 1 font file is used at all, only 1 gets downloaded, and it will be ~4kb rather than 168kb. This has been done for all the drop cap fonts used (yinit⁠, Cheshire Initials⁠, Deutsche Zierschrift⁠, Goudy Initialen⁠, Kanzlei Initialen), and the necessary CSS can be seen in fonts.css⁠. To specify the drop cap for each page, a Hakyll metadata field is used to pick the class and substituted into the HTML template.

  • lazy JavaScript loading by IntersectionObserver: several JS features are used rarely or not at all on many pages, but are responsible for much network activity. For example, most pages have no tables but tablesorter must be loaded anyway, and many readers will never get all the way to the Disqus comments at the bottom of each page, but Disqus will load anyway, causing much network activity and disturbing the reader because the page is not ‘finished loading’ yet.

    To avoid this, IntersectionObserver can be used to write a small JS function which fires only when particular page elements are visible to the reader. The JS then loads the library which does its thing. So an IntersectionObserver can be defined to fire only when an actual <table> element becomes visible, and on pages with no tables, this never happens. Similarly for Disqus and image-focus.js. This trick is a little dangerous if a library depends on another library because the loading might cause race conditions; fortunately, only 1 library, tablesorter, has a prerequisite, jQuery, so I simply prepend jQuery to tablesorter and load tablesorter. (Other libraries, like sidenotes or WP popups, aren’t lazy-loaded because sidenotes need to be rendered as fast as possible or the page will jump around & be laggy, and WP links are so universal it’s a waste of time making them lazy since they will be in the first screen on every page & be loaded immediately anyway, so they are simply loaded asynchronously with the defer JS keyword.)

  • image optimization: PNGs are optimized by pngnq/advpng, JPEGs with mozjpeg, SVGs are minified, PDFs are compressed with ocrmypdf’s support. (GIFs are not used at all in favor of WebM/MP4 <video>s.)

  • JS/CSS minification: because Cloudflare does Brotli compression, minification of JS/CSS has little advantage and makes development harder, so no minification is done; the font files don’t need any special compression either.

  • MathJax: getting well-rendered mathematical equations requires MathJax or a similar heavyweight JS library; worse, even after disabling features, the load & render time is extremely high—a page like the embryo selection page which is both large & has a lot of equations can visibly take >5s (as a progress bar that helpfully pops up informs the reader).

    The solution here is to prerender MathJax locally after Hakyll compilation⁠, using the local tool mathjax-node-page to load the final HTML files, parse the page to find all the math, compile the expressions, define the necessary CSS, and write the HTML back out. Pages still need to download the fonts but the overall speed goes from >5s to <0.5s, and JS is not necessary at all.

  • collapsible sections: managing complexity of pages is a balancing act. It is good to provide all necessary code to reproduce results, but does the reader really want to look at a big block of code? Sometimes they always would, sometimes only a few readers interested in the gory details will want to read the code. Similarly, a section might go into detail on a tangential topic or provide additional justification, which most readers don’t want to plow through to continue with the main theme. Should the code or section be deleted? No. But relegating it to an appendix, or another page entirely is not satisfactory either—for code blocks particularly, one loses the literate programming aspect if code blocks are being shuffled around out of order.

    A nice solution is to simply use a little JS to implement approach where sections or code blocks can be visually shrunk or collapsed, and expanded on demand by a mouse click. Collapsed sections are specified by a HTML class (eg <div class="collapse"></div>), and summaries of a collapsed section can be displayed, defined by another class (<div class="collapseSummary">). This allows code blocks to be collapse by default where they are lengthy or distracting, and for entire regions to be collapsed & summarized, without resorting to many appendices or forcing the reader to an entirely separate page.

  • sidenotes: one might wonder why sidenotes.js is necessary when most sidenote uses are like and use a static HTML/CSS approach, which would avoid a JS library entirely and visibly repainting the page after load?

    The problem is that Tufte-CSS-style sidenotes do not reflow and are solely on the right margin (wasting the considerable whitespace on the left), and depending on the implementation, may overlap, be pushed far down the page away from their, break when the browser window is too narrow or not work on smartphones/tablets at all. The JS library is able to handle all these and can handle the most difficult cases like my annotated edition of Radiance⁠. (Tufte-CSS-style epigraphs⁠, however, pose no such problems and we take the same approach of defining an HTML class & styling with CSS.)

  • Link icons: icons are defined for all filetypes used in Gwern.net and many commonly-linked websites such as Wikipedia, Gwern.net (within-page section links and between-page get ‘§’ & logo icons respectively), or YouTube; all are inlined into default.css as ⁠; the SVGs are so small it would be absurd to have them be files.

  • Redirects: static sites have trouble with redirects, as they are just static files. AWS 3S does not support a .htaccess-like mechanism for rewriting URLs. To allowing moving pages & fix broken links, I wrote Hakyll.Web.Redirect for generating simple HTML pages with redirect metadata+JS, which simply redirect from URL 1 to URL 2. After moving to Nginx hosting, I converted all the redirects to regular rewrite rules.

    In addition to page renames, I monitor 404 hits in Google Analytics to fix errors where possible, and Nginx logs. There are an astonishing number of ways to misspell Gwern.net URLs, it turns out, and I have defined >10k redirects so far (in addition to generic regexp rewrites to fix patterns of errors).

Benford’s law

Does Gwern.net follow the famous Benford’s law? A quick analysis suggests that it sort of does, except for the digit 2, probably due to the many citations to research from the past 2 decades (>2000 AD).

In March 2013 I wondered, upon seeing a mention of : “if I extracted all the numbers from everything I’ve written on Gwern.net, would it satisfy Benford’s law?” It seems the answer is… almost. I generate the list of numbers by running a Haskell program to parse digits, commas, and periods; and then I process it with shell utilities.44 This can then be read in R to run a confirming lack of fit (p=~0) and generate this comparison of the data & Benford’s law45:

 of parsed numbers vs predicted

There’s a clear resemblance for everything but the digit ‘2’, which then blows the fit to heck. I have no idea why 2 is overrepresented—it may be due to all the citations to recent academic papers which would involve numbers starting with ‘2’ (2002, 2010, 2013…) and cause a double-count in both the citation and filename, since if I look in the docs/ fulltext folder, I see 160 files starting with ‘1’ but 326 starting with ‘2’. But this can’t be the entire explanation since ‘2’ has 20.3k entries while to fit Benford, it needs to be just 11.5k—leaving a gap of ~10k numbers unexplained. A mystery.

License

This site is licensed under the public domain (CC-0) license.

I believe the public domain license reduces and 46⁠, encourages copying (), gives back (however little) to ⁠/ ⁠, and costs me nothing47⁠.


  1. ⁠, pg 19 of Russian Silhouettes⁠, on why he wrote his book of biographical sketches of great Soviet chess players. (As Richardson asks (Vectors 1.0, 2001): “25. Why would we write if we’d already heard what we wanted to hear?”)↩︎

  2. One danger of such an approach is that you will simply engage in ⁠, and build up an impressive-looking wall of citations that is completely wrong but effective in brainwashing yourself. The only solution is to be diligent to include criticism—so even if you do not escape brainwashing, at least your readers have a chance. , 1902:

    I had, also, during many years followed a golden rule, namely, that whenever a published fact, a new observation or thought came across me, which was opposed to my general results, to make a memorandum of it without fail and at once; for I had found by experience that such facts and thoughts were far more apt to escape from the memory than favourable ones. Owing to this habit, very few objections were raised against my views which I had not at least noticed and attempted to answer.

    ↩︎
  3. “It is only the attempt to write down your ideas that enables them to develop.” –Wittgenstein (pg 109, Recollections of Wittgenstein); “I thought a little [while in the isolation tank], and then I stopped thinking altogether…incredible how idleness of body leads to idleness of mind. After 2 days, I’d turned into an idiot. That’s the reason why, during a flight, astronauts are always kept busy.” –Oriana Fallaci, quoted in Rocket Men: The Epic Story of the First Men on the Moon by Craig Nelson.↩︎

  4. Such as universe; consider the introduction to the chronologically last story in that setting, “Safe at Any Speed” (Tales of Known Space).↩︎

  5. :

    “If the individual lived five hundred or one thousand years, this clash (between his interests and those of society) might not exist or at least might be considerably reduced. He then might live and harvest with joy what he sowed in sorrow; the suffering of one historical period which will bear fruit in the next one could bear fruit for him too.”

    ↩︎
  6. From Aging and Old Age:

    One way to distinguish empirically between aging effects and proximity-to-death effects would be to compare, with respect to choice of occupation, investment, education, leisure activities, and other activities, elderly people on the one hand with young or middle-aged people who have truncated life expectancies but are in apparent good health, on the other. For example, a person newly infected with the AIDS virus (HIV) has roughly the same life expectancy as a 65-year-old and is unlikely to have, as yet, [major] symptoms. The conventional human-capital model implies that, after correction for differences in income and for other differences between such persons and elderly persons who have the same life expectancy (a big difference is that the former will not have pension entitlements to fall back upon), the behavior of the two groups will be similar. It does appear to be similar, so far as investing in human capital is concerned; the truncation of the payback period causes disinvestment. And there is a high suicide rate among HIV-infected persons (even before they have reached the point in the progression of the disease at which they are classified as persons with AIDS), just as there is, as we shall see in chapter 6, among elderly persons.

    ↩︎
  7. John F. Kennedy⁠, 1962:

    I am reminded of the story of the great French Marshal Lyautey, who once asked his gardener to plant a tree. The gardener objected that the tree was slow-growing and would not reach maturity for a hundred years. The Marshal replied, “In that case, there is no time to lose, plant it this afternoon.”

    ↩︎
  8. ⁠, “Freedom 0”:

    In the long run, the utility of all non-Free software approaches zero. All non-Free software is a dead end.

    ↩︎
  9. These dependencies can be subtle. Computer archivist Jason Scott writes of services that:

    URL shorteners may be one of the worst ideas, one of the most backward ideas, to come out of the last five years. In very recent times, per-site shorteners, where a website registers a smaller version of its hostname and provides a single small link for a more complicated piece of content within it… those are fine. But these general-purpose URL shorteners, with their shady or fragile setups and utter dependence upon them, well. If we lose or ⁠, millions of weblogs, essays, and non-archived tweets lose their meaning. Instantly. To someone in the future, it’ll be like everyone from a certain era of history, say ten years of the 18th century, started speaking in a one-time pad of cryptographic pass phrases. We’re doing our best to stop it. Some of the shorteners have been helpful, others have been hostile. A number have died. We’re going to release torrents on a regular basis of these spreadsheets, these code breaking spreadsheets, and we hope others do too.

    ↩︎
  10. remarks (and the comments provide even more examples) further on URL shorteners:

    But the biggest burden falls on the clicker, the person who follows the links. The extra layer of indirection slows down browsing with additional DNS lookups and server hits. A new and potentially unreliable middleman now sits between the link and its destination. And the long-term archivability of the hyperlink now depends on the health of a third party. The shortener may decide a link is a Terms Of Service violation and delete it. If the shortener accidentally erases a database⁠, forgets to renew its domain, or just disappears⁠, the link will break. If a top-level domain changes its policy on commercial use⁠, the link will break. If the shortener gets hacked, every link becomes a potential phishing attack.

    ↩︎
  11. A static text-source site has many advantages for Long Content that I consider use almost a no-brainer.

    • By nature, they compile most content down to flat standalone textual files, which allow recovery of content even if the original site software has bit-rotted or the source files have been lost or the compiled versions cannot be directly used in new site software: one can parse them with XML tools or with quick hacks or by eye.
    • Site compilers generally require dependencies to be declared up front, and the approach makes explicitness and content easy, but dynamic interdependent components difficult, all of which discourages creeping complexity and hidden state.
    • A static site can be archived into a tarball of files which will be readable as long as web browsers exist (or afterwards if the HTML is reasonably clean), but it could be difficult to archive a CMS like WordPress or Blogspot (the latter doesn’t even provide the content in HTML—it only provides a rat’s-nest of inscrutable JavaScript files which then download the content from somewhere and display it somehow; indeed, I’m not sure how I would automate archiving of such a site if I had to; I would need some sort of headless browser to run the JS and serialize the final resulting DOM, possibly with some scripting of mouse/keyboard actions).
    • The content is often not available locally, or is stored in opaque binary formats rather than text (if one is lucky, it will at least be a database), both of which make it difficult to port content to other website software; you won’t have the necessary pieces, or they will be in wildly incompatible formats.
    • Static sites are usually written in a reasonably standardized markup language such as Markdown or LaTeX, in distinction to blogs which force one through WYSIWYG editors or invent their own markup conventions, which is yet another barrier: parsing a possibly ill-defined language.
    • The lowered sysadmin efforts (who wants to be constantly cleaning up spam or hacks on their WordPress blog?) are a final advantage: lower running costs make it more likely that a site will stay up rather than cease to be worth the hassle.

    Static sites are not appropriate for many kinds of websites, but they are appropriate for websites which are content-oriented, do not need interactivity, expect to migrate website software several times over coming decades, want to enable archiving by oneself or third parties (“lots of copies keeps stuff safe”), and to gracefully degrade after loss or bitrot.↩︎

  12. Such as burning the occasional copy onto read-only media like DVDs.↩︎

  13. One can’t be sure; the IA is fed by ⁠, and Alexa doesn’t guarantee pages will be & preserved if one goes through their request form.↩︎

  14. I am diligent in backing up my files, in periodically copying my content from the ⁠, and in preserving viewed Internet content; why do I do all this? Because I want to believe that my memories are precious, that the things I saw and said are valuable; “I want to meet them again, because I believe my feelings at that time were real.” My past is not trash to me, used up & discarded.↩︎

  15. Examples of such blogs:

    1. contributions to LessWrong were the rough draft of a philosophy book (or two)
    2. John Robb’s Global Guerrillas lead to his Brave New War: The Next Stage of Terrorism and the End of Globalization
    3. ’s Technium was turned into What Technology Wants⁠.

    An example of how not to do it would be Overcoming Bias blog; it is stuffed with fascinating citations & sketches of ideas, but they never go anywhere with the exception of his mind emulation economy posts which were eventually published in 2016 as The Age of Em. Just his posts on medicine would make a fascinating essay or just list—but he has never made one. ( would be a natural home for many of his posts’ contents, but will never be updated.)↩︎

  16. “Kevin Kelly Answers Your Questions”⁠, 2011-09-06:

    [Question:] “One purpose of the is to encourage long-term thinking. Aside from the Clock, though, what do you think people can do in their everyday lives to adopt or promote long-term thinking?”

    : “The 10,000-year Clock we are building in the hills of west Texas is meant to remind us to think long-term, but learning how to do that as in individual is difficult. Part of the difficulty is that as individuals we constrained to short lives, and are inherently not long-term. So part of the skill in thinking long-term is to place our values and energies in ways that transcend the individual—either in generational projects, or in social enterprises.”

    “As a start I recommend engaging in a project that will not be complete in your lifetime. Another way is to require that your current projects exhibit some payoff that is not immediate; perhaps some small portion of it pays off in the future. A third way is to create things that get better, or run up in time, rather than one that decays and runs down in time. For instance a seedling grows into a tree, which has seedlings of its own. A program like which gives breeding pairs of animals to poor farmers, who in turn must give one breeding pair away themselves, is an exotropic scheme, growing up over time.”

    ↩︎
  17. ‘Princess Irulan’, ⁠, ↩︎

  18. reports in “A good volunteer is hard to find” that of volunteers motivated enough to email them asking to help, something like <20% will complete the GiveWell test assignment and render meaningful help. Such persons would have been well-advised to have simply donated some money. I have long noted that many of the most popular pages on Gwern.net could have been written by anyone and drew on no unique talents of mine; I have on several occasions received offers to help with the DNB FAQ—none of which have resulted in actual help.↩︎

  19. An old sentiment; consider “A drop hollows out the stone” (Ovid, Epistles) or Thomas Carlyle’s “The weakest living creature, by concentrating his powers on a single object, can accomplish something. The strongest, by dispensing his over many, may fail to accomplish anything. The drop, by continually falling, bores its passage through the hardest rock. The hasty torrent rushes over it with hideous uproar, and leaves no trace behind.” (The life of Friedrich Schiller, 1825)↩︎

  20. “Ten Lessons I wish I had been Taught”⁠, :

    Richard Feynman was fond of giving the following advice on how to be a genius. You have to keep a dozen of your favorite problems constantly present in your mind, although by and large they will lay in a dormant state. Every time you hear or read a new trick or a new result, test it against each of your twelve problems to see whether it helps. Every once in a while there will be a hit, and people will say: ‘How did he do it? He must be a genius!’

    ↩︎
  21. IQ is sometimes used as a proxy for health, like height, because it sometimes seems like any health problem will damage IQ. Didn’t get much protein as a kid? Congratulations, your nerves will lack and you will literally think slower. Missing some iodine? Say good bye to <10 points! If you’re anemic or iron-deficient, that might increase to <15 points. Have tapeworms? There go some more points, and maybe centimeters off your adult height, thanks to the worms stealing nutrients from you. Have a rough birth and suffer a spot of hypoxia before you began breathing on your own? Tough luck, old bean. It is very easy to lower IQ; you can do it with a baseball bat. It’s the other way around that’s nearly impossible.↩︎

  22. And America has tried pretty hard over the past 60 years to affect IQ. The whole nature/nurture would be moot if there were some nutrient or educational system which could add even 10 points on average, because then we would use it on all the blacks. But it seems that I’m constantly reading about programs like which boost IQ for a little while… and do nothing in the long run.↩︎

  23. For details on the many valuable correlates of the Conscientiousness personality factor, see Conscientiousness and online education⁠.↩︎

  24. 25 episodes, 6 movies, >11 manga volumes—just to stick to the core works.↩︎

  25. ⁠, KKS XII: 609:

    More than my life
    What I most regret
    Is
    A dream unfinished
    And awakening.
    ↩︎
  26. As with Cloud Nine; I accidentally erased everything on a routine basis while messing around with Windows.↩︎

  27. For example, I notice I am no longer deeply interested in the occult. Hopefully this is because I have grown mentally and recognize it as rubbish; I would be embarrassed if when I died it turned out my youthful self had a better grasp on the real world.↩︎

  28. Some pages don’t have any connection to predictions. It’s possible to make predictions for some border cases like the terrorism essays (death tolls, achievements of particular groups’ policy goals), but what about the short stories or poems? My imagination fails there.↩︎

  29. Thinking of predictions is good mental discipline; we should always be able to cash out our beliefs in terms of the real world, or know why we cannot. Unfortunately, humans being humans, we need to actually track our predictions——lest our predicting degenerate into entertainment like political punditry.↩︎

  30. Dozens of theories have been put forth. I have been collecting & making predictions; and am up to 219. It will be interesting to see how the movies turn out.↩︎

  31. I have 2 predictions registered about the thesis on PB.com: 1 reviewer will accept my theory by 2016 and the light novels will finish by 2015⁠.↩︎

  32. See Robin Hanson, “If Uploads Come First”↩︎

  33. I originally used last file modification time but this turned out to be confusing to readers, because I so regularly add or update links or add new formatting features that the file modification time was usually quite recent, and so it was meaningless.↩︎

  34. Reactive archiving is inadequate because such links may die before my crawler gets to them, may not be archivable, or will just expose readers to dead links for an unacceptably long time before I’d normally get around to them.↩︎

  35. I like the static site approach to things; it tends to be harder to use and more restrictive, but in exchange it yields better performance & leads to fewer hassles or runtime issues⁠. The static model of compiling a single monolithic site directory also lends itself to testing: any shell script or CLI tool can be easily run over the compiled site to find potential bugs (which has become increasingly important as site complexity & size increases so much that eyeballing the occasional page is inadequate).↩︎

  36. Rutter argues for this point in Web Typography⁠, which is consistent with my own A/B tests where even lousy changes are difficult to distinguish from zero effect despite large n, and with the general shambolic state of the Internet (eg as reviewed in the 2019 Web Almanac). If users will not install adblock and loading times of multiple seconds have relatively modest traffic reductions, things like aligning columns properly or using section signs or sidenotes must have effects on behavior so close to zero as to be unobservable.↩︎

  37. Paraphrased from Dialogues of the Zen Masters as quoted in pg 11 of the Editor’s Introduction to Three Pillars of Zen:

    One day a man of the people said to Master Ikkyu: “Master, will you please write for me maxims of the highest wisdom?” Ikkyu immediately brushed out the word ‘Attention’. “Is that all? Will you not write some more?” Ikkyu then brushed out twice: ‘Attention. Attention.’ The man remarked irritably that there wasn’t much depth or subtlety to that. Then Ikkyu wrote the same word 3 times running: ‘Attention. Attention. Attention.’ Half-angered, the man demanded: “What does ‘Attention’ mean anyway?” And Ikkyu answered gently: “Attention means attention.”

    ↩︎
  38. And also, admittedly, for esthetic value. One earns the right to add ‘extraneous’ details by first putting in the hard work of removing the actual extraneous details; only after the ground has been cleared—the ‘data-ink ratio’ maximized, the ‘chartjunk’ removed—can one see what is actually beautiful to add.↩︎

  39. The default presentation of separate pages means that an entire page may contain only a single paragraph or sentence. The HTML versions of many technical manuals (typically compiled from LaTeX, Docbook, or GNU Info) are even worse, because they fail to exploit prefetching & are slower than local documentation, and take away all of the useful keybindings which makes navigating info manuals fast & convenient. Reading such documentation in a web browser is Chinese water torture. (That, decades later, the GNU project keeps generating documentation in that format, rather than at least as large single-page manuals with hyperlinked table-of-contents, is a good example of how bad they are at UI/UX design.) And it’s not clear that it’s that much worse than the other extreme, the monolithic which includes every detail under the sun and is impossible to navigate without one’s eyes glazing over even using to navigate through dozens of irrelevant hits—every single time!↩︎

  40. Why don’t all PDF generators use that? Software patents, which makes it hard to install the actual JBIG2 encoder (supposedly all JBIG2 encoding patents had expired by 2017, but no one, like Linux distros, wants to take the risk of unknown patents surfacing.), which has to ship separately from ocrmypdf, and worries over edge-cases in JBIG2 where numbers might be visually changed to different numbers to save bits.↩︎

  41. Specifically: some OS/browsers preserve soft hyphens in copy-paste, which might confuse readers, so we use JS to delete soft hyphens; this breaks for users with JS disabled, and on Linux, the X GUI bypasses the JS entirely for middle-click but no other way of copy-pasting. There were some additional costs: the soft-hyphens made the final HTML source code harder to read, made regexp & string searches/replaces more error-prone, and apparently some are so incompetent that they pronounce every soft-hyphen!↩︎

  42. An unusual choice, as one does not associate IBM with font design excellence, but nevertheless, it was our choice after blind comparison of ~20 code fonts with (which we consider a requirement for code).↩︎

  43. Sidenotes have long been used as a typographic solution to densely-annotated texts such as the (first 2 pages), but have not shown up much online yet.

    Pierre Bayle’s Historical and Critical Dictionary, demonstrating recursive  (1737, volume 4, pg901; source: Google Books)

    An early & inspiring use of margin/side notes.↩︎

  44. We write a short Haskell program as part of a pipeline:

    echo '{-# LANGUAGE OverloadedStrings #-};
          import Data.Text as T;
          main = interact (T.unpack . T.unlines . Prelude.filter (/="") .
                           T.split (not . (`elem` "0123456789,.")) . T.pack)' > ~/number.hs &&
    find ~/wiki/ -type f -name "*.page" -exec cat "{}" \; | runhaskell ~/number.hs |
     sort | tr -d ',' | tr -d '.' | cut -c 1 | sed -e 's/0$//' -e '/^$/d' > ~/number.txt
    ↩︎
  45. Graph then test:

    numbers <- read.table("number.txt")
    ta <- table(numbers$V1); ta
    
    #     1     2     3     4     5     6     7     8     9
    # 20550 20356  7087  5655  3900  2508  2075  2349  2068
    ## cribbing exact R code from http://www.math.utah.edu/~treiberg/M3074BenfordEg.pdf
    sta <- sum(ta)
    pb <- sapply(1:9, function(x) log10(1+1/x)); pb
    m <- cbind(ta/sta,pb)
    colnames(m)<- c("Observed Prop.", "Theoretical Prop.")
    barplot( rbind(ta/sta,pb/sum(pb)), beside = T, col = rainbow(7)[c(2,5)],
                  xlab = "First Digit")
    title("Benford's Law Compared to Writing Data")
    legend(16,.28, legend = c("From Page Data", "Theoretical"),
           fill = rainbow(7)[c(2,5)],bg="white")
    chisq.test(ta,p=pb)
    #
    #     Chi-squared test for given probabilities
    #
    # data:  ta
    # X-squared = 9331, df = 8, p-value < 2.2e-16
    ↩︎
  46. PD increases economic efficiency through—if nothing else—making works easier to find. says that “Obscurity is a far greater threat to authors and creative artists than piracy.” If that is so, then that means that difficulty of finding works reduces the welfare of artists and consumers, because both forgo a beneficial trade (the artist loses any revenue and the consumer loses any enjoyment). Even small increases in inconvenience make big differences⁠.↩︎

  47. Not that I could sell anything on this wiki; and if I could, I would polish it as much as possible, giving me fresh copyright.↩︎