Meta page describing Gwern.net website design experiments and post-mortem analyses.
- jQuery Sausages Scrollbar
- Beeline Reader
- Google Custom Search Engine
- Tufte-CSS Sidenotes
- DjVu Files
- Darcs/Github repo
- Spaced-Separated URLs
- Donation Links
- Google Web Fonts
- Quote Syntax Highlighting
- Link Screenshot Previews
- Automatic Dark Mode
- Multi-Column Footnotes
- Hyphenopoly Hyphenation
- Knuth-Plass Line Breaking
- Automatic Smallcaps
- Disqus Comments
Often the most interesting part of any design are the parts that are invisible—what was tried but did not work. Sometimes they were unnecessary, other times users didn’t understand them because it was too idiosyncratic, and sometimes we just can’t have nice things.
Some post-mortems of things I tried on Gwern.net but abandoned (in chronological order).
A Pandoc-based wiki using Darcs as a history mechanism, serving mostly as a demo; the requirement that ‘one page edit = one Darcs revision’ quickly became stifling, and I began editing my Markdown files directly and recording patches at the end of each day, and syncing the HTML cache with my host (at the time, a personal directory on
code.haskell.org). Eventually I got tired of that and figured that since I wasn’t using the wiki, but only the static compiled pages, I might as well switch to Hakyll and a normal static website approach.
jQuery sausages: unhelpful UI visualization of section lengths.
A UI experiment, ‘sausages’ add a second scroll bar where vertical lozenges correspond to each top-level section of the page; it indicates to the reader how long each section is and where they are. (They look like a long link of pale white sausages.) I thought it might assist the reader in positioning themselves, like the popular ‘floating highlighted Table of Contents’ UI element, but without text labels, the sausages were meaningless. After a jQuery upgrade broke it, I didn’t bother fixing it.
Beeline Reader: a ‘reading aid’ which just annoyed readers.
BLR tries to aid reading by coloring the beginnings & endings of lines to indicate the continuation and make it easier for the reader’s eyes to saccade to the correct next line without distraction (apparently dyslexic readers in particular have issue correctly fixating on the continuation of a line). The A/B test indicated no improvements in the time-on-page metric, and I received many complaints about it; I was not too happy with the browser performance or the appearance of it, either.
I’m sympathetic to the goal and think syntax highlighting aids are underused, but BLR was a bit half-baked and not worth the cost compared more straightforward interventions like reducing paragraph lengths or more rigorous use of ‘structural reading’ formatting. (We may be able to do typography differently in the future with new technology, like VR/AR headsets which come with eye tracking technology intended for foveated rendering—forget simple tricks like emphasizing the beginning of the next line as the reader reaches the end of the current line, do we need ‘lines’ at all if we can do things like just-in-time display the next piece of text in-place to create an ‘infinite line’?)
Google CSE: website search feature which too few people used.
A ‘custom search engine’, a CSE is a souped-up
site:gwern.net/ Google search query; I wrote one covering Gwern.net and some of my accounts on other websites, and added it to the sidebar. Checking the analytics, perhaps 1 in 227 page-views used the CSE, and a decent number of them used it only by accident (eg. searching “e”); an A/B testing for a feature used so little would be powerless, and so I removed it rather than try to formally test it.
An early admirer of Tufte-CSS for its sidenotes, I gave a Pandoc plugin a try only to discover a terrible drawback: the CSS didn’t support block elements & so the plugin simply deleted them. This bug apparently can be fixed, but the density of footnotes led to using
DjVu document format use: DjVu is a space-efficient document format with the fatal drawback that Google ignores it, and “if it’s not in Google, it doesn’t exist.”
DjVu is a document format superior to PDFs, especially standard PDFs: I discovered that space savings of 5× or more were entirely possible, so I used it for most of my book scans. It worked fine in my document viewers, Internet Archive & Libgen preferred them, and so why not? Until one day I wondered if anyone was linking them and tried searching in Google Scholar for some. Not a single hit! (As it happens, GS seems to specifically filter out books.) Perplexed, I tried Google—also nothing. Huh‽ My scans have been visible for years, DjVu dates to the 1990s and was widely used (if not remotely as popular as PDF), and G/GS picks up all my PDFs which are hosted identically. What about
filetype:djvu? I discovered to my horror that on the entire Internet, Google indexed about 50 DjVu files. Total. While apparently at one time Google did index DjVu files, that time must be long past.
Loathe to take the space hit, which would noticeably increase my Amazon AWS S3 hosting costs, I looked into PDFs more carefully. I discovered PDF technology had advanced considerably over the default PDFs that gscan2pdf generates, and with JBIG2 compression, they were closer to DjVu in size; I could conveniently generate such PDFs using ocrmypdf.1 This let me convert over at moderate cost and now my documents do show up in Google.
Darcs Patch-tag/Github Git repo: no useful contributions or patches submitted, added considerable process overhead, and I accidentally broke the repo by checking in too-large PDFs from a failed post-DjVu optimization pass (I misread the result as being smaller, when it was much larger).
I removed the site-content repo and replaced it with an infrastructure-specific repo for easier collaboration with Said Achmiz.
Spaces in URLs: an OK idea but users are why we can’t have nice things.
Gitit assumed ‘titles = filenames = URLs’, which simplified things and I liked spaced-separated filenames; I carried this over to Hakyll, but gradually, by monitoring analytics realized this was a terrible mistake—as straightforward as URL-encoding spaces as
%20 may seem to be, no one can do it properly. I didn’t want to fix it because by the time I realized how bad the problem was, it would have required breaking, or later on, redirecting, hundreds of URLs and updating all my pages. The final straw was when The Browser linked a page incorrectly, sending ~1500 people to the 404 page. I finally gave in and replaced spaces with hyphens. (Underscores are the other main option but because of Markdown, I worry that trades one error for another.) I suspect I should have also lower-cased all links while I was at it, but thus far it has not proven too hard to fix case errors & lower-case URLs are ugly.
In retrospect, Sam Hughes was right: I should have made URLs as simple as possible (and then a bit simpler): a single word, lowercase alphanumerical, with no hyphens or underscores or spaces or punctuation of any sort. I am, however, locked in to longer hyphen-separated mixed-case URLs now.
AdSense banner ads (and ads in general): reader-hostile and probably a net financial loss.
I hated running banner ads, but before my Patreon began working, it seemed the lesser of two evils. As my finances became less parlous, I became curious as to how much lesser—but I could find no Internet research whatsoever measuring something as basic as the traffic loss due to advertising! So I decided to run an A / B test myself, with a proper sample size and cost-benefit analysis; the harm turned out to be so large that the analysis was unnecessary, and I removed AdSense permanently the first time I saw the results. Given the measured traffic reduction, I was probably losing several times more in potential donations than I ever earned from the ads. (Amazon affiliate links appear to not trigger this reaction, and so I’ve left them alone.)
Google Fonts web fonts: slow and buggy.
The original idea of Google Fonts was a trusted high-performance provider of a wide variety of modern, multi-lingual, subsetted drop-in fonts which would likely be cached by user browsers if you used a common font. You want a decent Baskerville font? Just customize a bit of CSS and off you go!
The reality turned out to be a bit different. The cache story turned out to be mostly wishful thinking as caches expired too quickly, and in any case, privacy concerns meant that major web browsers all split caches across domains, so a Google Font download on your domain did nothing at all to help with the download on my domain. With no cache help and another domain connection required, Google Fonts turned out to introduce noticeable latency in page rendering. The variety of fonts offered turned out to be somewhat illusory: while expanding over time, its selection of fonts was back then limited, and the fonts outdated or incomplete. Google Fonts was not trusted at all and routinely cited as an example of the invasiveness of the Google panopticon (without any abuse ever documented that I saw—nevertheless, it was), and for additional lulz, Google Fonts may have been declared illegal by the EU’s elastic interpretation of the GDPR.
Removing Google Fonts was one of the first design & performance optimizations Said made. We got both faster and nicer-looking pages by taking the master Github versions of Adobe Source Serif/Sans Pro (the Google Fonts version was both outdated & incomplete then) and subsetting them for Gwern.net specifically.
MathJax JS: switched to static rendering during compilation for speed.
For math rendering, MathJax and KaTeX are reasonable options (inasmuch as MathML browser adoption is dead in the water). MathJax rendering is extremely slow on some pages: up to 6 seconds to load and render all the math. Not a great reading experience. When I learned that it was possible to preprocess MathJax-using pages, I dropped MathJax JS use the same day.
I like the idea of treating English as a little (not a lot!) more like a formal language, such as a programming language, as it comes with benefits like syntax highlighting. In a program, the reader gets guidance from syntax highlighting indicating logical nesting and structure of the ‘argument’; in a natural language document, it’s one damn letter after another, spiced up with the occasional punctuation mark or indentation. (If Lisp looks like “oatmeal with fingernail clippings mixed in” due to the lack of “syntactic sugar”, then English must be plain oatmeal!) One of the most basic kinds of syntax highlighting is simply highlighting strings and other literals vs code: I learned early on that syntax highlighting was worth it just to make sure you hadn’t forgotten a quote or parenthesis somewhere! The same is true of regular writing: if you are extensively quoting or naming things, the reader can get a bit lost in the thickets of curly quotes and be unsure who said what.
I discovered an obscure HTML tag enabled by an obscurer Pandoc setting: the quote tag
<q>, which replaces quote characters and is rendered by the browser as quotes (usually). Quote tags are parsed explicitly, rather than just being opaque natural language text blobs, and so they, at least, can be manipulated easily by JS/CSS and syntax-highlighted. Anything inside a pair of quotes would be tinted a gray to visually set it off similarly to the blockquotes. I was proud of this tweak, which I’ve seen nowhere else.
The problems with it was that not everyone was a fan (to say the least); it was not always correct (there are many double-quotes which are not literal quotes of anything, like rhetorical questions); and it interacted badly with everything else. There were puzzling drawbacks: eg. web browsers delete them from copy-paste, so we had to use JS to convert them to normal quotes. Even when it was worked out, all the HTML/CSS/JS had to be constantly rejiggered to deal with interactions with them, browser updates would silently break what was working, and Said hated the look. I tried manually annotating quotes to ensure they were all correct and not used in dangerous ways, but even with interactive regexp search-and-replace to assist, the manual toil of constantly marking up quotes was a major obstacle to writing. So I gave in. It was not meant to be.
Typographic rubrication: a solution in search of a problem.
Red emphasis is a visual strategy that works wonderfully well for many styles, but not Gwern.net that I could find. Using it on the regular website resulted in too much emphasis and the lack of color anywhere else made the design inconsistent; we tried using it in dark mode to add some color & preserve night vision by making headers/links/drop-caps red, but it looked like, as one reader put it, “a vampire fansite”. It is a good idea, but we just haven’t found a use for it. (Perhaps if I ever make another website, it will be designed around rubrication.)
wikipedia-popups.js: a JS library written to imitate Wikipedia popups, which used the WP API to fetch article summaries; obsoleted by the faster & more general local static link annotations.
I disliked the delay and as I thought about it, it occurred to me that it would be nice to have popups for other websites, like Arxiv/BioRxiv links—but they didn’t have APIs which could be queried. If I fixed the first problem by fetching WP article summaries while compiling articles and inlining them into the page, then there was no reason to include summaries for only Wikipedia links, I could get summaries from any tool or service or API, and I could of course write my own! But that required an almost complete rewrite to turn it into
The general popups functionality now handles WP articles as a special-case, which happens to call their API, but could also call another API, pop up the URL in an iframe (whether within the current page, on another page, or even on another website entirely), rewrite the URL being popped up in an iframe (such as trying to fetch a syntax-highlighted version of a linked file, or fetching the Ar5iv HTML version of an Arxiv paper), or fetch a pre-generated page like an annotation or backlinks or similar-links page.
Auto-dark mode: a good idea but “users are why we can’t have nice things”.
OSes/browsers have defined a ‘global dark mode’ toggle the user can set if they want dark mode everywhere, and this is available to a web page; if you are implementing a dark mode for your website, it then seems natural to just make it a feature and turn on iff the toggle is on. There is no need for complicated UI-cluttering widgets with complicated implementations. And yet—if you do do that, users will regularly complain about the website acting bizarre or being dark in the daytime, having apparently forgotten that they enabled it (or never understood what that setting meant).
A widget is necessary to give readers control, although even there it can be screwed up: many websites settle for a simple negation switch of the global toggle, but if you do that, someone who sets dark mode at day will be exposed to blinding white at night… Our widget works better than that. Mostly.
Multi-column footnotes: mysteriously buggy and yielding overlaps.
Since most footnotes are short, and no one reads the endnote section, I thought rendering them as two columns, as many papers do, would be more space-efficient and tidy. It was a good idea, but it didn’t work.
Hyphenopoly: it turned out to be more efficient (and not much harder to implement) to hyphenate the HTML during compilation than to run JS client-side.
To work around Google Chrome’s 2-decade-long refusal to ship hyphenation dictionaries on desktop and enable justified text (and incidentally use the better TeX hyphenation algorithm), the JS library Hyphenopoly will download the TeX English dictionary and typeset a webpage itself. While the performance cost was surprisingly minimal, it was there, and it caused problems with obscurer browsers like Internet Explorer.
So we scrapped Hyphenopoly, and I later implemented a Hakyll function using a Haskell version of the TeX hyphenation algorithm & dictionary to insert at compile-time a ‘soft hyphen’ everywhere a browser could usefully break a word, which enables Chrome to hyphenate correctly, at the moderate cost of inlining them and a few edge cases.2
Desktop Chrome finally shipped hyphen support in early 2020, and I removed the soft-hyphen hyphenation pass in April 2021 when CanIUse indicated >96% global support.
Knuth-Plass Line breaking: not to be confused with Knuth-Liang hyphenation discussed before, which simply optimizes the set of legal hyphens, Knuth-Plass line breaking tries to optimize the actual chosen linebreaks.
Particularly on narrow screens, justified text does not fit well, and must be distorted to fit, by microtypographic techniques like inserting spaces between/within words or changing glyph size. The default line breaking that web browsers use is a bad one: it is a greedy algorithm, which produces many unnecessary poor layouts, causing many stretched out words and blatant rivers. This bad layout gets worse the narrower the text, and so on Gwern.net lists on mobile, there are a lot of bad-looking list items when fully-justified with greedy layout.
Knuth-Plass instead looks at paragraphs as a whole, and calculates every possible layout to pick the best one. As can be seen in any TeX output, the results are much better. Knuth-Plass (or its competitors) would solve the justified mobile layout problem.
Unfortunately, no browser implements any such algorithm (aside from a brief period where Internet Explorer, of all browsers, apparently did); there is a property in CSS4,
typeset explicitly excludes lists and blockquotes, Bramstein commenting in 2014 that “This is mostly a tech-demo, not something that should be used in production. I’m still hopeful browser will implement this functionality natively at some point.” Knight’s
tex-linebreak suffers from fatal bugs too. Matthew Petroff has a demo which uses the brilliantly stupid brute-force approach of pre-calculating offline the Knuth-Plass linebreaks for every possible width—after all, monitor widths can only range ~1–4000px with the ‘readable’ range one cares about being a small subset of that—but it’s unclear, to say the least, how I’d ever use such a thing for Gwern.net, and doubtless has serious bugs of its own.) There are also questions about whether the performance on long pages would be acceptable, as the JS libraries rely on inserting & manipulating a lot of elements in order to force the browser to break where it should break, but those are largely moot when the prototypes are so radically incomplete. (My prediction is that the cost would be acceptable with some optimizations and constraints like considering a maximum of n lines.)
So the line breaking situation is insoluble for the foreseeable future. We decided to disable full justification on narrow screens, and settle for ragged-right.
Autopager keyboard shortcuts: binding Home/PgUp & End/PgDwn keyboard shortcuts to go to the ‘previous’/‘next’ logical page turned out to be glitchy & confusing.
HTML supports previous/next attributes (
"next") on links which specify what URL is the logical next or previous URL, which makes sense in many contexts like manuals or webcomics/web serials or series of essays; browsers make little use of this metadata—typically not even to preload the next page! (Opera apparently was one of the few exceptions.)
Such metadata was typically available in older hypertext systems by default, and so older more reader-oriented interfaces like pre-Web hypertext readers such info browsers frequently overloaded the standard page-up/down keybindings to, if one was already at the beginning/ending of a hypertext node, go to the logical previous/next node. This was convenient, since it made paging through a long series of info nodes fast, almost as if the entire info manual were a single long page, and it was easy to discover: most users will accidentally tap them twice at some point, either reflexively or by not realizing they were already at the top/bottom (as is the case on most info nodes due to egregious shortness). In comparison, navigating the HTML version of an info manual is frustrating: not only do you have to use the mouse to page through potentially dozens of 1-paragraph pages, each page takes noticeable time to load (because of failure to exploit preloading) whereas a local info browser is instantaneous.
After defining a global sequence for Gwern.net pages, and adding a ‘navbar’ to the bottom of each page with previous/next HTML links encoding that sequence, I thought it’d be nice to support continuous scrolling through Gwern.net, and wrote some JS to detect whether at the top/bottom of page, and on each Home/PgUp/End/PgDwn, whether that key had been pressed in the previous 0.5s, and if so, proceed to the previous/next page.
This worked, but proved buggy and opaque in practice, and tripped up even me occasionally. Since so few people know about that pre-WWW hypertext UI pattern (as useful as it is), would be unlikely to discover it, or use it much if they did discover it, I removed it.
.smallcaps-auto class: the typography of Gwern.net relies on “smallcaps”. We use smallcaps extensively as an additional form of emphasis going beyond italic, bold, and capitalization (and this motivated the switch from system Baskerville fonts to Source Serif Pro fonts). For example, keywords in lists can be emphasized as bold 1st top-level, italics 2nd level, and smallcaps 3rd level, making them much easier to scan.
However, there are other uses of smallcaps: acronyms/initials. 2 capital letters, like “AM”, don’t stand out; but names like “NASA” or phrases like “HTML/CSS” stick out for the same reason that writing in all-caps is ‘shouting’—capital letters are big! Putting them in smallcaps to condense them is a typographic refinement recommended by some typographers.3
Manually annotating every such case is a lot of work, even using interactive regexp search-and-replace. After a month or two, I resolved to do it automatically in Pandoc. So I created a rewrite plugin which would regexes on every string in the Pandoc AST for hits, split, and annotate the match in a HTML span element marked up with the
.smallcaps-auto class, which was styled by CSS like the existing
.smallcaps class. (Final code version.)
Doing so using Pandoc’s tree traversal library proved to be highly challenging due to a bunch of issues, and slow (I believe it doubled website compilation times due to the extravagant inefficiency of the traversal code & cost of running complex regexps on every possible node repeatedly). The rewrite approach meant that spans could be nested repeatedly, generating pointless
<span><span><span>... sequences (only partially ameliorated by more rewrite code to detect & remove those). The smallcaps regex was also hard to get right, and constantly sprouted new special-cases and exceptions. The injected span elements caused further complications downstream as they would break pattern-matches or add raw HTML to text I was not expecting to have raw HTML in it. The smallcaps themselves had many odd side-effects, like interactions with italics & link drop-shadow trick necessary for underlined links. The speed penalty did not stop at the website compilation, but affected users: Gwern.net pages are already intensive on browsers because of the extensive hyperlinks & formatting yielding a final large DOM (each atom of which caused additional load from the also-expanding set of JS & CSS), and the smallcaps markup added hundreds of additional DOM nodes to some pages. I also suspect that the very visibility of smallcaps contributed to the sense of “too fancy” or “overload” that many Gwern.net readers complain about: even if they don’t explicitly notice the smallcaps are smallcaps, they still notice that there is something unusual about all the acronyms. (If smallcaps were much more common, this would stop being a problem; but it is a problem and will remain one for as long as smallcaps are an exotic typographic flourish which must be explicitly enabled for each instance.)
The last straw was a change in annotations for Gwern.net essays to include their Table of Contents for easier browsing, where the ToCs in annotations got smallcaps-auto but the original ToCs did not (simply because the original ToCs are generated by Pandoc long after the rewrites are done, and are inaccessible to Pandoc plugins), creating an inconsistency and requiring even more CSS workarounds. At this point, with Said not a fan of smallcaps-auto and myself more than a little fed up, we decided to cut our losses and scrap the feature.
I still think that the idea of automatically using smallcaps for all-caps phrases like acronyms is valid—especially in technical writing, an acronym soup is overwhelming!—but the costs of doing so in the HTML DOM as CSS/HTML markup is too high for both writers & readers. It may make more sense for this sort of non-semantic change to be treated as a ligature and done by the font instead, which will have more control of the layout and avoid the need for special-cases. With smallcaps automatically done by the font, it can become a universal feature of online text, and lose its unpleasant unfamiliarity.
Disqus JS-based commenting system:
A commenting system was the sine qua non of blogs in the 2000s, but they required either a server to process comments (barring static websites) or an extortionately-expensive service using oft-incompatible plugins (barring blogging); they were also one of the most reliable ways (after being hacked thanks to WordPress) to kill a blog by filling it up with spam. Disqus helped disrupt incumbents by providing spam-filtering in a free JS-based service; while proprietary and lightly ad-supported at the time, it had some nice features like email moderation, and it supported the critical features of comment exports & anonymous comments. It quickly became the default choice for static websites which wanted a commenting system—like mine.
I set up Gwern.net’s Disqus in 2010-10-10; I removed it 4,212 days later, on 2022-04-21 (archive of comment exports).
There was no single reason to scrap Disqus, just a steady accumulation of minor issues:
Shift to social media: the lively blogosphere of the 2000s gave way in the 2010s to social media like Digg, Twitter, Reddit, Facebook—even in geek circles, momentum moved from on-blog comments to aggregators like Hacker News.
While there are still blogs with more comments on them than aggregators (eg. SlateStarCodex/Astral Codex Ten or LessWrong), this was increasingly only possible with a discrete community which centered on that blog. The culture of regular unaffiliated readers leaving comments is gone. I routinely saw aggregator:site comment ratios of >100:1. In the year before removal, I received 134 comments across >900,000 pageviews. For comparison, the last front-page Hacker News discussion had 254 comments, and the last weekly Astral Codex Ten ‘open thread’ discussion has >6× comments.
So, now I add links to those social media discussions in the “External Links” sections of pages to serve the purpose that the comment section used to. If no one is using the Disqus comments, why bother? (Much less move to an alternative like Commento, which costs >$100/year.) I am not the first blogger to observe that their commenting system has become vestigial, and remove it.
Monetization decay: it is a law of Internet companies that scrappy disruptive startups become extractive sclerotic incumbents as the VC money runs out & investors demand a return.
Disqus never became a unicorn and was eventually acquired by some sort of ad company. The new owners have not wrecked it the way many acquisitions go (eg. SourceForge), but it is clearly no longer as dynamic or invested-in as it used to, the spam-filtering seemed to occasionally fall behind the attackers, and the Disqus-injected advertising has gradually gotten heavier.
Many Disqus-user websites are unaware that Disqus lets you disable advertising on your website (it’s buried deep in the config), but Disqus’s reputation for advertising is bad enough that readers will accuse you of having Disqus ads anyway! (I think they look at one of the little boxes/page-cards for other pages on the same website which Disqus provides as recommendations, and without checking each one, assume that the rest are ads.) My ad experiments only investigated the harms of real advertising, so I don’t know how bad the effect of fake ads is—but I doubt it’s good.
odd bugs: One example of this decay is that I could never figure out why some Disqus comments on Gwern.net just… disappeared.
They weren’t casualties of page renames changing the URL, because comments disappeared on pages that had never been renamed. They weren’t deleted, because I knew I didn’t & the author would complain about me deleting them so they didn’t either. They weren’t marked as spam in the dashboard (as odd as retroactive spam-filtering would be, given that they had been approved initially). In fact, they weren’t anywhere in the dashboard that I could see, which made reporting issues to Disqus rather odd (and given the Disqus decay, I lacked faith that reporting bugs would help). The only way I knew they existed was if I had a URL to them (because I linked them as a reference) or if I could retrieve the original Disqus email of the comment.
So there are people out there who have left critical comments on Gwern.net, and are convinced that I deleted the comments to censor them and cover up what an intellectual fraud I am. Less than ideal. (One benefit of outsourcing comments to social media is that if someone is blamed for a bug, it won’t be me.)
dark mode: Disqus was designed for the 2000s, not the 2020s. Starting in the late 2010s, “dark mode” became a fad, driven mostly by smartphone use of web browsers in night-time contexts.
Disqus has some support for dark mode patched in, but it doesn’t integrate seamlessly into a website’s native customized dark mode. Since we put a lot of effort into making Gwern.net’s dark mode great, Disqus was a frustration.
Performance: Disqus was never lightweight. But the sheer weight of all of the (dynamic, uncached) JS & CSS it pulled in, filled with warnings & errors, only seemed to grow over the years.
Even with all of the features added to Gwern.net, I think the Disqus continued to outweigh it. Much of the burden looked to have little to do with commenting, and more to do with ads & tracking. It was frustrating to struggle with performance optimizations, only for any gains to be blown away as soon as the Disqus loaded, or during debugging, see the browser dev console rendered instantly unreadable.
It helped to use tricks like IntersectionObserver to avoid loading Disqus until the reader scrolled to the end of the page, but these brought their own problems. (Getting IntersectionObserver to work at all was tricky, and this trick creates new bugs: for example, I can only use 1 IntersectionObserver at a time without it breaking mysteriously; or, if a user clicks on a URL containing a Disqus ID anchor like
#comment-123456789, when Disqus has not loaded then that ID cannot exist and so the browser will load the page & not jump to the comment. As we have code to check for wrong anchors, this further causes spurious errors to be logged.) The weight of these wasn’t too bad (the Gwern.net side of Disqus was only ~250 lines of JS, 20 lines of CSS, & 10 of HTML), but the added complexity of interactions was.
Poor integration: Disqus increasingly just does not fit into Gwern.net and cannot be made to.
The dark mode & performance problems are examples of this, but it goes further. For example, the Disqus comment box does not respect the Gwern.net CSS and always looked lopsided because it did not line up with the main body. Disqus does not ‘know’ about page moves, so comments would be lost when I moved pages (which deterred me from ever renaming anything). Dealing with spam comments was annoying but had no solution other than locking down comments, defeating the point.
As the design sophistication increases, the lack of control becomes a bigger fraction of the remaining problems.
So eventually, a straw broke the camel’s back and I removed Disqus.
Why don’t all PDF generators use that? Software patents, which makes it hard to install the actual JBIG2 encoder (supposedly all JBIG2 encoding patents had expired by 2017, but no one, like Linux distros, wants to take the risk of unknown patents surfacing), which has to ship separately from ocrmypdf, and worries over edge-cases in JBIG2 where numbers might be visually changed to different numbers to save bits.↩︎
Specifically: some OS/browsers preserve soft hyphens in copy-paste, which might confuse readers, so we use JS to delete soft hyphens; this breaks for users with JS disabled, and on Linux, the X GUI bypasses the JS entirely for middle-click but no other way of copy-pasting. There were some additional costs: the soft-hyphens made the final HTML source code harder to read, made regexp & string searches/replaces more error-prone, and apparently some screen readers are so incompetent that they pronounce every soft-hyphen!↩︎