Subscripts For Citations

A typographic proposal: replace cumbersome inline citation formats like ‘Foo et al. (2010)’ with subscripted dates/sources like ‘Foo…2020’. Intuitive, easily implemented, consistent, and compact.
technology, design, philosophy
2020-01-082020-03-23 finished certainty: certain importance: 2

I pro­pose reviv­ing an old Gen­eral Seman­tics nota­tion: bor­row from sci­en­tific nota­tion and use sub­scripts like ‘Gwern2020’ for denot­ing sources (like cita­tion, tim­ing, or medi­um). Using sub­script indices is flex­i­ble, com­pact, uni­ver­sally tech­ni­cally sup­port­ed, and intu­itive. While (cur­rent­ly) unusu­al, sub­script­ing might be a use­ful trick for clearer writ­ing, com­pared to omit­ting such infor­ma­tion or using stan­dard cum­ber­some cir­cum­lo­cu­tions.

the Sapir-Whorf hypoth­e­sis so beloved of 20th cen­tury thinkers & SF, or that we can make our­selves much more ratio­nal by One Weird Lin­guis­tic Trick. There is no far trans­fer, and the ben­e­fits of improved vocabulary/notation are inher­ently domain-spe­cif­ic. You think the same thoughts in Eng­lish as you do in Chi­nese. But, like good typog­ra­phy, good lin­guis­tic con­ven­tions may be worth all told, say, even as much as 5% of what­ever one val­ues—and that’s not noth­ing. In ‘rec­ti­fy­ing names’, be real­is­tic: aim low. (It’s def­i­nitely worth­while to do things like spellcheck your writ­ings, after all, even though no amount of spellcheck can res­cue a bad idea.)

Good Writing Conventions

Check­list approach. I already use a few unusual con­ven­tions, like attempt­ing to use the to be more sys­tem­atic about the strength of my claims or always link­ing full­text in cita­tions (and improv­ing using link anno­ta­tions which do not just link full­text but present the abstract/excerpts/summary as well), and I employ a few more domain-spe­cific tricks like avoid­ing use of the word ‘sig­nif­i­cance’ in sta­tis­tics con­texts, cur­ren­cies (to avoid the of doing it by hand & so not doing it at all), or using research-spe­cific check­lists. With­out stray­ing into ter­ri­tory or attempt­ing to do every­thing in for­mal logic or seri­ous eccen­tric­i­ty, what else could be done?

Subscripts For Citations/Dates/Sources/Evidentials

One idea for more pre­cise Eng­lish writ­ing which I think could be use­fully revived is broader use of sub­scripts.

Dis­tin­guish­ing things named the same. The sub­script­ing idea is derived from (GS)1, which itself bor­rows it from stan­dard sci­en­tific nota­tion, like physics/statistics/mathematics/chemistry/programming: a is an index dis­tin­guish­ing mul­ti­ple ver­sions of some­thing, such as quan­ti­ty, loca­tion, or time, eg xt vs xt+1. They’re typ­i­cally not seen out­side STEM con­texts, aside from a few obscure uses like / .


How­ev­er, there are many places we could use sub­script­ing to be clearer & more com­pact about which ver­sion we are refer­ring to, using them as , and because it’s clearer & more com­pact, we can afford to use it more places with­out it wast­ing space/effort/patience. Cita­tions are a good use case. Why write “Frieden­bach (2012)” if we can write “Frieden­bach2012”? The lat­ter is short­er, eas­ier to read, less ambigu­ous (espe­cially if we use it in par­en­thet­i­cals, see Frieden­bach (2012)), and does­n’t come in a dozen dif­fer­ent slight­ly-­vary­ing house styles.


But why restrict sub­script­ing to for­mal pub­li­ca­tions or writ­ten doc­u­ments? Apply it to any quote, state­ment, or opin­ion where index­ing vari­ables like time might be rel­e­vant. Refus­ing to allow easy ref­er­ences to any­thing not a book is but codex chau­vin­ism.

One con­ven­tion, arbi­trary meta­da­ta. It is a uni­fied nota­tion: regard­less of whether some­thing was thought, spo­ken, or writ­ten by me in 2020, it gets the same nota­tion—“Gwern2020”. The evi­den­tial can be expanded as nec­es­sary: if it’s a paper or essay, the ‘2020’ can be a hyper­link, or if it’s a ‘per­sonal com­mu­ni­ca­tion’, then there can be a bib­li­og­ra­phy entry stat­ing as much, or if it’s the author about their own beliefs/actions/statements in 2020, fur­ther infor­ma­tion nei­ther nec­es­sary nor usu­ally pos­si­ble (and it avoids awk­ward cus­tom phrase­ol­ogy like “As I thought back in 2020 or so….”). In con­trast, nor­mal cita­tion style cum­ber­somely uses a dif­fer­ent for­mat for each, or pro­vides no guid­ance: how do you grace­fully cite a paper writ­ten one year but whose author changed their mind 5 years later based on new results and who told you so 10 years after that? (“Dr. Bach orig­i­nally main­tained A (Bach et al. (2000)) but grad­u­ally mod­i­fied his posi­tion until 2005 when he recalls writ­ing in his diary he had lost all con­fi­dence in A (per­sonal com­mu­ni­ca­tion, accord­ing to Frieden 2015)…”)

Multiple Authors

‘et al’ = ‘…’ Sin­gle or dou­ble author­ship is straight­for­ward, just ‘Frieden­bach2012’ or ‘Frieden & Bach2012’ But how should mul­ti­-au­thor cita­tions, cur­rently denoted by ‘et al’ or ‘et al.’ or even ‘et al.’, be han­dled?

The exist­ing ‘et al’ nota­tion is pretty ridicu­lous: not only does it take up 6 let­ters and is nat­ural lan­guage which should be a sym­bol, it’s ambigu­ous & hard to machine-­parse, and it’s not even Eng­lish2! Writ­ing ‘Foo et al2010’ or ‘Fooet al 2010’ does­n’t look nice, and it makes the sub­script­ing far less com­pact.

My cur­rent sug­ges­tion is to do the expected thing: when you elide or omit some­thing in Eng­lish or tech­ni­cal writ­ing, how do you express that? Why, with an ‘…’, of course. So one would write ‘Foo…2010’ or pos­si­bly ‘Foo…2010’. (I think the for­mer is prob­a­bly bet­ter, since there is less risk of con­fu­sion over what is being elid­ed.)

Unicode Ellipsis

Funky alter­na­tives. Hor­i­zon­tal ellip­sis aren’t the only kind: there are sev­eral oth­ers in Uni­code, includ­ing mid­line ‘⋯’ and ver­ti­cal ‘⋮’ and even the “down right diag­o­nal ellip­sis” ‘⋱’, so one could do ‘Foo⋯2010’ or ‘Foo⋮2010’ or ‘Foo⋱2010’. (I’m not sure about sup­port for these par­tic­u­lar Uni­code enti­ties, but they show up with­out issue in my Fire­fox, Emacs, and urxvt, so they should­n’t be too rare.) The ver­ti­cal ellip­sis is nice but unfor­tu­nately it’s hard to see the first/top dot because it almost over­laps with the final let­ter, mak­ing it look like a weird colon. The mid­line ellip­sis is mid­dling, and does­n’t really have any virtue. But I par­tic­u­larly like the last one, down-right-­di­ag­o­nal ellip­sis, because it works visu­ally so well—it leads the eye down and to the right and is clear about the omis­sion being an entire phrase, so to speak.

Technical Support

Sub­scripts: already in a the­ater near you. Because it’s already used so much in tech­ni­cal writ­ing, sub­script­ing is rea­son­ably famil­iar to any­one who took high­school chem­istry & can be quickly fig­ured out from con­text for those who’ve for­got­ten, and it’s well-­sup­ported by fonts and markup lan­guages and word proces­sors: it’s writ­ten x~t~ in & some Mark­down exten­sions like markdown-it (but not Red­dit), x<sub>t</sub> in HTML, x<subscript>t</subscript> in , x_t in /, x\ :sub: \t in ; and it has key­bind­ings C-= in Microsoft Office, C-B in , C-, in Google Docs etc. So sub­script­ing can be used almost every­where imme­di­ate­ly.

Example Use

Exam­ple: here are 3 ver­sions of a text; one stripped of cita­tions and evi­den­tials, one with them writ­ten out in long form, and one with sub­scripts:

  1. I went to Istan­bul for a trip, and saw all the friendly street cats there, just as I’d read about in Abdul Bey; he quotes the local Hakim Abdul say­ing that the cats even look dif­fer­ent from cats else­where (but after fur­ther thought, I’m not sure I agree with that there). I and my wife had a won­der­ful trip, although while she clearly enjoyed the trip to the city, she claimed the traf­fic was ter­ri­bly oppres­sive and ruined the trip. (Oh real­ly?)
  2. In 2010, I went to Istan­bul for a trip, and saw all the friendly street cats there, just as I’d read about in Abdul Bey’s 2000 Street Cats of Istan­bul; he quotes the local Hakim Abdul in 1970 say­ing that the cats even look dif­fer­ent from cats else­where (but after fur­ther thought as I write this now in 2020, I’m not sure I agree with Bey (2000)). I and my wife had a won­der­ful trip, although while she clearly enjoyed the trip to the city, on Face­book she claimed the traf­fic was ter­ri­bly oppres­sive and ruined the trip. (Oh real­ly?)
  3. I2010 went to Istan­bul for a trip, and saw all the friendly street cats there, just as I’d read about in Abdul Bey2000 (Street Cats of Istan­bul); he quotes the local Hakim Abdul1970 say­ing that the cats even look dif­fer­ent from cats else­where (but after fur­ther thought, I’m not sure I2020 agree with Bey2000). I and my wife had a won­der­ful trip, although while she clearly enjoyed the trip to the city, she claimedFB the traf­fic was ter­ri­bly oppres­sive and ruined the trip. (Oh real­ly?)

In the first ver­sion, sup­press­ing the meta­data leads to a con­fus­ing pas­sage. What did Bey write? We don’t learn when Abdul expressed his opin­ion—which is impor­tant because Istan­bul, as a large fast-­grow­ing metrop­o­lis, may have changed greatly over the 40 years from quote to vis­it. When did the speaker become skep­ti­cal of the claim Istan­bul cats both act & look dif­fer­ent? What might explain the wife’s incon­sis­ten­cy, and which ver­sion should we put more weight on?

The sec­ond ver­sion answers all these ques­tions, but at the cost of con­sid­er­able pro­lix­i­ty, jam­ming in comma phrases to spec­ify date or source. Few peo­ple would want to either write or read such a pas­sage, and the fussi­ness has a dis­tinctly fussy pseudo-a­ca­d­e­mic air. Unsur­pris­ing­ly, few peo­ple will bother with this—any more than they will bother pro­vid­ing infla­tion-ad­justed dol­lar amounts of some­thing from a decade ago (even though that’s mis­lead­ing by a good 15% or so, and com­pound­ing), or they’d want to check a pay­walled paper, or redo cal­cu­la­tions in Roman numer­als.

The third ver­sion may look a lit­tle alien because of the sub­scripts, but it pro­vides all the infor­ma­tion of the sec­ond ver­sion plus a lit­tle more (by mak­ing explicit the implicit ‘2020’), in con­sid­er­ably less space (as we can delete the cir­cum­lo­cu­tions in favor of a sin­gle con­sis­tent sub­scrip­t), and reads more pleas­antly (the meta­data is lit­er­ally out of the way until we decide we need it).

Possible Alternative Notation

I con­sid­ered 3 alter­na­tives:

  1. Super­scripts: already over­loaded as foot­notes & pow­ers

  2. Bang nota­tion: another pos­si­ble nota­tion for dis­am­biguat­ing, is the “X!Y” nota­tion (ap­par­ently derived from ), which is asso­ci­ated with online fan­doms & fan­fic­tion, and gives nota­tion like “2020!g­w­ern”.

    This nota­tion puts the meta­data first, which is con­fus­ing yodas­peak (what does the ‘2020’ refer to? It dan­gles until you read on); it makes it inline & ful­l-­sized, and then tacks on an addi­tional char­ac­ter just to take up even more space; it’s con­fus­ing and unusual to any­one who isn’t famil­iar with it from online fan­fic­tion already, and to those who are famil­iar, it is low-s­ta­tus and has bad con­no­ta­tions.

  3. Ruby anno­ta­tions: as men­tioned above, there is (but with spotty browser sup­port & no sup­port at all in most other for­mats) for ‘ruby’ anno­ta­tions which are sim­i­lar to super­scripts and intended for inter­lin­ear gloss­es.

    Unfor­tu­nate­ly, in a hor­i­zon­tal lan­guage like Eng­lish (as opposed to Chinese/Japanese), they require extremely high to be at all leg­i­ble. Exam­ple:

    Exam­ple of HTML ‘ruby’ inter­lin­ear gloss (eg <ruby>$200<rt>$375 in 2020</ruby>).
  4. New sym­bols: no font, edi­tor, or word proces­sor sup­port kills any new sym­bol pro­pos­al, and can be rejected out of hand.


Deal-break­er: low sta­tus? The major down­side, of course, is that sub­script­ing is novel and weird. It at least is not asso­ci­ated with any­thing bad (such as fanfic­s), and is asso­ci­ated with sci­ence & tech­nol­o­gy, but I’m sure it will deter read­ers any­way. Does it do enough good to be worth using despite the con­sid­er­able hit to weird­ness points? That I don’t know.

  1. I am con­sid­er­ably less impressed by other GS lin­guis­tic sug­ges­tions like , but sub­script­ing seems like it may be worth res­cu­ing.↩︎

  2. Actu­al­ly, it’s not even Latin because it’s an abbre­vi­a­tion for the actual Latin phrase, (to save you one char­ac­ter and also avoid the need to cor­rectly con­ju­gate the Lat­in—this is frac­tal, is what I’m say­ing), but as pseudo-Lat­in, that means that many will ital­i­cize it, as for­eign words/phrases usu­ally are—but now that is even more work, even more visual clut­ter, and intro­duces ambi­gu­ity with other uses of ital­ics like titles. A ter­ri­ble nota­tion.↩︎