Subscripts For Citations

A typographic proposal: replace cumbersome inline citation formats like ‘Foo et al. (2010)’ with subscripted dates/sources like ‘Foo…2020’. Intuitive, easily implemented, consistent, compact, and can be used for evidentials in general.
technology, design, philosophy
2020-01-082020-11-29 finished certainty: certain importance: 2


I pro­pose re­viv­ing an old Gen­eral Se­man­tics no­ta­tion: bor­row from sci­en­tific no­ta­tion and use sub­scripts like ‘Gw­ern2020’ for de­not­ing sources (like ci­ta­tion, tim­ing, or medi­um). Us­ing sub­script in­dices is flex­i­ble, com­pact, uni­ver­sally tech­ni­cally sup­port­ed, and in­tu­itive. This con­ven­tion can go be­yond for­mal aca­d­e­mic ci­ta­tion and be ex­tended fur­ther to ‘ev­i­den­tials’ in gen­er­al, in­di­cat­ing the source & date of state­ments. While (cur­rent­ly) un­usu­al, sub­script­ing might be a use­ful trick for clearer writ­ing, com­pared to omit­ting such in­for­ma­tion or us­ing stan­dard cum­ber­some cir­cum­lo­cu­tions.

I so beloved of 20th cen­tury thinkers & SF, or that we can make our­selves much more ra­tio­nal by One Weird Lin­guis­tic Trick. There is no far trans­fer, and the ben­e­fits of im­proved vocabulary/notation are in­her­ently do­main-spe­cific. You think the same thoughts in Eng­lish as you do in Chi­nese. But, like good ty­pog­ra­phy, good lin­guis­tic con­ven­tions may be worth all told, say, even as much as 5% of what­ever one val­ues—and that’s not noth­ing. In ‘rec­ti­fy­ing names’, be re­al­is­tic: aim low. (It’s defi­nitely worth­while to do things like spellcheck your writ­ings, after all, even though no amount of spellcheck can res­cue a bad idea.)

Good Writing Conventions

Check­list ap­proach. I al­ready use a few un­usual con­ven­tions, like at­tempt­ing to use the to be more sys­tem­atic about the strength of my claims or al­ways link­ing full­text in ci­ta­tions (and im­prov­ing us­ing link an­no­ta­tions which do not just link full­text but present the abstract/excerpts/summary as well), and I em­ploy a few more do­main-spe­cific tricks like avoid­ing use of the word ‘sig­nifi­cance’ in sta­tis­tics con­texts, cur­ren­cies (to avoid the of do­ing it by hand & so not do­ing it at al­l), or us­ing re­search-spe­cific check­lists. With­out stray­ing into ter­ri­tory or at­tempt­ing to do every­thing in for­mal logic or se­ri­ous ec­cen­tric­i­ty, what else could be done?

Subscripts For Citations/Dates/Sources/Evidentials

One idea for more pre­cise Eng­lish writ­ing which I think could be use­fully re­vived is broader use of sub­scripts.

Dis­tin­guish­ing things named the same. The sub­script­ing idea is de­rived from (GS)1, which it­self bor­rows it from stan­dard sci­en­tific no­ta­tion, like physics/statistics/mathematics/chemistry/programming: a is an in­dex dis­tin­guish­ing mul­ti­ple ver­sions of some­thing, such as quan­ti­ty, lo­ca­tion, or time, eg xt vs xt+1. They’re typ­i­cally not seen out­side STEM con­texts, aside from a few ob­scure uses like / .

Citations

How­ev­er, there are many places we could use sub­script­ing to be clearer & more com­pact about which ver­sion we are re­fer­ring to, us­ing them as , and be­cause it’s clearer & more com­pact, we can afford to use it more places with­out it wast­ing space/effort/patience. Ci­ta­tions are a good use case. Why write “Frieden­bach (2012)” if we can write “Frieden­bach2012”? The lat­ter is short­er, eas­ier to read, less am­bigu­ous (e­spe­cially if we use it in par­en­thet­i­cals, see Frieden­bach (2012)), and does­n’t come in a dozen differ­ent slight­ly-vary­ing house styles.

Evidentials

But why re­strict sub­script­ing to for­mal pub­li­ca­tions or writ­ten doc­u­ments? Ap­ply it to any quote, state­ment, or opin­ion where in­dex­ing vari­ables like time might be rel­e­vant. Re­fus­ing to al­low easy ref­er­ences to any­thing not a book is but codex chau­vin­ism.

One con­ven­tion, ar­bi­trary meta­da­ta. It is a uni­fied no­ta­tion: re­gard­less of whether some­thing was thought, spo­ken, or writ­ten by me in 2020, it gets the same no­ta­tion—“Gw­ern2020”. The ev­i­den­tial can be ex­panded as nec­es­sary: if it’s a pa­per or es­say, the ‘2020’ can be a hy­per­link, or if it’s a ‘per­sonal com­mu­ni­ca­tion’, then there can be a bib­li­og­ra­phy en­try stat­ing as much, or if it’s the au­thor about their own beliefs/actions/statements in 2020, fur­ther in­for­ma­tion nei­ther nec­es­sary nor usu­ally pos­si­ble (and it avoids awk­ward cus­tom phrase­ol­ogy like “As I thought back in 2020 or so….”). In con­trast, nor­mal ci­ta­tion style cum­ber­somely uses a differ­ent for­mat for each, or pro­vides no guid­ance: how do you grace­fully cite a pa­per writ­ten one year but whose au­thor changed their mind 5 years later based on new re­sults and who told you so 10 years after that? (“Dr. Bach orig­i­nally main­tained A (Bach et al. (2000)) but grad­u­ally mod­i­fied his po­si­tion un­til 2005 when he re­calls writ­ing in his di­ary he had lost all con­fi­dence in A (per­sonal com­mu­ni­ca­tion, ac­cord­ing to Frieden 2015)…”)

Multiple Authors

‘et al’ = ‘…’ Sin­gle or dou­ble au­thor­ship is straight­for­ward, just ‘Frieden­bach2012’ or ‘Frieden & Bach2012’ But how should mul­ti­-au­thor ci­ta­tions, cur­rently de­noted by ‘et al’ (or ‘et al.’ or even ‘et al.’), be han­dled?

The ex­ist­ing ‘et al’ no­ta­tion is pretty ridicu­lous: not only does it take up 6 let­ters and is nat­ural lan­guage which should be a sym­bol, it’s am­bigu­ous & hard to ma­chine-parse, and it’s not even Eng­lish2! Writ­ing ‘Foo et al2010’ or ‘Fooet al 2010’ does­n’t look nice, and it makes the sub­script­ing far less com­pact.

My cur­rent sug­ges­tion is to do the ex­pected thing: when you elide or omit some­thing in Eng­lish or tech­ni­cal writ­ing, how do you ex­press that? Why, with an ‘…’, of course. So one would write ‘Foo…2010’ or pos­si­bly ‘Foo…2010’. (I think the for­mer is prob­a­bly bet­ter, since there is less risk of con­fu­sion over what is be­ing elid­ed.)

Unicode Ellipsis

Funky al­ter­na­tives. Hor­i­zon­tal el­lip­sis aren’t the only kind: there are sev­eral oth­ers in Uni­code, in­clud­ing mid­line ‘⋯’ and ver­ti­cal ‘⋮’ and even the “down right di­ag­o­nal el­lip­sis” ‘⋱’, so one could do ‘Foo⋯2010’ or ‘Foo⋮2010’ or ‘Foo⋱2010’. (I’m not sure about sup­port for these par­tic­u­lar Uni­code en­ti­ties, but they show up with­out is­sue in my Fire­fox, Emacs, and urxvt, so they should­n’t be too rare.) The ver­ti­cal el­lip­sis is nice but un­for­tu­nately it’s hard to see the first/top dot be­cause it al­most over­laps with the fi­nal let­ter, mak­ing it look like a weird colon. The mid­line el­lip­sis is mid­dling, and does­n’t re­ally have any virtue. But I par­tic­u­larly like the last one, down-right-di­ag­o­nal el­lip­sis, be­cause it works vi­su­ally so well—it leads the eye down and to the right and is clear about the omis­sion be­ing an en­tire phrase, so to speak.

Generalized Evidentials

Ev­i­den­tials us­ing au­thors or years are short enough that they can be laid out as sim­ple sub­scripts. There is no new ty­po­graphic is­sue with that. But as dis­cussed above, there is no need to limit it to for­mal pub­li­ca­tions; knowl­edge can be de­rived from many sources, and even in the most for­mal aca­d­e­mic writ­ing, there are the oc­ca­sional pseudo-c­i­ta­tions like “Foo 2010 (per­sonal com­mu­ni­ca­tion)”. A com­plete ev­i­den­tial—­like “Foo told me so on the sec­ond day of our Black For­est camp­ing trip in 2010”—would be awk­ward to read if naively sub­script­ed.

In a non­in­ter­ac­tive for­mat, such ev­i­den­tials prob­a­bly must be rel­e­gated to footnotes/endnotes/; in an in­ter­ac­tive for­mat like HTML, we can do bet­ter.

For HTML, CSS sup­ports set­ting max­i­mum widths & trun­cat­ing with el­lip­sis over­flows, while ex­pand­ing width on hov­er, so one can do some­thing roughly like this:

.evidential { display: inline-block; white-space: nowrap; max-width: 10ch; overflow: hidden; text-overflow: ellipsis; }
.evidential:hover, .evidential:focus { max-width: min-content; position: relative; bottom: -0.5em; font-size: 80%; }
CSS pro­to­type of ex­pand­ing sub­scripts for long ev­i­den­tials.

Then the first 10 char­ac­ters will be dis­played, trun­cated by ‘…’, and if the reader hov­ers over it with their mouse, it ex­pands to re­veal the ar­bi­trar­i­ly-long ev­i­den­tial. The CSS seems tricky to get right, so it might be eas­ier to re­sort to Javascrip­t-based pop­ups like my ex­ist­ing link annotations/definitions us­ing .

Technical Support

Sub­scripts: al­ready in a the­ater near you. Be­cause it’s al­ready used so much in tech­ni­cal writ­ing, sub­script­ing is rea­son­ably fa­mil­iar to any­one who took high­school chem­istry & can be quickly fig­ured out from con­text for those who’ve for­got­ten, and it’s well-sup­ported by fonts and markup lan­guages and word proces­sors: it’s writ­ten x~t~ in & some Mark­down ex­ten­sions like markdown-it (but not Red­dit), x<sub>t</sub> in HTML, x<subscript>t</subscript> in , x_t in /, x\ :sub: \t in ; and it has key­bind­ings C-= in Mi­crosoft Office, C-B in , C-, in Google Docs etc. So sub­script­ing can be used al­most every­where im­me­di­ate­ly.

Example Use

Ex­am­ple: here are 3 ver­sions of a text; one stripped of ci­ta­tions and ev­i­den­tials, one with them writ­ten out in long form, and one with sub­scripts:

  1. I went to Is­tan­bul for a trip, and saw all the friendly street cats there, just as I’d read about in Ab­dul Bey; he quotes the lo­cal Hakim Ab­dul say­ing that the cats even look differ­ent from cats else­where (but after fur­ther thought, I’m not sure I agree with that there). I and my wife had a won­der­ful trip, al­though while she clearly en­joyed the trip to the city, she claimed the traffic was ter­ri­bly op­pres­sive and ru­ined the trip. (Oh re­al­ly?)
  2. In 2010, I went to Is­tan­bul for a trip, and saw all the friendly street cats there, just as I’d read about in Ab­dul Bey’s 2000 Street Cats of Is­tan­bul; he quotes the lo­cal Hakim Ab­dul in 1970 say­ing that the cats even look differ­ent from cats else­where (but after fur­ther thought as I write this now in 2020, I’m not sure I agree with Bey (2000)). I and my wife had a won­der­ful trip, al­though while she clearly en­joyed the trip to the city, on Face­book she claimed the traffic was ter­ri­bly op­pres­sive and ru­ined the trip. (Oh re­al­ly?)
  3. I2010 went to Is­tan­bul for a trip, and saw all the friendly street cats there, just as I’d read about in Ab­dul Bey2000 (Street Cats of Is­tan­bul); he quotes the lo­cal Hakim Ab­dul1970 say­ing that the cats even look differ­ent from cats else­where (but after fur­ther thought, I’m not sure I2020 agree with Bey2000). I and my wife had a won­der­ful trip, al­though while she clearly en­joyed the trip to the city, she claimedFB the traffic was ter­ri­bly op­pres­sive and ru­ined the trip. (Oh re­al­ly?)

In the first ver­sion, sup­press­ing the meta­data leads to a con­fus­ing pas­sage. What did Bey write? We don’t learn when Ab­dul ex­pressed his opin­ion—which is im­por­tant be­cause Is­tan­bul, as a large fast-grow­ing me­trop­o­lis, may have changed greatly over the 40 years from quote to vis­it. When did the speaker be­come skep­ti­cal of the claim Is­tan­bul cats both act & look differ­ent? What might ex­plain the wife’s in­con­sis­ten­cy, and which ver­sion should we put more weight on?

The sec­ond ver­sion an­swers all these ques­tions, but at the cost of con­sid­er­able pro­lix­i­ty, jam­ming in comma phrases to spec­ify date or source. Few peo­ple would want to ei­ther write or read such a pas­sage, and the fussi­ness has a dis­tinctly fussy pseudo-a­ca­d­e­mic air. Un­sur­pris­ing­ly, few peo­ple will bother with this—any more than they will bother pro­vid­ing in­fla­tion-ad­justed dol­lar amounts of some­thing from a decade ago (even though that’s mis­lead­ing by a good 15% or so, and com­pound­ing), or they’d want to check a pay­walled pa­per, or redo cal­cu­la­tions in Ro­man nu­mer­als.

The third ver­sion may look a lit­tle alien be­cause of the sub­scripts, but it pro­vides all the in­for­ma­tion of the sec­ond ver­sion plus a lit­tle more (by mak­ing ex­plicit the im­plicit ‘2020’), in con­sid­er­ably less space (as we can delete the cir­cum­lo­cu­tions in fa­vor of a sin­gle con­sis­tent sub­scrip­t), and reads more pleas­antly (the meta­data is lit­er­ally out of the way un­til we de­cide we need it).

Possible Alternative Notation

I con­sid­ered 3 al­ter­na­tives:

  1. Su­per­scripts: al­ready over­loaded as foot­notes & pow­ers

  2. Bang no­ta­tion: an­other pos­si­ble no­ta­tion for dis­am­biguat­ing, is the “X!Y” no­ta­tion (ap­par­ently de­rived from ), which is as­so­ci­ated with on­line fan­doms & fan­fic­tion, and gives no­ta­tion like “2020!g­w­ern”.

    This no­ta­tion puts the meta­data first, which is con­fus­ing yo­das­peak (what does the ‘2020’ re­fer to? It dan­gles un­til you read on); it makes it in­line & ful­l-sized, and then tacks on an ad­di­tional char­ac­ter just to take up even more space; it’s con­fus­ing and un­usual to any­one who is­n’t fa­mil­iar with it from on­line fan­fic­tion al­ready, and to those who are fa­mil­iar, it is low-s­ta­tus and has bad con­no­ta­tions.

  3. Ruby an­no­ta­tions: as men­tioned above, there is (but with spotty browser sup­port & no sup­port at all in most other for­mats) for ‘ruby’ an­no­ta­tions which are sim­i­lar to su­per­scripts and in­tended for in­ter­lin­ear gloss­es.

    Un­for­tu­nate­ly, in a hor­i­zon­tal lan­guage like Eng­lish (as op­posed to Chinese/Japanese), they re­quire ex­tremely high to be at all leg­i­ble. Ex­am­ple:

    Ex­am­ple of HTML ‘ruby’ in­ter­lin­ear gloss (eg <ruby>$200<rt>$375 in 2020</ruby>).
  4. New sym­bols: no font, ed­i­tor, or word proces­sor sup­port kills any new sym­bol pro­pos­al, and can be re­jected out of hand.

Disadvantages

Deal-break­er: low sta­tus? The ma­jor down­side, of course, is that sub­script­ing is novel and weird. It at least is not as­so­ci­ated with any­thing bad (such as fanfic­s), and is as­so­ci­ated with sci­ence & tech­nol­o­gy, but I’m sure it will de­ter read­ers any­way. Does it do enough good to be worth us­ing de­spite the con­sid­er­able hit to weird­ness points? That I don’t know.


  1. I am con­sid­er­ably less im­pressed by other GS lin­guis­tic sug­ges­tions like , but sub­script­ing seems like it may be worth res­cu­ing.↩︎

  2. Ac­tu­al­ly, it’s not even Latin be­cause it’s an ab­bre­vi­a­tion for the ac­tual Latin phrase, (to save you one char­ac­ter and also avoid the need to cor­rectly con­ju­gate the Lat­in—this is frac­tal, is what I’m say­ing), but as pseudo-Lat­in, that means that many will ital­i­cize it, as for­eign words/phrases usu­ally are—but now that is even more work, even more vi­sual clut­ter, and in­tro­duces am­bi­gu­ity with other uses of ital­ics like ti­tles. A ter­ri­ble no­ta­tion, and what could be more pre­ten­tious?↩︎