How Complex Are Individual Differences?

Individual human brains are more predictable and similar than they are different, reflecting low Kolmogorov complexity and implying that beta uploading may be more feasible than guessed, with suggestions on optimizing archived information.
psychology, philosophy, sociology, statistics, transhumanism, NN, insight-porn
2010-06-232019-06-14 in progress certainty: likely importance: 4

Every hu­man is differ­ent in a myr­iad of ways, from their mem­o­ries to their per­son­al­ity to their skills or knowl­edge to in­tel­li­gence and cog­ni­tive abil­i­ties to moral and po­lit­i­cal val­ues. But hu­mans are also often re­mark­ably sim­i­lar, oc­cu­py­ing a small area of mind-space—even a chim­panzee is alien in a way that their rec­og­niz­able sim­i­lar­i­ties to us only em­pha­size, never mind some­thing like an oc­to­pus, much less aliens or AIs.

So this raises a ques­tion: are in­di­vid­u­als, with their differ­ences, more or less in­for­ma­tion-the­o­ret­i­cally com­plex than some generic av­er­age hu­man brain is in to­tal? That is, if you some­how man­aged to en­code an av­er­age hu­man brain into a com­puter up­load (I’ll as­sume pat­ternism here), into a cer­tain num­ber of bits, would you then re­quire as many or more bits to con­vert that av­er­age brain into a spe­cific per­son, or would you re­quire many few­er?

This bears on some is­sues such as:

  1. in a com­puter up­load sce­nar­io, would it be nec­es­sary to scan every in­di­vid­ual hu­man brain in minute de­tail in or­der to up­load them, or would it be pos­si­ble to map only one brain in depth and then more coarse meth­ods would suffice for all sub­se­quent up­loads? To mold or evolve an up­load to­wards one spe­cialty or an­oth­er, would it be fea­si­ble to start with the generic brain and train it, or are brains so com­plex and idio­syn­cratic that one would have to start with a spe­cific brain al­ready close to the de­sired goal?
  2. is cry­on­ics (ex­pen­sive and highly un­likely to work) the only way to re­cover from death, or would it be pos­si­ble to aug­ment poor vit­ri­fi­ca­tion with sup­ple­men­tal in­for­ma­tion like di­aries to en­able full re­viv­i­fi­ca­tion? Or would it be pos­si­ble to be recre­ated en­tirely from sur­viv­ing data and records, so-called “beta up­load­ing” or “beta sim­u­la­tions”, in some more mean­ing­ful method than “sim­u­late all pos­si­ble hu­man brains”?

When I in­tro­spect, I do not feel es­pe­cially com­plex or unique or more than the prod­uct of the in­puts over my life. I feel I am the prod­uct of a large num­ber of in­built & learned mech­a­nisms, heuris­tics, and mem­o­ries, op­er­at­ing mech­a­nis­ti­cal­ly, re­peat­ably, and un­con­scious­ly. Once in a great while, while read­ing old blog posts or re­view­ing old emails, I com­pose a long re­ply, only to dis­cover that I had writ­ten one al­ready, which is sim­i­lar or even ex­actly the same al­most down to the word, and chilled, I feel like an au­toma­ton, just an­other sys­tem as lim­ited and pre­dictable to a greater in­tel­li­gence as a Sphex wasp or my cat are to me, not even an es­pe­cially unique one but a mediocre re­sult of my par­tic­u­lar as­sort­ment of genes and mu­ta­tion load and con­gen­i­tal de­fects and in­fec­tions and de­vel­op­ment noise and shared en­vi­ron­ment and me­dia con­sump­tion

One way is to ask how com­plex the brain could be.

Descriptive Complexity

Work­ing from the bot­tom up, we could ask how much in­for­ma­tion it takes to en­code in­di­vid­ual brains. The Whole Brain Em­u­la­tion Roadmap re­ports a num­ber of es­ti­mates of how much stor­age an up­load might re­quire, which re­flects the com­plex­ity of a brain at var­i­ous lev­els of de­tail, in “Ta­ble 8: Stor­age de­mands (em­u­la­tion on­ly, hu­man brain)” (pg79):

Ta­ble 8: Stor­age de­mands (em­u­la­tion on­ly, hu­man brain) [Sand­berg & Bostrom 2008]
Level Model # en­ti­ties Bytes per en­tity Mem­ory de­mands (Tb) Ear­li­est year, $1 mil­lion
1 Com­pu­ta­tional mod­ule 100–1,000? ? ? ?
2 Brain re­gion con­nec­tiv­ity 105 re­gions, 107 con­nec­tions 3? (2: byte con­nec­tiv­i­ty, 1 byte weight) 3 ∙ 10-5 Present
3 Ana­log net­work pop­u­la­tion model 108 pop­u­la­tions, 1013 con­nec­tions 5 (3-byte con­nec­tiv­i­ty, 1 byte weight, 1 byte ex­tra state vari­able) 50 Present
4 Spik­ing neural net­work 1011 neu­rons, 1015 con­nec­tions 8 (4-byte con­nec­tiv­i­ty, 4 state vari­ables) 8,000 2019
5 Elec­tro­phys­i­ol­ogy 1015 com­part­ments x 10 state vari­ables = 1016 1 byte per state vari­able 10,000 2019
6 Metabolome 1016 com­part­ments x 102 metabo­lites= 1018 1 byte per state vari­able 106 2029
7 Pro­teome 1016 com­part­ments x 103 pro­teins and metabo­lites = 1019 1 byte per state vari­able 107 2034
8 States of pro­tein com­plexes 1016 com­part­ments x 103 pro­teins x 10 states = 1020 1 byte per state vari­able 108 2038
9 Dis­tri­b­u­tion of com­plexes 1016 com­part­ments x 103 pro­teins and metabo­lites x 100 states/lo­ca­tions 1 byte per state vari­able 109 2043
9 Full 3D EM map (Fi­ala, 2002) 50x2.5x2.5 nm 1 byte per vox­el, com­pressed 109 2043
10 Sto­chas­tic be­hav­iour of sin­gle mol­e­cules 1025 mol­e­cules 31 (2 bytes mol­e­cule type, 14 bytes po­si­tion, 14 bytes ve­loc­i­ty, 1 byte state) 3.1∙1014 2069
11 Quan­tum Ei­ther ≈ 1026 atoms, or smaller num­ber of quan­tum-s­tate car­ry­ing mol­e­cules Qbits ? ?

The most likely scale is the spik­ing neural net­work but not lower lev­els (like in­di­vid­ual neural com­part­ments or mol­e­cules), which they quote at 10^11 neu­rons with 1015 con­nec­tions, at 4 bytes for the con­nec­tions and 4 bytes for neural state, giv­ing 8000 ter­abytes—which is quite large. (A byte for weights may be over­gen­er­ous; Bar­tol et al 2015 es­ti­mates synap­tic weights at >4.7 bits, not 8 bit­s.) If 8000tb is any­where close to the true com­plex­ity of one­self, beta sim­u­la­tions are highly un­likely to re­sult in any mean­ing­ful con­ti­nu­ity of self, as even if one did noth­ing but write in di­aries every wak­ing mo­ment, the raw text would never come any­where near 8000tb (typ­ing speeds tend to top out at ~100WPM, or 2.8 bil­lion words or ~22.4GB or 0.22tb in a life­time).

Bandwidth & Storage Bounds

An­other ap­proach is func­tion­al: how much in­for­ma­tion can the hu­man brain ac­tu­ally store when test­ed? On the other hand, es­ti­mates of hu­man brain ca­pac­ity tend to be far low­er.

One often cited quan­tifi­ca­tion is Lan­dauer 1986. Lan­dauer tested in­for­ma­tion re­ten­tion/re­call us­ing text read­ing, pic­ture recog­ni­tion mem­o­ry, and au­to­bi­o­graph­i­cal re­call, find­ing com­pa­ra­ble stor­age es­ti­mates for all modal­i­ties:

Ta­ble 1: Es­ti­mates of In­for­ma­tion Held in Hu­man Mem­ory (Lan­dauer 1986)
Source of pa­ra­me­ters Method of Es­ti­mate In­put Rate (b/s) Loss Rate (b/b/s) To­tal (bits)
Con­cen­trated read­ing 70-year lin­ear ac­cu­mu­la­tion 1.2
Pic­ture recog­ni­tion 70-year lin­ear ac­cu­mu­la­tion 2.3
Cen­tral val­ues as­ymp­totic 2.0 10-9
net gain over 70 years
Word knowl­edge se­man­tic nets x 15 do­mains

In the same vein, Mol­lica & Pi­anta­dosi 2019 es­ti­mate nat­ural lan­guage as a whole at 1.5MB, bro­ken down by bits as fol­lows (Ta­ble 1):

Ta­ble 1: Sum­mary of es­ti­mated bounds [in bits] across lev­els of lin­guis­tic analy­sis. (Mol­lica & Pi­anta­dosi 2019)
sec­tion do­main lower bound best guess up­per bound
2.1 phonemes 375 750 1,500
2.2 phone­mic word­forms 200,000 400,000 640,000
2.3 lex­i­cal se­man­tics 553,809 12,000,000 40,000,000

Based on the for­get­ting curve and man-cen­turies of data on , Woz­niak es­ti­mates that per­ma­nent long-term re­call of de­clar­a­tive mem­ory is lim­ited to 200–300 flash­card items per year per daily minute of re­view, so in a life­time of ~80 years and a rea­son­able amount of time spent on re­view, say 10 min­utes, would top out at a max­i­mum of items mem­o­rized; as each flash­card should en­code a min­i­mum fact such as a sin­gle word’s de­fi­n­i­tion, which is cer­tainly less than a kilo­byte of en­tropy, long-term mem­ory is bounded at around 240MB. (For com­par­ison, see pro­files of peo­ple with , whose most strik­ing at­tribute is nor­mal mem­o­ries com­bined with ex­tremely large amounts of time spent di­ariz­ing or re­call­ing or quizzing them­selves on the past; peo­ple who keep di­aries often note when reread­ing how much they for­get, which in a way em­pha­sizes how lit­tle au­to­bi­o­graph­i­cal mem­ory ap­par­ently mat­ters for con­ti­nu­ity of iden­ti­ty.)

Im­plicit mem­o­ry, such as of im­ages, can be used to store in­for­ma­tion For ex­am­ple, Drucker 2010, in , em­ployed vi­sual mem­ory to cal­cu­late ; he cites as prece­dent Stand­ing 1973:

In one of the most wide­ly-cited stud­ies on recog­ni­tion mem­o­ry, Stand­ing showed par­tic­i­pants an epic 10,000 pho­tographs over the course of 5 days, with 5 sec­onds’ ex­po­sure per im­age. He then tested their fa­mil­iar­i­ty, es­sen­tially as de­scribed above. The par­tic­i­pants showed an 83% suc­cess rate, sug­gest­ing that they had be­come fa­mil­iar with about 6,600 im­ages dur­ing their or­deal. Other vol­un­teers, trained on a smaller col­lec­tion of 1,000 im­ages se­lected for vivid­ness, had a 94% suc­cess rate.

At an 80% ac­cu­racy rate, we can even cal­cu­late how many bits of in­for­ma­tion can be en­trusted to the mes­sen­ger us­ing ; a cal­cu­la­tion gives 5.8 kilo­bit­s/725 bytes as the up­per limit1, or 0.23 bit­s/s, but the de­crease of the recog­ni­tion suc­cess rate sug­gests that to­tal recog­ni­tion mem­ory could not be used to store more than a few kilo­bytes.

Aside from low re­call of generic au­to­bi­o­graph­i­cal mem­o­ry, sets in around age 3–4, re­sult­ing in al­most to­tal loss of all episodic mem­ory prior to then, and the loss of per­haps 5% of all life­time mem­o­ries; the de­vel­op­men­tal the­ory is that due to lim­ited mem­ory ca­pac­i­ty, the ini­tial mem­o­ries sim­ply get over­writ­ten by later child­hood learn­ing & ex­pe­ri­ences. Child­hood am­ne­sia is some­times un­der­es­ti­mat­ed: many ‘mem­o­ries’ are pseudo-mem­o­ries of sto­ries as re­told by rel­a­tives.

is con­sid­ered a bot­tle­neck that in­for­ma­tion must pass through in or­der to be stored in short­-term mem­ory and then po­ten­tially into long-term mem­o­ry, but it is also small and slow. is a sim­ple test of work­ing mem­o­ry, in which one tries to store & re­call short ran­dom se­quences of the in­te­gers 0–10; a nor­mal adult for­ward digit span (with­out re­sort­ing to mnemon­ics or other strate­gies) might be around 7, and re­quires sev­eral sec­onds to store and re­call; thus, digit span sug­gests a max­i­mum stor­age of bits per sec­ond, or 8.3GB over a life­time.

A good reader will read at 200–300 words per min­ute; Claude Shan­non es­ti­mated a sin­gle char­ac­ter is <1 bit; Eng­lish words would weigh in at per­haps 8 bits per word (as vo­cab­u­lar­ies tend to top out at around 100,000 words, each word would be at most 16 bits) or 40 bits per sec­ond, so at 1 hour a day, a life­time of read­ing would con­vey a max­i­mum of 70MB. Lan­dauer’s sub­jects read 180 words per minute and he es­ti­mated 0.4 bits per word for a rate of 1.2 bits per sec­ond.

Deep learn­ing re­searcher Ge­offrey Hin­ton has re­peat­edly noted (in an ob­ser­va­tion echoed by ad­di­tional re­searchers dis­cussing the re­la­tion­ship be­tween su­per­vised learn­ing vs re­in­force­ment learn­ing vs un­su­per­vised learn­ing like Yann Le­Cun) that the num­ber of synapses & neu­rons in the brain ver­sus the length of our life­time im­plies that much of our brains must be spent learn­ing generic un­su­per­vised rep­re­sen­ta­tions of the world (such as via pre­dic­tive mod­el­ing):

The brain has about 1014 synapses and we only live for about 109 sec­onds. So we have a lot more pa­ra­me­ters than da­ta. This mo­ti­vates the idea that we must do a lot of un­su­per­vised learn­ing since the per­cep­tual in­put (in­clud­ing pro­pri­o­cep­tion) is the only place we can get 105 di­men­sions of con­straint per sec­ond.

Since we all ex­pe­ri­ence sim­i­lar per­cep­tual worlds with shared op­ti­cal il­lu­sions etc, and there is lit­tle feed­back from our choic­es, this would ap­pear to chal­lenge the idea of large in­di­vid­ual differ­ences. (A par­tic­u­larly acute prob­lem given that in re­in­force­ment learn­ing, the su­per­vi­sion is lim­ited to the re­wards, which pro­vide frac­tion of a bit spread over thou­sands or mil­lions of ac­tions taken in very high­-di­men­sional per­cep­tual states each of which may be hun­dreds of thou­sands or mil­lions of pa­ra­me­ters them­selves.)

Eric Drexler (2019) notes that cur­rent deep learn­ing mod­els in com­puter vi­sion have now achieved near-hu­man, or su­per­hu­man per­for­mance across a wide range of tasks re­quir­ing model sizes typ­i­cally around 500MB; the vi­sual cor­tex and brain re­gions re­lated to vi­sual pro­cess­ing make up a large frac­tion of the brain (per­haps as much as 20%), sug­gest­ing that ei­ther deep mod­els are ex­tremely effi­cient pa­ra­me­ter-wise com­pared to the brain2, the brain’s vi­sual pro­cess­ing is ex­tremely pow­er­ful in a way not cap­tured by any bench­marks such that cur­rent bench­marks re­quire <<1% of hu­man vi­sual ca­pa­bil­i­ty, or that a rel­a­tively small num­ber of cur­rent GPUs (eg 1000–15000) are equiv­a­lent to the brain.3 An­other way to state the point would be to con­sider Al­phaGo Ze­ro, whose fi­nal trained model (with­out any kind of com­pres­sion/dis­til­la­tion/s­par­si­fi­ca­tion) is prob­a­bly ~300MB (roughly es­ti­mat­ing from con­vo­lu­tion layer sizes) and achieves strongly su­per­hu­man per­for­mance in play­ing Go bet­ter than any liv­ing hu­man (and thus bet­ter than any hu­man ever) de­spite Go pro­fes­sion­als study­ing Go from early child­hood ful­l-time in spe­cial­ized Go school as in­sei and play­ing or study­ing de­tailed analy­ses of tens of thou­sands of games draw­ing on 2 mil­len­nia of Go study; if a hu­man brain truly re­quires 1 petabyte and is equal to Zero in effi­ciency of en­cod­ing Go knowl­edge, that im­plies the Go knowl­edge of a world cham­pion oc­cu­pies <0.00003% of their brain’s ca­pac­i­ty. This may be true but surely it would be sur­pris­ing, es­pe­cially given the ob­serv­able gross changes in brain re­gion vol­ume for sim­i­lar pro­fes­sion­als such as the changes in hip­pocam­pal vol­ume & dis­tri­b­u­tion in Lon­don taxi dri­vers learn­ing “the Knowl­edge” (Maguire et al 2000), and with the re­liance of chess ex­per­tise on long-term mem­ory (The Cam­bridge Hand­book of Ex­per­tise and Ex­pert Per­for­mance) & Go ex­per­tise on reusing the vi­sual cor­tex. Con­sider also : other pri­mates or cetaceans can have fairly sim­i­lar neu­ron counts as hu­mans, im­ply­ing that much of the brain is ded­i­cated to ba­sic mam­malian tasks like vi­sion or mo­tor co­or­di­na­tion, im­ply­ing that all the things we see as crit­i­cally im­por­tant to our sense of selves, such as our re­li­gious or po­lit­i­cal views, are sup­ported by a small frac­tion of the brain in­deed.

Overparameterization and biological robustness

The hu­man brain ap­pears to be highly re­dun­dant (al­beit with func­tion­al­ity highly lo­cal­ized to spe­cific points as demon­strated by le­sions and ‘grand­mother cells’), and par­tic­u­larly if the dam­age is slow or early in life, ca­pa­ble of cop­ing with loss of large amounts of tis­sue. The brains of the el­derly are no­tice­ably smaller than that of young peo­ple as neural loss con­tin­ues through­out the ag­ing process; but this re­sults in loss of iden­tity & con­ti­nu­ity only after decades of de­cay and at the ex­tremes like se­nile de­men­tia & Alzheimer’s dis­ease; like­wise, per­sonal iden­tity can sur­vive con­cus­sion or other trau­ma.4 Epilep­tic pa­tients who un­dergo the effec­tive pro­ce­dure of , sur­gi­cally dis­con­nect­ing or re­mov­ing en­tirely 1 cere­bral hemi­sphere, typ­i­cally re­cover well as their re­main­ing brain adapts to the new de­mands al­though some­times suffer­ing side-effects, and is often cited as an ex­am­ple of . Re­ported cases of peo­ple with such as due to show that ap­par­ently pro­foundly re­duced brain vol­umes, as much as down to 10% of nor­mal, are still ca­pa­ble of rel­a­tively nor­mal func­tion­ing5; sim­i­lar­ly, hu­man brain vol­umes vary con­sid­er­ably and the is rel­a­tively weak, both within hu­mans and be­tween species. Many an­i­mals & in­sects are ca­pa­ble of sur­pris­ingly com­plex spe­cial­ized be­hav­ior in the right cir­cum­stances6 de­spite some­times few neu­rons (eg )

An in­ter­est­ing par­al­lel is with ar­ti­fi­cial neural net­work­s/deep learn­ing; it is widely known that the NNs typ­i­cally trained with mil­lions or bil­lions of pa­ra­me­ters with dozens or hun­dreds of lay­ers are grossly over­pa­ra­me­ter­ized & deep & energy/FLOPS-demanding, be­cause there are many ways that NNs with equiv­a­lent in­tel­li­gence can be trained which need only a few lay­ers or many fewer pa­ra­me­ters or an or­der of mag­ni­tude fewer FLOPS. (Some ex­am­ples of NNs be­ing com­pressed in size or FLOPs by any­where from 50% to ~17,000%: , Gastaldi 2017, , , , , , Rawat & Wang 2017, , , , , , , , , , , , Rosen­feld & Tsot­sos 2018, , , , , , , , , , , Cheng et al 2018, , , Learn2­Com­press, /, , , , //, , Kusu­pati et al 2018, Tool 2018, , , , Google 2019, , /Liu 2020, , , //Amadori 2019, , , , , , , , , , , He et al 202, , , , ). This im­plies that while a NN may be ex­tremely ex­pen­sive to train ini­tially to a given level of per­for­mance and be large and slow, or­der of mag­ni­tude gains in effi­ciency are then pos­si­ble (in ad­di­tion to the gains from var­i­ous forms of hy­per­pa­ra­me­ter op­ti­miza­tion or ); there is no rea­son to ex­pect this to not ap­ply to a hu­man-level NN, so while the first hu­man-level NN may re­quire su­per­com­put­ers to train, it will quickly be pos­si­ble to run it on vastly more mod­est hard­ware (for 2017-style NNs, at least, there is with­out a doubt a se­ri­ous “hard­ware over­hang”).

One sug­ges­tion for why those small shal­low fast NNs can­not be trained di­rectly but must be dis­tilled/­com­pressed down from larger NNs, is the larger NNs are in­her­ently eas­ier to train be­cause the over­pa­ra­me­ter­i­za­tion pro­vides a smoother loss land­scape and more ways to travel be­tween lo­cal op­tima and find a near-global op­ti­ma; this would ex­plain why NNs need to be over­pa­ra­me­ter­ized, to train in a fea­si­ble amount of time, and per­haps why hu­man brains are so over­pa­ra­me­ter­ized as well. So ar­ti­fi­cial NNs are not as in­her­ently com­plex as they seem, but have much sim­pler forms; per­haps hu­man brains are not as com­plex as they look but can be com­pressed down to much sim­pler faster forms.

Predictive Complexity

A third tack is to treat the brain as a black box and take a Tur­ing-test-like view: if a sys­tem’s out­puts can be mim­ic­ked or pre­dicted rea­son­ably well by a smaller sys­tem, then that is the real com­plex­i­ty. So, how pre­dictable are hu­man choices and traits?

One ma­jor source of pre­dic­tive power is ge­net­ics.

A whole hu­man genome has ~3 bil­lion base-pairs, which can be stored in 1–3GB7. In­di­vid­ual hu­man differ­ences in ge­net­ics turn out to be fairly mod­est in mag­ni­tude: there are per­haps 5 mil­lion small mu­ta­tions and 2500 larger struc­tural vari­ants com­pared to a ref­er­ence genome (Au­ton et al 2015, , Seo et al 2016, ), many of which are cor­re­lated or fa­mil­ial, so given a ref­er­ence genome such as a rel­a­tive’s, the 1GB shrinks to a much smaller delta, ~125MB with one ap­proach, but pos­si­bly down to the MB range with more so­phis­ti­cated en­cod­ing tech­niques such as us­ing genome graphs rather than ref­er­ence se­quences. Just SNP geno­typ­ing (gen­er­ally cov­er­ing com­mon SNPs present in >=1% of the pop­u­la­tion) will typ­i­cally cover ~500,000 SNPs with 3 pos­si­ble val­ues, re­quir­ing less than 1MB.

A large frac­tion of in­di­vid­ual differ­ences can be as­cribed to those small ge­netic differ­ences. (For more back­ground on mod­ern be­hav­ioral ge­net­ics find­ings, see my ge­net­ics link bib­li­og­ra­phy.) Across thou­sands of mea­sured traits in twin stud­ies from blood pan­els of bio­mark­ers to dis­eases to an­thro­po­met­ric traits like height to psy­cho­log­i­cal traits like per­son­al­ity or in­tel­li­gence to so­cial out­comes like so­cioe­co­nomic sta­tus, the av­er­age mea­sured her­i­tabil­ity pre­dict ~50% of vari­ance (), with shared fam­ily en­vi­ron­ment ac­count­ing for 17%; this is a lower bound as it omits cor­rec­tions for is­sues like as­sor­ta­tive mat­ing or mea­sure­ment er­ror or ge­netic mo­saicism (par­tic­u­larly com­mon in neu­ron­s). SNPs alone ac­count for some­where around 15% (see & ; same caveat­s). Hu­mans are often stated to be rel­a­tively lit­tle vari­a­tion and ho­mo­ge­neous com­pared to other wild species, ap­par­ently due to loss of ge­netic di­ver­sity dur­ing hu­man ex­pan­sion out of Africa; pre­sum­ably if much of that di­ver­sity re­mained, her­i­tabil­i­ties might be even larg­er. Fur­ther, there is , where traits over­lap and in­flu­ence each oth­er, mak­ing them more pre­dictable; and the ge­netic in­flu­ence is often par­tially re­spon­si­ble for lon­gi­tu­di­nal con­sis­tency in in­di­vid­u­als’ traits over their life­times. A genome might seem to be re­dun­dant with a data source like one’s elec­tronic health records, but it’s worth re­mem­ber­ing that many things are not recorded in health or other records, and genomes are in­for­ma­tive about the things which did­n’t hap­pen just as much as things which did hap­pen—you can only die on­ce, but a genome has in­for­ma­tion on your vul­ner­a­bil­ity to all the dis­eases and con­di­tions you did­n’t hap­pen to get.

In psy­chol­ogy and so­ci­ol­ogy re­search, some vari­ables are so per­va­sively & pow­er­fully pre­dic­tive that they are rou­tinely mea­sured & in­cluded in analy­ses of every­thing, such as the stan­dard de­mo­graphic vari­ables of gen­der, eth­nic­i­ty, so­cioe­co­nomic sta­tus, , or . Nor is it un­com­mon to dis­cover that a large mass of ques­tions can often be boiled down to a few hy­poth­e­sized la­tent fac­tors (eg phi­los­o­phy). Whether it is the ex­ten­sive pre­dic­tive power of stereo­types (Jus­sim et al 2015) or the abil­ity of peo­ple to make above-chance “” guesses based on faces or pho­tographs of bed­rooms, or any num­ber of odd psy­chol­ogy re­sults, there ap­pears to be ex­ten­sive en­tan­gle­ment among per­sonal vari­ables; per­son­al­ity can be fa­mously in­ferred from Face­book ‘likes’ with high ac­cu­racy ap­proach­ing or sur­pass­ing in­for­mant rat­ings, or book/­movie pref­er­ences (, ), and from an in­for­ma­tion-the­o­retic tex­t-com­pres­sion per­spec­tive, much of the in­for­ma­tion in one’s Twit­ter tweets can be ex­tracted from the tweets of the ac­counts one in­ter­acts with () A sin­gle num­ber such as IQ (a vari­able, hav­ing an en­tropy of bit­s), for ex­am­ple, can pre­dict much of ed­u­ca­tion out­comes, and pre­dict more in con­junc­tion with the Big Five’s Con­sci­en­tious­ness & Open­ness. A ful­l-s­cale IQ test will also pro­vide mea­sure­ments of rel­e­vant vari­ables such as vi­su­ospa­tial abil­ity or vo­cab­u­lary size, and in to­tal might be ~10 bits. So a large num­ber of such vari­ables could be recorded in a sin­gle kilo­byte while pre­dict­ing many things about a per­son. Face­book sta­tus­es, for ex­am­ple, can be mined to ex­tract all of the­se; for ex­am­ple, demon­strate that a small FB sam­ple can pre­dict 13% of (mea­sured) IQ, 9–17% of (mea­sured) Big Five vari­ance, and sim­i­lar per­for­mances on a num­ber of other per­son­al­ity traits, and ex­tracts a large frac­tion of Big Five vari­ance from smart­phone ac­tiv­ity logs, while uses shop­ping/pur­chases to ex­tract a lesser frac­tion.

More gen­er­al­ly, al­most every pair of vari­ables shows a small but non-zero cor­re­la­tion (as­sum­ing suffi­cient data to over­come sam­pling er­ror), lead­ing to the ob­ser­va­tion that ; this is gen­er­ally at­trib­uted to all vari­ables be­ing em­bed­ded in uni­ver­sal causal net­works, such that, as Meehl notes, one can find baffling and (only ap­par­ent­ly) ar­bi­trary differ­ences such as male/fe­male differ­ences in agree­ment with the state­ment “I think Lin­coln was greater than Wash­ing­ton.” Thus, not only are there a few num­bers which can pre­dict a great deal of vari­ance, even ar­bi­trary vari­ables col­lec­tively are less un­pre­dictable than they may seem due to the hid­den con­nec­tions.

Personal Identity and Unpredictability

After all this, there will surely be er­rors in mod­el­ing and many recorded ac­tions or pref­er­ences will be un­pre­dictable. To what ex­tent does this er­ror term im­ply per­sonal com­plex­i­ty?

Mea­sure­ment er­ror is in­her­ent in all col­lectible dat­a­points, from IQ to In­ter­net logs; some of this is due to idio­syn­crasies of the spe­cific mea­sur­ing method such as the ex­act phras­ing of each ques­tion, but a de­cent chunk of it dis­ap­pears when mea­sure­ments are made many times over long time in­ter­vals. The im­pli­ca­tion is that our re­sponses have a con­sid­er­able com­po­nent of ran­dom­ness to them and so per­fect re­con­struc­tion to match recorded data is both im­pos­si­ble & un­nec­es­sary.

Tak­ing these er­rors too se­ri­ously would lead into a diffi­cult po­si­tion, that these tran­sient effects are vi­tal to per­sonal iden­ti­ty. But this brings up the clas­sic free will vs de­ter­min­ism ob­jec­tion: if we have free will in the sense of un­pre­dictable un­caused choic­es, then the choices can­not fol­low by phys­i­cal law from our pref­er­ences & wishes and in what sense is this our own ‘will’ and why would we want such a du­bi­ous gift at all? Sim­i­lar­ly, if our re­sponses to IQ tests or per­son­al­ity in­ven­to­ries have daily fluc­tu­a­tions which can­not be traced to any sta­ble long-term prop­er­ties of our selves nor to im­me­di­ate as­pects of our en­vi­ron­ments but the fluc­tu­a­tions ap­pears to be en­tirely ran­dom & un­pre­dictable, how can they ever con­sti­tute a mean­ing­ful part of our per­sonal iden­ti­ties?

Data sources

In terms of cost-effec­tive­ness and use­ful­ness, there are large differ­ences in cost & stor­age-size for var­i­ous pos­si­ble in­for­ma­tion sources:

  • IQ and per­son­al­ity in­ven­to­ries: free to ~$100; 1–100 bits

  • whole-genome se­quenc­ing: <$1000; <1GB

    Given the steep de­crease in genome se­quenc­ing costs his­tor­i­cal­ly, one should prob­a­bly wait an­other decade or so to do any whole-genome se­quenc­ing.

  • writ­ings, in de­scend­ing or­der; $0 (nat­u­rally pro­duced), <10GB (com­pressed)

    • chat/IRC logs
    • per­sonal writ­ings
    • life­time emails
  • di­ariz­ing: 4000 hours or >$29k? (10 min­utes per day over 60 years at min­i­mum wage), <2MB

  • au­to­bi­og­ra­phy: ? hours; <1MB

  • cry­on­ics for brain preser­va­tion: $80,000+, ?TB

While just know­ing IQ or Big Five is cer­tainly nowhere near enough for con­ti­nu­ity of per­sonal iden­ti­ty, it’s clear that mea­sur­ing those vari­ables has much greater bang-for-buck than record­ing the state of some neu­rons.

Depth of data collection

For some of the­se, one could spend al­most un­lim­ited effort, like in writ­ing an au­to­bi­og­ra­phy—one could ex­ten­sively in­ter­view rel­a­tives or at­tempt to cue child­hood mem­o­ries by re­vis­it­ing lo­ca­tions or reread­ing books or play­ing with tools.

But would that be all that use­ful? If one can­not eas­ily rec­ol­lect a child­hood mem­o­ry, then (pace the im­pli­ca­tions of child­hood am­ne­sia and the free will ar­gu­ment) can it re­ally be all that cru­cial to one’s per­sonal iden­ti­ty? And any crit­i­cal in­flu­ences from for­got­ten data should still be in­fer­able from the effects; if one is, say, badly trau­ma­tized by one’s abu­sive par­ents into be­ing un­able to form ro­man­tic re­la­tion­ships, the in­abil­ity is the im­por­tant part and will be ob­serv­able from a lack of ro­man­tic re­la­tion­ships, and the for­got­ten abuse does­n’t need to be re­trieved & stored.


  • Death Note Anonymity
  • Con­sci­en­tious­ness and on­line ed­u­ca­tion
  • Sim­u­la­tion in­fer­ences
  • http­s://www.g­w­­doc­s/psy­chol­o­gy/2012-bain­bridge.pdf
  • Wech­sler, range of hu­man vari­a­tion
  • why are sib­lings differ­en­t?, Plomin
  • lessons of be­hav­ioral ge­net­ics, Plomin


Efficient natural languages

A sin­gle Eng­lish char­ac­ter can be ex­pressed (in ) us­ing a byte, or ig­nor­ing the wasted high­-order bit, a full 7 bits. But Eng­lish is pretty pre­dictable, and is­n’t us­ing those 7 bits to good effect. found that each char­ac­ter was car­ry­ing more like 1 (0.6-1.3) bit of unguess­able in­for­ma­tion (d­iffer­ing from genre to genre8); Hamid Moradi found 1.62-2.28 bits on var­i­ous books9; Brown et al 1992 found <1.72 bits; Tea­han & Cleary 1996 got 1.46; Cover & King 1978 came up with 1.3 bits10; and found 1.6 bits for Eng­lish and that com­press­ibil­ity was sim­i­lar to this when us­ing trans­la­tions in Ara­bic/Chi­ne­se/French/­Greek/­Japan­ese/Ko­re­an/Rus­sian/S­pan­ish (with Japan­ese as an out­lier). In prac­tice, ex­ist­ing al­go­rithms can make it down to just 2 bits to rep­re­sent a char­ac­ter, and the­ory sug­gests the true en­tropy was around 0.8 bits per char­ac­ter.11 (This, in­ci­den­tally im­plies that the high­est band­width or­di­nary hu­man speech can at­tain is around 55 bits per sec­ond12, with nor­mal speech across many lan­guages es­ti­mated at .) Lan­guages can vary in how much they con­vey in a sin­gle ‘word’—an­cient Egypt­ian con­vey­ing ~7 bits per word and mod­ern Finnish around 10.413 (and word or­der­ing adding an­other at least 3 bits over most lan­guages); but we’ll ig­nore those com­pli­ca­tions.

What­ever the true en­tropy, it’s clear ex­ist­ing Eng­lish spelling is pretty waste­ful. How many char­ac­ters could we get away with? We could ask, how many bits does it take to uniquely spec­ify 1 out of, say, 100,000 words? Well, n bits can uniquely spec­ify 2n items; we want at least 100,000 items cov­ered by our bits, and as it hap­pens, 217 is 131072, which gives us some room to spare. (216 only gives us 65536, which would be enough for a pid­gin or some­thing.) We al­ready pointed out that a char­ac­ter can be rep­re­sented by 7 bits (in ASCII), so each char­ac­ter ac­counts for 7 of those 17 bits. 7+7+7 > 17, so 3 char­ac­ters. In this en­cod­ing, one of our 100,000 words would look like ‘AxC’ (and we’d have 30,000 un­used triplets to spare). That’s not so bad.

But as has often been pointed out, one of the ad­van­tages of our ver­bose sys­tem which can take as many as 9 char­ac­ters to ex­press a word like ‘ad­van­tage’ is that the waste also lets us un­der­stand par­tial mes­sages. The ex­am­ple given is a dis­emvow­eled sen­tence: ‘y cn ndrstnd Nglsh txt vn wtht th vwls’. Word lengths them­selves cor­re­spond roughly to fre­quency of use14 or av­er­age in­for­ma­tion con­tent.15

The an­swer given when any­one points out that a com­pressed file can be turned to non­sense by a sin­gle er­ror is that er­rors aren’t that com­mon, and the ‘nat­ural’ re­dun­dancy is very in­effi­cient in cor­rect­ing for er­rors16, and fur­ther, while there are some rea­sons to ex­pect lan­guages to have evolved to­wards effi­cien­cy, we have at least 2 ar­gu­ments that they may yet be very in­effi­cient:

  1. nat­ural lan­guages differ dra­mat­i­cally in al­most every way, as ev­i­dence by the diffi­culty Chom­skyians have in find­ing the of lan­guage; for ex­am­ple, av­er­age word length differs con­sid­er­ably from lan­guage to lan­guage. (Com­pare Ger­man and Eng­lish; they are closely re­lat­ed, yet one is short­er.)

    And specifi­cal­ly, nat­ural lan­guages seem to vary con­sid­er­ably in how much they can con­vey in a given time-u­nit; speak­ers make up for low-en­tropy syl­la­bles by speak­ing faster (and vice-ver­sa), but even after mul­ti­ply the num­ber of syl­la­bles by rate, the lan­guages still differ by as much as 30%17.

  2. speak­ers may pre­fer a con­cise short lan­guage with pow­er­ful er­ror-de­tect­ing and cor­rec­tion, since speak­ing is so tir­ing and meta­bol­i­cally cost­ly; but lis­ten­ers would pre­fer not to have to think hard and pre­fer that the speaker do all the work for them, and would thus pre­fer a less con­cise lan­guage with less pow­er­ful er­ror-de­tec­tion and cor­rec­tion18

One in­ter­est­ing nat­ural ex­per­i­ment in bi­nary en­cod­ing of lan­guages is the Kele lan­guage; its high and low tones add 1 bit to each syl­la­ble, and when the tones are trans­lated to drum­beats, it takes about 8:1 rep­e­ti­tion:

Kele is a tonal lan­guage with two sharply dis­tinct tones. Each syl­la­ble is ei­ther low or high. The drum lan­guage is spo­ken by a pair of drums with the same two tones. Each Kele word is spo­ken by the drums as a se­quence of low and high beats. In pass­ing from hu­man Kele to drum lan­guage, all the in­for­ma­tion con­tained in vow­els and con­so­nants is lost…in a tonal lan­guage like Kele, some in­for­ma­tion is car­ried in the tones and sur­vives the tran­si­tion from hu­man speaker to drums. The frac­tion of in­for­ma­tion that sur­vives in a drum word is small, and the words spo­ken by the drums are cor­re­spond­ingly am­bigu­ous. A sin­gle se­quence of tones may have hun­dreds of mean­ings de­pend­ing on the miss­ing vow­els and con­so­nants. The drum lan­guage must re­solve the am­bi­gu­ity of the in­di­vid­ual words by adding more words. When enough re­dun­dant words are added, the mean­ing of the mes­sage be­comes unique.

…She [his wife] sent him a mes­sage in drum lan­guage…the mes­sage needed to be ex­pressed with re­dun­dant and re­peated phras­es: “White man spirit in for­est come come to house of shin­gles high up above of white man spirit in for­est. Woman with yam awaits. Come come.” Car­ring­ton heard the mes­sage and came home. On the av­er­age, about eight words of drum lan­guage were needed to trans­mit one word of hu­man lan­guage un­am­bigu­ous­ly. West­ern math­e­mati­cians would say that about one eighth of the in­for­ma­tion in the hu­man Kele lan­guage be­longs to the tones that are trans­mit­ted by the drum lan­guage.19

With a good , you can com­press and eat your cake too. Ex­actly how much er­ror we can de­tect or cor­rect is given by the :

If we sup­pose that each word is 3 char­ac­ters long, and we get 1 er­ror every 2 words on av­er­age, our chan­nel ca­pac­ity is 6 char­ac­ters’ of bits (or 7*6, or 42), and our mis­take rate 1⁄6 of the char­ac­ters (or 7⁄42), sub­sti­tut­ing in we get:

Or in Haskell, we eval­u­ate20:

42 / 1 - (-(1/6 * logBase 2 (1/6) + (1 - 1/6) * logBase 2 (1 - 1/6)))

Which eval­u­ates to ~41. In other words, we started with 42 bits of pos­si­bly cor­rupted in­for­ma­tion, as­sumed a cer­tain er­ror rate, and asked how much could we com­mu­ni­cate given that er­ror rate; the differ­ence is what­ever we had to spend on ECC—1 bit. Try com­par­ing that to a vow­el-scheme. The vowel would not guar­an­tee de­tec­tion or cor­rec­tion (you may be able to decade ‘he st’ as ‘he sat’, but can you de­code ‘he at’ cor­rect­ly?), and even worse, vow­els de­mand an en­tire char­ac­ter, a sin­gle block of 7–8 bits, and can’t be sub­tly spread over all the char­ac­ters. So if our 2 words had one vow­el, we just blew 7 bits of in­for­ma­tion on that and that alone, and if there were more than 1 vow­el…

Of course, the Shan­non limit is the the­o­ret­i­cal as­ymp­totic ideal and re­quires com­plex so­lu­tions hu­mans could­n’t men­tally cal­cu­late on the fly. In re­al­i­ty, we would have to use some­thing much sim­pler and hence could­n’t get away with de­vot­ing just 1 bit to the FEC. But hope­fully it demon­strates that vow­els are a re­ally atro­cious form of er­ror-cor­rec­tion. What would be a good com­pro­mise be­tween hu­manly pos­si­ble sim­plic­ity and in­effi­ciency (com­pared to the Shan­non lim­it)? I don’t know.

The ex­is­tence of sim­i­lar neu­ronal path­ways across lan­guages & cul­tures for read­ing sug­gests that there could be hard lim­its on what sorts of lan­guages can be effi­ciently un­der­stood—­for ex­am­ple, char­ac­ters in all al­pha­bets tak­ing on av­er­age tak­ing 3 strokes to write. , who in­vented much of the early er­ror-cor­rect­ing codes, once de­vised a scheme for IBM (sim­i­lar to the ); num­ber let­ters or char­ac­ters from 1 to 37, and add them all up mod­ulo 37, which is the new pre­fix to the word. This check­sum han­dles what Ham­ming con­sid­ered the most com­mon hu­man er­rors like re­peat­ing or swap­ping dig­its.21 A re­lated idea is en­cod­ing bits into au­di­ble words which are as pho­net­i­cally dis­tant as pos­si­ble, so a bi­nary string (such as a cryp­to­graphic hash) can be spo­ken and heard with min­i­mum pos­si­bil­ity of er­ror; see or the 32-bit Mnemonic en­coder scheme.

  1. If p = 0.2 (based on the 80% suc­cess rate), then .↩︎

  2. Not that im­prob­a­ble, given how in­cred­i­bly bizarre and con­vo­luted much of neu­ro­bi­ol­ogy is, and the se­vere a brain must op­er­ate un­der.↩︎

  3. Lan­dauer notes some­thing very sim­i­lar in his con­clu­sion:

    Thus, the es­ti­mates all point to­ward a func­tional learned mem­ory con­tent of around a bil­lion bits for a ma­ture per­son. The con­sis­tency of the num­bers is re­as­sur­ing…­Com­puter sys­tems are now be­ing built with many bil­lion bit hard­ware mem­o­ries, but are not yet nearly able to mimic the as­so­cia­tive mem­ory pow­ers of our “bil­lion” bit func­tional ca­pac­i­ty. An at­trac­tive spec­u­la­tion from these jux­ta­posed ob­ser­va­tions is that the brain uses an enor­mous amount of ex­tra ca­pac­ity to do things that we have not yet learned how to do with com­put­ers. A num­ber of the­o­ries of hu­man mem­ory have pos­tu­lated the use of mas­sive re­dun­dancy as a means for ob­tain­ing such prop­er­ties as con­tent and con­text ad­dress­abil­i­ty, sen­si­tiv­ity to fre­quency of ex­pe­ri­ence, re­sis­tance to phys­i­cal dam­age, and the like (e.g., Lan­dauer, 1975; Hop­field, 1982; Ack­ley, Hin­ton, & Se­jnowski, 1985). Pos­si­bly we should not be look­ing for mod­els and mech­a­nisms that pro­duce stor­age economies (e.g., Collins & Quil­lian, 1972), but rather ones in which mar­vels are pro­duced by profli­gate use of ca­pac­i­ty.

  4. The clas­sic ex­am­ple of shows that large sec­tions of the brain can be de­stroyed by trauma and si­mul­ta­ne­ously still pre­serve con­ti­nu­ity of iden­tity while dam­ag­ing or elim­i­nat­ing con­ti­nu­ity of self, due to Gage be­ing able to re­mem­ber his life and re­main func­tional but with se­vere per­son­al­ity changes.↩︎

  5. That is, they are still ca­pa­ble of or­di­nary liv­ing al­though they often have pro­found prob­lems, and it is un­clear what the brain vol­ume re­duc­tion ac­tu­ally means, be­cause it affects pri­mar­ily the white mat­ter rather than the gray mat­ter, and no one seems to know how many neu­rons in the gray mat­ter may be lost. Re­ports of above-av­er­age IQ in se­vere hy­dro­cephalus cases are un­trust­wor­thy. For more ex­ten­sive dis­cus­sion of it all, see .↩︎

  6. Pos­ing se­ri­ous chal­lenges to at­tempts to mea­sure an­i­mal in­tel­li­gence, as it is easy both to an­thro­po­mor­phize & grossly over­es­ti­mate their cog­ni­tion, but also grossly un­der­es­ti­mate it due to poor choices of re­wards, sen­sory modal­i­ties, or tasks—.↩︎

  7. The raw data from the se­quencer would be much larger as it con­sists of many re­peated over­lap­ping se­quence runs, but this does­n’t re­flect the ac­tual genome’s size.↩︎

  8. “The Man Who In­vented Mod­ern Prob­a­bil­ity”, Nau­tilus is­sue 4:

    To mea­sure the artis­tic merit of texts, Kol­mogorov also em­ployed a let­ter-guess­ing method to eval­u­ate the en­tropy of nat­ural lan­guage. In in­for­ma­tion the­o­ry, en­tropy is a mea­sure of un­cer­tainty or un­pre­dictabil­i­ty, cor­re­spond­ing to the in­for­ma­tion con­tent of a mes­sage: the more un­pre­dictable the mes­sage, the more in­for­ma­tion it car­ries. Kol­mogorov turned en­tropy into a mea­sure of artis­tic orig­i­nal­i­ty. His group con­ducted a se­ries of ex­per­i­ments, show­ing vol­un­teers a frag­ment of Russ­ian prose or po­etry and ask­ing them to guess the next let­ter, then the next, and so on. Kol­mogorov pri­vately re­marked that, from the view­point of in­for­ma­tion the­o­ry, So­viet news­pa­pers were less in­for­ma­tive than po­et­ry, since po­lit­i­cal dis­course em­ployed a large num­ber of stock phrases and was highly pre­dictable in its con­tent. The verses of great po­ets, on the other hand, were much more diffi­cult to pre­dict, de­spite the strict lim­i­ta­tions im­posed on them by the po­etic form. Ac­cord­ing to Kol­mogorov, this was a mark of their orig­i­nal­i­ty. True art was un­like­ly, a qual­ity prob­a­bil­ity the­ory could help to mea­sure.

  9. H. Moradi, “En­tropy of Eng­lish text: Ex­per­i­ments with hu­mans and a ma­chine learn­ing sys­tem based on rough sets”, In­for­ma­tion Sci­ences, An In­ter­na­tional Jour­nal 104 (1998), 31-47↩︎

  10. T. M. Cov­er, “A con­ver­gent gam­bling es­ti­mate of the en­tropy of Eng­lish”, IEEE Trans. In­for­ma­tion The­ory, Vol­ume IT-24, no. 4, pp. 413-421, 1978↩︎

  11. Pe­ter Grass­berg­er, (2002)↩︎

  12. Rap­per Ricky Brown ap­par­ently set a rap­ping speed record in 2005 with “723 syl­la­bles in 51.27 sec­onds”, which is 14.1 syl­la­bles a sec­ond; if we as­sume that a syl­la­ble is 3 char­ac­ters on av­er­age, and go with an es­ti­mate of 1.3 bits per char­ac­ter, then the bits per sec­ond (b/s) is , or 55 b/s. This is some­thing of a lower bound; Ko­rean rap­per claims 17 syl­la­bles, which would be 66 b/s.↩︎

  13. See , Mon­te­murro 2011↩︎

  14. Which would be a sort of ; see also “En­tropy, and Short Codes”.↩︎

  15. “Word lengths are op­ti­mized for effi­cient com­mu­ni­ca­tion”: “We demon­strate a sub­stan­tial im­prove­ment on one of the most cel­e­brated em­pir­i­cal laws in the study of lan­guage, 75-y-old the­ory that word length is pri­mar­ily de­ter­mined by fre­quency of use. In ac­cord with ra­tio­nal the­o­ries of com­mu­ni­ca­tion, we show across 10 lan­guages that av­er­age in­for­ma­tion con­tent is a much bet­ter pre­dic­tor of word length than fre­quen­cy. This in­di­cates that hu­man lex­i­cons are effi­ciently struc­tured for com­mu­ni­ca­tion by tak­ing into ac­count in­ter­word sta­tis­ti­cal de­pen­den­cies. Lex­i­cal sys­tems re­sult from an op­ti­miza­tion of com­mu­nica­tive pres­sures, cod­ing mean­ings effi­ciently given the com­plex sta­tis­tics of nat­ural lan­guage use.”↩︎

  16. If we ar­gue that vow­els are serv­ing a use­ful pur­pose, then there’s a prob­lem. There are only 3 vow­els and some semi­-vow­els, so we have at the very start given up at least 20 let­ter­s—­tons of pos­si­bil­i­ties. To make a busi­ness anal­o­gy, you can’t burn 90% of your rev­enue on booze & par­ties, and make it up on vol­ume. Even the most triv­ial er­ror-cor­rec­tion is bet­ter than vow­els. For ex­am­ple, the last let­ter of every word could spec­ify how many let­ters there were and what frac­tion are vow­els; ‘a’ means there was 1 let­ter and it was a vow­el, ‘A’ means 1 con­so­nant, ‘b’ means 2 vow­els, ‘B’ means 2 con­so­nants’, ‘c’ means 1 vowel & 1 con­so­nant (in that or­der), ‘C’ means the re­verse, etc. So if you see ’John looked _tc Ju­lian’, the trail­ing ‘c’ im­plies the miss­ing let­ter is a vow­el, which could only be ‘a’.

    This point may be clearer if we look at sys­tems of writ­ing. An­cient He­brew, for ex­am­ple, was an script, with vow­el-indi­ca­tions (like the ) com­ing much lat­er. An­cient He­brew is also a dead lan­guage, no longer spo­ken in the ver­nac­u­lar by its de­scen­dants un­til the as , so oral tra­di­tions would not help much. Nev­er­the­less, the Bible is still well-un­der­stood, and the lack of vow­els rarely an is­sue; even the com­plete ab­sence of mod­ern punc­tu­a­tion did­n’t cause very many prob­lems. The ex­am­ples I know of are strik­ing for their unim­por­tance—the ex­act pro­nun­ci­a­tion of the or whether the with Je­sus im­me­di­ately went to heav­en.↩︎

  17. An early study found that read­ing speed in Chi­nese and Eng­lish were sim­i­lar when the in­for­ma­tion con­veyed was sim­i­lar (“Com­par­a­tive pat­terns of read­ing eye move­ment in Chi­nese and Eng­lish”); “A cross-lan­guage per­spec­tive on speech in­for­ma­tion rate” in­ves­ti­gated ex­actly how a num­ber of lan­guages traded off num­ber of syl­la­bles ver­sus talk­ing speed by record­ing a set of trans­lated sto­ries by var­i­ous na­tive speak­ers, and found that the two pa­ra­me­ters did not coun­ter-bal­ance ex­act­ly:

    In­for­ma­tion rate is shown to re­sult from a den­si­ty/rate trade-off il­lus­trated by a very strong neg­a­tive cor­re­la­tion be­tween the IDL and SRL. This re­sult con­firms the hy­poth­e­sis sug­gested fifty years ago by Karl­gren (1961:676) and re­ac­ti­vated more re­cently (Green­berg and Fos­ler-Lussier (2000); Locke (2008)): ‘It is a chal­leng­ing thought that gen­eral op­ti­mal­iza­tion rules could be for­mu­lated for the re­la­tion be­tween speech rate vari­a­tion and the sta­tis­ti­cal struc­ture of a lan­guage. Judg­ing from my ex­per­i­ments, there are rea­sons to be­lieve that there is an equi­lib­rium be­tween in­for­ma­tion value on the one hand and du­ra­tion and sim­i­lar qual­i­ties of the re­al­iza­tion on the other’ (Karl­gren 1961). How­ev­er, IRL ex­hibits more than 30% of vari­a­tion be­tween Japan­ese (0.74) and Eng­lish (1.08), in­val­i­dat­ing the first hy­poth­e­sis of a strict cross-lan­guage equal­ity of rates of in­for­ma­tion.

  18. , Can­cho 2002. From the ab­stract:

    In this ar­ti­cle, the early hy­poth­e­sis of Zipf of a prin­ci­ple of least effort for ex­plain­ing the law is shown to be sound. Si­mul­ta­ne­ous min­i­miza­tion in the effort of both hearer and speaker is for­mal­ized with a sim­ple op­ti­miza­tion process op­er­at­ing on a bi­nary ma­trix of sig­nal-ob­ject as­so­ci­a­tions. Zipf’s law is found in the tran­si­tion be­tween ref­er­en­tially use­less sys­tems and in­dex­i­cal ref­er­ence sys­tems. Our find­ing strongly sug­gests that Zipf’s law is a hall­mark of sym­bolic ref­er­ence and not a mean­ing­less fea­ture. The im­pli­ca­tions for the evo­lu­tion of lan­guage are dis­cussed

  19. “How We Know”, by in (re­view of James Gle­ick’s The In­for­ma­tion: A His­to­ry, a The­o­ry, a Flood)↩︎

  20. Haskell note: us­ing log­Base be­cause log is the nat­ural log­a­rithm, not the bi­nary log­a­rithm used in in­for­ma­tion the­o­ry.↩︎

  21. from “Cod­ing The­ory II” in The Art of Do­ing Sci­ence and En­gi­neer­ing, Richard W. Ham­ming 1997:

    I was once asked by AT&T how to code things when hu­mans were us­ing an al­pha­bet of 26 let­ter, ten dec­i­mal dig­its, plus a ‘space’. This is typ­i­cal of in­ven­tory nam­ing, parts nam­ing, and many other nam­ing of things, in­clud­ing the nam­ing of build­ings. I knew from tele­phone di­al­ing er­ror data, as well as long ex­pe­ri­ence in hand com­put­ing, hu­mans have a strong ten­dency to in­ter­change ad­ja­cent dig­its, a 67 is apt to be­come a 76, as well as change iso­lated ones, (usu­ally dou­bling the wrong dig­it, for ex­am­ple a 556 is likely to emerge as 566). Thus sin­gle er­ror de­tect­ing is not enough…Ed Gilbert, sug­gested a weighted code. In par­tic­u­lar he sug­gested as­sign­ing the num­bers (val­ues) 0, 1, 2, …, 36 to the sym­bols 0,1,…, 9, A, B, …, Z, space.

    …To en­code a mes­sage of n sym­bols leave the first sym­bol, k = 1, blank and what­ever the re­main­der is, which is less than 37, sub­tract it from 37 and use the cor­re­spond­ing sym­bol as a check sym­bol, which is to be put in the first po­si­tion. Thus the to­tal mes­sage, with the check sym­bol in the first po­si­tion, will have a check sum of ex­actly 0. When you ex­am­ine the in­ter­change of any two differ­ent sym­bols, as well as the change of any sin­gle sym­bol, you see it will de­stroy the weighted par­ity check, mod­ulo 37 (pro­vided the two in­ter­changed sym­bols are not ex­actly 37 sym­bols apart!). With­out go­ing into the de­tails, it is es­sen­tial the mod­u­lus be a prime num­ber, which 37 is.

    …If you were to use this en­cod­ing, for ex­am­ple, for in­ven­tory parts names, then the first time a wrong part name came to a com­put­er, say at trans­mis­sion time, if not be­fore (per­haps at or­der prepa­ra­tion time), the er­ror will be caught; you will not have to wait un­til the or­der gets to sup­ply head­quar­ters to be later told that there is no such part or else they have sent the wrong part! Be­fore it leaves your lo­ca­tion it will be caught and hence is quite eas­ily cor­rected at that time. Triv­ial? Yes! Effec­tive against hu­man er­rors (as con­trasted with the ear­lier white noise), yes!