How Complex Are Individual Differences?

Individual human brains are more predictable and similar than they are different, reflecting low Kolmogorov complexity and implying that beta uploading may be more feasible than guessed, with suggestions on optimizing archived information.
psychology, philosophy, sociology, statistics, transhumanism, NN, insight-porn
2010-06-232019-06-14 in progress certainty: likely importance: 4


Every human is differ­ent in a myr­iad of ways, from their mem­o­ries to their per­son­al­ity to their skills or knowl­edge to intel­li­gence and cog­ni­tive abil­i­ties to moral and polit­i­cal val­ues. But humans are also often remark­ably sim­i­lar, occu­py­ing a small area of mind-space—even a chim­panzee is alien in a way that their rec­og­niz­able sim­i­lar­i­ties to us only empha­size, never mind some­thing like an octo­pus, much less aliens or AIs.

So this raises a ques­tion: are indi­vid­u­als, with their differ­ences, more or less infor­ma­tion-the­o­ret­i­cally com­plex than some generic aver­age human brain is in total? That is, if you some­how man­aged to encode an aver­age human brain into a com­puter upload (I’ll assume pat­ternism here), into a cer­tain num­ber of bits, would you then require as many or more bits to con­vert that aver­age brain into a spe­cific per­son, or would you require many few­er?

This bears on some issues such as:

  1. in a com­puter upload sce­nar­io, would it be nec­es­sary to scan every indi­vid­ual human brain in minute detail in order to upload them, or would it be pos­si­ble to map only one brain in depth and then more coarse meth­ods would suffice for all sub­se­quent uploads? To mold or evolve an upload towards one spe­cialty or anoth­er, would it be fea­si­ble to start with the generic brain and train it, or are brains so com­plex and idio­syn­cratic that one would have to start with a spe­cific brain already close to the desired goal?
  2. is cry­on­ics (ex­pen­sive and highly unlikely to work) the only way to recover from death, or would it be pos­si­ble to aug­ment poor vit­ri­fi­ca­tion with sup­ple­men­tal infor­ma­tion like diaries to enable full reviv­i­fi­ca­tion? Or would it be pos­si­ble to be recre­ated entirely from sur­viv­ing data and records, so-called “beta upload­ing” or “beta sim­u­la­tions”, in some more mean­ing­ful method than “sim­u­late all pos­si­ble human brains”?

When I intro­spect, I do not feel espe­cially com­plex or unique or more than the prod­uct of the inputs over my life. I feel I am the prod­uct of a large num­ber of inbuilt & learned mech­a­nisms, heuris­tics, and mem­o­ries, oper­at­ing mech­a­nis­ti­cal­ly, repeat­ably, and uncon­scious­ly. Once in a great while, while read­ing old blog posts or review­ing old emails, I com­pose a long reply, only to dis­cover that I had writ­ten one already, which is sim­i­lar or even exactly the same almost down to the word, and chilled, I feel like an automa­ton, just another sys­tem as lim­ited and pre­dictable to a greater intel­li­gence as a Sphex wasp or my cat are to me, not even an espe­cially unique one but a mediocre result of my par­tic­u­lar assort­ment of genes and muta­tion load and con­gen­i­tal defects and infec­tions and devel­op­ment noise and shared envi­ron­ment and media con­sump­tion

One way is to ask how com­plex the brain could be.

Descriptive Complexity

Work­ing from the bot­tom up, we could ask how much infor­ma­tion it takes to encode indi­vid­ual brains. The Whole Brain Emu­la­tion Roadmap reports a num­ber of esti­mates of how much stor­age an upload might require, which reflects the com­plex­ity of a brain at var­i­ous lev­els of detail, in “Table 8: Stor­age demands (em­u­la­tion only, human brain)” (pg79):

Table 8: Stor­age demands (em­u­la­tion only, human brain) [Sand­berg & Bostrom 2008]
Level Model # enti­ties Bytes per entity Mem­ory demands (Tb) Ear­li­est year, $1 mil­lion
1 Com­pu­ta­tional mod­ule 100–1,000? ? ? ?
2 Brain region con­nec­tiv­ity 105 regions, 107 con­nec­tions 3? (2: byte con­nec­tiv­i­ty, 1 byte weight) 3 ∙ 10-5 Present
3 Ana­log net­work pop­u­la­tion model 108 pop­u­la­tions, 1013 con­nec­tions 5 (3-byte con­nec­tiv­i­ty, 1 byte weight, 1 byte extra state vari­able) 50 Present
4 Spik­ing neural net­work 1011 neu­rons, 1015 con­nec­tions 8 (4-byte con­nec­tiv­i­ty, 4 state vari­ables) 8,000 2019
5 Elec­tro­phys­i­ol­ogy 1015 com­part­ments x 10 state vari­ables = 1016 1 byte per state vari­able 10,000 2019
6 Metabolome 1016 com­part­ments x 102 metabo­lites= 1018 1 byte per state vari­able 106 2029
7 Pro­teome 1016 com­part­ments x 103 pro­teins and metabo­lites = 1019 1 byte per state vari­able 107 2034
8 States of pro­tein com­plexes 1016 com­part­ments x 103 pro­teins x 10 states = 1020 1 byte per state vari­able 108 2038
9 Dis­tri­b­u­tion of com­plexes 1016 com­part­ments x 103 pro­teins and metabo­lites x 100 states/locations 1 byte per state vari­able 109 2043
9 Full 3D EM map (Fi­ala, 2002) 50x2.5x2.5 nm 1 byte per vox­el, com­pressed 109 2043
10 Sto­chas­tic behav­iour of sin­gle mol­e­cules 1025 mol­e­cules 31 (2 bytes mol­e­cule type, 14 bytes posi­tion, 14 bytes veloc­i­ty, 1 byte state) 3.1∙1014 2069
11 Quan­tum Either ≈ 1026 atoms, or smaller num­ber of quan­tum-s­tate car­ry­ing mol­e­cules Qbits ? ?

The most likely scale is the spik­ing neural net­work but not lower lev­els (like indi­vid­ual neural com­part­ments or mol­e­cules), which they quote at 10^11 neu­rons with 1015 con­nec­tions, at 4 bytes for the con­nec­tions and 4 bytes for neural state, giv­ing 8000 ter­abytes—which is quite large. (A byte for weights may be over­gen­er­ous; Bar­tol et al 2015 esti­mates synap­tic weights at >4.7 bits, not 8 bit­s.) If 8000tb is any­where close to the true com­plex­ity of one­self, beta sim­u­la­tions are highly unlikely to result in any mean­ing­ful con­ti­nu­ity of self, as even if one did noth­ing but write in diaries every wak­ing moment, the raw text would never come any­where near 8000tb (typ­ing speeds tend to top out at ~100WPM, or 2.8 bil­lion words or ~22.4GB or 0.22tb in a life­time).

Bandwidth & Storage Bounds

Another approach is func­tion­al: how much infor­ma­tion can the human brain actu­ally store when test­ed? On the other hand, esti­mates of human brain capac­ity tend to be far low­er.

One often cited quan­tifi­ca­tion is Lan­dauer 1986. Lan­dauer tested infor­ma­tion retention/recall using text read­ing, pic­ture recog­ni­tion mem­o­ry, and auto­bi­o­graph­i­cal recall, find­ing com­pa­ra­ble stor­age esti­mates for all modal­i­ties:

Table 1: Esti­mates of Infor­ma­tion Held in Human Mem­ory (Lan­dauer 1986)
Source of para­me­ters Method of Esti­mate Input Rate (b/s) Loss Rate (b/b/s) Total (bits)
Con­cen­trated read­ing 70-year lin­ear accu­mu­la­tion 1.2
Pic­ture recog­ni­tion 70-year lin­ear accu­mu­la­tion 2.3
Cen­tral val­ues asymp­totic 2.0 10-9
net gain over 70 years
Word knowl­edge seman­tic nets x 15 domains

In the same vein, Mol­lica & Pianta­dosi 2019 esti­mate nat­ural lan­guage as a whole at 1.5MB, bro­ken down by bits as fol­lows (Table 1):

Table 1: Sum­mary of esti­mated bounds [in bits] across lev­els of lin­guis­tic analy­sis. (Mol­lica & Pianta­dosi 2019)
sec­tion domain lower bound best guess upper bound
2.1 phonemes 375 750 1,500
2.2 phone­mic word­forms 200,000 400,000 640,000
2.3 lex­i­cal seman­tics 553,809 12,000,000 40,000,000

Based on the for­get­ting curve and man-cen­turies of data on , Woz­niak esti­mates that per­ma­nent long-term recall of declar­a­tive mem­ory is lim­ited to 200–300 flash­card items per year per daily minute of review, so in a life­time of ~80 years and a rea­son­able amount of time spent on review, say 10 min­utes, would top out at a max­i­mum of items mem­o­rized; as each flash­card should encode a min­i­mum fact such as a sin­gle word’s defi­n­i­tion, which is cer­tainly less than a kilo­byte of entropy, long-term mem­ory is bounded at around 240MB. (For com­par­ison, see pro­files of peo­ple with , whose most strik­ing attribute is nor­mal mem­o­ries com­bined with extremely large amounts of time spent diariz­ing or recall­ing or quizzing them­selves on the past; peo­ple who keep diaries often note when reread­ing how much they for­get, which in a way empha­sizes how lit­tle auto­bi­o­graph­i­cal mem­ory appar­ently mat­ters for con­ti­nu­ity of iden­ti­ty.)

Implicit mem­o­ry, such as of images, can be used to store infor­ma­tion For exam­ple, Drucker 2010, in , employed visual mem­ory to cal­cu­late ; he cites as prece­dent Stand­ing 1973:

In one of the most wide­ly-cited stud­ies on recog­ni­tion mem­o­ry, Stand­ing showed par­tic­i­pants an epic 10,000 pho­tographs over the course of 5 days, with 5 sec­onds’ expo­sure per image. He then tested their famil­iar­i­ty, essen­tially as described above. The par­tic­i­pants showed an 83% suc­cess rate, sug­gest­ing that they had become famil­iar with about 6,600 images dur­ing their ordeal. Other vol­un­teers, trained on a smaller col­lec­tion of 1,000 images selected for vivid­ness, had a 94% suc­cess rate.

At an 80% accu­racy rate, we can even cal­cu­late how many bits of infor­ma­tion can be entrusted to the mes­sen­ger using ; a cal­cu­la­tion gives 5.8 kilobits/725 bytes as the upper limit1, or 0.23 bits/s, but the decrease of the recog­ni­tion suc­cess rate sug­gests that total recog­ni­tion mem­ory could not be used to store more than a few kilo­bytes.

Aside from low recall of generic auto­bi­o­graph­i­cal mem­o­ry, sets in around age 3–4, result­ing in almost total loss of all episodic mem­ory prior to then, and the loss of per­haps 5% of all life­time mem­o­ries; the devel­op­men­tal the­ory is that due to lim­ited mem­ory capac­i­ty, the ini­tial mem­o­ries sim­ply get over­writ­ten by later child­hood learn­ing & expe­ri­ences. Child­hood amne­sia is some­times under­es­ti­mat­ed: many ‘mem­o­ries’ are pseudo-mem­o­ries of sto­ries as retold by rel­a­tives.

is con­sid­ered a bot­tle­neck that infor­ma­tion must pass through in order to be stored in short­-term mem­ory and then poten­tially into long-term mem­o­ry, but it is also small and slow. is a sim­ple test of work­ing mem­o­ry, in which one tries to store & recall short ran­dom sequences of the inte­gers 0–10; a nor­mal adult for­ward digit span (with­out resort­ing to mnemon­ics or other strate­gies) might be around 7, and requires sev­eral sec­onds to store and recall; thus, digit span sug­gests a max­i­mum stor­age of bits per sec­ond, or 8.3GB over a life­time.

A good reader will read at 200–300 words per min­ute; Claude Shan­non esti­mated a sin­gle char­ac­ter is <1 bit; Eng­lish words would weigh in at per­haps 8 bits per word (as vocab­u­lar­ies tend to top out at around 100,000 words, each word would be at most 16 bits) or 40 bits per sec­ond, so at 1 hour a day, a life­time of read­ing would con­vey a max­i­mum of 70MB. Lan­dauer’s sub­jects read 180 words per minute and he esti­mated 0.4 bits per word for a rate of 1.2 bits per sec­ond.

Deep learn­ing researcher Geoffrey Hin­ton has repeat­edly noted (in an obser­va­tion echoed by addi­tional researchers dis­cussing the rela­tion­ship between super­vised learn­ing vs rein­force­ment learn­ing vs unsu­per­vised learn­ing like Yann LeCun) that the num­ber of synapses & neu­rons in the brain ver­sus the length of our life­time implies that much of our brains must be spent learn­ing generic unsu­per­vised rep­re­sen­ta­tions of the world (such as via pre­dic­tive mod­el­ing):

The brain has about 1014 synapses and we only live for about 109 sec­onds. So we have a lot more para­me­ters than data. This moti­vates the idea that we must do a lot of unsu­per­vised learn­ing since the per­cep­tual input (in­clud­ing pro­pri­o­cep­tion) is the only place we can get 105 dimen­sions of con­straint per sec­ond.

Since we all expe­ri­ence sim­i­lar per­cep­tual worlds with shared opti­cal illu­sions etc, and there is lit­tle feed­back from our choic­es, this would appear to chal­lenge the idea of large indi­vid­ual differ­ences. (A par­tic­u­larly acute prob­lem given that in rein­force­ment learn­ing, the super­vi­sion is lim­ited to the rewards, which pro­vide frac­tion of a bit spread over thou­sands or mil­lions of actions taken in very high­-di­men­sional per­cep­tual states each of which may be hun­dreds of thou­sands or mil­lions of para­me­ters them­selves.)

Eric Drexler (2019) notes that cur­rent deep learn­ing mod­els in com­puter vision have now achieved near-hu­man, or super­hu­man per­for­mance across a wide range of tasks requir­ing model sizes typ­i­cally around 500MB; the visual cor­tex and brain regions related to visual pro­cess­ing make up a large frac­tion of the brain (per­haps as much as 20%), sug­gest­ing that either deep mod­els are extremely effi­cient para­me­ter-wise com­pared to the brain2, the brain’s visual pro­cess­ing is extremely pow­er­ful in a way not cap­tured by any bench­marks such that cur­rent bench­marks require <<1% of human visual capa­bil­i­ty, or that a rel­a­tively small num­ber of cur­rent GPUs (eg 1000–15000) are equiv­a­lent to the brain.3 Another way to state the point would be to con­sider AlphaGo Zero, whose final trained model (with­out any kind of compression/distillation/sparsification) is prob­a­bly ~300MB (roughly esti­mat­ing from con­vo­lu­tion layer sizes) and achieves strongly super­hu­man per­for­mance in play­ing Go bet­ter than any liv­ing human (and thus bet­ter than any human ever) despite Go pro­fes­sion­als study­ing Go from early child­hood ful­l-time in spe­cial­ized Go school as insei and play­ing or study­ing detailed analy­ses of tens of thou­sands of games draw­ing on 2 mil­len­nia of Go study; if a human brain truly requires 1 petabyte and is equal to Zero in effi­ciency of encod­ing Go knowl­edge, that implies the Go knowl­edge of a world cham­pion occu­pies <0.00003% of their brain’s capac­i­ty. This may be true but surely it would be sur­pris­ing, espe­cially given the observ­able gross changes in brain region vol­ume for sim­i­lar pro­fes­sion­als such as the changes in hip­pocam­pal vol­ume & dis­tri­b­u­tion in Lon­don taxi dri­vers learn­ing “the Knowl­edge” (Maguire et al 2000), and with the reliance of chess exper­tise on long-term mem­ory (The Cam­bridge Hand­book of Exper­tise and Expert Per­for­mance) & Go exper­tise on reusing the visual cor­tex. Con­sider also : other pri­mates or cetaceans can have fairly sim­i­lar neu­ron counts as humans, imply­ing that much of the brain is ded­i­cated to basic mam­malian tasks like vision or motor coor­di­na­tion, imply­ing that all the things we see as crit­i­cally impor­tant to our sense of selves, such as our reli­gious or polit­i­cal views, are sup­ported by a small frac­tion of the brain indeed.

Overparameterization and biological robustness

The human brain appears to be highly redun­dant (al­beit with func­tion­al­ity highly local­ized to spe­cific points as demon­strated by lesions and ‘grand­mother cells’), and par­tic­u­larly if the dam­age is slow or early in life, capa­ble of cop­ing with loss of large amounts of tis­sue. The brains of the elderly are notice­ably smaller than that of young peo­ple as neural loss con­tin­ues through­out the aging process; but this results in loss of iden­tity & con­ti­nu­ity only after decades of decay and at the extremes like senile demen­tia & Alzheimer’s dis­ease; like­wise, per­sonal iden­tity can sur­vive con­cus­sion or other trau­ma.4 Epilep­tic patients who undergo the effec­tive pro­ce­dure of , sur­gi­cally dis­con­nect­ing or remov­ing entirely 1 cere­bral hemi­sphere, typ­i­cally recover well as their remain­ing brain adapts to the new demands although some­times suffer­ing side-effects, and is often cited as an exam­ple of . Reported cases of peo­ple with such as due to show that appar­ently pro­foundly reduced brain vol­umes, as much as down to 10% of nor­mal, are still capa­ble of rel­a­tively nor­mal func­tion­ing5; sim­i­lar­ly, human brain vol­umes vary con­sid­er­ably and the is rel­a­tively weak, both within humans and between species. Many ani­mals & insects are capa­ble of sur­pris­ingly com­plex spe­cial­ized behav­ior in the right cir­cum­stances6 despite some­times few neu­rons (eg the spi­der)

An inter­est­ing par­al­lel is with arti­fi­cial neural networks/deep learn­ing; it is widely known that the NNs typ­i­cally trained with mil­lions or bil­lions of para­me­ters with dozens or hun­dreds of lay­ers are grossly over­pa­ra­me­ter­ized & deep & energy/FLOPS-demanding, because there are many ways that NNs with equiv­a­lent intel­li­gence can be trained which need only a few lay­ers or many fewer para­me­ters or an order of mag­ni­tude fewer FLOPS. (Some exam­ples of NNs being com­pressed in size or FLOPs by any­where from 50% to ~17,000%: , Gastaldi 2017, , , , , , Rawat & Wang 2017, , , , , , , , , , , , Rosen­feld & Tsot­sos 2018, , , , , , , , , , , Cheng et al 2018, , , Learn2­Com­press, /, , , , //, , Kusu­pati et al 2018, Tool 2018, , , , Google 2019, , /Liu 2020, , , //Amadori 2019, , , , , , , , , , , He et al 202, , , ). This implies that while a NN may be extremely expen­sive to train ini­tially to a given level of per­for­mance and be large and slow, order of mag­ni­tude gains in effi­ciency are then pos­si­ble (in addi­tion to the gains from var­i­ous forms of hyper­pa­ra­me­ter opti­miza­tion or ); there is no rea­son to expect this to not apply to a human-level NN, so while the first human-level NN may require super­com­put­ers to train, it will quickly be pos­si­ble to run it on vastly more mod­est hard­ware (for 2017-style NNs, at least, there is with­out a doubt a seri­ous “hard­ware over­hang”).

One sug­ges­tion for why those small shal­low fast NNs can­not be trained directly but must be distilled/compressed down from larger NNs, is the larger NNs are inher­ently eas­ier to train because the over­pa­ra­me­ter­i­za­tion pro­vides a smoother loss land­scape and more ways to travel between local optima and find a near-global opti­ma; this would explain why NNs need to be over­pa­ra­me­ter­ized, to train in a fea­si­ble amount of time, and per­haps why human brains are so over­pa­ra­me­ter­ized as well. So arti­fi­cial NNs are not as inher­ently com­plex as they seem, but have much sim­pler forms; per­haps human brains are not as com­plex as they look but can be com­pressed down to much sim­pler faster forms.

Predictive Complexity

A third tack is to treat the brain as a black box and take a Tur­ing-test-like view: if a sys­tem’s out­puts can be mim­ic­ked or pre­dicted rea­son­ably well by a smaller sys­tem, then that is the real com­plex­i­ty. So, how pre­dictable are human choices and traits?

One major source of pre­dic­tive power is genet­ics.

A whole human genome has ~3 bil­lion base-pairs, which can be stored in 1–3GB7. Indi­vid­ual human differ­ences in genet­ics turn out to be fairly mod­est in mag­ni­tude: there are per­haps 5 mil­lion small muta­tions and 2500 larger struc­tural vari­ants com­pared to a ref­er­ence genome (Auton et al 2015, , Seo et al 2016, ), many of which are cor­re­lated or famil­ial, so given a ref­er­ence genome such as a rel­a­tive’s, the 1GB shrinks to a much smaller delta, ~125MB with one approach, but pos­si­bly down to the MB range with more sophis­ti­cated encod­ing tech­niques such as using genome graphs rather than ref­er­ence sequences. Just SNP geno­typ­ing (gen­er­ally cov­er­ing com­mon SNPs present in >=1% of the pop­u­la­tion) will typ­i­cally cover ~500,000 SNPs with 3 pos­si­ble val­ues, requir­ing less than 1MB.

A large frac­tion of indi­vid­ual differ­ences can be ascribed to those small genetic differ­ences. (For more back­ground on mod­ern behav­ioral genet­ics find­ings, see my genet­ics link bib­li­og­ra­phy.) Across thou­sands of mea­sured traits in twin stud­ies from blood pan­els of bio­mark­ers to dis­eases to anthro­po­met­ric traits like height to psy­cho­log­i­cal traits like per­son­al­ity or intel­li­gence to social out­comes like socioe­co­nomic sta­tus, the aver­age mea­sured her­i­tabil­ity pre­dict ~50% of vari­ance (), with shared fam­ily envi­ron­ment account­ing for 17%; this is a lower bound as it omits cor­rec­tions for issues like assor­ta­tive mat­ing or mea­sure­ment error or genetic mosaicism (par­tic­u­larly com­mon in neu­ron­s). SNPs alone account for some­where around 15% (see & ; same caveat­s). Humans are often stated to be rel­a­tively lit­tle vari­a­tion and homo­ge­neous com­pared to other wild species, appar­ently due to loss of genetic diver­sity dur­ing human expan­sion out of Africa; pre­sum­ably if much of that diver­sity remained, her­i­tabil­i­ties might be even larg­er. Fur­ther, there is , where traits over­lap and influ­ence each oth­er, mak­ing them more pre­dictable; and the genetic influ­ence is often par­tially respon­si­ble for lon­gi­tu­di­nal con­sis­tency in indi­vid­u­als’ traits over their life­times. A genome might seem to be redun­dant with a data source like one’s elec­tronic health records, but it’s worth remem­ber­ing that many things are not recorded in health or other records, and genomes are infor­ma­tive about the things which did­n’t hap­pen just as much as things which did hap­pen—you can only die once, but a genome has infor­ma­tion on your vul­ner­a­bil­ity to all the dis­eases and con­di­tions you did­n’t hap­pen to get.

In psy­chol­ogy and soci­ol­ogy research, some vari­ables are so per­va­sively & pow­er­fully pre­dic­tive that they are rou­tinely mea­sured & included in analy­ses of every­thing, such as the stan­dard demo­graphic vari­ables of gen­der, eth­nic­i­ty, socioe­co­nomic sta­tus, , or . Nor is it uncom­mon to dis­cover that a large mass of ques­tions can often be boiled down to a few hypoth­e­sized latent fac­tors (eg phi­los­o­phy). Whether it is the exten­sive pre­dic­tive power of stereo­types (Jus­sim et al 2015) or the abil­ity of peo­ple to make above-chance “” guesses based on faces or pho­tographs of bed­rooms, or any num­ber of odd psy­chol­ogy results, there appears to be exten­sive entan­gle­ment among per­sonal vari­ables; per­son­al­ity can be famously inferred from Face­book ‘likes’ with high accu­racy approach­ing or sur­pass­ing infor­mant rat­ings, or book pref­er­ences (), and from an infor­ma­tion-the­o­retic tex­t-com­pres­sion per­spec­tive, much of the infor­ma­tion in one’s Twit­ter tweets can be extracted from the tweets of the accounts one inter­acts with () A sin­gle num­ber such as IQ (a vari­able, hav­ing an entropy of bit­s), for exam­ple, can pre­dict much of edu­ca­tion out­comes, and pre­dict more in con­junc­tion with the Big Five’s Con­sci­en­tious­ness & Open­ness. A ful­l-s­cale IQ test will also pro­vide mea­sure­ments of rel­e­vant vari­ables such as visu­ospa­tial abil­ity or vocab­u­lary size, and in total might be ~10 bits. So a large num­ber of such vari­ables could be recorded in a sin­gle kilo­byte while pre­dict­ing many things about a per­son. Face­book sta­tus­es, for exam­ple, can be mined to extract all of the­se; for exam­ple, demon­strate that a small FB sam­ple can pre­dict 13% of (mea­sured) IQ, 9–17% of (mea­sured) Big Five vari­ance, and sim­i­lar per­for­mances on a num­ber of other per­son­al­ity traits, and extracts a large frac­tion of Big Five vari­ance from smart­phone activ­ity logs, while uses shopping/purchases to extract a lesser frac­tion.

More gen­er­al­ly, almost every pair of vari­ables shows a small but non-zero cor­re­la­tion (as­sum­ing suffi­cient data to over­come sam­pling error), lead­ing to the obser­va­tion that ; this is gen­er­ally attrib­uted to all vari­ables being embed­ded in uni­ver­sal causal net­works, such that, as Meehl notes, one can find baffling and (only appar­ent­ly) arbi­trary differ­ences such as male/female differ­ences in agree­ment with the state­ment “I think Lin­coln was greater than Wash­ing­ton.” Thus, not only are there a few num­bers which can pre­dict a great deal of vari­ance, even arbi­trary vari­ables col­lec­tively are less unpre­dictable than they may seem due to the hid­den con­nec­tions.

Personal Identity and Unpredictability

After all this, there will surely be errors in mod­el­ing and many recorded actions or pref­er­ences will be unpre­dictable. To what extent does this error term imply per­sonal com­plex­i­ty?

Mea­sure­ment error is inher­ent in all col­lectible dat­a­points, from IQ to Inter­net logs; some of this is due to idio­syn­crasies of the spe­cific mea­sur­ing method such as the exact phras­ing of each ques­tion, but a decent chunk of it dis­ap­pears when mea­sure­ments are made many times over long time inter­vals. The impli­ca­tion is that our responses have a con­sid­er­able com­po­nent of ran­dom­ness to them and so per­fect recon­struc­tion to match recorded data is both impos­si­ble & unnec­es­sary.

Tak­ing these errors too seri­ously would lead into a diffi­cult posi­tion, that these tran­sient effects are vital to per­sonal iden­ti­ty. But this brings up the clas­sic free will vs deter­min­ism objec­tion: if we have free will in the sense of unpre­dictable uncaused choic­es, then the choices can­not fol­low by phys­i­cal law from our pref­er­ences & wishes and in what sense is this our own ‘will’ and why would we want such a dubi­ous gift at all? Sim­i­lar­ly, if our responses to IQ tests or per­son­al­ity inven­to­ries have daily fluc­tu­a­tions which can­not be traced to any sta­ble long-term prop­er­ties of our selves nor to imme­di­ate aspects of our envi­ron­ments but the fluc­tu­a­tions appears to be entirely ran­dom & unpre­dictable, how can they ever con­sti­tute a mean­ing­ful part of our per­sonal iden­ti­ties?

Data sources

In terms of cost-effec­tive­ness and use­ful­ness, there are large differ­ences in cost & stor­age-size for var­i­ous pos­si­ble infor­ma­tion sources:

  • IQ and per­son­al­ity inven­to­ries: free to ~$100; 1–100 bits

  • whole-genome sequenc­ing: <$1000; <1GB

    Given the steep decrease in genome sequenc­ing costs his­tor­i­cal­ly, one should prob­a­bly wait another decade or so to do any whole-genome sequenc­ing.

  • writ­ings, in descend­ing order; $0 (nat­u­rally pro­duced), <10GB (com­pressed)

    • chat/IRC logs
    • per­sonal writ­ings
    • life­time emails
  • diariz­ing: 4000 hours or >$29k? (10 min­utes per day over 60 years at min­i­mum wage), <2MB

  • auto­bi­og­ra­phy: ? hours; <1MB

  • cry­on­ics for brain preser­va­tion: $80,000+, ?TB

While just know­ing IQ or Big Five is cer­tainly nowhere near enough for con­ti­nu­ity of per­sonal iden­ti­ty, it’s clear that mea­sur­ing those vari­ables has much greater bang-for-buck than record­ing the state of some neu­rons.

Depth of data collection

For some of the­se, one could spend almost unlim­ited effort, like in writ­ing an auto­bi­og­ra­phy—one could exten­sively inter­view rel­a­tives or attempt to cue child­hood mem­o­ries by revis­it­ing loca­tions or reread­ing books or play­ing with tools.

But would that be all that use­ful? If one can­not eas­ily rec­ol­lect a child­hood mem­o­ry, then (pace the impli­ca­tions of child­hood amne­sia and the free will argu­ment) can it really be all that cru­cial to one’s per­sonal iden­ti­ty? And any crit­i­cal influ­ences from for­got­ten data should still be infer­able from the effects; if one is, say, badly trau­ma­tized by one’s abu­sive par­ents into being unable to form roman­tic rela­tion­ships, the inabil­ity is the impor­tant part and will be observ­able from a lack of roman­tic rela­tion­ships, and the for­got­ten abuse does­n’t need to be retrieved & stored.

TODO:

  • Death Note Anonymity
  • Con­sci­en­tious­ness and online edu­ca­tion
  • Sim­u­la­tion infer­ences
  • https://www.gwern.net/docs/psychology/2012-bainbridge.pdf
  • Wech­sler, range of human vari­a­tion
  • why are sib­lings differ­en­t?, Plomin
  • lessons of behav­ioral genet­ics, Plomin

Appendix

Efficient natural languages

A sin­gle Eng­lish char­ac­ter can be expressed (in ) using a byte, or ignor­ing the wasted high­-order bit, a full 7 bits. But Eng­lish is pretty pre­dictable, and isn’t using those 7 bits to good effect. found that each char­ac­ter was car­ry­ing more like 1 (0.6-1.3) bit of unguess­able infor­ma­tion (differ­ing from genre to genre8); Hamid Moradi found 1.62-2.28 bits on var­i­ous books9; Brown et al 1992 found <1.72 bits; Tea­han & Cleary 1996 got 1.46; Cover & King 1978 came up with 1.3 bits10; and found 1.6 bits for Eng­lish and that com­press­ibil­ity was sim­i­lar to this when using trans­la­tions in Arabic/Chinese/French/Greek/Japanese/Korean/Russian/Spanish (with Japan­ese as an out­lier). In prac­tice, exist­ing algo­rithms can make it down to just 2 bits to rep­re­sent a char­ac­ter, and the­ory sug­gests the true entropy was around 0.8 bits per char­ac­ter.11 (This, inci­den­tally implies that the high­est band­width ordi­nary human speech can attain is around 55 bits per sec­ond12, with nor­mal speech across many lan­guages esti­mated at .) Lan­guages can vary in how much they con­vey in a sin­gle ‘word’—an­cient Egypt­ian con­vey­ing ~7 bits per word and mod­ern Finnish around 10.413 (and word order­ing adding another at least 3 bits over most lan­guages); but we’ll ignore those com­pli­ca­tions.

What­ever the true entropy, it’s clear exist­ing Eng­lish spelling is pretty waste­ful. How many char­ac­ters could we get away with? We could ask, how many bits does it take to uniquely spec­ify 1 out of, say, 100,000 words? Well, n bits can uniquely spec­ify 2n items; we want at least 100,000 items cov­ered by our bits, and as it hap­pens, 217 is 131072, which gives us some room to spare. (216 only gives us 65536, which would be enough for a pid­gin or some­thing.) We already pointed out that a char­ac­ter can be rep­re­sented by 7 bits (in ASCII), so each char­ac­ter accounts for 7 of those 17 bits. 7+7+7 > 17, so 3 char­ac­ters. In this encod­ing, one of our 100,000 words would look like ‘AxC’ (and we’d have 30,000 unused triplets to spare). That’s not so bad.

But as has often been pointed out, one of the advan­tages of our ver­bose sys­tem which can take as many as 9 char­ac­ters to express a word like ‘advan­tage’ is that the waste also lets us under­stand par­tial mes­sages. The exam­ple given is a dis­emvow­eled sen­tence: ‘y cn ndrstnd Nglsh txt vn wtht th vwls’. Word lengths them­selves cor­re­spond roughly to fre­quency of use14 or aver­age infor­ma­tion con­tent.15

The answer given when any­one points out that a com­pressed file can be turned to non­sense by a sin­gle error is that errors aren’t that com­mon, and the ‘nat­ural’ redun­dancy is very ineffi­cient in cor­rect­ing for errors16, and fur­ther, while there are some rea­sons to expect lan­guages to have evolved towards effi­cien­cy, we have at least 2 argu­ments that they may yet be very ineffi­cient:

  1. nat­ural lan­guages differ dra­mat­i­cally in almost every way, as evi­dence by the diffi­culty Chom­skyians have in find­ing the of lan­guage; for exam­ple, aver­age word length differs con­sid­er­ably from lan­guage to lan­guage. (Com­pare Ger­man and Eng­lish; they are closely relat­ed, yet one is short­er.)

    And specifi­cal­ly, nat­ural lan­guages seem to vary con­sid­er­ably in how much they can con­vey in a given time-u­nit; speak­ers make up for low-en­tropy syl­la­bles by speak­ing faster (and vice-ver­sa), but even after mul­ti­ply the num­ber of syl­la­bles by rate, the lan­guages still differ by as much as 30%17.

  2. speak­ers may pre­fer a con­cise short lan­guage with pow­er­ful error-de­tect­ing and cor­rec­tion, since speak­ing is so tir­ing and meta­bol­i­cally cost­ly; but lis­ten­ers would pre­fer not to have to think hard and pre­fer that the speaker do all the work for them, and would thus pre­fer a less con­cise lan­guage with less pow­er­ful error-de­tec­tion and cor­rec­tion18

One inter­est­ing nat­ural exper­i­ment in binary encod­ing of lan­guages is the ; its high and low tones add 1 bit to each syl­la­ble, and when the tones are trans­lated to drum­beats, it takes about 8:1 rep­e­ti­tion:

Kele is a tonal lan­guage with two sharply dis­tinct tones. Each syl­la­ble is either low or high. The drum lan­guage is spo­ken by a pair of drums with the same two tones. Each Kele word is spo­ken by the drums as a sequence of low and high beats. In pass­ing from human Kele to drum lan­guage, all the infor­ma­tion con­tained in vow­els and con­so­nants is lost…in a tonal lan­guage like Kele, some infor­ma­tion is car­ried in the tones and sur­vives the tran­si­tion from human speaker to drums. The frac­tion of infor­ma­tion that sur­vives in a drum word is small, and the words spo­ken by the drums are cor­re­spond­ingly ambigu­ous. A sin­gle sequence of tones may have hun­dreds of mean­ings depend­ing on the miss­ing vow­els and con­so­nants. The drum lan­guage must resolve the ambi­gu­ity of the indi­vid­ual words by adding more words. When enough redun­dant words are added, the mean­ing of the mes­sage becomes unique.

…She [his wife] sent him a mes­sage in drum lan­guage…the mes­sage needed to be expressed with redun­dant and repeated phras­es: “White man spirit in for­est come come to house of shin­gles high up above of white man spirit in for­est. Woman with yam awaits. Come come.” Car­ring­ton heard the mes­sage and came home. On the aver­age, about eight words of drum lan­guage were needed to trans­mit one word of human lan­guage unam­bigu­ous­ly. West­ern math­e­mati­cians would say that about one eighth of the infor­ma­tion in the human Kele lan­guage belongs to the tones that are trans­mit­ted by the drum lan­guage.19

With a good , you can com­press and eat your cake too. Exactly how much error we can detect or cor­rect is given by the :

If we sup­pose that each word is 3 char­ac­ters long, and we get 1 error every 2 words on aver­age, our chan­nel capac­ity is 6 char­ac­ters’ of bits (or 7*6, or 42), and our mis­take rate 1⁄6 of the char­ac­ters (or 7⁄42), sub­sti­tut­ing in we get:

Or in Haskell, we eval­u­ate20:

42 / 1 - (-(1/6 * logBase 2 (1/6) + (1 - 1/6) * logBase 2 (1 - 1/6)))

Which eval­u­ates to ~41. In other words, we started with 42 bits of pos­si­bly cor­rupted infor­ma­tion, assumed a cer­tain error rate, and asked how much could we com­mu­ni­cate given that error rate; the differ­ence is what­ever we had to spend on ECC—1 bit. Try com­par­ing that to a vow­el-scheme. The vowel would not guar­an­tee detec­tion or cor­rec­tion (you may be able to decade ‘he st’ as ‘he sat’, but can you decode ‘he at’ cor­rect­ly?), and even worse, vow­els demand an entire char­ac­ter, a sin­gle block of 7–8 bits, and can’t be sub­tly spread over all the char­ac­ters. So if our 2 words had one vow­el, we just blew 7 bits of infor­ma­tion on that and that alone, and if there were more than 1 vow­el…

Of course, the Shan­non limit is the the­o­ret­i­cal asymp­totic ideal and requires com­plex solu­tions humans could­n’t men­tally cal­cu­late on the fly. In real­i­ty, we would have to use some­thing much sim­pler and hence could­n’t get away with devot­ing just 1 bit to the FEC. But hope­fully it demon­strates that vow­els are a really atro­cious form of error-cor­rec­tion. What would be a good com­pro­mise between humanly pos­si­ble sim­plic­ity and ineffi­ciency (com­pared to the Shan­non lim­it)? I don’t know.

The exis­tence of sim­i­lar neu­ronal path­ways across lan­guages & cul­tures for read­ing sug­gests that there could be hard lim­its on what sorts of lan­guages can be effi­ciently under­stood—­for exam­ple, char­ac­ters in all alpha­bets tak­ing on aver­age tak­ing 3 strokes to write. , who invented much of the early error-cor­rect­ing codes, once devised a scheme for IBM (sim­i­lar to the ); num­ber let­ters or char­ac­ters from 1 to 37, and add them all up mod­ulo 37, which is the new pre­fix to the word. This check­sum han­dles what Ham­ming con­sid­ered the most com­mon human errors like repeat­ing or swap­ping dig­its.21 A related idea is encod­ing bits into audi­ble words which are as pho­net­i­cally dis­tant as pos­si­ble, so a binary string (such as a cryp­to­graphic hash) can be spo­ken and heard with min­i­mum pos­si­bil­ity of error; see or the 32-bit Mnemonic encoder scheme.


  1. If p = 0.2 (based on the 80% suc­cess rate), then .↩︎

  2. Not that improb­a­ble, given how incred­i­bly bizarre and con­vo­luted much of neu­ro­bi­ol­ogy is, and the severe a brain must oper­ate under.↩︎

  3. Lan­dauer notes some­thing very sim­i­lar in his con­clu­sion:

    Thus, the esti­mates all point toward a func­tional learned mem­ory con­tent of around a bil­lion bits for a mature per­son. The con­sis­tency of the num­bers is reas­sur­ing…­Com­puter sys­tems are now being built with many bil­lion bit hard­ware mem­o­ries, but are not yet nearly able to mimic the asso­cia­tive mem­ory pow­ers of our “bil­lion” bit func­tional capac­i­ty. An attrac­tive spec­u­la­tion from these jux­ta­posed obser­va­tions is that the brain uses an enor­mous amount of extra capac­ity to do things that we have not yet learned how to do with com­put­ers. A num­ber of the­o­ries of human mem­ory have pos­tu­lated the use of mas­sive redun­dancy as a means for obtain­ing such prop­er­ties as con­tent and con­text address­abil­i­ty, sen­si­tiv­ity to fre­quency of expe­ri­ence, resis­tance to phys­i­cal dam­age, and the like (e.g., Lan­dauer, 1975; Hop­field, 1982; Ack­ley, Hin­ton, & Sejnowski, 1985). Pos­si­bly we should not be look­ing for mod­els and mech­a­nisms that pro­duce stor­age economies (e.g., Collins & Quil­lian, 1972), but rather ones in which mar­vels are pro­duced by profli­gate use of capac­i­ty.

    ↩︎
  4. The clas­sic exam­ple of shows that large sec­tions of the brain can be destroyed by trauma and simul­ta­ne­ously still pre­serve con­ti­nu­ity of iden­tity while dam­ag­ing or elim­i­nat­ing con­ti­nu­ity of self, due to Gage being able to remem­ber his life and remain func­tional but with severe per­son­al­ity changes.↩︎

  5. That is, they are still capa­ble of ordi­nary liv­ing although they often have pro­found prob­lems, and it is unclear what the brain vol­ume reduc­tion actu­ally means, because it affects pri­mar­ily the white mat­ter rather than the gray mat­ter, and no one seems to know how many neu­rons in the gray mat­ter may be lost. Reports of above-av­er­age IQ in severe hydro­cephalus cases are untrust­wor­thy. For more exten­sive dis­cus­sion of it all, see .↩︎

  6. Pos­ing seri­ous chal­lenges to attempts to mea­sure ani­mal intel­li­gence, as it is easy both to anthro­po­mor­phize & grossly over­es­ti­mate their cog­ni­tion, but also grossly under­es­ti­mate it due to poor choices of rewards, sen­sory modal­i­ties, or tasks—.↩︎

  7. The raw data from the sequencer would be much larger as it con­sists of many repeated over­lap­ping sequence runs, but this does­n’t reflect the actual genome’s size.↩︎

  8. “The Man Who Invented Mod­ern Prob­a­bil­ity”, Nau­tilus issue 4:

    To mea­sure the artis­tic merit of texts, Kol­mogorov also employed a let­ter-guess­ing method to eval­u­ate the entropy of nat­ural lan­guage. In infor­ma­tion the­o­ry, entropy is a mea­sure of uncer­tainty or unpre­dictabil­i­ty, cor­re­spond­ing to the infor­ma­tion con­tent of a mes­sage: the more unpre­dictable the mes­sage, the more infor­ma­tion it car­ries. Kol­mogorov turned entropy into a mea­sure of artis­tic orig­i­nal­i­ty. His group con­ducted a series of exper­i­ments, show­ing vol­un­teers a frag­ment of Russ­ian prose or poetry and ask­ing them to guess the next let­ter, then the next, and so on. Kol­mogorov pri­vately remarked that, from the view­point of infor­ma­tion the­o­ry, Soviet news­pa­pers were less infor­ma­tive than poet­ry, since polit­i­cal dis­course employed a large num­ber of stock phrases and was highly pre­dictable in its con­tent. The verses of great poets, on the other hand, were much more diffi­cult to pre­dict, despite the strict lim­i­ta­tions imposed on them by the poetic form. Accord­ing to Kol­mogorov, this was a mark of their orig­i­nal­i­ty. True art was unlike­ly, a qual­ity prob­a­bil­ity the­ory could help to mea­sure.

    ↩︎
  9. H. Moradi, “Entropy of Eng­lish text: Exper­i­ments with humans and a machine learn­ing sys­tem based on rough sets”, Infor­ma­tion Sci­ences, An Inter­na­tional Jour­nal 104 (1998), 31-47↩︎

  10. T. M. Cov­er, “A con­ver­gent gam­bling esti­mate of the entropy of Eng­lish”, IEEE Trans. Infor­ma­tion The­ory, Vol­ume IT-24, no. 4, pp. 413-421, 1978↩︎

  11. Peter Grass­berg­er, (2002)↩︎

  12. Rap­per Ricky Brown appar­ently set a rap­ping speed record in 2005 with “723 syl­la­bles in 51.27 sec­onds”, which is 14.1 syl­la­bles a sec­ond; if we assume that a syl­la­ble is 3 char­ac­ters on aver­age, and go with an esti­mate of 1.3 bits per char­ac­ter, then the bits per sec­ond (b/s) is , or 55 b/s. This is some­thing of a lower bound; Korean rap­per claims 17 syl­la­bles, which would be 66 b/s.↩︎

  13. See , Mon­te­murro 2011↩︎

  14. Which would be a sort of ; see also “Entropy, and Short Codes”.↩︎

  15. “Word lengths are opti­mized for effi­cient com­mu­ni­ca­tion”: “We demon­strate a sub­stan­tial improve­ment on one of the most cel­e­brated empir­i­cal laws in the study of lan­guage, 75-y-old the­ory that word length is pri­mar­ily deter­mined by fre­quency of use. In accord with ratio­nal the­o­ries of com­mu­ni­ca­tion, we show across 10 lan­guages that aver­age infor­ma­tion con­tent is a much bet­ter pre­dic­tor of word length than fre­quen­cy. This indi­cates that human lex­i­cons are effi­ciently struc­tured for com­mu­ni­ca­tion by tak­ing into account inter­word sta­tis­ti­cal depen­den­cies. Lex­i­cal sys­tems result from an opti­miza­tion of com­mu­nica­tive pres­sures, cod­ing mean­ings effi­ciently given the com­plex sta­tis­tics of nat­ural lan­guage use.”↩︎

  16. If we argue that vow­els are serv­ing a use­ful pur­pose, then there’s a prob­lem. There are only 3 vow­els and some semi­-vow­els, so we have at the very start given up at least 20 let­ter­s—­tons of pos­si­bil­i­ties. To make a busi­ness anal­o­gy, you can’t burn 90% of your rev­enue on booze & par­ties, and make it up on vol­ume. Even the most triv­ial error-cor­rec­tion is bet­ter than vow­els. For exam­ple, the last let­ter of every word could spec­ify how many let­ters there were and what frac­tion are vow­els; ‘a’ means there was 1 let­ter and it was a vow­el, ‘A’ means 1 con­so­nant, ‘b’ means 2 vow­els, ‘B’ means 2 con­so­nants’, ‘c’ means 1 vowel & 1 con­so­nant (in that order), ‘C’ means the reverse, etc. So if you see ’John looked _tc Julian’, the trail­ing ‘c’ implies the miss­ing let­ter is a vow­el, which could only be ‘a’.

    This point may be clearer if we look at sys­tems of writ­ing. Ancient Hebrew, for exam­ple, was an script, with vow­el-indi­ca­tions (like the ) com­ing much lat­er. Ancient Hebrew is also a dead lan­guage, no longer spo­ken in the ver­nac­u­lar by its descen­dants until the as , so oral tra­di­tions would not help much. Nev­er­the­less, the Bible is still well-un­der­stood, and the lack of vow­els rarely an issue; even the com­plete absence of mod­ern punc­tu­a­tion did­n’t cause very many prob­lems. The exam­ples I know of are strik­ing for their unim­por­tance—the exact pro­nun­ci­a­tion of the or whether the with Jesus imme­di­ately went to heav­en.↩︎

  17. An early study found that read­ing speed in Chi­nese and Eng­lish were sim­i­lar when the infor­ma­tion con­veyed was sim­i­lar (“Com­par­a­tive pat­terns of read­ing eye move­ment in Chi­nese and Eng­lish”); “A cross-lan­guage per­spec­tive on speech infor­ma­tion rate” inves­ti­gated exactly how a num­ber of lan­guages traded off num­ber of syl­la­bles ver­sus talk­ing speed by record­ing a set of trans­lated sto­ries by var­i­ous native speak­ers, and found that the two para­me­ters did not coun­ter-bal­ance exact­ly:

    Infor­ma­tion rate is shown to result from a density/rate trade-off illus­trated by a very strong neg­a­tive cor­re­la­tion between the IDL and SRL. This result con­firms the hypoth­e­sis sug­gested fifty years ago by Karl­gren (1961:676) and reac­ti­vated more recently (Green­berg and Fos­ler-Lussier (2000); Locke (2008)): ‘It is a chal­leng­ing thought that gen­eral opti­mal­iza­tion rules could be for­mu­lated for the rela­tion between speech rate vari­a­tion and the sta­tis­ti­cal struc­ture of a lan­guage. Judg­ing from my exper­i­ments, there are rea­sons to believe that there is an equi­lib­rium between infor­ma­tion value on the one hand and dura­tion and sim­i­lar qual­i­ties of the real­iza­tion on the other’ (Karl­gren 1961). How­ev­er, IRL exhibits more than 30% of vari­a­tion between Japan­ese (0.74) and Eng­lish (1.08), inval­i­dat­ing the first hypoth­e­sis of a strict cross-lan­guage equal­ity of rates of infor­ma­tion.

    ↩︎
  18. , Can­cho 2002. From the abstract:

    In this arti­cle, the early hypoth­e­sis of Zipf of a prin­ci­ple of least effort for explain­ing the law is shown to be sound. Simul­ta­ne­ous min­i­miza­tion in the effort of both hearer and speaker is for­mal­ized with a sim­ple opti­miza­tion process oper­at­ing on a binary matrix of sig­nal-ob­ject asso­ci­a­tions. Zipf’s law is found in the tran­si­tion between ref­er­en­tially use­less sys­tems and index­i­cal ref­er­ence sys­tems. Our find­ing strongly sug­gests that Zipf’s law is a hall­mark of sym­bolic ref­er­ence and not a mean­ing­less fea­ture. The impli­ca­tions for the evo­lu­tion of lan­guage are dis­cussed

    ↩︎
  19. “How We Know”, by in (re­view of James Gle­ick’s The Infor­ma­tion: A His­to­ry, a The­o­ry, a Flood)↩︎

  20. Haskell note: using log­Base because log is the nat­ural log­a­rithm, not the binary log­a­rithm used in infor­ma­tion the­o­ry.↩︎

  21. from “Cod­ing The­ory II” in The Art of Doing Sci­ence and Engi­neer­ing, Richard W. Ham­ming 1997:

    I was once asked by AT&T how to code things when humans were using an alpha­bet of 26 let­ter, ten dec­i­mal dig­its, plus a ‘space’. This is typ­i­cal of inven­tory nam­ing, parts nam­ing, and many other nam­ing of things, includ­ing the nam­ing of build­ings. I knew from tele­phone dial­ing error data, as well as long expe­ri­ence in hand com­put­ing, humans have a strong ten­dency to inter­change adja­cent dig­its, a 67 is apt to become a 76, as well as change iso­lated ones, (usu­ally dou­bling the wrong dig­it, for exam­ple a 556 is likely to emerge as 566). Thus sin­gle error detect­ing is not enough…Ed Gilbert, sug­gested a weighted code. In par­tic­u­lar he sug­gested assign­ing the num­bers (val­ues) 0, 1, 2, …, 36 to the sym­bols 0,1,…, 9, A, B, …, Z, space.

    …To encode a mes­sage of n sym­bols leave the first sym­bol, k = 1, blank and what­ever the remain­der is, which is less than 37, sub­tract it from 37 and use the cor­re­spond­ing sym­bol as a check sym­bol, which is to be put in the first posi­tion. Thus the total mes­sage, with the check sym­bol in the first posi­tion, will have a check sum of exactly 0. When you exam­ine the inter­change of any two differ­ent sym­bols, as well as the change of any sin­gle sym­bol, you see it will destroy the weighted par­ity check, mod­ulo 37 (pro­vided the two inter­changed sym­bols are not exactly 37 sym­bols apart!). With­out going into the details, it is essen­tial the mod­u­lus be a prime num­ber, which 37 is.

    …If you were to use this encod­ing, for exam­ple, for inven­tory parts names, then the first time a wrong part name came to a com­put­er, say at trans­mis­sion time, if not before (per­haps at order prepa­ra­tion time), the error will be caught; you will not have to wait until the order gets to sup­ply head­quar­ters to be later told that there is no such part or else they have sent the wrong part! Before it leaves your loca­tion it will be caught and hence is quite eas­ily cor­rected at that time. Triv­ial? Yes! Effec­tive against human errors (as con­trasted with the ear­lier white noise), yes!

    ↩︎