How Complex Are Individual Differences?

Individual human brains are more predictable and similar than they are different, reflecting low Kolmogorov complexity and implying that beta uploading may be more feasible than guessed, with suggestions on optimizing archived information. (psychology, philosophy, sociology, statistics, transhumanism, NN)
created: 23 June 2010; modified: 09 Jan 2018; status: in progress; confidence: likely; importance: 4

Every human is different in a myriad of ways, from their memories to their personality to their skills or knowledge to intelligence and cognitive abilities to moral and political values. But humans are also often remarkably similar, occupying a small area of mind-space - even a chimpanzee is alien in a way that their recognizable similarities to us only emphasize, never mind something like an octopus, much less aliens or AIs.

So this raises a question: are individuals, with their differences, more or less information-theoretically complex than some generic average human brain is in total? That is, if you somehow managed to encode an average human brain into a computer upload (I’ll assume patternism here), into a certain number of bits, would you then require as many or more bits to convert that average brain into a specific person, or would you require many fewer?

This bears on some issues such as:

  1. in a computer upload scenario, would it be necessary to scan every individual human brain in minute detail in order to upload them, or would it be possible to map only one brain in depth and then more coarse methods would suffice for all subsequent uploads? To mold or evolve an upload towards one specialty or another, would it be feasible to start with the generic brain and train it, or are brains so complex and idiosyncratic that one would have to start with a specific brain already close to the desired goal?
  2. is cryonics (expensive and highly unlikely to work) the only way to recover from death, or would it be possible to augment poor vitrification with supplemental information like diaries to enable full revivification? Or would it be possible to be recreated entirely from surviving data and records, so-called beta uploading or beta simulations, in some more meaningful method than simulate all possible human brains?

When I introspect, I do not feel especially complex or unique or more than the product of the inputs over my life. I feel I am the product of a large number of inbuilt & learned mechanisms, heuristics, and memories, operating mechanistically, repeatably, and unconsciously. Once in a great while, while reading old blog posts or reviewing old emails, I compose a long reply, only to discover that I had written one already, which is similar or even exactly the same almost down to the word, and chilled, I feel like an automaton, just another system as limited and predictable to a greater intelligence as a Sphex wasp or my cat are to me, not even an especially unique one but a mediocre result of my particular assortment of genes and mutation load and congenital defects and infections and development noise and shared environment and media consumption

One way is to ask how complex the brain could be.

Descriptive Complexity

Working from the bottom up, we could ask how much information it takes to encode individual brains.

The Whole Brain Emulation Roadmap reports a number of estimates of how much storage an upload might require, which reflects the complexity of a brain at various levels of detail, in Table 8: Storage demands (emulation only, human brain) (pg79):

Table 8: Storage demands (emulation only, human brain)

Level Model # entities Bytes per entity Memory demands (Tb) Earliest year, $1 million
1 Computational module 100 - 1,000? ? ? ?
2 Brain region connectivity 105 regions, 107 connections 3? (2: byte connectivity, 1 byte weight) 3 ∙ 10-5 Present
3 Analog network population model 108 populations, 1013 connections. 5 (3 - byte connectivity, 1 byte weight, 1 byte extra state variable) 50 Present
4 Spiking neural network 1011 neurons, 1015 connections. 8 (4 - byte connectivity, 4 state variables) 8,000 2019
5 Electrophysiology 1015 compartments x 10 state variables = 1016 . 1 byte per state variable 10,000 2019
6 Metabolome 1016 compartments x 102 metabolites= 1018 . 1 byte per state variable 106 2029
7 Proteome 1016 compartments x 103 proteins and metabolites = 1019. 1 byte per state variable 107 2034
8 States of protein complexes 1016 compartments x 103 proteins x 10 states = 1020 1 byte per state variable 108 2038
9 Distribution of complexes 1016 compartments x 103 proteins and metabolites x 100 states/locations. 1 byte per state variable 109 2043
9 Full 3D EM map (Fiala, 2002) 50x2.5x2.5 nm 1 byte per voxel, compressed. 109 2043
10 Stochastic behaviour of single molecules 1025 molecules 31 (2 bytes molecule type, 14 bytes position, 14 bytes velocity, 1 byte state) 3.1∙1014 2069
11 Quantum Either ≈ 1026 atoms, or smaller number of quantum-state carrying molecules Qbits ? ?

The most likely scale is the spiking neural network but not lower levels (like individual neural compartments or molecules), which they quote at 10^11 neurons with 1015 connections, at 4 bytes for the connections and 4 bytes for neural state, giving 8000 terabytes - which is quite large. If 8000tb is anywhere close to the true complexity of oneself, beta simulations are highly unlikely to result in any meaningful continuity of self, as even if one did nothing but write in diaries every waking moment, the raw text would never come anywhere near 8000tb (typing speeds tend to top out at ~100WPM, or 2.8 billion words or ~22.4GB or 0.22tb in a lifetime).

Bandwidth & Storage Bounds

Another approach is functional: how much information can the human brain actually store when tested? On the other hand, estimates of human brain capacity tend to be far lower.

One often cited quantification is Landauer 1986. Landauer tested information retention/recall using text reading, picture recognition memory, and autobiographical recall, finding comparable storage estimates for all modalities:

Table 1: Estimates of Information Held in Human Memory

Source of parameters Method of Estimate Input Rate (b/s) Loss Rate (b/b/s) Total (bits)
Concentrated reading 70-year linear accumulation 1.2 1.81091.8 \cdot 10^9
Picture recognition 70-year linear accumulation 2.3 3.41093.4 \cdot 10^9
Central values asymptotic 2.0 10-9 2.01092.0 \cdot 10^9
net gain over 70 years 1.41091.4 \cdot 10^9
Word knowledge semantic nets x 15 domains 0.51090.5 \cdot 10^9

Based on the forgetting curve and man-centuries of data on spaced repetition, Wozniak estimates that permanent long-term recall of declarative memory is limited to 200-300 flashcard items per year per daily minute of review, so in a lifetime of ~80 years and a reasonable amount of time spent on review, say 10 minutes, would top out at a maximum of 3001080=240,000300 \cdot 10 \cdot 80=240,000 items memorized; as each flashcard should encode a minimum fact such as a single word’s definition, which is certainly less than a kilobyte of entropy, long-term memory is bounded at around 240MB. (For comparison, see profiles of people with highly superior autobiographical memory, whose most striking attribute is normal memories combined with extremely large amounts of time spent diarizing or recalling or quizzing themselves on the past; people who keep diaries often note when rereading how much they forget, which in a way emphasizes how little autobiographical memory apparently matters for continuity of identity.)

Implicit memory, such as recognition memory of images, can be used to store information For example, Drucker 2010, in Multiplying 10-digit numbers using Flickr: The power of recognition memory, employed visual memory to calculate 9883603368×4288997768=423907527851492826249883603368 \times 4288997768 = 42390752785149282624; he cites as precedent Standing 1973:

In one of the most widely-cited studies on recognition memory, Standing showed participants an epic 10,000 photographs over the course of 5 days, with 5 seconds’ exposure per image. He then tested their familiarity, essentially as described above. The participants showed an 83% success rate, suggesting that they had become familiar with about 6,600 images during their ordeal. Other volunteers, trained on a smaller collection of 1,000 images selected for vividness, had a 94% success rate.

At an 80% accuracy rate, we can even calculate how many bits of information can be entrusted to the messenger using Shannon’s theorem; a calculation gives 5.8 kilobits/725 bytes as the upper limit1, or 0.23 bits/s, but the decrease of the recognition success rate suggests that total recognition memory could not be used to store more than a few kilobytes.

Aside from low recall of generic autobiographical memory, childhood amnesia sets in around age 3-4, resulting in almost total loss of all episodic memory prior to then, and the loss of perhaps 5% of all lifetime memories; the developmental theory is that due to limited memory capacity, the initial memories simply get overwritten by later childhood learning & experiences. Childhood amnesia is sometimes underestimated: many memories are pseudo-memories of stories as retold by relatives.

Working memory is considered a bottleneck that information must pass through in order to be stored in short-term memory and then potentially into long-term memory, but it is also small and slow. Forward digit span is a simple test of working memory, in which one tries to store & recall short random sequences of the integers 0-10; a normal adult forward digit span (without resorting to mnemonics or other strategies) might be around 7, and requires several seconds to store and recall; thus, digit span suggests a maximum storage of log(10)=3.32log(10)=3.32 bits per second, or 8.3GB over a lifetime.

A good reader will read at 200-300 words per minute; Claude Shannon estimated a single character is <1 bit; English words would weigh in at perhaps 8 bits per word (as vocabularies tend to top out at around 100,000 words, each word would be at most 16 bits) or 40 bits per second, so at 1 hour a day, a lifetime of reading would convey a maximum of 70MB. Landauer’s subjects read 180 words per minute and he estimated 0.4 bits per word for a rate of 1.2 bits per second.

Deep learning researcher Geoffrey Hinton has repeatedly noted (in an observation echoed by additional researchers discussing the relationship between supervised learning vs reinforcement learning vs unsupervised learning like Yann LeCun) that the number of synapses & neurons in the brain versus the length of our lifetime implies that much of our brains must be spent learning generic unsupervised representations of the world (such as via predictive modeling):

The brain has about 1014 synapses and we only live for about 109 seconds. So we have a lot more parameters than data. This motivates the idea that we must do a lot of unsupervised learning since the perceptual input (including proprioception) is the only place we can get 105 dimensions of constraint per second.

Since we all experience similar perceptual worlds with shared optical illusions etc, and there is little feedback from our choices, this would appear to challenge the idea of large individual differences. (A particularly acute problem given that in reinforcement learning, the supervision is limited to the rewards, which provide fraction of a bit spread over thousands or millions of actions taken in very high-dimensional perceptual states each of which may be hundreds of thousands or millions of parameters themselves.)

Eric Drexler notes that current deep learning models in computer vision have now achieved near-human, or superhuman performance across a wide range of tasks requiring model sizes typically around 500MB; the visual cortex and brain regions related to visual processing make up a large fraction of the brain (perhaps as much as 20%), suggesting that either deep models are extremely efficient compared to the brain2, the brain’s visual processing is powerful in a way not captured by any benchmarks, or that a relatively small number of current GPUs (eg 1000-15000) are equivalent to the brain.3 Consider also animals by neural count: other primates or cetaceans can have fairly similar neuron counts as humans, implying that much of the brain is dedicated to basic mammalian tasks like vision or motor coordination, implying that all the things we see as critically important to our sense of selves, such as our religious or political views, are supported by a small fraction of the brain indeed.

Overparameterization and biological robustness

The human brain appears to be highly redundant (albeit with functionality highly localized to specific points as demonstrated by lesions and grandmother cells), and particularly if the damage is slow or early in life, capable of coping with loss of large amounts of tissue. The brains of the elderly are noticeably smaller than that of young people as neural loss continues throughout the aging process; but this results in loss of identity & continuity only after decades of decay and at the extremes like senile dementia & Alzheimer’s disease; likewise, personal identity can survive concussion or other trauma.4 Epileptic patients who undergo the effective procedure of hemispherectomy, surgically disconnecting or removing entirely 1 cerebral hemisphere, typically recover well as their remaining brain adapts to the new demands although sometimes suffering side-effects, and is often cited as an example of neuroplasticity. Reported cases of people with hydrocephalus such as due to Dandy-Walker syndrome show that apparently profoundly reduced brain volumes, as much as down to 10% of normal, are still capable of relatively normal functioning5; similarly, human brain volumes vary considerably and the correlation between intelligence and brain volume is relatively weak, both within humans and between species. Many animals & insects are capable of surprisingly complex specialized behavior in the right circumstances6 despite sometimes few neurons (eg the Portia spider)

An interesting parallel is with artificial neural networks/deep learning; it is widely known that the NNs typically trained with millions or billions of parameters with dozens or hundreds of layers are grossly overparameterized & deep & energy/FLOPS-demanding, because there are many ways that NNs with equivalent intelligence can be trained which need only a few layers or many fewer parameters or an order of magnitude fewer FLOPS. (Some examples of NNs being compressed in size or FLOPs by anywhere from 50% to ~17,000%: Molchanov et al 2017, Gastaldi 2017, Narang et al 2017, Neklyudov et al 2017, Zhang et al 2017, Lobacheva et al 2017, Botha et al 2017, Rawat & Wang 2017, Xu et al 2017, Ashok et al 2017, Zhu & Gupta 2017, Shayer et al 2017, Dai et al 2017, Kligvasser et al 2017, Zun et al 2017, Wu et al 2017, Lin et al 2017, Ye et al 2017.) This implies that while a NN may be extremely expensive to train initially to a given level of performance and be large and slow, order of magnitude gains in efficiency are then possible (in addition to the gains from various forms of hyperparameter optimization or agency); there is no reason to expect this to not apply to a human-level NN, so while the first human-level NN may require supercomputers to train, it will quickly be possible to run it on vastly more modest hardware (for 2017-style NNs, at least, there is without a doubt a serious hardware overhang).

One suggestion for why those small shallow fast NNs cannot be trained directly but must be distilled/compressed down from larger NNs, is the larger NNs are inherently easier to train because the overparameterization provides a smoother loss landscape and more ways to travel between local optima and find a near-global optima; this would explain why NNs need to be overparameterized, to train in a feasible amount of time, and perhaps why human brains are so overparameterized as well. So artificial NNs are not as inherently complex as they seem, but have much simpler forms; perhaps human brains are not as complex as they look but can be compressed down to much simpler faster forms.

Predictive Complexity

A third tack is to treat the brain as a black box and take a Turing-test-like view: if a system’s outputs can be mimicked or predicted reasonably well by a smaller system, then that is the real complexity. So, how predictable are human choices and traits?

One major source of predictive power is genetics.

A whole human genome has ~3 billion base-pairs, which can be stored in 1-3GB7. Individual human differences in genetics turn out to be fairly modest in magnitude: there are perhaps 5 million small mutations and 2500 larger structural variants compared to a reference genome (Auton et al 2015, Chaisson et al 2015, Seo et al 2016, Paten et al 2017), many of which are correlated or familial, so given a reference genome such as a relative’s, the 1GB shrinks to a much smaller delta, ~125MB with one approach, but possibly down to the MB range with more sophisticated encoding techniques such as using genome graphs rather than reference sequences. Just SNP genotyping (generally covering common SNPs present in >=1% of the population) will typically cover ~500,000 SNPs with 3 possible values, requiring less than 1MB.

A large fraction of individual differences can be ascribed to those small genetic differences. (For more background on modern behavioral genetics findings, see my genetics link bibliography.) Across thousands of measured traits in twin studies from blood panels of biomarkers to diseases to anthropometric traits like height to psychological traits like personality or intelligence to social outcomes like socioeconomic status, the average measured heritability predict ~50% of variance (Polderman et al 2015), with shared family environment accounting for 17%; this is a lower bound as it omits corrections for issues like assortative mating or measurement error or genetic mosaicism (particularly common in neurons). SNPs alone account for somewhere around 15% (see GCTA & Ge et al 2016; same caveats). Humans are often stated to be relatively little variation and homogeneous compared to other wild species, apparently due to loss of genetic diversity during human expansion out of Africa; presumably if much of that diversity remained, heritabilities might be even larger. Further, there is pervasive pleiotropy in the genetic influences, where traits overlap and influence each other, making them more predictable; and the genetic influence is often partially responsible for longitudinal consistency in individuals’ traits over their lifetimes.

In psychology and sociology research, some variables are so pervasively & powerfully predictive that they are routinely measured & included in analyses of everything, such as the standard demographic variables of gender, ethnicity, socioeconomic status, intelligence, or Big Five personality factors. Nor is it uncommon to discover that a large mass of questions can often be boiled down to a few hypothesized latent factors (eg philosophy). Whether it is the extensive predictive power of stereotypes (Jussim et al 2015) or the ability of people to make above-chance thin-slicing guesses based on faces or photographs of bedrooms, or any number of odd psychology results, A single number such as IQ (a normally distributed variable, having an entropy of 12ln(2πeσ2)=1.42\frac{1}{2} \cdot \ln(2 \cdot \pi \cdot e \cdot \sigma^2) = 1.42 bits), for example, can predict much of education outcomes, and predict more in conjunction with the Big Five’s Conscientiousness & Openness. A full-scale IQ test will also provide measurements of relevant variables such as visuospatial ability or vocabulary size, and in total might be ~10 bits. So a large number of such variables could be recorded in a single kilobyte while predicting many things about a person.

More generally, almost every pair of variables shows a small but non-zero correlation (assuming sufficient data to overcome sampling error), leading to the observation that everything is correlated; this is generally attributed to all variables being embedded in universal causal networks, such that, as Meehl notes, one can find baffling and (only apparently) arbitrary differences such as male/female differences in agreement with the statement I think Lincoln was greater than Washington. Thus, not only are there a few numbers which can predict a great deal of variance, even arbitrary variables collectively are less unpredictable than they may seem due to the hidden connections.

Personal Identity and Unpredictability

After all this, there will surely be errors in modeling and many recorded actions or preferences will be unpredictable. To what extent does this error term imply personal complexity?

Measurement error is inherent in all collectible datapoints, from IQ to Internet logs; some of this is due to idiosyncrasies of the specific measuring method such as the exact phrasing of each question, but a decent chunk of it disappears when measurements are made many times over long time intervals. The implication is that our responses have a considerable component of randomness to them and so perfect reconstruction to match recorded data is both impossible & unnecessary.

Taking these errors too seriously would lead into a difficult position, that these transient effects are vital to personal identity. But this brings up the classic free will vs determinism objection: if we have free will in the sense of unpredictable uncaused choices, then the choices cannot follow by physical law from our preferences & wishes and in what sense is this our own will and why would we want such a dubious gift at all? Similarly, if our responses to IQ tests or personality inventories have daily fluctuations which cannot be traced to any stable long-term properties of our selves nor to immediate aspects of our environments but the fluctuations appears to be entirely random & unpredictable, how can they ever constitute a meaningful part of our personal identities?

Data sources

In terms of cost-effectiveness and usefulness, there are large differences in cost & storage-size for various possible information sources:

  • IQ and personality inventories: free to ~$100; 1-100 bits
  • whole-genome sequencing: <$1000; <1GB

    Given the steep decrease in genome sequencing costs historically, one should probably wait another decade or so to do any whole-genome sequencing.
  • writings, in descending order; $0 (naturally produced), <10GB (compressed)

    • chat/IRC logs
    • personal writings
    • lifetime emails
  • diarizing: 4000 hours or >$29k? (10 minutes per day over 60 years at minimum wage), <2MB
  • autobiography: ? hours; <1MB
  • cryonics for brain preservation: $80,000+, ?TB

While just knowing IQ or Big Five is certainly nowhere near enough for continuity of personal identity, it’s clear that measuring those variables has much greater bang-for-buck than recording the state of some neurons.

Depth of data collection

For some of these, one could spend almost unlimited effort, like in writing an autobiography - one could extensively interview relatives or attempt to cue childhood memories by revisiting locations or rereading books or playing with tools.

But would that be all that useful? If one cannot easily recollect a childhood memory, then (pace the implications of childhood amnesia and the free will argument) can it really be all that crucial to one’s personal identity? And any critical influences from forgotten data should still be inferable from the effects; if one is, say, badly traumatized by one’s abusive parents into being unable to form romantic relationships, the inability is the important part and will be observable from a lack of romantic relationships, and the forgotten abuse doesn’t need to be retrieved & stored.

TODO: efficient natural language Death Note Anonymity Conscientiousness and online education Simulation inferences docs/psychology/2012-bainbridge.pdf Wechsler, range of human variation

  1. If p = 0.2 (based on the 80% success rate), then 100001(p×log2p+(1p)×(log2(1p)))=5807.44\frac{10000}{1 - (p \times log_2 p + (1 - p)\times(log_2 (1 - p)))} = 5807.44.

  2. Not that improbable, given how incredibly bizarre and convoluted much of neurobiology is, and the severe metabolic, biological, evolutionary, genetic, and developmental constraints a brain must operate under.

  3. Landauer notes something very similar in his conclusion:

    Thus, the estimates all point toward a functional learned memory content of around a billion bits for a mature person. The consistency of the numbers is reassuring…Computer systems are now being built with many billion bit hardware memories, but are not yet nearly able to mimic the associative memory powers of our billion bit functional capacity. An attractive speculation from these juxtaposed observations is that the brain uses an enormous amount of extra capacity to do things that we have not yet learned how to do with computers. A number of theories of human memory have postulated the use of massive redundancy as a means for obtaining such properties as content and context addressability, sensitivity to frequency of experience, resistance to physical damage, and the like (e.g., Landauer, 1975; Hopfield, 1982; Ackley, Hinton, & Sejnowski, 1985). Possibly we should not be looking for models and mechanisms that produce storage economies (e.g., Collins & Quillian, 1972), but rather ones in which marvels are produced by profligate use of capacity.

  4. The classic example of Phineas Gage shows that large sections of the brain can be destroyed by trauma and simultaneously still preserve continuity of identity while damaging or eliminating continuity of self, due to Gage being able to remember his life and remain functional but with severe personality changes.

  5. A widely discussed paper, de Oliveira et al 2012’s Revisiting hydrocephalus as a model to study brain resilience, has been retracted. John Hawks notes some reasons to be skeptical of John Lorber’s original anecdote (Lewin 1980) about a hydrocephalic with an IQ of 126 & a math degree.

  6. Posing serious challenges to attempts to measure animal intelligence, as it is easy both to anthropomorphize & grossly overestimate their cognition, but also grossly underestimate it due to poor choices of rewards, sensory modalities, or tasks - If a Lion Could Talk: Animal Intelligence and the Evolution of Consciousness.

  7. The raw data from the sequencer would be much larger as it consists of many repeated overlapping sequence runs, but this doesn’t reflect the actual genome’s size.