Modular Brain AUNNs for Uploads

Gwern Branwen

MLP NN, neuroscience, brain imitation learning

Proposal for applying the AUNN neural net architecture to reconstruction of brains in a modular piece-wise fashion.

2023-05-22–2023-08-08 finished certainty: unlikely importance: 3 similar bibliography

Challenges
Modularized AUNNs
Cross-AUNN Embeddings
Global Losses
Progressive Uploading
- Few-Shot Minds

[Warning: JavaScript Disabled!]

[For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.]

Emulating an entire brain from scratch using a single monolithic AUNN is probably too hard.

We can instead continue the previous Herculaneum papyri discussion of embeddings & constraints and propose a modularized brain, where each module is an AUNN instance, communicating embeddings with other AUNNs. These AUNNs are then collectively trained to reconstruct both the raw data of their region, to learn local algorithms, but also reconstruct global metadata like coarse EEG signals or functional connectivity, to emulate overall activity.

The modularization helps with tractability, but also enables progressive replacement of AUNN units with any more biologically-plausible simulations, and can help prioritize what brain regions are of most scientific value to scan, conserving limited scanning resources.

With sufficiently good brain AUNNs, this may even enable upload of specific individuals by scanning the minimum possible brain regions which functionally distinguish individuals.

My second proposed application is for making progress towards whole brain emulation. I previously alluded to AUNNs being a good fit for the high-dimensional, multimodal, sparse, heterogeneous piles of data accumulated about many brains, which would allow concentrating all knowledge into a single model which can flexibly transfer across them all to make up for the radically underspecified data & structure.

Challenges

But brains don’t look like a simple ‘drop in’ AUNN application for a few reasons:

brains are enormously complex objects; all information-theoretical calculations suggest that brains contain far more information than other kinds of data like images or videos. Where regular data might have kilobytes of intrinsic information, brains will have gigabytes. The AUNN for a single brain, much less thousands of them, will be truly titanic.
brain data is available for every scale from single neurons to total brain metabolic activity, but the spatial size is the inverse: a global brain measure will provide information on no individual neurons or even regions, while single-neuron recording datasets, which record its activity in incredible detail, may in fact be a single neuron out of the entire brain. Similarly, connectome data comes in individual slices, which allow detailed electron microscope images but only of a vanishingly small region.
brain data is almost entirely disjoint: 1 modality per subject. The single-neuron recording comes from mouse A, while the EEG comes from mouse B, and the connectome slices come from mouse C, and so on. Creating a harmonized unified dataset on deeply-phenotyped individual subjects would be nice (and has shown great value in cases like the UK Biobank), but given the realities of research & funding, one probably shouldn’t wait on it. In some cases, this is impossible: one cannot record the EEG in old age of a mouse already sacrificed as a pup for connectome scanning, and the fixation of connectome scanning rules out many other post-mortem analyses like transcriptomes.
brain data is consistently sparse/biased: while some regions may have a relatively large amount of data, others are terra incognita across all datasets. There may be weak information available from fMRI or constraints from global measures like EEG, but there will be no data at all from many other modalities like connectome scanning (which usually proceeds region by region, prioritizing the scientifically-interesting or easily-scanned regions). The dark regions in a specific brain cannot be reconstructed from other brains because the regions are missing there too.
brains are simultaneously modularized & global: we know much about how brains have highly distinct specialized regions which are densely self-interconnected and communicate with each other in consistent patterns, there are also some long-range connections everywhere in small-world networks. Learning all this from scratch will be challenging.

Given all this, if we do the naive approach of creating the largest AUNN we can and then training on all available data, it will probably not work well. It will be impossibly data-hungry, struggle to transfer at all, and have large brain regions which are left untrained & total nonsense.

Modularized AUNNs

How can we fix this up? One approach would be to take some inspiration from old scanning proposals like ‘Moravecian uploading’ and treat it as a divide-and-conquer problem: pick a neuron, and model it until you can predict it accurately; then replace it with the model, and study the next neuron. The brain is modular, so let’s break it up into modules and assign an AUNN to each model.

Now that it’s learning just this one brain region, the AUNN has a much easier job. The individual AUNNs will be useful on their own, but they can also be used together as a single ‘AUNN brain’. In fact, why have separate AUNNs? Let’s just train jointly a single AUNN which is provided with brain module & subject indices in addition to its regular index, similar to having an ID for individual papyri: this is critical structure to encode into the AUNN.

AUNN DAG. OK, so we have a single AUNN which we can condition on module+subject to model a specific brain region. Conceptually, we have ‘unrolled’ the AUNN region by region, but still optimize a single model instance with tied-weights.¹ That helps deal with the overall complexity and the disjointness, but it doesn’t address the sparse/biased availability of data, or the global connectivity.

Cross-AUNN Embeddings

OK, so how do we connect them up? We can use the same trick as with the Herculaneum papyri, except instead of a language model providing additional supervision, it is now another AUNN, which is taking an embedding and providing some sort of loss on it to improve global connectivity & module communication. This AUNN is itself providing an embedding for other AUNNs, and so on through the directed acyclic graph of brain modules. (In the interests of tractability, we punt on true recurrence/bidirectional connections by using AUNNs and time-steps.)

Global Losses

How would one learn global connectivity? Well, you could go bottom-up: model a brain in such detail that you know every angstrom of it and so the global connections are all correct and any simulated dynamics Just Work™. But the other way is the way the brain learned global connectivity in the first place: top-down, and the right global connectivity is whatever makes the simulated dynamics Just Work™. If you have a lot of modules with unknown connectivity, but you have global data like diffusion tensor imaging or EEG or fMRI over time or Big Five personality inventories or IQ, you now have a target to optimize the modules’ connectivity for: whatever connections make the modules collectively emit the right EEG or fMRI.

Fire together, wire together. So, our hooked-up AUNNs are all being trained to both predict their own specific raw data but also optimize for a global pattern, like a brain imaging reading of which brain regions are most activated or the brain making a discrete choice. (At this point we must begin furiously handwaving over the details of these global losses: many are not differentiable like a LLM is, so may require blackbox optimization like RL, or may require a limited form of BPTT in unfolding the AUNN over multiple time-steps to match a time-series.) These constrain the embeddings being passed around, and also the connectivity itself: AUNNs can be connected to more distant AUNNs rather than just the immediately neighboring ones, but with regularization to discourage long range communication to just the minimum & create sparsity.

Progressive Uploading

Lego blocks. Aside from trying to bring a whole brain emulation into the realm of feasibility at all, taking a modularized approach has some benefits: we can progressively swap in more neurologically-realistic components, using AUNNs as glue or for unknown areas. At some point scans or first-principles simulations may be good enough to model a nontrivial region at acceptable fidelity (perhaps having been initialized using predictions from the respective AUNN module); that region can be swapped in, and removed from the AUNN training indices.

If active learning can be made to work for AUNNs (or at all, really), then it would assist progressive uploading by showing where to prioritize scans.

We are the 1%. And perhaps we do not need to scan that much, if we can scan the right parts. It may well be that available datasets can constrain most of human brain structures in general; our brains are all built on the same principles and constructed by almost identical genomes and learning the same world and how to do similar things, and personal identity is robust to huge changes in most of the brain (while being extremely sensitive to damage in specific regions like the prefrontal cortex). We are not so different, you and I—that I am I and you are you is but an accident of personal history—a few million letters of DNA, a few megabytes of personal history… That none of our neurons are connected identically can hide the enormous overlap of our brains, but cannot erase it. This is common in neural networks: networks that may seem to not share a single parameter in common turn out to be almost the same given the right permutation symmetry or embedding, or turn out to map onto similar highly-linear low-dimensional latents (and this appears to be more true for more powerful models, in another blessing of scale). Or consider the enormous successes in generative models of few-shot learning & finetuning: the better the generative model, the fewer samples, parameters, or FLOPS required.²

Few-Shot Minds

From this perspective, we can implement uploading of specific specimens by pretraining on as many subjects & modalities as possible to build a foundation brain model; then specialize to a specific subject acquiring as much of the subject’s global information to condition on as possible, doing active-learning prioritization of regions, and then iteratively scan until the budget runs out. As the foundation brain model gets better, the necessary amount of scanning decreases.

An interesting efficiency question here would be how to specialize each AUNN instance: presumably one should be able to partially-evaluate an AUNN with just the module+subject index, to produce a specialized AUNN which can then train at high speed on solely data index inputs. But how can one do that, and how can one merge the gradients across the many specialized AUNNs to improve the original generalist AUNN? Note that if we can do that, we can accelerate AUNN training in general for any case where we are willing to build in IDs. In an RNN, as long as the specialization is the start of an episode, one can simply cache the hidden state, and initialize all instances with that before BPTT; in a Transformer or AUNN, it’s less clear how this would work—propagate the activations until they reach a dependency on unknown inputs, then store an irregular triangle-ish model snapshot?↩︎
For example, the mid-2023 efficiency of image generators for specialization would have struck most machine learning practitioners in 2019–2020 as simply unbelievable. I know, because in 2019 I was trying, with little success, to convince people that instead of pursuing super-specialized GANs and carefully curated datasets, their efforts would be spent in creating a single large scaled-up generative model trained on all available datasets which they could then prompt or finetune with their particular interest.↩︎