“Crowdsourcing The Best GPT-2-1.5b Poetry”, (2020-02-09):
[Public-editable Google Docs document for coordinating a read through a large sample of neural-net-generated poetry, to locate the best poem samples for displaying in the GPT-2 writeup.]
I used a large neural net model, GPT-2-1.5b, trained on hundreds of megabytes of poetry, to generate 1 million words of poetry. That’s too much for me to read by myself to find the best poems. Perhaps you’d like to help?
- Pick an unread URL from ‘Open Samples’ below, open it, and remove it from the list.
- Read it. (Each URL is ≤ 1000 lines, so it should be fun.)
- Add any good poems to ‘Selected Samples’ at the end of this document.
- Enjoy reading the current ‘Selected Samples’—or pick another URL to read!
SingleFile is a Web Extension (and a CLI tool) compatible with Chrome, Firefox (Desktop and Mobile), Microsoft Edge, Vivaldi, Brave, Waterfox, Yandex browser, and Opera. It helps you to save a complete web page into a single HTML file.
SingleFile can be installed on:
- Firefox: https: / / addons.mozilla.org / firefox / addon / single-file
- Chrome: https: / / chrome.google.com / extensions / detail / mpiodijhokgodhhofbcjdecpffjipkle
- Microsoft Edge: https: / / microsoftedge.microsoft.com / addons / detail / efnbkdcfmcmnhlkaijjjmhjjgladedno
Admonitions are small break-out boxes with notes, tips, warnings, etc. for the reader. They begin with a title line of a pattern of three exclamation marks, an optional CSS class, and an optional title. All following lines that are indented at least three spaces are included in the body, which may include multiple paragraphs. The default stylesheet provides classes for “note” (default), “tip”, “warning”, and “error”.
You can use the CSS columns style to make an HTML multicolumn block. Then, just use regular Markdeep within it and the browser will automatically apply your multicolumn layout… multi-column only works well if you know that you have very short sections (as in this example), or if you were planning on printing to separate pages when done.
Markdeep is a technology for writing plain text documents that will look good in any web browser, whether local or remote. It supports diagrams, calendars, equations, and other features as extensions of specifications, README files, code documentation, lab reports, blogs, and technical web pages. Because the source is plain text, Markdeep works well with software development toolchains.syntax. Markdeep is free and easy to use. It doesn’t require a plugin or Internet connection. Your document never leaves your machine and there’s nothing to install. Just start writing in your favorite text editor. You don’t have to export, compile, or otherwise process your document. Here’s an example of a text editor and a browser viewing the same file simultaneously:…Markdeep is ideal for design documents,
Markdeep was created by Morgan McGuire (Casual Effects) with inspiration from John Gruber’s LaTeX. Unique features:and Donald Knuth’s and Leslie Lamport’s
Diagrams · Insert documents into one another · LaTeX equation typesetting and numbering · Table of contents · Reference images and embedded images · Document title and subtitle formatting · Schedules and calendars · Section numbering and references · Figure, listing, and table numbering and references · Smart quotes · Embedded video · CSS stylesheets · Page breaks · En dash, em dash, ×, minus, and degrees · Attributes on links · Unindexed sections · Works in any browser by adding one line to the bottom of a text document · Fallback to ASCII in a browser if you have neither the local file nor Internet access · Optionally process server-side with
node.js· Optionally batch process to PDF with headless browser flags · HTML export to static content using
?exportin the URL or “Rasterizer”
“Auto-smallcaps filter”, (2020-02-19):
Description of a Pandoc plugin I wrote for use on Gwern.net which automatically rewrites any string text of 3 or more capital letters (eg “NSA” or “GAN” or “NASA”) to rendered with small caps in CSS, which are typographically nicer to read as the small caps make the acronyms less visually overwhelming compared to regular capital letters.
This is trickier to implement than the usual Pandoc plugin because strings must be parsed and broken up, so it’s not a straightforward 1-for-1 substitution, and I explain the necessary recursive tricks to make it correct & typecheck.
Millions are exposed to the human immunodeficiency virus type 1 (HIV-1) every year, but not all acquire the virus, suggesting a potential role for host genetics in the moderation of HIV-1 acquisition. Here, we analyzed summary statistics from the largest genome-wide association study of HIV-1 acquisition to-date, consisting of 6,334 infected patients and 7,247 population controls, to advance our understanding of the genetic mechanisms implicated in this trait. We found that HIV-1 acquisition is polygenic and heritable, with SNP heritability estimates explaining 28–42% of the variance in this trait at a population level. Genetic correlations alongside UK Biobank data revealed associations with smoking, prospective memory and socioeconomic traits. Gene-level enrichment analysis identified EF-hand calcium binding domain 14 as a novel susceptibility gene for HIV–1 acquisition. We also observed that susceptibility variants for HIV-1 acquisition were statistically-significantly enriched for genes expressed in T-cells, but also in striatal and hippocampal neurons. Finally, we tested how polygenic risk scores for HIV-1 acquisition influence blood levels of 35 inflammatory markers in 406 HIV-1-negative individuals. We found that higher genetic risk for HIV-1 acquisition was associated with lower levels of C-C motif chemokine ligand 17. Our findings corroborate a complex model for HIV-1 acquisition, whereby susceptibility is partly heritable and moderated by specific behavioral, cellular and immunological parameters.
“The transcriptional legacy of developmental stochasticity”, (2019-12-12):
Genetic variation, epigenetic regulation and major environmental stimuli are key contributors to phenotypic variation, but the influence of minor perturbations or “noise” has been difficult to assess in mammals. In this work, we uncover one major axis of random variation with a large and permanent influence: developmental stochasticity. By assaying the transcriptome of wild monozygotic quadruplets of the nine-banded armadillo, we find that persistent changes occur early in development, and these give rise to clear transcriptional signatures which uniquely characterize individuals relative to siblings. Comparing these results to human twins, we find the transcriptional signatures which define individuals exhibit conserved co-expression, suggesting a substantial fraction of phenotypic and disease discordance within mammals arises from developmental stochasticity.
One sentence summary
Longitudinal gene expression in identical armadillo quadruplets reveals a major role for developmental stochasticity.
Background: Being adopted early in life, an indicator of exposure to early-life adversity, has been consistently associated with poor mental health outcomes in adulthood. Such associations have largely been attributed to stressful environments, eg., exposure to trauma, abuse, or neglect. However, mental health is substantially heritable, and genetic influences may contribute to the exposure to childhood adversity, resulting in potential genetic confounding of such associations.
Methods: Here, we explored associations between childhood adoption and mental health-related outcomes in midlife in 243,797 UK Biobank participants (n adopted = 3151). We used linkage disequilibrium score regression and polygenic risk scores for depressive symptoms, schizophrenia, neuroticism, and subjective well-being to address potential genetic confounding (gene-environment correlations) and gene-environment interactions. As outcomes, we explored depressive symptoms, bipolar disorder, neuroticism, loneliness, and mental health-related socioeconomic and psychosocial measures in adoptees compared with non-adopted participants.
Results: Adoptees were slightly worse off on almost all mental, socioeconomic, and psychosocial measures. Each standard deviation increase in polygenic risk for depressive symptoms, schizophrenia, and neuroticism was associated with 6%, 5%, and 6% increase in the odds of being adopted, respectively. Statistically-significantbetween adoption status and depressive symptoms, major depression, and schizophrenia were observed. No evidence for gene-environment interaction between genetic risk and adoption on mental health was found.
Conclusions: The association between childhood adoption and mental health cannot fully be attributed to stressful environments but is partly explained by differences in genetic risk between adoptees and those who have not been adopted (ie., gene-environment correlation).
[Keywords: childhood adversity, depressive symptoms, gene-environment interplay, neuroticism, polygenic risk scores, schizophrenia]
Replicable genetic association signals have consistently been found through genome-wide association studies in recent years. The recent dramatic expansion of study sizes improves power of estimation of effect sizes, genomic prediction, causal inference, and polygenic selection, but it simultaneously increases susceptibility of these methods to bias due to subtle population structure. Standard methods using genetic principal components to correct for structure might not always be appropriate and we use a simulation study to illustrate when correction might be ineffective for avoiding biases. New methods such as trans-ethnic modeling and chromosome painting allow for a richer understanding of the relationship between traits and population structure. We illustrate the arguments using real examples (stroke and educational attainment) and provide a more nuanced understanding of population structure, which is set to be revisited as a critical aspect of future analyses in genetic epidemiology. We also make simple recommendations for how problems can be avoided in the future. Our results have particular importance for the implementation of meta-analysis, for prediction of traits, and for causal inference.
Symbionts that distort their host’s sex ratio by favouring the production and survival of females are common in arthropods. Their presence produces intense Fisherian selection to return the sex ratio to parity, typified by the rapid spread of host ‘suppressor’ loci that restore male survival/development. In this study, we investigated the genomic impact of a selective event of this kind in the butterfly Hypolimnas bolina. Through linkage mapping, we first identified a genomic region that was necessary for males to survive Wolbachia-induced male-killing. We then investigated the genomic impact of the rapid spread of suppression, which converted the Samoan population of this butterfly from a 100∶1 female-biased sex ratio in 2001 to a 1∶1 sex ratio by 2006. Models of this process revealed the potential for a chromosome-wide effect. To measure the impact of this episode of selection directly, the pattern of genetic variation before and after the spread of suppression was compared. Changes in allele frequencies were observed over a 25 cM region surrounding the suppressor locus, with a reduction in overall diversity observed at loci that co-segregate with the suppressor. These changes exceeded those expected from drift and occurred alongside the generation of linkage disequilibrium. The presence of novel allelic variants in 2006 suggests that the suppressor was likely to have been introduced via immigration rather than through de novo mutation. In addition, further sampling in 2010 indicated that many of the introduced variants were lost or had declined in frequency since 2006. We hypothesize that this loss may have resulted from a period of purifying selection, removing deleterious material that introgressed during the initial sweep. Our observations of the impact of suppression of sex ratio distorting activity reveal a very wide genomic imprint, reflecting its status as one of the strongest selective forces in nature.
Author Summary: The sex ratio of the offspring produced by an individual can be an evolutionary battleground. In many arthropod species, maternally inherited microbes selectively kill male hosts, and the host may in turn evolve strategies to restore the production or survival of males. When males are rare, the intensity of selection on the host may be extreme. We recently observed one such episode, in which the population sex ratio of the butterfly Hypolimnas bolina shifted from 100 females per male to near parity, through the evolution of a suppressor gene. In our current study, we investigate the hypothesis that the strength of selection in this case was so strong that the genomic impact would go well beyond the suppressor gene itself. After mapping the location of the suppressor within the genome of H. bolina, we examined changes in genetic variation at sites on the same chromosome as the suppressor. We show that a broad region of the genome was affected by the spread of the suppressor. Our data also suggest that the selection may have been sufficiently strong to introduce deleterious material into the population, which was later purged by selection.
What makes a paper independently reproducible? Debates on reproducibility center around intuition or assumptions but lack empirical results. Our field focuses on releasing code, which is important, but is not sufficient for determining reproducibility. We take the first step toward a quantifiable answer by manually attempting to implement 255 papers published from 1984 until 2017, recording features of each paper, and performing statistical analysis of the results. For each paper, we did not look at the authors code, if released, in order to prevent bias toward discrepancies between code and paper.
“Quantifying Independently Reproducible Machine Learning”, (2020-02-06):
How reproducible is the latest ML research, and can we begin to quantify what impacts its reproducibility? This question served as motivation for my NeurIPS 2019 paper. Based on a combination of masochism and stubbornness, over the past eight years I have attempted to implement various ML algorithms from scratch. This has resulted in a ML library called JSAT. My investigation in reproducible ML has also relied on personal notes and records hosted on Mendeley and Github. With these data, and clearly no instinct for preserving my own sanity, I set out to quantify and verify reproducibility! As I soon learned, I would be engaging in meta-science, the study of science itself.
…Some of the results were unsurprising. For example, the number of authors shouldn’t have any particular importance to a paper’s reproducibility, and it did not have a statistically-significant relationship. Hyperparameters are the knobs we can adjust to change an algorithms behavior, but are not learned by the algorithm itself. Instead, we humans must set their values (or devise a clever way to pick them). Whether or not a paper detailed the hyperparameters used was found to be statistically-significant, and we can intuit why. If you don’t tell the reader what the settings where, the reader has to guess. That takes work, time, and is error prone! So, some of our results have given credence to the ideas the community has already been pursuing in order to make papers more reproducible. What is important is that we can now quantify why these are good things to be pursuing. Other findings follow basic logic, such as the finding that papers that are easier to read are easier to reproduce, likely because they are easier to understand.
- …Having fewer equations per page makes a paper more reproducible.
- Empirical papers may be more reproducible than theory-oriented papers.
- Sharing code is not a panacea
- Having detailed pseudo code is just as reproducible as having no pseudo code.
- Creating simplified example problems do not appear to help with reproducibility.
- Please, check your email
…After completing this effort, my inclination is that there is room for improvement, but that we in the AI/ML field are doing a better job than most disciplines. A 62% success rate is higher than many meta-analyses from other sciences, and I suspect my 62% number is lower than reality…Finally, it has been pointed out to me that I may have created the most un-reproducible ML research ever. But in reality, it leads to a number of issues regarding how we do the science of meta-science, to study how we implement and evaluate our research. With that, I hope I’ve encouraged you to read my paper for further details and discussion. Think about how your own work fits into the larger picture of human knowledge and science. As the avalanche of new AI and ML research continues to grow, our ability to leverage and learn from all this work will be highly dependent on our ability to distill ever more knowledge down to a digestible form. At the same time, our process and systems must result in reproducible work that does not lead us astray. I have more work I would like to do in this space, and I hope you will join me.
Covariant.ai has developed a platform that consists of off-the-shelf robot arms equipped with cameras, a special gripper, and plenty of computer power for figuring out how to grasp objects tossed into warehouse bins. The company, emerging from stealth Wednesday, announced the first commercial installations of its AI-equipped robots: picking boxes and bags of products for a German electronics retailer called Obeta.
…The company was founded in 2017 by Pieter Abbeel, a prominent AI professor at UC Berkeley, and several of his students. Abbeel pioneered the application of machine learning to robotics, and he made a name for himself in academic circles in 2010 by developing a robot capable of folding laundry (albeit very slowly). Covariant uses a range of AI techniques to teach robots how to grasp unfamiliar objects. These include reinforcement learning, in which an algorithm trains itself through trial and error, a little like the way animals learn through positive and negative feedback…Besides , Abbeel says his company’s robots make use of imitation learning, a way of learning by observing demonstrations of perception and grasping by another algorithm, and meta-learning, a way of refining the learning process itself. Abbeel says the system can adapt and improve when a new batch of items arrive. “It’s training on the fly”, he says. “I don’t think anybody else is doing that in the real world.”
…Butis finicky and needs lots of computer power. “I used to be skeptical about reinforcement learning, but I’m not anymore”, says Hinton, a professor at the University of Toronto who also works part time at Google. Hinton says the amount of computer power needed to make reinforcement learning work has often seemed prohibitive, so it is striking to see commercial success. He says it is particularly impressive that Covariant’s system has been running in a commercial setting for a prolonged period.
…Peter Puchwein, vice president of innovation at Knapp, says he is particularly impressed by the way Covariant.ai’s robots can grasp even products in transparent bags, which can be difficult for cameras to perceive. “Even as a human being, if you have a box with 20 products in poly bags, it’s really hard to take just one out”, he says…Late last year, the international robot maker ABB ran a contest. It invited 20 companies to design software for its robot arms that could sort through bins of random items, from cubes to plastic bags filled with other objects. Ten of the companies were based in Europe, and the other half were in the United States. Most came nowhere close to passing the test. A few could handle most tasks but failed on the trickier cases. Covariant was the only company that could handle every task as swiftly and efficiently as a human. “We were trying to find weaknesses”, said Marc Segura, managing director of service robotics at ABB. “It is easy to reach a certain level on these tests, but it is super difficult not to show any weaknesses.”
“Transformers as Soft Reasoners over Language”, (2020-02-14):
Beginning with McCarthy’s Advice Taker (1959), AI has pursued the goal of providing a system with explicit, general knowledge and having the system reason over that knowledge. However, expressing the knowledge in a formal (logical or probabilistic) representation has been a major obstacle to this research. This paper investigates a modern approach to this problem where the facts and rules are provided as natural language sentences, thus bypassing a formal representation. We train transformers to reason (or emulate reasoning) over these sentences using synthetically generated data. Our models, that we call RuleTakers, provide the first empirical demonstration that this kind of soft reasoning over language is learnable, can achieve high (99%) accuracy, and generalizes to test data requiring substantially deeper chaining than seen during training (95%+ scores). We also demonstrate that the models transfer well to two hand-authored rulebases, and to rulebases paraphrased into more natural language. These findings are significant as it suggests a new role for transformers, namely as limited “soft theorem provers” operating over explicit theories in language. This in turn suggests new possibilities for explainability, correctability, and counterfactual reasoning in question-answering.
“Turing-NLG: A 17-billion-parameter language model by Microsoft”, (2020-02-10):
Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a demo of the model, including its freeform generation, question answering, and summarization capabilities, to academics for feedback and research purposes. <|endoftext|>
– This summary was generated by the Turing-NLG language model itself.
…Following the trend that larger natural language models lead to better results, Microsoft is introducing Turing Natural Language Generation (T-NLG), the largest model ever published at 17 billion parameters, which outperforms the state of the art on a variety of language modeling benchmarks and also excels when applied to numerous practical tasks, including summarization and question answering. This work would not be possible without breakthroughs produced by the DeepSpeed library (compatible with PyTorch) and ZeRO optimizer, which can be explored more in this accompanying blog post.
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
Larger language models are dramatically more useful for NLP tasks such as article completion, question answering, and dialog systems. Training the largest neural language model has recently been the best way to advance the state of the art in NLP applications. Two recent papers, BERT and GPT-2, demonstrate the benefits of large scale language modeling. Both papers leverage advances in compute and available text corpora to substantially surpass state of the art performance in natural language understanding, modeling, and generation. Training these models requires hundreds of exaflops of compute and clever memory management to trade recomputation for a reduced memory footprint. However, for very large models beyond a billion parameters, the memory on a single GPU is not enough to fit the model along with the parameters needed for training, requiring model parallelism to split the parameters across multiple GPUs. Several approaches to model parallelism exist, but they are difficult to use, either because they rely on custom compilers, or because they scale poorly or require changes to the optimizer.
In this work, we implement a simple and efficient model parallel approach by making only a few targeted modifications to existing PyTorch transformer implementations. Our code is written in native Python, leverages mixed precision training, and utilizes the NCCL library for communication between GPUs. We showcase this approach by training an 8.3 billion parameter transformer language model with 8-way model parallelism and 64-way data parallelism on 512 GPUs, making it the largest transformer based language model ever trained at 24× the size of BERT and 5.6× the size of GPT-2. We have published the code that implements this approach at our GitHub repository.
Our experiments are conducted on NVIDIA’s DGX SuperPOD. Without model parallelism, we can fit a baseline model of 1.2B parameters on a single V100 32GB GPU, and sustain 39 TeraFLOPS during the overall training process, which is 30% of the theoretical peak FLOPS for a single GPU in a DGX2-H server. Scaling the model to 8.3 billion parameters on 512 GPUs with 8-way model parallelism, we achieved up to 15.1 PetaFLOPS sustained performance over the entire application and reached 76% scaling efficiency compared to the single case.
Large deep learning models offer substantial accuracy gains, but training billions to trillions of parameters is challenging. Existing solutions such as data and model parallelisms exhibit fundamental limitations to fit these models into limited device memory, while obtaining computation, communication and development efficiency. We develop a novel solution, Zero Redundancy Optimizer (ZeRO), to optimize memory, vastly improving training speed while increasing the model size that can be efficiently trained. ZeRO eliminates memory redundancies in data-parallel and model-parallel training while retaining low communication volume and high computational granularity, allowing us to scale the model size proportional to the number of devices with sustained high efficiency. Our analysis on memory requirements and communication volume demonstrates: ZeRO has the potential to scale beyond 1 trillion parameters using today’s hardware.
We implement and evaluate ZeRO: it trains large models of over 100B parameter with super-linear speedup on 400 GPUs, achieving throughput of 15 petaflops. This represents an 8× increase in model size and 10× increase in achievable performance over state-of-the-art. In terms of usability, ZeRO can train large models of up to 13B parameters (e.g., larger than Megatron GPT 8.3B and T5 11B) without requiring model parallelism which is harder for scientists to apply. Last but not the least, researchers have used the system breakthroughs of ZeRO to create the world’s largest language model (Turing-NLG, 17B parameters) with record breaking accuracy.
“Mesh-TensorFlow: Deep Learning for Supercomputers”, (2018-11-05):
Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming. However, batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately, efficient model-parallel algorithms tend to be complicated to discover, describe, and to implement, particularly on large clusters. We introduce Mesh-TensorFlow, a language for specifying a general class of distributed tensor computations. Where data-parallelism can be viewed as splitting tensors and operations along the “batch” dimension, in Mesh-TensorFlow, the user can specify any tensor-dimensions to be split across any dimensions of a multi-dimensional mesh of processors. A Mesh-TensorFlow graph compiles into a SPMD program consisting of parallel operations coupled with collective communication primitives such as Allreduce. We use Mesh-TensorFlow to implement an efficient data-parallel, model-parallel version of the Transformer sequence-to-sequence model. Using TPU meshes of up to 512 cores, we train Transformer models with up to 5 billion parameters, surpassing state of the art results on WMT’14 English-to-French translation task and the one-billion-word language modeling benchmark. Mesh-Tensorflow is available at https://github.com/tensorflow/mesh .
The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (eg., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small-scale to large-scale models and data.
“Deep Learning Scaling is Predictable, Empirically”, (2017-12-01):
Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art.
This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in exponents—the “steepness” of the learning curve—yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.
Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured by the number of steps necessary to reach a goal out-of-sample error. We study how this relationship varies with the training algorithm, model, and data set, and find extremely large variation between workloads. Along the way, we show that disagreements in the literature on how batch size affects model quality can largely be explained by differences in metaparameter tuning and compute budgets at different batch sizes. We find no evidence that larger batch sizes degrade out-of-sample performance. Finally, we discuss the implications of our results on efforts to train neural networks much faster in the future. Our experimental data is publicly available as a database of 71,638,836 loss measurements taken over the course of training for 168,160 individual models across 35 workloads.
“How AI Training Scales”, (2018-12-14):
We’ve discovered that the gradient noise scale, a simple statistical metric, predicts the parallelizability of neural network training on a wide range of tasks. Since complex tasks tend to have noisier gradients, increasingly large batch sizes are likely to become useful in the future, removing one potential limit to further growth of AI systems. More broadly, these results show that neural network training need not be considered a mysterious art, but can be rigorized and systematized.
In an increasing number of domains it has been demonstrated that deep learning models can be trained using relatively large batch sizes without sacrificing data efficiency. However the limits of this massive data parallelism seem to differ from domain to domain, ranging from batches of tens of thousands in ImageNet to batches of millions in RL agents that play the game Dota 2. To our knowledge there is limited conceptual understanding of why these limits to batch size differ or how we might choose the correct batch size in a new domain. In this paper, we demonstrate that a simple and easy-to-measure statistic called the gradient noise scale predicts the largest useful batch size across many domains and applications, including a number of supervised learning datasets (MNIST, SVHN, CIFAR-10, , Billion Word), domains (Atari and Dota), and even generative model training (autoencoders on SVHN). We find that the noise scale increases as the loss decreases over a training run and depends on the model size primarily through improved model performance. Our empirically-motivated theory also describes the tradeoff between compute-efficiency and time-efficiency, and provides a rough model of the benefits of adaptive batch-size training.
…We have found that by measuring the gradient noise scale, a simple statistic that quantifies the signal-to-noise ratio of the network gradients, we can approximately predict the maximum useful batch size. Heuristically, the noise scale measures the variation in the data as seen by the model (at a given stage in training). When the noise scale is small, looking at a lot of data in parallel quickly becomes redundant, whereas when it is large, we can still learn a lot from huge batches of data…We’ve found it helpful to visualize the results of these experiments in terms of a tradeoff between wall time for training and total bulk compute that we use to do the training (proportional to dollar cost). At very small batch sizes, doubling the batch allows us to train in half the time without using extra compute (we run twice as many chips for half as long). At very large batch sizes, more parallelization doesn’t lead to faster training. There is a “bend” in the curve in the middle, and the gradient noise scale predicts where that bend occurs.
…more powerful models have a higher gradient noise scale, but only because they achieve a lower loss. Thus, there’s some evidence that the increasing noise scale over training isn’t just an artifact of convergence, but occurs because the model gets better. If this is true, then we expect future, more powerful models to have higher noise scale and therefore be more parallelizable. Second, tasks that are subjectively more difficult are also more amenable to parallelization…we have evidence that more difficult tasks and more powerful models on the same task will allow for more radical data-parallelism than we have seen to date, providing a key driver for the continued fast exponential growth in training compute.
“Scaling Laws for Neural Language Models”, (2020-01-23):
The loss scales as awith model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget.
Larger models are substantially more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping substantially before convergence.
“The messy, secretive reality behind OpenAI’s bid to save the world: The AI moonshot was founded in the spirit of transparency. This is the inside story of how competitive pressure eroded that idealism.”, (2020-02-17):
There are two prevailing technical theories about what it will take to reach AGI. In one, all the necessary techniques already exist; it’s just a matter of figuring out how to scale and assemble them. In the other, there needs to be an entirely new paradigm; deep learning, the current dominant technique in AI, won’t be enough. Most researchers fall somewhere between these extremes, but has consistently sat almost exclusively on the scale-and-assemble end of the spectrum. Most of its breakthroughs have been the product of sinking dramatically greater computational resources into technical innovations developed in other labs.
Brockman and Sutskever deny that this is their sole strategy, but the lab’s tightly guarded research suggests otherwise. A team called “Foresight” runs experiments to test how far they can push AI capabilities forward by training existing algorithms with increasingly large amounts of data and computing power. For the leadership, the results of these experiments have confirmed its instincts that the lab’s all-in, compute-driven strategy is the best approach. For roughly six months, these results were hidden from the public because quietly posted a paper on one of the primary open-source databases for AI research. People who experienced the intense secrecy around the effort didn’t know what to make of this change. Notably, another paper with similar results from different researchers had been posted a month earlier.sees this knowledge as its primary competitive advantage. Employees and interns were explicitly instructed not to reveal them, and those who left signed nondisclosure agreements. It was only in January that the team, without the usual fanfare,
…The man driving OpenAI’s strategy is Dario Amodei, the ex-Googler who now serves as research director. When I meet him, he strikes me as a more anxious version of Brockman. He has a similar sincerity and sensitivity, but an air of unsettled nervous energy. He looks distant when he talks, his brows furrowed, a hand absentmindedly tugging his curls. Amodei divides the lab’s strategy into two parts. The first part, which dictates how it plans to reach advanced AI capabilities, he likens to an investor’s “portfolio of bets.” Different teams at GPT-2, with its remarkably realistic auto-generated texts, as an instance of why it’s important to keep an open mind. “Pure language is a direction that the field and even some of us were somewhat skeptical of”, he says. “But now it’s like, ‘Wow, this is really promising.’” Over time, as different bets rise above others, they will attract more intense efforts. Then they will cross-pollinate and combine. The goal is to have fewer and fewer teams that ultimately collapse into a single technical direction for AGI. This is the exact process that OpenAI’s latest top-secret project has supposedly already begun.are playing out different bets. The language team, for example, has its money on a theory postulating that AI can develop a substantial understanding of the world through mere language learning. The robotics team, in contrast, is advancing an opposing theory that intelligence requires a physical embodiment to develop. As in an investor’s portfolio, not every bet has an equal weight. But for the purposes of scientific rigor, all should be tested before being discarded. Amodei points to
“GPT-3: Language Models are Few-Shot Learners”, (2020-05-28):
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions—something which current NLP systems still largely struggle to do.
Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10× more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
…The precise architectural parameters for each model are chosen based on computational efficiency and load-balancing in the layout of models across GPU’s. Previous work [KMH+20] suggests that validation loss is not strongly sensitive to these parameters within a reasonably broad range.
“REALM: Retrieval-Augmented Language Model Pre-Training”, (2020-02-10; ):
Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts.
To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents.
We demonstrate the effectiveness of Retrieval Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA).We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a substantial margin (4–16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.
It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge. We show that this approach scales with model size and performs competitively with open-domain systems that explicitly retrieve answers from an external knowledge source when answering questions. To facilitate reproducibility and future work, we release our code and trained models at https://goo.gle/t5-cbqa.
[Distill.pub interactive explainer: you can train small CNNs to coordinate as cellular automata to create complex damage-resilient global patterns using standard deep learning techniques like backpropagation, since CNNs are differentiable; the CNN updates such that when it is executed simultaneously in hundreds of ‘cells’, each cell can coordinate appropriately to emit a particular color and eg. form a complex lizard shape. Because it’s decentralized, any individual cell can be deleted and the damage healed.]
What is clear is that evolution has learned to exploit the laws of physics and computation to implement the highly robust morphogenetic software that runs on genome-encoded cellular hardware. This process is extremely robust to perturbations. Even when the organism is fully developed, some species still have the capability to repair damage—a process known as regeneration. Some creatures, such as salamanders, can fully regenerate vital organs, limbs, eyes, or even parts of the brain! Morphogenesis is a surprisingly adaptive process. Sometimes even a very atypical development process can result in a viable organism—for example, when an early mammalian embryo is cut in two, each half will form a complete individual—monozygotic twins!
The biggest puzzle in this field is the question of how the cell collective knows what to build and when to stop. The sciences of genomics and stem cell biology are only part of the puzzle, as they explain the distribution of specific components in each cell, and the establishment of different types of cells. While we know of many genes that are required for the process of regeneration, we still do not know the algorithm that is sufficient for cells to know how to build or remodel complex organs to a very specific anatomical end-goal. Thus, one major lynch-pin of future work in biomedicine is the discovery of the process by which large-scale anatomy is specified within cell collectives, and how we can rewrite this information to have rational control of growth and form.
…Let’s try to develop a cellular automata update rule that, starting from a single cell, will produce a predefined multicellular pattern on a 2D grid. This is our analogous toy model of organism development. To design the CA, we must specify the possible cell states, and their update function. Typical CA models represent cell states with a set of discrete values, although variants using vectors of continuous values exist. The use of continuous values has the virtue of allowing the update rule to be afunction of the cell’s neighbourhood’s states. The rules that guide individual cell behavior based on the local environment are analogous to the low-level hardware specification encoded by the genome of an organism. Running our model for a set amount of steps from a starting configuration will reveal the patterning behavior that is enabled by such hardware.
…This article describes a toy embryogenesis and regeneration model. This is a major direction for future work, with many applications in biology and beyond. In addition to the implications for understanding the evolution and control of regeneration, and harnessing this understanding for biomedical repair, there is the field of bioengineering. As the field transitions from synthetic biology of single cell collectives to a true synthetic morphology of novel living machines, it will be essential to develop strategies for programming system-level capabilities, such as anatomical homeostasis (regenerative repair)…let’s speculate about what a “more physical” implementation of such a system could look like. We can imagine it as a grid of tiny independent computers, simulating individual cells. Each of those computers would require approximately 10Kb of ROM to store the “cell genome”: neural network weights and the control code, and about 256 bytes of RAM for the cell state and intermediate activations. The cells must be able to communicate their 16-value state vectors to neighbors. Each cell would also require an RGB-diode to display the color of the pixel it represents. A single cell update would require about 10k multiply-add operations and does not have to be synchronised across the grid. We propose that cells might wait for random time intervals between updates. The system described above is uniform and decentralised. Yet, our method provides a way to program it to reach the predefined global state, and recover this state in case of multi-element failures and restarts. We therefore conjecture this kind of modeling may be used for designing reliable, self-organising agents. On the more theoretical machine learning front, we show an instance of a decentralized model able to accomplish remarkably complex tasks. We believe this direction to be opposite to the more traditional global modeling used in the majority of contemporary work in the deep learning field, and we hope this work to be an inspiration to explore more decentralized learning modeling.
“Explorable Explanations”, (2011-03-10):
Do our reading environments encourage active reading? Or do they utterly oppose it? A typical reading tool, such as a book or website, displays the author’s argument, and nothing else. The reader’s line of thought remains internal and invisible, vague and speculative. We form questions, but can’t answer them. We consider alternatives, but can’t explore them. We question assumptions, but can’t verify them. And so, in the end, we blindly trust, or blindly don’t, and we miss the deep understanding that comes from dialogue and exploration.
Explorable Explanations is my umbrella project for ideas that enable and encourage truly active reading. The goal is to change people’s relationship with text. People currently think of text as information to be consumed. I want text to be used as an environment to think in.
This essay presents examples of a few initial ideas:
- A reactive document allows the reader to play with the author’s assumptions and analyses, and see the consequences….The reader can play with the premise and assumptions of various claims, and see the consequences update immediately. It’s like a spreadsheet without the spreadsheet.
- An explorable example makes the abstract concrete, and allows the reader to develop an intuition for how a system works.
- Contextual information allows the reader to learn related material just-in-time, and cross-check the author’s claims.
“How the Horrific 1918 Flu Spread Across America: The toll of history’s worst epidemic surpasses all the military deaths in World War I and World War II combined. And it may have begun in the United States”, (2017-11):
Although some researchers argue that the 1918 pandemic began elsewhere, in France in 1916 or China and Vietnam in 1917, many other studies indicate a U.S. origin. The Australian immunologist and Nobel laureate Macfarlane Burnet, who spent most of his career studying influenza, concluded the evidence was “strongly suggestive” that the disease started in the United States and spread to France with “the arrival of American troops.” Camp Funston had long been considered as the site where the pandemic started until my historical research, published in 2004, pointed to an earlier outbreak in Haskell County.
Wherever it began, the pandemic lasted just 15 months but was the deadliest disease outbreak in human history, killing between 50 million and 100 million people worldwide, according to the most widely cited analysis. An exact global number is unlikely ever to be determined, given the lack of suitable records in much of the world at that time. But it’s clear the pandemic killed more people in a year than AIDS has killed in 40 years, more than the bubonic plague killed in a century. The impact of the pandemic on the United States is sobering to contemplate: Some 670,000 Americans died.
…The killing created its own horrors. Governments aggravated them, partly because of the war. For instance, the U.S. military took roughly half of all physicians under 45—and most of the best ones. What proved even more deadly was the government policy toward the truth. When the United States entered the war, Woodrow Wilson demanded that “the spirit of ruthless brutality…enter into the very fibre of national life.” So he created the Committee on Public Information, which was inspired by an adviser who wrote, “Truth and falsehood are arbitrary terms…The force of an idea lies in its inspirational value. It matters very little if it is true or false.” At Wilson’s urging, Congress passed the Sedition Act, making it punishable with 20 years in prison to “utter, print, write or publish any disloyal, profane, scurrilous, or abusive language about the form of government of the United State…or to urge, incite, or advocate any curtailment of production in this country of any thing or things…necessary or essential to the prosecution of the war.” Government posters and advertisements urged people to report to the Justice Department anyone “who spreads pessimistic stories…cries for peace, or belittles our effort to win the war.”
Against this background, while influenza bled into American life, public health officials, determined to keep morale up, began to lie.
Early in September, a Navy ship from Boston carried influenza to Philadelphia, where the disease erupted in the Navy Yard. The city’s public health director, Wilmer Krusen, declared that he would “confine this disease to its present limits, and in this we are sure to be successful. No fatalities have been recorded. No concern whatever is felt.” The next day two sailors died of influenza. Krusen stated they died of “old-fashioned influenza or grip”, not Spanish flu. Another health official declared, “From now on the disease will decrease.” The next day 14 sailors died—and the first civilian. Each day the disease accelerated. Each day newspapers assured readers that influenza posed no danger. Krusen assured the city he would “nip the epidemic in the bud.”
By September 26, influenza had spread across the country, and so many military training camps were beginning to look like Devens that the Army canceled its nationwide draft call. Philadelphia had scheduled a big Liberty Loan parade for September 28. Doctors urged Krusen to cancel it, fearful that hundreds of thousands jamming the route, crushing against each other for a better view, would spread disease. They convinced reporters to write stories about the danger. But editors refused to run them, and refused to print letters from doctors. The largest parade in Philadelphia’s history proceeded on schedule. The incubation period of influenza is two to three days. Two days after the parade, Krusen conceded that the epidemic “now present in the civilian population was…assuming the type found in” Army camps. Still, he cautioned not to be “panic stricken over exaggerated reports.”He needn’t have worried about exaggeration; the newspapers were on his side. “Scientific Nursing Halting Epidemic”, an Inquirer headline blared. In truth, nurses had no impact because none were available: Out of 3,100 urgent requests for nurses submitted to one dispatcher, only 193 were provided. Krusen finally and belatedly ordered all schools closed and banned all public gatherings—yet a newspaper nonsensically said the order was not “a public health measure” and “there is no cause for panic or alarm.” There was plenty of cause. At its worst, the epidemic in Philadelphia would kill 759 people…in one day. Priests drove horse-drawn carts down city streets, calling upon residents to bring out their dead; many were buried in mass graves. More than 12,000 Philadelphians died—nearly all of them in six weeks.
Across the country, public officials were lying. U.S. Surgeon General Rupert Blue said, “There is no cause for alarm if precautions are observed.” New York City’s public health director declared “other bronchial diseases and not the so-called Spanish influenza…[caused] the illness of the majority of persons who were reported ill with influenza.” The Los Angeles public health chief said, “If ordinary precautions are observed there is no cause for alarm.” For an example of the press’s failure, consider Arkansas. Over a four-day period in October, the hospital at Camp Pike admitted 8,000 soldiers. Francis Blake, a member of the Army’s special pneumonia unit, described the scene: “Every corridor and there are miles of them with double rows of cots…with influenza patients…There is only death and destruction.” Yet seven miles away in Little Rock, a headline in the Gazette pretended yawns: “Spanish influenza is plain la grippe—same old fever and chills.”
People knew this was not the same old thing, though. They knew because the numbers were staggering—in San Antonio, 53% of the population got sick with influenza. They knew because victims could die within hours of the first symptoms—horrific symptoms, not just aches and cyanosis but also a foamy blood coughed up from the lungs, and bleeding from the nose, ears and even eyes. And people knew because towns and cities ran out of coffins. People could believe nothing they were being told, so they feared everything, particularly the unknown. How long would it last? How many would it kill? Who would it kill? With the truth buried, morale collapsed. Society itself began to disintegrate.
“A Sea Story: One of the worst maritime disasters in European history took place a decade ago. It remains very much in the public eye. On a stormy night on the Baltic Sea, more than 850 people lost their lives when a luxurious ferry sank below the waves. From a mass of material, including official and unofficial reports and survivor testimony, our correspondent has distilled an account of the Estonia's last moments—part of his continuing coverage for the magazine of anarchy on the high seas”, (2004-05):
Psychology and survival: the MS Estonia was a car ferry whose nose came off in a Baltic storm at night. It did not sink instantly, but nearly 90% of the passengers on it died by not reacting fast enough and escaping to the deck where they had a chance to survive the tilting ship before it went under. The people who were worried by the sound of the clanking/thumping when the ship first encountered issues mostly survived, and anyone who dithered (including just by trying to get dressed before running to the deck) died, trapped when the hallways and stairways went vertical or filled up with water. A large number of the people who survived were naked or in their underwear, or were strong young men who could still climb and force their way out, after leaving behind their loved ones.
“This Is Why Your Holiday Travel Is Awful: The long, sordid history of New York’s Penn Station shows how progressives have made it too hard for the government to do big things—and why, believe it or not, Robert Caro is to blame”, (2019-11-29):
[Discussion of why infrastructure development is so extraordinarily costly and slow in NYC: a major part of it is a deliberate creation of a tragedy of the anticommons, where a large number of entities can kill or delay projects for little or no reason. Exemplified by a case study of Penn Station, one of the most heavily-used train/subway stations in the world, which is universally acknowledged to have been in desperate need of major renovations for well over 30 years (where sewage recently poured through the ceiling), and yet any major renovations seem as distant as when discussions first began. Entities ranging from the US Postal Service (jealous of its underused rooms) to Amtrak (financial failure) to untrustworthy real estate developers to preservationist activists (in love with a brick wall) to Jim Dolan (owner of Madison Square Garden) to 9/11 (disruption and creating new reasons for US Postal Service intransigence) to the NY State Senate have all conspired to delay and disrupt any progress.]
2019-archer.pdf: “The reality and evolutionary [importance] of human psychological sex differences”, (2019-03-20; ):
The aims of this article are: (1) to provide a quantitative overview of sex differences in human psychological attributes; and (2) to consider evidence for their possible evolutionary origins. Sex differences were identified from a systematic literature search of meta-analyses and large-sample studies. These were organized in terms of evolutionary [importance] as follows:
- characteristics arising from inter-male competition (within-sex aggression; impulsiveness and sensation-seeking; fearfulness; visuospatial and object-location memory; object-centred orientations);
- those concerning social relations that are likely to have arisen from women’s adaptations for small-group interactions and men’s for larger co-operative groups (person-centred orientation and social skills; language; depression and anxiety);
- those arising from female choice (sexuality; mate choice; sexual conflict).
There were sex differences in all categories, whose magnitudes ranged from
- small (object location memory; negative emotions), to
- medium (mental rotation; anxiety disorders; impulsivity; sex drive; interest in casual sex), to
- large (social interests and abilities; sociosexuality); and
- very large (escalated aggression; systemizing; sexual violence).
Evolutionary explanations were evaluated according to whether:
- similar differences occur in other mammals;
- there is cross-cultural consistency;
- the origin was early in life or at puberty;
- there was evidence for hormonal influences; and
- where possible, whether there was evidence for evolutionarily derived design features.
The evidence was positive for most features in most categories, suggesting evolutionary origins for a broad range of sex differences. Attributes for which there was no sex difference are also noted. Within-sex variations are discussed as limitations to the emphasis on sex differences.
For many traits, males show greater variability than females, with possible implications for understanding sex differences in health and disease. Here, the ENIGMA (Enhancing Neuro Imaging Genetics through Meta-Analysis) Consortium presents the largest-ever mega-analysis of sex differences in variability of brain structure, based on international data spanning nine decades of life. Subcortical volumes, cortical surface area and cortical thickness were assessed in MRI data of 16,683 healthy individuals 1–90 years old (47% females). We observed patterns of greater male than female between-subject for all brain measures. This pattern was stable across the lifespan for 50% of the subcortical structures, 70% of the regional area measures, and nearly all regions for thickness. Our findings that these sex differences are present in childhood implicate early life genetic or gene-environment interaction mechanisms. The findings highlight the importance of individual differences within the sexes, that may underpin sex-specific vulnerability to disorders.
“Typologies of extreme longevity myths”, (2010):
Purpose. Political, national, religious, and other motivations have led the media and even scientists to errantly accept extreme longevity claims prima facie. We describe various causes of false claims of extraordinary longevity. Design and Methods. American Social Security Death Index files for the period 1980-2009 were queried for individuals with birth and death dates yielding ages 110+ years of age. Frequency was compared to a list of age-validated supercentenarians maintained by the Gerontology Research Group who died during the same time period. Age claims of 110+ years and the age validation experiences of the authors facilitated a list of typologies of false age claims. Results. Invalid age claim rates increase with age from 65% at age 110–111 to 98% by age 115 to 100% for 120+ years. Eleven typologies of false claims were: Religious Authority Myth, Village Elder Myth, Fountain of Youth Myth (substance), Shangri-La Myth (geographic), Nationalist Pride, Spiritual Practice, Familial Longevity, Individual and/or Family Notoriety, Military Service, Administrative Entry Error, and Pension-Social Entitlement Fraud. Conclusions. Understanding various causes of false extreme age claims is important for placing current, past, and future extreme longevity claims in context and for providing a necessary level of skepticism.
[Probably not a fraud. Summary of the state of Calment centenarian fraud accusations by Novoselov & Zak: the tax fraud theory appears to rest on wildly overestimated tax burden estimates and has been abandoned in favor of covering up a death from tuberculosis which might affect their department store’s revenue, locals insist they or their relatives knew the two Calments too well for a switch, Yvonne’s child would have had to be in on it as well as a local notary, more of Jeanne’s stories about obscure people she knew appear to have been validated, Yvonne’s funeral was highly public, Jeanne’s anomalously tall late-life height appears to have been mis-measured and the real height much lower as expected from her great age, the nose fibroma argument is inconclusive, the photographs are too low-quality for proper forensic analysis, and one anecdote of calling Jeanne ‘Yvonne’ has been recanted. The fraud theory appears to be on very shaky ground now… but the mystery of Jeanne Calment’s longevity remains.]
1974-auerbach.pdf: “A simple procedure for the long-term cultivation of chicken embryos”, (1974-12; ):
A method is described which permits the growth of chicken embryos in petri dishes from the third to the 20th day of incubation. The procedure is relatively simple and has the advantage of providing ready access to the embryo and its membranes for tissue grafting, for introduction of teratogenic agents, and for microscopic observation of morphogenesis and growth…we found it essential to develop methods that would permit rapid and ready observation of large numbers of eggs under conditions facilitating examination with transmitted light, permitting time-lapse photography, and encouraging routine access to the grafted tissue. The procedures we describe in this report have now been used by us for growing several thousand eggs during the past several months…
2014-tahara.pdf: “A Novel Shell-less Culture System for Chick Embryos Using a Plastic Film as Culture Vessels”, (2014; ):
The development of shell-less culture methods for bird embryos with high hatchability would be useful for the efficient generation of transgenic chickens, embryo manipulations, tissue engineering, and basic studies in regenerative medicine. To date, studies of culture methods for bird embryos include the whole embryo culture using narrow windowed eggshells, surrogate eggshells, and an artificial vessel using a gas-permeable membrane. However, there are no reports achieving high hatchability of >50% using completely artificial vessels. To establish a simple method for culturing chick embryos with high hatchability, we examined various culture conditions, including methods for calcium supplementation and oxygen aeration. In the embryo cultures where the embryos were transferred to the culture vessel after 55–56h incubation, more than 90% of embryos survived until day 17 when a polymethylpentene film was used as a culture vessel with calcium lactate and distilled water supplementations. The aeration of pure oxygen to the surviving embryos from day 17 yielded a hatchability of 57.1% (8 out of 14). Thus, we successfully achieved a high hatchability with this method in chicken embryo culture using an artificial vessel.
Objective: To determine the implications of car ownership for physical activity and weight in a global city.
Design: Quasi-experimental cross sectional study.
Setting: Beijing, China, 2011-15.
Participants: People aged 18 and older from a random sample of households who had entered a permit lottery to purchase a vehicle between January 2011 and November 2015.
Interventions: Permit allowing purchase of a vehicle within six months of permit issuance.
Main outcome measures: Transit use (number of subway and bus rides each week), physical activity (minutes of walking or bicycling each day), and weight, measured once in early 2016.
Results: Of 937 people analysed in total, 180 had won a permit to purchase a new vehicle. Winning the permit lottery resulted in the purchase of an additional vehicle 91% of the time (95% confidence interval 89% to 94%; p < 0.001). About five years after winning, winners took statistically-significantly fewer weekly transit rides (−2.9 rides (−5.1 to −0.7); p = 0.01) and walked and cycled statistically-significantly less (−24.2 minutes (−40.3 to −8.1); p = 0.003) than those who did not win the lottery. Average weight did not change statistically-significantly between lottery winners and losers. Among those aged 50 and older, however, winners’ weight had increased relative to that of losers (10.3 kg (0.5 to 20.2); p = 0.04) 5.1 years after winning.
Conclusions: These data indicate that vehicle ownership in a rapidly growing global city led to long term reductions in physical activity and increase in weight. Continuing increases in car use and ownership in developing and middle income countries could adversely affect physical health and obesity rates.
In vision, two mixtures, each containing an independent set of many different wavelengths, may produce a common color percept termed “white.” In audition, two mixtures, each containing an independent set of many different frequencies, may produce a common perceptual hum termed “white noise.” Visual and auditory whites emerge upon two conditions: when the mixture components span stimulus space, and when they are of equal intensity. We hypothesized that if we apply these same conditions to odorant mixtures, “whiteness” may emerge in olfaction as well. We selected 86 molecules that span olfactory stimulus space and individually diluted them to a point of about equal intensity. We then prepared various odorant mixtures, each containing various numbers of molecular components, and asked human participants to rate the perceptual similarity of such mixture pairs. We found that as we increased the number of non-overlapping, equal-intensity components in odorant mixtures, the mixtures became more similar to each other, despite not having a single component in common. With ~30 components, most mixtures smelled alike. After participants were acquainted with a novel, arbitrarily named mixture of ~30 equal-intensity components, they later applied this name more readily to other novel mixtures of ~30 equal-intensity components spanning stimulus space, but not to mixtures containing fewer components or to mixtures that did not span stimulus space. We conclude that a common olfactory percept, “olfactory white”, is associated with mixtures of ~30 or more equal-intensity components that span stimulus space, implying that olfactory representations are of features of molecules rather than of molecular identity.
“In the 1970s, the CIA Created a Robot Dragonfly Spy. Now We Know How It Works. Newly released documents show how the CIA created one of the world's first examples of insect robotics.”, (2020-02-18):
…So the agency turned to retroreflectors, tiny glass beads that reflect laser light (in this case, a laser beam) back at its source…In addition to being incredibly maneuverable, dragonflies are exceptionally good gliders compared to other insects, which helps them conserve energy on long flights. The scientist brought in some specimens, and when Adkins pressed him on the issue, “the old fellow plucked the insect from its perch and tossed it into the air”, Adkins wrote. “It made about two circuits and landed nicely on the desk.”
The demonstration convinced Adkins, but the team still needed to figure out how to replicate a dragonfly’s wings, which flap 1,800 times per minute. To pull this off, scientists used a tiny fluidic oscillator, a device with no moving parts that’s completely driven by gas produced by lithium nitrate crystals. When initial tests showed that the prototype couldn’t carry the required 0.2 gm payload, designers added additional thrust by venting exhaust backward, much like jet propulsion. After a quick dragonfly-inspired paint job, the drone was ready for (covert) action, weighing just under a gram. Its glittering ‘eyes’ were the glass retroreflector beads destined to snoop on unsuspecting targets.
…While the CIA now had its robo-bug, it still needed a way to control it. Radio control was out of the question because any extra weight would doom the small insectothopter. So CIA scientists turned to the same lasers used for the retroreflectors. This was a portable laser unit, known as ROME, that produced an invisible infrared beam. The idea was that the laser would heat a bimetallic strip that would then open or close the dragonfly’s exhaust. While effectively throttling the ‘engine’, another laser—acting like a kind of rudder—would then steer the drone to its desired destination. With its gas-pumping engine and laser-based navigation system, the insectothopter could fly for only 60 seconds. But this was more than enough to get the dragonfly—and its payload—to a target some 200 meters away.
…The biggest problem with the insectothopter’s design was that an operator had to keep a laser manually trained on the drone during flight. Easily done in a static wind tunnel, less so in blustery and unpredictable conditions…In theory, the insectothopter could still be flown in less than 7MPH winds, but “the ultimate demonstration of controlled powered flight has not yet been achieved”, Adkins ultimately reported. “Though the flight tests were impressive, control in any kind of crosswind was too difficult.”
For more than half a century, governments all over the world trusted a single company to keep the communications of their spies, soldiers and diplomats secret. The company, Crypto AG, got its first break with a contract to build code-making machines for U.S. troops during World War II. Flush with cash, it became a dominant maker of encryption devices for decades, navigating waves of technology from mechanical gears to electronic circuits and, finally, silicon chips and software. The Swiss firm made millions of dollars selling equipment to more than 120 countries well into the 21st century. Its clients included Iran, military juntas in Latin America, nuclear rivals India and Pakistan, and even the Vatican.
But what none of its customers ever knew was that Crypto AG was secretly owned by the CIA in a highly classified partnership with West German intelligence. These spy agencies rigged the company’s devices so they could easily break the codes that countries used to send encrypted messages. The decades-long arrangement, among the most closely guarded secrets of the Cold War, is laid bare in a classified, comprehensive CIA history of the operation obtained by The Washington Post and ZDF, a German public broadcaster, in a joint reporting project. The account identifies the CIA officers who ran the program and the company executives entrusted to execute it. It traces the origin of the venture as well as the internal conflicts that nearly derailed it. It describes how the United States and its allies exploited other nations’ gullibility for years, taking their money and stealing their secrets.
The operation, known first by the code name “Thesaurus” and later “Rubicon”, ranks among the most audacious in CIA history. “It was the intelligence coup of the century”, the CIA report concludes. “Foreign governments were paying good money to the U.S. and West Germany for the privilege of having their most secret communications read by at least two (and possibly as many as five or six) foreign countries.” From 1970 on, the CIA and its code-breaking sibling, the National Security Agency, controlled nearly every aspect of Crypto’s operations—presiding with their German partners over hiring decisions, designing its technology, sabotaging its algorithms and directing its sales targets. Then, the U.S. and West German spies sat back and listened. They monitored Iran’s mullahs during the 1979 hostage crisis, fed intelligence about Argentina’s military to Britain during the Falklands War, tracked the assassination campaigns of South American dictators and caught Libyan officials congratulating themselves on the 1986 bombing of a Berlin disco.
…The German spy agency, the BND, came to believe the risk of exposure was too great and left the operation in the early 1990s. But the CIA bought the Germans’ stake and simply kept going, wringing Crypto for all its espionage worth until 2018, when the agency sold off the company’s assets, according to current and former officials.
…This story is based on the CIA history and a parallel BND account, also obtained by The Post and ZDF, interviews with current and former Western intelligence officials as well as Crypto employees. Many spoke on the condition of anonymity, citing the sensitivity of the subject. It is hard to overstate how extraordinary the CIA and BND histories are. Sensitive intelligence files are periodically declassified and released to the public. But it is exceedingly rare, if not unprecedented, to glimpse authoritative internal histories of an entire covert operation. The Post was able to read all of the documents, but the source of the material insisted that only excerpts be published.
[One of Polygon’s oral histories: this feature chronicles the inception and R&D of the Kinect motion-tracking device, Microsoft’s bold experiment in revolutionizing UI by shipping scores of millions of units for the XBox, and the gradual fade-out and eventual death. While beloved by many, finding niches in everything from hospitals to nursing homes to research labs to art installations (making the Kinect the Velvet Underground of peripherals?), and powering unique games, it just couldn’t quite find its footing despite selling >10 million units (perhaps a cautionary tale for VR enthusiasts).]
“Old CSS, New CSS”, (2020-02-01):
[Why is web programming so screwed up? A highly-opinionated history of how worse-is-better played out online from 1995 to now, by a programmer who started writing HTML ~1996 and has seen the evolution of it all up close: HTML was never designed to support even 1% of the things it is expected to do, requiring gruesome workarounds like tables for positioning anything or using images for rounded corners, and has been constantly extended with ad hoc and poorly-thought-through capabilities, sabotaged further by the exigencies of history like the ‘browser wars’ between Netscape & Microsoft, and then Microsoft simply killing Internet Explorer (IE) development for several years after achieving a near-total global monopoly. With a vast amount of work, HTML/CSS can now support many desirable web pages, but the historical legacy continues to live on, in the use of now-obsolete workarounds, features which no one uses, strange inconsistencies & limitations, etc.]
“Predicting the outcome of roulette”, (2012-04-28):
There have been several popular reports of various groups exploiting the deterministic nature of the game of roulette for profit. Moreover, through its history the inherent determinism in the game of roulette has attracted the attention of many luminaries of chaos theory. In this paper we provide a short review of that history and then set out to determine to what extent that determinism can really be exploited for profit. To do this, we provide a very simple model for the motion of a roulette wheel and ball and demonstrate that knowledge of initial position, velocity and acceleration is sufficient to predict the outcome with adequate certainty to achieve a positive expected return. We describe two physically realisable systems to obtain this knowledge both incognito and in situ. The first system relies only on a mechanical count of rotation of the ball and the wheel to measure the relevant parameters. By applying this techniques to a standard casino-grade European roulette wheel we demonstrate an expected return of at least 18%, well above the -2.7% expected of a random bet. With a more sophisticated, albeit more intrusive, system (mounting a digital camera above the wheel) we demonstrate a range of systematic and statistically-significant biases which can be exploited to provide an improved guess of the outcome. Finally, our analysis demonstrates that even a very slight slant in the roulette table leads to a very pronounced bias which could be further exploited to substantially enhance returns.
2005-vanderkloot.pdf: “Lawrence Bragg's role in the development of sound-ranging in World War I”, (2005-09-06; ):
In 1915, when Lawrence Bragg was a 25-year-old Second Lieutenant in the Royal Horse Artillery, seconded to ‘Maps GHQ’, he learned that he and his father had shared the Nobel Prize in physics. Lawrence’s equation was crucial for winning the prize and he had been wounded by his father’s early dissemination of their work with casual attribution to ‘my son’. Lawrence was responsible for developing methods for pinpointing the position of enemy artillery pieces by recording the boom of their firing with an array of microphones. It was a simple idea but difficult to implement. Step by step, Bragg and the group he assembled solved the problems and developed a system that worked. Sound-ranging was valuable in the British victory at Cambrai in 1917 and vital for that at Amiens in 1918: the ‘black day of the German Army’. He received the MC and the OBE. His Army service manifested both his scientific leadership and administrative skills, which culminated in the demonstrations of the validity of the dream he enunciated in his Nobel lecture: that X-rays could be used to resolve the structure of the most complicated molecules.
[Profile of Judy Sheindlin, star of the long-running (>23 years) daytime television show Judge Judy where, as an arbitrator, she berates and resolves an endless line of small-claims courts. This article covers her biography as she evolved from an ambitious young Jew in NYC who entered corporate law but left to become a stay-at-home mom and ultimately a reality show star running, after >5000 episodes, a finely-tuned machine for dragnetting cases from across the country, making a fortune from royalties and renewals: her net worth is anywhere up to $400 million.]
2008-alper.pdf: “Anesthetizing the Public Conscience: Lethal Injection and Animal Euthanasia”, (2008-08-05; ):
Lawyers challenging lethal injection on behalf of death row inmates have frequently argued that lethal injection protocols do not comport with standard practices for the euthanasia of animals. This article studies state laws governing animal euthanasia and concludes that many more states than have previously been recognized ban the use of paralyzing agents in animal euthanasia. In fact, 97.6% of lethal injection executions in this country have taken place in states that have banned, for use in animal euthanasia, the same drugs that are used in those states during executions. Moreover, a study of the legislative history of state euthanasia laws reveals that the concerns raised about paralyzing drugs in the animal euthanasia context are identical in many ways to the concerns that lawyers for death row inmates are currently raising about the use of those drugs in the lethal injection executions of human beings. This article takes an in depth look at animal euthanasia and its relationship to lethal injection by examining in Part I the history and origins of the paralyzing drugs that veterinarians and animal welfare experts refuse to allow in animal euthanasia; in Part II the standards of professional conduct for veterinary and animal shelter professionals; in Part III, the state laws and regulations governing animal euthanasia; and finally in Part IV, the legislative history that led to the enactment of the various states’ animal euthanasia laws and regulations.
[Keywords: death penalty, lethal injection, animal euthanasia, capital punishment.]
In the late 1970s, when Texas was considering whether to adopt Oklahoma’s three-drug lethal injection formula for the execution of prisoners, Dr. Ralph Gray, the doctor in charge of medical care in Texas prisons, consulted with a Texas veterinarian named Dr. Gerry Etheredge.1 Dr. Etheredge told Dr. Gray that veterinarians used an overdose of one drug, an anesthetic called sodium pentobarbital, to euthanize animals and that it was a “very safe, very effective, and very cheap” method of euthanasia.2 Dr. Etheredge remembers that Dr. Gray had only one objection to using a similar method to execute human beings. “He said it was a great idea”, Dr. Etheredge recalled, “except that people would think we are treating people the same way that we’re treating animals. He was afraid of a hue and cry.”3 Texas rejected Dr. Etheredge’s one-drug, anesthetic-only recommendation and, in 1982, became the first state to actually use lethal injection—via the three-drug formula—as a method of execution.4 This history is almost hard to believe in light of the fact that three decades later, death row inmates in Texas, as well as in nearly every other death penalty state, are challenging the three-drug formula on the grounds that the method is less reliable, and therefore less humane, than the method used to euthanize animals.5
…It was through the use of curare in vivisection that people began to consider the implications of what curare did not do, namely serve any anesthetic function. While curare inhibits all voluntary movement, it does nothing at all to affect consciousness, cognition, or the ability to feel pain.46…Dr. Hoggan, described the experience of a dog subjected to vivisection while paralyzed by curare.51 Curare, he testified, was used to:
render [the] dog helpless and incapable of any movement, even of breathing, which function was performed by a machine blowing through its windpipe. All this time, however, its intelligence, its sensitiveness, and its will, remained intact . . . . In this condition the side of the face, the interior of the belly, and the hip, were dissected out . . . continuously for ten consecutive hours . . . .52
In 1868, the Swedish physiologist A. F. Holmgren condemned curare as “the most cruel of all poisons.”53…in 1864 Claude Bernard offered another description of such a deceptively peaceful death:
A gentle sleep seems to occupy the transition from life to death. But it is nothing of the sort; the external appearances are deceitful. . . . [I]n fact . . . we discover that this death, which appears to steal on in so gentle a manner and so exempt from pain is, on the contrary, accompanied by the most atrocious sufferings that the imagination of man can conceive.81
No inmate has ever survived a botched lethal injection, so we do not know what it feels like to lie paralyzed on a gurney, unable even to blink an eye, consciously suffocating, while potassium burns through the veins on its way to the heart, until it finally causes cardiac arrest. But aided by the accounts of people who have suffered conscious paralysis on the operating table, one can begin to imagine.
[Discussion of “inverse p-zombies” via excerpts of “Inverse zombies, anesthesia awareness, and the hard problem of unconsciousness”, Mashour & LaRock 2008: the problem of telling when someone is conscious but otherwise appears and acts unconscious, a problem of particular concern in anesthesia for surgery—anesthesia occasionally fails, resulting in ‘anesthesia awareness’, leaving the patient fully conscious and feeling every last bit of the surgery, as they are completely paralyzed but are cut open and operated on for hours, which they describe as being every bit as horrific as one would think, leading to tortured memories and PTSD symptoms. Strikingly, death row executions by lethal injection use a cocktail of chemicals which are almost designed to produce this (rather than the simple single reliable drug universally used for euthanasia by veterinarians), suggesting that, as peaceful as the executions may look, the convicts may actually be enduring extraordinary agony and terror during the several minutes it takes to kill them.
Further, anesthesia appears to often operate by erasing memories, so it is possible that anesthesia awareness during surgery is much more common than realized, and underestimated because the victims’ long-term memories are blocked from forming. There are some indications that surgery is associated with bad psychiatric symptoms even in cases where the patient does not recall any anesthesia awareness, suggesting that the trauma is preserved in other parts of the mind.
While doctors continue to research the problem of detecting consciousness, it is far from solved. Most people, confronted with a hypothetical about getting money in exchange for being tortured but then administered an amnesiac, would say that the torture is an intrinsically bad thing even if it is then forgotten; but perhaps we are, unawares, making the opposite choice every time we go in for surgery under general anesthesia?]
1978-dennett.pdf: “Why you can't make a computer that feels pain”, Daniel C. Dennett
“A Martian Sends a Postcard Home [with commentary]”, (2016-03-17):
Craig Raine is a British poet, born in 1944, who is known as an exponent of “Martian poetry”, by which is meant the expression of familiar concepts in unfamiliar ways. The term derived from his poem “A Martian Sends a Postcard Home”, which was first published in the New Statesman in 1977.
One does not need to believe in Martians to enjoy this poem, only in the concept of being able to perceive human behaviour and institutions with complete detachment, as though one had never come across them before. Or rather, as Craig Raine does, to express one’s impressions of humanity in terms that seem strange and puzzling at first and need a little working out before one realises what it is to which the poet is referring. It is in working out the puzzles that the reader derives a lot of fun from this poem.
“Does Who You Are at 7 Determine Who You Are at 63?: In 1964, with Seven Up! Michael Apted stumbled into making what has become the most profound documentary series in the history of cinema. Fifty-five years later, the project is reaching its conclusion.”, (2019-11-27):
[This is a unique and very well-done take on Seven Up. It has the added dimension of how much of a learning process this was for Apted (the director), as well as other aspects that I hadn’t picked up from other sources.
I first discovered this series a few weeks ago. I found the idea fascinating and I expected to be keen to dig into it. So I read some of the pieces that I found online and I stopped. My expectation of wanting to dive into this living soap opera turned into a feeling of bleak depression. Part of it has something to do with the blandness of even the happiest of near-endings, and part of it has something to do with the sadness of seeing a seven year old quickly progress in age to that point in life when we’re sort of forced to evaluate who and where we are. It was far too quick of a journey for me. Their lives are presented like a history book, that places an emphasis on wars and other human struggles. It’s also similar to a newscast: the bad news overwhelms and the good news is boring, so it doesn’t get much attention. It’s a CV that demands to know “what have you done with your life?” in a series of bullet points that skews toward points of merit.
I suppose that part of my feeling has to do with the fact that I’m at that point in life myself. Family and peers are getting sick and dying. I’ll be doing the same. A lot of us aren’t mentally prepared for what it’s really like to be here. I think I’ve been working my way through it pretty well, but it takes a lot of emotional and philosophical work that we may not have a lot of experience with.
For me, Seven Up pokes and prods at life’s battle wounds without enough attention to the boring bits that may actually dominate a life, which might be where our focus needs to be if we’re to attain the contentment that should perhaps be our goal, whatever our class.]
“The fascinating and ego-killing existence of human wormholes”, (2016-06-15):
He (Medicine Crow) was a fascinating man, not just for what he did but also for what he represents to us now. He was, to use a phrase coined by Jason Kottke, a “human wormhole.” His unusual and long live is a reminder to how connected the past and present really are.
A curator at the Smithsonian described meeting Medicine Crow as “you’re shaking hands with the 19th century.” Which an amazing concept. A few intrepid historians on reddit recently discovered an even more amazing one, calculating that it would take a chain of just six individuals who shook hands with one another to connect Barack Obama to George Washington across the centuries (Obama → Queen Elizabeth II → Herbert Hoover → William H. Taft → Benjamin Harrison → William Henry Harrison → Benjamin Harrison V. → George Washington).
I’ve become fascinated with discovering and tracking some of these reminders. For some time now, I’ve kept a file of them on 4×6 notecards in my house. My friends and I email these moments to each other as we find them—some absurd (Oscar Wilde and Walt Whitman may have hooked up), coincidental (Orson Welles claimed to have been in the Biograph Theater in Chicago where John Dillinger was killed by the FBI) and some that are so unbelievable that they might just blow your mind (there’s a video from a 1956 CBS game show, I’ve Got a Secret, with a very old guest whose secret was that he was in Ford’s Theatre when Lincoln was assassinated. Appearing with him on the show? Lucille Ball.)
Here in modern life, it’s easy to think the past is dead and distant, until we bump up against the reality of Faulkner’s admonition that it’s not really even past. England’s government only recently paid off debts it incurred as far back as 1720 from events like the South Sea Bubble, the Napoleonic wars, the empire’s abolition of slavery, and the Irish potato famine—meaning that for more than a decade and a half of the twenty first century there was still a direct and daily connection to the eighteenth and nineteenth centuries. (The US is still paying pensions related to both the Civil War and the Spanish-American War.)
…Did you know that Tom Pratt, a football coach whose team the Arizona Cardinals narrowly missed going to the Super Bowl in 2015, was also on the coaching staff for the Kansas City Chiefs in the very first Super Bowl fifty years ago? Or that there are whales alive today who were born before Melville published Moby Dick? Or the world’s oldest tortoise, Jonathan, lives on an island in the Atlantic and is 183 years old? Or that President John Tyler, born in 1790, who took office just ten years after little Jonathan was born, still has living grandchildren?
War is perhaps the strangest source of these anomalies. Did you know that Winston Churchill and James Bond creator Ian Fleming’s father fought in the same unit in WWI? When Fleming’s father was killed, Churchill wrote his obituary. General Simon Bolivar Buckner was a Confederate general in the Civil War (he surrendered to Grant at Fort Donelson). His son Simon Bolivar Buckner Jr also became a General, and he died at Okinawa some 83 years later. General MacArthur—his father, Arthur MacArthur, Jr.—was a Civil War hero for the Union. Stonewall Jackson had a granddaughter who lived to be 104. She died in 1991.
In high school, a promising young student at the Virginia Military Institute named George Marshall petitioned the president for a military commission. Which President did the creator of the Marshall plan petition? William McKinley (just months before man’s life was cut short by an assassin’s bullet.) And most unbelievably, what of the fact that Robert Todd Lincoln was present as his father died of assassination, was at the train station with President James Garfield was assassinated, and was in attendance at the event in which McKinley was assassinated? Three assassinations, spread out over 40 years. Robert Todd Lincoln himself lived to be 82, dying in 1926. He could have read stories published by F. Scott Fitzgerald. He drove in a car. He talked on the telephone. He would have heard jazz music.
And these are just the events of the so called modern history.
We forget that woolly mammoths walked the earth while the pyramids were being built. We don’t realize that Cleopatra lived closer to our time than she did to the construction of those famous pyramids that marked her kingdom. We forget that Ovid and Jesus were alive at the same time. When British workers excavated the land in Trafalgar Square to build Nelson’s Column and its famous bronze lions, in the ground they found the bones of actual lions, who’d roamed that exact spot just a few thousand years before.
[Explanation of XKCD #1393, “Timeghost”:]
Megan has been haunted by a Timeghost for some time. It is obviously not the first time the ghost arrives to let Megan know that “…ooOOOOOOOOooo… Tiiiime is passiiiing!” The ghost is dedicated to making people feel old by having them think about the passage of time. It is shown to reference time periods related to well-known people and events, such as famous actors and the release of movies and TV shows. Megan is just annoyed that it is back and wishes it to go away…But one thing about the prediction is true—they will eventually die. And this is the scary part about realizing how old you are and that you are quickly getting older: You will die, and “soon” (for some value thereof).
The comic seems to be using “factoid” to mean a small fact. “Factoid” can also mean a “questionable or spurious statement presented as a fact”, but this does not seem to be intended usage here. In this instance, some of the factoids are easily verifiable, while others are reasonable assumptions based on the number of years passed since the individual events. Several sources advocate the use of the word “factlet” to express a brief interesting fact, while using the word “factoid” for unverifiable or untrue statements passed as fact…“Timeghost” might be a literal interpretation of ‘Zeitgeist’, which is a German term for “spirit of time” and refers to the school of thought that influences or dominates the art and culture of a time period. All the events and people mentioned in this comic may be considered influences on present day art and culture.
Randall has covered making people feel old several times in 647: Scary, 891: Movie Ages, 973: MTV Generation (in which White Hat utters Cueball’s “That can’t be right” line), and 1477: Star Wars. Also see the blag post Odd Temporal Milestones. This is, however, so far the only one that makes a prediction of anyone’s death. A similar ghost with a much different agenda was seen in 1108: Cautionary Ghost. Similarly annoying fact(oids) were given in 1272: Shadowfacts.
“The Great Buenos Aires Bank Heist: They were an all-star crew. They cooked up the perfect plan. And when they pulled off the caper of the century, it made them more than a fortune—it made them folk heroes.”, (2020-02-20):
For more than six hours, the nation was transfixed. The police had nicknamed Walter “the Man in the Gray Suit.” He was instantly famous. The hostages, Walter said, were being treated well. The mood inside seemed oddly ebullient: At one point, Walter and another robber could be heard singing “Happy Birthday” to a bank employee whose phone had been buzzing with birthday messages from friends and family. At 3:30 in the afternoon, Walter asked for pizzas; the hostages were hungry, he said. Then, only a few minutes later, Walter went silent. For over three hours, police leaders and city officials fretted over what to do as further attempts to reach Walter failed. Finally a team of special-forces officers took up position outside the bank. At 7PM, they burst inside. But there was no shoot-out, no commotion. And no sign of the thieves. The hostages were dispersed on three floors—the lobby level, a mezzanine space, and down in a basement conference room, which had been locked from the inside. They were all unharmed.
It wasn’t until detectives reached the basement that they discovered what the robbers had truly been after. There, in the expanse of the bank’s subterranean level, hundreds of reinforced-steel safe-deposit boxes lined the walls. And in a place like San Isidro, at a time like 2006, those boxes represented a veritable treasure trove. Argentines are uniquely distrustful of their banks, and for good reason. They’ve been betrayed by them, over and over. Most famously in 2001, when the collapse of the national banking system, known as the corralito, erased entire fortunes, affecting millions. With no faith in accounts, bank customers began tucking their savings—their cash, jewelry, and other valuables—into safe-deposit boxes. And this particular bank, situated in one of the richest enclaves of Argentina, must have seemed especially enticing, flush as its deposit boxes were sure to be with the fortunes of the city’s most well-to-do.
Somehow the thieves had smashed open a huge number of the boxes—143 of the bank’s 400—and cleaned them out. But what exactly they’d grabbed, or where they’d gone, was a mystery. Cops swept every inch of the bank’s three floors but failed to locate a single member of the gang. The bank had only two exits—both of which had been covered by police since the siege began. All of the building’s windows were intact. And the robbers were not hiding among the hostages. They’d simply vanished. The thieves had left a few things behind. Detectives found a battery pack, a tool that they surmised had been used to crack the boxes, a row of toy guns laid neatly on the floor, and a note, taped to the wall above the toys. It was handwritten and must have seemed like a taunt: “In a neighborhood of rich people, without weapons or grudges, it’s just money, not love.”
[JPL-sponsored Art Deco/WPA poster series with the concept of advertising travel in the Solar System & to exoplanets; public domain & free to download/print.]
A creative team of visual strategists at JPL, known as “The Studio”, created the poster series, which is titled “Visions of the Future.” Nine artists, designers, and illustrators were involved in designing the 14 posters, which are the result of many brainstorming sessions with JPL scientists, engineers, and expert communicators. Each poster went through a number of concepts and revisions, and each was made better with feedback from the JPL experts.
David Delgado, creative strategy: “The posters began as a series about exoplanets—planets orbiting other stars—to celebrate NASA’s study of them. (The NASA program that focuses on finding and studying exoplanets is managed by JPL.) Later, the director of JPL was on vacation at the Grand Canyon with his wife, and they saw a similarly styled poster that reminded them of the exoplanet posters. They suggested it might be wonderful to give a similar treatment to the amazing destinations in our solar system that JPL is currently exploring as part of NASA. And they were right! The point was to share a sense of things on the edge of possibility that are closely tied to the work our people are doing today. The JPL director has called our people”architects of the future." As for the style, we gravitated to the style of the old posters the WPA created for the national parks. There’s a nostalgia for that era that just feels good."
Joby Harris, illustrator: “The old WPA posters did a really great job delivering a feeling about a far-off destination. They were created at a time when color photography was not very advanced, in order to capture the beauty of the national parks from a human perspective. These posters show places in our solar system (and beyond) that likewise haven’t been photographed on a human scale yet—or in the case of the exoplanets might never be, at least not for a long time. It seemed a perfect way to help people imagine these strange, new worlds.”
David Delgado: “The WPA poster style is beloved, and other artists have embraced it before us. Our unique take was to take one specific thing about the place and focus on the science of it. We chose exoplanets that had really interesting, strange qualities, and everything about the poster was designed to amplify the concept. The same model guided us for the posters that focus on destinations in the solar system.”
Lois Kim, typography: “We worked hard to get the typography right, since that was a very distinctive element in creating the character of those old posters. We wanted to create a retro-future feel, so we didn’t adhere exactly to the period styles, but they definitely informed the design. The Venus poster has a very curvy, flowy font, for example, to evoke a sense of the clouds.”
Handel’s tale of intrigue and impropriety in ancient Rome arrives in cinemas on February 29, with star mezzo-soprano Joyce DiDonato as the controlling, power-hungry Agrippina and Harry Bicket conducting. Sir David McVicar’s production ingeniously reframes the action of this black comedy about the abuse of power to “the present”, where it should loudly resonate. The all-star cast features mezzo-soprano Kate Lindsey as Agrippina’s son and future emperor Nerone, soprano Brenda Rae as the seductive Poppea, countertenor Iestyn Davies as the ambitious officer Ottone, and bass Matthew Rose as the weary emperor Claudius. This live cinema transmission is part of the Met’s award-winning Live in HD series, bringing opera to more than 2,200 theaters in more than 70 countries worldwide.
This production was originally created by the Théâtre Royal de la Monnaie / De Munt Brussels and adapted by the Metropolitan Opera. Sung in Italian. Estimated Run Time: 3 hrs 35 mins.
- Conductor: Harry Bicket
- Narciso: Nicholas Tamagna
- Poppea: Brenda Rae
- Agrippina: Joyce DiDonato
- Kate Lindsey: Nerone
- Ottone: Iestyn Davies
- Pallante: Duncan Rock
- Claudio: Matthew Rose
World Premiere: Teatro San Giovanni Crisostomo, Venice, 1709
This early Italian opera of Handel was a success that secured the composer’s international reputation and played a large role in paving the way for his lucrative and high-profile subsequent career in London. While he continued to develop artistically for the next 50 years, his entire life’s genius is perfectly evident in this first great operatic accomplishment. Even today, the issues at stake in Agrippina—the power plays, sexual politics, and cults of personality played out against a fickle public—continue to resonate.