“Multimodal Neurons in Artificial Neural Networks [CLIP]”, (2021-03-04):
[Investigation of CLIP activations: CLIP detects a wide variety of entities, like Spiderman, Lady Gaga, or Halle Berry, in a variety of media, such as photos, (images of) text, people in costumes, drawings, or just similar terms; previous cruder smaller NNs lacked this ‘conceptual’ level, only responding to the exact person’s photograph.
CLIP neurons further specialize in regions, famous individual, human emotions, religions, human attributes such as age/gender/facial-features, geographic regions (down to specific cities), holidays, art styles (such as anime vs painting), media franchises (Pokemon, Star Wars, Minecraft, Batman etc), brands, images of text, and abstract concepts like ‘star’ or ‘LGBTQ+’ or numbers or time or color. Such conceptual neurons also have ‘opposite’ neurons, like Donald Trump vs “musicians like Nicky Minaj and Eminem, video games like Fortnite, civil rights activists like Martin Luther King Jr., and LGBT symbols like rainbow flags.” The capabilities are best with the English language, but there is limited foreign-language capabilities as well.
Given the ‘conceptual’ level of neurons, it’s not too surprising that the overloaded/entangled/“polysemantic” neurons that Distill.pub has documented in VGG16 (which appear undesirable and to reflect the crudity of the NN’s knowledge) are much less present in CLIP, and the neurons appear to learn much cleaner concepts.
The power of the zero-shot classification, and the breadth of CLIP’s capabilities, can lead to some counterintuitive results, like their discovery of what they dub typographic attacks: writing “iPod” on a piece of paper and sticking it on the front of a Granny Smith apple can lead to the text string “iPod” being much more ‘similar’ to the image than the text string “Granny Smith”.
“CLIP: Connecting Text and Images: We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision. CLIP can be applied to any visual classification benchmark by simply providing the names of the visual categories to be recognized, similar to the “zero-shot” capabilities of GPT-2 and GPT-3.”, (2021-01-05):
[CLIP paper] We present a neural network that aims to address these problems: it is trained on a wide variety of images with a wide variety of natural language supervision that’s abundantly available on the internet. By design, the network can be instructed in natural language to perform a great variety of classification benchmarks, without directly optimizing for the benchmark’s performance, similar to the “zero-shot” capabilities of GPT-25 and GPT-3.6 This is a key change: by not directly optimizing for the benchmark, we show that it becomes much more representative: our system closes this “robustness gap” by up to 75% while matching the performance of the original ResNet507 on ImageNet zero-shot without using any of the original 1.28M labeled examples.
Approach: We show that scaling a simple pre-training task is sufficient to achieve competitive zero-shot performance on a great variety of image classification datasets. Our method uses an abundantly available source of supervision: the text paired with images found across the internet. This data is used to create the following proxy training task for CLIP: given an image, predict which out of a set of 32,768 randomly sampled text snippets, was actually paired with it in our dataset.
In order to solve this task, our intuition is that CLIP models will need to learn to recognize a wide variety of visual concepts in images and associate them with their names. As a result, CLIP models can then be applied to nearly arbitrary visual classification tasks. For instance, if the task of a dataset is classifying photos of dogs vs cats we check for each image whether a CLIP model predicts the text description “a photo of a dog” or “a photo of a cat” is more likely to be paired with it.
- CLIP is highly efficient…In the end, our best performing CLIP model trains on 256 GPUs for 2 weeks which is similar to existing large scale image models.
- CLIP is flexible and general: Because they learn a wide range of visual concepts directly from natural language, CLIP models are substantially more flexible and general than existing models. We find they are able to zero-shot perform many different tasks. To validate this we have measured CLIP’s zero-shot performance on over 30 different datasets including tasks such as fine-grained object classification, geo-localization, action recognition in videos, and OCR. [While CLIP’s zero-shot OCR performance is mixed, its semantic OCR representation is quite useful. When evaluated on the SST-2 NLP dataset rendered as images, a linear classifier on CLIP’s representation matches a CBoW model with direct access to the text. CLIP is also competitive at detecting hateful memes without needing ground truth text.] In particular, learning OCR is an example of an exciting behavior that does not occur in standard ImageNet models.
…CLIP allows people to design their own classifiers and removes the need for task-specific training data. [See also “AudioCLIP: Extending CLIP to Image, Text and Audio”, Guzhov et al 2021; CLIP notebook compilation for art, “Alien Dreams: An Emerging Art Scene”/“AI Generated Art Scene Explodes as Hackers Create Groundbreaking New Tools”.]
“Evolving Reinforcement Learning Algorithms”, (2021-01-08):
[Blog] We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. from , we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.
2021-scanlon.pdf: “Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain”, (2021-03-08; ):
[Blog] Preventing and mitigating high severity collisions is one of the main opportunities for Automated Driving Systems (ADS) to improve road safety.
This study evaluated the Waymo Driver’s performance within real-world fatal collision scenarios that occurred in a specific operational design domain (ODD). To address the rare nature of high-severity collisions, this paper describes the addition of novel techniques to established safety impact assessment methodologies.
A census of fatal, human-involved collisions was examined for years 2008 through 2017 for Chandler, AZ, which overlaps the current geographic ODD of the Waymo One fully automated ride-hailing service. Crash reconstructions were performed on all available fatal collisions that involved a passenger vehicle as one of the first collision partners and an available map in this ODD to determine the pre-impact kinematics of the vehicles involved in the original crashes. The final dataset consisted of a total of 72 crashes and 91 vehicle actors (52 initiators and 39 responders) for simulations.
Next, a novel counterfactual “what-if” simulation method was developed to synthetically replace human-driven crash participants one at a time with the Waymo Driver. This study focused on the Waymo Driver’s performance when replacing one of the first two collision partners.
The results of these simulations showed that the Waymo Driver was successful in avoiding all collisions when replacing the crash initiator, that is, the road user who made the initial, unexpected maneuver leading to a collision. Replacing the driver reacting (the responder) to the actions of the crash initiator with the Waymo Driver resulted in an estimated 82% of simulations where a collision was prevented and an additional 10% of simulations where the collision severity was mitigated (reduction in crash-level serious injury risk). The remaining 8% of simulations with the Waymo Driver in the responder role had a similar outcome to the original collision. All of these “unchanged” collisions involved both the original vehicle and the Waymo Driver being struck in the rear in a front-to-rear configuration.
These results demonstrate the potential of fully automated driving systems to improve traffic safety compared to the performance of the humans originally involved in the collisions. The findings also highlight the major importance of driving behaviors that prevent entering a conflict situation (eg. maintaining safe time gaps and not surprising other road users). However, methodological challenges in performing single instance counterfactual simulations based solely on police report data and uncertainty in ADS performance may result in variable performance, requiring additional analysis and supplemental methodologies.
This study’s methods provide insights on rare, severe events that would otherwise only be experienced after operating in extreme real-world driving distances (many billions of driving miles).
But while we often hear that autonomous driving technology could make a dramatic difference, before today there has not been a published scenario-based study that we’re aware of that looks into how autonomous technology performs in scenarios that led to fatal crashes by human drivers.
Today, we’re releasing the results of a study into how the Waymo Driver might perform in such tragic situations, which builds on the research that we released in October. While the October study showed that the Waymo Driver was only involved in minor collisions over more than 6 million miles driven in reality on public roads, our most recent study shows how the Waymo Driver likely would have performed in the majority of fatal crashes that occurred on the same roads over a 10 year period. The results are encouraging.
…For our analysis, we collected information on every fatal crash that took place in Chandler, Arizona between 2008–2017. We excluded crashes that didn’t match situations that the Waymo Driver would face in the real world today, such as when crashes occurred outside of our current operating domain. Then, the data was used to carefully reconstruct each crash using best-practice methods. Once we had the reconstructions, we simulated how the Waymo Driver might have performed in each scenario.
In total, the simulated Waymo Driver completely avoided or mitigated 100% of crashes aside from the crashes in which it was struck from behind, including every instance that involved a pedestrian or cyclist (20 simulations in total). This is the first time an autonomous technology company has shared its evaluation for how the system might perform in real-world fatal crash scenarios.
…In other words, even when a human driver did something to initiate a crash, such as running a red light, the simulated Waymo Driver avoided or mitigated the vast majority of these fatal crashes.
Replaying the same scenario discussed above, for example, the simulated Waymo Driver is approaching from the right of the screen, and has the right of way at a green light. But as it approaches the intersection, it spots the speeding car approaching from the bottom, predicts that it isn’t going to stop at the red light, and slows considerably until the speeder passes, avoiding the crash:
You can read more about our methodologies and our full results in our academic paper.
“ML Scaling subreddit”, (2020-10-30):
Subreddit for discussing AI, machine learning, or deep learning approaches involving big numbers: billions of parameters, millions of n, petaflops, etc. eg GPT-3. Most research is conducted at much smaller scale; this subreddit is for research analogous to ‘high energy physics’, requiring specialized approaches, large investments, consortium, etc.
Topics: How? Who? Why do they work? What are they good for? What resources are available? Who will pay & how? What is the future of such approaches? What global consequences will there be?
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset.
In this work, we explore if self-supervision lives up to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RetNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of . Code: this URL. [See also “DetCon: Efficient Visual Pretraining with Contrastive Detection”, Hénaff et al 2021; “DINO: Emerging Properties in Self-Supervised Vision Transformers”, Caron et al 2021]
“Learning from videos to understand the world”, (2021-03-12):
- Today, we’re announcing a project called “Learning from Videos”, designed to automatically learn audio, textual, and visual representations from the data in publicly available videos uploaded to Facebook.
- By learning from videos spanning nearly every country and hundreds of languages, this project will not just help us continuously improve our core AI systems for applications like content recommendation and policy enforcement—it will enable entirely new experiences.
- This is also part of our broader efforts toward building machines that learn like humans do—from any example, not just ones where experts have labeled.
- The first application is now live in Instagram Reels’ [TikTok-style 15s-long videos] recommendation system.
…Although we’ve just scratched the surface, using semi-supervised and Generalized Data Transformations (GDT), a state-of-the-art, self-supervised framework for video understanding, we’ve built and deployed an AI model in Instagram Reels’ recommendation system. And this is just the beginning of our Learning from Videos project. Early experiments in applying to real-world videos also show a 20% reduction in speech recognition errors, which could improve a wide range of applications like auto-captioning and tasks that help flag harmful content like hate speech. And we’re researching ways to apply new capabilities, like multimodal video retrieval, in order to make it easier for people to surface key moments in time from their trove of digital memories.on the videos uploaded to Facebook has already improved our computer vision and speech recognition systems. Within six months of developing
Improving Reels recommendations with self-supervision: Finding similar Reels fits particularly well with self-supervised models because Reels tend to be highly stylized, featuring common patterns across trendy videos. Popular videos often consist of the same music set to the same dance moves, but created and acted by different people. Self-supervised models automatically learn “themes”, group them together, and implicitly make them available to the recommendation system. We’re using self-supervision to suggest videos that are relevant to recently watched videos, while filtering out near-duplicates—without explicit training labels for each classification task. To achieve this, we leveraged Generalized Data Transformations (GDT), our state-of-the-art method for building video embeddings, which systematically learns the relationships between the sound and images in a video. Since building this technology last year, we’ve pioneered the large-scale application of GDT to the representation of Reels data, by training a series of models on a data set of millions of Reels and videos from Instagram…We ran the model in production and made its output available in real time to the ranking system. Using this approach, we were able to run online A/B tests that showed positive results.
Better speech recognition for more languages and domains: Recently, speech models have been able to successfully learn the entire structure of language using mostly raw speech data—and to improve on traditional, supervised methods. Our latest technique for learning speech representations, called wav2vec 2.0, works by first masking a portion of the speech and then learning to predict masked speech units. To provide an idea of the speed of progress, wav2vec 2.0 and self-training requires only 10 minutes of transcribed audio to achieve very good speech recognition results on the LibriSpeech industry benchmark. The same results required nearly 1,000 hours of transcribed audio just one year ago.
…To test the method on real-world data, we applied wav2vec 2.0 on millions of hours of unlabeled videos and just 100 hours of labeled data. We achieved strong improvements of about 20% relative word error reduction, compared with supervised-only baselines with the 100 hours. This proves, for the first time, thatwith wav2vec 2.0 is effective for real-world data sets that are not as curated as the LibriSpeech corpus used in the original paper. The video data we trained wav2vec on is largely varied, and we found that wav2vec performs particularly well for subdomains and accents where little labeled data exists.
As a next step, we’re now working on scaling wav2vec 2.0 to more data and more languages. These models will reduce labeling for new automatic speech recognition domains (eg., like AR glasses and virtual gaming), improve the performance of low-resource and medium-resource models, and improve other speech and audio tasks. As part of these efforts, we’re currently working on training a multilingual model with millions of hours of speech from 25 languages.
…Jointly learning video, audio, text to recall digital memories: …Recent using billions of public images and thousands of hashtags…In this research model, we extract a visual clip—which is a short sequence of visual frames—from a video every second. Our system analyzes this sequence using a convolutional neural network ( ) to produce a vector of numbers that represents the information in the clip. This information is aggregated across time, both with another and with an attention model. The output of this process is an overall representation of the information in the visual part of the video. We follow a similar process with audio…As a next step, we’re now working on scaling this feature up to millions of videos before we can start testing the feature in production.advances have made it possible to create a joint representation of audio, visual, and textual signals in a single vector space. As part of our latest research efforts, we are using the combination of Facebook videos and their associated text (title, caption, descriptions) as the key lever for multimodal understanding…We’ve previously achieved this for images rather than videos
…Our Learning from Videos project signals a paradigm shift in the way machines are able to understand videos, sending us on the path to build smarter AI systems. This work will allow us to move away from AI that requires people to look at and label videos by hand, and will make it possible for us to build AI systems that use the most advanced techniques, such as self-supervision, to improve recommendations, search, and retrieval, and other important applications for everyone on Facebook. As our systems continuously learn, they will become more reliable, efficient, and personalized, so that sharing and rediscovering moments can one day be effortless. We are excited to continue our research in the space as we share more of our findings and work to productionize cutting-edge AI research that improves our core technology systems, unlocking new experiences for the billions of people around the world who use our products and services every day.
In the past few years, we have witnessed remarkable breakthroughs in self-supervised representation learning. Despite the success and adoption of representations learned through this paradigm, much is yet to be understood about how different training methods and datasets influence performance on downstream tasks.
In this paper, we analyze contrastive approaches as one of the most successful and popular variants of self-supervised representation learning. We perform this analysis from the perspective of the training algorithms, pre-training datasets and end tasks. We examine over 700 training experiments including 30 encoders, 4 pre-training datasets and 20 diverse downstream tasks./p>
Our experiments address various questions regarding the performance of self-supervised models compared to their supervised counterparts, current benchmarks used for evaluation, and the effect of the pre-training data on end task performance.
We hope the insights and empirical evidence provided by this work will help future research in learning better visual representations.
Here we provide a summary of the analysis:
- First, we showed that a backbone trained in a supervised fashion on is not the best encoder for end tasks other than classification and Pets classification (which is a similar end task).
- Second, we showed that in many cases there is little to no correlation between accuracy and the performance of end tasks that are not semantic image-level.
- Third, we showed different training algorithms provide better encoders for certain classes of end tasks. More specifically, MoCo v2 proved better for pixel-wise tasks and SwAV showed better performance on image-level tasks.
- Fourth, we showed that structural end tasks benefit more from self-supervision compared to semantic tasks.
- Fifth, we showed pre-training the encoder on the same or similar dataset to that of the end task provides higher performance. This is a well-known fact for supervised representation learning, but it was not evident for self-supervised methods that do not use any labels.
- Sixth, we showed that representations learned on unbalanced ImageNet is as good or even slightly better than representations learned from balanced data.
Deep Convolutional Neural Networks (CNNs) have long been the architecture of choice for computer vision tasks. Recently, Transformer-based architectures like Vision Transformer (ViT) have matched or even surpassed ResNets for image classification. However, details of the Transformer architecture—such as the use of non-overlapping patches—lead one to wonder whether these networks are as robust.
In this paper, we perform an extensive study of a variety of different measures of robustness of ViT models and compare the findings tobaselines. We investigate robustness to input perturbations as well as robustness to model perturbations.
We find that when pre-trained with a sufficient amount of data, ViT models are at least as robust as the Transformers are robust to the removal of almost any single layer, and that while activations from later layers are highly correlated with each other, they nevertheless play an important role in classification.counterparts on a broad range of perturbations. We also find that
One-sentence Summary:applied directly to image patches and pre-trained on large datasets work really well on image classification.
While the CNNs is not necessary and a pure transformer can perform very well on image classification tasks when applied directly to sequences of image patches. When pre-trained on large amounts of data [JFT-300M] and transferred to multiple recognition benchmarks ( , CIFAR-100, VTAB, etc), Vision attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train…Our Vision , pre-trained on the JFT-300M dataset, approaches or beats state of the art on multiple image recognition benchmarks, reaching accuracy of 88.36% on ImageNet, 90.77% on -ReaL, 94.55% on CIFAR-100, and 77.16% on the VTAB suite of 19 tasks…Interestingly, our models took substantially less compute to pre-train than prior state of the art, however, we note that pre-training efficiency may be affected not only by the architecture choice, but also other parameters, such as training schedule, optimizer, weight decay, etc. We provide a controlled study of performance vs. compute for different architectures in Section 4.4…Finally, [we plan] to further scale ViT, given that the performance does not seem yet to be saturating with the increased model size.architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on
[Keywords: computer vision, image recognition, self-attention, transformer, large-scale training]
What is newsworthy? This question should haunt everyone with a platform.
Last month, Stanford HAI published the AI Index Report 2021, a 222-page report on the state of AI, put together by an all-star team supported by a lot of data and strong connections to technical experts. What was newsworthy in this report? According to The Verge, “Artificial intelligence research continues to grow as China overtakes U.S. in AI journal citations.” In fact, the article takes its cue from what the report authors themselves deemed important, given that “China overtakes the U.S. in AI journal citations” features as one of the report’s 9 key takeaways.
Dig deeper into the data, however, and you’ll uncover alternative takeaways. Look at the cross-national statistics on average field-weighted citation impact (FWCI) of AI authors, for example, which gives a sense of the quality of the average AI publication from a region. Interestingly enough, the U.S. actually increased its relative lead in FWCI over China over the past couple years. According to the 2019 version of the AI Index, the FWCI of US publications was about 1.5 times greater than China’s; in 2021, that gap has widened to almost 2 times greater (p. 24).
So, working off the same materials as released in the AI index, here’s another way one could have distilled key takeaways: “The U.S. increases its lead over China in average impact of AI publications.” Or, if you wanted to be cheeky: “China lags behind Turkey in average impact of AI publications.” Just as newsworthy, in my opinion.
However, what I found most newsworthy about the AI Index went beyond horse-race reporting about “who’s winning the AI race‽” Instead, I was most intrigued by the rise of commercially available machine translation (MT) systems, covered on page 64. According to data from Intento, a startup that assesses MT services, there are now 28 cloud MT systems with pre-trained models that are commercially available—an increase from just 8 in 2017. But wait … there’s more: Intento also reports an incredible spike in MT language coverage, with 16,000+ language pairs supported by at least one MT provider (slide 33 of Intento’s “State of Machine Translation” report).
…Somehow, these incredible advances in translation are not relevant to the effect of AI on U.S.-China relations, at least based on existing discussions. Compare the complete dearth of Twitter discussions centered on the following keywords: U.S., China, and “machine translation” against what you get when you replace “machine translation” with “facial recognition.” Consider another reference point, the recently published 756-page report by the National Security Commission on Artificial Intelligence (NSCAI). 62 of those pages mention the word “weapon” at least once. Only 9 pages mention the word “translation”, and most do not substantively discuss translation (eg. the word appears in a bibliographic reference for a translated text).
Yet, I could make a convincing case that translation is more important than targeting for U.S. national security. Think about the potential of improved translation capabilities for the intelligence community. Another obvious vector is the effect of translation on diplomacy.
Top 9 Takeaways:
- AI investment in drug design and discovery increased substantially: “Drugs, Cancer, Molecular, Drug Discovery” received the greatest amount of private AI investment in 2020, with more than USD 13.8 billion, 4.5× higher than 2019.
- The industry shift continues: In 2019, 65% of graduating North American PhDs in AI went into industry—up from 44.4% in 2010, highlighting the greater role industry has begun to play in AI development.
- Generative everything: AI systems can now compose text, audio, and images to a sufficiently high standard that humans have a hard time telling the difference between synthetic and non-synthetic outputs for some constrained applications of the technology.
- AI has a diversity challenge: In 2019, 45% new U.S. resident AI PhD graduates were white—by comparison, 2.4% were African American and 3.2% were Hispanic.
- China overtakes the US in AI journal citations: After surpassing the US in the total number of journal publications several years ago, China now also leads in journal citations; however, the US has consistently (and substantially) more AI conference papers (which are also more heavily cited) than China over the last decade.
- The majority of the US AI PhD grads are from abroad—and they’re staying in the US: The percentage of international students among new AI PhDs in North America continued to rise in 2019, to 64.3%—a 4.3% increase from 2018. Among foreign graduates, 81.8% stayed in the United States and 8.6% have taken jobs outside the United States.
- Surveillance technologies are fast, cheap, and increasingly ubiquitous: The technologies necessary for large-scale surveillance are rapidly maturing, with techniques for image classification, face recognition, video analysis, and voice identification all seeing substantial progress in 2020.
- AI ethics lacks benchmarks and consensus: Though a number of groups are producing a range of qualitative or normative outputs in the AI ethics domain, the field generally lacks benchmarks that can be used to measure or assess the relationship between broader societal discussions about technology development and the development of the technology itself. Furthermore, researchers and civil society view AI ethics as more important than industrial organizations.
- AI has gained the attention of the U.S. Congress: The 116th Congress is the most AI-focused congressional session in history with the number of mentions of AI in congressional record more than triple that of the 115th Congress.
“GPT-3: Language Models are Few-Shot Learners”, (2020-05-28):
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions—something which current NLP systems still largely struggle to do.
Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with GPT-3, an autoregressive language model with 175 billion parameters, 10× more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.state-of-the-art fine-tuning approaches. Specifically, we train
Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
…The precise architectural parameters for each model are chosen based on computational efficiency and load-balancing in the layout of models across GPU’s. Previous work [KMH+20] suggests that validation loss is not strongly sensitive to these parameters within a reasonably broad range.
[Paper: “DALL·E: Zero-Shot Text-to-Image Generation”, Ramesh et al 2021. Re-implementation: DALL·E Mini (writeup). cf CogView, Wu Dao. Availability through OA API still planned as of 2021-09-05.] DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text-image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.
GPT-3 showed that language can be used to instruct a large neural network to perform a variety of text generation tasks. iGPT showed that the same type of neural network can also be used to generate images with high fidelity. [iGPT is another answer to the question of “how do we do images autoregressively, but not at the exorbitant cost of generating pixels 1 by 1?”; iGPT uses ‘super pixels’ & very small images, while DALL·E uses VAE ‘tokens’ corresponding roughly to small squares so the token sequence is relatively small, where the VAE does the actual compilation to raw pixels.] we extend these findings to show that manipulating visual concepts through language is now within reach.
DALL·E’s vocabulary has tokens for both text and image concepts. Specifically, each image caption is represented using a maximum of 256 BPE-encoded tokens with a vocabulary size of 16384, and the image is represented using 1024 tokens with a vocabulary size of 8192. The images are preprocessed to 256×256 resolution during training. Similar to VQ-VAE,1415 each image is compressed to a 32×32 grid of discrete latent codes using a discrete VAE1011 that we pretrained using a continuous relaxation.1213 We found that training using the relaxation obviates the need for an explicit codebook, EMA loss, or tricks like dead code revival, and can scale up to large vocabulary sizes.
…Capabilities: We find that DALL·E is able to create plausible images for a great variety of sentences that explore the compositional structure of language. We illustrate this using a series of interactive visuals in the next section. The samples shown for each caption in the visuals are obtained by taking the top 32 of 512 after reranking with CLIP, but we do not use any manual cherry-picking, aside from the thumbnails and standalone images that appear outside.
Controlling attributes: We test DALL·E’s ability to modify several of an object’s attributes, as well as the number of times that it appears.
Drawing multiple objects
Visualizing perspective and three-dimensionality
Visualizing internal and external structure
Inferring contextual details
…With varying degrees of reliability, DALL·E provides access to a subset of the capabilities of a 3D rendering engine via natural language. It can independently control the attributes of a small number of objects, and to a limited extent, how many there are, and how they are arranged with respect to one another. It can also control the location and angle from which a scene is rendered, and can generate known objects in compliance with precise specifications of angle and lighting conditions.
Zero-shot visual reasoning: GPT-3 can be instructed to perform many kinds of tasks solely from a description and a cue to generate the answer supplied in its prompt, without any additional training. For example, when prompted with the phrase “here is the sentence ‘a person walking his dog in the park’ translated into French:”, GPT-3 answers “un homme qui promène son chien dans le parc.” This capability is called zero-shot reasoning. We find that DALL·E extends this capability to the visual domain, and is able to perform several kinds of image-to-image translation tasks when prompted in the right way. [See also CLIP.]
We did not anticipate that this capability would emerge, and made no modifications to the neural network or training procedure to encourage it. Motivated by these results, we measure DALL·E’s aptitude for analogical reasoning problems by testing it on Raven’s Progressive Matrices, a visual IQ test that saw widespread use in the 20th century. Rather than treating the IQ test a multiple-choice problem as originally intended, we ask DALL·E to complete the bottom-right corner of each image using argmax sampling, and consider its completion to be correct if it is a close visual match to the original. DALL·E is often able to solve matrices that involve continuing simple patterns or basic geometric reasoning, such as those in sets B and C. It is sometimes able to solve matrices that involve recognizing permutations and applying boolean operations, such as those in set D. The instances in set E tend to be the most difficult, and DALL·E gets almost none of them correct. For each of the sets, we measure DALL·E’s performance on both the original images, and the images with the colors inverted. The inversion of colors should pose no additional difficulty for a human, yet does generally impair DALL·E’s performance, suggesting its capabilities may be brittle in unexpected ways.
Nine months since the launch of our first commercial product, the OpenAI API, more than 300 applications are now using GPT-3, and tens of thousands of developers around the globe are building on our platform. We currently generate an average of 4.5 billion words per day, and continue to scale production traffic.
…As we scale access, our team is continually improving the platform—from implementing a content filter to offering new features for developers including our recently launched:
- Answers endpoint: Searches provided information (documents, knowledge bases etc.) for relevant context to be added to the prompt before completing with GPT-3. Can be used to build applications like customer support bots with no fine-tuning.
- Classifications endpoint: Can leverage labeled training data without fine-tuning. By searching for the closest examples with respect to the input query and adding them to prompt, it often matches the performance of state of the art fine-tuned models, providing an autoML solution that is easy to configure and adapt.
- Enhanced search endpoint: Provides the backbone for the Answers and Classifications endpoints that scales to a large number of documents while also being cheap and fast.
- …Prompt library: Provides starter prompt design examples for dozens of use cases that users can begin programming with directly in Playground, like a Spreadsheet Generator, Grammar Corrector, or Airport Code Extractor.
Over the course of development, humans learn myriad facts about items in the world, and naturally group these items into useful categories and structures. This semantic knowledge is essential for diverse behaviors and inferences in adulthood. How is this richly structured semantic knowledge acquired, organized, deployed, and represented by neuronal networks in the brain? We address this question by studying how the nonlinear learning dynamics of deep linear networks acquires information about complex environmental structures. Our results show that this deep learning dynamics can self-organize emergent hidden representations in a manner that recapitulates many empirical phenomena in human semantic development. Such deep networks thus provide a mathematically tractable window into the development of internal neural representations through experience.
An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: What are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences?
We address this question by mathematically analyzing the nonlinear dynamics of learning in deep linear networks. We find exact solutions to this learning dynamics that yield a conceptual explanation for the prevalence of many disparate phenomena in semantic cognition, including the hierarchical differentiation of concepts through rapid developmental transitions, the ubiquity of semantic illusions between such transitions, the emergence of item typicality and category coherence as factors controlling the speed of semantic processing, changing patterns of inductive projection over development, and the conservation of semantic similarity in neural representations across species.
Thus, surprisingly, our simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep-learning dynamics to give rise to these regularities.
[Keywords: semantic cognition, deep learning, neural networks, generative models]
- Human brain organoids are expanded relative to nonhuman apes prior to neurogenesis
- Ape neural progenitors go through a newly identified transition morphotype state
- Delayed morphological transition with shorter cell cycles underlie human expansion
- ZEB2 is as an evolutionary regulator of this transition
The human brain has undergone rapid expansion since humans diverged from other great apes, but the mechanism of this human-specific enlargement is still unknown. Here, we use cerebral organoids derived from human, gorilla, and chimpanzee cells to study developmental mechanisms driving evolutionary brain expansion.
We find that neuroepithelial differentiation is a protracted process in apes, involving a previously unrecognized transition state characterized by a change in cell shape. Furthermore, we show that human organoids are larger due to a delay in this transition, associated with differences in interkinetic nuclear migration and cell cycle length. Comparative RNA sequencing (RNA-seq) reveals differences in expression dynamics of cell morphogenesis factors, including ZEB2, a known epithelial-mesenchymal transition regulator.
We show that ZEB2 promotes neuroepithelial transition, and its manipulation and downstream signaling leads to acquisition of nonhuman ape architecture in the human context and vice versa, establishing an important role for neuroepithelial cell shape in human brain expansion.
[Keywords: brain, evolution, cell shape, organoids, brain expansion, neuroepithelium, neural stem cells, ZEB2, gorilla, chimpanzee]
It is one of the defining attributes of being human: when compared with our closest primate relatives, we have incredibly large brains.
…Tests on the tiny “brain organoids” reveal a hitherto unknown molecular switch that controls brain growth and makes the human organ three times larger than brains in the great apes. Tinker with the switch and the human brain loses its growth advantage, while the great ape brain can be made to grow more like a human’s.
“What we see is a difference in cellular behaviour very, very early on that allows the human brain to grow larger”, said Dr Madeleine Lancaster, a developmental biologist at the Medical Research Council’s Laboratory of Molecular Biology in Cambridge. “We are able to account for almost all of the size difference.”
The healthy human brain typically reaches about 1,500cm3 in adulthood, roughly three times the size of the 500cm3 gorilla brain or the 400cm3 chimp brain. But working out why has been fraught with difficulty, not least because developing human and great ape brains cannot easily be studied.
After several weeks, the human brain organoids were by far the largest of the lot, and close examination revealed why. In human brain tissue, so-called neural progenitor cells—which go on to make all of the cells in the brain—divided more than those in great ape brain tissue.
Lancaster, whose study is published in Cell, added: “You have an increase in the number of those cells, so once they switch to making the different brain cells, including neurons, you have more to start with, so you get an increase in the whole population of brain cells across the entire cortex.”
Mathematical modelling of the process showed that the difference in cell proliferation happens so early in brain development, that it ultimately leads to a near doubling in the number of neurons in the adult human cerebral cortex compared with that in the great apes.
2012-herculanohouzel.pdf: “The remarkable, yet not extraordinary, human brain as a scaled-up primate brain and its associated cost”, (2012-06-19; ):
[Herculano-Houzel 2009] Neuroscientists have become used to a number of “facts” about the human brain: It has 100 billion neurons and 10- to 50-fold more glial cells; it is the largest-than-expected for its body among primates and mammals in general, and therefore the most cognitively able; it consumes an outstanding 20% of the total body energy budget despite representing only 2% of body mass because of an increased metabolic need of its neurons; and it is endowed with an overdeveloped , the largest compared with brain size.
These facts led to the widespread notion that the human brain is literally extraordinary: an outlier among mammalian brains, defying evolutionary rules that apply to other species, with a uniqueness seemingly necessary to justify the superior cognitive abilities of humans over mammals with even larger brains. These facts, with deep implications for neurophysiology and evolutionary biology, are not grounded on solid evidence or sound assumptions, however.
Our recent development of a method that allows rapid and reliable quantification of the numbers of cells that compose the whole brain has provided a means to verify these facts. Here, I review this recent evidence and argue that, with 86 billion neurons and just as many nonneuronal cells, the human brain is a scaled-up primate brain in its cellular composition and metabolic cost, with a relatively enlargedthat does not have a relatively larger number of brain neurons yet is remarkable in its cognitive abilities and metabolism simply because of its extremely large number of neurons.
Research unveiled on Thursday in Science finds that crows know what they know and can ponder the content of their own minds, a manifestation of higher intelligence and analytical thought long believed the sole province of humans and a few other higher mammals.
A second study, also in Science, looked in unprecedented detail at the neuroanatomy of pigeons and barn owls, finding hints to the basis of their intelligence that likely applies to corvids’, too.
“Together, the two papers show that intelligence/consciousness are grounded in connectivity and activity patterns of neurons” in the most neuron-dense part of the bird brain, called the pallium, neurobiologist Suzana Herculano-Houzel of Vanderbilt University, who wrote an analysis of the studies for Science, told STAT. “Brains can appear diverse, and at the same time share profound similarities. The extent to which similar properties present themselves might be simply a matter of scale: how many neurons are available to work.”
…A second study looked in unprecedented detail at the neuroanatomy of pigeons and barn owls, finding hints to the basis of their intelligence that likely applies to corvids’, too. Scientists have long known that crows and ravens have unusually large forebrains, but unlike mammals’ forebrains—the neocortex—corvids’ do not have the 6 connected layers thought to produce higher intelligence. But theirs do have “connectivity patterns … reminiscent of the neocortex”, scientists led by Martin Stacho of Ruhr-University in Germany reported.
Specifically, the pigeons’ and owls’ neurons meet at right angles, forming computational circuits organized in columns. “The avian version of this connectivity blueprint could conceivably generate computational properties reminiscent of the [mammalian] neocortex”, they write. “Similar microcircuits … achieve largely identical cognitive outcomes from seemingly vastly different forebrains.” That is, evolution invented connected, circuit-laden brain structure at least twice.
“In theory, any brain that has a large number of neurons connected into associative circuitry … could be expected to add flexibility and complexity to behavior”, said Herculano-Houzel. “That is my favorite operational definition of intelligence: behavioral flexibility.”
That enables pigeons to home, count, and be as trainable as monkeys. But for sheer smarts we’re still in the corvid camp. A 2014 study showed that New Caledonian crows, rooks, and European jays can solve an Aesop’s Fable challenge, dropping stones into a water-filled tube to bring a floating bit of food within reach, something kids generally can’t do until age 7. These birds were the first nonhuman animals to solve the task.
2021-kirschhock.pdf: “Behavioral and Neuronal Representation of Numerosity Zero in the Crow”, (2021-06-02; ):
[media] Different species of animals can discriminate numerosity, the countable number of objects in a set. The representations of countable numerosities have been deciphered down to the level of single neurons. However, despite its importance for human number theory, a special numerical quantity, the empty set (numerosity zero), has remained largely unexplored. We explored the behavioral and neuronal representation of the empty set in carrion crows.
Crows were trained to discriminate small numerosities including the empty set. Performance data showed a numerical distance effect for the empty set in one crow, suggesting that the empty set and countable numerosities are represented along the crows’ “mental number line.” Single-cell recordings in the endbrain region nidopallium caudolaterale (NCL) showed a considerable proportion of NCL neurons tuned to the preferred numerosity zero. As evidenced by neuronal distance and size effects, NCL neurons integrated the empty set in the neural number line. A subsequent neuronal population analysis using a statistical classifier approach showed that the neuronal numerical representations were predictive of the crows’ success in the task. These behavioral and neuronal data suggests that the conception of the empty set as a cognitive precursor of a zero-like number concept is not an exclusive property of the of primates.
Zero as a quantitative category cannot only be implemented in the layered neocortex of primates, but also in the anatomically distinct endbrain circuitries of birds that evolved based on convergent evolution.
The conception of “nothing” as number “zero” is celebrated as one of the greatest achievements in mathematics. To explore whether precursors of zero-like concepts can be found in vertebrates with a cerebrum that anatomically differs starkly from our primate brain, we investigated this in carrion crows. We show that crows can grasp the empty set as a null numerical quantity that is mentally represented next to number one. Moreover, we show that single neurons in an associative avian cerebral region specifically respond to the empty set and show the same physiological characteristics as for countable quantities. This suggests that zero as a quantitative category can also be implemented in the anatomically distinct endbrain circuitries of birds that evolved based on convergent evolution.
[Keywords: corvid, songbird, single-neuron recordings, nidopallium caudolaterale, numbers, empty set]
Human eye color is highly heritable, but its genetic architecture is not yet fully understood.
We report the results of the largest genome-wide association study for eye color to date, involving up to 192,986 European participants from 10 populations. We identify 124 independent associations arising from 61 discrete genomic regions, including 50 previously unidentified. We find evidence for genes involved in melanin pigmentation, but we also find associations with genes involved in iris morphology and structure. Further analyses in 1636 Asian participants from two populations suggest that iris pigmentation variation in Asians is genetically similar to Europeans, albeit with smaller effect sizes. Our findings collectively explain 53.2% (95% confidence interval, 45.4 to 61.0%) of eye color variation using common single-nucleotide polymorphisms.
Overall, our study outcomes demonstrate that the genetic complexity of human eye color considerably exceeds previous knowledge and expectations, highlighting eye color as a genetically highly complex human trait.
…In Europeans, the 112 autosomal SNPs identified through conditional analysis (all autosomal SNPs shown in table S1) explained 99.96% (SE = 6.5%, p = 4.8 × 10−279) of the liability scale for blue eyes (against brown eyes) and 38.5% (SE = 5.7%, p = 2.2 × 10−130) for intermediate eyes in the TwinsUK cohort, which was one of the VisiGen cohorts used for replication. Using the same linear scale as the GWAS analysis, these autosomal SNPs explained 53.2% (SE = 4.0%, p = 1.2 × 10−322) of the total phenotypic variation in eye color in TwinsUK.
2021-fagereng.pdf: “Why Do Wealthy Parents Have Wealthy Children?”, (2021-02-05; ):
We show that family background matters statistically-significantly for children’s accumulation of wealth and investor behavior as adults, even when removing the genetic connection between children and the parents raising them. The analysis is made possible by linking Korean-born children who were adopted at infancy by Norwegian parents to a population panel data set with detailed information on wealth and socioeconomic characteristics. The mechanism by which these Korean-Norwegian adoptees were assigned to adoptive families is known and effectively random. This mechanism allows us to estimate the causal effects from an adoptee being raised in one type of family versus another.
…The linear rank correlations are 0.24 and 0.16 for the samples of non-adoptees and adoptees, respectively. This means that, on average, a 10 percentile increase in parent net wealth is associated with a 2.4 percentile increase in a biological child’s net wealth and a 1.6 percentile increase in an adoptee’s net wealth…On average, the adoptees accrue an extra US$2,250 of wealth if they are assigned to an adoptive family with US$10,000 of additional wealth. The magnitude of this estimate suggests that adoptees raised by parents with a wealth level that is 10% above the mean of the parent generation can expect to obtain a wealth level that is almost 3.7% above the mean of the child generation.
…We find that the indirect effects arising from changes in the observed mediator variables explain about 37% of the average causal effect from assignment to wealthier parents on children’s accumulation of wealth. Direct transfers of wealth are the most important mediator variable, accounting for almost 90% of the indirect effect.
…Columns 1 and 2 in panel A of Table 5 suggest that both family environment and genetics are important in explaining the variation in children’s wealth accumulation. Shared environment accounts for about 16% (10%) of the variation in net (financial) wealth accumulation. Relative to shared environment, the genetic factors explain a larger portion (twice as much or more) of the variation in wealth accumulation (both net and financial wealth). These findings are consistent with the results in Table 3, showing statistically-significant but less wealth transmission from parents to adoptees as compared with non-adoptees.
As shown in column 3 in panel A of Table 5, shared environment is also important for explaining the variation in financial risk-taking, as measured by the risky share. By comparison, genetic factors explain little of the variation in this measure of financial risk-taking. In column 4 of Table 5, we report results for education as measured by years of schooling. These results are close to the American study of Korean adoptees by Sacerdote (2007), who finds that 9% of the variation in years of schooling can be explained by shared environment, while 60% is attributable to genes.
The emergence of parasites in evolving replicating systems appears to be inevitable. Parasites emerge readily in models and laboratory experiments of the hypothesised earliest replicating systems: the RNA world. Phylogenetic reconstructions also suggest very early evolution of viruses and other parasitic mobile genetic elements in our biosphere. The evolution of such parasites would lead to extinction unless prevented by compartmentalization or spatial pattern formation, and the emergence of multilevel selection. Today and apparently since the earliest times, many intricate defence and counter-defence strategies have evolved.
Here we bring together for the first time automata chemistry models and spatial RNA world models, to study the emergence of parasites and the evolving complexity to cope with the parasites. Our system is initialized with a hand-designed program string that copies other program strings one character at a time, with a small chance of point mutation. Almost immediately, short parasites arise; these are copied more quickly, and so have an evolutionary advantage. Spatial pattern formation, in the form of chaotic waves of replicators followed by parasites, can prevent extinction. The replicators also become shorter, and so are replicated faster. They evolve a mechanism to slow down replication, which reduces the difference of replication rate of replicators and parasites. They also evolve explicit mechanisms to discriminate copies of self from parasites; these mechanisms become increasingly complex. Replicators speciate into lineages and can become longer, despite the fitness cost that entails.
We do not see a classical co-evolutionary arms-race of a replicator and a parasite lineage: instead new parasite species continually arise from mutated replicators, rather than from evolving parasite lineages. Finally we note that evolution itself evolves, for example by effectively increasing point mutation rates, and by generating novel emergent mutational operators. The inevitable emergence of parasites in replicator systems drives the evolution of complex replicators and complex ecosystems with high population density. Even in the absence of parasites, the evolved replicators outperform the initial replicator and the early short replicators.
Modelling replication as an active computational process opens up many degrees of freedom that are exploited not only to meet environmental challenges, but also to modify the evolutionary process itself.
A vaccine for COVID-19 is urgently needed. Several vaccine trial designs may substantially accelerate vaccine testing and approval, but also increase risks to human subjects. Concerns about whether the public would see such designs as ethical represent an important roadblock to their implementation; accordingly, both the World Health Organization and numerous scholars have called for consulting the public regarding them.
We answered these calls by conducting a cross-national survey (n = 5920) in Australia, Canada, Hong Kong, New Zealand, South Africa, Singapore, the United Kingdom, and the United States. The survey explained key differences between traditional vaccine trials and two accelerated designs: a challenge trial or a trial integrating a Phase II safety and immunogenicity trial into a larger Phase III efficacy trial.
Respondents’ answers to comprehension questions indicate that they largely understood the key differences and ethical trade-offs between the designs from our descriptions. We asked respondents whether they would prefer scientists to conduct traditional trials or one of these two accelerated designs. We found broad majorities prefer for scientists to conduct challenge trials (75%) and integrated trials (63%) over standard trials. Even as respondents acknowledged the risks, they perceived both accelerated trials as similarly ethical to standard trial designs. This high support is consistent across every geography and demographic subgroup we examined, including vulnerable populations.
These findings may help assuage some of the concerns surrounding accelerated designs.
By 2009, Harry Hong, a spiky-haired twenty-four-year-old Angeleno, delivered the site’s first certified max-out, and Adam Cornelius, another Tetris enthusiast and a filmmaker, began working on a documentary about the remarkable achievement. When Harrison saw the project on Kickstarter, he donated a few hundred dollars to help complete the film, but added a caveat. “You can’t just talk about Harry Hong”, he recalls writing. “You’ve got to talk about Jonas Neubauer. You’ve got to talk about Thor Aackerlund. You’ve got to get these guys together and have a tournament and see who’s actually the best.”
Some of the players who gathered for the first classic-Tetris tournament, for all their thousands of hours of practice, were in the dark about basic tactics. Hong was stunned to learn that his strategy of scoring Tetrises by dropping long bars into a left-side gap was suboptimal. Due to piece-flipping mechanics, a right-side gap was superior. Dana Wilcox, one of the highest-scoring players on the Twin Galaxies leaderboard, discovered that she’d played for 20 years without knowing that the blocks could be spun in either direction.
…Learning to “hyper-tap” was a priority. Thor had been the first to hyper-tap, but, by 2017, Koryan Nishio, a Japanese programmer in his forties, was the only prominent player using the technique. (“It seemed like a lot of work for a video game”, Vince Clemente, who has co-organized the classic-Tetris tournament since its inception, explained.) To Joseph, though, it was the obvious way to go. To tap quickly, he developed a unique one-handed grip: with his right thumb on the control pad, he flexed his right bicep until his arm shook, pressing down with each tremor, about fifteen times per second. He turned his thumb into a jackhammer.
…Jonas quit his job to stream full-time on Twitch—broadcasting an efficient, battle-tested style for amateurs to emulate. When Joseph won the tournament again, in 2019, he inspired more young players. In 2020 alone, 131 players maxed out; between 1990 and 2019, 87 players had maxed out. Kids had killed the Tetris curve.
These new players see a max-out not as an impossibility, but as a rite of passage. Before even buying the game, most of the rising generation of classic-Tetris players have already watched hours of the best performances, hard-wiring beautiful stacking strategies. As they begin practicing, they often join one of many classic-Tetris servers on Discord, where hundreds of people are online all the time, ready to discuss any aspect of the game. It’s there that they often learn the most common hyper-tapping grip—holding the controller sideways, with the directional pad facing up—and how to properly tense the right arm so that it shakes quickly and consistently. They study the principles of developing a relatively even stack with a built-out left side, and discuss how dropping a pair of tetrominoes in a complementary orientation can reduce the need for a timely T-piece. They can imitate Joseph’s “hyper-tap quick-tap”, in which he sneaks in a left-handed tap among a right-thumb flurry, or watch Jonas’s “Tetris Spin Class” and observe how certain flips can clear a line and make the stack Tetris ready.
What took Jonas years to figure out takes new players minutes. “You don’t need to experiment for hours trying to figure out what works and what doesn’t”, Jacob Huff, a nineteen-year-old who maxed out last March after playing for two months, said. “You can ask someone in the Discord and they’ll tell you every spin that you can do.” Strategies born on Discord are practiced and scrutinized on Twitch, then put to the test in a growing pool of competitions: Classic Tetris Monthly, Classic Tetris League, Classic Tetris Gauntlet, Classic Tetris Brawl. Thanks to hyper-tapping and more efficient stacking, players build higher and higher, almost refusing to accept any line clearance that’s not a Tetris. To the older generation, the style seems reckless. To newer players, it’s simply the best way to play.
…By the quarter-final [of the championship], the entire old guard had vanished. The remaining players were all of the YouTube generation, with many explicitly crediting its algorithm for introducing them to classic Tetris.
2021-singh.pdf: “Magic, Explanations, and Evil: The Origins and Design of Witches and Sorcerers [and replies]”, (2021-02-25; ):
In nearly every documented society, people believe that some misfortunes are caused by malicious group mates using magic or supernatural powers. Here I report cross-cultural patterns in these beliefs and propose a theory to explain them.
Using the newly created Mystical Harm Survey, I show that several conceptions of malicious mystical practitioners, including sorcerers (who use learned spells), possessors of the evil eye (who transmit injury through their stares and words), and witches (who possess superpowers, pose existential threats, and engage in morally abhorrent acts), recur around the world.
I argue that these beliefs develop from three cultural selective processes: a selection for intuitive magic, a selection for plausible explanations of impactful misfortune, and a selection for demonizing myths that justify mistreatment.
Separately, these selective schemes produce traditions as diverse as shamanism, conspiracy theories, and campaigns against heretics—but around the world, they jointly give rise to the odious and feared witch. I use the tripartite theory to explain the forms of beliefs in mystical harm and outline 10 predictions for how shifting conditions should affect those conceptions:
- People are more likely to believe in sorcerers as sorcery techniques become more effective seeming.
- People are more likely to ascribe injury to mystical harm when they are distrustful of others, persecuted, or otherwise convinced of harmful intent (“Accusations of Mystical Harm Track Distrust and Suspicions of Harmful Intent”).
- The emotions attributed to malicious practitioners will be those that most intensely and frequently motivate aggression (“Accusations of Mystical Harm Track Distrust and Suspicions of Harmful Intent”).
- People are more likely to attribute injury to mystical harm when they lack alternative explanations (“Mystical Harm Explains Impactful and Unexplainable Misfortunes”).
- The greater the impact of the misfortune, the more likely people are to attribute it to mystical harm (“Mystical Harm Explains Impactful and Unexplainable Misfortunes”).
- Practitioners of mystical harm are more likely to become demonized during times of stressful uncertainty.
- The traits ascribed to malicious practitioners will become more heinous or sensational as Condoners become more trustful or reliant on information from Campaigners.
- Malicious practitioners will become less demonized when there is less disagreement or resistance about their removal.
- The traits that constitute demonization will be those that elicit the most punitive outrage, controlling for believability (“Witches Are Well Designed to Induce Punitive Outrage”).
- Malicious practitioners whose actions can more easily explain catastrophe, such as those who employ killing magic compared with love magic, will be easier to demonize.
Societally corrosive beliefs can persist when they are intuitively appealing or they serve some believers’ agendas.
Microdosing is the practice of regularly using low doses of psychedelic drugs. Anecdotal reports suggest that microdosing enhances well-being and cognition; however, such accounts are potentially biased by the placebo effect.
This study used a ‘self-blinding’ citizen science initiative, where participants were given online instructions on how to incorporate placebo control into their microdosing routine without clinical supervision. The study was completed by 191 participants, making it the largest placebo-controlled trial on psychedelics to-date.
All psychological outcomes improved statistically-significantly from baseline to after the 4 weeks long dose period for the microdose group; however, the placebo group also improved and no statistically-significant between-groups differences were observed. Acute (emotional state, drug intensity, mood, energy, and creativity) and post-acute (anxiety) scales showed small, but statistically-significant microdose vs. placebo differences; however, these results can be explained by participants breaking blind.
The findings suggest that anecdotal benefits of microdosing can be explained by the placebo effect.
…It is worth noting that the current study was designed to protect blinding integrity by including placebos for the microdose group as well, administering the microdose capsules on different days of the week and by including the half-half group. The 3-arm design can be seen as a strength in this regard, adding ambiguity and thus strengthening blinding. Illustrative of the integrity of the blind, we received several emails from participants in the PL group who were in disbelief after opening their unused envelopes containing unused capsules after the conclusion of the study:
- “I counted the number of cut blotters I had in the left overs: they are 8…so you must be right… Which is incredible […] Some days during the test were really, really focused and colours more vivid. This sensation was really new to me”.
- “I have just checked the remaining envelopes and it appears that I was indeed taking placebos throughout the trial. I’m quite astonished […] It seems I was able to generate a powerful ‘altered consciousness’ experience based only the expectation around the possibility of a microdose”.
- “An empty pill with strong belief/intentions makes nearly everything. You put spirituality into an empty pill here…wow!”
eLife digest: Psychedelic psychotherapy, therapy enhanced with psychedelic drugs such as LSD or psilocybin (the active ingredient of ‘magic mushrooms’), has been suggested to improve psychological well-being. For this reason, trials on psychedelic therapy for the treatment of depression, addiction and other conditions are ongoing. Recently, ‘microdosing’—a way of administering psychedelics that involves taking about 10% of a recreational dose 2 or 3× per week—has gained popularity. Unlike taking large doses of psychedelics, microdosing does not induce hallucinations, but anecdotal reports suggest that it yields similar benefits as psychedelic therapy.
A key feature of modern medicine are ‘placebo control’ studies that compare two groups of patients: one that takes a drug and another that takes inactive pills, known as placebos. Crucially, neither group knows whether they are taking drug or placebo. This control ensures that observed effects are due to the drug itself and not to unrelated psychological causes. For example, in trials of mood medicines, participants often expect to feel happier, which in itself improves their mood even when taking a placebo. This is known as the placebo effect.
Restrictive drug policies make placebo-controlled studies on psychedelics difficult and expensive, in particular for microdosing, which involves taking psychedelics over a longer time period. To overcome this problem, Szigeti et al. developed a new citizen-science approach, where microdosers implemented their own placebo control based on online instructions. The advantages are the low cost and the ability to recruit participants globally. The experiment was completed by 191 microdosers, making it the largest placebo-controlled study on psychedelics to-date, for a fraction of the cost of an equivalent clinical study.
The trial examined whether psychedelic microdosing can improve cognitive function and psychological well-being. The team found that microdosing statistically-significantly increased a number of psychological measures, such as well-being and life satisfaction. However, participants taking placebo also improved: there were no statistically-significant differences between the two groups. The findings confirmed positive anecdotes about microdosing improving people’s moods, but at the same time show that taking empty capsules, knowing they might be microdoses, have the same benefits. This result suggests that the observed benefits are not caused by the microdose, but rather by psychological expectations.
The study’s innovative ‘do-it-yourself’ approach to placebo control may serve as a template for future citizen science studies on other popular phenomena where positive expectations and social factors could play a role, such as cannabidiol (CBD) oils, nootropics and nutrition.
Psychedelic microdosing describes the ingestion of near-threshold perceptible doses of classic psychedelic substances. Anecdotal reports and observational studies suggest that microdosing may promote positive mood and well-being, but recent placebo-controlled studies failed to find compelling evidence for this.
The present study collected web-based mental health and related data using a prospective (before, during and after) design. Individuals planning a weekly microdosing regimen completed surveys at strategic timepoints, spanning a core 4-week test period. 81 participants completed the primary study endpoint. Results revealed increased self-reported psychological well-being, emotional stability and reductions in state anxiety and depressive symptoms at the four-week primary endpoint, plus increases in psychological resilience, social connectedness, agreeableness, nature relatedness and aspects of psychological flexibility. However, positive expectancy scores at baseline predicted subsequent improvements in well-being, suggestive of a substantial placebo response. This study highlights a role for positive expectancy in predicting positive outcomes following psychedelic microdosing and cautions against zealous inferences on its putative therapeutic value.
…Due to the pragmatic challenges of doing so via an online observational study, the present study did not include a placebo control condition. We did, however, employ a prospective, naturalistic design that included baseline sampling of expectations about possible outcomes from the impending microdosing. Well-being, state anxiety and depressive symptom scores were measured weekly on five occasions (pre-dosing at baseline to week 4 of the microdosing regimen) in order to track time-dependent changes. Neuroticism/emotional stability was measured pre-dosing at baseline and post-dosing at week 4 only. It was predicted that well-being and emotional stability would be increased, and that depression and anxiety scores would be decreased, at the key-endpoint (4 weeks) compared with baseline. Capitalising on the nature of the prospective design, we also predicted that baseline positive expectations about microdosing would be related to any subsequent improvements in well-being, depressive symptoms and anxiety scores. Finally, exploratory analyses were performed to assess pre-post changes in a range of secondary psychological outcomes of interest.
…Expectancy effect on main outcome change scores: One-tailed partial correlations using Pearson coefficient were employed in order to investigate the effects of baseline expectations on endpoint change scores (endpoint—baseline) for the primary outcome variables (well-being, depressive symptoms and anxiety), whilst controlling for the corresponding baseline scores. In line with our main hypothesis, expectations for well-being improvement were statistically-significantly associated with change scores in well-being (r = 0.275 [d = −0.57], p = 0.007), depressive symptoms (r = −0.263 [d = −0.54], p = 0.009) and anxiety (r = −0.220 [d = −0.45], p = 0.025). These results indicate that baseline expectations were predictive of mental health change at the study endpoint.
2016-foroughi.pdf: “Placebo effects in cognitive training”, (2016-07-05; ):
Placebo effects pose problems for some intervention studies, particularly those with no clearly identified mechanism. Cognitive training falls into that category, and yet the role of placebos in cognitive interventions has not yet been critically evaluated. Here, we show clear evidence of placebo effects after a brief cognitive training routine that led to substantial fluid intelligence gains. Our goal is to emphasize the importance of ruling out alternative explanations before attributing the effect to interventions. Based on our findings, we recommend that researchers account for placebo effects before claiming treatment effects.
Although a large body of research shows that general cognitive ability is heritable and stable in young adults, there is recent evidence that fluid intelligence can be heightened with cognitive training. Many researchers, however, have questioned the methodology of the cognitive-training studies reporting improvements in fluid intelligence: specifically, the role of placebo effects. W
e designed a procedure to intentionally induce a placebo effect via overt recruitment in an effort to evaluate the role of placebo effects in fluid intelligence gains from cognitive training. Individuals who self-selected into the placebo group by responding to a suggestive flyer showed improvements after a single, 1-h session of cognitive training that equates to a 5-point to 10-point increase on a standard IQ test. Controls responding to a non-suggestive flyer showed no improvement.
These findings provide an alternative explanation for effects observed in the cognitive-training literature and the brain-training industry, revealing the need to account for confounds in future research.
…We also observed differences between groups for scores on the Theories of Intelligence scale, which measures beliefs regarding the malleability of intelligence (34). The participants in the placebo group reported substantially higher scores on this index compared with controls [B = 14.96, SE = 1.93, t(48) = 7.75, p < 0.0001, d = 2.15], indicating a greater confidence that intelligence is malleable. These findings indicate that our manipulation via recruitment flyer produced statistically-significantly different groups with regard to expectancy. We did not detect differences in Need for Cognition scores (41) [B = 0.56, SE = 5.67, t(48) = 0.10, p = 0.922] (Figure 3). Together, these results support the interpretation that participants self-selected into groups based on differing expectations.
2021-brown.pdf: “Can You Ever Be Too Smart for Your Own Good? Comparing Linear and Nonlinear Effects of Cognitive Ability on Life Outcomes”, (2021-03-08; ):
Despite a long-standing expert consensus about the importance of cognitive ability for life outcomes, contrary views continue to proliferate in scholarly and popular literature. This divergence of beliefs presents an obstacle for evidence-based policymaking and decision-making in a variety of settings. One commonly held idea is that greater cognitive ability does not matter or is actually harmful beyond a certain point (sometimes stated as > 100 or 120 IQ points). We empirically tested these notions using data from four longitudinal, representative cohort studies comprising 48,558 participants in the United States and United Kingdom from 1957 to the present. We found that ability measured in youth has a positive association with most occupational, educational, health, and social outcomes later in life. Most effects were characterized by a moderate to strong linear trend or a practically null effect (mean R2 range = 0.002–.256). Nearly all nonlinear effects were practically insignificant in magnitude (mean incremental R2 = 0.001) or were not replicated across cohorts or survey waves. We found no support for any downside to higher ability and no evidence for a threshold beyond which greater scores cease to be beneficial. Thus, greater cognitive ability is generally advantageous—and virtually never detrimental.
Effective management of global crises relies on expert judgment of their societal effects. How accurate are such judgments?
In the spring of 2020, we asked behavioral scientists (n = 717) and lay Americans (n = 394) to make predictions about COVID-19 pandemic-related societal change across social and psychological domains. Six months later we obtained retrospective assessments for the same domains (Nscientists = 270; NlayPeople = 411). Scientists and lay people were equally inaccurate in judging COVID’s impact, both in prospective predictions and retrospective assessments. Across studies and samples, estimates of the magnitude of change were off by more than 20% and less than half of participants accurately predicted the direction of changes. Critically, these insights go against public perceptions of behavioral scientists’ ability to forecast such changes (n = 203): behavioral scientists were considered most likely to accurately predict societal change and most sought after for recommendations across a wide range of professions.
Taken together, we find that behavioral scientists and lay people fared poorly at predicting the societal consequences of the pandemic and misperceive what effects it may have already had.
Working memory (WM) training has been proposed as a promising intervention to enhance cognitive abilities, but convincing evidence for transfer to untrained abilities is lacking. Prevalent limitations of training studies include the narrow assessment of both and cognitive abilities, the analysis of manifest variables subject to measurement error, and training dosages too low to likely cause changes in the cognitive system.
To address these limitations, we conducted a 2-year longitudinal study to investigate the effects oftraining on latent factors of capacity, fluid intelligence and crystallized intelligence. 112 students initially attending 9th grade practiced a heterogeneous set of validated tasks on a bi-weekly basis. A control group of 113 students initially attending 9th grade participated in the pretest and posttest. Broad and prototypical measures of fluid and crystallized intelligence served as measures of nearer and far transfer.
We found substantial and reliable training effects on the practicedtasks, as well as on a latent factor constituted by them. However, no transfer of training effects to latent factors of fluid or crystallized intelligence were observed.
These results question the utility and validity oftraining as means of improving cognitive abilities.
[Keywords: cognitive training, intelligence,change score models, transfer effects, ]
- Dream reports given after people awaken are often fragmentary and distorted
- Our methods allow for two-way communication with individuals during a lucid dream
- For a proof-of-concept demonstration, we presented math problems and yes-no questions
- Dreamers answered in real time with volitional eye movements or facial muscle signals
Dreams take us to a different reality, a hallucinatory world that feels as real as any waking experience. These often-bizarre episodes are emblematic of human sleep but have yet to be adequately explained. Retrospective dream reports are subject to distortion and forgetting, presenting a fundamental challenge for neuroscientific studies of dreaming.
Here we show that individuals who are asleep and in the midst of a lucid dream (aware of the fact that they are currently dreaming) can perceive questions from an experimenter and provide answers using electrophysiological signals. We implemented our procedures for two-way communication during polysomnographically verified rapid-eye-movement (REM) sleep in 36 individuals. Some had minimal prior experience with lucid dreaming, others were frequent lucid dreamers, and one was a patient with narcolepsy who had frequent lucid dreams.
During REM sleep, these individuals exhibited various capabilities, including performing veridical perceptual analysis of novel information, maintaining information in , computing simple answers, and expressing volitional replies. Their responses included distinctive eye movements and selective facial muscle contractions, constituting correctly answered questions on 29 occasions across 6 of the individuals tested.
These repeated observations of interactive dreaming, documented by four independent laboratory groups, demonstrate that phenomenological and cognitive characteristics of dreaming can be interrogated in real time. This relatively unexplored communication channel can enable a variety of practical applications and a new strategy for the empirical exploration of dreams.
[Keywords: REM sleep, interactive dreaming, lucid dream, sleep learning, targeted memory reactivation, sensory processing, two-way communication, sleep mentation, dreams, consciousness]
- Lilliputian hallucinations are not as harmless as traditionally assumed.
- Their etiology is diverse, with CNS pathology accounting for a third of the cases.
- Therefore, in most cases auxiliary investigations are advisable.
- Treatment is directed at the underlying cause.
- A failure of size constancy may explain part of the underlying mechanism.
Lilliputian hallucinations concern hallucinated human, animal or fantasy entities of minute size. Having been famously described by the French psychiatrist Raoul Leroy in 1909, who wrote from personal experience, to date they are mentioned almost routinely in textbooks of psychiatry, albeit with little in-depth knowledge.
I therefore systematically reviewed 145 case reports and case series comprising 226 case descriptions, concluding that lilliputian hallucinations are visual (61%) or multimodal (39%) in nature. In 97% of the cases, they are perceived as grounded in the actual environment, thus indicating involvement of higher-level regions of the perceptual network subserving the fusion of sensory and hallucinatory content. Perceptual release and deafferentiation [“loss of peripheral afferent input, believed to lead under many circumstances to central hyperirritability or excitatory states”] are the most likely underlying mechanisms. Etiology is extremely diverse, with schizophrenia spectrum disorder, alcohol use disorder and loss of vision accounting for 50% of the cases and neurological disease for 36%. Recovery was obtained in 62% of the cases, whereas 18% of the cases ended in chronicity and 8% in death.
Recommendations are made for clinical practice and future research.
[Keywords: Alcohol hallucinosis, Charles Bonnet syndrome, entity experience, intoxication, multimodal hallucination, psychedelics, size constancy]
Unlike Granny and her giant group of salmon-eating family members, transient orcas travel in smaller packs and are known for their wily hunting abilities: They can tip a sheet of ice in order to catapult a seal into the sea, or take down a porpoise in midair…A paper published this month describes how transients in the Salish Sea can intentionally strand themselves, hauling their bodies out on land, in order to hunt seals.
…Suddenly, the whales disappeared, and a uniform ripple appeared on the water’s surface. A small seal was swimming near the rocky shoreline, and the orca family had used its massive collective bulk to send an underwater pressure wave racing toward it. A second ripple rose from the surface, and the seal, knocked off balance, disappeared. Very quickly, it was clear that the family had triumphed: Gulls circled overhead, eager to claim the bits of seal that the whales would leave behind.
This is the hunt, the daily fight of mammal-eating orcas. It’s a dance with these creatures, a constant balance of risk and reward—the more aggressive the prey, the more likely they are to be injured in the battle. While residents have to work together to hunt salmon, salmon don’t fight back. For the transients, Hafey said, every meal is a potential death match: “It’s as if every time you opened the fridge you had to have mortal combat with a turkey to get a sandwich.”
Granny and her kin are considered part of the same species as transient killer whales, Orcinus orca. But residents and transients have lived separate lives for at least a quarter-million years. They generally do their best to avoid each other, and they don’t even speak the same language—the patterns and sounds they use to communicate are completely different. Over time, each type has established cultural traditions that are passed from generation to generation. While transients’ small groups enable them to hunt more quietly and effectively, residents’ large extended families allow them to work together to locate and forage for fish. Biology isn’t destiny, but for orcas, food sources might be.
…But these island celebrities are slowly dying. 40% of the Chinook salmon runs that enter the Salish Sea are already extinct, and a large proportion of the rest are threatened or endangered. The fish that are still around are much smaller than their predecessors, forcing whales to work harder and swim more for their meals. The resident population now numbers only 74, down from 97 in 1996.
Meanwhile, the sea lion population on the West Coast, which was protected from hunting in the United States and Canada in the 1970s, has bounced back from near extinction and is close to its historic size. The mammal-eating transient orcas are thriving in part because of this boom: During the years that Tahlequah was believed to suffer a miscarriage and the death of her newborn calf, T37A birthed the five calves who now played by her side. The transient population, which in 2018 reached 349, grew at about 4% a year for most of the past decade, and is well on its way to replacing the residents as the dominant killer whale in the Salish Sea.
…Without a reliable supply of fish, the resident orcas are beginning to behave more like transients. But, unlike the transients, they can’t just start eating squid, herring, or seals—they learn from birth that fish is their only food. “They can’t change their diet”, Deborah Giles, an orca researcher with the University of Washington Center for Conservation Biology, told me. “Theoretically, they could, but I’m reluctant to say they will switch, because they have this deep, intense cultural direction from their moms not to eat that thing.” (By occupying different positions on the food chain, the residents and transients avoid competing with each other, lessening the likelihood of aggressive encounters.)
The transient orcas are changing their behavior as well, Shields told me. While they typically travel and hunt as small family units of three to five whales, she’s recently seen them traveling in groups of 20 to 40. The groups are almost like the resident superpods of years past, Shields said. Researchers have nicknamed them “T-parties.” “They’re definitely less focused on being stealthy and hunting”, Shields continued. It’s possible that mammal-eating orcas have such abundant food that they don’t need to spend as much time hunting—and can spend more time socializing.
…The residents are speaking, loudly, to anyone who is listening. They are moving away from their summer homes, searching high and low for salmon they once found with ease. They are struggling to give birth, to keep their babies alive, to keep up with a rapidly shifting world. At the same time, the transients are quietly waiting to be heard.
1995-watanabe.pdf: “Estimation of the total saliva volume produced per day in five–year–old children”, (1995-08-01; ):
15 boys and 15 girls were asked to record for 2 days the time spent awake, eating meals or snacks, and sleeping. The salivary flow rates elicited by chewing foods were also determined.
The mean flow rate (± SD) of unstimulated saliva was 0.26 ± 0.16ml/min and that of saliva while chewing six different foods was 3.6 ± 0.8 ml/min. The mean times spent eating, and awake but not eating, were 80.8 ± 27.3 and 820 ± 59 min, respectively, and the volumes of saliva produced during those periods would average about 288 and 208 ml, respectively.
If the flow rate is virtually zero during sleep, the estimated total salivary volume produced per day is calculated to be about 500 ml.
[Keywords: unstimulated salivary volume, stimulated salivary volume, chewing foods, children]
“The Aesthetic-Usability Effect”, (2017-01-29):
Users are more tolerant of minor usability issues when they find an interface visually appealing. This aesthetic-usability effect can mask UI problems and can prevent issue discovery during usability testing. Identify instances of the aesthetic-usability effect in your user research by watching what your users do, as well as listening to what they say.
It’s a familiar frustration to usability-test moderators: You watch a user struggle through a suboptimal UI, encountering many errors and obstacles. Then, when you ask the user to comment on her experience, all she can talk about is the site’s great color scheme:
During usability testing, one user encountered many issues while shopping on the FitBit site, ranging from minor annoyances in the interaction design to serious flaws in the navigation. She was able to complete her task, but with difficulty. However, in a post-task questionnaire, she rated the site very highly in ease of use. “It’s the colors they used”, she said. “Looks like the ocean, it’s calm. Very good photographs.” The positive emotional response caused by the aesthetic appeal of the site helped mask its usability issues.
Instances like this are often the result of the aesthetic-usability effect.
Definition: The aesthetic-usability effect refers to users’ tendency to perceive attractive products as more usable. People tend to believe that things that look better will work better—even if they aren’t actually more effective or efficient.
“They Might Never Tell You It’s Broken”, (2019-11-02):
As part of my PhD, I developed Higgs, an experimental JIT compiler…I developed it on GitHub, completely in the open, and wrote about my progress on this blog. Pretty soon, the project had 300 stars on , a handful of open source contributors, and I was receiving some nice feedback.
…One day, someone I had been exchanging with on the chat room for two weeks reached out to me to signal a strange bug. They couldn’t get the tests to pass and were getting a segmentation fault. I was puzzled. They asked me if Higgs had MacOS support. I explained that I’d never tested it on MacOS myself, but I couldn’t see any reason why it wouldn’t work. I told this person that the problem was surely on their end. Higgs had been open source for over a year. It was a pretty niche project, but I knew for a fact that at least 40–60 people must have tried it, and at least 50% of these people must have been running MacOS. I assumed that surely, if Higgs didn’t run on MacOS at all, someone would have opened aissue by now. Again, I was wrong.
…It’s a horrifying thought, but it could be that for every one person who opens an issue on, 100 or more people have already tried your project, run into that same bug, and simply moved on.
[Gwern.net examples of this include: 400,000+ Chinese visitors to This Waifu Does Not Exist not mentioning that the mobile version was horribly broken; Apple users not mentioning that 80% of Gwern.net videos didn’t play for them; the Anime Faces page loading 500MB+ of files on each page load… Another fun example: popups on all Wikipedias worldwide for ~5 months (September 2020–January 2021), could be disabled but not re-enabled (affecting ~24 billion page views per month or ~120 billion page views total); no one mentioned it until we happened to investigate the feature while cloning it for Gwern.net.]
Apple keeps doing things in the Mac OS that leave the user-experience (UX) community scratching its collective head, things like hiding the scroll bars and placing invisible controls inside the content region of windows on computers.
Apple’s mobile devices are even worse: It can take users upwards of 5 seconds to accurately drop the text pointer where they need it, but Apple refuses to add the arrow keys that have belonged on the keyboard from day-one.
Apple’s strategy is exactly right—up to a point:
Apple’s decisions may look foolish to those schooled in UX, but balance that against the fact that Apple consistently makes more money than the next several leaders in the industry combined.
While it’s true Apple is missing something—arrow keys—we in the UX community are missing something, too: Apple’s razor-sharp focus on a user many of us often fail to even consider: The potential user, the buyer. During the first Jobsian era at Apple, I used to joke that Steve Jobs cared deeply about Apple customers from the moment they first considered purchasing an Apple computer right up until the time their check cleared the bank.
…What do most buyers not want? They don’t want to see all kinds of scary-looking controls surrounding a media player. They don’t want to see a whole bunch of buttons they don’t understand. They don’t want to see scroll bars. They do want to see clean screens with smooth lines. Buyers want to buy Ferraris, not tractors, and that’s exactly what Apple is selling.
… Let me offer two examples of Apple objects that aid in selling products, but make life difficult for users thereafter.
The Apple Dock: The Apple Dock is a superb device for selling computers for pretty much the same reasons that it fails miserably as a day-to-day device: A single glance at the Dock lets the potential buyer know that this a computer that is beautiful, fun, approachable, easy to conquer, and you don’t have to do a lot of reading. Of course, not one of these attributes is literally true, at least not if the user ends up exploiting even a fraction of the machine’s potential, but such is the nature of merchandizing, and the Mac is certainly easier than the competition.
The real problem with the Dock is that Apple simultaneously stripped out functionality that was far superior, though less flashy, when they put the Dock in.
Invisible Scroll Bars:
“Gee, the screen looks so clean! This computer must be easy to use!” So goes the thinking of the buyer when seeing a document open in an Apple store, exactly the message Apple intends to impart. The problem right now is that Apple’s means of delivering that message is actually making the computer less easy to use!
…the scroll bar has become a vital status device as well, letting you know at a glance the size of and your current position within a document…Hiding the scroll bar, from a user’s perspective, is madness. If the user wants to actually scroll, it’s bad enough: He or she is now forced to use a thumbwheel or gesture to invoke scrolling, as the scroll bar is no longer even present. However, if the user simply wants to see their place within the document, things can quickly spiral out of control: The only way to get the scroll bar to appear is to initiate scrolling, so the only way to see where you are right now in a document is to scroll to a different part of the document! It may only require scrolling a line or two, but it is still crazy on the face of it! And many windows contain panels with their own scroll bars as well, so trying to trick the correct one into turning on, if you can do so at all (good luck with Safari!) can be quite a challenge…(The scroll bars, even when turned on, are hard to see with their latest mandatory drab gray replacing bright blue and are now so thin they take around twice as long to target as earlier scroll bars. When a company ships products either before user testing or after ignoring the results of that testing, both their product and their users suffer.)
…Industrial design: Borrow the aesthetic, ignore the limitation
While Apple has copied over the aesthetics of industrial design into the software world, they have also copied over its limitation: Whether it be a tractor, Ferrari, or electric toaster, that piece of hardware, in the absence of upgradeable software, will look and act the same the first time you use it as the thousandth time. Software doesn’t share that natural physical limitation, and Apple must stop acting as though it does.
“Lights and Shadows”, (2020-07-01):
[Tutorial on illumination & geometry with interactive JS widgets for visualizing ray-casting of lights and shadows. Topics: light power, position, logarithmic perception, distance & angle governing intensity (‘irradiance’), radians, casting onto spheres, luminance, reflections, and color.]
It’s hard to describe how paramount light is. Ultimately, it is the only thing we see. But just as important the presence of light is, so is its absence. To talk about light we have to start in darkness so let’s jump straight into it. Light is a visible portion of electromagnetic radiation, but in this article I’m not going to discuss any of the underlying details like wave-particle duality. Instead, I’ll try to explain how light creates so many beautiful effects seen in everyday life. In the demonstration below you can use the sliders to control the position and size of a rectangular light source. You can also drag around the scene to see it from different angles…
“Large Batch Simulation for Deep Reinforcement Learning”, (2021-03-12):
We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single 8- machine.
The key idea of our approach is to design a 3D renderer and embodied navigation simulator around the principle of “batch simulation”: accepting and executing large batches of requests simultaneously. Beyond exposing large amounts of work at once, batch simulation allows implementations to amortize in-memory storage of scene assets, rendering work, data loading, and synchronization costs across many simulation requests, dramatically improving the number of simulated agents perand overall simulation throughput.
To balance DNN inference and training costs with faster simulation, we also build a computationally efficient policy DNN that maintains high task performance, and modify training algorithms to maintain sample efficiency when training with large mini-batches.
By combining batch simulation and DNN performance optimizations, we demonstrate that PointGoal navigation agents can be trained in complex 3D environments on a single in 1.5 days to 97% of the accuracy of agents trained on a state-of-the-art system using a 64- cluster over 3 days. We provide open-source reference implementations of our batch 3D renderer and simulator to facilitate incorporation of these ideas into RL systems.
2021-meyer.pdf: “The Use and Misuse of Income Data and Extreme Poverty in the United States”, (2021; ):
[Interview] Recent research suggests that the share of US households living on less than $2/person/day is high and rising.
We reexamine such extreme poverty by linking SIPP and CPS data to administrative tax and program data.
We find that more than 90% of those reported to be in extreme poverty are not, once we include in-kind transfers, replace survey reports of earnings and transfer receipt with administrative records, and account for ownership of substantial assets. More than half of all misclassified households have incomes from the administrative data above the poverty line, and many have middle-class measures of material well-being.
2006-mackenzie.pdf: “Is economics performative? Option theory and the construction of derivatives markets”, (2006-08-23; ):
The thesis that economics is “performative” (Callon 1998) has provoked much interest but also some puzzlement and not a little confusion. The purpose of this article is to examine from the viewpoint of performativity one of the most successful areas of modern economics, the theory of options, and in so doing hopefully to clarify some of the issues at stake. To claim that economics is performative is to argue that it does things, rather than simply describing (with greater or lesser degrees of accuracy) an external reality that is not affected by economics. But what does economics do, and what are the effects of it doing what it does?
That the theory of options is an appropriate place around which to look for performativity is suggested by two roughly concurrent developments. Since the 1950s, the academic study of finance has been transformed from a low-status, primarily descriptive activity to a high-status, analytical, mathematical, Nobel-prize-winning enterprise. At the core of that enterprise is a theoretical account of options dating from the start of the 1970s (Black-Scholes). Around option theory there has developed a large array of sophisticated mathematical analyses of financial derivatives. (A “derivative” is a contract or security, such as an option, the value of which depends upon the price of another asset or upon the level of an index or interest rate.)
…Away from the hubbub, computers were used to generate Black-Scholes prices. Those prices were reproduced on sets of paper sheets which floor traders could carry around, often tightly wound cylindrically with only immediately relevant rows visible so that a quick squint would reveal the relevant price. While some individual traders and trading firms produced their own sheets, others used commercial services. Perhaps the most widely used sheets were sold by Fischer Black himself: see figure 2. Each month, Black would produce computer-generated sheets of theoretical prices for all the options traded on U.S. options exchanges, and have them photocopied and sent to those who subscribed to his pricing service. In 1975, for example, sheets for 100 stocks, with 3 volatility estimates for each stock, cost $1,170$3001975 per month, while a basic service with one stock and one volatility estimate cost $59$151975 per month (Black 1975b, “The Option Service: An Introduction”)
At first sight, Black’s sheets look like monotonous arrays of figures. They were, however, beautifully designed for their intended role in “distributed cognition” (Hutchins 1995a and b). Black included what options traders using the Black-Scholes-Merton model needed to know, but no more than they needed to know—there is virtually no redundant information on a sheet—hence their easy portability. He found an ad hoc but satisfactory way of dealing with the consequences of dividends for option pricing (an issue not addressed in the original version of the model), and devoted particular care to the crucial matter of the estimation of volatility. Even the physical size of the sheets was well-judged. Prices had first to be printed on the large computer line-printer paper of the period, but they were then photo-reduced onto standard-sized paper, differently colored for options traded on the different exchanges. The resultant sheets were small enough for easy handling, but not so small that the figures became too hard to read (the reproduction in figure 2 is smaller than full-scale).
How were Black’s sheets and similar option pricing services used? They could, of course, simply be used to set option prices. In April 1976, options trading began on the Pacific Stock Exchange in San Francisco, and financial economist Mark Rubinstein became a trader there. He told me in an interview that he found his fellow traders on the new exchange initially heavily reliant on Black’s sheets: “I walked up [to the most active option trading ‘crowd’] and looked at the screen [of market prices] and at the sheet and it was identical. I said to myself, ‘academics have triumphed’” (Rubinstein 2000).
It was long assumed that only humans can distinguish the living from the dead. Renewed interest in this question over the last decade has led several authors to assert that non-human primates are also aware of death. We investigate this issue by comparing the behaviours of monkeys and great apes toward helpless conspecifics, basing our analysis on published reports.
We first examine the behaviours of mothers towards the body of their dead offspring. They may carry the corpse for days or more before abandoning it. They groom, inspect and protect it, sometimes allowing group members to explore it, and rare cases of cannibalism have been reported. No substantial difference is observed in the way that monkeys and great apes treat the bodies of infants.
We then examine responses to collapsed (still able to move and react) and inanimate (unresponsive or dead) conspecifics. Monkeys and great apes guard, care for and inspect their helpless partners, and also manipulate and mobilize them. Through these actions, individuals may inform themselves about the state of their partners, test their responsiveness and/or attempt to rouse them. It appears that only chimpanzees and gorillas show violent action such as display behaviours and the rough treatment of bodies. They can also make distress calls, and periods of “stunned silence” sometimes occur in chimpanzees, indicating that they are experiencing intense emotion.
Finally, we argue that while both monkeys and great apes detect body dysfunction through the victims’ inability to wake up and move, only great apes can understand that something serious has happened. The signs of emotional disturbance reported in them indicate that they may believe that inanimate conspecifics have entered a state of “dormancy”, meaning that they are unlikely to regain wakefulness. However, there is no evidence that any non-human primates are aware of mortality.
[Keywords: death, emotion, distress, empathy, mental representation, epimeletic behaviour, primate]