December 2020 gwern.net newsletter with links on AI and technology; major new site feature: fully-generalized recursive popups. newsletter 2019-12-26–2021-01-06finishedcertainty: logimportance: 0
Gwern.net: recursive link annotations (memoized for efficiency in a “Link Bibliography”); fully generalized cross-element/page/document/website popups on desktop (demo); syntax highlighting of source code popups (demo); directory indexes
“Object-based attention for spatio-temporal reasoning: Outperforming neuro-symbolic models with flexible distributed architectures”, Ding et al 2020 (“DL can’t do reasoning”; reminds me of Santoro et al 2017; the bitter lesson?—if “neurosymbolic” types want their work to be relevant in even 5 years, they’d do better to focus on creating datasets which will induce symbolic reasoning eg if you want understanding of common noun weights, scrape Amazon & dump metadata of millions of listings formatted with shipping weight at the end (forcing a flexible understanding of titles/descriptions, quantities, and special-cases like fragility or danger), or convert knowledge graph databases like OpenCyc/Wikidata to text to distill them into a NN model.)
“Imitating Interactive Intelligence”, Interactive Agents Group 2020 (DM blog; “With each doubling of the dataset size, performance grew by approximately the same increment…agents trained to imitate human action and language demonstrate powerful combinatorial generalisation capabilities.” DeepMind gets closer to the goal of matching a mouse.1)
“What to Make of Rod McKuen?” (revisiting the now-forgotten pop poet of the masses; another example of childhood abuse spurring an insatiable need for achievement, made-up if need be)
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
Efficiently navigating a superpressure balloon in the stratosphere requires the integration of a multitude of cues, such as wind speed and solar elevation, and the process is complicated by forecast errors and sparse wind measurements. Coupled with the need to make decisions in real time, these factors rule out the use of conventional control techniques. Here we describe the use of reinforcement learning to create a high-performing flight controller. Our algorithm uses data augmentation and a self-correcting design to overcome the key technical challenge of reinforcement learning from imperfect data, which has proved to be a major obstacle to its application to physical systems. We deployed our controller to station Loon superpressure balloons at multiple locations across the globe, including a 39-day controlled experiment over the Pacific Ocean. Analyses show that the controller outperforms Loon’s previous algorithm and is robust to the natural diversity in stratospheric winds. These results demonstrate that reinforcement learning is an effective solution to real-world autonomous control problems in which neither conventional methods nor human intervention suffice, offering clues about what may be needed to create artificially intelligent agents that continuously interact with real, dynamic environments.
Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner. Here, we present a new architecture for image generators, where the color value at each pixel is computed independently given the value of a random latent vector and the coordinate of that pixel. No spatial convolutions or similar operations that propagate information across pixels are involved during the synthesis. We analyze the modeling capabilities of such generators when trained in an adversarial fashion, and observe the new generators to achieve similar generation quality to state-of-the-art convolutional generators. We also investigate several interesting properties unique to the new architecture.
…if an idea comes to mind of artwork I would like to see which doesn’t exist yet, I need to pay real money to commission an artist to do so. In the English-language market, there are quite a few options. These range from art websites such as DeviantArt and FurAffinity, to services which provide you with YCH (Your Character Here) artwork to bid for such as YCH.commishes.
But if you have favourite Japanese artists on services such as Pixiv, then the language barrier may prevent you from outright enquiring whether an artist is taking a commission. This is where Skeb.jp comes in, a Japanese artwork and voice-over commissioning service, which to date has received over 100,000 requests and has thousands of artists taking requests from the public. Like a small but growing number of Japanese artwork sites, English-language support is incorporated into the website. But even more substantial, is the ability for English-writing users to submit requests of their own through the DeepL Translator service.
…Each artist profile provides a direct Yes/No answer around whether they are taking requests, a sample of their public works, approximate rates (minimum and recommended), and the time it takes them to deliver the requested artwork typically. Doing this saves much unreasonable back-and-forth between client and artist about simple information. Instead, the website has you fill out a request form (example below) which allows you to provide specifics and payment. The simple one-page form allows you to enter an overview of the artwork you want to be commissioned (which is translated into English), provide a sum you are happy to pay, determine whether or not you want the artwork to be SFW (Safe for Work) or NSFW (Not Safe for Work) and a few other specifications. Some components of this form (eg. whether NSFW requests are acceptable) can be dictated directly by the artist. Otherwise whichever the client dictates cannot be switched by either party after being submitted. The application is then sent off to the artist, which they have the exclusive right to accept or decline. Depending on the deadline selected, the artist either has 30 days to accept with a delivery deadline of 60 days after submission, or 7 days to take with a delivery deadline of 90 days after submission.
…DeepL is a fantastic machine-learning translator service which I use regularly. But adding to the game of chance, translations can on the odd occasion come out with surreal interpretations. These include instances of こんばんは [“good evening”] being complemented with a dozen exclamation points, to things which don’t match what you wrote at all. Not only do you have the occasional translation issue to deal with, but character limits. Artists can dictate whether they want requests which are 140 characters in length or 1,000 characters in length. In theory, this is great, as it allows artists to dictate whether they want brief requests which will enable them to use their creativity or extended requests that use more of the client’s creativity. But with the 140 character limit, it can get tough to write more than a small sentence or two in English within the count, before they are shortened considerably into Japanese.
DeepL Translator is a free neural machine translation service launched on 28 August 2017 and developed by DeepL GmbH (Linguee), based in Cologne, Germany. It has received positive press asserting that it is more accurate and nuanced than Google Translate.
Artificial intelligence (AI) is surpassing human performance in a growing number of domains. However, there is limited evidence of its economic effects. Using data from a digital platform, we study a key application of AI: machine translation. We find that the introduction of a new machine translation system has significantly increased international trade on this platform, increasing exports by 10.9%. Furthermore, heterogeneous treatment effects are consistent with a substantial reduction in translation costs. Our results provide causal evidence that language barriers significantly hinder trade and that AI has already begun to improve economic efficiency in at least one domain.
This paper explores the hypothesis that the diversity of human languages, right now a barrier to ‘interoperability’ in communication and trade, will become significantly less of a barrier as machine translation technologies are deployed over the next several years. I argue that machine translation will become the 2020’s analogy for ideas to what container shipping did for goods trade in the second half of the 20th century. But as with container shipping or railroads in the 19th century, this new boundary-breaking technology does not reduce all boundaries equally, and it creates new challenges for the distribution of ideas and thus for innovation and economic growth. How we develop, license, commercialize, and deploy machine translation will be a critical determinant of its impact on trade, political coalitions, diversity of thought and culture, and the distribution of wealth. [Keywords: trade, globalization, machine translation, inequality, productivity]
Neural networks have achieved success in a wide array of perceptual tasks, but it is often stated that they are incapable of solving tasks that require higher-level reasoning. Two new task domains, CLEVRER and CATER, have recently been developed to focus on reasoning, as opposed to perception, in the context of spatio-temporal interactions between objects. Initial experiments on these domains found that neuro-symbolic approaches, which couple a logic engine and language parser with a neural perceptual front-end, substantially outperform fully-learned distributed networks, a finding that was taken to support the above thesis.
Here, we show on the contrary that a fully-learned neural network with the right inductive biases can perform substantially better than all previous neural-symbolic models on both of these tasks, particularly on questions that most emphasize reasoning over perception. Our model makes critical use of both self-attention and learned “soft” object-centric representations, as well as BERT-style semi-supervised predictive losses. These flexible biases allow our model to surpass the previous neuro-symbolic state-of-the-art using less than 60% of available labelled data.
Together, these results refute the neuro-symbolic thesis laid out by previous work involving these datasets, and they provide evidence that neural networks can indeed learn to reason effectively about the causal, dynamic structure of physical events.
Relational reasoning is a central component of generally intelligent behavior, but has proven difficult for neural networks to learn. In this paper we describe how to use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning. We tested RN-augmented networks on three tasks: visual question answering using a challenging dataset called CLEVR, on which we achieve state-of-the-art, super-human performance; text-based question answering using the bAbI suite of tasks; and complex reasoning about dynamic physical systems. Then, using a curated dataset called Sort-of-CLEVR we show that powerful convolutional networks do not have a general capacity to solve relational questions, but can gain this capacity when augmented with RNs. Our work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore’s law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.
…In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess…A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale…In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge—knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods…In computer vision…Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.
…We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that (1) AI researchers have often tried to build knowledge into their agents, (2) this always helps in the short term, and is personally satisfying to the researcher, but (3) in the long run it plateaus and even inhibits further progress, and (4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
Cyc is a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works. Hoping to capture common sense knowledge, Cyc focuses on implicit knowledge that other AI platforms may take for granted. This is contrasted with facts one might find somewhere on the internet or retrieve via a search engine or Wikipedia. Cyc enables AI applications to perform human-like reasoning and be less "brittle" when confronted with novel situations.
Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia, and anyone else, can use under the CC0 public domain license. Wikidata is powered by the software Wikibase.
Subreddit for discussing AI, machine learning, or deep learning approaches involving big numbers: billions of parameters, millions of n, petaflops, etc. eg GPT-3. Most research is conducted at much smaller scale; this subreddit is for research analogous to ‘high energy physics’, requiring specialized approaches, large investments, consortium, etc.
Topics: How? Who? Why do they work? What are they good for? What resources are available? Who will pay & how? What is the future of such approaches? What global consequences will there be?
Open Philanthropy is interested in when AI systems will be able to perform various tasks that humans can perform (“AI timelines”). To inform our thinking, I investigated what evidence the human brain provides about the computational power sufficient to match its capabilities. I consulted with more than 30 experts, and considered four methods of generating estimates [simulating neurons, comparing brain region sizes to similarly powerful algorithms, laws of physics limits, & IO bandwidth/latency], focusing on floating point operations per second (FLOP/s) as a metric of computational power.
In brief, I think it more likely than not that 1015 FLOP/s is enough to perform tasks as well as the human brain (given the right software, which may be very hard to create). And I think it unlikely (<10%) that more than 1021 FLOP/s is required.1 But I’m not a neuroscientist, and the science here is very far from settled.2 I offer a few more specific probabilities, keyed to one specific type of brain model, in the report’s appendix.
For context: the Fugaku supercomputer (~[$1]($2020) billion) performs ~4×1017 FLOP/s, and a V100 GPU (~$10,000) performs up to ~1014 FLOP/s.3 But even if my best-guesses are right, this doesn’t mean we’ll see AI systems as capable as the human brain anytime soon. In particular: actually creating/training such systems (as opposed to building computers that could in principle run them) is a substantial further challenge.
Estimates of FLOP/s budgets large enough to perform tasks as well as the human brain (given the right software).
Mechanistic estimates suggesting that 1013–1017 FLOP/s would be enough to match the human brain’s task-performance seem plausible to me. Some considerations point to higher numbers; some, to lower numbers. Of these, the latter seem to me stronger.
I give less weight to functional method estimates. However, I take estimates based on the visual cortex as some weak evidence that 1013–1017 FLOP/s isn’t much too low. Some estimates based on deep neural network models of retinal neurons point to higher numbers, but I take these as even weaker evidence.
I think it unlikely that the required number of FLOP/s exceeds the bounds suggested by the limit method. However, I don’t think the method itself airtight.
Communication method estimates may well prove informative, but I haven’t vetted them.
“Imitating Interactive Intelligence”, Interactive Agents Group (Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Stephen Clark, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu) (2020-12-10):
A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalize beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount. See videos for an overview of the manuscript, training time-lapse, and human-agent interactions.
…Although the agents do not yet attain human-level performance, we will soon describe scaling experiments which suggest that this gap could be closed substantially simply by collecting more data…The scripted probe tasks are imperfect measures of model performance, but as we have shown above, they tend to be well correlated with model performance under human evaluation. With each doubling of the dataset size, performance grew by approximately the same increment. The rate of performance, in particular for instruction-following tasks, was larger for the BG·A model compared to B·A. Generally, these results give us confidence that we could continue to improve the performance of the agents straightforwardly by increasing the dataset size.
…After training, we asked the models to “Lift an orange duck” or “What colour is the duck?”…Figure 15D shows that the agent trained without orange ducks performed almost as well on these restricted Lift and Color probe tasks as an agent trained with all of the data. These results demonstrate explicitly what our results elsewhere suggest: that agents trained to imitate human action and language demonstrate powerful combinatorial generalisation capabilities. While they have never encountered the entity, they know what an “orange duck” is and how to interact with one when asked to do so for the first time. This particular example was chosen at random; we have every reason to believe that similar effects would be observed for other compound concepts.
Artificial neural networks are most commonly trained with the back-propagation algorithm, where the gradient for learning is provided by back-propagating the error, layer by layer, from the output layer to the hidden layers. A recently discovered method called feedback-alignment shows that the weights used for propagating the error backward don’t have to be symmetric with the weights used for propagation the activation forward. In fact, random feedback weights work evenly well, because the network learns how to make the feedback useful.
In this work, the feedback alignment principle is used for training hidden layers more independently from the rest of the network, and from a zero initial condition. The error is propagated through fixed random feedback connections directly from the output layer to each hidden layer. This simple method is able to achieve zero training error even in convolutional networks and very deep networks, completely without error back-propagation.
The method is a step towards biologically plausible machine learning because the error signal is almost local, and no symmetric or reciprocal weights are required. Experiments show that the test performance on MNIST and CIFAR is almost as good as those obtained with back-propagation for fully connected networks. If combined with dropout, the method achieves 1.45% error on the permutation invariant MNIST task.
Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective, and study the applicability of Direct Feedback Alignment (DFA) to neural view synthesis, recommender systems, geometric learning, and natural language processing. In contrast with previous studies limited to computer vision tasks, our findings show that it successfully trains a large range of state-of-the-art deep learning architectures, with performance close to fine-tuned backpropagation. When a larger gap between DFA and backpropagation exists, like in Transformers, we attribute this to a need to rethink common practices for large and complex architectures. At variance with common beliefs, our work supports that challenging tasks can be tackled in the absence of weight transport.
The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute nodes, communication is challenging to orchestrate: this is a bottleneck to further scaling. In this work, we argue that alternative training methods can mitigate these issues, and can inform the design of extreme-scale training hardware. Indeed, using a synaptically asymmetric method with a parallelizable backward pass, such as Direct Feedback Alignment, communication needs are drastically reduced. We present a photonic accelerator for Direct Feedback Alignment, able to compute random projections with trillions of parameters. We demonstrate our system on benchmark tasks, using both fully-connected and graph convolutional networks. Our hardware is the first architecture-agnostic photonic co-processor for training neural networks. This is a significant step towards building scalable hardware, able to go beyond backpropagation, and opening new avenues for deep learning.
Graphcore is a semiconductor company that develops accelerators for AI and machine learning. It aims to make a massively parallel Intelligence Processing Unit (IPU) that holds the complete machine learning model inside the processor.
Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times required to train them, increasing the need for compute-efficient methods that parallelize training. Two common approaches to parallelize the training of deep networks have been data and model parallelism. While useful, data and model parallelism suffer from diminishing returns in terms of compute efficiency for large batch sizes.
In this paper, we investigate how to continue scaling compute efficiently beyond the point of diminishing returns for large batches through local parallelism, a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation. Local parallelism enables fully asynchronous layer-wise parallelism with a low memory footprint, and requires little communication overhead compared with model parallelism.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms. In this work we evaluate the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training algorithm that has significant hardware advantages. We introduce two methods, Spike Compensation and Linear Weight Prediction, that effectively mitigate the downsides caused by the asynchronicity of Pipelined Backpropagation and outperform existing techniques in our setting. We show that appropriate normalization and small batch sizes can also aid training. With our methods, fine-grained Pipelined Backpropagation using a batch size of one can match the accuracy of SGD for multiple networks trained on CIFAR-10 and ImageNet. Simple scaling rules allow the use of existing hyperparamaters for traditional training without additional tuning.
Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present the Pile: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets—many of which derive from academic or professional sources. [Common Crawl, PubMed Central, Bibliotik (Books3), OpenWebText2, arXiv, Github, FreeLaw, Stack Exchange, USPTO Backgrounds, PubMed Abstracts, Gutenberg (PG-19), OpenSubtitles, English Wikipedia, DeepMind Mathematics, Ubuntu IRC, BookCorpus2, EuroParl, Hacker News, YouTubeSubtitles, PhilPapers, NIH ExPorter, Enron Emails]
Our evaluation of the untuned performance of GPT-2 and GPT-3 on the Pile shows that these models struggle on many of its components, such as academic writing. Conversely, models trained on the Pile improve significantly over both Raw CC and CC-100 on all components of the Pile, while improving performance on downstream evaluations.
Through an in-depth exploratory analysis, we document potentially concerning aspects of the data for prospective users. We make publicly available the code used in its construction.
[Improving estimates of Transformer scaling curves: the graph usually shown for how GPT-3 model scaling leads to better performance on benchmark tasks is misleading, because the tasks have different ceilings & floors, and because the smaller models were trained for an unfair amount of time—every model was trained on the same (very large) fixed amount of text data, even though this is extremely wasteful of FLOPs, not how practitioners would want to train models, and results in the smallest models performing much better than they ‘ought’ to. Finnveden rescales each benchmark tasks to be 0–100% (random–perfect), and considers models trained in compute-optimal fashion.
Plotting against loss.
When he re-analyzes the reported benchmark performance, he finds that GPT-3 scaling is far smoother in model size than the original graphs would indicate, with fewer exceptions. (Two of them are explained, I believe, as simply BPE-caused problems; and the third was adversarially collected to target language model weaknesses, and GPT models may just be starting to solve them.)
Extrapolating.
Finnveden further considers extrapolating the scaling law and cross-referencing with model sizes, budgets, and estimated limits on dataset sizes.]
Conclusions:
On benchmark performance, GPT-3 seems to be in line with performance predicted by smaller sizes, and doesn’t seem to particularly break or accelerate the trend…
Close-to-optimal performance on these benchmarks seems like it’s at least ~3 orders of magnitude compute away (costing around $1b at current prices). This means that I’d be somewhat surprised if a 100× scaling brought us there immediately; but another 100× scaling after that might do it (for reference, a 10,000× increase in compute would correspond to a bit more than 100× increase in size, which is the difference between GPT-2 and GPT-3). If we kept scaling these models naively, I’d think it’s more likely than not that we’d get there after increasing the training FLOP by ~5–6 orders of magnitude (costing $100b–$1t at current prices).
Taking into account both software improvements and potential bottlenecks like data, I’d be inclined to update that downwards, maybe an order of magnitude or so (for a total cost of ~$10–100b). Given hardware improvements in the next 5–10 years, I would expect that to fall further to ~$1–10b.
I think this would be more than sufficient for automating the tasks mentioned above—though rolling out changes in practice could still take years. (Note that some of these tasks could be automated with today’s model sizes, already, if sufficient engineering work was spent to fine-tune them properly. I’m making the claim that automation will quite easily be doable by this point, if it hasn’t already been done.)
Assuming that hardware and algorithmic progress have reduced the cost of inference by at least 10×, this will cost less than 1 cent per word.
I think this would probably not be enough to automate the majority of human economic activity or otherwise completely transform society (but I think we should be investing substantial resources in preparing for that eventuality).
If I adopt the framework from Ajeya Cotra’s draft report—where a model with the right number of parameters can become ~human-equivalent at tasks with a certain horizon length if trained on the right number of data points of that horizon length—I’m inclined to treat these extrapolations as a guess for how many parameters will be required for ~human-equivalence. Given that Cotra’s model’s median number of parameters is close to my best guess of where near-optimal performance is achieved, the extrapolations do not contradict the model’s estimates, and constitute some evidence for the median being roughly right.
“CPM: A Large-scale Generative Chinese Pre-trained Language Model”, Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun (2020-12-01):
Pre-trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570GB training data, drew a lot of attention due to the capacity of few-shot (even zero-shot) learning. However, applying GPT-3 to address Chinese NLP tasks is still challenging, as the training corpus of GPT-3 is primarily English, and the parameters are not publicly available. In this technical report, we release the Chinese Pre-trained Language Model (CPM) with generative pre-training on large-scale Chinese training data. To the best of our knowledge, CPM, with 2.6 billion parameters and 100GB Chinese training data, is the largest Chinese pre-trained language model, which could facilitate several downstream Chinese NLP tasks, such as conversation, essay generation, cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many NLP tasks in the settings of few-shot (even zero-shot) learning. The code and parameters are available.
Background: Many human diseases are known to have a genetic contribution. While genome-wide studies have identified many disease-associated loci, it remains challenging to elucidate causal genes. In contrast, exome sequencing provides an opportunity to identify new disease genes and large-effect variants of clinical relevance. We therefore sought to determine the contribution of rare genetic variation in a curated set of human diseases and traits using a unique resource of 200,000 individuals with exome sequencing data from the UK Biobank.
Methods and Results: We included 199,832 participants with a mean age of 68 at follow-up. Exome-wide gene-based tests were performed for 64 diseases and 23 quantitative traits using a mixed-effects model, testing rare loss-of-function and damaging missense variants. We identified 51 known and 23 novel associations with 26 diseases and traits at a false-discovery-rate of 1%. There was a striking risk associated with many Mendelian disease genes including: MYPBC3 with over a 100-fold increased odds of hypertrophic cardiomyopathy, PKD1 with a greater than 25-fold increased odds of chronic kidney disease, and BRCA2, BRCA1, ATM and PALB2 with 3 to 10-fold increased odds of breast cancer. Notable novel findings included an association between GIGYF1 and type 2 diabetes (OR 5.6, p = 5.35×10−8), elevated blood glucose, and lower insulin-like-growth-factor-1 levels. Rare variants in CCAR2 were also associated with diabetes risk (OR 13, p = 8.5×10−8), while COL9A3 was associated with cataract (OR 3.4, p = 6.7×10−8). Notable associations for blood lipids and hypercholesterolemia included NR1H3, RRBP1, GIGYF1, SCGN, APH1A, PDE3B and ANGPTL8. A number of novel genes were associated with height, including DTL, PIEZO1, SCUBE3, PAPPA and ADAMTS6, while BSN was associated with body-mass-index. We further assessed putatively pathogenic variants in known Mendelian cardiovascular disease genes and found that between 1.3 and 2.3% of the population carried likely pathogenic variants in known cardiomyopathy, arrhythmia or hypercholesterolemia genes.
Conclusions: Large-scale population sequencing identifies known and novel genes harboring high-impact variation for human traits and diseases. A number of novel findings, including GIGYF1, represent interesting potential therapeutic targets. Exome sequencing at scale can identify a meaningful proportion of the population that carries a pathogenic variant underlying cardiovascular disease.
Here we review the motivation for creating the enhancing neuroimaging genetics through meta-analysis (ENIGMA) Consortium and the genetic analyses undertaken by the consortium so far. We discuss the methodological challenges, findings, and future directions of the genetics working group. A major goal of the working group is tackling the reproducibility crisis affecting “candidate gene” and genome-wide association analyses in neuroimaging. To address this, we developed harmonized analytic methods, and support their use in coordinated analyses across sites worldwide, which also makes it possible to understand heterogeneity in results across sites. These efforts have resulted in the identification of hundreds of common genomic loci robustly associated with brain structure. We have found both pleiotropic and specific genetic effects associated with brain structures, as well as genetic correlations with psychiatric and neurological diseases.
Why Did We Build The Enigma Consortium? The consortium was formed in 2009, largely in response to the growing evidence of a lack of reproducibility dubbed “the replication crisis” in imaging genetics. At this time, the first major works of the Psychiatric Genomics Consortium were being presented at conferences (Neale et al 2010; The Schizophrenia Psychiatric Genome-Wide Association Study [GWAS] Consortium, 2011a, 2011b), and we had observed the improvement in statistical power and increase in reproducibility that could be achieved through large-scale meta-analysis. In late 2009, we were beginning to see a series of GWAS publications using phenotypes derived from magnetic resonance imaging (MRI) attempting to answer complex and important questions in psychiatry and neurology. At that time, it was common to see GWAS papers reporting not only main effect analyses but also interactions with diagnosis or putative risk variables in sample sizes of less than 1,000 people.
…In response to these issues, Thompson and Martin sent an email to neuro-imaging groups around the world asking for interest in being part of a collaborative meta-analysis consortium focusing on imaging genetics. The key points in this email were that, although every group would understandably want to publish its own paper reporting their own findings, (a) the power calculations do not change just because the phenotype acquisition is expensive, (b) it was likely that the individual studies would not be large enough to find significant genetic effects, and (c) even if they did, it would still be necessary to replicate these findings in independent samples. From these beginnings, the ENIGMA consortium now involves more than 2,000 scientists from over 400 institutions in more than 40 countries.
The speed, expense and throughput of genomic sequencing impose limitations on its use for time-sensitive acute cases, such as rare or antibiotic resistant infections, and large-scale testing that is necessary for containing COVID-19 outbreaks using source-tracing. The major bottleneck for increasing the bandwidth and decreasing operating costs of next-generation sequencers (NGS) is the flow cell that supplies reagents for the biochemical processes; this subsystem has not significantly improved since 2005.
Here we report a new method for sourcing reagents based on surface coating technology (SCT): the DNA adhered onto the biochip is directly contacted by a reagent-coated polymeric strip. Compared with flow cells the reagent layers are an order of magnitude thinner while both the reagent exchange rate and biochip area are orders of magnitude greater. These improvements drop the turn-around time from days to twelve hours and the cost for whole genome sequencing (WGS) from about $1000 to $15, as well as increase data production by several orders of magnitude.
This makes NGS more affordable than many blood tests while rapidly providing detailed genomic information about microbial and viral pathogens, cancers and genetic disorders for targeted treatments and personalized medicine. This data can be pooled in population-wide databases for accelerated research and development as well providing detailed real-time data for tracking and containing outbreaks, such as the current COVID-pandemic.
BGI, currently known as the BGI Group, formerly known as the Beijing Genomics Institute, is a Chinese genome sequencing company, headquartered in Shenzhen, Guangdong, China.
Low-coverage whole genome sequencing followed by imputation has been proposed as a cost-effective genotyping approach for disease and population genetics studies. However, its competitiveness against SNP arrays is undermined as current imputation methods are computationally expensive and unable to leverage large reference panels.
Here, we describe a method, GLIMPSE, for phasing and imputation of low-coverage sequencing datasets from modern reference panels. We demonstrate its remarkable performance across different coverages and human populations. It achieves imputation of a full genome for less than $1, outperforming existing methods by orders of magnitude, with an increased accuracy of more than 20% at rare variants. We also show that 1x coverage enables effective association studies and is better suited than dense SNP arrays to access the impact of rare variations. Overall, this study demonstrates the promising potential of low-coverage imputation and suggests a paradigm shift in the design of future genomic studies.
“Reprogramming to recover youthful epigenetic information and restore vision”, Yuancheng Lu, Benedikt Brommer, Xiao Tian, Anitha Krishnan, Margarita Meer, Chen Wang, Daniel L. Vera, Qiurui Zeng, Doudou Yu, Michael S. Bonkowski, Jae-Hyun Yang, Songlin Zhou, Emma M. Hoffmann, Margarete M. Karg, Michael B. Schultz, Alice E. Kane, Noah Davidsohn, Ekaterina Korobkina, Karolina Chwalek, Luis A. Rajman, George M. Church, Konrad Hochedlinger, Vadim N. Gladyshev, Steve Horvath, Morgan E. Levine, Meredith S. Gregory-Ksander, Bruce R. Ksander, Zhigang He, David A. Sinclair (2020-12-02):
Ageing is a degenerative process that leads to tissue dysfunction and death. A proposed cause of ageing is the accumulation of epigenetic noise that disrupts gene expression patterns, leading to decreases in tissue function and regenerative capacity. Changes to DNA methylation patterns over time form the basis of ageing clocks, but whether older individuals retain the information needed to restore these patterns—and, if so, whether this could improve tissue function—is not known.
Over time, the central nervous system (CNS) loses function and regenerative capacity. Using the eye as a model CNS tissue, here we show that ectopic expression of Oct4 (also known as Pou5f1), Sox2 and Klf4 genes (OSK) in mouse retinal ganglion cells restores youthful DNA methylation patterns and transcriptomes, promotes axon regeneration after injury, and reverses vision loss in a mouse model of glaucoma and in aged mice. The beneficial effects of OSK-induced reprogramming in axon regeneration and vision require the DNA demethylases TET1 and TET2.
These data indicate that mammalian tissues retain a record of youthful epigenetic information—encoded in part by DNA methylation—that can be accessed to improve tissue function and promote regeneration in vivo.
Ageing has negative consequences for all the cells and organs in our bodies. Our brains are no exception. Neurons in the developing brain form circuits that can adapt to change and regenerate in response to injury. These capacities have long been known to diminish over time, but the molecular shifts that underlie this deterioration have remained mysterious. Lu et al 2020 show in a paper in Nature that neurons of the eye can be programmed to revert to a youthful state in which they reacquire their ability to resist injury and to regenerate. The authors’ findings shed light on mechanisms of ageing and point to a potent therapeutic target for age-related neuronal diseases.
…Lu et al. asked whether it is possible to revert RGCs to a younger ‘age’, and whether doing so would allow the cells to regenerate. They infected RGCs in mice with adeno-associated viruses. These harmless viruses had been genetically engineered to induce expression of three of the ‘Yamanaka factors’—a group of four transcription factors (Oct4, Sox2, Klf4 and c-Myc) that can trigger mature cell types to adopt an immature state6. Such an approach normally comes with hazards in vivo: Yamanaka factors can cause cells to adopt unwanted new identities and characteristics, leading to tumours or death7. Fortunately, Lu and co-workers found that they could circumvent these hazards by expressing just Oct4, Sox2 and Klf4 (together called OSK).
The authors tested the infected RGCs’ ability to regenerate if the cells’ axons were crushed. They found that the OSK-expressing viruses triggered RGC regeneration and long-distance axon extension following damage to the optic nerve (Fig. 1), with no apparent alterations to RGC identity, formation of retinal tumours or any other ill effects. OSK expression had beneficial effects on RGC axon regeneration in both young and aged mice. In some cases, the regenerated axons extended all the way from the eye to the optic chiasm (the location at the base of the brain at which the optic nerves from each eye cross to the opposite brain hemisphere). It is notable that the effects of OSK are seen in older animals, because studies of RGC regeneration are often conducted in relatively young animals, which have a residual natural regenerative ability. Thus, the evidence suggests that Lu and colleagues’ approach can fully restore long-distance regenerative capacity in mature RGCs—a milestone for the field.
…Why might reprogramming old RGCs to a younger state promote regeneration and restore vision? An emerging model in the field of ageing is that, over time, cells accumulate epigenetic noise—molecular changes that alter patterns of gene expression, including transcriptional changes and shifts in the patterns of methyl groups on DNA. Collectively, these changes cause cells to lose their identity and so to lose the DNA-expression, RNA-expression and protein-expression patterns that once promoted their youthful resilience. Given the growing excitement about DNA methylation as a marker of cell age, the authors asked whether OSK expression somehow counteracts the negative effects of ageing or axon injury on DNA methylation.
Ageing has negative consequences for all the cells and organs in our bodies. Our brains are no exception. Neurons in the developing brain form circuits that can adapt to change and regenerate in response to injury. These capacities have long been known to diminish over time, but the molecular shifts that underlie this deterioration have remained mysterious. Lu et al 2020 show in a paper in Nature that neurons of the eye can be programmed to revert to a youthful state in which they reacquire their ability to resist injury and to regenerate. The authors’ findings shed light on mechanisms of ageing and point to a potent therapeutic target for age-related neuronal diseases.
Aging is a degenerative process leading to tissue dysfunction and death. A proposed cause of aging is the accumulation of epigenetic noise, which disrupts youthful gene expression patterns that are required for cells to function optimally and recover from damage. Changes to DNA methylation patterns over time form the basis of ‘aging clocks’, but whether old individuals retain information to reset the clocks and, if so, whether it would improve tissue function is not known. Of all the tissues in the body, the central nervous system (CNS) is one of the first to lose regenerative capacity. Using the eye as a model tissue, we show that expression of Oct4, Sox2, and Klf4 genes (OSK) in mice resets youthful gene expression patterns and the DNA methylation age of retinal ganglion cells, promotes axon regeneration after optic nerve crush injury, and restores vision in a mouse model of glaucoma and in normal aged mice. This process, which we call the reversal of information loss via epigenetic reprogramming or REVIVER, requires non-global, active DNA demethylation by TET enzymes and the downstream enzyme TDG, indicating that alterations in DNA methylation patterns may not simply indicate age, but participate in aging. Thus, old tissues retain a faithful record of youthful epigenetic information that can be accessed for functional age reversal.
“Reconstitution of the oocyte transcriptional network with transcription factors”, Nobuhiko Hamazaki, Hirohisa Kyogoku, Hiromitsu Araki, Fumihito Miura, Chisako Horikawa, Norio Hamada, So Shimamoto, Orie Hikabe, Kinichi Nakashima, Tomoya S. Kitajima, Takashi Ito, Harry G. Leitch, Katsuhiko Hayashi (2020-12-16):
During female germline development, oocytes become a highly specialized cell type and form a maternal cytoplasmic store of crucial factors. Oocyte growth is triggered at the transition from primordial to primary follicle and is accompanied by dynamic changes in gene expression1, but the gene regulatory network that controls oocyte growth remains unknown. Here we identify a set of transcription factors that are sufficient to trigger oocyte growth. By investigation of the changes in gene expression and functional screening using an in vitro mouse oocyte development system, we identified eight transcription factors, each of which was essential for the transition from primordial to primary follicle. Notably, enforced expression of these transcription factors swiftly converted pluripotent stem cells into oocyte-like cells that were competent for fertilization and subsequent cleavage. These transcription-factor-induced oocyte-like cells were formed without specification of primordial germ cells, epigenetic reprogramming or meiosis, and demonstrate that oocyte growth and lineage-specific de novo DNA methylation are separable from the preceding epigenetic reprogramming in primordial germ cells. This study identifies a core set of transcription factors for orchestrating oocyte growth, and provides an alternative source of ooplasm, which is a unique material for reproductive biology and medicine.
“This demonstrates that you can go directly from stem cells to oocytes. I think that is exciting,” Petra Hajkova, a developmental epigeneticist at Imperial College London who was not involved in the study, tells The Scientist. The work, she notes, will help researchers explore the basic biology of oocyte development. In the future, says study coauthor Nobuhiko Hamazaki of Kyushu University, the research could aid in cloning endangered animals or helping women with mitochondrial diseases to have healthy children.
…“It’s believed that oocytes develop from germ cells, but we could make oocytes from non-germ cells,” he explains. “At first, I was so surprised that I could not believe my results, so I repeated the experiment again and again and when I got the same results, I was finally convinced.”…“I was initially in complete disbelief to see mouse stem cells so quickly and easily take the form of oocytes based on introducing just a handful of factors, but repeated experiments proved it was true,” said Nobuhiko Hamazaki, PhD, first author on the study reporting the results and assistant professor at Kyushu University at the time of the research. “To find that eight transcription factors could lead to such big changes was quite astonishing.”
…Richard Schultz, a cell biologist at the University of California, Davis, who was not involved in the study, says the work to identify the core set of transcription factors that can drive embryonic stem cells into a state where they look like oocytes is impressive. But the egglike cells don’t undergo meiosis, so they are not functional. “It’s a big step, but only 95% there. We haven’t gotten 100% there” to understanding the factors essential for maturation of germline egg cells to oocytes and then to viable eggs with half their chromosomes. Despite not working out the pathway to meiosis, the work “enabled us to produce a large number of oocytes. We believe that this technology can accelerate basic biological research on oocytes, which are still one of the most mysterious cell types,” Hamazaki says. He explains that the work could improve animal cloning because of the vast number of oocytes produced by the team’s technique…“Cytoplasm from oocytes is an invaluable resource in reproductive biology and medicine, and this method could provide a novel tool for producing large amounts of it without any invasive procedures,” commented Hayashi. “While the processes could still be much more complex for humans, these initial results in mice are very promising.”
This report examines public perceptions of biotechnology, evolution and the relationship between science and religion. Data in this report come from a survey conducted in 20 publics from October 2019 to March 2020 across Europe, Russia, the Americas and the Asia-Pacific region. Surveys were conducted by face-to-face interview in Russia, Poland, the Czech Republic, India and Brazil. In all other places, the surveys were conducted by telephone. All surveys were conducted with representative samples of adults ages 18 and older in each survey public.
…A 20-public median of 63% say scientific research on gene editing is a misuse—rather than an appropriate use—of technology, according to the survey fielded in publics across Europe, the Asia-Pacific region, the United States, Canada, Brazil and Russia.
However, views on specific instances where gene editing might be used highlight the complex and contextual nature of public attitudes. Majorities say it would be appropriate to change a baby’s genetic characteristics to treat a serious disease the baby would have at birth (median of 70%), and somewhat smaller shares, though still about half or more, say using these techniques to reduce the risk of a serious disease that could occur over the course of the baby’s lifetime would be appropriate (60%). But a median of just 14% say it would be appropriate to change a baby’s genetic characteristics to make the baby more intelligent. A far larger share (median of 82%) would consider this to be a misuse of technology.
Our task is simple: we will consider whether the rate of scientific progress has slowed down, and more generally what we know about the rate of scientific progress, based on these literatures and other metrics we have been investigating. This investigation will take the form of a conceptual survey of the available data. We will consider which measures are out there, what they show, and how we should best interpret them, to attempt to create the most comprehensive and wide-ranging survey of metrics for the progress of science. In particular, we integrate a number of strands in the productivity growth literature, the “science of science” literature, and various historical literatures on the nature of human progress.
…To sum up the basic conclusions of this paper, there is good and also wide-ranging evidence that the rate of scientific progress has indeed slowed down, In the disparate and partially independent areas of productivity growth, total factor productivity, GDP growth, patent measures, researcher productivity, crop yields, life expectancy, and Moore’s Law we have found support for this claim.
One implication here is we should not be especially optimistic about the productivity slowdown, as that notion is commonly understood, ending any time soon. There is some lag between scientific progress and practical outputs, and with science at less than its maximum dynamic state, one might not expect future productivity to fare so well either. Under one more specific interpretation of the data, a new General Purpose Technology might be required to kickstart economic growth once again.
We abstract the concept of a randomized controlled trial (RCT) as a triple (β, b, s), where β is the primary efficacy parameter, b the estimate and s the standard error (s > 0). The parameter β is either a difference of means, a log odds ratio or a log hazard ratio. If we assume that b is unbiased and normally distributed, then we can estimate the full joint distribution of (β, b, s) from a sample of pairs (bi, si).
We have collected 23,747 such pairs from the Cochrane Database of Systematic Reviews to do so. Here, we report the estimated distribution of the signal-to-noise ratio β⁄s and the achieved power. We estimate the median achieved power to be 0.13. We also consider the exaggeration ratio which is the factor by which the magnitude of β is overestimated. We find that if the estimate is just significant at the 5% level, we would expect it to overestimate the true effect by a factor of 1.7.
This exaggeration is sometimes referred to as the winner’s curse and it is undoubtedly to a considerable extent responsible for disappointing replication results. For this reason, we believe it is important to shrink the unbiased estimator, and we propose a method for doing so.
The Busy Beaver function, with its incomprehensibly rapid growth, has captivated generations of computer scientists, mathematicians, and hobbyists. In this survey, I offer a personal view of the BB function 58 years after its introduction, emphasizing lesser-known insights, recent progress, and especially favorite open problems.
Examples of such problems include: when does the BB function first exceed the Ackermann function? Is the value of BB(20) independent of set theory? Can we prove that BB(n + 1) > 2BB(n) for large enough n? Given BB(n), how many advice bits are needed to compute BB(n + 1)? Do all Busy Beavers halt on all inputs, not just the 0 input? Is it decidable, given n, whether BB(n) is even or odd?
Scott Joel Aaronson is an American theoretical computer scientist and David J. Bruton Jr. Centennial Professor of Computer Science at the University of Texas at Austin. His primary areas of research are quantum computing and computational complexity theory.
“In math, there is a very permeable boundary between what’s an amusing recreation and what is actually important,” said Scott Aaronson, a theoretical computer scientist at the University of Texas, Austin who recently published a survey of progress in “BusyBeaverology.” The recent work suggests that the search for long-running computer programs can illuminate the state of mathematical knowledge, and even tell us what’s knowable. According to researchers, the busy beaver game provides a concrete benchmark for evaluating the difficulty of certain problems, such as the unsolved Goldbach conjecture and Riemann hypothesis. It even offers a glimpse of where the logical bedrock underlying math breaks down. The logician Kurt Gödel proved the existence of such mathematical terra incognita nearly a century ago. But the busy beaver game can show where it actually lies on a number line, like an ancient map depicting the edge of the world.
…For instance, if you’re only allowed one rule, and you want to ensure that the Turing machine halts, you’re forced to include the halt instruction right away. The busy beaver number of a one-rule machine, or BB(1), is therefore 1. But adding just a few more rules instantly blows up the number of machines to consider. Of 6,561 possible machines with two rules, the one that runs the longest—six steps—before halting is the busy beaver. But some others simply run forever. None of these are the busy beaver, but how do you definitively rule them out? Turing proved that there’s no way to automatically tell whether a machine that runs for a thousand or a million steps won’t eventually terminate.
That’s why finding busy beavers is so hard. There’s no general approach for identifying the longest-running Turing machines with an arbitrary number of instructions; you have to puzzle out the specifics of each case on its own. In other words, the busy beaver game is, in general, “uncomputable.” Proving that BB(2) = 6 and that BB(3) = 21 was difficult enough that Radó’s student Shen Lin earned a doctorate for the work in 1965. Radó considered BB(4) “entirely hopeless,” but the case was finally solved in 1983. Beyond that, the values virtually explode; researchers have identified a five-rule Turing machine, for instance, that runs for 47,176,870 steps before stopping, so BB(5) is at least that big. BB(6) is at least 7.4 × 1036,534. Proving the exact values “will need new ideas and new insights, if it can be done at all,” said Aaronson.
…The Goldbach conjecture, for instance, asks whether every even integer greater than 2 is the sum of two primes. Proving the conjecture true or false would be an epochal event in number theory, allowing mathematicians to better understand the distribution of prime numbers. In 2015, an anonymous GitHub user named Code Golf Addict published code for a 27-rule Turing machine that halts if—and only if—the Goldbach conjecture is false. It works by counting upward through all even integers greater than 4; for each one, it grinds through all the possible ways to get that integer by adding two others, checking whether the pair is prime. When it finds a suitable pair of primes, it moves up to the next even integer and repeats the process. If it finds an even integer that can’t be summed by a pair of prime numbers, it halts. Running this mindless machine isn’t a practical way to solve the conjecture, because we can’t know if it will ever halt until it does. But the busy beaver game sheds some light on the problem. If it were possible to compute BB(27), that would provide a ceiling on how long we’d have to wait for the Goldbach conjecture to be settled automatically. That’s because BB(27) corresponds to the maximum number of steps this 27-rule Turing machine would have to execute in order to halt (if it ever did). If we knew that number, we could run the Turing machine for exactly that many steps. If it halted by that point, we’d know the Goldbach conjecture was false. But if it went that many steps and didn’t halt, we’d know for certain that it never would—thus proving the conjecture true…In 2016, Aaronson established a similar result in collaboration with Yuri Matiyasevich and Stefan O’Rear. They identified a 744-rule Turing machine that halts if and only if the Riemann hypothesis is false
…In 2016, he and his graduate student Adam Yedidia specified a 7,910-rule Turing machine that would only halt if ZF set theory is inconsistent. This means BB(7,910) is a calculation that eludes the axioms of ZF set theory. Those axioms can’t be used to prove that BB(7,910) represents one number instead of another, which is like not being able to prove that 2 + 2 = 4 instead of 5…“So much of math can be encoded as a question of, ‘Does this Turing machine halt or not?’” Aaronson said. “If you knew all the busy beaver numbers, then you could settle all of those questions.”
what if your dataset doesn’t fit in RAM? I will present the algorithm I use for shuffling large datasets. It isn’t novel, and one can find multiple instances of people reinventing it or something similar (and in essence it descends from Rao). However, I don’t know of anywhere that states the algorithm, shows why it’s correct, and gets into the particular practical issues we address below.
A 2-pass shuffle algorithm: Suppose we have data x0 , . . . , xn—1. Choose an M sufficiently large that a set of n⁄M points can be shuffled in RAM using something like Fisher–Yates, but small enough that you can have M open files for writing (with decent buffering). Create M “piles” p0 , . . . , pM—1 that we can write data to. The mental model of a “pile” here is that it’s a file you can append to, but in practice you might, say, have several piles exist as datasets in the same HDF5 file. The first pass of the algorithm is to split the data into these M piles, and the second pass shuffles each pile and appends it to the final result.
// First pass create empty piles p[0], ..., p[M—1] for i = 0, ..., n—1do j := uniform random draw from [0, ..., M—1] append x[i] to pile p[j] // Second pass (perhaps done lazily)for j = 0, ..., M—1do shuffle p[j] in RAM with Fisher-Yates // or whatever is convenient append p[j] to output file
Example of a shuffle: We start with unshuffled data (top); the first pass leaves M=6 piles (middle); the second pass yields shuffled data (bottom).
Assuming you have enough memory to satisfy the above constraint on M and assuming that drawing a random number is O(1), this is a linear time algorithm; the constant factor is dominated by having to read and write each data point twice in external storage (but the reading/writing can be done in blocks rather than one point at a time). Since the reading and writing is stream-oriented, the algorithm still works for data with variable record length.
…Appendix: Performance comparison: The 2-pass shuffle seemed so obviously better than random access into a file that I hadn’t bothered to measure how much faster it actually is. One approach works, the other doesn’t, what’s there to measure? But the post was met with a lot of skepticism about whether it is faster at all, apparently on the basis that the 2-pass algorithm has an extra read/write and SSDs are fast. (Even with uncompressed data on local SSDs, sequential traversals are 48 times as fast as random access traversals for my data.) So I measured the difference and found that, for my data and how it is stored, the 2-pass approach is 1000 times as fast as random access (and that’s before incorporating further improvements to the 2-pass approach that are done in practice, which are to parallelize the first pass and integrate it with the data preprocessing). If this sounds too good to be true, bear in mind that this is not a comparison to some highly-regarded practice; it is a comparison to a bad idea, like quicksort against bubblesort.
A general method is given for generating random permutations of integers using a table of random sampling numbers and without wasting the random numbers read. This is more convenient in practice, specially when random permutations of large numbers of elements are needed. It is suggested that even for permutations of small numbers, the method offers greater scope than consulting a table of a limited number of random permutations. [See also Sandelius 1962, hence the description of this as “Rao-Sandelius shuffling”.]
The paper describes a randomization procedure consisting in distributing a deck of cards into 10 decks using random decimal digits and repeating this step with each deck consisting of three or more cards. One random digit is used for randomizing a deck of two cards. This procedure, which is essentially a particular case of a general procedure described by Rao 1961, is called the “multistage randomization procedure”, or MRP. Some applications are described. A recursive formula is given for the expected number of random digits required by MRP for the randomization of n symbols. A measure of the efficiency of a randomization procedure is presented. The efficiency of MRP is compared with the efficiencies of two other randomization procedures, and it is proved that MRP has an asymptotic efficiency of 100%.
In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that are too large to fit into a computer's main memory at once. Such algorithms must be optimized to efficiently fetch and access data stored in slow bulk memory such as hard drives or tape drives, or when memory is on a computer network. External memory algorithms are analyzed in the external memory model.
The Fisher–Yates shuffle is an algorithm for generating a random permutation of a finite sequence—in plain terms, the algorithm shuffles the sequence. The algorithm effectively puts all the elements into a hat; it continually determines the next element by randomly drawing an element from the hat until no elements remain. The algorithm produces an unbiased permutation: every permutation is equally likely. The modern version of the algorithm is efficient: it takes time proportional to the number of items being shuffled and shuffles them in place.
Society is becoming increasingly dependent on survey research. However, surveys can be impacted by participants who are non-attentive, respond randomly to survey questions, and misrepresent who they are and their true attitudes. The impact that such respondents can have on public health research has rarely been systematically examined.
In this study we examine whether Americans began to engage in dangerous cleaning practices to avoid COVID-19 infection. Prior reports have suggested that people began to engage in highly dangerous cleaning practices during the COVID-19 pandemic, including ingesting household cleansers such as bleach. In a series of studies totaling close to 1400 respondents, we show that 80–90% of reports of household cleanser ingestion are made by problematic respondents. These respondents report impossible claims such as ‘recently having had a fatal heart attack’ and ‘eating concrete for its iron content’ at a similar rate to ingesting household cleaners. Additionally, respondents’ frequent misreading or misinterpreting the intent of questions accounted for the rest of such claims.
Once inattentive, mischievous, and careless respondents are taken out of the analytic sample we find no evidence that people ingest cleansers. The relationship between dangerous cleaning practices and health outcomes also becomes non-significant once problematic respondents are taken out of the analytic sample. These results show that reported ingestion of household cleaners and other similar dangerous practices are an artifact of problematic respondent bias.
The implications of these findings for public health and medical survey research, as well as best practices for avoiding problematic respondents in surveys are discussed.
Brief review of Scott Alexander’s “lizardman constant”: human survey-takers will, with >0% probability, endorse the most absurd items on a survey, for a mix of reasons like laziness, boredom, humor, sabotage, ignorance, and stupidity. For example, 4% of respondents may endorse the claim ‘lizard-people rule the earth’, 5% of atheists believe in God, and so on. This cautions us against taking survey results about extremely unusual people or traits too literally, or expecting perfectly accurate results, as given the lizardman constant and other crud factors, it is entirely possible that some or all of the outliers may just be the lizardman constant at work.
In this paper, we consider the task of digitally voicing silent speech, where silently mouthed words are converted to audible speech based on electromyography (EMG) sensor measurements that capture muscle impulses. While prior work has focused on training speech synthesis models from EMG collected during vocalized speech, we are the first to train from EMG collected during silently articulated speech. We introduce a method of training on silent EMG by transferring audio targets from vocalized to silent signals. Our method greatly improves intelligibility of audio generated from silent EMG compared to a baseline that only trains with vocalized data, decreasing transcription word error rate from 64% to 4% in one data condition and 88% to 68% in another. To spur further development on this task, we share our new dataset of silent and vocalized facial EMG measurements.
Speech neuroprosthetics aim to provide a natural communication channel to individuals who are unable to speak due to physical or neurological impairments. Real-time synthesis of acoustic speech directly from measured neural activity could enable natural conversations and significantly improve quality of life, particularly for individuals who have severely limited means of communication. Recent advances in decoding approaches have led to high quality reconstructions of acoustic speech from invasively measured neural activity. However, most prior research utilizes data collected during open-loop experiments of articulated speech, which neglects the critical human-in-the-loop aspect of a practical speech neuroprosthetic.
Here we present an approach that synthesizes audible speech in real-time for both imagined and whispered speech conditions. Using a participant implanted with stereotactic depth electrodes, we were able to reliably generate audible speech in real-time. The decoding models rely predominately on frontal activity suggesting that speech processes have similar representations when vocalized, whispered, or imagined. Our real-time synthesis approach represents an essential step towards investigating how patients will learn to operate a closed-loop speech neuroprosthesis, as well as the development of techniques that incorporate co-adaptation of the user and system for optimized performance.
“Thinking ahead: prediction in context as a keystone of language in humans and machines”, Ariel Goldstein, Zaid Zada, Eliav Buchnik, Mariano Schain, Amy Price, Bobbi Aubrey, Samuel A. Nastase, Amir Feder, Dotan Emanuel, Alon Cohen, Aren Jansen, Harshvardhan Gazula, Gina Choe, Aditi Rao, Catherine Kim, Colton Casto, Fanda Lora, Adeen Flinker, Sasha Devore, Werner Doyle, Patricia Dugan, Daniel Friedman, Avinatan Hassidim, Michael Brenner, Yossi Matias, Ken A. Norman, Orrin Devinsky, Uri Hasson (2020-12-03):
Departing from classical rule-based linguistic models, advances in deep learning have led to the development of a new family of self-supervised deep language models (DLMs). These models are trained using a simple self-supervised autoregressive objective, which aims to predict the next word in the context of preceding words in real-life corpora. After training, autoregressive DLMs are able to generate new "context-aware" sentences with appropriate syntax and convincing semantics and pragmatics.
Here we provide empirical evidence for the deep connection between autoregressive DLMs and the human language faculty using a 30-min spoken narrative and electrocorticographic (ECoG) recordings. Behaviorally, we demonstrate that humans have a remarkable capacity for word prediction in natural contexts, and that, given a sufficient context window, DLMs can attain human-level prediction performance. Next, we leverage DLM embeddings to demonstrate that many electrodes spontaneously predict the meaning of upcoming words, even hundreds of milliseconds before they are perceived. Finally, we demonstrate that contextual embeddings derived from autoregressive DLMs capture neural representations of the unique, context-specific meaning of words in the narrative.
Our findings suggest that deep language models provide an important step toward creating a biologically feasible computational framework for generative language.
Deep artificial neural networks have been proposed as a model of primate vision. However, these networks are vulnerable to adversarial attacks, whereby introducing minimal noise can fool networks into misclassifying images. Primate vision is thought to be robust to such adversarial images. We evaluated this assumption by designing adversarial images to fool primate vision. To do so, we first trained a model to predict responses of face-selective neurons in macaque inferior temporal cortex. Next, we modified images, such as human faces, to match their model-predicted neuronal responses to a target category, such as monkey faces. These adversarial images elicited neuronal responses similar to the target category. Remarkably, the same images fooled monkeys and humans at the behavioral level. These results challenge fundamental assumptions about the similarity between computer and primate vision and show that a model of neuronal activity can selectively direct primate visual behavior.
Particular deep artificial neural networks (ANNs) are today’s most accurate models of the primate brain’s ventral visual stream. Here we report that, using a targeted ANN-driven image synthesis method, new luminous power patterns (i.e. images) can be applied to the primate retinae to predictably push the spiking activity of targeted V4 neural sites beyond naturally occurring levels. More importantly, this method, while not yet perfect, already achieves unprecedented independent control of the activity state of entire populations of V4 neural sites, even those with overlapping receptive fields. These results show how the knowledge embedded in today’s ANN models might be used to non-invasively set desired internal brain states at neuron-level resolution, and suggest that more accurate ANN models would produce even more accurate control.
Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by matching the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.
Researchers generally receive little training in experimental deception.
Drawing on the field of magic, we present a novel model of effective deception.
First, deception should have many “layers” rather than a single cover story.
Second, these layers should be subtle rather than explicitly stated.
We provide strategies for improving deception and thus the reliability of research.
Social psychologists, placebo scientists, and consumer researchers often require deception in their studies, yet they receive little training on how to deceive effectively. Ineffective deception, however, can lead to suspicion and compromise the validity of research. The field of magic offers a potential solution; magicians have deceived audiences for millennia using a variety of robust techniques. As former professional magicians, we propose the Swiss cheese model of deception and argue that deception should be subtle yet elaborate. Subtle deception involves techniques such as fake mistakes, planted assumptions, and convincers. Elaborate deception involves layering many of these techniques rather than relying on a single cover story. We have demonstrated the potency of these principles by making participants believe implausible ideas, such as that a machine is controlling their mind or that the placebo they consumed was a psychedelic drug. These principles can help researchers reduce demand characteristics, improve blinding, and increase the generalisability of studies that require deception. [Keywords: deception, suspicion, magic, placebo, blinding, ethics]
1.1: Deceive elaborately with many layers: Co-author A.R. used to perform an act in which he would appear to read the mind of an audience member. The secret was simply that the audience member he selected for the demonstration was a paid confederate; the apparently impromptu mind reading was actually a scripted exchange. In the middle of one show, a man in the theatre stood up and shouted, “I was here last week and he chose the same woman. She’s a stooge!” After some commotion and hesitation, the magician invited the heckler onto the stage and then proceeded to read his mind instead. The act was powerful for the audience and particularly so for the initial confederate. The magician later “confided” to her that he could indeed genuinely read minds, but it was cognitively taxing for him, which is why he hired her as a confederate. The confederate was so impressed that she praised his magical powers in front of friends and colleagues for years after the performance. As it turns out, the heckler was the magician’s uncle—yet another confederate. This additional layer of deception was intended to fool the audience as well as the initial confederate.
Magicians often use such elaborate forms of deception (Kuhn et al 2014; Teller, 2012). Audiences may suspect stooges in a magic show, but they are less likely to suspect one stooge to cover up another. In other cases, magicians may show up at a restaurant hours before a performance to stick playing cards under each of the tables, one of which will be used in a casual magic trick over dinner. Or, the spouse of a magician may pretend to not understand English in order to discreetly eavesdrop and signal information undetected from the audience. Such elaborate acts, requiring considerable time, money, or effort, can be difficult for lay audiences to imagine and are thus particularly deceptive (Teller, 2012).
In research, deception is often confined to a few layers, such as a bogus device or a false explanation of what a task is measuring (Sieber et al 1995), though adding more layers may increase the effectiveness of the deception. In one study (Olson et al 2016), we had to convince educated participants that a (sham) MRI scanner could both read their mind and insert thoughts into their head; we were testing whether the delusion of thought insertion could be reproduced in a non-clinical population. To do so, we used various layers to strengthen the deception. The first 30 min of the protocol included fake MRI safety screenings, a lab technician (surrounded by scientific paraphernalia) describing the complex workings of the machine, and a sham calibration procedure. As in magic, such deception can lead participants down one explanatory path (e.g., that a novel technology will control their mind), making them less likely to discover the underlying “secret” (Thomas & Didierjean, 2016). These many layers constitute costly signalling: the effort involved in the procedure was specifically intended to make participants less likely to infer that it was all a sham (Galang, 2018). In a replication, removing one of the key layers of deception made the procedure less convincing (Pailhès, Olson, & Kuhn, in progress). Related studies of machine mind reading and thought insertion that used fewer layers of deception have also resulted in higher rates of suspicion or somewhat weaker effects (Ali et al., 2014; Swiney & Sousa, 2013).
Elaborate deceptive methods are occasionally required in placebo research. In a study applying the Swiss cheese model, we used a dozen researchers in lab coats, a security guard, a handful of confederates, sham blood pressure feedback, and fake drug information sheets to convince participants that the placebos they consumed were actually psychedelic drugs (Olson et al 2020). Accordingly, some of the participants reported alterations in consciousness similar to what one would expect from a moderate dose of the actual drug. In a study of placebo alcohol, Bernstein et al (2016) also used various layers of deception: confederates made off-hand comments about friends who got drunk while previously completing the study, the researchers sprayed the room with an alcohol scent, and the (non-alcoholic) drinks had real alcohol rubbed along the rim for subtle taste cues…When guessing three people’s chosen playing cards, they [mentalists] will intentionally get the last one slightly wrong (e.g., guessing the Seven of Diamonds rather than the Seven of Hearts) to make the situation appear more plausible and lead people to believe it is telepathy rather than a trick (Burger, 1983). This trickery is effective because it is more difficult for audiences to imagine that such seemingly costly mistakes would be carefully planned to improve the show (Galang, 2018).
Just as vision scientists study visual art and illusions to elucidate the workings of the visual system, so too can cognitive scientists study cognitive illusions to elucidate the underpinnings of cognition. Magic shows are a manifestation of accomplished magic performers’ deep intuition for and understanding of human attention and awareness. By studying magicians and their techniques, neuroscientists can learn powerful methods to manipulate attention and awareness in the laboratory. Such methods could be exploited to directly study the behavioural and neural basis of consciousness itself, for instance through the use of brain imaging and other neural recording techniques. [See their “Table 1: Psychological Assumptions” for a taxonomy.]
An intrusive thought is an unwelcome, involuntary thought, image, or unpleasant idea that may become an obsession, is upsetting or distressing, and can feel difficult to manage or eliminate. When such thoughts are associated with obsessive-compulsive disorder (OCD), depression, body dysmorphic disorder (BDD), and sometimes attention-deficit hyperactivity disorder (ADHD), the thoughts may become paralyzing, anxiety-provoking, or persistent. Intrusive thoughts may also be associated with episodic memory, unwanted worries or memories from OCD, post-traumatic stress disorder, other anxiety disorders, eating disorders, or psychosis. Intrusive thoughts, urges, and images are of inappropriate things at inappropriate times, and generally have aggressive, sexual, or blasphemous themes.
Trypophobia is an aversion to the sight of irregular patterns or clusters of small holes or bumps. It is not officially recognized as a mental disorder, but may be diagnosed as a specific phobia if excessive fear and distress occur. People may express only disgust to trypophobic imagery.
The marbled crayfish or Marmorkrebs, is a parthenogenetic crayfish that was discovered in the pet trade in Germany in the 1990s. Marbled crayfish are closely related to the "slough crayfish", Procambarus fallax, which is widely distributed across Florida. No natural populations of marbled crayfish are known. Information provided by one of the original pet traders as to where the marbled crayfish originated was deemed "totally confusing and unreliable". The informal name Marmorkrebs is German for "marbled crayfish".
As the United States and the world discuss the possibility of conflict extending into space, it is important to have a general understanding of what is physically possible and practical. Scenes from Star Wars, books, and TV shows portray a world very different from what we are likely to see in the next 50 years, if ever, given the laws of physics. To describe how physics constrains the space-to-space engagements of a conflict that extends into space, this paper lays out five key concepts:
satellites move quickly,
satellites move predictably,
space is big,
timing is everything, and
satellites maneuver slowly.
It is meant to be accessible to policymakers and decision-makers, helping to frame discussions of space conflict. It does not explore geopolitical considerations.
[Review of orbital dynamics: satellites are difficult to hit with any ordinary weapon because of their speed and distance, and are most easily attacked by getting into the same orbit. However, satellites are heavily constrained by their initial fuel reserve and reliant on subtle maneuvers unfolding over many orbits to gradually approach a desired point and time; a bad position or one maneuver could easily cost the entire fuel budget. In lieu of long-distance attack capabilities like powerful lasers, attacks must be planned long in advance, and, like cyberwarfare, are more likely to resemble ambushes than conventional battle.]
Microtypography is the name given to a range of methods for improving the readability and appearance of text, especially justified text. The methods reduce the appearance of large interword spaces and create edges to the text that appear more even. Microtypography methods can also increase reading comprehension of text, reducing the cognitive load of reading.
This paper discusses a new approach to the problem of dividing the text of a paragraph into lines of approximately equal length. Instead of simply making decisions one line at a time, the method considers the paragraph as a whole, so that the final appearance of a given line might be influenced by the text on succeeding lines.
A system based on three simple primitive concepts called ‘boxes’, ‘glue’, and ‘penalties’ provides the ability to deal satisfactorily with a wide variety of typesetting problems in a unified framework, using a single algorithm that determines optimum breakpoints. The algorithm avoids backtracking by a judicious use of the techniques of dynamic programming.
Extensive computational experience confirms that the approach is both efficient and effective in producing high-quality output. The paper concludes with a brief history of line-breaking methods, and an appendix presents a simplified algorithm that requires comparatively few resources.
A polyglot is a book that contains side-by-side versions of the same text in several different languages. Some editions of the Bible or its parts are polyglots, in which the Hebrew and Greek originals are exhibited along with historical translations. Polyglots are useful for studying the history of the text and its interpretation.
This thesis investigates the possibility to improve the quality of text composition. Two typographic extensions were examined: margin kerning and composing with font expansion.
Margin kerning is the adjustments of the characters at the margins of a typeset text. A simplified employment of margin kerning is hanging punctuation. Margin kerning is needed for optical alignment of the margins of a typeset text, because mechanical justification of the margins makes them look rather ragged. Some characters can make a line appear shorter to the human eye than others. Shifting such characters by an appropriate amount into the margins would greatly improve the appearance of a typeset text.
Composing with font expansion is the method to use a wider or narrower variant of a font to make interword spacing more even. A font in a loose line can be substituted by a wider variant so the interword spaces are stretched by a smaller amount. Similarly, a font in a tight line can be replaced by a narrower variant to reduce the amount that the interword spaces are shrunk by. There is certainly potential danger of font distortion when using such manipulations, thus they must be used with extreme care. The potentiality to adjust a line width by font expansion can be taken into consideration while a paragraph is being broken into lines, in order to choose better breakpoints.
These typographic extensions were implemented in pdfTeX, a derivation of TeX. Heavy experiments have been done to examine the influence of the extensions on the quality of typesetting. The extensions turned out to noticeably improve the appearance of a typeset text. A number of ‘real-world’ documents have been typeset using these typographic extensions, including this thesis.
In recent years, a controversy has arisen in Japan regarding an ongoing landscape policy proposing to eliminate the forest of utility poles and electric wires that covers almost all urban and rural landscapes. The controversy is somewhat peculiar vis-à-vis the existing study of landscape, partly because of the utterly ubiquitous and non-monumental characteristics of the poles and partly because of the general apathy in public reaction to them. Drawing upon diverse academic sources, this interdisciplinary exploration unfolds a complex entanglement of tacit landscape ideas behind the controversy. The author discusses the effectiveness and limits of addressing both the substantial and visual aspects of the poles vis-à-vis the public and policy makers by using three conceptual frameworks: (1) ‘erasure’ in the landscape as palimpsest, (2) the dual aspects of ‘noise’, and (3) artialisation, in order to understand this mundane element of technological objects in the context of creating contemporary landscapes.
…Utility-pole aesthetics: In contrast with this rather ghostly genealogy of artialisation, Hideaki Anno, a film director, has maximally explored the aesthetic potential of these poles and wires in his works, with his blatant claim for the aesthetics of such pole-covered landscapes…Regarding the importance of iconography in formulating landscape ideas (Cosgrove, 2006; Cosgrove and Daniels, 1988) or artialisation (Roger, 1997), the audience of Anno’s animation works immediately comprehends the detailed depiction of such technoscapes involving utility poles, wires and high voltage towers in his works, in sharp contrast to the more conventional way of simply omitting these items or using symbols in their place. Ryusuke Hikawa, a film critic, explains that Anno elects to put these elements in the forefront, focusing on their hidden life and history in his sagas.11 Anno has been becoming more outspoken in recent years in defence of the beauty of these poles, probably in response to the no-poles campaign that has become publicly visible. As he said in an interview conducted in 2000:
As I grew up close to a factory, it was my archetypal image. Even now I love such things as factories and masses of iron. I love also utility poles; especially their functional beauty (kinô-bi). I know there’s a movement in political circles to remove these poles. I wonder what motivates them to further impoverish the urban landscape, which has already been so boring. There would be no charm of landscape in Tokyo without utility poles. (emphasis added)12
On another occasion, he reiterates the concept of the functional beauty of these poles:
Utility poles have only functional beauty (kinô-bi). Their concise form exists as uniformity in every city . . . The disinterestedness of such poles, without any compromise to the general landscape, is something that I adore that is irreplaceable with other things.13
In parallel with Anno’s unique support of the poles’ beauty with his poetic depiction of them in his works, on a Japanese photographic SNS site called Ingrum there is a page dedicated to photos of utility poles with those that are clearly reminiscent of the scenes in Anno’s Evangelion, whose number had reached 107,147 as of late 2018, and the number is still growing.14Pixiv, another Japanese SNS site for both professional and amateur graphics writers, has a specific category of drawings for utility poles.15 There is even a site for the best drawings of utility-pole related landscapes, with a caption referring to the ‘inorganic beauty of electric wires’, which says, ‘we find these poles everywhere outside, while usually we don’t pay attention to them. Once, however, we attend to them, we are captured by their functional, inorganic beauty.’16 In what is called the Pixiv-Encyclopedia, the entry for utility poles is defined as ‘something nostalgic for the Japanese, while their number is decreasing due to the policy of burying them underground’.17
Related to such efforts to reappraise the aesthetic value of these poles on the web, there is a site on the web that collects critical comments on the very picture of Mt Fuji covered with utility poles in the photography competition for a No-Poles Landscape mentioned above. There are quite a few comments that underscore that the utility poles that cover the Mt Fuji print actually enhance the beauty of the scenery in the context of modern technology.18
In computer networking, IP over Avian Carriers (IPoAC) is a proposal to carry Internet Protocol (IP) traffic by birds such as homing pigeons. IP over Avian Carriers was initially described in RFC 1149, a Request for Comments (RFC) issued by the Internet Engineering Task Force (IETF), written by D. Waitzman, and released on April 1, 1990. It is one of several April Fools'' Day Request for Comments.
…On 28 April 2001, IPoAC was implemented by the Bergen Linux user group, under the name CPIP (for “Carrier Pigeon Internet Protocol”). They sent nine packets over a distance of approximately five kilometers (three miles), each carried by an individual pigeon and containing one ping (ICMP Echo Request), and received four responses.
It is unknown how abundant extraterrestrial life is, or whether such life might be complex or intelligent. On Earth, the emergence of complex intelligent life required a preceding series of evolutionary transitions such as abiogenesis, eukaryogenesis, and the evolution of sexual reproduction, multicellularity, and intelligence itself. Some of these transitions could have been extraordinarily improbable, even in conducive environments. The emergence of intelligent life late in Earth’s lifetime is thought to be evidence for a handful of rare evolutionary transitions, but the timing of other evolutionary transitions in the fossil record is yet to be analyzed in a similar framework.
Using a simplified Bayesian model that combines uninformative priors and the timing of evolutionary transitions, we demonstrate that expected evolutionary transition times likely exceed the lifetime of Earth, perhaps by many orders of magnitude. Our results corroborate the original argument suggested by Brandon Carter that intelligent life in the Universe is exceptionally rare, assuming that intelligent life elsewhere requires analogous evolutionary transitions. Arriving at the opposite conclusion would require exceptionally conservative priors, evidence for much earlier transitions, multiple instances of transitions, or an alternative model that can explain why evolutionary transitions took hundreds of millions of years without appealing to rare chance events.
Although the model is simple, it provides an initial basis for evaluating how varying biological assumptions and fossil record data impact the probability of evolving intelligent life, and also provides a number of testable predictions, such as that some biological paradoxes will remain unresolved and that planets orbiting M dwarf stars are uninhabitable.
A constructed language is a language whose phonology, grammar, and vocabulary, instead of having developed naturally, are consciously devised. Constructed languages may also be referred to as artificial languages, planned languages or invented languages and in some cases, fictional languages. Planned languages are languages that have been purposefully designed. They are the result of deliberate controlling intervention, thus of a form of language planning.
Rodney Marvin McKuen was an American poet, singer-songwriter, and actor. He was one of the best-selling poets in the United States during the late 1960s. Throughout his career, McKuen produced a wide range of recordings, which included popular music, spoken word poetry, film soundtracks and classical music. He earned two Academy Award nominations and one Pulitzer nomination for his music compositions. McKuen's translations and adaptations of the songs of Jacques Brel were instrumental in bringing the Belgian songwriter to prominence in the English-speaking world. His poetry deals with themes of love, the natural world and spirituality. McKuen's songs sold over 100 million recordings worldwide, and 60 million books of his poetry were sold as well.
In news that surprises nobody, Goodreads last week quietly announced the deprecation of their public APIs. And I mean really quietly—the only people who were told about this were those unfortunate enough to have their existing API keys disabled without warning. Other than a small banner at the top of the API docs which mentions vague “plans to retire these tools”, nobody else appears to have heard anything from Goodreads, including those whose API keys remain active…So this is an “announcement” much in the way a windshield announces its presence to bugs on a highway, and with the same consequences: dead bugs. Some developers have taken to the API discussion boards and blogs, but the overall impression I’m getting is grim acceptance. Really the surprising thing is how long it took them: Amazon has been in charge at Goodreads for almost 8 years now, and I think we’ve all been expecting this to come at some point.
So why now? What’s changed? Well, the fact is the market’s changing—and Goodreads isn’t. Alternative options are starting to emerge, and since Goodreads has forgotten how to innovate, it wants to use its market position to stifle innovation instead.
This is a compilation of my book reviews. Book reviews are sorted by star, and sorted by length of review within each star level, under the assumption that longer reviews are of more interest to readers.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.
A superpressure balloon (SPB) is a style of aerostatic balloon where the volume of the balloon is kept relatively constant in the face of changes in ambient pressure outside the balloon, and the temperature of the contained lifting gas. This allows the balloon to keep a stable altitude for long periods. This is in contrast with much more common variable-volume balloons, which are either only partially filled with lifting gas, or made with more elastic materials. Also referred to as pumpkin or Ultra Long Distance Balloons (ULDB) balloons, the sealed balloon envelopes have a pumpkin shape at flight altitude. In a variable-volume balloon, the volume of the lifting gas changes due to heating and cooling in the diurnal cycle. The cycle is magnified by a greenhouse effect inside the balloon, while the surrounding atmospheric gas is subject to a much more limited cyclical temperature change. As the lift gas heats and expands, the displacement of atmospheric gas increases, while the balloon weight remains constant. Its buoyancy increases, and this leads to a rise in altitude unless it is compensated by venting gas. Conversely, if the balloon cools and drops, it becomes necessary to release ballast. Since both ballast and gas are finite, there is a limit to how long a variable-volume balloon can compensate in order to stabilize its altitude. In contrast, a superpressure balloon will change altitude much less without compensation maneuvers.
Loon LLC is an Alphabet Inc. subsidiary working on providing Internet access to rural and remote areas. The company uses high-altitude balloons in the stratosphere at an altitude of 18 km (11 mi) to 25 km (16 mi) to create an aerial wireless network with up to 1 Mbit/s speeds. A reference to the balloons used, Project Loon began as a research and development project by X, but later spun out into a separate company in July 2018.
Pixiv is a Japanese online community for artists. It was first launched as a beta test on September 10, 2007 by Takahiro Kamitani and Takanori Katagiri. Pixiv Inc. is headquartered in Sendagaya, Shibuya, Tokyo, Japan. As of September 2016, the site consists of over 20 million members, over 43 million submissions, and receives over 3.7 billion page views monthly. Pixiv aims to provide a place for artists to exhibit their illustrations and get feedback via a rating system and user comments. Works are organized in an extensive tag structure which forms the backbone of the website.
The ability to reason about temporal and causal events from videos lies at the core of human intelligence. Most video reasoning benchmarks, however, focus on pattern recognition from complex visual and language input, instead of on causal structure. We study the complementary problem, exploring the temporal and causal structures behind videos of objects with simple visual appearance.
To this end, we introduce the CoLlision Events for Video REpresentation and Reasoning (CLEVRER), a diagnostic video dataset for systematic evaluation of computational models on a wide range of reasoning tasks. Motivated by the theory of human casual judgment, CLEVRER includes four types of questions: descriptive (e.g., "what color"), explanatory ("what is responsible for"), predictive ("what will happen next"), and counterfactual ("what if").
We evaluate various state-of-the-art models for visual reasoning on our benchmark. While these models thrive on the perception-based task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual), suggesting that a principled approach for causal reasoning should incorporate the capability of both perceiving complex visual and language inputs, and understanding the underlying dynamics and causal relations. We also study an oracle model that explicitly combines these components via symbolic representations.
Computer vision has undergone a dramatic revolution in performance, driven in large part through deep features trained on large-scale supervised datasets. However, much of these improvements have focused on static image analysis; video understanding has seen rather modest improvements. Even though new datasets and spatiotemporal models have been proposed, simple frame-by-frame classification methods often still remain competitive. We posit that current video datasets are plagued with implicit biases over scene and object structure that can dwarf variations in temporal structure.
In this work, we build a video dataset with fully observable and controllable object and scene bias, and which truly requires spatiotemporal understanding in order to be solved. Our dataset, named CATER, is rendered synthetically using a library of standard 3D objects, and tests the ability to recognize compositions of object movements that require long-term reasoning.
In addition to being a challenging dataset, CATER also provides a plethora of diagnostic tools to analyze modern spatiotemporal video architectures by being completely observable and controllable. Using CATER, we provide insights into some of the most recent state of the art deep video architectures.
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5 accuracy to 86.7 Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Our world can be succinctly and compactly described as structured scenes of objects and relations. A typical room, for example, contains salient objects such as tables, chairs and books, and these objects typically relate to each other by their underlying causes and semantics. This gives rise to correlated features, such as position, function and shape. Humans exploit knowledge of objects and their relations for learning a wide spectrum of tasks, and more generally when learning the structure underlying observed data.
In this work, we introduce relation networks (RNs)—a general purpose neural network architecture for object-relation reasoning. We show that RNs are capable of learning object relations from scene description data. Furthermore, we show that RNs can act as a bottleneck that induces the factorization of objects from entangled scene description inputs, and from distributed deep representations of scene images provided by a variational autoencoder. The model can also be used in conjunction with differentiable memory mechanisms for implicit relation discovery in one-shot learning tasks. Our results suggest that relation networks are a potentially powerful architecture for solving a variety of problems that require object relation reasoning.
When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint model weaknesses. We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.
One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent. To measure progress towards that goal, we argue for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering. Our tasks measure understanding in several ways: whether a system is able to answer questions via chaining facts, simple induction, deduction and many more. The tasks are designed to be prerequisites for any system that aims to be capable of conversing with a human. We believe many existing learning systems can currently not solve them, and hence our aim is to classify these tasks into skill sets, so that researchers can identify (and then rectify) the failings of their systems. We also extend and improve the recently introduced Memory Networks model, and show it is able to solve some, but not all, of the tasks.
“Language Models are Few-Shot Learners”, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei (2020-05-28):
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions—something which current NLP systems still largely struggle to do.
Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10× more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
Fugaku(Japanese: 富岳) – named after an alternative name for Mount Fuji – is a claimed exascale supercomputer, at the RIKEN Center for Computational Science in Kobe, Japan. It started development in 2014 as the successor to the K computer, and was scheduled to start operating in 2021. Fugaku made its debut in 2020, and became the fastest supercomputer in the world in the June 2020 TOP500 list, as well as becoming the first ARM architecture-based computer to achieve this. As of November 2020, Fugaku is still the fastest supercomputer in the world.
China and the United States are locked in a contest to develop the world’s most powerful computers. Now a massive machine in Japan has topped them both.
A long-awaited supercomputer called Fugaku, installed in the city of Kobe by the government-sponsored Riken institute, took first place in a twice-yearly speed ranking that was released on Monday. The Japanese machine carried out 2.8 times more calculations a second than an IBM system at Oak Ridge National Laboratory in Tennessee, which Fugaku bumped to second place in the so-called Top500 list. Another IBM system, at Lawrence Livermore National Laboratory in California, slid to third place in the ranking from second, while systems in China moved to the fourth and fifth spots from third and fourth.
…Japan remains a relatively small player in supercomputing. China placed 226 systems in the latest Top500 list; the U.S. total was 114, though they accounted for a greater share of aggregate computing power. But Japan has a long history of pushing the state of the art in computing. A prominent example is the K Supercomputer, its predecessor at Riken, which took the No. 1 spot on the Top500 list in 2011 before being displaced the next year by a system at Livermore.
…Horst Simon, who has studied Fugaku as deputy director of research at Lawrence Berkeley National Laboratory in California, called it a “very remarkable, very admirable” product. But it may not last long as the world’s fastest supercomputer in view of forthcoming Department of Energy systems at Oak Ridge and Livermore and likely advances in China, he said.
Fugaku, another name for Mount Fuji, required some lofty spending. The six-year budget for the system and related technology development totaled about $1 billion, compared with the $600 million price tags for the biggest planned U.S. systems.
The brain processes information through many layers of neurons. This deep architecture is representationally powerful, but it complicates learning by making it hard to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame to a neuron by computing exactly how it contributed to an error. To do this, it multiplies error signals by matrices consisting of all the synaptic weights on the neuron’s axon and farther downstream. This operation requires a precisely choreographed transport of synaptic weight information, which is thought to be impossible in the brain.
Here we present a surprisingly simple algorithm for deep learning, which assigns blame by multiplying error signals by random synaptic weights. We show that a network can learn to extract useful information from signals sent through these random feedback connections. In essence, the network learns to learn. We demonstrate that this new mechanism performs as quickly and accurately as backpropagation on a variety of problems and describe the principles which underlie its function.
Our demonstration provides a plausible basis for how a neuron can be adapted using error signals generated at distal locations in the brain, and thus dispels long-held assumptions about the algorithmic constraints on learning in neural circuits.
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (Θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well; this is an idea that many have explored in the past, and we hope our result motivates further research into applying this idea on larger and more diverse datasets.
GPT-3, announced by OpenAI in May 2020, is the largest neural network ever trained, by over an order of magnitude. Trained on Internet text data, it is the successor to GPT-2, which surprised everyone by its natural language understanding & generation ability. GPT-3 is even more surprising in that this vast increase in size did not run into diminishing returns, as many expected, but the benefits of scale continued to happen as forecasted by OpenAI. These benefits were not merely learning more facts & text than GPT-2, but qualitatively distinct & surprising in showing meta-learning: while GPT-2 learned how to do common natural language tasks like text summarization, GPT-3 instead learned how to follow directions and learn new tasks from a few examples. (As a result, GPT-3 outputs & interaction are more fascinating & human-like than GPT-2.)
While the immediate applications of GPT-3, like my poetry or humor writings, are nice, the short-term implications of GPT-3 are much more important.
First, while GPT-3 is expensive by conventional DL standards, it is cheap by scientific/commercial/military/government budget standards, and the results indicate that models could be made much larger. Second, models can also be made much more powerful, as GPT is an old approach known to be flawed in both minor & major ways, and far from an ‘ideal’ Transformer. Third, GPT-3’s capabilities come from learning on raw (unsupervised) data; that has long been one of the weakest areas of DL, holding back progress in other areas like reinforcement learning or robotics. Models like GPT-3 suggest that large unsupervised models will be vital components of future DL systems, as they can be ‘plugged into’ systems to immediately provide understanding of the world, humans, natural language, and reasoning.
The meta-learning has a longer-term implication: it is a demonstration of the blessings of scale, where problems with simple neural networks vanish, and they become more powerful, more generalizable, more human-like when simply made very large & trained on very large datasets with very large compute—even though those properties are believed to require complicated architectures & fancy algorithms (and this perceived need drives much research). Unsupervised models benefit from this, as training on large corpuses like Internet-scale text present a myriad of difficult problems to solve; this is enough to drive meta-learning despite GPT not being designed for meta-learning in any way. (This family of phenomena is perhaps driven by neural networks functioning as ensembles of many sub-networks with them all averaging out to an Occam’s razor, which for small data & models, learn superficial or memorized parts of the data, but can be forced into true learning by making the problems hard & rich enough.)
The blessings of scale in turn support a radical theory: an old AI paradigm held by a few pioneers in connectionism (early artificial neural network research) and by more recent deep learning researchers, the scaling hypothesis. The scaling hypothesis regards the blessings of scale as the secret of AGI: intelligence is ‘just’ simple neural units & learning algorithms applied to diverse experiences at a (currently) unreachable scale. As increasing computational resources permit running such algorithms at the necessary scale, the neural networks will get ever more intelligent.
When? Estimates of Moore’s law-like progress curves decades ago by pioneers like Hans Moravec indicated that it would take until the 2010s for the sufficiently-cheap compute for tiny insect-level prototype systems to be available, and the 2020s for the first sub-human systems to become feasible, and these forecasts are holding up. (Despite this vindication, the scaling hypothesis is so unpopular an idea, and difficult to prove in advance rather than as a fait accompli, that while the GPT-3 results finally drew some public notice after OpenAI enabled limited public access & people could experiment with it live, it is unlikely that many entities will modify their research philosophies, much less kick off an ‘arms race’.)
More concerningly, GPT-3’s scaling curves, unpredicted meta-learning, and success on various anti-AI challenges suggests that in terms of futurology, AI researchers’ forecasts are an emperor sans garments: they have no coherent model of how AI progress happens or why GPT-3 was possible or what specific achievements should cause alarm, where intelligence comes from, and do not learn from any falsified predictions. Their primary concerns appear to be supporting the status quo, placating public concern, and remaining respectable. As such, their comments on AI risk are meaningless: they would make the same public statements if the scaling hypothesis were true or not.
Depending on what investments are made into scaling DL, and how fast compute grows, the 2020s should be quite interesting—sigmoid or singularity?
Photonics is the physical science of light (photon) generation, detection, and manipulation through emission, transmission, modulation, signal processing, switching, amplification, and sensing. Though covering all light's technical applications over the whole spectrum, most photonic applications are in the range of visible and near-infrared light. The term photonics developed as an outgrowth of the first practical semiconductor light emitters invented in the early 1960s and optical fibers developed in the 1970s.
In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. Random projection methods are known for their power, simplicity, and low error rates when compared to other methods. According to experimental results, random projection preserves distances well, but empirical results are sparse. They have been applied to many natural language tasks under the name random indexing.
Compared to GPT-2, GPT-3 improves performance on character-level tasks like rhyming, alliteration, punning, anagrams or permutations, acrostic poems, and arithmetic less than expected, despite being very good at many other closely-related kinds of writings like satire.
Why? A plausible explanation is an obscure technical detail: as a performance optimization, GPT does not see characters but sub-word-chunks called “byte-pair encodings” (BPEs). Because GPTs never see characters but opaque partial-words, which vary chaotically based on the specific word and even the surrounding context, they are unable to easily learn about character-level aspects of language, like similar spellings or sounds, and are forced to learn relationships much more indirectly, like by brute-force memorizing of pairs of words.
Some experiments with reformatting GPT-3’s poorest-performing tasks to avoid inconsistent BPE encodings of strings shows small to large performance gains, consistent with this theory.
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific datasets.
We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset—matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples.
The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text.
These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome. It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons – humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.
“Meta-analysis of genome-wide association studies of attention-deficit/hyperactivity disorder”, Neale, Benjamin M. Medland, Sarah E. Ripke, Stephan Asherson, Philip Franke, Barbara Lesch, Klaus-Peter Faraone, Stephen V. Nguyen, Thuy Trang Schäfer, Helmut Holmans, Peter Daly, Mark Steinhausen, Hans-Christoph Freitag, Christine Reif, Andreas Renner, Tobias J. Romanos, Marcel Romanos, Jasmin Walitza, Susanne Warnke, Andreas Meyer, Jobst Palmason, Haukur Buitelaar, Jan Vasquez, Alejandro Arias Lambregts-Rommelse, Nanda Gill, Michael Anney, Richard J. L Langely, Kate O'Donovan, Michael Williams, Nigel Owen, Michael Thapar, Anita Kent, Lindsey Sergeant, Joseph Roeyers, Herbert Mick, Eric Biederman, Joseph Doyle, Alysa Smalley, Susan Loo, Sandra Hakonarson, Hakon Elia, Josephine Todorov, Alexandre Miranda, Ana Mulas, Fernando Ebstein, Richard P. Rothenberger, Aribert Banaschewski, Tobias Oades, Robert D. Sonuga-Barke, Edmund McGough, James Nisenbaum, Laura Middleton, Frank Hu, Xiaolan Nelson, Stan (2010):
Although twin and family studies have shown attention-deficit/hyperactivity disorder (ADHD) to be highly heritable, genetic variants influencing the trait at a genome-wide significant level have yet to be identified. As prior genome-wide association studies (GWAS) have not yielded significant results, we conducted a meta-analysis of existing studies to boost statistical power. We used data from four projects: a) the Children's Hospital of Philadelphia (CHOP); b) phase I of the International Multicenter ADHD Genetics project (IMAGE); c) phase II of IMAGE (IMAGE II); and d) the Pfizer-funded study from the University of California, Los Angeles, Washington University, and Massachusetts General Hospital (PUWMa). The final sample size consisted of 2,064 trios, 896 cases, and 2,455 controls. For each study, we imputed HapMap single nucleotide polymorphisms, computed association test statistics and transformed them to z-scores, and then combined weighted z-scores in a meta-analysis. No genome-wide significant associations were found, although an analysis of candidate genes suggests that they may be involved in the disorder. Given that ADHD is a highly heritable disorder, our negative results suggest that the effects of common ADHD risk variants must, individually, be very small or that other types of variants, e.g., rare ones, account for much of the disorder's heritability.
We examined the role of common genetic variation in schizophrenia in a genome-wide association study of substantial size: a stage 1 discovery sample of 21,856 individuals of European ancestry and a stage 2 replication sample of 29,839 independent subjects. The combined stage 1 and 2 analysis yielded genome-wide significant associations with schizophrenia for seven loci, five of which are new (1p21.3, 2q32.3, 8p23.2, 8q21.3 and 10q24.32-q24.33) and two of which have been previously implicated (6p21.32-p22.1 and 18q21.2). The strongest new finding (p = 1.6 × 10−11) was with rs1625579 within an intron of a putative primary transcript for MIR137 (microRNA 137), a known regulator of neuronal development. Four other schizophrenia loci achieving genome-wide significance contain predicted targets of MIR137, suggesting MIR137-mediated dysregulation as a previously unknown etiologic mechanism in schizophrenia. In a joint analysis with a bipolar disorder sample (16,374 affected individuals and 14,044 controls), three loci reached genome-wide significance: CACNA1C (rs4765905, p = 7.0 × 10−9), ANK3 (rs10994359, p = 2.5 × 10−8) and the ITIH3-ITIH4 region (rs2239547, p = 7.8 × 10−9).
We conducted a combined genome-wide association study (GWAS) of 7,481 individuals with bipolar disorder (cases) and 9,250 controls as part of the Psychiatric GWAS Consortium. Our replication study tested 34 SNPs in 4,496 independent cases with bipolar disorder and 42,422 independent controls and found that 18 of 34 SNPs had p < 0.05, with 31 of 34 SNPs having signals with the same direction of effect (p = 3.8 × 10−7). An analysis of all 11,974 bipolar disorder cases and 51,792 controls confirmed genome-wide significant evidence of association for CACNA1C and identified a new intronic variant in ODZ4. We identified a pathway comprised of subunits of calcium channels enriched in bipolar disorder association intervals. Finally, a combined GWAS analysis of schizophrenia and bipolar disorder yielded strong association evidence for SNPs in CACNA1C and in the region of NEK4-ITIH1-ITIH3-ITIH4. Our replication results imply that increasing sample sizes in bipolar disorder will confirm many additional loci.
Publication bias is a persisting problem in meta-analyses for evidence based medicine. As a consequence small studies with large treatment effects are more likely to be reported than studies with a null result which causes asymmetry.
Here, we investigated treatment effects from 57,186 studies from 1922 to 2019, and overall 99,129 meta-analyses and 5,557 large meta-analyses from the Cochrane Database of Systematic Reviews. Altogether 19% (95%-CI from 18% to 20%) of the meta-analyses demonstrated evidence for asymmetry, but only 3.9% (95%-CI from 3.4% to 4.4%) showed evidence for publication bias after further assessment of funnel plots. Adjusting treatment effects resulted in overall less evidence for efficacy, and treatment effects in some medical specialties or published in prestigious journals were more likely to be statistically significant.
These results suggest that asymmetry from exaggerated effects from small studies causes greater concern than publication bias.
In statistics, regression toward the mean is the phenomenon that arises if a sample point of a random variable is extreme, a future point will be closer to the mean or average on further measurements. To avoid making incorrect inferences, regression toward the mean must be considered when designing scientific experiments and interpreting data. Historically, what is now called regression toward the mean was also called reversion to the mean and reversion to mediocrity.
[On the Busy Beaver function.] The construction of non-computable functions used in this paper is based on the principle that a finite, non-empty set of non-negative integers has a largest element. Also, this principle is used only for sets which are exceptionally well-defined by current standards. No enumeration of computable functions is used, and in this sense the diagonal process is not employed. Thus, it appears that an apparently self-evident principle, of constant use in every area of mathematics, yields non-constructive entities.
In computability theory, the Ackermann function, named after Wilhelm Ackermann, is one of the simplest and earliest-discovered examples of a total computable function that is not primitive recursive. All primitive recursive functions are total and computable, but the Ackermann function illustrates that not all total computable functions are primitive recursive. After Ackermann’s publication of his function, many authors modified it to suit various purposes, so that today “the Ackermann function” may refer to any of numerous variants of the original function. One common version, the two-argument Ackermann–Péter function, is defined as follows for nonnegative integers m and n:
Goldbach's conjecture is one of the oldest and best-known unsolved problems in number theory and all of mathematics. It states that every even whole number greater than 2 is the sum of two prime numbers.
In mathematics, the Riemann hypothesis is a conjecture that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part 1⁄2. Many consider it to be the most important unsolved problem in pure mathematics. It is of great interest in number theory because it implies results about the distribution of prime numbers. It was proposed by Bernhard Riemann (1859), after whom it is named.
The well-defined but noncomputable functions Σ(k) and S(k)) given by T. Rado as the “score” and “shift number” for the k-state Turing machine “Busy Beaver Game” were previously known only for k ≤ 3. The largest known lower bounds yielding the relations Σ(4) ≥ 13 and S(4) ≥ 107, reported by this author, supported the conjecture that these lower bounds are the actual particular values of the functions for k = 4.
The four-state case has previously been reduced to solving the blank input tape halting problem of only 5,820 individual machines. In this final stage of the k = 4 case, one appears to move into a heuristic level of higher order where it is necessary to treat each machine as representing a distinct theorem.
The remaining set consists of two primary classes in which a machine and its tape are viewed as the representation of a growing string of cellular automata. The proof techniques, embodied in programs, are entirely heuristic, while the inductive proofs, once established by the computer, are completely rigorous and become the key to the proof of the new and original mathematical results: Σ(4) = 13 and S(4) = 107.
I have only done a little bit of social science research, but it was enough to make me hate people. One study I helped with analyzed whether people from different countries had different answers on a certain psychological test. So we put up a website where people answered some questions about themselves (like “what country are you from?”) and then took the psychological test. And so of course people screwed it up in every conceivable way. There were the merely dumb, like the guy who put “male” as his nationality and “American” as his gender. But there were also the actively malicious or at least annoying, like the people (yes, more than one) who wrote in “Martian”.
I think we all probably know someone like this, maybe a couple people like this. I also think most of us don’t know someone who believes reptilian aliens in human form control all the major nations of Earth. Public Policy Polling’s recent poll on conspiracy theories mostly showed up on my Facebook feed as “4% of Americans believe lizardmen are running the Earth” (of note, an additional 7% of Americans are “not sure” whether lizardmen are running the Earth or not.)
Imagine the situation. You’re at home, eating dinner. You get a call from someone who says “Hello, this is Public Policy Polling. Would you mind answering some questions for us?” You say “Sure”. An extremely dignified sounding voice says—and this is the exact wording of the question—“Do you believe that shape-shifting reptilian people control our world by taking on human form and gaining political power to manipulate our society, or not?” Then it urges you to press 1 if yes, press 2 if no, press 3 if not sure. So first we get the people who think “Wait, was 1 the one for if I did believe in lizardmen, or if I didn’t? I’ll just press 1 and move on to the next question.” Then we get the people who are like “I never heard it before, but if this nice pollster thinks it’s true, I might as well go along with them.” Then we get the people who are all “F#&k you, polling company, I don’t want people calling me when I’m at dinner. You screw with me, I tell you what I’m going to do. I’m going to tell you I believe lizard people are running the planet.” And then we get the people who put “Martian” as their nationality in psychology experiments. Because some men just want to watch the world burn.
Do these three groups total 4% of the US population? Seems plausible.
…But sometimes it’s not some abstruse subtle bias. Sometimes it’s not a good-natured joke. Sometimes people might just be actively working to corrupt your data.
Another link I’ve seen on my Facebook wall a few times is this one: “Are Climate Change Sceptics More Likely To Be Conspiracy Theorists?” It’s based on a paper by Stephen Lewandowsky et al called “NASA Faked The Moon Landing, Therefore Climate Science Is A Hoax—An Analysis Of The Motivated Rejection Of Science”. The paper’s thesis was that climate change skeptics are motivated by conspiracy ideation—a belief that there are large groups of sinister people out to deceive them. This seems sort of reasonable on the face of it—being a climate change skeptic requires going against the belief of the entire scientific establishment. My guess is that there probably is a significant link here waiting to be discovered.…But a bunch of global warming skeptics started re-analyzing the data and coming up with their own interpretations…More interestingly, they found that pretty much all of the link between global warming skepticism and stupidity was a couple of people (there were so few skeptics, and so few conspiracy believers, that these couple of people made up a pretty big proportion of them, and way more than enough to get a “significant” difference with the global warming believers). Further, most of these couple of people had given the maximally skeptical answer to every single question about global warming, and the maximally credulous answer to every single question about conspiracies.
The danger here now seems obvious. Global warming believer blogs publish a link to this study, saying gleefully that it’s going to prove that global warming skeptics are idiots who also think NASA faked the moon landing and the world is run by lizardmen or whatever. Some global warming believers decide to help this process along by pretending to be super-strong global warming skeptics and filling in the stupidest answers they can to every question. The few real global warming skeptics who take the survey aren’t enough signal to completely drown out this noise. Therefore, they do the statistics and triumphantly announce that global warming skepticism is linked to stupid beliefs.
…The lesson from all three of the cases in this post seems clear. When we’re talking about very unpopular beliefs, polls can only give a weak signal. Any possible source of noise—jokesters, cognitive biases, or deliberate misbehavior—can easily overwhelm the signal. Therefore, polls that rely on detecting very weak signals should be taken with a grain of salt.
Electromyography (EMG) is an electrodiagnostic medicine technique for evaluating and recording the electrical activity produced by skeletal muscles. EMG is performed using an instrument called an electromyograph to produce a record called an electromyogram. An electromyograph detects the electric potential generated by muscle cells when these cells are electrically or neurologically activated. The signals can be analyzed to detect medical abnormalities, activation level, or recruitment order, or to analyze the biomechanics of human or animal movement. In Computer Science, EMG is also used as middleware in gesture recognition towards allowing the input of physical action to a computer as a form of human-computer interaction.
Magicians use misdirection to prevent you from realizing the methods used to create a magical effect, thereby allowing you to experience an apparently impossible event. Magicians have acquired much knowledge about misdirection, and have suggested several taxonomies of misdirection. These describe many of the fundamental principles in misdirection, focusing on how misdirection is achieved by magicians. In this article we review the strengths and weaknesses of past taxonomies, and argue that a more natural way of making sense of misdirection is to focus on the perceptual and cognitive mechanisms involved. Our psychologically-based taxonomy has three basic categories, corresponding to the types of psychological mechanisms affected: perception, memory, and reasoning. Each of these categories is then divided into subcategories based on the mechanisms that control these effects. This new taxonomy can help organize magicians’ knowledge of misdirection in a meaningful way, and facilitate the dialog between magicians and scientists.
But magic’s not easy to pick apart with machines, because it’s not really about the mechanics of your senses. Magic’s about understanding—and then manipulating—how viewers digest the sensory information.
I think you’ll see what I mean if I teach you a few principles magicians employ when they want to alter your perceptions.
Exploit pattern recognition. I magically produce four silver dollars, one at a time, with the back of my hand toward you. Then I allow you to see the palm of my hand empty before a fifth coin appears. As Homo sapiens, you grasp the pattern, and take away the impression that I produced all five coins from a hand whose palm was empty.
Make the secret a lot more troublethan the trick seems worth. You will be fooled by a trick if it involves more time, money and practice than you (or any other sane onlooker) would be willing to invest.
My partner, Penn, and I once produced 500 live cockroaches from a top hat on the desk of talk-show host David Letterman. To prepare this took weeks. We hired an entomologist who provided slow-moving, camera-friendly cockroaches (the kind from under your stove don’t hang around for close-ups) and taught us to pick the bugs up without screaming like preadolescent girls. Then we built a secret compartment out of foam-core (one of the few materials cockroaches can’t cling to) and worked out a devious routine for sneaking the compartment into the hat. More trouble than the trick was worth? To you, probably. But not to magicians.
It’s hard to think critically if you’re laughing. We often follow a secret move immediately with a joke. A viewer has only so much attention to give, and if he’s laughing, his mind is too busy with the joke to backtrack rationally.
Keep the trickery outside the frame. I take off my jacket and toss it aside. Then I reach into your pocket and pull out a tarantula. Getting rid of the jacket was just for my comfort, right? Not exactly. As I doffed the jacket, I copped the spider.
To fool the mind, combine at least two tricks. Every night in Las Vegas, I make a children’s ball come to life like a trained dog. My method—the thing that fools your eye—is to puppeteer the ball with a thread too fine to be seen from the audience. But during the routine, the ball jumps through a wooden hoop several times, and that seems to rule out the possibility of a thread. The hoop is what magicians call misdirection, a second trick that “proves” the first. The hoop is genuine, but the deceptive choreography I use took 18 months to develop (see No. 2—More trouble than it’s worth).
Nothing fools you better than the lie you tell yourself. David P. Abbott was an Omaha magician who invented the basis of my ball trick back in 1907. He used to make a golden ball float around his parlor. After the show, Abbott would absent-mindedly leave the ball on a bookshelf while he went to the kitchen for refreshments. Guests would sneak over, heft the ball and find it was much heavier than a thread could support. So they were mystified. But the ball the audience had seen floating weighed only five ounces. The one on the bookshelf was a heavy duplicate, left out to entice the curious. When a magician lets you notice something on your own, his lie becomes impenetrable.
If you are given a choice, you believe you have acted freely. This is one of the darkest of all psychological secrets. I’ll explain it by incorporating it (and the other six secrets you’ve just learned) into a card trick worthy of the most annoying uncle:
The Effect: I cut a deck of cards a couple of times, and you glimpse flashes of several different cards. I turn the cards facedown and invite you to choose one, memorize it and return it. Now I ask you to name your card. You say (for example), “The queen of hearts.” I take the deck in my mouth, bite down and groan and wiggle to suggest that your card is going down my throat, through my intestines, into my bloodstream and finally into my right foot. I lift that foot and invite you to pull off my shoe and look inside. You find the queen of hearts. You’re amazed. If you happen to pick up the deck later, you’ll find it’s missing the queen of hearts.
The Secret(s): First, the preparation: I slip a queen of hearts in my right shoe, an ace of spades in my left and a three of clubs in my wallet. Then I manufacture an entire deck out of duplicates of those three cards. That takes 18 decks, which is costly and tedious (No. 2—More trouble than it’s worth). When I cut the cards, I let you glimpse a few different faces. You conclude the deck contains 52 different cards (No. 1—Pattern recognition). You think you’ve made a choice, just as when you choose between two candidates preselected by entrenched political parties (No. 7—Choice is not freedom). Now I wiggle the card to my shoe (No. 3—If you’re laughing…). When I lift whichever foot has your card, or invite you to take my wallet from my back pocket, I turn away (No. 4—Outside the frame) and swap the deck for a normal one from which I’d removed all three possible selections (No. 5—Combine two tricks). Then I set the deck down to tempt you to examine it later and notice your card missing (No. 6—The lie you tell yourself).
To learn whether criticism and regulation of research practices have been followed by a reduction of deception or use of more acceptable approaches to deception, the contents of all 1969, 1978, 1986, and 1992 issues of the Journal of Personality and Social Psychology were examined. Deception research was coded according to type of (non)informing (e.g., false informing, consent to deception, no informing), possible harmfulness of deception employed (e.g., powerfulness of induction, morality of the behavior induced, privacy of behavior), method of deception (e.g., bogus device or role, false purpose of study, false feedback), and debriefing employed. Use of confederates has been partly replaced by uses of computers. “Consent” with false informing declined after 1969, then rose in 1992. Changes in the topics studied (e.g., attribution, socialization, personality) largely accounted for the decline in deception in 1978 and 1986. More attention needs to be given to ways of respecting subjects’ autonomy, to appropriate debriefing and desensitizing, and to selecting the most valid and least objectionable deception methods.
We deceived participants into believing a machine could influence their thoughts.
Participants chose arbitrary numbers while inside the machine.
They felt less control and made slower decisions during the apparent influencing.
Some participants reported feeling an unknown source controlling their decisions.
This method may model psychiatric symptoms such as thought insertion.
In order to study the feeling of control over decisions, we told 60 participants that a neuroimaging machine could read and influence their thoughts. While inside a mock brain scanner, participants chose arbitrary numbers in two similar tasks. In the Mind-Reading Task, the scanner appeared to guess the participants’ numbers; in the Mind-Influencing Task, it appeared to influence their choice of numbers. We predicted that participants would feel less voluntary control over their decisions when they believed that the scanner was influencing their choices. As predicted, participants felt less control and made slower decisions in the Mind-Influencing Task compared to the Mind-Reading Task. A second study replicated these findings. Participants’ experience of the ostensible influence varied, with some reporting an unknown source directing them towards specific numbers. This simulated thought insertion paradigm can therefore influence feelings of voluntary control and may help model symptoms of mental disorders. [Keywords: sense of agency, thought insertion, volition, deception, magic, phenomenology]
Implanting an unfamiliar idea in the mind can prevent from finding an obvious one.
Highlighting false solutions divert suspicion away from the secret of the trick.
Even if subjects search for alternative solutions most of them fail to discover it.
In everyday life, several factors limit the human capacity to think differently. The present study shows that implanting an unlikely and unfamiliar idea in the mind can prevent participants from finding a more obvious one. To demonstrate this, we used a technique often adopted by magicians to misrepresent the method of a trick: the false solution. Our results reveal that a single exposure to an unlikely false solution (the magician can influence the spectator’s choice with his gesture) before the presentation of a card trick can prevent participants from finding the real (more obvious) secret of a trick, even if they are invited to search for an alternative solution. [Keywords: magic, fixing effect, Einstellung, problem solving, illusion]
Experimental deception has not been seriously examined in terms of its impact on reproducible science. I demonstrate, using data from the Open Science Collaboration’s Reproducibility Project (2015), that experiments involving deception have a higher probability of not replicating and have smaller effect sizes compared to experiments that do not have deception procedures. This trend is possibly due to missing information about the context and performance of agents in the studies in which the original effects were generated, leading to either compromised internal validity, or an incomplete specification and control of variables in replication studies. Of special interest are the mechanisms by which deceptions are implemented and how these present challenges for the efficient transmission of critical information from experimenter to participant. I rehearse possible frameworks that might form the basis of a future research program on experimental deception and make some recommendations as to how such a program might be initiated.
Rationale: Is it possible to have a psychedelic experience from a placebo alone? Most psychedelic studies find few effects in the placebo control group, yet these effects may have been obscured by the study design, setting, or analysis decisions.
Objective: We examined individual variation in placebo effects in a naturalistic environment resembling a typical psychedelic party.
Methods: 33 students completed a single-arm study ostensibly examining how a psychedelic drug affects creativity. The 4-h study took place in a group setting with music, paintings, coloured lights, and visual projections. Participants consumed a placebo that we described as a drug resembling psilocybin, which is found in psychedelic mushrooms. To boost expectations, confederates subtly acted out the stated effects of the drug and participants were led to believe that there was no placebo control group. The participants later completed the 5-Dimensional Altered States of Consciousness Rating Scale, which measures changes in conscious experience.
Results: There was considerable individual variation in the placebo effects; many participants reported no changes while others showed effects with magnitudes typically associated with moderate or high doses of psilocybin. In addition, the majority (61%) of participants verbally reported some effect of the drug. Several stated that they saw the paintings on the walls “move” or “reshape” themselves, others felt “heavy…as if gravity [had] a stronger hold”, and one had a “come down” before another “wave” hit her.
Conclusion: Understanding how context and expectations promote psychedelic-like effects, even without the drug, will help researchers to isolate drug effects and clinicians to maximise their therapeutic potential.
…In the second sample, before the debriefing, we asked participants to guess whether they had taken a psychedelic, a placebo, or whether they were uncertain. Overall, 35% reported being certain they had taken a placebo, 12% were certain that they had taken a psychedelic, and the rest (53%) were uncertain. In the first sample, we did not ask this question, but the same number of people spontaneously reported being certain that they had taken a psychedelic drug. During the debriefing, when we revealed the placebo nature of the study, many participants appeared shocked. Several gasped and started laughing. One stated, “It’s very funny!”, and another replied, “It’s sad!” One of the participants who had sat with a group near the paintings throughout the study asked, “So we were all sober and just watching these paintings for 45 minutes‽”
[“This is a remarkable study, and probably the most elaborate placebo ever reported. But how well did the trick work? The authors say that after they revealed the truth, some of the participants expressed shock. However, 35% of them said they were “certain” they had taken a placebo when quizzed just before the debriefing. Only 12% were “certain” that they’d taken a real psychedelic drug, which suggests that the deception was only partially successful.
Some of the participants did report very strong effects on a questionnaire of ‘psychedelic effects’. However, I noticed that the effects reported tended to be the more abstract kind, such as “insight” and “bliss”. In terms of actual hallucinogenic effects like ‘complex imagery’ and ‘elementary imagery’ (i.e. seeing things), no participants reported effects equal to even a low dose of LSD, let alone a stronger dose. See the rather confusing Figure 2 for details." —Neuroskeptic]
Background: The primary goal of this study was to establish a paradigm for credibly administering placebo alcohol to underage drinkers. We also sought to create a new, valid procedure for establishing placebo alcohol believability.
Method: Participants were 138 American college students (66.7% female) predominantly (90.0%) under the legal drinking age. Groups of 2–3 participants and one same-sex confederate consumed mixed drinks, purportedly containing alcohol, ad-lib in a naturalistic bar-laboratory for 20 minutes. All beverages, however, were non-alcoholic but we used visual, olfactory, and taste cues to maximize placebo credibility. Also, the confederate made two scripted statements designed to increase the perception of drinking real alcohol. After the drinking portion, participants responded to survey items related to alcohol consumption and intoxication. Next, they were individually debriefed, with open-ended responses used to make a determination of whether the participant was deceived with respect to placebo alcohol.
Results: All participants estimated consuming some amount of alcohol. However, using a more conservative criteria for estimating alcohol believability based on the debrief, 89.1% of participants were classified as deceived. Deceived participants were much more likely to estimate having a positive Blood Alcohol Content, and to say their current level of intoxication was typical given the amount of alcohol consumed than non-deceived participants.
Discussion: Credibly administering placebo alcohol to underage drinkers is possible. This approach carries great potential for future laboratory work. In addition, the methodology used here to classify participants as deceived or not deceived appears valid based on self-reported BAC estimation and intoxication levels. [Keywords: alcohol, placebo, confederate, bar-laboratory, college]
Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics.
The computer program pdfTeX is an extension of Knuth's typesetting program TeX, and was originally written and developed into a publicly usable product by Hàn Thế Thành as a part of the work for his PhD thesis at the Faculty of Informatics, Masaryk University, Brno, Czech Republic. The idea of making this extension to TeX was conceived during the early 1990s, when Jiří Zlatuška and Phil Taylor discussed some developmental ideas with Donald Knuth at Stanford University. Knuth later met Hàn Thế Thành in Brno during his visit to the Faculty of Informatics to receive an honorary doctorate from Masaryk University.
TeX, stylized within the system as TeX, is a typesetting system which was designed and mostly written by Donald Knuth and released in 1978. TeX is a popular means of typesetting complex mathematical formulae; it has been noted as one of the most sophisticated digital typographical systems.
Hideaki Anno is a Japanese animator and filmmaker. He is best known for creating the anime series Neon Genesis Evangelion. His style has become defined by his incorporation of postmodernism and the extensive portrayal of characters' thoughts and emotions, often through unconventional scenes presenting the mental deconstruction of those characters.
In the form in which it was originally expounded, the anthropic principle was presented as a warning to astrophysical and cosmological theorists of the risk of error in the interpretation of astronomical and cosmological information unless due account is taken of the biological restraints under which the information was acquired. However, the converse message is also valid: biological theorists also run the risk of error in the interpretation of the evolutionary record unless they take due heed of the astrophysical restraints under which evolution took place. After an introductory discussion of the ordinary (‘weak’) anthropic principle and of its more contestable (‘strong’) analogue, a new application of the former to the problem of the evolution of terrestrial life is presented. It is shown that the evidence suggests that the evolutionary chain included at least one but probably not more than two links that were highly improbable (a priori) in the available time interval.
If we are not to conclude that most planets like Earth have evolved life as intelligent as we are, we must presume Earth is not random. This selection effect, however, also implies that the origin of life need not be as easy as the early appearance of life on Earth suggests. If a series of major evolutionary transitions were required to produce intelligent life, selection implies that a subset of these were “critical steps”, with durations that are similarly distributed. The time remaining from now until simple life is no longer possible on Earth must also be similarly distributed. I show how these results provide timing tests to constrain models of critical evolutionary transitions.
The prediction that (due to the limited amount of hydrogen available as fuel in the Sun) the future duration of our favourable terrestrial environment will be short (compared with the present age of the Earth) has been interpreted as evidence for a hard step scenario. This means that some of the essential steps (such as the development of eukaryotes) in the evolution process leading to the ultimate emergence of intelligent life would have been hard, in the sense of being against the odds in the available time, so that they are unlikely to have been achieved in most of the earth-like planets that may one day be discovered in nearby extra-solar systems.
It was originally estimated that only one or two of the essential evolutionary steps had to have been hard in this sense, but it has become apparent that this figure may need upward revision, because recent studies of climatic instability suggest that the possible future duration of our biologically favourable environment may be shorter than had been supposed, only about 1 gigayear rather than 5.
On the basis of the statistical requirement of roughly equal spacing between hard steps, it is argued that the best fit with the fossil record is now obtainable by postulating the number of hard steps to be 5, if our evolution was exclusively terrestrial, or 6 if, as now seems very plausible, the first step occurred on Mars.
Structurally complex life and intelligence evolved late on Earth; models for the evolution of global temperature suggest that, due to the increasing solar luminosity, the future life span of the (eukaryote) biosphere will be “only” about another billion years, a short time compared to the ~4 Ga since life began. A simple stochastic model (Carter 1983) suggests that this timing might be governed by the necessity to pass a small number, n, of very difficult evolutionary steps, with n < 10 and a best guess of n = 4, in order for intelligent observers like ourselves to evolve.
Here I extend the model analysis to derive probability distributions for each step. Past steps should tend to be evenly spaced through Earth’s history, and this is consistent with identification of the steps with some of the major transitions in the evolution of life on Earth. A complementary approach, identifying the critical steps with major reorganizations in Earth’s biogeochemical cycles, suggests that the Archean-Proterozoic and Proterozoic-Phanerozoic transitions might be identified with critical steps.
The success of the model lends support to a “Rare Earth” hypothesis (Rare Earth: Why Complex Life Is Uncommon in the Universe, Ward and Brownlee, 2000): structurally complex life is separated from prokaryotes by several very unlikely steps and, hence, will be much less common than prokaryotes. Intelligence is one further unlikely step, so it is much less common still.
A simple stochastic model for evolution, based upon the need to pass a sequence of n critical steps (Carter 1983, Watson 2008) is applied to both terrestrial and extraterrestrial origins of life.
In the former case, the time at which humans have emerged during the habitable period of the Earth suggests a value of n = 4. Progressively adding earlier evolutionary transitions (Maynard Smith and Szathmary, 1995) gives an optimum fit when n = 5, implying either that their initial transitions are not critical or that habitability began around 6 Ga ago.
The origin of life on Mars or elsewhere within the Solar System is excluded by the latter case and the simple anthropic argument is that extraterrestrial life is scarce in the Universe because it does not have time to evolve. Alternatively, the timescale can be extended if the migration of basic progenotic material to Earth is possible. If extra transitions are included in the model to allow for Earth migration, then the start of habitability needs to be even earlier than 6 Ga ago. Our present understanding of Galactic habitability and dynamics does not exclude this possibility.
We conclude that Galactic punctuated equilibrium (Cirkovic et al. 2009), proposed as a way round the anthropic problem, is not the only way of making life more common in the Galaxy.
Life arose on Earth sometime in the first few hundred million years after the young planet had cooled to the point that it could support water-based organisms on its surface. The early emergence of life on Earth has been taken as evidence that the probability of abiogenesis is high, if starting from young Earth-like conditions. We revisit this argument quantitatively in a Bayesian statistical framework. By constructing a simple model of the probability of abiogenesis, we calculate a Bayesian estimate of its posterior probability, given the data that life emerged fairly early in Earth’s history and that, billions of years later, curious creatures noted this fact and considered its implications. We find that, given only this very limited empirical information, the choice of Bayesian prior for the abiogenesis probability parameter has a dominant influence on the computed posterior probability. Although terrestrial life’s early emergence provides evidence that life might be abundant in the universe if early-Earth-like conditions are common, the evidence is inconclusive and indeed is consistent with an arbitrarily low intrinsic probability of abiogenesis for plausible uninformative priors. Finding a single case of life arising independently of our lineage (on Earth, elsewhere in the solar system, or on an extrasolar planet) would provide much stronger evidence that abiogenesis is not extremely rare in the universe.
Goodreads is an American social cataloging website that allows individuals to search its database of books, annotations, quotes, and reviews. Users can sign up and register books to generate library catalogs and reading lists. They can also create their own groups of book suggestions, surveys, polls, blogs, and discussions. The website's offices are located in San Francisco. The company is owned by the online retailer Amazon.
This page is a compilation of my anime/manga reviews; it is compiled from my MyAnimeList account & newsletter. Reviews are sorted by rating in descending order.
This is a compilation of my film/television/theater reviews; it is compiled from my newsletter. Reviews are sorted by rating in descending order.
“Attention Is All You Need”, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017-06-12):
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a "pretraining" step for a later supervised sequence learning algorithm. In other words, the parameters obtained from the unsupervised step can be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better. With pretraining, we are able to train long short term memory recurrent networks up to a few hundred timesteps, thereby achieving strong performance in many text classification tasks, such as IMDB, DBpedia and 20 Newsgroups.
The Cochrane Library is a collection of databases in medicine and other healthcare specialties provided by Cochrane and other organizations. At its core is the collection of Cochrane Reviews, a database of systematic reviews and meta-analyses which summarize and interpret the results of medical research. The Cochrane Library aims to make the results of well-conducted controlled trials readily available and is a key resource in evidence-based medicine.
Functional fixedness is a cognitive bias that limits a person to use an object only in the way it is traditionally used. The concept of functional fixedness originated in Gestalt psychology, a movement in psychology that emphasizes holistic processing. Karl Duncker defined functional fixedness as being a mental block against using an object in a new way that is required to solve a problem. This "block" limits the ability of an individual to use components given to them to complete a task, as they cannot move past the original purpose of those components. For example, if someone needs a paperweight, but they only have a hammer, they may not see how the hammer can be used as a paperweight. Functional fixedness is this inability to see a hammer's use as anything other than for pounding nails; the person couldn't think to use the hammer in a way other than in its conventional function.
Einstellung is the development of a mechanized state of mind. Often called a problem solving set, Einstellung refers to a person's predisposition to solve a given problem in a specific manner even though better or more appropriate methods of solving the problem exist.
Empirically analyzing empirical evidence: One of the central goals in any scientific endeavor is to understand causality. Experiments that seek to demonstrate a cause/effect relation most often manipulate the postulated causal factor. Aarts et al 2015 describe the replication of 100 experiments reported in papers published in 2008 in 3 high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study.
Introduction: Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. Scientific claims should not gain credence because of the status or authority of their originator but by the replicability of their supporting evidence. Even research of exemplary quality may have irreproducible empirical findings because of random or systematic error.
Rationale: There is concern about the rate and predictors of reproducibility, but limited evidence. Potentially problematic practices include selective reporting, selective analysis, and insufficient specification of the conditions necessary or sufficient to obtain the results. Direct replication is the attempt to recreate the conditions believed sufficient for obtaining a previously observed finding and is the means of establishing reproducibility of a finding with new data. We conducted a large-scale, collaborative effort to obtain an initial estimate of the reproducibility of psychological science.
Results: We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. There is no single standard for evaluating replication success. Here, we evaluated reproducibility using statistical-significance and P values, effect sizes, subjective assessments of replication teams, and meta-analysis of effect sizes. The mean effect size (r) of the replication effects (Mr = 0.197, SD = 0.257) was half the magnitude of the mean effect size of the original effects (Mr = 0.403, SD = 0.188), representing a substantial decline. 97% of original studies had statistically-significant results (p < 0.05). 36% of replications had statistically-significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically-significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
Conclusion: No single indicator sufficiently describes replication success, and the five indicators examined here are not the only ways to evaluate reproducibility. Nonetheless, collectively these results offer a clear conclusion: A large portion of replications produced weaker evidence for the original findings despite using materials provided by the original authors, review in advance for methodological fidelity, and high statistical power to detect the original effect sizes. Moreover, correlational evidence is consistent with the conclusion that variation in the strength of initial evidence (such as original P value) was more predictive of replication success than variation in the characteristics of the teams conducting the research (such as experience and expertise). The latter factors certainly can influence replication success, but they did not appear to do so here.
Reproducibility is not well understood because the incentives for individual scientists prioritize novelty over replication. Innovation is the engine of discovery and is vital for a productive, effective scientific enterprise. However, innovative ideas become old news fast. Journal reviewers and editors may dismiss a new test of a published idea as unoriginal. The claim that “we already know this” belies the uncertainty of scientific evidence. Innovation points out paths that are possible; replication points out paths that are likely; progress relies on both. Replication can increase certainty when findings are reproduced and promote innovation when they are not. This project provides accumulating evidence for many findings in psychological research and suggests that there is still more work to do to verify whether we know what we think we know.
Figure 1: Original study effect size versus replication effect size (correlation coefficients). Diagonal line represents replication effect size equal to original effect size. Dotted line represents replication effect size of 0. Points below the dotted line were effects in the opposite direction of the original. Density plots are separated by statistically-significant (blue) and non-statistically-significant (red) effects.
[Discussion of what Carter names the “anthropic principle”: “what we can expect to observe must be restricted by the conditions necessary for our presence as observers. (Although our situation is not necessarily central, it is inevitably privileged to some extent.)”
Carter appeals to this to explain various “large number coincidences” in particle physics & cosmology: various relationships between stars and proton mass constants, the Hubble expansion rate & age of the universe, radiation pressure allowing solid matter, the value of the gravitational constant—which have been used to justify exotic theories of physics with varying constants etc—are in fact implied by our existence.
While that doesn’t necessarily ‘explain’ the relationships, this basic statistical/philosophical requirement greatly undermines the appeal of such theories, particularly given the plausibility of various kinds of multiverse and ensemble theories. There may be no particular reason for any specific ‘large number coincidence’ other than (a) it was possible and (b) it is required for our existence so we could not observe otherwise.]
The consequences of creating a mouse-scale brain—as opposed to a ‘mouse-in-a-can’—will be weird: even AI experts have a poor imagination and tend to imagine AI minds as either glorified calculators or humans-in-a-robot-suit, rather than trying to imagine a diverse Technium richer than biological ecosystems filled with minds with bizarre patterns of weaknesses & strengths (GPT-3/AI-Dungeon or AlphaGo/AlphaStar/OA5 ‘delusions’ are just the start), evolving at a computer pace, filling & creating new niches in a global economy. “The street finds its own use for things”—I can’t predict what somebody in China will do with my anime GANs last year, how can anyone hope to predict what everyone can do with a mind decades from now? If your vision of the future is not weird & disturbing (and orthogonal to current culture wars), you aren’t trying.↩︎