“Solving Rubik’s Cube With A Robot Hand”, Akkaya et al 2019 (blog; video: Dactyl followup with improved curriculum-learning of machine-generated domain randomizations in a simulation; the diversity of simulations leads to emergent meta-learning, and zero-shot sim2real transfer & robustness to untrained interference. An example of harder tasks leading to better learning (see previously Clune 2019), because neural nets are lazy!)
“How Airbnb Is Silently Changing Himalayan Villages”, Shanu Athiparambath (“All fixed, fast-frozen relations, with their train of ancient and venerable prejudices and opinions, are swept away, all new-formed ones become antiquated before they can ossify. All that is solid melts into air…”)
Roar (1981 film) (“Roar used real lions during filming, resulting in at least 70 cast and crew being injured. A crew member was scalped, another had his throat bitten out, and much of the footage of attacks was used in the final cut of the film. The blood seen in the movie is real.”)
The Beginning Was the End, Maerth 1971 (“eating brains produces an aphrodisiac effect & this caused apes to become addicted, organizing brain hunts wherein the males of another tribe were eaten & females raped in a frenzy of brain-induced sex & violence…The book contains no references whatsoever, based alternately on alleged conversations with present-day cannibals, the eating of ape brain by the author and direct insight from deep meditation.”)
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
Genome-wide association studies often explore links between particular genes and phenotypes of interest. Known genetic variants, however, are responsible for only a small fraction of human lifespan variation evident from genetic twin studies. To account for the missing longevity variance, we hypothesized that the cumulative effect of deleterious variants may affect human longevity. Here, we report that the burden of rarest protein-truncating variants (PTVs) negatively impacts both human healthspan and lifespan in two large independent cohorts. Longer-living subjects have both fewer rarest PTVs and less damaging PTVs. In contrast, we show that the burden of frequent PTVs and rare non-PTVs is less deleterious, lacking association with longevity. The combined effect of rare PTVs is similar to that of known variants associated with longer lifespan and accounts for 1 − 2 years of lifespan variability. We further find that somatic accumulation of PTVs accounts for a minute fraction of mortality and morbidity acceleration and hence provides little support for its causal role in aging. Thus, damaging mutations, germline and somatic, can only contribute to aging as a result of higher-order effects including interactions of multiple forms of damage.
“Associations of autozygosity with a broad range of human phenotypes”, David W. Clark, Yukinori Okada, Kristjan H. S. Moore, Dan Mason, Nicola Pirastu, Ilaria Gandin, Hannele Mattsson, Catriona L. K. Barnes, Kuang Lin, Jing Hua Zhao, Patrick Deelen, Rebecca Rohde, Claudia Schurmann, Xiuqing Guo, Franco Giulianini, Weihua Zhang, Carolina Medina-Gomez, Robert Karlsson, Yanchun Bao, Traci M. Bartz, Clemens Baumbach, Ginevra Biino, Matthew J. Bixley, Marco Brumat, Jin-Fang Chai, Tanguy Corre, Diana L. Cousminer, Annelot M. Dekker, David A. Eccles, Kristel R. van Eijk, Christian Fuchsberger, He Gao, Marine Germain, Scott D. Gordon, Hugoline G. de Haan, Sarah E. Harris, Edith Hofer, Alicia Huerta-Chagoya, Catherine Igartua, Iris E. Jansen, Yucheng Jia, Tim Kacprowski, Torgny Karlsson, Marcus E. Kleber, Shengchao Alfred Li, Ruifang Li-Gao, Anubha Mahajan, Koichi Matsuda, Karina Meidtner, Weihua Meng, May E. Montasser, Peter J. van der Most, Matthias Munz, Teresa Nutile, Teemu Palviainen, Gauri Prasad, Rashmi B. Prasad, Tallapragada Divya Sri Priyanka, Federica Rizzi, Erika Salvi, Bishwa R. Sapkota, Daniel Shriner, Line Skotte, Melissa C. Smart, Albert Vernon Smith, Ashley van der Spek, Cassandra N. Spracklen, Rona J. Strawbridge, Salman M. Tajuddin, Stella Trompet, Constance Turman, Niek Verweij, Clara Viberti, Lihua Wang, Helen R. Warren, Robyn E. Wootton, Lisa R. Yanek, Jie Yao, Noha A. Yousri, Wei Zhao, Adebowale A. Adeyemo, Saima Afaq, Carlos Alberto Aguilar-Salinas, Masato Akiyama, Matthew L. Albert, Matthew A. Allison, Maris Alver, Tin Aung, Fereidoun Azizi, Amy R. Bentley, Heiner Boeing, Eric Boerwinkle, Judith B. Borja, Gert J. de Borst, Erwin P. Bottinger, Linda Broer, Harry Campbell, Stephen Chanock, Miao-Li Chee, Guanjie Chen, Yii-Der I. Chen, Zhengming Chen, Yen-Feng Chiu, Massimiliano Cocca, Francis S. Collins, Maria Pina Concas, Janie Corley, Giovanni Cugliari, Rob M. van Dam, Anna Damulina, Maryam S. Daneshpour, Felix R. Day, Graciela E. Delgado, Klodian Dhana, Alexander S. F. Doney, Marcus Dörr, Ayo P. Doumatey, Nduna Dzimiri, S. Sunna Ebenesersdóttir, Joshua Elliott, Paul Elliott, Ralf Ewert, Janine F. Felix, Krista Fischer, Barry I. Freedman, Giorgia Girotto, Anuj Goel, Martin Gögele, Mark O. Goodarzi, Mariaelisa Graff, Einat Granot-Hershkovitz, Francine Grodstein, Simonetta Guarrera, Daniel F. Gudbjartsson, Kamran Guity, Bjarni Gunnarsson, Yu Guo, Saskia P. Hagenaars, Christopher A. Haiman, Avner Halevy, Tamara B. Harris, Mehdi Hedayati, David A. van Heel, Makoto Hirata, Imo Höfer, Chao Agnes Hsiung, Jinyan Huang, Yi-Jen Hung, M. Arfan Ikram, Anuradha Jagadeesan, Pekka Jousilahti, Yoichiro Kamatani, Masahiro Kanai, Nicola D. Kerrison, Thorsten Kessler, Kay-Tee Khaw, Chiea Chuen Khor, Dominique P. V. de Kleijn, Woon-Puay Koh, Ivana Kolcic, Peter Kraft, Bernhard K. Krämer, Zoltán Kutalik, Johanna Kuusisto, Claudia Langenberg, Lenore J. Launer, Deborah A. Lawlor, I-Te Lee, Wen-Jane Lee, Markus M. Lerch, Liming Li, Jianjun Liu, Marie Loh, Stephanie J. London, Stephanie Loomis, Yingchang Lu, Jian’an Luan, Reedik Mägi, Ani W. Manichaikul, Paolo Manunta, Gísli Másson, Nana Matoba, Xue W. Mei, Christa Meisinger, Thomas Meitinger, Massimo Mezzavilla, Lili Milani, Iona Y. Millwood, Yukihide Momozawa, Amy Moore, Pierre-Emmanuel Morange, Hortensia Moreno-Macías, Trevor A. Mori, Alanna C. Morrison, Taulant Muka, Yoshinori Murakami, Alison D. Murray, Renée de Mutsert, Josyf C. Mychaleckyj, Mike A. Nalls, Matthias Nauck, Matt J. Neville, Ilja M. Nolte, Ken K. Ong, Lorena Orozco, Sandosh Padmanabhan, Gunnar Pálsson, James S. Pankow, Cristian Pattaro, Alison Pattie, Ozren Polasek, Neil Poulter, Peter P. Pramstaller, Lluis Quintana-Murci, Katri Räikkönen, Sarju Ralhan, Dabeeru C. Rao, Wouter van Rheenen, Stephen S. Rich, Paul M. Ridker, Cornelius A. Rietveld, Antonietta Robino, Frank J. A van Rooij, Daniela Ruggiero, Yasaman Saba, Charumathi Sabanayagam, Maria Sabater-Lleal, Cinzia Felicita Sala, Veikko Salomaa, Kevin Sandow, Helena Schmidt, Laura J. Scott, William R. Scott, Bahareh Sedaghati-Khayat, Bengt Sennblad, Jessica van Setten, Peter J. Sever, Wayne H-H Sheu, Yuan Shi, Smeeta Shrestha, Sharvari Rahul Shukla, Jon K. Sigurdsson, Timo Tonis Sikka, Jai Rup Singh, Blair H. Smith, Alena Stančáková, Alice Stanton, John M. Starr, Lilja Stefansdottir, Leon Straker, Patrick Sulem, Gardar Sveinbjornsson, Morris A. Swertz, Adele M. Taylor, Kent D. Taylor, Natalie Terzikhan, Yih-Chung Tham, Gudmar Thorleifsson, Unnur Thorsteinsdottir, Annika Tillander, Russell P. Tracy, Teresa Tusié-Luna, Ioanna Tzoulaki, Simona Vaccargiu, Jagadish Vangipurapu, Jan H. Veldink, Veronique Vitart, Uwe Völker, Eero Vuoksimaa, Salma M. Wakil, Melanie Waldenberger, Gurpreet S. Wander, Ya Xing Wang, Nicholas J. Wareham, Sarah Wild, Chittaranjan S. Yajnik, Jian-Min Yuan, Lingyao Zeng, Liang Zhang, Jie Zhou, Najaf Amin, Folkert W. Asselbergs, Stephan J. L. Bakker, Diane M. Becker, Benjamin Lehne, David A. Bennett, Leonard H. van den Berg, Sonja I. Berndt, Dwaipayan Bharadwaj, Lawrence F. Bielak, Murielle Bochud, Mike Boehnke, Claude Bouchard, Jonathan P. Bradfield, Jennifer A. Brody, Archie Campbell, Shai Carmi, Mark J. Caulfield, David Cesarini, John C. Chambers, Giriraj Ratan Chandak, Ching-Yu Cheng, Marina Ciullo, Marilyn Cornelis, Daniele Cusi, George Davey Smith, Ian J. Deary, Rajkumar Dorajoo, Cornelia M. van Duijn, David Ellinghaus, Jeanette Erdmann, Johan G. Eriksson, Evangelos Evangelou, Michele K. Evans, Jessica D. Faul, Bjarke Feenstra, Mary Feitosa, Sylvain Foisy, Andre Franke, Yechiel Friedlander, Paolo Gasparini, Christian Gieger, Clicerio Gonzalez, Philippe Goyette, Struan F. A Grant, Lyn R. Griffiths, Leif Groop, Vilmundur Gudnason, Ulf Gyllensten, Hakon Hakonarson, Anders Hamsten, Pim van der Harst, Chew-Kiat Heng, Andrew A. Hicks, Hagit Hochner, Heikki Huikuri, Steven C. Hunt, Vincent W. V. Jaddoe, Philip L. De Jager, Magnus Johannesson, Åsa Johansson, Jost B. Jonas, J. Wouter Jukema, Juhani Junttila, Jaakko Kaprio, Sharon L. R. Kardia, Fredrik Karpe, Meena Kumari, Markku Laakso, Sander W. van der Laan, Jari Lahti, Matthias Laudes, Rodney A. Lea, Wolfgang Lieb, Thomas Lumley, Nicholas G. Martin, Winfried März, Giuseppe Matullo, Mark I. McCarthy, Sarah E. Medland, Tony R. Merriman, Andres Metspalu, Brian F. Meyer, Karen L. Mohlke, Grant W. Montgomery, Dennis Mook-Kanamori, Patricia B. Munroe, Kari E. North, Dale R. Nyholt, Jeffery R. O’connell, Carole Ober, Albertine J. Oldehinkel, Walter Palmas, Colin Palmer, Gerard G. Pasterkamp, Etienne Patin, Craig E. Pennell, Louis Perusse, Patricia A. Peyser, Mario Pirastu, Tinca J. C. Polderman, David J. Porteous, Danielle Posthuma, Bruce M. Psaty, John D. Rioux, Fernando Rivadeneira, Charles Rotimi, Jerome I. Rotter, Igor Rudan, Hester M. Den Ruijter, Dharambir K. Sanghera, Naveed Sattar, Reinhold Schmidt, Matthias B. Schulze, Heribert Schunkert, Robert A. Scott, Alan R. Shuldiner, Xueling Sim, Neil Small, Jennifer A. Smith, Nona Sotoodehnia, E-Shyong Tai, Alexander Teumer, Nicholas J. Timpson, Daniela Toniolo, David-Alexandre Tregouet, Tiinamaija Tuomi, Peter Vollenweider, Carol A. Wang, David R. Weir, John B. Whitfield, Cisca Wijmenga, Tien-Yin Wong, John Wright, Jingyun Yang, Lei Yu, Babette S. Zemel, Alan B. Zonderman, Markus Perola, Patrik K. E. Magnusson, André G. Uitterlinden, Jaspal S. Kooner, Daniel I. Chasman, Ruth J. F. Loos, Nora Franceschini, Lude Franke, Chris S. Haley, Caroline Hayward, Robin G. Walters, John R. B. Perry, Tōnu Esko, Agnar Helgason, Kari Stefansson, Peter K. Joshi, Michiaki Kubo, James F. Wilson (2019-10-31):
In many species, the offspring of related parents suffer reduced reproductive success, a phenomenon known as inbreeding depression. In humans, the importance of this effect has remained unclear, partly because reproduction between close relatives is both rare and frequently associated with confounding social factors. Here, using genomic inbreeding coefficients (FROH) for >1.4 million individuals, we show that FROH is significantly associated (p < 0.0005) with apparently deleterious changes in 32 out of 100 traits analysed. These changes are associated with runs of homozygosity (ROH), but not with common variant homozygosity, suggesting that genetic variants associated with inbreeding depression are predominantly rare. The effect on fertility is striking: FROH equivalent to the offspring of first cousins is associated with a 55% decrease [95% CI 44–66%] in the odds of having children. Finally, the effects of FROH are confirmed within full-sibling pairs, where the variation in FROH is independent of all environmental confounding.
Engineering biology with recombinant DNA, broadly called synthetic biology, has progressed tremendously in the last decade, owing to continued industrialization of DNA synthesis, discovery and development of molecular tools and organisms, and increasingly sophisticated modeling and analytic tools. However, we have yet to understand the full potential of engineering biology because of our inability to write and test whole genomes, which we call synthetic genomics. Substantial improvements are needed to reduce the cost and increase the speed and reliability of genetic tools. Here, we identify emerging technologies and improvements to existing methods that will be needed in four major areas to advance synthetic genomics within the next 10 years: genome design, DNA synthesis, genome editing, and chromosome construction (see table). Similar to other large-scale projects for responsible advancement of innovative technologies, such as the Human Genome Project, an international, cross-disciplinary effort consisting of public and private entities will likely yield maximal return on investment and open new avenues of research and biotechnology.
“Grandmaster level in StarCraft II using multi-agent reinforcement learning”, Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander S. Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom L. Paine, Caglar Gulcehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, David Silver (2019-10-30):
Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional e-sports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.
“Solving Rubik's Cube with a Robot Hand”, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang (2019-10-16):
We demonstrate that models trained only in simulation can be used to solve a manipulation problem of unprecedented complexity on a real robot. This is made possible by two key components: a novel algorithm, which we call automatic domain randomization (ADR) and a robot platform built for machine learning. ADR automatically generates a distribution over randomized environments of ever-increasing difficulty. Control policies and vision state estimators trained with ADR exhibit vastly improved sim2real transfer. For control policies, memory-augmented models trained on an ADR-generated distribution of environments show clear signs of emergent meta-learning at test time. The combination of ADR with our custom robot platform allows us to solve a Rubik’s cube with a humanoid robot hand, which involves both control and state estimation problems. Videos summarizing our results are available: https://openai.com/blog/solving-rubiks-cube/
We’ve trained a pair of neural networks to solve the Rubik’s Cube with a human-like robot hand. The neural networks are trained entirely in simulation, using the same reinforcement learning code as OpenAI Five paired with a new technique called Automatic Domain Randomization (ADR). The system can handle situations it never saw during training, such as being prodded by a stuffed giraffe. This shows that reinforcement learning isn’t just a tool for virtual tasks, but can solve physical-world problems requiring unprecedented dexterity.
…Since May 2017, we’ve been trying to train a human-like robotic hand to solve the Rubik’s Cube. We set this goal because we believe that successfully training such a robotic hand to do complex manipulation tasks lays the foundation for general-purpose robots. We solved the Rubik’s Cube in simulation in July 2017. But as of July 2018, we could only manipulate a block on the robot. Now, we’ve reached our initial goal. Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it. Our robot still hasn’t perfected its technique though, as it solves the Rubik’s Cube 60% of the time (and only 20% of the time for a maximally difficult scramble).
Perhaps the most ambitious scientific quest in human history is the creation of general artificial intelligence, which roughly means AI that is as smart or smarter than humans. The dominant approach in the machine learning community is to attempt to discover each of the pieces required for intelligence, with the implicit assumption that some future group will complete the Herculean task of figuring out how to combine all of those pieces into a complex thinking machine. I call this the "manual AI approach". This paper describes another exciting path that ultimately may be more successful at producing general AI. It is based on the clear trend in machine learning that hand-designed solutions eventually are replaced by more effective, learned solutions. The idea is to create an AI-generating algorithm (AI-GA), which automatically learns how to produce general AI. Three Pillars are essential for the approach: (1) meta-learning architectures, (2) meta-learning the learning algorithms themselves, and (3) generating effective learning environments. I argue that either approach could produce general AI first, and both are scientifically worthwhile irrespective of which is the fastest path. Because both are promising, yet the ML community is currently committed to the manual approach, I argue that our community should increase its research investment in the AI-GA approach. To encourage such research, I describe promising work in each of the Three Pillars. I also discuss AI-GA-specific safety and ethical considerations. Because it it may be the fastest path to general AI and because it is inherently scientifically interesting to understand the conditions in which a simple algorithm can produce general AI (as happened on Earth where Darwinian evolution produced human intelligence), I argue that the pursuit of AI-GAs should be considered a new grand challenge of computer science research.
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.
We introduce our efforts towards building a universal neural machine translation (NMT) system capable of translating between any language pair. We set a milestone towards this goal by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples. Our system demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines. We provide in-depth analysis of various aspects of model building that are crucial to achieving quality and practicality in universal NMT. While we prototype a high-quality universal translation system, our extensive empirical analysis exposes issues that need to be further addressed, and we suggest directions for future research.
Fine-tuning pre-trained Neural Machine Translation (NMT) models is the dominant approach for adapting to new languages and domains. However, fine-tuning requires adapting and maintaining a separate model for each target task. We propose a simple yet efficient approach for adaptation in NMT. Our proposed approach consists of injecting tiny task specific adapter layers into a pre-trained model. These lightweight adapters, with just a small fraction of the original model size, adapt the model to multiple individual tasks simultaneously. We evaluate our approach on two tasks: (i) Domain Adaptation and (ii) Massively Multilingual NMT. Experiments on domain adaptation demonstrate that our proposed approach is on par with full fine-tuning on various domains, dataset sizes and model capacities. On a massively multilingual dataset of 103 languages, our adaptation approach bridges the gap between individual bilingual models and one massively multilingual model for most language pairs, paving the way towards universal machine translation.
This paper presents a study of semi-supervised learning with large convolutional networks. We propose a pipeline, based on a teacher/student paradigm, that leverages a large collection of unlabelled images (up to 1 billion). Our main goal is to improve the performance for a given target architecture, like ResNet-50 or ResNext. We provide an extensive analysis of the success factors of our approach, which leads us to formulate some recommendations to produce high-accuracy models for image classification with semi-supervised learning. As a result, our approach brings important gains to standard architectures for image, video and fine-grained classification. For instance, by leveraging one billion unlabelled images, our learned vanilla ResNet-50 achieves 81.2
[Artbreeder is an interactive GAN generator website. Originally named “Ganbreeder” and providing only the 256px BigGAN generator, it now provides a variety of BigGAN & StyleGAN models, including the anime portrait StyleGAN model. (It is more general than the similar Waifu Labs, but my anime model is not as good.) Users can generate random samples and explore slight variants of them to gradually explore the “latent space” and find interesting images, but they can also edit images more directly, upload existing images to find the most similar image produced by the model, etc. A popular website, it has generated >56m images from September 2019 to January 2020.]
Generating high-quality anime faces has long been a task neural networks struggled with. The invention of StyleGAN in 2018 has effectively solved this task and I have trained a StyleGAN model which can generate high-quality anime faces at 512px resolution. To show off the recent progress, I made a website, “This Waifu Does Not Exist” for displaying random StyleGAN 2 faces. TWDNE displays a different neural-net-generated face & plot summary every 15s. The site was popular and went viral online, especially in China. The model can also be used interactively for exploration & editing in the Artbreeder online service.
TWDNE faces have been used as screensavers, user avatars, character art for game packs or onlinegames, uploaded to Pixiv, given away in streams, and used in a research paper (Noguchi & Harada 2019). TWDNE results also helped inspired Sizigi Studio’s online interactive waifu GAN, Waifu Labs, which generates even better anime faces than my StyleGAN results.
One fertile source of leprechauns seems to be the observation that researchers do not read many of the papers that they cite in their own papers. The frequency of this can be inferred from pre-digital papers, based on bibliographic errors: if a citation has mistakes in it, such that one could not have actually looked up the paper in a library or database, and those mistakes were copied from another paper, then the authors almost certainly did not read the paper (otherwise they would have fixed the mistakes when they found them out the hard way) and simply copied the citation. The empirically-measured spread of bibliographic errors suggest that researchers frequently do not read the papers they cite. The frequency can be further confirmed by examining citations to see when the citers misdescribe the original paper, "quotation errors", showing that the errors involved are substantial and not merely bibliographic. ... How often do authors not read their cites? This might seem near-impossible to answer, but bibliographic analysis offers a cute trick. In olden times, citations and bibliographies had to be compiled by hand; this is an error-prone process, but one may make a different error from another author citing the same paper, and one might correct any error on reading the original. On the other hand, if you cite a paper because you blindly copied the citation from another paper and never get around to reading it, you may introduce additional errors but you definitely won't fix any error in what you copied. So one can get an idea of how frequent non-reads are by tracing lineages of bibliographic errors: the more people copy around the same wrong version of a citation (out of the total set of citations for that cite), the fewer of them must be actually reading it. Such copied errors turn out to be quite common and represent a large fraction of citations, and thus suggests that many paper are being cited without being read. Simkin & Roychowdhury venture a guess that as many as 80% of authors citing a paper have not actually read the original. [Bibliography of papers in the area.]
Evidence across social science indicates that average effects of persuasive messages are small. One commonly offered explanation for these small effects is heterogeneity: Persuasion may only work well in specific circumstances. To evaluate heterogeneity, we repeated an experiment weekly in real time using 2016 U.S. presidential election campaign advertisements. We tested 49 political advertisements in 59 unique experiments on 34,000 people. We investigate heterogeneous effects by sender (candidates or groups), receiver (subject partisanship), content (attack or promotional), and context (battleground versus non-battleground, primary versus general election, and early versus late). We find small average effects on candidate favorability and vote. These small effects, however, do not mask substantial heterogeneity even where theory from political science suggests that we should find it. During the primary and general election, in battleground states, for Democrats, Republicans, and Independents, effects are similarly small. Heterogeneity with large offsetting effects is not the source of small average effects.
Purpose: This paper aims to analyse the governance structure of monasteries to gain new insights and apply them to solve agency problems of modern corporations. In an historic analysis of crises and closures it asks, if Benedictine monasteries were and are capable of solving agency problems. The analysis shows that monasteries established basic governance instruments very early and therefore were able to survive for centuries.
Design/methodology/approach: The paper uses a dataset of all Benedictine abbeys that ever existed in Bavaria, Baden‐Württemberg, and German‐speaking Switzerland to determine their lifespan and the reasons for closures. The governance mechanisms are analyzed in detail. Finally, it draws conclusions relevant to the modern corporation. The theoretical foundations are based upon principal agency theory, psychological economics, as well as embeddedness theory.
Findings: The monasteries that are examined show an average lifetime of almost 500 years and only a quarter of them dissolved as a result of agency problems. This paper argues that this success is due to an appropriate governance structure that relies strongly on internal control mechanisms.
Research limitations/implications: Benedictine monasteries and stock corporations differ fundamentally regarding their goals. Additional limitations of the monastic approach are the tendency to promote groupthink, the danger of dictatorship and the life long commitment.
Practical implications: The paper adds new insights into the corporate governance debate designed to solve current agency problems and facilitate better control.
Originality/value: By analyzing monasteries, a new approach is offered to understand the efficiency of internal behavioral incentives and their combination with external control mechanisms in corporate governance.
[Letter from the eastern Himalayas about the social and economic impact of Airbnb.]
It’s expensive to farm in Himalayan villages like mine. The farms are small and cannot leverage economies of scale. Hill people see the process of selling land as a humiliating ordeal they would never consider. Everybody chips in to cultivate the land. Women spend many hours a day cutting grass for their cows. This is not yet a division of labour society. It is this world that Airbnb has penetrated, turning it upside down.
Millions of people stay in Airbnb homes every night. It’s not trust which makes this possible. My pup is fearless when he sleeps with the door wide open, in a cottage in the woods. There are leopards around. Dogs here don’t live very long. He doesn’t trust leopards, but he knows they are afraid of humans. My pup sleeps on my bed, and so is well-protected from the vicissitudes of life. But I’m not the living proof that dogs can trust leopards. Dogs wouldn’t need humans to guard them if they could trust leopards. Similarly, Airbnb puts hosts and guests in a position where behaving badly would ruin their reputations. In one of my bad moods, I held my pup quite firmly. At midnight, he ran out of the cottage and barked for hours. I couldn’t bring him back to my bed. I did something he thought I wouldn’t consider. He felt I betrayed his trust in me. I’m, here, talking about a more meaningful form of trust. Intellectuals miss this obvious distinction, because they’re not the wonderful people they think they are. The distinction between trust and assurance is all too obvious. But if doing wrong doesn’t fill you with moral horror, you won’t get it. You can’t trust anybody who doesn’t feel that way, and there are not many such people. Unconditional trustworthiness is one of the rarest things in the world. Institutions can’t produce this kind of trust, because people aren’t conditionable beyond a point. In any case, how do you produce something you don’t even understand?
As they regale me with talk of their younger selves and their trips to Jamaica, Aruba, Cozumel, and Mazatlán, they present the very picture of well-adjusted adulthood on the verge of retirement. Except for one fairly major thing. As we chat, McKinnon makes clear that she has no memories of all those cruises. No memories of buying the lizard or finding that oilcloth collage. She doesn’t remember any vacation she’s ever taken. In fact, she cannot recall a single moment in her marriage to Green or before it.
For decades, scientists suspected that someone like Susie McKinnon might exist. They figured she was probably out there, living an ordinary life—hard to tell apart from the next person in line at the grocery store, yet fundamentally different from the rest of us. And sure enough, they found her (or rather, she found them) in 2006. “I don’t remember being smaller or having to reach up for things. I have no impressions of myself as a kid.” McKinnon is the first person ever identified with a condition called severely deficient autobiographical memory. She knows plenty of facts about her life, but she lacks the ability to mentally relive any of it, the way you or I might meander back in our minds and evoke a particular afternoon. She has no episodic memories—none of those impressionistic recollections that feel a bit like scenes from a movie, always filmed from your perspective. To switch metaphors: Think of memory as a favorite book with pages that you return to again and again. Now imagine having access only to the index. Or the Wikipedia entry.
…McKinnon first began to realize that her memory was not the same as everyone else’s back in 1977, when a friend from high school, who was studying to be a physician’s assistant, asked if she would participate in a memory test as part of a school assignment. When her friend asked basic questions about her childhood as part of the test, McKinnon would reply, “Why are you asking stuff like this? No one remembers that!” She knew that other people claimed to have detailed memories, but she always thought they embellished and made stuff up—just like she did. McKinnon’s friend was so disturbed by her responses that she suggested McKinnon get her memory checked by a professional. McKinnon put the exchange aside for almost three decades. Then one day in 2004, she came across an article about Endel Tulving, the researcher who had originally characterized the difference between episodic and semantic memory.
Aphantasia is a condition characterized by an inability to voluntarily visualize mental imagery. Many people with aphantasia also report an inability to recall sounds, smells, or sensations of touch. Some also report prosopagnosia, the inability to recognize faces.
SDAM (Severely Deficient Autobiographical Memory) is a relatively new discovery of the inability of a person to recall events in the past. It is not memory loss, Alzheimer's, or dementia. It is associated with the inability to vividly recall experiences, such as not having many, if any at all, childhood memories. Another example would be not remembering many details about your wedding day. SDAM seems to be related somewhat with Aphantasia (the inability to vividly picture things in your mind). Both, though, seem to exist on a spectrum. Obviously it does not seem to limit normal functioning or learning ability. We are here to find out more about it and its impacts.
Profoundly impaired autobiographical re-experiencing in healthy adults.
Deficit specific to episodic (especially visual), rather than semantic processes.
Impaired activation of midline structures during autobiographical memory retrieval.
Absence of late positive component with intact recognition.
Performance on everyday mnemonic tasks mediated by non-episodic processes.
Abstract: Recollection of previously experienced events is a key element of human memory that entails recovery of spatial, perceptual, and mental state details. While deficits in this capacity in association with brain disease have serious functional consequences, little is known about individual differences in autobiographical memory (AM) in healthy individuals. Recently, healthy adults with highly superior autobiographical capacities have been identified (e.g., LePort et al 2012. Here we report data from three healthy, high functioning adults with the reverse pattern: lifelong severely deficient autobiographical memory (SDAM) with otherwise preserved cognitive function. Their self-reported selective inability to vividly recollect personally experienced events from a first-person perspective was corroborated by absence of functional magnetic resonance imaging (fMRI) and event-related potential (ERP) biomarkers associated with naturalistic and laboratory episodic recollection, as well as by behavioral evidence of impaired episodic retrieval, particularly for visual information. Yet learning and memory were otherwise intact, as long as these tasks could be accomplished by non-episodic processes. Thus these individuals function normally in day-to-day life, even though their past is experienced in the absence of recollection. [Keywords: Episodic memory, Autobiographical memory, Hippocampus, Case study.]
The syndromes of highly superior autobiographical memory (HSAM) and severely deficient autobiographical memory (SDAM) have come under recent investigation. These syndromes pose challenges for theories of memory.
Research on individual differences in autobiographical memory across the spectrum have also emerged, complementing prior work involving individual differences in laboratory-based episodic memory.
Additional research that is focused on HSAM and SDAM, particularly those involving larger sample sizes, will provide a novel platform for understanding the cognitive and neural factors that are associated with the formation and retention of autobiographical memories.
Although humans have a remarkable capacity to recall a wealth of detail from the past, there are marked interindividual differences in the quantity and quality of our mnemonic experiences. Such differences in autobiographical memory may appear self-evident, yet there has been little research on this topic. In this review, we synthesize an emerging body of research regarding individual differences in autobiographical memory. We focus on two syndromes that fall at the extremes of the ‘remembering’ dimension: highly superior autobiographical memory (HSAM) and severely deficient autobiographical memory (SDAM). We also discuss findings from research on less extreme individual differences in autobiographical memory. This avenue of research is pivotal for a full description of the behavioral and neural substrates of autobiographical memory. [Keywords: Episodic memory, highly superior autobiographical memory, severely deficient autobiographical memory]
Presents a case report of 65 year old man. He became unable to summon images to the mind's eye after coronary angioplasty. Following a popular description of their paper, they were contacted by over twenty individuals who recognized themselves in the article's account of 'blind imagination', with the important difference that their imagery impairment had been lifelong. Here they describe the features of their condition, elicited by a questionnaire, and suggest a name—aphantasia—for this poorly recognized phenomenon. 21 individuals contacted them because of their lifelong reduction of visual imagery. They explored the features of their condition with a questionnaire devised for the purpose and the Vividness of Visual Imagery Questionnaire (VVIQ). Participants typically became aware of their condition in their teens or twenties when, through conversation or reading, they realized that most people who 'saw things in the mind's eye', unlike our participants, enjoyed a quasi-visual experience.
In this paper, we shall use Tulving's seminal empirical and theoretical research including the ‘Spoon Test’ to explore memory and mental time travel and its origins and role in planning for the future. We will review the comparative research on future planning and episodic foresight in pre-verbal children and non-verbal animals to explore how this may be manifest as wordless thoughts. [Keywords: Mental time travel, episodic memory, convergent evolution of cognition, corvids, child development, subjective experience of thinking.]
[On why Henry Darger, a elderly, solitary dishwasher, wrote and illustrated a 15,000+ page unpublished fantasy novel.]
I’m here today to tell you about a book I read recently, namely Henry Darger: In The Realms Of The Unreal, by John MacGregor. It’s a study of Henry Darger, a man I instantly became obsessed with upon encountering his Wikipedia entry sometime last fall.
Here’s a quick sketch of who Darger was, which will hopefully give you an idea of why I find him so fascinating. He was a reclusive man who worked various dishwashing jobs for most of his life. He only had one real friend in the course of his life, and although he occasionally interacted with the other residents of his apartment complex, they just saw him as a peculiar, taciturn eccentric. But when Darger was on his deathbed, his landlord Nathan Lerner began to clean out his room and discovered something incredible. Unknown to everyone around him, Darger had been writing and painting. Writing and painting a lot. Among the objects Lerner discovered were fifteen massive volumes comprising one continuous fictional work entitled The Story of the Vivian Girls, in What is Known as the Realms of the Unreal, of the Glandeco-Angelinian War Storm Caused by the Child Slave Rebellion. In total, the typed, single-spaced text was 15,145 pages long—one of the longest fictional works ever produced by a human being, if not the longest. (Whether it is the longest or not depends on what counts as a single work; there are some long works of serial pulp fiction that, in total, are longer, but that’s only if you add up the length of hundreds of installments.) This was not Darger’s only writing project. There was also a sort of sequel, Crazy House, which ran to around 10,000 pages, and the 5000-page autobiography The History of My Life, as well as numerous journals and other miscellany. And then there were the paintings, hundreds of huge, odd-looking, compositions depicting battles, scenes of torture, and heroic adventures. (You can see some of Darger’s art thanks to Google Image Search here).
It turned out that the paintings were illustrations for Darger’s 15,145-page masterwork, called In The Realms Of The Unreal for short. In The Realms Of The Unreal is, in some very broad sense, a fantasy novel. It takes place on a planet far larger than Earth, which Earth is said to orbit as a moon. This planet is mostly composed of Catholic nations, of which the most important to the plot are Angelinia, Calverinia and Abbieannia. (Protestants do not appear to exist in this world, though—confusingly enough—one of the Catholic nations is called Protestantia.) The story is about a war between the Catholic nations and the atheist nation Glandelinia, which is inhabited by evil, sadistic people who practice institutionalized child slavery. Shortly before the time period described in the text, some of the child slaves mounted a rebellion, led by a heroic 10-year-old named Annie Aronburg. The Glandelineans quashed the rebellion and killed Aronburg, but this started a chain of events that led to a Glandelinean invasion of Calverinia and eventually a full-scale war between the Catholic nations and Glandelinia. In The Realms Of The Unreal tells the story of this war, an incredibly long succession of huge battles, espionage missions, scenes of torture in the Glandelinean slave camps, and so on. The protagonists, curiously enough, are a set of seven prepubescent sisters—the titular Vivian girls—who follow the Christian armies, spy on the Glandelineans, and narrowly escape mortal danger on innumerable occasions. The battles are mostly realistic in nature—though they involve millions of combatants—but the world is an enchanted one, filled with chimeric beasts called “Blengiglomenean creatures” (or “Blengins,” for short) which assist and protect the Vivian girls.
…The problem comes when MacGregor tries to interpret the text psychologically, which happens often. MacGregor is a Freudian analyst—he studied with Anna Freud, in fact—and he is mainly interested in Darger as a psychological subject. Now, this is not the time or place to hash out whether Freudian psychology does or doesn’t succeed, generally speaking, at explaining the human mind. But even if I withhold judgment on MacGregor’s Freudian premises, his account of Darger’s psychology is just really, really bad and frustrating….So, without further ado, here are some interesting things about Henry Darger:
In the Realms, there are numerous characters named after Darger…These Dargers do not all seem to be distinct in the author’s mind, and it’s often confusing which one is being referred to in any given instance.
Darger’s paintings are filled with prepubescent girls—usually the Vivian girls, but there are also sometimes anonymous child slaves, etc. They are usually depicted naked, even when there is no good reason for this…The little girls usually, but not always, have penises…
Darger collected lots of random junk in the course of his menial job. He was particularly fond of photographs of children…
The inspiration for writing the Realms was the loss of a particular newspaper clipping, a photo of Elsie Paroubek, a little girl who had been murdered, and whose murder was all over the Chicago papers for a short time. Darger’s journals express no particular interest in this picture until he discovered that he had lost it. After that, he spent much of the rest of his life in a profound state of anger at God, who he believed had taken the picture from him. He saw the fictional war between Christians and Glandelineans as a way of punishing God for taking the picture by causing harm to millions of (fictional?) Christians.
…Darger’s 5000-page work The History Of My Life is putatively an autobiography. However, that word does not accurately describe the vast majority of its contents. The first several hundred pages of the work are indeed an account of Darger’s early life. However, after describing a scene in which his younger self is entranced by the sight of a powerful storm, he apparently gets distracted by the storm and spends the remaining 4000-some pages of the text describing the wake of destruction caused by a fictional twister called “Sweetie Pie,” with no further mention of his own life whatsoever.
…Near the end of his life, Darger apparently spent a lot of time playing with string. In his journal he recounts collecting string and coiling and uncoiling it, and huge amounts of string were found in his room after his death.
…Any account of Darger’s psychology is going to have to explain this weirdness. This is what, I contend, John MacGregor’s account fails to do. Fails pretty massively, in fact—massively enough that Darger seems less, rather than more, comprehensible after you read MacGregor try to “explain” him…But MacGregor also tells us that the battles sometimes lasted for hundreds of pages, and that they include vast amounts of bureaucratic detail (about particular regiments, commanders, tactical maneuvers, etc.—lots and lots of proper names), but that none of this detail is in any way self-consistent (so that it is impossible, for instance, to form a mental picture of the shape of the battlefield that does not distort over time). And that Darger is obsessed with what some might consider the more “boring” details of war—he spends huge amounts of time describing the way the supply lines work, for instance. It’s still conceivable that this sort of ridiculously long bureaucratic catalogue could be an expression of pent-up rage, but if so, it’s a very odd one, and naturally raises the question of just what sort of guy would deal with his frustrations by going home from his job every night and writing about the tedious technical details of a fictional war. But that’s exactly the question MacGregor does not want to answer…If writing this stuff was somehow pornographic for Darger, then how is it that so much of the text is composed of moralizing about the glorious Christians and the wicked Glandelineans, describing military maneuvers in mind-numbing detail, and so on, rather than talking about anything that smacks in any way of overt sexuality? Remember that this is a 15,000-page text in which no one ever gets it on; if we’re looking at a sexual fantasy, it must be the coyest sexual fantasy ever produced by the human race.
Henry Joseph Darger Jr. was an American writer, novelist and artist who worked as a hospital custodian in Chicago, Illinois. He has become famous for his posthumously discovered 15,145-page, single-spaced fantasy manuscript called The Story of the Vivian Girls, in What Is Known as the Realms of the Unreal, of the Glandeco-Angelinian War Storm, Caused by the Child Slave Rebellion, along with several hundred drawings and watercolor paintings illustrating the story.
Geschwind syndrome, also known as Gastaut-Geschwind, is a group of behavioral phenomena evident in some people with temporal lobe epilepsy. It is named for one of the first individuals to categorize the symptoms, Norman Geschwind, who published prolifically on the topic from 1973 to 1984. There is controversy surrounding whether it is a true neuropsychiatric disorder. Temporal lobe epilepsy causes chronic, mild, interictal changes in personality, which slowly intensify over time. Geschwind syndrome includes five primary changes; hypergraphia, hyperreligiosity, atypical sexuality, circumstantiality, and intensified mental life. Not all symptoms must be present for a diagnosis. Only some people with epilepsy or temporal lobe epilepsy show features of Geschwind syndrome.
Falsification may demarcate science from non-science as the rational way to test the truth of hypotheses. But experimental evidence from studies of reasoning shows that people often find falsification difficult. We suggest that domain expertise may facilitate falsification. We consider new experimental data about chess experts’ hypothesis testing. The results show that chess masters were readily able to falsify their plans. They generated move sequences that falsified their plans more readily than novice players, who tended to confirm their plans. The finding that experts in a domain are more likely to falsify their hypotheses has important implications for the debate about human rationality.
[Post-Heartbleed/Shellshock discussion of the economics of funding open source software: universally used & economically invaluable as a public good anyone can & does use, it is also essentially completely unfunded, leading to serious problems in long-term maintenance & improvement, exemplified by the Heartbleed bug—core cryptographic code run by almost every networked device on the planet could not fund more than a part-time developer.]
Our modern society—everything from hospitals to stock markets to newspapers to social media—runs on software. But take a closer look, and you’ll find that the tools we use to build software are buckling under demand…Nearly all software today relies on free, public code (called “open source” code), written and maintained by communities of developers and other talent. Much like roads or bridges, which anyone can walk or drive on, open source code can be used by anyone—from companies to individuals—to build software. This type of code makes up the digital infrastructure of our society today. Just like physical infrastructure, digital infrastructure needs regular upkeep and maintenance. In the United States, over half of government spending on transportation and water infrastructure goes just to maintenance.1 But financial support for digital infrastructure is much harder to come by. Currently, any financial support usually comes through sponsorships, direct or indirect, from software companies. Maintaining open source code used to be more manageable. Following the personal computer revolution of the early 1980s, most commercial software was proprietary, not shared. Software tools were built and used internally by companies, and their products were licensed to customers. Many companies felt that open source code was too nascent and unreliable for commercial use. In their view, software was meant to be charged for, not given away for free. Today, everybody uses open source code, including Fortune 500 companies, government, major software companies and startups. Sharing, rather than building proprietary code, turned out to be cheaper, easier, and more efficient.
This increased demand puts additional strain on those who maintain this infrastructure, yet because these communities are not highly visible, the rest of the world has been slow to notice. Most of us take opening a software application for granted, the way we take turning on the lights for granted. We don’t think about the human capital necessary to make that happen. In the face of unprecedented demand, the costs of not supporting our digital infrastructure are numerous. On the risk side, there are security breaches and interruptions in service, due to infrastructure maintainers not being able to provide adequate support. On the opportunity side, we need to maintain and improve these software tools in order to support today’s startup renaissance, which relies heavily on this infrastructure. Additionally, open source work builds developers’ portfolios and helps them get hired, but the talent pool is remarkably less diverse than in tech overall. Expanding the pool of contributors can positively affect who participates in the tech industry at large.
No individual company or organization is incentivized to address the problem alone, because open source code is a public good. In order to support our digital infrastructure, we must find ways to work together. Current examples of efforts to support digital infrastructure include the Linux Foundation’s Core Infrastructure Initiative and Mozilla’s Open Source Support (MOSS) program, as well as numerous software companies in various capacities. Sustaining our digital infrastructure is a new topic for many, and the challenges are not well understood. In addition, infrastructure projects are distributed across many people and organizations, defying common governance models. Many infrastructure projects have no legal entity at all. Any support strategy needs to accept and work with the decentralized, community-centric qualities of open source code. Increasing awareness of the problem, making it easier for institutions to contribute time and money, expanding the pool of open source contributors, and developing best practices and policies across infrastructure projects will all go a long way in building a healthy and sustainable ecosystem.
Joel Spolsky in 2002 identified a major pattern in technology business & economics: the pattern of “commoditizing your complement”, an alternative to vertical integration, where companies seek to secure a chokepoint or quasi-monopoly in products composed of many necessary & sufficient layers by dominating one layer while fostering so much competition in another layer above or below its layer that no competing monopolist can emerge, prices are driven down to marginal costs elsewhere in the stack, total price drops & increases demand, and the majority of the consumer surplus of the final product can be diverted to the quasi-monopolist. A classic example is the commodification of PC hardware by the Microsoft OS monopoly, to the detriment of IBM & benefit of MS.
This pattern explains many otherwise odd or apparently self-sabotaging ventures by large tech companies into apparently irrelevant fields, such as the high rate of releasing open-source contributions by many Internet companies or the intrusion of advertising companies into smartphone manufacturing & web browser development & statistical software & fiber-optic networks & municipal WiFi & radio spectrum auctions & DNS (Google): they are pre-emptive attempts to commodify another company elsewhere in the stack, or defenses against it being done to them.
A Public Lending Right (PLR) programme is a programme intended to either compensate authors for the potential loss of sales from their works being available in public libraries or as a governmental support of the arts, through support of works available in public libraries, such as books, music and artwork.
We propose a design for philanthropic or publicly-funded seeding to allow (near) optimal provision of a decentralized, self-organizing ecosystem of public goods. The concept extends ideas from Quadratic Voting to a funding mechanism for endogenous community formation. Citizens make public goods contributions to projects of value to them. The amount received by the project is (proportional to) the square of the sum of the square roots of contributions received. Under the “standard model” this yields first best public goods provision. Variations can limit the cost, help protect against collusion and aid coordination. We discuss applications to campaign finance, open source software ecosystems, news media finance and urban public projects. More broadly, we relate our mechanism to political theory, discussing how this solution to the public goods problem may furnish neutral and non-authoritarian rules for society that nonetheless support collective organization.
By making an individual donation, you contribute to a public good. This funding is guaranteed to be met by matching funding, widening the reach of your donation. What you do becomes “law.” By donating with one to one matching, you increase the power of any single donation in direct proportion to the size of the donation, making people more likely to feel like their money is having an impact. This is the premise of “Donate $1, [Company X] will match $1” programs. CLR takes this one step further, by emphasizing the importance of unique, individual contributors—even if they each only contribute a small amount. In short, while matching programs have traditionally chosen ‘equal matching’ by default, CLR tries to answer the question: When funding public goods, what is the ‘optimal’ match to maximize individual donations?
There’s a great amount of experimentation in sustaining open source (the Lemonade Stand by Nadia Eghbal is a seminal resource, for those interested). Yet, naturally, it’s hard to solve the problem from the ground up. Public goods are simply hard to fund. If we could find ‘ground up’ solutions, we can shift our open source conversations from ‘sustaining open source’ (all we can ask for, today) to ‘growing open source’ to promote a thriving, healthy internet infrastructure. The CLR mechanism is a concrete proposal for making grassroots donations something much larger. It requires a simple formula to achieve this goal.
Crowdfund individual donations towards open source projects.
‘Match’ or ‘top-off’ the contributions of individuals from government, grant, or private philanthropy funding
This is something we’re obviously interested in at Gitcoin. It just so happens we’ve launched a crowdfunding platform aiming contributions towards open source projects with Gitcoin Grants. The timing to explore CLR couldn’t be better.
…Gitcoin Grants, given Sybil / resistance via our Github integration, may be one of the best suited parties to help implement “Liberal Radicalism” ideas in a real and constructive way, within open source communities. These experiments fit quite well with what we’re doing both at Gitcoin Labs and Gitcoin Grants. We plan to carry on with CLR experiments. Please feel join our public Discourse around the topic and share with anyone who you think might be interested in contributing to the discussion. We don’t expect Liberal Radicalism to be a panacea, but are excited to engage in conversation and experimentation along the way. We look forward to continued conversation with the RadicalxChange community as we continue our research into structural support for a more resilient, open internet.
Gitcoin is excited to announce our first formal experiment with CLR, with $25,000 in matching contributions from Gitcoin’s CLR Fund. Our sponsors for this fund include the Ethereum Foundation and ConsenSys, via their respective grants programs, and unnamed donors in the Ethereum ecosystem.
As outlined in our recent post, the CLR mechanism is a concrete proposal for turning your small donations into something much larger. It requires a simple formula to achieve this goal.
Crowdfund individual donations towards open source projects.
A match from governments, grant programs, or private philanthropists
We are providing the $25,000 match…EDIT 2019/02/23: Results announced here.
A few weeks ago, we announced a radical experiment in Open Source Funding. Using the matching method outlined in “Liberal Radicalism”—a paper by Glen Weyl, Vitalik Buterin, and Zoë Hitzig—we announced a $25K fund to match any contributions made to 25 Ethereum infrastructure projects.
Gitcoin’s CLR Matching, By The Numbers
The top three projects in matching funding for this round were Prysmatic Labs, Moloch DAO, and Uniswap
$13,242 was contributed by 132 unique contributors across 26 projects
The top 10 projects all received over $1,000 in matching donations from the CLR fund
A total of $38,242 was contributed to Ethereum OSS infrastructure in two weeks
…We are encouraged by the results made in the first round of CLR matching and are hopeful for the emergence of new mechanisms to enable funding to public goods. We’ll explore a few of these in future rounds, and are especially interested in inflation funding mechanisms to fund public infrastructure.
Expanding the scope of memory systems: what types of understanding can they be used for?
Improving the mnemonic medium: making better cards
Two cheers for mnemonic techniques
How important is memory, anyway?
How to invent Hindu-Arabic numerals?
Part II: Exploring tools for thought more broadly:
Why isn’t there more work on tools for thought today?
Questioning our basic premises
What if the best tools for thought have already been discovered?
Isn’t this what the tech industry does? Isn’t there a lot of ongoing progress on tools for thought?
Why not work on AGI or BCI instead?
Serious work and the aspiration to canonical content
Stronger emotional connection through an inverted writing structure
Summary and Conclusion
… in Quantum Country an expert writes the cards, an expert who is skilled not only in the subject matter of the essay, but also in strategies which can be used to encode abstract, conceptual knowledge. And so Quantum Country provides a much more scalable approach to using memory systems to do abstract, conceptual learning. In some sense, Quantum Country aims to expand the range of subjects users can comprehend at all. In that, it has very different aspirations to all prior memory systems.
More generally, we believe memory systems are a far richer space than has previously been realized. Existing memory systems barely scratch the surface of what is possible. We’ve taken to thinking of Quantum Country as a memory laboratory. That is, it’s a system which can be used both to better understand how memory works, and also to develop new kinds of memory system. We’d like to answer questions such as:
What are new ways memory systems can be applied, beyond the simple, declarative knowledge of past systems?
How deep can the understanding developed through a memory system be? What patterns will help users deepen their understanding as much as possible?
How far can we raise the human capacity for memory? And with how much ease? What are the benefits and drawbacks?
Might it be that one day most human beings will have a regular memory practice, as part of their everyday lives? Can we make it so memory becomes a choice; is it possible to in some sense solve the problem of memory?
Spaced repetition is a centuries-old psychological technique for efficient memorization & practice of skills where instead of attempting to memorize by ‘cramming’, memorization can be done far more efficiently by instead spacing out each review, with increasing durations as one learns the item, with the scheduling done by software. Because of the greater efficiency of its slow but steady approach, spaced repetition can scale to memorizing hundreds of thousands of items (while crammed items are almost immediately forgotten) and is especially useful for foreign languages & medical studies.
I review what this technique is useful for, some of the large research literature on it and the testing effect (up to ~2013, primarily), the available software tools and use patterns, and miscellaneous ideas & observations on it.
Quantum computing is the use of quantum phenomena such as superposition and entanglement to perform computation. Computers that perform quantum computations are known as quantum computers. Quantum computers are believed to be able to solve certain computational problems, such as integer factorization, substantially faster than classical computers. The study of quantum computing is a subfield of quantum information science.
There is a commonly held belief that Helvetica is the signage typeface of the New York City subway system, a belief reinforced by Helvetica, Gary Hustwit’s popular 2007 documentary about the typeface. But it is not true—or rather, it is only somewhat true. Helvetica is the official typeface of the MTA today, but it was not the typeface specified by Unimark International when it created a new signage system at the end of the 1960s. Why was Helvetica not chosen originally? What was chosen in its place? Why is Helvetica used now, and when did the changeover occur? To answer those questions this essay explores several important histories: of the New York City subway system, transportation signage in the 1960s, Unimark International and, of course, Helvetica. These four strands are woven together, over nine pages, to tell a story that ultimately transcends the simple issue of Helvetica and the subway.
…The sign system that Noorda and Vignelli first proposed to the NYCTA in 1966 has proved remarkably resilient. It endures today despite a number of severe changes that make one wonder if it can even be attributed to them and Unimark anymore. Their modular system survives but only as graphic units rather than physical components. The black stripe, mistakenly created by the sign shop but then integrated into the 1970 standards manual, exists in a variety of colors and iterations. The black-on-white color scheme is now reversed. The colored disks are still used—some with the original artwork—but the colors themselves have changed. Finally, Standard Medium has given way to Helvetica Medium—or, more accurately, to Neue Helvetica 65. Yet not only is the Unimark DNA still in evidence but it has served as the basis for a much broader transportation system identity. So, the answer to whether or not Helvetica is the typeface of the New York City subway system is that it is—but that it was not.
David Volfovich Chudnovsky and Gregory Volfovich Chudnovsky are American mathematicians and engineers known for their world-record mathematical calculations and developing the Chudnovsky algorithm used to calculate the digits of π with extreme precision.
A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.
The Chudnovsky algorithm is a fast method for calculating the digits of π, based on Ramanujan’s π formulae. It was published by the Chudnovsky brothers in 1988 and was used in the world record calculations of 2.7 trillion digits of π in December 2009, 10 trillion digits in October 2011, 22.4 trillion digits in November 2016, 31.4 trillion digits in September 2018–January 2019, and 50 trillion digits on January 29, 2020.
The Hunt of the Unicorn or the Unicorn Tapestries is a series of European tapestries dating from the late Middle Ages. This series of seven tapestries now in The Cloisters in New York was possibly made – or at least designed – in Paris at the turn of the sixteenth century. They are one of the canonical works of Late Middle Ages/Early Renaissance art and show a group of noblemen and hunters in pursuit of a unicorn through an idealised French landscape. The tapestries were woven in wool, metallic threads, and silk. The vibrant colours, still evident today, were produced from dye plants: weld (yellow), madder (red), and woad (blue).
Web Authentication (WebAuthn) is a web standard published by the World Wide Web Consortium (W3C). WebAuthn is a core component of the FIDO2 Project under the guidance of the FIDO Alliance. The goal of the project is to standardize an interface for authenticating users to web-based applications and services using public-key cryptography.
A SIM swap scam is a type of account takeover fraud that generally targets a weakness in two-factor authentication and two-step verification in which the second factor or step is a text message (SMS) or call placed to a mobile telephone.
Multi-factor authentication is an electronic authentication method in which a computer user is granted access to a website or application only after successfully presenting two or more pieces of evidence to an authentication mechanism: knowledge, possession, and inherence. It protects the user from an unknown person trying to access their data such as personal ID details or financial assets.
[Parsing CRAN to see what in the strange set of R features are actually used in the real world—not laziness or its weirdo context-dependent scoping, turns out.]
R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.
…Corpus Gathering: We curated a large corpus of R programs composed of over 1000 executable R packages from the Bioconductor and CRAN repositories, as well as hand picked end-user codes and small performance benchmark programs that we wrote ourselves.
Implementation Evaluation: We evaluate the status of the R implementation. While its speed is not acceptable for use in production systems, many end users report being vastly more productive in R than in other languages. R is decidedly single-threaded, its semantics has no provisions for concurrency, and its implementation is hopelessly non-thread safe. Memory usage is also an issue; even small programs have been shown to use immoderate amounts of heap for data and meta-data. Improving speed and memory usage will require radical changes to the implementation, and a tightening of the language definition.
Language Evaluation: We examine the usage and adoption of different language features. R permits many programming styles, access to implementation details, and little enforcement of data encapsulation. Given the large corpus at hand, we look at the usage impacts of these design decisions.
…Given the nature of R, many numerical functions are written in C or Fortran; one could thus expect execution time to be dominated by native libraries. The time spent in calls to foreign functions, on average 22%, shows that this is clearly not the case.
…As a language, R is like French; it has an elegant core, but every rule comes with a set of ad-hoc exceptions that directly contradict it.
In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed and which also avoids repeated evaluations (sharing). The sharing can reduce the running time of certain functions by an exponential factor over other non-strict evaluation strategies, such as call-by-name, which repeatedly evaluate the same function, blindly, regardless whether the function can be memoized.
In computer programming, the scope of a name binding—an association of a name to an entity, such as a variable—is the part of a program where the name binding is valid, that is where the name can be used to refer to the entity. In other parts of the program the name may refer to a different entity, or to nothing at all. The scope of a name binding is also known as the visibility of an entity, particularly in older or more technical literature—this is from the perspective of the referenced entity, not the referencing name.
This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called 'the end-to-end argument', suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level. Examples discussed in the paper include bit error recovery, security using encryption, duplicate message suppression, recovery from system crashes, and delivery acknowledgement. Low level mechanisms to support these functions are justified only as performance enhancements.
Every couple of weeks I get questions along the lines of “should I checksum application files, given that the disk already has error correction?” or “given that TCP/IP has error correction on every communications packet, why do I need to have application level network error detection?” Another frequent question is “non-ECC mother boards are much cheaper—do we really need ECC on memory?” The answer is always yes. At scale, error detection and correction at lower levels fails to correct or even detect some problems. Software stacks above introduce errors. Hardware introduces more errors. Firmware introduces errors. Errors creep in everywhere and absolutely nobody and nothing can be trusted Over the years, each time I have had an opportunity to see the impact of adding a new layer of error detection, the result has been the same. It fires fast and it fires frequently. In each of these cases, I predicted we would find issues at scale. But, even starting from that perspective, each time I was amazed at the frequency the error correction code fired.
On one high scale, on-premise server product I worked upon, page checksums were temporarily added to detect issues during a limited beta release. The code fired constantly, and customers were complaining that the new beta version was “so buggy they couldn’t use it”. Upon deep investigation at some customer sites, we found the software was fine, but each customer had one, and sometimes several, latent data corruptions on disk. Perhaps it was introduced by hardware, perhaps firmware, or possibly software. It could have even been corruption introduced by one of our previous release when those pages where last written. Some of these pages may not have been written for years. I was amazed at the amount of corruption we found and started reflecting on how often I had seen “index corruption” or other reported product problems that were probably corruption introduced in the software and hardware stacks below us. The disk has complex hardware and hundreds of thousands of lines of code, while the storage area network has complex data paths and over a million lines of code. The device driver has tens of thousands of lines of code. The operating systems has millions of lines of code. And our application had millions of lines of code. Any of us can screw-up, each has an opportunity to corrupt, and its highly likely that the entire aggregated millions of lines of code have never been tested in precisely the combination and on the hardware that any specific customer is actually currently running.
…This incident reminds us of the importance of never trusting anything from any component in a multi-component system. Checksum every data block and have well-designed, and well-tested failure modes for even unlikely events. Rather than have complex recovery logic for the near infinite number of faults possible, have simple, brute-force recovery paths that you can use broadly and test frequently. Remember that all hardware, all firmware, and all software have faults and introduce errors. Don’t trust anyone or anything. Have test systems that bit flips and corrupts and ensure the production system can operate through these faults—at scale, rare events are amazingly common.
The end-to-end principle is a design framework in computer networking. In networks designed according to this principle, application-specific features reside in the communicating end nodes of the network, rather than in intermediary nodes, such as gateways and routers, that exist to establish the network.
Personal travel appears to be much more under the control of basic instincts than of economic drives. This may be the reason for the systematic mismatch between the results of cost benefit analysis and the actual behavior of travelers. In this paper we put together a list of the basic instincts that drive and contain travelers’ behavior, showing how they mesh with technological progress and economic constraints.
…the empirical conclusion reached by Zahavi is that all over the world the mean exposure time for man is around one hour per day.
…When introducing mechanical transportation with speeds higher than 5 km/hr, the physical size of the city can grow in proportion, as the historical analysis applied to the city of Berlin clearly shows (Figure 2). The commuting fields, based on cars, of a dozen American cities are reported in Figure 3. On the same chart and to the same scale, the Greek villages of Figure 1 are shown in schematic form. Cars make all the difference. As they have a speed of 6 or 7 times greater than a pedestrian, they expand daily connected space 6 or 7 times in linear terms, or about 50 times in area. Ancient cities typically had a maximum population of about 1 million people. Today the population may tend to reach 50 million people in conurbations like Mexico City (Figure 4), with a population density equal to that of Hadrian’s Rome. If the Japanese complete a Shinkansen Maglev (a magnetically levitated train) connecting Tokyo to Osaka in less than one hour with a large transportation capacity, then we may witness a city of 100 million people. If we expand the reasoning, we can muse about a city of 1 billion people, which would require an efficient transportation system with a mean speed of only 150 km/hr.
…There is another fundamental observation made by Zahavi that links instincts and money. Because of its generality it could be dubbed as a money instinct. People spend about 130Jo of their disposable income on traveling. The percentage is the same in Germany or Canada, now or in 1930. Within this budget, time and money are allocated between the various modes of transport available to the traveller in such a way as to maximize mean speed. The very poor man walks and makes 5 km/ day, the very rich man flies and makes 500 km/ day. The rest sit in between. People owning a car use it for about one hour a day (Figure 12) and travel about 50 km/ day (Figure 13). People who do not have a car spend less than 13% of their disposable income, however, presumably because public services are underrated and consequently there is no possibility of spending that share of income traveling one hour per day (Figure 14). Contrary to the risk of all this “exposure,” the number of people killed by road traffic seems to be invariant to the number of vehicles (Figure 15).
Technology introduces faster and faster means of transportation, which also are more expensive in terms of time of use. These new technologies are introduced roughly every 55 years in tune with the Kondratiev cycle. Their complete adoption takes about 100 years (Figure 16). We are now in the second Kondratiev for cars and most mobility comes from them. It was about 10 km/ day earlier, and is now about 40 km/ day. Airplanes are making inroads into this situation and they promise to bring the next leap forward in mobility, presumably with the help of Maglev trains. Hypersonic airplanes promise to glue the world into a single territory: the famous global village.
Dark Net Markets (DNMs) are websites found on the Dark Net that facilitate the anonymous trade of illegal items such as drugs and weapons. Despite repeated law enforcement interventions on DNMs, the ecosystem has continued to grow since the first DNM, Silk Road, in 2011. This research project investigates the resilience of the ecosystem and tries to understand which characteristics allow it to evade law enforcement.
This thesis is comprised of three studies. The first uses a dataset contained publicly available, scraped data from 34 DNMs to quantitatively measure the impact of a large-scale law enforcement operation, Operation Onymous, on the vendor population. This impact is compared to the impact of the closure of the DNM Evolution in an exit scam. For both events, the impact on different vendor populations (for example those who are directly affected and those who aren’t) are compared and the characteristics that make vendors resilient to each event are investigated.
In the second study, a dataset acquired from the server of the DNM Silk Road 2.0 [by UK LEA] is used to better understand the relationships between buyers and vendors. Network analysis and statistical techniques are used to explore when buyers trade and who with. This dataset is also used to measure the impact of a hack on Silk Road 2.0 on its population.
In the final study, discussions from the forum site Reddit were used to qualitatively assess user perceptions of two law enforcement interventions. These interventions were distinct in nature—one, Operation Hyperion, involved warning users and arresting individuals and the second, Operation Bayonet, actively closed a DNM. Grounded Theory was used to identify topics of conversation and directly compare the opinions held by users on each intervention.
These studies were used to evaluate hypotheses incorporated into two models of resilience. One model focuses on individual users and one on the ecosystem as a whole. The models were then used to discuss current law enforcement approaches on combating DNMs and how they might be improved.
In the first study of this thesis, several methodologies for data preparation and validation within the study of DNMs were developed. In particular, this work presents a new technique for validating a publicly available dataset that has been used in multiple studies in this field. This is the first attempt to formally validate the dataset and determine what can reasonably used for research. The discussion of the dataset has implications for research already using the dataset and future research on datasets collected using the same methodology.
In order to conduct the second study in this thesis, a dataset was acquired from a law enforcement agency. This dataset gives a new insight on how buyers behave on DNMs. Buyers are an unstudied group because their activities are often hidden and so analysis of this dataset reveals new insights into the behaviour of these users. The results of this study have been used to comment on existing work using less complete datasets and contribute new findings.
The third study in this thesis presents a qualitative analysis of two law enforcement interventions. This is the first work to assess the impact of either intervention and so provides new insights into how they were received by the DNM ecosystem. It uses qualitative techniques which are rare within this discipline and so provides a different perspective, for example by revealing how individuals perceive the harms of law enforcement interventions on DNMs. The value of this work has been recognised through its acceptance at a workshop at the IEEE European Symposium on Security and Privacy, 2019.
Part of this research has been conducted in consultation with a [UK] law enforcement agency who provided data for this research. The results of this research are framed specifically for this agency and other law enforcement groups currently investigating DNMs. Several suggestions are made on how to improve the efficacy of law enforcement interventions on DNMs
…A response to the criticisms of (Dolliver (2015a)) has been presented in (Dolliver (2015b)). Here, Dolliver (2015b) attempts to provide further evidence that Silk Road 2.0 overestimated the number of listings advertised by including the results of a manual inspection of the site (Dolliver (2015b)). The response also calls into question the use of the Branwen dataset which was collected by an independent researcher and has not been peer-reviewed. Dolliver (2015b) claims that the “manually crawling approach” adopted by Van Buskirk et al. (2015) is also problematic as it will miss listings that are uploaded and removed during the time it takes to crawl the site. Finally, other, unpublished datasets cited in (Dolliver (2015b)) also point to Silk Road 2.0 being especially volatile in nature before it was closed down and show that the number of listings varied by thousands from week to week. This volatility could potentially explain the contradicting depictions of Silk Road 2.0 given by (Dolliver (2015a)) and (Munksgaard et al. (2016)) and allow for both studies to have accurately described the site. However, empirical evidence in the form of police reports that describe the size of Silk Road 2.0 after its closure shows that the data collected by Dolliver (2015a) is an underestimate. Indeed, new data presented in this body of work also demonstrates that Silk Road 2.0 was bigger than Dolliver (2015a) claims, even at the beginning of its lifetime.
Using a Markov Chain Monte Carlo optimization algorithm and a computer simulation, I find the passenger ordering which minimizes the time required to board the passengers onto an airplane. The model that I employ assumes that the time that a passenger requires to load his or her luggage is the dominant contribution to the time needed to completely fill the aircraft. The optimal boarding strategy may reduce the time required to board and airplane by over a factor of four and possibly more depending upon the dimensions of the aircraft. In addition, knowledge of the optimal boarding procedure can inform decisions regarding changes to methods that are employed by a particular carrier. I explore some of the salient features of the optimal boarding method and discuss practical modifications to the optimal. Finally, I mention some of the benefits that could come from implementing an improved passenger boarding scheme.
A Vickrey auction is a type of sealed-bid auction. Bidders submit written bids without knowing the bid of the other people in the auction. The highest bidder wins but the price paid is the second-highest bid. This type of auction is strategically similar to an English auction and gives bidders an incentive to bid their true value. The auction was first described academically by Columbia University professor William Vickrey in 1961 though it had been used by stamp collectors since 1893. In 1797 Johann Wolfgang von Goethe sold a manuscript using a sealed-bid, second-price auction.
Computer keyboards can be classified by the switch technology that they use. Computer alphanumeric keyboards typically have 80 to 110 durable switches, generally one for each key. The choice of switch technology affects key response and pre travel. Some newer keyboard models use hybrids of various technologies to achieve greater cost savings.
[A look into the signature typefaces of Evangelion: Matisse EB, mechanical compression for distorted resizing, and title cards. Covered typefaces: Matisse/Helvetica/Neue Helvetica/Times/Helvetica Condensed/Chicago/Cataneo/Futura/Eurostile/ITC Avant Garde Gothic/Gill Sans.]
Evangelion was among the first anime to create a consistent typographic identity across its visual universe, from title cards to NERV’s user interfaces. Subcontractors usually painted anything type-related in an anime by hand, so it was a novel idea at the time for a director to use desktop typesetting to exert typographic control. Although sci-fi anime tended to use either sans serifs or hand lettering that mimicked sans serifs in 1995, Anno decided to buck that trend, choosing a display serif for stronger visual impact. After flipping through iFontworks’ specimen catalog, he personally selected the extra-bold (EB) weight of Matisse (マティス), a Mincho-style serif family…A combination of haste and inexperience gave Matisse a plain look and feel, which turned out to make sense for Evangelion. The conservative skeletal construction restrained the characters’ personality so it wouldn’t compete with the animation; the extreme stroke contrast delivered the desired visual punch. Despite the fact that Matisse was drawn on the computer, many of its stroke corners were rounded, giving it a hand-drawn, fin-de-siècle quality.
…In addition to a thorough graphic identity, Evangelion also pioneered a deep integration of typography as a part of animated storytelling—a technique soon to be imitated by later anime. Prime examples are the show’s title cards and flashing type-only frames mixed in with the animation. The title cards contain nothing but crude, black-and-white Matisse EB, and are often mechanically compressed to fit into interlocking compositions. This brutal treatment started as a hidden homage to the title cards in old Toho movies from the sixties and seventies, but soon became visually synonymous with Evangelion after the show first aired. Innovating on the media of animated storytelling, Evangelion also integrates type-only flashes. Back then, these black-and-white, split-second frames were Anno’s attempt at imprinting subliminal messages onto the viewer, but have since become Easter eggs for die-hard Evangelion fans as well as motion signatures for the entire franchise.
…Established in title cards, this combination of Matisse EB and all-caps Helvetica soon bled into various aspects of Evangelion, most notably the HUD user interfaces in NERV. Although it would be possible to attribute the mechanical compression to technical limitations or typographic ignorance, its ubiquitous occurrence did evoke haste and, at times, despair—an emotional motif perfectly suited to a post-apocalyptic story with existentialist themes.
In films, an intertitle, also known as a title card, is a piece of filmed, printed text edited into the midst of the photographed action at various points. Intertitles used to convey character dialogue are referred to as "dialogue intertitles", and those used to provide related descriptive/narrative material are referred to as "expository intertitles". In modern usage, the terms refer to similar text and logo material inserted at or near the start of films and television shows.
Roar is a 1981 American adventure comedy film written, produced, and directed by Noel Marshall. Roar's story follows Hank, a naturalist who lives on a nature preserve in Africa with lions, tigers, and other big cats. When his family visits him, they are instead confronted by the group of animals. The film stars Marshall as Hank, his real-life wife Tippi Hedren as his wife Madeleine, with Hedren's daughter Melanie Griffith and Marshall's sons John and Jerry Marshall in supporting roles.
The Beginning Was the End is a 1971 pseudo-scientific book written by Oscar Kiss Maerth that claims that humankind evolved from cannibalistic apes. Its premise:
One ape discovered that eating the fresh brain of one's own kind increases the sexual impulses. He and his descendants became addicted to brains and hunted for them. It was not until later that they noticed that their intelligence increased as a result. The outcome of this process is HOMO SAPIENS.
Old Internet users will remember Rotten.com. I didn’t much care for the main site, but I enjoyed their writeups in the ‘Rotten Library’ section. The website has been offline for years now and shows no sign of coming back, so I have put up a mirror of the Rotten Library (What’s New).
(I used zscole’s archive, compressed the JPEGs, and rewrote all the absolute links to make it work on Gwern.net, and fixed a few errors I found along the way—principally broken links and links to entries which appear to’ve never been written.)
Rotten.com was a shock site with the tagline "An archive of disturbing illustration," active from 1996 to 2012. It was devoted to morbid curiosities, pictures of violent acts, deformities, autopsy or forensic photographs, depictions of perverse sex acts, and disturbing or misanthropic historical curiosities. Founded in 1996, it was run by a developer known as Soylent Communications. Site updates slowed in 2009, with the final update in February 2012. The website's front page was last archived in January 2018.
Edward Rolf Tufte is an American statistician and professor emeritus of political science, statistics, and computer science at Yale University. He is noted for his writings on information design and as a pioneer in the field of data visualization.
Dating back to medieval manuscripts, text has often been highlighted using a particular distinct bright red. The contrast of black and red on a white background is highly visible and striking, and this has been reused many times, in a way which I have not noticed for other colors. I call these uses rubrication and collate examples I have noticed from many time periods. This design pattern does not seem to have a widely-accepted name or be commonly discussed, so I propose extending the term “rubrication” to all instances of this pattern, not merely religious texts.
Why this rubrication design pattern? Why red, specifically, and not, say, orange or purple? Is it just a historical accident? Cross-cultural research suggests that for humans, red may be intrinsically more noticeable & has a higher contrast with black, explaining its perennial appeal as a design pattern.
Regardless, it is a beautiful design pattern which has been used in many interesting ways over the millennia, and perhaps may inspire the reader.
Manon is an opéra comique in five acts by Jules Massenet to a French libretto by Henri Meilhac and Philippe Gille, based on the 1731 novel L’histoire du chevalier des Grieux et de Manon Lescaut by the Abbé Prévost. It was first performed at the Opéra-Comique in Paris on 19 January 1884, with sets designed by Eugène Carpezat, Auguste Alfred Rubé and Philippe Chaperon, and Jean-Baptiste Lavastre.
Review of Manon, a French opera about a beautiful countryside girl whose craving for the ‘good life’ leads her into the Parisian demimondaine as a courtesan.
Exemplifying Girardian mimesis, Manon wants what everyone else wants, and wants what she can’t have, like her spurned lover only once he has taken religious vows. She plays off suitors, who compete in negative-sum games for her favors, until eventually she goes too far and is imprisoned, destroying her health; cast down, she realizes that ‘living only for pleasure’ was not the ideal life.
This scenario seems to exemplify the extent to which polygynous competition can result in negative-sum games, making almost everyone worse off except a few winners (and those possibly only temporarily).
Turandot is an opera in three acts by Giacomo Puccini, posthumously completed by Franco Alfano in 1926, and set to a libretto in Italian by Giuseppe Adami and Renato Simoni. Its best-known aria is "Nessun dorma".
Fairy tale opera: a despotic Oriental princess chops off the heads of suitors if they cannot answer her riddle. A random prince happens to do so, sets her a counter-riddle, she fails, he tells her the answer, she falls in love with him for no reason, The End. Yeah, pretty dumb. Some amazing costumes and sets, though.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.
Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator’s input. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128×128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous best IS of 52.52 and FID of 18.6.
Thanks to the recent development of deep generative models, it is becoming possible to generate high-quality images with both fidelity and diversity. However, the training of such generative models requires a large dataset. To reduce the amount of data required, we propose a new method for transferring prior knowledge of the pre-trained generator, which is trained with a large dataset, to a small dataset in a different domain. Using such prior knowledge, the model can generate images leveraging some common sense that cannot be acquired from a small dataset. In this work, we propose a novel method focusing on the parameters for batch statistics, scale and shift, of the hidden layers in the generator. By training only these parameters in a supervised manner, we achieved stable training of the generator, and our method can generate higher quality images compared to previous methods without collapsing, even when the dataset is small ( 100). Our results show that the diversity of the filters acquired in the pre-trained generator is important for the performance on the target domain. Our method makes it possible to add a new class or domain to a pre-trained generator without disturbing the performance on the original domain.
[Waifu Labs is an interactive website for generating (1024px?) anime faces using a customized StyleGAN trained on Danbooru2018. Similar to Artbreeder, it supports face exploration and face editing, and at the end, a user can purchase prints of a particular face.]
We taught a world-class artificial intelligence how to draw anime. All the drawings you see were made by a non-human artist! Wild, right? It turns out machines love waifus almost as much as humans do. We proudly present the next chapter of human history: lit waifu commissions from the world's smartest AI artist. In less than 5 minutes, the artist learns your preferences to make the perfect waifu just for you.
A single case study recently documented one woman’s ability to recall accurately vast amounts of autobiographical information, spanning most of her lifetime, without the use of practiced mnemonics (Parker et al 2006). The current study reports findings based on 11 participants expressing this same memory ability, now referred to as Highly Superior Autobiographical Memory (HSAM). Participants were identified and subsequently characterized based on screening for memory of public events. They were then tested for personal autobiographical memories as well as for memory assessed by laboratory memory tests. Additionally, whole-brain structural MRI scans were obtained. Results indicated that HSAM participants performed significantly better at recalling public as well as personal autobiographical events as well as the days and dates on which these events occurred. However, their performance was comparable to age- and sex-matched controls on most standard laboratory memory tests. Neuroanatomical results identified nine structures as being morphologically different from those of control participants. The study of HSAM may provide new insights into the neurobiology of autobiographical memory.
Heartbleed is a security bug in the OpenSSL cryptography library, which is a widely used implementation of the Transport Layer Security (TLS) protocol. It was introduced into the software in 2012 and publicly disclosed in April 2014. Heartbleed may be exploited regardless of whether the vulnerable OpenSSL instance is running as a TLS server or client. It results from improper input validation in the implementation of the TLS heartbeat extension. Thus, the bug's name derives from heartbeat. The vulnerability is classified as a buffer over-read, a situation where more data can be read than should be allowed.
Shellshock, also known as Bashdoor, is a family of security bugs in the Unix Bash shell, the first of which was disclosed on 24 September 2014. Shellshock could enable an attacker to cause Bash to execute arbitrary commands and gain unauthorized access to many Internet-facing services, such as web servers, that use Bash to process requests.
Helvetica is an independent feature-length documentary film about typography and graphic design, centered on the eponymous typeface. Directed by Gary Hustwit, it was released in 2007 to coincide with the 50th anniversary of the typeface's introduction in 1957 and is considered the first of the Design Trilogy by the director.
The Metropolitan Transportation Authority (MTA) is a public benefit corporation responsible for public transportation in the New York City metropolitan area of the U.S. state of New York. The MTA is the largest public transit authority in the United States, serving 12 counties in Downstate New York, along with two counties in southwestern Connecticut under contract to the Connecticut Department of Transportation, carrying over 11 million passengers on an average weekday systemwide, and over 850,000 vehicles on its seven toll bridges and two tunnels per weekday.
Unimark International was an international design firm headquartered in Chicago, Illinois. It was founded in 1965 by six partners: Ralph Eckerstrom, Massimo Vignelli, Bob Noorda, James Fogelman, Wally Gutches, and Larry Klein. Although they were not listed as founding partners, Jay Doblin and Robert Moldafsky joined the new firm almost immediately. Initially, Unimark had three offices: Chicago, Milan and New York. Additional offices opened around the world, but these were often short-lived as the client base and funding varied, and as American and global economic issues influenced the viability of each office.
The New York City Subway is a rapid transit system owned by the City of New York and leased to the New York City Transit Authority, a subsidiary agency of the state-run Metropolitan Transportation Authority (MTA). Opened on October 27, 1904, the New York City Subway is one of the world's oldest public transit systems, one of the most-used, and the one with the most stations. The New York City Subway is the largest rapid transit system in the world by number of stations, with 472 stations in operation. Stations are located throughout the boroughs of Manhattan, Brooklyn, Queens, and the Bronx.
Massimo Vignelli was an Italian designer who worked in a number of areas ranging from package design through houseware design and furniture design to public signage and showroom design. He was the co-founder of Vignelli Associates, with his wife, Lella. His ethos was, "If you can design one thing, you can design everything," and this was reflected in the broad range of his work.
Generative neural networks, such as GANs, have struggled for years to generate decent-quality anime faces, despite their great success with photographic imagery such as real human faces. The task has now been effectively solved, for anime faces as well as many other domains, by the development of a new generative adversarial network, StyleGAN, whose source code was released in February 2019.
The appendix gives samples of my failures with earlier GANs for anime face generation, and I provide samples & model from a relatively large-scale BigGAN training run suggesting that BigGAN may be the next step forward to generating full-scale anime images.
A minute of reading could save an hour of debugging!
I continue my AI poetry generation experiments with OpenAI’s 2020 GPT-3, which is 116× larger, and much more powerful, than the 2019 GPT-2. GPT-3, however, is not merely a quantitative tweak yielding “GPT-2 but better”—it is qualitatively different, exhibiting eerie runtime learning capabilities allowing even the raw model, with zero finetuning, to “meta-learn” many textual tasks purely by example or instruction. One does not train or program GPT-3 in a normal way, but one engages in dialogue and writes prompts to teach GPT-3 what one wants.
Experimenting through the OpenAI Beta API in June 2020, I find that GPT-3 does not just match my finetuned GPT-2-1.5b-poetry for poem-writing quality, but exceeds it, while being versatile in handling poetry, Tom Swifty puns, science fiction, dialogue like Turing’s Turing-test dialogue, literary style parodies… As the pièce de résistance, I recreate Stanislaw Lem’s Cyberiad’s “Trurl’s Electronic Bard” poetry using GPT-3. (Along the way, I document instances of how the BPE text encoding unnecessarily damagesGPT-3’s performance on a variety of tasks, how to best elicit the highest-quality responses, common errors people make in using GPT-3, and test out GPT-3’s improvements in NN weak points like logic or commonsense knowledge.)
GPT-3’s samples are not just close to human level: they are creative, witty, deep, meta, and often beautiful. They demonstrate an ability to handle abstractions, like style parodies, I have not seen in GPT-2 at all. Chatting with GPT-3 feels uncannily like chatting with a human. I was impressed by the results reported in the GPT-3 paper, and after spending a week trying it out, I remain impressed.
This page records GPT-3 samples I generated in my explorations, and thoughts on how to use GPT-3 and its remaining weaknesses. I hope you enjoy them even a tenth as much as I enjoyed testing GPT-3 and watching the completions scroll across my screen.
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
“Language Models are Few-Shot Learners”, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei (2020-05-28):
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions—something which current NLP systems still largely struggle to do.
Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10× more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
This report describes AJ, a woman whose remembering dominates her life. Her memory is “nonstop, uncontrollable, and automatic.” AJ spends an excessive amount of time recalling her personal past with considerable accuracy and reliability. If given a date, she can tell you what she was doing and what day of the week it fell on. She differs from other cases of superior memory who use practiced mnemonics to remember vast amounts of personally irrelevant information.
We propose the name hyperthymestic syndrome, from the Greek word thymesis meaning remembering, and that AJ is the first reported case. [Since renamed Highly Superior Autobiographical Memory (HSAM).]
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
Danbooru2019 Portraits is a dataset of n = 302,652 (16GB) 512px anime faces cropped from ‘solo’ SFW Danbooru2019 images in a relatively broad ‘portrait’ style encompassing necklines/ears/hats/etc rather than tightly focused on the face, upscaled to 512px as necessary, and low-quality images deleted by manual review using Discriminator ranking. This dataset has been used for creating TWDNE.
Deep learning for computer revision relies on large annotated datasets. Classification/categorization has benefited from the creation of ImageNet, which classifies 1m photos into 1000 categories. But classification/categorization is a coarse description of an image which limits application of classifiers, and there is no comparably large dataset of images with many tags or labels which would allow learning and detecting much richer information about images. Such a dataset would ideally be >1m images with at least 10 descriptive tags each which can be publicly distributed to all interested researchers, hobbyists, and organizations. There are currently no such public datasets, as ImageNet, Birds, Flowers, and MS COCO fall short either on image or tag count or restricted distribution. I suggest that the “image -boorus” be used. The image boorus are longstanding web databases which host large numbers of images which can be ‘tagged’ or labeled with an arbitrary number of textual descriptions; they were developed for and are most popular among fans of anime, who provide detailed annotations.
The best known booru, with a focus on quality, is Danbooru. We provide a torrent/rsync mirror which contains ~3tb of 3.69m images with 108m tag instances (of 392k defined tags, ~29/image) covering Danbooru from 2005-05-24–2019-12-31 (final ID: #3,734,659), providing the image files & a JSON export of the metadata. We also provide a smaller torrent of SFW images downscaled to 512×512px JPGs (295GB; 2,828,400 images) for convenience.
Our hope is that a Danbooru2019 dataset can be used for rich large-scale classification/tagging & learned embeddings, test out the transferability of existing computer vision techniques (primarily developed using photographs) to illustration/anime-style images, provide an archival backup for the Danbooru community, feed back metadata improvements & corrections, and serve as a testbed for advanced techniques such as conditional image generation or style transfer.
Discussion of how to modify existing images with GANs. There are several possibilities: train another NN to turn an image back into the original encoding; run blackbox search on encodings, repeatedly tweaking it to approximate a target face; or the whitebox approach, directly backpropagating through the model from the image to the encoding while holding the model fixed. All of these have been implemented for StyleGAN, and a combination works best. There are even GUIs for editing StyleGAN anime faces!
I explore BigGAN, another recent GAN with SOTA results on the most complex image domain tackled by GANs so far, ImageNet. BigGAN’s capabilities come at a steep compute cost, however. I experiment with 128px ImageNet transfer learning (successful) with ~6 GPU-days, and from-scratch 256px anime portraits of 1000 characters on a 8×2080ti machine for a month (mixed results). My BigGAN results are good but compromised by practical problems with the released BigGAN code base. While BigGAN is not yet superior to StyleGAN for many purposes, BigGAN-like approaches may turn out to be necessary to scale to whole anime images.
Compared to GPT-2, GPT-3 improves performance on character-level tasks like rhyming, alliteration, punning, anagrams or permutations, acrostic poems, and arithmetic less than expected, despite being very good at many other closely-related kinds of writings like satire.
Why? A plausible explanation is an obscure technical detail: as a performance optimization, GPT does not see characters but sub-word-chunks called “byte-pair encodings” (BPEs). Because GPTs never see characters but opaque partial-words, which vary chaotically based on the specific word and even the surrounding context, they are unable to easily learn about character-level aspects of language, like similar spellings or sounds, and are forced to learn relationships much more indirectly, like by brute-force memorizing of pairs of words.
Some experiments with reformatting GPT-3’s poorest-performing tasks to avoid inconsistent BPE encodings of strings shows small to large performance gains, consistent with this theory.
The GPT-3 neural network is so large a model in terms of power and dataset that it exhibits qualitatively different behavior: you do not apply it to a fixed set of tasks which were in the training dataset, requiring retraining on additional data if one wants to handle a new task (as one would have to retrain GPT-2); instead, you interact with it, expressing any task in terms of natural language descriptions, requests, and examples, tweaking the prompt until it “understands” & it meta-learns the new task based on the high-level abstractions it learned from the pretraining.
This is a rather different way of using a DL model, and it’s better to think of it as a new kind of programming, where the prompt is now a “program” which programs GPT-3 to do new things.
The Discriminator of a GAN is trained to detect outliers or bad datapoints. So it can be used for cleaning the original dataset of aberrant samples. This works reasonably well and I obtained BigGAN/StyleGAN quality improvements by manually deleting the worst samples (typically badly-cropped or low-quality faces), but has peculiar behavior which indicates that the Discriminator is not learning anything equivalent to a “quality” score but may be doing some form of memorization of specific real datapoints. What does this mean for how GANs work?
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific datasets.
We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset—matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples.
The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text.
These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
In February 2019, following up on my 2015–2016 text-generation experiments with char-RNNs, I experiment with the cutting-edge Transformer NN architecture for language modeling & text generation. Using OpenAI’s GPT-2-117M (117M) model pre-trained on a large Internet corpus and nshepperd’s finetuning code, I retrain GPT-2-117M on a large (117MB) Project Gutenberg poetry corpus. I demonstrate how to train 2 variants: “GPT-2-poetry”, trained on the poems as a continuous stream of text, and “GPT-2-poetry-prefix”, with each line prefixed with the metadata of the PG book it came from. In May 2019, I trained the next-largest GPT-2, GPT-2-345M, similarly, for a further quality boost in generated poems. In October 2019, I retrained GPT-2-117M on a Project Gutenberg corpus with improved formatting, and combined it with a contemporary poem dataset based on Poetry Foundation’swebsite. .> With just a few GPU-days on 1080ti GPUs, GPT-2-117M finetuning can produce high-quality poetry which is more thematically consistent than my char-RNN poems, capable of modeling subtle features like rhyming, and sometimes even a pleasure to read. I list the many possible ways to improve poem generation and further approach human-level poems. For the highest-quality AI poetry to date, see my followup page, “GPT-3 Creative Writing”.
Char-RNNs are unsupervised generative models which learn to mimic text sequences. I suggest extending char-RNNs with inline metadata such as genre or author prefixed to each line of input, allowing for better & more efficient metadata, and more controllable sampling of generated output by feeding in desired metadata. A 2015 experiment using torch-rnn on a set of ~30 Project Gutenberg e-books (1 per author) to train a large char-RNN shows that a char-RNN can learn to remember metadata such as authors, learn associated prose styles, and often generate text visibly similar to that of a specified author.
The Poetry Foundation is a Chicago-based American foundation created to promote poetry in the wider culture. It was formed from Poetry magazine, which it continues to publish, with a 2003 gift of $200 million from philanthropist Ruth Lilly.
In November 2019, I experimented with training a GPT-2 neural net model to generate folk music in the high-level ABC music text format, following previous work in 2016 which used a char-RNN trained on a ‘The Session’ dataset. A GPT-2 hypothetically can improve on an RNN by better global coherence & copying of patterns, without problems with the hidden-state bottleneck.
I encountered problems with the standard GPT-2 model’s encoding of text which damaged results, but after fixing that, I successfully trained it on n = 205,304 ABC music pieces taken from The Session & ABCnotation.com. The resulting music samples are in my opinion quite pleasant. (A similar model was later retrained by Geerlings & Meroño-Peñuela 2020.)
We followed the ABC folk model with an ABC-MIDI model: a dataset of 453k ABC pieces decompiled from MIDI pieces, which fit into GPT-2-117M with an expanded context window when trained on TPUs. The MIDI pieces are far more diverse and challenging, and GPT-2 underfits and struggles to produce valid samples but when sampling succeeds, it can generate even better musical samples.
Standard language generation neural network models, like GPT-2, are trained via likelihood training to imitate human text corpuses. Generated text suffers from persistent flaws like repetition, due to myopic generation word-by-word, and cannot improve on the training data because they are trained to predict ‘realistic’ completions of the training data.
A proposed alternative is to use reinforcement learning to train the NNs, to encourage global properties like coherence & lack of repetition, and potentially improve over the original corpus’s average quality. Preference learning trains a reward function on human ratings, and uses that as the ‘environment’ for a blackbox DRL algorithm like PPO.
OpenAI released a codebase implementing this dual-model preference learning approach for textual generation, based on GPT-2. Having previously used GPT-2 for poetry & music generation, I experimented with GPT-2 preference learning for unconditional music and poetry generation.
I found that preference learning seemed to work better for music than poetry, and seemed to reduce the presence of repetition artifacts, but the results, at n≅7,400 ratings compiled over 23 iterations of training+sampling November 2019–January 2020, are not dramatically better than alternative improvements like scaling up models or more thorough data-cleaning or more stringent sample curation. My blind ratings using n≅200 comparisons showed no large advantage for the RL-tuned samples (winning only 93 of 210 comparisons, or 46%).
This may be due to insufficient ratings, bad hyperparameters, or not using samples generated with common prefixes, but I suspect it’s the former, as some NLP tasks in Ziegler et al 2019 required up to 60k ratings for good performance, and the reward model appeared to achieve poor performance & succumb to adversarial examples easily.
Working with it, I suspect that preference learning is unnecessarily sample-inefficient & data-inefficient, and that the blackbox reinforcement learning approach is inferior to directly using the reward model to optimize text samples, and propose two major architectural overhauls: have the reward model directly model the implied ranking of every datapoint, and drop the agent model entirely in favor of backprop-powered gradient ascent which optimizes sequences to maximize the reward model’s output.
…Black is GPT-2. Its excuse [for this chess blunder] is that it’s a text prediction program with no concept of chess. As far as it knows, it’s trying to predict short alphanumeric strings like “e2e4” or “Nb7”. Nobody told it this represents a board game. It doesn’t even have a concept of 2D space that it could use to understand such a claim. But it still captured my rook! Embarrassing!…Last month, I asked him if he thought GPT-2 could play chess. I wondered if he could train it on a corpus of chess games written in standard notation (where, for example, e2e4 means “move the pawn at square e2 to square e4”). There are literally millions of games written up like this. GPT-2 would learn to predict the next string of text, which would correspond to the next move in the chess game. Then you would prompt it with a chessboard up to a certain point, and it would predict how the chess masters who had produced its training data would continue the game – ie make its next move using the same heuristics they would. Gwern handed the idea to his collaborator Shawn Presser, who had a working GPT-2 chess engine running within a week:…You can play against GPT-2 yourself by following the directions in the last tweet, though it won’t be much of a challenge for anyone better than I am.
…What does this imply? I’m not sure (and maybe it will imply more if someone manages to make it actually good). It was already weird to see something with no auditory qualia learn passable poetic meter. It’s even weirder to see something with no concept of space learn to play chess. Is any of this meaningful? How impressed should we be that the same AI can write poems, compose music, and play chess, without having been designed for any of those tasks? I still don’t know.
While training a GPT-2-117M on a folk music corpus written in ABC format, persistent syntax errors kept being generated by an otherwise-high-quality model: random spaces would be generated, rendering a music piece either erroneous or lower-quality. Why? It seems to be some issue with the GPT BPE encoder handling of spaces which makes it difficult to emit the right space-separated characters. We found that ABC does not actually require spaces, and we simply removed all spaces from the corpus—noticeably improving quality of generated pieces.
Generating symbolic music with language models is a promising research area, with potential applications in automated music composition. Recent work shows that Transformer architectures can learn to generate compelling four-instrument scores from large MIDI datasets. In this paper, we re-train the small (117M) GPT-2 model with a large dataset in ABC notation, and generate samples of single-instrument folk music. Our BLEU and ROUGE based quantitative, and survey based qualitative, evaluations suggest that ABC notation is learned with syntactical and semantic correctness, and that samples contain robust and believable n-grams.
To expand the ABC GPT-2 model to cover a wider variety of musical genres, I turn to the next-most compact widespread music encoding format: MIDI. There are hundreds of thousands of MIDIs which can be decompiled to ABC format, averaging ~10k BPEs—within GPT-2-117M’s feasible context window when trained on TPUs (which permit training of context windows up to 30k wide).
We compile the ABC from before and 2 large MIDI datasets, and convert to ABC, yielding ~453k usable ABC-MIDI musical files (~5.1GB of text). We trained January–April 2020 on our TPU swarm (with many interruptions), achieving a final loss of ~0.2 (underfit).
Sampling from the final model is hit-or-miss as it is prone to the likelihood repetition trap and it generates instruments one-by-one so it is common for instruments to be cut off or otherwise broken during sampling (indicating that sampling is increasingly a bigger problem than training for long-range sequence modeling). However, successful pieces are possible, and are musically far more diverse than the folk ABC corpus, with many pleasingly complex samples.
This work applies natural language modeling to generate plausible strategic moves in the ancient game of Go. We train the Generative Pretrained Transformer (GPT-2) to mimic the style of Go champions as archived in Smart Game Format (SGF), which offers a text description of move sequences. The trained model further generates valid but previously unseen strategies for Go. Because GPT-2 preserves punctuation and spacing, the raw output of the text generator provides inputs to game visualization and creative patterns, such as the Sabaki project’s game engine using auto-replays. Results demonstrate that language modeling can capture both the sequencing format of championship Go games and their strategic formations. Compared to random game boards, the GPT-2 fine-tuning shows efficient opening move sequences favoring corner play over less advantageous center and side play. Game generation as a language modeling task offers novel approaches to more than 40 other board games where historical text annotation provides training data (e.g., Amazons & Connect 4/6).
This work demonstrates that natural language transformers can support more generic strategic modeling, particularly for text-archived games. In addition to learning natural language skills, the abstract transformer architecture can generate meaningful moves on a chessboard. With further fine-tuning, the transformer learns complex gameplay by training on 2.8 million chess games in Portable Game Notation. After 30,000 training steps, OpenAI’s Generative Pre-trained Transformer (GPT-2) optimizes weights for 774 million parameters. This fine-tuned Chess Transformer generates plausible strategies and displays game formations identifiable as classic openings, such as English or the Slav Exchange. Finally, in live play, the novel model demonstrates a human-to-transformer interface that correctly filters illegal moves and provides a novel method to challenge the transformer’s chess strategies. We anticipate future work will build on this transformer’s promise, particularly in other strategy games where features can capture the underlying complex rule syntax from simple but expressive player annotations.
A decompiler is a computer program that takes an executable file as input, and attempts to create a high level source file which can be recompiled successfully. It is therefore the opposite of a compiler, which takes a source file and makes an executable. Decompilers are usually unable to perfectly reconstruct the original source code, and as such, will frequently produce obfuscated code. Nonetheless, decompilers remain an important tool in the reverse engineering of computer software.