December 2019 gwern.net newsletter with links on gene editing, the Replication Crisis, computer latency, and suffering; 4 book reviews, 2 opera/movie reviews, and 2 anime reviews. newsletter 2019-11-21–2021-01-04finishedcertainty: logimportance: 0
‘Small multiples’ show change across variables.
Visual Explanations: Images and Quantities, Evidence and Narrative, Tufte 1997 (less of a hodge-podge than Envisioning Information, Tufte walks through, as usual, graphs: how to show multiple versions of things, such as 4D data, on 2D paper? Key case studies are John Snow’s cholera maps of infections vs location vs time, the Challenger disaster’s obscuration of problems vs temperature over the course of Space Shuttle launches, stage magician diagrams of tricks, which illustrate change over time; Tufte then considers showing parallel versions which differ in some abstract dimension; then graphs which must show change in both space & time, such as sunspots or Saturn’s rings, representing Tufte’s usual concept of “small multiples”; in a final section, Tufte highlights favorite art pieces of his which are diagrammatic or symbolic in some sense akin to the foregoing chapters. As usual, a pleasure to read, and it furnished some examples for my page on rubrication too.)
Surprisingly entertaining if not immediately useful.The Elements of Typographic Style (third edition), Bringhurst 2004 (I decided to read this based on Rutter’s web version of it; Bringhurst is unexpectedly amusing—I wish I cared about anything as much as Bringhurst cares about typography. No comparison is too strong to condemn a typographical sin: editing fonts, for example makes it “easy for a designer or compositor with no regard for letters to squish them into cattle trains and ship them to the slaughter”—with another author, one would assume the Holocaust connotations were unintentional, but with Bringhurst… Like Rutter, there is a great of material on ratios and page layout which smacks of numerology, but that can be skipped easily, and the rest of the material is useful. The book itself is, of course, nice typographically, exemplifying the use of sidenotes, and although he surprisingly doesn’t cover it at length (like he does what seems like everything else), he even provides two nice examples of rubrication for my page.)
Newsletter tag: archive of all issues back to 2013 for the gwern.net newsletter (monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews.)
This page is a changelog for Gwern.net: a monthly reverse chronological list of recent major writings/changes/additions.
Following my writing can be a little difficult because it is often so incremental. So every month, in addition to my regular /r/Gwern subreddit submissions, I write up reasonably-interesting changes and send it out to the mailing list in addition to a compilation of links & reviews (archives).
A subreddit for posting links of interest and also for announcing updates to gwern.net (which can be used as a RSS feed). Submissions are categorized similar to the monthly newsletter and typically will be collated there.
In November 2019, I experimented with training a GPT-2 neural net model to generate folk music in the high-level ABC music text format, following previous work in 2016 which used a char-RNN trained on a ‘The Session’ dataset. A GPT-2 hypothetically can improve on an RNN by better global coherence & copying of patterns, without problems with the hidden-state bottleneck.
I encountered problems with the standard GPT-2 model’s encoding of text which damaged results, but after fixing that, I successfully trained it on n = 205,304 ABC music pieces taken from The Session & ABCnotation.com. The resulting music samples are in my opinion quite pleasant. (A similar model was later retrained by Geerlings & Meroño-Peñuela 2020.)
We followed the ABC folk model with an ABC-MIDI model: a dataset of 453k ABC pieces decompiled from MIDI pieces, which fit into GPT-2-117M with an expanded context window when trained on TPUs. The MIDI pieces are far more diverse and challenging, and GPT-2 underfits and struggles to produce valid samples but when sampling succeeds, it can generate even better musical samples.
Here is how I currently understand the relationship between correlation and causality, and the collective findings of meta-scientific research:
The Replication Crisis: a shockingly large fraction of psychological research and other fields is simple random noise which cannot be replicated.
Everything Is Correlated: when we systematically measure many variables at large scale with large n, we find that ‘everything is correlated’—even things which seem to have no causal relationship whatsoever.
The Metallic Laws: empirically, most efforts to change human behavior and sociology and economics and education fail in randomized evaluation and the mean effect size of experiments in meta-analyses typically approaches zero, despite promising correlations.
Correlation ≠ Causation: so, we live in a world where research manufactures many spurious results and, even once we see through the fake findings, finding a correlation is meaningless because everything is correlated to begin with and accordingly, they are little better than experimenting at random, which doesn’t work well either.
But why is correlation ≠ causation?
Dense Causal Graphs: because, if we write down a causal graph consistent with ‘everything is correlated’ and the empirical facts of average null effects + unpredictive correlations, this implies that all variables are part of enormous dense causal graphs where each variable is connected to several others.
Incorrect Intuitions: This inequality between observable correlations and actual useful causal manipulability merely grows with larger networks, and causal networks in fields like economics or biology are far more complex than those in more ordinary everyday fields like ‘catching a ball’.
Our intuitions, formed in simple domains designed to have sparse causal networks (it would be bad if balls could make you do random things! your brain is carefully designed to control the influence of any outside forces & model the world as simple for planning purposes), turn out to be profoundly misleading in these other domains.
No, Really, Correlation ≠ Causation: This cognitive bias is why correlation ≠ causation is so difficult to internalize and accept, and honored primarily in the breach even by sophisticated researchers, and is why randomized experiments are historically late developed, neglected, counterintuitive, and criticized when run despite routinely debunking conventional wisdom of experts in almost every field.
[A bibliography of online links to papers/blogs/articles on the Replication Crisis, primarily post-2013 and curated from my newsletter, as a followup to the main article text describing the Replication Crisis.]
Hydrocephalus is a damaging brain disorder where fluids compress the brain, sometimes drastically decreasing its volume. While often extremely harmful or life-threatening when untreated, some people with severe compression nevertheless are relatively normal, and in one case (Lorber) they have been claimed to have IQs as high as 126 with a brain volume 5% of normal brains. A few of these case studies have been used to argue the extraordinary claim that brain volume has little or nothing to do with intelligence; authors have argued that hydrocephalus suggests enormous untapped cognitive potential which are tapped into rarely for repairs and can boost intelligence on net, or that intelligence/consciousness are non-material or tapping into ESP.
I point out why this claim is almost certainly untrue because it predicts countless phenomena we never observe, and investigate the claimed examples in more detail: the cases turn out to be suspiciously unverifiable (Lorber), likely fraudulent (Oliveira), or actually low intelligence (Feuillet). It is unclear if high-functioning cases of hydrocephalus even have less brain mass, as opposed to lower proxy measures like brain volume.
I then summarize anthropologist John Hawks’s criticisms of the original hydrocephalus author: his brain imaging data could not have been as precise as claimed, he studied a selective sample, the story of the legendary IQ 126 hydrocephalus patient raises questions as to how normal or intelligent he really was, and hydrocephalus in general appears to be no more anomalous or hard-to-explain than many other kinds of brain injuries, and in a comparison, hemispherectomies, removing or severing a hemisphere, has produced no anomalous reports of above-average intelligence (just deficits), though they ought to be just the same in terms of repairs or ESP.
That hydrocephalus cases can reach roughly normal levels of functioning, various deficits aside, can be explained by brain size being one of several relevant variables, brain plasticity enabling cognitive flexibility & recovery from gradually-developing conditions, and overparameterization giving robustness to damage and poor environments, and learning ability. The field of deep learning has observed similar phenomenon in training of artificial neural networks. This is consistent with Lorber’s original contention that the brain was more robust, and hydrocephalus was more treatable, than commonly accepted, but does not support any of the more exotic interpretations since put on his findings.
In short, there is little anomalous to explain, and standard brain-centric accounts appear to account for existing verified observations without much problem or resort to extraordinary claims.
What books are hardest for a reader who starts them to finish, and most likely to be abandoned? I scrape a crowdsourced tag, abandoned, from the GoodReads book social network on 2019-12-09 to estimate conditional probability of being abandoned.
The default GoodReads tag interface presents only raw counts of tags, not counts divided by total ratings (=reads). This conflates popularity with probability of being abandoned: a popular but rarely-abandoned book may have more abandoned tags than a less popular but often-abandoned book. There is also residual error from the winner’s curse where books with fewer ratings are more mis-estimated than popular books. I fix that to see what more correct rankings look like.
Correcting for both changes the top-5 ranking completely, from (raw counts):
I also consider a model adjusting for covariates (author/average-rating/year), to see what books are most surprisingly often-abandoned given their pedigrees & rating etc. Abandon rates increase the newer a book is, and the lower the average rating.
Books at the top of the adjusted list appear to reflect a mix of highly-popular authors changing genres, and ‘prestige’ books which are highly-rated but a slog to read.
These results are interesting for how they highlight how people read books for many reasons (such as marketing campaigns, literary prestige, or following a popular author), and this is reflected in their decision whether to continue reading or to abandon a book.
One of the most distinctive features of Tufte’s style is his extensive use of sidenotes.3 Sidenotes are like footnotes, except they don’t force the reader to jump their eye to the bottom of the page, but instead display off to the side in the margin. Perhaps you have noticed their use in this document already. You are very astute.
Sidenotes are a great example of the web not being like print. On sufficiently large viewports, Tufte CSS uses the margin for sidenotes, margin notes, and small figures. On smaller viewports, elements that would go in the margin are hidden until the user toggles them into view. The goal is to present related but not necessary information such as asides or citations as close as possible to the text that references them. At the same time, this secondary information should stay out of the way of the eye, not interfering with the progression of ideas in the main text.
…If you want a sidenote without footnote-style numberings, then you want a margin note. Notice there isn’t a number preceding the note. On large screens, a margin note is just a sidenote that omits the reference number. This lessens the distracting effect taking away from the flow of the main text, but can increase the cognitive load of matching a margin note to its referent text.
Over time, I developed a certain google-fu and expertise in finding references, papers, and books online. Some of these tricks are not well-known, like checking the Internet Archive (IA) for books. I try to write down my search workflow, and give general advice about finding and hosting documents, with demonstration case studies.
Socioeconomic position (SEP) is a multi-dimensional construct reflecting (and influencing) multiple socio-cultural, physical, and environmental factors. In a sample of 286,301 participants from UK Biobank, we identify 30 (29 previously unreported) independent-loci associated with income. Using a method to meta-analyze data from genetically-correlated traits, we identify an additional 120 income-associated loci. These loci show clear evidence of functionality, with transcriptional differences identified across multiple cortical tissues, and links to GABA-ergic and serotonergic neurotransmission. By combining our genome wide association study on income with data from eQTL studies and chromatin interactions, 24 genes are prioritized for follow up, 18 of which were previously associated with intelligence. We identify intelligence as one of the likely causal, partly-heritable phenotypes that might bridge the gap between molecular genetic inheritance and phenotypic consequence in terms of income differences. These results indicate that, in modern era Great Britain, genetic effects contribute towards some of the observed socioeconomic inequalities.
The two best predictors of children’s educational achievement available from birth are parents’ socioeconomic status (SES) and, recently, children’s inherited DNA differences that can be aggregated in genome-wide polygenic scores (GPS). Here, we chart for the first time the developmental interplay between these two predictors of educational achievement at ages 7, 11, 14 and 16 in a sample of almost 5,000 UK school children. We show that the prediction of educational achievement from both GPS and SES increases steadily throughout the school years. Using latent growth curve models, we find that GPS and SES not only predict educational achievement in the first grade but they also account for systematic changes in achievement across the school years. At the end of compulsory education at age 16, GPS and SES, respectively, predict 14% and 23% of the variance of educational achievement. Analyses of the extremes of GPS and SES highlight their influence and interplay: In children who have high GPS and come from high SES families, 77% go to university, whereas 21% of children with low GPS and from low SES backgrounds attend university. We find that the associations of GPS and SES with educational achievement are primarily additive, suggesting that their joint influence is particularly dramatic for children at the extreme ends of the distribution.
Genome-wide polygenic scores (GPS) and socioeconomic status (SES) account together for 27% of the variance in educational achievement from age 7 through 16 years
The predictive validity of GPS and SES increases over the course of compulsory schooling
The association of GPS and SES is primarily additive: their joint long-term influence is particularly pronounced in children at the extreme ends of the distribution
77% of children with high GPS from high SES families go to university compared to 21% of children with low GPS from low SES
Polygenic scores are increasingly powerful predictors of educational achievement. It is unclear, however, how sets of polygenic scores, which partly capture environmental effects, perform jointly with sets of environmental measures, which are themselves heritable, in prediction models of educational achievement.
Here, for the first time, we systematically investigate gene-environment correlation (rGE) and interaction (G×E) in the joint analysis of multiple genome-wide polygenic scores (GPS) and multiple environmental measures as they predict tested educational achievement (EA). We predict EA in a representative sample of 7,026 16-year-olds, with 20 GPS for psychiatric, cognitive and anthropometric traits, and 13 environments (including life events, home environment, and SES) measured earlier in life. Environmental and GPS predictors were modelled, separately and jointly, in penalized regression models with out-of-sample comparisons of prediction accuracy, considering the implications that their interplay had on model performance.
Jointly modelling multiple GPS and environmental factors significantly improved prediction of EA, with cognitive-related GPS adding unique independent information beyond SES, home environment and life events. We found evidence for rGE underlying variation in EA (rGE = 0.36; 95% CIs = 0.29, 0.43). We estimated that 38% (95% CIs = 29%, 49%) of the GPS effects on EA were mediated by environmental effects, and in turn that 18% (95% CIs = 12%, 25%) of environmental effects were accounted for by the GPS model. Lastly, we did not find evidence that G×E effects collectively contributed to multivariable prediction.
Our multivariable polygenic and environmental prediction model suggests widespread rGE and unsystematic G×E contributions to EA in adolescence.
Finding 7. Most measures of the “environment” show significant genetic influence
Although it might seem a peculiar thing to do, measures of the environment widely used in psychological science—such as parenting, social support, and life events—can be treated as dependent measures in genetic analyses. If they are truly measures of the environment, they should not show genetic influence. To the contrary, in 1991, Plomin and Bergeman conducted a review of the first 18 studies in which environmental measures were used as dependent measures in genetically sensitive designs and found evidence for genetic influence for these measures of the environment. Significant genetic influence was found for objective measures such as videotaped observations of parenting as well as self-report measures of parenting, social support, and life events. How can measures of the environment show genetic influence? The reason appears to be that such measures do not assess the environment independent of the person. As noted earlier, humans select, modify, and create environments correlated with their genetic behavioral propensities such as personality and psychopathology (McAdams, Gregory, & Eley, 2013). For example, in studies of twin children, parenting has been found to reflect genetic differences in children’s characteristics such as personality and psychopathology (Avinun & Knafo, 2014; Klahr & Burt, 2014; Plomin, 1994).
Since 1991, more than 150 articles have been published in which environmental measures were used in genetically sensitive designs; they have shown consistently that there is significant genetic influence on environmental measures, extending the findings from family environments to neighborhood, school, and work environments. Kendler and Baker (2007) conducted a review of 55 independent genetic studies and found an average heritability of 0.27 across 35 diverse environmental measures (confidence intervals not available). Meta-analyses of parenting, the most frequently studied domain, have shown genetic influence that is driven by child characteristics (Avinun & Knafo, 2014) as well as by parent characteristics (Klahr & Burt, 2014). Some exceptions have emerged. Not surprisingly, when life events are separated into uncontrollable events (e.g., death of a spouse) and controllable life events (e.g., financial problems), the former show nonsignificant genetic influence. In an example of how all behavioral genetic results can differ in different cultures, Shikishima, Hiraishi, Yamagata, Neiderhiser, and Ando (2012) compared parenting in Japan and Sweden and found that parenting in Japan showed more genetic influence than in Sweden, consistent with the view that parenting is more child centered in Japan than in the West.
Researchers have begun to use GCTA to replicate these findings from twin studies. For example, GCTA has been used to show significant genetic influence on stressful life events (Power et al., 2013) and on variables often used as environmental measures in epidemiological studies such as years of schooling (C. A. Rietveld, Medland, et al., 2013). Use of GCTA can also circumvent a limitation of twin studies of children. Such twin studies are limited to investigating within-family (twin-specific) experiences, whereas many important environmental factors such as socioeconomic status (SES) are the same for two children in a family. However, researchers can use GCTA to assess genetic influence on family environments such as SES that differ between families, not within families. GCTA has been used to show genetic influence on family SES (Trzaskowski et al., 2014) and an index of social deprivation (Marioni et al., 2014).
Over the last 100,000 years, humans have spread across the globe and encountered a highly diverse set of environments to which they have had to adapt. Genome-wide scans of selection are powerful to detect selective sweeps. However, because of unknown fractions of undetected sweeps and false discoveries, the numbers of detected sweeps often poorly reflect actual numbers of selective sweeps in populations. The thousands of soft sweeps on standing variation recently evidenced in humans have also been interpreted as a majority of mis-classified neutral regions. In such a context, the extent of human adaptation remains little understood. We present a new rationale to estimate these actual numbers of sweeps expected over the last 100,000 years (denoted by X) from genome-wide population data, both considering hard sweeps and selective sweeps on standing variation. We implemented an approximate Bayesian computation framework and showed, based on computer simulations, that such a method can properly estimate X. We then jointly estimated the number of selective sweeps, their mean intensity and age in several 1000G African, European and Asian populations. Our estimations of X, found weakly sensitive to demographic misspecifications, revealed very limited numbers of sweeps regardless the frequency of the selected alleles at the onset of selection and the completion of sweeps. We estimated ~80 sweeps in average across fifteen 1000G populations when assuming incomplete sweeps only and ~140 selective sweeps in non-African populations when incorporating complete sweeps in our simulations. The method proposed may help to address controversies on the number of selective sweeps in populations, guiding further genome-wide investigations of recent positive selection.
“Extensive Mammalian Germline Genome Engineering”, Yanan Yue, Yinan Kan, Weihong Xu, Hong-Ye Zhao, Yixuan Zhou, Xiaobin Song, Jiajia Wu, Juan Xiong, Dharmendra Goswami, Meng Yang, Lydia Lamriben, Mengyuan Xu, Qi Zhang, Yu Luo, Jianxiong Guo, Shenyi Mao, Deling Jiao, Tien Dat Nguyen, Zhuo Li, Jacob V. Layer, Malin Li, Violette Paragas, Michele E. Youd, Zhongquan Sun, Yuan Ding, Weilin Wang, Hongwei Dou, Lingling Song, Xueqiong Wang, Lei Le, Xin Fang, Haydy George, Ranjith Anand, Shi Yun Wang, William F. Westlin, Marc Guell, James Markmann, Wenning Qin, Yangbin Gao, Hongjiang Wei, George M. Church, Luhan Yang (2019-12-19):
Xenotransplantation, specifically the use of porcine organs for human transplantation, has long been sought after as an alternative for patients suffering from organ failure. However, clinical application of this approach has been impeded by two main hurdles: 1) risk of transmission of porcine endogenous retroviruses (PERVs) and 2) molecular incompatibilities between donor pigs and humans which culminate in rejection of the graft. We previously demonstrated that all 25 copies of the PERV elements in the pig genome could be inactivated and live pigs successfully generated. In this study, we improved the scale of porcine germline editing from targeting a single repetitive locus with CRISPR to engineering 18 different loci using multiple genome engineering methods. we engineered the pig genome at 42 alleles using CRISPR-Cas9 and transposon and produced PERVKO·3KO·9TG pigs which carry PERV inactivation, xeno-antigen KO and 9 effective human transgenes.. The engineered pigs exhibit normal physiology, fertility, and germline transmission of the edited alleles. In vitro assays demonstrated that these pigs gain significant resistance to human humoral and cell mediated damage, and coagulation dysregulations, similar to that of allotransplantation. Successful creation of PERVKO·3KO·9TG pigs represents a significant step forward towards safe and effective porcine xenotransplantation, which also represents a synthetic biology accomplishment of engineering novel functions in a living organism.
One Sentence Summary
Extensive genome engineering is applied to modify pigs for safe and immune compatible organs for human transplantation
If any swine is fit to be an organ donor for people, then the dozens of pigs snuffling around Qihan Bio’s facility in Hangzhou, China, may be the best candidates so far. The Chinese company and its U.S. collaborators reported today that they have used the genome editor CRISPR to create the most extensively genetically engineered pigs to date—animals whose tissues, the researchers say, finally combine all the features necessary for a safe and successful transplant into humans. “This is the first prototype,” says Luhan Yang, a geneticist at Qihan Bio. In a preprint published today on bioRxiv, Qihan researchers and collaborators, including Cambridge, Massachusetts-based eGenesis—which Yang co-founded with Harvard University geneticist George Church—described the new generation of animals and various tests on their cells; the researchers have already begun to transplant the pigs’ organs into nonhuman primates, a key step toward human trials.
…In the new study, the team for the first time combined these PERV “knockouts” with a suite of other changes to prevent immune rejection, for a record-setting 13 modified genes. In pig ear cells, they removed three genes coding for enzymes that help produce molecules on pig cells that provoke an immune response. They also inserted six genes that inhibit various aspects of the human immune response and three more that help regulate blood coagulation. The researchers then put the DNA-containing nuclei of these edited cells into eggs from pig ovaries collected at a slaughterhouse. These eggs developed into embryos that were implanted into surrogate mothers. Cells from the resulting piglets got another round of edits to remove the PERV sequences, after which their DNA went into another set of egg cells to create a new generation of pigs with all the desired edits. (In future, Yang says, the team will try to make all the modifications in a single generation.)
The resulting pigs appeared healthy and fertile with functioning organs, the team reports today. And initial tests of their cells in lab dishes suggest their organs will be much less prone to immune rejection than those of unmodified pigs: The tendency of the pig cells to bind to certain human antibodies was reduced by 90%, and the modified cells better survived interactions with human immune cells. But a key test is still to come: Yang says her team has begun to transplant organs from the highly edited pigs into monkeys to gauge their safety and longevity.
The combination of edits described in the new paper is “a technical feat,” says Marilia Cascalho, a transplant immunologist at the University of Michigan in Ann Arbor. “Whether it offers an advantage [over other engineered pig organs]… the jury is out on that,” she says…Yang says that Qihan plans to remain “laser-focused” on preclinical studies in 2020, but expects to be testing pig organs in humans within 5 years. Many in the field now feel an inevitable momentum around xenotransplantation: “There is so much need for organs,” Cascalho says. “I think it’s going to be a reality.”
Xenotransplantation is a promising strategy to alleviate the shortage of organs for human transplantation. In addition to the concern on pig-to-human immunological compatibility, the risk of cross-species transmission of porcine endogenous retroviruses (PERVs) has impeded the clinical application of this approach. Earlier, we demonstrated the feasibility of inactivating PERV activity in an immortalized pig cell line. Here, we confirmed that PERVs infect human cells, and observed the horizontal transfer of PERVs among human cells. Using CRISPR-Cas9, we inactivated all the PERVs in a porcine primary cell line and generated PERV-inactivated pigs via somatic cell nuclear transfer. Our study highlighted the value of PERV inactivation to prevent cross-species viral transmission and demonstrated the successful production of PERV-inactivated animals to address the safety concern in clinical xenotransplantation.
Granulosa cells can be reprogrammed to form oocytes by chemical reprogramming
Rock inhibition and crotonic acid facilitate the chemical induction of gPSCs from GCs
PGCLCs derived from gPSCs exhibit longer telomeres and high genomic stability
The generation of genomically stable and functional oocytes has great potential for preserving fertility and restoring ovarian function. It remains elusive whether functional oocytes can be generated from adult female somatic cells through reprogramming to germline-competent pluripotent stem cells (gPSCs) by chemical treatment alone. Here, we show that somatic granulosa cells isolated from adult mouse ovaries can be robustly induced to generate gPSCs by a purely chemical approach, with additional Rock inhibition and critical reprogramming facilitated by crotonic sodium or acid. These gPSCs acquired high germline competency and could consistently be directed to differentiate into primordial-germ-cell-like cells and form functional oocytes that produce fertile mice. Moreover, gPSCs promoted by crotonylation and the derived germ cells exhibited longer telomeres and high genomic stability like PGCs in vivo, providing additional evidence supporting the safety and effectiveness of chemical induction, which is particularly important for germ cells in genetic inheritance. [Keywords: chemical reprogramming, pluripotent stem cell, oocyte, granulosa cell]
Ovarian follicles are the basic functional unit of the ovary and consist of an oocyte, the immature egg, which is surrounded by granulosa cells. Besides being crucial to the development of follicles, studies have shown that granulosa cells possess plasticity that shows stem cell-like properties.
“The thing about in vitro fertilization is that they only use the oocyte for the procedure,” says senior author Lin Liu, of the College of Life Sciences at Nankai University. “After the egg retrieval, the granulosa cells in the follicle are discarded. It got us thinking, what if we can utilize these granulosa cells? Since every egg has thousands of granulosa cells surrounding it, if we can induce them into pluripotent cells and turn those cells into oocytes, aren’t we killing two birds with one stone?”
Granulosa cells tend to undergo cell death and differentiation once removed from the follicles. Liu and his team including Ph.D. students Chenglei Tian and Haifeng Fu developed a chemical “cocktail” with Rock inhibitor and crotonic acid for creating chemically induced pluripotent stem cells (CiPSCs) from granulosa cells. The research team introduced Rock inhibitor to prevent cell death and promote proliferation. In combination with other important small chemicals, crotonic acid facilitates the induction of granulosa cells into germline-competent pluripotent stem cells that exhibit pluripotency similar to embryonic stem cells.
A court in China on Monday sentenced He Jiankui, the researcher who shocked the global scientific community when he claimed that he had created the world’s first genetically edited babies, to three years in prison for carrying out “illegal medical practices.” In a surprise announcement from a trial that was closed to the public, the court in the southern city of Shenzhen found Dr. He guilty of forging approval documents from ethics review boards to recruit couples in which the man had H.I.V. and the woman did not, Xinhua, China’s official news agency, reported. Dr. He had said he was trying to prevent H.I.V. infections in newborns, but the state media on Monday said he deceived the subjects and the medical authorities alike.
Dr. He, 35, sent the scientific world into an uproar last year when he announced at a conference in Hong Kong that he had created the world’s first genetically edited babies—twin girls. On Monday, China’s state media said his work had resulted in a third genetically edited baby, who had been previously undisclosed.
Dr. He pleaded guilty and was also fined $430,000, according to Xinhua. In a brief trial, the court also handed down prison sentences to two other scientists who it said had “conspired” with him: Zhang Renli, who was sentenced to two years in prison, and Qin Jinzhou, who got a suspended sentence of one and a half years…The court said the trial had to be closed to the public to guard the privacy of the people involved.
AI Dungeon 2 is a completely AI generated text adventure built with OpenAI’s largest GPT-2 model. It’s a first of its kind game that allows you to enter and will react to any action you can imagine.
What is this?
Google Colab is a way to experience machine learning for free. Google provides GPUs that you can run code in. Because this game exploded however, Google likely won’t be able to allow free usage of it for AI Dungeon for very long. We are almost done making an app version of the game where you will be able to play AI Dungeon 2. Until that’s released you can still play the game here.
Main mirrors of AI Dungeon 2 are currently down due to high download costs.
We are using BitTorrent as a temporary solution to host game files and keep this game alive. It’s not fast, but it’s the best we’ve got right now.
If you want to help, best thing you can do is to download this torrent file with game files and seed it indefinitely to the best of your ability. This will help new players download this game faster, and discover the vast worlds of AI Dungeon 2!
Follow [@nickwalton00](https://nitter.net/nickwalton00) on Twitter for updates on when it will be available again.
[Demonstration dialogue of interacting with a GPT-2-1.5b trained on text adventures/RPGs. The player chooses to join a band of orcs as a musician and tries to steer the game towards orc rights, with moderate success, reaching the Emperor himself.]
In the first AI Dungeon, we created and deployed a deep learning generated text adventure using OpenAI’s 124M parameter GPT-2 model. In the interest of computational cost, possible actions and their results were generated and given to the player to choose from.
In AI Dungeon 2 we do away with pregenerated actions and allow the user to enter any action. The model then continues generating the story resulting from that action. We also upgrade the size of our model to OpenAI’s largest 1.5B parameter model and fine-tune it on a collection of text adventures obtained from chooseyourstory.com. Following the example of the Salesforce CTRL model we also modified the sampling of our model to add a penalty to already generated words to reduce the repetition issues that GPT-2 has and enable a lower temperature setting.
When playing AI Dungeon 2 the player can choose from several possible settings and characters. These choices generate a starting prompt and context sentence. The context sentence is always included as input to the model to help keep it grounded to the chosen setting while the starting prompt is the beginning of the story.
…After the Gnoll War ends, the orcs decide to hold a big celebration. You immediately organize concert to entertain everyone. It goes well and you even win a few awards for your work. However, you still aren’t satisfied and you decide to do something else.
> write plays for the orcs to perform in
You write a series of plays based on the stories of the orcs. These works are performed throughout the Empire and they are very popular. You also take an interest in the orcs’ future and begin to study their language and culture. In time you even develop a liking for the orcs themselves.
> found a human-orc friendship organization
While you were studying the orcs, you came across a group called the Human-Orc Friendship Association (HOOFA). While they weren’t exactly like the Orcs, they had similar goals. You decided to form a separate chapter just for them.
> write some pamphlets support orc-human friendship
You wrote a pamphlet explaining what the HOOFA was and how it worked. You also explained that the orcs were the only race that could truly understand each other and that humans were inferior to the orcs.
> promote inter-racial marriage between humans and orcs
In order to promote inter-racial marriage between humans and orcs, you started a branch of the HOOFA called the Inter-Race Marriage Bureau (IRMB) which was located in the city of Kishak…
“Dota 2 with Large Scale Deep Reinforcement Learning”, OpenAI, :, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang (2019-12-13):
On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.
At OpenAI, we’ve used the multiplayer video game Dota 2 as a research platform for general-purpose AI systems. Our Dota 2 AI, called OpenAI Five, learned by playing over 10,000 years of games against itself. It demonstrated the ability to achieve expert-level performance, learn human-AI cooperation, and operate at internet scale.
[OpenAI final report on OA5: timeline, training curve, index of blog posts.]
We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, our AI agent, called Tencent Solo, can defeat top professional human players in full 1v1 games.
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L2) to O(LlogL), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.
Discussion of removing a major architectural limitation in Transformer neural networks: the length of the input it can look at. Beyond a few thousand inputs, the resource requirements explode quadratically, rendering it infeasible to encode raw text at the character level, much less use entire books, images, or many other kinds of data which could be useful. Even for text, this inability also forces limitations like the use of BPE text encoding (responsible for sabotaging GPT-3’s rhyming, among other things), forgetfulness, limits to prompt programming, and inability to write coherent long texts.
Possibilities for fixing this generally fall into
adding state, through recurrency (a memory) or creating a compressed history/state as an explicit summary
tinkering with matrix algebra to remove the quadratic explosion while still keeping more or less the same self-attention mechanism
approximating self-attention: using attention on only a small subset of tokens at any time (dodging the quadratic limit), or using a mix of local and global attention (local attentions to do most of the work, and global attention on top of the local attentions, each one avoiding the quadratic by considering only a few inputs at a time)
miscellaneous tricks: removing parts, using only randomized untrainable components (with no need for backprop) etc
The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.
Discussion of TWDNEv3, launched January 2020. TWDNEv3 upgrades TWDNEv2 to use 100k anime portraits from an anime portraitStyleGAN2, an improvement to StyleGAN released in December 2019, which removes the blob artifacts and is generally of somewhat higher visual quality. TWDNEv3 provides images in 3 ranges of diversity, showing off both narrow but high quality samples and more wild samples. It replaces the StyleGAN 1 faces and portrait samples.
Many researchers rely on meta-analysis to summarize research evidence. However, there is a concern that publication bias and selective reporting may lead to biased meta-analytic effect sizes. We compare the results of meta-analyses to large-scale preregistered replications in psychology carried out at multiple laboratories. The multiple-laboratory replications provide precisely estimated effect sizes that do not suffer from publication bias or selective reporting. We searched the literature and identified 15 meta-analyses on the same topics as multiple-laboratory replications. We find that meta-analytic effect sizes are significantly different from replication effect sizes for 12 out of the 15 meta-replication pairs. These differences are systematic and, on average, meta-analytic effect sizes are almost 3 times as large as replication effect sizes. We also implement 3 methods of correcting meta-analysis for bias, but these methods do not substantively improve the meta-analytic results.
In the 30 years that biomedical researchers have worked determinedly to find a cure for Alzheimer’s disease, their counterparts have developed drugs that helped cut deaths from cardiovascular disease by more than half, and cancer drugs able to eliminate tumors that had been incurable. But for Alzheimer’s, not only is there no cure, there is not even a disease-slowing treatment.
…In more than two dozen interviews, scientists whose ideas fell outside the dogma recounted how, for decades, believers in the dominant hypothesis suppressed research on alternative ideas: They influenced what studies got published in top journals, which scientists got funded, who got tenure, and who got speaking slots at reputation-buffing scientific conferences. The scientists described the frustrating, even career-ending, obstacles that they confronted in pursuing their research. A top journal told one that it would not publish her paper because others hadn’t. Another got whispered advice to at least pretend that the research for which she was seeking funding was related to the leading idea—that a protein fragment called beta-amyloid accumulates in the brain, creating neuron-killing clumps that are both the cause of Alzheimer’s and the key to treating it. Others could not get speaking slots at important meetings, a key showcase for research results. Several who tried to start companies to develop Alzheimer’s cures were told again and again by venture capital firms and major biopharma companies that they would back only an amyloid approach.
…For all her regrets about the amyloid hegemony, Neve is an unlikely critic: She co-led the 1987 discovery of mutations in a gene called APP that increases amyloid levels and causes Alzheimer’s in middle age, supporting the then-emerging orthodoxy. Yet she believes that one reason Alzheimer’s remains incurable and untreatable is that the amyloid camp “dominated the field,” she said. Its followers were influential “to the extent that they persuaded the National Institute of Neurological Disorders and Stroke [part of the National Institutes of Health] that it was a waste of money to fund any Alzheimer’s-related grants that didn’t center around amyloid.” To be sure, NIH did fund some Alzheimer’s research that did not focus on amyloid. In a sea of amyloid-focused grants, there are tiny islands of research on oxidative stress, neuroinflammation, and, especially, a protein called tau. But Neve’s NINDS program officer, she said, “told me that I should at least collaborate with the amyloid people or I wouldn’t get any more NINDS grants.” (She hoped to study how neurons die.) A decade after her APP discovery, a disillusioned Neve left Alzheimer’s research, building a distinguished career in gene editing. Today, she said, she is “sick about the millions of people who have needlessly died from” the disease.
Dr. Daniel Alkon, a longtime NIH neuroscientist who started a company to develop an Alzheimer’s treatment, is even more emphatic: “If it weren’t for the near-total dominance of the idea that amyloid is the only appropriate drug target,” he said, “we would be 10 or 15 years ahead of where we are now.”
Making it worse is that the empirical support for the amyloid hypothesis has always been shaky. There were numerous red flags over the decades that targeting amyloid alone might not slow or reverse Alzheimer’s. “Even at the time the amyloid hypothesis emerged, 30 years ago, there was concern about putting all our eggs into one basket, especially the idea that ridding the brain of amyloid would lead to a successful treatment,” said neurobiologist Susan Fitzpatrick, president of the James S. McDonnell Foundation. But research pointing out shortcomings of the hypothesis was relegated to second-tier journals, at best, a signal to other scientists and drug companies that the criticisms needn’t be taken too seriously. Zaven Khachaturian spent years at NIH overseeing its early Alzheimer’s funding. Amyloid partisans, he said, “came to permeate drug companies, journals, and NIH study sections,” the groups of mostly outside academics who decide what research NIH should fund. “Things shifted from a scientific inquiry into an almost religious belief system, where people stopped being skeptical or even questioning.”
…“You had a whole industry going after amyloid, hundreds of clinical trials targeting it in different ways,” Alkon said. Despite success in millions of mice, “none of it worked in patients.”
Scientists who raised doubts about the amyloid model suspected why. Amyloid deposits, they thought, are a response to the true cause of Alzheimer’s and therefore a marker of the disease—again, the gravestones of neurons and synapses, not the killers. The evidence? For one thing, although the brains of elderly Alzheimer’s patients had amyloid plaques, so did the brains of people the same age who died with no signs of dementia, a pathologist discovered in 1991. Why didn’t amyloid rob them of their memories? For another, mice engineered with human genes for early Alzheimer’s developed both amyloid plaques and dementia, but there was no proof that the much more common, late-onset form of Alzheimer’s worked the same way. And yes, amyloid plaques destroy synapses (the basis of memory and every other brain function) in mouse brains, but there is no correlation between the degree of cognitive impairment in humans and the amyloid burden in the memory-forming hippocampus or the higher-thought frontal cortex. “There were so many clues,” said neuroscientist Nikolaos Robakis of the Icahn School of Medicine at Mount Sinai, who also discovered a mutation for early-onset Alzheimer’s. “Somehow the field believed all the studies supporting it, but not those raising doubts, which were very strong. The many weaknesses in the theory were ignored.”
Objective: To determine the temporal sequence of objectively defined subtle cognitive difficulties (Obj-SCD) in relation to amyloidosis and neurodegeneration, the current study examined the trajectories of amyloid PET and medial temporal neurodegeneration in participants with Obj-SCD relative to cognitively normal (CN) and mild cognitive impairment (MCI) groups.
Method: A total of 747 Alzheimer’s Disease Neuroimaging Initiative participants (305 CN, 153 Obj-SCD, 289 MCI) underwent neuropsychological testing and serial amyloid PET and structural MRI examinations. Linear mixed effects models examined 4-year rate of change in cortical 18F-florbetapir PET, entorhinal cortex thickness, and hippocampal volume in those classified as Obj-SCD and MCI relative to CN.
Result: Amyloid accumulation was faster in the Obj-SCD group than in the CN group; the MCI and CN groups did not significantly differ from each other. The Obj-SCD and MCI groups both demonstrated faster entorhinal cortical thinning relative to the CN group; only the MCI group exhibited faster hippocampal atrophy than CN participants.
Conclusion: Relative to CN participants, Obj-SCD was associated with faster amyloid accumulation and selective vulnerability of entorhinal cortical thinning, whereas MCI was associated with faster entorhinal and hippocampal atrophy. Findings suggest that Obj-SCD, operationally defined using sensitive neuropsychological measures, can be identified prior to or during the preclinical stage of amyloid deposition. Further, consistent with the Braak neurofibrillary staging scheme, Obj-SCD status may track with early entorhinal pathologic changes, whereas MCI may track with more widespread medial temporal change. Thus, Obj-SCD may be a sensitive and noninvasive predictor of encroaching amyloidosis and neurodegeneration, prior to frank cognitive impairment associated with MCI.
This monograph reviews our current understanding of the physical dynamics of ice crystal growth, focusing on the spontaneous formation of complex structures from water vapor (called snow crystals) as a function of temperature, supersaturation, background gas pressure, and other extrinsic parameters. Snow crystal growth is a remarkably rich and rather poorly understood phenomenon, requiring a synthesis of concepts from materials science, crystal-growth theory, statistical mechanics, diffusion-limited solidification, finite-element modeling, and molecular surface processes. Building upon recent advances in precision measurement techniques, computation modeling methods, and molecular dynamics simulations of crystalline surfaces, I believe we are moving rapidly toward the long-sought goal of developing a full physical model of snow crystal formation, using ab initio molecular dynamics simulations to create a semi-empirical characterization of the nanoscale surface attachment kinetics, and then incorporating that into a full computational model that reproduces the growth of macroscopic crystalline structures. Section 1 of this monograph deals mainly with the material properties of ice Ih in equilibrium, including thermodynamics quantities, facet surface structures, terrace step energies, and crystal twinning behaviors.
Welcome to SnowCrystals.com! Your online guide to snowflakes, snow crystals, and other ice phenomena. SnowCrystals.com has been bringing you snowflake photos and facts since February 1, 1999. Over 26 million visitors so far! [Photos / books / science; designer snowflakes, how to grow snowflakes, "identical-twin" snowflakes etc]
Can NP-complete problems be solved efficiently in the physical universe? I survey proposals including soap bubbles, protein folding, quantum computing, quantum advice, quantum adiabatic algorithms, quantum-mechanical nonlinearities, hidden variables, relativistic time dilation, analog computing, Malament-Hogarth spacetimes, quantum gravity, closed timelike curves, and "anthropic computing." The section on soap bubbles even includes some "experimental" results. While I do not believe that any of the proposals will let us solve NP-complete problems efficiently, I argue that by studying them, we can learn something not only about computation but also about physics.
A recent paper in Sci. Adv. by Miller et al. concludes that GREs do not help predict whether physics grad students will get Ph.D.s. The paper makes numerous elementary statistics errors, including introduction of unnecessary collider-like stratification bias, variance inflation by collinearity and range restriction, omission of needed data (some subsequently provided), a peculiar choice of null hypothesis on subgroups, blurring the distinction between failure to reject a null and accepting a null, and an extraordinary procedure for radically inflating confidence intervals in a figure. Release of results of simpler models, e.g. without unnecessary stratification, would fix some key problems. The paper exhibits exactly the sort of research techniques which we should be teaching students to avoid.
In statistics and causal graphs, a variable is a collider when it is causally influenced by two or more variables. The name "collider" reflects the fact that in graphical models, the arrow heads from variables that lead into the collider appear to "collide" on the node that is the collider. They are sometimes also referred to as inverted forks.
…In the process of reading the book and encountering some extraordinary claims about sleep, I decided to compare the facts it presented with the scientific literature. I found that the book consistently overstates the problem of lack of sleep, sometimes egregiously so. It misrepresents basic sleep research and contradicts its own sources.
In one instance, Walker claims that sleeping less than six or seven hours a night doubles one’s risk of cancer—this is not supported by the scientific evidence (Section 1.1). In another instance, Walker seems to have invented a “fact” that the WHO has declared a sleep loss epidemic (Section 4). In yet another instance, he falsely claims that the National Sleep Foundation recommends 8 hours of sleep per night, and then uses this “fact” to falsely claim that two-thirds of people in developed nations sleep less than the “the recommended eight hours of nightly sleep” (Section 5).
Walker’s book has likely wasted thousands of hours of life and worsened the health of people who read it and took its recommendations at face value (Section 7).
Just to start with, the post has a wonderful descriptive title. And the laffs start right away:
Positively Nabokovian, I’d say. I mean it. The above table of contents makes me want to read more.
I’ve not read Walker’s book and I don’t know anything about sleep research, so I won’t try to judge Guzey’s claims. I read through and I found Guzey’s arguments to be persuasive, but, hey, I’m easily persuaded.
I’d be happy to read a followup article by Michael Walker, “Alexey Guzey’s ‘Matthew Walker’s “Why We Sleep” Is Riddled with Scientific and Factual Errors’ Is Riddled with Scientific and Factual Errors.” That (hypothetical) post could completely turn me around! Then, of course, I’d be waiting for Guzey’s reply, “Michael Walker’s ‘Alexey Guzey’s “Matthew Walker’s ‘Why We Sleep’ Is Riddled with Scientific and Factual Errors” Is Riddled with Scientific and Factual Errors’ Is Riddled with Scientific and Factual Errors.” At that point, I’d probably have heard enough to have formed a firm opinion. Right now, the ball is totally in Walker’s court.
…Let me tell you a story. I went to graduate school at Harvard. Finest university in the world. My first day in a Harvard class, I was sitting with rapt attention, learning all sorts of interesting and important things (for reals; it was an amazing class that motivated me to become a statistician), sitting at one of those chairs with a desk attached to it, you know, the kind of chair where the desk part flips up so it’s in front of you, and, on the bottom of that desk was a wad of gum.
Back when I was in junior high, gum was almost a form of currency. I’d buy a pack of grape Bubble Yum for a quarter at the corner store on the way to school, then chew it in the morning during the endless hours between first period and lunch. I’d put one piece of gum in my mouth, chew it until it lost all its flavor, then add the second piece, chew it etc., and continue until I had a massive wad, all five pieces, ultimately flavorless, and I’d chew and chew and blow huge bubbles when the teacher wasn’t looking.
I’m not trying to make myself out into some big rebel here; the point is, we all did that. So of course there was yucky gum under all the desks. You knew to never run your hands under a desk, cos you never knew what might turn up. That was junior high.
Then in high school, everyone was much more mature, a lot less gum chewing . . . but still, gum under the desks. I took classes at the University of Maryland, a fine university with an OK basketball team . . . still, they had gum. Then I went to MIT, one of the finest engineering schools in the world . . . yup, gum. But Harvard? I’d hoped Harvard was better than that. But it wasn’t.
Anyway, that’s how I felt, learning that this purveyor of (possibly) horribly false claims is not just a professor of neuroscience at a top university—we know that top universities have lots of frauds—but was hired by Google. Google! Here I am, almost sixty years old (I don’t feel close to 60, but that’s my problem, not yours), and still there’s room for disillusionment.
So. It’s been a week since Alexey Guzey posted his wonderfully-titled article, “Matthew Walker’s ‘Why We Sleep’ Is Riddled with Scientific and Factual Errors.”…As of this writing, the ball remains in Walker’s court.
I googled ”matthew walker” “alexey guzey” and ”matthew walker” sleep and a few other things, but nowhere did I find any response from Walker to Guzey’s criticisms.
It’s hard for me to imagine that Walker hasn’t heard about Guzey’s article by now, but I guess it’s possible that he (Walker) is on vacation or that he’s preparing a response but has not finished yet….While we’re waiting for Walker to respond, I had a few more thoughts:
A few years ago, if someone were to claim that a celebrated professor of neuroscience and psychology at a major university had published a book on his own field of expertise, and the book was full of scientific and factual errors, that would’ve been a major scandal, no? But now, we’re like, yeah, sure, that’s just more same old same old. As the saying goes, the big scandal is how little a scandal this has been.
What would be really cool would be if NPR and Joe Rogan ran interviews with Alexey Guzey about this story. NPR probably won’t bite. But Joe Rogan . . . he might go for this, right? I bet Joe Rogan, or someone on his team, reads social media. And Rogan likes combat. He’s had Walker on his show, now time to have Guzey come in with the critique. That said, I don’t know that a podcast is the best format for such a debate. I think blogging is a better way to go, as then there’s enough space to lay out all the evidence.
Assuming Guzey’s criticisms hold up, I’m still trying to figure out what happened with that book. How could Walker introduce so many errors on his own area of expertise (or, I guess I should say, supposed expertise)? Was he just really really confused? Did he delegate the research and writing to lazy research assistants? Did he feel that his underlying story was important so the details didn’t matter? Did he conduct his research by putting all his notes onto index cards, then mistype material off the cards? I just don’t have a good way of thinking about these things.
Guzey’s article is careful and in many ways bulletproof: he backs up each of his statements, he doesn’t exaggerate (even for humorous purposes), and nobody seems to have found any mistakes in what he wrote. In addition, Guzey has gone on the web and responded to comments: where people claim he got things wrong, he has responded in detail.
This is excellent behavior on Guzey’s part but I just want to say that it should not be required. Suppose, just for the sake of argument, that Guzey was gratuitously rude, that he made some claims without making his the evidence clear, even that he made some mistakes. Suppose that he spent 13 hours or even 1.3 hours rather than 130 hours writing this post, so that he only got to the highlights and didn’t carefully check everything he wrote? That would be unfortunate, but it wouldn’t make his critique less valid.
What I’m saying is: by preparing a critique that’s clean, clear, well sourced, well written—actually enjoyable to read—, a critique that doesn’t make any questionable claims, by being so careful, Guzey has done us a favor. He’s made it easier to follow what he’s written, and he’s making it more difficult for someone to dismiss his arguments on superficial grounds. He’s raising the game, and that’s wonderful.
But if Guzey hadn’t gone to that trouble, he could still be making a useful contribution. It would just be the duty of Walker to extract that contribution.
Last month we reported on the book Why We Sleep, which had been dismantled in a long and detailed blog post by Alexey Guzey. A week later I looked again, and Walker had not responded to Guzey in any way. In the meantime, Why We Sleep has also been endorsed by O.G. software entrepreneur Bill Gates. Programmers typically have lots of personal experience of sleep deprivation, so this is a topic close to their hearts.
As of this writing, it seems that Walker still has not responded to most of the points Guzey made about errors in his book. The closest thing I can find is this post dated 2019-12-19, titled “Why We Sleep: Responses to questions from readers.” The post is on a site called On Sleep that appears to have been recently set up—I say this because I see no internet record of it, and it currently has just this one post. I’m not trying to be some sort of sleuth here, I’m just trying to figure out what’s going on. For now, I’ll assume that this post is written by Walker.
The post begins:
The aim of the book, Why We Sleep, is to provide the general public access to a broad collection of sleep research. Below, I address thoughtful questions that have been raised regarding the book and its content in reviews, online forums and direct emails that I have received. Related, I very much appreciate being made aware of any errors in the book requiring revision. I see this as a key part of good scholarship. Necessary corrections will be made in future editions.
The first link above goes to a page of newspaper and magazine reviews, and the second link goes to Guzey’s post. I didn’t really see any questions raised regarding the book in those newspaper and magazine reviews, so I’m guessing that the “thoughtful questions” that Walker is referring to are coming entirely, or nearly entirely, from Guzey. It seems odd for Walker to cite “online forums” and only link to one of them. Also, although Walker links to Guzey, he does not address the specific criticisms Guzey made of his book.
…Based on his book and his Ted talk, it seems that Walker has a message to send, and he doesn’t care much about the details. He’s sloppy with sourcing, gets a lot wrong, and has not responded well to criticism.
But this does not mean we should necessarily dismiss his message. Ultimately his claims need to be addressed on their merits.
In his post, Matthew Walker’s “Why We Sleep” Is Riddled with Scientific and Factual Errors” (see our discussions here, here, and here), Alexey Guzey added the following stunner:
We’ve left “super-important researcher too busy to respond to picky comments” territory and left “well-intentioned but sloppy researcher can’t keep track of citations” territory and entered “research misconduct” territory.
…This seems like a good time to revisit that Dan Davies line:
Good ideas do not need lots of lies told about them in order to gain public acceptance.
I study sleep. While some of walker’s claims may be hyperbolic, I think they are within reason and justified by the important message he is trying to convey. Too many people have begun to forego sleep in their health choices, and he has helped raise awareness of sleep’s role in our health.
Many of these criticisms are quite unfair or misunderstanding the science...
In politics, a noble lie is a myth or untruth, often, but not invariably, of a religious nature, knowingly propagated by an elite to maintain social harmony or to advance an agenda. The noble lie is a concept originated by Plato as described in the Republic.
There was widespread support for animal welfare in Nazi Germany among the country's leadership. Adolf Hitler and his top officials took a variety of measures to ensure animals were protected. Many Nazi leaders, including Hitler and Hermann Göring, were supporters of animal rights and conservation.
[Wide-ranging review of how social media, government database hacks, personal genomics, open-source intelligence, and pervasive surveillance are destroying traditional espionage, as undercover agents are unable to enter countries or recruit sources without being instantly exposed, forcing ever greater reliance on signals intelligence/hacking. Failures in OPSEC have resulted in entire countries going dark and the exposure of multiple US espionage networks and execution of sources, as well as embarrassing many countries when organizations like Bellingcat are able to expose agents and operations. While agencies like the FBI and CIA have begun adapting to the new reality, they have a long way to go, and countries like Russia or China or North Korea will only become harder to penetrate and obtain intelligence on.]
[Aging research over the past year, 2019. Categories include: The State of Funding, Conferences and Community, Clinical Development, Cellular Senescence, Mitochondria in Aging, Nuclear DNA Damage, Cross-Links, Neurodegeneration, Upregulation of Cell Maintenance, In Vivo Cell Reprogramming, Parabiosis, The Gut Microbiome in Aging, Biomarkers of Aging, Cancer, The Genetics of Longevity, Regenerative Medicine, Odds and Ends, Short Articles, and In Conclusion.]
As has been the case for a few years now, progress towards the implementation of rejuvenation therapies is accelerating dramatically, ever faster with each passing year. While far from everyone is convinced that near term progress in addressing human aging is plausible, it is undeniable that we are far further ahead than even a few years ago. Even the public at large is beginning to catch on. While more foresightful individuals of past generations could do little more than predict a future of rejuvenation and extended healthy lives, we are in a position to make it happen.
In 50s Middle Grove, things didn’t go according to plan either, though the surprise was of a different nature. Despite his pretence of leaving the 11-year-olds to their own devices, Sherif and his research staff, posing as camp counsellors and caretakers, interfered to engineer the result they wanted. He believed he could make the two groups, called the Pythons and the Panthers, sworn enemies via a series of well-timed “frustration exercises”. These included his assistants stealing items of clothing from the boys’ tents and cutting the rope that held up the Panthers’ homemade flag, in the hope they would blame the Pythons. One of the researchers crushed the Panthers’ tent, flung their suitcases into the bushes and broke a boy’s beloved ukulele. To Sherif’s dismay, however, the children just couldn’t be persuaded to hate each other…The robustness of the boy’s “civilised” values came as a blow to Sherif, making him angry enough to want to punch one of his young academic helpers. It turned out that the strong bonds forged at the beginning of the camp weren’t easily broken. Thankfully, he never did start the forest fire—he aborted the experiment when he realised it wasn’t going to support his hypothesis.
But the Rockefeller Foundation had given Sherif $38,000. In his mind, perhaps, if he came back empty-handed, he would face not just their anger but the ruin of his reputation. So, within a year, he had recruited boys for a second camp, this time in Robbers Cave state park in Oklahoma. He was determined not to repeat the mistakes of Middle Grove.
…At Robbers Cave, things went more to plan. After a tug-of-war in which they were defeated, the Eagles burned the Rattler’s flag. Then all hell broke loose, with raids on cabins, vandalism and food fights. Each moment of confrontation, however, was subtly manipulated by the research team. They egged the boys on, providing them with the means to provoke one another—who else, asks Perry in her book, could have supplied the matches for the flag-burning?
…Sherif was elated. And, with the publication of his findings that same year, his status as world-class scholar was confirmed. The “Robbers Cave experiment” is considered seminal by social psychologists, still one of the best-known examples of “realistic conflict theory”. It is often cited in modern research. But was it scientifically rigorous? And why were the results of the Middle Grove experiment—where the researchers couldn’t get the boys to fight—suppressed? “Sherif was clearly driven by a kind of a passion,” Perry says. “That shaped his view and it also shaped the methods he used. He really did come from that tradition in the 30s of using experiments as demonstrations—as a confirmation, not to try to find something new.” In other words, think of the theory first and then find a way to get the results that match it. If the results say something else? Bury them…“I think people are aware now that there are real ethical problems with Sherif’s research,” she tells me, “but probably much less aware of the backstage [manipulation] that I’ve found. And that’s understandable because the way a scientist writes about their research is accepted at face value.” The published report of Robbers Cave uses studiedly neutral language. “It’s not until you are able to compare the published version with the archival material that you can see how that story is shaped and edited and made more respectable in the process.” That polishing up still happens today, she explains. “I wouldn’t describe him as a charlatan…every journal article, every textbook is written to convince, persuade and to provide evidence for a point of view. So I don’t think Sherif is unusual in that way.”
[Fungi are some of the most common organisms around, prolific, hardy, and fungal infections are major causes of infection-related mortality in plants and reptiles and can infect and kill almost anything, but mammals usually die of bacteria/viruses/parasites. Dying of a fungus is rare, and we hardly even get fungal infections except in odd places like our extremities (eg toes), or odd times of life like when bats hibernate. Why? Perhaps because we are warm-blooded, so our body heat is fatal to fungi. This explains why extremities or hibernating bats are vulnerable (colder). And perhaps this even played a role in the extinction of the dinosaurs and triumph of mammals?]
Vacancy chains involve unique patterns of resource acquisition behaviors that determine how reusable resources are distributed through animal populations. Shell vacancy chains have been described for several hermit crab species, both terrestrial and marine, but little is known about the ecological and behavioral dynamics of shell choice in social versus solitary contexts. Here, we present a novel conceptual framework that differentiates 2 types of shell vacancy chain in hermit crabs and discuss fundamentally distinct predictions concerning the behavioral and ecological costs and benefits associated with synchronous versus asynchronous vacancy chains. In laboratory studies of the terrestrial hermit crab Coenobita clypeatus, we found support for the prediction that social context alters shell acquisition behaviors. Field observations demonstrated that both synchronous and asynchronous vacancy chains are common and revealed previously undescribed waiting and piggybacking behaviors that appear to facilitate synchronous vacancy chains. Additionally, simulation results from an agent-based model showed that population density and waiting behaviors can both influence the likelihood of synchronous vacancy chains. Together, these results indicate that better understanding of hermit crab resource acquisition requires studying social behaviors, including vacancy chain formation.
Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements and swaps them if they are in the wrong order. The pass through the list is repeated until the list is sorted. The algorithm, which is a comparison sort, is named for the way smaller or larger elements "bubble" to the top of the list.
A vacancy chain is a social structure through which resources are distributed to consumers. In a vacancy chain, a new resource unit that arrives into a population is taken by the first individual in line, who then leaves their old unit behind, this old unit is taken by a second individual, leaving their old unit behind, and so forth.
…As they grow, hermit crabs must move into larger shells, so they are always on the lookout for a more spacious dwelling. And an undamaged shell is preferable to a broken one, even if the shells are the same size. Knowing this, the researchers decided to dramatically change the available hermit crab real estate on Carrie Bow Cay. They placed 20 beautifully intact shells that were a little too big for most hermit crabs at various spots around the island and watched what happened.
When a lone crab encountered one of the beautiful new shells, it immediately inspected the shelter with its legs and antennae and scooted out of its current home to try on the new shelter for size. If the new shell was a good fit, the crab claimed it. Classic hermit crab behavior. But if the new shell was too big, the crab did not scuttle away disappointed—instead, it stood by its discovery for anywhere between 15 minutes and 8 hours, waiting. This was unusual. Eventually other crabs showed up, each one trying on the shell. If the shell was also too big for the newcomers, they hung around too, sometimes forming groups as large as 20. The crabs did not gather in a random arrangement, however. Rather, they clamped onto one another in a conga line stretching from the largest to smallest animal—a behavior the biologists dubbed “piggybacking.”
Only one thing could break up the chain of crabs: a Goldilocks hermit crab for whom the shell introduced by Lewis and Rotjan was just right. As soon as such a crab claimed its new home, all the crabs in queue swiftly exchanged shells in sequence. The largest crab at the front of the line seized the Goldilocks crab’s abandoned shell. The second largest crab stole into the first’s old shell. And so on.
No one had ever documented such well-orchestrated shell swapping before, but similar behavior was not unknown. In 1986, Ivan Chase of Stony Brook University made the first observations of hermit crabs exchanging shells in a “vacancy chain”—a term originally coined by social scientists to describe the ways that people trade coveted resources like apartments and jobs. When one person leaves, another moves in. Since then, several researchers—including Lewis and Rotjan—have studied the behavior in different hermit crab species. Some preliminary evidence suggests that other animals use vacancy chains too, including clown fish, lobsters, octopuses and some birds. As Chase explains in the June issue of Scientific American, vacancy chains are an excellent way to distribute resources: Unlike more typical competition, a single vacancy chain benefits everyone involved—each individual gets an upgrade. So it makes sense that hermit crabs and other animals have evolved sophisticated social behaviors to make the most of vacancy chains.
If you are a medical professional and have been trained in a “civilised” country you probably know next to nothing about the primate Homo sapiens and how they survive in the wild. You probably do not know that nature has provided an automatic manipulator to correct most spinal and peripheral joint lesions in primates. In common with millions of other so called civilised people you suffer unnecessarily from musculoskeletal problems and are discouraged about how to treat the exponential rise in low back pain throughout the developed world. Humans are one of 200 species of primates.1 All primates suffer from musculoskeletal problems; nature, recognising this fact, has given primates a way to correct them.
The study of animals in the wild has been a lifelong pursuit. I grew up with tribal people and in 1953–4 commanded a platoon of African soldiers from 9 tribes, who taught me to sleep on my side without a pillow so that I could listen out for danger with both ears. I have organised over 14 expeditions all over the world to meet native peoples and study their sleeping and resting postures. They all adopted similar postures and exhibited few musculoskeletal problems. I must emphasise that this is not a comparison of genes or races but of lifestyles. I tried to carry out surveys to collect evidence but they were meaningless, as tribespeople give you the answer they think you want. They often object to having their photographs taken, so I have demonstrated the postures.
Forest dwellers and nomads suffer fewer musculoskeletal lesions than “civilised” people
Nature’s automatic manipulator during sleep is the kickback against the vertebrae by the ribs when the chest is prevented from movement by the forest floor
Mike, who was born and raised in Kenya speaking its native language Swahili, was conscripted to command indigenous troops in the King’s African Rifles as unrest began to spread throughout his homeland. It was after Mau Mau militants ambushed a police truck that a battle erupted between the rivals. A clash Mike so vividly recalls as it marked the last time he could appreciate the gift of sight before it was lost. Remembering the battle, Mike said: “One of the Mau Mau threw a grenade at me and it landed by my foot. I jumped away from it and threw myself on the ground hoping that when it went off I wouldn’t get hit.”The next thing I remember I was running flat out and I got a bullet in my right ear which came out of my right eye. “My dad always said I didn’t have anything between my ears and now he’s got definite proof.”The next thing I remember I fell over and as I picked myself up everything went black. I sat down and I can’t remember much more than that—not in a logical sense anyway." Dissatisfied with blasting their victim with a rifle—nearly killing him—the Mau Mau rebels returned armed with machetes to cut up Mike, who lay helpless on the ground nursing his wound. Powerless to defend himself, Mike has always owed his survival to an ally soldier, Reguton—with whom he still has regular contact—who shot dead the seven rebels.
…Mike was transferred to a military hospital in England after the attack where he received the devastating news that he would never see again. Just a week before the shooting Mike had asked for his girlfriend’s hand in marriage, but following doctors’ gloomy prognosis he broke off the engagement. “After I was blinded I never thought I could look after a wife,” he said. “I didn’t think I would be able to look after myself let alone anyone else—it’s one of my biggest regrets.” But anxious not to allow his disability to blight the years ahead of him, Mike began learning the art of braille at St Dunstan’s, a national charity for the blind. Soon after Mike enrolled on a physiotherapy course with the Royal National Institute for the Blind (RNIB) suggested by his dad who felt the career suited his structural interests. It was during his training that he met his late wife Selma, and the couple eventually married in 1957. For the past 45 years Mike has been running a thriving physiotherapy clinic at his St Albans home and he remains committed to his work.
Summoned to Kubrick’s secluded mansion and offered an enormous sum of money, Watson began collaborating on a film idea with Kubrick, who was a perfectionist who demanded endless marathon revisions of possible stories and ideas, only to throw them out and hare off on an entirely different avenue; he would spend extravagantly on travel or books on a topic or demand photos of a particular place or a specific item like a bag on sale only discard them without a second look, perennially challenging his assistants’ patience. (This attitude extended to his films, where he thought nothing of ordering in an entire plastic replica garden, only to decide it was inadequate, discard it, and order real palm trees flown in.) He was a lover of animals like cats, dogs, and birds, requiring a servant to mow grass & deliver it to a cat kept upstairs on a daily basis, although his affection was often quite as harmful as helpful (his generosity in ordering feeding of the birds made them obese). Careless of rough drafts, he’d lose printouts or erase disks, but even more paranoid, he would be infuriated when the local hacker who assisted them with computer problems restored files from backups the hacker had prudently kept. This paranoia further kept him terrified about global geopolitics, such as whether Saddam Hussein would trigger nuclear war in the Middle East.
For all the surreal comedy, when Kubrick dies—A.I still being nowhere near filming, of course—and Watson writes up his memoirs, he finds that he misses Kubrick and “I remain sad that he’s gone.”]
A.I. Artificial Intelligence is a 2001 American science fiction drama film directed by Steven Spielberg. The screenplay by Spielberg and screen story by Ian Watson were loosely based on the 1969 short story "Supertoys Last All Summer Long" by Brian Aldiss. The film was produced by Kathleen Kennedy, Spielberg and Bonnie Curtis. It stars Haley Joel Osment, Jude Law, Frances O'Connor, Brendan Gleeson and William Hurt. Set in a futuristic post-climate change society, A.I. tells the story of David (Osment), a childlike android uniquely programmed with the ability to love.
Stanley Kubrick was an American film director, producer, screenwriter, and photographer. He is frequently cited as one of the greatest filmmakers in cinematic history. His films, which are mostly adaptations of novels or short stories, cover a wide range of genres, and are noted for their realism, dark humor, unique cinematography, extensive set designs, and evocative use of music.
Objectives: This study aimed to characterise feline audiogenic reflex seizures (FARS).
Methods: An online questionnaire was developed to capture information from owners with cats suffering from FARS. This was collated with the medical records from the primary veterinarian. 96 cats were included.
Results: Myoclonic seizures were one of the cardinal signs of this syndrome (90/96), frequently occurring prior to generalised tonic-clonic seizures (GTCSs) in this population. Other features include a late onset (median 15 years) and absence seizures (6/96), with most seizures triggered by high-frequency sounds amid occasional spontaneous seizures (up to 20%). Half the population (48/96) had hearing impairment or were deaf. One-third of cats (35/96) had concurrent diseases, most likely reflecting the age distribution. Birmans were strongly represented (30/96). Levetiracetam gave good seizure control. The course of the epilepsy was non-progressive in the majority (68/96), with an improvement over time in some (23/96). Only 33/96 and 11/90 owners, respectively, felt the GTCSs and myoclonic seizures affected their cat’s quality of life (QoL). Despite this, many owners (50/96) reported a slow decline in their cat’s health, becoming less responsive (43/50), not jumping (41/50), becoming uncoordinated or weak in the pelvic limbs (24/50) and exhibiting dramatic weight loss (39/50). These signs were exclusively reported in cats experiencing seizures for >2 years, with 42/50 owners stating these signs affected their cat’s QoL.
Conclusions and relevance: In gathering data on audiogenic seizures in cats, we have identified a new epilepsy syndrome named FARS with a geriatric onset. Further studies are warranted to investigate potential genetic predispositions to this condition.
Eh, this is doomed—Waxy or Imperica should take a crack at this. The AV Club did a list of ‘things’. I wanted to cover stuff that wasn’t on there. A lot happened outside of celebrities, Twitter and momentary memes. (We all obviously love @electrolemon, “double rainbow”, Key & Peele’s Gremlins 2 Brainstorm, 10 hr vids, etc.)
Ongoing news reports in the international media have revealed operational details about the Anglophone cryptographic agencies' global surveillance of both foreign and domestic nationals. The reports mostly emanate from a cache of top secret documents leaked by ex-NSA contractor Edward Snowden, which he obtained whilst working for Booz Allen Hamilton, one of the largest contractors for defense and intelligence in the United States. In addition to a trove of U.S. federal documents, Snowden's cache reportedly contains thousands of Australian, British and Canadian intelligence files that he had accessed via the exclusive "Five Eyes" network. In June 2013, the first of Snowden's documents were published simultaneously by The Washington Post and The Guardian, attracting considerable public attention. The disclosure continued throughout 2013, and a small portion of the estimated full cache of documents was later published by other media outlets worldwide, most notably The New York Times, the Canadian Broadcasting Corporation, the Australian Broadcasting Corporation, Der Spiegel (Germany), O Globo (Brazil), Le Monde (France), L'espresso (Italy), NRC Handelsblad, Dagbladet (Norway), El País (Spain), and Sveriges Television (Sweden).
The SCP Foundation is a fictional organization documented by the web-based collaborative-fiction project of the same name. Within the website's fictional setting, the SCP Foundation is responsible for locating and containing individuals, entities, locations, and objects that violate natural law. The real-world website is community-based and includes elements of many genres such as horror, science fiction, and urban fantasy.
17776 is a serialized speculative fiction multimedia narrative by Jon Bois, published online through SB Nation. Set in the distant future, the series follows three sentient space probes that watch humanity play an evolved form of American football in which games can be played for millennia over distances of thousands of miles. The series debuted on July 5, 2017, and new chapters were published daily until the series concluded ten days later with its twenty-fifth chapter on July 15.
[Profile of elevator safety, technology, and economics: the history and present day of elevators, interwoven with a story of a man trapped in an elevator for 41 hours. Elevators are remarkably safe and over-engineered, and make skyscrapers, and hence dense cities, economically possible. Balancing elevator space with tenant space is a critical part of elevator design, as is routing between floors and figuring out the exact socially-acceptable density of passengers. Elevator technology continues advancing, driven by ultra-tall skyscrapers like the Burj Khalifa. Nevertheless, the standard elevator design is so simple, energetically-efficient, and safe that it's hard to improve on.]
Frederick Benjamin "Ben" Carlin was an Australian adventurer who was the first person to circumnavigate the world in an amphibious vehicle. Born in Northam, Western Australia, Carlin attended Guildford Grammar School in Perth, and later studied mining engineering at the Kalgoorlie School of Mines. After qualifying as an engineer, he worked on the Goldfields before in 1939 emigrating to China to work in a British coal mine. In the Second World War, Carlin was posted to the Indian Army Corps of Engineers, serving in India, Italy, and throughout the Middle East. After his discharge from service in 1946, he emigrated to the United States with his American wife, Elinore.
The Ford GPA 'Seep' was an amphibious version of the World War II Ford GPW jeep. Unlike the jeep, the seep was not a successful design; it was considered too slow and heavy on land, and lacked sufficient seagoing abilities in open water. The design features of the much larger and more successful DUKW amphibious truck were used on the GPA.
[Seminal essay explaining why the rollout of “broadband” home connections to replace 56k dialups had not improved regular WWW browsing as much as people expected: while broadband had greater throughput, it had similar (or worse) latency.
Because much of the wallclock time of any Internet connection is spent setting up and negotiating with the other end, and not that much is spent on the raw transfer of large numbers of bytes, the speedup is far smaller than one would expect by dividing the respective bandwidths.
Further, while bandwidth/throughput is easy to improve by adding more or higher-quality connections and can be patched elsewhere in the system by adding parallelism or upgrading parts or investing in data compression, the latency-afflicted steps are stubbornly serial, any time lost is physically impossible to retrieve, and many steps are inherently limited by the speed of light—more capacious connections quickly run into Amdahl’s law, where the difficult-to-improve serial latency-bound steps dominate the overall task. As Cheshire summarizes it:]
Fact One: Making more bandwidth is easy.
Fact Two: Once you have bad latency you’re stuck with it.
Fact Three: Current consumer devices have appallingly bad latency.
Fact Four: Making limited bandwidth go further is easy.
…That’s the problem with communications devices today. Manufacturers say “speed” when they mean “capacity”. The other problem is that as far as the end-user is concerned, the thing they want to do is transfer large files quicker. It may seem to make sense that a high-capacity slow link might be the best thing for the job. What the end-user doesn’t see is that in order to manage that file transfer, their computer is sending dozens of little control messages back and forth. The thing that makes computer communication different from television is interactivity, and interactivity depends on all those little back-and-forth messages.
The phrase “high-capacity slow link” that I used above probably looked very odd to you. Even to me it looks odd. We’ve been used to wrong thinking for so long that correct thinking looks odd now. How can a high-capacity link be a slow link? High-capacity means fast, right? It’s odd how that’s not true in other areas. If someone talks about a “high-capacity” oil tanker, do you immediately assume it’s a very fast ship? I doubt it. If someone talks about a “large-capacity” truck, do you immediately assume it’s faster than a small sports car?
We have to start making that distinction again in communications. When someone tells us that a modem has a speed of 28.8 kbit/sec we have to remember that 28.8 kbit/sec is its capacity, not its speed. Speed is a measure of distance divided by time, and ‘bits’ is not a measure of distance.
I don’t know how communications came to be this way. Everyone knows that when you buy a hard disk you should check what its seek time is. The maximum transfer rate is something you might also be concerned with, but the seek time is definitely more important. Why does no one think to ask what a modem’s ‘seek time’ is? The latency is exactly the same thing. It’s the minimum time between asking for a piece of data and getting it, just like the seek time of a disk, and it’s just as important.
He measures 21 keyboard latencies using a logic analyzer, finding a range of 15–60ms (!), representing a waste of a large fraction of the available ~100–200ms latency budget before a user notices and is irritated (“the median keyboard today adds as much latency as the entire end-to-end pipeline as a fast machine from the 70s.”). The latency estimates are surprising, and do not correlate with advertised traits. They simply have to be measured empirically.]
We can see that, even with the limited set of keyboards tested, there can be as much as a 45ms difference in latency between keyboards. Moreover, a modern computer with one of the slower keyboards attached can’t possibly be as responsive as a quick machine from the 70s or 80s because the keyboard alone is slower than the entire response pipeline of some older computers. That establishes the fact that modern keyboards contribute to the latency bloat we’ve seen over the past forty years…Most keyboards add enough latency to make the user experience noticeably worse, and keyboards that advertise speed aren’t necessarily faster. The two gaming keyboards we measured weren’t faster than non-gaming keyboards, and the fastest keyboard measured was a minimalist keyboard from Apple that’s marketed more on design than speed.
These graphs show the distribution of latencies for each terminal. The y-axis has the latency in milliseconds. The x-axis is the percentile (e.g., 50 means represents 50%-ile keypress i.e., the median keypress). Measurements are with macOS unless otherwise stated. The graph on the left is when the machine is idle, and the graph on the right is under load. If we just look at median latencies, some setups don’t look too bad—terminal.app and emacs-eshell are at roughly 5ms unloaded, small enough that many people wouldn’t notice. But most terminals (st, alacritty, hyper, and iterm2) are in the range where you might expect people to notice the additional latency even when the machine is idle. If we look at the tail when the machine is idle, say the 99.9%-ile latency, every terminal gets into the range where the additional latency ought to be perceptible, according to studies on user interaction. For reference, the internally generated keypress to GPU memory trip for some terminals is slower than the time it takes to send a packet from Boston to Seattle and back, about 70ms.
…Most terminals have enough latency that the user experience could be improved if the terminals concentrated more on latency and less on other features or other aspects of performance. However, when I search for terminal benchmarks, I find that terminal authors, if they benchmark anything, benchmark the speed of sinking stdout or memory usage at startup. This is unfortunate because most “low performance” terminals can already sink stdout many orders of magnitude faster than humans can keep up with, so further optimizing stdout throughput has a relatively small impact on actual user experience for most users. Likewise for reducing memory usage when an idle terminal uses 0.01% of the memory on my old and now quite low-end laptop. If you work on a terminal, perhaps consider relatively more latency and interactivity (e.g., responsiveness to ^C) optimization and relatively less throughput and idle memory usage optimization.
A couple years ago, I took a road trip from Wisconsin to Washington and mostly stayed in rural hotels on the way. I expected the internet in rural areas too sparse to have cable internet to be slow, but I was still surprised that a large fraction of the web was inaccessible. Some blogs with lightweight styling were readable, as were pages by academics who hadn’t updated the styling on their website since 1995. But very few commercial websites were usable (other than Google). When I measured my connection, I found that the bandwidth was roughly comparable to what I got with a 56k modem in the 90s. The latency and packet loss were significantly worse than the average day on dialup: latency varied between 500ms and 1000ms and packet loss varied between 1% and 10%. Those numbers are comparable to what I’d see on dialup on a bad day.
Despite my connection being only a bit worse than it was in the 90s, the vast majority of the web wouldn’t load…When Microsoft looked at actual measured connection speeds, they found that half of Americans don’t have broadband speed. Heck, AOL had 2 million dial-up subscribers in 2015, just AOL alone. Outside of the U.S., there are even more people with slow connections. I recently chatted with Ben Kuhn, who spends a fair amount of time in Africa, about his internet connection:
I’ve seen ping latencies as bad as ~45 sec and packet loss as bad as 50% on a mobile hotspot in the evenings from Jijiga, Ethiopia. (I’m here now and currently I have 150ms ping with no packet loss but it’s 10am). There are some periods of the day where it ~never gets better than 10 sec and ~10% loss. The internet has gotten a lot better in the past ~year; it used to be that bad all the time except in the early mornings.
…Let’s load some websites that programmers might frequent with a variety of simulated connections to get data on page load times…The timeout for tests was 6 minutes; anything slower than that is listed as FAIL. Pages that failed to load are also listed as FAIL. A few things that jump out from the table are:
A large fraction of the web is unusable on a bad connection. Even on a good (0% packet loss, no ping spike) dialup connection, some sites won’t load…If you were to look at the 90%-ile results, you’d see that most pages fail to load on dialup and the “Bad” and “😱” connections are hopeless for almost all sites.
Some sites will use a lot of data!
…The flaw in the “page weight doesn’t matter because average speed is fast” [claim] is that if you average the connection of someone in my apartment building (which is wired for 1Gbps internet) and someone on 56k dialup, you get an average speed of 500 Mbps. That doesn’t mean the person on dialup is actually going to be able to load a 5MB website. The average speed of 3.9 Mbps comes from a 2014 Akamai report, but it’s just an average. If you look at Akamai’s 2016 report, you can find entire countries where more than 90% of IP addresses are slower than that!..“Use bcrypt” has become the mantra for a reasonable default if you’re not sure what to do when storing passwords. The web would be a nicer place if “use webpagetest” caught on in the same way. It’s not always the best tool for the job, but it sure beats the current defaults.
[Google engineer recounts the results of heavily optimizing YouTube to make it usable in slow Third World Countries: in an example of Jevons Paradox & Simpson’s Paradox, he found that despite making YouTube better for all users, average page load time got worse—because now Africans were actually able to use it.]
When we plotted the data geographically and compared it to our total numbers broken out by region, there was a disproportionate increase in traffic from places like Southeast Asia, South America, Africa, and even remote regions of Siberia. Further investigation revealed that, in these places, the average page load time under [the optimized version] Feather was over 2 minutes! This meant that a regular video page, at over a megabyte, was taking more than 20 minutes to load! This was the penalty incurred before the video stream even had a chance to show the first frame. Correspondingly, entire populations of people simply could not use YouTube because it took too long to see anything. Under Feather, despite it taking over 2 minutes to get to the first frame of video, watching a video actually became a real possibility. Over the week, word of Feather had spread in these areas and our numbers were completely skewed as a result. Large numbers of people who were previously unable to use YouTube before were suddenly able to.
Through Feather, I learned a valuable lesson about the state of the Internet throughout the rest of the world. Many of us are fortunate to live in high bandwidth regions, but there are still large portions of the world that do not. By keeping your client side code small and lightweight, you can literally open your product up to new markets.
I’ve had this nagging feeling that the computers I use today feel slower than the computers I used as a kid. As a rule, I don’t trust this kind of feeling because human perception has been shown to be unreliable in empirical studies, so I carried around a high-speed camera and measured the response latency of devices I’ve run into in the past few months. These are tests of the latency between a keypress and the display of a character in a terminal (see appendix for more details)…If we look at overall results, the fastest machines are ancient. Newer machines are all over the place. Fancy gaming rigs with unusually high refresh-rate displays are almost competitive with machines from the late 70s and early 80s, but “normal” modern computers can’t compete with thirty to forty year old machines.
…Almost every computer and mobile device that people buy today is slower than common models of computers from the 70s and 80s. Low-latency gaming desktops and the iPad Pro can get into the same range as quick machines from thirty to forty years ago, but most off-the-shelf devices aren’t even close.
If we had to pick one root cause of latency bloat, we might say that it’s because of “complexity”. Of course, we all know that complexity is bad. If you’ve been to a non-academic non-enterprise tech conference in the past decade, there’s a good chance that there was at least one talk on how complexity is the root of all evil and we should aspire to reduce complexity.
Unfortunately, it’s a lot harder to remove complexity than to give a talk saying that we should remove complexity. A lot of the complexity buys us something, either directly or indirectly. When we looked at the input of a fancy modern keyboard vs. the Apple 2 keyboard, we saw that using a relatively powerful and expensive general purpose processor to handle keyboard inputs can be slower than dedicated logic for the keyboard, which would both be simpler and cheaper. However, using the processor gives people the ability to easily customize the keyboard, and also pushes the problem of “programming” the keyboard from hardware into software, which reduces the cost of making the keyboard. The more expensive chip increases the manufacturing cost, but considering how much of the cost of these small-batch artisanal keyboards is the design cost, it seems like a net win to trade manufacturing cost for ease of programming.
You spend lots of time waiting on your computer. You pause while apps start and web pages load. Spinner icons are everywhere. Hardware gets faster, but software still feels slow. What gives? If you use your computer to do important work, you deserve fast software. Too much of today’s software falls short. At the Ink & Switch research lab we’ve researched why that is, so that we can do better. This article shares we’ve learned…Let’s look at an example of how latency can add up:
…There is a deep stack of technology that makes a modern computer interface respond to a user’s requests. Even something as simple as pressing a key on a keyboard and having the corresponding character appear in a text input box traverses a lengthy, complex gauntlet of steps, from the scan rate of the keyboard, through the OS and framework processing layers, through the graphics card rendering and display refresh rate. There is reason for this complexity, and yet we feel sad that computer users trying to be productive with these devices are so often left waiting, watching spinners, or even just with the slight but still perceptible sense that their devices simply can’t keep up with them.
In this article I examine human and machine aspects of typing latency (“typing lag”) and present experimental data on latency of popular text / code editors. The article is inspired by my work on implementing “zero-latency typing” in IntelliJ IDEA.
1.1. Feedback 1.2. Motor skill 1.3. Internal model 1.4. Multisensory integration 1.5. Effects
3.1. Configuration 3.2. Methodology 3.3. Windows 3.4. Linux 3.5. VirtualBox
…To measure processing delays experimentally I created Typometer—a tool to determine and analyze visual latency of text editors (sources). Typometer works by generating OS input events and using screen capture to measure the delay between a keystroke and a corresponding screen update. Hence, the measurement encompasses all the constituents of processing latency (i. e. OS queue, VM, editor, GPU pipeline, buffering, window manager and possible V-Sync). That is the right thing to do, because all those components are inherently intertwined with the editor, and in principle, editor application has influence on all the parts.…[He tested 9] Editors: Atom 1.1 / Eclipse 4.5.1 / Emacs 24.5.1 / Gedit 3.10.4 / GVim 7.4.712 / IntelliJ Idea CE 15.0 / Netbeans 8.1 / Notepad++ 6.8.4 / Sublime Text 3083.
Apparently, editors are not created equal (at least, from the standpoint of latency).
Cloud apps like Google Docs and Trello are popular because they enable real-time collaboration with colleagues, and they make it easy for us to access our work from all of our devices. However, by centralizing data storage on servers, cloud apps also take away ownership and agency from users. If a service shuts down, the software stops functioning, and data created with that software is lost.
In this article we propose “local-first software”: a set of principles for software that enables both collaboration and ownership for users. Local-first ideals include the ability to work offline and collaborate across multiple devices, while also improving the security, privacy, long-term preservation, and user control of data.
We survey existing approaches to data storage and sharing, ranging from email attachments to web apps to Firebase-backed mobile apps, and we examine the trade-offs of each. We look at Conflict-free Replicated Data Types (CRDTs): data structures that are multi-user from the ground up while also being fundamentally local and private. CRDTs have the potential to be a foundational technology for realizing local-first software.
We share some of our findings from developing local-first software prototypes at Ink & Switch over the course of several years. These experiments test the viability of CRDTs in practice, and explore the user interface challenges for this new data model. Lastly, we suggest some next steps for moving towards local-first software: for researchers, for app developers, and a startup opportunity for entrepreneurs.
…in the cloud, ownership of data is vested in the servers, not the users, and so we became borrowers of our own data. The documents created in cloud apps are destined to disappear when the creators of those services cease to maintain them. Cloud services defy long-term preservation. No Wayback Machine can restore a sunsetted web application. The Internet Archive cannot preserve your Google Docs.
In this article we explored a new way forward for software of the future. We have shown that it is possible for users to retain ownership and control of their data, while also benefiting from the features we associate with the cloud: seamless collaboration and access from anywhere. It is possible to get the best of both worlds.
But more work is needed to realize the local-first approach in practice. Application developers can take incremental steps, such as improving offline support and making better use of on-device storage. Researchers can continue improving the algorithms, programming models, and user interfaces for local-first software. Entrepreneurs can develop foundational technologies such as CRDTs and peer-to-peer networking into mature products able to power the next generation of applications.
Motivation: collaboration and ownership
Seven ideals for local-first software
No spinners: your work at your fingertips
Your work is not trapped on one device
The network is optional
Seamless collaboration with your colleagues
The Long Now
Security and privacy by default
You retain ultimate ownership and control
Existing data storage and sharing models
How application architecture affects user experience
Files and email attachments
Web apps: Google Docs, Trello, Figma
Dropbox, Google Drive, Box, OneDrive, etc.
Git and GitHub
Developer infrastructure for building apps
Web app (thin client)
Mobile app with local storage (thick client)
Backend-as-a-Service: Firebase, CloudKit, Realm
Towards a better future
CRDTs as a foundational technology
Ink & Switch prototypes
How you can help
For distributed systems and programming languages researchers
Cloud apps like Google Docs and Trello are popular because they enable real-time collaboration with colleagues, and they make it easy for us to access our work from all of our devices. However, by centralizing data storage on servers, cloud apps also take away ownership and agency from users. If a service shuts down, the software stops functioning, and data created with that software is lost.
In this article we propose local-first software, a set of principles for software that enables both collaboration and ownership for users. Local-first ideals include the ability to work offline and collaborate across multiple devices, while also improving the security, privacy, long-term preservation, and user control of data.
We survey existing approaches to data storage and sharing, ranging from email attachments to web apps to Firebase-backed mobile apps, and we examine the trade-offs of each. We look at Conflict-free Replicated Data Types (CRDTs): data structures that are multi-user from the ground up while also being fundamentally local and private. CRDTs have the potential to be a foundational technology for realizing local-first software.
We share some of our findings from developing local-first software prototypes at the Ink & Switch research lab over the course of several years. These experiments test the viability of CRDTs in practice, and explore the user interface challenges for this new data model. Lastly, we suggest some next steps for moving towards local-first software: for researchers, for app developers, and a startup opportunity for entrepreneurs. [Keywords: collaboration software, mobile computing, data ownership, CRDTs, peer-to-peer communication]
In distributed computing, a conflict-free replicated data type (CRDT) is a data structure which can be replicated across multiple computers in a network, where the replicas can be updated independently and concurrently without coordination between the replicas, and where it is always mathematically possible to resolve inconsistencies that might come up.
Short technology essay based on Myer & Sutherland 1968 (!) discussing a perennial pattern in computing history dubbed the ‘Wheel of Reincarnation’ for how old approaches inevitably reincarnate as the exciting new thing: shifts between ‘local’ and ‘remote’ computing resources, which are exemplified by repeated cycles in graphical display technologies from dumb ‘terminals’ which display only raw pixels to smart devices which interpret more complicated inputs like text or vectors or programming languages (eg PostScript). These cycles are driven by cost, latency, architectural simplicity, and available computing power.
The Wheel of Reincarnation paradigm has played out for computers as well, in shifts from local terminals attached to mainframes to PCs to smartphones to ‘cloud computing’.
The flexibility and power needed in the channel for a computer display are considered. To work efficiently, such a channel must have a sufficient number of instruction that it is best understood as a small processor rather than a powerful channel. As it was found that successive improvements to the display processor design lie on a circular path, by making improvements one can return to the original simple design plus one new general purpose computer for each trip around. The degree of physical separation between display and parent computer is a key factor in display processor design. [Keywords: display processor design, display system, computer graphics, graphic terminal, displays, graphics, display generator, display channel, display programming, graphical interaction, remote displays, Wheel of Reincarnation]
Why are there so many places for backdoors and weird machines in your “computer”? Because your computer is in fact scores or hundreds, perhaps even thousands, of computer chips, many of which are explicitly or implicitly capable of Turing-complete computations (many more powerful than desktops of bygone eras), working together to create the illusion of a single computer. Backdoors, bugs, weird machines, and security do not care about what you think—only where resources can be found and orchestrated into a computation.
Philip II was King of Spain (1556–1598), King of Portugal, King of Naples and Sicily, and jure uxoris King of England and Ireland. He was also Duke of Milan from 1540. From 1555 he was Lord of the Seventeen Provinces of the Netherlands.
The Familia Caritatis, also known as the Familists, was a mystical religious sect founded in the sixteenth century by Henry Nicholis, also known as Niclaes. Familia Caritatis translates from Latin into "Family of Love", and in other languages, "Hus der Lieften", "Huis der Liefde" and "Haus der Liebe".
On June 4th, a group of lawyers shuffled into a federal court in Manhattan to argue over two trademark registrations. The day’s hearing was the culmination of months of internet drama—furious blog posts, Twitter hashtags, YouTube videos, claims of doxxing, and death threats….They were gathered there that day because one self-published romance author was suing another for using the word “cocky” in her titles. And as absurd as this courtroom scene was—with a federal judge soberly examining the shirtless doctors on the cover of an “MFM Menage Romance”—it didn’t even begin to scratch the surface.
The fight over #Cockygate, as it was branded online, emerged from the strange universe of Amazon Kindle Unlimited, where authors collaborate and compete to game Amazon’s algorithm. Trademark trolling is just the beginning: There are private chat groups, ebook exploits, conspiracies to seed hyper-specific trends like “Navy SEALs” and “mountain men,” and even a controversial sweepstakes in which a popular self-published author offered his readers a chance to win diamonds from Tiffany’s if they reviewed his new book…A genre that mostly features shiny, shirtless men on its covers and sells ebooks for ¢99 a pop might seem unserious. But at stake are revenues sometimes amounting to a million dollars a year, with some authors easily netting six figures a month. The top authors can drop $50,000 on a single ad campaign that will keep them in the charts—and see a worthwhile return on that investment.
…According to Willink, over the course of RWA, Valderrama told her about certain marketing and sales strategies, which she claimed to handle for other authors. Valderrama allegedly said that she organized newsletter swaps, in which authors would promote each other’s books to their respective mailing lists. She also claimed to manage review teams—groups of assigned readers who were expected to leave reviews for books online. According to Willink, Valderrama’s authors often bought each other’s books to improve their ranking on the charts—something that she arranged, coordinating payments through her own PayPal account. Valderrama also told her that she used multiple email addresses to buy authors’ books on iBooks when they were trying to hit the USA Today list. When Valderrama invited Willink to a private chat group of romance authors, Willink learned practices like chart gaming and newsletter placement selling—and much more—were surprisingly common.
…In yet more screencaps, members discuss the mechanics of “book stuffing.” Book stuffing is a term that encompasses a wide range of methods for taking advantage of the Kindle Unlimited revenue structure. In Kindle Unlimited, readers pay $9.99 a month to read as many books as they want that are available through the KU program. This includes both popular mainstream titles like the Harry Potter series and self-published romances put out by authors like Crescent and Hopkins. Authors are paid according to pages read, creating incentives to produce massively inflated and strangely structured books. The more pages Amazon thinks have been read, the more money an author receives.
…Book stuffing is particularly controversial because Amazon pays authors from a single communal pot. In other words, Kindle Unlimited is a zero-sum game. The more one author gets from Kindle Unlimited, the less the other authors get. The romance authors Willink was discovering didn’t go in for clumsy stuffings of automatic translations or HTML cruft; rather, they stuffed their books with ghostwritten content or repackaged, previously published material. In the latter case, the author will bait readers with promises of fresh content, like a new novella, at the end of the book. Every time a reader reads to the end of a 3,000-page book, the author earns almost 14 dollars. For titles that break into the top of the Kindle Unlimited charts, this trick can generate a fortune.
[Paper/suicide note by a philosophy graduate who went on a motorcycle tour of Mexico and ran into a goat, instantly becoming a paraplegic. Atreus discusses how paraplegia robs him of the ability to do almost everything he valued in life, from running to motorcycling to sex, while burdening him down with dead weight equivalent to hundreds of pounds, which make the simplest action, like getting out of a car, take minutes or hours, radically shortening his effective days. He is an ambulatory corpse, “two arms and a head”. Atreus discusses in detail the existential horror of his condition, from complete lack of bowel control requiring him to constantly dig his own feces out of his anus to being trapped in a wheelchair larger than a washing machine to the cruelty of well-intentioned encouragement to social alienation and his constant agonized awareness of everything he has lost. If the first question of philosophy is whether to commit suicide, Atreus finds that for him, the answer is “yes”. The paper/book concludes with his description of stabbing himself and slowly bleeding to death.]
This book is born of pain. I wrote it out of compulsion during the most hellish time of my life. Writing it hurt me and was at times extremely unpleasant. Is the book my death-rattle or the sound of me screaming inside of my cage? Does its tone tell you I am angry or merely seeking a psychological expedient against the madness I see around me? The book is my creation but is also in many ways foreign to me for I am living in a foreign land. Most generally perhaps it is just the thoughts that passed through my head over the twenty months I spent moving toward death. I am certainly not a man who is at peace with his life, but on the contrary I despise it as I have never before despised anything. Who can sort it all out? Being imprisoned in the nightmarish cage of paraplegia has done all manner of violence to the deepest parts of me. Still, I have not gone mad. I am no literary genius and don’t expect everything I say to be understood, but if you would like to know what my experiences have been like, and what I am like, I will try my best to show you.
What do I think of this book? I have no affection for it. I find it odious and unattractive and am very saddened that I wrote it. But it is what I had to say. It took on a life of its own and when I now step back and look at what I created I regard it with distaste. If I could, I would put all of these horrible thoughts in a box, seal it forever, then go out and live life. I would run in the sun, enjoy my freedom, and revel in myself. But that’s the point. I cannot go out and live life because this is not life. So instead I speak to you from the place I now occupy, between life and death.
…Imagine a man cut off a few inches below the armpits. Neglect for a moment questions concerning how he eliminates waste and so forth, and just assume that the site of the “amputation” is, to borrow from Gogol, “as uniform as a newly fried pancake”. This man would be vastly, immensely better off than me. If you don’t know who Johnny Eck is, he had a role in the 1932 movie Freaks. He was the guy who was essentially a torso with arms. He walked on his hands. How fortunate he was compared to me may not register right away, because the illusion I mentioned above would probably make you find Johnny Eck’s condition far more shocking than mine. But the truth is that mine is much more horrible than his, barring whatever social “advantages” the illusion of being whole might confer on me. The other day I saw a picture of a woman missing both legs. They were cut off mid-thigh. I thought that if only I was like her perhaps my life would be bearable. She was, in my opinion, better off than the pancake man, who is beyond any doubt far better off than me. One man said to me, “At least you didn’t lose your legs.” No, I did lose my legs, and my penis, and my pelvis. Let’s get something very clear about the difference between paraplegics and double-leg amputees. If tomorrow every paraplegic woke up as a double-leg amputee, the Earth itself would quiver with ecstasy from the collective bursting forth of joyous emotion. Tears of the most exquisitely overwhelming relief and happiness would stream down the cheeks of former paraplegics the world over. My wording here is deliberate. It’s no exaggeration. Losing both legs is bad, but paraplegia is ghoulishly, nightmarishly worse.
Part of what I wanted in desiring to die in the company of those I loved was to reassure them and perhaps give them courage to face death well. That was something I really wanted to give to them and I’m sorry I can only do it with these words. I was driven almost mad by all of the things many other people said about paraplegia, suicide, and what was still possible in my condition. I hope everyone understands how all of that affected the tone of what I wrote. I was so frustrated with all of it, I thought it was so insane. But I only wanted to break free of it all and say what I felt. I felt like it stifled me so horribly.
I cut some more and the blood is flowing well again. I’m surprised how long it is taking me to even feel anything. I thought I was dizzy but I’m not sure I am now. It’s 8:51 pm. I thought I would get cold but I’m not cold either, I’m actually hot but that’s probably the two sweaters. Starting to feel a little badly. Sweating, a little light-headed.
I’m going to go now, done writing. Goodbye everyone.
…“There’s so much so in sorrow,” he said at one point. “Let me down from here,” he said at another. “I’ve lost my modality.” To the surprise of his family members, the lifelong atheist also began hallucinating angels and complaining about the crowded room—even though no one was there.
Felix’s 53-year-old daughter, Lisa Smartt, kept track of his utterances, writing them down as she sat at his bedside in those final days. Smartt majored in linguistics at UC Berkeley in the 1980s and built a career teaching adults to read and write. Transcribing Felix’s ramblings was a sort of coping mechanism for her, she says….eventually she wrote a book, Words on the Threshold, published in early 2017, about the linguistic patterns in 2,000 utterances from 181 dying people, including her father. Despite the limitations of this book, it’s unique—it’s the only published work I could find when I tried to satisfy my curiosity about how people really talk when they die.
…Many people die in such silence, particularly if they have advanced dementia or Alzheimer’s that robbed them of language years earlier. For those who do speak, it seems their vernacular is often banal. From a doctor I heard that people often say, “Oh fuck, oh fuck.” Often it’s the names of wives, husbands, children. “A nurse from the hospice told me that the last words of dying men often resembled each other,” wrote Hajo Schumacher in a September essay in Der Spiegel. “Almost everyone is calling for ‘Mommy’ or ‘Mama’ with the last breath.”…Delirium is so frequent then, wrote the New Zealand psychiatrist Sandy McLeod, that “it may even be regarded as exceptional for patients to remain mentally clear throughout the final stages of malignant illness.” About half of people who recover from postoperative delirium recall the disorienting, fearful experience.
…He also repeated words and phrases, often ones that made no sense. “The green dimension! The green dimension!” (Repetition is common in the speech of people with dementia and also those who are delirious.) Smartt found that repetitions often expressed themes such as gratitude and resistance to death. But there were also unexpected motifs, such as circles, numbers, and motion. “I’ve got to get off, get off! Off of this life,” Felix had said…In Final Gifts, the hospice nurses Callanan and Kelley note that “the dying often use the metaphor of travel to alert those around them that it is time for them to die.” They quote a 17-year-old, dying of cancer, distraught because she can’t find the map. “If I could find the map, I could go home! Where’s the map? I want to go home!”
[Essay by psychiatrist about care of the dying in American healthcare: people die agonizing, slow, expensive deaths, prolonged by modern healthcare, deprived of all dignity and joy by disease and decay. There is little noble about it.]
You will become bedridden, unable to walk or even to turn yourself over. You will become completely dependent on nurse assistants to intermittently shift your position to avoid pressure ulcers. When they inevitably slip up, your skin develops huge incurable sores that can sometimes erode all the way to the bone, and which are perpetually infected with foul-smelling bacteria. Your limbs will become practically vestigial organs, like the appendix, and when your vascular disease gets too bad, one or more will be amputated, sacrifices to save the host. Urinary and fecal continence disappear somewhere in the process, so you’re either connected to catheters or else spend a while every day lying in a puddle of your own wastes until the nurses can help you out….
Somewhere in the process your mind very quietly and without fanfare gives up the ghost. It starts with forgetting a couple of little things, and progresses…They don’t remember their own names, they don’t know where they are or what they’re doing there, and they think it’s the 1930s or the 1950s or don’t even have a concept of years at all. When you’re alert and oriented “x0”, the world becomes this terrifying place where you are stuck in some kind of bed and can’t move and people are sticking you with very large needles and forcing tubes down your throat and you have no idea why or what’s going on.
So of course you start screaming and trying to attack people and trying to pull the tubes and IV lines out. Every morning when I come in to work I have to check the nurses’ notes for what happened the previous night, and every morning a couple of my patients have tried to pull all of their tubes and lines out. If it’s especially bad they try to attack the staff, and although the extremely elderly are really bad at attacking people this is nevertheless Unacceptable Behavior and they have to be restrained ie tied down to the bed. A presumably more humane alternative sometimes used instead or in addition is to just drug you up on all of those old-timey psychiatric medications that actual psychiatrists don’t use anymore because of their bad reputation…Nevertheless, this is the way many of my patients die. Old, limbless, bedridden, ulcerated, in a puddle of waste, gasping for breath, loopy on morphine, hopelessly demented, in a sterile hospital room with someone from a volunteer program who just met them sitting by their bed.
…I work in a Catholic hospital. People here say the phrase “culture of life” a lot, as in “we need to cultivate a culture of life.” They say it almost as often as they say “patient-centered”. At my hospital orientation, a whole bunch of nuns and executives and people like that got up and told us how we had to do our part to “cultivate a culture of life.”
And now every time I hear that phrase I want to scream. 21st century American hospitals do not need to “cultivate a culture of life”. We have enough life. We have life up the wazoo. We have more life than we know what to do with. We have life far beyond the point where it becomes a sick caricature of itself. We prolong life until it becomes a sickness, an abomination, a miserable and pathetic flight from death that saps out and mocks everything that made life desirable in the first place. 21st century American hospitals need to cultivate a culture of life the same way that Newcastle needs to cultivate a culture of coal, the same way a man who is burning to death needs to cultivate a culture of fire.
And so every time I hear that phrase I want to scream, or if I cannot scream, to find some book of hospital poetry that really is a book of hospital poetry and shove it at them, make them read it until they understand. There is no such book, so I hope it will be acceptable if I just rip off of Wilfred Owen directly:
If in some smothering dreams you too could pace Behind the gurney that we flung him in, And watch the white eyes writhing in his face, His hanging face, like a devil’s sack of sin; If you could hear, at every jolt, the blood Come gargling from the froth-corrupted lungs, Obscene with cancer, bitter with the cud Of vile, incurable sores on innocent tongues My friend, you would not so pontificate To reasoners beset by moral strife The old lie: we must try to cultivate A culture of life.
Maria Wisława Anna Szymborska was a Polish poet, essayist, translator and recipient of the 1996 Nobel Prize in Literature. Born in Prowent, which has since become part of Kórnik, she later resided in Kraków until the end of her life. In Poland, Szymborska's books have reached sales rivaling prominent prose authors', though she wrote in a poem, "Some Like Poetry", that "perhaps" two in a thousand people like poetry.
The kingfisher rises out of the black wave like a blue flower, in his beak he carries a silver leaf. I think this is the prettiest world—so long as you don’t mind a little dying, how could there be a day in your whole life that doesn’t have its splash of happiness? There are more fish than there are leaves on a thousand trees, and anyway the kingfisher wasn’t born to think about it, or anything else. When the wave snaps shut over his blue head, the water remains water—hunger is the only story he has ever heard in his life that he could believe. I don’t say he’s right. Neither do I say he’s wrong. Religiously he swallows the silver leaf with its broken red river, and with a rough and easy cry I couldn’t rouse out of my thoughtful body if my life depended on it, he swings back over the bright sea to do the same thing, to do it (as I long to do something, anything) perfectly.
Let us not talk philosophy, drop it, Jeanne. So many words, so much paper, who can stand it. I told you the truth about my distancing myself. I’ve stopped worrying about my misshapen life. It was no better and no worse than the usual human tragedies.
For over thirty years we have been waging our dispute As we do now, on the island under the skies of the tropics. We flee a downpour, in an instant the bright sun again, And I grow dumb, dazzled by the emerald essence of the leaves.
We submerge in foam at the line of the surf, We swim far, to where the horizon is a tangle of banana bush, With little windmills of palms. And I am under accusation: That I am not up to my oeuvre, That I do not demand enough from myself, As I could have learned from Karl Jaspers, That my scorn for the opinions of this age grows slack.
I roll on a wave and look at white clouds.
You are right, Jeanne, I don’t know how to care about the salvation of my soul. Some are called, others manage as well as they can. I accept it, what has befallen me is just. I don’t pretend to the dignity of a wise old age. Untranslatable into words, I chose my home in what is now, In things of this world, which exist and, for that reason, delight us: Nakedness of women on the beach, coppery cones of their breasts, Hibiscus, alamanda, a red lily, devouring With my eyes, lips, tongue, the guava juice, the juice of la prune de Cythère, Rum with ice and syrup, lianas-orchids In a rain forest, where trees stand on the stilts of their roots.
Death, you say, mine and yours, closer and closer, We suffered and this poor earth was not enough. The purple-black earth of vegetable gardens Will be here, either looked at or not. The sea, as today, will breathe from its depths. Growing small, I disappear in the immense, more and more free.
Czesław Miłosz was a Polish-American poet, prose writer, translator, and diplomat. Regarded as one of the great poets of the 20th century, he won the 1980 Nobel Prize in Literature. In its citation, the Swedish Academy called Miłosz a writer who "voices man's exposed condition in a world of severe conflicts".
"I made a series of mistakes that culminated in the worst sailing accident of my life, and almost took me to the bottom of the ocean."
[One fall evening after work, Marlinspike and a friend made a simple plan to sail a 15-foot catamaran out 600 feet into the San Francisco Bay, where they’d drop anchor and row back in a smaller boat, leaving the sailboat to wait for their next adventure. (Anarchist sailors don’t like to pay dockage fees.) Marlinspike headed out into the bay on the catamaran with his friend following in a rowboat. Only after Marlinspike had passed the pier did he realize the wind was blowing at a treacherous 30 miles an hour. He decided to turn back but discovered that he’d misrigged the craft and had to fix his mistake. As the sun sank toward the horizon, he shouted to his friend that they should give up and return to shore, and the friend rowed back to safety.
Then, without warning, the wind gusted. The catamaran flipped, throwing Marlinspike into the ice-cold water. “The suddenness of it was unbelievable, as if I was on a tiny model made of paper which someone had simply flicked with their finger,” he would later write in a blog post about the experience. Soon the boat was fully upside down, pinned in place by the wind. Marlinspike tried to swim for shore. But the pier was too far away, the waves too strong, and he could feel his body succumbing to hypothermia, blackness creeping into the edges of his vision. He headed back to the overturned boat. Alone now in the dark, he clung to the hull, took stock of the last hour’s events, and realized, with slow and lonely certainty, that he was very likely going to die.
When a tugboat finally chanced upon his soaked and frozen form he was nearly unconscious and had to be towed up with a rope. When he arrived at the hospital, Marlinspike says, the nurses told him his temperature was so low their digital thermometers couldn’t register it. As he recovered over the next days, he had the sort of realization that sometimes results from a near-death experience. “It definitely sharpened my focus,” he says of the incident. “It made me question what I was doing with my life.”
Marlinspike’s time at Twitter had given him an ambitious sense of scale: He was determined to encrypt core chunks of the Internet. A normal person might have quit sailing. Instead, Marlinspike quit Twitter. A year and a day after he had started, he walked away from over $1 million in company stock.]
Matthew Rosenfeld, known as Moxie Marlinspike, is an American entrepreneur, cryptographer, and computer security researcher. Marlinspike is the creator of Signal, co-founder of the Signal Foundation, and currently serves as the CEO of Signal Messenger. He is also a co-author of the Signal Protocol encryption used by Signal, WhatsApp, Facebook Messenger, and Skype, responsible for the largest deployment of consumer end-to-end encryption.
When a white horse is not a horse is a paradox in Chinese philosophy. Around 300 BC, Gongsun Long wrote this dialectic analysis of the question "Can one legitimately assert 'white horse is not horse'?", in a work now named for him, Gongsun Longzi, in a segment called the "White Horse Dialog".
The Treachery of Images is a 1929 painting by surrealist painter René Magritte. It is also known as This is Not a Pipe and The Wind and the Song. Magritte painted it when he was 30 years old. It is on display at the Los Angeles County Museum of Art.
Visual Explanations: Images and Quantities, Evidence and Narrative [Tufte #3] is about pictures of verbs, the representation of mechanism and motion, process and dynamics, causes and effects, explanation and narrative. Practical applications and examples include statistical graphics, charts for making important decisions in engineering and medicine, technical manuals, diagrams, design of computer interfaces and websites and on-line manuals, animations and scientific visualizations, techniques for talks, and design strategies for enhancing the rate of information transfer in print, presentations, and computer screens. The use of visual evidence in deciding to launch the space shuttle Challenger is discussed in careful detail. Video snapshots show redesigns of a supercomputer animation of a thunderstorm. The book is designed and printed to the highest standards, with luscious color throughout and four built-in flaps for showing motion and before/after effects.
Dating back to medieval manuscripts, text has often been highlighted using a particular distinct bright red. The contrast of black and red on a white background is highly visible and striking, and this has been reused many times, in a way which I have not noticed for other colors. I call these uses rubrication and collate examples I have noticed from many time periods. This design pattern does not seem to have a widely-accepted name or be commonly discussed, so I propose extending the term “rubrication” to all instances of this pattern, not merely religious texts.
Why this rubrication design pattern? Why red, specifically, and not, say, orange or purple? Is it just a historical accident? Cross-cultural research suggests that for humans, red may be intrinsically more noticeable & has a higher contrast with black, explaining its perennial appeal as a design pattern.
Regardless, it is a beautiful design pattern which has been used in many interesting ways over the millennia, and perhaps may inspire the reader.
The Elements of Typographic Style is the authoritative book on typography and style by Canadian typographer, poet and translator Robert Bringhurst. Originally published in 1992 by Hartley & Marks Publishers, it was revised in 1996, 2001 (v2.4), 2002 (v2.5), 2004 (v3.0), 2005 (v3.1), 2008 (v3.2), and 2012 (v4.0). A history and guide to typography, it has been praised by Hermann Zapf, who said “I wish to see this book become the Typographers’ Bible.” Jonathan Hoefler and Tobias Frere-Jones consider it "the finest book ever written about typography," according to the FAQ section of their type foundry's website. Because of its status as a respected and frequently cited resource, typographers and designers often refer to it simply as Bringhurst.
Robert Bringhurst is a Canadian poet, typographer and author. He has translated substantial works from Haida and Navajo and from classical Greek and Arabic. He wrote The Elements of Typographic Style, a reference book of typefaces, glyphs and the visual and geometric arrangement of type. He was named an Officer of the Order of Canada in June 2013.
Sidenotes/margin notes are a typographic convention which improves on footnotes & endnotes by instead putting the notes in the page margin to let the reader instantly read them without needing to refer back and forth to the end of the document (endnotes) or successive pages (footnotes spilling over).
They are particularly useful for web pages, where ‘footnotes’ are de facto endnotes, and clicking back and forth to endnotes is a pain for readers. (Footnote variants, like “floating footnotes” which pop up on mouse hover, reduce the reader’s effort but don’t eliminate it.)
However, they are not commonly used, perhaps because web browsers until relatively recently made it hard to implement sidenotes easily & reliably. Tufte-CSS has popularized the idea and since then, there has been a proliferation of slightly variant approaches. I review some of the available implementations.
Within the context of a general discussion of the unintended effects of scientists on the results of their research, this work reported on the growing evidence that the hypothesis of the behavioral scientist could come to serve as self-fulfilling prophecy, by means of subtle processes of communication between the experimenter and the human or animal research subject. [The Science Citation Index (SCI) and the Social Sciences Citation Index (SSCI) indicate that the book has been cited over 740 times since 1966 [as of 1979].] —“Citation Classic”
[Enlarged Edition, expanded with discussion of the Pygmalion effect etc: ISBN 0-470-01391-5]
Review of a major and widely-cited psychology monograph purporting to demonstrate pervasive and powerful effects of social expectations and settings and the general environment on all aspects of human psychology, experimentation, and research, even to the point of the 'Pygmalion effect' proving that teacher expectations can boost student IQs by hundreds of points. The Pygmalion effect was based on impossible data, was defended by statistical malpractice, and repeatedly failed to replicate, and this exemplifies the problem with Rosenthal's research and the book as a whole: despite its appearance of extreme rigor and concern for bias, it is clear that the results actually exemplify the Replication Crisis and that almost none of his research is reliable, bogus from beginning to end, and the results were designed to serve ideological goals despite the intrinsic absurdity and inconsistency with basic observations of the stability and consistency and predictive power of individual differences and the impotence of environmental interventions.
Princess Shikishi was a Japanese classical poet, who lived during the late Heian and early Kamakura periods. She was the third daughter of Emperor Go-Shirakawa. In 1159, Shikishi, who did not marry, went into service at the Kamo Shrine in Kyoto. She left the shrine after some time, and in her later years became a Buddhist nun.
[Review of translation of complete corpus of imperial court poet Princess Shikishi (1149–1201). While well-annotated, Sato's decision to translate each poem in a single line drains it of any enjoyability, turning it into a prose-like slog.]
Rurouni Kenshin is a 2012 Japanese period action-adventure film based on the manga of the same name written and illustrated by Nobuhiro Watsuki. Directed by Keishi Ōtomo, the film stars Takeru Satoh and Emi Takei. It focuses on fictional events that take place during the early Meiji period in Japan, telling the story of a wanderer named Himura Kenshin, formerly known as the assassin Hitokiri Battōsai. After participating in the Bakumatsu war, Kenshin wanders the countryside of Japan offering protection and aid to those in need as atonement for the murders he once committed.
Rurouni Kenshin: Kyoto Inferno is a 2014 Japanese film directed by Keishi Ōtomo and based on the manga series Rurouni Kenshin. It is the first of two sequels to the 2012 live-action Rurouni Kenshin film, and was followed by The Legend Ends released later the same year.
Rurouni Kenshin: The Legend Ends is a 2014 Japanese jidaigeki action film directed by Keishi Ōtomo and based on the manga series Rurouni Kenshin. The story follows two prior films, Rurouni Kenshin (2012) and Rurouni Kenshin: Kyoto Inferno (2014).
The Magic Flute, K. 620, is an opera in two acts by Wolfgang Amadeus Mozart to a German libretto by Emanuel Schikaneder. The work is in the form of a Singspiel, a popular form during the time it was written that included both singing and spoken dialogue. The work premiered on 30 September 1791 at Schikaneder's theatre, the Freihaus-Theater auf der Wieden in Vienna, just two months before the composer's premature death.
Rebroadcast of abridged 2006 performance (which demonstrates how much Met HD broadcasts have improved technically over the past decade). Gorgeous nonsense. With excellent music by Mozart, lyrics unusually in English, and eccentrically colorful costumes/sets, but mostly unconvincing characters (not helped by abridgement), and a plot stuffed full of Masonic symbols but lacking any sense.
My Little Pony: Friendship Is Magic is a Canadian-American animated fantasy television series based on Hasbro's My Little Pony line of toys and animated works and is often referred to by collectors as the fourth generation of the franchise. The series aired on The Hub from October 10, 2010 to October 12, 2019. Hasbro selected animator Lauren Faust as the creative director and executive producer for the show. Faust sought to challenge the established nature of the existing My Little Pony line, creating more in-depth characters and adventurous settings; she left the series during season 2, to be replaced by Meghan McCarthy as showrunner for the remainder of the series.
The ninth and final season of the animated television series My Little Pony: Friendship Is Magic, developed by Lauren Faust, originally aired on the Discovery Family channel in the United States. The series is based on Hasbro's My Little Pony line of toys and animated works and is often referred by collectors to be the fourth generation, or "G4", of the My Little Pony franchise. Season 9 of the series premiered on April 6, 2019, on Discovery Family, an American pay television channel partly owned by Hasbro, and concluded with a three-part series finale on October 12.
I watch the 2010 Western animated series My Little Pony: Friendship is Magic (seasons 1–9), delving deep into it and the MLP fandom, and reflect on it. What makes it good and powers its fandom subculture, producing a wide array of fanfictions, music, and art? Focusing on fandom, plot, development, and meaning of bronydom, I conclude that, among other things, it has surprisingly high-quality production & aesthetics which are easily adapted to fandom and which power a Westernized shonen anime—which depicts an underappreciated plausibly-contemporary capitalist utopian perspective on self-actualization, reminiscent of other more explicitly self-help-oriented pop culture movements such as the recent Jordan B. Peterson movement. Included are my personal rankings of characters, seasons, episodes, and official & fan music.
The Monogatari Japanese anime television series is based on the light novel series of the same name, written by Nisio Isin with illustrations by Vofan. The anime is directed by several directors and produced by the animation studio Shaft. The series debuted with Bakemonogatari and aired 12 episodes between July 3 and September 25, 2009, on the Tokyo MX television station. Three additional original net animation episodes were distributed on the anime's official website between November 3, 2009, and June 25, 2010. A sequel titled Nisemonogatari aired 11 episodes between January 7 and March 17, 2012. A prequel to the original series titled Nekomonogatari (Black) aired four episodes back-to-back on December 31, 2012. Six further sequels were later adapted under the common moniker of Monogatari Series Second Season: Nekomonogatari (White), Kabukimonogatari, Otorimonogatari, Onimonogatari, and Koimonogatari aired between July 6 and December 28, 2013, whereas Hanamonogatari, which was originally meant to air with the others in 2013, was postponed and eventually broadcast separately on August 16, 2014. The "final season" of the novels were adapted as Tsukimonogatari, Owarimonogatari, Koyomimonogatari, and Zoku Owarimonogatari which aired from December 31, 2014 through June 22, 2019. An adaptation of the prequel to Bakemonogatari, titled Kizumonogatari, was announced in 2010 but delayed for six years until finally being released as a film trilogy from January 8, 2016 to January 6, 2017.
Subscription page for the monthly gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the changelog), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse the archives since December 2013.
While training a GPT-2-117M on a folk music corpus written in ABC format, persistent syntax errors kept being generated by an otherwise-high-quality model: random spaces would be generated, rendering a music piece either erroneous or lower-quality. Why? It seems to be some issue with the GPT BPE encoder handling of spaces which makes it difficult to emit the right space-separated characters. We found that ABC does not actually require spaces, and we simply removed all spaces from the corpus—noticeably improving quality of generated pieces.
Generating symbolic music with language models is a promising research area, with potential applications in automated music composition. Recent work shows that Transformer architectures can learn to generate compelling four-instrument scores from large MIDI datasets. In this paper, we re-train the small (117M) GPT-2 model with a large dataset in ABC notation, and generate samples of single-instrument folk music. Our BLEU and ROUGE based quantitative, and survey based qualitative, evaluations suggest that ABC notation is learned with syntactical and semantic correctness, and that samples contain robust and believable n-grams.
To expand the ABC GPT-2 model to cover a wider variety of musical genres, I turn to the next-most compact widespread music encoding format: MIDI. There are hundreds of thousands of MIDIs which can be decompiled to ABC format, averaging ~10k BPEs—within GPT-2-117M’s feasible context window when trained on TPUs (which permit training of context windows up to 30k wide).
We compile the ABC from before and 2 large MIDI datasets, and convert to ABC, yielding ~453k usable ABC-MIDI musical files (~5.1GB of text). We trained January–April 2020 on our TPU swarm (with many interruptions), achieving a final loss of ~0.2 (underfit).
Sampling from the final model is hit-or-miss as it is prone to the likelihood repetition trap and it generates instruments one-by-one so it is common for instruments to be cut off or otherwise broken during sampling (indicating that sampling is increasingly a bigger problem than training for long-range sequence modeling). However, successful pieces are possible, and are musically far more diverse than the folk ABC corpus, with many pleasingly complex samples.
Long-standing problems in standard scientific methodology have exploded as the “Replication Crisis”: the discovery that many results in fields as diverse as psychology, economics, medicine, biology, and sociology are in fact false or quantitatively highly inaccurately measured. I cover here a handful of the issues and publications on this large, important, and rapidly developing topic up to about 2013, at which point the Replication Crisis became too large a topic to cover more than cursorily.
The crisis is caused by methods & publishing procedures which interpret random noise as important results, far too small datasets, selective analysis by an analyst trying to reach expected/desired results, publication bias, poor implementation of existing best-practices, nontrivial levels of research fraud, software errors, philosophical beliefs among researchers that false positives are acceptable, neglect of known confounding like genetics, and skewed incentives (financial & professional) to publish ‘hot’ results.
Thus, any individual piece of research typically establishes little. Scientific validation comes not from small p-values, but from discovering a regular feature of the world which disinterested third parties can discover with straightforward research done independently on new data with new procedures—replication.
Statistical folklore asserts that “everything is correlated”: in any real-world dataset, most or all measured variables will have non-zero correlations, even between variables which appear to be completely independent of each other, and that these correlations are not merely sampling error flukes but will appear in large-scale datasets to arbitrarily designated levels of statistical-significance or posterior probability.
This raises serious questions for null-hypothesis statistical-significance testing, as it implies the null hypothesis of 0 will always be rejected with sufficient data, meaning that a failure to reject only implies insufficient data, and provides no actual test or confirmation of a theory. Even a directional prediction is minimally confirmatory since there is a 50% chance of picking the right direction at random.
It also has implications for conceptualizations of theories & causal models, interpretations of structural models, and other statistical principles such as the “sparsity principle”.
Problems with social experiments and evaluating them, loopholes, causes, and suggestions; non-experimental methods systematically deliver false results, as most interventions fail or have small effects.
Compilation of studies comparing observational results with randomized experimental results on the same intervention, compiled from medicine/economics/psychology, indicating that a large fraction of the time (although probably not a majority) correlation ≠ causality.
In information systems, a tag is a keyword or term assigned to a piece of information. This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are generally chosen informally and personally by the item's creator or by its viewer, depending on the system, although they may also be chosen from a controlled vocabulary.
The Casual Vacancy is a 2012 novel written by J. K. Rowling. The book was published worldwide by the Little, Brown Book Group on 27 September 2012. A paperback edition was released on 23 July 2013. It was Rowling's first publication since the Harry Potter series, her first apart from that series, and her first novel for adult readership.
Joanne Rowling, better known by her pen name J. K. Rowling, is a British author and philanthropist. She is best known for writing the Harry Potter fantasy series, which has won multiple awards and sold more than 500 million copies, becoming the best-selling book series in history. The books are the basis of a popular film series, over which Rowling had overall approval on the scripts and was a producer on the final films. She also writes crime fiction under the pen name Robert Galbraith.
Catch-22 is a satirical war novel by American author Joseph Heller. He began writing it in 1953; the novel was first published in 1961. Often cited as one of the most significant novels of the twentieth century, it uses a distinctive non-chronological third-person omniscient narration, describing events from the points of view of different characters. The separate storylines are out of sequence so the timeline develops along with the plot.
Joseph Heller was an American author of novels, short stories, plays, and screenplays. His best-known work is the 1961 novel Catch-22, a satire on war and bureaucracy, whose title has become a synonym for an absurd or contradictory choice.
American Gods (2001) is a fantasy novel by British author Neil Gaiman. The novel is a blend of Americana, fantasy, and various strands of ancient and modern mythology, all centering on the mysterious and taciturn Shadow.
Neil Richard MacKinnon Gaiman is an English author of short fiction, novels, comic books, graphic novels, nonfiction, audio theatre, and films. His works include the comic book series The Sandman and novels Stardust, American Gods, Coraline, and The Graveyard Book. He has won numerous awards, including the Hugo, Nebula, and Bram Stoker awards, as well as the Newbery and Carnegie medals. He is the first author to win both the Newbery and the Carnegie medals for the same work, The Graveyard Book (2008). In 2013, The Ocean at the End of the Lane was voted Book of the Year in the British National Book Awards.
A Game of Thrones is the first novel in A Song of Ice and Fire, a series of fantasy novels by the American author George R. R. Martin. It was first published on August 1, 1996. The novel won the 1997 Locus Award and was nominated for both the 1997 Nebula Award and the 1997 World Fantasy Award. The novella Blood of the Dragon, comprising the Daenerys Targaryen chapters from the novel, won the 1997 Hugo Award for Best Novella. In January 2011, the novel became a New York Times Bestseller and reached No. 1 on the list in July 2011.
George Raymond Richard Martin, also known as GRRM, is an American novelist and short story writer, screenwriter, and television producer. He is the author of the series of epic fantasy novels A Song of Ice and Fire, which was adapted into the HBO series Game of Thrones (2011–2019).
Markus Zusak is an Australian writer with Austrian and German roots. He is best known for The Book Thief and The Messenger, two novels which became international bestsellers. He won the Margaret A. Edwards Award in 2014.
Black Leopard, Red Wolf is a 2019 fantasy novel by writer Marlon James. It is the first book of a planned trilogy. The novel draws on African history and mythology, blended into the landscape of the North Kingdom and the South Kingdom, and the political tensions between these two warring states, as well as various city-states and tribes in the surrounding landscape. The rights to produce a film adaptation were purchased by Michael B. Jordan in February 2019 prior to release of the book.
Marlon James is a Jamaican writer. He has written four novels: John Crow's Devil (2005), The Book of Night Women (2009), A Brief History of Seven Killings (2014), winner of the 2015 Man Booker Prize, and Black Leopard, Red Wolf (2019). Now living in Minneapolis, Minnesota, in the U.S., James teaches literature at Macalester College in St. Paul, Minnesota. He is also a faculty lecturer at St. Francis College's Low Residency MFA in Creative Writing.
Catherynne M. Valente is an American fiction writer, poet, and literary critic. For her speculative fiction novels she has won the annual James Tiptree, Andre Norton, and Mythopoeic Fantasy Awards. Her short fiction has appeared in Clarkesworld Magazine, the World Fantasy Award–winning anthologies Salon Fantastique and Paper Cities, along with numerous Year's Best volumes. Her critical work has appeared in the International Journal of the Humanities as well as in numerous essay collections.
John Crowley is an American author of fantasy, science fiction and historical fiction. He has also written essays. Crowley studied at Indiana University and has a second career as a documentary film writer.
Stacy Madeleine Schiff is an American former editor, essayist, and author of five biographies; her biography of Vera Nabokov, the wife and muse of the Russian-American novelist Vladimir Nabokov, won the 2000 Pulitzer Prize in biography.
Stephenie Meyer is an American novelist. She is best known for her vampire romance series Twilight, which has sold over 100 million copies, with translations into 37 different languages. Meyer was the bestselling author of 2008 and 2009 in the U.S., having sold over 29 million books in 2008, and 26.5 million in 2009. Meyer received the 2009 Children's Book of the Year award from the British Book Awards for Breaking Dawn, the Twilight series finale.
David Foster Wallace was an American author of novels, short stories and essays, as well as a university professor of English and creative writing. Wallace is widely known for his 1996 novel Infinite Jest, which Time magazine cited as one of the 100 best English-language novels from 1923 to 2005. His posthumous novel, The Pale King (2011), was a finalist for the Pulitzer Prize for Fiction in 2012.
The Glass Bead Game is the last full-length novel of the German author Hermann Hesse. It was begun in 1931 and published in Switzerland in 1943 after being rejected for publication in Germany due to Hesse's anti-Fascist views. A few years later, in 1946, Hesse won the Nobel Prize in Literature. In honoring him in its Award Ceremony Speech, the Swedish Academy said that the novel "occupies a special position" in Hesse's work.
Hermann Karl Hesse was a German-born Swiss poet, novelist, and painter. His best-known works include Demian, Steppenwolf, Siddhartha, and The Glass Bead Game, each of which explores an individual's search for authenticity, self-knowledge and spirituality. In 1946, he received the Nobel Prize in Literature.
Theft by Finding: Diaries (1977–2002) is an edited compilation of diary entries by David Sedaris published on May 30, 2017. Sedaris shares selected entries spanning from his days as a 20-year-old hitchhiking through Oregon to living in London just shy of his 46th birthday. It was released in advance of David Sedaris Diaries: A Visual Compendium, which was published the same year and edited by Jeffrey Jenkins.
David Raymond Sedaris is an American humorist, comedian, author, and radio contributor. He was publicly recognized in 1992 when National Public Radio broadcast his essay "Santaland Diaries." He published his first collection of essays and short stories, Barrel Fever, in 1994. He is the brother and writing collaborator of actor Amy Sedaris.
Followup section to the article covering how to search the Internet effectively: 14 case studies of challenging Internet searches drawn from the past 10 years. I present the problem, and step through the process of finding it, and describe my tacit knowledge and implicit strategies. These case studies hopefully make the prior tips more understandable by showing them off in practice.
Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on task-specific datasets.
We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset—matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples.
The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text.
These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at https://github.com/salesforce/ctrl.
“Language Models are Few-Shot Learners”, Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei (2020-05-28):
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions—something which current NLP systems still largely struggle to do.
Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10× more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
How to use StyleGAN2, an improvement to StyleGAN released in December 2019, which removes the blob artifacts and is generally of somewhat higher visual quality. StyleGAN 2 is tricky to use because it requires custom local compilation of optimized code. Aaron Gokaslan provided tips on getting StyleGAN 2 running and trained a StyleGAN 2 on my anime portraits, which is available for download and which I use to create TWDNEv3"
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
For information about my site’s philosophy, method, traffic statistics, and implementation, see the About page; for information about myself, my use of other websites, and contact information, see the Links page; for information about new pages, see the Changelog; to receive updates, news, & reviews, subscribe to the newsletter (archives).
In computer architecture, Amdahl's law is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It is named after computer scientist Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.
In economics, the Jevons paradox occurs when technological progress or government policy increases the efficiency with which a resource is used, but the rate of consumption of that resource rises due to increasing demand. The Jevons paradox is perhaps the most widely known paradox in environmental economics. However, governments and environmentalists generally assume that efficiency gains will lower resource consumption, ignoring the possibility of the paradox arising.
Simpson's paradox, which also goes by several other names, is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. This result is often encountered in social-science and medical-science statistics and is particularly problematic when frequency data is unduly given causal interpretations. The paradox can be resolved when causal relations are appropriately addressed in the statistical modeling. It is also referred to as Simpson's reversal, Yule–Simpson effect, amalgamation paradox, or reversal paradox.
Paraplegia is an impairment in motor or sensory function of the lower extremities. The word comes from Ionic Greek (παραπληγίη)"half-stricken". It is usually caused by spinal cord injury or a congenital condition that affects the neural (brain) elements of the spinal canal. The area of the spinal canal that is affected in paraplegia is either the thoracic, lumbar, or sacral regions. If four limbs are affected by paralysis, tetraplegia or quadriplegia is the correct term. If only one limb is affected, the correct term is monoplegia. Spastic paraplegia is a form of paraplegia defined by spasticity of the affected muscles, rather than flaccid paralysis.
John Eckhardt Jr, professionally billed as Johnny Eck was an American freak show performer in sideshows and a film actor. Born without the lower half of his torso, Eck is best known today for his role in Tod Browning's 1932 cult classic film Freaks and his appearances as a bird creature in several Tarzan films. He was often billed as "The Amazing Half-Boy", "King of the Freaks" and "The Most Remarkable Man Alive".
Freaks is a 1932 American pre-Code horror film produced and directed by Tod Browning, and starring Wallace Ford, Leila Hyams, Olga Baclanova and Roscoe Ates. It follows a trapeze artist who joins a group of carnival sideshow performers with a plan to seduce and murder a dwarf in the troupe to gain his inheritance, but her plot proves to have dangerous consequences. The film is based on elements from the short story "Spurs" by Tod Robbins.
A decompiler is a computer program that takes an executable file as input, and attempts to create a high level source file which can be recompiled successfully. It is therefore the opposite of a compiler, which takes a source file and makes an executable. Decompilers are usually unable to perfectly reconstruct the original source code, and as such, will frequently produce obfuscated code. Nonetheless, decompilers remain an important tool in the reverse engineering of computer software.
The replication crisis is, as of 2020, an ongoing methodological crisis in which it has been found that many scientific studies are difficult or impossible to replicate or reproduce. The replication crisis affects the social sciences and medicine most severely. The crisis has long-standing roots; the phrase was coined in the early 2010s as part of a growing awareness of the problem. The replication crisis represents an important body of research in the field of metascience.
Spaced repetition is a centuries-old psychological technique for efficient memorization & practice of skills where instead of attempting to memorize by ‘cramming’, memorization can be done far more efficiently by instead spacing out each review, with increasing durations as one learns the item, with the scheduling done by software. Because of the greater efficiency of its slow but steady approach, spaced repetition can scale to memorizing hundreds of thousands of items (while crammed items are almost immediately forgotten) and is especially useful for foreign languages & medical studies.
I review what this technique is useful for, some of the large research literature on it and the testing effect (up to ~2013, primarily), the available software tools and use patterns, and miscellaneous ideas & observations on it.
Generative neural networks, such as GANs, have struggled for years to generate decent-quality anime faces, despite their great success with photographic imagery such as real human faces. The task has now been effectively solved, for anime faces as well as many other domains, by the development of a new generative adversarial network, StyleGAN, whose source code was released in February 2019.
The appendix gives samples of my failures with earlier GANs for anime face generation, and I provide samples & model from a relatively large-scale BigGAN training run suggesting that BigGAN may be the next step forward to generating full-scale anime images.
A minute of reading could save an hour of debugging!
Meta page describing gwern.net site ideals of stable long-term essays which improve over time; technical decisions using Markdown and static hosting; idea sources and writing methodology; metadata definitions; site statistics; copyright license.
We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
Danbooru2019 Portraits is a dataset of n = 302,652 (16GB) 512px anime faces cropped from ‘solo’ SFW Danbooru2019 images in a relatively broad ‘portrait’ style encompassing necklines/ears/hats/etc rather than tightly focused on the face, upscaled to 512px as necessary, and low-quality images deleted by manual review using Discriminator ranking. This dataset has been used for creating TWDNE.
Deep learning for computer revision relies on large annotated datasets. Classification/categorization has benefited from the creation of ImageNet, which classifies 1m photos into 1000 categories. But classification/categorization is a coarse description of an image which limits application of classifiers, and there is no comparably large dataset of images with many tags or labels which would allow learning and detecting much richer information about images. Such a dataset would ideally be >1m images with at least 10 descriptive tags each which can be publicly distributed to all interested researchers, hobbyists, and organizations. There are currently no such public datasets, as ImageNet, Birds, Flowers, and MS COCO fall short either on image or tag count or restricted distribution. I suggest that the “image -boorus” be used. The image boorus are longstanding web databases which host large numbers of images which can be ‘tagged’ or labeled with an arbitrary number of textual descriptions; they were developed for and are most popular among fans of anime, who provide detailed annotations.
The best known booru, with a focus on quality, is Danbooru. We provide a torrent/rsync mirror which contains ~3tb of 3.69m images with 108m tag instances (of 392k defined tags, ~29/image) covering Danbooru from 2005-05-24–2019-12-31 (final ID: #3,734,659), providing the image files & a JSON export of the metadata. We also provide a smaller torrent of SFW images downscaled to 512×512px JPGs (295GB; 2,828,400 images) for convenience.
Our hope is that a Danbooru2019 dataset can be used for rich large-scale classification/tagging & learned embeddings, test out the transferability of existing computer vision techniques (primarily developed using photographs) to illustration/anime-style images, provide an archival backup for the Danbooru community, feed back metadata improvements & corrections, and serve as a testbed for advanced techniques such as conditional image generation or style transfer.
Generating high-quality anime faces has long been a task neural networks struggled with. The invention of StyleGAN in 2018 has effectively solved this task and I have trained a StyleGAN model which can generate high-quality anime faces at 512px resolution. To show off the recent progress, I made a website, “This Waifu Does Not Exist” for displaying random StyleGAN 2 faces. TWDNE displays a different neural-net-generated face & plot summary every 15s. The site was popular and went viral online, especially in China. The model can also be used interactively for exploration & editing in the Artbreeder online service.
TWDNE faces have been used as screensavers, user avatars, character art for game packs or onlinegames, uploaded to Pixiv, given away in streams, and used in a research paper (Noguchi & Harada 2019). TWDNE results also helped inspired Sizigi Studio’s online interactive waifu GAN, Waifu Labs, which generates even better anime faces than my StyleGAN results.
Discussion of how to modify existing images with GANs. There are several possibilities: train another NN to turn an image back into the original encoding; run blackbox search on encodings, repeatedly tweaking it to approximate a target face; or the whitebox approach, directly backpropagating through the model from the image to the encoding while holding the model fixed. All of these have been implemented for StyleGAN, and a combination works best. There are even GUIs for editing StyleGAN anime faces!
I explore BigGAN, another recent GAN with SOTA results on the most complex image domain tackled by GANs so far, ImageNet. BigGAN’s capabilities come at a steep compute cost, however. I experiment with 128px ImageNet transfer learning (successful) with ~6 GPU-days, and from-scratch 256px anime portraits of 1000 characters on a 8×2080ti machine for a month (mixed results). My BigGAN results are good but compromised by practical problems with the released BigGAN code base. While BigGAN is not yet superior to StyleGAN for many purposes, BigGAN-like approaches may turn out to be necessary to scale to whole anime images.
The Discriminator of a GAN is trained to detect outliers or bad datapoints. So it can be used for cleaning the original dataset of aberrant samples. This works reasonably well and I obtained BigGAN/StyleGAN quality improvements by manually deleting the worst samples (typically badly-cropped or low-quality faces), but has peculiar behavior which indicates that the Discriminator is not learning anything equivalent to a “quality” score but may be doing some form of memorization of specific real datapoints. What does this mean for how GANs work?
I continue my AI poetry generation experiments with OpenAI’s 2020 GPT-3, which is 116× larger, and much more powerful, than the 2019 GPT-2. GPT-3, however, is not merely a quantitative tweak yielding “GPT-2 but better”—it is qualitatively different, exhibiting eerie runtime learning capabilities allowing even the raw model, with zero finetuning, to “meta-learn” many textual tasks purely by example or instruction. One does not train or program GPT-3 in a normal way, but one engages in dialogue and writes prompts to teach GPT-3 what one wants.
Experimenting through the OpenAI Beta API in June 2020, I find that GPT-3 does not just match my finetuned GPT-2-1.5b-poetry for poem-writing quality, but exceeds it, while being versatile in handling poetry, Tom Swifty puns, science fiction, dialogue like Turing’s Turing-test dialogue, literary style parodies… As the pièce de résistance, I recreate Stanislaw Lem’s Cyberiad’s “Trurl’s Electronic Bard” poetry using GPT-3. (Along the way, I document instances of how the BPE text encoding unnecessarily damagesGPT-3’s performance on a variety of tasks, how to best elicit the highest-quality responses, common errors people make in using GPT-3, and test out GPT-3’s improvements in NN weak points like logic or commonsense knowledge.)
GPT-3’s samples are not just close to human level: they are creative, witty, deep, meta, and often beautiful. They demonstrate an ability to handle abstractions, like style parodies, I have not seen in GPT-2 at all. Chatting with GPT-3 feels uncannily like chatting with a human. I was impressed by the results reported in the GPT-3 paper, and after spending a week trying it out, I remain impressed.
This page records GPT-3 samples I generated in my explorations, and thoughts on how to use GPT-3 and its remaining weaknesses. I hope you enjoy them even a tenth as much as I enjoyed testing GPT-3 and watching the completions scroll across my screen.
[Artbreeder is an interactive GAN generator website. Originally named “Ganbreeder” and providing only the 256px BigGAN generator, it now provides a variety of BigGAN & StyleGAN models, including the anime portrait StyleGAN model. (It is more general than the similar Waifu Labs, but my anime model is not as good.) Users can generate random samples and explore slight variants of them to gradually explore the “latent space” and find interesting images, but they can also edit images more directly, upload existing images to find the most similar image produced by the model, etc. A popular website, it has generated >56m images from September 2019 to January 2020.]
Thanks to the recent development of deep generative models, it is becoming possible to generate high-quality images with both fidelity and diversity. However, the training of such generative models requires a large dataset. To reduce the amount of data required, we propose a new method for transferring prior knowledge of the pre-trained generator, which is trained with a large dataset, to a small dataset in a different domain. Using such prior knowledge, the model can generate images leveraging some common sense that cannot be acquired from a small dataset. In this work, we propose a novel method focusing on the parameters for batch statistics, scale and shift, of the hidden layers in the generator. By training only these parameters in a supervised manner, we achieved stable training of the generator, and our method can generate higher quality images compared to previous methods without collapsing, even when the dataset is small ( 100). Our results show that the diversity of the filters acquired in the pre-trained generator is important for the performance on the target domain. Our method makes it possible to add a new class or domain to a pre-trained generator without disturbing the performance on the original domain.
[Waifu Labs is an interactive website for generating (1024px?) anime faces using a customized StyleGAN trained on Danbooru2018. Similar to Artbreeder, it supports face exploration and face editing, and at the end, a user can purchase prints of a particular face.]
We taught a world-class artificial intelligence how to draw anime. All the drawings you see were made by a non-human artist! Wild, right? It turns out machines love waifus almost as much as humans do. We proudly present the next chapter of human history: lit waifu commissions from the world's smartest AI artist. In less than 5 minutes, the artist learns your preferences to make the perfect waifu just for you.
Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator’s input. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128×128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous best IS of 52.52 and FID of 18.6.
Compared to GPT-2, GPT-3 improves performance on character-level tasks like rhyming, alliteration, punning, anagrams or permutations, acrostic poems, and arithmetic less than expected, despite being very good at many other closely-related kinds of writings like satire.
Why? A plausible explanation is an obscure technical detail: as a performance optimization, GPT does not see characters but sub-word-chunks called “byte-pair encodings” (BPEs). Because GPTs never see characters but opaque partial-words, which vary chaotically based on the specific word and even the surrounding context, they are unable to easily learn about character-level aspects of language, like similar spellings or sounds, and are forced to learn relationships much more indirectly, like by brute-force memorizing of pairs of words.
Some experiments with reformatting GPT-3’s poorest-performing tasks to avoid inconsistent BPE encodings of strings shows small to large performance gains, consistent with this theory.
The GPT-3 neural network is so large a model in terms of power and dataset that it exhibits qualitatively different behavior: you do not apply it to a fixed set of tasks which were in the training dataset, requiring retraining on additional data if one wants to handle a new task (as one would have to retrain GPT-2); instead, you interact with it, expressing any task in terms of natural language descriptions, requests, and examples, tweaking the prompt until it “understands” & it meta-learns the new task based on the high-level abstractions it learned from the pretraining.
This is a rather different way of using a DL model, and it’s better to think of it as a new kind of programming, where the prompt is now a “program” which programs GPT-3 to do new things.
In February 2019, following up on my 2015–2016 text-generation experiments with char-RNNs, I experiment with the cutting-edge Transformer NN architecture for language modeling & text generation. Using OpenAI’s GPT-2-117M (117M) model pre-trained on a large Internet corpus and nshepperd’s finetuning code, I retrain GPT-2-117M on a large (117MB) Project Gutenberg poetry corpus. I demonstrate how to train 2 variants: “GPT-2-poetry”, trained on the poems as a continuous stream of text, and “GPT-2-poetry-prefix”, with each line prefixed with the metadata of the PG book it came from. In May 2019, I trained the next-largest GPT-2, GPT-2-345M, similarly, for a further quality boost in generated poems. In October 2019, I retrained GPT-2-117M on a Project Gutenberg corpus with improved formatting, and combined it with a contemporary poem dataset based on Poetry Foundation’swebsite. .> With just a few GPU-days on 1080ti GPUs, GPT-2-117M finetuning can produce high-quality poetry which is more thematically consistent than my char-RNN poems, capable of modeling subtle features like rhyming, and sometimes even a pleasure to read. I list the many possible ways to improve poem generation and further approach human-level poems. For the highest-quality AI poetry to date, see my followup page, “GPT-3 Creative Writing”.
Char-RNNs are unsupervised generative models which learn to mimic text sequences. I suggest extending char-RNNs with inline metadata such as genre or author prefixed to each line of input, allowing for better & more efficient metadata, and more controllable sampling of generated output by feeding in desired metadata. A 2015 experiment using torch-rnn on a set of ~30 Project Gutenberg e-books (1 per author) to train a large char-RNN shows that a char-RNN can learn to remember metadata such as authors, learn associated prose styles, and often generate text visibly similar to that of a specified author.
The Poetry Foundation is a Chicago-based American foundation created to promote poetry in the wider culture. It was formed from Poetry magazine, which it continues to publish, with a 2003 gift of $200 million from philanthropist Ruth Lilly.
Standard language generation neural network models, like GPT-2, are trained via likelihood training to imitate human text corpuses. Generated text suffers from persistent flaws like repetition, due to myopic generation word-by-word, and cannot improve on the training data because they are trained to predict ‘realistic’ completions of the training data.
A proposed alternative is to use reinforcement learning to train the NNs, to encourage global properties like coherence & lack of repetition, and potentially improve over the original corpus’s average quality. Preference learning trains a reward function on human ratings, and uses that as the ‘environment’ for a blackbox DRL algorithm like PPO.
OpenAI released a codebase implementing this dual-model preference learning approach for textual generation, based on GPT-2. Having previously used GPT-2 for poetry & music generation, I experimented with GPT-2 preference learning for unconditional music and poetry generation.
I found that preference learning seemed to work better for music than poetry, and seemed to reduce the presence of repetition artifacts, but the results, at n≅7,400 ratings compiled over 23 iterations of training+sampling November 2019–January 2020, are not dramatically better than alternative improvements like scaling up models or more thorough data-cleaning or more stringent sample curation. My blind ratings using n≅200 comparisons showed no large advantage for the RL-tuned samples (winning only 93 of 210 comparisons, or 46%).
This may be due to insufficient ratings, bad hyperparameters, or not using samples generated with common prefixes, but I suspect it’s the former, as some NLP tasks in Ziegler et al 2019 required up to 60k ratings for good performance, and the reward model appeared to achieve poor performance & succumb to adversarial examples easily.
Working with it, I suspect that preference learning is unnecessarily sample-inefficient & data-inefficient, and that the blackbox reinforcement learning approach is inferior to directly using the reward model to optimize text samples, and propose two major architectural overhauls: have the reward model directly model the implied ranking of every datapoint, and drop the agent model entirely in favor of backprop-powered gradient ascent which optimizes sequences to maximize the reward model’s output.
…Black is GPT-2. Its excuse [for this chess blunder] is that it’s a text prediction program with no concept of chess. As far as it knows, it’s trying to predict short alphanumeric strings like “e2e4” or “Nb7”. Nobody told it this represents a board game. It doesn’t even have a concept of 2D space that it could use to understand such a claim. But it still captured my rook! Embarrassing!…Last month, I asked him if he thought GPT-2 could play chess. I wondered if he could train it on a corpus of chess games written in standard notation (where, for example, e2e4 means “move the pawn at square e2 to square e4”). There are literally millions of games written up like this. GPT-2 would learn to predict the next string of text, which would correspond to the next move in the chess game. Then you would prompt it with a chessboard up to a certain point, and it would predict how the chess masters who had produced its training data would continue the game – ie make its next move using the same heuristics they would. Gwern handed the idea to his collaborator Shawn Presser, who had a working GPT-2 chess engine running within a week:…You can play against GPT-2 yourself by following the directions in the last tweet, though it won’t be much of a challenge for anyone better than I am.
…What does this imply? I’m not sure (and maybe it will imply more if someone manages to make it actually good). It was already weird to see something with no auditory qualia learn passable poetic meter. It’s even weirder to see something with no concept of space learn to play chess. Is any of this meaningful? How impressed should we be that the same AI can write poems, compose music, and play chess, without having been designed for any of those tasks? I still don’t know.
This work applies natural language modeling to generate plausible strategic moves in the ancient game of Go. We train the Generative Pretrained Transformer (GPT-2) to mimic the style of Go champions as archived in Smart Game Format (SGF), which offers a text description of move sequences. The trained model further generates valid but previously unseen strategies for Go. Because GPT-2 preserves punctuation and spacing, the raw output of the text generator provides inputs to game visualization and creative patterns, such as the Sabaki project’s game engine using auto-replays. Results demonstrate that language modeling can capture both the sequencing format of championship Go games and their strategic formations. Compared to random game boards, the GPT-2 fine-tuning shows efficient opening move sequences favoring corner play over less advantageous center and side play. Game generation as a language modeling task offers novel approaches to more than 40 other board games where historical text annotation provides training data (e.g., Amazons & Connect 4/6).
This work demonstrates that natural language transformers can support more generic strategic modeling, particularly for text-archived games. In addition to learning natural language skills, the abstract transformer architecture can generate meaningful moves on a chessboard. With further fine-tuning, the transformer learns complex gameplay by training on 2.8 million chess games in Portable Game Notation. After 30,000 training steps, OpenAI’s Generative Pre-trained Transformer (GPT-2) optimizes weights for 774 million parameters. This fine-tuned Chess Transformer generates plausible strategies and displays game formations identifiable as classic openings, such as English or the Slav Exchange. Finally, in live play, the novel model demonstrates a human-to-transformer interface that correctly filters illegal moves and provides a novel method to challenge the transformer’s chess strategies. We anticipate future work will build on this transformer’s promise, particularly in other strategy games where features can capture the underlying complex rule syntax from simple but expressive player annotations.