- “GPT-2 Preference Learning for Music Generation”, Branwen 2019
- “GPT-2 Neural Network Poetry”, Branwen & Presser 2019
- “This Waifu Does Not Exist”, Branwen 2019
- “Making Anime Faces With StyleGAN”, Branwen 2019
- “Making Anime With BigGAN”, Branwen 2019
- “Internet Search Tips”, Branwen 2018
- “Candy Japan’s New Box A/B Test”, Branwen 2016
- “Easy Cryptographic Timestamping of Files”, Branwen 2015
- “RNN Metadata for Mimicking Author Style”, Branwen 2015
- “When Should I Check The Mail?”, Branwen 2015
- “The Sort --key Trick”, Branwen 2014
- “A/B Testing Long-form Readability on Gwern.net”, Branwen 2012
- “Silk Road 1: Theory & Practice”, Branwen 2011
- “Archiving GitHub”, Branwen 2011
- “Archiving URLs”, Branwen 2011
- “Writing a Wikipedia RSS Link Archive Bot”, Branwen 2009
- “Writing a Wikipedia Link Archive Bot”, Branwen 2008
Experiments with OpenAI’s ‘preference learning’ approach, which trains a NN to predict global quality of datapoints, and then uses reinforcement learning to optimize that directly, rather than proxies. I am unable to improve quality, perhaps due to too-few ratings.
Standard language generation neural network models, like GPT-2, are trained via likelihood training to imitate human text corpuses. Generated text suffers from persistent flaws like repetition, due to myopic generation word-by-word, and cannot improve on the training data because they are trained to predict ‘realistic’ completions of the training data.
A proposed alternative is to use reinforcement learning to train the NNs, to encourage global properties like coherence & lack of repetition, and potentially improve over the original corpus’s average quality. Preference learning trains a reward function on human ratings, and uses that as the ‘environment’ for a blackbox DRL algorithm like PPO.
OpenAI released a codebase implementing this dual-model preference learning approach for textual generation, based on GPT-2. Having previously used GPT-2 for poetry & music generation, I experimented with GPT-2 preference learning for unconditional music and poetry generation.
I found that preference learning seemed to work better for music than poetry, and seemed to reduce the presence of repetition artifacts, but the results, at n ≈ 7,400 ratings compiled over 23 iterations of training+sampling November 2019–January 2020, are not dramatically better than alternative improvements like scaling up models or more thorough data-cleaning or more stringent sample curation. My blind ratings using n ≈ 200 comparisons showed no large advantage for the RL-tuned samples (winning only 93 of 210 comparisons, or 46%).
This may be due to insufficient ratings, bad hyperparameters, or not using samples generated with common prefixes, but I suspect it’s the former, as some NLP tasks in Ziegler et al 2019 required up to 60k ratings for good performance, and the reward model appeared to achieve poor performance & succumb to adversarial examples easily.
Working with it, I suspect that preference learning is unnecessarily sample-inefficient & data-inefficient, and that the blackbox reinforcement learning approach is inferior to directly using the reward model to optimize text samples, and propose two major architectural overhauls: have the reward model directly model the implied ranking of every datapoint, and drop the agent model entirely in favor of backprop-powered gradient ascent which optimizes sequences to maximize the reward model’s output.
- Why Preference Learning?
- Data Increases
- Architectural Improvements
- Optimization by Backprop, not Blackbox
- Bradley-Terry Preference Learning
- Decision Transformers: Preference Learning As Simple As Possible
- External Links
Demonstration tutorial of retraining OpenAI’s GPT-2 (a text-generating Transformer neural network) on large poetry corpuses to generate high-quality English verse.
In February 2019, following up on my 2015–2016 text-generation experiments with char-RNNs, I experiment with the cutting-edge Transformer NN architecture for language modeling & text generation. Using OpenAI’s GPT-2-117M (117M) model pre-trained on a large Internet corpus and nshepperd’s finetuning code, I retrain GPT-2-117M on a large (117MB) Project Gutenberg poetry corpus. I demonstrate how to train 2 variants: “GPT-2-poetry”, trained on the poems as a continuous stream of text, and “GPT-2-poetry-prefix”, with each line prefixed with the metadata of the PG book it came from. In May 2019, I trained the next-largest GPT-2, GPT-2-345M, similarly, for a further quality boost in generated poems. In October 2019, I retrained GPT-2-117M on a Project Gutenberg corpus with improved formatting, and combined it with a contemporary poem dataset based on Poetry Foundation’s website.
With just a few GPU-days on 1080ti GPUs, GPT-2-117M finetuning can produce high-quality poetry which is more thematically consistent than my char-RNN poems, capable of modeling subtle features like rhyming, and sometimes even a pleasure to read. I list the many possible ways to improve poem generation and further approach human-level poems. For the highest-quality AI poetry to date, see my followup pages, “GPT-3 Creative Writing”/“GPT-3 Non-Fiction”.
For anime plot summaries, see TWDNE; for generating ABC-formatted folk music, see “GPT-2 Folk Music” & “GPT-2 Preference Learning for Music and Poetry Generation”; for playing chess, see “A Very Unlikely Chess Game”; for the Reddit comment generator, see SubSimulatorGPT-2; for fanfiction, the Ao3; and for video games, the walkthrough model. For OpenAI’s GPT-3 followup, see “GPT-3: Language Models are Few-Shot Learners”.
- GPT-2-117M: Generating Poetry
- Training GPT-2-117M To Generate Poetry
- Essay on Criticism
- 8 Famous First Lines
- “Jabberwocky”, Lewis Carroll
- External Links
I describe how I made the website ThisWaifuDoesNotExist.net (TWDNE) for displaying random anime faces generated by StyleGAN neural networks, and how it went viral.
Generating high-quality anime faces has long been a task neural networks struggled with. The invention of StyleGAN in 2018 has effectively solved this task and I have trained a StyleGAN model which can generate high-quality anime faces at 512px resolution. To show off the recent progress, I made a website, “This Waifu Does Not Exist” for displaying random StyleGAN 2 faces. TWDNE displays a different neural-net-generated face & plot summary every 15s. The site was popular and went viral online, especially in China. The model can also be used interactively for exploration & editing in the Artbreeder online service.
TWDNE faces have been used as screensavers, user avatars, character art for game packs or online games, painted watercolors, uploaded to Pixiv, given away in streams, and used in a research paper (Noguchi & Harada 2019). TWDNE results also helped inspired Sizigi Studio’s online interactive waifu GAN, Waifu Labs, which generates even better anime faces than my StyleGAN results.
A tutorial explaining how to train and generate high-quality anime faces with StyleGAN 1/2 neural networks, and tips/scripts for effective StyleGAN use.
Generative neural networks, such as GANs, have struggled for years to generate decent-quality anime faces, despite their great success with photographic imagery such as real human faces. The task has now been effectively solved, for anime faces as well as many other domains, by the development of a new generative adversarial network, StyleGAN, whose source code was released in February 2019.
I show off my StyleGAN 1/2 CC-0-licensed anime faces & videos, provide downloads for the final models & anime portrait face dataset, provide the ‘missing manual’ & explain how I trained them based on Danbooru2017 / 2018 with source code for the data preprocessing, document installation & configuration & training tricks.
For application, I document various scripts for generating images & videos, briefly describe the website “This Waifu Does Not Exist” I set up as a public demo & its followup This Anime Does Not Exist.ai (TADNE) (see also Artbreeder), discuss how the trained models can be used for transfer learning such as generating high-quality faces of anime characters with small datasets (eg. Holo or Asuka Souryuu Langley), and touch on more advanced StyleGAN applications like encoders & controllable generation.
The anime face graveyard gives samples of my failures with earlier GANs for anime face generation, and I provide samples & model from a relatively large-scale BigGAN training run suggesting that BigGAN may be the next step forward to generating full-scale anime images.
A minute of reading could save an hour of debugging!
- Training requirements
- Data Preparation
- Transfer Learning
- Anime Faces → Character Faces
- Emilia (Re:Zero)
- Anime Faces → Anime Headshots
- Anime Faces → Portrait
- Anime Faces → Male Faces
- Anime Faces → Ukiyo-e Faces
- Anime Faces → Western Portrait Faces
- Anime Faces → Danbooru2018
- FFHQ Variations
- Reversing StyleGAN To Control & Modify Images
- StyleGAN 2
- Future Work
- See Also
- External Links
Experiments in using BigGAN to generate anime faces and whole anime images; semi-successful.
Following my StyleGAN anime face experiments, I explore BigGAN, another recent GAN with SOTA results on one of the most complex image domains tackled by GANs so far (ImageNet). BigGAN’s capabilities come at a steep compute cost, however.
Using the unofficial BigGAN-PyTorch reimplementation, I experimented in 2019 with 128px ImageNet transfer learning (successful) with ~6 GPU-days, and from-scratch 256px anime portraits of 1000 characters on an 8×2080ti machine for a month (mixed results). My BigGAN results are good but compromised by the compute expense & practical problems with the released BigGAN code base. While BigGAN is not yet superior to StyleGAN for many purposes, BigGAN-like approaches may be necessary to scale to whole anime images.
For followup experiments, Shawn Presser, I and others (collectively, “Tensorfork”) have used Tensorflow Research Cloud TPU credits & the compare_gan BigGAN reimplementation. Running this at scale on the full Danbooru2019 dataset in May 2020, we have reached the best anime GAN results to date (later exceeded by This Anime Does Not Exist).
- BigGAN Advantages
- BigGAN Disadvantages
A description of advanced tips and tricks for effective Internet research of papers/books, with real-world examples.
Over time, I developed a certain google-fu and expertise in finding references, papers, and books online. Some of these tricks are not well-known, like checking the Internet Archive (IA) for books. I try to write down my search workflow, and give general advice about finding and hosting documents, with demonstration case studies.
- Web pages
- Case Studies
- Missing Appendix
- Misremembered Book
- Missing Website
- Speech → Book
- Rowling Quote On Death
- Crowley Quote
- Finding The Right ‘SAGE’
- UK Charity Financials
- Nobel Lineage Research
- Dead URL
- Description But No Citation
- Finding Followups
- How Many Homeless?
- Citation URL With Typo
- Too Narrow
- Try It
- Really, Just Try It
- (Try It!)
- Yes, That Works Too
- Beating PDF Passwords
- Lewontin’s Thesis
- See Also
- External Links
Bayesian decision-theoretic analysis of the effect of fancier packaging on subscription cancellations & optimal experiment design.
I analyze an A/B test from a mail-order company of two different kinds of box packaging from a Bayesian decision-theory perspective, balancing posterior probability of improvements & greater profit against the cost of packaging & risk of worse results, finding that as the company’s analysis suggested, the new box is unlikely to be sufficiently better than the old. Calculating expected values of information shows that it is not worth experimenting on further, and that such fixed-sample trials are unlikely to ever be cost-effective for packaging improvements. However, adaptive experiments may be worthwhile.
- New Box A / B test
- See Also
Scripts for convenient free secure Bitcoin-based dating of large numbers of files/strings
Local archives are useful for personal purposes, but sometimes, in investigations that may be controversial, you want to be able to prove that the copy you downloaded was not modified and you need to timestamp it and prove the exact file existed on or before a certain date. This can be done by creating a cryptographic hash of the file and then publishing that hash to global chains like centralized digital timestampers or the decentralized Bitcoin blockchain. Current timestamping mechanisms tend to be centralized, manual, cumbersome, or cost too much to use routinely. Centralization can be overcome by timestamping to Bitcoin; costing too much can be overcome by batching up an arbitrary number of hashes and creating just 1 hash/timestamp covering them all; manual & cumbersome can be overcome by writing programs to handle all of this and incorporating them into one’s workflow. So using an efficient cryptographic timestamping service (the OriginStamp Internet service), we can write programs to automatically & easily timestamp arbitrary files & strings, timestamp every commit to a Git repository, and webpages downloaded for archival purposes. We can implement the same idea offline, without reliance on OriginStamp, but at the cost of additional software dependencies like a Bitcoin client.
Teaching a text-generating char-RNN to automatically imitate many different authors by labeling the input text by author; additional experiments include imitating Geocities and retraining GPT-2 on a large Project Gutenberg poetry corpus.
Char-RNNs are unsupervised generative models which learn to mimic text sequences. I suggest extending char-RNNs with inline metadata such as genre or author prefixed to each line of input, allowing for better & more efficient metadata, and more controllable sampling of generated output by feeding in desired metadata. A 2015 experiment using
torch-rnnon a set of ~30 Project Gutenberg e-books (1 per author) to train a large char-RNN shows that a char-RNN can learn to remember metadata such as authors, learn associated prose styles, and often generate text visibly similar to that of a specified author.
I further try & fail to train a char-RNN on Geocities HTML for unclear reasons.
More successfully, I experiment in 2019 with a recently-developed alternative to char-RNNs, the Transformer NN architecture, by finetuning training OpenAI’s GPT-2-117M Transformer model on a much larger (117MB) Project Gutenberg poetry corpus using both unlabeled lines & lines with inline metadata (the source book). The generated poetry is much better. And GPT-3 is better still.
- Handling multiple corpuses
- External Links
Bayesian decision-theoretic analysis of local mail delivery times: modeling deliveries as survival analysis, model comparison, optimizing check times with a loss function, and optimal data collection.
Mail is delivered by the USPS mailman at a regular but not observed time; what is observed is whether the mail has been delivered at a time, yielding somewhat-unusual “interval-censored data”. I describe the problem of estimating when the mailman delivers, write a simulation of the data-generating process, and demonstrate analysis of interval-censored data in R using maximum-likelihood (survival analysis with Gaussian regression using
survivallibrary), MCMC (Bayesian model in JAGS), and likelihood-free Bayesian inference (custom ABC, using the simulation). This allows estimation of the distribution of mail delivery times. I compare those estimates from the interval-censored data with estimates from a (smaller) set of exact delivery-times provided by USPS tracking & personal observation, using a multilevel model to deal with heterogeneity apparently due to a change in USPS routes/postmen. Finally, I define a loss function on mail checks, enabling: a choice of optimal time to check the mailbox to minimize loss (exploitation); optimal time to check to maximize information gain (exploration); Thompson sampling (balancing exploration & exploitation indefinitely), and estimates of the value-of-information of another datapoint (to estimate when to stop exploration and start exploitation after a finite amount of data).
- Decision theory
Commandline folklore: sorting files by filename or content before compression can save large amounts of space by exposing redundancy to the compressor. Examples and comparisons of different sorts.
Programming folklore notes that one way to get better lossless compression efficiency is by the precompression trick of rearranging files inside the archive to group ‘similar’ files together and expose redundancy to the compressor, in accordance with information-theoretical principles. A particularly easy and broadly-applicable way of doing this, which does not require using any unusual formats or tools and is fully compatible with the default archive methods, is to sort the files by filename and especially file extension.
I show how to do this with the standard Unix command-line
sorttool, using the so-called “
sort --keytrick”, and give examples of the large space-savings possible from my archiving work for personal website mirrors and for making darknet market mirror datasets where the redundancy at the file level is particularly extreme and the
sort --keytrick shines compared to the naive approach.
A log of experiments done on the site design, intended to render pages more readable, focusing on the challenge of testing a static site, page width, fonts, plugins, and effects of advertising.
To gain some statistical & web development experience and to improve my readers’ experiences, I have been running a series of CSS A/B tests since June 2012. As expected, most do not show any meaningful difference.
- Problems with “conversion” metric
- ideas for testing
- Resumption: ABalytics
- Line height
- Null test
- Text & background color
- List symbol and font-size
- Blockquote formatting
- Font size & ToC background
- Section header capitalization
- ToC formatting
- BeeLine Reader text highlighting
- Floating footnotes
- Indented paragraphs
- Sidebar elements
- Moving sidebar’s metadata into page
- Banner Ad Effect on Total Traffic
- Deep reinforcement learning
- Indentation + Left-Justified Text
History, background, visiting, ordering, using, & analyzing the drug market Silk Road 1
The cypherpunk movement laid the ideological roots of Bitcoin and the online drug market Silk Road; balancing previous emphasis on cryptography, I emphasize the non-cryptographic market aspects of Silk Road which is rooted in cypherpunk economic reasoning, and give a fully detailed account of how a buyer might use market information to rationally buy, and finish by discussing strengths and weaknesses of Silk Road, and what future developments are predicted by cypherpunk ideas.
- Silk Road as Cyphernomicon’s black markets
- Silk Road as a marketplace
- Silk Road
- Legal wares
- LSD case study
- Future Developments
- See Also
- External Links
- Reddit advice
- A mole?
- Bitcoin exchange risk
- Estimating DPR’s fortune minus expenses & exchange rate
- The Bet: BMR or Sheep to die in a year (by Oct 2014)
- Archives of SR pages
Archiving the Web, because nothing lasts forever: statistics, online archive services, extracting URLs automatically from browsers, and creating a daemon to regularly back up URLs to multiple sources.
Links on the Internet last forever or a year, whichever comes first. This is a major problem for anyone serious about writing with good references, as link rot will cripple several% of all links each year, and compounding.
To deal with link rot, I present my multi-pronged archival strategy using a combination of scripts, daemons, and Internet archival services: URLs are regularly dumped from both my web browser’s daily browsing and my website pages into an archival daemon I wrote, which pre-emptively downloads copies locally and attempts to archive them in the Internet Archive. This ensures a copy will be available indefinitely from one of several sources. Link rot is then detected by regular runs of
linkchecker, and any newly dead links can be immediately checked for alternative locations, or restored from one of the archive sources.
As an additional flourish, my local archives are efficiently cryptographically timestamped using Bitcoin in case forgery is a concern, and I demonstrate a simple compression trick for substantially reducing sizes of large web archives such as crawls (particularly useful for repeated crawls such as my DNM archives).
- Link rot
- Reacting to broken links
- External Links
Archiving using Wikipedia Recent Changes RSS feed (obsolete).
Continuation of the 2009 Haskell Wikipedia link archiving bot tutorial, extending it from operating on a pre-specified list of articles to instead archiving links live by using TagSoup parsing Wikipedia Recent Changes for newly-added external links which can be archived using WebCite in parallel. (Note: these tutorials are obsolete. WebCite is largely defunct, doing archiving this way is not advised, and WP link archiving is currently handled by Internet Archive-specific plugins by the WMF. For a more general approach suitable for personal use, see the writeup of
archiver-botin Archiving URLs.)
Haskell: tutorial on writing a daemon to archive links in Wikipedia articles with TagSoup and WebCite; obsolete.
This is a 2008 tutorial demonstrating how to write a Haskell program to automatically archive Internet links into WebCite & Internet Archive to avoid linkrot, by parsing WP dumps, downloading & parsing WP articles for external links with the TagSoup HTML parsing library, using the WebCite/IA APIs to archive them, and optimizing runtime. This approach is suitable for one-off crawls but not for live archiving using the RSS feed; for the next step, see Wikipedia RSS Archive Bot for a demonstration of how one could write a RSS-oriented daemon.