Skip to main content

Python directory


“Technology Holy Wars Are Coordination Problems”, Branwen 2020

Holy-wars: “Technology Holy Wars are Coordination Problems”⁠, Gwern Branwen (2020-06-15; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Flamewars over platforms & upgrades are so bitter not because people are jerks but because the choice will influence entire ecosystems, benefiting one platform through network effects & avoiding ‘bitrot’ while subtly sabotaging the rest through ‘bitcreep’.

The enduring phenomenon of ‘holy wars’ in computing, such as the bitterness around the prolonged Python 2 to Python 3 migration, is not due to mere pettiness or love of conflict, but because they are a coordination problem: dominant platforms enjoy strong network effects, such as reduced ‘bitrot’ as it is regularly used & maintained by many users, and can inflict a mirror-image ‘bitcreep’ on other platforms which gradually are neglected and begin to bitrot because of the dominant platform.

The outright negative effect of bitcreep mean that holdouts do not just cost early adopters the possible network effects, they also greatly reduce the value of a given thing, and may cause the early adopters to be actually worse off and more miserable on a daily basis. Given the extent to which holdouts have benefited from the community, holdout behavior is perceived as parasitic and immoral behavior by adopters, while holdouts in turn deny any moral obligation and resent the methods that adopters use to increase adoption (such as, in the absence of formal controls, informal ones like bullying).

This desperate need for there to be a victor, and the large technical benefits/​costs to those who choose the winning/​losing side, explain the (only apparently) disproportionate energy, venom, and intractability of holy wars⁠.

Perhaps if we explicitly understand holy wars as coordination problems, we can avoid the worst excesses and tap into knowledge about the topic to better manage things like language migrations.

“This Waifu Does Not Exist”, Branwen 2019

TWDNE: “This Waifu Does Not Exist”⁠, Gwern Branwen (2019-02-19; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

I describe how I made the website (TWDNE) for displaying random anime faces generated by StyleGAN neural networks, and how it went viral.

Generating high-quality anime faces has long been a task neural networks struggled with. The invention of StyleGAN in 2018 has effectively solved this task and I have trained a StyleGAN model which can generate high-quality anime faces at 512px resolution. To show off the recent progress, I made a website, “This Waifu Does Not Exist” for displaying random StyleGAN 2 faces. TWDNE displays a different neural-net-generated face & plot summary every 15s. The site was popular and went viral online, especially in China. The model can also be used interactively for exploration & editing in the Artbreeder online service⁠.

TWDNE faces have been used as screensavers, user avatars, character art for game packs or online games⁠, painted watercolors⁠, uploaded to Pixiv, given away in streams⁠, and used in a research paper (Noguchi & Harada 2019). TWDNE results also helped inspired Sizigi Studio’s online interactive waifu GAN⁠, Waifu Labs⁠, which generates even better anime faces than my StyleGAN results.

“Making Anime Faces With StyleGAN”, Branwen 2019

Faces: “Making Anime Faces With StyleGAN”⁠, Gwern Branwen (2019-02-04; ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

A tutorial explaining how to train and generate high-quality anime faces with StyleGAN 1/​2 neural networks, and tips/​scripts for effective StyleGAN use.

Generative neural networks, such as GANs, have struggled for years to generate decent-quality anime faces, despite their great success with photographic imagery such as real human faces. The task has now been effectively solved, for anime faces as well as many other domains, by the development of a new generative adversarial network, StyleGAN⁠, whose source code was released in February 2019.

I show off my StyleGAN 1/​2 CC-0-licensed anime faces & videos, provide downloads for the final models & anime portrait face dataset⁠, provide the ‘missing manual’ & explain how I trained them based on Danbooru2017 /  ​ 2018 with source code for the data preprocessing⁠, document installation & configuration & training tricks⁠.

For application, I document various scripts for generating images & videos⁠, briefly describe the website “This Waifu Does Not Exist” I set up as a public demo & its followup This Anime Does Not (TADNE) (see also Artbreeder), discuss how the trained models can be used for transfer learning such as generating high-quality faces of anime characters with small datasets (eg. Holo or Asuka Souryuu Langley), and touch on more advanced StyleGAN applications like encoders & controllable generation.

The anime face graveyard gives samples of my failures with earlier GANs for anime face generation, and I provide samples & model from a relatively large-scale BigGAN training run suggesting that BigGAN may be the next step forward to generating full-scale anime images.

A minute of reading could save an hour of debugging!

“Making Anime With BigGAN”, Branwen 2019

BigGAN: “Making Anime With BigGAN”⁠, Gwern Branwen (2019-02-04; ⁠, ⁠, ⁠, ; backlinks; similar):

Experiments in using BigGAN to generate anime faces and whole anime images; semi-successful.

Following my StyleGAN anime face experiments⁠, I explore BigGAN⁠, another recent GAN with SOTA results on one of the most complex image domains tackled by GANs so far (ImageNet). BigGAN’s capabilities come at a steep compute cost, however.

Using the unofficial BigGAN-PyTorch reimplementation, I experimented in 2019 with 128px ImageNet transfer learning (successful) with ~6 GPU-days, and from-scratch 256px anime portraits of 1000 characters on an 8×2080ti machine for a month (mixed results). My BigGAN results are good but compromised by the compute expense & practical problems with the released BigGAN code base. While BigGAN is not yet superior to StyleGAN for many purposes, BigGAN-like approaches may be necessary to scale to whole anime images.

For followup experiments, Shawn Presser⁠, I and others (collectively, “Tensorfork”) have used Tensorflow Research Cloud TPU credits & the compare_gan BigGAN reimplementation. Running this at scale on the full Danbooru2019 dataset in May 2020, we have reached the best anime GAN results to date (later exceeded by This Anime Does Not Exist).

“The Kelly Coin-Flipping Game: Exact Solutions”, Branwen et al 2017

Coin-flip: “The Kelly Coin-Flipping Game: Exact Solutions”⁠, Gwern Branwen, Arthur B., nshepperd, FeepingCreature, Gurkenglas (2017-01-19; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Decision-theoretic analysis of how to optimally play Haghani & Dewey 2016’s 300-round double-or-nothing coin-flipping game with an edge and ceiling better than using the Kelly Criterion. Computing and following an exact decision tree increases earnings by $6.6 over a modified KC.

Haghani & Dewey 2016 experiment with a double-or-nothing coin-flipping game where the player starts with $30.4[^\$25.0^~2016~]{.supsub} and has an edge of 60%, and can play 300 times, choosing how much to bet each time, winning up to a maximum ceiling of $303.8[^\$250.0^~2016~]{.supsub}. Most of their subjects fail to play well, earning an average $110.6[^\$91.0^~2016~]{.supsub}, compared to Haghani & Dewey 2016’s heuristic benchmark of ~$291.6[^\$240.0^~2016~]{.supsub} in winnings achievable using a modified Kelly Criterion as their strategy. The KC, however, is not optimal for this problem as it ignores the ceiling and limited number of plays.

We solve the problem of the value of optimal play exactly by using decision trees & dynamic programming for calculating the value function, with implementations in R, Haskell⁠, and C. We also provide a closed-form exact value formula in R & Python, several approximations using Monte Carlo/​random forests⁠/​neural networks, visualizations of the value function, and a Python implementation of the game for the OpenAI Gym collection. We find that optimal play yields $246.61 on average (rather than ~$240), and so the human players actually earned only 36.8% of what was possible, losing $155.6 in potential profit. Comparing decision trees and the Kelly criterion for various horizons (bets left), the relative advantage of the decision tree strategy depends on the horizon: it is highest when the player can make few bets (at b = 23, with a difference of ~$36), and decreases with number of bets as more strategies hit the ceiling.

In the Kelly game, the maximum winnings, number of rounds, and edge are fixed; we describe a more difficult generalized version in which the 3 parameters are drawn from Pareto, normal, and beta distributions and are unknown to the player (who can use Bayesian inference to try to estimate them during play). Upper and lower bounds are estimated on the value of this game. In the variant of this game where subjects are not told the exact edge of 60%, a Bayesian decision tree approach shows that performance can closely approach that of the decision tree, with a penalty for 1 plausible prior of only $1. Two deep reinforcement learning agents, DQN & DDPG⁠, are implemented but DQN fails to learn and DDPG doesn’t show acceptable performance, indicating better deep RL methods may be required to solve the generalized Kelly game.