Skip to main content

text games directory


“Automated Crossword Solving”, Wallace et al 2022

“Automated Crossword Solving”⁠, Eric Wallace, Nicholas Tomlin, Albert Xu, Kevin Yang, Eshaan Pathak, Matthew Ginsberg, Dan Klein (2022-05-19; ):

[blog⁠; Github] We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles.

Our system works by generating answer candidates for each crossword clue using neural question answering models and then combines loopy belief propagation with local search [using ByT5 to handle puns/​humor/​word-play] to find full puzzle solutions.

Compared to existing approaches [like Dr.Fill], our system improves exact puzzle accuracy from 57% to 82% on crosswords from The New York Times and obtains 99.9% letter accuracy on themeless puzzles. Our system also won first place at the top human crossword tournament⁠, which marks the first time that a computer program has surpassed human performance at this event.

To facilitate research on question answering and crossword solving, we analyze our system’s remaining errors and release a dataset of over 6 million question-answer pairs.

Figure 2: An overview of the Berkeley Crossword Solver. We use a neural question answering model to generate answer probabilities for each question, and then refine the probabilities with loopy belief propagation. Finally, we fill the grid with greedy search and iteratively improve uncertain areas of the puzzle using local search.

“Word Golf”, Xia 2021

“Word Golf”⁠, Eric Xia (2021-12; ; backlinks; similar):

[Word Golf is a puzzle game similar to Word ladder+Six Degrees of Wikipedia: given a starting word like “dragon” and a grid of ‘similar’ possible words (near by in the GloVe graph of word embeddings), the player tries to navigate to a target word like “tower” in as few words as possible.

This requires intuiting similarity of meaning & dimensions of descriptions in English to guess what will make progress towards the target without leaving one trapped in a semantic niche with no way out.

It is a creative use of word embeddings, like “Alpha (A translation of Genesis 1)” or Entendrepreneur: a “Portmanteau & Rhyme Generator” using word embeddings (Simon 2018) or Semantle⁠, which demonstrates dark knowledge and “Toward A Universal Law Of Generalization For Psychological Science”, Shepard 1987⁠, like asking people which is more likely, Big Foot or Nessie⁠; other game examples include Mysterium or Chronology or GeoGuessr⁠. cf. “calibration training”]

“AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021

“AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”⁠, Tongshuang Wu, Michael Terry, Carrie J. Cai (2021-10-04; ⁠, ; backlinks; similar):

Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less effective when assisting humans on more complex tasks. In response, we introduce the concept of Chaining LLM steps together, where the output of one step becomes the input for the next, thus aggregating the gains per step. We first define a set of LLM primitive operations useful for Chain construction, then present an interactive system where users can modify these Chains, along with their intermediate results, in a modular way. In a 20-person user study, we found that Chaining not only improved the quality of task outcomes, but also significantly enhanced system transparency, controllability, and sense of collaboration. Additionally, we saw that users developed new ways of interacting with LLMs through Chains: they leveraged sub-tasks to calibrate model expectations, compared and contrasted alternative strategies by observing parallel downstream effects, and debugged unexpected model outputs by “unit-testing” sub-components of a Chain. In two case studies, we further explore how LLM Chains may be used in future applications.

“A Recipe For Arbitrary Text Style Transfer With Large Language Models”, Reif et al 2021

“A Recipe For Arbitrary Text Style Transfer with Large Language Models”⁠, Emily Reif, Daphne Ippolito, Ann Yuan, Andy Coenen, Chris Callison-Burch, Jason Wei (2021-09-08; ⁠, ⁠, ; backlinks; similar):

In this paper, we leverage large language models (LMs) to perform zero-shot text style transfer⁠. We present a prompting method that we call augmented zero-shot learning, which frames style transfer as a sentence rewriting task and requires only a natural language instruction, without model fine-tuning or exemplars in the target style. Augmented zero-shot learning is simple and demonstrates promising results not just on standard style transfer tasks such as sentiment, but also on arbitrary transformations such as “make this melodramatic” or “insert a metaphor.”

“1992: Silverwolf”, Reed 2021

“1992: Silverwolf”⁠, Aaron A. Reed (2021-06-03; ; backlinks; similar):

At the address was “a white crumbling turn-of-the-century house overlooking the tiny fishing village of Burtonport”, where women could take a paid holiday that would immerse them in the life of a proper boarding school girl of an earlier time. “There were no electric lights in the place”, one game journalist wrote upon visiting: “the maid who answered the door was surely not of this decade.” The students wore bonnets and period clothes while attending lessons on mathematics, literature, and penmanship; plastic and other modern materials were forbidden; the headmistress was a severe woman in black who enforced strict discipline—stricter, at times, than some of the students might have preferred. “Quite where computers fit into this situation is difficult to understand”, another journalist wrote⁠; and nobody could really put their finger on what the “situation” even was. Were the group “Victorian cultists?” Were they LARPers? Were they con artists preying on emotionally immature women? Were they a game studio with a very unusual front? Or was there, as one embarrassed Irish reporter asked⁠, “almost a gay element to the activities here?” Answers were not then forthcoming. Few are even today.

…Oxford, 1971. 2 years after Stonewall, a wave of student and activist groups are loosely uniting under the mantle of the Gay Liberation Front, accelerating queer and feminist conversations about equal rights and alternatives to hegemonic patriarchy. At women’s college Lady Margaret Hall, one student group bonds over a difference with most of their sisters-in-arms: they reject the crass, drug-fueled and sex-fueled decadence of the 1960s, even while admitting it “left openings for a new feminist consciousness”, as one member would later write: “We welcome [the rock culture of the sixties] as we would welcome typhoid in the enemy’s water supply. But we do not drink it ourselves.” Out of this group would arise several radical separatist movements with overlapping membership, including a religious one called Lux Madriana—worshiping a female god with rituals supposedly passed down from a “magical matriarchal community” in a distant past—and an elaborately fleshed-out otherworld called Aristasia⁠. Much like the rich fantasy worlds created by Tolkien or the Brontë sisters, Aristasia became an ever-growing obsession for its creators, with its own customs, calendar, literature, and history, to the extent that some of the worldbuilders eventually dropped out of university to attend their own unofficial Aristasian school instead. In Aristasia there were 2 genders, both female (assertive brunettes and demure blondes); the decadent modern world was known as The Pit; and the word for person was not man but maid.

Eventually some number of this group took up residence in the remote coastal house in Burtonport, which would become the stage for their next decade of inventing new realities. At first they styled themselves a community of “Rhennish” folk, the last descendants of a five-thousand-year-old matriarchal culture, and called themselves the “Silver Sisterhood.” But their plans to live off the land fell through, and after a few seasons it seemed a quite different group was occupying the house, now called St. Bride’s School. St. Bride’s billed itself as something between a real school and a holiday retreat, posting ads for week-long terms where students would “spend 24 hours a day living in a different time, living a different life.” The staff and students observed a strict hierarchy, with obedient students appointed prefects to keep the others in line, and prefects reporting in turn to teachers: “Some maids like to tell others what to do”, as a visitor summarized the philosophy during the Silver Sisterhood days, “and some maids like to be told what to do.” Both the Sisterhood and St. Bride’s attracted copious media attention—which seems likely to have been deliberately sought out—and from news clips it’s clear at least some residents of both groups were the same people, though going by different names and speaking with changed accents. It was the first of many transformations.

…One of these was a title called Silverwolf, which was to be released alongside an original comic by Langridge. It was based on a serialized fantasy story appearing in a lesbian periodical called Artemis, which the St. Bride’s crew were also distributing under yet different aliases. The stories were credited to “Laeretta Krenne-Genovene with illustrations by Michele Dennis”; one or both of these people may, or may not, have been Langridge. The stories tapped into the deep well of Aristasian mythology, and the recap at the start of one episode gives a sense of their flavor:

Modern English schoolgirl Petra Stone is a reincarnation of the matriarchal warrior princess Mayanna. The princess and the schoolgirl exist as 2 independent personalities. She has been taken back into ancient matriarchal Britain by an Amazon group: Rahiyana, the leader; Thunder, a 7-foot powerhouse; Whirlwind, the teen tornado and a shape-shifting imp called Uisce. But the evil patriarchal Lord Fear is determined to kill Petra and has sent in pursuit of the group a powerful and mysterious band known only as the Swarm.

In the text adventure based on the stories, you play as Petra’s 4 Amazon companions, switching between them on a quest to help the reincarnated princess gain the power to become Silverwolf. The game is split into 2 parts which can be played in either order: they may originally have come on 2 sides of the same cassette tape. In one part you play as Rahiyana and Whirlwind, trying to escort Petra to the Holy Mountain where she can complete the ritual to transform into Silverwolf; in the other, you play Thunder and Uisce trying to retrieve the enchanted sword that Silverwolf will wield. Each of the 4 Amazon women has their own special power, and you must switch between them using commands like BECOME WHIRLWIND to complete the game. Transformation is in fact a recurring motif: Uisce can turn into any creature she sees by typing TURN INTO, and this includes other people—in some sequences you’ll need to BECOME UISCE and then TURN INTO THUNDER to complete a puzzle. To activate Rahiyana’s archery skills, the player needs to summon the power of Diana into her body by typing the phrase HAYA DYANA. The game, like its creators, is obsessed with becoming other people, or allowing them to become you…In one puzzle sequence, you must make use of Uisce’s shape-shifting to reach a series of progressively more unlikely areas. Spotting a bullfrog in the rushes of a lake, you can transform into it to leap to a lily pad. From the lily pad you can see a dragonfly, which you can in turn become to fly to a hidden beach. On the beach is a sand-castle, and the dragonfly is small enough to see that it’s a fortress home for a band of fairies. Becoming a fairy lets you enter the castle and recover a buried key.

…The group’s former publisher suspects their primary motive was always financial: “I think, basically, St Bride’s were in business: they were doing it on a commercial basis, however un-commercial they may have looked!” But some of the school’s pupils in later years would come to characterize the group as dangerously earnest, with one describing it as a cult. “There was something sinister at the heart of it”, she wrote: “The founder was a remarkable person but was leading a fantasy life—we were living in someone else’s fantasy.” While much about the Games Mistresses would shift across their decades of fronts and personas, disconnection from the everyday world was a constant theme. “We really, truly are not living in the same place as you”, one once wrote⁠; “I don’t like the modern world, and I don’t live in it”, Scarlett has said⁠. “We don’t concern ourselves with the present at all. We live in a little world inside our house… it’s a world apart, really, where we are.” Perhaps from this perspective, an interest in the transporting power of games, electronic or otherwise, becomes less difficult to understand.

“Decision Transformer: Reinforcement Learning via Sequence Modeling”, Chen et al 2021

“Decision Transformer: Reinforcement Learning via Sequence Modeling”⁠, Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas et al (2021-06-02; ⁠, ⁠, ⁠, ; backlinks; similar):

[online DT] We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT⁠. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.

Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite the simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym⁠, and Key-to-Door tasks.

Decision Transformer: autoregressive sequence modeling for RL: We take a simple approach: each modality (return, state, or action) is passed into an embedding network (convolutional encoder for images, linear layer for continuous states). The embeddings are then processed by an autoregressive transformer model, trained to predict the next action given the previous tokens using a linear output layer. Evaluation is also easy: we can initialize by a desired target return (eg. 1 or 0 for success or failure) and the starting state in the environment. Unrolling the sequence—similar to standard autoregressive generation in language models—yields a sequence of actions to execute in the environment.

Sequence modeling as multitask learning: One effect of this type of modeling is that we perform conditional generation, where we initialize a trajectory by inputting our desired return. Decision Transformer does not yield a single policy; rather, it models a wide distribution of policies. If we plot average achieved return against the target return of a trained Decision Transformer, we find distinct policies are learned that can reasonably match the target, trained only with supervised learning. Furthermore, on some tasks (such as Q✱bert and Seaquest), we find Decision Transformer can actually extrapolate outside of the dataset and model policies achieving higher return!

[Paper⁠; Github⁠; see also MuZero⁠, “goal-conditioned” or “upside-down reinforcement learning” (such as “morethan” prompting), Shawn Presser’s GPT-2 chess model (& Cheng’s almost-DT chess transformer), value equivalent models, Ortega et al 2021 on ‘delusions’⁠. Simultaneous work at BAIR invents Decision Transformer as Trajectory Transformer⁠. Note that DT, being in the ‘every task is a generation task’ paradigm of GPT, lends itself nicely to preference learning simply by formatting human-ranked choices of a sequence.

The simplicity of this version of the control codes or ‘inline metadata trick’ (eg. CTRL) means it can be reused with almost any generative model where some measure of quality or reward is available (even if only self-critique like likelihood of a sequence eg. in Meena-style best-of ranking or inverse prompting): you have an architecture floorplan DALL·E? Use standard architecture software to score plans by their estimated thermal efficiency/​sunlight/​etc; prefix these scores, retrain, & decode for good floorplans maximizing thermal efficiency/​sunlight. You have a regular DALL·E? Sample n samples per prompt, CLIP-rank the images, prefix their ranking, retrain… No useful CLIP? Then use the CogView self-text-captioning trick to turn generated images back into text, rank by text likelihood… Choose Your Own Adventure AI Dungeon game-tree? Rank completions by player choice, feed back in for preference learning… All of the work is done by the data, as long as the generative model is smart enough.]

“AI Dungeon Public Disclosure Vulnerability Report—GraphQL Unpublished Adventure Data Leak”, AetherDevSecOps 2021

“AI Dungeon Public Disclosure Vulnerability Report—GraphQL Unpublished Adventure Data Leak”⁠, AetherDevSecOps (2021-04-28; ; backlinks; similar):

On April 18th, I discovered a vulnerability in the AI Dungeon GraphQL API that allowed unpublished adventures [games], unpublished scenarios [settings], and unpublished posts [stories] to be leaked. These resources could be read in bulk, at a rate of ~1000 requests per minute. Unfortunately, this is, in fact, the second time I have discovered this exact vulnerability. The first time, the issue was reported and fixed, but after finding it again, I can see that simply reporting the issue was a mistake…There was nothing preventing me from collecting more data, but what was gathered seemed sufficient to demonstrate the vulnerability fully—adventures dating all the way back to Dec 16th, 2019 were at risk.

A Surprising Observation: Looking at the resulting aggregated data led to a surprising observation. There were a lot of lewd or otherwise nsfw user action fragments—way more than I had anticipated. As a bit of followup analysis, I checked what percentage of adventures had explicitly lewd (18+) actions, and what percentage had nsfw actions.

The results are… surprising, to say the least. Out of the 188k adventures (and 3.9M user actions) analyzed:

  • 87.3k (46.3% of all adventures sampled) are NSFW and…
  • 59.1k (31.4% (!) of all adventures sampled) are explicit (18+)

Autoincrementing IDs: Autoincrementing IDs are, in my opinion, by far the biggest issue. They allow someone to read all resources, simply by starting from 1 and counting upwards. Had these not been used, a secondary vulnerability would have needed to be discovered alongside the vote vulnerability in order to exploit either one. Otherwise, there would be no way to figure out what the private adventure IDs are, even if they could be read through a vulnerability. I recommend deprecating and removing autoincrementing IDs completely, as soon as possible. After which point leaking and publishing a non UUID id should be treated as a security issue just by itself.

Also note—autoincrementing IDs allow anyone to trivially figure out roughly how many of each resource exists. For AI Dungeon, (as of April 19th) these would be:

  • ~1B actions
  • ~50M adventures
  • ~800K scenarios
  • ~250K comments—10% on posts, 25% as nested comments, 50% on scenarios, 5% on adventures, 10% on “story” posts
  • ~20K posts

“Collaborative Storytelling With Large-scale Neural Language Models”, Nichols et al 2020

“Collaborative Storytelling with Large-scale Neural Language Models”⁠, Eric Nichols, Leo Gao, Randy Gomez (2020-11-20; ⁠, ; backlinks; similar):

Storytelling plays a central role in human socializing and entertainment. However, much of the research on automatic storytelling generation assumes that stories will be generated by an agent without any human interaction. In this paper, we introduce the task of collaborative storytelling, where an artificial intelligence agent and a person collaborate to create an unique story by taking turns adding to it. We present a collaborative storytelling system which works with a human storyteller to create a story by generating new utterances based on the story so far. We constructed the storytelling system by tuning a publicly-available large scale language model on a dataset of writing prompts and their accompanying fictional works. We identify generating sufficiently human-like utterances to be an important technical issue and propose a sample-and-rank approach to improve utterance quality. Quantitative evaluation shows that our approach outperforms a baseline, and we present qualitative evaluation of our system’s capabilities.

“GPT-3 Creative Fiction”, Branwen 2020

GPT-3: “GPT-3 Creative Fiction”⁠, Gwern Branwen (2020-06-19; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Creative writing by OpenAI’s GPT-3 model, demonstrating poetry, dialogue, puns, literary parodies, and storytelling. Plus advice on effective GPT-3 prompt programming & avoiding common errors.

I continue my AI poetry generation experiments with OpenAI’s 2020 GPT-3, which is 116× larger, and much more powerful, than the 2019 GPT-2⁠. GPT-3, however, is not merely a quantitative tweak yielding “GPT-2 but better”—it is qualitatively different, exhibiting eerie runtime learning capabilities allowing even the raw model, with zero finetuning, to “meta-learn” many textual tasks purely by example or instruction. One does not train or program GPT-3 in a normal way, but one engages in dialogue and writes prompts to teach GPT-3 what one wants.

Experimenting through the OpenAI Beta API in June 2020, I find that GPT-3 does not just match my finetuned GPT-2-1.5b-poetry for poem-writing quality, but exceeds it, while being versatile in handling poetry⁠, Tom Swifty puns⁠, science fiction, dialogue like Turing’s Turing-test dialogue⁠, literary style parodies… As the pièce de résistance, I recreate Stanislaw Lem’s Cyberiad’s “Trurl’s Electronic Bard” poetry using GPT-3. (Along the way, I document instances of how the BPE text encoding unnecessarily damages GPT-3’s performance on a variety of tasks, how to best elicit the highest-quality responses, common errors people make in using GPT-3, and test out GPT-3’s improvements in NN weak points like logic or commonsense knowledge.)

GPT-3’s samples are not just close to human level: they are creative, witty, deep, meta, and often beautiful. They demonstrate an ability to handle abstractions, like style parodies, I have not seen in GPT-2 at all. Chatting with GPT-3 feels uncannily like chatting with a human. I was impressed by the results reported in the GPT-3 paper, and after spending a week trying it out, I remain impressed.

This page records GPT-3 samples I generated in my explorations, and thoughts on how to use GPT-3 and its remaining weaknesses⁠. I hope you enjoy them even a tenth as much as I enjoyed testing GPT-3 and watching the completions scroll across my screen.

“Measuring the Algorithmic Efficiency of Neural Networks”, Hernandez & Brown 2020

“Measuring the Algorithmic Efficiency of Neural Networks”⁠, Danny Hernandez, Tom B. Brown (2020-05-08; ⁠, ⁠, ⁠, ; backlinks; similar):

Three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. Algorithmic progress has traditionally been more difficult to quantify than compute and data. In this work, we argue that algorithmic progress has an aspect that is both straightforward to measure and interesting: reductions over time in the compute needed to reach past capabilities. We show that the number of floating-point operations required to train a classifier to AlexNet-level performance on ImageNet has decreased by a factor of 44× between 2012 and 2019. This corresponds to algorithmic efficiency doubling every 16 months over a period of 7 years. By contrast, Moore’s Law would only have yielded an 11× cost improvement. We observe that hardware and algorithmic efficiency gains multiply and can be on a similar scale over meaningful horizons, which suggests that a good model of AI progress should integrate measures from both.

“GPT-2 Preference Learning for Music Generation”, Branwen 2019

GPT-2-preference-learning: “GPT-2 Preference Learning for Music Generation”⁠, Gwern Branwen (2019-12-16; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Experiments with OpenAI’s ‘preference learning’ approach, which trains a NN to predict global quality of datapoints, and then uses reinforcement learning to optimize that directly, rather than proxies. I am unable to improve quality, perhaps due to too-few ratings.

Standard language generation neural network models, like GPT-2⁠, are trained via likelihood training to imitate human text corpuses. Generated text suffers from persistent flaws like repetition, due to myopic generation word-by-word, and cannot improve on the training data because they are trained to predict ‘realistic’ completions of the training data.

A proposed alternative is to use reinforcement learning to train the NNs, to encourage global properties like coherence & lack of repetition, and potentially improve over the original corpus’s average quality. Preference learning trains a reward function on human ratings, and uses that as the ‘environment’ for a blackbox DRL algorithm like PPO⁠.

OpenAI released a codebase implementing this dual-model preference learning approach for textual generation, based on GPT-2. Having previously used GPT-2 for poetry & music generation⁠, I experimented with GPT-2 preference learning for unconditional music and poetry generation.

I found that preference learning seemed to work better for music than poetry, and seemed to reduce the presence of repetition artifacts, but the results, at n ≈ 7,400 ratings compiled over 23 iterations of training+sampling November 2019–January 2020, are not dramatically better than alternative improvements like scaling up models or more thorough data-cleaning or more stringent sample curation. My blind ratings using n ≈ 200 comparisons showed no large advantage for the RL-tuned samples (winning only 93 of 210 comparisons, or 46%).

This may be due to insufficient ratings, bad hyperparameters, or not using samples generated with common prefixes, but I suspect it’s the former, as some NLP tasks in Ziegler et al 2019 required up to 60k ratings for good performance, and the reward model appeared to achieve poor performance & succumb to adversarial examples easily.

Working with it, I suspect that preference learning is unnecessarily sample-inefficient & data-inefficient, and that the blackbox reinforcement learning approach is inferior to directly using the reward model to optimize text samples, and propose two major architectural overhauls: have the reward model directly model the implied ranking of every datapoint, and drop the agent model entirely in favor of backprop-powered gradient ascent which optimizes sequences to maximize the reward model’s output⁠.

“AI Dungeon 2 Colab Notebook”, Walton 2019

“AI Dungeon 2 Colab notebook”⁠, Nick Walton (2019-12-14; similar):

AI Dungeon 2 is a completely AI generated text adventure built with OpenAI’s largest GPT-2 model. It’s a first of its kind game that allows you to enter and will react to any action you can imagine.

What is this?

Google Colab is a way to experience machine learning for free. Google provides GPUs that you can run code in. Because this game exploded however, Google likely won’t be able to allow free usage of it for AI Dungeon for very long. We are almost done making an app version of the game where you will be able to play AI Dungeon 2. Until that’s released you can still play the game here.

Main mirrors of AI Dungeon 2 are currently down due to high download costs.

We are using BitTorrent as a temporary solution to host game files and keep this game alive. It’s not fast, but it’s the best we’ve got right now.

If you want to help, best thing you can do is to download this torrent file with game files and seed it indefinitely to the best of your ability. This will help new players download this game faster, and discover the vast worlds of AI Dungeon 2!

  • Follow @nickwalton00 on Twitter for updates on when it will be available again.
  • Support AI Dungeon 2 on Patreon to help me to continue improving the game with all the awesome ideas I have for its future!

How to play

  1. Click “Tools”-> “Settings…” -> “Theme” -> “Dark” (optional but recommended)
  2. Go to Main Game section below
  3. Run Install block
  4. Run Download Model block
  5. It will then take a couple minutes to boot up as the model is downloaded loaded onto the GPU.
  6. Run the game block
  7. If you have questions about getting it to work then please go to github repo to get help.

“AI Dungeon 2”, Walton 2019

“AI Dungeon 2”⁠, Nick Walton (2019-12; ⁠, ⁠, ; backlinks; similar):

[AI Dungeon 2 is a project which trains GPT-2-1.5b on logs from text adventure games; when used interactively by a human, it “plays RPG games” with you, but because it is powered by GPT-2-1.5b, it is immensely flexible and can cope (to some degree) with almost any input, producing bizarre, hilarious, or surprisingly logical sequences of adventures.

It became popular overnight, crushing Walton with GCP egress bandwidth bills, and has been turned into an app and community to support distribution and development. See also the subreddit and story examples like ” My Musical Troupe of Orcs Uses Music to Advance Orc Rights ”⁠.]

“AI Dungeon 2: My Musical Troupe of Orcs Uses Music to Advance Orc Rights”, Walton 2019

“AI Dungeon 2: My Musical Troupe of Orcs Uses Music to Advance Orc Rights”⁠, Nick Walton (2019-11-26; ⁠, ⁠, ; backlinks; similar):

[Demonstration dialogue of interacting with a GPT-2-1.5b trained on text adventures/​RPGs. The player chooses to join a band of orcs as a musician and tries to steer the game towards orc rights, with moderate success, reaching the Emperor himself.]

In the first AI Dungeon, we created and deployed a deep learning generated text adventure using OpenAI’s 124M parameter GPT-2 model. In the interest of computational cost, possible actions and their results were generated and given to the player to choose from.

In AI Dungeon 2 we do away with pregenerated actions and allow the user to enter any action. The model then continues generating the story resulting from that action. We also upgrade the size of our model to OpenAI’s largest 1.5B parameter model and fine-tune it on a collection of text adventures obtained from Following the example of the Salesforce CTRL model we also modified the sampling of our model to add a penalty to already generated words to reduce the repetition issues that GPT-2 has and enable a lower temperature setting.

When playing AI Dungeon 2 the player can choose from several possible settings and characters. These choices generate a starting prompt and context sentence. The context sentence is always included as input to the model to help keep it grounded to the chosen setting while the starting prompt is the beginning of the story.

[Dialogue sample:]

…After the Gnoll War ends, the orcs decide to hold a big celebration. You immediately organize concert to entertain everyone. It goes well and you even win a few awards for your work. However, you still aren’t satisfied and you decide to do something else.

> write plays for the orcs to perform in

You write a series of plays based on the stories of the orcs. These works are performed throughout the Empire and they are very popular. You also take an interest in the orcs’ future and begin to study their language and culture. In time you even develop a liking for the orcs themselves.

> found a human-orc friendship organization

While you were studying the orcs, you came across a group called the Human-Orc Friendship Association (HOOFA). While they weren’t exactly like the Orcs, they had similar goals. You decided to form a separate chapter just for them.

> write some pamphlets support orc-human friendship

You wrote a pamphlet explaining what the HOOFA was and how it worked. You also explained that the orcs were the only race that could truly understand each other and that humans were inferior to the orcs.

> promote inter-racial marriage between humans and orcs

In order to promote inter-racial marriage between humans and orcs, you started a branch of the HOOFA called the Inter-Race Marriage Bureau (IRMB) which was located in the city of Kishak…

“GPT-2 Neural Network Poetry”, Branwen & Presser 2019

GPT-2: “GPT-2 Neural Network Poetry”⁠, Gwern Branwen, Shawn Presser (2019-03-03; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

Demonstration tutorial of retraining OpenAI’s GPT-2 (a text-generating Transformer neural network) on large poetry corpuses to generate high-quality English verse.

In February 2019, following up on my 2015–2016 text-generation experiments with char-RNNs⁠, I experiment with the cutting-edge Transformer NN architecture for language modeling & text generation. Using OpenAI’s GPT-2-117M (117M) model pre-trained on a large Internet corpus and nshepperd’s finetuning code, I retrain GPT-2-117M on a large (117MB) Project Gutenberg poetry corpus. I demonstrate how to train 2 variants: “GPT-2-poetry”, trained on the poems as a continuous stream of text, and “GPT-2-poetry-prefix”, with each line prefixed with the metadata of the PG book it came from. In May 2019, I trained the next-largest GPT-2, GPT-2-345M, similarly, for a further quality boost in generated poems. In October 2019, I retrained GPT-2-117M on a Project Gutenberg corpus with improved formatting, and combined it with a contemporary poem dataset based on Poetry Foundation’s website⁠.

With just a few GPU-days on 1080ti GPUs, GPT-2-117M finetuning can produce high-quality poetry which is more thematically consistent than my char-RNN poems, capable of modeling subtle features like rhyming, and sometimes even a pleasure to read. I list the many possible ways to improve poem generation and further approach human-level poems. For the highest-quality AI poetry to date, see my followup pages, “GPT-3 Creative Writing”⁠/​“GPT-3 Non-Fiction”⁠.

For anime plot summaries, see TWDNE⁠; for generating ABC-formatted folk music, see “GPT-2 Folk Music” & “GPT-2 Preference Learning for Music and Poetry Generation”⁠; for playing chess, see “A Very Unlikely Chess Game”⁠; for the Reddit comment generator, see SubSimulatorGPT-2⁠; for fanfiction, the Ao3⁠; and for video games, the walkthrough model⁠. For OpenAI’s GPT-3 followup, see “GPT-3: Language Models are Few-Shot Learners”⁠.

“This Waifu Does Not Exist”, Branwen 2019

TWDNE: “This Waifu Does Not Exist”⁠, Gwern Branwen (2019-02-19; ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ⁠, ; backlinks; similar):

I describe how I made the website (TWDNE) for displaying random anime faces generated by StyleGAN neural networks, and how it went viral.

Generating high-quality anime faces has long been a task neural networks struggled with. The invention of StyleGAN in 2018 has effectively solved this task and I have trained a StyleGAN model which can generate high-quality anime faces at 512px resolution. To show off the recent progress, I made a website, “This Waifu Does Not Exist” for displaying random StyleGAN 2 faces. TWDNE displays a different neural-net-generated face & plot summary every 15s. The site was popular and went viral online, especially in China. The model can also be used interactively for exploration & editing in the Artbreeder online service⁠.

TWDNE faces have been used as screensavers, user avatars, character art for game packs or online games⁠, painted watercolors⁠, uploaded to Pixiv, given away in streams⁠, and used in a research paper (Noguchi & Harada 2019). TWDNE results also helped inspired Sizigi Studio’s online interactive waifu GAN⁠, Waifu Labs⁠, which generates even better anime faces than my StyleGAN results.

“These Maps Reveal the Hidden Structures of Choose Your Own Adventure Books: If You Decide to See More, Click on This Story”, Laskow 2017

“These Maps Reveal the Hidden Structures of Choose Your Own Adventure Books: If you decide to see more, click on this story”⁠, Sarah Laskow (2017-06-13; ; backlinks; similar):

The last installment of the original “Choose Your Own Adventure” series came out in 1998, but since 2004, Chooseco⁠, founded by one of the series’ original authors, R. A. Montgomery, has been republishing classic volumes, as well as new riffs on the form of interactive fiction that seemed ubiquitous in the 1980s and ’90s. The new editions also carry an additional feature—maps of the hidden structure of each book.

Tattoo of Death, Choose Your Own Adventure #22 (All maps courtesy of ChooseCo)

For years, fans have been creating visualizations of the forking structures of “Choose Your Own Adventure” books. Often, they’re interested in the types of outcomes at the end of each path. One map labels each ending as “new life, return home, or death”, and another separates them into “cliffhanger, solution, or death.” Christian Swinehart’s extensive graphical analysis of the books labels the endings as “great, favorable, mediocre, disappointing, or catastrophic.”

…Mapping the bones of the books can have other purposes, too. Nick Montfort, a poet and professor at the Massachusetts Institute of Technology who studies interactive fiction, has a habit of asking people what they know about “Choose Your Own Adventure” books. “They often say, ‘You have two choices after every page’”, he says. “That’s not true. Sometimes you have one choice. Sometimes you have more than two. When you show the maps, you can see that these books don’t look exactly the same.” The older volumes, for instance, tend to have more endings than the later ones, and three of the oldest—Journey Under the Sea, Space and Beyond, and By Balloon to the Sahara—have 42 endings each, more than any other books in the series…In just about every case, it can be surprising how a simple choice leads you down a complex path. In By Balloon to the Sahara, you’re in a balloon and are presented with a choice on the very first page. Storm clouds are on the horizon. Choice 1: “If you act now, you can release gas from the balloon and land before the storm overtakes you.” Choice 2: “Perhaps the storm will pass quickly. Maybe you can ride it out.” That’s just the beginning, since this book has the most decision points—48—of the series.

…There is yet another possibility in these nonlinear books: hidden endings. Inside UFO 54-40 has a hidden ending that’s only available to a reader who ignores the decisions and flips to it without prompting. But it’s there. “It’s a two-page, big illustration of this city”, says Montfort, the MIT professor. “The land of Ultima. As you flip through the book, even if you’re being very obedient, you can’t help but wonder what this text is.”

…Maps like the ones Chooseco created can reveal the structure of a book that gives readers choices, but though the multiple story lines are part of what makes the series so fun, they’re not the only thing that defines it. The meat of “Choose Your Own Adventure” stories are gender-neutral romps in worlds where there are no obviously right or wrong moral choices. There’s danger around bend, usually in the form of something like space monkeys, malicious ghosts, or conniving grown-ups. Even with a map, there’s no way to find out what really comes next without making a choice and flipping to another page.

“Complex Game Worlds, Simple Interfaces”, Hracek 2015

“Complex game worlds, simple interfaces”⁠, Filip Hracek (2015-08-25; ; backlinks; similar):

[AI game paradigm: highly-complicated simulations but with AI decision support, like providing the top n ranked choices.]

The whole point of “egamebook” is to allow for complex game worlds that are controlled by a series of simple choices. By simple, I don’t mean “easy” or “without complex consequences”. I just mean they’re not a complicated interface. They’re a list of 2 to 5 options to choose from.

More generally, I’m interested in systems that present complex simulations in conversational form.

…In games, AI is generally written for the enemies. Some games have allies or wingmen, and those also need AI. In other words, AI is written for all agents in the game except for the player.

But that’s exactly what you want. You want to write your AI in a way that it can be applied to the player. Or, more precisely, to the User Interface (UI).

…This is what I tried to do this past weekend when I entered the Ludum Dare #33 competition. (It’s a challenge to create a game in one weekend from scratch, solo.) I used the (still very much incomplete) egamebook library as the engine, and my fuzzy logic library as the basis of the AI. I made a little prototype called Loch Ness⁠.

The game is, of course, very flawed. It does receive quite favorable reviews, but there’s just so much you can do in 2 days, especially if you strive for a strategy game. For me, though, the biggest success is that it only gives you a few options at a time, and they’re not dumb, and you still play in a sandbox world.

The way I did this was simple, really. I wrote the AI code that scores different strategic moves according to their immediate desirability. (For example, moving troops from a well-supplied place to a place where they would starve receives a low score. Attacking an undefended enemy city receives a high score. And so on.) In traditional AI fashion, this code is then used by the opposing factions3 by scoring all possible moves and then picking the most desirable ones.

But—since I already have a mechanism to score moves—I can use the same thing for the player. I score all the possibilities, sort them, then pick the first few and bring them up as options.

This makes sure that you don’t need to pick from 100 or more options, most of which are irrelevant or dumb. But it still gives you the freedom of a simulated world.

“48% of Americans Know What Gamebooks Are”, Hracek 2015

“48% of Americans know what gamebooks are”⁠, Filip Hracek (2015-03-10; ; backlinks; similar):

I recently ran a miniature survey to gauge interest in egamebook and to find out more about the kind of people who might be interested in it…There were only 2 questions:

  1. Have you ever read a Choose-Your-Own-Adventure book or a gamebook?

    • 52%: No, I’ve never heard about those
    • 20%: No, but I’ve heard of them
    • 21%: Yes, long time ago
    • 5%: Yes, recently
  2. How does this prototype of a mobile e-gamebook look to you?

“/r/AIDungeon/”, Reddit 2022

“/r/AIDungeon/”⁠, Reddit (; backlinks):

[Subreddit for sharing AI Dungeon 2 game transcripts and discussing bugs/​issues/​upgrades; FAQ⁠.]

The Sumerian Game


Deadline (video game)


AI Dungeon