Choose-Your-Own-Adventure AI Dungeon Games

Neural networks like GPT-2 power text adventure games where you can do anything; but they are too expensive. I propose that if we turn them into Choose Your Own Adventure hypertext games, they become feasible and enable entirely new gameplay.
NN⁠, fiction⁠, GPT
2021-06-062021-06-18 finished certainty: possible importance: 4 backlinks


A useful variation on -style (AID) text games would be to turn them into shared public game trees of pre-generated options which the player selects from, - style.

This can dramatically reduce costs as players spend most of their time reading cached output (rarely needing nor wanting to generate brandnew output requiring a NN run), can increase quality as players collectively uprank actions/outcomes which are highest-quality, and caters to newbies who don’t understand the power of NN-backed text games and flail around.

Revisiting AI Dungeon (AID) in the light of a year of GPT-3⁠, I would like to propose a radical redesign of it based on the problems it has encountered. AID is one of the most interesting uses of the GPT-2 & GPT-3 neural network models; computer games, forever frustrated by their inability to offer worlds as arbitrarily complex and realistic as a human running games (which is effectively a collaborative fiction writing exercise), no matter how many verbs are added to the text adventure parser or how many 3D artists are hired, finally can handle anything a player can type in, and improvise with it. At its best, an AI Dungeon game like really feels like simulating an entire world. Of course, as current NNs have many limitations (they are far from human-level intelligence, lack any kind of memory, are unable to plan, are not trained to create high-quality fiction, have weak common sense, etc), AID and its imitators typically does not deliver such a peak experience. In practice, AID, as run by Nick Walton’s startup Latitude as of April–June 2021, has been experiencing more serious practical problems relating to GPT-3 cost, and its use of the OpenAI API & OA’s mandatory moderation thereof.

AID Problems

AID and imitators are currently designed in the most straightforward way possible: a seed text, often unique or customized, and then text generation step by step after waiting for player text responses. This causes 3 major problems:

  1. cost: every turn invokes a full-cost call to a NN model. No matter what you do, this is always going to be expensive. This has devastating consequences on what players can be allowed to do and how often, how hard they must be monetized, how one is beholden to APIs, centralization requiring/enabling censorship, etc.
  2. quality: models like GPT-3 can generate stunningly good output when cherry-picked at a level like ‘best of 20’; the other 19, however, range from ‘meh’ to ‘atrocious’. Few people have the patience, or funds, to do 20 samples per action/outcome, however, because of #1. If one could, however, the results might be stunning.
  3. learning curve: the default experience for AID seems to be to open it up, type in a few sentences like “Hello, I am a human”, decide it’s boring, and to quit. It doesn’t show you what it can do; one has to elicit by demanding cool stuff, of the sort most people quite understandably expect computers to be completely incapable of. (How can a computer be a good D&D DM when it also takes 20 seconds to load a social media website, which then is broken?)

Stuck

Fixing these AID problems are intrinsically difficult. The AID model is inherently extremely expensive, and tweaks like model sparsification/distillation can only deliver so much in the way of cost savings before the quality goes to hell. The cost of GPT-3 is already a near-fatal problem for Latitude; in deep learning (see ) are fast, but we still aren’t going to see GPT-3-level models on low-end GPUs for several years to come (which is a long time to wait, however amusing problems may seem in retrospect), and whatever cost/efficiency gains algorithms & hardware give us, players may respond to by escalating demands. (It was not so long ago that people were dreaming of access to GPT-2-1.5b; running & finetuning that or models up to 9b parameters is now ordinary, but GPT-3 has spoiled people for them. Once GPT-3-level NNs become available on home computers, players will doubtless clamor for voice-synthesis/3D-image/human-level interactions—much as there is still an audience for ever more realistic 3D FPS games, despite regular predictions that ‘the latest PS1 [or PS2 or PS5 or Crysis] game has finally exhausted the possibilities of incremental gains in fancy graphics and now developers will have to focus on gameplay’.)

Rethinking Game Trees

To solve this, we need to rethink AID entirely; AID is the first and simplest way to do a text game with NN models, but it may not be the best.

Games are often thought of in terms of ‘trees’ defined by actions and possible outcomes, like in chess or Go. Text generation is also a tree—just a wide tree. Typically, in GPT-related apps, this is hidden from the player: you just see the blinking cursor, not a tree of possible text completions from there. Of this tree, the player sees only one branch; he can erase outputs and rewind to an earlier branch, but still can only see his current branch, and is blind to all the options. Anyway, it would be too difficult and expensive to show the player all of the branches, because that requires generating many more samples, only to throw them away, likely unseen.

But what if we embraced the tree hidden behind the AID interface? What if we didn’t throw them away? What if we kept all those branches we so expensively generated, and shared them with other players? How would we do that?

Choose Your Own Adventure

By playing not ‘neural net D&D’ but ‘neural net Choose Your Own Adventure’ (CYOA) text games instead: a game where instead of ‘generating’ the next turn, each page instead gives you a few good options to choose.

In a CYOA version of AID, everyone starts with a certain scenario(s), which describes a situation and lists, say, 5 possible actions. This is the root of the tree. Each of these actions is pre-generated by a model; a player chooses one to go to the next scene/node, and it is fetched from the database to show the outcome; only if the outcome hasn’t been generated before & is not cached, does the NN have to generate the outcome. Then another set of 5 actions is listed (also usually pre-generated). And so on. Unlike regular CYOAs, the adventure is never ending—because the NN just keeps expanding out terminal nodes on demand. And while it may sound complex, it is easy to implement.1

Switching to a CYOA model solves all 3 problems with AID, and is not merely a cost-saving crutch coming at the expense of game quality—a CYOA approach enables many things that are hard or impossible with the default AID interface.

CYOA Advantages

Newbies

A newbie firing it up for the first time is greeted by a UI that is as simple as can be, working just like a hypertext web browser, and immediately sucking them in with interesting options and outcomes on every page, more comprehensive, creative, and high quality than any normal single-authored CYOA could ever hope to be or compete with the hivemind of AI+human-community—the adventure never ends, it just slows down a bit when you go out of cache. There is no ‘cold start’ problem like there is with AID. No one will ever ask “but what do I do” of a CYOA. (What do you do? You click to find out whether throwing the anchor at the space-octopus works or if you should instead try to mind-meld with it to negotiate a peace-treaty between the space-octopii & humanity, that’s what you do. And then you discover the advanced ways to use it.)

Amortizing Generation Cost

As players play through the game tree, the actions/outcomes are cached permanently, and the tree fills out. Soon, players are spending most of their time hitting cached pages. Such interactions are ~100% free: they’re just a few kilobytes fetched from a database. The more that players play, the cheaper each play/player becomes. Because of this cheapness, it is easy to scale up to many players, easy to afford high-quality rather than low-quality models, etc. (The use of a few ranked choices is critical: if regular freeform text inputs were allowed, they would probably not be all that much more interesting than curated choices, but the ‘branching factor’ would be like tens of thousands per node, and you can forget about caching anything.)

CYOA vs AID is not a hard dichotomy, but a spectrum of novelty vs cost.

A CYOA can interpolate anywhere between the 2 extremes: the CYOA extreme is 100% cached, pre-generated, and static; it is free to play, and you can host hundreds of thousands of players on a single dedicated server for a few bucks a month of hosting costs. The other extreme is the default AID extreme: there is zero caching, everything is generated on demand by expensive NNs just for that turn, and immediately thrown away; highly active players can easily rack up scores or hundreds of dollars in OA API costs a month, and startups can burn through millions in VC capital catering to relatively small playerbases, with few options for how to cut costs (OA isn’t going to cut the API bill, so what do you do? Degrade the model players paid for? Between a rock & a hard place there, and hell hath few furies like a gamer scorned). In between, one has a mix of a cached tree, and occasional generation events

CYOA operators can adjust their spot on the spectrum by tweaking who can generate what when at what cost. For example, perhaps there is a ‘free tier’ who is unable to generate at all, and can only traverse existing nodes; or perhaps free tier players can generate at will, but only using the smallest cheapest lowest-quality model (in which case it will be extremely unlikely that they’ll generate a better choice than the community has curated using the highest-quality model’s suggestions, and they won’t generate unless they have to). If costs are too low, generations can be upgraded, new options generated to be added to popular nodes, players can be directed to interesting but under-explored subtrees to start fleshing them out, one’s idle GPU can go around filling out missing nodes during downtime just to be prepared for future players… ‘Whales’ are critical to game profitability, and they can be catered to in many ways, far more easily than in AID (which is already maximally expensive): perhaps they get 20 options instead of 5 generated for them, or perhaps they get dedicated GPUs filling out nodes in advance, so they never have to waste a few seconds waiting & twiddling their thumbs (ie if they are at node depth X, then children nodes at depth X+1 & depth X+2 are generated in the background for them, rather than waiting for them to make a choice and only then generating a node).

Many options and possible settings to choose the point on the spectrum that balances cost, novelty, and playability, and avoiding the messy situations with AID’s costs. This makes it more likely that acceptable tradeoffs and an economically-feasible business model can be found.

Optimizing Trees

One can go further than simply offering up a tree with choices by optimizing the tree.

Choices can be ranked by popularity: bad choices, which few players care enough to choose, can automatically fall off the list and be replaced by new choices. Choices could be explicitly voted on to filter even more heavily. Players could have the option to reroll at any given point to stir in new options—this doesn’t undermine the cost savings because after filtering through 10 or 20 choices at a given node, the top 5 choices will almost certainly be much better than a random new choice, and so players just won’t bother unless the choices really are terrible. And nothing bars editing choices or writing new ones by hand to insert at a given node—the NN doesn’t care. Further, I refer to it as a tree, but it doesn’t need to be a tree either, only a graph (classic CYOAs were not trees either, and sometimes weren’t even connected graphs!); the straightforward generation approach is relatively unlikely to loop back to existing nodes (the generated text would need to be identical, which is extremely unlikely), but more advanced models might be able to choose to link to an existing nodes, and players may find it quite useful to edit the game-tree as a graph to set up ‘hubs’ or ‘episodes’ or ‘loops’.

The ranking mechanism, it’s worth noting as a concrete example of how it improves quality game-wide, further helps solve AID’s problems with long-range coherency & memory, and problems like characters changing sex mid-scene as the NN ‘forgets’—if players do not want such contradictions or forgettings, then they simply will not choose the actions or outcomes which entail contradictions, and will tend to pick the ones which are consistent and continue the story well. The players collectively serve as the long-term memory and make up for the NN’s failings. (AID tries to solve this with a limited ‘notes’ function which hardwires ‘facts’ into each generation as a kind of long-term memory, but that uses up the context window and still doesn’t work well.)

Because of this flexibility, you may be able to use a cheap small NN and not the big fancy ones like GPT-3: sure, maybe the average completion is not nearly as good, but the community will just upvote the best ones, edit in new ones as necessary, and will make the quality competitive. (You could start competing with AID right now using GPT-neo on your home desktop, no need to be a slave to the OA API and its many burdens!) Choices can also be tagged by players: “funny”, “dramatic”, “friendly”, “sexy” (perhaps color-coded)…

Ranking & RL Finetuning

Given enough rankings, one can then train a ranker to predict the score of a choice. This can be used to screen new generations, rerank existing sets of options on top of popularity/votes, and one could even do OA-style to train a new better NN which directly tries to generate good completions rather than merely statistically likely completions (which is what a default or fiction-finetuned model is doing), enabling high quality generations without any human feedback.2

Emergent Gameplay

A CYOA might sound stale and boring, but the curation means it’ll be quite the opposite because of the social aspects.

It would be like a hybrid of & (eg ), because CYOA paths are deep-linkable URLs, in a way AID is not: I can easily hand you a link to an arbitrarily deep node, and you can click on it and start playing exactly the same thing I am playing; there is nothing remotely like this for AID—even if you hand me a scenario, our games immediately diverge and there is no way for me to easily share my version with you. But with a CYOA tree, it’s all just URLs/paths/pointers. And “the street finds its own use for things”.

So a player can ‘author’ an adventure by carefully curating a premise and then choosing actions and backing up and editing, creating a full-fledged scenario in collaboration with the NN, and then announce the trail down the overall tree to the new sub-tree when the sub-tree is satisfactory. Or players can diverge into different sub-adventures and continuums, and communities can develop elaborate in-jokes and allusions across nodes, or influence the rankings to create ‘canonical’ stories or environments. If you’ve ever seen ‘forum quests’ (Internet evolution of play-by-post tabletop RPG games), imagine that, but with parasocial factions contending to decide which option at a key node becomes the ‘True End’ and which ones are just ‘Bad Ends’. There might be elaborate discussions as groups step through adventures, debating which choices they all will take, with heretics reviled & exiled… Streamers might let fans vote to decide what fork they go down, and their hordes of fans follow them down it, voting with their feet for specific choices, and sculpting a particular stream.

Combined: The CYOA Flywheel

So, this creates positive feedback loops in multiple ways:

  1. cheap play draws in players, amortizing fixed costs over more players;
  2. the more players you have, the more caching reduces cost per player, enabling you to offer the same level of service or cut prices or invest in more generation;
  3. the more players and nodes, the better ranked choices are, increasing player happiness while decreasing node generation (further reducing cost);
  4. and the better choices and scenarios are by default, the more players will be drawn in by the quality rather than low costs, feeding into the previous 3 feedback loops;
  5. the more players, the more likely community effects are to kick in and enable creation of brandnew ways to play, driving 1–4.

Players of this might look down on regular AID-likes: “You pay how much‽ Like, is that per year? And you get the random and illogical completions most of the time, while being totally unable to share it, and it takes several hours just to get an idea what you can do? Can you even do quests together in your AID with your clan or ? Wait, you don’t know what a ‘quest’ is? smh! Well, I suppose that’s OK for fapping (you pervert), but that sounds like about it. Why don’t you try CYOA for a bit? I found a funny one the other day, where you turn into a goose, who is naughty…”

Limitations: Gaming In Public

The main flaw with this is that people might want the contents of the tree to remain secret—that it’s not enough to merely have no names associated with it, but the content itself must be secret. Perhaps the content has private data like names of people one knows, or is the thought of anyone ever reading it is just sheer mortification. If people only want limited customization, like “Hi, $NAME!”, then that’s easy to handle with templating/prompts, somewhat like AID’s current world-variables, but that doesn’t work for fucked-up porn. I don’t see how CYOA-AID can too easily handle such use-cases, aside from making trees per-account private, and the player having to pay full-freight for any new nodes. But this is no worse than they currently undergo with AID, so that seems fine to me. People generating fucked-up porn are no worse off, are potentially better if they create shared porn-trees, and are much better off when they play the regular trees. Perhaps everyone greatly overestimates the appeal of customization, because in AID, it costs exactly the same as non-customized; OnlyFans notwithstanding, there are millions of viewers for regular porn stars’ videos, after all.


  1. While the ranker+RL model would be nontrivial, most of this would be relatively trivial to implement, and a CRUD web app, nothing complicated. It’s just a tree, after all. You keep a list of action IDs associated with each outcome ID, and follow the pointers etc.↩︎

  2. OA-style PPO might be obsolete now. I found it difficult, requiring multiple RAM-hungry models with special finicky slow reinforcement learning training for relatively poor results, and so no one does it—but the new demonstrates that you can probably do this with a single self-supervised model simply by treating it as a text finetuning problem and formatting ranked choices appropriately before doing normal finetuning self-supervised training. So if you can finetune a GPT model on a fiction dataset, you are also trivially able to train a ranker & RL-optimized model!↩︎