Neural networks like GPT-2 power text adventure games where you can do anything; but they are too expensive. I propose that if we turn them into Choose Your Own Adventure hypertext games, they become feasible and enable entirely new gameplay.
2021-06-06–2021-06-18 finished certainty: possible importance: 4
A useful variation on AI Dungeon-style (AID) text games would be to turn them into shared public game trees of pre-generated options which the player selects from, Choose-Your-Own-Adventure-book style.
This can dramatically reduce costs as players spend most of their time reading cached output (rarely needing nor wanting to generate brandnew output requiring a NN run), can increase quality as players collectively uprank actions/
outcomes which are highest-quality, and caters to newbies who don’t understand the power of NN-backed text games and flail around.
Revisiting AI Dungeon (AID) in the light of a year of GPT-3, I would like to propose a radical redesign of it based on the problems it has encountered. AID is one of the most interesting uses of the GPT-2 & GPT-3 neural network models; computer games, forever frustrated by their inability to offer worlds as arbitrarily complex and realistic as a human Dungeon Master running Dungeons & Dragons games (which is effectively a collaborative fiction writing exercise), no matter how many verbs are added to the text adventure parser or how many 3D artists are hired, finally can handle anything a player can type in, and improvise with it. At its best, an AI Dungeon game like “My Musical Troupe of Orcs Uses Music to Advance Orc Rights” really feels like simulating an entire world. Of course, as current NNs have many limitations (they are far from human-level intelligence, lack any kind of memory, are unable to plan, are not trained to create high-quality fiction, have weak common sense, etc), AID and its imitators typically does not deliver such a peak experience. In practice, AID, as run by Nick Walton’s startup Latitude as of April–June 2021, has been experiencing more serious practical problems relating to GPT-3 cost, and its use of the OpenAI API & OA’s mandatory moderation thereof.
AID and imitators are currently designed in the most straightforward way possible: a seed text, often unique or customized, and then text generation step by step after waiting for player text responses. This causes 3 major problems:
- cost: every turn invokes a full-cost call to a NN model. No matter what you do, this is always going to be expensive. This has devastating consequences on what players can be allowed to do and how often, how hard they must be monetized, how one is beholden to APIs, centralization requiring/
enabling censorship, etc.
- quality: models like GPT-3 can generate stunningly good output when cherry-picked at a level like ‘best of 20’; the other 19, however, range from ‘meh’ to ‘atrocious’. Few people have the patience, or funds, to do 20 samples per action/
outcome, however, because of #1. If one could, however, the results might be stunning.
- learning curve: the default experience for AID seems to be to open it up, type in a few sentences like “Hello, I am a human”, decide it’s boring, and to quit. It doesn’t show you what it can do; one has to elicit by demanding cool stuff, of the sort most people quite understandably expect computers to be completely incapable of. (How can a computer be a good D&D DM when it also takes 20 seconds to load a social media website, which then is broken?)
Fixing these AID problems are intrinsically difficult. The AID model is inherently extremely expensive, and tweaks like model sparsification/
To solve this, we need to rethink AID entirely; AID is the first and simplest way to do a text game with NN models, but it may not be the best.
Games are often thought of in terms of ‘trees’ defined by actions and possible outcomes, like in chess or Go. Text generation is also a tree—just a wide tree. Typically, in GPT-related apps, this is hidden from the player: you just see the blinking cursor, not a tree of possible text completions from there. Of this tree, the player sees only one branch; he can erase outputs and rewind to an earlier branch, but still can only see his current branch, and is blind to all the options. Anyway, it would be too difficult and expensive to show the player all of the branches, because that requires generating many more samples, only to throw them away, likely unseen.
But what if we embraced the tree hidden behind the AID interface? What if we didn’t throw them away? What if we kept all those branches we so expensively generated, and shared them with other players? How would we do that?
By playing not ‘neural net D&D’ but ‘neural net Choose Your Own Adventure’ (CYOA) text games instead: a game where instead of ‘generating’ the next turn, each page instead gives you a few good options to choose.
In a CYOA version of AID, everyone starts with a certain scenario(s), which describes a situation and lists, say, 5 possible actions. This is the root of the tree. Each of these actions is pre-generated by a model; a player chooses one to go to the next scene/
Switching to a CYOA model solves all 3 problems with AID, and is not merely a cost-saving crutch coming at the expense of game quality—a CYOA approach enables many things that are hard or impossible with the default AID interface.
A newbie firing it up for the first time is greeted by a UI that is as simple as can be, working just like a hypertext web browser, and immediately sucking them in with interesting options and outcomes on every page, more comprehensive, creative, and high quality than any normal single-authored CYOA could ever hope to be or compete with the hivemind of AI+human-community—the adventure never ends, it just slows down a bit when you go out of cache. There is no ‘cold start’ problem like there is with AID. No one will ever ask “but what do I do” of a CYOA. (What do you do? You click to find out whether throwing the anchor at the space-octopus works or if you should instead try to mind-meld with it to negotiate a peace-treaty between the space-octopii & humanity, that’s what you do. And then you discover the advanced ways to use it.)
As players play through the game tree, the actions/
CYOA vs AID is not a hard dichotomy, but a spectrum of novelty vs cost.
A CYOA can interpolate anywhere between the 2 extremes: the CYOA extreme is 100% cached, pre-generated, and static; it is free to play, and you can host hundreds of thousands of players on a single dedicated server for a few bucks a month of hosting costs. The other extreme is the default AID extreme: there is zero caching, everything is generated on demand by expensive NNs just for that turn, and immediately thrown away; highly active players can easily rack up scores or hundreds of dollars in OA API costs a month, and startups can burn through millions in VC capital catering to relatively small playerbases, with few options for how to cut costs (OA isn’t going to cut the API bill, so what do you do? Degrade the model players paid for? Between a rock & a hard place there, and hell hath few furies like a gamer scorned). In between, one has a mix of a cached tree, and occasional generation events
CYOA operators can adjust their spot on the spectrum by tweaking who can generate what when at what cost. For example, perhaps there is a ‘free tier’ who is unable to generate at all, and can only traverse existing nodes; or perhaps free tier players can generate at will, but only using the smallest cheapest lowest-quality model (in which case it will be extremely unlikely that they’ll generate a better choice than the community has curated using the highest-quality model’s suggestions, and they won’t generate unless they have to). If costs are too low, generations can be upgraded, new options generated to be added to popular nodes, players can be directed to interesting but under-explored subtrees to start fleshing them out, one’s idle GPU can go around filling out missing nodes during downtime just to be prepared for future players… ‘Whales’ are critical to game profitability, and they can be catered to in many ways, far more easily than in AID (which is already maximally expensive): perhaps they get 20 options instead of 5 generated for them, or perhaps they get dedicated GPUs filling out nodes in advance, so they never have to waste a few seconds waiting & twiddling their thumbs (ie if they are at node depth X, then children nodes at depth X+1 & depth X+2 are generated in the background for them, rather than waiting for them to make a choice and only then generating a node).
Many options and possible settings to choose the point on the spectrum that balances cost, novelty, and playability, and avoiding the messy situations with AID’s costs. This makes it more likely that acceptable tradeoffs and an economically-feasible business model can be found.
One can go further than simply offering up a tree with choices by optimizing the tree.
Choices can be ranked by popularity: bad choices, which few players care enough to choose, can automatically fall off the list and be replaced by new choices. Choices could be explicitly voted on to filter even more heavily. Players could have the option to reroll at any given point to stir in new options—this doesn’t undermine the cost savings because after filtering through 10 or 20 choices at a given node, the top 5 choices will almost certainly be much better than a random new choice, and so players just won’t bother unless the choices really are terrible. And nothing bars editing choices or writing new ones by hand to insert at a given node—the NN doesn’t care. Further, I refer to it as a tree, but it doesn’t need to be a tree either, only a graph (classic CYOAs were not trees either, and sometimes weren’t even connected graphs!); the straightforward generation approach is relatively unlikely to loop back to existing nodes (the generated text would need to be identical, which is extremely unlikely), but more advanced models might be able to choose to link to an existing nodes, and players may find it quite useful to edit the game-tree as a graph to set up ‘hubs’ or ‘episodes’ or ‘loops’.
The ranking mechanism, it’s worth noting as a concrete example of how it improves quality game-wide, further helps solve AID’s problems with long-range coherency & memory, and problems like characters changing sex mid-scene as the NN ‘forgets’—if players do not want such contradictions or forgettings, then they simply will not choose the actions or outcomes which entail contradictions, and will tend to pick the ones which are consistent and continue the story well. The players collectively serve as the long-term memory and make up for the NN’s failings. (AID tries to solve this with a limited ‘notes’ function which hardwires ‘facts’ into each generation as a kind of long-term memory, but that uses up the context window and still doesn’t work well.)
Because of this flexibility, you may be able to use a cheap small NN and not the big fancy ones like GPT-3: sure, maybe the average completion is not nearly as good, but the community will just upvote the best ones, edit in new ones as necessary, and will make the quality competitive. (You could start competing with AID right now using GPT-neo on your home desktop, no need to be a slave to the OA API and its many burdens!) Choices can also be tagged by players: “funny”, “dramatic”, “friendly”, “sexy” (perhaps color-coded)…
Given enough rankings, one can then train a ranker to predict the score of a choice. This can be used to screen new generations, rerank existing sets of options on top of popularity/
A CYOA might sound stale and boring, but the curation means it’ll be quite the opposite because of the social aspects.
It would be like a hybrid of HyperCard & MUDs (eg Kingdom of Loathing), because CYOA paths are deep-linkable URLs, in a way AID is not: I can easily hand you a link to an arbitrarily deep node, and you can click on it and start playing exactly the same thing I am playing; there is nothing remotely like this for AID—even if you hand me a scenario, our games immediately diverge and there is no way for me to easily share my version with you. But with a CYOA tree, it’s all just URLs/paths/pointers. And “the street finds its own use for things”.
So a player can ‘author’ an adventure by carefully curating a premise and then choosing actions and backing up and editing, creating a full-fledged scenario in collaboration with the NN, and then announce the trail down the overall tree to the new sub-tree when the sub-tree is satisfactory. Or players can diverge into different sub-adventures and continuums, and communities can develop elaborate in-jokes and allusions across nodes, or influence the rankings to create ‘canonical’ stories or environments. If you’ve ever seen ‘forum quests’ (Internet evolution of play-by-post tabletop RPG games), imagine that, but with parasocial factions contending to decide which option at a key node becomes the ‘True End’ and which ones are just ‘Bad Ends’. There might be elaborate discussions as groups step through adventures, debating which choices they all will take, with heretics reviled & exiled… Streamers might let fans vote to decide what fork they go down, and their hordes of fans follow them down it, voting with their feet for specific choices, and sculpting a particular stream.
So, this creates positive feedback loops in multiple ways:
- cheap play draws in players, amortizing fixed costs over more players;
- the more players you have, the more caching reduces cost per player, enabling you to offer the same level of service or cut prices or invest in more generation;
- the more players and nodes, the better ranked choices are, increasing player happiness while decreasing node generation (further reducing cost);
- and the better choices and scenarios are by default, the more players will be drawn in by the quality rather than low costs, feeding into the previous 3 feedback loops;
- the more players, the more likely community effects are to kick in and enable creation of brandnew ways to play, driving 1–4.
Players of this might look down on regular AID-likes: “You pay how much‽ Like, is that per year? And you get the random and illogical completions most of the time, while being totally unable to share it, and it takes several hours just to get an idea what you can do? Can you even do quests together in your AID with your clan or VTubers? Wait, you don’t know what a ‘quest’ is? smh! Well, I suppose that’s OK for fapping (you pervert), but that sounds like about it. Why don’t you try CYOA for a bit? I found a funny one the other day, where you turn into a goose, who is naughty…”
The main flaw with this is that people might want the contents of the tree to remain secret—that it’s not enough to merely have no names associated with it, but the content itself must be secret. Perhaps the content has private data like names of people one knows, or is the thought of anyone ever reading it is just sheer mortification. If people only want limited customization, like “Hi, $NAME!”, then that’s easy to handle with templating/
While the ranker+RL model would be nontrivial, most of this would be relatively trivial to implement, and a CRUD web app, nothing complicated. It’s just a tree, after all. You keep a list of action IDs associated with each outcome ID, and follow the pointers etc.↩︎
OA-style PPO might be obsolete now. I found it difficult, requiring multiple RAM-hungry models with special finicky slow reinforcement learning training for relatively poor results, and so no one does it—but the new Decision Transformer demonstrates that you can probably do this with a single self-supervised model simply by treating it as a text finetuning problem and formatting ranked choices appropriately before doing normal finetuning self-supervised training. So if you can finetune a GPT model on a fiction dataset, you are also trivially able to train a ranker & RL-optimized model!↩︎