This quarter I started building an AI TTRPG Game Master, which I have been tentatively calling “The Wizard’s Codex”. In this post I’ll talk about my motivations to work on this project, what I accomplished this quarter, and where I am planning to take this project in the future. The idea for this project is ambitious in scope, and while I still have a lot more to build before I even reach an MVP, I am proud of the progress I made this quarter. The project has gone from an amorphous concept in my head to the start of something tangible, and I have a much clearer picture of what the project might become and the challenges that lie ahead. I want to give a big thank you to Christina for welcoming me into her independent project cohort this quarter; I really appreciated the space you created to explore ideas and chat about games, design, AI, and more. It’s been a great quarter!

Background

I have been a gamer for as long as I had access to a computer, and while I find myself playing games less as life gets busier, every day I am more drawn to games as a medium for expression. Like every gamer, there are certain genres and types of fun that resonate with me more than others. I have found that my favorite games tend to be those that facilitate the creation of emergent narratives. This can happen across a broad range of genres, but I find them most reliably in simulation games and sandbox RPGs.

When I mention simulation games, I’m talking about games more like The Sims and Cities Skylines, rather than those like Microsoft Flight Simulator or Farming Simulator. I like these games because the complexity of their systems can end up giving these games a toy-like quality. Chaos theory tells us that with enough complexity, even systems governed by completely deterministic rules can produce seemingly random behavior, and thus games/toys with complex simulations tickle my brain in a similar way to games with randomness and chance which have been shown to elicit highs and addictions among gamers. This is why gambling can be so addictive for some people. While I think a simulation game is much less problematic than gambling games, there is definitely an addicting slot machine-esque experience to re-rolling the simulation and seeing what you’ll get this time.

In Civilization V, you can set up a game of only computer-controlled Civilizations, and watch them duke it out across centuries. There is no human input, only the chance to watch a unique and complicated history play itself out. I LOVE this. There is a whole genre of “Battle Simulator” games that have cropped up that scratch a similar itch. People can’t help but watch how a Roman Legion would fare against a mob of Zombies. Even though each soldier is simulated as an exact copy of each other, the theme of the game makes it easy to imagine countless stories going on amidst the battle. A lot of the narrative in simulation games, and thus a lot of the fun for me, exists within the story I tell myself in my head as I play.

Cover image of the Ultimate Epic Battle Simulator game on Steam. It depicts an massive army of Roman soldiers charging into battle against an equally large army of zombies. This takes place in a desert valley with some hills depicted in the distance. — Steam cover photo for the game Ultimate Epic Battle Simulator

I think that implementing Conway’s Game of Life in CS106X back in 2017 is when it first clicked for me how much I love emergence in games. I was fascinated that a simple set of rules could lead to so much complexity. So, a good simulation game is awesome for creating emergent narratives because the complexity of the systems can create an enormous number of unique outcomes, each enjoyable to observe.

Sandbox RPGs are the other genre of games that scratches the emergent storytelling itch for me. In a game like Skyrim I can join a civil war on the side of the Empire or rebels, I can work my way up the ladder of the Thieves Guild, Assassin’s Guild, or College of Wizards. I can become a blacksmith, or a trader, or a hunter. I can become a werewolf, a vampire, or join the groups that hunt them. There is so much to do that who knows how much I have left undiscovered even after nearly 150 hours in the game.

There are already games that attempt to bridge these two genres which have found commercial success. One of my all time favorite games, Mount and Blade: Warband does exactly this. In this game you start off as a penniless peasant in a medieval world divided into a number of kingdoms and cultures. At first you fight in tournaments and do simple tasks to earn money and reputation, but eventually you can start to accumulate real power. With enough cash and skill you can recruit villagers into your warband, and over time train them into elite troops. You could become a trader, making millions buying low and selling high in various market places across the simulated economy of this game world. You can become a lord and rise up the ranks of an existing kingdom through conquest or marriage, become a mercenary, or rebel and start your own kingdom.

Over-the shoulder view of a knight on horseback, positioned just behind some other horsemen and archers. Across a clearing is a line of enemies charging into battle — Mount and Blade: Warband – You control the tactics of the battlefield: where troops move, formations, etc, all while still controlling your own character and participating in combat as your character.

At the start of the game you are doing hand to hand combat in your encounters, and by the end you are giving commands to your armies, controlling the high-level tactics and positioning of your troops. The game is a sandbox RPG, and delivers simulations of battles, diplomatic relations, and an economy sim. This game delivers high-level simulation experiences like controlling the development of a nation like in Civ V, or watching massive battles like in Ultimate Epic Battle Simulator, but it grounds you as an individual within the world. To have full god-like control over the simulation and deliver the power fantasy many games aim to deliver, you have to acquire power as an individual in the game world. This makes it feel so much more rewarding because it forces the player to become invested in the game world before they have power over it.

Games like RimWorld and Dwarf Fortress offer another take on the blend between genres, both are fundamentally colony-management sims, but these games make more of an intentional effort to create narrative than pure-sim games like Cities Skylines. In RimWorld, you pick a story-teller that will modulate the difficulty of the game and send random events to your colony meant to keep you on the edge of survival, where all the best stories are experienced. Dwarf fortress simulates the world so deeply with so many interacting layers and systems that stories can emerge from the game well beyond the imaginations of its creators. I’ve never played because the complexity of the game makes the learning curve intimidating to me, but there are videos and reddit posts dedicated to people sharing the wacky stories that unfold in the game and those are a ton of fun. Players in Dwarf Fortress and RimWorld are managing their sims and trying to get their colonies to succeed, but often it’s the completely unexpected stories that are the most fun moments in the game. These games encourage players to focus less on the outcomes of their colonies and to focus more on the stories they create while playing.

I mention these games as they are the closest analog to the Sandbox Fantasy TTRPG experience I’ve set out to build. Baldur’s Gate 3 seems like it should have made this list because it implements D&D rules, but it’s a meticulously hand-crafted linear narrative and the game I am making will likely have more in common with the procedural RPGs I mentioned earlier.

The last and most relevant influences on the game are TTRPGs, Dungeons and Dragons in particular. While D&D can be used to tell a linear adventure like Baldur’s Gate 3, the experience of a “Sandbox Campaign” is what I aim to create with The Wizard’s Codex. A Sandbox D&D campaign is the ultimate emergent narrative generator. The flexibility of D&D’s ruleset and improv-style role-playing gameplay means that it can be a vessel for almost any story you could dream up.

Why Make this Game?

I’ve already established that I love games with emergent narratives, and I believe LLMs have the power to transform most industries, including gaming. I also think that D&D and the TTRPG genre in particular is having a big cultural moment, and it also naturally lends itself to being revolutionized by AI. Dungeons and Dragons is a game I have played several times and have come to be really fascinated by. While it was once something only known of by the biggest nerds, now pop culture is filled with references. Many people who have never played even know about it. Moreso, the most common complaint I hear about D&D is that people don’t have the access to play it.

Being a good game master takes skill, practice, and hours of preparation for each session. A good game master can be hard to find, and if they are the only one in a group, they often get “abused” by the other players into always hosting games for them in an incredibly asymmetric relationship. Lastly, the “Sandbox Campaign” is considered the hardest style of game to run for a game master. With a linear story, many of the locations, characters, and story beats can be prepped ahead of time. In a Sandbox, you may create a pool of locations and characters to draw from, but the majority of the game rests on the DM’s ability to improvise creative stories. Being a game master is incredibly difficult and those who are good at it are stretched thin, so there is a big demand to play this game that is really hard to meet currently.

Right now, the biggest limitations with LLMs as a technology is that they speak too confidently and hallucinate. In the context of this game though, these limitations aren’t as big of a problem. As a game master, your job is literally to make things up and present them confidently. With the capabilities of LLMs aligning somewhat obviously with the requirements of a game master, it’s no surprise that others have been building at the intersection of D&D and AI.

Similar Projects in this Space

The very first exploration of using LLMs to play text-based RPG games that I did, was years ago when ChatGPT became the first LLM to go viral. I tried simply asking it to run an RPG game for me, and it was actually pretty fun. But that fun only lasted a few minutes. It became clear to me that the LLM was not capable of tracking money, health, or an inventory, nor could it even keep track of the story it was telling me. It had a similar surreal hallucinatory feeling to how dreams can sometimes play out, where you’re somewhere at one moment then you turn around and you’re suddenly somewhere completely different.

The game starts off fine, but quickly de-coheres. There was no story arc and a strong lack of object permanence. Things happened seemingly randomly, with no concern for pacing of the game or anything like that. The thing that hurt the game most for me though, was ChatGPT’s suggestibility. Anything you wanted to do in the RPG the LLM would do, regardless of how well it actually fit into the story and game universe. Your character could go from peasant to king in a single prompt, characters could be revived or invented at a moment’s notice and this made it feel more like a (bad) story-writing tool rather than a game. My fantasy RPG game should not let me “sprout laser gun arms and assault the King’s castle”, but chatGPT just went along with it. In more recent testing on new ChatGPT models, the output stayed on topic and coherent for longer, but eventually the same fundamental issues arose. The suggestibility has somehow gotten even worse though.

So I went looking elsewhere, and I found AI Dungeon. This game was a bit better than Chat GPT, but ultimately it’s the same exact product with additional fine tuning to tell stories in specific genres and with more emotional range than ChatGPT. Ultimately though, it has the same architectural hurdles as ChatGPT. It’s just a chat-bot with some fine-tuning, it does not have any underlying game rules that keep the story grounded. It does not keep a coherent view of the game world, and frankly works as more of a story writing companion than a game. This one was fun to play for longer than ChatGPT, but still lost its luster for me very quickly. So, none of the existing options could satisfy the desire I had to play this endless story-telling game I was imagining. Guess I’ll have to make it myself.

Approach

The idea is simple. I want to create a game, where an AI Game Master runs you through a TTRPG-like experience. It should be a game focused on telling a story, but it should be a game, not just a story-telling tool. It needs to address the deficiencies of existing competitors. Namely, it needs to:

Keep a coherent view of the world
Use the narrative to push back on players, to create stories with conflict and a feeling of progression

World Representation

To address the first concern, I have been working on designing a data schema for tracking objects in the world. After some research on the approaches used by leading firms to give their models memory, I determined that a graph-based approach would work best. A graph based representation of the world would let us define “objects” and connections between them. Quickly though, I realized the approach used by chat bot providers was built to be extremely general. Existing approaches extract “objects” as plaintext, and create connections between them by extracting “knowledge triples” that look like:

(<object>, <fact>, <object>)

For a concrete example, from the following text “Sally gave Bob chocolates because he loves sweets”, one triple you might extract is:

(“bob”, “loves”, “sweets”)

This approach is great for a chat-bot where all input, output, and internal data are in plaintext, but not as great for what I was building. Since the objects I was tracking needed to interface with game/rules and mechanics, I would need to track specific details about entities in a structured format and didn’t need such a generalizable solution. By instructing our extractors to specifically look for characters, locations, factions, items, and concepts (magic systems, religions, ideologies, etc) we can write more specific prompts which improve quality at the expense of generalizability.

So, I settled for tracking the game state using entities of the following types:

Character
- Any living being or creature with agency in the game world
Location
- Places in the world
Faction
- Groups that hold power in the game world
Concept
- Ideas or concepts that hold gravity within or shape the dynamics of the game world, e.g. religions, ideologies, cultures, prophecies
Item
- Inanimate objects: vehicles, tools, weapons, furniture, etc

Entities of all types can have properties, which can act as key-value pairs or as ways to attach attributes to certain entities. For example, character entities have a property called “appearance” with a description of their looks, a property which is later fed into an image model to generate character portraits. Another example, certain entities have a value-less property “storyRelevant” which can be attached to an entity to ensure it shows up in the player journal in the frontend.

Entities also have relations, which directionally connect two entities. A character might have the ‘leader_of’ relation pointing to a faction, which itself might have the ‘believes_in’ relation pointing to a religion. Later when entities get serialized to be used as context for an LLM query, we can specify to the serializer how many edges to traverse along relations. With a single Traversal you might get an entity string that looks like (with details and descriptions removed):
‘’’

[location]: Pinewater City

Relation: ruled_by:

[character]: King Xavier

‘’’

But with an additional traversal it could be:

‘’’

[location]: Pinewater City

Relation: ruled_by:

[character]: King Xavier
- Relation: ruler_of:
  - [faction]: The Verdant Republic

‘’’

This system of entities, properties, and relations is expressive enough to capture the structure of nearly any fantasy world you could imagine. While initially a little drunk on the gen-ai koolaide, I envisioned a game with an agent at the center that was simply instructed to “Run a game of D&D, here is this rule-less entity graph database you can use to organize your game”, but ultimately decided for a hybrid. I’ve chosen to create a set of predefined properties and relations common to certain entity types, and maybe down the road I will give the LLM the capability to create its own properties and relation types. For example, I specifically prompt an LLM to generate an “appearance” string for each character, rather than let an LLM agent reason “this is a character, so I should probably create an ‘appearance’ property.”

For most types of entities, it’s relatively easy to create prompts that can generate them or extract them. For instance, in the world building pipeline, I have a stage that generates a leader for every faction that is generated. Later in gameplay, when the game master responds we run an extractor on the messages that checks whether any new characters have been mentioned, and creates new entities for them. Both of these are fairly straightforward, with prompts along the lines of “create a character that is the leader of this faction” or “extract any new characters introduced in the latest chat message”.

Items are also easy to extract, and while Factions and Concepts seem a bit harder, I feel comfortable building an MVP that has the factions and concepts (nations, organizations, magic systems, prophecies, religions) fixed for the duration of the campaign. Locations however, turned out to be the most challenging to represent while also being one of the most important entity types in the game.

Locations

Eventually, I found that ‘location’ type entities were going to require a more structured design than the other entity types. Every moment of gameplay will take place within a location. Gameplay will involve traversal of the graph from location to location, so the structure is incredibly important. Though countries have borders, in the physical world locations are continuous, so it’s hard to reason about how to break up a space into discrete “location” entities.

To begin, I noticed that the way we think about locations is inherently structured into a hierarchy. Europe is a location, as is the UK, as is London, as is The Volley Pub on 211 Old St London, and you could even argue that a specific seat at the bar is a location. A location can be encapsulated by another, and a location can encapsulate multiple other locations. To help structure this hierarchy, I created a ‘location_type’ property that all location types would have, with the following enum values: World, Continent, Region, Settlement, Building, and Scene. In this hierarchy, ‘Scenes’ represent the most granular type of location, which you could imagine as a movie set. While a movie might be “set in LA”, every moment within the movie occurs in a more specific scene within LA: a bar, the beach, Hollywood Blvd, etc. Scenes in this game are also meant to represent that most granular, most specific definition of a location: the place where the action is actually happening.

Though I was able to start generating game worlds with this design, I hit a roadblock when I went to design the agent game master. How exactly do I reason about creating new locations? For character entities I have gotten good baseline outputs from just giving the model the chat history and asking “did the last message introduce any new characters?”, but how do you ask that question about locations? If the character walks across town is that a new location. If they travel for some time, how far have they gotten? How do I track distances between locations using the graph structure?

There were so many unanswered questions that I had to move away from agent design back to refining my world representation. Eventually I settled on a hybrid representation for locations. In addition to the entity graph, I generate a small 2D map to represent the world. An early extraction stage has an LLM answer how many continents to give the world, conditioned on the world summary, and this is used to procedurally generate a map by sampling perlin noise at various resolutions. This is an incredibly common approach for generating terrain maps. Here is a blog post that goes into it a bit more.

Through iteration I’ve settled on the following approach. First I generate a 100×50 world map with the number of continents the LLM is instructing me to. I’ve defined a region to be roughly 16km x 16km (~10mi x 10mi), and this representation can represent up to 5,000 regions. Though the map is generated at the start of the game, this is just a 100×50 grid of values representing the elevation of the region (more layers like wealth, rainfall, temperature, etc to be added to the map generator in the future). As we move through the world, we pre-generate the entities for only the regions adjacent to the one containing the player.

A illustration of how the map system works. There world map contains tiles for each region, which can have different values (land/water) and are either known or unrevealed. Revealed regions can get broken further into a Region map, which has tiles for Scenes within it.

Regions that have an entity generated are further broken down into a region map, made up of an 8×8 grid of scene locations. Scenes can also be land or water, and land scenes can be wilderness or settlement scenes. The ‘settlement’ location still exists within the hierarchy, but it exists as a logical container for several settlement scenes within the region map. Given the 16km x 16km region size, it follows that scenes are roughly 2km x 2km in size. In practice this will vary, as an extremely dense urban region will likely represent a much smaller space. The 2km x 2km measure is most useful to help the LLM reason concretely about the boundary of borders that are continuous expanses of wilderness and so that I can calculate approximate distances between locations.

The final design work left for the location system is around how exactly to handle building generation. Buildings don’t exist as locations on the map, rather they act similarly to settlements, as containers for scenes. That said, scenes that are directly contained in a settlement location will always have a coordinate on the map, while scenes in buildings do not have their own unique coordinate. Here is a visual to (hopefully) help clarify, it depicts a location hierarchy and how it maps onto the Region map grid.

A logical diagram of how location entities translate onto a region map. Notice that the region has some scenes as direct children (wilderness scenes). It has some scenes which are direct children of the settlement location, and these scenes map to a specific tile on the map. Notice that the urban scene encompasses the building and its scenes, which do not have a unique position on the map (they are within the “Town Square” scene). Lastly, the green lines represent connections between on-map scenes that are traversable by the player, and blue lines represent traversable edges that don’t cross map tile boundaries or change you coordinates in the game world.

I still have many questions to answer. How do I determine the number of buildings that should be contained within a specific tile of a settlement? Once you have a building entity, how many scenes should it contain? Is a room in a building a Scene, is a floor a scene? For the first question, I’m leaning towards assigning a ‘density’ property with values ‘rural’, ‘suburban’, and ‘urban’ which determines whether a settlement scene branches into 4, 10, or 100 locations. For rural locations, I plan on generating the four buildings as soon as I make the central settlement scene, so the buildings can be used to inform the game master’s output “…you see a blacksmith as well as a general store…”. For suburban locations, we’ll pregenerate the names of the 10 buildings, so the GM can describe the scene properly, and then generate the actual location entities when the player goes to enter the building. For the urban scenes, I plan on just letting the game master hallucinate freely towards the player, then extract its answers into entities.

With smaller locations, you don’t want the LLM contradicting which buildings are present or constantly adding new buildings to a scene you’ve explored thoroughly: “…you know that quaint village square you’ve spent hours in? Well there’s been a nightclub here this whole time…”. For an urban location, the fantasy equivalent of NYC, it feels completely reasonable to discover new locations each time you visit: “…as you explore the maze of back alleyways you come across a tea shop you’ve never seen before…”.

I look forward to finalizing this design soon, at which point I’ll have a fully fleshed out set of rules for how to represent locations and how to generate new ones. There is still some additional design I need to engage with on how the bot should determine that a player wants to move from one scene to another. Should I use an LLM to determine what a player’s intention is? Should I have a keyword, where when a word like travel is used, then we read the message with an LLM? Maybe traveling should be a UI-based action that is done on the map page? I need to spend some more time brainstorming here.

Ultimately, I am putting a lot of effort into the design of the location entities because I believe it is crucial to delivering the “grounded” feeling I want to create in this game. Whereas current approaches deliver psychedelic dream-like stories decohere over time, I want to create a game that feels like it has real places. I want them to be places that you can visit over time, and develop a real attachment to. Going on a journey or epic quest is such a fantasy trope at this point (like in Tolkien’s Lord of the Rings, which is basically all traveling), that delivering travel gameplay that feels immersive, fun, and challenging is a goal I am designing around.

Resisting Player Intentions

For this to feel like a game instead of a fantasy story generator, it needs to push back on the player’s intentions. What we need is a set of rules that help create a feeling of challenge and progression within an improvised, mostly natural-language game environment. Luckily, the whole reason I want to make an AI-powered TTRPG is that their rulesets are already designed to do exactly that. Even better, Wizards of the Coast publishes their core D&D ruleset as a commercially-usable System Reference Document. In fact, their 10-year refresh of the D&D ruleset happened recently and an updated SRD 5.2 was released this year.

The SRD provides definitions for characters and mechanics in the flavor of D&D. Most importantly it includes D&D’s attributes, skills, and proficiencies mechanics. When you attempt to do something in the game world, unless it is a completely trivial task, the Game Master will likely ask you to “roll for ___”. You want to lie your way past security at an event? Roll for deception.

The number you roll gets a positive or negative modifier added to it based on the skills of your characters, then this number is compared against a Difficulty Class (DC). This mechanic creates a dynamic in the game where skills you are good at are resilient to bad rolls, and skills you are bad at will require lucky rolls for success. Crucially, this means that not everything that the player tries to do will succeed. This means players have to creatively strategize on how to solve dilemmas in the game while leaning into the strengths of their character in order to manage risk.

The SRD also includes guidance for Game Masters on how to reason about when skill-check rolls should happen, which skill a player’s action should require, and how to assign DCs. The SRD provides guidance for game masters on how to apply the game rules in different scenarios that can come up in gameplay. So, how will I incorporate the SRD rules within my game?

Wouldn’t it be nice if we could build a Game Master bot that, rather than having every single mechanic memorized, is able to run a game more like a real person? A real game master not only has notebooks where they track the characters and locations of their world, like our entity system does, but they also have the rulebook available as a reference. The goal for this project is to use AI to create a Game Master that processes a player’s turn similarly to how a human game master would. I will create an agent with access to the rulebook, and a set of functions for interacting with the maps and database. First, I will have the agent query the rulebook for snippets of sections that it thinks are needed to process the player’s next action, and these will get added to the next step’s context. Next, I have the agent’s plan on what to do. I’m looking for something like:

“””The player said they wanted to hit the goblin. Given the chat history and the scene entities in my context, it looks like the player is looking to hit the ‘Goblin Leader’ entity at index 2 in my context. Based on the rules I queried earlier, I’ll need to see the stat block for the Goblin’s armor to calculate their armor class, so first I’ll query the database for that. Then i will call the roll_dice function…..”””

It will take some experimentation for me to see what level of abstraction works best for an Agent’s toolset. Should I just give it a query_entitiy function and let it sift through the returned data, or expose more specific functions like get_character_health, get_inventory, get_home_location that have semantic meaning. Perhaps the agent is great at sifting through data, but drops in performance when given too many action options. Perhaps delivering only relevant context with no noise will prove important for output quality, or maybe we choose to do it as a cost reduction strategy to use less context tokens. This is an open research question within this project.

This cutting-edge agentic experimentation is something I’m incredibly excited to mess around with, but it has taken a while to get to the place where I can begin that work. To create an agent that can navigate a graph database for relevant entries, it’s really important to have a graph database with entries to navigate around. Otherwise, there is no way to test that your approach is working. So, I ran into a bootstrapping problem. Before I could create an agent to manage a world, I’d have to create a world for it to manage.

Results

The main result of this quarter has been a world-generation tool along with the starts of the agent that will act upon the world.

World Generation

The first step of world generation is a questionnaire that the game runs the player through. It asks them a set of thought-provoking world building questions, and then uses their answers to write a ‘world summary’, a few-paragraphs long description of the world, its unique details, the factions within it, their goals, and so on. It is a high-level premise of the fantasy world and it is used as input to many of the generation prompts in order to keep high-level consistency across the world entities. I spent a decent amount of time engineering a world building system that would be flexible: players can ask for suggestions, ask the LLM to come up with answers for them, or ask the LLM to fill the whole questionnaire out itself if the player wants a complete surprise.

Using the existing APIs you have to choose between structured output, or streamed output. Structured output is incredibly useful, as you get responses that contain already-parsed python objects that can be directly manipulated in code. It makes integrating LLMs into a data pipeline incredibly convenient. Streaming on the other hand, is incredibly useful for front-facing applications because it dramatically reduces the perceived latency for users. With streaming, you see text pop up token by token, but with streaming off you receive the full chunk of text after the last token has been generated. So streaming often means you start seeing responses seconds before you’d see them without streaming.

Why is this problem relevant? With the questionnaire bot, I built it in such a way that it will respond to a user with plain-text messages until the user has given a response to the question. This way, the LLM can guide the conversation until it gets the user to answer its question, say if the player is off-topic or overly vague for some reason. For a simplified example, this capability lets us use a single LLM prompt to have the following interaction.

LLM: What are the nations that hold power in your world?

Player: I’m hungry.

LLM: Sorry you’re hungry! You should get up to grab a snack, and when you’re back let me know what sort of nations you’d like to see in your game!

Player: I want one monolithic empire called “Centrality”, that…

LLM: nations=[{name=”centrality”,description=”Centrality is the sole state ruling over the lands of….”}

The chatbot will continuously engage the user until they answer the question. This LLM must be able to stream its plaintext responses when it is talking to the user, but when it responds with structured data that needs to be parsed and it should not be passed along to the user. It’s simple enough to load streamed output into a buffer and parse it when the stream is done if you want to stream structured data. The issue is that when you are given an LLM stream and return it to the user, you are sending the user the stream before you see what is being written to the stream.

I am proud to have engineered a system for this project that enables the use of streamed outputs for responses that include structured data, plaintext, and can be used to send system events and more. The solution is a mulit-layer design (inspired heavily from my time working inside the Windows Networking stack) where each layer acts as a middleman that listens into the stream it is passing up. Layers can allow data to pass through untouched (when it’s a plaintext message to the user) or intercept data and process it (when structured data needs to get passed between layers).

To do so, the LLM will output tags preceding its response indicating what kind of response it is. The questionnaire bot has three response types indicated by the following tags:

[#MESSAGE#] – A message to the player, this should get sent to the frontend
[#ANSWER#] – The player’s answer to the question, extracted from the chat into a structured format. This should be intercepted and stored in the database.
[#CREATE_CONTENT#] – The user wants an AI-response to this question, send it to another LLM to come up with a clever answer, then save that to the database.

Each layer listens to the data it streams and decides whether to send it in its output stream or hold onto the data. Layers can also stream tags and other non-LLM-generated data. For instance when the questionnaire LLM sends [#ANSWER#] tags, the questionnaire bot will store it and the layer above (world generator) wont even see it. Only after the final question has been answered, the questionnaire bot assembles one object with all the answers and streams it the the layer above, preceded by an [#ANSWER#] tag. This is definitely the most technical and computer-systems-y portion of this project, so apologies if this brief explanation isn’t thorough enough to explain it clearly.

A simplified view of how the world builder coordinates with the questionnaire chat bot using the protocols I developed.

World Generation exists in a multi-stage pipeline, which I have engineered to be easily expandable and configurable. Stages each contribute a small part to generating the world graph, and can have dependencies on other stages. For example, the first stage is using the questionnaire to generate the world summary. Once the world summary stage completes, any stage that depends on it starts to run. The tasks are all scheduled as parallel tasks to maximize parallelization and lower latency. For example, after the summary is generated stages that extract locations, factions, and concepts from the world summary and create entities for them each run in parallel. Once those are done, the stages that create relations between entities are able to start running. The granular approach stages maximizes the parallelizability of the world generation tasks, and the framework I designed handles all the concurrency between dependencies. It even validates there are no dependency loops.

Right now, the world generates a varied list of factions: nations, clans, movements, orders, corporations, etc. It also creates concepts like the types of magic in the world, religions, ideologies. It creates the characters that lead those factions, and the locations in the world where the factions are based. Relations are made between factions, their leaders, other factions, the concepts they believe in and more. Relations between characters are created to represent who knows who, and what characters think of each other. Nations diplomatic status between each other are set. The goal of the stage-based world generation framework I built was to minimize the difficulty of adding to it in the future. As the project progresses, I am sure I will revisit the world generation to add additional layers of complexity and richness to the entities and their relations. Here are some screenshots for some of a single character’s entity entry, so you can see how the properties and relations come together to create interesting characters and relationships.

Screenshot of the entities tab of The Wizard’s Codex, with a character’s description, appearance, and generated portrait.

Another screenshot of the same character’s page, this time displaying their relations. Each of the characters mentioned under relationships has their own page with description, image, relations, etc.

Future Work

There is a lot I look forward to working on in this project; perhaps I’ll take CS399 again another quarter. Most obviously, I want to finish my MVP game loop. This will essentially be a text-based walking sim, where you can traverse the world, explore locations and talk to characters. After that I can start to fold in additional complexity like items, inventory, skill checks, etc.

Package and Stylize as a Game

Something else that I feel strongly is that despite being a text and menu-based game, the package this game is served in needs to be stylized. I think that it will be important to demonstrate that this is a project of passion with care put into it, not an AI-slop generator. There are a ton of those around, and AI is good at generating slop, but the work I’m putting into this project is trying to get the best quality, most engaging outputs from these LLMs, and I want a package that demonstrates care and effort as well. To that end, I want to build the game frontend within Godot, as a 3D rendered book with the UI on the pages. I want to lean into “The Wizard’s Codex” as the magical book containing every possible story, and create an atmosphere in the game that supports that theme.

Refine the Approach to Image Generation

The one thing I know I don’t want to do with this game is plow along with absolutely no concern for the very valid concerns surrounding the use of AI. I know many artists hate AI, and I totally get why. I was told this quarter in CS247G “you could probably sell your pixel art” and while I appreciated the compliment, I had to push back on that idea because I really think that small-time freelance artistry is dying. At least in its current form. This is sad because making art is an awesome career I would have been interested in, one that sounds fun and fulfilling, but that opportunity seems to be disappearing and that sucks.

That said, the ability to dynamically create images of the game world as the story progresses is an incredibly impactful feature for a game like this one. I think there is a balance that can be struck that enables the usage of this tech while minimizing the perceived “theft” of others’ work. Crucial to this will be creating my own art (or contracting someone who is on board with the idea) to be used as the “design language” for art in my game. If all the art I generate is of a style I worked hard to create myself, it feels less like theft than if I shipped a Studio Ghibli-styled game.

Secondly, despite this being an AI game, I still view games as a form of art and expression. So, I want to be very deliberate about the art I deliver. Its the same motivation for wanting to re-stylize the game to feel more like a game. I am trying to create something that is fun, and that creates a vibe and an atmosphere that people enjoy. To do so, I want control over curating the details of the experience including the style of the images. But I don’t only want to control what is in the images, I want to enforce what gets left out of the images.

You’ll see in the example above the “village elder” image, a woman in her late forties, pretty much looks like a hot mid-twenties woman just with gray hair. This was the least-sexualized of the portraits generated for the women in my character cast. Though the image generation is a fairly new feature I’ve added, with the limited testing of how it generates “fantasy characters” I’ve already found that it consistently produces overly-sexualized images of the women (especially in comparison to the images it generates for male characters). It’s giving “they capitulated to the demands of gamer-gate” and that is NOT the game I am trying to make. I think there is certainly space to explore sexuality in this game, RPGs like Dragon Age and Baldur’s Gate have been doing that for years, but there is so much more that I’d like it to do too. I think that I can improve this aspect of the image by being more specific in the image prompts, but creating a proprietary art style should help too.

So, in order to mitigate the moral hazard of using image generators, curate my artistic vision, and in order to not play into a culture I don’t want to be a part of, creating my own visual design for this game will be a crucial future step.

Explore Experimental UX/UI

I want to explore the idea of the game having two text feeds. One would be the “chat” feed, where the player can type messages asking the Game master a question or indicating their next move in the world. In this feed, the game master would respond in a conversational way, answering questions or presenting the game-y descriptions of the game: “you see person A, B, and C. You see location X, Y, Z”. This chat presents information the player needs to make decisions in the game world.

For some reason though, I like the idea of a polished fantasy-novel version of your adventure getting produced as an artifact of playing the game. Perhaps after a few of the player’s turns, the outcomes of those turns can get aggregated and turned into a paragraph in the “story” feed. Where a chat message might say:

“… as you walk into the town square, you notice an elder seated on the park bench reading, two dwarves in a heated discussion by the general store, and a notice board in the center of the plaza. What do you do?”

Let’s say the player’s character’s name is ‘Murphy’ and let’s say they approached the elder after this message from the game master. The story feed might get updated with something like:

“As Murphy made his way into the plaza of the first village he’d seen in miles, he wasted no time finding a shaded bench to rest at. Murphy’s rest was not long, however, as soon after he sat the old man on the opposite bench looked up from his book and…”

Now, I’m not claiming that the writing above is any good (I’m no fantasy author myself), but notice that it is in 3rd person, referring to the player by the character’s name rather than ‘you’. It does not list out options for story hooks like the game chat does, the story goes straight to the elder the player chose to interact with rather than mention the dwarves or notice board.

In addition to the multiple feeds, I want to explore whether the interface for all actions should just be chat. I mentioned earlier the possibility of adding a dropdown to the chat to select its context, either ‘Do’, ‘Say’, ‘Ask GM’ or something similar. Maybe traveling should be handled via the map? I am only scratching the surface of the possible design questions there are to explore in this project, and being both an AI application and a game means that working on these questions presents an opportunity for both innovation and creativity.

Boost LLM Creativity

Lastly, I have some half baked ideas about how to boost LLM creativity that I want to explore when the project is far enough along. First, I’ve already improved the creativity of character names considerably from when the project first started. When you ask an LLM to come up with character names, the ‘most probable token’ aspect of its generation leads to incredibly same-y sounding names. I’ve had a world with Sylvie, Sylvester, and Sylvia. Sometimes the LLM gets fixated on a prefix, and all the names collapse to something similar.

I fixed the lack of variety in names by creating a dataset of names by scraping Chat GPT with prompts like “Please generate 1000 names for male elves in a fantasy setting.” For each species in D&D, I created a set with hundreds of masculine, feminine, and gender neutral names for each species, as well as a set of last names per species. Then when creating characters, I just sample a name from the dataset rather than ask the LLM to come up with it on the fly.

I think a similar method can be used to create named locations rather than generic sounding ones that I am producing now. The “Rolling grasslands” could be the “LizardHelm Grasslands”. Just by picking two random words, I could give regions, buildings, scenes and locations names, and then have the LLM generate around the name we pre-generate.

People complain about LLM outputs being too generic, but this doesn’t have to be the case. An LLM is simply producing the most probable outcome (with sampling adding some randomness), and if you say “Come up with a fantasy character for me” it’ll basically come up with the average of all fantasy characters ever made, which will feel like a generic pile of garbage. If you say “generate a character that leads the Order of the Falcons, a group that has trained falcons to capture messenger pigeons to spy on communications, and offer their espionage as a service.” the produced character will be much more specific. Similarly to how constraints often help creativity with humans, to get ‘creative’ outputs from an LLM you simply have to give it enough context that it is forced to produce something specific rather than general. How much better could these characters be if before the LLM generated them we sampled personality traits, strengths, quirks, and flaws, and forced the LLM to integrate them into the character?

Another thing I want to explore is having a section in the context for the game master bot where I sample a random entity from the game world and include it with the current game context. I’d tell the LLM that this is a random entity that can be incorporated into the current story or ignored as desired. This would be in an effort to increase the random connections in the game world and force “creativity” in the story the LLM tells. Maybe there is no mention of the “magic laser crystals” in the scene you are playing out, but it could get randomly inserted into the agent’s context and slip into the story subtly. I am super excited to keep exploring the problem of LLM creativity.

Hierarchical Narrative Pacers

One of my favorite lectures in CS247G was the one about narrative structure. I have even bought the “Save the Cat” audiobook and plan on listening to it over the coming weeks. Even though it is for film, I want to learn about pacing and tension and how it is used to create compelling narrative structures. Later, in this project I have an idea for a set of story teller bots that are tracking the current moment in the story with a short-term (episode) lens, medium-term (season) lens, and long-term (campaign) lens. These storytellers could track the evolution of the story, and send suggestions to the GM on what should happen in order to control pacing and create big narrative moments. In RimWorld, the storyteller AI tries to keep you in the sweet spot of gameplay where things aren’t going so poorly for your colony that it’s frustrating, but not going so well that you’re bored. Similarly, these story analyzers could give suggestions to the GM to make sure the player isn’t steamrolling the game or getting crushed.

Conclusion

I hope it’s clear to see that I have put a considerable amount of time and effort into this project this quarter. It has been a ton of fun and an easy project to pour myself into. I think that as a standalone quarter-long project, the world builder I created is already pretty cool. That said, there is still a lot to do for this project and I look forward to the interesting challenges that it will bring. Again, I really appreciate to have had the opportunity to work on an independent project under Christina this quarter, and I hope at least some of my yapping about it was interesting.

Share on Social Media

The Mechanics of Magic

The Wizard’s Codex: CS399 Project

Background

Why Make this Game?

Similar Projects in this Space

Approach

World Representation

Locations

Resisting Player Intentions

Results

World Generation

Future Work

Package and Stylize as a Game

Refine the Approach to Image Generation

Explore Experimental UX/UI

Boost LLM Creativity

Hierarchical Narrative Pacers

Conclusion

About the author

Leave a Reply Cancel reply

Background

Why Make this Game?

Similar Projects in this Space

Approach

World Representation

Locations

Resisting Player Intentions

Results

World Generation

Future Work

Package and Stylize as a Game

Refine the Approach to Image Generation

Explore Experimental UX/UI

Boost LLM Creativity

Hierarchical Narrative Pacers

Conclusion

About the author

Related Posts

Leave a Reply Cancel reply