Where We Left Off
Last spring I wrote about the beginnings of an AI-powered TTRPG I’d been building for CS399. At the start, even I was uncertain about exactly what it was that I was building. But like a sailor who can smell land miles before it appears in their spyglass, I could sense there was something exciting to be built at the intersection of AI and narrative games even if I couldn’t see it clearly yet. So, that spring I let my creativity and curiosity guide me, and ended up building a fantasy world building engine. After that quarter I had an app that implemented the equivalent of a world-building “session 0” in Dungeons and Dragons. You could chat with a Game Master, answer some questions, watch a procedurally generated map appear, and browse a journal of AI-generated factions, characters, and locations. It was cool, but it wasn’t a game yet. There was no gameplay loop, no combat, no way to actually play in the worlds being created.
A lot has changed since then. I’ve spent time learning more about game design, business, project management, and strengthened my convictions on what I want The Wizards Codex to be. Like Minecraft, The Wizard’s Codex sits at the intersection of being a game, a game engine, and a gaming platform. It’s an opinionated TTRPG-like that lets players live out any story they’d like within a fantasy world. An AI Game Master improvises the story, but a real game engine holds the world together.
What Got Built This Quarter
This quarter was heads-down building. 243 commits landed on main across three major workstreams: a full tactical combat system, a 3D map renderer, and a UI/UX overhaul. I also invested a lot of time into building infrastructure and workflows that keeps me moving fast as a solo dev. I won’t cover everything here, but I want to hit the highlights.
Tactical Combat
This is the feature area that once fully integrated will have the biggest impact on making The Wizard’s Codex feel like a game. This quarter I built the full backend for a grid-based tactical combat engine grounded in the 5e SRD rules. Attack resolution with weapon tables, proficiency bonuses, advantage/disadvantage, critical hits, armor class calculation, death saving throws: all the mechanics that Baldur’s Gate and D&D players have grown accustomed to playing within. The grid itself supports cover (calculated via raycasting), flanking, elevation bonuses, and line-of-sight. I added a system for environmental tile effects like fire, ice, poison clouds, and oil slicks that interact with each other. Spill oil on the ground, light it on fire. Get an enemy wet, then hit them with lightning for amplified damage. These kinds of emergent interactions are exactly the sort of thing that makes tactical combat fun in tabletop games, and they emerge naturally from a small set of rules.
I also built a civilian AI system that makes combat feel like it’s part of a larger game, not a whole different mode. Bystanders in combat scenes have an 8-state finite state machine: they start unaware, notice the fighting, panic, and flee. Different archetypes (merchants, guards, children) react differently. At the end of combat everyone in the scene will be impacted by what just happened; the location itself could be transformed by what happens in the battle map. If you fight goblins invading a village but launch a fireball that sets the village on fire during the battle, the journal will update to show a burning village and angry villagers. It’s a small system but it makes combat scenes feel like they’re happening in a living world, not an empty arena.
The most technically interesting piece is the NPC agent system. Each enemy combatant is controlled by its own LLM agent that receives a compact tactical context including its stats, a 15×15 ASCII grid of the battlefield around it, visible enemies, available actions, and its personality and goals. One issue I faced was that LLMs are being trained to be smarter and smarter, but in a story game some characters are smarter than others. How do you ask an LLM to come up with a battle strategy that is “medium smart”? Rather than tell the LLM “you’re in the 65th percentile of tactical intelligence, generate a strategy that reflects that”, we ask the LLM to generate a gradient of tactical actions ranging from tactical genius to catastrophically stupid and we modulate how we sample from these options based on the character intelligence.
A genius tactician almost always picks the optimal move. A dumb goblin picks randomly from good and bad options alike. The combat engine then validates and executes the chosen action through the same resolution pipeline that handles player actions. After each round, another LLM pass takes the mechanical event log (“Theron hit Groth for 9 slashing damage”) and smooths it into a narrative combat recap. The result is combat that resolves like a real TTRPG but reads like a fantasy novel.
The backend combat system is fully implemented and I’ve started on the frontend. There’s a working grid renderer and some early UI pieces in place, but fully integrating the combat map with the new 3D renderer, building out the complete combat HUD, and the mountain of QA work that comes with a system this complex is still ahead. Getting combat to a shippable v0 is a top priority for next quarter.
3D Map Renderer
The map got a complete rewrite this quarter. I migrated from a PixiJS-based pseudo-3D renderer to real Three.js with actual elevation geometry — land tiles use BoxGeometry with height proportional to their elevation, so you can actually see mountain ranges rise up from the terrain. Using actual 3d geometry means we can show the map from an angle that looks exactly like the old 2d map, but we can also rotate and change camera angles. This gives the world much more depth, and was a blocking change for implementing combat. If we have a grid with elevation variation and a fixed camera angle, we risk some tiles on the grid being hidden “behind” taller tiles that are closer to the camera. Being able to rotate was necessary for combat. The refactor was a fair amount of work, but switching to 3D has been pure upside.
At first, the 3D map was a lot slower than the 2D map. The naive implementation created 15,000+ individual meshes for a world map, which is way too many draw calls. I packed all 245 tile textures into a single 1024×1024 texture atlas, then collapsed the meshes into a handful of BatchedMeshes, an 80-90% reduction in draw calls. I switched from a continuous 60fps render loop to an on-demand dirty-flag pattern, so the map only re-renders when something changes. On a laptop, if you’re not moving the camera, the map costs basically zero. At this point, the 3D map is more performant than the 2D map I was using before. I also built a camera system with zoom-dependent pitch: zoomed out you get a top-down strategic view, zoom in and the camera tilts to show the terrain from a more dramatic angle. At a certain zoom threshold, the renderer automatically transitions from the world map to a higher-detail regional view, a classic LOD transition.
UI/UX Overhaul
I reskinned the entire app in a pixel art theme — custom pixel fonts, stone and wood panel textures, hand-picked leather backgrounds, redesigned icons for every tab. Over 43 stylesheets got touched. This might sound cosmetic, but I think it’s one of the most important things I did this quarter. In a landscape full of AI demos that look like they were built in a weekend, the presentation is a statement: someone cared about this. This is a game, not a prototype. I want opening The Wizard’s Codex to feel like opening a game, not opening an app.
I also built out a real navigation architecture — a main menu hub with Continue, New Adventure, Load Game, and Discover Worlds options. The New Adventure screen presents different world creation paths as cards. The Load Game screen has a two-panel layout with sort, filter, search, and game management. Small things, but they add up to an experience that feels polished rather than stitched together. There is still a lot of work to do in the visual and UX design, this is far from the final iteration, but I was getting tired of my game looking and feeling like every other weekend vibe-coded app out there.
The Rest
A few other things that don’t need their own sections but are worth mentioning: I built an in-game fantasy calendar system so the world tracks time as the player progresses. I improved the world creation chat bot with engagement tracking and an auto-completion system that fills in missing world details while staying coherent with what the player already established. I fixed the session auth system so players don’t get randomly logged out mid-game.
On the infrastructure side, I built a containerized parallel workspace system: a Dockerfile and setup script that spins up isolated Docker containers, each with its own git worktree, so I can run up to four AI coding agents working on different branches simultaneously without them stepping on each other. I also added OpenTelemetry distributed tracing and a pyinstrument profiling middleware so I can actually see what’s slow and why. These aren’t player-facing features, but they’re the things that let a solo developer ship 243 commits in a quarter.
Theses About AI-Native Game Design
In a recent piece called “Why No AI Games?”, game designer Frank Lantz, founding chair of the NYU Game Center, creator of Universal Paperclips, and author of The Beauty of Games, makes an observation I haven’t been able to stop thinking about. It’s been five years since the current AI era kicked off, and despite all the hype, there hasn’t been a single genuinely groundbreaking AI-powered game. AI is transforming how games get made, sure, but where are the new experiences? He runs through the usual suspects, AI Dungeon (whose CEO I’ve met several times now), a couple of viral party games, and like me he finds them all underwhelming. His theory is that LLMs’ soft, stochastic logic just isn’t intrinsically fun the way physics engines and 3D renderers turned out to be. Simple deterministic rules produce fun through emergence; starting with a bunch of unpredictable complexity short-circuits the whole process.
I think he’s half right. He’s absolutely right that no one has nailed it yet, and he’s right that just wrapping an LLM in a game shell doesn’t work. But I don’t think the conclusion is that AI doesn’t belong in games: I think the problem is that people keep trying to use AI as the game, when they should be using AI inside a game. The technology isn’t the bottleneck. The design thinking is. Too many people building in this space want to reinvent the wheel without taking the time to learn the principles of fun that game designers have spent decades formalizing. The CEO of AI Dungeon himself told me they focus on hiring people from outside the games industry. And while I agree there’s a creative rot and ballooning operational complexity at AAA studios, the past twenty years have been a renaissance for indie games and small to medium sized studios. There’s an enormous body of knowledge about what makes games work. We should be standing on the shoulders of giants, letting existing games and game design principles inform where AI fits best, not ignoring them. Over this past year of building, I’ve developed some strong convictions about what AI-native game design actually looks like, and I want to lay a few of them out here.
Thesis 1: LLMs are reasoning engines, not game engines
Years ago when I first tried to play a chat based RPG with ChatGPT it was fun for about five minutes before the world started to decohere. Characters appeared and vanished. Locations shifted without explanation. I could talk my way into being king in a single prompt, or summon a t-rex with laser guns for arms. It had that surreal, dreamlike quality where you turn a corner and suddenly you’re somewhere completely different with no explanation. Every AI game I’ve tried since then, AI Dungeon included, hits the same wall eventually: no object permanence, no real consequences, and the LLM will say yes to anything. That sounds like freedom but it actually kills the experience. A game where you can do anything is a game where nothing matters.
The core issue is that people keep trying to use LLMs as the entire game. LLMs are semantic reasoning engines, though, not execution engines. They’re extraordinary at understanding player intent, generating coherent narrative, voicing believable characters, and making judgment calls in ambiguous situations. They are terrible at arithmetic, state tracking, and deterministic logic. If you ask an LLM to simultaneously be the narrator, the rules engine, the database, and the world simulator, it will fail at most of those jobs. My answer is: don’t make the LLM the game. Make the LLM a layer on top of proven game systems. In The Wizard’s Codex, tried and tested SRD mechanics handle combat math. A graph database tracks every entity, relationship, and location in the world. Numpy generates terrain from noise functions. The LLM sits on top of all of that and makes it narratively legible. It’s the difference between a chatbot pretending to be a game master and an actual game engine with an AI narrator.
This is where I actually agree with Lantz more than I disagree. He says simple deterministic rules produce fun through emergence, and he’s right. That’s been true since chess, since Go, and since Conway’s Game of Life. Games like Dwarf Fortress and RimWorld lean heavily into emergent storytelling, rising up from a simulated world where a number of deterministic rules end up generating unpredictable complexity in the game world. What LLMs unlock is taking the data from a simulation and presenting it in a way that feels narratively meaningful.
When a dozen simple combat rules interact and a flanking maneuver, an oil slick, and a lightning spell combine to turn the tide of a fight, that’s the kind of unpredictable complexity that players found fun and exciting in games for decades. When the system then narrates that sequence as a dramatic turning point in a battle, with the NPCs reacting in character, that’s something new. The interesting stuff is still coming from the rules. The LLM just makes it legible as a narrative.
Thesis 2: Don’t ask the LLM to be creative. Constrain it into creativity.
If you ask an LLM to “generate a fantasy town,” you will get the expected value over all fantasy towns ever written. You’ll get cobblestone streets, a bustling marketplace, a mysterious old woman in a cloak, and a tavern called something like The Silver Stag. It’s not bad, exactly. It’s just the average. It’s the town that would emerge if you melted down every fantasy novel ever published and poured the resulting slurry into a mold.
But what if, before the LLM ever sees the prompt, you’ve already accumulated a rich set of constraints about this town? The player described a world with harsh winters and a dominant theocracy during world creation. The noise-based map generator produced specific values for this tile’s elevation, temperature, rainfall, and soil quality. And over the course of gameplay, a war was fought in this region, a faction collapsed, and refugees settled here. All of that becomes conditioning context. I think of it in explicitly statistical terms:
E[town] ≠ E[town | the world’s lore]
The unconditioned town is generic slop. The conditioned town is forced to be specific, because it has to reconcile with everything the player established, everything the procedural systems generated, and everything that’s happened in the game so far. And the second one is vastly more interesting.
In artistic fields, people say that limitations are the key to creativity. Poets choose to write sonnets. Filmmakers work within budgets. A painter limits their palette. I apply the same principle to LLMs. Before I ever ask an LLM to describe a location, I’ve already procedurally generated a mountain of data about it. The noise-based map gives me elevation, temperature, and rainfall. From those I derive biomes, soil quality, and resource availability. The entity graph tells me which faction controls this region, what religion is practiced here, what trade routes pass through, and what wars were fought nearby. Then I hand all of that to the LLM and say: make this narratively cohesive. The LLM doesn’t get to invent a generic fantasy town. It has to explain why this specific place, with these specific stats, in this specific corner of this specific world, is the way it is. The procedural generation is the creative constraint, and the LLM is a narrative smoothing function that weaves the data into something that feels like it was authored.
This pattern shows up everywhere in the project. Early on, when I let the LLM name characters, I got worlds with Sylvie, Sylvester, and Sylvia living in the same village. The model fixates on a high-probability token prefix and just rides it. So I built a dataset of over 300,000 names, pre-generated per species and gender, and now I just sample from it. Much more variety than any LLM would produce on the fly. For continent names, I use a hybrid: procedural generation produces a phonetic “seed” and then an LLM pass customizes it to fit the world’s flavor. The principle is always the same. Generate the low-probability, high-variance data using procedural methods, then use the LLM to smooth it into narrative. Don’t ask the LLM to be random. It’s not good at random. It’s good at making things make sense.
The entity graph amplifies this even further. A building isn’t generated in isolation. It knows about the town it’s in, the town knows about its region, the region knows about its continent, and the continent knows about the world. Using the graph, I can traverse relationships to assemble a generation context that captures the relevant lore surrounding whatever I’m about to create. A tavern in a coastal fishing village controlled by a theocratic faction will feel completely different from a tavern in a mountain mining town run by a merchant guild, even if the underlying generation prompt is identical. The difference is entirely in the conditioning context. That’s what makes the world feel interconnected instead of randomly assembled.
I have a future idea I’m excited about that pushes this even further. Right now, the graph traversal pulls in related entities for context. But what if, occasionally, the system randomly sampled an unrelated entity from somewhere else in the world and injected it into the generation context? A sort of serendipity engine. Maybe the generation of a blacksmith in a remote village gets a random injection of context about a prophesied magical crystal from the other side of the world, and now this blacksmith has heard rumors about it, or has a fragment of one in his workshop. The LLM would have to reconcile the connection, and in doing so, it would create the kind of surprising cross-world link that makes a fictional world feel like it was crafted by a real author with a grand plan. Right now this is just an idea, but it’s the kind of thing that gets me most excited about this project.
Thesis 3: Be opinionated. More Minecraft than Roblox.
When AI can generate anything, the hardest problem becomes choosing what not to do. It’s a kind of painter’s block. The canvas is infinite, every color is available, and you’re paralyzed by possibility. I think this is why so many AI game companies end up building platforms instead of games. If the tech can handle any setting, any genre, any story, why constrain it? Build a general-purpose narrative engine and let users do whatever they want. That’s the Roblox approach. It sounds smart on paper, but I think it’s a trap. Roblox is an incredible platform, but nobody talks about Roblox the way people talk about Minecraft. Minecraft is a game. It has a point of view. It made choices.
Think about the creeper from Minecraft. What even is a creeper? It’s a weird green thing that silently walks up to you and explodes. It doesn’t come from any mythology or existing genre convention. Zombies and skeletons, sure, every survival game has those. But a silent exploding cactus monster? That’s a specific, opinionated, and borderline risky design choice. And now the creeper face is one of the most recognizable 20-pixel patterns in the entire world. That’s what happens when designers make choices instead of hedging. Minecraft is full of choices like this. It committed to blocky voxel graphics when everyone else was chasing photorealism. It committed to a crafting system, a day/night cycle, and a specific set of biomes. Each of those choices closed doors, and each one made the game more distinctive because of it.
I try to apply the same thinking to The Wizard’s Codex. It’s a fantasy setting, full stop. A noise-based procedural map constrains the game to a specific type of world; we can’t do space operas or interdimensional travel (though I might add alternate planes of reality in the future, our own version of the Nether perhaps), but locking into our world type lets us make a cool 3D map that adds a lot to the game. SRD mechanics give the game real pushback: you can fail, you can die, your skill checks can go badly, and that matters way more than letting the LLM vibe its way through consequences. I chose grid-based tactical combat over theater of the mind, which meant investing heavily in one system rather than offering a shallow version of everything. And the pixel art aesthetic took weeks to build, but it creates a hand-crafted harness around the AI-generated illustrations so the overall experience still feels curated and intentional. Every one of these was a scope reduction that closed off possibilities, and every one of them has ended up feeling irreplaceable. These constraints are where taste shows through.
I think taste and curation matter more in the AI era, not less. When anyone can generate a fantasy world with a single prompt, the thing that separates a memorable game from forgettable slop is the evidence that a human made deliberate choices about what to include and what to leave out. Anyone can buy ingredients at the store and cook a fancy meal at home. People still pay to eat at restaurants, and the best restaurants aren’t selling access to food, they’re selling curation. You’re paying to taste a specific recipe dreamt up by someone who has spent years obsessing over how the flavors work together. I want The Wizard’s Codex to feel like that. The pixel art theme I spent weeks building isn’t just cosmetic. It sends a message: someone cared about this. The hand touches are the differentiator now.
Thesis 4: It has to be a game, not a toy
There’s a massive difference between an LLM that simulates a world and a game. I actually love simulations. Many of my favorite games lean heavily into simulation: Dwarf Fortress, RimWorld, Mount and Blade, Civilization. But we can’t rely on the emergent unpredictability of a simulation alone to keep people entertained in the long term. People need an expressive interface to interact with and impart their will on the simulation. And even in a game where you can do anything, we need scaffolding to help players latch onto story hooks, to give them a reason to care about what’s happening. Without friction and failure states, without the need to actually strategize, you’ll poke at the simulation for a while and then move on. I’ve watched people try ChatGPT RPGs and AI Dungeon, and the pattern is always the same: ten minutes of wonder, then a slow realization that nothing they do matters because the system will accommodate anything. That’s a toy. Toys are fun for a bit, but games are the things people sink hundreds of hours into, because games push back. When Lantz says that LLMs’ soft logic “short-circuits” the emergence that makes games fun, I think he’s describing exactly this problem. If the LLM is the whole game, there’s no resistance, no structure for interesting decisions to emerge from.
The Wizard’s Codex pushes back. Skill checks resist your intentions: you can try to lie to the guard, but if your charisma is low and you roll badly, you’re getting arrested. Combat has real death saving throws, and the enemies are actually trying to win. NPCs have a patience system that decays over the course of a conversation; a stranger in a tavern isn’t going to be open to you interrogating them on their personal details, unless they’re particularly extroverted or you’ve worked up to a meaningful relationship with them. Character creation constrains your starting power level so that progression actually means something, because if you can start as the king then there’s nowhere to go. These systems exist to create the friction that makes choices feel meaningful. The LLM handles what it’s good at: improvising dialogue, narrating consequences, giving characters a voice. But the game engine is what makes those moments matter, because the game engine is what makes you earn them.
What’s Next
The immediate priorities are straightforward: finish the combat frontend, build character creation, and launch the website so people can actually play. Even if it’s a pre-alpha build that is feature-incomplete and breaks often at first, I need to get it into people’s hands. I’ve been heads-down building systems for a long time now, and I’m confident in the engine I’ve put together. But confidence in the engine is not the same as knowing the game is fun. With a system this complex, there are a lot of knobs to tune and a lot of interactions between systems that will only reveal themselves through real play. Finding the fun is going to be an iterative process, a little like panning for gold. I know it’s in there, but it’s going to take work to filter out what works best from what just works.
Last spring there was only intuition, a gut feeling that there was land beyond the horizon. Now I can see the coastline clearly in my spyglass, and it’s full speed ahead. The project has gone from an amorphous idea to something with real shape and real momentum. And the thing that gave it shape, more than anything, was choosing what to leave out. Every time I committed to a specific choice and closed off other possibilities, the project got better. That’s perhaps the through-line of everything I’ve learned this year: in a world where AI can generate anything, the most important skill is knowing what not to build.