Hex and The Wizard’s Codex [Ryan’s CS199 Final Reflection]

Introduction

This was my first, and unfortunately last, independent study. To say that I’m only a little disappointed that I waited until my final quarter here at Stanford to have such a fulfilling learning experience would be a considerable understatement. Yet, at the same time, I’m grateful that this could act as the informal capstone of my 5 years at this institution — a delightful and impactful culmination to my undergraduate and coterm journey. In many ways, CS199 felt like a fitting conclusion to my time here, acting as a chance for me to combine my personal interests in games with my academic interests in computer science, packaging them neatly into a single—or two—projects.

Initial Ideation

Games have been a recurring theme for me throughout my coterm year — from CS377G in the fall to CS247G in the spring, I’ve spent a large majority of the year studying, analyzing, designing, and teaching games. As a result, it should’ve come as no surprise that I entered this course wanting to build something that would let me play more of them. Initially, I was considering two main project ideas: the first being an AI-powered rulebook assistant designed to help players learn games faster, answer rules questions, and reduce onboarding time and friction; the second being a locally-run AI-narrated Dungeons & Dragons-esque experience that functioned as an endlessly branching interactive story that could be played from anywhere on a phone or laptop.

Initially, the second idea was particularly enthralling to me. Much of my previous work — including Anchor and and an earlier attempt at building a lightweight Telltale-style game in Unity — were interactive fictions and largely choice-based narratives. The idea of creating a program capable of generating unique and novel adventures for each player felt like a natural extension of those interests. 

However, after spending a week and a half experimenting with local LLMs through Ollama on both my laptop and my outdated gaming PC, I was quickly forced to recognize the practical and computational limitations of this idea. Neither my laptop nor PC had enough computing power to adequately run a local model at the scale needed for this program — and building an AI game master capable of maintaining a world state, narrative consistency, and meaningful player agency proved to be much more demanding than I originally anticipated. This was a project I wasn’t giving up on, but decided I would shelve until I got home to my more powerful gaming computer at home (spoiler alert: I wouldn’t have to wait that long!).

As a result, I pivoted to the rulebook assistant. There were two main allures here: the first being my personal desire to finally tackle the growing collection of board games I’ve amassed through Facebook Marketplace but haven’t yet learned; the second being it’s applicability to CS247G and 377G where students play a multitude of new games every class, and the learning objective is often not embedded within the rulebook, but rather the actual gameplay itself. Within this environment, I saw a large use case of reducing onboarding time that would be useful both personally and academically!

Hex, The Rulebook Assistant

Needfinding and Competitive Analysis:

Working on Hex (my rulebook assistant named after the magic-powered supercomputer in Discworld) was a significant learning opportunity. At the beginning of this independent study, before building anything, I wanted to better understand the people I was designing for. Initially, my preconceived notion was that this would be largely experienced tabletop gamers. However, the more I thought about it, the more I realized that the overarching goal of this project was not to simply help existing players, but to help more people enter the tabletop gaming community. As such, I conducted needfinding not only with my Dungeons & Dragons group back home and avid tabletop boardgamers on campus, but with people who actively avoided tabletop games or found them too intimidating to even get into. Across around 10 interviews, several themes emerged. I’ve distilled them here: 

  • Reliable edge-case resolution with citations to the original rules.
  • Definitive answers rather than speculative interpretations.
  • Assistance decoding lengthy or poorly written rulebooks.
  • Beginner-friendly onboarding.
  • Round-by-round guidance during gameplay.
  • Resolution of disputes that arise at the table.
  • Identification of frequently misunderstood rules.
  • Strong safeguards against fabricated information.
  • The absolute minimum amount of information required to begin playing.

Some interviewees suggested additional features like strategy recommendations, FAQ integration, and house-rule management, but I quickly realized I couldn’t build a project that satisfied every single person’s request — it would be a monstrosity. Instead, I focused on the throughline of the interviews: reducing confusion, supporting onboarding, and providing trustworthy answers during gameplay. From an entrepreneurial perspective, this was a particularly fun phase since it set the groundwork for the entire project and allowed me to hone the skill of identifying underlying needs rather than designing around individual feature requests.

Boardgameassistant.ai, one of the published sites, seems to no longer be running.

After interviews, I similarly conducted a competitive analysis of the existing market around rulebook assistants and AI-supported tabletop tools. Surprisingly, the space was relatively sparse. While a handful of projects exist, many appear abandoned or supported a limited selection of games. More interestingly, discussions across Reddit and board game forums revealed a common disdain and distrust of AI-generated rules assistance due to hallucinations — and I knew this was something I would need to overcome. This insight became one of the most important influences on my technical direction. Rather than creating a generalized assistant that had access to a wide corpus of information, I wanted a system trained and isolated exclusively on uploaded rulebooks. Every answer needed to be traceable to the source material, and the system needed mechanisms to avoid hallucinations when the evidence was sparse or inconclusive.

This fundamentally altered my assumptions of how this project would be. Going in, I expected the largest challenge to be capability of my model and whether or not it would adequately answer questions. However, after talking with users, I realized the larger core issue was trust. Even a highly capable system would be unusable if players didn’t believe the answers it provided, or didn’t have an method to audit its responses. As such, this reinforced my idea to pursue a RAG-based architecture.

The nitty gritty and building Hex:

When building Hex, I deliberately chose not to use a prepackaged RAG framework. While I’m aware that existing solutions already exist and would’ve significantly speeded up my process, I wanted to better understand the mechanics of retrieval systems firsthand, especially given their prevalence within the tech industry. What better way to learn about RAG then to study it and build it from the ground up? This meant building the ingestion pipeline, embedding workflow, and retrieval logic. To understand how this would function, I familiarized myself with the relevant RAG literature, including Databricks, IBM, and Singh et al.’s paper on agentic RAG. The resulting system was built using Next.js, Supabase, pgvector, Cohere embeddings, and OpenAI models for both language and vision processing.

One of the first challenges I faced involved the document ingestion. Initially, coming into the project, I assumed all rulebooks were going to be managed fairly simply they are, after all, just text documents, right? … Right? The answer, unfortunately, was not that simple. Some rulebooks contained clean text layers, while others were effectively a collection of images, diagrams, and scanned pages. What initially seemed like it’d be a fairly easy retrieval process quickly became a document-understanding problem. Supporting the variety of rulebooks that existed forced me to think beyond traditional text-based retrieval systems and incorporate OCR and vision-based processing. More importantly, it also reminded me how easy it is to accidentally design for idealized data rather than the reality users actually face.

Example of the differing styles of rulebooks available. This Coup rulebook is largely textual where King of Tokyo has a lot of diagrams and flattened text from a scanned image.

This realization became especially apparent during playtests. When I first tested my prototype on games like Coup, the system performed surprisingly well. I’d never played Coup before, but I was able to get the game up and running under 5 minutes. At that point, I felt as though I’d largely solved my retrieval challenge. Further playtests with Secret Hitler similarly bolstered my confidence that my system was in good shape. That confidence crumbled when I playtested with Cartographers.

Response from earlier Hex iteration that couldn’t return scoring card information.

From a retrieval perspective, the response was technically correct — the retrieved text didn’t explicitly describe the appearance of the sorting cards. But from a player perspective, the answer was obviously unsatisfying because the rulebook clearly demonstrated what the cards looked like. Looking back now, this was a really important moment since it forced me to understand my program as a system that needed to understand the multiple ways games communicate information to players.

More recent image showing the response of to what scoring cards look like. Response now describes what they look like and returns a screenshot of the relevant visual information.

Another large lesson I learned was derived from my focus on trust. One of the strongest themes that emerged from both my competitive research and interviews was the skepticism for AI-generated rulings — and from using AI systems to try to moderate games before, I understand why. As such, rather than optimizing for broad knowledge of games and conversational flexibility, I decided to optimize for trust which can be seen in a lot of the architectural decisions I made throughout — game-specific document corpora, source citations, screenshot grounding, and a larger propensity for my system to reject giving a response if it had insufficient evidence to ground it in truth. Moreover, I built a dedicated gameplay interface for the website where players were able to read the rulebook while also simultaneously interacting with the assistant. This required implementing a two-page rulebook viewer, persistent conversations, screenshot-based queries, page caching, prefetching, and a docked assistant interface.

How the interface functions / looks now.

By the end of the quarter, Hex had become a much larger learning opportunity than I’d originally anticipated. My understanding of AI systems and products has grown, but more importantly, I learned that one of the most difficult parts of building a system is creating something that people trust enough to even ask and use in the first place. Had I had more weeks within the quarter, I would’ve loved to smooth out the user interface and make it look more professional. Right now, the product is geared around functionality rather than a pleasurable user experience. Future changes would include redesigning the UI, and further building out my model’s abilities to a point that I would readily release this to the world. Though I worked on this for a quarter, this is a project I plan to continue and hone such that I, and many others, can delve deeply and quickly into board games!

The Wizard’s Codex

(First, I wanted to acknowledge this is getting a bit long, so I’ll try to keep this brief!)

This was an unanticipated detour. While working on Hex, I also got wrapped into working on Luciano’s CS399 project The Wizard’s Codex. Thank you, Christina, again, for introducing the two of us. This has been one of the most entertaining and fruitful learning experiences I’ve had this quarter! While I originally anticipated joining this project would look like me pushing a lot of code and developing out a ton of features, this was not the case. This was a humbling opportunity to learn from someone who has not only worked in industry for years, but has also been working on this project for a year. Rather than building an isolated project from scratch, I found myself learning to understand the existing codebase and game’s overarching architecture. I met with Luciano on a near-weekly basis, having marathon meetings where we would spend hours talking about the game’s architecture, product decisions, potential features, AI workflows, and implementation tradeoffs. Through these meetings, I learned a significant amount about professional software development — learning experiences I’d never been exposed to before. I was taught environment management through Miniconda, feature-branch workflows, collaborative development practices (like making staging branches and how to git-pull and git-push in accordance to industry standard), how AI-assisted engineering is working in practice, product management tooling through Linear and how it connects to Claude code, and the realities of extending a large existing codebase without breaking existing code.

Before making any tangible contributions, I spent substantial time understanding the system — the way the system organized locations, factions, and relationships as node graphs, how to mask backend calculations by running them in parallel with other processes, and so more. Unlike most academic projects, where I typically delve into development immediately, this experience taught me that contributing to a production codebase often requires extensive onboarding and architectural understanding first. Additionally, I ran in-depth playtests to better understand how the system runs, as well as to note any potential flaws or bugs, given that Luciano told me he hadn’t playtested the product for a long period yet. Once I felt as though I had a good grasp on the code and experience — and following an attempted implementation of the inventory mechanic — I assumed ownership of two major features: World Studio and Discover Worlds. These features were isolated from the code Luciano was working on, so I felt more comfortable taking on these tasks as I was confident I couldn’t mess up any of his work from these spaces. 

The three options to build your world: Guided Creation, World Studio, or Discover Worlds.

World Studio was originally a grayed-out stub, so this was an opportunity for me to own something from the ground up. This feature introduced a structured alternative to the Wizard’s Codex’s conversational worldbuilding process (Guided Creation) and allowed players to more fine-tune their world through a guided-interface that included geography type, personalizable factions, nations, magic and religion, lore, narration style, and central conflicts. Thinking about it, I wanted to give full creative ownership over to the player — an opportunity for them to craft their experience from the laws of the world, to the people they interact with, to the way their story is conveyed to them. One of the most interesting learning experiences working on this segment was understanding how the information moves through the generation pipeline. The frontend collected a wide range of user inputs — from sliders to dropdowns to textfields — while the backend generation system expected a very particular input format. With Claude Code now being omnipotent throughout society, the challenge in this was less about writing code and more about understanding interfaces between systems. Spending time tracing how user choices reflected in the narrative and world generation, coupled with the information I’ve gleaned from Luciano, has given me a much deeper appreciation for system designs and data flow, and how they impact the user experience.

My completed world studio. Users can visualize what tangible changes their design choices will have on their experience. Customizability is also present in having the ability to explicitly define what Factions/Nations/Religion/Magic systems you want in your world!

This experience also drastically changed how I think about AI-assisted development. Tools like Claude make it easy to implement and push features quickly, but don’t remove the baseline need to understand architecture. It was only through Luciano’s tutelage that I learned crucial aspects of system design, such as hiding long processes in parallel with other ones to improve the user experience. This wasn’t something Claude was going to come up with, because Claude is focused around pushing code, not curating an overall seamless experience for a user. That’s still something only a person can do.

Second, Discover Worlds, focused on world exploration rather than creation. Inspired by Spotify’s ‘Discover Weekly’, I wanted to make a curated list of the top 5-6 worlds that matched user’s play preferences, while also allowing them to browse all available worlds. Users can browse these curated worlds, preview them, edit them, or launch these adventures directly from the interface. Architecturally, one of the most important design decisions was making sure this feature flowed into the same underlying world-generation pipeline as World Studio or guided creation.

My completed Discover Worlds feature. Users can browse their curated worlds, all available worlds, and filter them to their liking!

Though working on the Wizard’s Codex diverted time away from Hex, my experience working on this project has been invaluable. While not only fun, this has been one of the most educational parts of the quarter as I’ve learned industry knowledge and hands-on-experience working on an AI-powered game, as well as an inside look of how small-scale startups begin — from funding, to creating onboarding materials, and organizing codebases. 

Moving forward, I’ve already discussed with Luciano my desire to continue working on The Wizard’s Codex. Creating a system like this is something that I’ve been passionate about for the past year, and I’d love to help bring this project to fruition. I’m hopeful I’ll have the chance to keep contributing to this project, helping flesh out narrative aspects such as giving NPCs more agency, implementing quests, and so much more. Whatever happens, I’m glad I’ve had this opportunity.

Conclusion

I entered CS199 expecting to spend the entire quarter building technical artifacts. While I definitely did some of that, I also learned lessons about product development, user research, and software architecture. Through both Hex and The Wizard’s Codex, I learned that products are largely defined by the systems surrounding it — their data quality, retrieval design, user experience, trust, and feedback loops. Many of the issues I encountered throughout the quarter had little to do with prompting and everything to do with understanding how users were going to be using the product, and then designing reliable systems around those needs. Similarly, I found that product insights shaped the technical architecture of products just as much as the technical architecture of products shape the product. Lastly, I learned to embrace the spontaneity of projects. When I undertake personal projects and endeavors, I often find myself trying to create a very structured roadmap of where I want to go, and what I want my project to look like. But if this quarter has taught me anything, projects rarely unfold exactly as you plan them to. However, in these divergences and detours, I’ve learned so much. Thank you — to Luciano for being patient, answering my questions, and letting me be the first person to assist on his baby of a project; and to Christina for not only all your support this quarter, but for the opportunity to get to know you and work with you and Nina over the past year. It’s genuinely been the highlight of my Stanford experience.

About the author

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.