Rich Plots, Real Improvisation

What Alice Saw

I shared a tool in a Discord channel the other day, and a new DM named Alice messaged me back:

“I’m gonna be so honest. This is Greek to me. I have no clue what you’re showing or what problem it solves. I’m a new DM.”

Fair

So I sent her a Google Doc I use to prep my Out of the Abyss campaign. Here’s a short snippet.

Score Name	NPC/Faction	Current Value	Next Threshold	What Triggers Next
Zuggtmoy’s Wedding	Zuggtmoy / Neverlight Grove	Elevated — 2 increases (Blingdenstone expansion + Basidia’s evacuation removed internal resistance)	Wedding completion / Araumycos union	Continued party absence from Underdark; no faction opposes Zuggtmoy; fungal spread reaches critical mass
Juiblex Rebirth	Juiblex	Low-Moderate — 1 increase (declared intent to consume Zuggtmoy’s domain)	Juiblex manifests a new physical form or begins attacking Zuggtmoy’s territory	Zuggtmoy’s wedding weakens her defenses; time passes without intervention

A minute later she replied:

“Ooooooooh. It’s for your campaign story. I thought it was a software thing, not creative.”

She was right. I’d explained the machinery before I’d explained the use.

So here’s the useful version:

I use AI to help me maintain campaign canon across long-running games, but I do not let it decide what counts as canon.

That distinction is the whole system.

What Alice saw was a planning document. At the top were four campaign clocks: four villain plots advancing in the background, each one changed by something the party had done, or failed to do, at the table.

Under that were faction states, NPC dossiers, and plot notes: who knows what, who wants what, what the party has learned, and what is changing offscreen.

I scan that document before every session. In a minute or two, I know what moved, why it moved, and what pressure is building in the world if the party does nothing.

I can’t run the campaign I want to run without this doc.

I also can’t write it by hand. Not across two campaigns. Not across a year of sessions. Not with dozens of named NPCs, each dragging their own history behind them.

This essay is about how I got it written anyway.

What Kind of Game I Want

I want a particular kind of game.

I want players making strange, committed, character-driven choices that I could not have predicted in advance.

That’s not abstract for me. I played Baldur’s Gate 3 cold, no guide, no walkthrough, and made a series of choices as Shadowheart that apparently almost nobody makes. Not because I was optimizing for rarity, but because they made sense for the version of her I was playing.

That’s what I want at the table: not correct choices, but real ones.

I run two D&D campaigns: a heavily modified Out of the Abyss campaign for the Ember Vanguard, and a Dragon of Icespire Peak / Lost Mine hybrid set in Phandalin. I actively push both parties to go somewhere I didn’t plan for. If the plot I prepped isn’t the plot they want, that’s fine.

That’s the deal.

Pick any two

The problem is that I want three things at once.

First: deep prep. Texture. NPCs with interiors. Villains whose behavior today is a consequence of something they chose eight sessions ago. Plots that keep moving, whether the party is watching or not.

Second: flexibility. When the party walks past the dungeon I built, something has to be where they actually went, and it has to feel like it was always going to be there.

Third: consistency. The villain I run in session 24 has to behave like the villain I ran in session 6. If I forget what Shal already said, did, or knew, the illusion cracks.

For a long time, that combination felt impossible.

Prep deep, and the moment players deviate, you’re improvising on top of prep that no longer applies. Prep loose, and the world gets thin. Try to prep every branch, and you end up burning the time you were trying to save.

And over a long campaign, the hardest problem is quieter: you lose track of what actually happened. The next scene drifts a little. Then a little more. Nobody stops the game to point out the contradiction. The fiction just gets lighter.

That’s the part people don’t say out loud. When the party walks past four hours of prep, I’m not upset because they missed it. I’m upset because I burned four hours on something that no longer matters. That lost time becomes thinner prep for the next session, then less energy in the session after that. Players feel it too. They become less willing to push into unplanned territory if they can sense I’m paying for it.

Before LLMs, I had more or less given up on getting all three.

What I Tried First

I wrote summaries from memory. That works for one campaign, maybe. It breaks fast when there’s too much to hold.

Then I tried using an LLM to write the summaries. Better than nothing, but imprecise in ways I didn’t always catch, and the errors showed up later, when they were harder to spot and more expensive to fix.

Then I found GMAssistant.app. That was a real improvement. It gave me solid summaries of what happened. But a D&D session isn’t just a sequence of actions. It’s dialogue, tension, implication, half-finished intentions, weird emotional turns. The recap could get the action right while still losing the feel of the session.

So I went further. I combined GMAssistant recaps with verbatim VTT transcripts from our Zoom calls. Then I built tooling around that. Then more tooling around the tooling. Six weeks, maybe two months, of real work.

I thought I was solving the record problem.

I was. Partly.

The Failure Mode

What I didn’t realize until it nearly cost me a scene was that I was also building a new kind of failure.

A few months into Out of the Abyss, I was designing the endgame around an earlier scene. A PC named Daz had come across evidence implicating a major NPC. My planned encounter assumed he had taken that evidence with him.

I checked the LLM-generated recap to confirm. It said Daz had discovered the evidence. Good enough, I thought, and I kept writing.

Then, by accident, I re-read the original session summary.

What had actually happened was narrower. Daz had looked at some unusual books on a shelf and left the room. He hadn’t opened them. He hadn’t taken them.

Noticed had become discovered in the paraphrase. And discovered had quietly shaded, in my head, into obtained.

If I had run the encounter as written, I would have retconned my own campaign.

That’s the failure mode.

The model hadn’t lied. It had paraphrased. But the paraphrase was fluent. It read like canon. It read so much like canon that I stopped checking the source.

And once that paraphrase enters the next stage of the pipeline, it hardens. A summary becomes a dossier entry. A dossier entry becomes a threat score. A threat score shapes the next session. Small errors do not stay small.

My first version of the tool had bought me depth and flexibility at the cost of consistency, and it had done it invisibly.

The obvious fix would be to go back to the source every time. But if I have to re-read everything every time, I don’t need the tool.

What Actually Worked

What actually worked was putting myself back in the middle.

Now I use the same loop at every layer.

First, the model reads what I can’t read quickly: a transcript, a stack of summaries, a year of sessions. It gives me candidate structure: NPC lists, draft dossiers, scene candidates, recap material.

Second, I review that structure. I fix names. I merge duplicates. I cut paraphrases that slipped into invention. I restore what matters and remove what doesn’t.

This step is not optional. This step is the work.

Third, the model takes the reviewed structure and renders it as prose: a dossier, a narrative recap, a planning document, or a threat tracker.

The model is strong at the first and third steps. It is unreliable at the second. Scope, attribution, ordering, what counts as canon, what matters dramatically: those are creative decisions. Those stay with me.

Skip that middle step, and the errors compound. Keep it, and the whole loop holds.

That’s the system.

What I Have Now

The threat tracker Alice saw is the direct output of it. Session material gets extracted into per-NPC dossiers. I review and reconcile them. Then the tool synthesizes the planning document from that reviewed canon.

That’s how I get villain clocks that stay consistent across a year of play. It’s how I get session recaps that read like narrative chapters without drifting into invention. It’s how I get pre-session cheat sheets I can trust.

It’s also how I keep two campaigns inside one world without the whole thing collapsing under its own weight.

What I have now is not magic. It’s just finally the right division of labor.

I have a threat tracker that tells me, before every session, which villain plans have moved and what moved them.

I have session recaps that read like narrative chapters, but only after I verify the details that matter.

I have NPC dossiers where Captain Tolubb is one NPC, not three duplicate spellings pretending to be different people.

I have two campaigns sharing one coherent world across a year of play.

And most importantly, I no longer mind when the party walks past the dungeon.

Because the prep I’m doing now is not the kind that gets wasted.

If You Want This Too

If you want this for your own game, the practical lesson is simple:

Use AI to read, sort, summarize, and draft.

Do not use it to silently decide what is true.

That part stays with you.

The tools are free. They’re also crude. I’m one GM iterating on my own campaigns, not a product team, and the learning curve is not zero.

I don’t make money on any of this. It’s a hobby. My goal is simply for more of us to have better tools, and for the people trying to make a living doing this to actually make a living.

So if this sparks something for you, build the version you want. If something I built is useful, take it. Fork it. Break it. Send it to a friend who runs games.

I would rather more of us had working tools than fewer.

The last essay was for people interested in the machinery: the loop, the trust layers between documents, and the searchable index I use to query reviewed content mid-session.

This one was for Alice.