· jwbatey

Play it here: https://jwbatey.com/parallax/

Caution, it’s alpha at best.

This page is mostly AI text… in the spirit of this project. I am using this project as a way to test how far AI has advanced. I setup the structure and described the game I want, but I have scaffolding (which the AI can edit) which goes and actually generates the game.

Parallax: a point-and-click adventure built with model-generated content

I’m building Parallax, a 2D point-and-click adventure that borrows its pacing and tone from Monkey Island. The twist is practical: most of the writing and art starts as output from language models and image models, then gets assembled and tested through a repeatable pipeline.

I’m not trying to let a model “make a game on its own.” I treat models like content compilers. They take a small set of inputs and produce structured assets (JSON graphs, prompts, images) that I can inspect, validate, and rerun like any other build.

The bet is that game content can be handled like compiled output: you keep a small, reviewable source-of-truth, and you generate everything else in a way that’s repeatable and inspectable. Models are useful here because they’re fast at producing draft artifacts. The project only stays sane if those artifacts are constrained, validated, and easy to regenerate.

What I’m aiming for

A small playable slice of a retro point-and-click game:
- I write the high-level story intent
- models draft dialogue graphs and basic scene logic
- models generate background screens and character art
- the game reads hotspots and navigation overlays from data
Repeatable generation:
- every generated file has a clear origin (prompt + inputs)
- regen is safe because generated outputs live in a separate tree
Fast iteration without “mystery meat” content:
- a debugger where I can step through screens, hotspots, and dialogue like a dev tool
- cost/run visibility so I can rerun only what changed

Where it is right now

Parallax has two parts that cooperate.

I keep the project split into three layers:

Source inputs (story_specific/) — the files I’m willing to edit by hand
Build outputs (story_specific_gen/) — everything produced by the pipeline
Runtime (game/debug suite repo) — a thin loader + tooling that reads the build outputs

That separation sounds mundane, but it’s the difference between “I can rerun this safely” and “I’m scared to touch anything.”

1) Content generation pipeline (Python)

One pipeline run can produce:

narrative scaffolding (scenes.json, characters.json, hotspots.json)
dialogue graphs (dialogue/SCN_*.json)
per-screen prompt markdown files (plus an index)
rendered screen PNGs and a manifest.json
segmentation masks for hotspots/navigation (generated after rendering)
optional character images
a final “copy outputs into the game repo” publishing step

The loop is “edit inputs, regenerate outputs.” I try hard to avoid hand-editing downstream artifacts.

Under the hood, the pipeline behaves like a small build system:

Stage graph: plan → draft → normalize → validate → render → postprocess → publish
Stable IDs: screens, scenes, and dialogue nodes keep IDs that survive regeneration
Cache keys: stages can be skipped when inputs and parameters match prior runs
Provenance: each generated artifact records:
- the prompt (or prompt template + filled inputs)
- model name/version + parameters
- source file hashes for the inputs that fed it

That last point matters more than it sounds: when something gets weird, I can answer “what produced this?” without guessing.

2) A playable debug environment (static HTML/JS)

The Parallax Debug Suite is a static HTML/JavaScript toolkit that renders the game from data:

screen art, hotspot masks, navigation overlays
hover/click on masks to move between screens
a dialogue overlay with:
- node stepping
- transcript output
- condition/effect state tracking
simple character sprite support, including a head-mask “talk” animation
a cache-bypass toggle so JSON edits reload quickly during authoring

A few debug affordances have been disproportionately valuable:

a mask inspector that shows the raw mask pixel under the cursor (so mismatches are obvious)
an object overlay list for the current screen (hotspots + exits + their IDs), clickable to highlight
a state panel that shows current values and a compact history of changes (condition/effect tracing)
a transcript export so dialogue runs can be reviewed like test output
a deep link pattern (query params) so I can share “open this screen, with this state” URLs with myself across sessions

It also has a deliberately retro entry point (a DOS-style boot screen). That part is pure tone-setting, and it makes me smile.

The “LLM as compiler” workflow

Generation starts from a small set of stable inputs:

story_specific/story.md (narrative intent)
story_specific/screens.json (screen graph + global style constraints)
story_specific/dialog_style.md (tone + formatting rules)
story_specific/character_style.json
story_specific/art_style.json

Typical flow:

run python pipeline.py
review what it produced (--only_script acts as a “generate, then stop” checkpoint)
run unattended (--yolo) only when I trust the current setup

One choice that has paid off: dialogue generation happens in two passes. Pass one is quick drafting. Pass two reforms the draft into a stricter graph format and runs validation. It keeps the pace high while still giving me structure I can trust.

Validation is mostly boring rules that prevent expensive debugging later:

schema checks (shape, required fields, allowed enums)
referential checks (every link points to a real node/screen)
dialogue integrity (no orphan nodes, no dead-end choice sets unless intentional)
state sanity (conditions reference declared state keys; effects write valid values)
formatting rules (line length, speaker tags, choice text constraints)

When a run fails, I’d rather get a blunt error with a file/line pointer than discover it mid-playthrough.

All generated artifacts land under story_specific_gen/. Keeping that boundary sharp has saved me from editing the wrong layer more than once.

Data-driven design and tooling choices

Everything important is data

Most of the world lives in JSON:

screens.json describes rooms and connections
hotspots.json defines click targets
scenes.json maps triggers and dialogue
dialogue/ holds conversation graphs

Both the debug suite and the runtime read these files. Story logic is not hard-coded.

Version control is part of the workflow: source inputs are reviewed like code, generated outputs are treated like build artifacts, and the game/debug suite stays thin. That keeps creative iteration compatible with normal engineering hygiene: diffs, rollbacks, reproducible runs.

Mask-first interaction debugging

Hotspots and navigation get tested through overlay masks, not only hand-tuned coordinates. One rule I learned the annoying way: mask files have to match full screen image dimensions and naming conventions, or the overlays lie to you.

I also keep the mask-to-hotspot mapping explicit. A mask is only useful if it’s unambiguous how pixels map back to IDs in hotspots.json (palette indices, RGB codes, or a lookup table—pick one, document it, stick to it). “Looks right” is not a passing test when the engine is reading it.

A web debugger with no build step

The debugger runs from a plain local server (for example, python -m http.server). Entry points:

index.html (boot-style entry)
debug.htm?screen=HUB-01 (main debug view)
dos.htm (standalone boot sequence)

Explicit state, persisted for testing

Debug state lives in localStorage under debugState. That makes it easy to refresh, reproduce an issue, and test gating logic across scenes and dialogue without rebuilding anything.

Generated outputs are treated as build artifacts

Renders can be skipped when outputs already exist, with explicit flags to redo work when needed. That avoids accidental rerenders and keeps costs predictable.

One unavoidable wrinkle is that models are stochastic. I handle that in three ways:

lock what can be locked (fixed prompt templates, fixed style constraints, explicit must-include objects)
treat outputs as disposable (regen is expected, not a crisis)
prefer incremental reruns (change one input, rerun one stage, verify the diff)

When something comes out worse, I don’t argue with it—I adjust the upstream constraint until the pipeline reliably produces something usable again.

Notes from the generation boundary

The hardest problems show up where text intent has to become a picture.

A practical lesson: image generation behaves better when I treat prompts like a contract, not a vibe. For each screen, I keep a short “must include” list (key interactables, doors/exits, landmarks), plus a “must not” list (floating objects, unreadable signage, cropped exits). I also keep camera rules stable—framing, horizon line, implied player height—so navigation doesn’t feel like teleporting between unrelated illustrations.

One recurring failure mode: when a hotspot name encodes a container/content relationship or a state, often written like Container / Item, image generation may drop the container and draw only the item. The interaction ends up wrong because the picture is missing the thing the player is supposed to click.

The fix has been simple:

keep the primary interactable object as the hotspot name
move container/state phrasing into a trailing location/description field that the art prompt must depict

That gets visuals back in sync with the intended interaction.

What I’m working on next

Hotspot geometry correctness Some hotspots are getting rectangle geometry even when nothing supports it. My plan is to leave geometry blank by default and fill it from masks (or explicit layout) later.
Generated folder cleanup Move all generated content (scenes, characters, hotspots, dialogue, prompts, images) under a consistent {project}_gen/ tree.
Schemas and validation for scale Next up:
- items/inventory (items.schema.json)
- world state type definitions (state.schema.json)
- asset manifest (assets.schema.json)
Later, if the engine keeps growing: navigation/walkboxes, objectives, skill/stat checks, cutscenes, localization, savegame schema.

Cost visibility is part of the design

I track spend and time per phase (arrange/plan/render/characters). Full runs have landed in the “tens of dollars” range across calls to OpenAI models and Google models, so tooling that lets me rerun only what changed is part of the work, not a nice extra.

For my own sanity, I record the boring details too: per-stage wall time, cache hits/misses, image counts, and token/call counts where I can get them. If I’m going to iterate like a developer, I want the feedback loops to look like developer tooling.