Recent EF-Map work has not come from asking one agent to "make it better." The useful loop is more controlled than that: audit, document, chunk, implement, smoke test, and deploy.
This blog post is about the workflow, not about one magic model. It is the current process that works for a solo builder using LLMs heavily on a large EVE Frontier project. It is imperfect, it changes as the tools change, and it still depends on human domain review.
Why EF-Map needs structure
EF-Map is not a small demo app. It has a 3D map, routing, Intelligence, Killboard, State of the Frontier, Solar System View, Smart Assemblies, Smart Gate data, public and tribe marks, helper integrations, static content, and a lot of EVE Frontier-specific language.
A vague prompt is not enough for that kind of product. An agent can inspect files and make edits, but it does not automatically know which UX details matter to scouts, where community terminology is sensitive, or when a feature technically works but says the wrong thing. The process has to give the agent context, constraints, and a narrow target.
The planning layer: ChatGPT web with repo access
The first layer is planning. In my current setup, ChatGPT web can use connected repository access to read EF-Map docs, generated audit files, branch diffs, and committed planning documents. That makes it useful for turning a rough idea into an implementation prompt.
This is where I usually ask for scope: what should be done first, what should stay out of scope, what files are likely involved, and what validation should be mandatory. It is also where large documents get split into safer batches.
One practical benefit, at least in my current setup, is that planning in ChatGPT web is separate from the coding-agent session budget I am trying to preserve. That may change, so treat it as a workflow observation rather than a platform rule.
Product names and availability move quickly
OpenAI currently documents Codex as its coding agent and the Codex IDE extension as a way to use that agent in VS Code-compatible editors. OpenAI also documents GitHub access through ChatGPT apps. Anthropic documents Claude Code, Claude Cowork, and Claude in Chrome as separate surfaces. Anthropic's public docs also show that model names and access can change quickly, including recent Fable availability changes. Verify what is available on your plan before building a process around one model picker label.
The execution layer: VS Code agents
The second layer is execution. VS Code agents, including Codex-style agents, work well when given a precise task. They can inspect the workspace, edit files, run tests, build the site, run HTTP checks, and deploy previews or production when the repo workflow requires it.
They work less well when handed a giant audit and told to do everything. That turns one ambiguous report into a broad multi-surface change, which is exactly where product drift and accidental refactors creep in. EF-Map work goes better when the prompt says what to touch, what not to touch, which checks to run, and what acceptance looks like.
The audit layer: Claude Code Cowork
The third layer is audit. The Claude Code Cowork blog post covers this in detail, so the short version is enough here: Claude Cowork plus Claude in Chrome can inspect the live app and the repo together.
That was useful for EF-Map because a lot of the hard product work is browser behavior, not isolated code. Cowork could click through the live site, inspect the repo, and produce UX or engagement audits that I would struggle to do manually at the same breadth. Those audits then became Markdown documents and implementation prompts, not automatic patches.
The document-first pattern
For big ideas, the first output should often be a document. A VS Code agent can create a Markdown audit, spec, or implementation plan, then commit and push it. Once the document is in the repo, ChatGPT web can read it through repository access and help decide what to do next.
This sounds slower than asking the agent to code immediately, but it has been faster in practice. The document becomes durable project memory. It survives context resets. It gives later agents something concrete to load. It also makes it easier for a human to say, "yes, do batch one," or "no, this part misunderstands the product."
Chunking the work
Large reports become batches. That was the pattern behind recent EF-Map engagement work: one pass for State of the Frontier links back into the map, another for copy feedback and text summaries, another for PNG share cards and QR cards, another for the Frontier Report dot and count-up metrics, and another for terminology cleanup.
Each batch got its own scoped prompt, branch or commit, validation run, preview deploy when appropriate, and manual smoke test. The goal was not to move slowly. The goal was to keep each change small enough that a human could understand and accept it.
| Batch type | Agent does | Human checks |
|---|---|---|
| UX audit | Inspects browser and repo, writes a report, ranks fixes. | Whether the findings match real player workflows. |
| Implementation | Edits scoped files, runs tests/builds, reports the diff. | Whether the behavior makes sense in EVE Frontier context. |
| Static content | Writes blog/report pages, metadata, sitemap entries, SEO checks. | Whether wording is true, useful, and shareable. |
| Deploy | Builds, deploys preview or production, runs HTTP checks. | Whether the live page or feature is acceptable to ship. |
Why manual smoke testing still matters
Agents test code. Humans test meaning.
That distinction matters on EF-Map. A button can pass a click test and still be the wrong product action. A card can render and still be useless in Discord. A report can have valid HTML and still overclaim what the data proves. A route note can include the right link and still confuse a player.
The best examples were small. "Kills / deaths" became "kills / losses" because structure losses are not player deaths. Share cards needed QR codes because a URL printed inside a PNG is not clickable. State of the Frontier wording needed to stay bounded by fact packs and verification instead of sounding like official or complete universe truth.
Manual smoke testing is where "technically works" becomes "right for this product." It catches wrong terminology, bad assumptions, unusable links, unclear cards, and misread intent. It also brings community feedback back into the loop.
What agents are good at here
In this workflow, agents are good at repeated, bounded work. They can inspect a repo quickly, find related files, generate or update documentation, make mechanical edits, run validation, build static pages, deploy previews, check metadata, and produce structured closeouts.
That is why the blog workflow itself fits agents well. The agent reads the blog guide, creates the HTML file, prepends `posts.json`, adds the blog index card, updates the sitemap, runs SEO and build checks, commits, pushes, deploys, and verifies the live URL. A human still reviews the prose and share preview.
What agents are still bad at
Agents are still weak at knowing what not to build. They can miss game and community context. They can choose technically correct but awkward wording. They can overclaim unless the prompt makes the evidence boundary explicit. They can optimize the code diff while missing the product trade-off.
This is why the human is still in the loop. The operator does not need to be a traditional software developer to add value. Domain knowledge is the review layer. Knowing how EVE Frontier players talk, what a feature is supposed to mean, and what would be embarrassing to ship is part of the engineering system.
How to try the workflow
The compact version is:
- Start with a clear goal and a narrow surface.
- Ask for an audit, spec, or implementation plan first.
- Commit and push the document so it becomes repo context.
- Review it in ChatGPT web with repository access.
- Chunk the work into one implementation batch.
- Give the VS Code agent explicit files, non-goals, validation, and deploy instructions.
- Require tests, build checks, HTTP checks, and a preview deploy when relevant.
- Manually smoke test the preview with domain knowledge.
- Merge and production deploy only accepted work.
- Keep a closeout so the next prompt starts from known state.
Where this may go next
More automated agent chains are coming. They will be useful for bounded, data-driven tasks: routing-performance experiments, mechanical refactors, schema checks, repetitive documentation updates, and other work with clear pass/fail signals.
I am more cautious about fully autonomous chains for domain-heavy product work. EF-Map is full of EVE Frontier-specific assumptions, half-visible data semantics, player language, and community context. That is exactly where an agent can look productive while steering the product in the wrong direction.
For now, the EF-Map process keeps the human in the loop. The tools help audit, chunk, implement, and verify. The human still decides whether the work should exist.
Conclusion
The win is not replacing the builder. The win is turning vague ideas into structured prompts and reviewed changes.
EF-Map's process is imperfect, but it is repeatable. It has shipped real improvements across UI audits, State of the Frontier, share cards, report links, copy feedback, search intelligence, database inventory work, and public blog posts. The workflow is the useful part: audit, chunk, implement, smoke test, and only then deploy.