How I use an AI orchestrator to manage a fleet of coding agents with worktrees, tmux, deterministic monitoring, and the two-tier context split that makes it work.

My git history looks like I hired a dev team

94 commits in a day. Three PRs landing while I was on a call. A feature branch I didn't open my editor for, start to finish.

My git log tells the story of a team. In reality it's me and an AI orchestrator managing a fleet of coding agents.

The problem nobody talks about

Codex is good. Claude Code is good. But they have almost no context about your business.

They see code. They don't see the customer who asked for this feature, the meeting where you scoped it, the architectural decision from last Tuesday that rules out the obvious approach, or the five things you've already tried.

You can paste context into the prompt. But context windows are zero-sum. Fill it with code and there's no room for business context. Fill it with customer history and there's no room for the codebase.

So you compromise. Every time.

Two tiers, one insight

The fix is separation.

I run a two-tier system. My orchestrator, Navi, holds all the business context. Sprint state, customer conversations, past decisions, what we tried before and why it failed. She translates that into precise prompts for coding agents.

The coding agents see only what they need: the worktree, the types, the tests, and a brief that tells them exactly what to build.

Navi never writes code. The agents never see business context. Each is loaded with exactly what makes them effective.

The insight that makes this work: specialisation through context, not through different models.

The mechanics

Each agent gets three things: an isolated git worktree, a task-specific prompt, and a tmux session.

The worktree means agents can't step on each other. Three agents working on three features, three branches, zero merge conflicts during development.

The prompt is where the orchestrator earns its keep. Navi writes these with the full weight of business context. Which files to touch. What the customer actually wants. The commit message format. The exact PR title. Everything the agent needs and nothing it doesn't.

The tmux session means I can watch, steer, or redirect without killing the agent. Wrong direction? One line in the terminal. Scope creep? Pull it back. Missing context? Send the schema.

What "done" actually means

A PR alone isn't done. I learned this early.

Done means: PR open, branch synced, CI green, lint clean, tests passing, and Navi has reviewed the diff. Only then does anyone get notified.

The monitoring is fully deterministic. A cron runs every ten minutes and checks: is the tmux session alive? Has the branch been pushed? Is there a PR? What's CI doing? No LLM calls, no token cost. Pure status checking.

When something needs attention, it surfaces. When everything is fine, silence.

The loop that compounds

Agents fail. That's expected.

When one fails, the orchestrator doesn't just respawn it with the same prompt. She looks at the failure with full business context and figures out how to unblock it.

Agent ran out of context? "Focus only on these three files." Went the wrong direction? "Stop. The customer wanted X, not Y. Here's what they said." Need clarification? "Here's their email and what their company does."

Each failure teaches the orchestrator what works. This prompt structure ships billing features. Codex needs type definitions upfront. Always include the test file paths.

The reward signal is simple: CI passing, review approved, human merge. Any failure triggers the loop. Over time, the prompts get better because the orchestrator remembers what shipped.

The ceiling nobody expects

It's RAM.

Each agent needs its own worktree. Each worktree needs its own node_modules. Each agent runs builds, type checks, tests. Three agents building TypeScript simultaneously means three parallel compilers in memory.

On my 8GB server, I stagger builds. Two agents code while one builds. It works, but it's the constraint I didn't see coming.

The bottleneck in an agent swarm isn't intelligence. It's memory.

What this actually looks like

I take a customer call. Come back to my terminal. Three features are in PR. CI is green on two. The third needs a redirect because the agent over-scoped.

I send one line to the tmux session, merge the two good PRs, and check my phone. Total time: twelve minutes.

My sprint velocity tripled. Not because the agents are faster than me at coding. Because they run in parallel, they don't context-switch, and they don't get tired at 2am.

The real unlock

The system isn't the agents. It's the orchestration.

Anyone can run Codex. The leverage comes from an orchestrator that holds enough context to write prompts the agents can actually execute. The business context, the customer history, the architectural decisions, the things that make a feature request a precise brief instead of a vague ticket.

Human as architect, AI as executor. Not the other way around.

My Git History Looks Like I Hired a Dev Team