Building An Agent Swarm: Lessons From Our First Month
Fifteen minutes. That's how long it took three AI agents to scaffold a new SaaS product, generate the database schema, and wire up Docker. I was still reading the second ticket when they finished the first five.
I'm Mike, solo founder of ArchonHQ. A month ago I started running AI coding agents in parallel using OpenClaw, Codex CLI, and tmux sessions. I began with three. Then seven. Right now, eleven agents work across two separate codebases at the same time: AgentTeams, a brand new product with 32 tickets, and Mission Control, an existing platform carrying 71.
The idea sounds simple. Give each agent a ticket, point it at the repo, let it work. The reality required something I didn't expect to build: a dispatcher with dependency graphs, phase gates, and a failure protocol that prevents you from burning money on stubbornness.
Here's what actually happened on day one. Wave one: three agents took the scaffold, schema, and Docker tickets. Done in under fifteen minutes, all committed clean. Wave two: auth service, container orchestrator, LLM proxy. Three more agents, three more tickets closing. By the end of the day, 16 of the 32 AgentTeams tickets were complete. Thirteen thousand two hundred and seventy lines of code added across both projects. Seventeen commits. A git history that looks like I hired a dev team.
I didn't.
But it wasn't all clean commits and smooth merges. One ticket, kan-6, failed twice. The agent produced code that didn't pass verification, rewrote it, and failed again. Another session died silently. No error, no commit, no trace. Just a tmux pane sitting at a blank prompt while I assumed work was happening.
These failures forced the two rules that now hold the whole system together. First: the double failure rule. If an agent fails the same ticket twice, flag it and move on. Don't retry. Don't throw more tokens at it. That ticket is telling you something about its own complexity. Kan-6 eventually got split into three smaller tickets. All three succeeded on the first attempt.
Second: phase gates. Within a single phase, agents spawn automatically as dependencies clear. But between phases, everything stops for human review. No agent gets to decide that Phase 1 is good enough to start Phase 2. That boundary is mine. It's the difference between a swarm and a stampede.
The part that surprised me most wasn't the failures. It was where the bottleneck landed. I expected the agents to be slow. I expected token limits, context confusion, merge conflicts. Instead, the agents finished phases faster than I could review them. The constraint on the whole system turned out to be me, sitting at the phase gate, reading diffs.
Which brings me to the thing I think matters most, the thing worth naming.
Every agent in my swarm runs the same model. Same weights, same architecture, same Codex instance. There's no "frontend specialist" model or "database expert" model. The specialisation comes entirely from the prompt. Each agent receives a unique brief: the ticket scope, the repo structure, which files other agents are touching (so it avoids conflicts), and a verification checklist it must pass before committing.
Specialisation through context, not through different models. The prompt is the specialisation.
This changes how you think about scaling. You don't need a better model for harder tickets. You need a better prompt. You need tighter context. You need to tell the agent exactly what it's building, exactly what it should not touch, and exactly how to verify its own work. When I got that right, the same model that failed kan-6 twice went on to close three decomposed sub-tickets without a single error.
One month in, the system runs two projects simultaneously with a solo founder at the controls. The dispatcher manages dependencies. The phase gates keep humans in the loop where it matters. The double failure rule prevents waste. And eleven identical agents, differentiated only by what they're told, produce work that would take a small team weeks.
The agents are not the bottleneck. The prompts are not the bottleneck. The bottleneck is the person at the gate, and that's exactly where it should be.



