Developer Onboarding Guide
Everything you need to start working on ArchonHQ.
Developer Onboarding Guide
You can use this guide to start working on ArchonHQ quickly. It assumes you can read a terminal and have done software development before. Skip to whatever section you need.
1. System Overview
ArchonHQ is a SaaS dashboard built on top of OpenClaw, an agentic AI runtime. The product gives teams a browser UI to manage AI agents, sprints, billing, and deployments. The backend is Next.js with Drizzle ORM against Postgres.
Component Map
ArchonHQ (this repo) is the web application. It runs as a Docker container on the production server, exposed via Traefik reverse proxy. The dev instance runs as a raw Node process on separate ports.
OpenClaw is the AI gateway. It handles agent orchestration, cron scheduling, message routing, and the Telegram bot. Navi (described below) runs inside OpenClaw.
AiPipe is the LLM router that proxies model API calls. It sits between the application and providers like Anthropic and OpenAI, handling key rotation, cost tracking, and fallbacks.
tls-proxy is a small Node.js TLS termination proxy (tls-proxy.js). It listens on port 3000 (HTTPS) and forwards to the container on port 3002. A watchdog script (start-proxy.sh) keeps it alive. It gets restarted automatically at the end of every deploy.
Cloudflare Tunnel (cloudflared) creates an encrypted tunnel from Cloudflare's edge to the server. Public traffic hits Cloudflare, passes through the tunnel to tls-proxy on port 3000, then to the Docker container on port 3002. The tunnel token lives in pass at apis/cloudflare-tunnel-token.
Postgres runs locally on the server via a Unix socket. Connection string: postgresql://openclaw@/mission_control?host=/var/run/postgresql.
Traefik handles routing and TLS inside the Coolify network. The production container connects to the coolify Docker network and registers with Traefik via labels.
Key Repos
openclaw-mission-control: the web app (this repo)navi-ops: the Go CLI that drives autonomous DevOps operationsopenclaw-starter: one-command VPS installer for getting your own OpenClaw instance
URL Reference
- Production:
https://archonhq.ai - Dev preview:
https://dev.archonhq.ai - Dev local HTTP:
http://127.0.0.1:3003 - Dev local HTTPS:
https://<tailscale-host>:3004 - Container health:
http://127.0.0.1:3002/api/health
2. Working with Navi
Navi is the DevOps brain of this project. It is not a chatbot you prompt occasionally. It is an autonomous agent that reads sprint.json every 30 minutes, classifies work items, dispatches sub-agents for safe work, and pings Telegram when it needs a decision.
The primary interface is Telegram (chat ID configured in your OpenClaw workspace). Most operational communication happens there, not in Slack or email.
What Navi Actually Does
Every 30 minutes, navi-ops run fires via OpenClaw cron. The loop is:
- Load
workflow/sprint.json - Classify every item as AUTO, NEEDS_USER, or BLOCKED
- Dispatch sub-agents for AUTO items in parallel via the dispatcher at
http://127.0.0.1:7070 - Send Telegram alerts for NEEDS_USER items
- Log everything to
/home/openclaw/.openclaw/workspace/navi-ops.log
The Three States
AUTO means Navi executes without asking. All of these must be true: the epic is in_progress or active, the item is todo, the item type is Quick, all dependencies are done, autoMergeToDev is true, and the CONFIDENCE_SCORE after implementation reaches 95 or above.
NEEDS_USER means Navi sends a Telegram message and waits. Any one of these triggers it: the epic is still backlog, the item type is Standard or Split, autoMergeToDev is false, CONFIDENCE_SCORE is below 95, or the work touches public posting, secrets, OAuth, or a production deploy.
BLOCKED means a dependency is unresolved. Navi logs it and moves on silently.
Common Request Patterns
To start a new feature, describe what you want in Telegram. Navi will write a pre-flight spec (Phase 0) and ask for approval before writing any code.
To check the current sprint state: navi-ops plan
To see system health: navi-ops status
To manually trigger the autonomous loop: navi-ops run
To preview what the loop would do without executing: navi-ops run --dry-run
Getting Your Own Navi Clone
See Section 10.
3. Sprint and Feature Request Workflow
Work flows through six phases. Understanding this prevents confusion about why something is not deployed yet.
Phase 0: Pre-flight Spec
Before any code is written, Navi (or you) writes a spec covering assumptions, success criteria, and explicit scope exclusions. This is not optional. It is the contract that lets CONFIDENCE_SCORE mean something.
Write a feature request by messaging Navi in Telegram with what you want, in plain language. Navi will draft the spec and ask you to approve it. Once you approve, the item enters the sprint.
Phase 1: Stories
Complex items (type Split) get broken into sub-items. Navi uses the architect agent (Claude Sonnet) to design the decomposition, then the planner agent to write individual specs.
Simple items (type Quick) skip this and go straight to implementation.
Phase 2: Readiness Check
Before implementation, a readiness check runs to verify the spec is complete enough to execute. CONFIDENCE_SCORE is evaluated here. Below 95, the item becomes NEEDS_USER and you get a Telegram message with the score breakdown.
Phase 3: Implementation
The code-agent (gpt-5.3-codex in Development Mode) writes the code. The tdd-guide writes tests in parallel where possible. Changes are committed to a feature/* branch.
Phase 4: Quality Contract
The code-reviewer agent reviews the implementation. If the item touches auth, tenant logic, file handling, or user input, the security-reviewer also runs. Both must pass before merge.
Phase 5: Docs Gate
This is a hard gate before any feature reaches production. For items where docsRequired: true in sprint.json, the following must exist:
docs/features/<docSlug>.mddocs/technical/<docSlug>.md- Roadmap entry moved to Delivered
- Landing page reviewed for accuracy
navi-ops release check fails without these. If gaps exist, the doc-updater agent (Claude Opus) is dispatched automatically.
Production Release
Mike reviews dev.archonhq.ai, then says "merge to main" in Telegram. Navi runs navi-ops release check (which runs regression tests, pre-release-check.sh, and the docs gate). If everything passes with ALL CLEAR, Navi does the merge and push. GitHub Actions deploys from there.
sprint.json as Source of Truth
Everything lives in workflow/sprint.json. The sprint ID is the ISO week (e.g. 2026-W08). Epics have phases, statuses, and items. Items have types, dependencies, autoMergeToDev flags, docsRequired flags, and docSlug values.
Mike approves three things and only three things: the Monday sprint plan, the production release (dev to main), and items that come back with CONFIDENCE_SCORE below 95.
4. Development Setup
Prerequisites
- Node.js 22 (check with
node --version) - pnpm or npm (the repo uses npm scripts but pnpm works fine)
- Go 1.22 or later (for building navi-ops)
- Postgres running locally
- Tailscale (recommended for accessing the dev preview URL)
Clone and Install
git clone [email protected]:<org>/openclaw-mission-control.git
cd openclaw-mission-control
npm installEnvironment Variables
Copy the key list below into .env.local at the repo root. All values come from the pass store or the team's shared secret manager. Do not hardcode any values.
Required keys:
DATABASE_URL
NEXTAUTH_URL
NEXTAUTH_SECRET
GOOGLE_CLIENT_ID
GOOGLE_CLIENT_SECRET
GATEWAY_URL
WORKSPACE_PATH
SSL_CERT
SSL_KEY
API_SECRET
TELEGRAM_BOT_TOKEN
TELEGRAM_CHAT_ID
RESEND_API_KEY
STRIPE_SECRET_KEY
STRIPE_WEBHOOK_SECRET
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY
STRIPE_PRO_PRICE_ID
STRIPE_TEAM_PRICE_ID
STRIPE_PRO_REGULAR_PRICE_ID
STRIPE_TEAM_REGULAR_PRICE_ID
AIPIPE_URL
AIPIPE_ADMIN_SECRET
ANTHROPIC_API_KEY
DEPLOY_WEBHOOK_SECRET
COOLIFY_API_TOKEN
COOLIFY_APP_UUID
ENCRYPTION_KEY
GITHUB_PATFor local dev, NEXTAUTH_URL should be https://dev.archonhq.ai. Never set it to a localhost URL in .env.local since the pre-release check will flag that as an env leak.
Database details Setup
The database runs as the local openclaw Postgres user via Unix socket:
psql "postgresql://openclaw@/mission_control?host=/var/run/postgresql"To apply schema migrations:
DATABASE_URL="postgresql://openclaw@/mission_control?host=/var/run/postgresql" npx drizzle-kit pushRunning the Dev Server
bash start-dev.shThis kills any existing dev instance, sources .env.local, and starts Next.js via tsx server.ts on:
- HTTP:
http://127.0.0.1:3003 - HTTPS:
https://<tailscale-host>:3004
Logs go to /tmp/mc-dev.log. The server compiles on demand, so no pre-build is needed. Code changes are live on the next page load.
To stop the dev server cleanly:
kill $(cat /tmp/mc-dev.pid)Do not use pkill -f next. It will kill the exec shell. Always use the PID file.
5. Build, Test, and Debug
TypeScript Check
Before committing anything, run:
npx tsc --noEmitZero errors is the standard. The pre-release check enforces this and will block a merge if TS errors exist.
Regression Suite
The regression suite covers build, database, public API, auth enforcement, pages, newsletter, Stripe, middleware, infrastructure, and content integrity.
Run against the local dev server:
bash scripts/regression-test.shRun against production to verify a deploy:
bash scripts/regression-test.sh --prodRun against a specific URL:
bash scripts/regression-test.sh --base http://localhost:3003Exit code 0 means all pass. Exit code 1 means failures. The output includes a Results summary line you can read at a glance.
Pre-Release Check
This script must pass before every dev to main merge. It runs automatically from the pre-push hook, but you can run it manually at any point:
bash scripts/pre-release-check.shIt checks (in order):
- Regression suite (runs the full regression test as its first gate)
- Git branch and unpushed commits
- Changed files scanned for env-specific values baked into source
- TypeScript errors
- Coolify env var audit for duplicates and required keys
NEXTAUTH_URLset correctly for production- Active Stripe prices
- Production and dev returning HTTP 200
- Infrastructure running (Cloudflare tunnel, tls-proxy, Postgres)
Run with --fix-coolify to automatically delete Coolify env duplicates:
bash scripts/pre-release-check.sh --fix-coolifyDev Server Logs
tail -f /tmp/mc-dev.logFor production server logs, check the Docker container:
docker logs archonhq --tail 100 -fFor tls-proxy logs:
tail -f /tmp/tls-proxy.logFor navi-ops autonomous loop logs:
navi-ops log --n 50Or read the raw file:
tail -f /home/openclaw/.openclaw/workspace/navi-ops.logCommon Error Patterns
Server fails to start: Check /tmp/mc-dev.log for the first error. Usually a missing env var or port conflict. Make sure nothing is already using 3003/3004.
Auth redirect loops: NEXTAUTH_URL is wrong. For dev it must be https://dev.archonhq.ai, not localhost.
Database connection errors: Postgres may not be running, or the Unix socket path is wrong. Try psql "postgresql://openclaw@/mission_control?host=/var/run/postgresql" directly.
TypeScript errors after pulling: Run npm install first. A dependency may have updated its types.
Coolify env duplicates: Run bash scripts/pre-release-check.sh --fix-coolify. Duplicate env vars in Coolify cause silent config problems.
6. Git Workflow
Branch Model
feature/<story-id>-short-description → dev → mainAll feature work happens on feature/* branches cut from dev. PRs target dev, not main. The dev branch deploys to dev.archonhq.ai for review. main is production only.
Never commit directly to dev or main. Always use a branch.
Conventional Commits
Every commit message follows this format:
<type>: <short description>
<optional body with evidence: test count, build status>Types: feat, fix, refactor, perf, docs, test, chore, ci
One commit per story. Do not mix unrelated cleanup with behavior changes. For significant changes, include verification evidence in the commit body (test counts, build status, what was checked).
Examples:
feat: add team billing portal with Stripe customer portal redirect
Regression: 47 pass, 0 fail. TypeScript: 0 errors. Tested locally against Stripe test mode.fix: correct NEXTAUTH_URL env leak in checkout route fallbackdocs: add Phase 5 docs gate documentation to technical guideAuto-merge to Dev
When a Quick item meets all AUTO conditions (CONFIDENCE_SCORE ≥ 95, build passes, zero test failures, scope matches the pre-flight spec), Navi merges automatically:
git checkout dev && git merge feature/xxx --no-ff && git pushNavi then sends a one-line passive notification to Telegram. No action needed.
Merge to Main
This requires Mike's explicit approval. Always. Not for hotfixes, not at CONFIDENCE_SCORE 100, not for anything "trivial". The pre-push hook enforces this: direct pushes to main (non-merge commits) are blocked.
The hook also runs pre-release-check.sh automatically when you push a merge commit to main. If the check fails, the push is rejected.
To bypass in a genuine emergency (requires Mike's explicit say-so):
git push --no-verifyPRs and Review
Open PRs against dev. The code-reviewer agent reviews automatically as part of the sprint workflow. For security-sensitive changes (auth, tenancy, file handling, input validation), the security-reviewer also runs.
7. Deploy and Monitor
Full Deploy Pipeline
- Mike approves "merge to main" in Telegram
navi-ops release checkruns: regression tests +pre-release-check.sh+ Phase 5 docs gate- ALL CLEAR: Navi runs
git checkout main && git merge dev --no-ff && git push - GitHub Actions triggers on push to
main - Actions SSH into the production server
- Server pulls latest code and builds a new Docker image (
archonhq:new) - New container starts on port 3005 for a health check (
/api/health) - Health check passes: old container is stopped and removed, new container starts on port 3002 as
archonhq start-proxy.shrestarts tls-proxy to restore the CF path- Old images are pruned
The full sequence takes about 3 to 5 minutes. There is a brief gap (around 2 seconds) during container swap when the CF tunnel path is interrupted.
Checking Deploy Status
Watch GitHub Actions in the repo's Actions tab, or check the container directly:
docker ps | grep archonhq
curl -sf http://127.0.0.1:3002/api/health && echo "healthy"navi-ops Release Commands
Run the full release gate before merging:
navi-ops release checkIf Phase 5 gaps exist, dispatch the doc-updater:
navi-ops release docsExecute the dev to main merge (only after ALL CLEAR):
navi-ops release deployAlert Conditions
Navi sends Telegram alerts automatically for:
- Production returning non-200: CRITICAL, message says "URGENT: prod is {code}"
- Cloudflare tunnel process not running: CRITICAL
- Regression test failure: HIGH
- Gateway RSS above 1000 MB: HIGH
- Available RAM below 400 MB: HIGH
- NEEDS_USER item ready for decision: INFO
- CONFIDENCE_SCORE below 95: INFO with score breakdown
If you get a CRITICAL alert, check the tls-proxy and cloudflared processes first. Most production outages are one of those two.
Log Locations
| What | Where |
|---|---|
| Dev server | /tmp/mc-dev.log |
| Prod container | docker logs archonhq |
| tls-proxy | /tmp/tls-proxy.log |
| navi-ops loop | /home/openclaw/.openclaw/workspace/navi-ops.log |
| OpenClaw gateway | openclaw gateway status then check its log path |
8. DevOps Tools Reference
navi-ops
The Go CLI for autonomous DevOps. Binary at /home/openclaw/.local/bin/navi-ops.
navi-ops status # full system snapshot: git, health, board, regression, docs gate
navi-ops plan # classify all sprint items as AUTO/NEEDS_USER/BLOCKED/DONE
navi-ops plan --filter AUTO # show only AUTO items
navi-ops run # execute autonomous work loop
navi-ops run --dry-run # preview what the loop would do without executing
navi-ops release check # full release gate (regression + pre-release-check + docs)
navi-ops release docs # dispatch doc-updater for Phase 5 gaps
navi-ops release deploy # execute dev to main merge (requires ALL CLEAR)
navi-ops log # show recent run log entries
navi-ops log --n 100 # show last 100 entries
navi-ops cron install # install the 30-min heartbeat cron job
navi-ops cron status # check cron job statusopenclaw CLI
Manages the OpenClaw gateway, agents, and cron jobs.
openclaw gateway status # check gateway health and memory
openclaw gateway start # start the gateway
openclaw gateway stop # stop the gateway
openclaw gateway restart # restart the gateway
openclaw agent --agent main -m "your message" # send a message to the main agent
openclaw agent --agent code-agent -m "..." # invoke a specific sub-agentAiPipe
The LLM router. Check health:
curl $AIPIPE_URL/healthzAdmin endpoint (requires AIPIPE_ADMIN_SECRET):
curl -H "Authorization: Bearer $AIPIPE_ADMIN_SECRET" $AIPIPE_URL/admin/statsCloudflare Tunnel
Check if cloudflared is running:
pgrep -a cloudflaredRestart if down:
cloudflared tunnel run <tunnel-name> &The tunnel token is in pass at apis/cloudflare-tunnel-token.
pass Store
All secrets live in the pass store. Never put them in files.
pass show apis/cloudflare-tunnel-token
pass show anthropic/api-keyFor scripts, use bash automation/secrets.sh <path> which wraps pass show safely.
Agent Roles
These sub-agents are dispatched by the autonomous loop or manually via openclaw agent:
architect: designs story decomposition for complex Split items (Sonnet, Research Mode)planner: writes specs for Standard and Split items (Sonnet, Research Mode)code-agent: all implementation work (gpt-5.3-codex, Development Mode)tdd-guide: writes tests alongside implementation (gpt-5.3-codex, Development Mode)code-reviewer: reviews implementation before merge (Sonnet, Review Mode)security-reviewer: reviews auth, tenant, file, and input changes (Sonnet, Review Mode)build-error-resolver: fixes build failures, max 5 runs (gpt-5.3-codex, Development Mode)doc-updater: fills Phase 5 documentation gaps, auto-spawned (Claude Opus, Development Mode)e2e-runner: runs regression-test.sh and smoke tests (Sonnet, Development Mode)refactor-cleaner: cleanup pass after feature work (gpt-5.3-codex, Development Mode)
Agent role definitions live in /home/openclaw/.openclaw/workspace/workflow/agents/*.md.
9. Hard Rules
These are non-negotiable. They exist because someone (or some agent) learned the hard way.
Never push or merge to main without Mike's explicit approval. Not for hotfixes. Not at CONFIDENCE_SCORE 100. Not for "trivial" docs changes. The pre-push hook blocks direct pushes. If it ever needs bypassing, Mike must say so explicitly. Bypassing without that is a breach of the process.
Never hardcode secrets. All config goes in .env.local (gitignored) or the pass store. If a value is sensitive, it does not go in source. No exceptions.
Never use pkill -f next. This kills the exec shell, not just the server. Always use the PID file: kill $(cat /tmp/mc-dev.pid) for dev or kill $(cat /tmp/mc.pid) for prod.
Never re-enable mission-control.service. The systemd service is permanently disabled. The server is managed via PID files and Docker. If you see instructions to systemctl enable mission-control, they are outdated.
Never auto-post to social media. All tweets and LinkedIn posts require Mike's explicit review and approval. Navi will never post automatically. Any workflow that could post publicly must have a manual gate.
Never skip Phase 5 docs gate. If navi-ops release check fails on the docs gate, do not bypass it. Run navi-ops release docs to dispatch the doc-updater, or write the docs yourself.
Karpathy coding standards apply to all code. Target 200 to 400 lines per file, never exceed 800. Explicit error handling everywhere, no silent catch blocks. Validate all external input at API boundaries. Surgical changes only: touch only files that are in scope for the current story. No mixing unrelated cleanup with behavior changes.
One hypothesis at a time. When debugging, change one thing, verify, then move to the next. Do not stack speculative changes.
Max 2 research note emails per day to the work email address. Cron handles these. Do not send extras during heartbeats or ad-hoc sessions.
10. Getting Your Own Navi Clone
This section is for contributors who want to run the same autonomous DevOps workflow on their own infrastructure, separate from Mike's instance.
What It Means
Your own Navi clone is a separate OpenClaw instance running on a separate VPS, with its own SOUL.md, AGENTS.md, USER.md, and IDENTITY.md configured for you. It shares the same workflow structure (sprint phases, WORKFLOW_AUTO.md, agent roles) but operates completely independently. It talks to your Telegram, not Mike's.
Install OpenClaw on a Fresh VPS
curl -fsSL https://raw.githubusercontent.com/MikeS071/openclaw-starter/main/vps-install.sh | sudo bashThis handles: OpenClaw install and systemd user service, UFW firewall, Fail2ban, Tailscale VPN, structured workspace files, and the ocl management CLI. Requires Ubuntu 22.04 or 24.04. Takes about 5 minutes.
If you already have OpenClaw installed:
git clone https://github.com/MikeS071/openclaw-starter.git
cd openclaw-starter
bash install.shConfigure Your Persona
After install, edit these files in ~/.openclaw/workspace/:
SOUL.md: your working principles, autonomy rules, coding standardsUSER.md: who you are, your preferences, context Navi needs to knowAGENTS.md: session rules, memory protocol, group chat behaviorIDENTITY.md: Navi's name and personality for your instance
These files are loaded into every session. They are the core of how your Navi behaves. Spend time on SOUL.md in particular. A vague SOUL.md means an inconsistent agent.
Configure the LLM Provider
openclaw onboard --non-interactive --accept-risk --auth-choice anthropicOr for OpenAI:
openclaw onboard --non-interactive --accept-risk --auth-choice openai-api-keyStart the Gateway
nohup openclaw gateway run > /tmp/openclaw-gateway.log 2>&1 &On a properly installed VPS this runs automatically via the systemd user service on reboot.
Test it:
openclaw agent --agent main -m "Say hello"Connect to ArchonHQ Dashboard
Once your OpenClaw gateway is running, you can connect it to an ArchonHQ dashboard in BYO (Bring Your Own) mode. In the dashboard, go to Settings and select "Connect your own OpenClaw instance". You will need your gateway URL and API secret.
Use the Same Workflow Structure
Copy WORKFLOW_AUTO.md from this repo's workspace to your own instance. This gives your Navi the same phase discipline: pre-flight specs before code, CONFIDENCE_SCORE gates, Phase 5 docs requirement, and the same release rules. The workflow is designed to be persona-agnostic. Your Navi and Mike's Navi will follow the same process even though they are separate instances that never talk to each other.
Manage Your Installation
ocl status # check gateway health
ocl logs # view recent logs
ocl restart # restart the gateway
ocl backup # backup your config
ocl update # update to latest OpenClaw
ocl harden # restrict SSH to Tailscale onlyOne Important Note
Your instance is completely separate. It will not interfere with Mike's Navi, and Mike's Navi will not affect yours. They share no state, no crons, no databases. The only shared artifact is the workflow protocol documented here.