<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[ArchonHQ]]></title><description><![CDATA[Practical AI that wins every day]]></description><link>https://archonhq.ai</link><image><url>https://substackcdn.com/image/fetch/$s_!pNku!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3743a6e-7a43-4175-a8bb-20548855b667_112x112.png</url><title>ArchonHQ</title><link>https://archonhq.ai</link></image><generator>Substack</generator><lastBuildDate>Mon, 25 May 2026 14:14:02 GMT</lastBuildDate><atom:link href="https://archonhq.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Michal Szalinski]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[michalszalinski@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[michalszalinski@substack.com]]></itunes:email><itunes:name><![CDATA[Michal Szalinski]]></itunes:name></itunes:owner><itunes:author><![CDATA[Michal Szalinski]]></itunes:author><googleplay:owner><![CDATA[michalszalinski@substack.com]]></googleplay:owner><googleplay:email><![CDATA[michalszalinski@substack.com]]></googleplay:email><googleplay:author><![CDATA[Michal Szalinski]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Your Content Is a Production Pipeline , Build It Like One]]></title><description><![CDATA[A system that discovers ideas, filters them, drafts them, QA's them, generates visuals, publishes, distributes, and measures]]></description><link>https://archonhq.ai/p/your-content-is-a-production-pipeline</link><guid isPermaLink="false">https://archonhq.ai/p/your-content-is-a-production-pipeline</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Sun, 24 May 2026 21:01:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tOcd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tOcd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tOcd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png 424w, https://substackcdn.com/image/fetch/$s_!tOcd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png 848w, https://substackcdn.com/image/fetch/$s_!tOcd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png 1272w, https://substackcdn.com/image/fetch/$s_!tOcd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tOcd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png" width="1100" height="550" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:550,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1085349,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/197809116?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tOcd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png 424w, https://substackcdn.com/image/fetch/$s_!tOcd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png 848w, https://substackcdn.com/image/fetch/$s_!tOcd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png 1272w, https://substackcdn.com/image/fetch/$s_!tOcd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ab36e06-2ecb-47e0-bd28-3986226d43f7_1100x550.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You told yourself you&#8217;d post weekly. It&#8217;s been six weeks. Your Substack dashboard mocks you with that sad &#8220;0 posts this month&#8221; counter. You open a blank document, stare at it, close it, open Hacker News instead. The guilt loop continues.</p><p>Meanwhile, the AI bros on X are posting three times a day about &#8220;content leverage&#8221; while clearly using the same ChatGPT template as everyone else. Quantity up, quality sideways, audience numb.</p><p>There&#8217;s a third option. You can treat content the way you treat production software: as a pipeline with intake, quality control, assembly, finishing, distribution, and feedback. Skip any step and you get either silence or garbage. Run every step and you get consistent, high-quality output while you sleep.</p><p>I know because I built it. This article you&#8217;re reading? It came out of that pipeline. The other articles in this series? Same pipeline. Six Python scripts, five cron jobs, one environment file, zero frameworks.</p><p>Here&#8217;s the architecture.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Idea (60 Seconds)</h2><p>Content is a manufacturing problem, and manufacturing problems have manufacturing solutions. You need a system that discovers ideas, filters them, drafts them, QA&#8217;s them, generates visuals, publishes, distributes, and measures. Each stage is a script. Each script runs on a schedule. The human touches two points: approving ideas (5 minutes) and reviewing QA failures (15 minutes, rare). Everything else is automated.</p><h2>Why This Pipeline, Not Manual Blogging</h2><p>Most people treat content as inspiration plus typing. They wait for the muse, then labor over every sentence. It&#8217;s artisanal. Admirable. And completely unscalable past a few posts per month. The pipeline approach treats content as what it actually is for a technical blog: a manufacturing process. The ideas are raw materials. The scoring is quality control on intake. The drafting is assembly. The QA is inspection. The hero image is finishing. The distribution is logistics. The analytics are customer feedback.</p><p>The output: 2&#8211;3 articles per week. The cost: ~$0.50 per article in LLM and image generation tokens. The human time: under 30 minutes per day.</p><h2>The Framework: Six Stages, Six Scripts</h2>
      <p>
          <a href="https://archonhq.ai/p/your-content-is-a-production-pipeline">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build Your Own Cline Alternative in 200 Lines]]></title><description><![CDATA[Create a minimal VS Code extension that handles file operations, executes terminal commands, and connects to any OpenAI-compatible API.]]></description><link>https://archonhq.ai/p/build-your-own-cline-alternative</link><guid isPermaLink="false">https://archonhq.ai/p/build-your-own-cline-alternative</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Fri, 22 May 2026 21:00:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e_XO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e_XO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e_XO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png 424w, https://substackcdn.com/image/fetch/$s_!e_XO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png 848w, https://substackcdn.com/image/fetch/$s_!e_XO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png 1272w, https://substackcdn.com/image/fetch/$s_!e_XO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e_XO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png" width="1100" height="550" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:550,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:972740,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/197808465?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e_XO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png 424w, https://substackcdn.com/image/fetch/$s_!e_XO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png 848w, https://substackcdn.com/image/fetch/$s_!e_XO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png 1272w, https://substackcdn.com/image/fetch/$s_!e_XO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f492a9e-85a3-47ea-b289-c12d467632b2_1100x550.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Your AI coding assistant vanishes overnight. Cline gets abandoned. Roo Code stops responding to issues. The VS Code extension that automated your file operations, ran terminal commands, and integrated with your preferred AI models suddenly throws deprecation warnings.</p><p>You&#8217;re back to copying code snippets manually. Context switching between terminal and editor. Explaining the same codebase structure to ChatGPT every session. The 40% productivity boost from autonomous coding assistance evaporates because someone else controlled the tools you relied on.</p><p>What if you could build your own AI coding assistant in an afternoon, own the entire stack, and customize it exactly for your workflow?</p><h2>The Idea (60 Seconds)</h2><p>You&#8217;ll create a minimal VS Code extension that handles file operations, executes terminal commands, and connects to any OpenAI-compatible API. The 200-line implementation provides autonomous coding capabilities through a simple chat interface that can read your codebase, modify files, and run commands. Setup takes 30 minutes. The result gives you permanent control over your AI coding workflow.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Why Build This, Beyond Waiting for Alternatives</h2><p><strong>Dependency risk drops to zero.</strong> Commercial tools get discontinued. Open source projects get abandoned. Your custom extension lives in your codebase under your control. Zero external dependencies means zero abandonment risk.</p><p><strong>Customization becomes unlimited.</strong> You control the prompts, the model endpoints, and the file operation logic. Add project-specific commands. Integrate with your deployment scripts. Modify the behavior to match your exact workflow.</p><p><strong>API flexibility stays open.</strong> Connect to OpenAI, Anthropic, local Ollama instances, or any OpenAI-compatible endpoint. Switch providers by changing one configuration line. Your tool adapts to whatever AI infrastructure you prefer.</p><h2>Walkthrough</h2>
      <p>
          <a href="https://archonhq.ai/p/build-your-own-cline-alternative">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How to Give Claude Perfect Memory]]></title><description><![CDATA[Three layers of memory, each building on the last. Layer one takes five minutes and covers 90%+ of users.]]></description><link>https://archonhq.ai/p/how-to-give-claude-perfect-memory</link><guid isPermaLink="false">https://archonhq.ai/p/how-to-give-claude-perfect-memory</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Wed, 20 May 2026 21:09:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2suO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2suO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2suO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!2suO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!2suO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!2suO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2suO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2875249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/197807366?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2suO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!2suO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!2suO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!2suO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5abef7e3-0d7f-49e5-b615-d6257fc46c60_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>By default, Claude&#8217;s memory is basically decorative. It forgets context mid-conversation. You re-explain yourself constantly. Even after you do, the next session starts from zero.</p><p>Most people have been living with this for months, assuming it&#8217;s just how LLMs work. It&#8217;s how LLMs work <em>absent a system</em>. With a system, everything changes.</p><p>I use Claude every single day. More screen time than any other app on my Mac. I need it sharp, consistent, and carrying forward every decision, preference, and hard-won lesson from the sessions before.</p><h2>The Idea (60 Seconds)</h2><p>Three layers of memory, each building on the last. Layer one takes five minutes and covers 90%+ of users.</p><h2>Why Build a Memory System Instead of Re-explaining</h2><p>Every time you start a new Claude session, you burn tokens re-establishing context. Over a month, that compounds into hours of wasted time and inconsistent outputs. A memory system pays for itself on day one. Layer one alone saves ten minutes per session. Layer three makes Claude genuinely useful for long-running projects where consistency across weeks matters.</p><p>Layer two takes about an hour and changes how Claude operates entirely.</p><p>Layer three turns Claude into a self-evolving second brain, trained on all your data, with persistent search and recall across every conversation you&#8217;ve ever had.</p><p>Here are all three.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Layer 1: Basic Memory (5 Minutes)</h2><p>Four quick wins. Minutes to set up. Immediate improvement in every conversation.</p><h3>1. Memory Editing Tool</h3><p>Go to Settings &#8594; Memory right now.</p><p>This is the most overlooked page in Claude. Most people have zero awareness it exists.</p><p>What you&#8217;ll find: everything Claude has stored about you, accumulated passively across every conversation. Preferences, facts, habits, working styles. Left alone, your memory fills up with garbage fast.</p><p>The fix: read through everything on this page. Delete anything outdated, inaccurate, or irrelevant. Then manually add the context you actually want Claude to carry permanently.</p><p>Stick to the basics here (your role, key preferences). We&#8217;ll build highly specific systems soon.</p><h3>2. Project Instructions</h3><p>If you use Claude Projects (you should), fill in your Project Instructions field.</p><p>My advice: create projects for all your most-used workflows, then voice-prompt all your context into a Google Doc and upload it as a PDF for each project.</p><h3>3. Tell Claude Directly</h3><p>The simplest memory hack on this list. Mid-conversation, just tell Claude what to remember.</p><p>Things like:</p><p>&#8220;Remember that I prefer responses under 400 words.&#8221;<br>&#8220;Remember that my role is [x].&#8221;<br>&#8220;Update your memory with [x].&#8221;</p><p>Claude stores these immediately. You can also tell it to forget things: &#8220;Forget that I mentioned [x].&#8221;</p><h3>4. Memory Imports and Exports</h3><p>If you&#8217;ve been using ChatGPT (or another LLM) and have built up significant context there, you have two options to transfer it:</p><p>a) Tell ChatGPT you&#8217;re switching platforms and ask it to generate a memory export document: &#8220;I&#8217;m switching this project to Claude, give me a summary document...&#8221;</p><p>b) Use Import/Export in Claude. In Settings &#8594; Memory, you can import full data from other LLMs.</p><p>These four edits cover 90%+ of users and make an immediate impact on how Claude responds.</p><p>The next section is for people who want a real system.</p><h2>Layer 2: Context File System (~60 Minutes)</h2><p>Layer 1 fixes the basic memory problems. Layer 2 builds something more powerful: a file-based memory architecture that lives on your computer, loads automatically into Cowork and Claude Code.</p><p>The concept: instead of prompting Claude for context every time, you store all of that context in .MD desktop files that Claude has access to. You can also attach these markdown files to any LLM or AI agent system.</p><p>Create a new desktop folder, label it &#8220;Claude Master Folder&#8221;, and build these four markdown files within it (Claude can help you do this):</p><h3>1. Instructions.md</h3><p>This file tells Claude all your rules and instructions:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;69562032-31d6-4842-91f0-e8a1272e580e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">## Who you are
## What you do
## Rules
## What good outputs look like
</code></pre></div><p>Important to include: &#8220;Update Memory.md with my preferences over time.&#8221;</p><p>This line is crucial. It&#8217;s how you get Claude to create a running memory log of your data in the second markdown file.</p><h3>2. Memory.md</h3><p>This is the &#8220;brain&#8221; of Claude, continuously updated over time.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;7574c8df-86be-442c-8f85-f7ba3c4ca581&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">## Preferences
## Corrections
## Patterns
## Decisions
</code></pre></div><p>Now whehas yet to you say something like &#8220;stop using em dashes,&#8221; Claude goes into the memory file and updates it.</p><h3>3. Context.md</h3><p>The specific context file for a given project. What&#8217;s in this file changes depending on your project. You can also create a general &#8220;business context&#8221; or &#8220;life context&#8221; markdown mega file.</p><h3>4. Archive Copies</h3><p>This one is purely protective but worth doing.</p><p>Claude will update your memory files automatically as you work. Occasionally, it overwrites something incorrectly or makes a change you missed. Absent a backup system, that context is gone.</p><p>The fix: once a week, copy your entire master folder (Instructions, Memory, Context, and everything else) into a separate archive folder that Claude has zero access to. Label it with the date.</p><p>If anything breaks or gets overwritten incorrectly, restore from the archive.</p><h3>Setting It Up</h3><p>Just create a new folder called &#8220;Claude Master Folder,&#8221; attach it to a new Cowork chat, and paste this prompt:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;863cdad9-fff2-48c5-a646-d2fd571be4b2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Go into my "Claude Master Folder" in my connected workspace and build 
these four markdown files inside it:

Instructions.md - includes sections for: Who You Are, What You Do, 
Rules, What Good Outputs Look Like, and a line telling Claude to 
update Memory.md with my preferences over time.

Memory.md - includes sections for: Preferences, Corrections, 
Patterns, Decisions, and Personal Context. Pre-fill with placeholder 
examples so I know what to add.

Context.md - includes sections for: About This Project/Business, 
Audience, Key People &amp; Collaborators, Active Projects &amp; Priorities, 
Tools &amp; Stack, and Important Background/History. Use a template 
format with placeholders I can fill in.

Archive-Guide.md - a step-by-step guide explaining why to archive,
how to do it weekly (duplicate the folder, rename with the date, 
move it somewhere Claude has zero access to), what to include, 
how to restore if something breaks, and where to store the backups.
</code></pre></div><p>Anytime you&#8217;re working in Cowork or Claude Code, attach your Master Folder and Claude uses it as a mini memory database. It edits the memory markdown file, leaving you with something you can attach to any LLM, new chat, or AI agent.</p><p>This system is a complete game-changer. But Layer 3 takes it further.</p><h2>Layer 3: AI Second Brain (1-2 Hours)</h2><p>This is the deepest level. It requires setup and ongoing maintenance, but for those who build it, it&#8217;s the most advanced memory system available for Claude today.</p><p>Two options depending on how you work. Option 1 is the fast path. Option 2 is the power-user path, requiring 1-2 hours of dedicated building.</p><p>Keep in mind: for your AI second brain memory vault to be effective, you have to spend time maintaining it and updating your databases. This is a living system, a set-and-forget approach produces decay.</p><h3>Option 1: Claude x Notion (5 Minutes)</h3><p>Connecting Claude to Notion is the highest-leverage thing you can do in 5 minutes.</p><p>Go to Claude &#8594; Settings &#8594; Connectors, then enable the Notion connector.</p><p>Once connected, Claude reads your Notion workspace directly inside any chat.</p><p>All your tasks, CRMs, notes, tables are now accessible and editable for Claude.</p><p>I recommend creating a new &#8220;Memory Database&#8221; where you store all your AI preferences, rules, and important AI context. As you&#8217;re working with Claude, you can say: &#8220;Send this to my Notion Memory Database.&#8221;</p><p>You can then export this Notion data to other LLMs or AI platforms via a CSV file or by using the Notion MCP connector.</p><p>This setup is similar to Layer 2, except you gain Notion&#8217;s built-in board views, to-do lists, and additional functionality.</p><h3>Option 2: Claude x Obsidian x AI Engram (1-2 Hours)</h3><p>This is the setup I personally use. It combines three things:</p><ol><li><p><strong>Obsidian</strong> for local markdown storage (your files, your machine, your control)</p></li><li><p><strong>Karpathy&#8217;s LLM Knowledge Base</strong> schema for structuring how Claude organizes and compounds knowledge over time</p></li><li><p><strong>AI Engram</strong> for persistent search and memory across every conversation</p></li></ol><p>Here&#8217;s why this stack matters: Layer 2 gives Claude a folder of files to read. Layer 3 Option 2 gives Claude a <em>searchable, evolving knowledge system</em> that compounds with every conversation.</p><h4>Step 1: Download Obsidian</h4><p>Go to obsidian.md and download the app.</p><p>Create a new Vault (think of this as a desktop folder where Claude Code stores and accesses your data). Your data stays local. Zero cloud dependency.</p><h4>Step 2: Point Claude at Your Vault</h4><p>Open the Claude desktop app and click &#8216;Select Folder.&#8217; Point it at your Obsidian Vault folder. Claude now has direct read and write access to everything inside it.</p><h4>Step 3: Inject the Knowledge Base Schema</h4><p>Paste Andrej Karpathy&#8217;s LLM Knowledge Base system prompt into the chatbox. This is the instruction set that tells Claude Code how to build, maintain, and evolve your wiki over time.</p><p>The prompt is available here: gist.github.com/karpathy/442a6bf555914893e9891c11519de94f</p><p>I wrote about this system in detail in my earlier article, &#8220;Build an LLM Knowledge Base That Actually Compounds.&#8221; The key architecture:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;337cf6e5-aa27-4202-9c22-1d2678405c0c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">your-vault/
&#9500;&#9472;&#9472; raw/          # Immutable source documents (AI reads, has yet to modifies)
&#9500;&#9472;&#9472; wiki/         # AI-maintained wiki with domain folders
&#9474;   &#9500;&#9472;&#9472; index.md  # Navigation hub
&#9474;   &#9492;&#9472;&#9472; log.md    # Append-only action log
&#9500;&#9472;&#9472; outputs/      # Generated reports and query answers
&#9492;&#9472;&#9472; AGENTS.md     # Schema defining how the AI organizes, ingests, and queries
</code></pre></div><p>The <code>AGENTS.md</code> schema is the single most important file. It defines identity, architecture, conventions, and workflows. Every wiki page gets YAML frontmatter. Wiki-links cross-reference topics. Source citations are required. Contradictions get flagged.</p><p>Three core workflows defined in the schema:</p><ol><li><p><strong>Ingest</strong>: Read a source, extract key information, create/update summary pages, update index, add backlinks, flag contradictions, log it. A single source touches 10-15 wiki pages.</p></li><li><p><strong>Query</strong>: Read index first, find relevant pages, synthesize answer with citations, offer to file insights back into wiki.</p></li><li><p><strong>Lint</strong> (monthly): Check contradictions, stale claims, orphan pages, missing cross-references, unattributed claims. Output a severity-leveled report.</p></li></ol><p>This system alone is powerful. But it has a gap: every new conversation starts with zero recall of past conversations. Claude reads your wiki files, sure, but it has zero memory of the <em>decisions, preferences, and insights</em> from previous chat sessions.</p><p>That gap is exactly what AI Engram fills.</p><h4>Step 4: Install AI Engram</h4><p>AI Engram is an MCP (Model Context Protocol) server that gives Claude persistent conversation memory and deep search over your markdown workspace. It runs entirely locally. Zero cloud services. Zero API calls.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;e0962240-8847-41a9-8c24-253c64597e9c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">pip install ai-engram
# or clone from github.com/MikeS071/ai-engram
</code></pre></div><p>Add it to your Claude Desktop MCP config:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;a8cf1c19-78f5-4056-949d-ad723982c336&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">{
  "mcpServers": {
    "ai-engram": {
      "command": "python",
      "args": ["aiengram_mcp.py"],
      "cwd": "/path/to/your/vault"
    }
  }
}
</code></pre></div><p>AI Engram gives Claude 13 new tools, split into two groups:</p><p><strong>Content Search (6 tools):</strong></p><p>Tool What It Does <code>search_blog</code> BM25 keyword search with relevance scoring and snippets <code>semantic_search_blog</code> Meaning-based search via sentence-transformer embeddings <code>build_index</code> Pre-build or refresh the semantic embedding cache <code>list_blog_files</code> List markdown files, filterable by collection <code>blog_stats</code> File counts and word totals across collections <code>read_blog_file</code> Read full markdown file content (with fuzzy path matching)</p><p><strong>Conversation Memory (7 tools):</strong></p><p>Tool What It Does <code>remember</code> Store a memory with category and optional tags <code>recall</code> Semantic search across stored memories <code>recall_all</code> Cross-search memories AND blog content via RRF fusion <code>list_memories</code> Browse memories by category, newest first <code>forget</code> Delete a specific memory by ID <code>memory_stats</code> Memory counts by category and storage size <code>get_system_prompt</code> Load the context memory protocol instructions</p><p>The search pipeline combines BM25 (keyword) and semantic (embedding) search via Reciprocal Rank Fusion. BM25 catches exact terms. Semantic catches meaning. Together, they find things that either approach alone would miss.</p><h4>Step 5: How Memory Actually Works</h4><p>AI Engram stores memories as JSONL entries (append-only, easy to inspect, easy to recover). Each memory has an ID, category, content, tags, timestamp, and source. Six categories:</p><p>Category Use Case <code>decision</code> Architectural choices, workflow rules, rejected approaches <code>preference</code> Tool choices, formatting styles, workflow preferences <code>insight</code> Key learnings, patterns discovered, breakthroughs <code>context</code> Background information, project state, environment details <code>task</code> Completed work, milestones, deliverables <code>note</code> General purpose, anything worth persisting</p><p>The Context Memory Protocol works like this:</p><p>At conversation start, Claude calls <code>recall_all</code> with a relevant query, then <code>list_memories</code> with category &#8220;decision&#8221; to load workflow decisions from past sessions.</p><p>During conversation, Claude automatically stores decisions, preferences, completed tasks, important context, insights, and notes using the <code>remember</code> tool.</p><p>The result: every conversation builds on every conversation before it. Decisions persist. Preferences stick. Insights compound.</p><h4>The Final Product</h4><p>Your Obsidian Vault now contains:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;455e91bf-d142-43ac-8077-44c0a9f43bd2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">your-vault/
&#9500;&#9472;&#9472; raw/                      # Source documents (immutable)
&#9500;&#9472;&#9472; wiki/                     # Evolving knowledge base
&#9474;   &#9500;&#9472;&#9472; index.md              # Navigation hub
&#9474;   &#9500;&#9472;&#9472; log.md                # Append-only action log
&#9474;   &#9492;&#9472;&#9472; [domain folders]      # Topic-organized wiki pages
&#9500;&#9472;&#9472; outputs/                  # Generated reports
&#9500;&#9472;&#9472; AGENTS.md                 # Knowledge base schema
&#9500;&#9472;&#9472; .aiengram_memory.jsonl    # Persistent conversation memory
&#9492;&#9472;&#9472; .aiengram_cache.pkl       # Semantic embedding cache
</code></pre></div><p>Claude reads your wiki. Claude searches your files with hybrid BM25+semantic search. Claude remembers every decision across every session. Your knowledge base compounds. Your memory persists.</p><h2>Where This System Breaks</h2><p><strong>Context window ceiling.</strong> Around 100 articles or 400K words, selective reading via the index introduces blind spots. Claude reads the index first and may miss relevant pages further down.</p><p><strong>Error compounding.</strong> The AI writes a subtle mistake into your wiki. A later query uses that mistake. It files back insights reinforcing the error. This is the compounding downside of a compounding system.</p><p><strong>Hallucination persists.</strong> Your wiki looks authoritative with citations and structured formatting. But the AI can still synthesize false connections. The structure makes mistakes <em>look</em> more credible.</p><p><strong>Cost adds up.</strong> Frontier models run $1-2 per ingest operation. Ten sources a day adds up. Cheaper models work for simple updates, frontier models for complex ingestion.</p><p><strong>AI Engram requires maintenance.</strong> The JSONL memory file grows. Occasionally you need to review, prune, and <code>forget</code> outdated memories. A set-and-forget approach produces the same decay as Layer 1&#8217;s unmanaged memory page.</p><p><strong>Scaling caps out around 10K sources.</strong> This system serves individuals and small teams well. Enterprise-scale knowledge management requires a different architecture.</p><h2>Which Layer Should You Build?</h2><p>Layer Time Best For 1: Basic Memory 5 minutes Everyone. Start here. 2: Context Files ~60 minutes Power users with repeatable workflows 3 Option 1: Notion 5 minutes People already in Notion who want visual dashboards 3 Option 2: Obsidian + Engram 1-2 hours People who want local control, deep search, and persistent memory across sessions</p><p>My recommendation: start at Layer 1 today. Build Layer 2 this week. Graduate to Layer 3 Option 2 when you&#8217;re ready to stop repeating yourself across every conversation.</p><p>The difference between Claude with default memory and Claude with a second brain is the difference between a goldfish and an elephant. Same fishbowl. Completely different relationship with time.</p><div><hr></div><p><em>This article was built from real systems: the LLM Knowledge Base architecture (covered in detail at archonhq.ai) and AI Engram (github.com/MikeS071/ai-engram), an open-source MCP server for persistent AI memory. Both run locally. Both compound. Go build yours.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://archonhq.ai/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Clone Hermes Agent's Architecture for Your Own AI Assistant]]></title><description><![CDATA[Reverse-engineer Hermes Agent's core design patterns to build a production-grade AI assistant framework]]></description><link>https://archonhq.ai/p/clone-hermes-agents-architecture</link><guid isPermaLink="false">https://archonhq.ai/p/clone-hermes-agents-architecture</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Mon, 18 May 2026 21:00:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!EFZv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EFZv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EFZv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!EFZv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!EFZv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!EFZv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EFZv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2652918,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/197806293?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EFZv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!EFZv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!EFZv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!EFZv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69cd935d-331e-474e-8c47-dbc787be86ef_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Your AI assistant forgets the conversation context after three exchanges. The tool calling fails when you chain multiple operations. The memory system breaks when handling complex workflows that span multiple sessions.</p><p>You&#8217;re cobbling together OpenAI function calls with custom prompt engineering while fighting race conditions in multi-step processes. The assistant that worked for simple Q&amp;A completely falls apart when you need it to research, analyze, and execute a series of dependent tasks.</p><p>Meanwhile, Nous Research&#8217;s Hermes Agent handles complex workflows flawlessly. Multi-turn conversations maintain perfect context. Tool execution chains together seamlessly. The architecture scales from simple queries to sophisticated automation.</p><h2>The Idea (60 Seconds)</h2><p>You&#8217;ll reverse-engineer Hermes Agent&#8217;s core design patterns to build a production-grade AI assistant framework. The implementation uses a modular plugin system, persistent memory management, and standardized tool interfaces that handle complex workflows reliably. Setup takes 2 hours. The result gives you an assistant architecture that scales from basic chat to autonomous task execution.</p><h2>Why This Architecture, Beyond Simple Function Calling</h2><p><strong>Memory persistence solves context degradation.</strong> Standard chat implementations lose context as conversations grow. Hermes uses structured memory that maintains conversation state, user preferences, and task history across sessions. Your assistant remembers what you discussed yesterday and builds on previous work.</p><p><strong>Plugin modularity enables unlimited expansion.</strong> Function calling requires hardcoded tool definitions. The Hermes pattern uses a plugin interface where tools register themselves dynamically. Add new capabilities by dropping Python files into a plugins directory. Zero core code changes.</p><p><strong>Execution planning prevents tool chaos.</strong> Naive implementations call tools randomly based on user input. Hermes creates execution plans that sequence tool calls logically, handle dependencies, and recover from failures. The difference between &#8220;search for Python tutorials&#8221; and &#8220;search for Python tutorials, summarize the top 3, create a learning plan, and schedule practice sessions.&#8221;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Walkthrough</h2><h3>1. Core Agent Framework</h3><p>Create the base agent class that handles conversation flow and tool coordination:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;28bd9612-c2d8-439c-9de3-43323b0ab9fd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># agent.py
import json
import asyncio
from typing import Dict, List, Any, Optional
from dataclasses import dataclass
from datetime import datetime

@dataclass
class Message:
    role: str
    content: str
    timestamp: datetime
    metadata: Dict[str, Any] = None

class HermesAgent:
    def __init__(self, model_client, memory_store, plugin_manager):
        self.model = model_client
        self.memory = memory_store
        self.plugins = plugin_manager
        self.conversation_id = None
        
    async def process_message(self, user_input: str) -&gt; str:
        # Load conversation context
        context = await self.memory.get_context(self.conversation_id)
        
        # Create execution plan
        plan = await self.create_execution_plan(user_input, context)
        
        # Execute plan steps
        results = []
        for step in plan.steps:
            result = await self.execute_step(step)
            results.append(result)
            
        # Generate response
        response = await self.synthesize_response(results, user_input)
        
        # Store conversation state
        await self.memory.store_exchange(
            self.conversation_id, user_input, response, results
        )
        
        return response
</code></pre></div><h3>2. Memory Management System</h3><p>Implement persistent memory that maintains context across sessions:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;0d2ef26f-2ff6-45b6-be64-1f7dab0588ee&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># memory.py
import sqlite3
import json
from typing import Dict, List, Optional

class MemoryStore:
    def __init__(self, db_path: str):
        self.db_path = db_path
        self.init_database()
        
    def init_database(self):
        conn = sqlite3.connect(self.db_path)
        conn.execute('''
            CREATE TABLE IF NOT EXISTS conversations (
                id TEXT PRIMARY KEY,
                created_at TIMESTAMP,
                last_active TIMESTAMP,
                context_summary TEXT
            )
        ''')
        conn.execute('''
            CREATE TABLE IF NOT EXISTS messages (
                id INTEGER PRIMARY KEY,
                conversation_id TEXT,
                role TEXT,
                content TEXT,
                timestamp TIMESTAMP,
                metadata TEXT,
                FOREIGN KEY (conversation_id) REFERENCES conversations (id)
            )
        ''')
        conn.commit()
        conn.close()
        
    async def get_context(self, conversation_id: str) -&gt; Dict:
        conn = sqlite3.connect(self.db_path)
        
        # Get recent messages
        messages = conn.execute('''
            SELECT role, content, timestamp, metadata 
            FROM messages 
            WHERE conversation_id = ? 
            ORDER BY timestamp DESC 
            LIMIT 20
        ''', (conversation_id,)).fetchall()
        
        # Get conversation summary
        summary = conn.execute('''
            SELECT context_summary 
            FROM conversations 
            WHERE id = ?
        ''', (conversation_id,)).fetchone()
        
        conn.close()
        
        return {
            'messages': [
                {
                    'role': msg[0], 
                    'content': msg[1], 
                    'timestamp': msg[2],
                    'metadata': json.loads(msg[3] or '{}')
                } 
                for msg in reversed(messages)
            ],
            'summary': summary[0] if summary else None
        }
</code></pre></div><h3>3. Plugin System Architecture</h3><p>Build the modular tool interface that enables dynamic capability expansion:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;3acd2c8b-d068-4a6e-8dbd-48256ced2689&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># plugins.py
import importlib
import os
from abc import ABC, abstractmethod
from typing import Dict, Any, List

class Plugin(ABC):
    @property
    @abstractmethod
    def name(self) -&gt; str:
        pass
        
    @property
    @abstractmethod
    def description(self) -&gt; str:
        pass
        
    @abstractmethod
    async def execute(self, parameters: Dict[str, Any]) -&gt; Any:
        pass
        
    @abstractmethod
    def get_schema(self) -&gt; Dict:
        pass

class PluginManager:
    def __init__(self, plugins_dir: str):
        self.plugins_dir = plugins_dir
        self.plugins: Dict[str, Plugin] = {}
        self.load_plugins()
        
    def load_plugins(self):
        for filename in os.listdir(self.plugins_dir):
            if filename.endswith('.py') and filename != '__init__.py':
                module_name = filename[:-3]
                spec = importlib.util.spec_from_file_location(
                    module_name, 
                    os.path.join(self.plugins_dir, filename)
                )
                module = importlib.util.module_from_spec(spec)
                spec.loader.exec_module(module)
                
                # Find Plugin subclasses
                for attr_name in dir(module):
                    attr = getattr(module, attr_name)
                    if (isinstance(attr, type) and 
                        issubclass(attr, Plugin) and 
                        attr != Plugin):
                        plugin_instance = attr()
                        self.plugins[plugin_instance.name] = plugin_instance
                        
    def get_available_tools(self) -&gt; List[Dict]:
        return [
            {
                'name': plugin.name,
                'description': plugin.description,
                'schema': plugin.get_schema()
            }
            for plugin in self.plugins.values()
        ]
</code></pre></div><h3>4. Example Plugin Implementation</h3><p>Create a web search plugin that follows the standard interface:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;896b50e2-e719-46a5-bd0b-b68fbb56340b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># plugins/web_search.py
import aiohttp
import json
from plugins import Plugin

class WebSearchPlugin(Plugin):
    @property
    def name(self) -&gt; str:
        return "web_search"
        
    @property
    def description(self) -&gt; str:
        return "Search the web for current information"
        
    async def execute(self, parameters):
        query = parameters.get('query')
        max_results = parameters.get('max_results', 5)
        
        # Use your preferred search API
        async with aiohttp.ClientSession() as session:
            url = f"https://api.search.brave.com/res/v1/web/search"
            headers = {"X-Subscription-Token": "your_api_key"}
            params = {"q": query, "count": max_results}
            
            async with session.get(url, headers=headers, params=params) as response:
                data = await response.json()
                
        results = []
        for item in data.get('web', {}).get('results', []):
            results.append({
                'title': item.get('title'),
                'url': item.get('url'),
                'description': item.get('description')
            })
            
        return {'results': results, 'query': query}
        
    def get_schema(self):
        return {
            'type': 'object',
            'properties': {
                'query': {'type': 'string', 'description': 'Search query'},
                'max_results': {'type': 'integer', 'description': 'Maximum results to return'}
            },
            'required': ['query']
        }
</code></pre></div><h3>5. Execution Planning</h3><p>Implement the planning system that sequences tool calls intelligently:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;fdb32af2-a81a-42f2-9911-ed35da75e4d2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># planner.py
from typing import List, Dict
from dataclasses import dataclass

@dataclass
class ExecutionStep:
    tool_name: str
    parameters: Dict
    depends_on: List[str] = None
    step_id: str = None

class ExecutionPlanner:
    def __init__(self, model_client, plugin_manager):
        self.model = model_client
        self.plugins = plugin_manager
        
    async def create_plan(self, user_input: str, context: Dict) -&gt; List[ExecutionStep]:
        available_tools = self.plugins.get_available_tools()
        
        planning_prompt = f"""
        User request: {user_input}
        Available tools: {json.dumps(available_tools, indent=2)}
        
        Create a step-by-step execution plan. Each step should use one tool.
        Consider dependencies between steps.
        
        Respond with a JSON array of steps:
        [
            {{
                "step_id": "step_1",
                "tool_name": "web_search",
                "parameters": {{"query": "Python tutorials"}},
                "depends_on": []
            }}
        ]
        """
        
        response = await self.model.complete(planning_prompt)
        steps_data = json.loads(response)
        
        return [ExecutionStep(**step) for step in steps_data]
</code></pre></div><h2>Caveats</h2><p><strong>Model quality determines planning effectiveness.</strong> The execution planner relies on the language model understanding tool capabilities and sequencing logic. Weaker models create inefficient plans or miss dependencies. GLM-5.1 level capability becomes essential for complex workflows.</p><p><strong>Memory storage grows indefinitely.</strong> The SQLite implementation accumulates conversation history permanently. Add cleanup routines for conversations older than 30 days or implement conversation archiving to prevent database bloat.</p><p><strong>Plugin isolation remains minimal.</strong> Plugins execute in the same Python process with full system access. Malicious or buggy plugins can crash the entire agent. Consider sandboxing for production deployments handling untrusted plugins.</p><h2>Philosophy</h2><p>Building your own agent architecture creates compound advantages over time. Each plugin you add increases the system&#8217;s capabilities exponentially. The memory system learns your preferences and work patterns. The execution planner gets better at sequencing tasks for your specific use cases.</p><p>The Hermes architecture pattern scales from personal assistant to team automation platform. Start with web search and file operations. Add calendar integration, code analysis, and deployment tools. The modular design grows with your needs while maintaining reliability.</p><p>You own the entire stack. Zero vendor dependencies. Zero API rate limits. Zero feature deprecation risk.</p><h2>Build Yours</h2><p>Start with the core agent framework and memory system. Build one plugin. Test the execution planning with simple two-step workflows. The architecture becomes clear once you see it running.</p><p>What&#8217;s the first capability you&#8217;ll add to your agent? Drop your plugin ideas in the comments.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://archonhq.ai/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Your ICP Is a Trap]]></title><description><![CDATA[Your Ideal Customer Profile is a trap when it answers the wrong question.]]></description><link>https://archonhq.ai/p/your-icp-is-a-trap</link><guid isPermaLink="false">https://archonhq.ai/p/your-icp-is-a-trap</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Sun, 17 May 2026 07:26:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ohP0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ohP0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ohP0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!ohP0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!ohP0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!ohP0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ohP0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2506170,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/198092930?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ohP0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!ohP0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!ohP0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!ohP0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F722064c1-bc51-456f-b321-e18e7cc27cd4_1920x1080.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You spend six weeks building an AI agent that automates invoice processing for small businesses. You launch. Crickets. You posted in three Discord servers, sent 40 cold DMs, ran $200 in ads. Zero paying customers. The product works. The demos are smooth. Sales stay at zero.</p><p>The problem stared you in the face the whole time. Your ICP was &#8220;small business owners who need automation.&#8221; That describes 30 million people and excites exactly zero of them. You defined your ideal customer by demographics, by role, by company size. You listed who they are. You failed to ask whether they care, whether they spend, whether you can reach them, and whether you have any right to win.</p><h2>The Idea (60 Seconds)</h2><p>Your Ideal Customer Profile is a trap when it answers the wrong question. Most builders define ICP by demographics: age, income, job title, company size. Those attributes describe a person. They fail to predict behavior.</p><p>A strong ICP answers one question: Who is actively trying to solve this problem right now, has the ability to pay, and can be reached?</p><p>Urgency and situation beat demographics every time. A 42-year-old CFO at a logistics company drowning in manual reconciliation is your ICP. &#8220;CFOs at mid-market companies&#8221; is a demographic label that includes thousands of people perfectly happy with their spreadsheets.</p><p>The 4-Filter Test screens your ICP before you invest a single build hour. Pain. Market. Access. Fit. Each filter eliminates weak assumptions. Pass all four, and you have a target worth building for.</p><p>Two complementary question sequences sharpen the result. The Narrowing Funnel, derived from Alex Hormozi&#8217;s framework, starts broad and drills to urgency. The Lighthouse Client Method, created by Rmosh, grounds your ICP in a real human being instead of an abstract persona.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Why This Matters</h2><p>Every AI builder hits the same wall. You learn prompt engineering. You master agent frameworks. You ship something that works. Then you realize you built it for everybody, which means you built it for an audience of zero.</p><p>Generic ICPs produce generic messaging. Generic messaging produces low conversion and high churn. You attract people who kind of sort of need your thing. They sign up, poke around, and leave. Your retention numbers look like a cliff.</p><p>The cost compounds fast. Six weeks of building for the wrong audience means six weeks of code you may need to rewrite, six weeks of positioning you need to undo, and six weeks of motivation burned on a product zero people wanted.</p><p>The antidote is simple and ruthless: filter before you build. The 4-Filter Test takes 30 minutes and saves months.</p><h2>Walkthrough</h2><h3>The 4-Filter Test</h3><p>Run your ICP through these four filters in order. Fail any single one, and you stop. Revisit your assumptions. Pick a different target. Do it all before writing a single line of code.</p><p><strong>Filter 1: Pain.</strong> Are real people experiencing this problem and actively seeking solutions?</p><p>This is the urgency filter. People complain about many things. People seek solutions for far fewer. Your ICP must have a problem painful enough that they are already looking for help, googling alternatives, posting in forums, asking colleagues.</p><p>Test: Search for the problem in Reddit, Twitter, industry Slack channels. If people are posting about it and asking for recommendations, pain is real. If you find only vague complaints, the pain is too low to drive purchase behavior.</p><p>Example: &#8220;Bookkeeping is tedious&#8221; is a complaint. &#8220;I spent 12 hours last weekend reconciling invoices and I am still behind&#8221; is a pain signal. The second person buys. The first person scrolls past your ad.</p><p><strong>Filter 2: Market.</strong> Is there a group spending money on solutions already?</p><p>Existing spend proves willingness to pay. If zero people are spending money to solve this problem, you are fighting human inertia and budget allocation at the same time. That is a losing battle.</p><p>Test: Search for existing products, agencies, consultants, or freelancers serving this problem. Check their pricing pages. Look for G2 or Capterra listings. Paid competitors validate the market. Zero competitors usually signals zero market, and first-mover advantage is a myth for solo builders.</p><p>Example: Automation tools for real estate agents exist everywhere, and agents pay for them. That signals a market. A tool for &#8220;people who want to journal more creatively&#8221; faces a market of free alternatives and low willingness to pay.</p><p><strong>Filter 3: Access.</strong> Can you reach these people through channels you can actually use?</p><p>A perfect ICP locked behind an unreachable channel is useless. If your target is Fortune 500 CTOs and your only channel is a Twitter account with 200 followers, you lack access. Access means you can put your message in front of your ICP repeatedly, at low cost, starting this week.</p><p>Test: List every channel where your ICP spends time. Then honestly assess whether you can show up there. Do you have followers there? Do you know someone who does? Can you write content they read? Can you cold-email them effectively?</p><p>Example: React developers are reachable through Twitter, Dev.to, Discord, GitHub, and conference communities. Mid-market hospital administrators are reachable through expensive trade shows and closed networks. Pick the ICP you can actually reach.</p><p><strong>Filter 4: Fit.</strong> Does your skill or experience give you an edge with this group?</p><p>You need earned advantage. Domain knowledge, professional network, technical expertise, or lived experience that lets you build something better or faster than a random competitor. Fit is your moat at the earliest stage.</p><p>Test: Ask yourself what you know about this ICP that most people lack. If the answer is &#8220;zero,&#8221; you are competing on execution alone against people who have both execution and insight.</p><p>Example: A former tax accountant building automation for tax firms has massive fit. A career developer building automation for dental practices has zero fit. Both can build the product. The former builds the right product faster.</p><h3>The Narrowing Funnel (Hormozi-Derived)</h3><p>Once your ICP passes all four filters, sharpen it with this question sequence. Each question narrows the field.</p><ol><li><p><strong>Who specifically?</strong> Start broad: &#8220;Business owners.&#8221; Narrow: &#8220;E-commerce business owners.&#8221; Narrower: &#8220;E-commerce business owners doing $1M to $10M in revenue.&#8221; Each level removes people who dilute your message.</p></li><li><p><strong>What is their situation?</strong> Describe the context that creates the problem. &#8220;E-commerce owners managing inventory across three warehouses with a team of five and lacking a dedicated operations person.&#8221;</p></li><li><p><strong>What is the painful version?</strong> Find the acute symptom. &#8220;SKU mismatches causing stockouts on best-selling items during peak season.&#8221; This is what keeps them up at night.</p></li><li><p><strong>What triggers them to seek help right now?</strong> Identify the event that converts latent frustration into active purchasing. &#8220;Black Friday inventory errors cost them $50K in lost sales last year, and Q4 is eight weeks away.&#8221; That is urgency.</p></li><li><p><strong>What is the outcome they would pay for?</strong> State the result in their language. &#8220;Eliminate SKU mismatches so every order ships correct and on time.&#8221; The outcome, the result, rather than the feature.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li></ol><h3>The Lighthouse Client Method</h3><p>The Narrowing Funnel gives you a precise segment. The Lighthouse Client Method grounds it in a real human being.</p><ol><li><p><strong>Identify one person you would love to help.</strong> A specific individual. A former colleague, a client you worked with, a person from a community you belong to. Someone you can picture clearly.</p></li><li><p><strong>Map their entire day.</strong> From morning to evening, what do they do? Where do they spend time? What tools do they open? What meetings drain them? What tasks feel like wasted effort?</p></li><li><p><strong>Find the friction point they complain about most.</strong> The thing they mention unprompted. The task that makes them groan. The process they describe as &#8220;the worst part of my week.&#8221;</p></li><li><p><strong>Build for that person, then generalize.</strong> Create the solution that eliminates their specific friction. Then ask: who else has this same friction in this same context? Those people are your ICP.</p></li></ol><p>This method works because it anchors your product in observed behavior instead of imagined needs. You solve a real problem for a real person. Other people with the same problem recognize themselves in your messaging because it describes their actual experience.</p><h3>The Two Big Beginner Mistakes</h3><p><strong>Mistake 1: &#8220;My ICP is everyone who might pay.&#8221;</strong> This feels safe. It is the opposite of safe. Broad targeting produces generic messaging. Generic messaging converts at a fraction of specific messaging. You attract marginal customers who churn fast because the product serves everyone poorly instead of serving someone exceptionally.</p><p>Fix: Define your ICP by best-fit criteria and disqualifiers. Write down who you serve and who you deliberately exclude. Disqualifiers sharpen your positioning as much as qualifiers. &#8220;We help e-commerce operators doing $1M to $10M. Enterprise teams and solopreneurs fall outside our focus.&#8221;</p><p><strong>Mistake 2: Choosing a niche based on passion or identity, assuming the market rewards authenticity.</strong> Passion is a starting point. It falls short as a standalone strategy. The market rewards value, and value requires craft. Building for a niche you love where you lack skill and produces mediocre products competing against people with genuine expertise.</p><p>Fix: Replace passion-first with craft plus pull. Craft means your skill gives you an edge. Pull means the market signals demand. When craft and pull align, you have a sustainable position. When they misalign, you have a hobby.</p><h2>The Prompt Toolkit</h2><h3>ICP Extraction Prompt</h3><p>Copy the prompt below, replace the placeholder with your business idea, and paste it into any LLM.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;ef537be1-e10e-4124-a3ab-5be721537ada&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">&lt;role&gt;You are a ruthless ICP analyst. You eliminate weak assumptions and surface the truth about whether a business idea has a viable target customer.&lt;/role&gt;

&lt;task&gt;Run the 4-Filter Test on the business idea below. Score each filter from 1 to 10. For each filter, provide the score, the reasoning, and the specific evidence a builder should gather to validate or invalidate the score. Be brutally honest. Affirmative evidence only; discard wishful thinking.&lt;/task&gt;

&lt;filters&gt;
 &lt;filter name="Pain"&gt;
 &lt;question&gt;Are real people experiencing this problem and actively seeking solutions right now?&lt;/question&gt;
 &lt;scoring_guide&gt;10 = People post daily in public forums begging for a fix. 5 = People complain occasionally. 1 = You assume the pain exists based on logic alone.&lt;/scoring_guide&gt;
 &lt;/filter&gt;
 &lt;filter name="Market"&gt;
 &lt;question&gt;Is there a group already spending money to solve this problem?&lt;/question&gt;
 &lt;scoring_guide&gt;10 = Multiple paid products with pricing pages and reviews. 5 = One or two niche tools exist. 1 = Zero paid solutions exist.&lt;/scoring_guide&gt;
 &lt;/filter&gt;
 &lt;filter name="Access"&gt;
 &lt;question&gt;Can you reach these people through channels you can actually use this week?&lt;/question&gt;
 &lt;scoring_guide&gt;10 = You already have an audience or direct connection. 5 = You can reach them through public communities. 1 = They hide behind gatekeepers and enterprise sales cycles.&lt;/scoring_guide&gt;
 &lt;/filter&gt;
 &lt;filter name="Fit"&gt;
 &lt;question&gt;Does your skill or experience give you an edge with this group?&lt;/question&gt;
 &lt;scoring_guide&gt;10 = You have years of domain expertise and a network. 5 = You have adjacent skills. 1 = You have zero connection to this world.&lt;/scoring_guide&gt;
 &lt;/filter&gt;
&lt;/filters&gt;

&lt;output_format&gt;
Return your response in this exact structure:
&lt;result&gt;
 &lt;idea_summary&gt;One-sentence restatement of the idea&lt;/idea_summary&gt;
 &lt;filter name="Pain" score="X"&gt;Reasoning and evidence to gather&lt;/filter&gt;
 &lt;filter name="Market" score="X"&gt;Reasoning and evidence to gather&lt;/filter&gt;
 &lt;filter name="Access" score="X"&gt;Reasoning and evidence to gather&lt;/filter&gt;
 &lt;filter name="Fit" score="X"&gt;Reasoning and evidence to gather&lt;/filter&gt;
 &lt;total_score&gt;X/40&lt;/total_score&gt;
 &lt;verdict&gt;PASS if total &gt;= 28, CONDITIONAL if 20-27, FAIL if below 20&lt;/verdict&gt;
 &lt;next_action&gt;One specific thing the builder should do next&lt;/next_action&gt;
&lt;/result&gt;
&lt;/output_format&gt;

&lt;business_idea&gt;
[PASTE YOUR BUSINESS IDEA HERE]
&lt;/business_idea&gt;
</code></pre></div><h3>Lighthouse Client Prompt</h3><p>Copy the prompt below, answer the questions honestly, and paste it into any LLM.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;36fe3c19-f85b-48ba-ad30-54d2a6529ae5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">&lt;role&gt;You are a product strategist who specializes in grounding abstract customer profiles in real human behavior. Your method is the Lighthouse Client Method: find one real person, observe their actual day, and surface the friction that drives purchasing.&lt;/role&gt;

&lt;task&gt;Walk me through the Lighthouse Client Method step by step. Ask me one question at a time. Wait for my answer before proceeding to the next step. Complete all four steps.&lt;/task&gt;

&lt;steps&gt;
 &lt;step number="1" name="Identify"&gt;
 &lt;instruction&gt;Ask me to name one specific person I would love to help. This must be a real individual I can picture clearly: a former colleague, a past client, someone from a community I belong to. Ask for their first name (or alias), their role, and their industry.&lt;/instruction&gt;
 &lt;/step&gt;
 &lt;step number="2" name="Map the Day"&gt;
 &lt;instruction&gt;Ask me to describe this person's typical workday from morning to evening. Prompt me to include: what tools they open, what meetings they attend, what tasks consume their time, and what feels like wasted effort. Probe for specifics.&lt;/instruction&gt;
 &lt;/step&gt;
 &lt;step number="3" name="Find the Friction"&gt;
 &lt;instruction&gt;Ask me to identify the single task this person complains about most. The thing they mention unprompted. The process that makes them groan. Ask what they have tried to fix it and why those attempts fell short.&lt;/instruction&gt;
 &lt;/step&gt;
 &lt;step number="4" name="Generalize"&gt;
 &lt;instruction&gt;Based on everything I shared, produce a one-paragraph ICP statement in this format: "People like [name], who are [role] at [type of company], who struggle with [specific friction] because [root cause], and who would pay for [outcome]."&lt;/instruction&gt;
 &lt;/step&gt;
&lt;/steps&gt;

&lt;output_format&gt;
After I complete all four steps, output:
&lt;lighthouse_result&gt;
 &lt;client_profile&gt;Summary of the person I described&lt;/client_profile&gt;
 &lt;friction_point&gt;The specific pain you identified&lt;/friction_point&gt;
 &lt;icp_statement&gt;The one-paragraph ICP statement from Step 4&lt;/icp_statement&gt;
 &lt;validation_checklist&gt;Three specific actions I should take this week to confirm this friction exists for five more people&lt;/validation_checklist&gt;
&lt;/lighthouse_result&gt;
&lt;/output_format&gt;
</code></pre></div><h3>ICP Validation CLI</h3><p>Save the script below as <code>icp_check.py</code>, set your <code>OPENROUTER_API_KEY</code> environment variable, and run it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;9764e23a-3394-4b36-92fb-596452140727&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import argparse, os, json, urllib.request

def main():
 p = argparse.ArgumentParser(description="4-Filter ICP Assessment via OpenRouter")
 p.add_argument("idea", help="Your business idea description")
 p.add_argument("--model", default="google/gemini-2.0-flash-001")
 args = p.parse_args()
 key = os.environ.get("OPENROUTER_API_KEY", "")
 assert key, "Set OPENROUTER_API_KEY env var"
 prompt = f"""Score this business idea on the 4-Filter ICP Test. Each filter gets 1-10.
Filters: Pain (active problem seekers?), Market (existing spend?), Access (reachable channels?), Fit (your edge?).
Idea: {args.idea}
Return JSON only: {{"pain": int, "market": int, "access": int, "fit": int, "total": int, "verdict": "PASS|CONDITIONAL|FAIL"}}"""
 body = json.dumps({"model": args.model, "messages": [{"role": "user", "content": prompt}]}).encode()
 req = urllib.request.Request("https://openrouter.ai/api/v1/chat/completions", data=body,
 headers={"Authorization": f"Bearer {key}", "Content-Type": "application/json"})
 resp = json.loads(urllib.request.urlopen(req).read())
 r = json.loads(resp["choices"][0]["message"]["content"])
 print(f"Pain: {r['pain']}/10 | Market: {r['market']}/10 | Access: {r['access']}/10 | Fit: {r['fit']}/10")
 print(f"Total: {r['total']}/40 | Verdict: {r['verdict']}")

if __name__ == "__main__":
 main()
</code></pre></div><h2>Caveats</h2><p>The 4-Filter Test eliminates bad ICPs fast. It can also create false confidence if you lie to yourself on any filter. Confirmation bias is the enemy. Run each filter assuming your ICP fails, and look for evidence that it passes. The opposite approach, seeking evidence that confirms your hope, leads to the same wasted months the test is designed to prevent.</p><p>The Lighthouse Client Method risks overfitting to one person. Your lighthouse client may have idiosyncratic needs that diverge from the broader market. After building for them, validate that the problem generalizes. Talk to five more people in the same segment. If three of five describe the same friction, you have product-market signal. If only one of five does, you have a consulting client.</p><p>Markets shift. An ICP that passes all four filters today may fail in six months as conditions change. Revisit the test quarterly. Treat your ICP as a hypothesis, and treat revenue as the experiment result.</p><h2>Philosophy</h2><p>The best product strategy starts with ruthless selection, and selection means elimination. Every person you exclude from your ICP makes your messaging sharper for the people who remain. Every filter you apply removes a possible path to wasted effort.</p><p>Building AI tools is easier than ever. The moat has moved from technical execution to problem selection. The builders who win are the ones who chose the right problem before they wrote a single function. The 4-Filter Test is how you choose correctly.</p><p>Specificity is generosity. A vague ICP leaves every reader uncertain whether this product serves them. A precise ICP tells the right people, &#8220;this was built for you, and you can see it.&#8221; That clarity converts.</p><div><hr></div><p><em>This is the first entry in the Caliber Series, a paid column on building and selling AI tools. The next article breaks down how to validate your ICP in 48 hours using zero but free tools and five conversations. Upgrade to access the full series.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://archonhq.ai/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Build Your Own VS Code AI Agent independently of GitHub Copilot]]></title><description><![CDATA[Build a VS Code extension that routes completions through your local AI models]]></description><link>https://archonhq.ai/p/build-your-own-vs-code-ai-agent-without</link><guid isPermaLink="false">https://archonhq.ai/p/build-your-own-vs-code-ai-agent-without</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Sat, 16 May 2026 21:37:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ODj2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ODj2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ODj2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!ODj2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!ODj2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!ODj2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ODj2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2742069,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/197804869?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ODj2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!ODj2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!ODj2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!ODj2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28f427ee-f192-4b9e-a98f-7ea00e797b49_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Your GitHub Copilot subscription hits $10/month. The completions feel sluggish when your internet connection drops. The model suggestions lean generic, trained on everything but optimized for your specific codebase patterns.</p><p>You&#8217;re paying monthly for AI assistance while being locked into Microsoft&#8217;s inference servers, data policies, and model choices. Meanwhile, your local machine sits idle with 32GB of RAM and a capable GPU.</p><p>What if you could build a VS Code extension that routes completions through your local AI models, processes requests in 200ms, and learns your coding patterns autonomously?</p><h2>The Idea (60 Seconds)</h2><p>You&#8217;ll create a custom VS Code extension using the Language Server Protocol to intercept completion requests and route them through local models like Ollama or LM Studio. The system provides real-time code suggestions, context-aware completions, and chat functionality while running entirely offline. Setup takes 30 minutes. The result replaces Copilot with a faster, customizable, cost-free alternative.</p><h2>Why Local Models, Beyond Cloud APIs</h2><p><strong>Latency drops to milliseconds.</strong> Cloud completions travel to Microsoft&#8217;s servers and back. Local inference happens on your machine. The difference between 800ms and 150ms changes how you code.</p><p><strong>Context stays private.</strong> Your proprietary code remains on your hardware. Zero data leaves your network. Zero logs hit external servers. Your IP stays yours.</p><p><strong>Customization becomes possible.</strong> You control the model, the prompts, and the training data. Fine-tune on your codebase. Adjust temperature for your preferences. Switch models per project.</p><p><strong>Costs disappear.</strong> The subscription fee vanishes. Inference runs on hardware you already own. Scale usage based on your machine&#8217;s capacity, beyond monthly limits.</p><h2>Walkthrough</h2>
      <p>
          <a href="https://archonhq.ai/p/build-your-own-vs-code-ai-agent-without">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Clone Needle: Build a 26M Parameter Tool-Calling Model]]></title><description><![CDATA[Distill Gemini's function-calling into a tiny model that runs locally &#8212; replacing expensive cloud APIs with free inference]]></description><link>https://archonhq.ai/p/clone-needle-build-a-26m-parameter</link><guid isPermaLink="false">https://archonhq.ai/p/clone-needle-build-a-26m-parameter</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Fri, 15 May 2026 05:06:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nnzW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nnzW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nnzW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png 424w, https://substackcdn.com/image/fetch/$s_!nnzW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png 848w, https://substackcdn.com/image/fetch/$s_!nnzW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png 1272w, https://substackcdn.com/image/fetch/$s_!nnzW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nnzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png" width="1100" height="550" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:550,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:829036,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/197669750?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nnzW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png 424w, https://substackcdn.com/image/fetch/$s_!nnzW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png 848w, https://substackcdn.com/image/fetch/$s_!nnzW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png 1272w, https://substackcdn.com/image/fetch/$s_!nnzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c73f11e-c79c-4365-88c8-9e2a3a71cbb5_1100x550.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Your production app calls OpenAI&#8217;s API 847 times per day. Each function call costs $0.002. The monthly bill hits $380. Your CFO asks pointed questions about &#8220;AI infrastructure costs&#8221; in the quarterly review.</p><p>Meanwhile, your tool-calling needs are embarrassingly simple. Parse JSON. Validate schemas. Route function calls. Extract parameters. A 175B parameter model feels like hiring a PhD to sort your mail.</p><p>What if you could distill those capabilities into a 26M parameter model that runs locally, costs zero per inference, and handles 90% of your tool-calling workload?</p><h2>The Idea (60 Seconds)</h2><p>You&#8217;ll build a lightweight tool-calling model by distilling Gemini&#8217;s function-calling behavior into a compact transformer. The 26M parameter model runs locally, processes tool calls in 50-100ms, and handles structured JSON output with schema validation. Training takes 4 hours on a single GPU. The result replaces expensive API calls for routine function routing and parameter extraction.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Why Distillation, Beyond Fine-tuning</h2><p><strong>Fine-tuning starts with random weights.</strong> You&#8217;re teaching a model to speak tool-calling from scratch. Distillation starts with a teacher model that already excels at function calls. You&#8217;re copying expertise, instead of building it.</p><p><strong>Data efficiency matters more than parameter count.</strong> Fine-tuning needs 50K+ examples to learn tool-calling patterns. Distillation works with 5K teacher-student pairs because the student learns from the teacher&#8217;s internal representations, beyond just input-output mappings.</p><p><strong>Gemini&#8217;s tool-calling is already production-tested.</strong> Google spent millions optimizing function call accuracy. Distillation captures that optimization in a model you own completely.</p><p>The math is simple: 5K distillation examples vs 50K fine-tuning examples. 4 hours vs 40 hours. $20 in compute vs $200.</p><h2>Walkthrough</h2><h3>1. Generate Teacher-Student Data</h3><p>Start by collecting Gemini&#8217;s tool-calling behavior across diverse function schemas:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;9aeb10c4-5d4d-4e9f-b189-9498108184a4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># data_generation.py
import google.generativeai as genai
import json
from typing import List, Dict

class ToolCallDataGenerator:
    def __init__(self, api_key: str):
        genai.configure(api_key=api_key)
        self.model = genai.GenerativeModel('gemini-1.5-flash')
        
    def generate_function_call_data(self, schemas: List[Dict], num_examples: int = 5000):
        examples = []
        
        for i in range(num_examples):
            # Sample random function schema
            schema = random.choice(schemas)
            
            # Generate natural language request
            prompt = self.create_natural_prompt(schema)
            
            # Get Gemini's function call response
            response = self.model.generate_content(
                prompt,
                tools=[schema],
                tool_config={'function_calling_config': {'mode': 'ANY'}}
            )
            
            if response.candidates[0].content.parts[0].function_call:
                examples.append({
                    'input': prompt,
                    'function_schema': schema,
                    'teacher_output': response.candidates[0].content.parts[0].function_call,
                    'raw_response': response.text
                })
                
        return examples
    
    def create_natural_prompt(self, schema: Dict) -&gt; str:
        # Generate varied natural language that would trigger this function
        function_name = schema['function']['name']
        
        templates = {
            'weather': [
                "What's the weather like in {city}?",
                "Check the forecast for {city}",
                "Is it raining in {city} today?"
            ],
            'calculator': [
                "Calculate {expression}",
                "What's {expression}?",
                "Solve {expression} for me"
            ]
        }
        
        # Fill templates with realistic data
        return self.fill_template(templates.get(function_name, ["Use the {function_name} function"]))
</code></pre></div><h3>2. Build the Student Model Architecture</h3><p>Create a compact transformer optimized for tool-calling output:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;4daebf73-f041-4288-b11c-cc113eede473&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># model.py
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM

class ToolCallingModel(nn.Module):
    def __init__(self, vocab_size: int = 32000, d_model: int = 512, n_heads: int = 8, n_layers: int = 6):
        super().__init__()
        
        # 26M parameters: 6 layers, 512 hidden, 8 heads
        self.embedding = nn.Embedding(vocab_size, d_model)
        self.pos_encoding = nn.Parameter(torch.randn(2048, d_model))
        
        self.transformer_blocks = nn.ModuleList([
            TransformerBlock(d_model, n_heads) for _ in range(n_layers)
        ])
        
        self.ln_final = nn.LayerNorm(d_model)
        self.output_head = nn.Linear(d_model, vocab_size)
        
        # Special tokens for function calling
        self.function_start_token = vocab_size - 4
        self.function_end_token = vocab_size - 3
        self.param_sep_token = vocab_size - 2
        
    def forward(self, input_ids, attention_mask=None):
        seq_len = input_ids.shape[1]
        
        # Embeddings + positional encoding
        x = self.embedding(input_ids) + self.pos_encoding[:seq_len]
        
        # Transformer layers
        for block in self.transformer_blocks:
            x = block(x, attention_mask)
            
        x = self.ln_final(x)
        return self.output_head(x)
</code></pre></div><h3>3. Implement Knowledge Distillation Training</h3><p>Train the student to mimic both Gemini&#8217;s outputs and internal representations:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;5a4d8f79-0bd1-4918-a33f-dfb9930a50a9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># distillation_trainer.py
class DistillationTrainer:
    def __init__(self, student_model, teacher_model, tokenizer):
        self.student = student_model
        self.teacher = teacher_model
        self.tokenizer = tokenizer
        
        # Distillation hyperparameters
        self.temperature = 4.0
        self.alpha = 0.7  # Weight for distillation loss
        self.beta = 0.3   # Weight for hard target loss
        
    def distillation_loss(self, student_logits, teacher_logits, hard_targets):
        # Soft target loss (knowledge distillation)
        soft_loss = nn.KLDivLoss(reduction='batchmean')(
            F.log_softmax(student_logits / self.temperature, dim=-1),
            F.softmax(teacher_logits / self.temperature, dim=-1)
        ) * (self.temperature ** 2)
        
        # Hard target loss (actual function calls)
        hard_loss = F.cross_entropy(
            student_logits.view(-1, student_logits.size(-1)),
            hard_targets.view(-1),
            ignore_index=-100
        )
        
        return self.alpha * soft_loss + self.beta * hard_loss
    
    def train_step(self, batch):
        input_ids = batch['input_ids']
        function_call_targets = batch['function_call_targets']
        
        # Get teacher predictions (no gradients)
        with torch.no_grad():
            teacher_logits = self.teacher(input_ids).logits
            
        # Get student predictions
        student_logits = self.student(input_ids)
        
        # Calculate distillation loss
        loss = self.distillation_loss(
            student_logits, 
            teacher_logits, 
            function_call_targets
        )
        
        return loss
</code></pre></div><h3>4. Create the Inference Server</h3><p>Build a FastAPI server that handles tool calls with JSON schema validation:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;8f4b232f-cf85-4a07-897d-60372f176829&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python"># inference_server.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import torch
import json

app = FastAPI()

class ToolCallRequest(BaseModel):
    prompt: str
    available_functions: List[Dict]
    max_tokens: int = 150

class ToolCallResponse(BaseModel):
    function_name: str
    parameters: Dict
    confidence: float

@app.post("/tool-call", response_model=ToolCallResponse)
async def generate_tool_call(request: ToolCallRequest):
    try:
        # Tokenize input with function schemas
        input_text = format_prompt_with_schemas(request.prompt, request.available_functions)
        tokens = tokenizer.encode(input_text, return_tensors='pt')
        
        # Generate function call
        with torch.no_grad():
            output = model.generate(
                tokens,
                max_length=tokens.shape[1] + request.max_tokens,
                temperature=0.1,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id
            )
        
        # Parse function call from output
        generated_text = tokenizer.decode(output[0][tokens.shape[1]:], skip_special_tokens=True)
        function_call = parse_function_call(generated_text)
        
        # Validate against schema
        validate_function_call(function_call, request.available_functions)
        
        return ToolCallResponse(
            function_name=function_call['name'],
            parameters=function_call['parameters'],
            confidence=calculate_confidence(output)
        )
        
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
</code></pre></div><h3>5. CLI Interface</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;7ceced1d-8be6-42a2-91a6-37398a768c20&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext"># Install and run
pip install torch transformers fastapi uvicorn

# Start the server
python inference_server.py

# Test a function call
curl -X POST "http://localhost:8000/tool-call" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is the weather in San Francisco?",
    "available_functions": [{
      "name": "get_weather",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {"type": "string"},
          "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
        }
      }
    }]
  }'
</code></pre></div><h2>Caveats</h2><p><strong>Complex reasoning fails.</strong> The 26M model handles straightforward parameter extraction and function routing. Multi-step reasoning, ambiguous queries, and edge cases still need the teacher model or GPT-4.</p><p><strong>Schema validation is strict.</strong> The model learns patterns from training data. Novel function schemas or unusual parameter types can break inference. Keep a fallback to cloud APIs for schema mismatches.</p><p><strong>Training data quality determines ceiling performance.</strong> Bad teacher examples create bad student behavior. Gemini occasionally generates malformed function calls. Clean your distillation dataset aggressively.</p><p>Performance benchmarks from my testing: 87% accuracy on single-function calls, 72% on multi-function scenarios, 15ms average inference time on RTX 4090.</p><h2>Philosophy</h2><p>Tool-calling models represent the future of local AI inference. Most production applications need structured output, parameter extraction, and API routing. These tasks require precision over creativity.</p><p>The distillation approach captures expert behavior in compact models you control completely. Zero API dependencies. Zero per-inference costs. Zero data leaving your infrastructure.</p><p>This pattern extends beyond tool-calling. Distill code generation, text classification, structured data extraction. Build a library of specialized models that replace expensive API calls with fast local inference.</p><p>The 26M parameter model becomes your function-calling foundation. Expand it. Specialize it. Deploy it everywhere.</p><h2>Build Your Clone</h2><p>Start with the data generation script above. Collect 5K Gemini examples across your target function schemas. Train the distillation model on a single GPU for 4 hours. Deploy the inference server.</p><p>Your tool-calling costs drop to zero. Your inference speed increases 10x. Your data stays local.</p><p>What function-calling use case will you tackle first?</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://archonhq.ai/subscribe?"><span>Subscribe now</span></a></p><p></p><p></p><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Software 3.0 Is Not a Silver Bullet: Why Engineering Expertise Still Wins with LLMs]]></title><description><![CDATA[LLMs are the new operating system, prompting is the new programming, and anyone can build software now]]></description><link>https://archonhq.ai/p/software-30-is-not-a-silver-bullet</link><guid isPermaLink="false">https://archonhq.ai/p/software-30-is-not-a-silver-bullet</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Mon, 04 May 2026 03:18:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ewku!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ewku!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ewku!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ewku!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ewku!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ewku!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ewku!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg" width="1100" height="567" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:567,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:292542,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/196378021?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ewku!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ewku!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ewku!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ewku!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69647a6-c585-4653-a520-f006ad8feb07_1100x567.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Andrej Karpathy calls it Software 3.0. YC calls it the biggest shift since high-level languages. The framing is seductive: LLMs are the new operating system, prompting is the new programming, and anyone can build software now.</p><p>Except they can&#8217;t.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>I&#8217;ve watched the same pattern repeat for a while. A non-engineer pastes a vague request into ChatGPT, gets something that looks plausible, ships it, and watches it fall apart under real conditions. Meanwhile, I take the same starting point and end up with a robust CLI that runs autonomously on a cron job, handles edge cases, and compounds value every day.<br>Same models. Same access. Radically different outcomes. The difference isn&#8217;t the prompt. It&#8217;s the engineering.</p><p>You can watch the whole lecture here:</p><div id="youtube2-LCEmiRjPEtQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;LCEmiRjPEtQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/LCEmiRjPEtQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>The Idea (60 Seconds)</h2><p>Software 3.0 is real. LLMs are a new kind of runtime. But the &#8220;anyone can code now&#8221; narrative is incomplete. What Karpathy and the YC lecture get right is the mental model shift; LLMs as operating systems, not smarter search engines. What they understate is how much engineering discipline that OS still demands.</p><p>The LLM is a lossy runtime. It hallucinates. It forgets context. It produces plausible garbage. Treating it like a magic oracle gets you slop. Treating it like a savant with cognitive issues, brilliant but unreliable, and engineering around those limitations is what separates working solutions from demo toys.</p><p>The engineers who master this don&#8217;t just write better prompts. They build systems around the LLM: scaffolding, verification loops, fallback chains, state management, and evaluation frameworks. The prompt is the interface. The engineering is the product.</p><h2>Why Non-Engineers Produce Slop</h2><p>When a non-engineer asks an LLM to &#8220;build me a tool that extracts data from a website and sends a daily email,&#8221; they get a script. It works. Once. On the happy path. With the data formatted exactly the way they showed in their example.</p><p>When I build the same thing, here&#8217;s what happens in my head before I type a single prompt:</p><ul><li><p>How does this fail when the website changes its layout?</p></li><li><p>What happens when the API returns empty data instead of the expected format?</p></li><li><p>Where does state live so we can detect anomalies?</p></li><li><p>How do we make this idempotent so re-running doesn&#8217;t send duplicate emails?</p></li><li><p>What&#8217;s the time budget for each operation, and what&#8217;s the hard ceiling?</p></li></ul><p>These aren&#8217;t prompt engineering tricks. These are engineering instincts: problem decomposition, error handling, state management, verification. The LLM doesn&#8217;t eliminate the need for these. It accelerates the implementation but makes the consequences of skipping them more dangerous, because now your broken code runs automatically.</p><p>The YC lecture nails this indirectly: &#8220;you have to design around their limitations rather than expecting perfect, human-like reliability.&#8221; That&#8217;s engineering. That&#8217;s always been engineering. The medium changed. The discipline didn&#8217;t.</p><p>This is why I believe there isn&#8217;t going to be a drop in demand for engineering jobs. In fact, there is sufficient evidence from past events (such as the industrial revolution, the internet, digitisation of everything) that greater capability almost always demands greater supply of skills. People just need to adapt to new ways of working.</p><h2>The LLM as Lossy Runtime</h2><p>Karpathy&#8217;s operating system metaphor is useful. An OS has defined syscalls, documented error codes, and deterministic behavior for the same inputs. An LLM has none of these.</p><p>A better mental model: the LLM is a brilliant but unreliable non-deterministic co-processor. It can execute tasks that would take you hours in seconds, but it sometimes returns wrong results with absolute confidence. Your job as the engineer isn&#8217;t to write the perfect prompt that prevents errors, instead, it&#8217;s to build the harness that catches them.</p><p>Here&#8217;s what that harness looks like in practice.</p><h2>Technique 1: Structured Scaffolding, Not Free-Form Prompts</h2><p>The single biggest upgrade from slop to production: never accept free-form LLM output for anything you plan to use programmatically.</p><p>When you ask &#8220;write me a CLI,&#8221; you get markdown, prose explanations, and code in varying states of completeness. When you ask for a specific JSON schema, you get a parseable artifact you can pipe into your build system.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;eb7fd8f7-4d46-4e8b-a15c-626228a9e805&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">You are an expert Python engineer. Always respond in valid JSON with this exact schema:

{
  "thinking": "step-by-step reasoning and trade-offs",
  "code": "complete, ready-to-run code with inline comments",
  "explanation": "usage, edge cases, and assumptions",
  "tests": ["list of test cases or pytest snippets"]
}

Only output the JSON. No markdown. No prose outside the schema.
</code></pre></div><p>This is the LLM equivalent of typed function signatures. You wouldn&#8217;t write a production API that returns unstructured text. Don&#8217;t let your LLM do it either.</p><p>Pair this with JSON mode in Claude or structured output in OpenAI, and you&#8217;ve turned a probabilistic text generator into a deterministic data pipeline &#8212; at least at the interface boundary.</p><h2>Technique 2: Self-Critique Loops</h2><p>The generate-once-and-ship pattern is how you get slop. The generate-critique-revise pattern is how you get quality.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;065f1c4c-48c9-4312-abb9-cab40caa007c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">First, plan the solution step-by-step. Consider architecture,
error handling, and UX.

[After getting initial output]

Now critique the above as a senior staff engineer. Score 1-10 on:
correctness, robustness, usability, maintainability.
List specific fixes. Then output a revised version in the same
JSON schema.
</code></pre></div><p>Two or three cycles of this consistently produces dramatically better output. The LLM is excellent at critiquing its own work when given clear evaluation criteria, it just won&#8217;t do it unprompted.</p><p>This mirrors how senior engineers actually write code: draft, review, revise. The difference is the cycle time. What takes a human hours takes the LLM seconds. The bottleneck shifts from writing to evaluating.</p><h2>Technique 3: Context Packaging as Modular Architecture</h2><p>Most people paste their entire codebase into a prompt and hope for the best. Engineers treat context like code modules; structured, versioned, and reusable.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;a47ccf09-f9da-43bd-8e8e-ee9e3970fa46&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">&lt;context&gt;
[paste relevant code, docs, or previous outputs here]
&lt;/context&gt;

&lt;requirements&gt;
- Use Typer + Rich
- Zero unnecessary dependencies
- Full --help support
- Graceful error handling
&lt;/requirements&gt;

&lt;task&gt;
[Specific request]
&lt;/task&gt;
</code></pre></div><p>XML tags aren&#8217;t magic. But they give the LLM clear boundaries between different types of information. The model processes tagged sections more reliably than unstructured walls of text.</p><p>This scales. Maintain a library of context blocks; your project&#8217;s architecture, your coding conventions, your error handling patterns. Swap them in and out as needed. This is the LLM equivalent of import statements.</p><h2>Technique 4: Model Routing, Not Model Loyalty</h2><p>No single model is best at everything. The engineers getting the most from LLMs aren&#8217;t loyal to one model &#8212; they route tasks based on what each model does best.</p><p>In my daily pipeline:</p><ul><li><p><strong>Fast model</strong> (Gemini Flash, Haiku) for data extraction, formatting, and routine summarization</p></li><li><p><strong>Strong model</strong> (Claude Opus) for judgment calls &#8212; clinical pattern recognition, nuanced coaching recommendations, anything where missing a subtle signal costs more than the extra latency</p></li></ul><p>The routing logic itself is deterministic code. The models are interchangeable components. When a new model drops that&#8217;s faster or cheaper, I swap it in for the appropriate tier and verify against my test suite.</p><p>This is operating system thinking. You don&#8217;t write code that only runs on one processor. You write against abstractions and let the runtime choose the best execution path.</p><h2>Technique 5: Verification as Architecture</h2><p>The YC lecture mentions speeding up the &#8220;generation &#8594; verification loop.&#8221; That undersells it. Verification isn&#8217;t a loop; it&#8217;s the architecture.</p><p>Every LLM output in my production system goes through verification before it&#8217;s used:</p><ul><li><p><strong>Schema validation</strong> &#8212; does the JSON match the expected structure?</p></li><li><p><strong>Content validation</strong> &#8212; are the key fields populated? Are values in plausible ranges?</p></li><li><p><strong>State validation</strong> &#8212; does this output contradict our stored history? (A client who had 33 check-ins yesterday doesn&#8217;t have one today.)</p></li><li><p><strong>Quality validation</strong> &#8212; is the report complete? Does it have all 5 required sections?</p></li></ul><p>If any validation fails, the system doesn&#8217;t silently accept bad output. It retries with a different approach, falls back to stored data, or escalates to a human. The LLM is just one component in a larger system with defined invariants.</p><p>This is the engineering discipline that separates &#8220;I built a cool demo&#8221; from &#8220;this runs my business while I sleep.&#8221;</p><h2>The Skills Layer: Where It All Compounds</h2><p>Here&#8217;s where Software 3.0 genuinely shifts the game. Individual prompts are disposable. Skills are compound interest.</p><p>A skill is a reusable, versioned, self-contained workflow that packages:</p><ul><li><p>The system prompt (personality, constraints, output format)</p></li><li><p>The context (domain knowledge, project conventions)</p></li><li><p>The tools (what the agent can call)</p></li><li><p>The verification (what &#8220;done&#8221; looks like)</p></li><li><p>The fallbacks (what happens when things break)</p></li></ul><p>I have a skill that analyzes fitness check-in data and generates coaching reports. It took 50+ iterations to get right. Now it runs twice daily on a cron job, processing 18 clients autonomously. Each run costs about $0.30 in LLM tokens. The equivalent human effort would be 4+ hours of a coach&#8217;s time.</p><p>The skill is the artifact. The prompts that built it are long gone. This is the shift from &#8220;prompting&#8221; to &#8220;engineering&#8221;, you&#8217;re not optimizing a single interaction, you&#8217;re building a system that improves with every edge case you handle and every failure mode you fix.</p><p>Non-engineers can create impressive one-off outputs. Engineers create systems that compound.</p><h2>The Honest Truth About Software 3.0</h2><p>Karpathy is right that we&#8217;re in the 1960s of LLMs. The 1960s weren&#8217;t democratic. Assembly language existed, but the people who built reliable systems were the ones who understood memory management, error handling, and hardware constraints.</p><p>We&#8217;re in the same place now, just with natural language instead of assembly. The barrier to entry is lower, you can produce something that runs on your first try. But the barrier to <em>reliability</em> hasn&#8217;t moved. Building something that handles edge cases, degrades gracefully, and runs unattended for months still requires engineering discipline.</p><p>The tools will improve. Context windows will grow. Models will get more reliable. The verification loops will tighten. But the core skill (treating the LLM as a brilliant, unreliable component in a larger system you engineer) will remain the differentiator.</p><p>Software 3.0 doesn&#8217;t democratize great software. It supercharges the people who already know how to build it. The gap between engineers and non-engineers isn&#8217;t closing, it&#8217;s widening, because engineers are building compound systems while everyone else is still optimizing single prompts.</p><p>The engineers who master this now who build skills, design verification architectures, and treat the LLM as a lossy runtime to engineer around will define the next decade of software.</p><p>Everyone else will keep wondering why their &#8220;automated&#8221; workflow broke again this morning.</p><p><em>Building production AI automation? <a href="https://archonhq.ai">ArchonHQ</a> gives you the skills architecture, verification frameworks, and orchestration tools to turn LLMs from toys into reliable systems. Stop prompting. Start engineering.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Automate Anything with AI Skills and CLIs - Your New Superpower in 2026]]></title><description><![CDATA[Automate almost any repetitive workflow with AI by stacking five layers with AI]]></description><link>https://archonhq.ai/p/automate-anything-with-ai-skills</link><guid isPermaLink="false">https://archonhq.ai/p/automate-anything-with-ai-skills</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Fri, 17 Apr 2026 07:04:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Yeh0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yeh0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yeh0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Yeh0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Yeh0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Yeh0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yeh0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg" width="1100" height="580" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:580,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:280015,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/194488145?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yeh0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Yeh0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Yeh0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Yeh0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F735a305a-4b01-4bb2-bbfc-6b6fe6ebf331_1100x580.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You&#8217;ve seen it. A fitness coach spending three hours every Friday night reviewing client checkins, scrolling through twenty clients, each with weight data, waist measurements, sleep scores, nutrition compliance, workout logs, injury reports, and subjective mood ratings. Copy the numbers. Compare to last week. Write the feedback. Send the email. Next client. Repeat until midnight.</p><p>That&#8217;s data entry with a human in the loop.</p><p>This pattern repeats across every small business: a SaaS platform that holds your data hostage, a manual review process that scales linearly with clients, and an expert whose time gets eaten by the 80% of the work that&#8217;s pattern-matching rather than judgment.</p><p>Here&#8217;s what I built to fix it, and the pattern you can copy for almost any repetitive knowledge workflow.</p><h2>The Idea (60 Seconds)</h2><p>You can automate almost any repetitive workflow by stacking five layers:</p><ol><li><p><strong>Reverse-engineer</strong> the data source (even absent a formal API)</p></li><li><p><strong>Build a CLI</strong> that extracts and structures the data</p></li><li><p><strong>Use an LLM</strong> to do the analysis a human used to do manually</p></li><li><p><strong>Package it as a skill</strong> so your AI agent can repeat the process reliably</p></li><li><p><strong>Schedule it with cron</strong> so it runs on autopilot</p></li></ol><p>Most people stop at layer one, they ask ChatGPT a question and get an answer. That&#8217;s a conversation. Automation is when the system runs autonomously.</p><p>I&#8217;m going to show you exactly how I built all five layers for a fitness coaching business. The platform lacked a public API. The data was locked behind a login screen. The analysis required professional domain knowledge. And the whole thing needed to run daily and email reports to a real coach with real clients.</p><p>The result: <strong>75% less manual work</strong>. A coach who used to spend 12+ hours per week reviewing checkins now spends 2 hours scanning AI-generated reports and producing videos based on the analysis for their clients. Instant value-add, zero duplication.</p><h2>Why Skills and CLIs Are the Underrated Superpower of 2026</h2><p>Everyone&#8217;s talking about AI chatbots and agents. The real unlock goes unmentioned.</p><p>The AI maturity ladder looks like this:</p><p>Level What You Do Runs Without You? 1. Chat Ask a question, get an answer No 2. Prompt Library Reuse tested prompts No 3. CLI Script that takes arguments and runs Yes (manually triggered) 4. Skill Packaged workflow your AI agent can load and execute Yes (agent-triggered) 5. Cron Scheduled autonomous execution Yes (fully automatic)</p><p>Most people are at level 1 or 2. They have folders full of prompts. They paste them into ChatGPT and copy the output. That&#8217;s better than a blank page, but it plateaus.</p><p><strong>A CLI compounds.</strong> You build it once, debug it, and it works forever. You can pipe data into it. You can chain it with other CLIs. You can schedule it.</p><p><strong>A skill compounds harder.</strong> A skill is a <code>SKILL.md</code> file that packages your entire workflow, triggers, inputs, steps, gotchas, and all the hard-won lessons from debugging. Your AI agent reads it and knows exactly what to do. Every bug you fix, every edge case you discover, gets baked in permanently. The skill is the artifact. The automation is the side effect.</p><p><strong>Cron is the payoff.</strong> When your CLI runs on a schedule autonomously, you&#8217;ve shipped automation. Production.</p><p>This is the pattern most people are missing in 2026. They&#8217;re using AI to answer questions when they should be using it to <strong>replace themselves</strong>.</p><h2>Reverse-Engineering as an AI Superpower</h2><p>Here&#8217;s the uncomfortable truth: <strong>the most valuable AI skill in 2026 is reverse-engineering, outranking even prompting.</strong></p><p>Most business data is locked in SaaS apps lacking an export button, API documentation, or webhooks. The vendor wants you inside their walled garden. Your data is their leverage.</p><p>AI makes the downstream analysis trivially easy, pass data to an LLM, get insights back. But <strong>the upstream problem hasn&#8217;t changed</strong>: you still need to get the data out first. And that requires a craft skill that most &#8220;AI practitioners&#8221; have yet to develop.</p><p>I&#8217;m going to walk through the exact reverse-engineering process I used on Kahunas.io, a fitness coaching platform with zero public API documentation. These techniques work on almost any web app.</p>
      <p>
          <a href="https://archonhq.ai/p/automate-anything-with-ai-skills">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Build with AI Like a Professional Engineering Team]]></title><description><![CDATA[Production-quality apps with a professional AI engineering team by your side]]></description><link>https://archonhq.ai/p/build-with-ai-like-a-professional</link><guid isPermaLink="false">https://archonhq.ai/p/build-with-ai-like-a-professional</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Tue, 14 Apr 2026 06:43:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mIor!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mIor!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mIor!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png 424w, https://substackcdn.com/image/fetch/$s_!mIor!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png 848w, https://substackcdn.com/image/fetch/$s_!mIor!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png 1272w, https://substackcdn.com/image/fetch/$s_!mIor!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mIor!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png" width="1100" height="380" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:380,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mIor!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png 424w, https://substackcdn.com/image/fetch/$s_!mIor!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png 848w, https://substackcdn.com/image/fetch/$s_!mIor!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png 1272w, https://substackcdn.com/image/fetch/$s_!mIor!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8cfcf249-5fda-4896-b6e8-d0e10b5f6474_1100x380.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The AI software slop problem</h2><p>You&#8217;ve seen it. Maybe you&#8217;ve done it. Open a coding agent, type &#8220;build me a SaaS app,&#8221; and watch it spit out 2,000 lines of code in twelve seconds. It compiles. It even runs. You ship it.</p><p>Three weeks later you&#8217;re debugging why the auth middleware silently skips validation on expired tokens. The database queries have no error handling. There are hardcoded API keys in config files. The tests, if they exist, test that functions return something, not that they return the right thing. Nobody reviewed the architecture because there was no architect. Nobody caught the security holes because there was no security review. Nobody checked if the code actually matched the spec because there was no spec.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This is the AI software slop problem. The code <em>looks</em> professional. It reads like professional code. But it was written by a single agent working in isolation with no oversight, no review gates, and no engineering process. It&#8217;s the software equivalent of a kid copying homework, the answers look right until you check the working.</p><p>The fix isn&#8217;t better prompts. The fix isn&#8217;t a smarter model. The fix is the same thing that stopped human developers from shipping garbage: <strong>process and specialisation</strong>. You need an engineering team, not a code fountain.</p><p>In my tests, the same issue exists no matter what coding agent or model you use. I tried building fairly sizable projects using Claude Code and Opus 4.6, Codex and GPT-5.4 or GPT-5.3-codex. I tried OpenCode with Kimi-K2.5, Minimax-M2.7 and Droid Agent with GLM-5.1. Some agents a marginally better than others because they have an in-built harness. However, until you really sit down and map what needs to happen from an engineering delivery and quality assurance point of view, you&#8217;re just a kid with crayons painting pretty pictures that are completely unmaintainable. Great for an MVP or testing an idea, but not great for delivering production grade software.</p><p>This harness and approach changes that.</p><h2>The Basic Idea</h2><p>Professional software teams don&#8217;t work the way most people use AI coding agents.</p><p>In a real team, you have an architect who designs the system and writes Architecture Decision Records, but never writes implementation code. You have a planner who breaks features into phased delivery plans with exact file paths and dependencies. You have developers who write code test-first and refuse to touch files outside their scope. You have code reviewers who catch what the developer missed. You have a security reviewer who runs OWASP Top 10 checks and issues a BLOCK/WARN/PASS verdict. You have a TDD (Test Driven Development) guide who enforces red-green-refactor. You have a database reviewer, an E2E test runner, a refactoring specialist.</p><p>Each role has a narrow, well-defined responsibility. Each role has hard boundaries, things it will not do. The architect never writes code. The security reviewer never edits files. The developer never adds helpers that aren&#8217;t in the spec.</p><p>What if you could give your AI coding agent the same structure?</p><p>That&#8217;s what <a href="https://github.com/MikeS071/ai-dev-harness">ai-dev-harness</a> does. It&#8217;s an open-source framework that installs a complete set of agent profiles, skills, lifecycle policies, and working rules into your project. Your coding agent doesn&#8217;t just write code, it works within an engineering system that keeps the code honest.</p><h2>Why This Setup and What You Get</h2><p>The harness gives you three things that solo AI coding doesn&#8217;t:</p><p><strong>1. Agent profiles with hard boundaries.</strong> Fourteen specialist roles, each with explicit permissions and restrictions. The architect reads code and produces ADRs, its tool access is limited to read, grep, and glob. The code-agent writes code and runs tests, but must declare a scope boundary before touching anything. The security-reviewer scans for vulnerabilities and produces a structured report with a verdict, but never edits a file. The planner creates phased implementation plans with exact file paths, but never writes code. These aren&#8217;t soft suggestions, they&#8217;re enforced by the agent&#8217;s tool configuration and written into each profile&#8217;s &#8220;What NOT to Do&#8221; section.</p><p><strong>2. Skills that encode engineering practices.</strong> Eighteen skills covering the full development lifecycle: api-design, backend-patterns, frontend-patterns, database-migrations, deployment-patterns, docker-patterns, security-review, e2e-testing, tdd-workflow, verification-loop, coding-standards, golang-patterns, python-patterns, postgres-patterns, and more. Each skill activates contextually, the tdd-workflow skill triggers when writing new features or fixing bugs, the verification-loop skill triggers after completing significant changes, the security-review skill triggers when handling auth or data protection.</p><p><strong>3. A lifecycle policy that routes work to the right agent.</strong> A TOML configuration maps ticket types to specialist profiles. A new feature goes to the code-agent. A code quality gap goes to the code-reviewer. A security concern goes to the security-reviewer. A database change goes to the database-reviewer. You don&#8217;t have to remember which agent does what, the harness routes it automatically.</p><p>Together, these three layers mean your AI agent works within a system and has standards to follow, review gates to pass, and specialist roles to lean on. The result is code that survives contact with production, because it was built like production code.</p><h2>The Architecture Principles Every Project Starts With</h2><p>Before you write a single line of code, before you even spec out a feature, there are principles that should be baked into every solution. These aren&#8217;t optional. They&#8217;re not &#8220;nice to haves&#8221; you add later. They&#8217;re the difference between a system that works and a system that <em>keeps working</em>.</p><p>The harness encodes these into its agent profiles, skills, and working rules. But even if you never use the harness, these are the non-negotiables. Print them out. Stick them on the wall. Reference them in every code review. If your project violates one of these, you need a damn good reason documented in an ADR.</p><p><strong>1. Modularity, single responsibility, high cohesion, low coupling.</strong> Every file does one thing. Every function does one thing. If you can&#8217;t describe what a file does in one sentence, it does too much. Target 200-400 lines per file. Hard stop at 800. When a file crosses 800 lines, it&#8217;s telling you it has too many responsibilities. Split it.</p><p><strong>2. Explicit error handling, no silent failures, ever.</strong> Every async call has error handling. Every external dependency call has a try/catch or equivalent. No silent catch blocks. No swallowing errors and continuing. If something fails, the system knows it failed, logs it, and responds appropriately. A silent failure is a lying system. Lying systems kill people&#8217;s data.</p><p><strong>3. Input validation at boundaries.</strong> Validate all external input at the system boundary, API endpoints, message consumers, file parsers, before it reaches business logic. Internal code trusts internal data. External code trusts nothing. This is where injection attacks live. Validate early, validate once, validate completely.</p><p><strong>4. Immutability by default across async boundaries.</strong> Shared mutable state across async boundaries is a bug factory. Data that crosses an async boundary should be immutable, copied, not referenced. If two concurrent processes can modify the same object, you will get race conditions. Not maybe. Will.</p><p><strong>5. Security, defense in depth, least privilege.</strong> Authentication on every protected route. Authorization checked on every operation. No hardcoded secrets, all config via environment variables. No PII in logs. Errors sanitized before reaching clients. Security is not a layer you add at the end. It&#8217;s a constraint you design for from the start.</p><p><strong>6. Stateless services where possible.</strong> Design for horizontal scaling from day one. If your service holds session state, it can&#8217;t be scaled by adding instances. Push state to the client or to a dedicated state store. Stateless services are easier to deploy, easier to scale, easier to recover. Design for 10x before needing 100x.</p><p><strong>7. API contracts before implementation.</strong> Define your API surface first: method, path, request body, response body, auth requirements, error responses. Write it down. Share it. Build against it. This is the contract between your frontend and backend, between your services, between your team. If the contract is wrong, the implementation doesn&#8217;t matter.</p><p><strong>8. Test-driven development, tests before code, always.</strong> Write failing tests that describe the expected behaviour. Then write the minimum code to make them pass. Then refactor. Red-green-refactor. Every time. No exceptions. Target 80% coverage on branches, functions, and lines. If coverage is below threshold, write more tests, never lower the threshold.</p><p><strong>9. Phased delivery, each phase independently mergeable.</strong> Break work into phases that can ship on their own. M0: repo and infrastructure. M1: foundation. M2: features. M3: quality and operations. If a phase can&#8217;t be merged without the next phase, the plan is wrong. This isn&#8217;t bureaucratic, it&#8217;s how you keep the blast radius of any bug bounded.</p><p><strong>10. No hardcoded values, config is external.</strong> API URLs, feature flags, timeout values, retry counts, rate limits, these change between environments. Hardcode them and you&#8217;re deploying code to change a timeout. Externalize them and you&#8217;re editing a config file. Every magic number in your codebase is a deployment risk.</p><p><strong>11. Consistent patterns over clever solutions.</strong> When the existing codebase uses a pattern, use that pattern. Don&#8217;t introduce a &#8220;better&#8221; approach that only you understand. Consistency beats cleverness every time. If the pattern is genuinely wrong, write an ADR, get agreement, then change the pattern everywhere. Don&#8217;t leave two patterns coexisting.</p><p><strong>12. Logging, security-sensitive operations are always logged.</strong> Auth attempts, permission changes, data access, payment operations. If it&#8217;s security-sensitive and it&#8217;s not logged, you have no forensics when something goes wrong. Log the operation, the actor, the timestamp, and the outcome. Not the sensitive data itself, the fact that the operation happened.</p><p><strong>13. Dependency management, know what you depend on.</strong> Run dependency audits regularly. No high or critical CVEs in your dependencies. Pin your versions. Know what each dependency does and why it&#8217;s there. A dependency you don&#8217;t understand is a supply-chain attack vector you can&#8217;t defend against.</p><p><strong>14. Documentation stays in sync with code.</strong> Stale documentation is worse than no documentation, it&#8217;s actively misleading. When code changes, docs change. README, API docs, ADRs, runbooks, all of it. The doc-updater profile exists because this is hard for humans. It&#8217;s harder for AI agents, which will happily implement a feature and forget the README exists.</p><p><strong>15. Design for failure.</strong> Every external call will fail. Every database will have slow days. Every third-party API will timeout. Design for it. Circuit breakers, retries with backoff, fallback responses, graceful degradation. If your system assumes the network is reliable, the network will teach you otherwise.</p><p>These fifteen principles are the starting point. Not the ending point, you&#8217;ll add domain-specific principles as you learn more about your problem. But if your project violates any of these, the violation should be intentional, documented, and justified. Not accidental.</p><p>The harness doesn&#8217;t just suggest these, it enforces them through agent constraints, skill workflows, and review gates. But even without the harness, this list is your pre-flight checklist. Run through it before every project. Run through it during every review gate. If something&#8217;s missing, fix it before you ship.</p><h2>Step 1: Install and Configure Your Coding Agent</h2><p>Pick your coding agent. I use Factory&#8217;s Droid, but this setup works with Codex, Claude, or any agent that can read a repo and follow instructions. The harness is agent-agnostic, it&#8217;s a set of files and conventions, not a vendor lock-in.</p><p>Install your agent&#8217;s CLI and make sure it can read files, write files, run shell commands, and grep your codebase. That&#8217;s the baseline. If your agent can do those four things, the harness works.</p><p>Don&#8217;t skip the shell access. Half the value of this system comes from running tests, builds, and security scans. If your agent can&#8217;t execute commands, you lose the verification-loop skill, the security-reviewer&#8217;s automated scans, and the TDD workflow. You&#8217;re left with a very expensive syntax highlighter.</p><h2>Step 2: Initialise Your New Project Folder</h2><p>Create your project directory and point your coding agent at the harness repo. The harness bootstraps your project with:</p><ul><li><p>.agents/profiles/, the fourteen specialist roles</p></li><li><p>.agents/skills/, the eighteen engineering practice skills</p></li><li><p>lifecycle-policy.toml, ticket-type-to-agent routing</p></li><li><p>AGENTS.md, working rules, conventions, and coding standards</p></li><li><p>CODEX_INITIAL_PROMPT.md, the initialisation and phased delivery model</p></li></ul><p>The initialisation prompt (in the Prompt Library below) tells the agent to read the entire harness repo, follow the setup instructions, install dependencies, create a GitHub repo, and then learn all the agent roles and skills. This is not a quick step, it takes a few minutes. Let it run. The agent is reading every profile, every skill, every working rule. It&#8217;s building a mental model of your engineering team.</p><p>Once initialised, your project has a structure that any coding agent can pick up and work within. The rules live in the repo, not in your head.</p><h2>Step 3: Enable Your Engineering Team to Do Work</h2><p>This is where it gets real. The harness defines fourteen specialist profiles. Here&#8217;s what each one does and why it matters:</p><p><strong>Architect</strong>, Read-only. Gathers evidence before recommending. Produces Architecture Decision Records with explicit trade-off analysis (Pros/Cons/Alternatives/Decision with single clear rationale). Never writes implementation code. The architect exists to stop you from jumping straight to coding without understanding the problem.</p><p><strong>Planner</strong>, Read-only. Breaks features into independently deliverable phases with exact file paths, dependencies, complexity estimates, and testing strategy. Each phase must be mergeable on its own. The planner catches &#8220;update the API&#8221; vagueness and forces specificity: <em>which file, which function, which endpoints</em>.</p><p><strong>Code Agent</strong>, The implementer. Executes one scoped ticket at a time. Must declare assumptions, scope boundary, and what it&#8217;s <em>not</em> building before touching any files. TDD process: write failing tests, implement minimum code, run quality gates, fix failures, commit. Never touches files outside scope. Never adds helpers not in the spec.</p><p><strong>Code Reviewer</strong>, Read-only. Reviews code for quality, patterns, and consistency. Catches what the code agent missed.</p><p><strong>Security Reviewer</strong>, Read-only. Runs automated scans for hardcoded secrets, dependency vulnerabilities, SQL injection patterns, and sensitive data in logs. Then does a manual OWASP Top 10 review on changed code. Produces a structured report with BLOCK/WARN/PASS verdict. Any CRITICAL or HIGH finding blocks the phase.</p><p><strong>TDD Guide</strong>, Writes failing tests before implementation exists. Enforces red-green-refactor. Requires 80% coverage on branches, functions, and lines. If coverage is below threshold, writes more tests, never lowers the threshold.</p><p><strong>Database Reviewer</strong>, Reviews schema changes, migration safety, and query patterns.</p><p><strong>E2E Runner</strong>, Runs end-to-end test suites (Playwright) covering critical user flows.</p><p><strong>Doc Updater</strong>, Keeps documentation in sync with code changes.</p><p><strong>Refactor Cleaner</strong>, Removes duplication, improves naming, reduces file sizes. Targets 200-400 lines per file, hard stop at 800.</p><p><strong>Build Error Resolver</strong>, Fixes build failures and CI pipeline issues.</p><p><strong>Go Build Resolver / Go Reviewer</strong>, Go-specific build and review specialists.</p><p><strong>Python Reviewer</strong>, Python-specific review with pattern enforcement.</p><p>The key insight: each profile has a &#8220;What NOT to Do&#8221; section. These boundaries are what make the system work. Without them, every agent turns into the same generic code generator.</p><h2>Step 4: Create Your First Feature or Build Spec</h2><p>Now you use the &#8220;Spec Out&#8221; prompt from the Prompt Library. You tell the agent what you want to build and it produces two things: a production-ready build spec and a phased checklist roadmap.</p><p>The build spec covers architecture decisions, data models, API contracts, error handling strategy, and security considerations. It follows the harness standards, modularity (single responsibility, 200-400 lines per file), explicit error handling (no silent catches), input validation at boundaries, immutability by default across async boundaries.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j_Pg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j_Pg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!j_Pg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!j_Pg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!j_Pg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j_Pg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png" width="1024" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:183933,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/194156699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j_Pg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!j_Pg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!j_Pg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!j_Pg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b6dc63-cd8a-4a45-80ad-97163b024419_1024x768.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The checklist roadmap breaks the work into independently deliverable phases: M0 (repo activation and infrastructure), M1 (foundation, core models and base endpoints), M2 (product core, feature implementation), M3 (quality and operations, security hardening, E2E tests, monitoring). Each phase has clear success criteria and can be merged on its own.</p><p>Here&#8217;s the important part: the prompt asks the agent to ask you up to 30 questions, one by one, to clarify anything uncertain before building starts. This pattern is deceptively powerful. It forces the AI to extrapolate on solution components you haven&#8217;t thought about, authentication strategy, error surface area, data migration paths, rate limiting, cache invalidation. It&#8217;s like having a senior engineer sit next to you and say &#8220;have you considered what happens when the payment provider times out?&#8221; before you write a single line of code.</p><p>Don&#8217;t skip the 30 questions. This is where the harness earns its keep. The agent has read all the profiles and skills, it will surface concerns that the architect profile would raise, security issues the security-reviewer would catch, edge cases the TDD guide would test for. Answer the questions. Lock in the assumptions. Then build from a position of certainty to create a much better quality solution.</p><h2>Step 5: Create Quality Validation Gates for Every Phase</h2><p>Every phase of your checklist roadmap needs a review gate. The &#8220;Add a Review Gate&#8221; prompt in the Prompt Library appends a verification step to each phase that the agent cannot skip past.</p><p>The review gate does three things: compares the code developed in that phase against the build spec, identifies any functionality gaps, and closes them. It enforces 80% test coverage minimum. The phase cannot be called complete until the gate passes.</p><p>This is the single most important step in the entire process. Without review gates, AI agents drift. They forget parts of the spec. The review gate forces the agent to audit its own work against the plan before moving on.</p><p>The gate works because the agent already has the spec in context. It&#8217;s comparing what it built against what it was told to build. This catches two classes of problems: missed functionality (the spec said to validate input at boundaries, but the agent only validated on the happy path) and quality gaps (the spec required 80% coverage, but the agent only wrote happy-path tests).</p><p>Run the review gate after every phase, especially the ones that feel straightforward, that&#8217;s where complacency lives.</p><h2>Step 6: Do Walk-throughs of All Your Screens and Update Specs</h2><p>This step should perhaps be the first step of this whole process. Let me explain.</p><p>In a normal cycle of deciding what you want to build you have to spend some time thinking about and then writing down your requirements. Often, this happens in a simple document, bullet point lists etc. This is ok if your system is fairly simple to conceptualise but what happens if you want to build something a bit more complex? How will the AI actually know what you&#8217;re trying to accomplish?</p><p>Some people who I&#8217;ve spoken to say, &#8220;well, just talk to the AI&#8221;. Ok, true, talking to the AI one prompt after another kind of makes the approach iterative which is fine. But by doing so, you don&#8217;t really have a true picture of the entire system and more importantly because you don&#8217;t have the holistic view, you can&#8217;t make good architectural decisions or tradoffs.</p><p>Maybe there is a better way?</p><p>There is. Let&#8217;s say your system is a SaaS app with several screens with complex, non-trivial use case(s). Writing down a bunch of bullet points is probably not going to cut it. Instead try this:</p><ol><li><p>Actually design the UI before the AI agent starts building anything. You can easily use <a href="https://stitch.withgoogle.com">https://stitch.withgoogle.com</a> or any other UI designer (e.g. <a href="http://www.figma.com">www.figma.com</a>) and design how the user interaction is meant to occur. Essentially convert your thinking process into a tangible look and feel so you can confirm your idea working. For example, how we did this design recently:</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z8y8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z8y8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png 424w, https://substackcdn.com/image/fetch/$s_!z8y8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png 848w, https://substackcdn.com/image/fetch/$s_!z8y8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png 1272w, https://substackcdn.com/image/fetch/$s_!z8y8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z8y8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png" width="632" height="845" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:845,&quot;width&quot;:632,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249400,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/194156699?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z8y8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png 424w, https://substackcdn.com/image/fetch/$s_!z8y8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png 848w, https://substackcdn.com/image/fetch/$s_!z8y8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png 1272w, https://substackcdn.com/image/fetch/$s_!z8y8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91cf3d5c-62d7-4408-be22-36051af8f3a7_632x845.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol start="2"><li><p>Once you&#8217;re happy with your UI prototype, do a complete walk-through with a speech-to-text recorder. Just talk to yourself or a friend as you walk through every screen, every component, button and text-box. Ask yourself, why is this here? What is it meant to do? What was the expected behaviour? How does this work?</p></li><li><p>Then, feed the exported UI prototype code and the walk-through transcript into your AI agent and say</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6c010693-b345-4aa7-837f-e2f72233fa79&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">"Read the UI code and the transcript and update all specs to ensure the system works as I expect. The transcript is your source of truth."</code></pre></div></li></ol><p>You&#8217;d be amazed how much better and quicker you can build complex, great looking and high quality apps using this approach.</p><p>I had a genuine &#8220;aha&#8221; moment when I finally put this in place for one of my projects.</p><h2>Step 7: Build Your System Phase by Phase</h2><p>This is where the phased delivery model pays off. You don&#8217;t type &#8220;build everything&#8221; and hope for the best. You build M0, verify it passes, then M1, verify it passes, and so on.</p><p>The harness enforces a natural cadence:</p><ul><li><p><strong>M0, Repo Activation:</strong> Initialise the project, install dependencies, set up CI, create the GitHub repo. No feature code yet. Just infrastructure.</p></li><li><p><strong>M1, Foundation:</strong> Core data models, base API structure, auth middleware, database migrations. The smallest slice that compiles and runs.</p></li><li><p><strong>M2, Product Core:</strong> Feature implementation. This is where the code-agent does the heavy lifting, guided by the planner&#8217;s phased breakdown. Each feature gets its own TDD cycle.</p></li><li><p><strong>M3, Quality and Operations:</strong> Security hardening (the security-reviewer runs full OWASP Top 10), E2E tests (the e2e-runner covers critical user flows), monitoring, documentation. The polish phase.</p></li></ul><p>Each phase is independently mergeable. If M1 passes but M2 has issues, you can ship M1 while you debug M2. This is not accidental, the planner profile explicitly requires that each phase can be delivered independently. Plans that require all phases to complete before anything works are rejected.</p><p>When you build phase by phase, debugging is tractable. If something breaks in M2, you know it wasn&#8217;t broken in M1 because M1 passed its review gate. The blast radius of any bug is bounded by the current phase.</p><h2>Step 8: Test and Fix --&gt; Test and Fix</h2><p>This step isn&#8217;t a one-time action. It&#8217;s a loop. Run tests, find failures, fix them, run tests again. Repeat until green.</p><p>The harness gives you multiple testing layers:</p><p><strong>Unit tests</strong>, Written by the TDD guide before implementation. Test individual functions and components. Target &lt; 50ms per test. Mock only external dependencies (database, HTTP, file system). Never mock the thing being tested.</p><p><strong>Integration tests</strong>, API endpoints, database operations, service interactions. Test that the pieces connect correctly.</p><p><strong>E2E tests</strong>, Playwright tests covering critical user flows. Login, search, create, update, delete. The full journey.</p><p><strong>Security scans</strong>, The security-reviewer runs automated checks for hardcoded secrets, vulnerable dependencies, SQL injection patterns, and sensitive data in logs. Then a manual OWASP Top 10 pass.</p><p><strong>Verification loop</strong>, A six-phase check: build, type check, lint, test suite with coverage, security scan, diff review. The loop produces a structured report: BUILD PASS/FAIL, TYPES PASS/FAIL, LINT PASS/FAIL, TESTS X/Y passed with Z% coverage, SECURITY PASS/FAIL, X files changed, Overall: READY/NOT READY for PR.</p><p>The verification loop runs after every significant change. Not just at the end. After each phase, after each feature, after each refactor. It&#8217;s the safety net that catches regressions before they compound.</p><p>When tests fail, the harness has a specific flow: the build-error-resolver profile handles build failures, the code-agent fixes functional test failures, the security-reviewer addresses security findings. You&#8217;re not debugging alone, the right specialist is assigned by the lifecycle policy.</p><p>Keep looping until everything is green. Don&#8217;t move to the next phase with failing tests. That&#8217;s not discipline, that&#8217;s engineering.</p><h2>Where This System Breaks</h2><p>Nothing is perfect. Here&#8217;s where this approach falls down:</p><p><strong>The agent can ignore the rules.</strong> The profiles and skills are instructions, not compiled constraints. A sufficiently confused or corner-cutting agent will violate its &#8220;What NOT to Do&#8221; list. The review gates catch most of this, but they&#8217;re not foolproof. You still need to read the code.</p><p><strong>Context window pressure.</strong> Fourteen profiles, eighteen skills, working rules, and your entire build spec, that&#8217;s a lot of context. For long sessions, the strategic-compact skill helps by suggesting compaction at logical boundaries (after research, before implementation; after a milestone; before a context shift). But if you&#8217;re working on a massive codebase, you&#8217;ll feel the token pressure.</p><p><strong>The 30-questions pattern can stall.</strong> If the agent asks thirty questions and you don&#8217;t know the answers, you&#8217;ll spend more time researching than building. This is actually a feature, not a bug, it means you&#8217;re designing before coding, but it can feel slow if you just want to see something working.</p><p><strong>Multi-agent coordination is still emergent.</strong> The harness defines roles and routing, but it doesn&#8217;t have a central orchestrator that automatically spins up the security reviewer after the code agent finishes. You, the human, decide when to invoke which agent. The lifecycle policy suggests the right routing, but you drive the sequence.</p><p><strong>The harness can&#8217;t fix bad requirements.</strong> If your build spec is vague, contradictory, or incomplete, no amount of review gates will save you. The 30-questions pattern helps surface ambiguities, but if you answer &#8220;I don&#8217;t know&#8221; to half of them, the spec will have holes that show up as bugs later.</p><p>None of these are reasons to skip the system. They&#8217;re reasons to understand its limits and apply it where it fits, production software that needs to work correctly, securely, and maintainably. For everything else, a solo coding agent is fine. Just don&#8217;t pretend the output is production-ready</p><h2>The Complete Prompt Library</h2><h3>Initialise a New Project</h3><p>Use this prompt right at the start of any new project. The assumption here is that the droid coding agent is being used. However, if something else is used, like codex or claude than that&#8217;s ok. The project structure will still work correctly.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;bca065c4-a889-449d-a16c-fb70aa91a136&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">cd ~/new-project-dir
droid</code></pre></div><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;bca065c4-a889-449d-a16c-fb70aa91a136&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read the repo in [https://github.com/MikeS071/ai-dev-harness] and follow all instructions to initialise a new project in [new-project-dir]. Ensure all dependencies are installed and are ready to use and remote github repo for this new project has been created and initialised. Once initialised, read the [new-project-dir] and learn all the instructions, skills, agent roles so all capabilities can be used to build this new project.</code></pre></div><h3>Spec Out a New Feature or Project</h3><p>Use this prompt to start designing a new feature or system. The harness knows the standards, structure and broad requirements based on good system design. By asking it to spec things out what we want is to create a production-ready build spec together with a reference-able checklist roadmap so the AI agent can execute when ready and you can keep track of where things are at during a multi-day development cycle.<br>You may also notice that there is a &#8220;Ask me upto 30 questions...&#8221; instruction at the end. That pattern is actually super important as it tells the AI to lock in 30 most critical assumptions - you&#8217;d be surprised how well this works in this scenario and can be used in other scenarios as it forces the AI to extrapolate on solution components that you may not have thought about. It&#8217;s like having a senior engineer sit side-by-side with you and giving you advice.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;955579fe-5b22-4a35-8841-ea29827e2f34&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">I want to design and spec out a new feature/system/solution called [feature-name]. Help me plan this out, design a comprehensive and production-ready build spec and also produce a checklist roadmap for all phases of the delivery. Ensure all standards, instructions and good-system design principles are followed. Ask me upto 30 most critical questions one by one to clarify anything that is still uncertain and needs to be locked-in before building starts.</code></pre></div><h3>Add a Review Gate to Each Phase</h3><p>Use this prompt to add a review gate for each phase of the build spec checklist. The idea here is to minimise hallucinations or issues with missed development of code. While AI Agents are great, they do miss things, so forcing a review of it&#8217;s own work actually improves your chances of getting a high quality, working solution. Use this prompt as a second pass over the build spec/checklist once it has been generated.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;fedc783e-4b35-45b4-91fd-113d94adc6b8&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Add a review gate to each phase by adding this exact prompt to the end of each checklist/roadmap phase. Each phase cannot be called complete until the review gate has passed:
"Review the code developed in this phase and compare to the existing build specs. Identify any functionality gaps and if you find any material gaps then close them by building relevant code. Ensure no material gaps exist and that test coverage of 80% or higher. The phase cannot be called complete until the review gate is successfully passed."</code></pre></div><h3>Build the System</h3><p>Once you have the production-ready build specs and checklist/roadmap, it is a simple matter of just saying &#8220;build it&#8221; to the AI Agent. I suggest you build the system phase by phase to manage complexity and to check that phases pass before moving on to the next section. This makes debugging or fixing issues a lot simpler.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;edcc92be-825d-4b76-8157-46ff1ad62905&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Build Phase M0 (or M1 or M2 etc)</code></pre></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://archonhq.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">ArchonHQ is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Build an LLM Knowledge Base That Actually Compounds]]></title><description><![CDATA[Full system, exact prompts, real configuration, and honest about where it breaks.]]></description><link>https://archonhq.ai/p/build-an-llm-knowledge-base-that</link><guid isPermaLink="false">https://archonhq.ai/p/build-an-llm-knowledge-base-that</guid><dc:creator><![CDATA[Michal Szalinski]]></dc:creator><pubDate>Fri, 10 Apr 2026 07:52:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3mES!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3mES!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3mES!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png 424w, https://substackcdn.com/image/fetch/$s_!3mES!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png 848w, https://substackcdn.com/image/fetch/$s_!3mES!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png 1272w, https://substackcdn.com/image/fetch/$s_!3mES!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3mES!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png" width="1100" height="380" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:380,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:952584,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://archonhq.ai/i/193767834?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3mES!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png 424w, https://substackcdn.com/image/fetch/$s_!3mES!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png 848w, https://substackcdn.com/image/fetch/$s_!3mES!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png 1272w, https://substackcdn.com/image/fetch/$s_!3mES!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44d83054-0b13-4b0c-a2d8-7701207653f0_1100x380.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Problem You Already Know You Have</h2><p>You have knowledge scattered everywhere. Articles saved in 4 apps. Bookmarks from 2023 you&#8217;ll never revisit. Notes from meetings in a folder you forgot existed.</p><p>When you ask AI a question about your stuff, it starts from zero every time. Upload docs, ask a question, get an answer. Next session? Forgotten everything. That&#8217;s how ChatGPT file uploads, NotebookLM, and most RAG systems work. Zero accumulation.</p><h2>The Idea (60 Seconds)</h2><p>Instead of the AI searching your raw files every time, the AI reads your sources <em>once</em> and compiles a structured wiki. Summaries, cross-references, connections between ideas, contradictions flagged. All maintained by the AI. All in simple markdown files.</p><p>Next time you ask a question, the AI doesn&#8217;t dig through raw documents. It reads the wiki it already built. The connections are already there. Every new source you add makes the wiki richer. Every question you ask can get filed back in. Knowledge compounds instead of resetting.</p><p>No database. No embeddings. No vector store. Just folders and text files.</p><h2>Why This Setup, Not The Others</h2><p>Three things make this particular stack worth your time:</p><ol><li><p><strong>Factory Droid reads and writes local files natively.</strong> No copy-paste. No uploading. The AI operates directly on your filesystem reading PDFs, creating wiki pages, updating the index, all in one pass.</p></li><li><p><strong>OpenRouter gives you any model.</strong> I run <code>glm-5.1</code> through OpenRouter. You can swap to Claude, GPT-4o, Gemini, Llama, or any model that drops next week. One config change. No vendor lock-in on the intelligence layer.</p></li><li><p><strong>Obsidian renders the wiki as it&#8217;s built.</strong> Graph view, backlinks, search, YAML properties, all work automatically because the AI writes standard Obsidian-compatible markdown. You see the knowledge base grow in real time.</p></li></ol><h2>What You Need</h2><ul><li><p><strong>Factory Droid</strong> - AI coding agent that reads/writes local files (<a href="https://factory.ai">factory.ai</a>)</p></li><li><p><strong>OpenRouter API key</strong> - model gateway that lets you use any LLM (<a href="https://openrouter.ai">openrouter.ai</a>)</p></li><li><p><strong>Obsidian</strong> - markdown editor with wiki-link support (<a href="https://obsidian.md">obsidian.md</a>)</p></li><li><p><strong>10+ source documents</strong> on a topic you care about</p></li><li><p><strong>30 minutes</strong> for initial setup, then <strong>10 minutes per source</strong> after that</p></li></ul><p>No special software beyond these three. No accounts beyond these two. No plugins to install.</p><h2>Step 1: Install and Configure Factory Droid (5 Minutes)</h2><p>Install Droid:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;f6e935b8-e017-4993-8972-5e70c5116636&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext"># macOS / Linux
curl -fsSL https://app.factory.ai/cli | sh
</code></pre></div><p>Authenticate:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;81dfae9c-31bb-47a2-a3e9-c848af2d894c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">droid login
</code></pre></div><h3>Add Your Custom Model via OpenRouter</h3><p>This is how you run <em>any</em> model through Droid, not just the default ones. I use glm-5.1. You can use whatever OpenRouter supports.</p><p>Edit ~/.factory/settings.json (create it if it doesn&#8217;t exist):</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;650ae5d9-c291-40d1-a42a-c22c5d31d5f7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">{
  "customModels": [
    {
      "model": "z-ai/glm-5.1",
      "displayName": "GLM-5.1 [OpenRouter]",
      "baseUrl": "https://openrouter.ai/api/v1",
      "apiKey": "YOUR_OPENROUTER_API_KEY",
      "provider": "generic-chat-completion-api",
      "maxOutputTokens": 65536
    }
  ]
}
</code></pre></div><p>To use a different model, change the model field to any model ID from openrouter.ai/models. The displayName is just what shows up in the Droid UI.</p><p>To select your custom model in a session:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;edf0688b-a7b6-4f2b-bac5-b0e9a4079081&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">droid --model "z-ai/glm-5.1"
</code></pre></div><p>Or type /model inside a running Droid session to switch on the fly.</p><h2>Step 2: Create the Folder Structure (2 Minutes)</h2><p>Create this anywhere on your machine:</p><pre><code><code>my-knowledge-base/
&#9500;&#9472;&#9472; raw/           # Your source material. AI reads but never modifies.
&#9474;   &#9492;&#9472;&#9472; assets/    # Images, screenshots, diagrams
&#9500;&#9472;&#9472; wiki/          # AI-maintained wiki. You read. AI writes.
&#9500;&#9472;&#9472; outputs/       # Reports, analyses, answers from queries
&#9492;&#9472;&#9472; AGENTS.md      # The schema file that makes this whole thing work
</code></code></pre><p>Three folders, one file. If you&#8217;re spending more than 2 minutes here, you&#8217;re overthinking it.</p><h2>Step 3: Write Your Schema File (The Step Everyone Skips)</h2><p>The schema is the difference between a generic chatbot and a disciplined wiki maintainer. It tells your AI what the knowledge base is about, how to organize it, and what to do when you add sources, ask questions, or run maintenance.<br>As you can also see, I have multiple knowledge domains in my structure. You don&#8217;t have to setup multi-domain folders, one will do just fine. If that&#8217;s the case, just remove or rename what you don&#8217;t want.</p><p>Save this as AGENTS.md in your project root:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;markdown&quot;,&quot;nodeId&quot;:&quot;421bbfec-abba-41c2-a514-32856eb80692&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-markdown"># Knowledge Base Schema

## Identity
This is a personal knowledge base about [YOUR TOPIC HERE].
Maintained by an LLM agent. The human curates sources and asks questions. The LLM does everything else.

## Architecture
- raw/ contains immutable source documents. NEVER modify files in raw/.
- wiki/ contains the compiled wiki. The LLM owns this directory entirely.
  - wiki/architecture/ -- Enterprise and solution architecture
  - wiki/resilience-ops/ -- Resilience, operations, SRE
  - wiki/data-ai/ -- Data platforms, ML, AI
  - wiki/security/ -- Security, IAM, data protection
  - wiki/software-engineering/ -- Software design, practices, CI/CD
  - wiki/ai-automation/ -- AI for business process automation
  - wiki/index.md -- Master index of all pages by domain
  - wiki/log.md -- Append-only chronological record
  - Cross-cutting pages (e.g. contradictions-and-tensions.md) live at wiki/ root
  - Each domain folder has a home.md landing page listing its pages
- outputs/ contains generated reports, analyses, and query answers.

## Wiki Conventions
- Every topic gets its own .md file in the appropriate domain folder under wiki/
- Every wiki file starts with YAML frontmatter:
  ---
  title: [Topic Name]
  created: [Date]
  last_updated: [Date]
  source_count: [Number of raw sources that informed this page]
  status: [draft | reviewed | needs_update]
  ---
- After frontmatter, a one-paragraph summary
- Use [[topic-name]] for internal links between wiki pages
- Every factual claim cites its source: [Source: filename.md]
- When new info contradicts existing content, flag explicitly:
  &gt; CONTRADICTION: [old claim] vs [new claim] from [source]

## Index and Log
- wiki/index.md lists every page by domain with a one-line description
- wiki/log.md is append-only chronological record
- Log entry format: ## [YYYY-MM-DD] action | Description
  (Actions: ingest, query, lint, update)

## Ingest Workflow
When processing a new source:
1. Read the full source document
2. Discuss key takeaways with user
3. Create or update a summary page in the appropriate wiki/ domain folder
4. Update wiki/index.md and the domain's home.md
5. Update ALL relevant entity and concept pages across the wiki
6. Add backlinks from existing pages to new content
7. Flag any contradictions with existing wiki content
8. Append entry to wiki/log.md
9. A single source should touch 10-15 wiki pages

## Query Workflow
When answering a question:
1. Read wiki/index.md first to find relevant pages
2. Read all relevant wiki pages
3. Synthesize answer with [Source: page-name] citations
4. If answer reveals new insights, offer to file it back into wiki/
5. Save valuable answers to outputs/

## Lint Workflow (Monthly)
Check for:
- Contradictions between pages
- Stale claims superseded by newer sources
- Orphan pages with no inbound links
- Concepts mentioned but never explained
- Missing cross-references
- Claims without source attribution
Output: wiki/lint-report-[date].md with severity levels

## Focus Areas
[List 3-5 topics this knowledge base covers]
</code></pre></div><p>Customise three things before saving:</p><ol><li><p>The [YOUR TOPIC HERE] line -- make it specific (&#8221;Enterprise Architecture for Financial Services&#8221; not just &#8220;Architecture&#8221;)</p></li><li><p>The domain folders -- rename/add/remove to match your topics (I have 6 domains; you might have 3 or 10)</p></li><li><p>The Focus Areas -- list the 3-5 domains this KB covers</p></li></ol><p>This file is read by Droid at the start of every session. It&#8217;s the single most important file in the entire system.</p><h2>Step 4: Fill Your Raw Folder (10 Minutes of Dumping, Zero Organising)</h2><p>Open <code>raw/</code> and dump everything in:</p><ul><li><p>Copy-paste articles into <code>.md</code> or <code>.txt</code> files</p></li><li><p>Export notes from whatever app you&#8217;re using now</p></li><li><p>Save screenshots and diagrams to <code>raw/assets/</code></p></li><li><p>Drop in PDFs (Droid can extract text from them)</p></li><li><p>Paste in research papers, competitor breakdowns, internal docs</p></li><li><p>Dump bookmarks you&#8217;ve been hoarding for months</p></li></ul><p><strong>Don&#8217;t organise it. Don&#8217;t rename anything. Don&#8217;t clean it up.</strong> That&#8217;s the AI&#8217;s job.</p><p>If you have PDFs, Droid will handle extraction automatically. The Anthropic PDF skill (<a href="https://github.com/anthropics/skills/tree/main/skills/pdf">github.com/anthropics/skills/tree/main/skills/pdf</a>) uses <code>pdfplumber</code> and <code>pypdf</code> under the hood. If Droid doesn&#8217;t have these installed, it will install them as part of the ingest. No manual setup needed.</p><p>The Obsidian Web Clipper browser extension converts any web article to markdown in one click. Set a hotkey to pull all images locally so the AI can reference them.</p><p>The goal is volume. Not perfection.</p><h2>Step 5: Run Your First Ingest</h2><p>Open your project in Droid:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;52e8cba9-b04e-46a3-b7f7-dad7d16bd5c4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">cd my-knowledge-base
droid
</code></pre></div><p>Then paste this prompt:</p><p><strong>INGEST PROMPT (single source):</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;58fb0d1d-26bd-453d-842d-1d4ad332704d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read the schema in AGENTS.md. Then process [FILENAME] from raw/. Read it fully, discuss key takeaways with me, then: create a summary page in the appropriate wiki/ domain folder, update wiki/index.md and the domain's home.md, update all relevant concept and entity pages across the wiki, add backlinks, flag any contradictions, and append to wiki/log.md. Use the PDF skill ([github.com/anthropics/skills/tree/main/skills/pdf](https://github.com/anthropics/skills/tree/main/skills/pdf) to read the PDFs or convert them to md format.
</code></pre></div><p>Start with <strong>one source at a time</strong>. Read the summaries. Check the updates. Guide the AI on what to emphasise. This produces dramatically better results than batch-processing everything at once.</p><p><strong>What happens during an ingest:</strong></p><ol><li><p>Droid reads the full source document from <code>raw/</code></p></li><li><p>It discusses key takeaways with you (this is your quality gate)</p></li><li><p>It creates a summary page in the right domain folder</p></li><li><p>It creates cross-cutting concept pages that connect to existing content</p></li><li><p>It updates the index and domain home page</p></li><li><p>It adds backlinks from existing pages to the new content</p></li><li><p>It flags contradictions with existing wiki content</p></li><li><p>It appends a log entry to <code>wiki/log.md</code></p></li></ol><p>A single good source will touch 10-15 wiki pages. That&#8217;s the compounding in action.</p><p><strong>For PDFs specifically</strong>, add this to your ingest prompt:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;7a1d3ac4-31e8-4e06-bcb9-cafeab694673&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Use the PDF skill (pdfplumber/pypdf) to extract text from the PDF before processing.
</code></pre></div><p><strong>For batch ingest</strong> (less supervised, use after you trust the system):</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;bc894ae7-cefc-43f7-986e-19470ca36ac5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read AGENTS.md. Process all unprocessed files in raw/ sequentially. For each: create summary in the appropriate domain folder, update index and home.md, update relevant pages, add backlinks, flag contradictions, log the ingest. Proceed automatically.
</code></pre></div><p>After 5-10 sources, your wiki/ folder will have an index, a log, domain home pages, and 15-30 interconnected pages. That&#8217;s when things click.</p><h2>Step 6: Set Up Obsidian (3 Minutes)</h2><p>Your wiki is already Obsidian-compatible. It uses markdown files, [[wiki-links]], and YAML frontmatter. You just need to point Obsidian at it.</p><p>If Obsidian is already installed:</p><ol><li><p>Open Obsidian</p></li><li><p>Click &#8220;Open folder as vault&#8221;</p></li><li><p>Select your wiki/ folder</p></li><li><p>Done</p></li></ol><p>If you need to install Obsidian:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b11cc76a-8ba4-4f73-b58d-447d2e19184c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext"># Linux (Flatpak)
flatpak install flathub md.obsidian.Obsidian

# macOS
brew install --cask obsidian

# Or download from obsidian.md
</code></pre></div><p>What works immediately in Obsidian:</p><ul><li><p>[[wiki-links]] - all cross-page links are clickable and navigable</p></li><li><p>Graph view - click the graph icon to see the interconnected page structure</p></li><li><p>Backlinks panel - right sidebar shows which pages link to the current page</p></li><li><p>YAML frontmatter - properties like title, status, source_count appear in the properties panel</p></li><li><p>Search - Ctrl+Shift+F for global search across all pages</p></li><li><p>Folder navigation - domain folders show up in the file explorer</p></li></ul><p>The vault opens on index.md as the landing page with the full catalogue of all wiki pages by domain.</p><h2>Step 7: Start Querying Your Knowledge Base</h2><p>Once you have 10+ wiki pages, the system becomes genuinely useful.</p><p><strong>QUERY PROMPT:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;be303cfa-feca-4b95-b6b5-360625b417cf&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read wiki/index.md. Based on what's in the knowledge base, answer: [YOUR QUESTION]. Cite which wiki pages informed your answer. If this reveals new connections worth preserving, create a new page in the appropriate wiki/ domain folder and update the index.
</code></pre></div><p><strong>Questions that extract the most value:</strong></p><ul><li><p>&#8220;What are the three biggest gaps in this knowledge base?&#8221;</p></li><li><p>&#8220;Which sources disagree with each other, and on what?&#8221;</p></li><li><p>&#8220;What should I research next based on what&#8217;s here?&#8221;</p></li><li><p>&#8220;Write a 500-word briefing on [topic] using only wiki content&#8221;</p></li><li><p>&#8220;What connections exist between [concept A] and [concept B]?&#8221;</p></li><li><p>&#8220;What contradictions or tensions exist across the sources?&#8221;</p></li></ul><p>The critical loop: <strong>good answers should be filed back into the wiki.</strong> A comparison, an analysis, a connection you discovered. These compound just like ingested sources do. Every question makes the next answer better.</p><h2>Step 8: Run Monthly Health Checks</h2><p>This is the step nobody does. It&#8217;s the step that prevents the whole system from slowly rotting.</p><p><strong>LINT PROMPT:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;079eb7bb-e5bc-42b2-be87-e61baf426ffc&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Run a full health check on wiki/ per the lint workflow in AGENTS.md. Output to wiki/lint-report-[date].md with severity levels. Suggest 3 articles to fill the biggest knowledge gaps.
</code></pre></div><p>Why this matters: when the AI writes something slightly wrong and you save it back, the next answer builds on the wrong thing. Two months later, you have five pages reinforcing the same error. Health checks catch this before it snowballs.</p><p>One check per month. Ten minutes of your time. Non-negotiable if you want the system to stay trustworthy.</p><h2>Step 9: Let It Compound</h2><p>After 4-6 weeks of consistent use, you&#8217;re not just searching notes. You&#8217;re querying a structured knowledge system that understands connections between your sources better than you do.</p><p>Three ways to accelerate the compounding:</p><p><strong>File exploration outputs back.</strong> When the AI generates a comparison or analysis you find valuable, save it into <code>wiki/</code> or <code>outputs/</code>. Your own explorations and queries always add up in the knowledge base.</p><p><strong>Add visual outputs.</strong> Have the AI render answers as markdown tables, charts, or slide decks. These become reusable assets, not throwaway chat messages.</p><p><strong>Version control everything.</strong> Your wiki is just markdown files. Initialize a git repo. You get full history, branching, and the ability to undo anything the AI messes up.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;7838559c-219a-4171-9d40-03c04ad6e4fa&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">cd my-knowledge-base
git init
git add .
git commit -m "Initial knowledge base"
</code></pre></div><div><hr></div><h2>Where This System Breaks</h2><p>This is a nascent pattern, not a finished product. Karpathy himself called it &#8220;a hacky collection of scripts.&#8221; Here&#8217;s what you need to know:</p><h3>Context Window Ceiling</h3><p>The wiki works at ~100 articles and ~400K words. But even 128K-token context windows only hold ~96K words. The AI reads selectively through the index, which means it can miss things. Research shows LLMs suffer from &#8220;lost in the middle&#8221; effects where information in the centre of long inputs gets deprioritised. Your query results will have blind spots. Accept this.</p><h3>Error Compounding</h3><p>The AI writes a wiki page with a subtle mistake. You query against it. The mistake enters your answer. You file that answer back. Now two pages reinforce the same error. Monthly linting helps, but the AI doing the linting has the same blind spots as the AI that made the error. <strong>This is the single biggest risk.</strong></p><h3>Hallucination Doesn&#8217;t Disappear</h3><p>The wiki approach reduces hallucination because the AI grounds answers in your sources. But it doesn&#8217;t eliminate it. The AI can still synthesise connections that don&#8217;t exist in the source material. And because the wiki looks authoritative (clean markdown, cross-references, citations), you&#8217;re more likely to trust incorrect information. Don&#8217;t.</p><h3>Cost Isn&#8217;t Zero</h3><p>Every ingest, every query, every lint check costs tokens. A single source that touches 10-15 pages can run $1-2 in API calls with frontier models. Cheaper than a research assistant. Not free. Using OpenRouter with cost-efficient models like glm-5.1 helps, but it&#8217;s still not zero.</p><h3>It Doesn&#8217;t Scale to Enterprise</h3><p>The index-file approach works without RAG at ~100 articles. At 10,000+ sources, this pattern breaks. The index grows too large. Consistency across thousands of pages becomes impossible. You&#8217;ll need the infrastructure this system was designed to avoid. Know the ceiling.</p><h2>What To Do About The Breakpoints</h2><p>Problem Mitigation Error compounding Monthly lint checks. Cross-check critical claims manually. Never trust blindly on high-stakes decisions. Context limits Keep each wiki focused on one domain. Multiple domains? Multiple knowledge bases. Cost Use frontier models for ingest and complex queries. Cheaper models for simple updates. Hallucination The schema requires source citations on every claim. If a page makes a claim without [Source: filename], linting flags it. Scale Accept this is a personal tool, not enterprise infrastructure. If you outgrow it, that&#8217;s a good problem. Model bias Swap models in OpenRouter with one config change. Re-lint after switching to catch interpretation differences.</p><div><hr></div><h2>Your Complete Prompt Library</h2><p>Every prompt from this guide, collected in one place:</p><p>SCHEMA: Copy the full AGENTS.md template from Step 3.</p><p>INGEST (one source):</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;18c37744-958f-432b-addb-2fa629fd3bae&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read the schema in AGENTS.md. Then process [FILENAME] from raw/. Read it fully, discuss key takeaways with me, then: create a summary page in the appropriate wiki/ domain folder, update wiki/index.md and the domain's home.md, update all relevant concept and entity pages across the wiki, add backlinks, flag any contradictions, and append to wiki/log.md.
</code></pre></div><p>INGEST (batch, less supervised):</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;745616a1-e94e-47b3-8253-4bdd334f2314&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read AGENTS.md. Process all unprocessed files in raw/ sequentially. For each: create summary in the appropriate domain folder, update index and home.md, update relevant pages, add backlinks, flag contradictions, log the ingest. Proceed automatically.
</code></pre></div><p><strong>QUERY:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;014f61d5-7289-4efb-b9a0-c3adfeefdb26&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read wiki/index.md. Answer: [QUESTION]. Cite wiki pages. If this answer is worth preserving, offer to file it as a new wiki page in the appropriate domain folder.
</code></pre></div><p><strong>LINT:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;db2a708d-175d-4712-9a66-79ea57ea8f33&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Run a full health check on wiki/ per the lint workflow in AGENTS.md. Output to wiki/lint-report-[date].md with severity levels. Suggest 3 articles to fill gaps.
</code></pre></div><p><strong>EXPLORE:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c7c90637-a671-4bcb-b934-461cdbe8048d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read wiki/index.md and identify the 5 most interesting unexplored connections between existing topics. For each, explain what insight it might reveal and what source would help confirm it.
</code></pre></div><p><strong>BRIEF:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;28b20225-4973-462f-8519-bb4c7d8bff02&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Based on everything in wiki/, write a 500-word executive briefing on [TOPIC]. Cite sources. Structure it as: current state, key tensions, open questions, recommended next steps.
</code></pre></div><p><strong>CONTRADICTIONS:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;2c9b3de1-bf9f-4ab1-8846-56ede0e497d1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Read all wiki pages and identify every place where guidance in one page conflicts with, undermines, or creates tension with guidance in another page. Categorise as: explicit contradiction, implicit tension, acknowledged trade-off, or vague guidance. Rate severity as high/medium/low.
</code></pre></div><h2>Go Build It</h2><p>The difference between bookmarking Karpathy&#8217;s gist and benefiting from it is one afternoon.</p><p>Pick your topic. Create the folders. Copy the schema. Drop in what you already have. Run your first ingest.</p><h1>Then do it again tomorrow with another source. And next week with five more.</h1><p>The wiki gets smarter every time. That&#8217;s the whole point.</p><p>Three folders. One schema. One custom model. An AI that does the grunt work you&#8217;d never do yourself.</p><p>Stop collecting bookmarks. Start compiling knowledge.</p>]]></content:encoded></item></channel></rss>