Clone Hermes Agent's Architecture for Your Own AI Assistant
Reverse-engineer Hermes Agent's core design patterns to build a production-grade AI assistant framework
Your AI assistant forgets the conversation context after three exchanges. The tool calling fails when you chain multiple operations. The memory system breaks when handling complex workflows that span multiple sessions.
You’re cobbling together OpenAI function calls with custom prompt engineering while fighting race conditions in multi-step processes. The assistant that worked for simple Q&A completely falls apart when you need it to research, analyze, and execute a series of dependent tasks.
Meanwhile, Nous Research’s Hermes Agent handles complex workflows flawlessly. Multi-turn conversations maintain perfect context. Tool execution chains together seamlessly. The architecture scales from simple queries to sophisticated automation.
The Idea (60 Seconds)
You’ll reverse-engineer Hermes Agent’s core design patterns to build a production-grade AI assistant framework. The implementation uses a modular plugin system, persistent memory management, and standardized tool interfaces that handle complex workflows reliably. Setup takes 2 hours. The result gives you an assistant architecture that scales from basic chat to autonomous task execution.
Why This Architecture, Beyond Simple Function Calling
Memory persistence solves context degradation. Standard chat implementations lose context as conversations grow. Hermes uses structured memory that maintains conversation state, user preferences, and task history across sessions. Your assistant remembers what you discussed yesterday and builds on previous work.
Plugin modularity enables unlimited expansion. Function calling requires hardcoded tool definitions. The Hermes pattern uses a plugin interface where tools register themselves dynamically. Add new capabilities by dropping Python files into a plugins directory. Zero core code changes.
Execution planning prevents tool chaos. Naive implementations call tools randomly based on user input. Hermes creates execution plans that sequence tool calls logically, handle dependencies, and recover from failures. The difference between “search for Python tutorials” and “search for Python tutorials, summarize the top 3, create a learning plan, and schedule practice sessions.”
Walkthrough
1. Core Agent Framework
Create the base agent class that handles conversation flow and tool coordination:
# agent.py
import json
import asyncio
from typing import Dict, List, Any, Optional
from dataclasses import dataclass
from datetime import datetime
@dataclass
class Message:
role: str
content: str
timestamp: datetime
metadata: Dict[str, Any] = None
class HermesAgent:
def __init__(self, model_client, memory_store, plugin_manager):
self.model = model_client
self.memory = memory_store
self.plugins = plugin_manager
self.conversation_id = None
async def process_message(self, user_input: str) -> str:
# Load conversation context
context = await self.memory.get_context(self.conversation_id)
# Create execution plan
plan = await self.create_execution_plan(user_input, context)
# Execute plan steps
results = []
for step in plan.steps:
result = await self.execute_step(step)
results.append(result)
# Generate response
response = await self.synthesize_response(results, user_input)
# Store conversation state
await self.memory.store_exchange(
self.conversation_id, user_input, response, results
)
return response
2. Memory Management System
Implement persistent memory that maintains context across sessions:
# memory.py
import sqlite3
import json
from typing import Dict, List, Optional
class MemoryStore:
def __init__(self, db_path: str):
self.db_path = db_path
self.init_database()
def init_database(self):
conn = sqlite3.connect(self.db_path)
conn.execute('''
CREATE TABLE IF NOT EXISTS conversations (
id TEXT PRIMARY KEY,
created_at TIMESTAMP,
last_active TIMESTAMP,
context_summary TEXT
)
''')
conn.execute('''
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY,
conversation_id TEXT,
role TEXT,
content TEXT,
timestamp TIMESTAMP,
metadata TEXT,
FOREIGN KEY (conversation_id) REFERENCES conversations (id)
)
''')
conn.commit()
conn.close()
async def get_context(self, conversation_id: str) -> Dict:
conn = sqlite3.connect(self.db_path)
# Get recent messages
messages = conn.execute('''
SELECT role, content, timestamp, metadata
FROM messages
WHERE conversation_id = ?
ORDER BY timestamp DESC
LIMIT 20
''', (conversation_id,)).fetchall()
# Get conversation summary
summary = conn.execute('''
SELECT context_summary
FROM conversations
WHERE id = ?
''', (conversation_id,)).fetchone()
conn.close()
return {
'messages': [
{
'role': msg[0],
'content': msg[1],
'timestamp': msg[2],
'metadata': json.loads(msg[3] or '{}')
}
for msg in reversed(messages)
],
'summary': summary[0] if summary else None
}
3. Plugin System Architecture
Build the modular tool interface that enables dynamic capability expansion:
# plugins.py
import importlib
import os
from abc import ABC, abstractmethod
from typing import Dict, Any, List
class Plugin(ABC):
@property
@abstractmethod
def name(self) -> str:
pass
@property
@abstractmethod
def description(self) -> str:
pass
@abstractmethod
async def execute(self, parameters: Dict[str, Any]) -> Any:
pass
@abstractmethod
def get_schema(self) -> Dict:
pass
class PluginManager:
def __init__(self, plugins_dir: str):
self.plugins_dir = plugins_dir
self.plugins: Dict[str, Plugin] = {}
self.load_plugins()
def load_plugins(self):
for filename in os.listdir(self.plugins_dir):
if filename.endswith('.py') and filename != '__init__.py':
module_name = filename[:-3]
spec = importlib.util.spec_from_file_location(
module_name,
os.path.join(self.plugins_dir, filename)
)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Find Plugin subclasses
for attr_name in dir(module):
attr = getattr(module, attr_name)
if (isinstance(attr, type) and
issubclass(attr, Plugin) and
attr != Plugin):
plugin_instance = attr()
self.plugins[plugin_instance.name] = plugin_instance
def get_available_tools(self) -> List[Dict]:
return [
{
'name': plugin.name,
'description': plugin.description,
'schema': plugin.get_schema()
}
for plugin in self.plugins.values()
]
4. Example Plugin Implementation
Create a web search plugin that follows the standard interface:
# plugins/web_search.py
import aiohttp
import json
from plugins import Plugin
class WebSearchPlugin(Plugin):
@property
def name(self) -> str:
return "web_search"
@property
def description(self) -> str:
return "Search the web for current information"
async def execute(self, parameters):
query = parameters.get('query')
max_results = parameters.get('max_results', 5)
# Use your preferred search API
async with aiohttp.ClientSession() as session:
url = f"https://api.search.brave.com/res/v1/web/search"
headers = {"X-Subscription-Token": "your_api_key"}
params = {"q": query, "count": max_results}
async with session.get(url, headers=headers, params=params) as response:
data = await response.json()
results = []
for item in data.get('web', {}).get('results', []):
results.append({
'title': item.get('title'),
'url': item.get('url'),
'description': item.get('description')
})
return {'results': results, 'query': query}
def get_schema(self):
return {
'type': 'object',
'properties': {
'query': {'type': 'string', 'description': 'Search query'},
'max_results': {'type': 'integer', 'description': 'Maximum results to return'}
},
'required': ['query']
}
5. Execution Planning
Implement the planning system that sequences tool calls intelligently:
# planner.py
from typing import List, Dict
from dataclasses import dataclass
@dataclass
class ExecutionStep:
tool_name: str
parameters: Dict
depends_on: List[str] = None
step_id: str = None
class ExecutionPlanner:
def __init__(self, model_client, plugin_manager):
self.model = model_client
self.plugins = plugin_manager
async def create_plan(self, user_input: str, context: Dict) -> List[ExecutionStep]:
available_tools = self.plugins.get_available_tools()
planning_prompt = f"""
User request: {user_input}
Available tools: {json.dumps(available_tools, indent=2)}
Create a step-by-step execution plan. Each step should use one tool.
Consider dependencies between steps.
Respond with a JSON array of steps:
[
{{
"step_id": "step_1",
"tool_name": "web_search",
"parameters": {{"query": "Python tutorials"}},
"depends_on": []
}}
]
"""
response = await self.model.complete(planning_prompt)
steps_data = json.loads(response)
return [ExecutionStep(**step) for step in steps_data]
Caveats
Model quality determines planning effectiveness. The execution planner relies on the language model understanding tool capabilities and sequencing logic. Weaker models create inefficient plans or miss dependencies. GLM-5.1 level capability becomes essential for complex workflows.
Memory storage grows indefinitely. The SQLite implementation accumulates conversation history permanently. Add cleanup routines for conversations older than 30 days or implement conversation archiving to prevent database bloat.
Plugin isolation remains minimal. Plugins execute in the same Python process with full system access. Malicious or buggy plugins can crash the entire agent. Consider sandboxing for production deployments handling untrusted plugins.
Philosophy
Building your own agent architecture creates compound advantages over time. Each plugin you add increases the system’s capabilities exponentially. The memory system learns your preferences and work patterns. The execution planner gets better at sequencing tasks for your specific use cases.
The Hermes architecture pattern scales from personal assistant to team automation platform. Start with web search and file operations. Add calendar integration, code analysis, and deployment tools. The modular design grows with your needs while maintaining reliability.
You own the entire stack. Zero vendor dependencies. Zero API rate limits. Zero feature deprecation risk.
Build Yours
Start with the core agent framework and memory system. Build one plugin. Test the execution planning with simple two-step workflows. The architecture becomes clear once you see it running.
What’s the first capability you’ll add to your agent? Drop your plugin ideas in the comments.


