Skip to content
Gary Wu
Go back

Autonomous Agent Frameworks

Edit page

Org Status: 🟡 Dormant Cloudflare: N/A Last Audited: 2026-04-28


The autonomous agent ecosystem exploded in early 2026. OpenClaw became the fastest-growing open-source project in GitHub history, crossing 302K stars in 60 days. A wave of lightweight alternatives followed — NanoClaw, PicoClaw, NullClaw, ZeroClaw, NanoBot, TinyClaw — each making different tradeoffs around memory, skill systems, communication patterns, and execution models. Meanwhile, enterprise frameworks like LangGraph, CrewAI, AutoGen, and the OpenAI Agents SDK matured toward production readiness.

This article systematically compares 18 agent frameworks across every dimension that matters: architecture, memory, skills, communication, execution, state management, and ecosystem. The goal is not to crown a winner — it is to map the design space so you can pick the right patterns for your system.

What you will learn:


  1. The Problem
  2. Framework Overview
  3. Deep Dives
  4. Master Feature Matrix
  5. Memory Systems Compared
  6. Skills and Tools Compared
  7. Communication Patterns Compared
  8. Execution Models Compared
  9. State Management Compared
  10. Ecosystem Compared
  11. Architecture Patterns That Emerge
  12. Implementation Deep Dives
  13. Anti-Patterns
  14. Recommendations by Use Case
  15. References

Building an autonomous agent that works reliably requires solving at least six hard problems simultaneously:

  1. Memory — How does the agent remember what happened last session? Last week? How does it decide what to forget?
  2. Skills — How do you teach an agent a repeatable procedure? Can skills be shared, versioned, composed?
  3. Communication — How do agents talk to each other? How does a human interrupt or redirect?
  4. Execution — Where does the agent run? How do you sandbox it? How do you control cost?
  5. State — Can an agent resume from a checkpoint? Can it survive a crash?
  6. Coordination — In multi-agent systems, who decides what runs next?

No single framework solves all six perfectly. Most are strong in one or two areas and weak in the rest. The frameworks that emerged in 2025-2026 represent the first generation of serious attempts at production-grade autonomous agents, and they make radically different design choices.

What changes if you get this right

A well-chosen agent architecture lets you:

What happens if you get it wrong


The Claw Family (OpenClaw-derived)

The single biggest event in the agent framework space was OpenClaw’s explosive growth. Peter Steinberger’s personal AI assistant went from obscure side project to 302K GitHub stars in 60 days, surpassing React’s 10-year record. Steinberger joined OpenAI in February 2026 and moved the project to an open-source foundation.

This spawned an entire family of lightweight alternatives:

FrameworkLanguageBinary SizeRAMStartupStarsFocus
OpenClawTypeScript~200MB (Node)~1GB~5s302KFull-featured personal AI assistant
NanoClawTypeScript~50MB (Node)~200MB~3s22KContainer-isolated, Claude-native
NanoBotPythonN/A~100MB~2s27KUltra-light, knowledge graph memory
PicoClawGo<10MB<10MB<1s18KEdge/IoT, $10 hardware
ZeroClawRust~16MB~5MB<50ms17KTrait-driven, pluggable everything
NullClawZig678KB~1MB<2ms12KSmallest possible, hardware peripherals
TinyClawTypeScript~60MB (Node)~300MB~3s8KMulti-agent teams, collaboration

Enterprise / Research Frameworks

FrameworkLanguageStarsFocus
LangGraphPython/TS10KStateful agent graphs, checkpointing
CrewAIPython46KMulti-agent orchestration, role-based
AutoGen / Microsoft Agent FrameworkPython/.NET38KEnterprise multi-agent, async messaging
OpenAI Agents SDKPython/TS15KProduction evolution of Swarm
MetaGPTPython42KSOP-driven software company simulation
Claude Agent SDKPython/TSN/AClaude Code tooling as a framework
Pydantic AIPython15KType-safe agents, structured output
Agent ZeroPython12KAuto-learning, hierarchical subordinates
Cloudflare Agents SDKTypeScriptN/ADurable Objects as agents, edge-native

Legacy / Educational

FrameworkLanguageStarsStatus
AutoGPTPython/TS170KPivoted to low-code platform
BabyAGIPython20KExperimental, self-building
SuperAGIPython15KStalled since Jan 2024
Swarm (OpenAI)Python18KReplaced by Agents SDK

OpenClaw

What it is: The most popular autonomous AI agent framework. Runs locally, connects to 20+ messaging platforms, and provides a hub-and-spoke architecture with a local WebSocket gateway.

Strengths:

Weaknesses:

Architecture:

User (WhatsApp/Telegram/Slack/...)
        |
Local Gateway (ws://127.0.0.1:18789)
        |
Agent Runtime Sessions
  |-- SOUL.md      (personality)
  |-- AGENTS.md    (behavior rules, injected every turn)
  |-- MEMORY.md    (long-term curated facts)
  |-- memory/      (daily logs, temporal decay)
  |-- skills/      (SKILL.md playbooks, selectively injected)
  |-- SQLite       (sessions, search index)

Skill definition example:

---
name: deploy-to-cloudflare
description: Deploy a Cloudflare Workers project using Wrangler
requires:
  binaries: ["wrangler", "node"]
  env: ["CLOUDFLARE_ACCOUNT_ID", "CLOUDFLARE_API_TOKEN"]
tags: ["deployment", "cloudflare", "infrastructure"]
---


1. Verify wrangler.jsonc exists in the project root
2. Run `wrangler whoami` to verify authentication
3. Run `wrangler deploy` and capture output
4. Verify deployment by checking the output URL
5. If deployment fails, read error output and attempt fix


- If CLOUDFLARE_API_TOKEN is missing, tell the user
- If wrangler.jsonc is missing, check for wrangler.toml and convert
- If deployment fails with route conflict, suggest manual resolution

Key insight: OpenClaw proved that skills-as-markdown is a viable pattern. No SDK, no compilation, no dependency management. Just a folder with a SKILL.md file. This pattern has been adopted by nearly every framework in the Claw family.


NanoClaw

What it is: A container-isolated, Claude-native alternative to OpenClaw. ~500 lines of core TypeScript. Each agent runs in its own Linux container with filesystem isolation.

Strengths:

Weaknesses:

Architecture:

Platform -> Channel.onMessage() -> storeMessage(SQLite)
  -> MessageLoop polls -> GroupQueue enqueues
  -> runContainerAgent() spawns container
  -> Claude Agent SDK processes
  -> IPC files written -> Host polls & routes
  -> Channel delivers to platform

IPC structure:

data/ipc/{group}/
  |-- messages/            # Outbound message JSON files
  |-- tasks/               # Schedule/pause/cancel task JSONs
  |-- current_tasks.json   # Host -> container snapshot
  |-- available_groups.json

Tool definition example (MCP tools inside container):

// NanoClaw exposes these as MCP tools to Claude inside each container
const tools = {
  send_message: async (params: { group: string; text: string }) => {
    // Write JSON to data/ipc/{group}/messages/
    await writeFile(
      `data/ipc/${params.group}/messages/${Date.now()}.json`,
      JSON.stringify({ text: params.text, timestamp: new Date().toISOString() })
    );
  },
  schedule_task: async (params: { name: string; cron: string; prompt: string }) => {
    await writeFile(
      `data/ipc/${currentGroup}/tasks/${params.name}.json`,
      JSON.stringify({ action: "schedule", cron: params.cron, prompt: params.prompt })
    );
  },
  list_tasks: async () => {
    const tasks = await readFile(`data/ipc/${currentGroup}/current_tasks.json`);
    return JSON.parse(tasks);
  },
};

Key insight: NanoClaw proves you can build a production-capable agent system in ~500 lines by standing on the Claude Agent SDK. The “skills over features” model — where customization means code changes, not configuration — avoids the configuration sprawl that plagues larger frameworks.


NanoBot (HKUDS)

What it is: An ultra-lightweight Python alternative to OpenClaw. ~4,000 lines. Delivers core agent functionality with 99% less code than OpenClaw.

Strengths:

Weaknesses:

Memory example:



PicoClaw

What it is: An ultra-lightweight Go-based agent. <10MB RAM, boots in 1 second, runs on $10 RISC-V hardware. 95% of core code is AI-generated through a self-bootstrapping process.

Strengths:

Weaknesses:

Use case: Edge computing, IoT devices, Raspberry Pi, self-hosted agents on cheap hardware.


NullClaw

What it is: The smallest possible autonomous agent. 678KB binary, ~1MB RAM, <2ms startup. Written in raw Zig with zero dependencies — no Python, no JVM, no Go runtime.

Strengths:

Weaknesses:

Architecture pattern:

// NullClaw's vtable-driven extension model
// Every subsystem implements a simple interface

const ProviderVTable = struct {
    init: *const fn (config: *const Config) anyerror!void,
    complete: *const fn (messages: []const Message) anyerror!Response,
    embed: *const fn (text: []const u8) anyerror![]f32,
    deinit: *const fn () void,
};

const ChannelVTable = struct {
    init: *const fn (config: *const Config) anyerror!void,
    receive: *const fn () anyerror!?InboundMessage,
    send: *const fn (message: OutboundMessage) anyerror!void,
    deinit: *const fn () void,
};

// Register a new provider:
pub fn registerProvider(name: []const u8, vtable: ProviderVTable) void {
    provider_registry.put(name, vtable);
}

Key insight: NullClaw proves that a full-featured agent runtime (providers, channels, tools, memory, sandbox) can fit in under 1MB. The vtable pattern is what makes this possible — zero abstraction cost, zero dynamic dispatch overhead.


ZeroClaw

What it is: A Rust-based agent runtime. ~16MB binary, ~5MB RAM. Trait-driven architecture where every subsystem is swappable.

Strengths:

Weaknesses:

Memory configuration:

[memory]
backend = "sqlite"
vector_weight = 0.7
keyword_weight = 0.3
embedding_model = "nomic-embed-text"
embedding_cache_size = 10000
auto_recall = true

[memory.retention]
short_term_hours = 24
long_term_days = 365
decay_half_life_days = 30

[skills]
paths = ["./skills", "~/.zeroclaw/skills"]
community_registry = "https://registry.zeroclawlabs.ai"

TinyClaw

What it is: A multi-agent team collaboration framework. Agents work in teams, communicate through persistent chat rooms, and hand off work via chain execution and fan-out patterns.

Strengths:

Weaknesses:

Multi-agent team definition:

// TinyClaw team configuration
const team = {
  id: "content-team",
  agents: [
    {
      id: "researcher",
      model: "claude-sonnet-4-5-20250514",
      systemPrompt: "You are a research specialist...",
      tools: ["web_search", "read_file", "write_file"],
    },
    {
      id: "writer",
      model: "claude-sonnet-4-5-20250514",
      systemPrompt: "You are a content writer...",
      tools: ["read_file", "write_file", "send_message"],
    },
    {
      id: "reviewer",
      model: "claude-haiku-3-5-20241022",
      systemPrompt: "You are a content reviewer...",
      tools: ["read_file", "send_message"],
    },
  ],
  // Persistent team chat room -- all agents see all messages
  chatRoom: {
    persistence: "sqlite",
    broadcastAll: true,
  },
};

// Fan-out: researcher mentions @writer and @reviewer
// Both receive the message and process in parallel
// Responses flow back to the team chat room

Agent Zero

What it is: A general-purpose AI agent with the best auto-learning system in the open-source space. Hierarchical superior-subordinate model with Docker-isolated execution.

Strengths:

Weaknesses:

Auto-learning memory flow:


extracted = utility_model.extract(
    conversation_turn,
    categories=["facts", "solutions", "preferences", "errors"]
)

for item in extracted:
    embedding = embedding_model.embed(item.text)
    faiss_index.add(embedding, metadata={
        "category": item.category,
        "timestamp": now(),
        "source_conversation": conversation_id,
    })


memories = faiss_index.search(current_query, k=10)

Key insight: Agent Zero’s auto-learning is the single most differentiating feature in the entire framework landscape. No other framework automatically extracts, categorizes, embeds, and consolidates knowledge from conversations. Every other system requires the user or developer to explicitly manage memory.


LangGraph

What it is: LangChain’s agent graph framework. Models agent workflows as state machines with nodes, edges, and reducers. Reached 1.0 GA in October 2025.

Strengths:

Weaknesses:

State and checkpointing example:

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.postgres import PostgresSaver
from typing import TypedDict, Annotated
from operator import add

class AgentState(TypedDict):
    messages: Annotated[list, add]  # Reducer: append new messages
    task_status: str
    research_results: list[dict]
    draft: str
    review_feedback: str

def research_node(state: AgentState) -> dict:
    """Research node - fetches data and updates state."""
    results = search_api(state["messages"][-1])
    return {
        "research_results": results,
        "task_status": "researched",
    }

def write_node(state: AgentState) -> dict:
    """Write node - drafts content from research."""
    draft = llm.invoke(f"Write based on: {state['research_results']}")
    return {"draft": draft, "task_status": "drafted"}

def review_node(state: AgentState) -> dict:
    """Review node - human can interrupt here."""
    feedback = llm.invoke(f"Review: {state['draft']}")
    return {"review_feedback": feedback, "task_status": "reviewed"}

graph = StateGraph(AgentState)
graph.add_node("research", research_node)
graph.add_node("write", write_node)
graph.add_node("review", review_node)
graph.add_edge(START, "research")
graph.add_edge("research", "write")
graph.add_edge("write", "review")
graph.add_conditional_edges("review", lambda s: END if s["task_status"] == "approved" else "write")

checkpointer = PostgresSaver(conn_string="postgresql://...")
app = graph.compile(checkpointer=checkpointer)

config = {"configurable": {"thread_id": "article-draft-42"}}
result = app.invoke({"messages": ["Write about agent frameworks"]}, config)

resumed = app.invoke({"messages": ["Add more code examples"]}, config)

CrewAI

What it is: Multi-agent orchestration with role-based agent definitions. Dual architecture: Crews (autonomous teams) and Flows (event-driven workflows). 45.9K GitHub stars.

Strengths:

Weaknesses:

Agent and crew definition:

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive information about autonomous agent frameworks",
    backstory="You are an experienced tech researcher...",
    tools=[web_search, file_reader],
    memory=True,
    verbose=True,
    llm="anthropic/claude-sonnet-4-5-20250514",
)

writer = Agent(
    role="Technical Writer",
    goal="Produce clear, detailed technical articles",
    backstory="You are a staff engineer who writes...",
    tools=[file_writer, markdown_formatter],
    memory=True,
    llm="anthropic/claude-sonnet-4-5-20250514",
)

research_task = Task(
    description="Research {topic} and compile findings",
    expected_output="Structured research notes with citations",
    agent=researcher,
)

writing_task = Task(
    description="Write a comprehensive article based on research",
    expected_output="A 2000+ word technical article in Markdown",
    agent=writer,
    context=[research_task],  # Writer gets researcher's output
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    memory=True,
    memory_config={
        "provider": "mem0",
        "config": {"vector_store": {"provider": "chroma"}},
    },
)

result = crew.kickoff(inputs={"topic": "agent memory systems"})

AutoGen / Microsoft Agent Framework

What it is: Microsoft’s multi-agent framework. AutoGen v0.4 was a complete rewrite with async event-driven architecture. Now merging with Semantic Kernel into the Microsoft Agent Framework, targeting GA by end of Q1 2026.

Strengths:

Weaknesses:

AutoGen v0.4 agent example:

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient

model = OpenAIChatCompletionClient(model="gpt-4o")

coder = AssistantAgent(
    name="coder",
    model_client=model,
    system_message="You write Python code to solve tasks.",
)

reviewer = AssistantAgent(
    name="reviewer",
    model_client=model,
    system_message="You review code for bugs and improvements.",
)

termination = TextMentionTermination("APPROVED")

team = RoundRobinGroupChat(
    participants=[coder, reviewer],
    termination_condition=termination,
    max_turns=10,
)

result = await team.run(task="Write a function to calculate Fibonacci numbers")

OpenAI Agents SDK

What it is: The production evolution of Swarm. Lightweight, provider-agnostic multi-agent framework with handoffs, guardrails, and tracing.

Strengths:

Weaknesses:

Agent with handoff:

from agents import Agent, Runner

triage_agent = Agent(
    name="Triage",
    instructions="Route the user to the right specialist.",
    handoffs=["research_agent", "coding_agent"],
)

research_agent = Agent(
    name="Research",
    instructions="Search for information and summarize findings.",
    tools=[web_search_tool],
)

coding_agent = Agent(
    name="Coding",
    instructions="Write and debug code.",
    tools=[code_execution_tool],
)

result = Runner.run_sync(triage_agent, "I need to find the best API for geocoding")

MetaGPT

What it is: A multi-agent framework that simulates a software company. Agents have roles (Product Manager, Architect, Engineer, QA) and collaborate through a shared message pool following Standard Operating Procedures (SOPs).

Strengths:

Weaknesses:


Claude Agent SDK

What it is: Anthropic’s framework for building agents on top of Claude Code. Provides the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript.

Strengths:

Weaknesses:


Cloudflare Agents SDK

What it is: Each agent is a Durable Object with built-in state persistence, scheduling, WebSocket communication, and queue integration. Edge-native.

Strengths:

Weaknesses:

import { Agent, AIChatAgent } from "@cloudflare/agents";

class ResearchAgent extends AIChatAgent<Env, AgentState> {
  async onChatMessage(onFinish: StreamCallbacks["onFinish"]) {
    const response = await generateText({
      model: anthropic("claude-sonnet-4-5-20250514"),
      system: this.system,
      messages: this.messages,
      tools: this.tools,
      onFinish: (result) => {
        // State auto-persists to Durable Object SQLite
        this.setState({
          ...this.state,
          lastResearch: result.text,
          researchCount: (this.state.researchCount || 0) + 1,
        });
        onFinish(result);
      },
    });
  }

  // Built-in scheduling
  async onAlarm() {
    // Runs on configured schedule
    await this.runDailyResearch();
  }
}

Pydantic AI

What it is: A type-safe agent framework built on Pydantic. Catches agent logic errors at development time through Python’s type system.

Strengths:

Weaknesses:


BabyAGI

What it is: A task-driven autonomous agent that runs a create-execute-reprioritize loop. The OG autonomous agent from 2023.

Current status: Evolved into an experimental self-building agent framework. The newest version is the agent that builds itself.

Historical significance: Proved the task loop pattern (create tasks -> execute -> reprioritize -> repeat) that influenced nearly every subsequent framework.


AutoGPT

What it is: The original viral autonomous agent (170K stars). Has pivoted from fully autonomous execution to a low-code platform with a block-based agent builder.

Current status: Now a platform rather than a framework. Users build agents using modular blocks through a Next.js UI with FastAPI backend and PostgreSQL storage.


SuperAGI

What it is: A dev-first autonomous agent framework with a GUI, toolkit marketplace, and APM dashboard.

Current status: Stalled. No releases since January 2024. Issues go unanswered. The company has pivoted. Security vulnerabilities remain unaddressed. Do not use for new projects.


This is the comprehensive comparison across all frameworks and all dimensions.

Architecture

FrameworkTypeLanguageAsyncLLM SupportBinary/Runtime
OpenClawSingle agentTypeScriptYesMulti-provider (20+)Node.js
NanoClawSingle agentTypeScriptYesClaude onlyNode.js + Container
NanoBotSingle agentPythonYesMulti-providerPython
PicoClawSingle agentGoYesMulti-provider (7+)Static binary
ZeroClawSingle agentRustYesMulti-providerStatic binary
NullClawSingle agentZigYesMulti-provider (23+)Static binary
TinyClawMulti-agentTypeScriptYesMulti-providerNode.js
Agent ZeroHierarchicalPythonYesMulti-provider (3 slots)Python + Docker
LangGraphGraph-basedPython/TSYesMulti-providerPython/Node.js
CrewAIMulti-agentPythonYesMulti-providerPython
AutoGenMulti-agentPython/.NETYesMulti-providerPython/.NET
OpenAI SDKMulti-agentPython/TSYesMulti-provider (100+)Python/Node.js
MetaGPTMulti-agentPythonYesMulti-providerPython
Claude SDKSingle agentPython/TSYesClaude onlyPython/Node.js
CF AgentsSingle/MultiTypeScriptYesMulti-providerCloudflare Workers
Pydantic AISingle agentPythonYesMulti-provider (10+)Python
BabyAGISingle agentPythonNoOpenAI primaryPython
AutoGPTPlatformPython/TSYesMulti-providerDocker

Memory Systems

FrameworkShort-TermLong-TermVector SearchKeyword SearchAuto-LearningShared MemoryTemporal Decay
OpenClawSession JSONLMEMORY.md + daily logssqlite-vecBM25NoNo (opt-in IPC)30-day half-life
NanoClawSQLitePer-group CLAUDE.mdNoNoNoNoNo
NanoBotIn-memoryKnowledge graphPlannedPlannedGraph-basedNoNo
PicoClawIn-memoryFile-basedPlannedPlannedNoNoNo
ZeroClawIn-memorySQLite/MarkdownCosine sim (0.7)FTS5 BM25 (0.3)NoNoConfigurable
NullClawIn-memorySQLiteCosine simFTS5 BM25NoNoNo
TinyClawPer-agentTeam chat historyNoNoNoTeam chat roomsNo
Agent ZeroContext windowFAISS vectorsFAISS IndexFlatIPNoYes (auto)Subordinate inheritanceNo (consolidation instead)
LangGraphGraph stateCheckpointer (PG/SQLite)Via LangChainVia LangChainNoShared graph stateNo
CrewAIShort-term memoryLong-term w/ vector DBChroma/customHybridNoCrew-level sharedConfigurable half-life
AutoGenSession-basedConfigurableVia pluginsVia pluginsNoGroup chat historyNo
OpenAI SDKSessionSession (auto-managed)No built-inNo built-inNoVia handoff contextNo
MetaGPTRole contextShared message poolNo built-inNo built-inNoShared message poolNo
Claude SDKConversationFile-based (CLAUDE.md)No built-inNo built-inNoNoNo
CF AgentsDO SQLite stateDO SQLite + VectorizeVectorizeNo built-inNoDO RPCNo
Pydantic AIConversationDurable executionVia integrationVia integrationNoNoNo
BabyAGITask listPinecone vectorsPineconeNoNoNoNo
AutoGPTBlock statePostgreSQLVia blocksVia blocksNoNoNo

Skills and Tools

FrameworkSkill FormatComposableShared RegistryBuilt-in ToolsCustom Skills
OpenClawSKILL.md (Markdown + YAML)Yes5,400+ skillsShell, browser, file, memoryDrop-in folder
NanoClawCode changesYesNoMCP tools (send, schedule, list)Fork and modify
NanoBotClawHub skillsYesClawHub registryShell, file, MCPMCP + ClawHub
PicoClawConfig-basedLimitedNoShell, file, messagingConfig
ZeroClawTOML manifestsYesCommunity registryShell, file, memory, git, browser, cronTOML + folder
NullClawVtable interfaceYesNo18+ (file, shell, memory, browser, hw)Zig interface impl
TinyClawAgent configYesNoRead, write, search, messageTypeScript functions
Agent ZeroPython functionsYesNoShell, browser, code exec, delegatePython functions
LangGraphPython functionsYesLangChain hubLangChain tools ecosystemPython/TS functions
CrewAIPython decoratorsYes7,000+ tools600+ integrationsPython decorators
AutoGenPython/.NETYesNoCode execution, webFunctions + plugins
OpenAI SDKFunctions/MCPYesNoFunctions, MCP, hosted toolsFunctions
MetaGPTRole actionsYesNoCode, write, design, reviewPython classes
Claude SDKPython/TS functions + MCPYesMCP ecosystemRead, Write, Edit, Bash, + moreFunctions + MCP
CF AgentsTypeScript + MCPYesMCP ecosystemDO state, schedule, queue, SQLTS functions + MCP
Pydantic AITyped Python functionsYesMCP/A2AMCP ecosystemTyped functions
BabyAGIPython functionsLimitedNoSearch, code executionPython functions
AutoGPTBlocks (low-code)YesMarketplaceWeb, file, code, AIBlock builder

Communication Patterns

FrameworkAgent-to-AgentHuman-in-LoopMessage PassingEvent System
OpenClawOpt-in IPC (sessions_send/spawn)Chat interfaceWebSocket gatewayNo
NanoClawFilesystem IPC (JSON polling)Chat interfaceJSON files, 1s pollNo
NanoBotNoChat interfaceMessaging platformsNo
PicoClawPlannedChat interfaceGateway commandNo
ZeroClawNoChat interfaceChannelsNo
NullClawNoChat interface18 channelsWebhooks
TinyClawTeam chat rooms (broadcast)Chat interfaceActor model + message queueEvent feed
Agent ZeroSubordinate spawningSuperior chain (human at top)Direct invocationNo
LangGraphGraph edgesInterrupt nodesState reducersNo
CrewAITask delegationCallback systemTask context passingFlows (event-driven)
AutoGenAsync messagingChat interfacePub/sub, request/responseOpenTelemetry
OpenAI SDKHandoffsHuman-in-loop APIHandoff functionsTracing
MetaGPTShared message pool (pub/sub)NoPublish/subscribe by roleNo
Claude SDKNo (single agent)HooksMCPNo
CF AgentsDO RPCWebSocketDO RPC, Queues, WorkflowsQueue events
Pydantic AINo (single agent)NoMCP, A2ANo
BabyAGINoNoTask listNo
AutoGPTBlock connectionsUI approvalBlock graphWebhooks

Execution Model

FrameworkRuns OnSandboxingCost ControlParallel ExecError Handling
OpenClawLocal machineNone (runs as user)Model fallback chainsNoRetry + fallback
NanoClawLocal + containersLinux containersClaude onlyPer-group parallelContainer restart
NanoBotLocal machineNoneModel selectionNoRetry
PicoClawAny ($10 hw)NoneCheap modelsNoRetry
ZeroClawAny (self-hosted)Workspace isolationTool allowlistsNoRetry
NullClawAny (embedded)Multi-layer sandboxMinimal by designNoRetry
TinyClawLocal + DockerDocker (via tinyclaw-infra)Model per agentFan-out parallelActor supervision
Agent ZeroDocker containersDocker per agent3-slot model mixingSubordinate parallelSubordinate retry
LangGraphLocal/cloudNone built-inToken trackingParallel branchesCheckpoint recovery
CrewAILocal/cloudNone built-inToken trackingParallel tasksTask retry
AutoGenLocal/cloud/AzureCode execution sandboxToken trackingAsync agentsRetry + fallback
OpenAI SDKLocal/cloudNone built-inToken trackingAsync agentsGuardrails
MetaGPTLocalDocker for code execToken trackingRole parallelSOP recovery
Claude SDKLocalTool permissionsAPI pricingNoRetry
CF AgentsCloudflare edgeDO isolationDO CPU/memory limitsQueue workersAlarm retry + DLQ
Pydantic AILocal/cloudNone built-inToken trackingNoDurable execution
BabyAGILocalNoneToken trackingNoLoop retry
AutoGPTDockerDockerCredit systemBlock parallelBlock retry

Memory is the most differentiated axis across frameworks. Here is a detailed breakdown.

Memory Architecture Patterns

Four distinct patterns have emerged:

Pattern 1: File-Based Memory (OpenClaw, NanoClaw, Claude SDK)

workspace/
  MEMORY.md          # Curated long-term facts (append-only)
  SOUL.md            # Agent personality (rarely changes)
  AGENTS.md          # Behavior rules (injected every prompt)
  memory/
    2026-03-16.md    # Daily activity log
    2026-03-15.md    # Yesterday (also loaded at startup)

Pros: Human-readable, version-controllable, easy to debug Cons: No semantic search without additional index, linear scan

Pattern 2: Vector Database Memory (Agent Zero, BabyAGI, CrewAI)

memory_store = FAISSMemory(
    index_type="IndexFlatIP",  # Cosine similarity
    dimension=1536,
    auto_extract=True,         # LLM extracts facts after each turn
    auto_consolidate=True,     # Merge related memories over time
    categories=["facts", "solutions", "preferences", "errors"],
)

Pros: Semantic retrieval, scales to millions of memories Cons: Opaque (hard to inspect), embedding model dependency, cost

Pattern 3: Hybrid Search (OpenClaw, ZeroClaw, NullClaw)

-- Hybrid search: vector similarity + BM25 keyword matching
-- ZeroClaw's SQLite implementation

-- Vector search (70% weight)
SELECT id, content,
       cosine_similarity(embedding, ?) AS vec_score
FROM memories;

-- Keyword search (30% weight)
SELECT id, content,
       bm25(memories_fts) AS kw_score
FROM memories_fts
WHERE memories_fts MATCH ?;

-- Combined score
SELECT id, content,
       (0.7 * vec_score + 0.3 * kw_score) AS combined_score
FROM (
  -- join vector and keyword results
) ORDER BY combined_score DESC LIMIT 10;

Pros: Best of both worlds — catches semantic similarity AND exact matches Cons: More complex, requires both embedding model and FTS index

Pattern 4: Graph Memory (NanoBot)

User
  |-- works_on --> data-pipeline
  |       |-- uses --> pandas
  |       |-- deployed_on --> AWS Lambda
  |-- prefers --> type-hints
  |-- asked_about --> agent-frameworks (recent)

Pros: Relationship-aware retrieval, good for ongoing projects Cons: Graph construction is imprecise, harder to debug than files

Memory Retrieval Strategies

StrategyUsed ByHow It Works
Temporal decayOpenClaw, ZeroClawRecent memories weighted higher. 30-day half-life. Evergreen files skip decay
Auto-extractionAgent ZeroUtility LLM extracts facts/solutions/errors after every turn
Auto-consolidationAgent ZeroRelated memories merged and summarized periodically
Adaptive-depth recallCrewAIRetrieval depth adjusts based on query complexity
Composite scoringCrewAIWeighted combination of recency, semantic similarity, and importance
CompactionOpenClawWhen context fills, important info flushed to MEMORY.md, older conversation summarized
Embedding cacheZeroClawLRU cache of 10K embeddings to avoid recomputation
Thread checkpointingLangGraphFull state snapshot at every graph node, recoverable by thread ID

What the Best Memory Systems Have in Common

  1. Hybrid search is converging as the standard. Pure vector search misses exact terms. Pure keyword search misses meaning. The 70/30 vector/keyword split in ZeroClaw and OpenClaw appears to be the sweet spot.

  2. File-based memory for human-inspectable state. Every Claw-family framework uses MEMORY.md. It is not technically optimal, but it is debuggable, versionable, and portable.

  3. Auto-learning is the biggest gap. Only Agent Zero automatically extracts and consolidates knowledge. Every other framework requires explicit memory management.


Skill Definition Patterns

Pattern 1: Markdown Playbooks (OpenClaw, ZeroClaw)

The most popular pattern. A skill is a directory with a SKILL.md file containing YAML frontmatter (requirements, tags, environment) and markdown instructions.

---
name: keyword-research
description: Research keywords using DataForSEO API
requires:
  env: ["DATAFORSEO_LOGIN", "DATAFORSEO_PASSWORD"]
tags: ["seo", "research", "marketing"]
platforms: ["all"]
---

OpenClaw injects a compact XML list of available skills into the system prompt. Skills are selectively loaded based on the current context.

Pattern 2: Typed Functions (Pydantic AI, LangGraph, OpenAI SDK)

Skills are Python/TypeScript functions with type annotations.

from pydantic_ai import Agent
from pydantic import BaseModel

class SearchResult(BaseModel):
    title: str
    url: str
    snippet: str
    relevance_score: float

agent = Agent(
    "anthropic:claude-sonnet-4-5-20250514",
    system_prompt="You are a research assistant.",
)

@agent.tool
async def search_web(query: str, max_results: int = 5) -> list[SearchResult]:
    """Search the web for information."""
    results = await search_api.search(query, limit=max_results)
    return [
        SearchResult(
            title=r["title"],
            url=r["url"],
            snippet=r["snippet"],
            relevance_score=r["score"],
        )
        for r in results
    ]

Pattern 3: Role Actions (MetaGPT, CrewAI)

Skills are embedded in agent roles as expected behaviors.

class Architect(Role):
    name: str = "Architect"
    profile: str = "Software Architect"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.set_actions([WriteDesign, WriteAPIDesign])
        self.watch({WritePRD})  # Subscribe to Product Manager output

Pattern 4: MCP Tools (Claude SDK, NanoClaw, Pydantic AI)

Skills are exposed as MCP (Model Context Protocol) servers.

// Claude SDK: MCP server as skill provider
const agent = new ClaudeAgent({
  mcpServers: [
    {
      name: "seo-tools",
      transport: "stdio",
      command: "npx",
      args: ["-y", "@my-org/seo-mcp-server"],
    },
    {
      name: "database",
      transport: "sse",
      url: "https://api.example.com/mcp",
    },
  ],
});

Pattern 5: Vtable Interfaces (NullClaw)

Skills are compiled interface implementations.

const ToolVTable = struct {
    name: []const u8,
    description: []const u8,
    parameters: []const ParameterDef,
    execute: *const fn (params: ParameterMap) anyerror!ToolResult,
};

// Adding a new tool means implementing this interface
pub const git_commit_tool = ToolVTable{
    .name = "git_commit",
    .description = "Create a git commit with the given message",
    .parameters = &[_]ParameterDef{
        .{ .name = "message", .type = .string, .required = true },
        .{ .name = "files", .type = .string_array, .required = false },
    },
    .execute = &gitCommitImpl,
};

Skill Ecosystem Size

FrameworkBuilt-inCommunityRegistry
OpenClaw~50 bundled5,400+ in registryopenclaw/skills
CrewAI600+ integrations7,000+ toolsCrewAI Tools
LangGraphLangChain ecosystemLangChain HubLangChain Hub
NanoBot~10 coreClawHubClawHub
ZeroClaw~20 built-inCommunity packsTOML registry
NullClaw18+ built-inSmallNone
Claude SDKClaude Code toolset + MCPMCP ecosystemMCP servers
Others5-15 built-inSmallNone

Pattern 1: No Agent-to-Agent Communication

Used by: NanoBot, PicoClaw, ZeroClaw, NullClaw, Claude SDK, Pydantic AI, BabyAGI

These frameworks are single-agent systems. Communication is always agent-to-human via messaging platforms or CLI.

Pattern 2: Filesystem IPC (Polling)

Used by: NanoClaw

Host process polls data/ipc/{group}/ every 1 second
  -> Reads JSON files from messages/, tasks/
  -> Processes and deletes after handling
  -> Container agents write to IPC directory
  -> Host routes messages to platforms

Simple, debuggable, but adds 1-second latency per message hop.

Pattern 3: Handoff Functions

Used by: OpenAI Agents SDK, Swarm (legacy)

triage = Agent(
    name="Triage",
    handoffs=["billing_agent", "tech_support_agent"],
    instructions="Route user to the right specialist based on their question.",
)

Clean and minimal. Works well for customer support routing patterns. Limited for complex multi-step coordination.

Pattern 4: Shared Message Pool (Pub/Sub)

Used by: MetaGPT


class SharedMessagePool:
    def publish(self, message: Message, sender_role: str):
        for subscriber in self.subscribers:
            if subscriber.should_receive(message, sender_role):
                subscriber.inbox.append(message)

    def subscribe(self, agent: Agent, filter_roles: list[str]):
        self.subscribers.append(Subscription(agent, filter_roles))

Pattern 5: Team Chat Rooms (Broadcast)

Used by: TinyClaw

// Every team has a persistent chat room
// All agents see all messages from teammates
// Agents mention teammates to delegate: "@writer please draft this"

interface TeamMessage {
  from: string;      // "researcher"
  to?: string;       // "@writer" or broadcast (no to)
  team: string;      // "content-team"
  content: string;
  timestamp: number;
}

// Fan-out: researcher mentions @writer and @reviewer
// Both process in parallel
// Responses flow back to team room

Pattern 6: Graph Edges and State Reducers

Used by: LangGraph


class State(TypedDict):
    # add reducer: new messages are appended, not replaced
    messages: Annotated[list[str], add]
    # last-write-wins for simple values
    status: str

Pattern 7: DO RPC (Direct Method Invocation)

Used by: Cloudflare Agents SDK

// Agent-to-agent communication via Durable Object RPC
// Zero HTTP overhead -- direct method call within the same datacenter

class CoordinatorAgent extends Agent<Env> {
  async delegateResearch(topic: string) {
    const researcher = this.env.RESEARCH_AGENT.get(
      this.env.RESEARCH_AGENT.idFromName(topic)
    );
    // Direct method call -- no HTTP, no serialization overhead
    const result = await researcher.research(topic);
    return result;
  }
}

Human-in-the-Loop Mechanisms

FrameworkMechanismGranularity
LangGraphInterrupt nodesPer-node in graph
CrewAICallbacksPer-task
AutoGenChat interfacePer-message
OpenAI SDKHuman-in-loop APIPer-run
Agent ZeroSuperior chainHuman is top of hierarchy
TinyClawTeam chatAgents and humans share chat room
OpenClawChat platformsPer-message
CF AgentsWebSocketReal-time bidirectional

Local CLI Execution

Used by: Most frameworks (OpenClaw, NanoBot, PicoClaw, ZeroClaw, NullClaw, Agent Zero, LangGraph, CrewAI)

The agent runs as a local process. Advantages: full machine access, no cloud costs, privacy. Disadvantages: requires a running machine, no horizontal scaling, no fault tolerance.

Container-Isolated Execution

Used by: NanoClaw, Agent Zero, AutoGPT

Each agent runs in its own container. NanoClaw uses Docker or Apple Container. Agent Zero uses Docker per subordinate. This provides OS-level isolation but adds startup latency.

Edge/Serverless Execution

Used by: Cloudflare Agents SDK

// Each agent is a Durable Object
// Runs on Cloudflare's edge network (300+ cities)
// Automatically hibernates when idle (no cost)
// Wakes on request, alarm, or queue message

export default {
  async fetch(request: Request, env: Env) {
    const agentId = env.AGENT.idFromName("my-agent");
    const agent = env.AGENT.get(agentId);
    return agent.fetch(request);
  },
};

Advantages: global distribution, automatic scaling, pay-per-use, built-in persistence. Disadvantages: platform lock-in, CPU time limits, no local filesystem.

Sandboxing Comparison

FrameworkSandbox TypeIsolation LevelPerformance Impact
NanoClawLinux containerOS-levelModerate (container startup)
Agent ZeroDockerOS-levelModerate
NullClawMulti-layer sandboxProcess-levelMinimal
ZeroClawWorkspace isolationFilesystem-levelNone
CF AgentsDurable ObjectV8 isolateNone
LangGraphNone built-inNoneNone
OpenClawNoneRuns as userNone

Cost Control Mechanisms

FrameworkMechanismHow It Works
OpenClawModel fallback chainsTry cheap model first, escalate on failure
Agent Zero3-slot model mixingDifferent models for chat/utility/embedding
LangGraphToken trackingState includes token counts per node
CrewAIToken trackingPer-agent token budgets
CF AgentsDO CPU/memory limitsHard limits per request, alarm-based budgets
OpenAI SDKToken trackingPer-run cost visibility
ZeroClawTool allowlistsLimit which tools can run (prevents expensive operations)

Persistence Models

FrameworkState BackendCheckpoint/ResumeSession ManagementCrash Recovery
OpenClawSQLite + filesSession JSONL (branching)Per-agent sessionsResume from JSONL
NanoClawSQLiteCrash recovery built-inPer-groupSQLite WAL
NanoBotKnowledge graphPersistent graphIn-memory sessionsGraph persists
ZeroClawSQLite/MarkdownFile-basedPer-workspaceFile recovery
NullClawSQLiteFile-basedPer-workspaceBinary restart
TinyClawSQLitePer-agent workspacePer-teamActor restart
Agent ZeroFAISS + filesFAISS persists to diskPer-conversationFAISS reload
LangGraphPG/SQLite/MemoryFull checkpoint at every nodeThread-basedCheckpoint recovery
CrewAIConfigurable vector DBTask-levelPer-crewTask retry
AutoGenConfigurableSession-basedPer-groupSession replay
OpenAI SDKSession APISession historyAuto-managedSession resume
MetaGPTIn-memory + filesNo built-inPer-runNo
Claude SDKFile-basedNo built-inConversationNo
CF AgentsDO SQLiteAutomatic (every setState)Per-DO instanceDO auto-recovery
Pydantic AIConfigurableDurable executionPer-conversationProgress preserved
BabyAGIPineconeTask listPer-runVector reload
AutoGPTPostgreSQLBlock statePer-agentDB recovery

Key insight: LangGraph and Cloudflare Agents SDK have the strongest state persistence stories. LangGraph checkpoints at every graph node with full state snapshots. CF Agents auto-persist state to Durable Object SQLite on every setState() call. Both enable true resume-from-failure — every other framework requires manual work.


FrameworkGitHub StarsLast ActiveDocs QualityPlugin SystemCommunity
OpenClaw302KDailyExcellentSkills registry (5,400+)Massive
NanoBot (HKUDS)27KDailyGoodClawHubGrowing
NanoClaw22KDailyGoodFork-and-modifyGrowing
PicoClaw18KWeeklyFairLimitedGrowing
ZeroClaw17KWeeklyGoodTOML manifestsGrowing
NullClaw12KWeeklyFairVtable interfacesSmall
TinyClaw8KWeeklyFairAgent configsSmall
CrewAI46KDailyExcellent7,000+ toolsLarge
MetaGPT42KMonthlyGoodRole systemLarge
AutoGen38KDailyGoodPlugin systemLarge
BabyAGI20KMonthlyFairLimitedDeclining
Swarm18KArchivedFairDeprecatedMigrating to Agents SDK
OpenAI SDK15KDailyExcellentMCP + toolsGrowing
Pydantic AI15KDailyExcellentMCP + A2AGrowing
SuperAGI15KStalled (Jan 2024)OutdatedMarketplaceDead
Agent Zero12KWeeklyGoodPython functionsNiche
AutoGPT170KMonthlyFairBlock marketplaceDeclining

After analyzing 18 frameworks, several consensus patterns emerge:

1. Markdown Files as Persistent Memory

Every Claw-family framework uses MEMORY.md. Claude Code uses CLAUDE.md. MetaGPT uses structured artifacts. The pattern is universal: human-readable text files for state that humans need to inspect.

MEMORY.md   # What the agent knows (curated facts)
SOUL.md     # Who the agent is (personality, boundaries)
AGENTS.md   # How the agent behaves (rules, injected every turn)

This is not technically optimal (linear scan, no semantic search), but it solves the debuggability problem that vector-only systems fail at.

2. Hybrid Search Is the Memory Sweet Spot

The frameworks with the best memory systems all combine vector similarity with keyword matching:

Pure vector search misses exact terms (“wrangler.jsonc” won’t match “wrangler configuration”). Pure keyword search misses semantic similarity (“deployment tool” won’t match “wrangler”). The 70/30 split appears to be the emerging consensus.

3. Skills Are Just Markdown

OpenClaw proved it. ZeroClaw adopted it. The skill-as-markdown-file pattern has become the standard:

skills/
  my-skill/
    SKILL.md        # YAML frontmatter + markdown instructions
    (optional files) # Templates, configs, reference code

No SDK, no compilation, no runtime dependency. The agent reads the skill file and follows the instructions. This works because modern LLMs are good enough to follow structured markdown instructions reliably.

4. Container Isolation for Security

NanoClaw and Agent Zero both use container isolation per agent. This is the only pattern that provides real security — everything else (workspace isolation, tool allowlists) is convention-based and can be bypassed.

5. Polling, Not Pushing

NanoClaw polls IPC directories every second. OpenClaw watches file changes. BabyAGI runs a poll loop. The pattern: simple polling is more reliable than complex push systems for agent coordination.

6. Single Agent Scales Down, Multi-Agent Scales Up

The Claw family (OpenClaw, NanoClaw, PicoClaw, ZeroClaw, NullClaw) are all single-agent systems. They scale down beautifully — NullClaw runs on 1MB RAM.

Multi-agent systems (CrewAI, AutoGen, TinyClaw, MetaGPT) scale up — they can handle complex workflows with specialized roles. But they add significant complexity.

The pattern: start with a single agent, add multi-agent only when a single agent demonstrably cannot handle the workload.

7. State Machines for Complex Workflows

LangGraph’s graph model and CrewAI’s Flows both treat agent workflows as state machines. This provides:

For anything beyond simple chat, state machines are emerging as the coordination primitive.


How to Build a Hybrid Memory System

The best memory systems combine multiple retrieval strategies. Here is a TypeScript implementation inspired by OpenClaw and ZeroClaw:

import { Database } from "better-sqlite3";

interface Memory {
  id: string;
  content: string;
  category: "fact" | "solution" | "preference" | "error";
  embedding: Float32Array;
  created_at: number;
  importance: number;
}

interface SearchResult {
  memory: Memory;
  score: number;
  source: "vector" | "keyword" | "both";
}

class HybridMemoryStore {
  private db: Database;
  private vectorWeight = 0.7;
  private keywordWeight = 0.3;
  private decayHalfLifeDays = 30;

  constructor(dbPath: string) {
    this.db = new Database(dbPath);
    this.initSchema();
  }

  private initSchema() {
    this.db.exec(`
      CREATE TABLE IF NOT EXISTS memories (
        id TEXT PRIMARY KEY,
        content TEXT NOT NULL,
        category TEXT NOT NULL,
        embedding BLOB,
        created_at INTEGER NOT NULL,
        importance REAL DEFAULT 1.0
      );
      CREATE VIRTUAL TABLE IF NOT EXISTS memories_fts
        USING fts5(content, id UNINDEXED);
    `);
  }

  async store(memory: Omit<Memory, "id" | "created_at">): Promise<string> {
    const id = crypto.randomUUID();
    const created_at = Date.now();

    this.db.prepare(`
      INSERT INTO memories (id, content, category, embedding, created_at, importance)
      VALUES (?, ?, ?, ?, ?, ?)
    `).run(id, memory.content, memory.category,
           Buffer.from(memory.embedding.buffer), created_at, memory.importance);

    this.db.prepare(`
      INSERT INTO memories_fts (id, content) VALUES (?, ?)
    `).run(id, memory.content);

    return id;
  }

  async search(
    query: string,
    queryEmbedding: Float32Array,
    limit: number = 10
  ): Promise<SearchResult[]> {
    // Vector search
    const vectorResults = this.vectorSearch(queryEmbedding, limit * 2);

    // Keyword search
    const keywordResults = this.keywordSearch(query, limit * 2);

    // Merge with weights
    const merged = this.mergeResults(vectorResults, keywordResults);

    // Apply temporal decay
    const decayed = merged.map((r) => ({
      ...r,
      score: r.score * this.temporalDecay(r.memory.created_at),
    }));

    return decayed
      .sort((a, b) => b.score - a.score)
      .slice(0, limit);
  }

  private vectorSearch(
    queryEmbedding: Float32Array,
    limit: number
  ): SearchResult[] {
    const rows = this.db.prepare(`
      SELECT id, content, category, embedding, created_at, importance
      FROM memories WHERE embedding IS NOT NULL
    `).all() as any[];

    return rows
      .map((row) => {
        const embedding = new Float32Array(row.embedding.buffer);
        const similarity = this.cosineSimilarity(queryEmbedding, embedding);
        return {
          memory: { ...row, embedding },
          score: similarity * this.vectorWeight,
          source: "vector" as const,
        };
      })
      .sort((a, b) => b.score - a.score)
      .slice(0, limit);
  }

  private keywordSearch(query: string, limit: number): SearchResult[] {
    const rows = this.db.prepare(`
      SELECT m.id, m.content, m.category, m.embedding, m.created_at, m.importance,
             bm25(memories_fts) as bm25_score
      FROM memories_fts f
      JOIN memories m ON f.id = m.id
      WHERE memories_fts MATCH ?
      ORDER BY bm25_score
      LIMIT ?
    `).all(query, limit) as any[];

    return rows.map((row) => ({
      memory: row,
      score: Math.abs(row.bm25_score) * this.keywordWeight,
      source: "keyword" as const,
    }));
  }

  private mergeResults(
    vectorResults: SearchResult[],
    keywordResults: SearchResult[]
  ): SearchResult[] {
    const merged = new Map<string, SearchResult>();

    for (const r of vectorResults) {
      merged.set(r.memory.id, r);
    }

    for (const r of keywordResults) {
      const existing = merged.get(r.memory.id);
      if (existing) {
        existing.score += r.score;
        existing.source = "both";
      } else {
        merged.set(r.memory.id, r);
      }
    }

    return Array.from(merged.values());
  }

  private temporalDecay(createdAt: number): number {
    const ageMs = Date.now() - createdAt;
    const ageDays = ageMs / (1000 * 60 * 60 * 24);
    return Math.pow(0.5, ageDays / this.decayHalfLifeDays);
  }

  private cosineSimilarity(a: Float32Array, b: Float32Array): number {
    let dot = 0, normA = 0, normB = 0;
    for (let i = 0; i < a.length; i++) {
      dot += a[i] * b[i];
      normA += a[i] * a[i];
      normB += b[i] * b[i];
    }
    return dot / (Math.sqrt(normA) * Math.sqrt(normB));
  }
}

How to Build an Auto-Learning System

Inspired by Agent Zero’s approach, here is how to add auto-extraction to any agent:

interface ExtractedKnowledge {
  content: string;
  category: "fact" | "solution" | "preference" | "error";
  confidence: number;
}

class AutoLearner {
  private memoryStore: HybridMemoryStore;
  private extractionModel: string;
  private embeddingModel: string;

  constructor(
    memoryStore: HybridMemoryStore,
    extractionModel = "claude-haiku-3-5-20241022", // cheap model for extraction
    embeddingModel = "nomic-embed-text"
  ) {
    this.memoryStore = memoryStore;
    this.extractionModel = extractionModel;
    this.embeddingModel = embeddingModel;
  }

  async learnFromConversation(
    userMessage: string,
    assistantResponse: string
  ): Promise<ExtractedKnowledge[]> {
    const extracted = await this.extract(userMessage, assistantResponse);

    for (const item of extracted) {
      if (item.confidence < 0.7) continue; // Skip low-confidence extractions

      // Check for duplicates via semantic search
      const embedding = await this.embed(item.content);
      const existing = await this.memoryStore.search(item.content, embedding, 3);

      const isDuplicate = existing.some((r) => r.score > 0.9);
      if (isDuplicate) continue;

      await this.memoryStore.store({
        content: item.content,
        category: item.category,
        embedding,
        importance: item.confidence,
      });
    }

    return extracted;
  }

  private async extract(
    userMessage: string,
    assistantResponse: string
  ): Promise<ExtractedKnowledge[]> {
    const prompt = `Extract factual knowledge from this conversation turn.

For each piece of knowledge, classify it:
- fact: Something true about the user, their project, or their environment
- solution: A problem-solution pair that worked
- preference: A user preference or style choice
- error: A mistake or anti-pattern discovered

Return JSON array. Only include high-confidence items.

User: ${userMessage}
Assistant: ${assistantResponse}

Extract:`;

    const response = await callLLM(this.extractionModel, prompt);
    return JSON.parse(response);
  }

  private async embed(text: string): Promise<Float32Array> {
    return callEmbedding(this.embeddingModel, text);
  }

  async consolidate(): Promise<number> {
    // Run periodically to merge related memories
    const allMemories = await this.memoryStore.getAll();
    let mergeCount = 0;

    for (let i = 0; i < allMemories.length; i++) {
      for (let j = i + 1; j < allMemories.length; j++) {
        const similarity = this.memoryStore.cosineSimilarity(
          allMemories[i].embedding,
          allMemories[j].embedding
        );

        if (similarity > 0.85) {
          // Merge: ask LLM to combine
          const merged = await this.merge(allMemories[i], allMemories[j]);
          await this.memoryStore.store(merged);
          await this.memoryStore.delete(allMemories[i].id);
          await this.memoryStore.delete(allMemories[j].id);
          mergeCount++;
        }
      }
    }

    return mergeCount;
  }

  private async merge(a: Memory, b: Memory): Promise<Omit<Memory, "id" | "created_at">> {
    const prompt = `Merge these two related memories into one concise statement:
1: ${a.content}
2: ${b.content}

Merged:`;

    const content = await callLLM(this.extractionModel, prompt);
    const embedding = await this.embed(content);

    return {
      content,
      category: a.category,
      embedding,
      importance: Math.max(a.importance, b.importance),
    };
  }
}

How to Build a Skill Loader

A TypeScript implementation of the SKILL.md pattern:

import { readdir, readFile } from "fs/promises";
import { join } from "path";
import { parse as parseYaml } from "yaml";

interface SkillMetadata {
  name: string;
  description: string;
  requires?: {
    binaries?: string[];
    env?: string[];
  };
  tags?: string[];
  platforms?: string[];
}

interface Skill {
  metadata: SkillMetadata;
  instructions: string;
  path: string;
}

class SkillLoader {
  private skillPaths: string[];
  private loadedSkills: Map<string, Skill> = new Map();

  constructor(skillPaths: string[]) {
    // Precedence: workspace > user > bundled (last wins on conflict)
    this.skillPaths = skillPaths;
  }

  async loadAll(): Promise<Map<string, Skill>> {
    for (const basePath of this.skillPaths) {
      try {
        const dirs = await readdir(basePath, { withFileTypes: true });
        for (const dir of dirs) {
          if (!dir.isDirectory()) continue;
          const skillPath = join(basePath, dir.name, "SKILL.md");
          try {
            const skill = await this.parseSkillFile(skillPath);
            // Later paths override earlier (workspace > user > bundled)
            this.loadedSkills.set(skill.metadata.name, skill);
          } catch {
            // Skip invalid skill files
          }
        }
      } catch {
        // Skip missing directories
      }
    }

    return this.loadedSkills;
  }

  private async parseSkillFile(path: string): Promise<Skill> {
    const content = await readFile(path, "utf-8");

    // Parse YAML frontmatter
    const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/);
    if (!frontmatterMatch) {
      throw new Error(`Invalid SKILL.md format: ${path}`);
    }

    const metadata = parseYaml(frontmatterMatch[1]) as SkillMetadata;
    const instructions = frontmatterMatch[2].trim();

    return { metadata, instructions, path };
  }

  async getAvailableSkills(
    context: { env: Record<string, string>; platform: string }
  ): Promise<Skill[]> {
    const available: Skill[] = [];

    for (const skill of this.loadedSkills.values()) {
      // Check environment requirements
      if (skill.metadata.requires?.env) {
        const missing = skill.metadata.requires.env.filter(
          (e) => !context.env[e]
        );
        if (missing.length > 0) continue;
      }

      // Check platform requirements
      if (
        skill.metadata.platforms &&
        !skill.metadata.platforms.includes("all") &&
        !skill.metadata.platforms.includes(context.platform)
      ) {
        continue;
      }

      available.push(skill);
    }

    return available;
  }

  formatForSystemPrompt(skills: Skill[]): string {
    // OpenClaw-style: inject compact XML list into system prompt
    const lines = skills.map(
      (s) =>
        `<skill name="${s.metadata.name}" description="${s.metadata.description}" />`
    );
    return `<available_skills>\n${lines.join("\n")}\n</available_skills>`;
  }
}

// Usage:
const loader = new SkillLoader([
  "./skills",                    // Workspace skills (highest priority)
  `${process.env.HOME}/.openclaw/skills`,  // User skills
  "./bundled-skills",            // Built-in skills (lowest priority)
]);

const skills = await loader.loadAll();
const available = await loader.getAvailableSkills({
  env: process.env as Record<string, string>,
  platform: process.platform,
});

const systemPromptSkills = loader.formatForSystemPrompt(available);

How to Build a Multi-Executor Router

For systems like Code Turtle that need to route tasks to different LLM backends based on cost and capability:

interface ExecutorConfig {
  name: string;
  model: string;
  provider: "anthropic" | "openai" | "google" | "openrouter" | "workers-ai";
  costPerMTokInput: number;
  costPerMTokOutput: number;
  maxContextTokens: number;
  capabilities: Set<"code" | "reasoning" | "simple" | "embedding">;
  rateLimit: { requests: number; perSeconds: number };
  currentUsage: { requests: number; windowStart: number };
}

class ExecutorRouter {
  private executors: ExecutorConfig[];
  private dailyBudget: number;
  private dailySpend: number = 0;

  constructor(executors: ExecutorConfig[], dailyBudget: number) {
    this.executors = executors;
    this.dailyBudget = dailyBudget;
  }

  selectExecutor(task: {
    type: "code" | "reasoning" | "simple";
    estimatedInputTokens: number;
    estimatedOutputTokens: number;
    priority: "high" | "normal" | "low";
  }): ExecutorConfig | null {
    // Filter by capability
    const capable = this.executors.filter((e) =>
      e.capabilities.has(task.type)
    );

    // Filter by rate limit
    const available = capable.filter((e) => {
      const now = Date.now() / 1000;
      if (now - e.currentUsage.windowStart > e.rateLimit.perSeconds) {
        e.currentUsage = { requests: 0, windowStart: now };
      }
      return e.currentUsage.requests < e.rateLimit.requests;
    });

    if (available.length === 0) return null;

    // Estimate cost for each
    const withCost = available.map((e) => ({
      executor: e,
      estimatedCost:
        (task.estimatedInputTokens / 1_000_000) * e.costPerMTokInput +
        (task.estimatedOutputTokens / 1_000_000) * e.costPerMTokOutput,
    }));

    // Budget check
    const affordable = withCost.filter(
      (e) => this.dailySpend + e.estimatedCost <= this.dailyBudget
    );

    if (affordable.length === 0) {
      // Budget exceeded -- only allow free tier
      const free = withCost.filter((e) => e.estimatedCost === 0);
      return free.length > 0 ? free[0].executor : null;
    }

    // Strategy by priority
    if (task.priority === "high") {
      // Best model regardless of cost (within budget)
      return affordable.sort(
        (a, b) => b.estimatedCost - a.estimatedCost
      )[0].executor;
    }

    // Normal/Low: cheapest capable model
    return affordable.sort(
      (a, b) => a.estimatedCost - b.estimatedCost
    )[0].executor;
  }

  recordUsage(executor: ExecutorConfig, actualCost: number): void {
    executor.currentUsage.requests++;
    this.dailySpend += actualCost;
  }
}

// Example configuration:
const executors: ExecutorConfig[] = [
  {
    name: "claude-sonnet",
    model: "claude-sonnet-4-5-20250514",
    provider: "anthropic",
    costPerMTokInput: 3,
    costPerMTokOutput: 15,
    maxContextTokens: 200_000,
    capabilities: new Set(["code", "reasoning", "simple"]),
    rateLimit: { requests: 50, perSeconds: 60 },
    currentUsage: { requests: 0, windowStart: 0 },
  },
  {
    name: "claude-haiku",
    model: "claude-haiku-3-5-20241022",
    provider: "anthropic",
    costPerMTokInput: 0.8,
    costPerMTokOutput: 4,
    maxContextTokens: 200_000,
    capabilities: new Set(["simple", "code"]),
    rateLimit: { requests: 100, perSeconds: 60 },
    currentUsage: { requests: 0, windowStart: 0 },
  },
  {
    name: "gemini-flash",
    model: "gemini-2.0-flash",
    provider: "google",
    costPerMTokInput: 0.1,
    costPerMTokOutput: 0.4,
    maxContextTokens: 1_000_000,
    capabilities: new Set(["simple", "reasoning"]),
    rateLimit: { requests: 15, perSeconds: 60 },
    currentUsage: { requests: 0, windowStart: 0 },
  },
  {
    name: "workers-ai-llama",
    model: "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
    provider: "workers-ai",
    costPerMTokInput: 0,  // Free tier
    costPerMTokOutput: 0,
    maxContextTokens: 8_192,
    capabilities: new Set(["simple"]),
    rateLimit: { requests: 300, perSeconds: 60 },
    currentUsage: { requests: 0, windowStart: 0 },
  },
];

Don’tDo InsteadWhy
Store all memory in a vector database onlyUse hybrid search (vector + keyword) with human-readable MEMORY.mdVector-only misses exact terms and is impossible to debug by inspection
Give agents unrestricted shell accessUse container isolation (NanoClaw) or tool allowlists (ZeroClaw)One rm -rf / away from disaster. Security is not optional
Use one model for everythingRoute by task complexity: Haiku for simple, Sonnet for code, Opus for reasoningA single Opus call costs 60x a Haiku call. Cost control requires routing
Build multi-agent when single-agent sufficesStart single-agent, add coordination only when demonstrably neededMulti-agent adds latency, cost, and debugging complexity
Rely on LLM memory alone (no persistent store)Persist key facts to files or database after every sessionLLMs forget everything between context windows. Persistent memory is infrastructure, not optional
Use push-based agent coordinationUse polling (check for new tasks every N seconds)Push systems are harder to debug, harder to make reliable, harder to replay
Skip idempotency in task executionCheck if task was already completed before executingAgents retry. Networks flake. At-least-once delivery means duplicate execution is guaranteed
Store embeddings without an LRU cacheCache recent embeddings (ZeroClaw caches 10K)Embedding API calls are slow and expensive. Cache hits are free
Use a stalled/dead framework (SuperAGI, early BabyAGI)Use actively maintained frameworksUnpatched security vulnerabilities, no bug fixes, community has moved on
Lock into a single LLM providerAbstract the LLM interface (provider trait/interface)Pricing changes, outages, and new models are constant. Provider agnosticism is insurance
Inject all skills into every promptFilter skills by context, load only relevant ones5,400 skills in the prompt would consume the entire context window. Selective injection is mandatory
Build custom memory from scratchAdopt the hybrid search pattern (SQLite + FTS5 + vector)This is a solved problem. OpenClaw, ZeroClaw, and NullClaw all converge on the same design

Personal AI Assistant (Chat + Automation)

Recommended: OpenClaw (full-featured) or NanoBot (lightweight)

OpenClaw if you want the largest ecosystem, most messaging platform integrations, and battle-tested memory system. NanoBot if you want something lighter with knowledge graph memory and Python extensibility. NanoClaw if you want container security and are Claude-only.

Edge / IoT / Embedded

Recommended: PicoClaw (Go, <10MB) or NullClaw (Zig, 678KB)

PicoClaw for the widest architecture support (RISC-V, ARM, MIPS, x86) and Go’s ecosystem. NullClaw for the absolute smallest footprint and hardware peripheral support.

Multi-Agent Team Workflows

Recommended: CrewAI (role-based) or LangGraph (graph-based)

CrewAI for intuitive role definitions and the largest tool ecosystem (7,000+). LangGraph for the strongest checkpointing, persistence, and human-in-the-loop story. AutoGen if you need enterprise features and Azure integration.

TinyClaw is interesting for lightweight team collaboration with fan-out patterns.

Production Enterprise Deployment

Recommended: LangGraph + PostgresSaver or Microsoft Agent Framework

LangGraph is GA at v1.0.10 with full checkpoint recovery and production tooling. Microsoft Agent Framework for Azure-native shops that need GDPR, OpenTelemetry, and enterprise support.

Serverless / Edge-Native Agents

Recommended: Cloudflare Agents SDK

The only framework where each agent is a Durable Object with auto-persistent state, built-in scheduling, WebSocket communication, and queue integration. No alternative matches this for edge-native deployment.

Auto-Learning Agents

Recommended: Agent Zero (adopt the pattern) + your framework of choice

No other framework does automatic knowledge extraction. Implement Agent Zero’s extract-embed-consolidate pattern on top of whatever framework you choose.

Cost-Optimized Autonomous Workers

Recommended: Build a multi-executor router (see implementation above)

No framework handles this well out of the box. You need to build a routing layer that picks the cheapest capable model per task and enforces daily budgets. The executor router pattern in this article is a starting point.

Code Review / Development Agents

Recommended: Claude Agent SDK or OpenAI Agents SDK

Claude Agent SDK gives you the full Claude Code toolset (Read, Write, Edit, Bash). OpenAI Agents SDK gives you provider flexibility with clean handoff patterns.

Recommendation Matrix

Use CaseFirst ChoiceRunner-UpAvoid
Personal assistantOpenClawNanoBotSuperAGI
Edge / IoTPicoClawNullClawOpenClaw (too heavy)
Multi-agent teamsCrewAILangGraphBabyAGI
EnterpriseLangGraphMicrosoft Agent FrameworkSwarm (deprecated)
ServerlessCF Agents SDK—Local-only frameworks
Auto-learningAgent Zero pattern—Frameworks without persistence
Cost-optimizedCustom router—Single-provider frameworks
Code agentsClaude Agent SDKOpenAI Agents SDKAutoGPT
Type safetyPydantic AILangGraphUntyped frameworks
Software company simMetaGPTCrewAISingle-agent frameworks

Our autonomous issue worker needs:

  1. Multi-executor backends — The executor router pattern above solves this. Abstract the LLM interface, route by task complexity and budget. Start with Claude Sonnet for code, Haiku for simple tasks, Workers AI for free-tier simple tasks.

  2. A skill system — Adopt the SKILL.md pattern from OpenClaw. Skills as markdown files with YAML frontmatter. No SDK, no compilation. The skill loader implementation above is directly usable.

  3. A coordinator (L2) — LangGraph’s graph model is the best pattern for multi-step workflows. But for a single-agent system, a simple state machine with checkpoint/resume is sufficient. Start with Pydantic AI’s durable execution or implement checkpoint/resume over D1.

  4. Memory across runs — Implement the hybrid memory system above. SQLite + FTS5 + vector embeddings. Store MEMORY.md for human inspection. Add auto-learning (Agent Zero pattern) once the basic system works.

  5. Cost optimization — The executor router with daily budgets, rate limit awareness, and free-tier fallback. Track actual costs per task and adjust routing thresholds based on real data.

Patterns to adopt: SKILL.md skills, hybrid memory search, auto-learning extraction, multi-executor routing, checkpoint/resume state Patterns to skip: Multi-agent coordination (overkill for our use case), push-based communication (polling is simpler), full graph workflow engine (state machine is sufficient)


Official Documentation

Articles and Analysis

Comparison Articles

Ecosystem and Community

Tutorials and Guides

Platform Documentation


Edit page
Share this post on:

Previous Post
Claude Code Context Monitoring
Next Post
Composable Processor Architecture for AI Content Pipelines