Autonomous Agent Frameworks

Org Status: 🟡 Dormant Cloudflare: N/A Last Audited: 2026-04-28

The autonomous agent ecosystem exploded in early 2026. OpenClaw became the fastest-growing open-source project in GitHub history, crossing 302K stars in 60 days. A wave of lightweight alternatives followed — NanoClaw, PicoClaw, NullClaw, ZeroClaw, NanoBot, TinyClaw — each making different tradeoffs around memory, skill systems, communication patterns, and execution models. Meanwhile, enterprise frameworks like LangGraph, CrewAI, AutoGen, and the OpenAI Agents SDK matured toward production readiness.

This article systematically compares 18 agent frameworks across every dimension that matters: architecture, memory, skills, communication, execution, state management, and ecosystem. The goal is not to crown a winner — it is to map the design space so you can pick the right patterns for your system.

What you will learn:

How each framework implements memory (short-term, long-term, vector search, auto-learning)
How skills and tools are defined, composed, and shared across agents
What communication patterns exist between agents and with humans
How execution models differ (local CLI, cloud, serverless, edge)
Which architectural patterns emerge as consensus across frameworks
Recommendations by use case with honest tradeoffs

The Problem
Framework Overview
Deep Dives
Master Feature Matrix
Memory Systems Compared
Skills and Tools Compared
Communication Patterns Compared
Execution Models Compared
State Management Compared
Ecosystem Compared
Architecture Patterns That Emerge
Implementation Deep Dives
Anti-Patterns
Recommendations by Use Case
References

Building an autonomous agent that works reliably requires solving at least six hard problems simultaneously:

Memory — How does the agent remember what happened last session? Last week? How does it decide what to forget?
Skills — How do you teach an agent a repeatable procedure? Can skills be shared, versioned, composed?
Communication — How do agents talk to each other? How does a human interrupt or redirect?
Execution — Where does the agent run? How do you sandbox it? How do you control cost?
State — Can an agent resume from a checkpoint? Can it survive a crash?
Coordination — In multi-agent systems, who decides what runs next?

No single framework solves all six perfectly. Most are strong in one or two areas and weak in the rest. The frameworks that emerged in 2025-2026 represent the first generation of serious attempts at production-grade autonomous agents, and they make radically different design choices.

What changes if you get this right

A well-chosen agent architecture lets you:

Run autonomous workflows that span hours or days without human intervention
Route tasks to the cheapest capable model (Haiku for simple, Opus for complex)
Resume from failures without losing progress
Share learned procedures across agents without copy-pasting prompts
Scale from a single local agent to a distributed multi-agent system

What happens if you get it wrong

Agents that forget everything between sessions
Skills that are brittle, non-transferable prompt hacks
Communication that is either missing (agents can’t coordinate) or chaotic (message storms)
Execution that blows through token budgets with no circuit breaker
State that vanishes on crash, forcing full restarts

The Claw Family (OpenClaw-derived)

The single biggest event in the agent framework space was OpenClaw’s explosive growth. Peter Steinberger’s personal AI assistant went from obscure side project to 302K GitHub stars in 60 days, surpassing React’s 10-year record. Steinberger joined OpenAI in February 2026 and moved the project to an open-source foundation.

This spawned an entire family of lightweight alternatives:

Framework	Language	Binary Size	RAM	Startup	Stars	Focus
OpenClaw	TypeScript	~200MB (Node)	~1GB	~5s	302K	Full-featured personal AI assistant
NanoClaw	TypeScript	~50MB (Node)	~200MB	~3s	22K	Container-isolated, Claude-native
NanoBot	Python	N/A	~100MB	~2s	27K	Ultra-light, knowledge graph memory
PicoClaw	Go	<10MB	<10MB	<1s	18K	Edge/IoT, $10 hardware
ZeroClaw	Rust	~16MB	~5MB	<50ms	17K	Trait-driven, pluggable everything
NullClaw	Zig	678KB	~1MB	<2ms	12K	Smallest possible, hardware peripherals
TinyClaw	TypeScript	~60MB (Node)	~300MB	~3s	8K	Multi-agent teams, collaboration

Enterprise / Research Frameworks

Framework	Language	Stars	Focus
LangGraph	Python/TS	10K	Stateful agent graphs, checkpointing
CrewAI	Python	46K	Multi-agent orchestration, role-based
AutoGen / Microsoft Agent Framework	Python/.NET	38K	Enterprise multi-agent, async messaging
OpenAI Agents SDK	Python/TS	15K	Production evolution of Swarm
MetaGPT	Python	42K	SOP-driven software company simulation
Claude Agent SDK	Python/TS	N/A	Claude Code tooling as a framework
Pydantic AI	Python	15K	Type-safe agents, structured output
Agent Zero	Python	12K	Auto-learning, hierarchical subordinates
Cloudflare Agents SDK	TypeScript	N/A	Durable Objects as agents, edge-native

Legacy / Educational

Framework	Language	Stars	Status
AutoGPT	Python/TS	170K	Pivoted to low-code platform
BabyAGI	Python	20K	Experimental, self-building
SuperAGI	Python	15K	Stalled since Jan 2024
Swarm (OpenAI)	Python	18K	Replaced by Agents SDK

OpenClaw

What it is: The most popular autonomous AI agent framework. Runs locally, connects to 20+ messaging platforms, and provides a hub-and-spoke architecture with a local WebSocket gateway.

Strengths:

Massive ecosystem: 5,400+ skills in the official registry, 103 production-ready agent templates
File-based memory system that is easy to inspect, debug, and version control
Hybrid search (vector + BM25) over per-agent SQLite with temporal decay
Multi-provider LLM support with fallback chains
Skills are just markdown files (SKILL.md) — no SDK, no compilation

Weaknesses:

Heavy: ~1GB RAM, 5-second startup. Not suitable for edge or constrained environments
Local-only architecture. Cloud deployment requires MoltWorker or similar wrappers
The 302K stars created a gold rush of low-quality forks and plugins
Agent-to-agent communication is opt-in and somewhat clunky

Architecture:

User (WhatsApp/Telegram/Slack/...)
        |
Local Gateway (ws://127.0.0.1:18789)
        |
Agent Runtime Sessions
  |-- SOUL.md      (personality)
  |-- AGENTS.md    (behavior rules, injected every turn)
  |-- MEMORY.md    (long-term curated facts)
  |-- memory/      (daily logs, temporal decay)
  |-- skills/      (SKILL.md playbooks, selectively injected)
  |-- SQLite       (sessions, search index)

Skill definition example:

---
name: deploy-to-cloudflare
description: Deploy a Cloudflare Workers project using Wrangler
requires:
  binaries: ["wrangler", "node"]
  env: ["CLOUDFLARE_ACCOUNT_ID", "CLOUDFLARE_API_TOKEN"]
tags: ["deployment", "cloudflare", "infrastructure"]
---


1. Verify wrangler.jsonc exists in the project root
2. Run `wrangler whoami` to verify authentication
3. Run `wrangler deploy` and capture output
4. Verify deployment by checking the output URL
5. If deployment fails, read error output and attempt fix


- If CLOUDFLARE_API_TOKEN is missing, tell the user
- If wrangler.jsonc is missing, check for wrangler.toml and convert
- If deployment fails with route conflict, suggest manual resolution

Key insight: OpenClaw proved that skills-as-markdown is a viable pattern. No SDK, no compilation, no dependency management. Just a folder with a SKILL.md file. This pattern has been adopted by nearly every framework in the Claw family.

NanoClaw

What it is: A container-isolated, Claude-native alternative to OpenClaw. ~500 lines of core TypeScript. Each agent runs in its own Linux container with filesystem isolation.

Strengths:

Security through container isolation (Docker or Apple Container)
Built directly on the Claude Agent SDK
Dead simple filesystem IPC (JSON files polled by host every second)
“Skills over Features” philosophy — users add capabilities by having Claude modify the codebase

Weaknesses:

Claude-only. No multi-provider support
Container overhead means slightly higher latency per request
Smaller ecosystem than OpenClaw

Architecture:

Platform -> Channel.onMessage() -> storeMessage(SQLite)
  -> MessageLoop polls -> GroupQueue enqueues
  -> runContainerAgent() spawns container
  -> Claude Agent SDK processes
  -> IPC files written -> Host polls & routes
  -> Channel delivers to platform

IPC structure:

data/ipc/{group}/
  |-- messages/            # Outbound message JSON files
  |-- tasks/               # Schedule/pause/cancel task JSONs
  |-- current_tasks.json   # Host -> container snapshot
  |-- available_groups.json

Tool definition example (MCP tools inside container):

// NanoClaw exposes these as MCP tools to Claude inside each container
const tools = {
  send_message: async (params: { group: string; text: string }) => {
    // Write JSON to data/ipc/{group}/messages/
    await writeFile(
      `data/ipc/${params.group}/messages/${Date.now()}.json`,
      JSON.stringify({ text: params.text, timestamp: new Date().toISOString() })
    );
  },
  schedule_task: async (params: { name: string; cron: string; prompt: string }) => {
    await writeFile(
      `data/ipc/${currentGroup}/tasks/${params.name}.json`,
      JSON.stringify({ action: "schedule", cron: params.cron, prompt: params.prompt })
    );
  },
  list_tasks: async () => {
    const tasks = await readFile(`data/ipc/${currentGroup}/current_tasks.json`);
    return JSON.parse(tasks);
  },
};

Key insight: NanoClaw proves you can build a production-capable agent system in ~500 lines by standing on the Claude Agent SDK. The “skills over features” model — where customization means code changes, not configuration — avoids the configuration sprawl that plagues larger frameworks.

NanoBot (HKUDS)

What it is: An ultra-lightweight Python alternative to OpenClaw. ~4,000 lines. Delivers core agent functionality with 99% less code than OpenClaw.

Strengths:

Stateful knowledge graph memory — the agent builds a local graph of user history and context
Model-agnostic: works with OpenAI, Anthropic, local models
MCP support for external tool integration
ClawHub skill — search and install public agent skills
Clean Python codebase that is easy to read and extend

Weaknesses:

Python means slower startup than Go/Rust/Zig alternatives
Knowledge graph memory is more complex to debug than flat file memory
Smaller community than OpenClaw (27K vs 302K stars)

Memory example:

PicoClaw

What it is: An ultra-lightweight Go-based agent. <10MB RAM, boots in 1 second, runs on $10 RISC-V hardware. 95% of core code is AI-generated through a self-bootstrapping process.

Strengths:

Single static binary across RISC-V, ARM, MIPS, and x86
400x faster startup than OpenClaw
Gateway command for multi-platform messaging (Telegram, Discord, QQ, DingTalk)
Multi-provider LLM support (OpenRouter, Anthropic, OpenAI, DeepSeek, Groq)

Weaknesses:

Multi-agent collaboration is still in progress (Issue #294)
Smaller skill ecosystem
Memory system less sophisticated than OpenClaw’s hybrid search

Use case: Edge computing, IoT devices, Raspberry Pi, self-hosted agents on cheap hardware.

NullClaw

What it is: The smallest possible autonomous agent. 678KB binary, ~1MB RAM, <2ms startup. Written in raw Zig with zero dependencies — no Python, no JVM, no Go runtime.

Strengths:

23+ LLM providers, 18 channels, 18+ tools in a 678KB binary
Hybrid vector + FTS5 memory search in self-contained SQLite
Hardware peripheral support (MaixCam, sensors)
Multi-layer sandbox for security
Vtable-driven architecture — every subsystem is pluggable

Weaknesses:

Zig ecosystem is small — fewer contributors, harder to find developers
Documentation is sparser than TypeScript/Python alternatives
Community and plugin ecosystem much smaller

Architecture pattern:

// NullClaw's vtable-driven extension model
// Every subsystem implements a simple interface

const ProviderVTable = struct {
    init: *const fn (config: *const Config) anyerror!void,
    complete: *const fn (messages: []const Message) anyerror!Response,
    embed: *const fn (text: []const u8) anyerror![]f32,
    deinit: *const fn () void,
};

const ChannelVTable = struct {
    init: *const fn (config: *const Config) anyerror!void,
    receive: *const fn () anyerror!?InboundMessage,
    send: *const fn (message: OutboundMessage) anyerror!void,
    deinit: *const fn () void,
};

// Register a new provider:
pub fn registerProvider(name: []const u8, vtable: ProviderVTable) void {
    provider_registry.put(name, vtable);
}

Key insight: NullClaw proves that a full-featured agent runtime (providers, channels, tools, memory, sandbox) can fit in under 1MB. The vtable pattern is what makes this possible — zero abstraction cost, zero dynamic dispatch overhead.

ZeroClaw

What it is: A Rust-based agent runtime. ~16MB binary, ~5MB RAM. Trait-driven architecture where every subsystem is swappable.

Strengths:

Hybrid memory search: 70% vector (cosine similarity) + 30% FTS5 (BM25), tunable weights
Authentication pairing, workspace isolation, explicit tool allowlists
TOML-based skill manifests with community skill packs
Auto-recall: context automatically retrieved based on task
Embedding cache (LRU, 10K entries) for performance

Weaknesses:

Rust compile times slow down development iteration
Smaller community than Go or TypeScript alternatives
Multiple unofficial forks creating confusion (official org)

Memory configuration:

[memory]
backend = "sqlite"
vector_weight = 0.7
keyword_weight = 0.3
embedding_model = "nomic-embed-text"
embedding_cache_size = 10000
auto_recall = true

[memory.retention]
short_term_hours = 24
long_term_days = 365
decay_half_life_days = 30

[skills]
paths = ["./skills", "~/.zeroclaw/skills"]
community_registry = "https://registry.zeroclawlabs.ai"

TinyClaw

What it is: A multi-agent team collaboration framework. Agents work in teams, communicate through persistent chat rooms, and hand off work via chain execution and fan-out patterns.

Strengths:

First-class multi-agent teams: coder, reviewer, writer, researcher collaborate autonomously
TinyOffice web portal for monitoring agents, teams, queues, and event feeds
Actor model with simple message queue for agent-to-agent communication
Cross-channel context sharing (Discord, WhatsApp, Telegram)
Fan-out: one agent mentions a teammate, work distributes in parallel

Weaknesses:

Higher memory footprint than single-agent alternatives (~300MB)
Team coordination adds latency vs single-agent execution
Newer framework — API still evolving

Multi-agent team definition:

// TinyClaw team configuration
const team = {
  id: "content-team",
  agents: [
    {
      id: "researcher",
      model: "claude-sonnet-4-5-20250514",
      systemPrompt: "You are a research specialist...",
      tools: ["web_search", "read_file", "write_file"],
    },
    {
      id: "writer",
      model: "claude-sonnet-4-5-20250514",
      systemPrompt: "You are a content writer...",
      tools: ["read_file", "write_file", "send_message"],
    },
    {
      id: "reviewer",
      model: "claude-haiku-3-5-20241022",
      systemPrompt: "You are a content reviewer...",
      tools: ["read_file", "send_message"],
    },
  ],
  // Persistent team chat room -- all agents see all messages
  chatRoom: {
    persistence: "sqlite",
    broadcastAll: true,
  },
};

// Fan-out: researcher mentions @writer and @reviewer
// Both receive the message and process in parallel
// Responses flow back to the team chat room

Agent Zero

What it is: A general-purpose AI agent with the best auto-learning system in the open-source space. Hierarchical superior-subordinate model with Docker-isolated execution.

Strengths:

Auto-learning is the killer feature: FAISS vector memory with automatic fact extraction after every agent turn
Auto-consolidation: related memories merge and summarize over time
Model-agnostic with 3 slots (chat, utility, embedding) that can mix providers
Subordinate spawning: any agent can create sub-agents for specialized tasks
The agent genuinely gets better over time without explicit training

Weaknesses:

Smaller community (12K stars)
Docker dependency for execution isolation
Python-only
No built-in messaging platform integration (it is a framework, not a personal assistant)

Auto-learning memory flow:


extracted = utility_model.extract(
    conversation_turn,
    categories=["facts", "solutions", "preferences", "errors"]
)

for item in extracted:
    embedding = embedding_model.embed(item.text)
    faiss_index.add(embedding, metadata={
        "category": item.category,
        "timestamp": now(),
        "source_conversation": conversation_id,
    })


memories = faiss_index.search(current_query, k=10)

Key insight: Agent Zero’s auto-learning is the single most differentiating feature in the entire framework landscape. No other framework automatically extracts, categorizes, embeds, and consolidates knowledge from conversations. Every other system requires the user or developer to explicitly manage memory.

LangGraph

What it is: LangChain’s agent graph framework. Models agent workflows as state machines with nodes, edges, and reducers. Reached 1.0 GA in October 2025.

Strengths:

Strongest checkpointing and persistence story: MemorySaver, SqliteSaver, PostgresSaver
Thread-based state management with time-travel debugging
Human-in-the-loop built into the graph model (interrupt nodes)
Reducer-driven state schemas prevent data loss in multi-agent systems
Production-proven at scale

Weaknesses:

LangChain dependency (large, complex)
Graph DSL has a learning curve
Python-centric (TypeScript support exists but lags)
Overhead for simple use cases

State and checkpointing example:

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.postgres import PostgresSaver
from typing import TypedDict, Annotated
from operator import add

class AgentState(TypedDict):
    messages: Annotated[list, add]  # Reducer: append new messages
    task_status: str
    research_results: list[dict]
    draft: str
    review_feedback: str

def research_node(state: AgentState) -> dict:
    """Research node - fetches data and updates state."""
    results = search_api(state["messages"][-1])
    return {
        "research_results": results,
        "task_status": "researched",
    }

def write_node(state: AgentState) -> dict:
    """Write node - drafts content from research."""
    draft = llm.invoke(f"Write based on: {state['research_results']}")
    return {"draft": draft, "task_status": "drafted"}

def review_node(state: AgentState) -> dict:
    """Review node - human can interrupt here."""
    feedback = llm.invoke(f"Review: {state['draft']}")
    return {"review_feedback": feedback, "task_status": "reviewed"}

graph = StateGraph(AgentState)
graph.add_node("research", research_node)
graph.add_node("write", write_node)
graph.add_node("review", review_node)
graph.add_edge(START, "research")
graph.add_edge("research", "write")
graph.add_edge("write", "review")
graph.add_conditional_edges("review", lambda s: END if s["task_status"] == "approved" else "write")

checkpointer = PostgresSaver(conn_string="postgresql://...")
app = graph.compile(checkpointer=checkpointer)

config = {"configurable": {"thread_id": "article-draft-42"}}
result = app.invoke({"messages": ["Write about agent frameworks"]}, config)

resumed = app.invoke({"messages": ["Add more code examples"]}, config)

CrewAI

What it is: Multi-agent orchestration with role-based agent definitions. Dual architecture: Crews (autonomous teams) and Flows (event-driven workflows). 45.9K GitHub stars.

Strengths:

Intuitive role-based agent model (researcher, writer, editor)
600+ platform integrations, 7,000+ tools
Advanced memory: weighted strategies (recency, semantic, importance), configurable half-life
Adaptive-depth recall with composite scoring
Flows for enterprise event-driven workflows

Weaknesses:

Python-only
Can be slow for simple tasks (agent overhead)
Memory system complexity can be hard to debug
Commercial features locked behind enterprise tier

Agent and crew definition:

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive information about autonomous agent frameworks",
    backstory="You are an experienced tech researcher...",
    tools=[web_search, file_reader],
    memory=True,
    verbose=True,
    llm="anthropic/claude-sonnet-4-5-20250514",
)

writer = Agent(
    role="Technical Writer",
    goal="Produce clear, detailed technical articles",
    backstory="You are a staff engineer who writes...",
    tools=[file_writer, markdown_formatter],
    memory=True,
    llm="anthropic/claude-sonnet-4-5-20250514",
)

research_task = Task(
    description="Research {topic} and compile findings",
    expected_output="Structured research notes with citations",
    agent=researcher,
)

writing_task = Task(
    description="Write a comprehensive article based on research",
    expected_output="A 2000+ word technical article in Markdown",
    agent=writer,
    context=[research_task],  # Writer gets researcher's output
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    memory=True,
    memory_config={
        "provider": "mem0",
        "config": {"vector_store": {"provider": "chroma"}},
    },
)

result = crew.kickoff(inputs={"topic": "agent memory systems"})

AutoGen / Microsoft Agent Framework

What it is: Microsoft’s multi-agent framework. AutoGen v0.4 was a complete rewrite with async event-driven architecture. Now merging with Semantic Kernel into the Microsoft Agent Framework, targeting GA by end of Q1 2026.

Strengths:

Enterprise-ready: OpenTelemetry, GDPR, Azure integration
Cross-language: Python and .NET with more planned
Distributed agent networks across organizational boundaries
Graph-based workflows for explicit multi-agent orchestration
Session-based state management

Weaknesses:

Complexity: merging two large frameworks creates a steep learning curve
Microsoft-ecosystem bias (Azure, Semantic Kernel)
Migration path from AutoGen to Agent Framework is non-trivial
Heavyweight for simple use cases

AutoGen v0.4 agent example:

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient

model = OpenAIChatCompletionClient(model="gpt-4o")

coder = AssistantAgent(
    name="coder",
    model_client=model,
    system_message="You write Python code to solve tasks.",
)

reviewer = AssistantAgent(
    name="reviewer",
    model_client=model,
    system_message="You review code for bugs and improvements.",
)

termination = TextMentionTermination("APPROVED")

team = RoundRobinGroupChat(
    participants=[coder, reviewer],
    termination_condition=termination,
    max_turns=10,
)

result = await team.run(task="Write a function to calculate Fibonacci numbers")

OpenAI Agents SDK

What it is: The production evolution of Swarm. Lightweight, provider-agnostic multi-agent framework with handoffs, guardrails, and tracing.

Strengths:

Clean, minimal API (agents, handoffs, tools, guardrails)
Built-in tracing for debugging agent runs
Sessions for automatic conversation history management
Provider-agnostic: supports 100+ LLMs
Realtime voice agents

Weaknesses:

No built-in persistence/checkpointing (unlike LangGraph)
Limited memory system (session-based only)
Newer framework — less battle-tested at scale

Agent with handoff:

from agents import Agent, Runner

triage_agent = Agent(
    name="Triage",
    instructions="Route the user to the right specialist.",
    handoffs=["research_agent", "coding_agent"],
)

research_agent = Agent(
    name="Research",
    instructions="Search for information and summarize findings.",
    tools=[web_search_tool],
)

coding_agent = Agent(
    name="Coding",
    instructions="Write and debug code.",
    tools=[code_execution_tool],
)

result = Runner.run_sync(triage_agent, "I need to find the best API for geocoding")

MetaGPT

What it is: A multi-agent framework that simulates a software company. Agents have roles (Product Manager, Architect, Engineer, QA) and collaborate through a shared message pool following Standard Operating Procedures (SOPs).

Strengths:

SOP-driven workflows produce structured deliverables (PRDs, designs, code, tests)
Shared message pool with pub/sub — agents subscribe to relevant messages by role
Enforces structured outputs (not just chat), improving downstream quality
Academic backing with published research

Weaknesses:

Opinionated toward software development workflows
Heavy Python dependency tree
Less flexible for non-software-development use cases
SOPs are rigid — hard to adapt mid-execution

Claude Agent SDK

What it is: Anthropic’s framework for building agents on top of Claude Code. Provides the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript.

Strengths:

Full Claude Code toolset (Read, Write, Edit, Bash, etc.) out of the box
Custom tools as Python/TypeScript functions
MCP integration for external services (Slack, GitHub, Google Drive, Asana)
Hook system for intercepting and modifying agent behavior
First-class code review capabilities (multi-agent)

Weaknesses:

Claude-only (no multi-provider support)
Newer — ecosystem still developing
Requires Anthropic API key (no subscription arbitrage)

Cloudflare Agents SDK

What it is: Each agent is a Durable Object with built-in state persistence, scheduling, WebSocket communication, and queue integration. Edge-native.

Strengths:

Every agent is a Durable Object — state auto-persists to SQLite
this.schedule() for cron, this.queue() for async work
Agent-to-agent communication via DO RPC (zero HTTP overhead)
WebSocket state sync with React via useAgent() hook
MCPAgent class for building MCP servers
AgentWorkflow for bidirectional Workflow-Agent RPC

Weaknesses:

Cloudflare platform lock-in
No local development story without Miniflare
Durable Object limitations (memory, CPU time)
Smaller community

import { Agent, AIChatAgent } from "@cloudflare/agents";

class ResearchAgent extends AIChatAgent<Env, AgentState> {
  async onChatMessage(onFinish: StreamCallbacks["onFinish"]) {
    const response = await generateText({
      model: anthropic("claude-sonnet-4-5-20250514"),
      system: this.system,
      messages: this.messages,
      tools: this.tools,
      onFinish: (result) => {
        // State auto-persists to Durable Object SQLite
        this.setState({
          ...this.state,
          lastResearch: result.text,
          researchCount: (this.state.researchCount || 0) + 1,
        });
        onFinish(result);
      },
    });
  }

  // Built-in scheduling
  async onAlarm() {
    // Runs on configured schedule
    await this.runDailyResearch();
  }
}

Pydantic AI

What it is: A type-safe agent framework built on Pydantic. Catches agent logic errors at development time through Python’s type system.

Strengths:

Type safety: IDE autocompletion and static analysis for agent code
Structured output with continuous streaming validation
Durable execution: preserves progress across failures and restarts
Model-agnostic: OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, Bedrock, Vertex
MCP and Agent2Agent protocol support

Weaknesses:

Python-only
Newer framework (15K stars, still growing)
Less opinionated about multi-agent patterns

BabyAGI

What it is: A task-driven autonomous agent that runs a create-execute-reprioritize loop. The OG autonomous agent from 2023.

Current status: Evolved into an experimental self-building agent framework. The newest version is the agent that builds itself.

Historical significance: Proved the task loop pattern (create tasks -> execute -> reprioritize -> repeat) that influenced nearly every subsequent framework.

AutoGPT

What it is: The original viral autonomous agent (170K stars). Has pivoted from fully autonomous execution to a low-code platform with a block-based agent builder.

Current status: Now a platform rather than a framework. Users build agents using modular blocks through a Next.js UI with FastAPI backend and PostgreSQL storage.

SuperAGI

What it is: A dev-first autonomous agent framework with a GUI, toolkit marketplace, and APM dashboard.

Current status: Stalled. No releases since January 2024. Issues go unanswered. The company has pivoted. Security vulnerabilities remain unaddressed. Do not use for new projects.

This is the comprehensive comparison across all frameworks and all dimensions.

Architecture

Framework	Type	Language	Async	LLM Support	Binary/Runtime
OpenClaw	Single agent	TypeScript	Yes	Multi-provider (20+)	Node.js
NanoClaw	Single agent	TypeScript	Yes	Claude only	Node.js + Container
NanoBot	Single agent	Python	Yes	Multi-provider	Python
PicoClaw	Single agent	Go	Yes	Multi-provider (7+)	Static binary
ZeroClaw	Single agent	Rust	Yes	Multi-provider	Static binary
NullClaw	Single agent	Zig	Yes	Multi-provider (23+)	Static binary
TinyClaw	Multi-agent	TypeScript	Yes	Multi-provider	Node.js
Agent Zero	Hierarchical	Python	Yes	Multi-provider (3 slots)	Python + Docker
LangGraph	Graph-based	Python/TS	Yes	Multi-provider	Python/Node.js
CrewAI	Multi-agent	Python	Yes	Multi-provider	Python
AutoGen	Multi-agent	Python/.NET	Yes	Multi-provider	Python/.NET
OpenAI SDK	Multi-agent	Python/TS	Yes	Multi-provider (100+)	Python/Node.js
MetaGPT	Multi-agent	Python	Yes	Multi-provider	Python
Claude SDK	Single agent	Python/TS	Yes	Claude only	Python/Node.js
CF Agents	Single/Multi	TypeScript	Yes	Multi-provider	Cloudflare Workers
Pydantic AI	Single agent	Python	Yes	Multi-provider (10+)	Python
BabyAGI	Single agent	Python	No	OpenAI primary	Python
AutoGPT	Platform	Python/TS	Yes	Multi-provider	Docker

Memory Systems

Framework	Short-Term	Long-Term	Vector Search	Keyword Search	Auto-Learning	Shared Memory	Temporal Decay
OpenClaw	Session JSONL	MEMORY.md + daily logs	sqlite-vec	BM25	No	No (opt-in IPC)	30-day half-life
NanoClaw	SQLite	Per-group CLAUDE.md	No	No	No	No	No
NanoBot	In-memory	Knowledge graph	Planned	Planned	Graph-based	No	No
PicoClaw	In-memory	File-based	Planned	Planned	No	No	No
ZeroClaw	In-memory	SQLite/Markdown	Cosine sim (0.7)	FTS5 BM25 (0.3)	No	No	Configurable
NullClaw	In-memory	SQLite	Cosine sim	FTS5 BM25	No	No	No
TinyClaw	Per-agent	Team chat history	No	No	No	Team chat rooms	No
Agent Zero	Context window	FAISS vectors	FAISS IndexFlatIP	No	Yes (auto)	Subordinate inheritance	No (consolidation instead)
LangGraph	Graph state	Checkpointer (PG/SQLite)	Via LangChain	Via LangChain	No	Shared graph state	No
CrewAI	Short-term memory	Long-term w/ vector DB	Chroma/custom	Hybrid	No	Crew-level shared	Configurable half-life
AutoGen	Session-based	Configurable	Via plugins	Via plugins	No	Group chat history	No
OpenAI SDK	Session	Session (auto-managed)	No built-in	No built-in	No	Via handoff context	No
MetaGPT	Role context	Shared message pool	No built-in	No built-in	No	Shared message pool	No
Claude SDK	Conversation	File-based (CLAUDE.md)	No built-in	No built-in	No	No	No
CF Agents	DO SQLite state	DO SQLite + Vectorize	Vectorize	No built-in	No	DO RPC	No
Pydantic AI	Conversation	Durable execution	Via integration	Via integration	No	No	No
BabyAGI	Task list	Pinecone vectors	Pinecone	No	No	No	No
AutoGPT	Block state	PostgreSQL	Via blocks	Via blocks	No	No	No

Skills and Tools

Framework	Skill Format	Composable	Shared Registry	Built-in Tools	Custom Skills
OpenClaw	SKILL.md (Markdown + YAML)	Yes	5,400+ skills	Shell, browser, file, memory	Drop-in folder
NanoClaw	Code changes	Yes	No	MCP tools (send, schedule, list)	Fork and modify
NanoBot	ClawHub skills	Yes	ClawHub registry	Shell, file, MCP	MCP + ClawHub
PicoClaw	Config-based	Limited	No	Shell, file, messaging	Config
ZeroClaw	TOML manifests	Yes	Community registry	Shell, file, memory, git, browser, cron	TOML + folder
NullClaw	Vtable interface	Yes	No	18+ (file, shell, memory, browser, hw)	Zig interface impl
TinyClaw	Agent config	Yes	No	Read, write, search, message	TypeScript functions
Agent Zero	Python functions	Yes	No	Shell, browser, code exec, delegate	Python functions
LangGraph	Python functions	Yes	LangChain hub	LangChain tools ecosystem	Python/TS functions
CrewAI	Python decorators	Yes	7,000+ tools	600+ integrations	Python decorators
AutoGen	Python/.NET	Yes	No	Code execution, web	Functions + plugins
OpenAI SDK	Functions/MCP	Yes	No	Functions, MCP, hosted tools	Functions
MetaGPT	Role actions	Yes	No	Code, write, design, review	Python classes
Claude SDK	Python/TS functions + MCP	Yes	MCP ecosystem	Read, Write, Edit, Bash, + more	Functions + MCP
CF Agents	TypeScript + MCP	Yes	MCP ecosystem	DO state, schedule, queue, SQL	TS functions + MCP
Pydantic AI	Typed Python functions	Yes	MCP/A2A	MCP ecosystem	Typed functions
BabyAGI	Python functions	Limited	No	Search, code execution	Python functions
AutoGPT	Blocks (low-code)	Yes	Marketplace	Web, file, code, AI	Block builder

Communication Patterns

Framework	Agent-to-Agent	Human-in-Loop	Message Passing	Event System
OpenClaw	Opt-in IPC (sessions_send/spawn)	Chat interface	WebSocket gateway	No
NanoClaw	Filesystem IPC (JSON polling)	Chat interface	JSON files, 1s poll	No
NanoBot	No	Chat interface	Messaging platforms	No
PicoClaw	Planned	Chat interface	Gateway command	No
ZeroClaw	No	Chat interface	Channels	No
NullClaw	No	Chat interface	18 channels	Webhooks
TinyClaw	Team chat rooms (broadcast)	Chat interface	Actor model + message queue	Event feed
Agent Zero	Subordinate spawning	Superior chain (human at top)	Direct invocation	No
LangGraph	Graph edges	Interrupt nodes	State reducers	No
CrewAI	Task delegation	Callback system	Task context passing	Flows (event-driven)
AutoGen	Async messaging	Chat interface	Pub/sub, request/response	OpenTelemetry
OpenAI SDK	Handoffs	Human-in-loop API	Handoff functions	Tracing
MetaGPT	Shared message pool (pub/sub)	No	Publish/subscribe by role	No
Claude SDK	No (single agent)	Hooks	MCP	No
CF Agents	DO RPC	WebSocket	DO RPC, Queues, Workflows	Queue events
Pydantic AI	No (single agent)	No	MCP, A2A	No
BabyAGI	No	No	Task list	No
AutoGPT	Block connections	UI approval	Block graph	Webhooks

Execution Model

Framework	Runs On	Sandboxing	Cost Control	Parallel Exec	Error Handling
OpenClaw	Local machine	None (runs as user)	Model fallback chains	No	Retry + fallback
NanoClaw	Local + containers	Linux containers	Claude only	Per-group parallel	Container restart
NanoBot	Local machine	None	Model selection	No	Retry
PicoClaw	Any ($10 hw)	None	Cheap models	No	Retry
ZeroClaw	Any (self-hosted)	Workspace isolation	Tool allowlists	No	Retry
NullClaw	Any (embedded)	Multi-layer sandbox	Minimal by design	No	Retry
TinyClaw	Local + Docker	Docker (via tinyclaw-infra)	Model per agent	Fan-out parallel	Actor supervision
Agent Zero	Docker containers	Docker per agent	3-slot model mixing	Subordinate parallel	Subordinate retry
LangGraph	Local/cloud	None built-in	Token tracking	Parallel branches	Checkpoint recovery
CrewAI	Local/cloud	None built-in	Token tracking	Parallel tasks	Task retry
AutoGen	Local/cloud/Azure	Code execution sandbox	Token tracking	Async agents	Retry + fallback
OpenAI SDK	Local/cloud	None built-in	Token tracking	Async agents	Guardrails
MetaGPT	Local	Docker for code exec	Token tracking	Role parallel	SOP recovery
Claude SDK	Local	Tool permissions	API pricing	No	Retry
CF Agents	Cloudflare edge	DO isolation	DO CPU/memory limits	Queue workers	Alarm retry + DLQ
Pydantic AI	Local/cloud	None built-in	Token tracking	No	Durable execution
BabyAGI	Local	None	Token tracking	No	Loop retry
AutoGPT	Docker	Docker	Credit system	Block parallel	Block retry

Memory is the most differentiated axis across frameworks. Here is a detailed breakdown.

Memory Architecture Patterns

Four distinct patterns have emerged:

Pattern 1: File-Based Memory (OpenClaw, NanoClaw, Claude SDK)

workspace/
  MEMORY.md          # Curated long-term facts (append-only)
  SOUL.md            # Agent personality (rarely changes)
  AGENTS.md          # Behavior rules (injected every prompt)
  memory/
    2026-03-16.md    # Daily activity log
    2026-03-15.md    # Yesterday (also loaded at startup)

Pros: Human-readable, version-controllable, easy to debug Cons: No semantic search without additional index, linear scan

Pattern 2: Vector Database Memory (Agent Zero, BabyAGI, CrewAI)

memory_store = FAISSMemory(
    index_type="IndexFlatIP",  # Cosine similarity
    dimension=1536,
    auto_extract=True,         # LLM extracts facts after each turn
    auto_consolidate=True,     # Merge related memories over time
    categories=["facts", "solutions", "preferences", "errors"],
)

Pros: Semantic retrieval, scales to millions of memories Cons: Opaque (hard to inspect), embedding model dependency, cost

Pattern 3: Hybrid Search (OpenClaw, ZeroClaw, NullClaw)

-- Hybrid search: vector similarity + BM25 keyword matching
-- ZeroClaw's SQLite implementation

-- Vector search (70% weight)
SELECT id, content,
       cosine_similarity(embedding, ?) AS vec_score
FROM memories;

-- Keyword search (30% weight)
SELECT id, content,
       bm25(memories_fts) AS kw_score
FROM memories_fts
WHERE memories_fts MATCH ?;

-- Combined score
SELECT id, content,
       (0.7 * vec_score + 0.3 * kw_score) AS combined_score
FROM (
  -- join vector and keyword results
) ORDER BY combined_score DESC LIMIT 10;

Pros: Best of both worlds — catches semantic similarity AND exact matches Cons: More complex, requires both embedding model and FTS index

Pattern 4: Graph Memory (NanoBot)

User
  |-- works_on --> data-pipeline
  |       |-- uses --> pandas
  |       |-- deployed_on --> AWS Lambda
  |-- prefers --> type-hints
  |-- asked_about --> agent-frameworks (recent)

Pros: Relationship-aware retrieval, good for ongoing projects Cons: Graph construction is imprecise, harder to debug than files

Memory Retrieval Strategies

Strategy	Used By	How It Works
Temporal decay	OpenClaw, ZeroClaw	Recent memories weighted higher. 30-day half-life. Evergreen files skip decay
Auto-extraction	Agent Zero	Utility LLM extracts facts/solutions/errors after every turn
Auto-consolidation	Agent Zero	Related memories merged and summarized periodically
Adaptive-depth recall	CrewAI	Retrieval depth adjusts based on query complexity
Composite scoring	CrewAI	Weighted combination of recency, semantic similarity, and importance
Compaction	OpenClaw	When context fills, important info flushed to MEMORY.md, older conversation summarized
Embedding cache	ZeroClaw	LRU cache of 10K embeddings to avoid recomputation
Thread checkpointing	LangGraph	Full state snapshot at every graph node, recoverable by thread ID

What the Best Memory Systems Have in Common

Hybrid search is converging as the standard. Pure vector search misses exact terms. Pure keyword search misses meaning. The 70/30 vector/keyword split in ZeroClaw and OpenClaw appears to be the sweet spot.
File-based memory for human-inspectable state. Every Claw-family framework uses MEMORY.md. It is not technically optimal, but it is debuggable, versionable, and portable.
Auto-learning is the biggest gap. Only Agent Zero automatically extracts and consolidates knowledge. Every other framework requires explicit memory management.

Skill Definition Patterns

Pattern 1: Markdown Playbooks (OpenClaw, ZeroClaw)

The most popular pattern. A skill is a directory with a SKILL.md file containing YAML frontmatter (requirements, tags, environment) and markdown instructions.

---
name: keyword-research
description: Research keywords using DataForSEO API
requires:
  env: ["DATAFORSEO_LOGIN", "DATAFORSEO_PASSWORD"]
tags: ["seo", "research", "marketing"]
platforms: ["all"]
---

OpenClaw injects a compact XML list of available skills into the system prompt. Skills are selectively loaded based on the current context.

Pattern 2: Typed Functions (Pydantic AI, LangGraph, OpenAI SDK)

Skills are Python/TypeScript functions with type annotations.

from pydantic_ai import Agent
from pydantic import BaseModel

class SearchResult(BaseModel):
    title: str
    url: str
    snippet: str
    relevance_score: float

agent = Agent(
    "anthropic:claude-sonnet-4-5-20250514",
    system_prompt="You are a research assistant.",
)

@agent.tool
async def search_web(query: str, max_results: int = 5) -> list[SearchResult]:
    """Search the web for information."""
    results = await search_api.search(query, limit=max_results)
    return [
        SearchResult(
            title=r["title"],
            url=r["url"],
            snippet=r["snippet"],
            relevance_score=r["score"],
        )
        for r in results
    ]

Pattern 3: Role Actions (MetaGPT, CrewAI)

Skills are embedded in agent roles as expected behaviors.

class Architect(Role):
    name: str = "Architect"
    profile: str = "Software Architect"

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.set_actions([WriteDesign, WriteAPIDesign])
        self.watch({WritePRD})  # Subscribe to Product Manager output

Pattern 4: MCP Tools (Claude SDK, NanoClaw, Pydantic AI)

Skills are exposed as MCP (Model Context Protocol) servers.

// Claude SDK: MCP server as skill provider
const agent = new ClaudeAgent({
  mcpServers: [
    {
      name: "seo-tools",
      transport: "stdio",
      command: "npx",
      args: ["-y", "@my-org/seo-mcp-server"],
    },
    {
      name: "database",
      transport: "sse",
      url: "https://api.example.com/mcp",
    },
  ],
});

Pattern 5: Vtable Interfaces (NullClaw)

Skills are compiled interface implementations.

const ToolVTable = struct {
    name: []const u8,
    description: []const u8,
    parameters: []const ParameterDef,
    execute: *const fn (params: ParameterMap) anyerror!ToolResult,
};

// Adding a new tool means implementing this interface
pub const git_commit_tool = ToolVTable{
    .name = "git_commit",
    .description = "Create a git commit with the given message",
    .parameters = &[_]ParameterDef{
        .{ .name = "message", .type = .string, .required = true },
        .{ .name = "files", .type = .string_array, .required = false },
    },
    .execute = &gitCommitImpl,
};

Skill Ecosystem Size

Framework	Built-in	Community	Registry
OpenClaw	~50 bundled	5,400+ in registry	openclaw/skills
CrewAI	600+ integrations	7,000+ tools	CrewAI Tools
LangGraph	LangChain ecosystem	LangChain Hub	LangChain Hub
NanoBot	~10 core	ClawHub	ClawHub
ZeroClaw	~20 built-in	Community packs	TOML registry
NullClaw	18+ built-in	Small	None
Claude SDK	Claude Code toolset + MCP	MCP ecosystem	MCP servers
Others	5-15 built-in	Small	None

Pattern 1: No Agent-to-Agent Communication

Used by: NanoBot, PicoClaw, ZeroClaw, NullClaw, Claude SDK, Pydantic AI, BabyAGI

These frameworks are single-agent systems. Communication is always agent-to-human via messaging platforms or CLI.

Pattern 2: Filesystem IPC (Polling)

Used by: NanoClaw

Host process polls data/ipc/{group}/ every 1 second
  -> Reads JSON files from messages/, tasks/
  -> Processes and deletes after handling
  -> Container agents write to IPC directory
  -> Host routes messages to platforms

Simple, debuggable, but adds 1-second latency per message hop.

Pattern 3: Handoff Functions

Used by: OpenAI Agents SDK, Swarm (legacy)

triage = Agent(
    name="Triage",
    handoffs=["billing_agent", "tech_support_agent"],
    instructions="Route user to the right specialist based on their question.",
)

Clean and minimal. Works well for customer support routing patterns. Limited for complex multi-step coordination.

Pattern 4: Shared Message Pool (Pub/Sub)

Used by: MetaGPT


class SharedMessagePool:
    def publish(self, message: Message, sender_role: str):
        for subscriber in self.subscribers:
            if subscriber.should_receive(message, sender_role):
                subscriber.inbox.append(message)

    def subscribe(self, agent: Agent, filter_roles: list[str]):
        self.subscribers.append(Subscription(agent, filter_roles))

Pattern 5: Team Chat Rooms (Broadcast)

Used by: TinyClaw

// Every team has a persistent chat room
// All agents see all messages from teammates
// Agents mention teammates to delegate: "@writer please draft this"

interface TeamMessage {
  from: string;      // "researcher"
  to?: string;       // "@writer" or broadcast (no to)
  team: string;      // "content-team"
  content: string;
  timestamp: number;
}

// Fan-out: researcher mentions @writer and @reviewer
// Both process in parallel
// Responses flow back to team room

Pattern 6: Graph Edges and State Reducers

Used by: LangGraph


class State(TypedDict):
    # add reducer: new messages are appended, not replaced
    messages: Annotated[list[str], add]
    # last-write-wins for simple values
    status: str

Pattern 7: DO RPC (Direct Method Invocation)

Used by: Cloudflare Agents SDK

// Agent-to-agent communication via Durable Object RPC
// Zero HTTP overhead -- direct method call within the same datacenter

class CoordinatorAgent extends Agent<Env> {
  async delegateResearch(topic: string) {
    const researcher = this.env.RESEARCH_AGENT.get(
      this.env.RESEARCH_AGENT.idFromName(topic)
    );
    // Direct method call -- no HTTP, no serialization overhead
    const result = await researcher.research(topic);
    return result;
  }
}

Human-in-the-Loop Mechanisms

Framework	Mechanism	Granularity
LangGraph	Interrupt nodes	Per-node in graph
CrewAI	Callbacks	Per-task
AutoGen	Chat interface	Per-message
OpenAI SDK	Human-in-loop API	Per-run
Agent Zero	Superior chain	Human is top of hierarchy
TinyClaw	Team chat	Agents and humans share chat room
OpenClaw	Chat platforms	Per-message
CF Agents	WebSocket	Real-time bidirectional

Local CLI Execution

Used by: Most frameworks (OpenClaw, NanoBot, PicoClaw, ZeroClaw, NullClaw, Agent Zero, LangGraph, CrewAI)

The agent runs as a local process. Advantages: full machine access, no cloud costs, privacy. Disadvantages: requires a running machine, no horizontal scaling, no fault tolerance.

Container-Isolated Execution

Used by: NanoClaw, Agent Zero, AutoGPT

Each agent runs in its own container. NanoClaw uses Docker or Apple Container. Agent Zero uses Docker per subordinate. This provides OS-level isolation but adds startup latency.

Edge/Serverless Execution

Used by: Cloudflare Agents SDK

// Each agent is a Durable Object
// Runs on Cloudflare's edge network (300+ cities)
// Automatically hibernates when idle (no cost)
// Wakes on request, alarm, or queue message

export default {
  async fetch(request: Request, env: Env) {
    const agentId = env.AGENT.idFromName("my-agent");
    const agent = env.AGENT.get(agentId);
    return agent.fetch(request);
  },
};

Advantages: global distribution, automatic scaling, pay-per-use, built-in persistence. Disadvantages: platform lock-in, CPU time limits, no local filesystem.

Sandboxing Comparison

Framework	Sandbox Type	Isolation Level	Performance Impact
NanoClaw	Linux container	OS-level	Moderate (container startup)
Agent Zero	Docker	OS-level	Moderate
NullClaw	Multi-layer sandbox	Process-level	Minimal
ZeroClaw	Workspace isolation	Filesystem-level	None
CF Agents	Durable Object	V8 isolate	None
LangGraph	None built-in	None	None
OpenClaw	None	Runs as user	None

Cost Control Mechanisms

Framework	Mechanism	How It Works
OpenClaw	Model fallback chains	Try cheap model first, escalate on failure
Agent Zero	3-slot model mixing	Different models for chat/utility/embedding
LangGraph	Token tracking	State includes token counts per node
CrewAI	Token tracking	Per-agent token budgets
CF Agents	DO CPU/memory limits	Hard limits per request, alarm-based budgets
OpenAI SDK	Token tracking	Per-run cost visibility
ZeroClaw	Tool allowlists	Limit which tools can run (prevents expensive operations)

Persistence Models

Framework	State Backend	Checkpoint/Resume	Session Management	Crash Recovery
OpenClaw	SQLite + files	Session JSONL (branching)	Per-agent sessions	Resume from JSONL
NanoClaw	SQLite	Crash recovery built-in	Per-group	SQLite WAL
NanoBot	Knowledge graph	Persistent graph	In-memory sessions	Graph persists
ZeroClaw	SQLite/Markdown	File-based	Per-workspace	File recovery
NullClaw	SQLite	File-based	Per-workspace	Binary restart
TinyClaw	SQLite	Per-agent workspace	Per-team	Actor restart
Agent Zero	FAISS + files	FAISS persists to disk	Per-conversation	FAISS reload
LangGraph	PG/SQLite/Memory	Full checkpoint at every node	Thread-based	Checkpoint recovery
CrewAI	Configurable vector DB	Task-level	Per-crew	Task retry
AutoGen	Configurable	Session-based	Per-group	Session replay
OpenAI SDK	Session API	Session history	Auto-managed	Session resume
MetaGPT	In-memory + files	No built-in	Per-run	No
Claude SDK	File-based	No built-in	Conversation	No
CF Agents	DO SQLite	Automatic (every setState)	Per-DO instance	DO auto-recovery
Pydantic AI	Configurable	Durable execution	Per-conversation	Progress preserved
BabyAGI	Pinecone	Task list	Per-run	Vector reload
AutoGPT	PostgreSQL	Block state	Per-agent	DB recovery

Key insight: LangGraph and Cloudflare Agents SDK have the strongest state persistence stories. LangGraph checkpoints at every graph node with full state snapshots. CF Agents auto-persist state to Durable Object SQLite on every setState() call. Both enable true resume-from-failure — every other framework requires manual work.

Framework	GitHub Stars	Last Active	Docs Quality	Plugin System	Community
OpenClaw	302K	Daily	Excellent	Skills registry (5,400+)	Massive
NanoBot (HKUDS)	27K	Daily	Good	ClawHub	Growing
NanoClaw	22K	Daily	Good	Fork-and-modify	Growing
PicoClaw	18K	Weekly	Fair	Limited	Growing
ZeroClaw	17K	Weekly	Good	TOML manifests	Growing
NullClaw	12K	Weekly	Fair	Vtable interfaces	Small
TinyClaw	8K	Weekly	Fair	Agent configs	Small
CrewAI	46K	Daily	Excellent	7,000+ tools	Large
MetaGPT	42K	Monthly	Good	Role system	Large
AutoGen	38K	Daily	Good	Plugin system	Large
BabyAGI	20K	Monthly	Fair	Limited	Declining
Swarm	18K	Archived	Fair	Deprecated	Migrating to Agents SDK
OpenAI SDK	15K	Daily	Excellent	MCP + tools	Growing
Pydantic AI	15K	Daily	Excellent	MCP + A2A	Growing
SuperAGI	15K	Stalled (Jan 2024)	Outdated	Marketplace	Dead
Agent Zero	12K	Weekly	Good	Python functions	Niche
AutoGPT	170K	Monthly	Fair	Block marketplace	Declining

After analyzing 18 frameworks, several consensus patterns emerge:

1. Markdown Files as Persistent Memory

Every Claw-family framework uses MEMORY.md. Claude Code uses CLAUDE.md. MetaGPT uses structured artifacts. The pattern is universal: human-readable text files for state that humans need to inspect.

MEMORY.md   # What the agent knows (curated facts)
SOUL.md     # Who the agent is (personality, boundaries)
AGENTS.md   # How the agent behaves (rules, injected every turn)

This is not technically optimal (linear scan, no semantic search), but it solves the debuggability problem that vector-only systems fail at.

2. Hybrid Search Is the Memory Sweet Spot

The frameworks with the best memory systems all combine vector similarity with keyword matching:

OpenClaw: sqlite-vec + BM25
ZeroClaw: cosine similarity (0.7) + FTS5 BM25 (0.3)
NullClaw: vector + FTS5

Pure vector search misses exact terms (“wrangler.jsonc” won’t match “wrangler configuration”). Pure keyword search misses semantic similarity (“deployment tool” won’t match “wrangler”). The 70/30 split appears to be the emerging consensus.

3. Skills Are Just Markdown

OpenClaw proved it. ZeroClaw adopted it. The skill-as-markdown-file pattern has become the standard:

skills/
  my-skill/
    SKILL.md        # YAML frontmatter + markdown instructions
    (optional files) # Templates, configs, reference code

No SDK, no compilation, no runtime dependency. The agent reads the skill file and follows the instructions. This works because modern LLMs are good enough to follow structured markdown instructions reliably.

4. Container Isolation for Security

NanoClaw and Agent Zero both use container isolation per agent. This is the only pattern that provides real security — everything else (workspace isolation, tool allowlists) is convention-based and can be bypassed.

5. Polling, Not Pushing

NanoClaw polls IPC directories every second. OpenClaw watches file changes. BabyAGI runs a poll loop. The pattern: simple polling is more reliable than complex push systems for agent coordination.

6. Single Agent Scales Down, Multi-Agent Scales Up

The Claw family (OpenClaw, NanoClaw, PicoClaw, ZeroClaw, NullClaw) are all single-agent systems. They scale down beautifully — NullClaw runs on 1MB RAM.

Multi-agent systems (CrewAI, AutoGen, TinyClaw, MetaGPT) scale up — they can handle complex workflows with specialized roles. But they add significant complexity.

The pattern: start with a single agent, add multi-agent only when a single agent demonstrably cannot handle the workload.

7. State Machines for Complex Workflows

LangGraph’s graph model and CrewAI’s Flows both treat agent workflows as state machines. This provides:

Deterministic routing (edges determine next step)
Checkpoint/resume (state snapshots at each node)
Human-in-the-loop (interrupt at specific nodes)
Observability (trace the path through the graph)

For anything beyond simple chat, state machines are emerging as the coordination primitive.

How to Build a Hybrid Memory System

The best memory systems combine multiple retrieval strategies. Here is a TypeScript implementation inspired by OpenClaw and ZeroClaw:

import { Database } from "better-sqlite3";

interface Memory {
  id: string;
  content: string;
  category: "fact" | "solution" | "preference" | "error";
  embedding: Float32Array;
  created_at: number;
  importance: number;
}

interface SearchResult {
  memory: Memory;
  score: number;
  source: "vector" | "keyword" | "both";
}

class HybridMemoryStore {
  private db: Database;
  private vectorWeight = 0.7;
  private keywordWeight = 0.3;
  private decayHalfLifeDays = 30;

  constructor(dbPath: string) {
    this.db = new Database(dbPath);
    this.initSchema();
  }

  private initSchema() {
    this.db.exec(`
      CREATE TABLE IF NOT EXISTS memories (
        id TEXT PRIMARY KEY,
        content TEXT NOT NULL,
        category TEXT NOT NULL,
        embedding BLOB,
        created_at INTEGER NOT NULL,
        importance REAL DEFAULT 1.0
      );
      CREATE VIRTUAL TABLE IF NOT EXISTS memories_fts
        USING fts5(content, id UNINDEXED);
    `);
  }

  async store(memory: Omit<Memory, "id" | "created_at">): Promise<string> {
    const id = crypto.randomUUID();
    const created_at = Date.now();

    this.db.prepare(`
      INSERT INTO memories (id, content, category, embedding, created_at, importance)
      VALUES (?, ?, ?, ?, ?, ?)
    `).run(id, memory.content, memory.category,
           Buffer.from(memory.embedding.buffer), created_at, memory.importance);

    this.db.prepare(`
      INSERT INTO memories_fts (id, content) VALUES (?, ?)
    `).run(id, memory.content);

    return id;
  }

  async search(
    query: string,
    queryEmbedding: Float32Array,
    limit: number = 10
  ): Promise<SearchResult[]> {
    // Vector search
    const vectorResults = this.vectorSearch(queryEmbedding, limit * 2);

    // Keyword search
    const keywordResults = this.keywordSearch(query, limit * 2);

    // Merge with weights
    const merged = this.mergeResults(vectorResults, keywordResults);

    // Apply temporal decay
    const decayed = merged.map((r) => ({
      ...r,
      score: r.score * this.temporalDecay(r.memory.created_at),
    }));

    return decayed
      .sort((a, b) => b.score - a.score)
      .slice(0, limit);
  }

  private vectorSearch(
    queryEmbedding: Float32Array,
    limit: number
  ): SearchResult[] {
    const rows = this.db.prepare(`
      SELECT id, content, category, embedding, created_at, importance
      FROM memories WHERE embedding IS NOT NULL
    `).all() as any[];

    return rows
      .map((row) => {
        const embedding = new Float32Array(row.embedding.buffer);
        const similarity = this.cosineSimilarity(queryEmbedding, embedding);
        return {
          memory: { ...row, embedding },
          score: similarity * this.vectorWeight,
          source: "vector" as const,
        };
      })
      .sort((a, b) => b.score - a.score)
      .slice(0, limit);
  }

  private keywordSearch(query: string, limit: number): SearchResult[] {
    const rows = this.db.prepare(`
      SELECT m.id, m.content, m.category, m.embedding, m.created_at, m.importance,
             bm25(memories_fts) as bm25_score
      FROM memories_fts f
      JOIN memories m ON f.id = m.id
      WHERE memories_fts MATCH ?
      ORDER BY bm25_score
      LIMIT ?
    `).all(query, limit) as any[];

    return rows.map((row) => ({
      memory: row,
      score: Math.abs(row.bm25_score) * this.keywordWeight,
      source: "keyword" as const,
    }));
  }

  private mergeResults(
    vectorResults: SearchResult[],
    keywordResults: SearchResult[]
  ): SearchResult[] {
    const merged = new Map<string, SearchResult>();

    for (const r of vectorResults) {
      merged.set(r.memory.id, r);
    }

    for (const r of keywordResults) {
      const existing = merged.get(r.memory.id);
      if (existing) {
        existing.score += r.score;
        existing.source = "both";
      } else {
        merged.set(r.memory.id, r);
      }
    }

    return Array.from(merged.values());
  }

  private temporalDecay(createdAt: number): number {
    const ageMs = Date.now() - createdAt;
    const ageDays = ageMs / (1000 * 60 * 60 * 24);
    return Math.pow(0.5, ageDays / this.decayHalfLifeDays);
  }

  private cosineSimilarity(a: Float32Array, b: Float32Array): number {
    let dot = 0, normA = 0, normB = 0;
    for (let i = 0; i < a.length; i++) {
      dot += a[i] * b[i];
      normA += a[i] * a[i];
      normB += b[i] * b[i];
    }
    return dot / (Math.sqrt(normA) * Math.sqrt(normB));
  }
}

How to Build an Auto-Learning System

Inspired by Agent Zero’s approach, here is how to add auto-extraction to any agent:

interface ExtractedKnowledge {
  content: string;
  category: "fact" | "solution" | "preference" | "error";
  confidence: number;
}

class AutoLearner {
  private memoryStore: HybridMemoryStore;
  private extractionModel: string;
  private embeddingModel: string;

  constructor(
    memoryStore: HybridMemoryStore,
    extractionModel = "claude-haiku-3-5-20241022", // cheap model for extraction
    embeddingModel = "nomic-embed-text"
  ) {
    this.memoryStore = memoryStore;
    this.extractionModel = extractionModel;
    this.embeddingModel = embeddingModel;
  }

  async learnFromConversation(
    userMessage: string,
    assistantResponse: string
  ): Promise<ExtractedKnowledge[]> {
    const extracted = await this.extract(userMessage, assistantResponse);

    for (const item of extracted) {
      if (item.confidence < 0.7) continue; // Skip low-confidence extractions

      // Check for duplicates via semantic search
      const embedding = await this.embed(item.content);
      const existing = await this.memoryStore.search(item.content, embedding, 3);

      const isDuplicate = existing.some((r) => r.score > 0.9);
      if (isDuplicate) continue;

      await this.memoryStore.store({
        content: item.content,
        category: item.category,
        embedding,
        importance: item.confidence,
      });
    }

    return extracted;
  }

  private async extract(
    userMessage: string,
    assistantResponse: string
  ): Promise<ExtractedKnowledge[]> {
    const prompt = `Extract factual knowledge from this conversation turn.

For each piece of knowledge, classify it:
- fact: Something true about the user, their project, or their environment
- solution: A problem-solution pair that worked
- preference: A user preference or style choice
- error: A mistake or anti-pattern discovered

Return JSON array. Only include high-confidence items.

User: ${userMessage}
Assistant: ${assistantResponse}

Extract:`;

    const response = await callLLM(this.extractionModel, prompt);
    return JSON.parse(response);
  }

  private async embed(text: string): Promise<Float32Array> {
    return callEmbedding(this.embeddingModel, text);
  }

  async consolidate(): Promise<number> {
    // Run periodically to merge related memories
    const allMemories = await this.memoryStore.getAll();
    let mergeCount = 0;

    for (let i = 0; i < allMemories.length; i++) {
      for (let j = i + 1; j < allMemories.length; j++) {
        const similarity = this.memoryStore.cosineSimilarity(
          allMemories[i].embedding,
          allMemories[j].embedding
        );

        if (similarity > 0.85) {
          // Merge: ask LLM to combine
          const merged = await this.merge(allMemories[i], allMemories[j]);
          await this.memoryStore.store(merged);
          await this.memoryStore.delete(allMemories[i].id);
          await this.memoryStore.delete(allMemories[j].id);
          mergeCount++;
        }
      }
    }

    return mergeCount;
  }

  private async merge(a: Memory, b: Memory): Promise<Omit<Memory, "id" | "created_at">> {
    const prompt = `Merge these two related memories into one concise statement:
1: ${a.content}
2: ${b.content}

Merged:`;

    const content = await callLLM(this.extractionModel, prompt);
    const embedding = await this.embed(content);

    return {
      content,
      category: a.category,
      embedding,
      importance: Math.max(a.importance, b.importance),
    };
  }
}

How to Build a Skill Loader

A TypeScript implementation of the SKILL.md pattern:

import { readdir, readFile } from "fs/promises";
import { join } from "path";
import { parse as parseYaml } from "yaml";

interface SkillMetadata {
  name: string;
  description: string;
  requires?: {
    binaries?: string[];
    env?: string[];
  };
  tags?: string[];
  platforms?: string[];
}

interface Skill {
  metadata: SkillMetadata;
  instructions: string;
  path: string;
}

class SkillLoader {
  private skillPaths: string[];
  private loadedSkills: Map<string, Skill> = new Map();

  constructor(skillPaths: string[]) {
    // Precedence: workspace > user > bundled (last wins on conflict)
    this.skillPaths = skillPaths;
  }

  async loadAll(): Promise<Map<string, Skill>> {
    for (const basePath of this.skillPaths) {
      try {
        const dirs = await readdir(basePath, { withFileTypes: true });
        for (const dir of dirs) {
          if (!dir.isDirectory()) continue;
          const skillPath = join(basePath, dir.name, "SKILL.md");
          try {
            const skill = await this.parseSkillFile(skillPath);
            // Later paths override earlier (workspace > user > bundled)
            this.loadedSkills.set(skill.metadata.name, skill);
          } catch {
            // Skip invalid skill files
          }
        }
      } catch {
        // Skip missing directories
      }
    }

    return this.loadedSkills;
  }

  private async parseSkillFile(path: string): Promise<Skill> {
    const content = await readFile(path, "utf-8");

    // Parse YAML frontmatter
    const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---\n([\s\S]*)$/);
    if (!frontmatterMatch) {
      throw new Error(`Invalid SKILL.md format: ${path}`);
    }

    const metadata = parseYaml(frontmatterMatch[1]) as SkillMetadata;
    const instructions = frontmatterMatch[2].trim();

    return { metadata, instructions, path };
  }

  async getAvailableSkills(
    context: { env: Record<string, string>; platform: string }
  ): Promise<Skill[]> {
    const available: Skill[] = [];

    for (const skill of this.loadedSkills.values()) {
      // Check environment requirements
      if (skill.metadata.requires?.env) {
        const missing = skill.metadata.requires.env.filter(
          (e) => !context.env[e]
        );
        if (missing.length > 0) continue;
      }

      // Check platform requirements
      if (
        skill.metadata.platforms &&
        !skill.metadata.platforms.includes("all") &&
        !skill.metadata.platforms.includes(context.platform)
      ) {
        continue;
      }

      available.push(skill);
    }

    return available;
  }

  formatForSystemPrompt(skills: Skill[]): string {
    // OpenClaw-style: inject compact XML list into system prompt
    const lines = skills.map(
      (s) =>
        `<skill name="${s.metadata.name}" description="${s.metadata.description}" />`
    );
    return `<available_skills>\n${lines.join("\n")}\n</available_skills>`;
  }
}

// Usage:
const loader = new SkillLoader([
  "./skills",                    // Workspace skills (highest priority)
  `${process.env.HOME}/.openclaw/skills`,  // User skills
  "./bundled-skills",            // Built-in skills (lowest priority)
]);

const skills = await loader.loadAll();
const available = await loader.getAvailableSkills({
  env: process.env as Record<string, string>,
  platform: process.platform,
});

const systemPromptSkills = loader.formatForSystemPrompt(available);

How to Build a Multi-Executor Router

For systems like Code Turtle that need to route tasks to different LLM backends based on cost and capability:

interface ExecutorConfig {
  name: string;
  model: string;
  provider: "anthropic" | "openai" | "google" | "openrouter" | "workers-ai";
  costPerMTokInput: number;
  costPerMTokOutput: number;
  maxContextTokens: number;
  capabilities: Set<"code" | "reasoning" | "simple" | "embedding">;
  rateLimit: { requests: number; perSeconds: number };
  currentUsage: { requests: number; windowStart: number };
}

class ExecutorRouter {
  private executors: ExecutorConfig[];
  private dailyBudget: number;
  private dailySpend: number = 0;

  constructor(executors: ExecutorConfig[], dailyBudget: number) {
    this.executors = executors;
    this.dailyBudget = dailyBudget;
  }

  selectExecutor(task: {
    type: "code" | "reasoning" | "simple";
    estimatedInputTokens: number;
    estimatedOutputTokens: number;
    priority: "high" | "normal" | "low";
  }): ExecutorConfig | null {
    // Filter by capability
    const capable = this.executors.filter((e) =>
      e.capabilities.has(task.type)
    );

    // Filter by rate limit
    const available = capable.filter((e) => {
      const now = Date.now() / 1000;
      if (now - e.currentUsage.windowStart > e.rateLimit.perSeconds) {
        e.currentUsage = { requests: 0, windowStart: now };
      }
      return e.currentUsage.requests < e.rateLimit.requests;
    });

    if (available.length === 0) return null;

    // Estimate cost for each
    const withCost = available.map((e) => ({
      executor: e,
      estimatedCost:
        (task.estimatedInputTokens / 1_000_000) * e.costPerMTokInput +
        (task.estimatedOutputTokens / 1_000_000) * e.costPerMTokOutput,
    }));

    // Budget check
    const affordable = withCost.filter(
      (e) => this.dailySpend + e.estimatedCost <= this.dailyBudget
    );

    if (affordable.length === 0) {
      // Budget exceeded -- only allow free tier
      const free = withCost.filter((e) => e.estimatedCost === 0);
      return free.length > 0 ? free[0].executor : null;
    }

    // Strategy by priority
    if (task.priority === "high") {
      // Best model regardless of cost (within budget)
      return affordable.sort(
        (a, b) => b.estimatedCost - a.estimatedCost
      )[0].executor;
    }

    // Normal/Low: cheapest capable model
    return affordable.sort(
      (a, b) => a.estimatedCost - b.estimatedCost
    )[0].executor;
  }

  recordUsage(executor: ExecutorConfig, actualCost: number): void {
    executor.currentUsage.requests++;
    this.dailySpend += actualCost;
  }
}

// Example configuration:
const executors: ExecutorConfig[] = [
  {
    name: "claude-sonnet",
    model: "claude-sonnet-4-5-20250514",
    provider: "anthropic",
    costPerMTokInput: 3,
    costPerMTokOutput: 15,
    maxContextTokens: 200_000,
    capabilities: new Set(["code", "reasoning", "simple"]),
    rateLimit: { requests: 50, perSeconds: 60 },
    currentUsage: { requests: 0, windowStart: 0 },
  },
  {
    name: "claude-haiku",
    model: "claude-haiku-3-5-20241022",
    provider: "anthropic",
    costPerMTokInput: 0.8,
    costPerMTokOutput: 4,
    maxContextTokens: 200_000,
    capabilities: new Set(["simple", "code"]),
    rateLimit: { requests: 100, perSeconds: 60 },
    currentUsage: { requests: 0, windowStart: 0 },
  },
  {
    name: "gemini-flash",
    model: "gemini-2.0-flash",
    provider: "google",
    costPerMTokInput: 0.1,
    costPerMTokOutput: 0.4,
    maxContextTokens: 1_000_000,
    capabilities: new Set(["simple", "reasoning"]),
    rateLimit: { requests: 15, perSeconds: 60 },
    currentUsage: { requests: 0, windowStart: 0 },
  },
  {
    name: "workers-ai-llama",
    model: "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
    provider: "workers-ai",
    costPerMTokInput: 0,  // Free tier
    costPerMTokOutput: 0,
    maxContextTokens: 8_192,
    capabilities: new Set(["simple"]),
    rateLimit: { requests: 300, perSeconds: 60 },
    currentUsage: { requests: 0, windowStart: 0 },
  },
];

Don’t	Do Instead	Why
Store all memory in a vector database only	Use hybrid search (vector + keyword) with human-readable MEMORY.md	Vector-only misses exact terms and is impossible to debug by inspection
Give agents unrestricted shell access	Use container isolation (NanoClaw) or tool allowlists (ZeroClaw)	One `rm -rf /` away from disaster. Security is not optional
Use one model for everything	Route by task complexity: Haiku for simple, Sonnet for code, Opus for reasoning	A single Opus call costs 60x a Haiku call. Cost control requires routing
Build multi-agent when single-agent suffices	Start single-agent, add coordination only when demonstrably needed	Multi-agent adds latency, cost, and debugging complexity
Rely on LLM memory alone (no persistent store)	Persist key facts to files or database after every session	LLMs forget everything between context windows. Persistent memory is infrastructure, not optional
Use push-based agent coordination	Use polling (check for new tasks every N seconds)	Push systems are harder to debug, harder to make reliable, harder to replay
Skip idempotency in task execution	Check if task was already completed before executing	Agents retry. Networks flake. At-least-once delivery means duplicate execution is guaranteed
Store embeddings without an LRU cache	Cache recent embeddings (ZeroClaw caches 10K)	Embedding API calls are slow and expensive. Cache hits are free
Use a stalled/dead framework (SuperAGI, early BabyAGI)	Use actively maintained frameworks	Unpatched security vulnerabilities, no bug fixes, community has moved on
Lock into a single LLM provider	Abstract the LLM interface (provider trait/interface)	Pricing changes, outages, and new models are constant. Provider agnosticism is insurance
Inject all skills into every prompt	Filter skills by context, load only relevant ones	5,400 skills in the prompt would consume the entire context window. Selective injection is mandatory
Build custom memory from scratch	Adopt the hybrid search pattern (SQLite + FTS5 + vector)	This is a solved problem. OpenClaw, ZeroClaw, and NullClaw all converge on the same design

Personal AI Assistant (Chat + Automation)

Recommended: OpenClaw (full-featured) or NanoBot (lightweight)

OpenClaw if you want the largest ecosystem, most messaging platform integrations, and battle-tested memory system. NanoBot if you want something lighter with knowledge graph memory and Python extensibility. NanoClaw if you want container security and are Claude-only.

Edge / IoT / Embedded

Recommended: PicoClaw (Go, <10MB) or NullClaw (Zig, 678KB)

PicoClaw for the widest architecture support (RISC-V, ARM, MIPS, x86) and Go’s ecosystem. NullClaw for the absolute smallest footprint and hardware peripheral support.

Multi-Agent Team Workflows

Recommended: CrewAI (role-based) or LangGraph (graph-based)

CrewAI for intuitive role definitions and the largest tool ecosystem (7,000+). LangGraph for the strongest checkpointing, persistence, and human-in-the-loop story. AutoGen if you need enterprise features and Azure integration.

TinyClaw is interesting for lightweight team collaboration with fan-out patterns.

Production Enterprise Deployment

Recommended: LangGraph + PostgresSaver or Microsoft Agent Framework

LangGraph is GA at v1.0.10 with full checkpoint recovery and production tooling. Microsoft Agent Framework for Azure-native shops that need GDPR, OpenTelemetry, and enterprise support.

Serverless / Edge-Native Agents

Recommended: Cloudflare Agents SDK

The only framework where each agent is a Durable Object with auto-persistent state, built-in scheduling, WebSocket communication, and queue integration. No alternative matches this for edge-native deployment.

Auto-Learning Agents

Recommended: Agent Zero (adopt the pattern) + your framework of choice

No other framework does automatic knowledge extraction. Implement Agent Zero’s extract-embed-consolidate pattern on top of whatever framework you choose.

Cost-Optimized Autonomous Workers

Recommended: Build a multi-executor router (see implementation above)

No framework handles this well out of the box. You need to build a routing layer that picks the cheapest capable model per task and enforces daily budgets. The executor router pattern in this article is a starting point.

Code Review / Development Agents

Recommended: Claude Agent SDK or OpenAI Agents SDK

Claude Agent SDK gives you the full Claude Code toolset (Read, Write, Edit, Bash). OpenAI Agents SDK gives you provider flexibility with clean handoff patterns.

Recommendation Matrix

Use Case	First Choice	Runner-Up	Avoid
Personal assistant	OpenClaw	NanoBot	SuperAGI
Edge / IoT	PicoClaw	NullClaw	OpenClaw (too heavy)
Multi-agent teams	CrewAI	LangGraph	BabyAGI
Enterprise	LangGraph	Microsoft Agent Framework	Swarm (deprecated)
Serverless	CF Agents SDK	—	Local-only frameworks
Auto-learning	Agent Zero pattern	—	Frameworks without persistence
Cost-optimized	Custom router	—	Single-provider frameworks
Code agents	Claude Agent SDK	OpenAI Agents SDK	AutoGPT
Type safety	Pydantic AI	LangGraph	Untyped frameworks
Software company sim	MetaGPT	CrewAI	Single-agent frameworks

Our autonomous issue worker needs:

Multi-executor backends — The executor router pattern above solves this. Abstract the LLM interface, route by task complexity and budget. Start with Claude Sonnet for code, Haiku for simple tasks, Workers AI for free-tier simple tasks.
A skill system — Adopt the SKILL.md pattern from OpenClaw. Skills as markdown files with YAML frontmatter. No SDK, no compilation. The skill loader implementation above is directly usable.
A coordinator (L2) — LangGraph’s graph model is the best pattern for multi-step workflows. But for a single-agent system, a simple state machine with checkpoint/resume is sufficient. Start with Pydantic AI’s durable execution or implement checkpoint/resume over D1.
Memory across runs — Implement the hybrid memory system above. SQLite + FTS5 + vector embeddings. Store MEMORY.md for human inspection. Add auto-learning (Agent Zero pattern) once the basic system works.
Cost optimization — The executor router with daily budgets, rate limit awareness, and free-tier fallback. Track actual costs per task and adjust routing thresholds based on real data.

Patterns to adopt: SKILL.md skills, hybrid memory search, auto-learning extraction, multi-executor routing, checkpoint/resume state Patterns to skip: Multi-agent coordination (overkill for our use case), push-based communication (polling is simpler), full graph workflow engine (state machine is sufficient)

Official Documentation

OpenClaw Documentation — Full documentation for the OpenClaw framework
OpenClaw Skills Guide — How the SKILL.md skill system works
NanoClaw GitHub — Container-isolated Claude-native agent
NanoBot GitHub — Ultra-lightweight OpenClaw alternative
PicoClaw GitHub — Go-based agent for edge/IoT hardware
ZeroClaw GitHub — Rust-based trait-driven agent runtime
NullClaw GitHub — 678KB Zig agent with vtable architecture
TinyClaw GitHub — Multi-agent team collaboration framework
Agent Zero GitHub — Auto-learning hierarchical agent framework
LangGraph Documentation — Graph API for stateful agents
CrewAI Documentation — Multi-agent orchestration framework
CrewAI Memory System — Short-term, long-term, entity, and contextual memory
AutoGen Documentation — Microsoft’s multi-agent framework
Microsoft Agent Framework Overview — Enterprise agent platform combining AutoGen and Semantic Kernel
OpenAI Agents SDK — Production evolution of Swarm
OpenAI Agents SDK GitHub — Source repository
MetaGPT GitHub — SOP-driven multi-agent collaboration
Claude Agent SDK Overview — Anthropic’s agent framework
Claude Agent SDK Python — Python implementation
Claude Agent SDK TypeScript — TypeScript implementation
Cloudflare Agents SDK — Edge-native agents on Durable Objects
Pydantic AI Documentation — Type-safe agent framework
Pydantic AI GitHub — Source repository
AutoGPT GitHub — Original autonomous agent platform
BabyAGI GitHub — Task-driven autonomous agent
SuperAGI GitHub — Stalled autonomous agent framework
Swarm GitHub — OpenAI’s educational multi-agent framework (deprecated)

Articles and Analysis

OpenClaw Beat React’s 10-Year GitHub Record in 60 Days — Analysis of OpenClaw’s explosive growth
OpenClaw Wikipedia — Background and history
NanoClaw Solves OpenClaw’s Security Issues — VentureBeat coverage of container isolation approach
PicoClaw and NanoBot vs OpenClaw — Lightweight alternatives comparison
Meet NullClaw: 678KB Zig AI Framework — MarkTechPost technical overview
ZeroClaw: A Minimal Rust-Based AI Agent Framework — DEV Community deep dive
OpenClaw, NanoBot, PicoClaw, ZeroClaw: The Claw Craziness — Overview of the Claw ecosystem
Building Agents with the Claude Agent SDK — Anthropic engineering blog
MetaGPT Research Paper — Academic paper on SOP-driven multi-agent collaboration
What is MetaGPT? - IBM — IBM’s overview of MetaGPT
Agent Zero: Revolutionary AI Framework — Tutorial and overview
Agent Zero Memory and Learning — DeepWiki analysis of auto-learning system
What is BabyAGI? - IBM — IBM’s overview of BabyAGI
Birth of BabyAGI — Yohei Nakajima’s original post

Comparison Articles

The 2026 AI Agent Framework Decision Guide: LangGraph vs CrewAI vs Pydantic AI — DEV Community comparison
Top 7 Agentic AI Frameworks in 2026 — AlphaMatch overview
AI Agent Frameworks 2026: LangGraph vs CrewAI & More — Let’s Data Science comparison
LangGraph vs CrewAI vs AutoGen: Which Framework in 2026? — ML Journey comparison
CrewAI vs AutoGen: Usage, Performance & Features in 2026 — Head-to-head comparison
Agent Zero vs AutoGen: Multi-Agent 2026 Guide — The AI Journal comparison
OpenClaw vs NanoBot: Which AI Agent Framework? — DataCamp comparison
12 Best Open-Source AI Agents & Frameworks in 2026 — Taskade comprehensive list
Rust Agent Runtime Showdown — Rust agent frameworks compared

Ecosystem and Community

Awesome OpenClaw Skills — 5,400+ skills filtered and categorized
Awesome OpenClaw Agents — 103 production-ready agent templates
OpenClaw Skills System (DeepWiki) — Technical analysis of skill loading
NanoClaw CLAUDE.md — Agent behavior rules
TinyClaw TinyOffice Portal — Web management portal
TinyClaw Infrastructure — Docker orchestration layer
ZeroClaw Migration Assessment — OpenClaw to ZeroClaw migration guide
PicoClaw Go Package — Go documentation
NanoBot Study Guide — Learn agent architecture in 3 days
Awesome Agents List — Comprehensive agent framework directory

Tutorials and Guides

Setting Up Skills in OpenClaw — Step-by-step skill creation
PicoClaw Setup Guide: Go Binary AI Assistant on $10 Hardware — Edge deployment guide
What are OpenClaw Skills? A Developer’s Guide — DigitalOcean developer guide
How to Build Custom OpenClaw Skills — LumaDock tutorial
Building Production-Ready AI Agents in 2026 using OpenAI Agent SDK — Architecture-first approach
NanoBot Tutorial: A Lightweight OpenClaw Alternative — DataCamp tutorial
LangGraph: Build Stateful Multi-Agent Systems That Don’t Crash — Production LangGraph patterns
Pydantic AI Tutorial: Build Type-Safe AI Agents — MyEngineeringPath tutorial

Platform Documentation

OpenClaw Official Site — Landing page and quick start
PicoClaw Official Site — Go-powered performance-first assistant
NullClaw Architecture — Technical architecture documentation
ZeroClaw Labs — Official site (beware unofficial domains)
NanoBot Official — MCP agent framework (nanobot-ai, different from HKUDS nanobot)
AutoGPT Platform — Low-code agent builder
Agent Zero Official — AI framework and computer assistant
CrewAI Open Source — Multi-agent orchestration
OpenAI AgentKit — Production agent deployment
Cloudflare Agents Landing — Edge-native agent platform