The human talks to the business. The business talks to the code. Every conversation makes the system smarter.
Most autonomous agent frameworks solve the wrong problem. They make a single agent that can code faster. The actual problem is different: a human runs a business with 40 repos, dozens of brands, continuous drift, and no time to start individual coding sessions for every repo that needs work. The human needs to talk at the business level — “Income Coach isn’t converting” — and have that translate into concrete code changes across multiple repos, automatically, with memory of what was tried and why.
This article describes Prime — a hierarchical control plane where persistent AI agents manage an organization autonomously. The human talks to the org-level agent about business priorities. The org agent decomposes intent into repo-level tasks. Per-repo agents execute, learn, and report back. The system gets smarter with every cycle.
Built on Cloudflare Agents SDK, Durable Objects, and a Napkin-inspired memory system that uses the best model for retrieval — not a separate embedding model.
Table of Contents
Open Table of Contents
- Why This Exists
- The Vision: Business Language In, Code Changes Out
- Architecture Overview
- The Agent Hierarchy
- The Conversation Interface
- Memory: Napkin-Inspired Progressive Disclosure
- The Wake Cycle
- Skill System: SKILL.md
- Auto-Learning: Knowledge Distillation
- The Run Sheet: Control Plane to Data Plane
- The Dispatcher: Dumb Resource Manager
- Competitive Position
- Implementation: Current State and Migration
- Schema and Data Model
- References
Why This Exists
The Conversation Gap
Every agent framework in the 18-framework survey solves some version of “human tells agent what to code.” OpenClaw connects to 20+ chat platforms. Claude Agent SDK gives you terminal access. CrewAI orchestrates teams. LangGraph checkpoints multi-step workflows.
None of them solve: human talks about business → system produces code changes across multiple repos.
The gap is not in execution capability. Modern agents can write code, fix CI, add configurations, create PRs. The gap is in decomposition — translating business intent into concrete technical work across a portfolio of repos, with memory of what was tried, awareness of what’s blocked, and intelligence to know when NOT to act.
The Architecture Gap
OpenClaw has 302K stars and 5,400+ skills. It is also a single-agent system that requires a running machine, forgets everything between sessions (unless you manually maintain MEMORY.md), and has no concept of organizational hierarchy.
The Cloudflare Agents SDK provides the missing primitive: Durable Objects as persistent agents. A DO has its own SQLite database, survives indefinitely, hibernates at zero cost, wakes instantly on events, and is always addressable. This is the container that makes always-on agents economically viable.
But the Agents SDK is infrastructure, not architecture. It gives you the building blocks. This article describes what you build with them.
What Changes
| Before (current) | After (Prime) |
|---|---|
| Human starts a Claude Code session per repo | Human talks to OrgPrime about business priorities |
| Agent forgets when session ends | Agents have persistent memory across all sessions |
| One repo at a time | 40 repos managed simultaneously |
| Manual task decomposition | Business intent auto-decomposes into repo tasks |
| Dispatcher does everything (scan, evaluate, dispatch) | Control plane thinks, data plane executes |
| No conversation history | Every decision recorded, auditable, learnable |
| System is idle unless human is talking | System is always on, always aware |
The Vision: Business Language In, Code Changes Out
Human: "Income Coach isn't converting. The onboarding flow is
too long and the value prop isn't clear on the landing page."
OrgPrime reasons:
- This is a business-level concern spanning multiple repos
- income-coach repo: onboarding flow is code
- brand-systems repo: value prop is brand positioning
- frontasy repo: landing page renders the positioning
OrgPrime decomposes:
1. garywu/brand-systems → create issue: "sharpen Income Coach
value prop — current messaging doesn't communicate immediate
value to first-time visitors"
2. garywu/income-coach → create issue: "simplify onboarding —
reduce steps from 5 to 2, defer profile completion to after
first value delivery"
3. garywu/frontasy → create issue: "update Income Coach landing
page copy to match new positioning" (blocked on #1)
Each RepoPrime:
- Reads its CLAUDE.md for repo context
- Reads the issue OrgPrime created
- Reasons about implementation approach
- Submits jobs to the dispatcher
- Reports outcomes back to OrgPrime
OrgPrime tracks:
- All three repos progressing toward the same business goal
- Dependencies (frontasy blocked on brand-systems)
- Whether the business intent was actually addressed
This is not a hypothetical. Every piece of this — DO hierarchy, job dispatch, GitHub issues as memory, signal scanning, CI-aware merge — is either built or partially built in garywu/mulan. What’s missing is the intelligence layer: the LLM reasoning that converts business language into technical decomposition.
Architecture Overview
Human (WebSocket / Telegram / CLI)
│
▼
OrgPrime DO (one per GitHub org)
│ ├── Persistent SQLite memory
│ ├── Conversation history
│ ├── Business context + priorities
│ ├── Cross-repo awareness
│ └── Decomposes intent → repo tasks
│
├── RepoPrime DO × N (one per repo)
│ ├── Persistent SQLite memory
│ ├── CLAUDE.md identity
│ ├── Signal awareness (CI, standards, PRs)
│ ├── Attempt history
│ ├── Skill registry
│ └── Reasons about repo-specific work
│
▼
Dispatcher CF Worker (one per org)
│ ├── Reads run sheet from OrgPrime
│ ├── Manages runner capacity
│ ├── Deduplicates jobs
│ ├── Records every attempt
│ └── Reports outcomes to RepoPrimes
│
├── CF Runner (inline, free)
├── CI Runner (GitHub Actions, deterministic)
└── Local Runner (Claude Agent SDK, AI-driven)
The Three Separations
1. Control Plane vs Data Plane
The control plane (OrgPrime + RepoPrimes) thinks. The data plane (Dispatcher + Runners) executes. Neither does both. This separation means:
- A bug in dispatch logic doesn’t affect reasoning
- A bug in reasoning doesn’t break job execution
- Each layer can be improved independently
2. Business Level vs Code Level
OrgPrime speaks business language. RepoPrimes speak code language. The translation happens at the OrgPrime → RepoPrime boundary via GitHub issues. Issues are the API between business intent and technical execution.
3. Memory vs Execution
Memory lives in DO SQLite (private to each agent) and GitHub issues (shared, auditable). Execution lives in the dispatcher and runners. Memory survives indefinitely. Execution is ephemeral — jobs start, run, complete, and their outcomes feed back into memory.
The Agent Hierarchy
OrgPrime DO — The Business Agent
One per GitHub organization. The human’s primary interface.
Knows:
- All repos in the portfolio and their current state
- Business priorities and strategic context
- Cross-repo dependencies and patterns
- Conversation history with the human
- Which RepoPrimes are active, blocked, or idle
Does:
- Receives business-level input from human
- Decomposes into repo-level tasks (via GitHub issues)
- Produces the run sheet (prioritized work backlog)
- Coordinates cross-repo work
- Escalates to human when something needs judgment
- Tracks whether business intent was actually delivered
Does NOT:
- Execute any code
- Manage runners or capacity
- Make repo-specific technical decisions (delegated to RepoPrimes)
Wake triggers:
- Human conversation (WebSocket)
- RepoPrime escalation (DO RPC)
- Scheduled alarm (daily review)
- Dispatcher flag (job failed 3x)
RepoPrime DO — The Repo Agent
One per managed repository. Autonomous within its scope.
Knows:
- This repo’s CLAUDE.md (identity, rules, constraints)
- Current signal states (CI, biome, commitlint, etc.)
- Open issues and PRs
- Attempt history (what was tried, what failed, why)
- Skills applicable to this repo
Does:
- Reads issues created by OrgPrime or humans
- Reasons about implementation approach (one LLM call)
- Submits jobs to dispatcher
- Comments on GitHub issues (records decisions and outcomes)
- Flags OrgPrime when something needs cross-repo attention
- Distills knowledge from completed jobs into memory
Does NOT:
- Execute code directly (delegates to runners via dispatcher)
- Make business-level decisions (defers to OrgPrime)
- Communicate with other RepoPrimes (goes through OrgPrime)
Wake triggers:
- GitHub webhook (issue labeled, CI failed, PR merged)
- OrgPrime delegation (new issue created)
- Dispatcher notification (job completed/failed)
- Scheduled alarm (1h/6h/24h adaptive)
Dispatcher — The Resource Manager
One per organization. Pure mechanics, zero intelligence.
Does:
- Reads run sheet from OrgPrime
- Checks runner availability and capacity
- Deduplicates against pending/running jobs
- Dispatches top N jobs within capacity
- Records every attempt with outcome
- Reports outcomes to RepoPrimes
- Scans repo signals via GitHub API (proxied through API Mom)
- Generates README dashboard
Does NOT:
- Decide what to work on (reads run sheet)
- Evaluate policy (OrgPrime decides priorities)
- Create jobs (RepoPrimes create jobs)
- Reason about failures (RepoPrimes reason)
The Conversation Interface
WebSocket via Agents SDK
The Cloudflare Agents SDK provides native WebSocket support with hibernation. OrgPrime maintains a persistent WebSocket connection to the human’s client:
export class OrgPrime extends Agent<Env, OrgState> {
// WebSocket message from human
async onMessage(connection: Connection, message: string) {
// 1. Store message in conversation history (DO SQLite)
this.sql`INSERT INTO conversations VALUES (
${crypto.randomUUID()}, 'human', ${message}, ${Date.now()}
)`
// 2. Build context (progressive disclosure from memory)
const context = await this.buildContext(message)
// 3. One LLM call — reason about business intent
const decision = await generateObject({
model: this.getModel(),
schema: OrgDecisionSchema,
system: context.pinnedContext,
prompt: this.buildPrompt(context, message),
})
// 4. Execute decisions (create issues, delegate to RepoPrimes)
await this.executeDecisions(decision.actions)
// 5. Respond to human
connection.send(JSON.stringify({
type: 'response',
summary: decision.summary,
actions: decision.actions.map(a => a.reason),
}))
// 6. Distill new knowledge from this conversation turn
await this.distillConversation(message, decision)
}
// WebSocket hibernation — zero cost when human isn't talking
async onClose(connection: Connection) {
// Connection state persists in DO SQLite
// Next connection resumes with full history
}
}
Multi-Platform Support
OrgPrime’s WebSocket is the canonical interface. Platform adapters translate:
| Platform | Adapter | Status |
|---|---|---|
| CLI (terminal) | Direct WebSocket | Priority 1 |
| Telegram | Bot webhook → OrgPrime HTTP → WebSocket | Built (@brewdbot) |
| Web dashboard | WebSocket from browser | Future |
| Slack | Slack Events API → OrgPrime HTTP | Future |
The adapter is thin — it translates platform message format to OrgPrime’s WebSocket protocol. All intelligence lives in OrgPrime.
Conversation Memory
Every conversation turn is stored in OrgPrime’s DO SQLite:
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
role TEXT NOT NULL, -- 'human' | 'org-prime' | 'system'
content TEXT NOT NULL,
related_repos TEXT, -- JSON array of repos mentioned
related_issues TEXT, -- JSON array of issues referenced
created_at INTEGER NOT NULL
);
This gives OrgPrime:
- Full conversation history across sessions (survives hibernation)
- Ability to reference prior conversations (“remember when we discussed…”)
- Context for understanding follow-up messages
- Audit trail of every human ↔ system interaction
Memory: Napkin-Inspired Progressive Disclosure
Traditional RAG bolts a smaller, dumber embedding model onto a capable LLM to pre-filter information. This inverts the decision hierarchy — the least capable model makes the most important decision (what context to retrieve).
Inspired by the Napkin memory system, Prime uses progressive disclosure — the LLM itself navigates a structured knowledge base using its full reasoning capability.
The Four Levels
Level 0: Pinned Context (always loaded, <500 tokens)
Loaded on every wake cycle. Contains only what the agent MUST know:
For OrgPrime:
# Org Context
- 40 repos across garywu org
- 3-tier execution: CF (free) → CI (deterministic) → Local (AI)
- Budget: $X/day across all repos
- Current focus: [from last human conversation]
- Blocked: [repos waiting on human input]
For RepoPrime:
# [repo-name] Context
- Stack: TypeScript, Cloudflare Workers, D1
- Priority: P1 (revenue-generating)
- Current state: CI passing, 3 open automated issues
- Last human direction: [from OrgPrime delegation]
This is the equivalent of CLAUDE.md but distilled to what matters RIGHT NOW.
Level 1: Keyword Map (loaded on wake, ~200 tokens)
A TF-IDF weighted taxonomy of the agent’s memory, generated from DO SQLite:
decisions/
keywords: onboarding, conversion, brand, positioning
notes: 12
attempts/
keywords: fix-ci, biome, commitlint, tsc-errors
notes: 45
patterns/
keywords: rate-limit, timeout, dependency, cascade
notes: 8
skills/
keywords: add-biome, fix-commitlint, fix-husky, diagnose-ci
notes: 10
The LLM reads this map and decides which folders to search. No embedding model involved — the best model makes the retrieval decision.
TF-IDF weighting:
- Headings: 3x weight
- Note titles: 2x weight
- Body text: 1x weight
- Terms appearing across all folders: suppressed (not distinctive)
Level 2: Search (on-demand, BM25)
When the LLM needs more context, it searches memory:
-- BM25 search over memory notes
SELECT title, snippet, folder, updated_at
FROM memory_fts
WHERE memory_fts MATCH ?
ORDER BY rank * recency_weight * backlink_score
LIMIT 10;
Key design decisions (from Napkin):
- Return matching lines only — no surrounding context (saves tokens)
- Hide numeric scores — prevents the LLM from anchoring on numbers instead of reasoning about semantic fit
- Recency weighting — newer notes rank higher (temporal decay without explicit pruning)
- Backlink scoring — notes referenced by other notes rank higher (markdown PageRank)
Level 3: Full Read (on-demand)
Complete note content, only when Level 2 points to something relevant:
SELECT content FROM memory WHERE id = ?;
This is the equivalent of reading a full GitHub issue or a complete SKILL.md file. The LLM navigates here deliberately, not by accident.
Why Not Vector Search?
At our scale (40 repos, ~50 memory notes per repo = ~2,000 total notes), vector search adds complexity without benefit:
| Dimension | Vector Search | Progressive Disclosure |
|---|---|---|
| Retrieval intelligence | Embedding model (smaller, dumber) | The LLM itself (best model available) |
| Infrastructure | Embedding pipeline + vector DB | DO SQLite FTS5 (built-in, free) |
| Debugging | Opaque cosine similarities | Readable keyword maps + BM25 |
| Update cost | Re-embed on every change | No pipeline, FTS5 auto-updates |
| Cold start | Needs embeddings computed | Works immediately from text |
For corpora of 100K+ documents, vector search is necessary. For organizational memory at our scale, it’s overhead that produces worse results.
DO SQLite Schema
Each agent (OrgPrime and every RepoPrime) has its own SQLite database:
-- Memory notes (the knowledge base)
CREATE TABLE memory (
id TEXT PRIMARY KEY,
folder TEXT NOT NULL, -- 'decisions', 'attempts', 'patterns', 'skills'
title TEXT NOT NULL,
content TEXT NOT NULL,
backlinks TEXT DEFAULT '[]', -- JSON array of note IDs this links to
source TEXT, -- 'distillation' | 'human' | 'job-outcome'
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
-- FTS5 index for BM25 search
CREATE VIRTUAL TABLE memory_fts USING fts5(
title, content, folder,
content=memory, content_rowid=rowid
);
-- Keyword map cache (regenerated on write)
CREATE TABLE keyword_map (
folder TEXT PRIMARY KEY,
keywords TEXT NOT NULL, -- JSON array of {term, weight} sorted by weight
note_count INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
-- Conversation history (OrgPrime only)
CREATE TABLE conversations (
id TEXT PRIMARY KEY,
role TEXT NOT NULL,
content TEXT NOT NULL,
related_repos TEXT,
related_issues TEXT,
created_at INTEGER NOT NULL
);
-- Working state
CREATE TABLE working (
key TEXT PRIMARY KEY,
value TEXT NOT NULL,
updated_at INTEGER NOT NULL
);
-- Decision log (audit trail)
CREATE TABLE decisions (
id TEXT PRIMARY KEY,
reasoning TEXT NOT NULL,
actions TEXT NOT NULL, -- JSON array
wake_reason TEXT NOT NULL,
decided_at INTEGER NOT NULL
);
The Wake Cycle
When an agent wakes (from alarm, webhook, conversation, or delegation):
1. Load Pinned Context (Level 0)
→ Read from working memory: current plan, in-flight jobs, last decision
→ For RepoPrime: read CLAUDE.md (cached in SQLite, refresh if >24h)
2. Load Keyword Map (Level 1)
→ Generated from memory table, cached in keyword_map table
→ ~200 tokens of navigational context
3. Understand the Trigger
→ What happened? (alarm, webhook event, human message, delegation)
→ Search memory (Level 2) for relevant history
→ Read specific notes (Level 3) if needed
4. Reason (ONE LLM call)
→ Input: pinned context + keyword map + trigger + relevant memory
→ Output: structured decision (actions + reasoning + next wake time)
→ Constraint: max 5 actions per wake cycle
5. Act
→ Submit jobs to dispatcher
→ Create/comment on GitHub issues
→ Delegate to RepoPrimes (OrgPrime only)
→ Flag OrgPrime (RepoPrime only)
→ Update working memory
6. Distill
→ Extract knowledge from this cycle into memory notes
→ Update keyword map
7. Schedule Next Wake
→ Work pending: 1h alarm
→ Nothing pending: 6h alarm
→ Blocked on human: 24h alarm
→ Just had conversation: 10min alarm (responsiveness)
The Single LLM Call Principle
Each wake cycle makes exactly ONE LLM call for reasoning. Not per event, not per repo, not per job. One call with full context, producing a structured decision.
This is critical for cost control and coherence:
- Cost: At 40 repos with 6h default alarms, that’s ~160 LLM calls/day across all agents. At Haiku pricing, that’s cents per day.
- Coherence: One call sees the full picture. Multiple calls per wake risk contradictory decisions.
The model selection follows the Wallet layer pattern (API Mom):
- OrgPrime: Sonnet (business reasoning requires strong capability)
- RepoPrime: Haiku for routine wakes, Sonnet for complex situations
- Workers AI (free) for simple signal evaluation
Skill System: SKILL.md
Adopted from OpenClaw’s proven pattern: skills are markdown files, not code.
# skills/fix-commitlint/SKILL.md
---
name: fix-commitlint
description: Add conventional commit linting to a repo
runner: ci
signals: [commitlint]
requires:
files: [package.json]
confidence: 0.95
last_success: 2026-03-20
success_rate: 47/50
---
## Instructions
1. Install @commitlint/cli and @commitlint/config-conventional
2. Create commitlint.config.cjs extending config-conventional
3. Add commitlint to husky pre-commit hook
4. Verify: echo "fix: test" | npx commitlint
## Error Patterns
- If husky is not installed, run fix-husky skill first
- If package.json has no "prepare" script, add "prepare": "husky"
- If commitlint.config.cjs conflicts with existing config, check
for .commitlintrc.json and remove it
## Learned From
- garywu/frontasy#12 (2026-03-15): initial implementation
- garywu/niche-fi#8 (2026-03-18): husky dependency discovered
- garywu/svg-generators#30 (2026-03-21): config format conflict
Skill Lifecycle
1. Manual Creation
Human or agent writes SKILL.md for a known procedure
2. Selective Injection
RepoPrime's Level 1 keyword map includes skill names
LLM reads relevant skills before reasoning about a job
3. Crystallization (Auto-Learning)
Job succeeds → distillation extracts the procedure
→ Creates or updates SKILL.md with new error patterns
→ Updates success_rate and last_success
4. Skill Inheritance
Universal skills (add-biome) → apply to all repos
Vertical skills (fix-cloudflare-worker) → apply to CF Worker repos
Repo-specific skills → apply to one repo only
Skill Storage
Skills live in two places:
- Org-level skills: In OrgPrime’s DO SQLite (shared across repos)
- Repo-level skills: In each RepoPrime’s DO SQLite (repo-specific)
RepoPrime inherits org-level skills and can override them with repo-specific versions.
Auto-Learning: Knowledge Distillation
Inspired by Agent Zero’s auto-learning pattern, adapted for structured markdown and DO SQLite.
When Distillation Happens
| Trigger | What Gets Distilled |
|---|---|
| Job completed (success) | Procedure → skill, outcome → attempt note |
| Job completed (failure) | Error pattern → skill update, blocker → pattern note |
| Human conversation | Business context → decision note, priority change → working memory |
| Signal change detected | State transition → pattern note |
| RepoPrime escalation | Cross-repo pattern → OrgPrime pattern note |
The Distillation Prompt
You are a knowledge distiller for the {agent_name} agent.
Given this event:
{event_type}: {event_summary}
{event_details}
And the current memory structure:
{keyword_map}
Extract knowledge into one of these categories:
- decisions/ — why something was decided, with context
- attempts/ — what was tried, outcome, what was learned
- patterns/ — recurring patterns (errors, dependencies, signals)
- skills/ — repeatable procedures (SKILL.md format)
Rules:
- Link to existing notes using [[note-title]] when relevant
- Use YAML frontmatter with: title, folder, source, related_repos
- If updating an existing note, return the note ID and the update
- Be concise — memory notes should be <200 words
- Include ONLY information not obvious from the code itself
Temporal Decay
Memory notes are not deleted. They decay naturally through BM25 recency weighting:
-- Recency weight: notes updated recently rank higher
-- Half-life: 30 days (a note from 30 days ago scores 50% of a fresh one)
SELECT *,
rank * (0.5 + 0.5 * EXP(-0.693 * (unixepoch('now') - updated_at) / 2592000.0))
AS weighted_rank
FROM memory_fts
WHERE memory_fts MATCH ?
ORDER BY weighted_rank DESC;
This means:
- Recent knowledge surfaces first
- Old knowledge is still searchable but deprioritized
- No pruning logic needed — the ranking handles it
- A note that gets updated (referenced, linked) resets its decay
The Run Sheet: Control Plane to Data Plane
The run sheet is OrgPrime’s output — a prioritized list of work for the dispatcher.
interface RunSheetItem {
rank: number
repo: string
signal: string
jobType: string
runner: 'cf' | 'ci' | 'local'
reason: string // why this matters (business context)
approach: string // how to do it (technical guidance)
issueNumber?: number // tracking issue
cooldownHours: number // don't retry before this
blockedBy?: string[] // other run sheet items that must complete first
businessGoal?: string // which human conversation spawned this
}
Run Sheet vs Current Policy Rules
| Current (POLICY_RULES) | Prime (Run Sheet) |
|---|---|
| Static if/else in code | Dynamic, regenerated each OrgPrime wake |
| Signal → job type mapping | Business intent → prioritized work list |
| Same rules for all repos | Per-repo reasoning by RepoPrime |
| No business context | Links work to business goals |
| No dependency tracking | Explicit blockedBy relationships |
| No cooldown intelligence | Cooldown based on attempt history |
The Dispatcher: Dumb Resource Manager
The dispatcher’s role shrinks significantly in the Prime architecture. It becomes a pure execution engine:
What Stays
- Scan repo signals via GitHub API (proxied through API Mom)
- Drain pending jobs to runners (CF, CI, Local)
- Record attempt outcomes in D1
- Batch-commit README dashboard
- Dispatch via
repository_dispatchto CI runner - Report job outcomes to RepoPrime DOs
What Moves to Prime
- Policy evaluation → OrgPrime produces run sheet
- Job creation → RepoPrimes submit jobs
- Priority logic → OrgPrime reasons about priorities
- Failure analysis → RepoPrimes reason about failures
What’s New
- Read run sheet from OrgPrime (replaces policy evaluation)
- Notify RepoPrime on job completion (via DO RPC)
- Respect
blockedBydependencies in run sheet
Competitive Position
vs OpenClaw (302K stars)
| Dimension | OpenClaw | Prime |
|---|---|---|
| Scope | Single agent, single repo | Hierarchical: org + N repos |
| Persistence | Local process, dies when stopped | DO hibernation, zero cost when idle |
| Memory | File-based + SQLite + vector search | DO SQLite + progressive disclosure (no embeddings) |
| Skills | 5,400+ in registry (SKILL.md) | Adopts SKILL.md pattern, starts with ~10 |
| Conversation | 20+ platform adapters | WebSocket + Telegram (extensible) |
| Multi-repo | No | Native — OrgPrime coordinates 40 repos |
| Cost control | None | API Mom intelligent router + daily budgets |
| Business decomposition | No — human must specify repo and task | OrgPrime decomposes business intent |
| Auto-learning | No (manual MEMORY.md) | Agent Zero-style distillation |
| Always-on | Requires running machine | Cloudflare edge, $0 when idle |
Prime’s advantage: Hierarchical decomposition + always-on + cost control. OpenClaw’s advantage: Ecosystem size + platform coverage.
vs Agent Zero (12K stars)
| Dimension | Agent Zero | Prime |
|---|---|---|
| Auto-learning | FAISS embeddings, auto-extract | BM25 + progressive disclosure, auto-distill |
| Hierarchy | Superior/subordinate chain | OrgPrime/RepoPrime with DO persistence |
| Persistence | FAISS files on disk | DO SQLite (survives indefinitely) |
| Execution | Docker containers | 3-tier (CF/CI/Local) |
Prime’s advantage: Persistent state without infrastructure, edge deployment. Agent Zero’s advantage: More mature auto-learning implementation.
vs LangGraph (10K stars)
| Dimension | LangGraph | Prime |
|---|---|---|
| State | PostgresSaver checkpoints | DO SQLite (no external DB needed) |
| Graph | Arbitrary node graphs | Two-level hierarchy (simpler, sufficient) |
| Human-in-loop | Interrupt nodes | WebSocket conversation |
| Persistence | Requires PostgreSQL | Built into DO (zero config) |
Prime’s advantage: No infrastructure requirements, always-on without a server. LangGraph’s advantage: More flexible graph patterns, larger community.
The Unique Position
No framework in the survey combines:
- Always-on persistence (DO hibernation)
- Hierarchical agent coordination (org → repo)
- Business-to-code decomposition (conversation → issues → jobs)
- Cost-controlled execution (API Mom routing)
- Auto-learning memory (distillation into DO SQLite)
This is not a general-purpose agent framework. It is an organizational control plane — purpose-built for the specific problem of managing a portfolio of software repos from business-level conversations.
Implementation: Current State and Migration
What’s Built (garywu/mulan)
| Component | Status | Notes |
|---|---|---|
| Dispatcher CF Worker | Live | Scans, dispatches, commits README |
| BrainDO (→ OrgPrime) | Partial | Alarm, run sheet, flag escalation. No LLM, no WebSocket. |
| RepoPrimeDO | Partial | Boot/wake/plan-result routes. No Agents SDK, no SQLite, no LLM. |
| D1 schema | Live | jobs, repos, signal_history, runners, pr_outcomes |
| CF Runner | Live | Inline job execution in worker |
| CI Runner | Live | GitHub Actions deterministic handlers |
| Local Runner | Partial | Claude Agent SDK, claims jobs |
| GitHub API proxy | Live | Centralized via github-fetch.ts + API Mom |
| Telegram bot | Live | @brewdbot, send-only |
| Signal scanning | Live | 12 checks per repo, smart scan optimization |
| Batch commit | Live | Git Trees API, single commit per tick |
| PAUSED kill switch | Live | Emergency brake for dispatcher |
Migration Path
Phase 1: Foundation (fix current implementation)
- Migrate BrainDO and RepoPrimeDO to Cloudflare Agents SDK (
extends Agent) - Add DO SQLite memory tables (replace KV storage)
- Wire Agents SDK alarm scheduling (replace manual alarm)
- Move policy evaluation out of dispatcher into OrgPrime
Phase 2: Memory (Napkin-inspired)
- Implement 4-level progressive disclosure in DO SQLite
- Add FTS5 index for BM25 search
- Build keyword map generator
- Implement temporal decay weighting
- Add distillation on job completion
Phase 3: Conversation
- Add WebSocket to OrgPrime via Agents SDK
- Implement conversation history in DO SQLite
- Build business intent → repo task decomposition (LLM call)
- Wire Telegram adapter to OrgPrime WebSocket
- Add one LLM call per wake cycle to RepoPrime
Phase 4: Skills and Learning
- Implement SKILL.md format in DO SQLite
- Build skill injection into RepoPrime’s LLM context
- Add crystallization: successful job → skill extraction
- Implement skill inheritance (org → repo)
Phase 5: Run Sheet
- OrgPrime generates run sheet from LLM reasoning
- Dispatcher reads run sheet (replaces POLICY_RULES)
- Add blockedBy dependency tracking
- Link run sheet items to business goals from conversations
Schema and Data Model
D1 (Shared, Org-Wide) — Unchanged
-- Signal states for all repos (dispatcher writes, Primes read)
CREATE TABLE repos (remote TEXT PRIMARY KEY, state TEXT NOT NULL);
-- Job queue (RepoPrimes submit, dispatcher executes)
CREATE TABLE jobs (
id TEXT PRIMARY KEY, type TEXT, repo TEXT, status TEXT,
needs TEXT, payload TEXT, result TEXT,
priority INTEGER, created_at INTEGER, updated_at INTEGER,
completed_at INTEGER, claimed_by TEXT
);
-- Attempt history (dispatcher writes, Primes read)
CREATE TABLE signal_history (
id TEXT PRIMARY KEY, repo TEXT, signal TEXT,
prev TEXT, curr TEXT, changed_at INTEGER
);
-- PR outcomes (dispatcher writes, Primes read for regression detection)
CREATE TABLE pr_outcomes (
pr_url TEXT PRIMARY KEY, job_type TEXT, repo TEXT,
baseline_signals TEXT, outcome TEXT, merged_at INTEGER
);
-- Run sheet (OrgPrime writes, dispatcher reads)
CREATE TABLE run_sheet (
rank INTEGER NOT NULL, repo TEXT NOT NULL,
signal TEXT NOT NULL, job_type TEXT NOT NULL,
runner TEXT NOT NULL, reason TEXT NOT NULL,
approach TEXT NOT NULL, issue_number INTEGER,
cooldown_hours INTEGER DEFAULT 24,
blocked_by TEXT, business_goal TEXT,
updated_at INTEGER NOT NULL
);
DO SQLite (Private, Per-Agent) — New
See the Memory section for the complete schema.
References
This Project
- garywu/mulan — Implementation repository
- mulan#131 — RFC: Prime as Durable Object
- mulan#123 — Epic: Autonomous Org Maintenance
- atlas#404 — Brain as org control plane DO
Architecture Articles
- Autonomous Agent Frameworks Compared — 18-framework survey
- The Autonomous Entity Pattern — The meta-framework
- Prime: Persistent Org-Level AI Agents on Cloudflare — Prior article (superseded by this one)
- The Three-Layer AI Agent Architecture — Container / Brain / Wallet
- Never Fail Twice: Escalation Ladder — Skill crystallization
External
- Napkin Memory System — Progressive disclosure memory for agents
- Cloudflare Agents SDK — DO-based agent runtime
- OpenClaw — 302K-star agent framework (benchmark)
- Agent Zero — Auto-learning hierarchical agents