Skip to content
Gary Wu
Go back

Prime: A Conversational Control Plane

Edit page

The human talks to the business. The business talks to the code. Every conversation makes the system smarter.

Most autonomous agent frameworks solve the wrong problem. They make a single agent that can code faster. The actual problem is different: a human runs a business with 40 repos, dozens of brands, continuous drift, and no time to start individual coding sessions for every repo that needs work. The human needs to talk at the business level — “Income Coach isn’t converting” — and have that translate into concrete code changes across multiple repos, automatically, with memory of what was tried and why.

This article describes Prime — a hierarchical control plane where persistent AI agents manage an organization autonomously. The human talks to the org-level agent about business priorities. The org agent decomposes intent into repo-level tasks. Per-repo agents execute, learn, and report back. The system gets smarter with every cycle.

Built on Cloudflare Agents SDK, Durable Objects, and a Napkin-inspired memory system that uses the best model for retrieval — not a separate embedding model.


Table of Contents

Open Table of Contents

Why This Exists

The Conversation Gap

Every agent framework in the 18-framework survey solves some version of “human tells agent what to code.” OpenClaw connects to 20+ chat platforms. Claude Agent SDK gives you terminal access. CrewAI orchestrates teams. LangGraph checkpoints multi-step workflows.

None of them solve: human talks about business → system produces code changes across multiple repos.

The gap is not in execution capability. Modern agents can write code, fix CI, add configurations, create PRs. The gap is in decomposition — translating business intent into concrete technical work across a portfolio of repos, with memory of what was tried, awareness of what’s blocked, and intelligence to know when NOT to act.

The Architecture Gap

OpenClaw has 302K stars and 5,400+ skills. It is also a single-agent system that requires a running machine, forgets everything between sessions (unless you manually maintain MEMORY.md), and has no concept of organizational hierarchy.

The Cloudflare Agents SDK provides the missing primitive: Durable Objects as persistent agents. A DO has its own SQLite database, survives indefinitely, hibernates at zero cost, wakes instantly on events, and is always addressable. This is the container that makes always-on agents economically viable.

But the Agents SDK is infrastructure, not architecture. It gives you the building blocks. This article describes what you build with them.

What Changes

Before (current)After (Prime)
Human starts a Claude Code session per repoHuman talks to OrgPrime about business priorities
Agent forgets when session endsAgents have persistent memory across all sessions
One repo at a time40 repos managed simultaneously
Manual task decompositionBusiness intent auto-decomposes into repo tasks
Dispatcher does everything (scan, evaluate, dispatch)Control plane thinks, data plane executes
No conversation historyEvery decision recorded, auditable, learnable
System is idle unless human is talkingSystem is always on, always aware

The Vision: Business Language In, Code Changes Out

Human: "Income Coach isn't converting. The onboarding flow is
        too long and the value prop isn't clear on the landing page."

OrgPrime reasons:
  - This is a business-level concern spanning multiple repos
  - income-coach repo: onboarding flow is code
  - brand-systems repo: value prop is brand positioning
  - frontasy repo: landing page renders the positioning

OrgPrime decomposes:
  1. garywu/brand-systems → create issue: "sharpen Income Coach
     value prop — current messaging doesn't communicate immediate
     value to first-time visitors"
  2. garywu/income-coach → create issue: "simplify onboarding —
     reduce steps from 5 to 2, defer profile completion to after
     first value delivery"
  3. garywu/frontasy → create issue: "update Income Coach landing
     page copy to match new positioning" (blocked on #1)

Each RepoPrime:
  - Reads its CLAUDE.md for repo context
  - Reads the issue OrgPrime created
  - Reasons about implementation approach
  - Submits jobs to the dispatcher
  - Reports outcomes back to OrgPrime

OrgPrime tracks:
  - All three repos progressing toward the same business goal
  - Dependencies (frontasy blocked on brand-systems)
  - Whether the business intent was actually addressed

This is not a hypothetical. Every piece of this — DO hierarchy, job dispatch, GitHub issues as memory, signal scanning, CI-aware merge — is either built or partially built in garywu/mulan. What’s missing is the intelligence layer: the LLM reasoning that converts business language into technical decomposition.


Architecture Overview

Human (WebSocket / Telegram / CLI)


OrgPrime DO (one per GitHub org)
  │  ├── Persistent SQLite memory
  │  ├── Conversation history
  │  ├── Business context + priorities
  │  ├── Cross-repo awareness
  │  └── Decomposes intent → repo tasks

  ├── RepoPrime DO × N (one per repo)
  │     ├── Persistent SQLite memory
  │     ├── CLAUDE.md identity
  │     ├── Signal awareness (CI, standards, PRs)
  │     ├── Attempt history
  │     ├── Skill registry
  │     └── Reasons about repo-specific work


Dispatcher CF Worker (one per org)
  │  ├── Reads run sheet from OrgPrime
  │  ├── Manages runner capacity
  │  ├── Deduplicates jobs
  │  ├── Records every attempt
  │  └── Reports outcomes to RepoPrimes

  ├── CF Runner (inline, free)
  ├── CI Runner (GitHub Actions, deterministic)
  └── Local Runner (Claude Agent SDK, AI-driven)

The Three Separations

1. Control Plane vs Data Plane

The control plane (OrgPrime + RepoPrimes) thinks. The data plane (Dispatcher + Runners) executes. Neither does both. This separation means:

2. Business Level vs Code Level

OrgPrime speaks business language. RepoPrimes speak code language. The translation happens at the OrgPrime → RepoPrime boundary via GitHub issues. Issues are the API between business intent and technical execution.

3. Memory vs Execution

Memory lives in DO SQLite (private to each agent) and GitHub issues (shared, auditable). Execution lives in the dispatcher and runners. Memory survives indefinitely. Execution is ephemeral — jobs start, run, complete, and their outcomes feed back into memory.


The Agent Hierarchy

OrgPrime DO — The Business Agent

One per GitHub organization. The human’s primary interface.

Knows:

Does:

Does NOT:

Wake triggers:

RepoPrime DO — The Repo Agent

One per managed repository. Autonomous within its scope.

Knows:

Does:

Does NOT:

Wake triggers:

Dispatcher — The Resource Manager

One per organization. Pure mechanics, zero intelligence.

Does:

Does NOT:


The Conversation Interface

WebSocket via Agents SDK

The Cloudflare Agents SDK provides native WebSocket support with hibernation. OrgPrime maintains a persistent WebSocket connection to the human’s client:

export class OrgPrime extends Agent<Env, OrgState> {
  // WebSocket message from human
  async onMessage(connection: Connection, message: string) {
    // 1. Store message in conversation history (DO SQLite)
    this.sql`INSERT INTO conversations VALUES (
      ${crypto.randomUUID()}, 'human', ${message}, ${Date.now()}
    )`

    // 2. Build context (progressive disclosure from memory)
    const context = await this.buildContext(message)

    // 3. One LLM call — reason about business intent
    const decision = await generateObject({
      model: this.getModel(),
      schema: OrgDecisionSchema,
      system: context.pinnedContext,
      prompt: this.buildPrompt(context, message),
    })

    // 4. Execute decisions (create issues, delegate to RepoPrimes)
    await this.executeDecisions(decision.actions)

    // 5. Respond to human
    connection.send(JSON.stringify({
      type: 'response',
      summary: decision.summary,
      actions: decision.actions.map(a => a.reason),
    }))

    // 6. Distill new knowledge from this conversation turn
    await this.distillConversation(message, decision)
  }

  // WebSocket hibernation — zero cost when human isn't talking
  async onClose(connection: Connection) {
    // Connection state persists in DO SQLite
    // Next connection resumes with full history
  }
}

Multi-Platform Support

OrgPrime’s WebSocket is the canonical interface. Platform adapters translate:

PlatformAdapterStatus
CLI (terminal)Direct WebSocketPriority 1
TelegramBot webhook → OrgPrime HTTP → WebSocketBuilt (@brewdbot)
Web dashboardWebSocket from browserFuture
SlackSlack Events API → OrgPrime HTTPFuture

The adapter is thin — it translates platform message format to OrgPrime’s WebSocket protocol. All intelligence lives in OrgPrime.

Conversation Memory

Every conversation turn is stored in OrgPrime’s DO SQLite:

CREATE TABLE conversations (
  id TEXT PRIMARY KEY,
  role TEXT NOT NULL,        -- 'human' | 'org-prime' | 'system'
  content TEXT NOT NULL,
  related_repos TEXT,        -- JSON array of repos mentioned
  related_issues TEXT,       -- JSON array of issues referenced
  created_at INTEGER NOT NULL
);

This gives OrgPrime:


Memory: Napkin-Inspired Progressive Disclosure

Traditional RAG bolts a smaller, dumber embedding model onto a capable LLM to pre-filter information. This inverts the decision hierarchy — the least capable model makes the most important decision (what context to retrieve).

Inspired by the Napkin memory system, Prime uses progressive disclosure — the LLM itself navigates a structured knowledge base using its full reasoning capability.

The Four Levels

Level 0: Pinned Context (always loaded, <500 tokens)

Loaded on every wake cycle. Contains only what the agent MUST know:

For OrgPrime:

# Org Context
- 40 repos across garywu org
- 3-tier execution: CF (free) → CI (deterministic) → Local (AI)
- Budget: $X/day across all repos
- Current focus: [from last human conversation]
- Blocked: [repos waiting on human input]

For RepoPrime:

# [repo-name] Context
- Stack: TypeScript, Cloudflare Workers, D1
- Priority: P1 (revenue-generating)
- Current state: CI passing, 3 open automated issues
- Last human direction: [from OrgPrime delegation]

This is the equivalent of CLAUDE.md but distilled to what matters RIGHT NOW.

Level 1: Keyword Map (loaded on wake, ~200 tokens)

A TF-IDF weighted taxonomy of the agent’s memory, generated from DO SQLite:

decisions/
  keywords: onboarding, conversion, brand, positioning
  notes: 12
attempts/
  keywords: fix-ci, biome, commitlint, tsc-errors
  notes: 45
patterns/
  keywords: rate-limit, timeout, dependency, cascade
  notes: 8
skills/
  keywords: add-biome, fix-commitlint, fix-husky, diagnose-ci
  notes: 10

The LLM reads this map and decides which folders to search. No embedding model involved — the best model makes the retrieval decision.

TF-IDF weighting:

Level 2: Search (on-demand, BM25)

When the LLM needs more context, it searches memory:

-- BM25 search over memory notes
SELECT title, snippet, folder, updated_at
FROM memory_fts
WHERE memory_fts MATCH ?
ORDER BY rank * recency_weight * backlink_score
LIMIT 10;

Key design decisions (from Napkin):

Level 3: Full Read (on-demand)

Complete note content, only when Level 2 points to something relevant:

SELECT content FROM memory WHERE id = ?;

This is the equivalent of reading a full GitHub issue or a complete SKILL.md file. The LLM navigates here deliberately, not by accident.

At our scale (40 repos, ~50 memory notes per repo = ~2,000 total notes), vector search adds complexity without benefit:

DimensionVector SearchProgressive Disclosure
Retrieval intelligenceEmbedding model (smaller, dumber)The LLM itself (best model available)
InfrastructureEmbedding pipeline + vector DBDO SQLite FTS5 (built-in, free)
DebuggingOpaque cosine similaritiesReadable keyword maps + BM25
Update costRe-embed on every changeNo pipeline, FTS5 auto-updates
Cold startNeeds embeddings computedWorks immediately from text

For corpora of 100K+ documents, vector search is necessary. For organizational memory at our scale, it’s overhead that produces worse results.

DO SQLite Schema

Each agent (OrgPrime and every RepoPrime) has its own SQLite database:

-- Memory notes (the knowledge base)
CREATE TABLE memory (
  id TEXT PRIMARY KEY,
  folder TEXT NOT NULL,      -- 'decisions', 'attempts', 'patterns', 'skills'
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  backlinks TEXT DEFAULT '[]',  -- JSON array of note IDs this links to
  source TEXT,               -- 'distillation' | 'human' | 'job-outcome'
  created_at INTEGER NOT NULL,
  updated_at INTEGER NOT NULL
);

-- FTS5 index for BM25 search
CREATE VIRTUAL TABLE memory_fts USING fts5(
  title, content, folder,
  content=memory, content_rowid=rowid
);

-- Keyword map cache (regenerated on write)
CREATE TABLE keyword_map (
  folder TEXT PRIMARY KEY,
  keywords TEXT NOT NULL,    -- JSON array of {term, weight} sorted by weight
  note_count INTEGER NOT NULL,
  updated_at INTEGER NOT NULL
);

-- Conversation history (OrgPrime only)
CREATE TABLE conversations (
  id TEXT PRIMARY KEY,
  role TEXT NOT NULL,
  content TEXT NOT NULL,
  related_repos TEXT,
  related_issues TEXT,
  created_at INTEGER NOT NULL
);

-- Working state
CREATE TABLE working (
  key TEXT PRIMARY KEY,
  value TEXT NOT NULL,
  updated_at INTEGER NOT NULL
);

-- Decision log (audit trail)
CREATE TABLE decisions (
  id TEXT PRIMARY KEY,
  reasoning TEXT NOT NULL,
  actions TEXT NOT NULL,      -- JSON array
  wake_reason TEXT NOT NULL,
  decided_at INTEGER NOT NULL
);

The Wake Cycle

When an agent wakes (from alarm, webhook, conversation, or delegation):

1. Load Pinned Context (Level 0)
   → Read from working memory: current plan, in-flight jobs, last decision
   → For RepoPrime: read CLAUDE.md (cached in SQLite, refresh if >24h)

2. Load Keyword Map (Level 1)
   → Generated from memory table, cached in keyword_map table
   → ~200 tokens of navigational context

3. Understand the Trigger
   → What happened? (alarm, webhook event, human message, delegation)
   → Search memory (Level 2) for relevant history
   → Read specific notes (Level 3) if needed

4. Reason (ONE LLM call)
   → Input: pinned context + keyword map + trigger + relevant memory
   → Output: structured decision (actions + reasoning + next wake time)
   → Constraint: max 5 actions per wake cycle

5. Act
   → Submit jobs to dispatcher
   → Create/comment on GitHub issues
   → Delegate to RepoPrimes (OrgPrime only)
   → Flag OrgPrime (RepoPrime only)
   → Update working memory

6. Distill
   → Extract knowledge from this cycle into memory notes
   → Update keyword map

7. Schedule Next Wake
   → Work pending: 1h alarm
   → Nothing pending: 6h alarm
   → Blocked on human: 24h alarm
   → Just had conversation: 10min alarm (responsiveness)

The Single LLM Call Principle

Each wake cycle makes exactly ONE LLM call for reasoning. Not per event, not per repo, not per job. One call with full context, producing a structured decision.

This is critical for cost control and coherence:

The model selection follows the Wallet layer pattern (API Mom):


Skill System: SKILL.md

Adopted from OpenClaw’s proven pattern: skills are markdown files, not code.

# skills/fix-commitlint/SKILL.md
---
name: fix-commitlint
description: Add conventional commit linting to a repo
runner: ci
signals: [commitlint]
requires:
  files: [package.json]
confidence: 0.95
last_success: 2026-03-20
success_rate: 47/50
---

## Instructions

1. Install @commitlint/cli and @commitlint/config-conventional
2. Create commitlint.config.cjs extending config-conventional
3. Add commitlint to husky pre-commit hook
4. Verify: echo "fix: test" | npx commitlint

## Error Patterns

- If husky is not installed, run fix-husky skill first
- If package.json has no "prepare" script, add "prepare": "husky"
- If commitlint.config.cjs conflicts with existing config, check
  for .commitlintrc.json and remove it

## Learned From

- garywu/frontasy#12 (2026-03-15): initial implementation
- garywu/niche-fi#8 (2026-03-18): husky dependency discovered
- garywu/svg-generators#30 (2026-03-21): config format conflict

Skill Lifecycle

1. Manual Creation
   Human or agent writes SKILL.md for a known procedure

2. Selective Injection
   RepoPrime's Level 1 keyword map includes skill names
   LLM reads relevant skills before reasoning about a job

3. Crystallization (Auto-Learning)
   Job succeeds → distillation extracts the procedure
   → Creates or updates SKILL.md with new error patterns
   → Updates success_rate and last_success

4. Skill Inheritance
   Universal skills (add-biome) → apply to all repos
   Vertical skills (fix-cloudflare-worker) → apply to CF Worker repos
   Repo-specific skills → apply to one repo only

Skill Storage

Skills live in two places:

RepoPrime inherits org-level skills and can override them with repo-specific versions.


Auto-Learning: Knowledge Distillation

Inspired by Agent Zero’s auto-learning pattern, adapted for structured markdown and DO SQLite.

When Distillation Happens

TriggerWhat Gets Distilled
Job completed (success)Procedure → skill, outcome → attempt note
Job completed (failure)Error pattern → skill update, blocker → pattern note
Human conversationBusiness context → decision note, priority change → working memory
Signal change detectedState transition → pattern note
RepoPrime escalationCross-repo pattern → OrgPrime pattern note

The Distillation Prompt

You are a knowledge distiller for the {agent_name} agent.

Given this event:
{event_type}: {event_summary}
{event_details}

And the current memory structure:
{keyword_map}

Extract knowledge into one of these categories:
- decisions/  — why something was decided, with context
- attempts/   — what was tried, outcome, what was learned
- patterns/   — recurring patterns (errors, dependencies, signals)
- skills/     — repeatable procedures (SKILL.md format)

Rules:
- Link to existing notes using [[note-title]] when relevant
- Use YAML frontmatter with: title, folder, source, related_repos
- If updating an existing note, return the note ID and the update
- Be concise — memory notes should be <200 words
- Include ONLY information not obvious from the code itself

Temporal Decay

Memory notes are not deleted. They decay naturally through BM25 recency weighting:

-- Recency weight: notes updated recently rank higher
-- Half-life: 30 days (a note from 30 days ago scores 50% of a fresh one)
SELECT *,
  rank * (0.5 + 0.5 * EXP(-0.693 * (unixepoch('now') - updated_at) / 2592000.0))
  AS weighted_rank
FROM memory_fts
WHERE memory_fts MATCH ?
ORDER BY weighted_rank DESC;

This means:


The Run Sheet: Control Plane to Data Plane

The run sheet is OrgPrime’s output — a prioritized list of work for the dispatcher.

interface RunSheetItem {
  rank: number
  repo: string
  signal: string
  jobType: string
  runner: 'cf' | 'ci' | 'local'
  reason: string           // why this matters (business context)
  approach: string         // how to do it (technical guidance)
  issueNumber?: number     // tracking issue
  cooldownHours: number    // don't retry before this
  blockedBy?: string[]     // other run sheet items that must complete first
  businessGoal?: string    // which human conversation spawned this
}

Run Sheet vs Current Policy Rules

Current (POLICY_RULES)Prime (Run Sheet)
Static if/else in codeDynamic, regenerated each OrgPrime wake
Signal → job type mappingBusiness intent → prioritized work list
Same rules for all reposPer-repo reasoning by RepoPrime
No business contextLinks work to business goals
No dependency trackingExplicit blockedBy relationships
No cooldown intelligenceCooldown based on attempt history

The Dispatcher: Dumb Resource Manager

The dispatcher’s role shrinks significantly in the Prime architecture. It becomes a pure execution engine:

What Stays

What Moves to Prime

What’s New


Competitive Position

vs OpenClaw (302K stars)

DimensionOpenClawPrime
ScopeSingle agent, single repoHierarchical: org + N repos
PersistenceLocal process, dies when stoppedDO hibernation, zero cost when idle
MemoryFile-based + SQLite + vector searchDO SQLite + progressive disclosure (no embeddings)
Skills5,400+ in registry (SKILL.md)Adopts SKILL.md pattern, starts with ~10
Conversation20+ platform adaptersWebSocket + Telegram (extensible)
Multi-repoNoNative — OrgPrime coordinates 40 repos
Cost controlNoneAPI Mom intelligent router + daily budgets
Business decompositionNo — human must specify repo and taskOrgPrime decomposes business intent
Auto-learningNo (manual MEMORY.md)Agent Zero-style distillation
Always-onRequires running machineCloudflare edge, $0 when idle

Prime’s advantage: Hierarchical decomposition + always-on + cost control. OpenClaw’s advantage: Ecosystem size + platform coverage.

vs Agent Zero (12K stars)

DimensionAgent ZeroPrime
Auto-learningFAISS embeddings, auto-extractBM25 + progressive disclosure, auto-distill
HierarchySuperior/subordinate chainOrgPrime/RepoPrime with DO persistence
PersistenceFAISS files on diskDO SQLite (survives indefinitely)
ExecutionDocker containers3-tier (CF/CI/Local)

Prime’s advantage: Persistent state without infrastructure, edge deployment. Agent Zero’s advantage: More mature auto-learning implementation.

vs LangGraph (10K stars)

DimensionLangGraphPrime
StatePostgresSaver checkpointsDO SQLite (no external DB needed)
GraphArbitrary node graphsTwo-level hierarchy (simpler, sufficient)
Human-in-loopInterrupt nodesWebSocket conversation
PersistenceRequires PostgreSQLBuilt into DO (zero config)

Prime’s advantage: No infrastructure requirements, always-on without a server. LangGraph’s advantage: More flexible graph patterns, larger community.

The Unique Position

No framework in the survey combines:

  1. Always-on persistence (DO hibernation)
  2. Hierarchical agent coordination (org → repo)
  3. Business-to-code decomposition (conversation → issues → jobs)
  4. Cost-controlled execution (API Mom routing)
  5. Auto-learning memory (distillation into DO SQLite)

This is not a general-purpose agent framework. It is an organizational control plane — purpose-built for the specific problem of managing a portfolio of software repos from business-level conversations.


Implementation: Current State and Migration

What’s Built (garywu/mulan)

ComponentStatusNotes
Dispatcher CF WorkerLiveScans, dispatches, commits README
BrainDO (→ OrgPrime)PartialAlarm, run sheet, flag escalation. No LLM, no WebSocket.
RepoPrimeDOPartialBoot/wake/plan-result routes. No Agents SDK, no SQLite, no LLM.
D1 schemaLivejobs, repos, signal_history, runners, pr_outcomes
CF RunnerLiveInline job execution in worker
CI RunnerLiveGitHub Actions deterministic handlers
Local RunnerPartialClaude Agent SDK, claims jobs
GitHub API proxyLiveCentralized via github-fetch.ts + API Mom
Telegram botLive@brewdbot, send-only
Signal scanningLive12 checks per repo, smart scan optimization
Batch commitLiveGit Trees API, single commit per tick
PAUSED kill switchLiveEmergency brake for dispatcher

Migration Path

Phase 1: Foundation (fix current implementation)

Phase 2: Memory (Napkin-inspired)

Phase 3: Conversation

Phase 4: Skills and Learning

Phase 5: Run Sheet


Schema and Data Model

D1 (Shared, Org-Wide) — Unchanged

-- Signal states for all repos (dispatcher writes, Primes read)
CREATE TABLE repos (remote TEXT PRIMARY KEY, state TEXT NOT NULL);

-- Job queue (RepoPrimes submit, dispatcher executes)
CREATE TABLE jobs (
  id TEXT PRIMARY KEY, type TEXT, repo TEXT, status TEXT,
  needs TEXT, payload TEXT, result TEXT,
  priority INTEGER, created_at INTEGER, updated_at INTEGER,
  completed_at INTEGER, claimed_by TEXT
);

-- Attempt history (dispatcher writes, Primes read)
CREATE TABLE signal_history (
  id TEXT PRIMARY KEY, repo TEXT, signal TEXT,
  prev TEXT, curr TEXT, changed_at INTEGER
);

-- PR outcomes (dispatcher writes, Primes read for regression detection)
CREATE TABLE pr_outcomes (
  pr_url TEXT PRIMARY KEY, job_type TEXT, repo TEXT,
  baseline_signals TEXT, outcome TEXT, merged_at INTEGER
);

-- Run sheet (OrgPrime writes, dispatcher reads)
CREATE TABLE run_sheet (
  rank INTEGER NOT NULL, repo TEXT NOT NULL,
  signal TEXT NOT NULL, job_type TEXT NOT NULL,
  runner TEXT NOT NULL, reason TEXT NOT NULL,
  approach TEXT NOT NULL, issue_number INTEGER,
  cooldown_hours INTEGER DEFAULT 24,
  blocked_by TEXT, business_goal TEXT,
  updated_at INTEGER NOT NULL
);

DO SQLite (Private, Per-Agent) — New

See the Memory section for the complete schema.


References

This Project

Architecture Articles

External


Edit page
Share this post on:

Previous Post
Inversion of Control in Data Pipeline Architecture
Next Post
Self-Hosted to Serverless: Migrating a WebSocket Relay