Agency is the capacity to generate goals from ambient state — not the capacity to execute given goals. Every system in the “AI agent” category today is a very capable executor; almost none does open-ended goal generation from rich ambient state. That missing capability is the distinguishing primitive. Once it exists, executors become a commodity layer you swap as the market evolves. An active-memory substrate is a plausible-but-not-yet-proven path to that primitive; a spike against our own wiki substrate produced roughly 65% validation — real signal where the substrate was mature, silence where it wasn’t.
1. “Agent” Is Doing Too Much Work
The word “agent” has been overloaded to the point where it carries almost no architectural signal. A queue-polling background worker is called an agent. A chat-completion call in a for-loop is called an agent. A Durable Object with an alarm, a Claude Code session, a cron-invoked shell script — all called agents.
Cluster these systems by what they actually do and a cleaner taxonomy falls out. Two roles, not one:
- Executor. Given a goal, decompose it, choose tools, act, handle errors, stop when done.
- Goal generator. Given a substrate (world, codebase, wiki, portfolio), continuously decide which problem is worth solving next, and phrase it as a goal an executor can act on.
These are different computations with different inputs, success conditions, and failure modes.
An executor takes goal → plan → actions → outcome. Judged on whether outcome matches goal. The goal is exogenous: a human, a ticket, or a parent agent supplies it.
A goal generator takes ambient state → candidate goals → ranking → broadcast. Judged on whether the goals it surfaces are real, well-scoped, and worth doing when nobody asked. The substrate is exogenous; the goal is endogenous.
Dennett’s intentional stance draws roughly this line from the outside: a system is “agent-like” to the degree it’s usefully predictable as pursuing goals over time. Korsgaard, in Self-Constitution, pushes further: practical agency requires choosing what to pursue. Friston’s active-inference framing arrives at a related place — an agent reduces its own surprise by acting on the world, which implies it is the thing deciding which prediction to reduce next. None of these frames reduce agency to “executes a given instruction.” All locate agency at the point where the goal is generated.
We can adopt that as a working definition without claiming to have solved philosophy:
Agency is the capacity to generate goals from ambient state, and to keep generating them, coherently, without being prompted.
Everything downstream of the generated goal — decomposition, tool selection, retries, escalation — is execution. Valuable, hard to do well, increasingly commoditized. But not the thing that makes a system agentic.
This article argues that almost every system currently marketed as an “agent” is, under this definition, an executor. That categorization is not a dismissal: executors are critically useful, and the best ones are extremely sophisticated. The categorization clarifies what the hard open problem actually is, and where the durable moat lives if you solve it.
2. Audit the Field: What Do Existing Systems Actually Do?
Before claiming a gap, we should demonstrate it. Below is a capability audit of ten systems that get called “agents” in 2025–2026 discourse. Each column is a concrete, verifiable capability. I am deliberately ranking capabilities from easy to hard:
- Reactive response. Receives an event and responds.
- Scheduled execution. Wakes on a timer and runs a fixed routine.
- Memory-augmented context. Retains state across invocations and retrieves it on future ones.
- Drift detection. Compares desired state to actual state and flags the delta.
- Subgoal decomposition. Given a goal, produces and executes subgoals.
- Open-ended goal generation. Given ambient state only, proposes novel goals worth pursuing.
The sources for each row are cited — either the framework’s own docs or our own internal survey of ten context-continuity systems (/Users/admin/Work/_readme/wiki/business/autonomous-agents-context-continuity.md) and eleven active-memory systems (/Users/admin/Work/_readme/wiki/business/active-memory-sota-survey.md).
| System | 1. Reactive | 2. Scheduled | 3. Memory | 4. Drift detect | 5. Subgoal decomp | 6. Open-ended goal gen |
|---|---|---|---|---|---|---|
| Claude Code | Yes (user turn) | No | CLAUDE.md + MEMORY.md | No | Yes (TODO list) | No — waits for prompt |
| Claude Agent SDK | Yes | No (caller) | JSONL session transcripts | No | Yes (query() loop) | No |
| OpenClaw | Yes (message) | Yes (dreaming cron) | Workspace files + SQLite + active-memory | No | Yes | No — conversational |
| Letta / MemGPT | Yes | Yes (sleep-time compute) | Core blocks + archival vector | Pressure warning only | Yes (agent loop) | No — self-edits, doesn’t author goals |
| Mem0 | Yes (add/search) | No | Vector + BM25 + entity links | No | N/A (memory layer) | No — not a goal system |
| Zep / Graphiti | Yes (episode) | Yes (community rebuild) | Bi-temporal knowledge graph | Edge invalidation = fact-level drift | N/A | No — substrate, not planner |
| MoltWorker | Yes (sandbox events) | Yes (wake-cron) | R2-as-filesystem + SDK sessions | No | Yes (nested agents) | No |
| AutoGPT / BabyAGI (2023) | N/A | Loop | Vector store | No | Yes (task list) | Attempted — failed (see §3) |
| “Prime” (our org-level pattern) | Yes (webhook) | Yes (DO alarm) | 4-layer (DO SQLite + GitHub issues + CLAUDE.md + D1) | Yes (signals vs desired) | Yes (dispatcher jobs) | Yes, in a bounded domain (repo standards) |
| Honcho (AGPL) | Yes | Yes (dreamer) | Relational + peer representations | Derivation deltas | N/A | Partial — derives, broadcasts, but doesn’t rank goals |
A few observations from the table:
- Columns 1–3 are saturated. Reactive response, scheduling, and memory are table stakes. Disagreements are about shape (files vs graph, summarization vs dreaming) not about whether to have memory.
- Column 4 — drift detection — thins out. Graphiti’s bi-temporal edges detect fact-level drift (new episode contradicts an old edge; old edge gets
invalid_atset). Honcho’s deriver emits events on representation change. Prime explicitly compares signals (CI, biome, commitlint, Dependabot) against desired standards. But most systems don’t have an opinion about what the world should look like, and therefore have nothing to detect drift against. - Column 5 — subgoal decomposition — is solved for given goals. Hand any of these systems a well-formed goal and they’ll break it down. That’s what “agent loops” do.
- Column 6 — open-ended goal generation — is the one almost nobody passes. Claude Code waits for the turn. OpenClaw is conversational. Letta self-edits memory but is still driven by a conversational turn. Mem0/Zep/Graphiti are substrates, not planners. MoltWorker runs goal-bearing tasks handed to it by Claude Code. The Claude Agent SDK is the executor’s executor — great tools for running a goal, but the goal comes from the caller.
Two rows approach column 6, both with qualifications. Prime generates goals from repo-standards drift — ambient state is a set of signals it knows how to score, against a desired state it has opinions about. That is open-ended goal generation in a bounded domain. Honcho’s deriver + dreamer + webhooks stack emits derivation events — arguably proto-goals — but does not rank or prioritize across them in a way an executor could consume without further framing.
No system in this table does what a reasonable person would mean by “an agent that, given my whole operational state, figures out what I should work on today and sets it up to be worked on.” That is the gap.
3. AutoGPT / BabyAGI: The Failure That Taught Us What the Gap Looks Like
It is worth naming the one category that did claim column 6 and whose results are by now a well-documented cautionary tale. In 2023, AutoGPT (github.com/Significant-Gravitas/AutoGPT) and BabyAGI (github.com/yoheinakajima/babyagi) went viral on a simple loop:
- Given a high-level objective, ask the LLM: “what tasks should I do to achieve this?”
- Pick the top task.
- Execute it.
- Ask the LLM: “given what just happened, what new tasks should I add?”
- Re-prioritize.
- Goto 2.
The community built this in a weekend. It looked like open-ended goal generation. It was not. Within weeks, retrospectives from users, researchers, and the authors themselves documented the same failure mode: tasks were syntactically plausible but unmoored from reality. The system would spend hours creating sub-tasks that referred to files it had not read, facts it had not verified, URLs it had hallucinated. Tasks spawned subtasks, the tree grew, nothing shipped. Users would come back after an hour to an LLM bill, a log of a hundred “tasks,” and zero concrete progress.
The postmortem is not “the prompt was bad.” It is structural, and it is the lesson that informs everything else in this article:
Open-ended goal generation without rich, grounded, continuously-updated ambient state produces hallucination dressed as autonomy.
AutoGPT’s loop asked the LLM to generate goals from the LLM’s own priors and its own running narrative. The substrate was the conversation plus whatever scratch files the LLM wrote. No facts ledger, no typed entities, no drift signals, no graph of what the user actually cares about. The LLM was forced to pretend it had ambient state, and it hallucinated that state turn by turn.
Contrast a human “generating goals from ambient state”: walks into a kitchen, notices the trash is full, sees the counter is sticky, remembers a package was supposed to arrive, hears a notification ping. The ambient state is enormously rich and continuously verifiable against reality. Goal generation is cheap precisely because the substrate is high-resolution and real-time.
The 2023 experiment revealed, by failing, that the substrate is the rate-limiting reagent. You cannot brute-force column 6 with an for-loop. The field internalized the lesson and largely retreated from column 6 entirely, focusing on columns 1–5 for the next two years. That is what the 2026 agent-framework landscape mostly looks like.
4. Prime as a Bounded-Domain Existence Proof
If you restrict the domain enough, open-ended goal generation works today. Our own Prime pattern (/Users/admin/Work/_readme/articles/org-prime-agent-architecture/README.md) is one such example, and it is useful to walk through why it works in order to see what a general-purpose version would need.
Prime is a hierarchy of persistent Durable Object agents (Org Prime, Repo Prime × N) monitoring a multi-repo GitHub org. Each Repo Prime has:
- A desired state (CLAUDE.md encodes standards: biome configured, commitlint present, CI green, Dependabot enabled).
- An actual state (periodic scan of GitHub signals).
- A drift detector (compares the two).
- A decision cycle (one LLM call per wake picks the next action from a constrained set).
- A dispatcher (submits jobs to a runner pool).
- A memory (GitHub issues as episodic, DO SQLite as working, D1 as shared, CLAUDE.md as semantic).
This system does generate goals. A new repo joins the org; Repo Prime wakes, scans, notices biome is missing, creates a tracking issue, dispatches a job to add biome.json, comments the outcome. Nobody asked. The goal was generated from ambient state.
Why does this work where AutoGPT didn’t? Four reasons:
- The substrate is real. Repo state comes from GitHub’s API, CI from GitHub Actions, file presence from
GET /contents. Nothing is narrated by the LLM. - The desired state is explicit. CLAUDE.md says what “good” looks like. The LLM never has to invent what the user wants.
- The goal vocabulary is bounded.
{fix-ci, add-biome, add-commitlint, add-dependabot, merge-mulan-pr, ...}— finite catalog. The LLM is choosing from a menu. - The loop is interrupted. Each wake produces at most five actions, then the DO hibernates. No unbounded recursion; re-grounds from the real world every wake.
This is open-ended goal generation inside a domain where the substrate is dense and the goal vocabulary is discrete. It is structurally identical to how MCTS and AlphaZero generate goals (moves) inside a bounded game-tree: the board state is dense and verifiable, the move vocabulary is finite, the loop re-grounds after every move. In-domain goal generation is tractable when the substrate is high-resolution and the action space is enumerable.
Prime is an existence proof. What it does not show is that the same pattern scales to open-ended domains — where the desired state isn’t a YAML file of standards, the goal vocabulary isn’t a fixed catalog, and the substrate isn’t a REST API. That’s the harder problem, and it’s the one the active-memory hypothesis tries to address.
5. The Active-Memory Hypothesis: Goal Generation as a Substrate Function
If Prime’s recipe is “dense verifiable substrate + desired-state doc + bounded goal vocabulary + interrupted loop,” the question for general-purpose agency becomes: can we build that recipe where the desired state is not a YAML file?
The active-memory survey (/Users/admin/Work/_readme/wiki/business/active-memory-sota-survey.md) outlines one plausible path, composed from three systems that each hold a piece:
- Graphiti’s bi-temporal graph — every edge carries
valid_at/invalid_at/expired_atplus anepisodes: list[str]provenance trail. Contradicting facts don’t delete the old edge; they invalidate it. This gives a substrate that can detect its own drift. - A-MEM’s evolution prompt + neighbor update — on every write, an LLM judge looks at the new note and its K nearest neighbors and decides whether to link, strengthen, or mutate the neighbors. The write-propagation primitive.
- Honcho’s deriver / dreamer / webhooks — writes enqueue jobs; a background deriver runs pipelines; a dreamer runs scheduled reorganization passes; webhooks broadcast substrate-state-change events. The architectural shape that turns a passive store into an active one.
Compose these and you get a substrate where writes propagate through pipelines (potentially mutating related nodes), contradictions get flagged rather than silently overwritten, scheduled consolidation passes produce candidate summaries and priorities, and threshold-crossing events fire webhooks that an executor can pick up.
On top of that substrate, goal generation becomes — plausibly — a read-query over the substrate. “What should we work on?” becomes: find the highest-heat unresolved thread, rank by staleness × importance, check against desired-state pages for the relevant product, return a typed goal object. The LLM is no longer hallucinating ambient state; it’s phrasing an SQL-plus-semantic-search query against a graph that has been actively curating itself.
This is not a solved architecture. It is a hypothesis with a clear shape:
Goal-generation quality is proportional to substrate quality. Build the substrate well enough and goal generation becomes a querying problem. Build it poorly and you reproduce AutoGPT.
The remainder of this article treats that hypothesis as something to test, not something to assume.
6. The Spike: Honest 65% Validation
On 2026-04-19 we ran a small spike (/Users/admin/Work/jane/wiki/6-research/goal-generation-spike-s1.md) against this hypothesis. The setup was as clean as we could make it:
- Substrate: our own wiki corpus across four products — First 300 (venture judgment practice), Book Telic (IP generation studio), UberMesh (content substrate), and Scalable Media (brand operator). This is a partially-typed substrate: mostly markdown pages with frontmatter, some numeric status.json files, some YAML agent identities. It is nowhere near the richness of the Graphiti + A-MEM + Honcho composition described above. It is, however, what we actually have.
- Task: “Produce a coherent, grounded, dispatchable 30-day monetization plan without the operator re-briefing anything.” This is not a narrow, bounded-catalog task in the Prime sense — it’s open-ended planning over four separate business units with different monetization models.
- Method: a systematic read of stated goals, stated blockers, integration points, and cross-references; a synthesis of “what is the loudest signal in the substrate right now?”; a week-by-week plan with confidence annotations and source citations for every goal.
- Scoring: the spike itself ended with a self-rating across four axes: coherence, grounding, specificity, sequencing realism.
The result: roughly 65% validation. Specifically, the self-rating from the spike itself was:
| Axis | Score | Notes |
|---|---|---|
| Coherence | 7/10 | Two spine events (ship book 1, deploy intake chatbot) with weeks 2–4 compounding around them. Nothing for UberMesh or Scalable Media. |
| Grounding | 8/10 | Every goal cited a specific doc and line range; most claims validated against two different files. |
| Specificity | 7/10 | ~60% of goals could be dispatched to Haiku directly from the goal + citation; another ~25% to Sonnet. |
| Sequencing realism | 6/10 | Aggressive: pulled month-2 work into month-1 for a single-operator pipeline. |
Two things are worth saying clearly about this result.
Where it worked. Where the substrate was mature — Book Telic has an explicit revenue model, twelve-month plan, economics calculations, status.json showing books_published: 0 against a month-1 target of 1; First 300 has a published thesis, scored portfolio, ICP definition, a transfer document naming the next spike — the generated goals were concrete, well-sequenced, and cited sources the system could re-verify. They read like goals a human operator would produce. Anyone reading the spike’s “Week 1” section could dispatch the work without further briefing.
Where it didn’t. In UberMesh, there was no monetization path stated in the substrate itself. The spike correctly refused to invent one. In Scalable Media, the only monetization framing came from a three-day-old memory file, not from the repo’s own docs. The spike correctly flagged this as insufficient grounding. Under the AutoGPT-failure-mode test, this is the right behavior — silence beats hallucination. But it also means the plan was “one plan” for two products rather than four. The other two didn’t fail; they returned null.
The honest takeaway is not “goal generation works!” and not “goal generation fails!” It is structural:
The rate-limiting factor was substrate maturity, not architecture. Where the wiki had real revenue, strategy, status, and dependency docs, goal generation produced genuine signal. Where the wiki had aspiration without ground-level detail, goal generation silently skipped or produced LOW-confidence goals correctly marked as such.
This tells us what to fix. Not “write a smarter planner.” Rather: “make the substrate denser on the 30% of surface area where it’s thin.” Different kind of work, and it compounds — every page written improves the next goal-generation cycle.
Extrapolating: a substrate at 100% maturity (every product has revenue, status, dependency, and desired-state docs with cross-links) might produce goal generation at 85–90% validation. Getting from 65% to 90% is a matter of filling in substrate, not rewriting the planner. This first spike is consistent with the active-memory hypothesis without proving it.
One spike is not a proof. We have not tested a hardened substrate, substrate-evolution over time, or the same prompt against competitors. But we now have a result shaped like the hypothesis.
7. Agent-Agnosticism: Executors as a Commodity Layer
Grant the argument so far. Suppose agency — open-ended goal generation from ambient state — turns out to be solvable as a function of substrate quality plus a goal-ranking layer on top. What follows for system design?
The consequence is architectural:
Once you own goal generation and the substrate it runs on, the executor becomes a commodity.
This is a concrete claim about which interfaces matter and which don’t.
Today, a lot of engineering effort goes into picking “the right agent framework.” OpenClaw vs Claude Agent SDK vs MoltWorker vs LangGraph vs CrewAI vs AutoGen. Committing to one creates lock-in, usually at the shape-of-work level (conversational vs loop vs graph) rather than at the goal level.
If goal generation is separated cleanly — if a goal is an output of the substrate, shaped as a structured record with type, reason, citation, success criterion, and confidence — the executor on the other side can be anything that can consume that record. Claude Code is a great executor for goals wanting human-in-the-loop. Claude Agent SDK for scripted determinism. MoltWorker for sandboxed nested runs. OpenClaw for conversational. Prime for repo-shaped.
The question stops being “which agent framework do we commit to?” and becomes “which executor is best for this goal right now?” The answer changes week-to-week as the market evolves. You can run executors concurrently against the same goal stream, A/B them, route goals by capability match. This is the same move the API-Mom router makes at the model layer — model choice as routing, not commitment. One layer up, for agent frameworks.
The durable asset is not the executor. Executors are replaceable. The durable asset is:
- The substrate — the typed, content-addressed, actively-curated store of what’s true about the world.
- The goal generator — the continuous process that queries the substrate and emits goal records.
- The interface between them — the goal-record format, which is the only thing any executor needs to understand.
Everything below the goal record is commodity. Everything above it is frontend and human-in-the-loop UX.
This is the Kubernetes split: control plane owns desired state and drift; data plane is dumb and swappable. Kubelet, CRI-O, containerd are interchangeable. The substrate-plus-goal-generator is the control plane for autonomous operation; the executor zoo is the data plane. Prime’s article made this argument for the narrow case of repo standards; the claim here is that it generalizes.
One consequence worth flagging: if you bet your company on being a “best-in-class agent executor,” you are betting on a commoditizing layer. Your moat, if you have one, has to be upstream of the executor.
8. The Clean Definition
The definition the article has been working toward:
Agency is the capacity, given ambient state, to generate goals worth pursuing, and to continue doing so coherently over time.
Philosophically clean: it separates agency from execution rather than bundling them. Operationally clean: it gives you a test — can the system produce a ranked stream of goals without being prompted, such that the goals hold up under inspection? Practically clean: it tells you where to invest — upstream of the executor, in the substrate and the ranking layer.
Testable corollaries:
- A system with no substrate cannot have agency. (AutoGPT, §3.)
- A system with a substrate but no continuous goal-generation process does not have agency even if it could. The capacity has to be running.
- A system with a goal generator but a shallow substrate produces shallow goals. (§6, the 35% that didn’t validate.)
- A system with an actively-curated substrate (write-propagation, bi-temporal invalidation, scheduled reorganization) has higher-quality agency than one with a passive store.
- Agency is not binary. It scales with substrate quality and goal-generator sophistication. Prime has bounded-domain agency; general agency is the same architecture with a richer substrate.
And a consequence:
Most of what the industry calls “agents” in 2026 are execution surfaces. They are useful. They are sophisticated. They are not agents in the sense this article means.
Categorization, not insult. The best execution surfaces — Claude Code, OpenClaw, Claude Agent SDK, MoltWorker — are valuable precisely because they are great at executing. The problem isn’t that they’re executors; it’s that “agent” has been applied to them in a way that obscures what’s actually missing.
9. What to Build If You Want Agency
The practical implications for builders separate cleanly by altitude:
If you are building an executor (an agent framework, a Claude-based coding tool, a sandbox runner): optimize for the handoff from a goal record. Accept structured goal inputs with success criteria and citations. Report outcomes back. Don’t try to also be a goal generator — that’s a different product. Make it easy for someone else’s goal generator to drive you.
If you are building a substrate (a memory system, a knowledge graph, a content-addressed store): the active-memory survey names the primitive you need — write propagation. Writes must trigger pipelines. Pipelines must mutate related entities. Scheduled consolidation passes must re-rank and re-embed. Threshold-crossing events must emit webhooks. Without these, you have a database, and goal generation over a database fails the AutoGPT test.
If you are building a goal generator: the substrate-quality problem is your problem. You cannot generate good goals from a thin substrate. Roadmap: adopt a bi-temporal edge schema, adopt the deriver/dreamer/webhooks shape, adopt a per-domain desired-state doc convention, adopt a bounded goal vocabulary per domain, run the spike, measure, fix the substrate where it produced null results, re-run.
If you are betting your company on any single agent framework: separate the layers. Keep goal generation and the substrate in your own hands. Use whoever’s executor is best this quarter; be ready to swap.
Most of all: stop asking “is this system autonomous?” and start asking “does this system generate its own goals from a substrate it continuously re-grounds in?” The first is a branding question. The second is an engineering question with a test attached.
10. Honest Caveats
Three things this article does not claim.
It does not claim goal generation is solved. Our own spike was 65% validation on a partially-mature substrate. That is a promising first data point, not a victory. The test that matters is running the same shape of spike on a substrate we have deliberately hardened, against a goal-generation stack we have deliberately built, and seeing whether 65% becomes 85%. We have not run that test.
It does not claim existing systems are “wrong.” OpenClaw, Letta, Claude Code, MoltWorker, Agent SDK — these are excellent at what they do. Re-categorizing them as executors rather than agents changes their position on the taxonomy; it does not reduce their value. Naming the layer clearly makes them more useful, because it’s then clear what they should integrate with.
It does not claim active memory is the only path. Active memory is a plausible substrate shape, informed by prior art and tested in a partial spike. Other shapes exist — RL over long horizons, model-internal memory (EM-LLM’s direction), human-in-the-loop-with-agent-assist, formal planning over typed domains (PDDL descendants). The active-memory bet is that goal generation becomes a substrate-query problem once the substrate has the shape described in §5. We could be wrong. The test is empirical.
These caveats are the precondition for the article’s central claim being useful: we draw a line between two distinct things (executors and goal generators), name the hard open problem (open-ended goal generation), point at one specific path through it (active memory), and report our own partial result honestly. If the path fails, the distinction still holds, and whatever solves goal generation will still be the thing that matters.
Related Articles
- Context is a Harness Artifact — why an agent’s “context window” is not the substrate.
- The Harness Is a Prompt Compiler — how the outer loop shapes what the LLM sees.
- Thinking Is Substrate Self-Modification — the theoretical companion to this article; this article is its operational test.
- Memory as Lazy Queries Over the World — why memory is a retrieval function, not a container.
- Dreaming and the Effect Gate — what scheduled reorganization passes do and why they matter.
- Org Prime Agent Architecture — the bounded-domain existence proof referenced in §4.
- Autonomous Entity Pattern — the hierarchy-of-agents pattern that executor-swap assumes.
Sources
Internal
/Users/admin/Work/_readme/wiki/business/autonomous-agents-context-continuity.md— survey of 10 context-continuity frameworks (OpenClaw, NanoClaw, NanoBot, Claude Agent SDK, Claude Code, Hermes Agent, LangGraph, Agent Zero, Cloudflare Agents SDK, MoltWorker)./Users/admin/Work/_readme/wiki/business/active-memory-sota-survey.md— survey of 11 active-memory systems (Letta, Mem0, Graphiti, Cognee, A-MEM, MemoryOS, Honcho, LangMem, HippoRAG, EM-LLM, Anthropic Memory Tool)./Users/admin/Work/jane/wiki/6-research/goal-generation-spike-s1.md— the 2026-04-19 spike referenced in §6, including self-rating (7/8/7/6 across coherence, grounding, specificity, sequencing) and gap analysis./Users/admin/Work/_readme/articles/org-prime-agent-architecture/README.md— bounded-domain existence proof of open-ended goal generation./Users/admin/Work/_readme/articles/autonomous-entity-pattern/README.md— the hierarchy-of-agents pattern.
External
- AutoGPT retrospectives (2023–2024) and the BabyAGI task-loop paper (Nakajima, 2023) — the failure-mode analysis summarized in §3 is the consensus view across community postmortems, not a single source.
- Dennett, D. (1987). The Intentional Stance. MIT Press — the predictive-stance framing of agency.
- Korsgaard, C. (2009). Self-Constitution: Agency, Identity, and Integrity. Oxford University Press — practical agency as self-authored goal-setting.
- Friston, K. (2010). “The free-energy principle: a unified brain theory?” Nat Rev Neurosci. — active-inference framing of agents as surprise-minimizers.
- Silver, D., et al. (2017). “Mastering the game of Go without human knowledge.” Nature 550 — AlphaZero as bounded-domain goal generation via MCTS.
- Graphiti edge schema —
graphiti_core/edges.py,EntityEdgeclass with bi-temporal validity fields. - A-MEM memory-evolution prompt —
memory_layer.py,AgenticMemorySystem.evolution_system_prompt. - Honcho active-substrate decomposition —
src/deriver/,src/dreamer/,src/webhooks/(AGPL-3.0; pattern referenced, code not reused).