Skip to content
Gary Wu
Go back

AI Context Efficiency

Edit page

Org Status: 🟑 Dormant Cloudflare: N/A Last Audited: 2026-04-28


Most people use AI coding assistants like a search engine with attitude. They open a chat, dump in a vague question, get a vague answer, and close the tab. They are leaving 80% of the capacity on the table. The developers who get 10-50x more value from the same subscription have figured out something fundamental: the context window is a computer, and you need to program it.

This is a practitioner’s field guide to context engineering for AI coding assistants. It is based on running 50+ AI sessions per week across multiple concurrent projects, tracking every token, and iterating on the system for months. The patterns work with Claude Code, Cursor, Copilot, Windsurf, Aider, or any tool that gives you a context window to manage.

What you will learn:


  1. The Context Window Problem
  2. Token Economics: What You Are Actually Paying For
  3. Session-Based Workflows
  4. Token Budgeting
  5. Context Layering: The 3-Layer Architecture
  6. Two-Lane Concurrency
  7. Capacity Pacing
  8. Persistent Memory Systems
  9. The Train System
  10. Tool and Agent Design
  11. Comparisons
  12. Anti-Patterns
  13. References

Every AI coding assistant has a context window --- a fixed-size buffer of tokens that represents everything the model can β€œsee” at once. For Claude, this is 200K tokens (about 150K words). For GPT-4o, it is 128K. For Gemini, it is up to 1M but with significant degradation at the long end.

Here is the problem: most users fill this window with garbage.

A typical unstructured AI coding session looks like this:

  1. Open a new conversation
  2. Paste in 3 files (8K tokens)
  3. Ask a question (50 tokens)
  4. Get a response (2K tokens)
  5. Ask a follow-up that contradicts the first question (80 tokens)
  6. Paste in 2 more files because the AI β€œdoesn’t understand” (6K tokens)
  7. Get a response that hallucinated because the context is now a mess (3K tokens)
  8. Give up and start a new conversation

Total tokens consumed: ~20K. Useful work accomplished: maybe 2K tokens of actual insight, buried in 18K tokens of noise. And the next session starts from zero because nothing was persisted.

Compare this to a structured session:

  1. AI opens, reads a 2K-token memory file that contains project state, decisions, and priorities
  2. AI reads a 1K-token priority list from a GitHub issue
  3. AI identifies the goal, reads exactly the files it needs (5K tokens)
  4. AI executes the work, committing code, updating issues, logging progress
  5. AI closes, writing 500 tokens of state back to memory for the next session

Total tokens consumed: ~15K. Useful work: a shipped feature, updated documentation, and persistent state that makes the next session even more efficient.

Key insight: Context engineering is not about cramming more into the window. It is about curating what goes in so that every token earns its place.

Why Existing Approaches Fall Short

β€œJust chat with it” --- No persistence, no structure, no continuity. Each conversation starts from scratch. You re-explain your project, your preferences, your architecture every single time.

β€œPaste in everything” --- Context windows have a quality problem, not just a size problem. Research shows that model attention degrades as context grows --- a phenomenon Anthropic calls β€œcontext rot.” Pasting in your entire codebase does not help; it hurts.

β€œUse the biggest model” --- A 1M-token context window does not solve the curation problem. It just means you can be sloppy with more tokens before things break. The retrieval accuracy at the edges of a 1M window is measurably worse than a well-curated 50K window.

β€œMemory features will save me” --- ChatGPT’s memory, Windsurf’s Cascade memories, and similar features store facts but not workflows. They cannot encode β€œwhen you open a dev session, read the priority list, check for conflicts with the other lane, and pick a goal.” That requires structure.

What Changes If You Get This Right

The difference between unstructured and structured AI usage is not incremental. It is categorical:

MetricUnstructuredStructured
Sessions before useful output3-51
Context wasted on re-explanation40-60%<5%
Cross-session continuityNoneFull
Concurrent workstreams12+
Weekly capacity utilization20-40%90-100%
Subscription ROI (API-equivalent)3-5x15-60x

Those are real numbers from tracked sessions. On a single day running structured sessions, one developer logged $91 in API-equivalent value on a $200/month subscription --- a 13x daily ROI. On a day of unstructured chatting, the same developer might generate $5-10 of value.


Before you can manage context efficiently, you need to understand what tokens cost and where they go.

Token Pricing (Claude API, March 2026)

ModelInput (per 1M tokens)Output (per 1M tokens)Context Window
Claude Opus 4.6$5.00$25.00200K (1M extended)
Claude Sonnet 4.6$3.00$15.00200K
Claude Haiku 4.5$1.00$5.00200K
GPT-4o$2.50$10.00128K
GPT-4o mini$0.15$0.60128K

Key insight: Output tokens cost 5x more than input tokens. A 2K-token response costs the same as a 10K-token input. This means verbose AI responses are disproportionately expensive, and extended thinking (which bills as output) can dominate your budget.

Subscription vs API Economics

Most individual developers use subscriptions, not API billing:

PlanMonthly CostApproximate Weekly Token BudgetAPI-Equivalent Value
Claude Pro$20~1M output tokens~$100-200/week
Claude Max 5x$100~5M output tokens~$500-1000/week
Claude Max 20x$200~20M output tokens~$2000-4000/week
Cursor Pro$20Varies (500 requests/mo)~$100-300/week

On a subscription plan, you are not paying per token --- you are paying for a weekly capacity bucket. This fundamentally changes the optimization target: you want to use 100% of your capacity, not minimize usage. An unused subscription day is wasted money.

Where Tokens Actually Go

Here is a real breakdown from a tracked dev session:

Session: Train 6 S1 β€” 21 articles published
API-equivalent cost: ~$45
Duration: ~3 hours

Token breakdown (approximate):
  System prompt + CLAUDE.md:     4,000 tokens  (input, cached)
  Memory file (MEMORY.md):       3,500 tokens  (input)
  GitHub issues (#12, #13, #30): 2,500 tokens  (input)
  Playbook (CLAUDE-main.md):     2,000 tokens  (input)
  Code files read during work:  35,000 tokens  (input)
  Tool calls + responses:       20,000 tokens  (input/output)
  AI reasoning + output:        80,000 tokens  (output)
  Extended thinking:            50,000 tokens  (output, billed as output)
  ─────────────────────────────────────────────
  Total:                       ~197,000 tokens

The session startup cost (everything before actual work begins) was about 12K tokens. That is the β€œtax” you pay to restore context. The goal of context engineering is to minimize this tax while maximizing the signal it delivers.

The Input Token Tax

Every message you send to the AI includes the entire conversation history as input tokens. If your conversation is 50K tokens deep, every new message costs 50K input tokens plus your new message plus the response. This means:

// The cost of message N in a conversation
interface MessageCost {
  // Everything before this message
  conversationHistory: number; // grows with every exchange
  // The new content
  newInput: number;           // your message + any file reads
  // The response
  output: number;             // AI's response + thinking

  // Total input = conversationHistory + newInput
  // This is why long conversations get expensive fast
}

// Example: 20-message conversation
// Message 1:  input=2K,  output=1K  β†’ total=3K
// Message 5:  input=15K, output=2K  β†’ total=17K
// Message 10: input=40K, output=3K  β†’ total=43K
// Message 20: input=90K, output=2K  β†’ total=92K
// By message 20, you're paying 90K tokens just to maintain history

This is why Claude Code has auto-compaction at ~95% of the context window, and why /clear between unrelated tasks is not optional --- it is essential.


The most impactful pattern in this entire guide is treating AI interactions as sessions rather than conversations. A session has a lifecycle: it opens, does work, and closes. Each phase has specific responsibilities.

The Session Lifecycle

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   OPEN   │────▢│   WORK   │────▢│  CLOSE   β”‚
β”‚          β”‚     β”‚          β”‚     β”‚          β”‚
β”‚ β€’ Load   β”‚     β”‚ β€’ Executeβ”‚     β”‚ β€’ Persistβ”‚
β”‚   state  β”‚     β”‚   goal   β”‚     β”‚   state  β”‚
β”‚ β€’ Brief  β”‚     β”‚ β€’ Log    β”‚     β”‚ β€’ Update β”‚
β”‚ β€’ Pick   β”‚     β”‚   progress     β”‚   memory β”‚
β”‚   goal   β”‚     β”‚ β€’ Stay   β”‚     β”‚ β€’ Log    β”‚
β”‚          β”‚     β”‚   focused β”‚     β”‚   cost   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Session Interface

interface Session {
  type: "dev" | "planning" | "ops" | "architecture" | "monetization";
  goal: string;       // One sentence. If you can't state it, don't start.
  lane: "main" | "aux"; // Which concurrency lane (more on this later)

  open(): Promise<SessionContext>;
  work(): Promise<WorkResult>;
  close(): Promise<void>;
}

interface SessionContext {
  memory: MemoryFile;        // Persistent state from last session
  priorities: PriorityList;  // What needs doing, ranked
  sessionLog: LogEntry[];    // Recent session history (detect conflicts)
  planningBook: Signal[];    // Strategic signals to process
  capacityBudget: TokenBudget; // How much output you can afford today
}

interface WorkResult {
  shipped: string[];         // Concrete deliverables
  decisions: string[];       // Choices made and why
  insights: string[];        // Things that change how you think
  blockers: string[];        // What went wrong
}

OPEN: Restoring Context Efficiently

The OPEN phase is where context engineering matters most. You need to reload enough state for the AI to be productive without wasting tokens on information it does not need.

Here is a real OPEN sequence from a production system:


1. Read MEMORY.md (status, issue index, decisions)
2. Run capacity check (weekly token budget β€” report pace)
3. Read session log (scan for active parallel sessions)
4. Read priorities (pick a goal)
5. Read planning book (is strategic planning overdue?)

Then: Self-brief in 3-5 lines. State where we are, what was done
last, what is next. If a parallel session is active, note it.

Each step is intentional:

Key insight: The OPEN sequence is your session’s β€œboot sequence.” Like a computer BIOS, it should be fast, deterministic, and load only what is needed. A bloated OPEN wastes tokens on every single session.

WORK: Staying Focused

The WORK phase has one rule: do the thing you said you would do at OPEN.


Rules:
- One group, one goal. Don't wander.
- If strategy is wrong, that IS the finding β€” surface it, stop building.
- Log progress as you go (comments on issues, commit messages)
- If you notice something strategic, capture it on the planning book
  immediately β€” don't try to solve it now.
- Seek close context: after the main goal, look for small wins in
  the same area. Don't start new groups.

The β€œcapture, don’t execute” pattern is critical. During a dev session, you will inevitably notice architectural issues, missing features, or strategic opportunities. Without discipline, these become rabbit holes that consume your entire session. Instead:

// Anti-pattern: Scope creep during execution
async function devSession() {
  await fixLoginBug();          // The goal
  await refactorAuthModule();   // "While I'm here..."
  await addOAuth2Support();     // "This would be easy to add..."
  await redesignUserModel();    // "Actually the data model is wrong..."
  // Result: nothing shipped, everything half-done
}

// Pattern: Capture and continue
async function devSession() {
  await fixLoginBug();          // The goal
  // Noticed auth module needs refactoring
  await captureToPlanning("πŸ—οΈ ARCHITECTURE: auth module needs refactoring β€” see commit abc123");
  // Noticed OAuth would be valuable
  await captureToPlanning("πŸ’‘ OPPORTUNITY: OAuth2 support β€” 3 customers asked for it");
  // Done. One thing shipped.
}

CLOSE: Persisting State

CLOSE is the most underrated phase. It is where you convert a session’s ephemeral knowledge into persistent state that compounds across sessions.


1. Update priorities β€” check off done items
2. Update work issues β€” description current, has progress comments, close if done
3. Capture to planning book β€” anything strategic not yet captured
4. Log the session β€” type, summary, what shipped
5. Update daily log β€” insights, connections, blockers
6. Update MEMORY.md β€” but only your lane's section
7. Log session cost β€” track API-equivalent spend
8. Check capacity β€” report pace vs weekly budget
9. Say "Session closed. Safe to /clear."

The final line is important. /clear destroys the conversation context, but everything important has been persisted to files and issues. The next session can reconstruct full context in ~12K tokens instead of the 100K+ tokens the conversation consumed.

interface CloseChecklist {
  // State persistence (required)
  memoryUpdated: boolean;        // MEMORY.md reflects current state
  prioritiesUpdated: boolean;    // Done items checked off
  issuesCurrent: boolean;        // Work issues have progress comments
  sessionLogged: boolean;        // #12 has CLOSE entry

  // Knowledge capture (required)
  planningBookCaptured: boolean; // Strategic signals preserved
  dailyLogWritten: boolean;      // Insights, decisions, connections

  // Analytics (required)
  costLogged: boolean;           // API-equivalent spend tracked
  capacityChecked: boolean;      // Weekly budget pace reported

  // Gate
  safeToClose(): boolean {
    return this.memoryUpdated
      && this.sessionLogged
      && this.costLogged;
  }
}

Session Types

Not all sessions are the same. Different work requires different context loads and different OPEN sequences:

type SessionType =
  | "dev"           // Execute one goal. Ship code.
  | "planning"      // Evaluate past work. Launch next sprint.
  | "ops"           // Build automation. Reduce interactive time.
  | "architecture"  // Process ideas. Design systems.
  | "monetization"; // Find the shortest path to money.

// Each type loads different context at OPEN:
const contextLoads: Record<SessionType, string[]> = {
  dev: ["MEMORY.md", "#13 priorities", "#12 session log", "#30 planning book"],
  planning: ["MEMORY.md", "#30 planning book (ALL comments)", "#13 priorities", "#12 session log"],
  ops: ["MEMORY.md", "#49 ops log", "#12 session log"],
  architecture: ["MEMORY.md", "Ideas/Inbox/*", "#33 pipeline", "#13 priorities"],
  monetization: ["MEMORY.md", "#50 monetization log", "#23 revenue loop", "Brands/*.md"],
};

Planning sessions load more context than dev sessions because they need to see the full picture. Dev sessions load the minimum needed to execute a single goal. This is intentional --- every token of context load is a tax, and you should only pay for what you need.


If you cannot measure it, you cannot manage it. Token budgeting means tracking what costs tokens, auditing your session startup cost, and cutting bloat.

Measuring Session Cost

Here is a lightweight cost-tracking script that runs at session CLOSE:

#!/bin/bash

TYPE="${1:-W}"
DESC="${2:-session}"
TODAY=$(date +"%Y-%m-%d")
STATE_FILE=".session-cost-state"
LOG_FILE="session-costs.md"

CURRENT=$(npx ccusage daily --limit 1 --json 2>/dev/null | python3 -c "
import sys, json
d = json.load(sys.stdin)
days = d.get('daily', [])
today = '$TODAY'
for day in days:
    if day.get('date','') == today:
        print(f'{day.get(\"totalCost\", 0):.2f}')
        sys.exit()
if days:
    print(f'{days[-1].get(\"totalCost\", 0):.2f}')
" 2>/dev/null)

PREV_COST="0"
if [ -f "$STATE_FILE" ]; then
  PREV_DATE=$(head -1 "$STATE_FILE")
  PREV_COST=$(tail -1 "$STATE_FILE")
  [ "$PREV_DATE" != "$TODAY" ] && PREV_COST="0"
fi

DELTA=$(echo "$CURRENT - $PREV_COST" | bc 2>/dev/null)

echo "$TODAY" > "$STATE_FILE"
echo "$CURRENT" >> "$STATE_FILE"
echo "| $TODAY | [$TYPE] | ~\$$DELTA | ~\$$CURRENT | $DESC |" >> "$LOG_FILE"
echo "Logged: [$TYPE] $TODAY β€” session: ~\$$DELTA (day total: ~\$$CURRENT)"

Real Session Cost Data

Here is actual session cost data from a week of tracked sessions:

| Date       | Type | Session  | Day Total | Description                                    |
|------------|------|----------|-----------|------------------------------------------------|
| 2026-03-12 | [A]  | ~$4      | ~$4       | Social intelligence + vault tooling              |
| 2026-03-12 | [P]  | ~$3      | ~$7       | Train system established, Train 4 launched       |
| 2026-03-12 | [A]  | ~$2      | ~$9       | Two-lane concurrency + session type design       |
| 2026-03-12 | [D]  | ~$4      | ~$13      | Affiliate blocks live on 33 articles             |
| 2026-03-12 | [O]  | ~$0.18   | ~$13      | Fixed cost tracking, created smoke tests         |
| 2026-03-12 | [O]  | ~$1.82   | ~$15      | Stale issue detection + planning prep tools      |
| 2026-03-12 | [D]  | ~$4.91   | ~$20      | Affiliate directory (12 tools)                   |
| 2026-03-12 | [D]  | ~$20     | ~$40      | 1Password CLI + domain portfolio audit           |
| 2026-03-12 | [D]  | ~$6      | ~$48      | Custom domain wiring, standard pages             |
| 2026-03-12 | [O]  | ~$7      | ~$55      | Twitter API proxy + MCP tools                    |
| 2026-03-12 | [O]  | ~$3.50   | ~$59      | MCP server consolidation (3β†’1, 21 tools)        |
| 2026-03-12 | [D]  | ~$4      | ~$71      | Standard pages, content gen triggered            |
| 2026-03-12 | [P]  | ~$2      | ~$73      | Train 4 evaluated, Train 5 launched              |
| 2026-03-12 | [O]  | ~$11     | ~$85      | YouTube intelligence via API proxy               |
| 2026-03-12 | [D]  | ~$3      | ~$89      | SM→GatherFeed intelligence wiring                |
| 2026-03-12 | [P]  | ~$2      | ~$91      | Train 5 evaluated, Train 6 launched              |

That is 22 sessions in one day, producing $91 in API-equivalent value on a $200/month subscription. The daily budget at that rate is ~$6.60/day. This represents a 13.8x ROI on that single day.

Notice the variance: ops sessions cost $0.18-$11, dev sessions $1-$45, planning sessions $2-3. This variance is useful --- it tells you which session types are token-heavy and where to optimize.

Auditing Your OPEN Cost

The session startup cost is the tax you pay before any real work begins. Here is how to audit it:

// Before optimization: 42K token OPEN sequence
interface BloatedOpen {
  claudeMd: 4_000;           // System instructions
  memoryMd: 8_000;           // Entire memory file, including stale sections
  issueView12: 12_000;       // Full session log (all 50+ comments)
  issueView13: 6_000;        // Full priorities (with all comments)
  issueView30: 8_000;        // Full planning book (all comments)
  playbook: 4_000;           // Full playbook file
  // Total: ~42,000 tokens BEFORE any work begins
}

// After optimization: 15K token OPEN sequence
interface OptimizedOpen {
  claudeMd: 2_000;           // Trimmed to essentials, moved details to skills
  memoryMd: 3_500;           // Structured with lane sections, only read your lane
  sessionContextSh: 2_500;   // Lightweight fetcher: last 5 comments only
  prioritiesBody: 1_500;     // Body only, skip comment history
  playbook: 2_000;           // Lane-specific playbook, not both
  capacityCheck: 500;        // Compact capacity report
  // Total: ~12,000 tokens β€” 71% reduction
}

The key optimization was the lightweight fetcher pattern --- a shell script that replaces expensive gh issue view calls with targeted API queries:

#!/bin/bash

REPO="your-org/your-repo"
TARGET="${1:-all}"

fetch_issue_body() {
  local num=$1
  local title=$2
  echo "--- #$num $title ---"
  gh api "repos/$REPO/issues/$num" --jq '.body' 2>/dev/null | head -80
  echo ""
}

fetch_recent_comments() {
  local num=$1
  local title=$2
  local count=${3:-5}
  echo "--- #$num $title (last $count comments) ---"
  gh api "repos/$REPO/issues/$num/comments?per_page=$count&sort=created&direction=desc" \
    --jq 'reverse | .[] | "--- " + .created_at[0:10] + " ---\n" + .body[0:500]' 2>/dev/null
  echo ""
}

if [ "$TARGET" = "12" ] || [ "$TARGET" = "all" ]; then
  fetch_recent_comments 12 "Session Log" 5
fi

if [ "$TARGET" = "13" ] || [ "$TARGET" = "all" ]; then
  fetch_issue_body 13 "Priorities"
fi

if [ "$TARGET" = "30" ] || [ "$TARGET" = "all" ]; then
  fetch_recent_comments 30 "Planning Book" 5
fi

This script saves ~27K tokens per session by fetching only what is needed:

Token Budget by Session Type

Different session types should have different token budgets:

interface TokenBudget {
  sessionType: SessionType;
  openCost: number;        // Target startup cost
  workBudget: number;      // Expected working tokens
  totalTarget: number;     // Target total session cost
}

const budgets: TokenBudget[] = [
  { sessionType: "dev",           openCost: 12_000, workBudget: 150_000, totalTarget: 180_000 },
  { sessionType: "planning",      openCost: 20_000, workBudget: 40_000,  totalTarget: 60_000  },
  { sessionType: "ops",           openCost: 8_000,  workBudget: 80_000,  totalTarget: 100_000 },
  { sessionType: "architecture",  openCost: 15_000, workBudget: 60_000,  totalTarget: 80_000  },
  { sessionType: "monetization",  openCost: 18_000, workBudget: 30_000,  totalTarget: 50_000  },
];

// Planning is expensive to OPEN but cheap to execute (mostly analysis, not code)
// Dev is moderate to OPEN but expensive to execute (reading + writing code)
// Ops varies wildly (quick scripts vs complex infrastructure)

The most effective context management pattern is a 3-layer architecture where information is loaded progressively based on need.

Layer 1: Always-Loaded (Tiny Index)

This layer is loaded into every session, regardless of type. It must be small --- under 200 lines, ideally under 4K tokens.



Brief project description. One paragraph max.


Directory layout. Just the top-level names.


- Dev [D]: Execute one goal. Ship code.
- Planning [P]: Evaluate and plan.
- Ops [O]: Build automation.
- Architecture [A]: Process ideas.
- Monetization [M]: Find revenue paths.


At OPEN, read your lane's playbook:
- Dev or Planning β†’ Read CLAUDE-main.md
- Others β†’ Read CLAUDE-aux.md


1. Reference issues by number β€” don't re-explain
2. Don't re-read files already in conversation
3. Grep/Glob directly β€” no agent for simple searches


List of available shell tools with one-line descriptions

This file tells the AI what kind of thing it is, what kinds of work exist, and where to find more detailed instructions. It does NOT contain the detailed instructions themselves --- those live in Layer 2.

Key insight: Layer 1 is a routing table, not a manual. It tells the AI where to look, not what to do. This keeps the always-loaded cost under 4K tokens.

Layer 2: Session-Loaded (Playbooks, Memory)

This layer is loaded based on the session type determined at OPEN. Different sessions load different playbooks.

// Layer 2 files and when they load
interface Layer2 {
  // Loaded by Dev and Planning sessions only
  "CLAUDE-main.md": {
    content: "Full OPEN/WORK/CLOSE playbook for dev and planning";
    tokens: 2_000;
    loadedBy: ["dev", "planning"];
  };

  // Loaded by Ops, Architecture, and Monetization sessions only
  "CLAUDE-aux.md": {
    content: "Full OPEN/WORK/CLOSE playbook for aux lane sessions";
    tokens: 3_000;
    loadedBy: ["ops", "architecture", "monetization"];
  };

  // Loaded by all sessions
  "MEMORY.md": {
    content: "Persistent state: project status, issue index, decisions";
    tokens: 3_500;
    loadedBy: ["all"];
    sections: {
      "MAIN LANE STATUS": "Written by dev/planning only",
      "AUX LANE STATUS": "Written by ops/arch only",
      "SHARED SECTIONS": "Both lanes read, additive only",
    };
  };
}

The key design decision is splitting playbooks by lane. A dev session never loads the aux playbook, and vice versa. This saves 2-3K tokens per session and prevents the AI from confusing session types.

Here is what a lane-specific playbook looks like:


Read this at OPEN for Dev or Planning sessions.
Do NOT read CLAUDE-aux.md (save tokens).

---


### OPEN (refresh β†’ brief β†’ goal β†’ log)

1. Read MEMORY.md (status, issue index, decisions)
2. Run capacity check (weekly budget β€” report pace)
3. Read session log (scan for active parallel sessions)
4. Read priorities (pick a group to work on)
5. Read planning book (is planning overdue? 5+ unprocessed entries?)

### WORK (one goal, tight context, finish it)

- One group, one goal. Don't wander.
- If strategy is wrong, that IS the finding β€” surface it, stop.
- Log progress as you go on the work issue.

### CLOSE

1. Update priorities β€” check off done items
2. Update work issues β€” current descriptions, close if done
3. Capture to planning book β€” strategic signals
4. Log session to #12
5. Update MEMORY.md (your lane's section only)
6. Log session cost
7. Check capacity
8. "Session closed. Safe to /clear."

Layer 3: On-Demand (Full Files, APIs)

This layer is never pre-loaded. The AI fetches it during the WORK phase when it needs specific information.

// Layer 3: fetched on demand during work
type OnDemandSource =
  | { type: "file"; path: string }           // Read a specific source file
  | { type: "api"; endpoint: string }        // Call an API
  | { type: "search"; query: string }        // Search the codebase
  | { type: "issue"; number: number }        // Read a GitHub issue
  | { type: "subagent"; agent: string }      // Delegate to a specialized agent
  | { type: "web"; url: string };            // Fetch documentation

// Examples of on-demand fetches:
// - Reading a specific source file the AI needs to modify
// - Querying a database for keyword research data
// - Running a codebase search for usage patterns
// - Delegating a multi-file scan to a sub-agent
// - Fetching external documentation for a library

The key principle: Layer 3 content should NEVER be loaded speculatively. If the AI might need a file, it should not pre-load it β€œjust in case.” It should fetch it when the need is confirmed.

The Full Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Context Window              β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€ Layer 1: Always-Loaded (~4K) ──────┐│
β”‚  β”‚ CLAUDE.md (routing table)            β”‚β”‚
β”‚  β”‚ System prompt                        β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€ Layer 2: Session-Loaded (~8K) ─────┐│
β”‚  β”‚ Lane playbook (main OR aux)          β”‚β”‚
β”‚  β”‚ MEMORY.md (persistent state)         β”‚β”‚
β”‚  β”‚ Priority list (body only)            β”‚β”‚
β”‚  β”‚ Session log (last 5 comments)        β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€ Layer 3: On-Demand (varies) ───────┐│
β”‚  β”‚ Source files (as needed)             β”‚β”‚
β”‚  β”‚ API responses (as needed)            β”‚β”‚
β”‚  β”‚ Sub-agent results (summarized)       β”‚β”‚
β”‚  β”‚ Documentation (fetched when used)    β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”‚                                          β”‚
β”‚  β”Œβ”€ Conversation History (grows) ──────┐│
β”‚  β”‚ Previous messages + responses         β”‚β”‚
β”‚  β”‚ (auto-compacted at ~95% capacity)    β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Budget allocation:
  Layer 1:    ~4K tokens  (2% of 200K window)
  Layer 2:    ~8K tokens  (4% of 200K window)
  Layer 3:    ~50K tokens (25% of 200K window, variable)
  History:    ~120K tokens (60% of 200K window, grows)
  Headroom:   ~18K tokens (9% buffer for compaction)

Designing Each Layer

Layer 1 design rules:

Layer 2 design rules:

Layer 3 design rules:


Once you have session-based workflows working, you unlock a powerful capability: running two AI sessions simultaneously on different workstreams without them colliding.

The Problem

If two AI sessions are both modifying the same files, reading the same issues, and updating the same state, you get conflicts:

The Solution: Lane-Based Partitioning

Partition your work into two lanes, each with its own owned resources:

interface Lane {
  name: "main" | "aux";
  sessionTypes: SessionType[];
  ownedResources: Resource[];
  sharedResources: Resource[];
}

const mainLane: Lane = {
  name: "main",
  sessionTypes: ["dev", "planning"],
  ownedResources: [
    "#13 priorities",           // Main lane restructures freely
    "work issues",              // Dev work tracked here
    "CLAUDE-main.md",           // Main lane playbook
    "MEMORY.md ## MAIN STATUS", // Main lane's memory section
  ],
  sharedResources: [
    "#12 session log",          // Both lanes append
    "#30 planning book",        // Both lanes append
    "MEMORY.md ## SHARED",      // Both lanes read, additive only
  ],
};

const auxLane: Lane = {
  name: "aux",
  sessionTypes: ["ops", "architecture", "monetization"],
  ownedResources: [
    "#33 ideas pipeline",       // Architecture tracking
    "#49 ops tracking",         // Ops tracking
    "#50 monetization tracking", // Monetization tracking
    "CLAUDE-aux.md",            // Aux lane playbook
    "MEMORY.md ## AUX STATUS",  // Aux lane's memory section
  ],
  sharedResources: [
    "#12 session log",
    "#30 planning book",
    "MEMORY.md ## SHARED",
  ],
};

Conflict Prevention Rules


1. **Comments are safe, edits are dangerous.**
   Prefer additive comments on shared issues over overwriting
   descriptions or files.

2. **Each lane owns different resources.**
   Main lane owns #13, work issues, CLAUDE-main.md.
   Aux lane owns #33/#49/#50, CLAUDE-aux.md.
   Shared: #12, #30, MEMORY.md.

3. **OPEN: detect the other lane.**
   Scan recent session log comments for unmatched OPEN markers.
   If a sibling is active, note it.

4. **CLOSE: write to your lane's MEMORY section.**
   MEMORY.md has ## MAIN LANE STATUS and ## AUX LANE STATUS.
   Each lane writes only to its own section.
   Shared sections are additive only.

5. **Priorities: main lane writes, aux lane appends.**
   Dev/Planning restructure #13 freely.
   Architecture/Ops only ADD items, never reorder.

6. **Planning book: entry-count trigger.**
   5+ unprocessed entries since last CLEARED.
   Either lane can comment; only Planning clears.

7. **File edits: coordinate by area.**
   Main lane edits code and project notes.
   Aux lane edits Ideas/, Standards/, agents, study decks.
   If both need the same file, aux lane defers.

Detecting Concurrent Sessions

At OPEN, each session checks for siblings:

Valid Concurrent Combinations

Main lane:  Dev ←→ Planning (they alternate, never concurrent)
Aux lane:   One of: Ops, Architecture, Monetization (never overlap)

Valid combos:
  Dev + Ops                 βœ…
  Dev + Architecture        βœ…
  Dev + Monetization        βœ…
  Planning + Ops            βœ…
  Planning + Architecture   βœ…

Invalid combos:
  Dev + Planning            ❌ (both main lane)
  Ops + Architecture        ❌ (both aux lane)
  Dev + Dev                 ❌ (same type)

Real-World Example

Here is how two concurrent sessions worked on the same day:

09:00  🟒 OPEN [D] β€” Dev session: wire custom domains for 10 sites
       (main lane reads MEMORY.md, #13, picks goal)

09:15  🟒 OPEN [O] β€” Ops session: build YouTube intelligence MCP tools
       (aux lane reads MEMORY.md, #49, detects dev session active)
       (avoids editing any code the dev session is touching)

10:30  πŸ”΄ CLOSE [D] β€” Domains wired, standard pages deployed
       (updates MEMORY.md ## MAIN LANE STATUS)

10:45  πŸ”΄ CLOSE [O] β€” 5 YouTube MCP tools built, 14/14 smoke tests pass
       (updates MEMORY.md ## AUX LANE STATUS)
       (appends YouTube tools to #13 priorities β€” does not reorder)

Both sessions produced value without conflicts because they operated on partitioned resources. The dev session edited production code; the ops session built new tooling in a separate directory.


Subscription plans have weekly token budgets. Capacity pacing means tracking your usage against the weekly cap and adjusting your session frequency to use 100% of your budget without hitting the wall.

The Capacity Check Script

#!/bin/bash

MODE="${1:-open}"
WEEKLY_CAP="${CLAUDE_WEEKLY_CAP:-5000000}"

DATA=$(npx ccusage weekly --limit 1 --json 2>/dev/null)

python3 -c "
import sys, json, os
from datetime import datetime

mode = '$MODE'
cap = $WEEKLY_CAP
dow = $(date +%u)  # 1=Mon 7=Sun

d = json.loads('''$DATA''')
weeks = d.get('weekly', [])
if not weeks:
    print('No weekly data')
    sys.exit()

w = weeks[-1]
output = w.get('outputTokens', 0)

ideal_pct = dow / 7 * 100
actual_pct = min(output / cap * 100, 999) if cap > 0 else 0
diff = actual_pct - ideal_pct
remaining = max(cap - output, 0)
days_left = 7 - dow + 1

budget_per_day = remaining / days_left if days_left > 0 else 0

if diff > 15:
    status = 'AHEAD β€” Throttle'
elif diff > 5:
    status = 'SLIGHTLY AHEAD β€” Moderate pace'
elif diff > -10:
    status = 'ON PACE β€” Good'
elif diff > -25:
    status = 'BEHIND β€” Room to push'
else:
    status = 'FAR BEHIND β€” Push harder'

bar_w = 30
filled = min(int(actual_pct / 100 * bar_w), bar_w)
bar = '#' * filled + '-' * (bar_w - filled)

print(f'Capacity: [{bar}] {actual_pct:.0f}% | Day {dow}/7 | {status}')
print(f'Output: {output:,} / {cap:,} | Remaining: {remaining:,}')
print(f'Budget: {budget_per_day:,.0f} tokens/day for {days_left} remaining days')
"

Pacing Strategy

interface PacingStrategy {
  // Weekly budget (output tokens)
  weeklyBudget: number;

  // Current state
  dayOfWeek: number;  // 1-7
  tokensUsed: number;
  tokensRemaining: number;

  // Derived
  idealPace: number;     // weeklyBudget * (dayOfWeek / 7)
  actualPace: number;    // tokensUsed
  dailyBudget: number;   // tokensRemaining / daysLeft

  // Decision
  recommendation(): string {
    const diff = (this.actualPace / this.idealPace - 1) * 100;

    if (diff > 15) return "THROTTLE: Run fewer sessions, use lighter models";
    if (diff > 5)  return "MODERATE: Normal session count, avoid heavy ops";
    if (diff > -10) return "ON PACE: Keep going, this is ideal";
    if (diff > -25) return "PUSH: Run extra sessions today, use heavier models";
    return "SPRINT: Maximum sessions, large-scope work, use Opus freely";
  }
}

Adjusting Session Weight

When you are behind pace, increase session weight (use Opus, tackle larger goals, run more sessions). When ahead, decrease it (use Sonnet, tackle smaller goals, skip optional sessions).

Week pacing example (5M weekly cap):

Mon: Used 400K (5.7%) β€” Ideal: 14% β€” FAR BEHIND
     β†’ Run 3-4 heavy dev sessions, use Opus

Tue: Used 1.2M (24%) β€” Ideal: 29% β€” BEHIND
     β†’ Run 2-3 sessions, normal weight

Wed: Used 2.1M (42%) β€” Ideal: 43% β€” ON PACE
     β†’ Normal operations

Thu: Used 3.0M (60%) β€” Ideal: 57% β€” SLIGHTLY AHEAD
     β†’ Moderate pace, prefer Sonnet

Fri: Used 3.8M (76%) β€” Ideal: 71% β€” AHEAD
     β†’ Lighter sessions, small scope

Sat: Used 4.3M (86%) β€” Ideal: 86% β€” ON PACE
     β†’ Normal wind-down

Sun: Used 4.9M (98%) β€” Ideal: 100% β€” GOOD
     β†’ One final session to close out the week

The Usage Report

A weekly usage report provides the big picture:

#!/bin/bash

echo "Daily breakdown (last 14 days):"
npx ccusage daily --limit 14 2>/dev/null

echo ""
echo "Weekly breakdown:"
npx ccusage weekly --limit 4 2>/dev/null

echo ""
echo "How to read:"
echo "Cost = API-equivalent (what tokens would cost without subscription)."
echo "Budget: \$6.60/day on Max plan."
echo "Most work days show \$100-400 = 15-60x ROI."
echo "Low days (<\$10) = subscription underused."
echo "Goal: use Claude Code every day, no zero days."

Key insight: On a Max subscription ($200/month), a day with $0 in API-equivalent usage is a $6.60 loss. The goal is not to minimize spend --- it is to maximize the value extracted from a fixed-cost resource. Zero days are the enemy.


The fundamental problem with AI coding sessions is that /clear destroys everything. Persistent memory systems ensure that critical state survives conversation resets.

The Memory File

MEMORY.md is the backbone of cross-session persistence. It is loaded at Layer 2 (every session reads it) and updated at CLOSE (every session writes to it).



_Written by Dev [D] and Planning [P] sessions only._

- **Current train**: Train 6 β€” "Ship Content"
  - S1 done: 21 articles published on llctax.co
  - S2 next: ugc.marketing (20+ articles)
- **Active brands**: llctax.co (score 149), ugc.marketing (score 114)
- **GatherFeed**: 18 RSS feeds, 185+ posts, cron every 6h
- **Blockers for DRR > $0**: AdSense ID (human), affiliate signups (human)


_Written by Ops [O] and Architecture [A] sessions only._

- **Last ops session**: YouTube MCP tools built, 14/14 smoke tests
- **API Mom providers**: brave, perplexity, dataforseo, twitter, youtube
- **Automation**: smoke test cron (4h), 6 shell tools, 5 sub-agents
- **Monetization**: No sessions yet. DRR = $0 across all brands.


_Both lanes read. Additive edits only._


- Scramjet: pipeline engine (D1+R2+DO+Workflows+Queues)
- Scalable Media: autonomous brand operator
- GatherFeed: async research engine
- Pages-plus: multi-site SSR
- API Mom: managed keys, cost attribution, cache


### Active (revenue path)
- #17 Brand Engine Β· #23 Revenue loop Β· #62 Batch content gen
### Open (background)
- #1 Monorepo Β· #32 Observability Β· #65 Bookmarks ingest


- Event-driven: all inter-service via Queues
- DRR is the KPI β€” daily recurring revenue per brand
- Always wrangler.jsonc, never .toml

Memory File Design Principles

interface MemoryFileDesign {
  // Structure
  sections: {
    laneOwned: string[];     // Written by one lane only
    shared: string[];         // Both lanes read, additive edits
  };

  // Sizing
  maxTokens: 4_000;          // Must fit comfortably in Layer 2 budget
  maxLines: 200;              // Enforced by periodic curation

  // Content rules
  rules: [
    "State, not history. Current status, not what happened.",
    "References, not content. Issue numbers, not issue descriptions.",
    "Decisions, not debates. What was decided, not why alternatives were rejected.",
    "Lane separation. Each lane writes only to its own section.",
    "Shared sections are additive only. Never delete another lane's additions.",
  ];

  // Anti-patterns
  avoid: [
    "Session transcripts (use session log issue instead)",
    "Full code blocks (use file paths instead)",
    "Stale information (prune at Planning sessions)",
    "Duplicated content (reference once, don't repeat)",
  ];
}

GitHub Issues as Knowledge Base

Beyond MEMORY.md, GitHub issues serve as a persistent knowledge base with specific roles:

interface KnowledgeBase {
  // System issues (always exist, never close)
  "#12": {
    name: "Session Log";
    purpose: "Chronicle of all sessions β€” OPEN/CLOSE markers, summaries";
    pattern: "Append-only comments. Never edit previous comments.";
    readAt: "OPEN (last 5 comments for sibling detection)";
    writeAt: "OPEN (🟒 marker) and CLOSE (πŸ”΄ marker)";
  };

  "#13": {
    name: "Priorities";
    purpose: "What needs doing, ranked by importance";
    pattern: "Body = current priorities. Main lane rewrites freely.";
    readAt: "OPEN (body only)";
    writeAt: "CLOSE (check off done items)";
  };

  "#30": {
    name: "Planning Book";
    purpose: "Strategic signals captured during work";
    pattern: "Append-only. Cleared only by Planning sessions.";
    readAt: "OPEN (last 5 comments, check if 5+ unprocessed)";
    writeAt: "During WORK (when noticing strategic signals)";
    tags: [
      "SEQUENCE β€” something should happen in a different order",
      "MISSING β€” a prerequisite we did not account for",
      "OPPORTUNITY β€” a new revenue or efficiency opportunity",
      "ARCHITECTURE β€” a system design consideration",
      "PRIORITY β€” something should be ranked differently",
      "PROCESS β€” a workflow improvement",
      "OPS β€” an automation opportunity",
    ];
  };
}

File-Based vs Database Memory

// File-based memory (what we use)
// Pros: readable, versionable, works offline, AI can read/write directly
// Cons: no query engine, manual curation needed, concurrent writes need rules

interface FileMemory {
  "MEMORY.md": "Cross-session state, updated at CLOSE";
  "session-costs.md": "Cost tracking log, append-only";
  "Log/Daily/*.md": "Daily logs with insights, connections, decisions";
  "Standards/*.md": "Institutional knowledge, updated at Planning";
}

// Database memory (what tools like ChatGPT use)
// Pros: queryable, automatic, no manual curation
// Cons: opaque, not versionable, AI can't control what's stored

interface DatabaseMemory {
  chatgpt: "Automatic fact extraction, stored in profile";
  windsurf: "Cascade memories β€” auto-generated during conversation";
  copilot: "Copilot Spaces β€” scoped to repos, not sessions";
}

Key insight: File-based memory is more work to maintain but gives you full control. Database memory is easier but opaque. For serious AI workflows, you want both: file-based memory for structured state, and the tool’s built-in memory for ad hoc facts.

The Curation Problem

Memory files grow stale. Old status entries, completed issues still listed, decisions that have been superseded --- all of these waste tokens on every session.

The solution is periodic curation, triggered by the planning cycle:

interface CurationProcess {
  trigger: "Planning session (every 5-10 dev sessions)";
  tasks: [
    "Remove completed items from MAIN/AUX STATUS sections",
    "Archive closed issues from the index",
    "Update SHARED SECTIONS with current infrastructure state",
    "Prune KEY DECISIONS that are now obvious/established",
    "Verify all referenced file paths still exist",
    "Check MEMORY.md token count β€” must stay under 4K",
  ];

  // Automated curation via sub-agent
  agent: "vault-curator";
  model: "sonnet"; // Cheap model for curation
  output: "Report of changes made, stale items removed, token count";
}

Daily Logs as Institutional Knowledge

Each session produces a daily log entry. Over time, these become searchable institutional knowledge:



- [A] Social intelligence + vault tooling (4 session types, canvas diagrams)
- [P] Train system established, Trains 1-3 evaluated, Train 4 launched
- [D] Affiliate blocks live on bankstatementtoexcel (33 articles, 6 tools)
- [O] Smoke test automation, vault-curator validation
- [D] 1Password CLI + domain portfolio audit (181 domains)
- [O] Tailscale mesh β€” 4 endpoints with SSH


- 1Password CLI + service account tokens = no human needed for secrets
- Domain portfolio strategy: near-zero marginal cost per site means buy aggressively
- Smoke tests catch regressions that would otherwise go unnoticed for days


- [[domain-portfolio-strategy]] informed by [[monetization-vocabulary]]
- [[api-mom]] proxy pattern enables [[twitter]] and [[youtube]] intelligence
- [[affiliate-blocks]] pattern from [[composable-archetypes]]


- Train system adopted: planning + N dev sessions, max 5 per train
- Two-lane concurrency: main lane (dev/planning) + aux lane (ops/arch/monetization)
- Revenue items go FIRST in every train plan, not last

Sessions need to roll up into something larger. Without a macro-level structure, you get β€œsession soup” --- productive individual sessions that do not add up to a coherent outcome.

What is a Train?

A train is a planning cycle: one planning session (the locomotive) followed by N execution sessions (the cars). The planning session sets the destination. The execution sessions get there. The next planning session evaluates results and launches a new train.

interface Train {
  number: number;
  name: string;           // Goal-oriented, e.g., "Ship Content"
  plannedAt: Date;
  sessions: PlannedSession[];  // Max 5 dev sessions
  successCriteria: string[];   // Measurable, revenue-linked

  // Metrics (filled during/after execution)
  planCompletion: number;      // items done / items planned
  workArchRatio: number;       // % of sessions that shipped work (target >80%)
  driftSessions: number;       // sessions that went off-plan (target: 0)
  drrDelta: number;            // change in daily recurring revenue
  issuesNet: number;           // created - closed (target: <=0)
}

interface PlannedSession {
  number: number;
  goal: string;
  revenueTest: string;  // "Does this make money? How?"
  status: "planned" | "complete" | "skipped";
}

Train Rules

These rules were learned from tracking 6 trains over several weeks:


1. **Revenue-closing items go FIRST**, not last.
   Train 1-3 pattern: build infra first, monetize later.
   Result: $0 revenue after 15 sessions.
   Fix: put the money-making work at the front of every train.

2. **No architecture sessions during a train.**
   Architecture is exploration. Trains are execution.
   If you notice an architecture issue, capture it on #30.
   Don't solve it now.

3. **Max 5 dev sessions per train.**
   Longer trains drift. Shorter trains force focus.
   If you can't accomplish the goal in 5 sessions,
   the goal is too big β€” split it.

4. **Every session passes the revenue test.**
   "Does this make money?" If the answer is "eventually"
   or "it enables something that will," be suspicious.
   Prefer direct revenue impact.

5. **Close >= create for issues per train.**
   Issue count is a leading indicator of complexity creep.
   If a train creates more issues than it closes,
   something is wrong.

6. **No side quests.**
   Capture on #30, don't execute.
   Side quests are the #1 killer of train productivity.

Evolution Through Trains

Here is how the train system evolved through actual use:

Train 1-3: Architecture drift
  Problem: Spent all sessions building infra, no revenue
  Pattern: "Let me just set up one more thing before monetizing"
  Lesson: Revenue items go FIRST

Train 4: Human-blocked
  Problem: Plan required human actions (AdSense signup, domain transfers)
    that didn't happen, blocking the whole train
  Lesson: Human-gated items must complete BEFORE train launches

Train 5: Prerequisite discovery
  Problem: Plan assumed infra existed that didn't
    (keyword research assumed GatherFeed had keyword storage β€” it didn't)
  Lesson: Identify ALL prerequisites at planning time.
    If infra doesn't exist, that IS the session.

Train 6: Pure execution
  Result: 21 articles published in Session 1 alone
  Pattern: All infra in place, all human gates cleared,
    sessions are pure output
  Lesson: Front-load the boring work so trains are fast

Key insight: Trains get better. Train 1 produced $0 of value in 5 sessions. Train 6 produced 21 published articles in session 1. The system learns from itself because each planning session evaluates the previous train and adjusts.

Planning Session Structure


Step 0 β€” Evaluate previous train
  - Plan completion % (items done / planned)
  - Work/Arch ratio (target >80%)
  - Drift sessions (target: 0)
  - DRR delta
  - Issues net (created - closed)
  Write findings in the train file. Identify what caused drift.

Step 1 β€” Gather
  Read all #30 comments since last CLEARED.
  Group by theme.

Step 2 β€” Assess
  For each theme: real signal or noise?
  Changes sequence/strategy/groups?

Step 3 β€” Standards audit
  Spawn standards-auditor agent.
  Pick 1-2 gaps for next train.

Step 4 β€” Restructure
  Rewrite #13. Update tracking issues.
  Create/close issues. Revenue items first.

Step 5 β€” Launch next train
  Create train file with:
  - 5 planned dev sessions with revenue test justification
  - Architecture decisions made HERE (not deferred)
  - Measurable success criteria

Step 6 β€” Clear planning book
  Comment "CLEARED β€” processed N entries"

AI coding assistants have tools: file read/write, bash execution, web search, and increasingly, sub-agents. How you design and use these tools has a massive impact on context efficiency.

Sub-Agents: When and Why

Sub-agents are isolated AI instances that receive a focused task, execute it, and return a summary. The key benefit is context isolation: the sub-agent’s working context (file reads, API responses, intermediate reasoning) stays in the sub-agent. Only the final summary returns to the main conversation.

interface SubAgentDesign {
  // When to use a sub-agent
  when: [
    "Expensive scanning β€” reads many files across many projects",
    "Repeatable process β€” same analysis runs regularly",
    "Bulk data processing β€” API responses too large for main context",
    "Specialized domain β€” needs a focused system prompt",
  ];

  // When NOT to use a sub-agent
  whenNot: [
    "A simple Grep/Glob answers the question",
    "The task is one-off and won't recur",
    "The output is small enough for the main conversation",
  ];

  // Design rules
  rules: [
    "Read-only by default. Agents audit and report β€” no code modification.",
    "Concise output. Keep reports under 100 lines.",
    "Use Sonnet unless the task requires Opus-level reasoning.",
    "Track in a registry. Every agent gets a row in the registry table.",
  ];
}

Agent Registry

Here is a real agent registry from a production system:

| Agent                | Model  | Trigger                    | Purpose                          |
|----------------------|--------|----------------------------|----------------------------------|
| standards-auditor    | sonnet | Planning (mandatory)       | Scan projects against standards  |
| seo-keyword-analyzer | sonnet | On-demand                  | Process bulk keyword data        |
| article-writer       | opus   | On-demand                  | Write comprehensive articles     |
| vault-curator        | sonnet | Planning, on-demand        | Maintain vault health            |
| competitor-analyzer  | sonnet | Dev (content), monetization| Analyze competitor websites      |
| ideas-pipeline-sync  | sonnet | Architecture               | Validate Ideas pipeline state    |

Context Isolation Math

// Without sub-agent: everything in main context
async function auditWithoutAgent() {
  // Read 10 project configs (10 * 2K = 20K tokens)
  const configs = await readAllProjectConfigs();
  // Read 4 standards (4 * 3K = 12K tokens)
  const standards = await readAllStandards();
  // Process (reasoning: ~10K tokens)
  const report = analyzeCompliance(configs, standards);
  // Total added to main context: ~42K tokens
  // This stays in context for the rest of the session
  return report;
}

// With sub-agent: isolated context
async function auditWithAgent() {
  // Spawn sub-agent with focused system prompt
  const report = await runAgent("standards-auditor", {
    task: "Audit all active projects against current standards",
    // Sub-agent reads 42K tokens internally
    // Only the summary comes back
  });
  // Total added to main context: ~2K tokens (the summary)
  // 40K tokens of working context stayed in the sub-agent
  return report; // 95% context savings
}

Tool Selection Hierarchy

Not every task needs the most powerful tool. Use the lightest tool that gets the job done:

// From cheapest to most expensive context-wise:
type ToolHierarchy = [
  // 1. Direct tools (0 context overhead beyond the response)
  "Grep",           // Search file contents β€” returns only matches
  "Glob",           // Find files by pattern β€” returns only paths
  "Read",           // Read a specific file β€” returns content

  // 2. Shell commands (small overhead from command + output)
  "Bash",           // Run a command β€” returns stdout/stderr

  // 3. API calls (moderate overhead from request + response)
  "gh api",         // GitHub API β€” targeted queries
  "gh issue view",  // Full issue view β€” can be expensive

  // 4. Sub-agents (large internal cost, small returned cost)
  "Agent",          // Spawns isolated instance β€” returns summary only

  // 5. Web tools (variable, often large)
  "WebSearch",      // Search results β€” moderate tokens
  "WebFetch",       // Full page content β€” can be very large
];

// Rule: Always try the cheapest tool first
// Don't use Agent for something Grep can solve
// Don't use WebFetch if you just need a URL
// Don't use gh issue view if you just need the body

Hooks for Context Preprocessing

Claude Code hooks can preprocess data before the AI sees it, dramatically reducing context consumption:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/filter-test-output.sh"
          }
        ]
      }
    ]
  }
}
#!/bin/bash

input=$(cat)
cmd=$(echo "$input" | jq -r '.tool_input.command')

if [[ "$cmd" =~ ^(npm test|pytest|go test) ]]; then
  # Filter to show only failures β€” reduces 10K lines to ~50
  filtered_cmd="$cmd 2>&1 | grep -A 5 -E '(FAIL|ERROR|error:)' | head -100"
  echo "{\"hookSpecificOutput\":{\"hookEventName\":\"PreToolUse\",\"permissionDecision\":\"allow\",\"updatedInput\":{\"command\":\"$filtered_cmd\"}}}"
else
  echo "{}"
fi

This single hook can save 5-10K tokens per test run by filtering out passing tests and only showing failures.

Skills vs CLAUDE.md

Claude Code supports skills --- on-demand knowledge packs that load only when invoked, keeping them out of the always-loaded context:

// Anti-pattern: everything in CLAUDE.md (always loaded)
// CLAUDE.md = 800 lines, 6K tokens loaded EVERY session
// Including: PR review guidelines, database migration steps,
// deployment checklist, testing standards...
// Most sessions use 0-1 of these sections.

// Pattern: CLAUDE.md as index, skills as content
// CLAUDE.md = 150 lines, 2K tokens (routing table)
// Skills loaded on-demand:
//   "pr-review" skill     β†’ 2K tokens, loaded only during PR reviews
//   "db-migration" skill  β†’ 1.5K tokens, loaded only during migrations
//   "deploy" skill        β†’ 1K tokens, loaded only during deploys
//   "testing" skill       β†’ 1.5K tokens, loaded only during test work

// Savings: 4K tokens per session that doesn't need specialized knowledge
// That's 4K * 20 sessions/week = 80K tokens/week saved

AI Coding Assistants: Context Management Features

FeatureClaude CodeCursorCopilot CLIWindsurfAiderContinue.dev
Config fileCLAUDE.md.cursor/rules/*.mdc.github/copilot-instructions.mdGlobal + workspace rulesCONVENTIONS.mdconfig.yaml
Config locationProject root, ~/.claude/.cursor/ dir.github/ dirSettings + projectProject root, ~/.aider.conf.yml~/.continue/
Hierarchical configYes (project + user + memory)Yes (global + project rules)Yes (org + repo)Yes (global + workspace)Yes (home + repo + cwd)Yes (global + workspace)
Persistent memoryMEMORY.md (file-based, user-controlled)Limited (chat history)Copilot SpacesCascade Memories (auto-generated)Chat history filesCustom context providers
Session lifecycleManual (OPEN/WORK/CLOSE via CLAUDE.md)None (continuous chat)NoneNoneNone (stateless by design)None
Context window200K tokens~128K (model-dependent)Model-dependentModel-dependentModel-dependentModel-dependent
Auto-compactionYes (at ~95%)No (starts new chat)NoNoNo (repo map compression)No
Sub-agentsYes (isolated context)NoAgent teams (preview)NoNoNo
Hooks/preprocessingYes (PreToolUse, PostToolUse)NoNoNoNoCustom context providers
Cost tracking/cost, /stats, ccusageLimitedGitHub billingLimitedToken counting per modelNo
Concurrent sessionsYes (with lane partitioning)No (single workspace)Possible (multiple terminals)NoPossible (multiple terminals)No
Model switching/model mid-sessionPer-chat model selectionLimitedModel selection--model flagPer-provider config

Detailed Comparison: Context Persistence Approaches

ApproachHow It WorksStrengthsWeaknesses
Claude Code CLAUDE.md + MEMORY.mdMarkdown files in project root. AI reads at session start, writes at session end. User controls structure and content.Full control, version-controlled, structured sections, lane partitioning, works offlineManual maintenance required, needs discipline to update, can grow stale
ChatGPT Custom Instructions + MemorySystem-level instructions (1500 chars). Auto-extracted memories from conversations.Zero effort, always active, works across all chatsTiny instruction limit, opaque memory, no file-based persistence, no session structure
Cursor .cursor/rulesMDC files with frontmatter (description, globs, alwaysApply). Activated based on file patterns or always.Pattern-based activation (load rules only when editing matching files), reduces noiseNo memory system, no session lifecycle, no cross-session persistence beyond rules
Aider CONVENTIONS.mdMarkdown file loaded as context. Repo map for compressed codebase overview. Model-agnostic.Simple, model-agnostic, repo map is clever, works with any LLMNo session structure, no memory, no sub-agents, no compaction
Windsurf Rules + MemoriesGlobal/workspace rules (saved prompts). Cascade memories (auto-generated during conversation).Auto-generated memories, separates rules from memoriesOpaque memory management, no session lifecycle, memories are per-project only
GitHub Copilot SpacesCollaborative containers with repos, issues, docs, instructions. Scoped context across sessions.Team-oriented, pulls in multiple repos, persistent across sessionsNot session-based, no lifecycle management, no token budgeting, no cost tracking
Continue.dev Context ProvidersYAML config with custom context providers. HTTP, MCP, and built-in providers. @ mentions to load context.Extensible, MCP integration, works with any modelManual context selection, no auto-loading, no session lifecycle

When to Use What

// Decision tree for choosing a context management approach

function chooseApproach(needs: ProjectNeeds): string {
  if (needs.multiSessionContinuity && needs.concurrentWorkstreams) {
    return "Claude Code with full session system";
    // You need MEMORY.md, lane partitioning, train system
  }

  if (needs.multiSessionContinuity && !needs.concurrentWorkstreams) {
    return "Claude Code with simplified sessions";
    // MEMORY.md + single-lane OPEN/WORK/CLOSE
  }

  if (needs.teamConsistency && !needs.sessionLifecycle) {
    return "Cursor rules or Copilot instructions";
    // Rules/instructions for conventions, no session management
  }

  if (needs.modelAgnostic) {
    return "Aider CONVENTIONS.md";
    // Works with any model, simple and portable
  }

  if (needs.autoMemory && !needs.control) {
    return "Windsurf or ChatGPT memory";
    // Let the tool handle memory automatically
  }

  return "Start with CLAUDE.md/rules, add layers as needed";
}

Don’tDo InsteadWhy
Dump entire codebase into contextUse 3-layer architecture: index β†’ playbook β†’ on-demandContext rot degrades model attention. 50K of well-curated tokens beats 200K of everything.
Start every session from scratchUse MEMORY.md with OPEN sequenceRe-explaining your project costs 5-10K tokens per session. Memory costs 3K once.
Keep one long conversation going foreverOPEN/WORK/CLOSE with /clear between sessionsLong conversations compound input token costs. Message 20 costs 90K+ input tokens.
Put everything in CLAUDE.mdSplit into CLAUDE.md (index) + playbooks (session-loaded) + skills (on-demand)CLAUDE.md loads every session. 800 lines = 6K tokens taxed on every interaction.
Use gh issue view for large issuesUse lightweight fetcher (body only, last N comments, truncated)A 50-comment issue can be 15K tokens. Last 5 comments is 2K.
Let both lanes edit the same filesPartition resources by lane with clear ownership rulesConcurrent edits create conflicts. Comments are safe; edits are dangerous.
Architecture rabbit holes during executionCapture to planning book, continue with the goalSide quests are the #1 killer of session productivity. Capture, don’t execute.
Ignore weekly capacity budgetRun capacity check at OPEN and CLOSEBurning 80% of your budget by Wednesday means Thursday-Sunday are throttled.
Use Opus for everythingUse Sonnet for most work, Opus for complex reasoningOpus costs 5x more (output) and uses more extended thinking tokens.
Run sub-agents for simple tasksUse Grep/Glob/Read directlySub-agents have overhead (spawn, system prompt, return). A simple Grep is instant.
Store session transcripts in memoryStore state and decisions, not history”We discussed X” wastes tokens. β€œDecision: X, because Y” is actionable.
Treat AI subscription as freeTrack API-equivalent cost, target ROI$200/month subscription generating $10/day value = 1.5x ROI. Target 15-60x.
Plan trains with >5 sessionsSplit into smaller trains with focused goalsLonger trains drift. 5-session trains force prioritization and completion.
Skip the CLOSE phaseAlways run the full CLOSE checklistSkipping CLOSE means the next session starts from scratch. 2 minutes of CLOSE saves 10 minutes of re-orientation.
Load both playbooks at OPENLoad only your lane’s playbookEach playbook is 2-3K tokens. Loading both wastes 2-3K per session.
Use generic prompts (β€œmake it better”)Specific prompts (β€œadd input validation to auth.ts login function”)Vague prompts trigger broad scanning. Specific prompts trigger targeted work.

The most important shift in thinking is not about any specific technique. It is about treating your AI subscription as an employee with an ROI target.

The ROI Framework

interface AIEmployeeMetrics {
  // Cost
  monthlyCost: number;          // Subscription fee ($20-$200/month)
  weeklyBudget: number;         // monthlyCost / 4.3

  // Value (API-equivalent)
  weeklyApiEquivalent: number;  // What tokens would cost at API rates
  roi: number;                  // weeklyApiEquivalent / weeklyBudget

  // Utilization
  sessionsPerWeek: number;      // How many sessions you ran
  zeroDays: number;             // Days with no AI usage (wasted capacity)
  capacityUtilization: number;  // % of weekly token budget used

  // Effectiveness
  shippedPerSession: number;    // Concrete deliverables per session
  driftRate: number;            // % of sessions that went off-plan
  revenueImpact: number;       // DRR delta attributable to AI work
}

Real Numbers

Weekly report for a Max subscriber ($200/month):

  Subscription cost:     $46/week
  API-equivalent value:  $400-700/week (typical)
  ROI:                   8-15x

  Sessions:              20-30/week
  Zero days:             0-1 (target: 0)
  Capacity utilization:  85-95%

  Shipped per session:   1.2 concrete deliverables
  Drift rate:            <10%

  Comparison:
  - A junior developer costs $1200-2000/week
  - AI at $46/week doing 50% of that work = massive ROI
  - But ONLY if you manage context properly

The Revenue Test

Every session should pass the revenue test: β€œDoes this make money?”

type RevenueTest = {
  direct: "This session directly produces revenue (publish articles, deploy ads)";
  enabling: "This session enables revenue in the next 1-2 sessions (build pipeline, fix bug)";
  infrastructure: "This session builds infrastructure that reduces future cost per session";
  none: "This session produces no revenue impact β€” WHY are we doing it?";
};

// Train 1-3 pattern (bad): 15 sessions of infrastructure, $0 revenue
// Train 6 pattern (good): 21 articles published in session 1

// The fix: revenue-closing items go FIRST in every plan
// Infrastructure is a means, not an end

Subscription Selection

function chooseSubscription(usage: UsagePattern): string {
  const weeklyOutputTokens = usage.estimateWeeklyOutputTokens();
  const sessionsPerWeek = usage.sessionsPerWeek;

  if (sessionsPerWeek < 5 || weeklyOutputTokens < 500_000) {
    return "Claude Pro ($20/month)";
    // Light usage, occasional sessions
  }

  if (sessionsPerWeek < 15 || weeklyOutputTokens < 2_000_000) {
    return "Claude Max 5x ($100/month)";
    // Regular daily usage, 2-3 sessions/day
  }

  if (sessionsPerWeek >= 15 || weeklyOutputTokens >= 2_000_000) {
    return "Claude Max 20x ($200/month)";
    // Heavy usage, 3-5+ sessions/day
    // ROI: if each session produces $20+ in value, 15+ sessions/week = $300+/week
    // $200/month = $46/week β†’ 6.5x ROI at minimum
  }

  // For teams:
  if (usage.isTeam) {
    return "Claude API with team billing";
    // Average $100-200/developer/month with Sonnet
    // Set per-user TPM limits based on team size
  }
}

The Zero-Day Problem

A zero day is a day where your AI subscription sits unused.

On a $200/month Max plan:
  Daily cost: $6.60
  A zero day = $6.60 wasted

If you have 2 zero days per week:
  Weekly waste: $13.20
  Monthly waste: $57.20
  Annual waste: $686

That's enough for 3.4 months of the subscription.

Fix: capacity pacing. Check your pace at OPEN.
If behind, push more sessions. If ahead, lighter sessions.
Target: 0 zero days per week.

Stacking AI Tools

The meta-game includes stacking multiple AI tools for different purposes:

interface AIToolStack {
  // Primary: Claude Code for coding sessions
  claudeCode: {
    cost: "$200/month";
    use: "Dev, planning, ops, architecture sessions";
    contextSystem: "Full OPEN/WORK/CLOSE with MEMORY.md";
  };

  // Secondary: ChatGPT for quick questions, brainstorming
  chatGPT: {
    cost: "$20/month";
    use: "Quick questions, brainstorming, research";
    contextSystem: "Custom instructions + auto memory";
  };

  // Tertiary: Perplexity for search-heavy research
  perplexity: {
    cost: "$20/month";
    use: "Research, fact-checking, competitive analysis";
    contextSystem: "None (stateless search)";
  };

  // Total: $240/month for a full AI productivity stack
  // ROI target: 10x minimum = $2400/month in value
  // Realistic: $5000-10000/month for a skilled practitioner
}

Here is what a fully optimized AI working day looks like:

07:00  bash capacity-check.sh open
       β†’ 🟒 ON PACE: 42% used, day 4/7, 320K tokens/day budget

07:05  🟒 OPEN [D] β€” Train 6, Session 2: publish 20 articles on ugc.marketing
       (reads MEMORY.md: 3.5K tokens)
       (reads session-context.sh output: 2.5K tokens)
       (reads CLAUDE-main.md playbook: 2K tokens)
       OPEN cost: 12K tokens total

07:10  WORK β€” Generates articles, publishes via pipeline
       (reads template files, competitor analysis, keyword data on demand)

09:30  πŸ”΄ CLOSE [D] β€” 22 articles published, 3 competitor archetypes added
       (updates MEMORY.md, logs cost, checks capacity)
       API-equivalent: ~$35

09:35  🟒 OPEN [O] β€” Ops: build analytics verification tool
       (reads MEMORY.md, #49 ops tracking)
       OPEN cost: 8K tokens

10:00  WORK β€” Builds analytics verification script

10:45  πŸ”΄ CLOSE [O] β€” Analytics tool built, 3 brands verified
       API-equivalent: ~$7

10:50  🟒 OPEN [D] β€” Train 6, Session 3: publish 15 articles on vibemarketing.pro
       (note: different domain, same workflow)
       OPEN cost: 12K tokens

12:30  πŸ”΄ CLOSE [D] β€” 15 articles published
       API-equivalent: ~$25

       Day total: ~$67 API-equivalent, 3 sessions
       Subscription cost: ~$6.60/day
       ROI: 10.2x

       bash capacity-check.sh close
       β†’ 🟒 ON PACE: 56% used, day 4/7, good

Implementation Checklist

If you want to implement this system, here is the minimal viable version:

Week 1: Foundation

Week 2: Optimization

Week 3: Automation

Week 4: Scale

The Minimal Session System

If you do nothing else from this guide, implement this:



[One paragraph description]


1. At OPEN: read MEMORY.md, pick a goal
2. At WORK: execute the goal, don't wander
3. At CLOSE: update MEMORY.md, say "safe to /clear"


./MEMORY.md β€” read it first, update it last


[What's done, what's in progress, what's next]


[Choices made and why β€” so you don't re-debate them]


[What's stuck and what's needed to unstick it]

Last updated: [date] by [session type]

That is 20 lines of configuration. It takes 5 minutes to set up. And it will immediately improve every AI session you run because the AI starts with context instead of starting from zero.


Official Documentation

Research and Analysis

Practitioner Guides

Tool-Specific References

Configuration and Templates

Cost and Capacity


This article is based on running 50+ AI coding sessions per week across 10+ concurrent projects, tracking every token, and iterating on the system through 6 trains of development. The patterns are tool-agnostic but use Claude Code as the primary example because that is where the deepest experience lies. Every number cited is from real session cost logs, not estimates.


Edit page
Share this post on:

Previous Post
API Mom as Intelligent Router
Next Post
AI Usage Postmortem