Skip to content
Gary Wu
Go back

AI Usage Postmortem

Edit page

Org Status: 🟡 Dormant Cloudflare: N/A Last Audited: 2026-04-28


Every developer using AI coding tools has experienced the same moment: mid-flow, deep in a complex refactor, and the tool stops responding. “You’ve reached your usage limit.” No warning, no breakdown, no explanation of what consumed the budget. Just a wall. This article surveys every tool, technique, and data source available in 2026 for understanding where your AI coding tokens actually go — and identifies what’s still missing.

What you’ll learn:


  1. The Problem: Quota Blindness
  2. The Data Layer: What Each Tool Stores Locally
  3. Core Concepts
  4. Survey of Existing Tools
  5. Patterns: How the Best Tools Work
  6. Small Examples
  7. The Gap: What’s Missing
  8. The ncdu Analogy
  9. Cross-Platform Opportunity
  10. Comparisons
  11. Anti-Patterns
  12. Commercial Potential
  13. What Should Be Built
  14. References

The AI coding assistant market hit $3.5 billion in 2025. By early 2026, 85% of developers regularly use AI tools for coding, and 70% juggle two to four tools simultaneously. GitHub Copilot has 20M+ users. Claude Code overtook it as the most-used AI coding tool. Cursor crossed $500M ARR.

Every one of these tools has usage limits. None of them adequately explains where your budget went.

The 5-Hour Window Problem

Claude Code operates on a rolling 5-hour window. Pro users get approximately 44,000 tokens per window. Max5 users get around 88,000. Max20 users get roughly 220,000. Starting in August 2025, Anthropic added weekly limits on top of the 5-hour windows as a response to unsustainable consumption rates.

The problem is not the limit — it’s the opacity. You don’t know:

The $131 Cursor Surprise

When Cursor switched to usage-based pricing in June 2025, one developer’s bill jumped from $19 to $131 — a 700% increase. Claude accounted for 93% of their token consumption. They had no idea until the invoice arrived.

The 2.1.1 Bug Report

In January 2026, users reported hitting 5-hour limits 4x faster after updating to Claude Code 2.1.1. Some found that rolling back to 2.0.61 fixed it. Others disagreed. Anthropic said they hadn’t identified a flaw. Without granular usage data, nobody could prove anything — it was a he-said-she-said between users and the provider.

Key insight: The token economy is the first economy where consumers have no receipts. You pay (in quota or dollars), you consume, and you have no itemized bill. Every other metered utility — electricity, water, cloud compute, cellular data — gives you a breakdown. AI coding tools do not.

What Changes If You Get This Right

A developer who built Claude Spend discovered that just 3 marathon conversations consumed 60%+ of all their tokens due to compounding context. With that single insight, they could restructure their workflow — shorter sessions, more frequent /clear commands, targeted context loading — and extend their effective quota by 2-3x without spending more money.

That’s the prize: same budget, dramatically more output. You just need to see where the tokens go.


Before we survey the tracking tools, we need to understand the raw material they work with. Every AI coding assistant stores conversation data locally. The format, location, and richness of this data determines what any analytics tool can possibly show you.

Claude Code: JSONL Transcripts

Claude Code stores conversation transcripts as JSONL files (one JSON object per line) in two locations:

~/.claude/projects/<project-name>/<conversation-id>.jsonl   # Project-specific
~/.claude/transcripts/<session-id>.jsonl                      # All sessions

Each line represents an event. The schema:

// Core event types in Claude Code JSONL files
interface UserEvent {
  type: "user";
  timestamp: string;       // ISO 8601
  content: string;         // The user's prompt text
}

interface ToolUseEvent {
  type: "tool_use";
  timestamp: string;
  tool_name: string;       // "Read", "Write", "Bash", "Grep", etc.
  tool_input: Record<string, unknown>;
}

interface ToolResultEvent {
  type: "tool_result";
  timestamp: string;
  tool_name: string;
  tool_input: Record<string, unknown>;
}

// Assistant messages with token usage
interface AssistantMessage {
  parentUuid: string;
  sessionId: string;
  version: string;
  gitBranch: string;
  cwd: string;
  uuid: string;
  timestamp: string;
  isSidechain: boolean;
  isApiErrorMessage: boolean;
  message: {
    role: "assistant";
    content: ContentBlock[];
    usage: TokenUsage;
  };
}

interface TokenUsage {
  input_tokens: number;
  output_tokens: number;
  cache_creation_input_tokens?: number;
  cache_read_input_tokens?: number;
}

Key insight: The usage field on assistant messages is the gold mine. It tells you exactly how many tokens were consumed per API call, split by type. The isSidechain flag distinguishes parallel agent calls from main conversation flow. The three input token types (regular, cache creation, cache read) have dramatically different costs — cache reads are 90% cheaper.

A typical session produces 300-800 JSONL lines. Heavy sessions with lots of file reads and edits can produce thousands. Here’s what a real session breakdown looks like:

// Counting message types from a real Claude Code session
const messageCounts = {
  user: 35,        // 35 human prompts
  tool_use: 171,   // 171 tool invocations
  tool_result: 166  // 166 tool results (some tools fail)
};
// Ratio: ~5 tool calls per human message
// This is typical for agentic coding sessions

Cursor: SQLite + Undocumented API

Cursor stores local data in SQLite databases, primarily state.vscdb. This includes chat history, compose logs, and metrics. The key discovery that enabled third-party tracking was an undocumented API endpoint found by observing browser redirects during CSV downloads:

// Cursor data access paths
interface CursorDataSources {
  // Local SQLite database with chat history
  localDb: "~/Library/Application Support/Cursor/User/globalStorage/state.vscdb";

  // JWT tokens for API auth (extracted from SQLite)
  authToken: "state.vscdb -> JWT -> session token conversion";

  // Undocumented usage API (discovered via browser redirect interception)
  usageApi: "https://www.cursor.com/api/usage"; // requires session token

  // Tokscale cache location for synced data
  tokscaleCache: "~/.config/tokscale/cursor-cache/";
}

Cursor’s local data is stored in plain-text SQLite — a documented privacy concern in their community forums. This is both a risk and an opportunity: it means third-party tools can read it.

GitHub Copilot: NDJSON Export + Metrics API

Copilot provides the most structured data access of any tool, but primarily for organization admins:

interface CopilotDataSources {
  // Organization-level API (admin access required)
  metricsApi: "GET /orgs/{org}/copilot/metrics";
  usageApi: "GET /orgs/{org}/copilot/usage";

  // NDJSON export for raw data
  ndjsonExport: "Settings > Copilot > Export usage data";

  // VS Code extension local logs
  extensionLogs: "~/.vscode/extensions/github.copilot-*/logs/";

  // Token tracking extension data
  tokenTracker: "VS Code status bar via robBos.copilot-token-tracker";
}

// Metrics available via API
interface CopilotMetrics {
  total_active_users: number;
  total_engaged_users: number;
  copilot_ide_code_completions: {
    total_engaged_users: number;
    total_code_suggestions: number;
    total_code_acceptances: number;
    total_code_lines_suggested: number;
    total_code_lines_accepted: number;
  };
  copilot_ide_chat: {
    total_engaged_users: number;
    total_chats: number;
    total_chat_insertion_events: number;
    total_chat_copy_events: number;
  };
}

The gap: individual developers on personal plans have almost no visibility. The VS Code token tracker extension estimates usage from local logs, but Copilot doesn’t expose per-request token counts the way Claude Code does.

Gemini CLI: JSON Chat Files

Gemini CLI stores conversation data in JSON files:

~/.gemini/tmp/*/chats/*.json

Gemini CLI also has built-in analytics via the /stats command, which shows token usage split by model and tool usage. Google provides a pre-configured monitoring dashboard in Google Cloud Monitoring when OpenTelemetry is configured.

interface GeminiDataSources {
  // Local chat files
  chatFiles: "~/.gemini/tmp/*/chats/*.json";

  // Built-in stats command
  builtInStats: "/stats"; // shows model, tool usage, idle time

  // OpenTelemetry export to GCP
  otelExport: {
    dashboardTemplate: "Gemini CLI Monitoring";
    directGcpExport: true; // No intermediate collector needed
  };
}

OpenAI Codex CLI: Session Files

Codex CLI stores session data locally:

~/.codex/sessions/

Codex has a /status command for checking remaining limits during a session, but real-time cost visibility is a feature request, not a shipped feature. The data format is similar to Claude Code’s JSONL approach.

interface CodexDataSources {
  // Local session storage
  sessions: "~/.codex/sessions/";

  // Usage records (when available)
  usageRecords: "~/.codex/usage/";

  // Built-in status check
  builtInStatus: "/status"; // shows remaining limits
}

Token Types and Their Costs

Not all tokens cost the same. Understanding the four types is essential for optimization:

interface TokenBreakdown {
  // Regular input tokens — full price
  input_tokens: number;

  // Output tokens — typically 3-5x more expensive than input
  output_tokens: number;

  // Cache creation — charged at ~1.25x input rate (one-time cost)
  cache_creation_input_tokens: number;

  // Cache read — charged at ~0.1x input rate (90% discount!)
  cache_read_input_tokens: number;
}

// Real pricing for Claude Sonnet 4 (as of March 2026)
const SONNET_PRICING = {
  input: 3.00,            // per million tokens
  output: 15.00,          // per million tokens
  cacheCreation: 3.75,    // per million tokens
  cacheRead: 0.30,        // per million tokens — 10x cheaper!
};

function calculateCost(usage: TokenBreakdown): number {
  return (
    (usage.input_tokens * SONNET_PRICING.input +
     usage.output_tokens * SONNET_PRICING.output +
     (usage.cache_creation_input_tokens ?? 0) * SONNET_PRICING.cacheCreation +
     (usage.cache_read_input_tokens ?? 0) * SONNET_PRICING.cacheRead) /
    1_000_000
  );
}

Key insight: Cache efficiency is the single biggest lever for cost optimization. A session where 80% of input tokens are cache reads costs roughly 8x less than one sending everything fresh. The tracking tools that distinguish cache hits from misses are dramatically more useful than those that just count total tokens.

The 5-Hour Rolling Window

Claude Code’s billing window is the most complex quota system among AI coding tools:

interface BillingWindow {
  // 5-hour rolling window
  windowDuration: 5 * 60 * 60 * 1000; // 5 hours in ms

  // Tier-specific limits (approximate, Anthropic doesn't publish exact numbers)
  limits: {
    pro: 44_000;      // ~44K tokens per 5-hour window
    max5: 88_000;     // ~88K tokens
    max20: 220_000;   // ~220K tokens
  };

  // Weekly limit overlay (added August 2025)
  weeklyLimit: {
    enabled: true;
    // Resets on a rolling basis, not calendar-aligned
    // Exact amounts vary and are not published
  };
}

// The key insight: your effective budget is
// min(windowRemaining, weeklyRemaining)
// You can hit either limit independently

Context Compounding

The most expensive pattern in AI coding is context compounding — where each conversational turn re-sends the entire history:

// How context compounds in a long session
function simulateContextGrowth(turns: number, avgResponseTokens: number): number[] {
  const contextPerTurn: number[] = [];
  let totalContext = 0;

  for (let i = 0; i < turns; i++) {
    // Each turn sends ALL previous context + new prompt
    const promptTokens = 200; // average user prompt
    totalContext += promptTokens + avgResponseTokens;
    contextPerTurn.push(totalContext);
  }

  return contextPerTurn;
}

// With 500-token average responses over 20 turns:
// Turn 1:   700 tokens sent
// Turn 5:   3,500 tokens sent
// Turn 10:  7,000 tokens sent
// Turn 20:  14,000 tokens sent
// CUMULATIVE: 147,000 tokens total input
//
// Without compounding (if each turn were independent):
// 20 × 700 = 14,000 tokens total
//
// That's a 10.5x multiplier from compounding alone.

Key insight: The Claude Spend developer discovered that 3 marathon conversations consumed 60%+ of all tokens. This is the direct result of context compounding. Short, focused sessions with /clear between tasks are dramatically cheaper than long-running sessions.

OpenTelemetry: The Official Instrumentation Layer

Claude Code has first-class OpenTelemetry support — the most comprehensive of any AI coding tool. This is the “right” way to collect data at scale:

// Claude Code OTEL configuration
interface OtelConfig {
  // Required: enable telemetry
  CLAUDE_CODE_ENABLE_TELEMETRY: "1";

  // Metrics exporter (time-series data)
  OTEL_METRICS_EXPORTER: "otlp" | "prometheus" | "console";

  // Logs/events exporter (per-event data)
  OTEL_LOGS_EXPORTER: "otlp" | "console";

  // OTLP endpoint
  OTEL_EXPORTER_OTLP_PROTOCOL: "grpc" | "http/json" | "http/protobuf";
  OTEL_EXPORTER_OTLP_ENDPOINT: string; // e.g., "http://localhost:4317"
}

// Metrics exported by Claude Code
interface ClaudeCodeMetrics {
  "claude_code.session.count": Counter;
  "claude_code.lines_of_code.count": Counter; // attributes: type=added|removed
  "claude_code.pull_request.count": Counter;
  "claude_code.commit.count": Counter;
  "claude_code.cost.usage": Counter;           // USD, per model
  "claude_code.token.usage": Counter;          // per type, per model
  "claude_code.code_edit_tool.decision": Counter;
  "claude_code.active_time.total": Counter;    // seconds
}

// Events exported (richer than metrics)
interface ClaudeCodeEvents {
  "claude_code.user_prompt": {
    prompt_length: number;
    prompt?: string; // only if OTEL_LOG_USER_PROMPTS=1
  };
  "claude_code.api_request": {
    model: string;
    cost_usd: number;
    duration_ms: number;
    input_tokens: number;
    output_tokens: number;
    cache_read_tokens: number;
    cache_creation_tokens: number;
    speed: "fast" | "normal";
  };
  "claude_code.tool_result": {
    tool_name: string;
    success: boolean;
    duration_ms: number;
    tool_result_size_bytes: number;
  };
  "claude_code.api_error": {
    model: string;
    error: string;
    status_code: string;
    attempt: number;
  };
}

The OTEL approach gives you the richest data, but requires infrastructure (Grafana, Prometheus, Honeycomb, etc.). Most individual developers don’t run observability stacks. This is the gap that local analytics tools fill.


The ecosystem has exploded. Here’s every significant tool, organized by approach.

ccusage (11.6K stars)

The dominant tool in the space. Written in TypeScript, maintained by ryoppippi, with 61 contributors and releases up to v18.0.10.

npx ccusage daily           # Daily token usage and costs
npx ccusage monthly         # Monthly aggregated reports
npx ccusage session         # Per-conversation breakdown
npx ccusage blocks          # 5-hour billing window tracking
npx ccusage statusline      # One-line summary for shell prompts

npx ccusage daily --since 2026-03-01 --breakdown --json
npx ccusage session --project my-project --compact

What it reads: Local JSONL files from ~/.claude/projects/

Key features:

Architecture: Pure TypeScript, minimal dependencies, no build step required. Uses LiteLLM pricing data. The @ccusage/codex package adds Codex CLI support.

// ccusage JSON output structure (from --json flag)
interface CcusageDailyOutput {
  date: string;
  totalInputTokens: number;
  totalOutputTokens: number;
  cacheCreationTokens: number;
  cacheReadTokens: number;
  totalCost: number;       // USD
  models: {
    [modelName: string]: {
      inputTokens: number;
      outputTokens: number;
      cost: number;
    };
  };
}

tokscale (1.2K stars)

The most ambitious cross-platform play. Built by junhoyeo with a Rust core for performance.

npx tokscale@latest
bunx tokscale@latest

tokscale login          # GitHub auth
tokscale submit         # Upload usage data
tokscale submit --dry-run

Supported platforms (15+):

PlatformData LocationMethod
Claude Code~/.claude/projects/JSONL parsing
Codex CLI~/.codex/sessions/Session file parsing
Gemini CLI~/.gemini/tmp/*/chats/*.jsonJSON parsing
Cursor IDE~/.config/tokscale/cursor-cache/API sync
OpenCode~/.local/share/opencode/opencode.dbSQLite
Amp (AmpCode)~/.local/share/amp/threads/File parsing
OpenClaw~/.openclaw/agents/File parsing
Factory Droid~/.factory/sessions/File parsing
Pi~/.pi/agent/sessions/File parsing
Kimi CLI~/.kimi/sessions/File parsing
Qwen CLI~/.qwen/projects/File parsing
Roo CodeVS Code extension storageStorage API
KiloVS Code extension storageStorage API

Key features:

Key insight: tokscale’s leaderboard feature turns token consumption into a social metric — a “Kardashev scale” for AI-augmented development. This is either brilliant gamification or a race to the bottom of cost efficiency, depending on your perspective.

Claude Spend (236 stars)

The tool that went viral on Reddit and got its creator their first 100 real users.

npx claude-spend
npx claude-spend --port 8080 --no-open

What it provides: A web dashboard on localhost showing:

Key differentiator: Web UI instead of terminal output. Opens a browser dashboard automatically. Privacy-first — no data leaves your machine.

Tier 2: Real-Time Monitors

Claude Code Usage Monitor (7K stars)

A Python-based real-time terminal monitor by Maciek-roboblog.

uv pip install claude-monitor
pip install claude-monitor

claude-monitor
cmonitor
ccm

Key features:

ML predictions: The tool analyzes all sessions from the last 8 days, calculates P90 percentiles, and predicts when you’ll hit limits with 95% confidence. This is the only tool that tries to predict rather than just report.

claude-code-limit-tracker (15 stars)

A shell-integrated tracker by TylerGallenbeck that displays quota in your terminal prompt.

uv run python install.py

Technical approach: Reads ~/.claude/projects/, calculates session durations from timestamps, counts prompts (filtering system messages), identifies model from assistant responses, and tracks per-model quotas separately for Sonnet and Opus.

Tier 3: Platform-Native Apps

Claude Usage Tracker — macOS (1.6K stars)

A native macOS menu bar app by hamed-elfayome, built with Swift/SwiftUI.

How it works: Unlike JSONL-parsing tools, this one authenticates directly with Claude’s API using session keys from browser cookies or Claude Code CLI credentials. It hits api.anthropic.com/api/oauth/usage for real utilization percentages.

Features:

Key differentiator: It gets the actual quota percentages from Anthropic’s API rather than estimating from local data. This means it knows your exact remaining budget, not just what you’ve consumed.

TokenMeter (macOS)

Another macOS menu bar app (Priyans-hu/tokenmeter) that reads Claude Code’s OAuth token from macOS Keychain and calls the same usage API endpoint.

CodexBar (macOS)

steipete/CodexBar shows usage stats for both OpenAI Codex and Claude Code from the menu bar, without requiring login.

Tier 4: Observability-Stack Integrations

claude-code-monitor — OpenTelemetry (7 stars)

A dedicated OTEL receiver that captures Claude Code telemetry and provides a web dashboard.

Architecture:

claude_telemetry

TechNickAI/claude_telemetry is a drop-in replacement that swaps the claude command for claudia, wrapping Claude Code with OpenTelemetry instrumentation. Exports to Logfire, Sentry, Honeycomb, or Datadog.

claude-code-otel

ColeMurray/claude-code-otel provides a comprehensive observability solution with Docker Compose configurations, Prometheus, and Grafana dashboards.

Tier 5: Browser Extensions (Cursor-Specific)

ExtensionPlatformMethodFeatures
Cursor StatsChromeDashboard scrapingUsage visualization
Cursor Token TrackerChromeAPICost tracking, budget projection, trend charts
Cursor Usage & Cost TrackerVS CodeExtensionIn-editor monitoring, monthly budget alerts
Cursor-PulseVS CodeExtensionReal-time quota tracking

Tier 6: Copilot-Specific

ToolTypeFeatures
Copilot Token TrackerVS Code ExtensionDaily/monthly estimates in status bar
copilot-metrics-viewerWeb DashboardOrg-level Copilot Business API visualization
Copilot Fluency ScoreVS Code Extension4-stage maturity model across 6 categories

Tier 7: Log Viewers (Non-Analytics)

ToolPurpose
claude-code-logConverts JSONL transcripts to readable HTML
clogWeb-based conversation log viewer with real-time monitoring
tps-viewerVisualizes Claude Code tokens-per-second over time
claude-monitor-gnome-extensionGNOME Shell 48 taskbar monitor

Pattern 1: JSONL Streaming Parser

The foundation of every Claude Code analytics tool. Parse JSONL files, extract token usage, aggregate by time period.

import { readFileSync, readdirSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";

interface TokenUsage {
  input_tokens: number;
  output_tokens: number;
  cache_creation_input_tokens?: number;
  cache_read_input_tokens?: number;
}

interface ParsedMessage {
  timestamp: Date;
  sessionId: string;
  model?: string;
  usage?: TokenUsage;
  type: "user" | "assistant" | "tool_use" | "tool_result";
  toolName?: string;
  isSidechain: boolean;
}

function parseJsonlFile(filePath: string): ParsedMessage[] {
  const content = readFileSync(filePath, "utf-8");
  const lines = content.trim().split("\n");
  const messages: ParsedMessage[] = [];

  for (const line of lines) {
    try {
      const entry = JSON.parse(line);

      const parsed: ParsedMessage = {
        timestamp: new Date(entry.timestamp),
        sessionId: entry.sessionId ?? "",
        type: entry.type,
        isSidechain: entry.isSidechain ?? false,
        toolName: entry.tool_name,
      };

      // Extract usage from assistant messages
      if (entry.message?.usage) {
        parsed.usage = entry.message.usage;
        parsed.model = inferModel(entry);
      }

      messages.push(parsed);
    } catch {
      // Skip malformed lines — JSONL files can be interrupted mid-write
      continue;
    }
  }

  return messages;
}

function inferModel(entry: Record<string, unknown>): string | undefined {
  // Model identification varies by Claude Code version
  // Some versions include it directly, others require inference
  const message = entry.message as Record<string, unknown> | undefined;
  if (message?.model) return message.model as string;

  // Heuristic: check content for model markers
  const content = message?.content;
  if (Array.isArray(content)) {
    for (const block of content) {
      if (typeof block === "object" && block !== null && "model" in block) {
        return (block as { model: string }).model;
      }
    }
  }

  return undefined;
}

function discoverAllSessions(): Map<string, ParsedMessage[]> {
  const projectsDir = join(homedir(), ".claude", "projects");
  const sessions = new Map<string, ParsedMessage[]>();

  try {
    const projects = readdirSync(projectsDir);
    for (const project of projects) {
      const projectDir = join(projectsDir, project);
      const files = readdirSync(projectDir).filter((f) =>
        f.endsWith(".jsonl")
      );

      for (const file of files) {
        const filePath = join(projectDir, file);
        const messages = parseJsonlFile(filePath);
        const sessionId = file.replace(".jsonl", "");
        sessions.set(`${project}/${sessionId}`, messages);
      }
    }
  } catch {
    // Directory may not exist on fresh installs
  }

  return sessions;
}

When to use: Any time you need to analyze Claude Code usage. This is the foundational pattern.

Gotchas:

Pattern 2: Rolling Window Calculator

Computing remaining budget in the 5-hour billing window.

interface WindowStatus {
  windowStart: Date;
  windowEnd: Date;
  tokensUsed: number;
  tokenLimit: number;
  remainingTokens: number;
  percentUsed: number;
  burnRate: number; // tokens per minute
  projectedExhaustion: Date | null;
  minutesUntilReset: number;
}

type SubscriptionTier = "pro" | "max5" | "max20";

const TIER_LIMITS: Record<SubscriptionTier, number> = {
  pro: 44_000,
  max5: 88_000,
  max20: 220_000,
};

const WINDOW_DURATION_MS = 5 * 60 * 60 * 1000;

function calculateWindowStatus(
  messages: ParsedMessage[],
  tier: SubscriptionTier,
  now: Date = new Date()
): WindowStatus {
  const windowStart = new Date(now.getTime() - WINDOW_DURATION_MS);
  const windowEnd = now;
  const tokenLimit = TIER_LIMITS[tier];

  // Sum tokens in current window (only from non-sidechain messages with usage)
  const windowMessages = messages.filter(
    (m) =>
      m.timestamp >= windowStart &&
      m.timestamp <= windowEnd &&
      m.usage &&
      !m.isSidechain
  );

  let tokensUsed = 0;
  for (const msg of windowMessages) {
    if (msg.usage) {
      tokensUsed +=
        msg.usage.input_tokens +
        msg.usage.output_tokens +
        (msg.usage.cache_creation_input_tokens ?? 0) +
        (msg.usage.cache_read_input_tokens ?? 0);
    }
  }

  const remainingTokens = Math.max(0, tokenLimit - tokensUsed);
  const percentUsed = (tokensUsed / tokenLimit) * 100;

  // Calculate burn rate from last 30 minutes of activity
  const recentWindow = new Date(now.getTime() - 30 * 60 * 1000);
  const recentMessages = windowMessages.filter(
    (m) => m.timestamp >= recentWindow
  );
  let recentTokens = 0;
  for (const msg of recentMessages) {
    if (msg.usage) {
      recentTokens +=
        msg.usage.input_tokens +
        msg.usage.output_tokens +
        (msg.usage.cache_creation_input_tokens ?? 0) +
        (msg.usage.cache_read_input_tokens ?? 0);
    }
  }

  const minutesElapsed = Math.max(
    1,
    (now.getTime() - recentWindow.getTime()) / 60_000
  );
  const burnRate = recentTokens / minutesElapsed;

  // Project when tokens will run out
  let projectedExhaustion: Date | null = null;
  if (burnRate > 0 && remainingTokens > 0) {
    const minutesLeft = remainingTokens / burnRate;
    projectedExhaustion = new Date(now.getTime() + minutesLeft * 60_000);
  }

  // Minutes until oldest messages fall out of window
  const oldestInWindow = windowMessages.reduce(
    (oldest, m) => (m.timestamp < oldest ? m.timestamp : oldest),
    now
  );
  const minutesUntilReset =
    (oldestInWindow.getTime() + WINDOW_DURATION_MS - now.getTime()) / 60_000;

  return {
    windowStart,
    windowEnd,
    tokensUsed,
    tokenLimit,
    remainingTokens,
    percentUsed,
    burnRate,
    projectedExhaustion,
    minutesUntilReset,
  };
}

When to use: Real-time monitoring, shell status lines, burn rate alerts.

Gotchas:

Pattern 3: Cost Attribution by Activity Type

The most valuable and least-implemented pattern. Categorize token spend by what you were actually doing.

type ActivityType =
  | "file_reading"    // Read, Glob, Grep
  | "code_writing"    // Edit, Write
  | "code_execution"  // Bash
  | "conversation"    // Pure text exchange
  | "planning"        // TodoWrite, agent orchestration
  | "search"          // WebSearch, WebFetch
  | "git_operations"  // git commands via Bash
  | "unknown";

interface ActivityCost {
  type: ActivityType;
  tokenCount: number;
  estimatedCost: number;
  messageCount: number;
  percentage: number;
}

function classifyToolActivity(toolName: string, toolInput?: Record<string, unknown>): ActivityType {
  // Direct tool classification
  const toolMap: Record<string, ActivityType> = {
    Read: "file_reading",
    Glob: "file_reading",
    Grep: "file_reading",
    Edit: "code_writing",
    Write: "code_writing",
    NotebookEdit: "code_writing",
    Bash: "code_execution",
    TodoWrite: "planning",
    TodoRead: "planning",
    WebSearch: "search",
    WebFetch: "search",
    Agent: "planning",
  };

  if (toolMap[toolName]) {
    // Special case: Bash commands that are git operations
    if (toolName === "Bash" && toolInput) {
      const command = (toolInput.command as string) ?? "";
      if (command.startsWith("git ") || command.includes("| git")) {
        return "git_operations";
      }
    }
    return toolMap[toolName];
  }

  return "unknown";
}

function attributeCosts(
  messages: ParsedMessage[],
  pricing: Record<string, number>
): ActivityCost[] {
  // Group messages into "turns" — each user message starts a turn
  // that includes all subsequent tool calls and assistant responses
  const turns: { activity: ActivityType; tokens: number }[] = [];
  let currentActivity: ActivityType = "conversation";
  let currentTokens = 0;

  for (const msg of messages) {
    if (msg.type === "user") {
      // Save previous turn
      if (currentTokens > 0) {
        turns.push({ activity: currentActivity, tokens: currentTokens });
      }
      currentActivity = "conversation";
      currentTokens = 0;
    }

    if (msg.type === "tool_use" && msg.toolName) {
      currentActivity = classifyToolActivity(msg.toolName);
    }

    if (msg.usage) {
      const total =
        msg.usage.input_tokens +
        msg.usage.output_tokens +
        (msg.usage.cache_creation_input_tokens ?? 0) +
        (msg.usage.cache_read_input_tokens ?? 0);
      currentTokens += total;
    }
  }

  // Don't forget the last turn
  if (currentTokens > 0) {
    turns.push({ activity: currentActivity, tokens: currentTokens });
  }

  // Aggregate by activity type
  const totals = new Map<ActivityType, { tokens: number; count: number }>();
  let grandTotal = 0;

  for (const turn of turns) {
    const existing = totals.get(turn.activity) ?? { tokens: 0, count: 0 };
    existing.tokens += turn.tokens;
    existing.count += 1;
    totals.set(turn.activity, existing);
    grandTotal += turn.tokens;
  }

  return Array.from(totals.entries())
    .map(([type, data]) => ({
      type,
      tokenCount: data.tokens,
      estimatedCost: (data.tokens / 1_000_000) * (pricing.blendedRate ?? 5.0),
      messageCount: data.count,
      percentage: grandTotal > 0 ? (data.tokens / grandTotal) * 100 : 0,
    }))
    .sort((a, b) => b.tokenCount - a.tokenCount);
}

When to use: Understanding why you’re burning tokens. The most actionable pattern.

Gotchas:

Pattern 4: Session Comparison Dashboard

Compare sessions to find anomalies — the “which session ate my budget?” pattern.

interface SessionSummary {
  id: string;
  project: string;
  startTime: Date;
  endTime: Date;
  durationMinutes: number;
  totalTokens: number;
  inputTokens: number;
  outputTokens: number;
  cacheHitRate: number;        // percentage of input from cache
  toolCallCount: number;
  userMessageCount: number;
  tokensPerMinute: number;
  costEstimate: number;
  topTools: { name: string; count: number }[];
  model: string;
}

function summarizeSession(
  id: string,
  project: string,
  messages: ParsedMessage[]
): SessionSummary {
  if (messages.length === 0) {
    return {
      id, project,
      startTime: new Date(), endTime: new Date(),
      durationMinutes: 0, totalTokens: 0,
      inputTokens: 0, outputTokens: 0,
      cacheHitRate: 0, toolCallCount: 0,
      userMessageCount: 0, tokensPerMinute: 0,
      costEstimate: 0, topTools: [], model: "unknown",
    };
  }

  const sorted = [...messages].sort(
    (a, b) => a.timestamp.getTime() - b.timestamp.getTime()
  );
  const startTime = sorted[0].timestamp;
  const endTime = sorted[sorted.length - 1].timestamp;
  const durationMinutes = Math.max(
    1,
    (endTime.getTime() - startTime.getTime()) / 60_000
  );

  let inputTokens = 0;
  let outputTokens = 0;
  let cacheReadTokens = 0;
  let cacheCreationTokens = 0;
  const toolCounts = new Map<string, number>();
  let userMessageCount = 0;
  let model = "unknown";

  for (const msg of messages) {
    if (msg.type === "user") userMessageCount++;
    if (msg.type === "tool_use" && msg.toolName) {
      toolCounts.set(msg.toolName, (toolCounts.get(msg.toolName) ?? 0) + 1);
    }
    if (msg.usage) {
      inputTokens += msg.usage.input_tokens;
      outputTokens += msg.usage.output_tokens;
      cacheReadTokens += msg.usage.cache_read_input_tokens ?? 0;
      cacheCreationTokens += msg.usage.cache_creation_input_tokens ?? 0;
    }
    if (msg.model) model = msg.model;
  }

  const totalInput = inputTokens + cacheReadTokens + cacheCreationTokens;
  const totalTokens = totalInput + outputTokens;
  const cacheHitRate = totalInput > 0
    ? (cacheReadTokens / totalInput) * 100
    : 0;

  const topTools = Array.from(toolCounts.entries())
    .map(([name, count]) => ({ name, count }))
    .sort((a, b) => b.count - a.count)
    .slice(0, 5);

  return {
    id,
    project,
    startTime,
    endTime,
    durationMinutes,
    totalTokens,
    inputTokens,
    outputTokens,
    cacheHitRate,
    toolCallCount: Array.from(toolCounts.values()).reduce((s, c) => s + c, 0),
    userMessageCount,
    tokensPerMinute: totalTokens / durationMinutes,
    costEstimate: totalTokens * 0.000005, // rough blended rate
    topTools,
    model,
  };
}

// Find the most expensive sessions in a time period
function findTopSessions(
  sessions: Map<string, ParsedMessage[]>,
  since: Date,
  limit: number = 10
): SessionSummary[] {
  const summaries: SessionSummary[] = [];

  for (const [key, messages] of sessions) {
    const [project, sessionId] = key.split("/");
    const filtered = messages.filter((m) => m.timestamp >= since);
    if (filtered.length === 0) continue;

    summaries.push(summarizeSession(sessionId, project, filtered));
  }

  return summaries
    .sort((a, b) => b.totalTokens - a.totalTokens)
    .slice(0, limit);
}

When to use: Post-mortem analysis after hitting a limit. “Which session ate my budget?”

Connection to Pattern 3: Combine with activity attribution to answer “which session ate my budget AND why?”

Pattern 5: OpenTelemetry Collection Pipeline

For teams that want production-grade observability.

// docker-compose.yml equivalent configuration
interface OtelPipelineConfig {
  // Step 1: Configure Claude Code to export
  claudeCodeEnv: {
    CLAUDE_CODE_ENABLE_TELEMETRY: "1";
    OTEL_METRICS_EXPORTER: "otlp";
    OTEL_LOGS_EXPORTER: "otlp";
    OTEL_EXPORTER_OTLP_PROTOCOL: "grpc";
    OTEL_EXPORTER_OTLP_ENDPOINT: "http://localhost:4317";
    OTEL_RESOURCE_ATTRIBUTES: "department=engineering,team.id=platform";
  };

  // Step 2: OTEL Collector receives and routes data
  collector: {
    receivers: ["otlp/grpc:4317", "otlp/http:4318"];
    processors: ["batch", "memory_limiter"];
    exporters: ["prometheus", "loki"];
  };

  // Step 3: Prometheus scrapes metrics
  prometheus: {
    scrapeInterval: "15s";
    targets: ["otel-collector:8889"];
  };

  // Step 4: Grafana visualizes everything
  grafana: {
    dataSources: ["prometheus", "loki"];
    dashboards: ["claude-code-usage", "team-comparison"];
  };
}

// Useful PromQL queries for Grafana dashboards
const GRAFANA_QUERIES = {
  // Token usage rate per user
  tokenRate: `rate(claude_code_token_usage_total[5m]) by (user_account_uuid, type)`,

  // Cost per team per day
  dailyCostByTeam: `sum(increase(claude_code_cost_usage_total[24h])) by (department)`,

  // Cache hit rate
  cacheHitRate: `
    sum(rate(claude_code_token_usage_total{type="cacheRead"}[1h]))
    /
    sum(rate(claude_code_token_usage_total{type=~"input|cacheRead|cacheCreation"}[1h]))
    * 100
  `,

  // Most active sessions
  activeSessionTokens: `topk(10, sum(claude_code_token_usage_total) by (session_id))`,

  // Tool usage frequency
  toolFrequency: `sum(increase(claude_code_tool_result_total[24h])) by (tool_name)`,

  // Average session duration
  avgSessionDuration: `avg(claude_code_active_time_total) by (user_account_uuid)`,
};

When to use: Teams of 5+ developers. Enterprise environments. When you need historical trending, alerting, and cross-developer comparison.

Gotchas:


Example 1: Quick Token Count for Current Session

import { readFileSync, readdirSync, statSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";

function getCurrentSessionTokens(): { input: number; output: number; cost: number } {
  const transcriptsDir = join(homedir(), ".claude", "transcripts");
  const files = readdirSync(transcriptsDir)
    .map((f) => ({
      name: f,
      mtime: statSync(join(transcriptsDir, f)).mtime.getTime(),
    }))
    .sort((a, b) => b.mtime - a.mtime);

  if (files.length === 0) return { input: 0, output: 0, cost: 0 };

  const latest = readFileSync(join(transcriptsDir, files[0].name), "utf-8");
  let totalInput = 0;
  let totalOutput = 0;

  for (const line of latest.trim().split("\n")) {
    try {
      const entry = JSON.parse(line);
      if (entry.message?.usage) {
        const u = entry.message.usage;
        totalInput += u.input_tokens + (u.cache_read_input_tokens ?? 0)
                      + (u.cache_creation_input_tokens ?? 0);
        totalOutput += u.output_tokens;
      }
    } catch { continue; }
  }

  const cost = (totalInput * 3 + totalOutput * 15) / 1_000_000;
  return { input: totalInput, output: totalOutput, cost };
}

Example 2: Cache Efficiency Report

function cacheEfficiencyReport(messages: ParsedMessage[]): void {
  let freshInput = 0;
  let cacheRead = 0;
  let cacheCreation = 0;

  for (const msg of messages) {
    if (!msg.usage) continue;
    freshInput += msg.usage.input_tokens;
    cacheRead += msg.usage.cache_read_input_tokens ?? 0;
    cacheCreation += msg.usage.cache_creation_input_tokens ?? 0;
  }

  const totalInput = freshInput + cacheRead + cacheCreation;
  const hitRate = totalInput > 0 ? (cacheRead / totalInput) * 100 : 0;

  // Cost comparison: actual vs. if everything were fresh
  const actualCost = (freshInput * 3 + cacheRead * 0.3 + cacheCreation * 3.75) / 1_000_000;
  const freshCost = (totalInput * 3) / 1_000_000;
  const savings = freshCost - actualCost;

  console.log(`Cache hit rate: ${hitRate.toFixed(1)}%`);
  console.log(`Actual cost:    $${actualCost.toFixed(4)}`);
  console.log(`Without cache:  $${freshCost.toFixed(4)}`);
  console.log(`Savings:        $${savings.toFixed(4)} (${((savings/freshCost)*100).toFixed(0)}%)`);
}

Example 3: Tool Usage Frequency Heatmap Data

function toolUsageByHour(messages: ParsedMessage[]): Map<number, Map<string, number>> {
  const heatmap = new Map<number, Map<string, number>>();

  for (const msg of messages) {
    if (msg.type !== "tool_use" || !msg.toolName) continue;
    const hour = msg.timestamp.getHours();

    if (!heatmap.has(hour)) heatmap.set(hour, new Map());
    const hourMap = heatmap.get(hour)!;
    hourMap.set(msg.toolName, (hourMap.get(msg.toolName) ?? 0) + 1);
  }

  return heatmap;
}

// Output: which tools you use at which hours
// Reveals patterns like: "I do most file reading in the morning,
// most Bash execution in the afternoon"

Example 4: DuckDB Analysis of JSONL Files

-- Load all JSONL files from Claude Code projects directory
-- DuckDB can query JSONL files directly without importing
SELECT
    json_extract_string(line, '$.type') as event_type,
    json_extract_string(line, '$.timestamp') as ts,
    json_extract_string(line, '$.tool_name') as tool,
    json_extract(line, '$.message.usage.input_tokens')::int as input_tokens,
    json_extract(line, '$.message.usage.output_tokens')::int as output_tokens,
    json_extract(line, '$.message.usage.cache_read_input_tokens')::int as cache_read
FROM read_text('~/.claude/projects/**/*.jsonl')
WHERE json_extract_string(line, '$.message.usage') IS NOT NULL
ORDER BY ts DESC
LIMIT 100;

-- Aggregate cost by day
SELECT
    date_trunc('day', json_extract_string(line, '$.timestamp')::timestamp) as day,
    sum(json_extract(line, '$.message.usage.input_tokens')::int) as total_input,
    sum(json_extract(line, '$.message.usage.output_tokens')::int) as total_output,
    sum(json_extract(line, '$.message.usage.cache_read_input_tokens')::int) as cache_reads,
    round(
        (sum(json_extract(line, '$.message.usage.input_tokens')::int) * 3.0
         + sum(json_extract(line, '$.message.usage.output_tokens')::int) * 15.0
         + sum(COALESCE(json_extract(line, '$.message.usage.cache_read_input_tokens')::int, 0)) * 0.3
        ) / 1000000.0, 4
    ) as estimated_cost_usd
FROM read_text('~/.claude/projects/**/*.jsonl')
WHERE json_extract_string(line, '$.message.usage') IS NOT NULL
GROUP BY day
ORDER BY day DESC;

Example 5: Detect Context Compounding

function detectCompounding(messages: ParsedMessage[]): {
  turnsBeforeCompounding: number;
  compoundingFactor: number;
  recommendation: string;
} {
  // Track input token growth per API call
  const inputTokenSeries: number[] = [];

  for (const msg of messages) {
    if (msg.usage && !msg.isSidechain) {
      const totalInput =
        msg.usage.input_tokens +
        (msg.usage.cache_read_input_tokens ?? 0) +
        (msg.usage.cache_creation_input_tokens ?? 0);
      inputTokenSeries.push(totalInput);
    }
  }

  if (inputTokenSeries.length < 3) {
    return {
      turnsBeforeCompounding: inputTokenSeries.length,
      compoundingFactor: 1,
      recommendation: "Not enough data to detect compounding",
    };
  }

  // Find the point where input tokens start growing >20% per turn
  let compoundingStart = inputTokenSeries.length;
  for (let i = 1; i < inputTokenSeries.length; i++) {
    const growth = inputTokenSeries[i] / inputTokenSeries[i - 1];
    if (growth > 1.2) {
      compoundingStart = i;
      break;
    }
  }

  const first = inputTokenSeries[0];
  const last = inputTokenSeries[inputTokenSeries.length - 1];
  const compoundingFactor = last / first;

  let recommendation = "Context growth is manageable.";
  if (compoundingFactor > 5) {
    recommendation =
      `Context grew ${compoundingFactor.toFixed(1)}x. Consider using /clear ` +
      `after turn ${compoundingStart} to reset context.`;
  } else if (compoundingFactor > 3) {
    recommendation =
      `Context grew ${compoundingFactor.toFixed(1)}x. Session is getting expensive. ` +
      `Break complex tasks into shorter sessions.`;
  }

  return {
    turnsBeforeCompounding: compoundingStart,
    compoundingFactor,
    recommendation,
  };
}

Example 6: Shell Status Line Integration

#!/usr/bin/env bash

claude_usage() {
  local dir="$HOME/.claude/transcripts"
  local latest=$(ls -t "$dir"/*.jsonl 2>/dev/null | head -1)
  [ -z "$latest" ] && return

  # Extract token counts from the most recent session
  local tokens=$(python3 -c "
import json, sys
total = 0
with open('$latest') as f:
    for line in f:
        try:
            d = json.loads(line)
            u = d.get('message', {}).get('usage', {})
            total += u.get('input_tokens', 0) + u.get('output_tokens', 0)
            total += u.get('cache_read_input_tokens', 0)
        except: pass
print(total)
" 2>/dev/null)

  if [ "$tokens" -gt 100000 ]; then
    echo -n "%F{red}${tokens}tok%f"   # Red: high usage
  elif [ "$tokens" -gt 50000 ]; then
    echo -n "%F{yellow}${tokens}tok%f" # Yellow: moderate
  else
    echo -n "%F{green}${tokens}tok%f"  # Green: low
  fi
}

RPROMPT='$(claude_usage)'

Example 7: Multi-Platform Session Discovery

// Discover sessions across all supported platforms
interface PlatformSession {
  platform: string;
  path: string;
  lastModified: Date;
  sizeBytes: number;
}

function discoverAllPlatformSessions(): PlatformSession[] {
  const home = homedir();
  const sessions: PlatformSession[] = [];

  const platforms: { name: string; glob: string }[] = [
    { name: "Claude Code", glob: ".claude/projects/**/*.jsonl" },
    { name: "Claude Code Transcripts", glob: ".claude/transcripts/*.jsonl" },
    { name: "Codex CLI", glob: ".codex/sessions/*" },
    { name: "Gemini CLI", glob: ".gemini/tmp/*/chats/*.json" },
    { name: "Amp", glob: ".local/share/amp/threads/*" },
    { name: "OpenClaw", glob: ".openclaw/agents/*" },
    { name: "Factory Droid", glob: ".factory/sessions/*" },
    { name: "Pi", glob: ".pi/agent/sessions/*" },
    { name: "Kimi CLI", glob: ".kimi/sessions/*" },
    { name: "Qwen CLI", glob: ".qwen/projects/*" },
  ];

  for (const platform of platforms) {
    try {
      // Use glob to find matching files
      const fullGlob = join(home, platform.glob);
      // In practice, use a glob library like fast-glob
      // This is illustrative of the discovery pattern
      console.log(`Scanning ${platform.name}: ${fullGlob}`);
    } catch {
      continue;
    }
  }

  return sessions;
}

Example 8: Weekly Budget Optimizer

interface OptimizationSuggestion {
  category: string;
  currentCost: number;
  projectedSavings: number;
  action: string;
  confidence: "high" | "medium" | "low";
}

function generateOptimizations(
  sessions: SessionSummary[]
): OptimizationSuggestion[] {
  const suggestions: OptimizationSuggestion[] = [];

  // Check for long sessions that should be split
  const longSessions = sessions.filter((s) => s.durationMinutes > 60);
  if (longSessions.length > 0) {
    const avgCost = longSessions.reduce((s, x) => s + x.costEstimate, 0) / longSessions.length;
    suggestions.push({
      category: "Session Length",
      currentCost: avgCost,
      projectedSavings: avgCost * 0.4, // 40% savings from splitting
      action: `Split ${longSessions.length} sessions >60min. Use /clear between subtasks.`,
      confidence: "high",
    });
  }

  // Check for low cache hit rates
  const lowCacheSessions = sessions.filter((s) => s.cacheHitRate < 30);
  if (lowCacheSessions.length > 0) {
    suggestions.push({
      category: "Cache Efficiency",
      currentCost: lowCacheSessions.reduce((s, x) => s + x.costEstimate, 0),
      projectedSavings: lowCacheSessions.reduce((s, x) => s + x.costEstimate, 0) * 0.5,
      action: "Improve CLAUDE.md to reduce context re-sending. Use /compact more often.",
      confidence: "medium",
    });
  }

  // Check for excessive file reading
  for (const session of sessions) {
    const readCalls = session.topTools.find((t) => t.name === "Read");
    if (readCalls && readCalls.count > 20) {
      suggestions.push({
        category: "File Reading",
        currentCost: session.costEstimate * 0.3,
        projectedSavings: session.costEstimate * 0.15,
        action: `Session ${session.id}: ${readCalls.count} file reads. Use Grep/Glob to narrow before reading.`,
        confidence: "medium",
      });
    }
  }

  return suggestions.sort((a, b) => b.projectedSavings - a.projectedSavings);
}

Despite 15+ tools in the ecosystem, significant gaps remain.

Gap 1: No Unified Cross-Platform View

tokscale supports 15+ platforms but still has limitations. No tool today can answer: “How many total tokens did I consume across Claude Code, Cursor, and Copilot this week, and what was the blended cost?” Each platform uses different units, pricing models, and data formats.

Gap 2: Activity Attribution Is Primitive

Every tool can tell you how many tokens you used. Almost none can tell you why. The Pattern 3 code above (activity classification) doesn’t exist in any shipped product. When a developer hits their limit, they need to know: “Was it the large codebase read, the long conversation, or the repeated failed builds?”

Gap 3: No Optimization Suggestions

The tools report. None prescribe. An ideal system would say: “Your session at 2:15 PM consumed 40% of your daily budget. 70% of that was re-reading files you’d already seen. Next time, use /clear after the first task and start a fresh session for the second.”

Gap 4: No Proactive Warnings

The Claude Code Usage Monitor’s ML predictions are the closest thing to proactive warnings. But even it can only predict based on past patterns — it can’t intervene. Nobody has built the equivalent of a “low battery” notification that fires at 80% usage with suggestions for how to stretch the remaining 20%.

Gap 5: No Team Comparison Without Enterprise

Claude Code’s official analytics are only available on Team and Enterprise plans. Individual Pro/Max subscribers — who arguably need this data most — have no official visibility. The OTEL export works for individuals, but requires running infrastructure.

Gap 6: Billing Window Drift Is Invisible

The 5-hour rolling window means your budget constantly refreshes as old usage falls off. But no tool clearly shows: “In 23 minutes, your oldest expensive session will expire from the window, freeing up 15,000 tokens.” This is the equivalent of showing a cell phone data counter without showing when the billing period resets.

Gap 7: Cross-Session Context Recommendations

If a developer works on the same project across 5 sessions per day, they re-establish context each time. No tool tracks cross-session patterns to suggest: “You’ve spent 12,000 tokens on context setup for project X today. Consider using a persistent CLAUDE.md file to reduce this.”


ncdu is a terminal-based disk usage analyzer. DaisyDisk is its GUI equivalent for macOS. Both solve the same problem: “Where is my disk space going?” They share design principles that apply perfectly to token usage:

Principle 1: Hierarchical Drill-Down

ncdu shows disk usage as a tree: directory > subdirectory > file. You can drill down to find the single file consuming the most space.

The token equivalent:

Week > Day > Session > Turn > API Call
  |      |      |        |       |
  |      |      |        |       └── 2,340 tokens (Read: package.json)
  |      |      |        └── 8,200 tokens (3 API calls)
  |      |      └── 45,000 tokens (12 turns, 47 tool calls)
  |      └── 89,000 tokens (3 sessions)
  └── 312,000 tokens (5 days)

Principle 2: Relative Sizing

DaisyDisk uses a sunburst diagram where segment size represents proportional usage. The token equivalent would show sessions as proportional blocks — immediately revealing the “big spender” sessions.

┌─────────────────────────────────────────────────┐
│ Weekly Token Budget: 312K / 500K (62%)          │
├──────────────────────┬──────────┬──────┬────────┤
│ Mon: 89K (29%)       │ Tue: 72K │ Wed  │ Thu    │
│ ┌──────────┬───────┐ │ (23%)    │ 65K  │ 86K    │
│ │session-1 │ s-2   │ │          │ (21%)│ (28%)  │
│ │ 45K      │ 32K   │ │          │      │        │
│ │ (51%)    │ (36%) │ │          │      │        │
│ └──────────┴───────┘ │          │      │        │
│ s-3: 12K (13%)       │          │      │        │
└──────────────────────┴──────────┴──────┴────────┘

And here is the equivalent diagram translated into Mermaid syntax:

block-beta
  columns 4
  Weekly["Weekly Token Budget: 312K / 500K (62%)"]:4
  block:Mon:1
    columns 2
    TitleMon["Mon: 89K (29%)"]:2
    s1["session-1<br/>45K (51%)"]:1
    s2["s-2<br/>32K (36%)"]:1
    s3["s-3<br/>12K (13%)"]:2
  end
  Tue["Tue: 72K<br/>(23%)"]
  Wed["Wed: 65K<br/>(21%)"]
  Thu["Thu: 86K<br/>(28%)"]

Principle 3: Actionability

ncdu lets you delete files directly from the interface. The token equivalent would offer actions:

Principle 4: Zero-Config

ncdu works with ncdu /path. No configuration, no setup, no accounts. The token equivalent is npx ccusage daily. The best tools already follow this principle.

What the ncdu-for-Tokens Looks Like

Token Usage Analyzer — Week of March 10-17, 2026
Usage: 312,441 / 500,000 tokens (62.5%)

  ████████████████████████████████░░░░░░░░░░░░░░░░░░

  Day         Sessions  Tokens    Cost     Cache%
  ─────────────────────────────────────────────────
  Mon Mar 10       3     89,231   $0.58    45%
  Tue Mar 11       2     72,104   $0.47    62%
  Wed Mar 12       4     65,890   $0.42    71%  ← best cache day
  Thu Mar 13       3     85,216   $0.55    38%  ← worst cache day

  Top Sessions (by token cost):
  ─────────────────────────────────────────────────
  1. atlas/refactor-pipeline    45,221 tok  $0.29  Mon 2:15pm  62min
     → 60% file reading, 25% code writing, 15% conversation
     → Recommendation: Split after file exploration phase

  2. api-mom/fix-auth-bug       38,102 tok  $0.25  Thu 9:30am  45min
     → 40% bash execution, 35% code writing, 25% file reading
     → Low cache rate (22%). Prime context with CLAUDE.md.

  3. pages-plus/new-component   32,445 tok  $0.21  Mon 4:00pm  38min
     → 55% code writing, 30% file reading, 15% conversation
     → Good cache rate (68%). Well-structured session.

  [d]rill down  [s]ort by  [e]xport json  [q]uit

This doesn’t exist yet. The closest is ccusage’s session view combined with Claude Spend’s web dashboard. But nobody has combined hierarchical drill-down, activity attribution, and optimization recommendations into a single tool.


The Data Landscape

PlatformLocal Data FormatLocationToken DataCost DataCache Data
Claude CodeJSONL~/.claude/projects/Per-API-callCalculableYes (4 types)
Codex CLIJSONL/JSON~/.codex/sessions/Per-sessionLimitedNo
Gemini CLIJSON~/.gemini/tmp/*/chats/Per-sessionVia /statsYes (context caching)
CursorSQLite + APIstate.vscdb + APIVia APIVia APINo
CopilotNDJSON + APIExtension logs + APIOrg-levelOrg-levelNo
OpenCodeSQLite~/.local/share/opencode/Per-sessionLimitedNo

The Unification Problem

Building a cross-platform analyzer requires solving three problems:

1. Schema Normalization

// Universal token event (normalized across platforms)
interface NormalizedTokenEvent {
  platform: "claude-code" | "codex" | "gemini" | "cursor" | "copilot" | "opencode";
  timestamp: Date;
  sessionId: string;
  project?: string;
  model: string;
  tokens: {
    input: number;
    output: number;
    cacheRead?: number;
    cacheWrite?: number;
    reasoning?: number; // Codex/OpenAI-specific
  };
  cost: {
    amount: number;
    currency: "USD";
    confidence: "exact" | "estimated";
  };
  activity?: {
    type: ActivityType;
    toolName?: string;
  };
}

2. Pricing Normalization

Each platform prices differently. Claude Code charges per-token with cache discounts. Cursor uses a monthly credit system. Copilot is flat-rate for individuals but token-based for enterprises. Gemini CLI is free within limits, then per-token.

interface PricingModel {
  platform: string;
  type: "per-token" | "credit-based" | "flat-rate" | "free-tier-then-token";
  inputRate: number;         // per million tokens
  outputRate: number;        // per million tokens
  cacheReadRate?: number;    // per million tokens
  monthlyCredit?: number;    // USD (Cursor)
  freeLimit?: number;        // tokens (Gemini)
}

const PRICING_MODELS: PricingModel[] = [
  {
    platform: "claude-code",
    type: "per-token",
    inputRate: 3.00,
    outputRate: 15.00,
    cacheReadRate: 0.30,
  },
  {
    platform: "cursor",
    type: "credit-based",
    inputRate: 3.00, // varies by model
    outputRate: 15.00,
    monthlyCredit: 20.00,
  },
  {
    platform: "copilot",
    type: "flat-rate",
    inputRate: 0, // included in subscription
    outputRate: 0,
  },
  {
    platform: "gemini",
    type: "free-tier-then-token",
    inputRate: 1.25,
    outputRate: 5.00,
    freeLimit: 1_500_000, // per day
  },
];

3. Activity Classification Across Platforms

Each platform uses different tool names. Claude Code calls it “Read” and “Write”. Cursor calls it “file read” and “file edit”. Copilot doesn’t expose tool-level data at all. A cross-platform tool needs a mapping layer.

Why tokscale Leads

tokscale already supports 15+ platforms because it focuses on the common denominator: token counts. It doesn’t try to normalize activities or provide optimization recommendations — it just counts tokens across platforms. This is the right v1 approach. The question is whether anyone will build v2 with the deeper analysis.


Tool Comparison Matrix

FeatureccusagetokscaleClaude SpendUsage MonitorLimit TrackerClaude Usage Tracker (macOS)
Stars11.6K1.2K2367K151.6K
LanguageTypeScriptTS + RustTypeScriptPythonPythonSwift
InterfaceCLICLI (TUI)Web (localhost)Terminal (Rich)Shell statuslinemacOS menu bar
PlatformsClaude + 4 more15+ platformsClaude onlyClaude onlyClaude onlyClaude only
Real-timeYes (live mode)NoNoYesYesYes
Cost trackingYesYesYesYesEstimateYes (API)
Cache breakdownYesYesLimitedYesNoNo
ML predictionsNoNoNoYesNoNo
MCP integrationYesNoNoNoNoNo
JSON exportYesYesNoNoNoYes
Activity attributionNoNoLimitedNoNoNo
Optimization suggestionsNoNoNoNoNoNo
Zero configYes (npx)Yes (npx)Yes (npx)pip installuv runDMG install
Offline capableYesYesYesYesYesNo (needs API)

Approach Comparison

ApproachProsConsBest For
JSONL parsing (ccusage, Claude Spend)Offline, private, detailed per-API-call dataOnly sees local data, no quota awarenessIndividual developers, post-mortem analysis
API-based (Claude Usage Tracker macOS)Exact quota percentages from AnthropicRequires auth, network dependent, less granularReal-time quota monitoring
OpenTelemetry (OTEL pipeline)Production-grade, supports alerting, team-wideRequires infrastructure, setup costTeams, enterprises
Browser extension (Cursor extensions)Integrated into workflow, real-timePlatform-specific, limited depthCursor users specifically
Shell integration (limit-tracker)Always visible, zero overheadLimited detail, text-onlyDevelopers who live in the terminal
Cross-platform (tokscale)Unified view across toolsJack-of-all-trades depthMulti-tool users

Platform Analytics Comparison

CapabilityClaude CodeCursorCopilotGemini CLICodex CLI
Official dashboardTeams/Enterprise onlyYes (teams)Yes (org admins)GCP monitoringNo
Individual analytics/cost command onlySettings pageExtension estimates/stats command/status command
OTEL exportNative, comprehensiveNoNoNative, with GCPNo
Local data richnessExcellent (JSONL per-call)Good (SQLite)Limited (logs)Good (JSON)Moderate
Third-party ecosystem10+ tools5+ tools3+ tools1-2 tools1-2 tools
Per-request token dataYesVia APINo (org aggregate)YesLimited
Cache token trackingYes (4 types)NoNoYes (context cache)No
Tool-level attributionYes (in JSONL)NoNoNoNo

Don’tDo InsteadWhy
Run a single session for 3+ hours without /clearBreak into 30-60 min focused sessionsContext compounding can multiply token cost 5-10x
Read entire files when you need one functionUse Grep/Glob first, then read targeted linesA 2000-line Read costs 2-3K input tokens every turn it stays in context
Ignore cache hit ratesMonitor with ccusage daily --breakdownCache reads cost 90% less — low cache rates mean you’re overpaying
Track only total tokensBreak down by input/output/cache typesOutput tokens cost 3-5x more than input — a session with verbose output is disproportionately expensive
Wait until you hit the limit to check usageAdd ccusage statusline to your shell promptProactive monitoring prevents the “sudden wall” experience
Use one tool’s analytics for all platformsUse tokscale for cross-platform, ccusage for Claude-deep-divesNo single tool is best at everything
Run OTEL with high-cardinality metricsDisable session.id attribute if metrics storage is expensiveEach session creates a new time series — unbounded cardinality
Assume the 5-hour window is your only limitTrack weekly usage tooWeekly limits exist independently and can block you even with 5-hour budget remaining
Build custom analytics from scratchStart with npx ccusage daily --json and build on topThe JSONL parsing is solved — focus your effort on the analysis layer
Set OTEL_LOG_USER_PROMPTS=1 in shared environmentsLeave it disabled, log only prompt lengthUser prompts can contain proprietary code, credentials, or personal data

The Market

Willingness to Pay

The existing tools are all open-source. But there are signals:

Potential Business Models

ModelPrice PointTargetViability
Freemium CLIFree + $5-10/mo for advanced featuresIndividual developersMedium — hard to gate CLI features
Team dashboard SaaS$10-20/user/moEngineering teamsHigh — teams need comparison, budgeting, alerting
Enterprise OTEL add-on$50-100/user/moLarge orgsHigh — integrates with existing observability
API proxy with analyticsPay-per-use markupAPI usersLow — adds latency and trust issues
Desktop app$15-30 one-timeMac/Windows usersMedium — DaisyDisk model, but small market

The “Mint for AI” Opportunity

Mint (the personal finance app) succeeded by aggregating all financial accounts into one view and providing categorized spending analysis with optimization suggestions. The AI equivalent:

Nobody has built this. The pieces exist (tokscale for multi-platform, ccusage for deep analysis, Usage Monitor for predictions), but the integrated product does not.

Why Open Source Might Win

The challenge for commercial products: the data is local. Users don’t want to upload their coding conversations to a SaaS. The macOS menu bar apps and CLI tools work precisely because they run locally. A commercial product would need to either:

  1. Run entirely locally (like DaisyDisk — one-time purchase, no cloud)
  2. Aggregate only metadata (token counts, not content)
  3. Provide team-level value that justifies cloud aggregation

Option 2 is most likely. Teams already share token usage data via OTEL to Grafana. A hosted version with better UX and built-in optimization recommendations could charge $10-20/user/month.


Based on the gap analysis, here’s what the ecosystem actually needs:

1. Activity Attribution Engine (The Missing Layer)

Every tool counts tokens. Nobody classifies them by activity type. Build a middleware that:

// The API that should exist
import { AttributionEngine } from "@token-analytics/attribution";

const engine = new AttributionEngine();
const enriched = engine.analyze("~/.claude/projects/my-project/session.jsonl");

console.log(enriched.summary);
// {
//   file_reading: { tokens: 45000, cost: 0.14, percentage: 35 },
//   code_writing: { tokens: 32000, cost: 0.10, percentage: 25 },
//   code_execution: { tokens: 25000, cost: 0.08, percentage: 20 },
//   conversation: { tokens: 18000, cost: 0.06, percentage: 14 },
//   git_operations: { tokens: 8000, cost: 0.03, percentage: 6 },
// }

console.log(enriched.recommendations);
// [
//   { action: "Use Grep before Read", savings: "~15%", confidence: "high" },
//   { action: "Split after exploration phase", savings: "~25%", confidence: "medium" },
// ]

2. Cross-Platform Budget Tracker

Extend tokscale’s approach with:

3. Proactive Warning System

A daemon (or shell integration) that:

4. Team Leaderboard with Context

tokscale’s global leaderboard is fun but meaningless. A team leaderboard with context would show:

5. Historical Analysis Dashboard

A lightweight web app (like Claude Spend, but deeper) that provides:


Official Documentation

CLI Tools (JSONL Parsers)

Real-Time Monitors

Platform-Native Apps

Observability Integrations

Cursor-Specific Tools

Copilot-Specific Tools

Log Viewers and Utilities

Analysis and Guides

Community Discussions and Bug Reports

Market and Adoption Data

Visualization Inspiration


Published March 17, 2026. This survey covers tools and data available as of this date. The AI coding assistant ecosystem moves fast — tools may have added features or new entrants may have appeared since publication.


Edit page
Share this post on:

Previous Post
AI Context Efficiency
Next Post
Claude Code Context Monitoring