Org Status: 🟡 Dormant Cloudflare: N/A Last Audited: 2026-04-28
Every developer using AI coding tools has experienced the same moment: mid-flow, deep in a complex refactor, and the tool stops responding. “You’ve reached your usage limit.” No warning, no breakdown, no explanation of what consumed the budget. Just a wall. This article surveys every tool, technique, and data source available in 2026 for understanding where your AI coding tokens actually go — and identifies what’s still missing.
What you’ll learn:
- What open-source tools exist for tracking AI coding assistant usage (15+ projects surveyed)
- How each tool works technically — what data it reads, how it calculates costs
- What raw data is available from Claude Code, Cursor, Copilot, Gemini CLI, and Codex
- The gap between what exists and what developers actually need
- The “ncdu analogy” — what great token usage UX would look like
- Cross-platform opportunity and commercial potential
- What should be built next
- The Problem: Quota Blindness
- The Data Layer: What Each Tool Stores Locally
- Core Concepts
- Survey of Existing Tools
- Patterns: How the Best Tools Work
- Small Examples
- The Gap: What’s Missing
- The ncdu Analogy
- Cross-Platform Opportunity
- Comparisons
- Anti-Patterns
- Commercial Potential
- What Should Be Built
- References
The AI coding assistant market hit $3.5 billion in 2025. By early 2026, 85% of developers regularly use AI tools for coding, and 70% juggle two to four tools simultaneously. GitHub Copilot has 20M+ users. Claude Code overtook it as the most-used AI coding tool. Cursor crossed $500M ARR.
Every one of these tools has usage limits. None of them adequately explains where your budget went.
The 5-Hour Window Problem
Claude Code operates on a rolling 5-hour window. Pro users get approximately 44,000 tokens per window. Max5 users get around 88,000. Max20 users get roughly 220,000. Starting in August 2025, Anthropic added weekly limits on top of the 5-hour windows as a response to unsustainable consumption rates.
The problem is not the limit — it’s the opacity. You don’t know:
- Which session consumed the most tokens
- Whether your context was efficiently cached or re-sent fresh
- Which tool calls were expensive (a
Readof a 2000-line file vs. aBashecho) - Whether your conversation hit the “context compounding” problem where each turn re-sends the entire history
- What your burn rate is relative to the window reset
The $131 Cursor Surprise
When Cursor switched to usage-based pricing in June 2025, one developer’s bill jumped from $19 to $131 — a 700% increase. Claude accounted for 93% of their token consumption. They had no idea until the invoice arrived.
The 2.1.1 Bug Report
In January 2026, users reported hitting 5-hour limits 4x faster after updating to Claude Code 2.1.1. Some found that rolling back to 2.0.61 fixed it. Others disagreed. Anthropic said they hadn’t identified a flaw. Without granular usage data, nobody could prove anything — it was a he-said-she-said between users and the provider.
Key insight: The token economy is the first economy where consumers have no receipts. You pay (in quota or dollars), you consume, and you have no itemized bill. Every other metered utility — electricity, water, cloud compute, cellular data — gives you a breakdown. AI coding tools do not.
What Changes If You Get This Right
A developer who built Claude Spend discovered that just 3 marathon conversations consumed 60%+ of all their tokens due to compounding context. With that single insight, they could restructure their workflow — shorter sessions, more frequent /clear commands, targeted context loading — and extend their effective quota by 2-3x without spending more money.
That’s the prize: same budget, dramatically more output. You just need to see where the tokens go.
Before we survey the tracking tools, we need to understand the raw material they work with. Every AI coding assistant stores conversation data locally. The format, location, and richness of this data determines what any analytics tool can possibly show you.
Claude Code: JSONL Transcripts
Claude Code stores conversation transcripts as JSONL files (one JSON object per line) in two locations:
~/.claude/projects/<project-name>/<conversation-id>.jsonl # Project-specific
~/.claude/transcripts/<session-id>.jsonl # All sessions
Each line represents an event. The schema:
// Core event types in Claude Code JSONL files
interface UserEvent {
type: "user";
timestamp: string; // ISO 8601
content: string; // The user's prompt text
}
interface ToolUseEvent {
type: "tool_use";
timestamp: string;
tool_name: string; // "Read", "Write", "Bash", "Grep", etc.
tool_input: Record<string, unknown>;
}
interface ToolResultEvent {
type: "tool_result";
timestamp: string;
tool_name: string;
tool_input: Record<string, unknown>;
}
// Assistant messages with token usage
interface AssistantMessage {
parentUuid: string;
sessionId: string;
version: string;
gitBranch: string;
cwd: string;
uuid: string;
timestamp: string;
isSidechain: boolean;
isApiErrorMessage: boolean;
message: {
role: "assistant";
content: ContentBlock[];
usage: TokenUsage;
};
}
interface TokenUsage {
input_tokens: number;
output_tokens: number;
cache_creation_input_tokens?: number;
cache_read_input_tokens?: number;
}
Key insight: The
usagefield on assistant messages is the gold mine. It tells you exactly how many tokens were consumed per API call, split by type. TheisSidechainflag distinguishes parallel agent calls from main conversation flow. The three input token types (regular, cache creation, cache read) have dramatically different costs — cache reads are 90% cheaper.
A typical session produces 300-800 JSONL lines. Heavy sessions with lots of file reads and edits can produce thousands. Here’s what a real session breakdown looks like:
// Counting message types from a real Claude Code session
const messageCounts = {
user: 35, // 35 human prompts
tool_use: 171, // 171 tool invocations
tool_result: 166 // 166 tool results (some tools fail)
};
// Ratio: ~5 tool calls per human message
// This is typical for agentic coding sessions
Cursor: SQLite + Undocumented API
Cursor stores local data in SQLite databases, primarily state.vscdb. This includes chat history, compose logs, and metrics. The key discovery that enabled third-party tracking was an undocumented API endpoint found by observing browser redirects during CSV downloads:
// Cursor data access paths
interface CursorDataSources {
// Local SQLite database with chat history
localDb: "~/Library/Application Support/Cursor/User/globalStorage/state.vscdb";
// JWT tokens for API auth (extracted from SQLite)
authToken: "state.vscdb -> JWT -> session token conversion";
// Undocumented usage API (discovered via browser redirect interception)
usageApi: "https://www.cursor.com/api/usage"; // requires session token
// Tokscale cache location for synced data
tokscaleCache: "~/.config/tokscale/cursor-cache/";
}
Cursor’s local data is stored in plain-text SQLite — a documented privacy concern in their community forums. This is both a risk and an opportunity: it means third-party tools can read it.
GitHub Copilot: NDJSON Export + Metrics API
Copilot provides the most structured data access of any tool, but primarily for organization admins:
interface CopilotDataSources {
// Organization-level API (admin access required)
metricsApi: "GET /orgs/{org}/copilot/metrics";
usageApi: "GET /orgs/{org}/copilot/usage";
// NDJSON export for raw data
ndjsonExport: "Settings > Copilot > Export usage data";
// VS Code extension local logs
extensionLogs: "~/.vscode/extensions/github.copilot-*/logs/";
// Token tracking extension data
tokenTracker: "VS Code status bar via robBos.copilot-token-tracker";
}
// Metrics available via API
interface CopilotMetrics {
total_active_users: number;
total_engaged_users: number;
copilot_ide_code_completions: {
total_engaged_users: number;
total_code_suggestions: number;
total_code_acceptances: number;
total_code_lines_suggested: number;
total_code_lines_accepted: number;
};
copilot_ide_chat: {
total_engaged_users: number;
total_chats: number;
total_chat_insertion_events: number;
total_chat_copy_events: number;
};
}
The gap: individual developers on personal plans have almost no visibility. The VS Code token tracker extension estimates usage from local logs, but Copilot doesn’t expose per-request token counts the way Claude Code does.
Gemini CLI: JSON Chat Files
Gemini CLI stores conversation data in JSON files:
~/.gemini/tmp/*/chats/*.json
Gemini CLI also has built-in analytics via the /stats command, which shows token usage split by model and tool usage. Google provides a pre-configured monitoring dashboard in Google Cloud Monitoring when OpenTelemetry is configured.
interface GeminiDataSources {
// Local chat files
chatFiles: "~/.gemini/tmp/*/chats/*.json";
// Built-in stats command
builtInStats: "/stats"; // shows model, tool usage, idle time
// OpenTelemetry export to GCP
otelExport: {
dashboardTemplate: "Gemini CLI Monitoring";
directGcpExport: true; // No intermediate collector needed
};
}
OpenAI Codex CLI: Session Files
Codex CLI stores session data locally:
~/.codex/sessions/
Codex has a /status command for checking remaining limits during a session, but real-time cost visibility is a feature request, not a shipped feature. The data format is similar to Claude Code’s JSONL approach.
interface CodexDataSources {
// Local session storage
sessions: "~/.codex/sessions/";
// Usage records (when available)
usageRecords: "~/.codex/usage/";
// Built-in status check
builtInStatus: "/status"; // shows remaining limits
}
Token Types and Their Costs
Not all tokens cost the same. Understanding the four types is essential for optimization:
interface TokenBreakdown {
// Regular input tokens — full price
input_tokens: number;
// Output tokens — typically 3-5x more expensive than input
output_tokens: number;
// Cache creation — charged at ~1.25x input rate (one-time cost)
cache_creation_input_tokens: number;
// Cache read — charged at ~0.1x input rate (90% discount!)
cache_read_input_tokens: number;
}
// Real pricing for Claude Sonnet 4 (as of March 2026)
const SONNET_PRICING = {
input: 3.00, // per million tokens
output: 15.00, // per million tokens
cacheCreation: 3.75, // per million tokens
cacheRead: 0.30, // per million tokens — 10x cheaper!
};
function calculateCost(usage: TokenBreakdown): number {
return (
(usage.input_tokens * SONNET_PRICING.input +
usage.output_tokens * SONNET_PRICING.output +
(usage.cache_creation_input_tokens ?? 0) * SONNET_PRICING.cacheCreation +
(usage.cache_read_input_tokens ?? 0) * SONNET_PRICING.cacheRead) /
1_000_000
);
}
Key insight: Cache efficiency is the single biggest lever for cost optimization. A session where 80% of input tokens are cache reads costs roughly 8x less than one sending everything fresh. The tracking tools that distinguish cache hits from misses are dramatically more useful than those that just count total tokens.
The 5-Hour Rolling Window
Claude Code’s billing window is the most complex quota system among AI coding tools:
interface BillingWindow {
// 5-hour rolling window
windowDuration: 5 * 60 * 60 * 1000; // 5 hours in ms
// Tier-specific limits (approximate, Anthropic doesn't publish exact numbers)
limits: {
pro: 44_000; // ~44K tokens per 5-hour window
max5: 88_000; // ~88K tokens
max20: 220_000; // ~220K tokens
};
// Weekly limit overlay (added August 2025)
weeklyLimit: {
enabled: true;
// Resets on a rolling basis, not calendar-aligned
// Exact amounts vary and are not published
};
}
// The key insight: your effective budget is
// min(windowRemaining, weeklyRemaining)
// You can hit either limit independently
Context Compounding
The most expensive pattern in AI coding is context compounding — where each conversational turn re-sends the entire history:
// How context compounds in a long session
function simulateContextGrowth(turns: number, avgResponseTokens: number): number[] {
const contextPerTurn: number[] = [];
let totalContext = 0;
for (let i = 0; i < turns; i++) {
// Each turn sends ALL previous context + new prompt
const promptTokens = 200; // average user prompt
totalContext += promptTokens + avgResponseTokens;
contextPerTurn.push(totalContext);
}
return contextPerTurn;
}
// With 500-token average responses over 20 turns:
// Turn 1: 700 tokens sent
// Turn 5: 3,500 tokens sent
// Turn 10: 7,000 tokens sent
// Turn 20: 14,000 tokens sent
// CUMULATIVE: 147,000 tokens total input
//
// Without compounding (if each turn were independent):
// 20 × 700 = 14,000 tokens total
//
// That's a 10.5x multiplier from compounding alone.
Key insight: The Claude Spend developer discovered that 3 marathon conversations consumed 60%+ of all tokens. This is the direct result of context compounding. Short, focused sessions with
/clearbetween tasks are dramatically cheaper than long-running sessions.
OpenTelemetry: The Official Instrumentation Layer
Claude Code has first-class OpenTelemetry support — the most comprehensive of any AI coding tool. This is the “right” way to collect data at scale:
// Claude Code OTEL configuration
interface OtelConfig {
// Required: enable telemetry
CLAUDE_CODE_ENABLE_TELEMETRY: "1";
// Metrics exporter (time-series data)
OTEL_METRICS_EXPORTER: "otlp" | "prometheus" | "console";
// Logs/events exporter (per-event data)
OTEL_LOGS_EXPORTER: "otlp" | "console";
// OTLP endpoint
OTEL_EXPORTER_OTLP_PROTOCOL: "grpc" | "http/json" | "http/protobuf";
OTEL_EXPORTER_OTLP_ENDPOINT: string; // e.g., "http://localhost:4317"
}
// Metrics exported by Claude Code
interface ClaudeCodeMetrics {
"claude_code.session.count": Counter;
"claude_code.lines_of_code.count": Counter; // attributes: type=added|removed
"claude_code.pull_request.count": Counter;
"claude_code.commit.count": Counter;
"claude_code.cost.usage": Counter; // USD, per model
"claude_code.token.usage": Counter; // per type, per model
"claude_code.code_edit_tool.decision": Counter;
"claude_code.active_time.total": Counter; // seconds
}
// Events exported (richer than metrics)
interface ClaudeCodeEvents {
"claude_code.user_prompt": {
prompt_length: number;
prompt?: string; // only if OTEL_LOG_USER_PROMPTS=1
};
"claude_code.api_request": {
model: string;
cost_usd: number;
duration_ms: number;
input_tokens: number;
output_tokens: number;
cache_read_tokens: number;
cache_creation_tokens: number;
speed: "fast" | "normal";
};
"claude_code.tool_result": {
tool_name: string;
success: boolean;
duration_ms: number;
tool_result_size_bytes: number;
};
"claude_code.api_error": {
model: string;
error: string;
status_code: string;
attempt: number;
};
}
The OTEL approach gives you the richest data, but requires infrastructure (Grafana, Prometheus, Honeycomb, etc.). Most individual developers don’t run observability stacks. This is the gap that local analytics tools fill.
The ecosystem has exploded. Here’s every significant tool, organized by approach.
Tier 1: Full-Featured CLI Analyzers
ccusage (11.6K stars)
The dominant tool in the space. Written in TypeScript, maintained by ryoppippi, with 61 contributors and releases up to v18.0.10.
npx ccusage daily # Daily token usage and costs
npx ccusage monthly # Monthly aggregated reports
npx ccusage session # Per-conversation breakdown
npx ccusage blocks # 5-hour billing window tracking
npx ccusage statusline # One-line summary for shell prompts
npx ccusage daily --since 2026-03-01 --breakdown --json
npx ccusage session --project my-project --compact
What it reads: Local JSONL files from ~/.claude/projects/
Key features:
- Daily, monthly, session, and billing block aggregation
- Per-model cost breakdown (Opus vs. Sonnet)
- Cache token tracking (creation vs. read, separately)
- Live monitoring mode with real-time burn rate
- JSON export for programmatic use
- Offline mode with pre-cached pricing
- MCP server integration (
@ccusage/mcp) - Multi-platform: Claude Code + Codex CLI + OpenCode + Pi + Amp
Architecture: Pure TypeScript, minimal dependencies, no build step required. Uses LiteLLM pricing data. The @ccusage/codex package adds Codex CLI support.
// ccusage JSON output structure (from --json flag)
interface CcusageDailyOutput {
date: string;
totalInputTokens: number;
totalOutputTokens: number;
cacheCreationTokens: number;
cacheReadTokens: number;
totalCost: number; // USD
models: {
[modelName: string]: {
inputTokens: number;
outputTokens: number;
cost: number;
};
};
}
tokscale (1.2K stars)
The most ambitious cross-platform play. Built by junhoyeo with a Rust core for performance.
npx tokscale@latest
bunx tokscale@latest
tokscale login # GitHub auth
tokscale submit # Upload usage data
tokscale submit --dry-run
Supported platforms (15+):
| Platform | Data Location | Method |
|---|---|---|
| Claude Code | ~/.claude/projects/ | JSONL parsing |
| Codex CLI | ~/.codex/sessions/ | Session file parsing |
| Gemini CLI | ~/.gemini/tmp/*/chats/*.json | JSON parsing |
| Cursor IDE | ~/.config/tokscale/cursor-cache/ | API sync |
| OpenCode | ~/.local/share/opencode/opencode.db | SQLite |
| Amp (AmpCode) | ~/.local/share/amp/threads/ | File parsing |
| OpenClaw | ~/.openclaw/agents/ | File parsing |
| Factory Droid | ~/.factory/sessions/ | File parsing |
| Pi | ~/.pi/agent/sessions/ | File parsing |
| Kimi CLI | ~/.kimi/sessions/ | File parsing |
| Qwen CLI | ~/.qwen/projects/ | File parsing |
| Roo Code | VS Code extension storage | Storage API |
| Kilo | VS Code extension storage | Storage API |
Key features:
- Native Rust core (“10x faster processing”)
- Interactive TUI with 4 views
- GitHub-style contribution graph (9 color themes)
- Global leaderboard with public profiles
- Real-time pricing via LiteLLM with OpenRouter fallback
- Detailed breakdowns: input, output, cache read/write, reasoning tokens
- JSON export
Key insight: tokscale’s leaderboard feature turns token consumption into a social metric — a “Kardashev scale” for AI-augmented development. This is either brilliant gamification or a race to the bottom of cost efficiency, depending on your perspective.
Claude Spend (236 stars)
The tool that went viral on Reddit and got its creator their first 100 real users.
npx claude-spend
npx claude-spend --port 8080 --no-open
What it provides: A web dashboard on localhost showing:
- Token usage per conversation
- Daily usage trends
- Model-specific consumption
- Insight: which prompts cost the most
Key differentiator: Web UI instead of terminal output. Opens a browser dashboard automatically. Privacy-first — no data leaves your machine.
Tier 2: Real-Time Monitors
Claude Code Usage Monitor (7K stars)
A Python-based real-time terminal monitor by Maciek-roboblog.
uv pip install claude-monitor
pip install claude-monitor
claude-monitor
cmonitor
ccm
Key features:
- Rich terminal UI with color-coded progress bars
- Machine learning-based predictions (P90 percentile analysis)
- Custom plan detection: analyzes last 192 hours to calculate personalized limits
- Multi-plan support: Pro, Max5, Max20, Custom
- WCAG-compliant contrast
- Configurable refresh rates (0.1-20 Hz)
- Auto theme detection (light/dark/classic)
ML predictions: The tool analyzes all sessions from the last 8 days, calculates P90 percentiles, and predicts when you’ll hit limits with 95% confidence. This is the only tool that tries to predict rather than just report.
claude-code-limit-tracker (15 stars)
A shell-integrated tracker by TylerGallenbeck that displays quota in your terminal prompt.
uv run python install.py
Technical approach: Reads ~/.claude/projects/, calculates session durations from timestamps, counts prompts (filtering system messages), identifies model from assistant responses, and tracks per-model quotas separately for Sonnet and Opus.
Tier 3: Platform-Native Apps
Claude Usage Tracker — macOS (1.6K stars)
A native macOS menu bar app by hamed-elfayome, built with Swift/SwiftUI.
How it works: Unlike JSONL-parsing tools, this one authenticates directly with Claude’s API using session keys from browser cookies or Claude Code CLI credentials. It hits api.anthropic.com/api/oauth/usage for real utilization percentages.
Features:
- 6-tier pace system with colored markers
- Multi-profile support (unlimited accounts)
- Terminal statusline integration
- Usage history with interactive charts
- Export to JSON/CSV
- macOS Keychain storage for credentials
- Apple code signed — no security warnings
Key differentiator: It gets the actual quota percentages from Anthropic’s API rather than estimating from local data. This means it knows your exact remaining budget, not just what you’ve consumed.
TokenMeter (macOS)
Another macOS menu bar app (Priyans-hu/tokenmeter) that reads Claude Code’s OAuth token from macOS Keychain and calls the same usage API endpoint.
CodexBar (macOS)
steipete/CodexBar shows usage stats for both OpenAI Codex and Claude Code from the menu bar, without requiring login.
Tier 4: Observability-Stack Integrations
claude-code-monitor — OpenTelemetry (7 stars)
A dedicated OTEL receiver that captures Claude Code telemetry and provides a web dashboard.
Architecture:
- OTLP Receiver on port 4318
- Web Dashboard on port 3000
- Prometheus Exporter on port 9464
- Data stored in
daily-usage.json— no external DB required
claude_telemetry
TechNickAI/claude_telemetry is a drop-in replacement that swaps the claude command for claudia, wrapping Claude Code with OpenTelemetry instrumentation. Exports to Logfire, Sentry, Honeycomb, or Datadog.
claude-code-otel
ColeMurray/claude-code-otel provides a comprehensive observability solution with Docker Compose configurations, Prometheus, and Grafana dashboards.
Tier 5: Browser Extensions (Cursor-Specific)
| Extension | Platform | Method | Features |
|---|---|---|---|
| Cursor Stats | Chrome | Dashboard scraping | Usage visualization |
| Cursor Token Tracker | Chrome | API | Cost tracking, budget projection, trend charts |
| Cursor Usage & Cost Tracker | VS Code | Extension | In-editor monitoring, monthly budget alerts |
| Cursor-Pulse | VS Code | Extension | Real-time quota tracking |
Tier 6: Copilot-Specific
| Tool | Type | Features |
|---|---|---|
| Copilot Token Tracker | VS Code Extension | Daily/monthly estimates in status bar |
| copilot-metrics-viewer | Web Dashboard | Org-level Copilot Business API visualization |
| Copilot Fluency Score | VS Code Extension | 4-stage maturity model across 6 categories |
Tier 7: Log Viewers (Non-Analytics)
| Tool | Purpose |
|---|---|
| claude-code-log | Converts JSONL transcripts to readable HTML |
| clog | Web-based conversation log viewer with real-time monitoring |
| tps-viewer | Visualizes Claude Code tokens-per-second over time |
| claude-monitor-gnome-extension | GNOME Shell 48 taskbar monitor |
Pattern 1: JSONL Streaming Parser
The foundation of every Claude Code analytics tool. Parse JSONL files, extract token usage, aggregate by time period.
import { readFileSync, readdirSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
interface TokenUsage {
input_tokens: number;
output_tokens: number;
cache_creation_input_tokens?: number;
cache_read_input_tokens?: number;
}
interface ParsedMessage {
timestamp: Date;
sessionId: string;
model?: string;
usage?: TokenUsage;
type: "user" | "assistant" | "tool_use" | "tool_result";
toolName?: string;
isSidechain: boolean;
}
function parseJsonlFile(filePath: string): ParsedMessage[] {
const content = readFileSync(filePath, "utf-8");
const lines = content.trim().split("\n");
const messages: ParsedMessage[] = [];
for (const line of lines) {
try {
const entry = JSON.parse(line);
const parsed: ParsedMessage = {
timestamp: new Date(entry.timestamp),
sessionId: entry.sessionId ?? "",
type: entry.type,
isSidechain: entry.isSidechain ?? false,
toolName: entry.tool_name,
};
// Extract usage from assistant messages
if (entry.message?.usage) {
parsed.usage = entry.message.usage;
parsed.model = inferModel(entry);
}
messages.push(parsed);
} catch {
// Skip malformed lines — JSONL files can be interrupted mid-write
continue;
}
}
return messages;
}
function inferModel(entry: Record<string, unknown>): string | undefined {
// Model identification varies by Claude Code version
// Some versions include it directly, others require inference
const message = entry.message as Record<string, unknown> | undefined;
if (message?.model) return message.model as string;
// Heuristic: check content for model markers
const content = message?.content;
if (Array.isArray(content)) {
for (const block of content) {
if (typeof block === "object" && block !== null && "model" in block) {
return (block as { model: string }).model;
}
}
}
return undefined;
}
function discoverAllSessions(): Map<string, ParsedMessage[]> {
const projectsDir = join(homedir(), ".claude", "projects");
const sessions = new Map<string, ParsedMessage[]>();
try {
const projects = readdirSync(projectsDir);
for (const project of projects) {
const projectDir = join(projectsDir, project);
const files = readdirSync(projectDir).filter((f) =>
f.endsWith(".jsonl")
);
for (const file of files) {
const filePath = join(projectDir, file);
const messages = parseJsonlFile(filePath);
const sessionId = file.replace(".jsonl", "");
sessions.set(`${project}/${sessionId}`, messages);
}
}
} catch {
// Directory may not exist on fresh installs
}
return sessions;
}
When to use: Any time you need to analyze Claude Code usage. This is the foundational pattern.
Gotchas:
- JSONL files can be interrupted mid-write (always handle parse errors)
- The
isSidechainflag is critical — sidechain operations have separate context cache_read_input_tokensmay be undefined in older Claude Code versions- File paths use URL-encoded project names (spaces become
%20, etc.)
Pattern 2: Rolling Window Calculator
Computing remaining budget in the 5-hour billing window.
interface WindowStatus {
windowStart: Date;
windowEnd: Date;
tokensUsed: number;
tokenLimit: number;
remainingTokens: number;
percentUsed: number;
burnRate: number; // tokens per minute
projectedExhaustion: Date | null;
minutesUntilReset: number;
}
type SubscriptionTier = "pro" | "max5" | "max20";
const TIER_LIMITS: Record<SubscriptionTier, number> = {
pro: 44_000,
max5: 88_000,
max20: 220_000,
};
const WINDOW_DURATION_MS = 5 * 60 * 60 * 1000;
function calculateWindowStatus(
messages: ParsedMessage[],
tier: SubscriptionTier,
now: Date = new Date()
): WindowStatus {
const windowStart = new Date(now.getTime() - WINDOW_DURATION_MS);
const windowEnd = now;
const tokenLimit = TIER_LIMITS[tier];
// Sum tokens in current window (only from non-sidechain messages with usage)
const windowMessages = messages.filter(
(m) =>
m.timestamp >= windowStart &&
m.timestamp <= windowEnd &&
m.usage &&
!m.isSidechain
);
let tokensUsed = 0;
for (const msg of windowMessages) {
if (msg.usage) {
tokensUsed +=
msg.usage.input_tokens +
msg.usage.output_tokens +
(msg.usage.cache_creation_input_tokens ?? 0) +
(msg.usage.cache_read_input_tokens ?? 0);
}
}
const remainingTokens = Math.max(0, tokenLimit - tokensUsed);
const percentUsed = (tokensUsed / tokenLimit) * 100;
// Calculate burn rate from last 30 minutes of activity
const recentWindow = new Date(now.getTime() - 30 * 60 * 1000);
const recentMessages = windowMessages.filter(
(m) => m.timestamp >= recentWindow
);
let recentTokens = 0;
for (const msg of recentMessages) {
if (msg.usage) {
recentTokens +=
msg.usage.input_tokens +
msg.usage.output_tokens +
(msg.usage.cache_creation_input_tokens ?? 0) +
(msg.usage.cache_read_input_tokens ?? 0);
}
}
const minutesElapsed = Math.max(
1,
(now.getTime() - recentWindow.getTime()) / 60_000
);
const burnRate = recentTokens / minutesElapsed;
// Project when tokens will run out
let projectedExhaustion: Date | null = null;
if (burnRate > 0 && remainingTokens > 0) {
const minutesLeft = remainingTokens / burnRate;
projectedExhaustion = new Date(now.getTime() + minutesLeft * 60_000);
}
// Minutes until oldest messages fall out of window
const oldestInWindow = windowMessages.reduce(
(oldest, m) => (m.timestamp < oldest ? m.timestamp : oldest),
now
);
const minutesUntilReset =
(oldestInWindow.getTime() + WINDOW_DURATION_MS - now.getTime()) / 60_000;
return {
windowStart,
windowEnd,
tokensUsed,
tokenLimit,
remainingTokens,
percentUsed,
burnRate,
projectedExhaustion,
minutesUntilReset,
};
}
When to use: Real-time monitoring, shell status lines, burn rate alerts.
Gotchas:
- Anthropic doesn’t publish exact token limits — the numbers above are community estimates
- The weekly limit overlay means you can hit a wall even with 5-hour budget remaining
- Burn rate calculation needs smoothing — a single large file read can spike it
- Cache read tokens count toward usage even though they’re cheaper in dollar terms
Pattern 3: Cost Attribution by Activity Type
The most valuable and least-implemented pattern. Categorize token spend by what you were actually doing.
type ActivityType =
| "file_reading" // Read, Glob, Grep
| "code_writing" // Edit, Write
| "code_execution" // Bash
| "conversation" // Pure text exchange
| "planning" // TodoWrite, agent orchestration
| "search" // WebSearch, WebFetch
| "git_operations" // git commands via Bash
| "unknown";
interface ActivityCost {
type: ActivityType;
tokenCount: number;
estimatedCost: number;
messageCount: number;
percentage: number;
}
function classifyToolActivity(toolName: string, toolInput?: Record<string, unknown>): ActivityType {
// Direct tool classification
const toolMap: Record<string, ActivityType> = {
Read: "file_reading",
Glob: "file_reading",
Grep: "file_reading",
Edit: "code_writing",
Write: "code_writing",
NotebookEdit: "code_writing",
Bash: "code_execution",
TodoWrite: "planning",
TodoRead: "planning",
WebSearch: "search",
WebFetch: "search",
Agent: "planning",
};
if (toolMap[toolName]) {
// Special case: Bash commands that are git operations
if (toolName === "Bash" && toolInput) {
const command = (toolInput.command as string) ?? "";
if (command.startsWith("git ") || command.includes("| git")) {
return "git_operations";
}
}
return toolMap[toolName];
}
return "unknown";
}
function attributeCosts(
messages: ParsedMessage[],
pricing: Record<string, number>
): ActivityCost[] {
// Group messages into "turns" — each user message starts a turn
// that includes all subsequent tool calls and assistant responses
const turns: { activity: ActivityType; tokens: number }[] = [];
let currentActivity: ActivityType = "conversation";
let currentTokens = 0;
for (const msg of messages) {
if (msg.type === "user") {
// Save previous turn
if (currentTokens > 0) {
turns.push({ activity: currentActivity, tokens: currentTokens });
}
currentActivity = "conversation";
currentTokens = 0;
}
if (msg.type === "tool_use" && msg.toolName) {
currentActivity = classifyToolActivity(msg.toolName);
}
if (msg.usage) {
const total =
msg.usage.input_tokens +
msg.usage.output_tokens +
(msg.usage.cache_creation_input_tokens ?? 0) +
(msg.usage.cache_read_input_tokens ?? 0);
currentTokens += total;
}
}
// Don't forget the last turn
if (currentTokens > 0) {
turns.push({ activity: currentActivity, tokens: currentTokens });
}
// Aggregate by activity type
const totals = new Map<ActivityType, { tokens: number; count: number }>();
let grandTotal = 0;
for (const turn of turns) {
const existing = totals.get(turn.activity) ?? { tokens: 0, count: 0 };
existing.tokens += turn.tokens;
existing.count += 1;
totals.set(turn.activity, existing);
grandTotal += turn.tokens;
}
return Array.from(totals.entries())
.map(([type, data]) => ({
type,
tokenCount: data.tokens,
estimatedCost: (data.tokens / 1_000_000) * (pricing.blendedRate ?? 5.0),
messageCount: data.count,
percentage: grandTotal > 0 ? (data.tokens / grandTotal) * 100 : 0,
}))
.sort((a, b) => b.tokenCount - a.tokenCount);
}
When to use: Understanding why you’re burning tokens. The most actionable pattern.
Gotchas:
- Tool classification is heuristic — a Bash command could be anything
- Token usage is reported at the API call level, not the tool level — attribution requires correlation
- A single turn might involve multiple activities (read a file, then edit it)
- This pattern requires the JSONL files to include tool_name, which they do in recent Claude Code versions
Pattern 4: Session Comparison Dashboard
Compare sessions to find anomalies — the “which session ate my budget?” pattern.
interface SessionSummary {
id: string;
project: string;
startTime: Date;
endTime: Date;
durationMinutes: number;
totalTokens: number;
inputTokens: number;
outputTokens: number;
cacheHitRate: number; // percentage of input from cache
toolCallCount: number;
userMessageCount: number;
tokensPerMinute: number;
costEstimate: number;
topTools: { name: string; count: number }[];
model: string;
}
function summarizeSession(
id: string,
project: string,
messages: ParsedMessage[]
): SessionSummary {
if (messages.length === 0) {
return {
id, project,
startTime: new Date(), endTime: new Date(),
durationMinutes: 0, totalTokens: 0,
inputTokens: 0, outputTokens: 0,
cacheHitRate: 0, toolCallCount: 0,
userMessageCount: 0, tokensPerMinute: 0,
costEstimate: 0, topTools: [], model: "unknown",
};
}
const sorted = [...messages].sort(
(a, b) => a.timestamp.getTime() - b.timestamp.getTime()
);
const startTime = sorted[0].timestamp;
const endTime = sorted[sorted.length - 1].timestamp;
const durationMinutes = Math.max(
1,
(endTime.getTime() - startTime.getTime()) / 60_000
);
let inputTokens = 0;
let outputTokens = 0;
let cacheReadTokens = 0;
let cacheCreationTokens = 0;
const toolCounts = new Map<string, number>();
let userMessageCount = 0;
let model = "unknown";
for (const msg of messages) {
if (msg.type === "user") userMessageCount++;
if (msg.type === "tool_use" && msg.toolName) {
toolCounts.set(msg.toolName, (toolCounts.get(msg.toolName) ?? 0) + 1);
}
if (msg.usage) {
inputTokens += msg.usage.input_tokens;
outputTokens += msg.usage.output_tokens;
cacheReadTokens += msg.usage.cache_read_input_tokens ?? 0;
cacheCreationTokens += msg.usage.cache_creation_input_tokens ?? 0;
}
if (msg.model) model = msg.model;
}
const totalInput = inputTokens + cacheReadTokens + cacheCreationTokens;
const totalTokens = totalInput + outputTokens;
const cacheHitRate = totalInput > 0
? (cacheReadTokens / totalInput) * 100
: 0;
const topTools = Array.from(toolCounts.entries())
.map(([name, count]) => ({ name, count }))
.sort((a, b) => b.count - a.count)
.slice(0, 5);
return {
id,
project,
startTime,
endTime,
durationMinutes,
totalTokens,
inputTokens,
outputTokens,
cacheHitRate,
toolCallCount: Array.from(toolCounts.values()).reduce((s, c) => s + c, 0),
userMessageCount,
tokensPerMinute: totalTokens / durationMinutes,
costEstimate: totalTokens * 0.000005, // rough blended rate
topTools,
model,
};
}
// Find the most expensive sessions in a time period
function findTopSessions(
sessions: Map<string, ParsedMessage[]>,
since: Date,
limit: number = 10
): SessionSummary[] {
const summaries: SessionSummary[] = [];
for (const [key, messages] of sessions) {
const [project, sessionId] = key.split("/");
const filtered = messages.filter((m) => m.timestamp >= since);
if (filtered.length === 0) continue;
summaries.push(summarizeSession(sessionId, project, filtered));
}
return summaries
.sort((a, b) => b.totalTokens - a.totalTokens)
.slice(0, limit);
}
When to use: Post-mortem analysis after hitting a limit. “Which session ate my budget?”
Connection to Pattern 3: Combine with activity attribution to answer “which session ate my budget AND why?”
Pattern 5: OpenTelemetry Collection Pipeline
For teams that want production-grade observability.
// docker-compose.yml equivalent configuration
interface OtelPipelineConfig {
// Step 1: Configure Claude Code to export
claudeCodeEnv: {
CLAUDE_CODE_ENABLE_TELEMETRY: "1";
OTEL_METRICS_EXPORTER: "otlp";
OTEL_LOGS_EXPORTER: "otlp";
OTEL_EXPORTER_OTLP_PROTOCOL: "grpc";
OTEL_EXPORTER_OTLP_ENDPOINT: "http://localhost:4317";
OTEL_RESOURCE_ATTRIBUTES: "department=engineering,team.id=platform";
};
// Step 2: OTEL Collector receives and routes data
collector: {
receivers: ["otlp/grpc:4317", "otlp/http:4318"];
processors: ["batch", "memory_limiter"];
exporters: ["prometheus", "loki"];
};
// Step 3: Prometheus scrapes metrics
prometheus: {
scrapeInterval: "15s";
targets: ["otel-collector:8889"];
};
// Step 4: Grafana visualizes everything
grafana: {
dataSources: ["prometheus", "loki"];
dashboards: ["claude-code-usage", "team-comparison"];
};
}
// Useful PromQL queries for Grafana dashboards
const GRAFANA_QUERIES = {
// Token usage rate per user
tokenRate: `rate(claude_code_token_usage_total[5m]) by (user_account_uuid, type)`,
// Cost per team per day
dailyCostByTeam: `sum(increase(claude_code_cost_usage_total[24h])) by (department)`,
// Cache hit rate
cacheHitRate: `
sum(rate(claude_code_token_usage_total{type="cacheRead"}[1h]))
/
sum(rate(claude_code_token_usage_total{type=~"input|cacheRead|cacheCreation"}[1h]))
* 100
`,
// Most active sessions
activeSessionTokens: `topk(10, sum(claude_code_token_usage_total) by (session_id))`,
// Tool usage frequency
toolFrequency: `sum(increase(claude_code_tool_result_total[24h])) by (tool_name)`,
// Average session duration
avgSessionDuration: `avg(claude_code_active_time_total) by (user_account_uuid)`,
};
When to use: Teams of 5+ developers. Enterprise environments. When you need historical trending, alerting, and cross-developer comparison.
Gotchas:
- Requires running infrastructure (Docker, Prometheus, Grafana)
- Cost metrics are approximations — official billing data lives in the Claude Console
- Session IDs create high cardinality — watch metric storage costs
- Privacy:
OTEL_LOG_USER_PROMPTSis off by default for good reason
Example 1: Quick Token Count for Current Session
import { readFileSync, readdirSync, statSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
function getCurrentSessionTokens(): { input: number; output: number; cost: number } {
const transcriptsDir = join(homedir(), ".claude", "transcripts");
const files = readdirSync(transcriptsDir)
.map((f) => ({
name: f,
mtime: statSync(join(transcriptsDir, f)).mtime.getTime(),
}))
.sort((a, b) => b.mtime - a.mtime);
if (files.length === 0) return { input: 0, output: 0, cost: 0 };
const latest = readFileSync(join(transcriptsDir, files[0].name), "utf-8");
let totalInput = 0;
let totalOutput = 0;
for (const line of latest.trim().split("\n")) {
try {
const entry = JSON.parse(line);
if (entry.message?.usage) {
const u = entry.message.usage;
totalInput += u.input_tokens + (u.cache_read_input_tokens ?? 0)
+ (u.cache_creation_input_tokens ?? 0);
totalOutput += u.output_tokens;
}
} catch { continue; }
}
const cost = (totalInput * 3 + totalOutput * 15) / 1_000_000;
return { input: totalInput, output: totalOutput, cost };
}
Example 2: Cache Efficiency Report
function cacheEfficiencyReport(messages: ParsedMessage[]): void {
let freshInput = 0;
let cacheRead = 0;
let cacheCreation = 0;
for (const msg of messages) {
if (!msg.usage) continue;
freshInput += msg.usage.input_tokens;
cacheRead += msg.usage.cache_read_input_tokens ?? 0;
cacheCreation += msg.usage.cache_creation_input_tokens ?? 0;
}
const totalInput = freshInput + cacheRead + cacheCreation;
const hitRate = totalInput > 0 ? (cacheRead / totalInput) * 100 : 0;
// Cost comparison: actual vs. if everything were fresh
const actualCost = (freshInput * 3 + cacheRead * 0.3 + cacheCreation * 3.75) / 1_000_000;
const freshCost = (totalInput * 3) / 1_000_000;
const savings = freshCost - actualCost;
console.log(`Cache hit rate: ${hitRate.toFixed(1)}%`);
console.log(`Actual cost: $${actualCost.toFixed(4)}`);
console.log(`Without cache: $${freshCost.toFixed(4)}`);
console.log(`Savings: $${savings.toFixed(4)} (${((savings/freshCost)*100).toFixed(0)}%)`);
}
Example 3: Tool Usage Frequency Heatmap Data
function toolUsageByHour(messages: ParsedMessage[]): Map<number, Map<string, number>> {
const heatmap = new Map<number, Map<string, number>>();
for (const msg of messages) {
if (msg.type !== "tool_use" || !msg.toolName) continue;
const hour = msg.timestamp.getHours();
if (!heatmap.has(hour)) heatmap.set(hour, new Map());
const hourMap = heatmap.get(hour)!;
hourMap.set(msg.toolName, (hourMap.get(msg.toolName) ?? 0) + 1);
}
return heatmap;
}
// Output: which tools you use at which hours
// Reveals patterns like: "I do most file reading in the morning,
// most Bash execution in the afternoon"
Example 4: DuckDB Analysis of JSONL Files
-- Load all JSONL files from Claude Code projects directory
-- DuckDB can query JSONL files directly without importing
SELECT
json_extract_string(line, '$.type') as event_type,
json_extract_string(line, '$.timestamp') as ts,
json_extract_string(line, '$.tool_name') as tool,
json_extract(line, '$.message.usage.input_tokens')::int as input_tokens,
json_extract(line, '$.message.usage.output_tokens')::int as output_tokens,
json_extract(line, '$.message.usage.cache_read_input_tokens')::int as cache_read
FROM read_text('~/.claude/projects/**/*.jsonl')
WHERE json_extract_string(line, '$.message.usage') IS NOT NULL
ORDER BY ts DESC
LIMIT 100;
-- Aggregate cost by day
SELECT
date_trunc('day', json_extract_string(line, '$.timestamp')::timestamp) as day,
sum(json_extract(line, '$.message.usage.input_tokens')::int) as total_input,
sum(json_extract(line, '$.message.usage.output_tokens')::int) as total_output,
sum(json_extract(line, '$.message.usage.cache_read_input_tokens')::int) as cache_reads,
round(
(sum(json_extract(line, '$.message.usage.input_tokens')::int) * 3.0
+ sum(json_extract(line, '$.message.usage.output_tokens')::int) * 15.0
+ sum(COALESCE(json_extract(line, '$.message.usage.cache_read_input_tokens')::int, 0)) * 0.3
) / 1000000.0, 4
) as estimated_cost_usd
FROM read_text('~/.claude/projects/**/*.jsonl')
WHERE json_extract_string(line, '$.message.usage') IS NOT NULL
GROUP BY day
ORDER BY day DESC;
Example 5: Detect Context Compounding
function detectCompounding(messages: ParsedMessage[]): {
turnsBeforeCompounding: number;
compoundingFactor: number;
recommendation: string;
} {
// Track input token growth per API call
const inputTokenSeries: number[] = [];
for (const msg of messages) {
if (msg.usage && !msg.isSidechain) {
const totalInput =
msg.usage.input_tokens +
(msg.usage.cache_read_input_tokens ?? 0) +
(msg.usage.cache_creation_input_tokens ?? 0);
inputTokenSeries.push(totalInput);
}
}
if (inputTokenSeries.length < 3) {
return {
turnsBeforeCompounding: inputTokenSeries.length,
compoundingFactor: 1,
recommendation: "Not enough data to detect compounding",
};
}
// Find the point where input tokens start growing >20% per turn
let compoundingStart = inputTokenSeries.length;
for (let i = 1; i < inputTokenSeries.length; i++) {
const growth = inputTokenSeries[i] / inputTokenSeries[i - 1];
if (growth > 1.2) {
compoundingStart = i;
break;
}
}
const first = inputTokenSeries[0];
const last = inputTokenSeries[inputTokenSeries.length - 1];
const compoundingFactor = last / first;
let recommendation = "Context growth is manageable.";
if (compoundingFactor > 5) {
recommendation =
`Context grew ${compoundingFactor.toFixed(1)}x. Consider using /clear ` +
`after turn ${compoundingStart} to reset context.`;
} else if (compoundingFactor > 3) {
recommendation =
`Context grew ${compoundingFactor.toFixed(1)}x. Session is getting expensive. ` +
`Break complex tasks into shorter sessions.`;
}
return {
turnsBeforeCompounding: compoundingStart,
compoundingFactor,
recommendation,
};
}
Example 6: Shell Status Line Integration
#!/usr/bin/env bash
claude_usage() {
local dir="$HOME/.claude/transcripts"
local latest=$(ls -t "$dir"/*.jsonl 2>/dev/null | head -1)
[ -z "$latest" ] && return
# Extract token counts from the most recent session
local tokens=$(python3 -c "
import json, sys
total = 0
with open('$latest') as f:
for line in f:
try:
d = json.loads(line)
u = d.get('message', {}).get('usage', {})
total += u.get('input_tokens', 0) + u.get('output_tokens', 0)
total += u.get('cache_read_input_tokens', 0)
except: pass
print(total)
" 2>/dev/null)
if [ "$tokens" -gt 100000 ]; then
echo -n "%F{red}${tokens}tok%f" # Red: high usage
elif [ "$tokens" -gt 50000 ]; then
echo -n "%F{yellow}${tokens}tok%f" # Yellow: moderate
else
echo -n "%F{green}${tokens}tok%f" # Green: low
fi
}
RPROMPT='$(claude_usage)'
Example 7: Multi-Platform Session Discovery
// Discover sessions across all supported platforms
interface PlatformSession {
platform: string;
path: string;
lastModified: Date;
sizeBytes: number;
}
function discoverAllPlatformSessions(): PlatformSession[] {
const home = homedir();
const sessions: PlatformSession[] = [];
const platforms: { name: string; glob: string }[] = [
{ name: "Claude Code", glob: ".claude/projects/**/*.jsonl" },
{ name: "Claude Code Transcripts", glob: ".claude/transcripts/*.jsonl" },
{ name: "Codex CLI", glob: ".codex/sessions/*" },
{ name: "Gemini CLI", glob: ".gemini/tmp/*/chats/*.json" },
{ name: "Amp", glob: ".local/share/amp/threads/*" },
{ name: "OpenClaw", glob: ".openclaw/agents/*" },
{ name: "Factory Droid", glob: ".factory/sessions/*" },
{ name: "Pi", glob: ".pi/agent/sessions/*" },
{ name: "Kimi CLI", glob: ".kimi/sessions/*" },
{ name: "Qwen CLI", glob: ".qwen/projects/*" },
];
for (const platform of platforms) {
try {
// Use glob to find matching files
const fullGlob = join(home, platform.glob);
// In practice, use a glob library like fast-glob
// This is illustrative of the discovery pattern
console.log(`Scanning ${platform.name}: ${fullGlob}`);
} catch {
continue;
}
}
return sessions;
}
Example 8: Weekly Budget Optimizer
interface OptimizationSuggestion {
category: string;
currentCost: number;
projectedSavings: number;
action: string;
confidence: "high" | "medium" | "low";
}
function generateOptimizations(
sessions: SessionSummary[]
): OptimizationSuggestion[] {
const suggestions: OptimizationSuggestion[] = [];
// Check for long sessions that should be split
const longSessions = sessions.filter((s) => s.durationMinutes > 60);
if (longSessions.length > 0) {
const avgCost = longSessions.reduce((s, x) => s + x.costEstimate, 0) / longSessions.length;
suggestions.push({
category: "Session Length",
currentCost: avgCost,
projectedSavings: avgCost * 0.4, // 40% savings from splitting
action: `Split ${longSessions.length} sessions >60min. Use /clear between subtasks.`,
confidence: "high",
});
}
// Check for low cache hit rates
const lowCacheSessions = sessions.filter((s) => s.cacheHitRate < 30);
if (lowCacheSessions.length > 0) {
suggestions.push({
category: "Cache Efficiency",
currentCost: lowCacheSessions.reduce((s, x) => s + x.costEstimate, 0),
projectedSavings: lowCacheSessions.reduce((s, x) => s + x.costEstimate, 0) * 0.5,
action: "Improve CLAUDE.md to reduce context re-sending. Use /compact more often.",
confidence: "medium",
});
}
// Check for excessive file reading
for (const session of sessions) {
const readCalls = session.topTools.find((t) => t.name === "Read");
if (readCalls && readCalls.count > 20) {
suggestions.push({
category: "File Reading",
currentCost: session.costEstimate * 0.3,
projectedSavings: session.costEstimate * 0.15,
action: `Session ${session.id}: ${readCalls.count} file reads. Use Grep/Glob to narrow before reading.`,
confidence: "medium",
});
}
}
return suggestions.sort((a, b) => b.projectedSavings - a.projectedSavings);
}
Despite 15+ tools in the ecosystem, significant gaps remain.
Gap 1: No Unified Cross-Platform View
tokscale supports 15+ platforms but still has limitations. No tool today can answer: “How many total tokens did I consume across Claude Code, Cursor, and Copilot this week, and what was the blended cost?” Each platform uses different units, pricing models, and data formats.
Gap 2: Activity Attribution Is Primitive
Every tool can tell you how many tokens you used. Almost none can tell you why. The Pattern 3 code above (activity classification) doesn’t exist in any shipped product. When a developer hits their limit, they need to know: “Was it the large codebase read, the long conversation, or the repeated failed builds?”
Gap 3: No Optimization Suggestions
The tools report. None prescribe. An ideal system would say: “Your session at 2:15 PM consumed 40% of your daily budget. 70% of that was re-reading files you’d already seen. Next time, use /clear after the first task and start a fresh session for the second.”
Gap 4: No Proactive Warnings
The Claude Code Usage Monitor’s ML predictions are the closest thing to proactive warnings. But even it can only predict based on past patterns — it can’t intervene. Nobody has built the equivalent of a “low battery” notification that fires at 80% usage with suggestions for how to stretch the remaining 20%.
Gap 5: No Team Comparison Without Enterprise
Claude Code’s official analytics are only available on Team and Enterprise plans. Individual Pro/Max subscribers — who arguably need this data most — have no official visibility. The OTEL export works for individuals, but requires running infrastructure.
Gap 6: Billing Window Drift Is Invisible
The 5-hour rolling window means your budget constantly refreshes as old usage falls off. But no tool clearly shows: “In 23 minutes, your oldest expensive session will expire from the window, freeing up 15,000 tokens.” This is the equivalent of showing a cell phone data counter without showing when the billing period resets.
Gap 7: Cross-Session Context Recommendations
If a developer works on the same project across 5 sessions per day, they re-establish context each time. No tool tracks cross-session patterns to suggest: “You’ve spent 12,000 tokens on context setup for project X today. Consider using a persistent CLAUDE.md file to reduce this.”
ncdu is a terminal-based disk usage analyzer. DaisyDisk is its GUI equivalent for macOS. Both solve the same problem: “Where is my disk space going?” They share design principles that apply perfectly to token usage:
Principle 1: Hierarchical Drill-Down
ncdu shows disk usage as a tree: directory > subdirectory > file. You can drill down to find the single file consuming the most space.
The token equivalent:
Week > Day > Session > Turn > API Call
| | | | |
| | | | └── 2,340 tokens (Read: package.json)
| | | └── 8,200 tokens (3 API calls)
| | └── 45,000 tokens (12 turns, 47 tool calls)
| └── 89,000 tokens (3 sessions)
└── 312,000 tokens (5 days)
Principle 2: Relative Sizing
DaisyDisk uses a sunburst diagram where segment size represents proportional usage. The token equivalent would show sessions as proportional blocks — immediately revealing the “big spender” sessions.
┌─────────────────────────────────────────────────┐
│ Weekly Token Budget: 312K / 500K (62%) │
├──────────────────────┬──────────┬──────┬────────┤
│ Mon: 89K (29%) │ Tue: 72K │ Wed │ Thu │
│ ┌──────────┬───────┐ │ (23%) │ 65K │ 86K │
│ │session-1 │ s-2 │ │ │ (21%)│ (28%) │
│ │ 45K │ 32K │ │ │ │ │
│ │ (51%) │ (36%) │ │ │ │ │
│ └──────────┴───────┘ │ │ │ │
│ s-3: 12K (13%) │ │ │ │
└──────────────────────┴──────────┴──────┴────────┘
And here is the equivalent diagram translated into Mermaid syntax:
block-beta
columns 4
Weekly["Weekly Token Budget: 312K / 500K (62%)"]:4
block:Mon:1
columns 2
TitleMon["Mon: 89K (29%)"]:2
s1["session-1<br/>45K (51%)"]:1
s2["s-2<br/>32K (36%)"]:1
s3["s-3<br/>12K (13%)"]:2
end
Tue["Tue: 72K<br/>(23%)"]
Wed["Wed: 65K<br/>(21%)"]
Thu["Thu: 86K<br/>(28%)"]
Principle 3: Actionability
ncdu lets you delete files directly from the interface. The token equivalent would offer actions:
- “This session used 45K tokens. 60% was context re-sending. [Split into 3 sessions next time]”
- “Cache hit rate was 12%. [Add a CLAUDE.md file to prime context]”
- “You read the same 5 files in 3 different sessions today. [Create a context preset]“
Principle 4: Zero-Config
ncdu works with ncdu /path. No configuration, no setup, no accounts. The token equivalent is npx ccusage daily. The best tools already follow this principle.
What the ncdu-for-Tokens Looks Like
Token Usage Analyzer — Week of March 10-17, 2026
Usage: 312,441 / 500,000 tokens (62.5%)
████████████████████████████████░░░░░░░░░░░░░░░░░░
Day Sessions Tokens Cost Cache%
─────────────────────────────────────────────────
Mon Mar 10 3 89,231 $0.58 45%
Tue Mar 11 2 72,104 $0.47 62%
Wed Mar 12 4 65,890 $0.42 71% ← best cache day
Thu Mar 13 3 85,216 $0.55 38% ← worst cache day
Top Sessions (by token cost):
─────────────────────────────────────────────────
1. atlas/refactor-pipeline 45,221 tok $0.29 Mon 2:15pm 62min
→ 60% file reading, 25% code writing, 15% conversation
→ Recommendation: Split after file exploration phase
2. api-mom/fix-auth-bug 38,102 tok $0.25 Thu 9:30am 45min
→ 40% bash execution, 35% code writing, 25% file reading
→ Low cache rate (22%). Prime context with CLAUDE.md.
3. pages-plus/new-component 32,445 tok $0.21 Mon 4:00pm 38min
→ 55% code writing, 30% file reading, 15% conversation
→ Good cache rate (68%). Well-structured session.
[d]rill down [s]ort by [e]xport json [q]uit
This doesn’t exist yet. The closest is ccusage’s session view combined with Claude Spend’s web dashboard. But nobody has combined hierarchical drill-down, activity attribution, and optimization recommendations into a single tool.
The Data Landscape
| Platform | Local Data Format | Location | Token Data | Cost Data | Cache Data |
|---|---|---|---|---|---|
| Claude Code | JSONL | ~/.claude/projects/ | Per-API-call | Calculable | Yes (4 types) |
| Codex CLI | JSONL/JSON | ~/.codex/sessions/ | Per-session | Limited | No |
| Gemini CLI | JSON | ~/.gemini/tmp/*/chats/ | Per-session | Via /stats | Yes (context caching) |
| Cursor | SQLite + API | state.vscdb + API | Via API | Via API | No |
| Copilot | NDJSON + API | Extension logs + API | Org-level | Org-level | No |
| OpenCode | SQLite | ~/.local/share/opencode/ | Per-session | Limited | No |
The Unification Problem
Building a cross-platform analyzer requires solving three problems:
1. Schema Normalization
// Universal token event (normalized across platforms)
interface NormalizedTokenEvent {
platform: "claude-code" | "codex" | "gemini" | "cursor" | "copilot" | "opencode";
timestamp: Date;
sessionId: string;
project?: string;
model: string;
tokens: {
input: number;
output: number;
cacheRead?: number;
cacheWrite?: number;
reasoning?: number; // Codex/OpenAI-specific
};
cost: {
amount: number;
currency: "USD";
confidence: "exact" | "estimated";
};
activity?: {
type: ActivityType;
toolName?: string;
};
}
2. Pricing Normalization
Each platform prices differently. Claude Code charges per-token with cache discounts. Cursor uses a monthly credit system. Copilot is flat-rate for individuals but token-based for enterprises. Gemini CLI is free within limits, then per-token.
interface PricingModel {
platform: string;
type: "per-token" | "credit-based" | "flat-rate" | "free-tier-then-token";
inputRate: number; // per million tokens
outputRate: number; // per million tokens
cacheReadRate?: number; // per million tokens
monthlyCredit?: number; // USD (Cursor)
freeLimit?: number; // tokens (Gemini)
}
const PRICING_MODELS: PricingModel[] = [
{
platform: "claude-code",
type: "per-token",
inputRate: 3.00,
outputRate: 15.00,
cacheReadRate: 0.30,
},
{
platform: "cursor",
type: "credit-based",
inputRate: 3.00, // varies by model
outputRate: 15.00,
monthlyCredit: 20.00,
},
{
platform: "copilot",
type: "flat-rate",
inputRate: 0, // included in subscription
outputRate: 0,
},
{
platform: "gemini",
type: "free-tier-then-token",
inputRate: 1.25,
outputRate: 5.00,
freeLimit: 1_500_000, // per day
},
];
3. Activity Classification Across Platforms
Each platform uses different tool names. Claude Code calls it “Read” and “Write”. Cursor calls it “file read” and “file edit”. Copilot doesn’t expose tool-level data at all. A cross-platform tool needs a mapping layer.
Why tokscale Leads
tokscale already supports 15+ platforms because it focuses on the common denominator: token counts. It doesn’t try to normalize activities or provide optimization recommendations — it just counts tokens across platforms. This is the right v1 approach. The question is whether anyone will build v2 with the deeper analysis.
Tool Comparison Matrix
| Feature | ccusage | tokscale | Claude Spend | Usage Monitor | Limit Tracker | Claude Usage Tracker (macOS) |
|---|---|---|---|---|---|---|
| Stars | 11.6K | 1.2K | 236 | 7K | 15 | 1.6K |
| Language | TypeScript | TS + Rust | TypeScript | Python | Python | Swift |
| Interface | CLI | CLI (TUI) | Web (localhost) | Terminal (Rich) | Shell statusline | macOS menu bar |
| Platforms | Claude + 4 more | 15+ platforms | Claude only | Claude only | Claude only | Claude only |
| Real-time | Yes (live mode) | No | No | Yes | Yes | Yes |
| Cost tracking | Yes | Yes | Yes | Yes | Estimate | Yes (API) |
| Cache breakdown | Yes | Yes | Limited | Yes | No | No |
| ML predictions | No | No | No | Yes | No | No |
| MCP integration | Yes | No | No | No | No | No |
| JSON export | Yes | Yes | No | No | No | Yes |
| Activity attribution | No | No | Limited | No | No | No |
| Optimization suggestions | No | No | No | No | No | No |
| Zero config | Yes (npx) | Yes (npx) | Yes (npx) | pip install | uv run | DMG install |
| Offline capable | Yes | Yes | Yes | Yes | Yes | No (needs API) |
Approach Comparison
| Approach | Pros | Cons | Best For |
|---|---|---|---|
| JSONL parsing (ccusage, Claude Spend) | Offline, private, detailed per-API-call data | Only sees local data, no quota awareness | Individual developers, post-mortem analysis |
| API-based (Claude Usage Tracker macOS) | Exact quota percentages from Anthropic | Requires auth, network dependent, less granular | Real-time quota monitoring |
| OpenTelemetry (OTEL pipeline) | Production-grade, supports alerting, team-wide | Requires infrastructure, setup cost | Teams, enterprises |
| Browser extension (Cursor extensions) | Integrated into workflow, real-time | Platform-specific, limited depth | Cursor users specifically |
| Shell integration (limit-tracker) | Always visible, zero overhead | Limited detail, text-only | Developers who live in the terminal |
| Cross-platform (tokscale) | Unified view across tools | Jack-of-all-trades depth | Multi-tool users |
Platform Analytics Comparison
| Capability | Claude Code | Cursor | Copilot | Gemini CLI | Codex CLI |
|---|---|---|---|---|---|
| Official dashboard | Teams/Enterprise only | Yes (teams) | Yes (org admins) | GCP monitoring | No |
| Individual analytics | /cost command only | Settings page | Extension estimates | /stats command | /status command |
| OTEL export | Native, comprehensive | No | No | Native, with GCP | No |
| Local data richness | Excellent (JSONL per-call) | Good (SQLite) | Limited (logs) | Good (JSON) | Moderate |
| Third-party ecosystem | 10+ tools | 5+ tools | 3+ tools | 1-2 tools | 1-2 tools |
| Per-request token data | Yes | Via API | No (org aggregate) | Yes | Limited |
| Cache token tracking | Yes (4 types) | No | No | Yes (context cache) | No |
| Tool-level attribution | Yes (in JSONL) | No | No | No | No |
| Don’t | Do Instead | Why |
|---|---|---|
Run a single session for 3+ hours without /clear | Break into 30-60 min focused sessions | Context compounding can multiply token cost 5-10x |
| Read entire files when you need one function | Use Grep/Glob first, then read targeted lines | A 2000-line Read costs 2-3K input tokens every turn it stays in context |
| Ignore cache hit rates | Monitor with ccusage daily --breakdown | Cache reads cost 90% less — low cache rates mean you’re overpaying |
| Track only total tokens | Break down by input/output/cache types | Output tokens cost 3-5x more than input — a session with verbose output is disproportionately expensive |
| Wait until you hit the limit to check usage | Add ccusage statusline to your shell prompt | Proactive monitoring prevents the “sudden wall” experience |
| Use one tool’s analytics for all platforms | Use tokscale for cross-platform, ccusage for Claude-deep-dives | No single tool is best at everything |
| Run OTEL with high-cardinality metrics | Disable session.id attribute if metrics storage is expensive | Each session creates a new time series — unbounded cardinality |
| Assume the 5-hour window is your only limit | Track weekly usage too | Weekly limits exist independently and can block you even with 5-hour budget remaining |
| Build custom analytics from scratch | Start with npx ccusage daily --json and build on top | The JSONL parsing is solved — focus your effort on the analysis layer |
Set OTEL_LOG_USER_PROMPTS=1 in shared environments | Leave it disabled, log only prompt length | User prompts can contain proprietary code, credentials, or personal data |
The Market
- 85% of developers use AI coding tools regularly
- Claude Code is the most-used AI coding tool as of 2026
- 70% of developers use 2-4 AI tools simultaneously
- Cursor crossed $500M ARR with usage-based pricing
- Every pricing change generates backlash and demand for visibility
Willingness to Pay
The existing tools are all open-source. But there are signals:
- ccusage has a sponsorship request (11.6K stars suggests strong demand)
- Claude Spend got 100 users from a single Reddit post
- The Cursor budget tracker HackerNoon article drove significant traffic
- Multiple developers independently built the same category of tool
Potential Business Models
| Model | Price Point | Target | Viability |
|---|---|---|---|
| Freemium CLI | Free + $5-10/mo for advanced features | Individual developers | Medium — hard to gate CLI features |
| Team dashboard SaaS | $10-20/user/mo | Engineering teams | High — teams need comparison, budgeting, alerting |
| Enterprise OTEL add-on | $50-100/user/mo | Large orgs | High — integrates with existing observability |
| API proxy with analytics | Pay-per-use markup | API users | Low — adds latency and trust issues |
| Desktop app | $15-30 one-time | Mac/Windows users | Medium — DaisyDisk model, but small market |
The “Mint for AI” Opportunity
Mint (the personal finance app) succeeded by aggregating all financial accounts into one view and providing categorized spending analysis with optimization suggestions. The AI equivalent:
- Connect Claude Code, Cursor, Copilot, Gemini CLI
- Normalize all spending into one dashboard
- Categorize by activity type (coding, reading, planning, debugging)
- Provide weekly “budget review” with optimization suggestions
- Alert when approaching limits
- Suggest the most cost-efficient tool for each task type
Nobody has built this. The pieces exist (tokscale for multi-platform, ccusage for deep analysis, Usage Monitor for predictions), but the integrated product does not.
Why Open Source Might Win
The challenge for commercial products: the data is local. Users don’t want to upload their coding conversations to a SaaS. The macOS menu bar apps and CLI tools work precisely because they run locally. A commercial product would need to either:
- Run entirely locally (like DaisyDisk — one-time purchase, no cloud)
- Aggregate only metadata (token counts, not content)
- Provide team-level value that justifies cloud aggregation
Option 2 is most likely. Teams already share token usage data via OTEL to Grafana. A hosted version with better UX and built-in optimization recommendations could charge $10-20/user/month.
Based on the gap analysis, here’s what the ecosystem actually needs:
1. Activity Attribution Engine (The Missing Layer)
Every tool counts tokens. Nobody classifies them by activity type. Build a middleware that:
- Takes JSONL events as input
- Classifies each turn by activity type (reading, writing, executing, conversing, debugging)
- Outputs enriched JSONL or a summary API
- Works as a library, not a standalone tool
// The API that should exist
import { AttributionEngine } from "@token-analytics/attribution";
const engine = new AttributionEngine();
const enriched = engine.analyze("~/.claude/projects/my-project/session.jsonl");
console.log(enriched.summary);
// {
// file_reading: { tokens: 45000, cost: 0.14, percentage: 35 },
// code_writing: { tokens: 32000, cost: 0.10, percentage: 25 },
// code_execution: { tokens: 25000, cost: 0.08, percentage: 20 },
// conversation: { tokens: 18000, cost: 0.06, percentage: 14 },
// git_operations: { tokens: 8000, cost: 0.03, percentage: 6 },
// }
console.log(enriched.recommendations);
// [
// { action: "Use Grep before Read", savings: "~15%", confidence: "high" },
// { action: "Split after exploration phase", savings: "~25%", confidence: "medium" },
// ]
2. Cross-Platform Budget Tracker
Extend tokscale’s approach with:
- Unified weekly budget across all platforms
- Normalized cost comparison (“this task cost $0.12 in Claude Code vs. estimated $0.08 in Gemini”)
- Platform recommendation engine (“for large file reads, Gemini is 60% cheaper”)
3. Proactive Warning System
A daemon (or shell integration) that:
- Monitors token consumption in real-time
- Fires alerts at 60%, 80%, 95% of quota
- Suggests specific actions based on current burn rate
- Predicts when the limit will be hit (like the Usage Monitor’s ML, but proactive)
4. Team Leaderboard with Context
tokscale’s global leaderboard is fun but meaningless. A team leaderboard with context would show:
- Token efficiency (output per token) not just total consumption
- Cache hit rates compared to team average
- Session length distribution
- Tool usage patterns that correlate with productivity
5. Historical Analysis Dashboard
A lightweight web app (like Claude Spend, but deeper) that provides:
- ncdu-style hierarchical drill-down
- Week-over-week trending
- “Budget review” reports (like a bank statement)
- Exportable reports for expense tracking
Official Documentation
- Claude Code Monitoring & Usage — OpenTelemetry configuration, metrics, events, and privacy controls
- Claude Code Usage Analytics — Official team/enterprise analytics dashboard
- Claude Code ROI Measurement Guide — Anthropic’s official Docker + Prometheus + Grafana setup
- Cursor Analytics Docs — Built-in team analytics for Cursor
- GitHub Copilot Usage Metrics — Official Copilot metrics API and dashboard
- Monitoring GitHub Copilot Usage — Entitlements and usage monitoring
- Gemini CLI — Official docs including /stats command
- Gemini CLI Monitoring Dashboard — Pre-configured GCP monitoring
- OpenAI Codex CLI — Official docs including /status command
- Codex CLI Cost Tracking Feature Request — Community discussion on usage tracking
CLI Tools (JSONL Parsers)
- ccusage — 11.6K stars. The definitive Claude Code usage analyzer. TypeScript, daily/monthly/session/blocks views, MCP server, multi-platform.
- tokscale — 1.2K stars. Cross-platform tracker (15+ platforms). Rust core, TUI, global leaderboard.
- Claude Spend — 236 stars. Web dashboard via
npx claude-spend. Viral on Reddit. - claude-code-usage — CLI tool for local cost analysis.
- claude-code-usage-analyzer — Comprehensive analyzer using ccusage + LiteLLM pricing.
- claude-usage-tracker (haasonsaas) — JSONL parser with rate limit awareness and daily/weekly breakdowns.
Real-Time Monitors
- Claude-Code-Usage-Monitor — 7K stars. Python, Rich UI, ML-based predictions, P90 percentile analysis.
- claude-code-limit-tracker — Shell statusline integration with per-model quota tracking.
- claude-code-token-tracker — Real-time monitoring with conversation-level tracking.
Platform-Native Apps
- Claude-Usage-Tracker (macOS) — 1.6K stars. Swift/SwiftUI menu bar app. 6-tier pace system, multi-profile.
- TokenMeter — macOS menu bar app reading OAuth token from Keychain.
- CodexBar — macOS stats for Codex + Claude Code without login.
- claude-monitor-gnome-extension — GNOME Shell 48 taskbar integration.
Observability Integrations
- claude-code-monitor (OTEL) — OpenTelemetry receiver with web dashboard and Prometheus export.
- claude_telemetry — Drop-in
claudiawrapper exporting to Logfire, Sentry, Honeycomb, Datadog. - claude-code-otel — Docker Compose + Prometheus + Grafana solution.
- Claude Code + Grafana Guide (Quesma) — Step-by-step Grafana Cloud setup.
- Claude Code + VictoriaMetrics Guide — Self-hosted monitoring stack.
- Claude Code Monitoring with SigNoz — SigNoz integration guide.
- Claude Code Grafana Dashboard (Sealos) — Pre-built Grafana dashboard configuration.
- Claude Code Monitoring on Bedrock — AWS-specific monitoring guide.
Cursor-Specific Tools
- Cursor Stats Chrome Extension — Dashboard visualization.
- Cursor Token Tracker Chrome Extension — Cost tracking with budget projection.
- Cursor Usage & Cost Tracker VS Code Extension — In-editor monitoring.
- Cursor Usage Widget — Subscription monitoring with model breakdown.
- Cursor Tokens Dashboard — Analytics dashboard for Pro/Ultra/Teams.
Copilot-Specific Tools
- Copilot Token Tracker VS Code Extension — Status bar token estimates.
- copilot-metrics-viewer — Copilot Business Metrics API visualizer.
Log Viewers and Utilities
- claude-code-log — JSONL to HTML converter.
- clog — Web-based log viewer with real-time monitoring.
- tps-viewer — Tokens-per-second visualization.
- ClaudeCode_Dashboard — Dashboard with USD/JPY conversion.
- claude-usage-analytics (Prism) — API proxy with FinOps analytics.
Analysis and Guides
- Analyzing Claude Code Logs with DuckDB — SQL-based JSONL analysis.
- How to Calculate Your Claude Code Context Usage — Technical deep-dive on context calculation from JSONL.
- How to Track Claude Code Usage (Shipyard) — Survey of tracking methods.
- How to Reduce Claude Code Token Usage by 60% — Practical optimization guide.
- Claude Code Tokens Explained (Shipyard) — Token types and counting.
- Best Ways to Monitor Claude Code Token Usage (DEV) — 2026 monitoring guide.
- Claude Code Token Limits Guide (Faros AI) — Engineering leader’s guide.
- Claude Code Rate Limits Explained (SitPoint) — Developer guide to rate limits.
- Measuring ROI for Claude Code (Tribe AI) — ROI measurement quickstart.
- Cursor Pricing Explained (Vantage) — Cursor pricing model analysis.
Community Discussions and Bug Reports
- Excessive Token Usage in Claude Code 2.1.1 (GitHub #16856) — Bug report: 4x faster consumption.
- Opus 4.5 Usage Limits Reduced (GitHub #17084) — Limit reduction complaints.
- Excessive Token Usage (Hacker News) — HN discussion on token consumption.
- Claude Devs Complain About Limits (The Register) — Press coverage of limit frustration.
- Cursor Budget Story (HackerNoon) — Developer’s 700% bill increase and tool building.
- Claude Spend LinkedIn Post — Viral launch story: 100 users from Reddit.
- Codex CLI Usage After Limit Reset — Codex limit complaints.
- Gemini CLI Usage Monitoring Discussion — Community discussion.
- Gemini CLI Token Usage Feature Request — Feature request for spending visibility.
Market and Adoption Data
- AI Coding Assistant Statistics 2026 (Panto) — 85% developer adoption, time savings data.
- AI Coding Tools Compared 2026 (TLDL) — Benchmarks and pricing comparison.
- Best AI Coding Agents 2026 (Faros) — Real-world developer reviews.
- AI Tooling for Software Engineers 2026 (Pragmatic Engineer) — Industry survey.
- Copilot vs Cursor vs Codeium 2026 — Market comparison.
Visualization Inspiration
- ncdu (NCurses Disk Usage) — Terminal disk usage analyzer: the UX model for token usage.
- DaisyDisk — macOS disk analyzer with sunburst visualization.
- The Birth of DaisyDisk — Design decisions behind sunburst vs. treemap.
- Treemap vs Sunburst (Klarity) — Visualization design comparison.
Published March 17, 2026. This survey covers tools and data available as of this date. The AI coding assistant ecosystem moves fast — tools may have added features or new entrants may have appeared since publication.