Skip to content
Gary Wu
Go back

Durable Objects as Capability Registries

Edit page

A service registry does not need a consensus protocol, a cluster, or external infrastructure. A Durable Object is already the single source of truth for its capability: single-threaded writes eliminate registration races, in-memory state delivers sub-millisecond reads, and alarm-driven health checks make the registry self-managing. The architectural insight is simple — one DO per capability name.


Table of Contents

Open Table of Contents

The Problem with D1 as Registry

The obvious first instinct for a capability registry is D1. It is the structured-data store on Cloudflare Workers. It speaks SQL. You can write a providers table, query it on each invocation, and call it done.

This breaks under production load in three specific ways.

Throughput ceiling. D1 is built on SQLite running in a single-writer model. Cloudflare’s documentation puts the practical ceiling around 1,000 write queries per second for the global database. A registration storm — agents coming online after a cold deploy — can saturate this limit instantly. The result is not graceful degradation. It is 429s and lost registrations.

Latency floor. A D1 read from a Worker takes 500–1,500ms in the general case. This includes the round-trip to the nearest D1 replica, the SQLite query execution, and the response. For a capability lookup that happens on every invocation, this is not a cache miss — it is the critical path. Your 200ms inference request is now a 1,700ms inference request.

Race-condition registration. When multiple providers register the same capability simultaneously, D1’s eventual consistency model allows them to overwrite each other. Provider A writes its record. Provider B writes its record 50ms later. Provider A’s entry is gone. This is the same class of problem that causes etcd bootstrap failures when multiple services race to register themselves at startup. etcd solves this with Raft consensus — a distributed coordination protocol whose correctness proofs span academic papers. The solution works, but it requires running a three-node or five-node cluster, managing leader elections, handling network partitions, and paying the operational cost of a stateful distributed system.

Neither SQLite nor KV gives you what a service registry actually needs: sub-millisecond reads and race-condition-free writes at the same time.


The Insight: A DO Is a Registry Entry

Durable Objects were built to solve coordination problems. The properties that make them useful for rate limiting and session state are exactly the properties a registry needs:

The architectural shift is a single sentence: the DO does not look up the registry entry — it is the registry entry.

                Old model (D1)                New model (DO)
              ┌──────────────┐              ┌──────────────────┐
 Worker ──→   │   D1 query   │  ──→ row     │ CapabilityDO     │
              └──────────────┘              │                  │
              500-1500ms                    │ providers[]      │
              eventual writes               │ metrics{}        │
              race conditions possible      │ budget{}         │
                                            │                  │
                                            │ fetch() → route  │
                                            │ alarm() → health │
                                            └──────────────────┘
                                            <1ms reads
                                            strong writes
                                            self-managing

One DO per capability name. code-generation, image-analysis, structured-extraction, web-search — each is its own DO instance with its own providers list, its own metrics, its own budget envelope. The DO for code-generation has no awareness of image-analysis and needs none.


CapabilityRegistryDO Implementation

This is the full production implementation. It handles four operations: registration, heartbeat, invocation, and status reporting.

// capability-registry-do.ts

interface Provider {
  id: string;
  url: string;
  auth_header: string;
  registered_at: number;
  last_heartbeat: number;
  health: 'active' | 'stale' | 'dead';
  weight: number; // 0.0 - 1.0, adjusted by adaptive controller
  metadata: Record<string, string>; // tier, model, cost_per_token, etc.
}

interface ProviderMetrics {
  provider_id: string;
  latency_samples: number[];     // rolling 100 samples (ms)
  error_count: number;
  success_count: number;
  cost_usd_today: number;
  last_updated: number;
}

interface RegistryState {
  capability: string;
  providers: Record<string, Provider>;
  metrics: Record<string, ProviderMetrics>;
  budget: {
    daily_cap_usd: number;
    spent_today_usd: number;
    budget_day: string; // YYYY-MM-DD, resets at midnight UTC
  };
  routing: {
    strategy: 'weighted-random' | 'lowest-latency' | 'round-robin';
    round_robin_index: number;
  };
  created_at: number;
  last_health_check: number;
}

const STALE_THRESHOLD_MS = 2 * 60 * 1000;  // 2 minutes without heartbeat → stale
const DEAD_THRESHOLD_MS  = 5 * 60 * 1000;  // 5 minutes without heartbeat → dead
const HEALTH_CHECK_INTERVAL_MS = 60 * 1000; // alarm every 60 seconds

export class CapabilityRegistryDO implements DurableObject {
  private state: DurableObjectState;
  private env: Env;
  private registry: RegistryState | null = null;

  constructor(state: DurableObjectState, env: Env) {
    this.state = state;
    this.env = env;

    this.state.blockConcurrencyWhile(async () => {
      this.registry = await this.state.storage.get<RegistryState>('registry') ?? null;
    });
  }

  async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url);
    const segments = url.pathname.split('/').filter(Boolean);
    const action = segments[0];

    switch (action) {
      case 'register':  return this.handleRegister(request);
      case 'heartbeat': return this.handleHeartbeat(request);
      case 'invoke':    return this.handleInvoke(request);
      case 'status':    return this.handleStatus();
      case 'deregister': return this.handleDeregister(request);
      default:
        return new Response('Not found', { status: 404 });
    }
  }

  // -- Registration -----------------------------------------------

  private async handleRegister(request: Request): Promise<Response> {
    const body = await request.json<{
      provider_id: string;
      url: string;
      auth_header: string;
      capability: string;
      metadata?: Record<string, string>;
    }>();

    const registry = this.ensureRegistry(body.capability);

    // Idempotent: re-registration updates the record, does not duplicate
    const existing = registry.providers[body.provider_id];
    registry.providers[body.provider_id] = {
      id: body.provider_id,
      url: body.url,
      auth_header: body.auth_header,
      registered_at: existing?.registered_at ?? Date.now(),
      last_heartbeat: Date.now(),
      health: 'active',
      weight: existing?.weight ?? 1.0,
      metadata: body.metadata ?? {},
    };

    // Initialize metrics for new providers
    if (!registry.metrics[body.provider_id]) {
      registry.metrics[body.provider_id] = {
        provider_id: body.provider_id,
        latency_samples: [],
        error_count: 0,
        success_count: 0,
        cost_usd_today: 0,
        last_updated: Date.now(),
      };
    }

    // Ensure health check alarm is running
    const currentAlarm = await this.state.storage.getAlarm();
    if (!currentAlarm) {
      await this.state.storage.setAlarm(Date.now() + HEALTH_CHECK_INTERVAL_MS);
    }

    await this.save();
    return Response.json({ registered: true, provider_id: body.provider_id });
  }

  // -- Heartbeat --------------------------------------------------

  private async handleHeartbeat(request: Request): Promise<Response> {
    const body = await request.json<{ provider_id: string }>();
    const registry = this.requireRegistry();

    const provider = registry.providers[body.provider_id];
    if (!provider) {
      return new Response('Provider not registered', { status: 404 });
    }

    provider.last_heartbeat = Date.now();

    // Revive a stale provider — it checked in, mark active
    if (provider.health === 'stale') {
      provider.health = 'active';
    }

    await this.save();
    return Response.json({ ok: true, health: provider.health });
  }

  // -- Invocation -------------------------------------------------

  private async handleInvoke(request: Request): Promise<Response> {
    const registry = this.requireRegistry();

    // Check budget before routing
    const budgetResult = this.checkBudget(registry);
    const freeTierOnly = budgetResult.exhausted;

    // Select provider
    const provider = this.selectProvider(registry, freeTierOnly);
    if (!provider) {
      return new Response(
        JSON.stringify({ error: 'no_healthy_providers', free_tier_only: freeTierOnly }),
        { status: 503, headers: { 'Content-Type': 'application/json' } }
      );
    }

    // Proxy the request to the selected provider
    const start = Date.now();
    let proxyResponse: Response;

    try {
      proxyResponse = await fetch(provider.url, {
        method: request.method,
        headers: {
          ...Object.fromEntries(request.headers),
          'Authorization': provider.auth_header,
          'X-Routed-By': 'capability-registry',
          'X-Provider-Id': provider.id,
        },
        body: request.body,
      });
    } catch (err) {
      this.recordResult(registry, provider.id, Date.now() - start, false, 0);
      await this.save();
      return new Response(
        JSON.stringify({ error: 'provider_unreachable', provider_id: provider.id }),
        { status: 502, headers: { 'Content-Type': 'application/json' } }
      );
    }

    const latencyMs = Date.now() - start;
    const success = proxyResponse.ok;

    // Parse cost from response headers if provider includes them
    const costUsd = parseFloat(proxyResponse.headers.get('X-Cost-USD') ?? '0');

    this.recordResult(registry, provider.id, latencyMs, success, costUsd);
    this.adaptWeights(registry);
    await this.save();

    // Forward the provider response with routing metadata headers
    const responseHeaders = new Headers(proxyResponse.headers);
    responseHeaders.set('X-Provider-Id', provider.id);
    responseHeaders.set('X-Routed-Latency-Ms', String(latencyMs));
    responseHeaders.set('X-Budget-Remaining-USD',
      String((registry.budget.daily_cap_usd - registry.budget.spent_today_usd).toFixed(4))
    );

    return new Response(proxyResponse.body, {
      status: proxyResponse.status,
      headers: responseHeaders,
    });
  }

  // -- Status -----------------------------------------------------

  private handleStatus(): Response {
    const registry = this.requireRegistry();
    const activeProviders = Object.values(registry.providers).filter(p => p.health === 'active');

    return Response.json({
      capability: registry.capability,
      providers: {
        total: Object.keys(registry.providers).length,
        active: activeProviders.length,
        stale: Object.values(registry.providers).filter(p => p.health === 'stale').length,
        dead: Object.values(registry.providers).filter(p => p.health === 'dead').length,
      },
      budget: {
        daily_cap_usd: registry.budget.daily_cap_usd,
        spent_today_usd: registry.budget.spent_today_usd,
        remaining_usd: registry.budget.daily_cap_usd - registry.budget.spent_today_usd,
        budget_day: registry.budget.budget_day,
      },
      metrics_summary: this.buildMetricsSummary(registry),
      last_health_check: registry.last_health_check,
    });
  }

  // -- Alarm: Health Check Cycle ----------------------------------

  async alarm(): Promise<void> {
    const registry = this.registry;
    if (!registry) return;

    const now = Date.now();
    registry.last_health_check = now;

    // Advance health state machine for each provider
    for (const provider of Object.values(registry.providers)) {
      const silenceDuration = now - provider.last_heartbeat;

      if (provider.health === 'active' && silenceDuration > STALE_THRESHOLD_MS) {
        provider.health = 'stale';
        console.log(`[${registry.capability}] Provider ${provider.id} → stale (${Math.round(silenceDuration / 1000)}s silent)`);
      }

      if (provider.health !== 'dead' && silenceDuration > DEAD_THRESHOLD_MS) {
        provider.health = 'dead';
        provider.weight = 0;
        console.log(`[${registry.capability}] Provider ${provider.id} → dead (${Math.round(silenceDuration / 1000)}s silent)`);
      }
    }

    // Reset daily budget if the day has rolled over
    const today = new Date().toISOString().slice(0, 10);
    if (registry.budget.budget_day !== today) {
      registry.budget.spent_today_usd = 0;
      registry.budget.budget_day = today;
      // Reset daily cost counters on metrics
      for (const m of Object.values(registry.metrics)) {
        m.cost_usd_today = 0;
      }
    }

    await this.save();

    // Reschedule self
    await this.state.storage.setAlarm(Date.now() + HEALTH_CHECK_INTERVAL_MS);
  }

  // -- Provider Selection -----------------------------------------

  private selectProvider(registry: RegistryState, freeTierOnly: boolean): Provider | null {
    let candidates = Object.values(registry.providers).filter(p => p.health === 'active');

    if (freeTierOnly) {
      candidates = candidates.filter(p => p.metadata['tier'] === 'free');
    }

    if (candidates.length === 0) return null;

    switch (registry.routing.strategy) {
      case 'weighted-random':
        return this.weightedRandom(candidates);
      case 'lowest-latency':
        return this.lowestLatency(candidates, registry.metrics);
      case 'round-robin':
        return this.roundRobin(candidates, registry);
      default:
        return this.weightedRandom(candidates);
    }
  }

  private weightedRandom(providers: Provider[]): Provider {
    const totalWeight = providers.reduce((sum, p) => sum + p.weight, 0);
    let r = Math.random() * totalWeight;
    for (const p of providers) {
      r -= p.weight;
      if (r <= 0) return p;
    }
    return providers[providers.length - 1];
  }

  private lowestLatency(providers: Provider[], metrics: Record<string, ProviderMetrics>): Provider {
    return providers.reduce((best, p) => {
      const bestP50 = this.p50(metrics[best.id]?.latency_samples ?? []);
      const thisP50 = this.p50(metrics[p.id]?.latency_samples ?? []);
      return thisP50 < bestP50 ? p : best;
    });
  }

  private roundRobin(providers: Provider[], registry: RegistryState): Provider {
    const i = registry.routing.round_robin_index % providers.length;
    registry.routing.round_robin_index = i + 1;
    return providers[i];
  }

  // -- Helpers ----------------------------------------------------

  private recordResult(
    registry: RegistryState,
    providerId: string,
    latencyMs: number,
    success: boolean,
    costUsd: number,
  ): void {
    const m = registry.metrics[providerId];
    if (!m) return;

    m.latency_samples.push(latencyMs);
    if (m.latency_samples.length > 100) m.latency_samples.shift();

    if (success) m.success_count++;
    else m.error_count++;

    m.cost_usd_today += costUsd;
    m.last_updated = Date.now();

    registry.budget.spent_today_usd += costUsd;
  }

  private p50(samples: number[]): number {
    if (samples.length === 0) return Infinity;
    const sorted = [...samples].sort((a, b) => a - b);
    return sorted[Math.floor(sorted.length * 0.50)];
  }

  private p95(samples: number[]): number {
    if (samples.length === 0) return Infinity;
    const sorted = [...samples].sort((a, b) => a - b);
    return sorted[Math.floor(sorted.length * 0.95)];
  }

  private p99(samples: number[]): number {
    if (samples.length === 0) return Infinity;
    const sorted = [...samples].sort((a, b) => a - b);
    return sorted[Math.floor(sorted.length * 0.99)];
  }

  private checkBudget(registry: RegistryState): { exhausted: boolean } {
    const today = new Date().toISOString().slice(0, 10);
    if (registry.budget.budget_day !== today) {
      registry.budget.spent_today_usd = 0;
      registry.budget.budget_day = today;
    }
    return { exhausted: registry.budget.spent_today_usd >= registry.budget.daily_cap_usd };
  }

  private ensureRegistry(capability: string): RegistryState {
    if (!this.registry) {
      this.registry = {
        capability,
        providers: {},
        metrics: {},
        budget: {
          daily_cap_usd: 10.0,
          spent_today_usd: 0,
          budget_day: new Date().toISOString().slice(0, 10),
        },
        routing: {
          strategy: 'weighted-random',
          round_robin_index: 0,
        },
        created_at: Date.now(),
        last_health_check: 0,
      };
    }
    return this.registry;
  }

  private requireRegistry(): RegistryState {
    if (!this.registry) throw new Error('Registry not initialized');
    return this.registry;
  }

  private buildMetricsSummary(registry: RegistryState) {
    return Object.values(registry.metrics).map(m => ({
      provider_id: m.provider_id,
      p50_ms: this.p50(m.latency_samples),
      p95_ms: this.p95(m.latency_samples),
      p99_ms: this.p99(m.latency_samples),
      error_rate: m.error_count / Math.max(1, m.error_count + m.success_count),
      cost_usd_today: m.cost_usd_today,
    }));
  }

  private async save(): Promise<void> {
    await this.state.storage.put('registry', this.registry);
  }

  // -- Deregistration ---------------------------------------------

  private async handleDeregister(request: Request): Promise<Response> {
    const body = await request.json<{ provider_id: string }>();
    const registry = this.requireRegistry();

    delete registry.providers[body.provider_id];
    delete registry.metrics[body.provider_id];

    await this.save();
    return Response.json({ deregistered: true });
  }
}

The critical invariant: because the DO is single-threaded, handleRegister and handleInvoke can never execute simultaneously. There is no locking code because no locking is needed. The runtime guarantees exclusion.


Adaptive Routing: Per-Provider Metrics

The adaptWeights method is where the Adaptive Controller pattern — described in the DO patterns article — is applied to provider selection. After every invocation, the DO updates routing weights based on observed performance.

private adaptWeights(registry: RegistryState): void {
  const activeProviders = Object.values(registry.providers)
    .filter(p => p.health === 'active');

  if (activeProviders.length === 0) return;

  for (const provider of activeProviders) {
    const m = registry.metrics[provider.id];
    if (!m || m.latency_samples.length < 10) continue; // need enough data

    const errorRate = m.error_count / Math.max(1, m.error_count + m.success_count);
    const p99Ms = this.p99(m.latency_samples);

    // Weight is inversely proportional to error rate and high-tail latency
    // Both normalized to [0, 1] range
    const errorPenalty  = 1 - Math.min(errorRate * 2, 1.0); // 50% error rate → weight 0
    const latencyPenalty = Math.max(0, 1 - p99Ms / 5000);    // 5s P99 → weight 0

    const rawWeight = errorPenalty * 0.6 + latencyPenalty * 0.4;

    // Smooth: blend 80% old weight, 20% new signal — avoid thrashing
    provider.weight = provider.weight * 0.8 + rawWeight * 0.2;

    // Floor: never drop below 0.05 unless health state has degraded
    // This keeps underperforming providers in the pool at low weight
    // rather than removing them, so metrics continue to accumulate
    provider.weight = Math.max(0.05, provider.weight);

    console.log(
      `[adapt] ${provider.id}: p99=${p99Ms}ms err=${(errorRate*100).toFixed(1)}% → weight=${provider.weight.toFixed(3)}`
    );
  }
}

The exponential smoothing (weight * 0.8 + signal * 0.2) prevents a single bad request from swinging the routing table. A provider that degrades gradually — P99 creeping from 200ms to 2000ms over an hour — loses weight incrementally. By the time it hits 2000ms, its weight is near the floor and it is receiving fewer than 5% of requests. The remaining providers absorb the traffic without an abrupt cutover.

This matters for the P99 case in particular. P50 and P95 can look healthy while P99 reveals a tail problem — a provider hitting a cold-start every 10th request, or a model that occasionally emits a 4,000-token response when the task expected 100. The DO tracks all three and uses P99 in the weight calculation.


Budget Governor Integration

The DO doubles as a budget enforcement point. This is Pattern 3 from the DO patterns article — Budget Governor — applied at the capability layer rather than the global layer.

When the daily budget for a capability is exhausted, the selectProvider method restricts candidates to providers tagged tier: free. Paid providers are excluded for the rest of the budget day. Free providers — Workers AI, OpenRouter free tier — continue to serve requests at degraded quality.

// Configure budget when registering the capability
await fetch('/registry/configure', {
  method: 'POST',
  body: JSON.stringify({
    capability: 'code-generation',
    daily_cap_usd: 5.00,          // $5/day cap for this capability
    routing_strategy: 'weighted-random',
  }),
});

// Provider registration includes tier metadata
await fetch('/registry/code-generation/register', {
  method: 'POST',
  body: JSON.stringify({
    provider_id: 'claude-sonnet-primary',
    url: 'https://api.anthropic.com/v1/messages',
    auth_header: `Bearer ${ANTHROPIC_KEY}`,
    capability: 'code-generation',
    metadata: {
      tier: 'paid',
      model: 'claude-sonnet-4-6',
      cost_per_m_tokens: '3.00',
    },
  }),
});

await fetch('/registry/code-generation/register', {
  method: 'POST',
  body: JSON.stringify({
    provider_id: 'llama-free-fallback',
    url: 'https://openrouter.ai/api/v1/chat/completions',
    auth_header: `Bearer ${OPENROUTER_KEY}`,
    capability: 'code-generation',
    metadata: {
      tier: 'free',
      model: 'meta-llama/llama-3.3-70b-instruct:free',
      cost_per_m_tokens: '0',
    },
  }),
});

The budget governor is local to the capability. code-generation burning its $5 daily budget does not affect image-analysis. Each DO manages its own envelope. This avoids the global budget starvation problem where one expensive capability consumes the shared pool and blocks everything else.

The alarm cycle handles the daily reset: at midnight UTC, budget_day no longer matches today’s date string, and spent_today_usd resets to zero. No external scheduler needed.


The Worker Entry Point

The Workers code that routes incoming requests to the right capability DO is minimal. The routing key is the capability name in the URL path.

// worker.ts — API Mom entry point

import { Hono } from 'hono';

type Bindings = {
  CAPABILITY_REGISTRY: DurableObjectNamespace;
};

const app = new Hono<{ Bindings: Bindings }>();

// Route invocation to the correct capability DO
app.post('/v1/invoke/:name', async (c) => {
  const capabilityName = c.req.param('name');
  const stub = c.env.CAPABILITY_REGISTRY.get(
    c.env.CAPABILITY_REGISTRY.idFromName(capabilityName)
  );
  return stub.fetch(c.req.raw);
});

// Registration endpoint
app.post('/v1/registry/:name/register', async (c) => {
  const capabilityName = c.req.param('name');
  const stub = c.env.CAPABILITY_REGISTRY.get(
    c.env.CAPABILITY_REGISTRY.idFromName(capabilityName)
  );

  const url = new URL(c.req.url);
  url.pathname = '/register';
  return stub.fetch(new Request(url.toString(), c.req.raw));
});

// Heartbeat endpoint
app.post('/v1/registry/:name/heartbeat', async (c) => {
  const capabilityName = c.req.param('name');
  const stub = c.env.CAPABILITY_REGISTRY.get(
    c.env.CAPABILITY_REGISTRY.idFromName(capabilityName)
  );

  const url = new URL(c.req.url);
  url.pathname = '/heartbeat';
  return stub.fetch(new Request(url.toString(), c.req.raw));
});

// Status endpoint — useful for dashboards and debugging
app.get('/v1/registry/:name/status', async (c) => {
  const capabilityName = c.req.param('name');
  const stub = c.env.CAPABILITY_REGISTRY.get(
    c.env.CAPABILITY_REGISTRY.idFromName(capabilityName)
  );

  const url = new URL(c.req.url);
  url.pathname = '/status';
  return stub.fetch(new Request(url.toString(), { method: 'GET' }));
});

export default app;

// Wrangler requires the DO class to be exported from the entry point module
export { CapabilityRegistryDO } from './capability-registry-do.js';

idFromName is the critical call. It deterministically maps a string to a DO ID. Every Worker instance that calls idFromName('code-generation') gets the same ID, which routes to the same DO instance, which holds the single authoritative provider list for that capability. There is no coordination between Workers required — the name is the key, and Cloudflare handles global routing.


Why This Beats Consul and etcd

Consul and etcd are built to solve distributed consensus: how do you agree on a value when multiple machines can fail and network partitions can split your cluster? They solve this with real complexity — Raft logs, leader elections, quorum writes, anti-entropy gossip. etcd’s documentation recommends a five-node cluster for production. Consul’s getting-started guide begins with server and client agent configuration.

This complexity is load-bearing when you have genuinely distributed infrastructure that can fail independently. It is overhead when your “infrastructure” is a serverless platform that manages durability for you.

A Durable Object gives you what Consul gives you for a single entity, with none of the cluster management:

Consensus without consensus protocol. The DO is the single source of truth by construction. There is no replication to coordinate because there is only one writer. Cloudflare handles durability and persistence at the storage layer.

No cluster to operate. There is no etcd cluster to provision, patch, backup, and monitor. The registry exists as long as you deploy the Worker. Cloudflare manages the underlying infrastructure.

No leader election. The DO is always the leader. It does not need an election because it never shares authority with another instance.

No registration races. etcd provides atomic compare-and-swap operations precisely because multiple clients racing to register the same key create undefined state. In the DO model, the second registration request cannot execute until the first has fully committed. The race condition is structurally impossible.

Global by default. etcd clusters are regional. Cross-region etcd requires replication topologies and careful latency management. A DO is globally addressable by default — a Worker in Tokyo and a Worker in London reach the same DO instance.

The tradeoff: Consul and etcd support cross-entity queries natively. You can ask Consul “give me all healthy providers for the code-generation capability across all datacenters.” The DO model does not support this — a capability DO knows only its own providers. For cross-capability queries, you need a separate catalog index (KV or D1 works fine for this; the latency requirements are looser because catalog queries happen during bootstrapping, not on every invocation).


Multi-Dimensional Health: Not a Boolean

Binary health — healthy or unhealthy — misses the most useful information. A provider that is technically alive but degrading is more dangerous than a dead one, because it accepts traffic while delivering poor results.

The DO implements a three-state health machine per provider:

active → stale → dead

        active  (heartbeat received, revives from stale)

State transitions are driven by two signals: heartbeat silence (time-based, driven by the alarm cycle) and invocation metrics (driven by adaptive weight degradation).

Provider State Machine
─────────────────────

active      healthy: receiving heartbeats, weight > 0.05

  │ silence > 2 min (no heartbeat)

stale       degraded: still receives traffic at reduced weight
  │         (weight frozen at last value before going stale)

  │ heartbeat received

active      revived: weight resumes adapting

  │ silence > 5 min (from stale)

dead        excluded: weight = 0, removed from selection pool
            (stays in registry for auditing, not for routing)

The latency dimension adds a second path to degradation that does not require a missing heartbeat. A provider that is alive and responding — but whose P99 is increasing over a rolling window — receives progressively lower weight. By the time P99 reaches the configured threshold, the provider is near-excluded. This is proactive degradation: the DO responds to the trend, not the failure event.

// Detect P99 trend — called inside alarm() for proactive degradation
private detectLatencyTrend(m: ProviderMetrics): 'stable' | 'degrading' | 'critical' {
  if (m.latency_samples.length < 20) return 'stable'; // insufficient data

  // Compare P99 of first half vs second half of samples
  const half = Math.floor(m.latency_samples.length / 2);
  const older = m.latency_samples.slice(0, half);
  const newer = m.latency_samples.slice(half);

  const p99Older = this.p99(older);
  const p99Newer = this.p99(newer);

  const ratio = p99Newer / Math.max(1, p99Older);

  if (ratio > 3.0) return 'critical';   // P99 tripled — imminent failure
  if (ratio > 1.5) return 'degrading';  // P99 up 50% — watch closely
  return 'stable';
}

A critical trend triggers an immediate weight reduction to the floor, even before a heartbeat is missed. This is the difference between the system noticing a problem at the 5-minute dead threshold versus noticing it while the provider is still responsive but clearly struggling.


Scaling Limits and Sharding

The DO model has honest limits. Understanding them prevents scaling surprises.

Provider count per DO. In practice, 1–50 providers per capability DO is the comfortable range. The full provider list, metrics, and budget state are loaded into memory on every invocation. At 100 providers, the serialized state is a few hundred kilobytes — fine. At 1,000 providers, state serialization on every write becomes measurable overhead. The adaptWeights loop also becomes O(n) on provider count.

If a capability genuinely has 1,000+ providers, shard by provider group. code-generation/us-east, code-generation/eu-west, code-generation/asia-pacific — three DOs, each managing a regional subset. A parent “shard coordinator” DO distributes registrations and aggregates status queries.

Request throughput per DO. Single-threaded execution is both the safety property and the throughput ceiling. Cloudflare’s documentation puts DO throughput at approximately 1,000 requests/second in practice, bounded by the processing time of each request. An invocation that proxies to an external API and waits 200ms for a response means the DO can handle ~5 concurrent invocations queued as sequential calls. For high-throughput capabilities, the DO becomes a serialization bottleneck.

The fix: the DO handles routing decisions, not the actual proxying. Instead of stub.fetch(request) that waits for the provider response, the Worker fetches the routing decision from the DO, then makes the provider call directly.

// High-throughput pattern: separate routing from execution
app.post('/v1/invoke/:name', async (c) => {
  const capabilityName = c.req.param('name');
  const stub = c.env.CAPABILITY_REGISTRY.get(
    c.env.CAPABILITY_REGISTRY.idFromName(capabilityName)
  );

  // Step 1: get routing decision from DO (fast — in-memory, no external calls)
  const routeUrl = new URL(c.req.url);
  routeUrl.pathname = '/route';
  const routeResp = await stub.fetch(new Request(routeUrl.toString(), { method: 'POST' }));
  const { provider_url, auth_header, provider_id } = await routeResp.json<{
    provider_url: string;
    auth_header: string;
    provider_id: string;
  }>();

  // Step 2: call provider directly from the Worker (parallel across many Workers)
  const start = Date.now();
  const providerResp = await fetch(provider_url, {
    method: c.req.method,
    headers: { ...Object.fromEntries(c.req.raw.headers), 'Authorization': auth_header },
    body: c.req.raw.body,
  });

  // Step 3: report result back to DO asynchronously (non-blocking)
  c.executionCtx.waitUntil(
    stub.fetch(new Request('/record', {
      method: 'POST',
      body: JSON.stringify({
        provider_id,
        latency_ms: Date.now() - start,
        success: providerResp.ok,
        cost_usd: parseFloat(providerResp.headers.get('X-Cost-USD') ?? '0'),
      }),
    }))
  );

  return providerResp;
});

This pattern removes the provider’s response latency from the DO’s critical path. The DO spends microseconds on a routing decision and returns. The Worker spends 200ms waiting for the provider. Metrics flow back asynchronously via waitUntil. The DO throughput ceiling applies only to routing decisions and metric ingestion — both sub-millisecond operations.

Cross-capability queries. The hardest limit: there is no way to ask all capability DOs a question simultaneously. If you need “list all capabilities with at least one healthy provider,” you need a separate index. KV with a capabilities key holding a JSON array of names works well here. Each DO writes its name to the index on first registration. The catalog lookup is a single KV read.


Comparison Table

FeatureD1 RegistryKV RegistryDO Registry
Read latency500–1,500ms<10ms<1ms (in-memory)
Write consistencyEventualEventualStrong (single-writer)
Registration racesPossiblePossibleImpossible
Adaptive routingManual (external)Manual (external)Built-in
Health managementExternal cronExternal cronAlarm-driven
Budget enforcementManualManualBuilt-in
Per-provider metricsManualManualBuilt-in
Cross-entity queriesNative SQLManual scanRequires separate index
Provider limitUnlimited rowsUnlimited keys~100 comfortable, ~1K with care
Operational overheadLowLowVery low
External infrastructureNoneNoneNone
Consensus guaranteesNoneNoneSingle-writer

The KV column deserves attention. KV reads are fast (<10ms), but KV writes are eventually consistent with up to 60 seconds of propagation delay. Two Workers registering the same capability key within 60 seconds of each other can overwrite each other’s record. This is the same race condition as D1 — slower to manifest, but structurally identical. The DO eliminates it entirely.


References


Edit page
Share this post on:

Previous Post
Cost-Aware Orchestration: Budget as a First-Class Constraint
Next Post
The Media Store Pattern