Skip to content
Gary Wu
Go back

Durable Object Rate Limiting

Edit page

Org Status: 🟡 Dormant Cloudflare: N/A Last Audited: 2026-04-28


You need to crawl an external API — thousands of records, no official rate limit documented, and a 403 that arrives without warning. Fixed-interval crons either waste capacity or get you blocked. Token buckets don’t learn. Queue-based backpressure has no memory of what happened ten minutes ago.

Durable Objects solve this with a pattern I call the adaptive controller: a stateful singleton that schedules its own work, measures its own success rate, and tunes its own parameters in a closed feedback loop. No external scheduler. No static timers. The system finds its own cruising speed.

What you’ll learn:



Every developer who has built a crawler for an external API has faced the same situation. You need data from a service — the Apple App Store, a social media API, a competitor’s product catalog — and that service has rate limits that are either undocumented, inconsistently enforced, or both.

The Apple App Store is a particularly instructive example. There is no official “App Store Scraping API.” The iTunes Search API is public but limited. For rich data — subtitles, rating histograms, in-app purchase details, similar apps — you need to scrape the store pages directly. Apple’s response to automated requests varies by:

The naive approach is a cron job that runs every N minutes and processes a fixed batch:

// The naive approach — don't do this
export default {
  async scheduled(event: ScheduledEvent, env: Env) {
    const apps = await env.DB.prepare(
      `SELECT track_id FROM app_registry
       WHERE enriched_at IS NULL LIMIT 50`
    ).all();

    for (const app of apps.results) {
      try {
        await enrichApp(env.DB, app.track_id);
      } catch (err) {
        console.error(`Failed: ${app.track_id}`, err);
      }
    }
  },
};

This approach has several fatal flaws:

  1. Fixed batch size: 50 apps per tick regardless of whether Apple is responding or blocking. When Apple is happy, you waste capacity. When Apple is angry, you burn through failed requests.

  2. Fixed interval: Cron triggers on Cloudflare Workers have a minimum granularity of 1 minute. You can only have 3 cron triggers per Worker. You cannot adjust the interval programmatically.

  3. No memory: Each cron invocation is stateless. It does not know that the last 10 invocations all returned 403s. It does not know that it successfully processed 200 apps in the last hour without a single failure.

  4. No cooldown: When you hit a rate limit wall, the cron keeps hammering. The 403s keep coming. You are actively antagonizing the target API.

  5. No parallelism control: Serial processing wastes time. Fully parallel processing overloads the target. There is no middle ground.

What changes if you get this right:


The Adaptive Controller Pattern

The adaptive controller is a Durable Object that acts as both scheduler and executor. It owns its own timing, measures its own results, and adjusts its own parameters. This is fundamentally different from systems where scheduling and execution are separate concerns.

interface AdaptiveController {
  // Scheduling — the DO schedules itself
  alarm(): Promise<void>;

  // Execution — the DO does the work
  processBatch(items: string[]): Promise<BatchResult>;

  // Adaptation — the DO adjusts its own parameters
  adapt(result: BatchResult): void;

  // Persistence — the DO remembers across restarts
  persist(): Promise<void>;
}

interface BatchResult {
  dispatched: number;
  succeeded: number;
  failed: number;
  errors: ErrorSignal[];
}

type ErrorSignal =
  | { type: "rate_limit"; status: 403 | 429 }
  | { type: "server_error"; status: 500 | 502 | 503 }
  | { type: "timeout" }
  | { type: "parse_error" }
  | { type: "not_found"; status: 404 };

The key insight is that all three responsibilities — scheduling, execution, and adaptation — live in the same object, sharing the same state, running in the same process. There is no coordination overhead, no race condition between the scheduler deciding “run now” and the executor deciding “I’m overwhelmed.”

Key insight: The adaptive controller is a control system in the classical sense. It has a process variable (success rate), a setpoint (target throughput), and a controller (the adapt() function) that adjusts the manipulated variables (batch size, interval) to keep the process variable near the setpoint.

Durable Object Alarms as a Scheduling Primitive

A Durable Object alarm is a single scheduled wake-up call. You set it with ctx.storage.setAlarm(timestamp), and at that time, the runtime calls your alarm() method. Each DO can have exactly one alarm at a time.

interface AlarmCapabilities {
  // Set a single alarm — millisecond granularity
  setAlarm(scheduledTime: number | Date): Promise<void>;

  // Get the currently scheduled alarm time (or null)
  getAlarm(): Promise<number | null>;

  // Cancel the current alarm
  deleteAlarm(): Promise<void>;
}

Why alarms are superior to crons for adaptive scheduling:

PropertyCron TriggersDO Alarms
Granularity1 minute minimumMillisecond precision
Per-Worker limit3 triggers maxUnlimited DOs, each with 1 alarm
Programmable intervalNo — configured in dashboard/wranglerYes — set dynamically in code
State between invocationsNone — statelessFull — DO storage persists
Retry on failureNo built-in retryAutomatic exponential backoff retry
Per-entity isolationShared across all entitiesOne DO per entity, each with own alarm
Location controlRuns wherever Worker is deployedLocation hints pin to specific colo

The alarm’s at-least-once guarantee means your controller will always wake up, even after infrastructure maintenance or failures. The automatic retry with exponential backoff (starting at 2 seconds, up to 6 retries) means transient failures in your alarm() handler don’t break the scheduling loop.

Key insight: A DO alarm is not a timer. It is a durable scheduling primitive with persistence guarantees. Setting an alarm is making a contract with the runtime: “Wake me up at this time, and keep trying until my handler succeeds.” This is fundamentally different from setTimeout or cron, which are fire-and-forget.

The Feedback Loop

The feedback loop is the core mechanism that makes the controller adaptive. Every alarm tick follows the same cycle:

+------------------------------------------------------+
|                                                      |
|  +---------+    +---------+    +---------+           |
|  |  WAKE   |--->| EXECUTE |--->| MEASURE |           |
|  | (alarm) |    | (batch) |    | (stats) |           |
|  +---------+    +---------+    +----v----+           |
|       ^                             |                |
|       |         +---------+    +----v----+           |
|       |         |SCHEDULE |<---|  ADAPT  |           |
|       +---------| (alarm) |    | (tune)  |           |
|                 +---------+    +---------+           |
|                                                      |
+------------------------------------------------------+

Each phase:

  1. WAKE: The alarm fires. The DO wakes from hibernation (zero cost while sleeping).
  2. EXECUTE: Query the database for work items. Process them in parallel batches.
  3. MEASURE: Record successes and failures in a rolling metrics window.
  4. ADAPT: Calculate the success rate over the window. Adjust batch size and interval.
  5. SCHEDULE: Set the next alarm at the adapted interval.

The loop is self-perpetuating. Once started, it runs indefinitely until explicitly stopped. Each iteration learns from the previous one.

// The complete feedback loop in pseudocode
async alarm(): Promise<void> {
  if (!this.state.running) return;

  // 1. Check cooldown
  if (Date.now() < this.state.cooldown_until) {
    await this.scheduleNext();
    return;
  }

  // 2. Execute
  const items = await this.fetchWorkItems(this.state.current_batch_size);
  const result = await this.processBatch(items);

  // 3. Measure
  this.recordMetrics(result);

  // 4. Adapt
  this.adapt();

  // 5. Schedule
  await this.persist();
  await this.scheduleNext();
}

State Shape

The controller’s state is a single JSON object persisted to DO storage. It captures everything needed to resume after hibernation, restart, or migration.

interface ControllerState {
  // Lifetime counters
  total_dispatched: number;
  total_completed: number;
  total_failed: number;
  last_tick_at: number;

  // Control flag
  running: boolean;

  // Adaptive parameters — these are the tuning knobs
  current_batch_size: number;   // How many items per tick (2–50)
  current_interval_ms: number;  // Time between ticks (10s–120s)
  cooldown_until: number;       // Timestamp: skip execution until this time

  // Rolling metrics for success rate calculation
  // Keyed by minute-aligned timestamp for easy pruning
  metrics: Record<number, {
    dispatched: number;
    completed: number;
    failed: number;
  }>;
}

The default state starts conservative — small batch, moderate interval — and lets the controller ramp up as it proves the API is responsive:

const DEFAULT_STATE: ControllerState = {
  total_dispatched: 0,
  total_completed: 0,
  total_failed: 0,
  last_tick_at: 0,
  running: true,
  current_batch_size: 5,        // Start small
  current_interval_ms: 30_000,  // Every 30 seconds
  cooldown_until: 0,
  metrics: {},
};

Key insight: The state is a single put/get on DO storage, not a set of scattered keys. This makes persistence atomic — you never have a partially-updated state. The metrics window is embedded in the state object, pruned on every persist, keeping the storage footprint bounded.


Pattern 1: The Self-Driving Singleton

The controller is a singleton Durable Object — one instance per crawl target, addressed by a stable name. It drives itself: the constructor starts the alarm loop, and each alarm schedules the next one. No external trigger needed after the initial creation.

When to use it: Any scenario where you need continuous, adaptive processing of a work queue against a rate-limited external API.

import { DurableObject } from "cloudflare:workers";

// Safety rails — bounds that the adaptive logic cannot exceed
const ABSOLUTE_MIN_BATCH = 2;
const ABSOLUTE_MAX_BATCH = 50;
const MIN_INTERVAL_MS = 10_000;   // Fastest: every 10 seconds
const MAX_INTERVAL_MS = 120_000;  // Slowest: every 2 minutes
const COOLDOWN_MS = 300_000;      // 5 minutes on critical failure
const METRICS_WINDOW_MINUTES = 5;
const PARALLEL_LIMIT = 8;

interface ControllerState {
  total_dispatched: number;
  total_completed: number;
  total_failed: number;
  last_tick_at: number;
  running: boolean;
  current_batch_size: number;
  current_interval_ms: number;
  cooldown_until: number;
  metrics: Record<
    number,
    { dispatched: number; completed: number; failed: number }
  >;
}

const DEFAULT_STATE: ControllerState = {
  total_dispatched: 0,
  total_completed: 0,
  total_failed: 0,
  last_tick_at: 0,
  running: true,
  current_batch_size: 5,
  current_interval_ms: 30_000,
  cooldown_until: 0,
  metrics: {},
};

type Env = {
  DB: D1Database;
};

export class EnrichmentController extends DurableObject<Env> {
  private state: ControllerState = { ...DEFAULT_STATE };

  constructor(ctx: DurableObjectState, env: Env) {
    super(ctx, env);

    // blockConcurrencyWhile ensures no requests arrive
    // until we finish loading state from storage
    ctx.blockConcurrencyWhile(async () => {
      const saved = await ctx.storage.get<ControllerState>("state");
      if (saved) {
        this.state = { ...DEFAULT_STATE, ...saved };
      }
      this.pruneMetrics();

      // Self-start: if the controller should be running
      // and there is no alarm set, set one
      if (this.state.running) {
        const existing = await ctx.storage.getAlarm();
        if (!existing) {
          await ctx.storage.setAlarm(
            Date.now() + this.state.current_interval_ms
          );
        }
      }
    });
  }

  override async alarm(): Promise<void> {
    if (!this.state.running) return;

    try {
      // Cooldown check — skip execution, just reschedule
      if (Date.now() < this.state.cooldown_until) {
        const remaining = Math.round(
          (this.state.cooldown_until - Date.now()) / 1000
        );
        console.log(
          `[controller] In cooldown, ${remaining}s remaining`
        );
        await this.scheduleNext();
        return;
      }

      // Fetch work items from the database
      const batchSize = this.state.current_batch_size;
      const toProcess = await this.env.DB.prepare(
        `SELECT id FROM work_queue
         WHERE processed_at IS NULL
         ORDER BY priority DESC, created_at ASC
         LIMIT ?`
      )
        .bind(batchSize)
        .all<{ id: string }>();

      if (toProcess.results.length === 0) {
        console.log(`[controller] No items to process`);
        await this.scheduleNext();
        return;
      }

      const ids = toProcess.results.map((r) => r.id);
      this.state.total_dispatched += ids.length;
      this.recordMetric("dispatched", ids.length);

      // Process in parallel batches
      const { succeeded, failed } = await this.processBatch(ids);

      // Record results
      this.state.total_completed += succeeded;
      this.state.total_failed += failed;
      this.state.last_tick_at = Date.now();
      this.recordMetric("completed", succeeded);
      this.recordMetric("failed", failed);

      // Adapt based on results
      this.adapt();

      console.log(
        `[controller] ${succeeded}/${ids.length} ok, ` +
          `${failed} failed | ` +
          `batch=${this.state.current_batch_size} ` +
          `interval=${(this.state.current_interval_ms / 1000).toFixed(0)}s`
      );
    } catch (err) {
      console.error(`[controller] Tick failed:`, err);
    }

    await this.persist();
    await this.scheduleNext();
  }

  private async processBatch(
    ids: string[]
  ): Promise<{ succeeded: number; failed: number }> {
    let succeeded = 0;
    let failed = 0;

    // Process in chunks of PARALLEL_LIMIT
    for (let i = 0; i < ids.length; i += PARALLEL_LIMIT) {
      const chunk = ids.slice(i, i + PARALLEL_LIMIT);
      await Promise.allSettled(
        chunk.map(async (id) => {
          try {
            await this.processItem(id);
            succeeded++;
          } catch {
            failed++;
          }
        })
      );
      // Brief pause between parallel chunks
      if (i + PARALLEL_LIMIT < ids.length) {
        await new Promise((r) => setTimeout(r, 200));
      }
    }

    return { succeeded, failed };
  }

  private async processItem(id: string): Promise<void> {
    // Your actual processing logic here
    // This is where you call the external API
    throw new Error("Implement me");
  }

  private adapt(): void {
    // See Pattern 3 for the full adaptive logic
  }

  private async scheduleNext(): Promise<void> {
    if (this.state.running) {
      await this.ctx.storage.setAlarm(
        Date.now() + this.state.current_interval_ms
      );
    }
  }

  private recordMetric(
    key: "dispatched" | "completed" | "failed",
    count: number
  ): void {
    const minuteTs = Math.floor(Date.now() / 60_000) * 60_000;
    if (!this.state.metrics[minuteTs]) {
      this.state.metrics[minuteTs] = {
        dispatched: 0,
        completed: 0,
        failed: 0,
      };
    }
    this.state.metrics[minuteTs][key] += count;
  }

  private pruneMetrics(): void {
    const cutoff =
      Date.now() - (METRICS_WINDOW_MINUTES + 1) * 60_000;
    for (const ts of Object.keys(this.state.metrics)) {
      if (Number(ts) < cutoff) {
        delete this.state.metrics[Number(ts)];
      }
    }
  }

  private async persist(): Promise<void> {
    this.pruneMetrics();
    await this.ctx.storage.put("state", this.state);
  }
}

Gotchas:

Connection to other patterns: This is the skeleton. Patterns 2-7 flesh out each method.


Pattern 2: Rolling Window Metrics

The controller needs to know its recent success rate to make adaptation decisions. A lifetime success rate is too slow to react — if you processed 10,000 items successfully and then hit a rate limit wall, your lifetime rate barely dips. You need a rolling window.

When to use it: Any adaptive system that needs to respond to recent conditions, not historical averages.

const METRICS_WINDOW_MINUTES = 5;

// Metrics are bucketed by minute for easy pruning.
// Each minute bucket accumulates dispatched/completed/failed counts.
type MetricsBucket = {
  dispatched: number;
  completed: number;
  failed: number;
};

// Stored in the controller state, keyed by minute-aligned timestamp
type MetricsStore = Record<number, MetricsBucket>;

function recordMetric(
  metrics: MetricsStore,
  key: keyof MetricsBucket,
  count: number
): void {
  // Align to minute boundary
  const minuteTs = Math.floor(Date.now() / 60_000) * 60_000;
  if (!metrics[minuteTs]) {
    metrics[minuteTs] = { dispatched: 0, completed: 0, failed: 0 };
  }
  metrics[minuteTs][key] += count;
}

function successRate(metrics: MetricsStore): number {
  const windowStart =
    Date.now() - METRICS_WINDOW_MINUTES * 60_000;
  let completed = 0;
  let failed = 0;

  for (const [ts, bucket] of Object.entries(metrics)) {
    if (Number(ts) >= windowStart) {
      completed += bucket.completed;
      failed += bucket.failed;
    }
  }

  const total = completed + failed;
  // No data yet → assume success (optimistic start)
  return total === 0 ? 1 : completed / total;
}

function recentTotal(metrics: MetricsStore): number {
  const windowStart =
    Date.now() - METRICS_WINDOW_MINUTES * 60_000;
  let total = 0;

  for (const [ts, bucket] of Object.entries(metrics)) {
    if (Number(ts) >= windowStart) {
      total += bucket.completed + bucket.failed;
    }
  }

  return total;
}

function completionsPerMinute(metrics: MetricsStore): number {
  const windowStart =
    Date.now() - METRICS_WINDOW_MINUTES * 60_000;
  let total = 0;

  for (const [ts, bucket] of Object.entries(metrics)) {
    if (Number(ts) >= windowStart) {
      total += bucket.completed;
    }
  }

  return Math.round(total / METRICS_WINDOW_MINUTES);
}

function pruneMetrics(metrics: MetricsStore): void {
  const cutoff =
    Date.now() - (METRICS_WINDOW_MINUTES + 1) * 60_000;
  for (const ts of Object.keys(metrics)) {
    if (Number(ts) < cutoff) {
      delete metrics[Number(ts)];
    }
  }
}

Why minute-aligned buckets?

The alternative is per-tick metrics (one entry per alarm invocation). The problem: alarm intervals vary from 10 seconds to 2 minutes. A 5-minute window at 10-second intervals is 30 entries. At 2-minute intervals, it is 2-3 entries. Minute-aligned buckets give a consistent window size (always 5 buckets for a 5-minute window) regardless of the alarm interval.

Why prune on every persist?

DO storage has no automatic TTL. Without pruning, the metrics object grows indefinitely. Since the entire state is a single put call, keeping old buckets inflates every write. Pruning to WINDOW + 1 minutes keeps the storage footprint constant.

Key insight: The optimistic default (return total === 0 ? 1 : ...) matters. When the controller first starts or resumes after a long pause, it has no metrics. Returning 1.0 (100% success) means it uses its current batch size and interval without unnecessarily throttling. The first real failures will immediately adjust the parameters.

Edge cases:


Pattern 3: Tiered Adaptive Response

The adaptation logic uses the rolling success rate to make proportional adjustments. Not all failures are equal — a temporary blip should cause a gentle slowdown, while a sustained block demands aggressive retreat.

When to use it: When you need more nuance than binary “back off / speed up.”

// Thresholds define four zones of operation
const SUCCESS_RATE_CRITICAL = 0.2;  // <20% → emergency cooldown
const SUCCESS_RATE_LOW = 0.5;       // <50% → significant backoff
const SUCCESS_RATE_GOOD = 0.8;      // >80% → gentle ramp up
const SUCCESS_RATE_GREAT = 0.95;    // >95% → aggressive ramp up

// Adjustment factors
const RAMP_UP_AGGRESSIVE = 1.25;    // +25% batch size
const RAMP_UP_GENTLE = 1.1;        // +10% batch size
const RAMP_DOWN_FACTOR = 0.5;      // -50% batch size
const INTERVAL_SPEEDUP_FAST = 0.8; // 20% faster
const INTERVAL_SPEEDUP_SLOW = 0.95; // 5% faster
const INTERVAL_SLOWDOWN = 1.5;     // 50% slower

function adapt(state: ControllerState): void {
  const sr = successRate(state.metrics);
  const recent = recentTotal(state.metrics);

  // Don't adapt until we have enough data points
  // (prevents overreaction on the first few ticks)
  if (recent < 5) return;

  if (sr < SUCCESS_RATE_CRITICAL) {
    // ZONE 1: CRITICAL — enter cooldown
    // Success rate below 20% means the API is actively blocking us.
    // Stop for COOLDOWN_MS, reset to minimum parameters.
    console.log(
      `[adapt] CRITICAL: ${(sr * 100).toFixed(0)}% success → ` +
        `cooldown ${COOLDOWN_MS / 1000}s`
    );
    state.cooldown_until = Date.now() + COOLDOWN_MS;
    state.current_batch_size = ABSOLUTE_MIN_BATCH;
    state.current_interval_ms = MAX_INTERVAL_MS;

  } else if (sr < SUCCESS_RATE_LOW) {
    // ZONE 2: LOW — significant backoff
    // Success rate 20-50%. Halve the batch, slow the interval.
    // Aggressive but not panicked.
    state.current_batch_size = Math.max(
      ABSOLUTE_MIN_BATCH,
      Math.floor(state.current_batch_size * RAMP_DOWN_FACTOR)
    );
    state.current_interval_ms = Math.min(
      MAX_INTERVAL_MS,
      Math.floor(state.current_interval_ms * INTERVAL_SLOWDOWN)
    );

  } else if (sr > SUCCESS_RATE_GREAT) {
    // ZONE 4: GREAT — aggressive ramp up
    // Success rate above 95%. The API is happy. Push harder.
    state.current_batch_size = Math.min(
      ABSOLUTE_MAX_BATCH,
      Math.ceil(state.current_batch_size * RAMP_UP_AGGRESSIVE)
    );
    state.current_interval_ms = Math.max(
      MIN_INTERVAL_MS,
      Math.floor(state.current_interval_ms * INTERVAL_SPEEDUP_FAST)
    );

  } else if (sr > SUCCESS_RATE_GOOD) {
    // ZONE 3: GOOD — gentle ramp up
    // Success rate 80-95%. Cautiously increase.
    state.current_batch_size = Math.min(
      ABSOLUTE_MAX_BATCH,
      Math.ceil(state.current_batch_size * RAMP_UP_GENTLE)
    );
    state.current_interval_ms = Math.max(
      MIN_INTERVAL_MS,
      Math.floor(state.current_interval_ms * INTERVAL_SPEEDUP_SLOW)
    );
  }
  // DEAD ZONE: 50-80% success rate → no change
  // This provides hysteresis — the system needs to clearly
  // improve or worsen before parameters change.
}

The four zones visualized:

Success Rate:
  0%        20%         50%         80%        95%       100%
  |━━━━━━━━━|━━━━━━━━━━━|━━━━━━━━━━━|━━━━━━━━━━|━━━━━━━━━|
  │CRITICAL │   LOW     │ DEAD ZONE │  GOOD    │  GREAT  │
  │cooldown │ -50% batch│ no change │+10% batch│+25% batch│
  │min batch│ +50% intv │           │-5% intv  │-20% intv │
  │max intv │           │           │          │          │

Why the dead zone matters:

Without a dead zone (50-80%), the controller oscillates. Imagine a success rate of 78%: it ramps up, which pushes it to 82%, which ramps up more, pushing it back to 75%, which backs off, pushing it to 85%, etc. The dead zone provides hysteresis — the system must clearly enter a new zone before parameters change.

Why Math.ceil for ramp-up and Math.floor for ramp-down?

Ceiling ensures forward progress. Math.ceil(5 * 1.1) = 6 — you actually increase by 1. Math.floor(5 * 1.1) = 5 — you stay flat. For ramp-down, flooring ensures you actually decrease: Math.floor(6 * 0.5) = 3.

The minimum data check (recent < 5):

This prevents a single failed request from triggering a critical cooldown. The controller waits until it has at least 5 data points in the rolling window before making adaptation decisions. This is especially important at startup and after cooldown recovery.

Key insight: The adaptation is asymmetric by design. Ramp-down is faster than ramp-up (50% reduction vs 25% increase). This is intentional. Getting blocked is expensive — it poisons your IP reputation, wastes compute, and may trigger longer-term bans. Getting to maximum throughput a few ticks later is cheap. Always be faster to retreat than to advance.


Pattern 4: Parallel Batch Processing

Within a single alarm tick, the controller processes its batch items in parallel — but with controlled concurrency. You do not want to fire 50 concurrent requests at Apple simultaneously. You also do not want to process them one at a time.

When to use it: When each work item involves one or more external API calls and you need to maximize throughput within CPU time limits.

const ENRICHMENT_PARALLEL = 8; // Max concurrent items per chunk

async function enrichBatch(
  db: D1Database,
  trackIds: number[],
  htmlCache?: R2Bucket
): Promise<{ enriched: number; failed: number }> {
  // Dynamic import keeps the DO class lightweight.
  // The enrichment module is only loaded when work exists.
  const { default: enrichSingleApp } = await import(
    "./enrich-single"
  );

  let enriched = 0;
  let failed = 0;

  // Process in chunks of ENRICHMENT_PARALLEL
  for (let i = 0; i < trackIds.length; i += ENRICHMENT_PARALLEL) {
    const chunk = trackIds.slice(i, i + ENRICHMENT_PARALLEL);

    // Promise.allSettled — never throws, always returns all results.
    // A single failed enrichment must not abort the entire batch.
    await Promise.allSettled(
      chunk.map(async (trackId) => {
        try {
          await enrichSingleApp(db, trackId, htmlCache);
          enriched++;
        } catch (err: unknown) {
          failed++;
          const msg =
            err instanceof Error ? err.message : String(err);
          const isRateLimit =
            msg.includes("403") || msg.includes("429");

          if (!isRateLimit) {
            // Permanent failure — mark as failed so we don't retry
            await db
              .prepare(
                `UPDATE app_registry
                 SET enriched_at = datetime('now'),
                     enrichment_source = 'failed'
                 WHERE track_id = ?`
              )
              .bind(trackId)
              .run()
              .catch(() => {});
          }
          // Rate limit failures: DON'T mark as failed.
          // The item stays in the queue for retry on a future tick,
          // after the controller has backed off.
        }
      })
    );

    // Brief pause between chunks — gives the target API
    // a micro-breather and prevents request bunching
    if (i + ENRICHMENT_PARALLEL < trackIds.length) {
      await new Promise((r) => setTimeout(r, 200));
    }
  }

  return { enriched, failed };
}

Why Promise.allSettled instead of Promise.all?

Promise.all rejects on the first failure, abandoning all in-flight work. In a batch of 8 enrichments, if item 3 fails, you lose the results of items 4-8 that might have succeeded. Promise.allSettled always resolves with all results, letting you count successes and failures accurately.

Why the 200ms inter-chunk pause?

Without it, chunks fire back-to-back. If PARALLEL_LIMIT is 8 and the batch size is 50, that is 7 chunks in rapid succession — essentially 50 near-simultaneous requests. The 200ms pause spreads the load and gives the target API time to process each chunk before the next arrives.

Why dynamic import?

The enrichment module imports cheerio (for HTML parsing), zod (for validation), and other dependencies. Loading these on every alarm tick — even when there is no work — wastes CPU time. The dynamic import means the DO class itself is lightweight, loading the heavy modules only when actual work exists.

Error classification matters:

The code distinguishes between rate-limit errors (403/429) and other errors. This distinction drives two different behaviors:

Key insight: Each enrichment call to the Apple App Store actually makes 4 concurrent sub-requests: iTunes API lookup, ratings scrape, page scrape, and IAP scrape. With 8 items in a chunk, that is 32 concurrent outbound fetches. This is why the parallel limit matters — it is not 8 requests, it is 32. Know your multiplication factor.


Pattern 5: Cooldown and Recovery

When the success rate drops below the critical threshold (20%), the controller enters cooldown — a period of complete inactivity that lets the target API’s rate limiting state reset.

When to use it: When sustained failure indicates you have been blocked, not just rate-limited. The difference: a 429 means “slow down.” Sustained 403s mean “go away for a while.”

const COOLDOWN_MS = 300_000; // 5 minutes

function enterCooldown(state: ControllerState): void {
  state.cooldown_until = Date.now() + COOLDOWN_MS;
  state.current_batch_size = ABSOLUTE_MIN_BATCH;
  state.current_interval_ms = MAX_INTERVAL_MS;
}

function handleCooldown(state: ControllerState): boolean {
  if (Date.now() < state.cooldown_until) {
    const remaining = Math.round(
      (state.cooldown_until - Date.now()) / 1000
    );
    console.log(`[controller] In cooldown, ${remaining}s remaining`);
    return true; // Skip this tick
  }
  return false; // Cooldown expired, proceed
}

// Recovery after cooldown is gentle.
// The controller resumes with:
//   - ABSOLUTE_MIN_BATCH (2 items)
//   - MAX_INTERVAL_MS (120 seconds)
//   - Empty metrics window (all old metrics expired during cooldown)
//
// This means the first few ticks process 2 items every 2 minutes.
// If those succeed, the adapter gradually ramps back up.
// Full recovery from cooldown to maximum throughput typically takes
// 5-10 ticks (5-20 minutes depending on how fast the adapter ramps).

The recovery trajectory:

Tick 0: cooldown ends → batch=2, interval=120s
Tick 1: 2/2 succeed (100%) → adapt: batch=3, interval=96s
Tick 2: 3/3 succeed (100%) → adapt: batch=4, interval=77s
Tick 3: 4/4 succeed (100%) → adapt: batch=5, interval=62s
Tick 4: 5/5 succeed (100%) → adapt: batch=7, interval=49s
Tick 5: 7/7 succeed (100%) → adapt: batch=9, interval=39s
Tick 6: 9/9 succeed (100%) → adapt: batch=12, interval=32s
...
Tick 12: ~40 items, ~12s interval → near maximum throughput

This trajectory shows exponential recovery. After 12 successful ticks (about 6 minutes of processing time plus cooldown), the controller is back near full capacity. But if any tick during recovery shows failures, the adaptation immediately responds, preventing a second cooldown.

Multiple consecutive cooldowns:

If the controller exits cooldown, processes 2 items, and both fail, the success rate immediately drops below 20% and triggers another cooldown. This is correct behavior — the API has not recovered. The controller will keep cycling through cooldowns until the API becomes responsive again.

Edge case — clock skew:

cooldown_until is an absolute timestamp. If the DO migrates to a different machine with a slightly different clock, or if Date.now() drifts, the cooldown duration could be slightly longer or shorter. A few seconds of drift is irrelevant for a 5-minute cooldown. If you need precision, store cooldown_started_at and cooldown_duration_ms separately and compute the end time fresh each tick.

Key insight: Cooldown is not the same as stopping. During cooldown, the alarm loop continues running — it just skips the execution phase. This is important because the DO stays warm (fast response to status queries) and the alarm chain is never broken (no need for an external “wake-up” after cooldown).


Pattern 6: Location-Optimized Singleton

Cloudflare Workers run in 300+ data centers worldwide. When your DO makes requests to a specific API, latency depends on which data center the DO is running in. Location hints let you pin a DO to a colo close to the target API.

When to use it: When your DO primarily communicates with a specific external service rather than with end users.

type Env = {
  DB: D1Database;
  HTML_CACHE: R2Bucket;
  ENRICHMENT_CONTROLLER: DurableObjectNamespace<EnrichmentController>;
};

export default {
  async fetch(
    request: Request,
    env: Env
  ): Promise<Response> {
    const url = new URL(request.url);

    if (url.pathname.startsWith("/enrichment/")) {
      // Get the singleton with a location hint.
      // 'enam' = Eastern North America — close to Apple's
      // infrastructure in Cupertino/Reno.
      const id = env.ENRICHMENT_CONTROLLER.idFromName("main");
      const stub = env.ENRICHMENT_CONTROLLER.get(id, {
        locationHint: "enam",
      });

      // Strip the /enrichment prefix and forward
      const doUrl = new URL(request.url);
      doUrl.pathname = doUrl.pathname.replace(
        "/enrichment",
        ""
      );
      return stub.fetch(doUrl.toString(), request);
    }

    return new Response("Not found", { status: 404 });
  },
};

Available location hints:

HintRegionGood for
wnamWestern North AmericaUS West APIs (Google, Meta)
enamEastern North AmericaUS East APIs (Apple, AWS)
samSouth AmericaNote: DOs actually spawn in enam
weurWestern EuropeEU APIs
eeurEastern EuropeEU East APIs
apacAsia PacificAsian APIs
ocOceaniaAU/NZ APIs
afrAfricaNote: may spawn in weur
meMiddle EastNote: may spawn in weur or apac

Important caveats from the Cloudflare docs:

  1. Location hints are best effort, not guaranteed. The DO will spawn in a nearby data center that supports DOs, which may not be the exact colo you hinted.

  2. Only the first get() call for a particular DO respects the hint. Once created, the DO stays where it is. Subsequent calls with different hints are ignored.

  3. Some regions (South America, Africa, Middle East) do not have DO-capable data centers. DOs hinted to these regions spawn in the nearest supported region.

Why enam for Apple:

Apple’s infrastructure is concentrated in the United States. The iTunes API endpoints (itunes.apple.com) and App Store web pages (apps.apple.com) resolve to US-based servers. A DO running in enam has 10-30ms latency to Apple. A DO running in apac has 150-200ms latency. When you are making 4 API calls per item across 8 parallel items, that latency difference adds up to seconds per tick.

Our production experience confirmed this: requests from certain European and Asian Cloudflare colos triggered 403s more frequently than US East colos. Apple’s bot detection appears to have different thresholds based on source geography.

Key insight: For a crawler DO, location hints should optimize for the target API, not for the user querying the status endpoint. A status query from Europe hitting a DO in enam adds 100ms of latency — negligible for an admin dashboard. But the DO’s latency to Apple affects every single enrichment request, thousands per day.


Pattern 7: Operational Control Plane

The controller needs a management API for operators: check status, start/stop the loop, reset parameters, view metrics. This is exposed via the DO’s fetch() handler.

When to use it: Any long-running DO that humans need to inspect and control.

export class EnrichmentController extends DurableObject<Env> {
  // ... (state, constructor, alarm from Pattern 1)

  override async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url);
    const path = url.pathname;

    // ── Status endpoint ─────────────────────────────
    if (request.method === "GET" && path === "/status") {
      const sr = this.successRate();
      const rate = this.completionsPerMinute();
      return Response.json({
        running: this.state.running,
        total_dispatched: this.state.total_dispatched,
        total_completed: this.state.total_completed,
        total_failed: this.state.total_failed,
        completions_per_minute: rate,
        success_rate_pct: Math.round(sr * 100),
        current_batch_size: this.state.current_batch_size,
        current_interval_ms: this.state.current_interval_ms,
        in_cooldown: Date.now() < this.state.cooldown_until,
        cooldown_remaining_s: Math.max(
          0,
          Math.round(
            (this.state.cooldown_until - Date.now()) / 1000
          )
        ),
        last_tick_at: this.state.last_tick_at
          ? new Date(this.state.last_tick_at).toISOString()
          : null,
        projected_per_day: rate * 60 * 24,
      });
    }

    // ── Start the loop ──────────────────────────────
    if (request.method === "POST" && path === "/start") {
      this.state.running = true;
      this.state.cooldown_until = 0;
      await this.persist();
      await this.ctx.storage.setAlarm(Date.now() + 1000);
      return Response.json({ ok: true, running: true });
    }

    // ── Stop the loop ───────────────────────────────
    if (request.method === "POST" && path === "/stop") {
      this.state.running = false;
      await this.persist();
      // Don't delete the alarm — it will fire and see
      // running=false, then not reschedule. Cleaner than
      // trying to cancel mid-flight.
      return Response.json({ ok: true, running: false });
    }

    // ── Reset to defaults ───────────────────────────
    if (request.method === "POST" && path === "/reset") {
      this.state = { ...DEFAULT_STATE };
      await this.persist();
      await this.ctx.storage.setAlarm(Date.now() + 1000);
      return Response.json({ ok: true, message: "Reset to defaults" });
    }

    // ── Manual parameter override ───────────────────
    if (request.method === "POST" && path === "/tune") {
      const body = (await request.json()) as Partial<{
        batch_size: number;
        interval_ms: number;
      }>;

      if (body.batch_size !== undefined) {
        this.state.current_batch_size = Math.max(
          ABSOLUTE_MIN_BATCH,
          Math.min(ABSOLUTE_MAX_BATCH, body.batch_size)
        );
      }
      if (body.interval_ms !== undefined) {
        this.state.current_interval_ms = Math.max(
          MIN_INTERVAL_MS,
          Math.min(MAX_INTERVAL_MS, body.interval_ms)
        );
      }

      await this.persist();
      return Response.json({
        ok: true,
        current_batch_size: this.state.current_batch_size,
        current_interval_ms: this.state.current_interval_ms,
      });
    }

    return new Response("Not found", { status: 404 });
  }
}

Usage from curl:

curl https://my-worker.workers.dev/enrichment/status | jq


curl -X POST https://my-worker.workers.dev/enrichment/stop

curl -X POST https://my-worker.workers.dev/enrichment/start

curl -X POST https://my-worker.workers.dev/enrichment/tune \
  -H "Content-Type: application/json" \
  -d '{"batch_size": 5, "interval_ms": 60000}'

curl -X POST https://my-worker.workers.dev/enrichment/reset

The projected_per_day metric:

This is the most useful number for operators. It takes the current completions-per-minute rate and extrapolates to a daily total. When the controller is running well, you might see projected_per_day: 25920 (18/min * 60 * 24). When it is in cooldown, this drops to 0. This single number tells you whether the system is healthy.

Key insight: The /stop endpoint does not delete the alarm. It sets running = false and persists. The next alarm fires, sees running = false, and exits without rescheduling. This is cleaner and safer than deleteAlarm() because it handles the race condition where the alarm fires between your stop and deleteAlarm calls.


Example 1: Minimal Adaptive Alarm

The simplest possible adaptive alarm loop — just the scheduling mechanics with no business logic.

import { DurableObject } from "cloudflare:workers";

export class MinimalAdaptiveLoop extends DurableObject {
  private interval = 30_000; // Start at 30s
  private successCount = 0;
  private failCount = 0;

  constructor(ctx: DurableObjectState, env: unknown) {
    super(ctx, env);
    ctx.blockConcurrencyWhile(async () => {
      this.interval =
        (await ctx.storage.get<number>("interval")) ?? 30_000;
      const alarm = await ctx.storage.getAlarm();
      if (!alarm) {
        await ctx.storage.setAlarm(Date.now() + this.interval);
      }
    });
  }

  override async alarm(): Promise<void> {
    const ok = Math.random() > 0.3; // Simulated work
    if (ok) {
      this.successCount++;
      this.interval = Math.max(10_000, this.interval * 0.9);
    } else {
      this.failCount++;
      this.interval = Math.min(120_000, this.interval * 1.5);
    }
    await this.ctx.storage.put("interval", this.interval);
    await this.ctx.storage.setAlarm(Date.now() + this.interval);
  }
}

Example 2: Success Rate Calculation

Isolated success rate function with proper edge case handling.

type Metrics = Record<
  number,
  { completed: number; failed: number }
>;

function calculateSuccessRate(
  metrics: Metrics,
  windowMinutes: number
): { rate: number; sampleSize: number; confidence: string } {
  const windowStart = Date.now() - windowMinutes * 60_000;
  let completed = 0;
  let failed = 0;

  for (const [ts, m] of Object.entries(metrics)) {
    if (Number(ts) >= windowStart) {
      completed += m.completed;
      failed += m.failed;
    }
  }

  const total = completed + failed;
  const rate = total === 0 ? 1 : completed / total;
  const confidence =
    total === 0
      ? "none"
      : total < 5
        ? "low"
        : total < 20
          ? "medium"
          : "high";

  return { rate, sampleSize: total, confidence };
}

Example 3: Batch Size Adaptation

Pure function that computes the new batch size given current state.

function adaptBatchSize(
  currentBatch: number,
  successRate: number,
  min: number,
  max: number
): { newBatch: number; action: string } {
  if (successRate < 0.2) {
    return { newBatch: min, action: "critical_reset" };
  }
  if (successRate < 0.5) {
    return {
      newBatch: Math.max(min, Math.floor(currentBatch * 0.5)),
      action: "halve",
    };
  }
  if (successRate > 0.95) {
    return {
      newBatch: Math.min(max, Math.ceil(currentBatch * 1.25)),
      action: "aggressive_ramp",
    };
  }
  if (successRate > 0.8) {
    return {
      newBatch: Math.min(max, Math.ceil(currentBatch * 1.1)),
      action: "gentle_ramp",
    };
  }
  return { newBatch: currentBatch, action: "hold" };
}

Example 4: Cooldown Gate

A reusable cooldown checker that can be composed into any alarm handler.

class CooldownGate {
  private cooldownUntil = 0;
  private readonly duration: number;

  constructor(durationMs: number) {
    this.duration = durationMs;
  }

  trigger(): void {
    this.cooldownUntil = Date.now() + this.duration;
  }

  isActive(): boolean {
    return Date.now() < this.cooldownUntil;
  }

  remainingMs(): number {
    return Math.max(0, this.cooldownUntil - Date.now());
  }

  remainingSeconds(): number {
    return Math.round(this.remainingMs() / 1000);
  }

  toJSON(): { cooldown_until: number } {
    return { cooldown_until: this.cooldownUntil };
  }

  static fromJSON(data: { cooldown_until: number }, durationMs: number): CooldownGate {
    const gate = new CooldownGate(durationMs);
    gate.cooldownUntil = data.cooldown_until;
    return gate;
  }
}

Example 5: Parallel Chunked Execution

Generic parallel execution with controlled concurrency and inter-chunk delays.

async function processInChunks<T, R>(
  items: T[],
  processor: (item: T) => Promise<R>,
  options: {
    concurrency: number;
    delayBetweenChunksMs: number;
  }
): Promise<{ results: PromiseSettledResult<R>[] }> {
  const allResults: PromiseSettledResult<R>[] = [];

  for (let i = 0; i < items.length; i += options.concurrency) {
    const chunk = items.slice(i, i + options.concurrency);
    const chunkResults = await Promise.allSettled(
      chunk.map(processor)
    );
    allResults.push(...chunkResults);

    if (i + options.concurrency < items.length) {
      await new Promise((r) =>
        setTimeout(r, options.delayBetweenChunksMs)
      );
    }
  }

  return { results: allResults };
}

// Usage
const { results } = await processInChunks(
  trackIds,
  (id) => enrichSingleApp(db, id),
  { concurrency: 8, delayBetweenChunksMs: 200 }
);

const succeeded = results.filter(
  (r) => r.status === "fulfilled"
).length;
const failed = results.filter(
  (r) => r.status === "rejected"
).length;

Example 6: Status Dashboard Endpoint

A comprehensive status response that gives operators everything they need.

function buildStatus(state: ControllerState): object {
  const sr = calculateSuccessRate(
    state.metrics,
    METRICS_WINDOW_MINUTES
  );
  const cpm = completionsPerMinute(state.metrics);

  return {
    // Control state
    running: state.running,
    in_cooldown: Date.now() < state.cooldown_until,
    cooldown_remaining_s: Math.max(
      0,
      Math.round((state.cooldown_until - Date.now()) / 1000)
    ),

    // Current parameters
    params: {
      batch_size: state.current_batch_size,
      interval_ms: state.current_interval_ms,
      interval_human: `${(state.current_interval_ms / 1000).toFixed(0)}s`,
    },

    // Metrics
    lifetime: {
      dispatched: state.total_dispatched,
      completed: state.total_completed,
      failed: state.total_failed,
      success_rate_pct: state.total_dispatched > 0
        ? Math.round(
            (state.total_completed / state.total_dispatched) * 100
          )
        : 100,
    },
    rolling: {
      success_rate_pct: Math.round(sr.rate * 100),
      sample_size: sr.sampleSize,
      confidence: sr.confidence,
      completions_per_minute: cpm,
    },

    // Projections
    projected_per_hour: cpm * 60,
    projected_per_day: cpm * 60 * 24,

    // Timing
    last_tick_at: state.last_tick_at
      ? new Date(state.last_tick_at).toISOString()
      : null,
    uptime_s: state.last_tick_at
      ? Math.round((Date.now() - state.last_tick_at) / 1000)
      : 0,
  };
}

Example 7: Constructor State Recovery

Safe state recovery that handles schema evolution.

constructor(ctx: DurableObjectState, env: Env) {
  super(ctx, env);
  ctx.blockConcurrencyWhile(async () => {
    const saved = await ctx.storage.get<ControllerState>("state");

    if (saved) {
      // Merge with defaults — new fields get default values,
      // saved fields override defaults
      this.state = { ...DEFAULT_STATE, ...saved };

      // Migrate: if saved state has fields that changed type,
      // fix them here
      if (typeof this.state.metrics !== "object") {
        this.state.metrics = {};
      }

      // Clamp: ensure saved values are within current bounds
      // (bounds may have changed between deploys)
      this.state.current_batch_size = Math.max(
        ABSOLUTE_MIN_BATCH,
        Math.min(ABSOLUTE_MAX_BATCH, this.state.current_batch_size)
      );
      this.state.current_interval_ms = Math.max(
        MIN_INTERVAL_MS,
        Math.min(MAX_INTERVAL_MS, this.state.current_interval_ms)
      );
    }

    this.pruneMetrics();

    // Self-start if running and no alarm is set
    if (this.state.running) {
      const existing = await ctx.storage.getAlarm();
      if (!existing) {
        await ctx.storage.setAlarm(
          Date.now() + this.state.current_interval_ms
        );
      }
    }
  });
}

Example 8: Metric Pruning

Keeping the metrics store bounded to prevent unbounded storage growth.

function pruneMetrics(
  metrics: Record<number, unknown>,
  windowMinutes: number
): number {
  const cutoff = Date.now() - (windowMinutes + 1) * 60_000;
  let pruned = 0;

  for (const ts of Object.keys(metrics)) {
    if (Number(ts) < cutoff) {
      delete metrics[Number(ts)];
      pruned++;
    }
  }

  return pruned;
}

// Call on every persist() to keep storage bounded:
// With a 5-minute window, you never have more than 6-7 buckets.
// Each bucket is ~50 bytes of JSON. Total: ~350 bytes.

Example 9: Graceful Start After Deploy

When a Worker is deployed, all DOs are restarted. The constructor must handle this gracefully.

// The self-start pattern in the constructor handles deploys
// automatically. But you might want a warm-up period:

constructor(ctx: DurableObjectState, env: Env) {
  super(ctx, env);
  ctx.blockConcurrencyWhile(async () => {
    const saved = await ctx.storage.get<ControllerState>("state");
    if (saved) {
      this.state = { ...DEFAULT_STATE, ...saved };
    }

    // After a deploy, start conservatively even if the saved
    // state has aggressive parameters. The deploy may have
    // changed the enrichment logic, and the old parameters
    // might not be appropriate.
    const deployGracePeriod = 60_000; // 1 minute
    if (this.state.running) {
      const existing = await ctx.storage.getAlarm();
      if (!existing) {
        // Delay the first tick to let the deploy settle
        await ctx.storage.setAlarm(
          Date.now() + deployGracePeriod
        );
      }
    }
  });
}

Example 10: Error Classification

Classifying errors to drive different adaptation behaviors.

type ErrorClass =
  | "rate_limit"    // 403, 429 → back off
  | "server_error"  // 500, 502, 503 → back off gently
  | "not_found"     // 404 → skip item, don't count as failure
  | "parse_error"   // Invalid response → skip item
  | "timeout"       // Network timeout → back off
  | "unknown";      // Unexpected → count as failure

function classifyError(err: unknown): ErrorClass {
  if (!(err instanceof Error)) return "unknown";
  const msg = err.message.toLowerCase();

  if (msg.includes("403") || msg.includes("429")) {
    return "rate_limit";
  }
  if (msg.includes("500") || msg.includes("502") || msg.includes("503")) {
    return "server_error";
  }
  if (msg.includes("404") || msg.includes("not found")) {
    return "not_found";
  }
  if (msg.includes("timeout") || msg.includes("timed out")) {
    return "timeout";
  }
  if (msg.includes("parse") || msg.includes("json") || msg.includes("html")) {
    return "parse_error";
  }
  return "unknown";
}

// Usage in the batch processor:
// Only rate_limit and server_error count toward the failure metric.
// not_found and parse_error are item-level problems, not API problems.
function shouldCountAsApiFailure(errorClass: ErrorClass): boolean {
  return errorClass === "rate_limit" || errorClass === "server_error" || errorClass === "timeout";
}

This is the complete, production-tested EnrichmentController from a real Apple App Store crawler. It processes thousands of apps per day, adapting to Apple’s varying tolerance levels across different times of day and Cloudflare colos.

/**
 * EnrichmentController — Durable Object that owns the
 * entire enrichment pipeline.
 *
 * Self-driving, self-tuning closed loop:
 *   alarm fires → pick apps from DB → enrich in parallel
 *   → measure success → adapt batch_size + interval
 *   → schedule next alarm at adapted interval
 *
 * No cron dependency. No queue. No static timers.
 * Everything adapts to Apple's response: ramps up when
 * succeeding, backs off when blocked, enters cooldown
 * on sustained failure.
 *
 * Identity: Singleton — addressed as idFromName('main')
 * Location: Created with locationHint 'enam' to pin to
 *           US East (Apple-friendly colo)
 */

import { DurableObject } from "cloudflare:workers";

// ── Bounds (safety rails, not targets) ──────────────
const ABSOLUTE_MIN_BATCH = 2;
const ABSOLUTE_MAX_BATCH = 50;
const MIN_INTERVAL_MS = 10_000;
const MAX_INTERVAL_MS = 120_000;
const COOLDOWN_MS = 300_000;
const METRICS_WINDOW_MINUTES = 5;
const ENRICHMENT_PARALLEL = 8;

// ── Adaptive thresholds ─────────────────────────────
const SUCCESS_RATE_CRITICAL = 0.2;
const SUCCESS_RATE_LOW = 0.5;
const SUCCESS_RATE_GOOD = 0.8;
const SUCCESS_RATE_GREAT = 0.95;
const RAMP_UP_FACTOR = 1.25;
const RAMP_DOWN_FACTOR = 0.5;

// ── State ───────────────────────────────────────────
interface ControllerState {
  total_dispatched: number;
  total_completed: number;
  total_failed: number;
  last_tick_at: number;
  running: boolean;
  current_batch_size: number;
  current_interval_ms: number;
  cooldown_until: number;
  metrics: Record<
    number,
    { dispatched: number; completed: number; failed: number }
  >;
}

const DEFAULT_STATE: ControllerState = {
  total_dispatched: 0,
  total_completed: 0,
  total_failed: 0,
  last_tick_at: 0,
  running: true,
  current_batch_size: 5,
  current_interval_ms: 30_000,
  cooldown_until: 0,
  metrics: {},
};

type Env = {
  DB: D1Database;
  HTML_CACHE: R2Bucket;
};

export class EnrichmentController extends DurableObject<Env> {
  private state: ControllerState = DEFAULT_STATE;

  constructor(ctx: DurableObjectState, env: Env) {
    super(ctx, env);
    ctx.blockConcurrencyWhile(async () => {
      const saved =
        await ctx.storage.get<ControllerState>("state");
      if (saved) {
        this.state = { ...DEFAULT_STATE, ...saved };
      }
      this.pruneMetrics();
      if (this.state.running) {
        const existing = await ctx.storage.getAlarm();
        if (!existing) {
          await ctx.storage.setAlarm(
            Date.now() + this.state.current_interval_ms
          );
        }
      }
    });
  }

  // ── HTTP API (status + control) ───────────────────

  override async fetch(request: Request): Promise<Response> {
    const url = new URL(request.url);
    const path = url.pathname;

    if (request.method === "GET" && path === "/status") {
      return Response.json(this.getStatus());
    }

    if (request.method === "POST" && path === "/start") {
      this.state.running = true;
      this.state.cooldown_until = 0;
      await this.persist();
      await this.ctx.storage.setAlarm(Date.now() + 1000);
      return Response.json({ ok: true, running: true });
    }

    if (request.method === "POST" && path === "/stop") {
      this.state.running = false;
      await this.persist();
      return Response.json({ ok: true, running: false });
    }

    if (request.method === "POST" && path === "/reset") {
      this.state = { ...DEFAULT_STATE };
      await this.persist();
      await this.ctx.storage.setAlarm(Date.now() + 1000);
      return Response.json({ ok: true });
    }

    return new Response("Not found", { status: 404 });
  }

  // ── Alarm: the heartbeat ──────────────────────────

  override async alarm(): Promise<void> {
    if (!this.state.running) return;

    try {
      if (Date.now() < this.state.cooldown_until) {
        const remaining = Math.round(
          (this.state.cooldown_until - Date.now()) / 1000
        );
        console.log(
          `[enrichment-do] In cooldown, ${remaining}s remaining`
        );
        await this.scheduleNext();
        return;
      }

      const batchSize = this.state.current_batch_size;
      const toEnrich = await this.env.DB.prepare(
        `SELECT track_id FROM app_registry
         WHERE enriched_at IS NULL
           OR (enrichment_source = 'itunes'
               AND enriched_at < datetime('now', '-14 days'))
         ORDER BY enrich_priority DESC, rating_count DESC
         LIMIT ?`
      )
        .bind(batchSize)
        .all<{ track_id: number }>();

      if (toEnrich.results.length === 0) {
        console.log(`[enrichment-do] No apps to enrich`);
        await this.scheduleNext();
        return;
      }

      const trackIds = toEnrich.results.map((r) => r.track_id);
      this.state.total_dispatched += trackIds.length;
      this.recordMetric("dispatched", trackIds.length);

      const { enriched, failed } =
        await this.enrichBatch(trackIds);

      this.state.total_completed += enriched;
      this.state.total_failed += failed;
      this.state.last_tick_at = Date.now();
      this.recordMetric("completed", enriched);
      this.recordMetric("failed", failed);

      this.adapt();

      console.log(
        `[enrichment-do] ${enriched}/${trackIds.length} enriched, ` +
          `${failed} failed | ` +
          `batch=${this.state.current_batch_size} ` +
          `interval=${(this.state.current_interval_ms / 1000).toFixed(0)}s`
      );
    } catch (err) {
      console.error(`[enrichment-do] Tick failed:`, err);
    }

    await this.persist();
    await this.scheduleNext();
  }

  // ── Enrichment engine ─────────────────────────────

  private async enrichBatch(
    trackIds: number[]
  ): Promise<{ enriched: number; failed: number }> {
    const { default: enrichSingleApp } = await import(
      "./enrich-single"
    );
    let enriched = 0;
    let failed = 0;

    for (
      let i = 0;
      i < trackIds.length;
      i += ENRICHMENT_PARALLEL
    ) {
      const batch = trackIds.slice(
        i,
        i + ENRICHMENT_PARALLEL
      );
      await Promise.allSettled(
        batch.map(async (trackId) => {
          try {
            await enrichSingleApp(
              this.env.DB,
              trackId,
              this.env.HTML_CACHE
            );
            enriched++;
          } catch (err: unknown) {
            failed++;
            const msg =
              err instanceof Error ? err.message : "";
            const isRateLimit =
              msg.includes("403") || msg.includes("429");
            if (!isRateLimit) {
              await this.env.DB.prepare(
                `UPDATE app_registry
                 SET enriched_at = datetime('now'),
                     enrichment_source = 'failed'
                 WHERE track_id = ?`
              )
                .bind(trackId)
                .run()
                .catch(() => {});
            }
          }
        })
      );
      if (i + ENRICHMENT_PARALLEL < trackIds.length) {
        await new Promise((r) => setTimeout(r, 200));
      }
    }

    return { enriched, failed };
  }

  // ── Adaptive logic ────────────────────────────────

  private adapt(): void {
    const sr = this.successRate();
    const recentTotal = this.recentTotal();

    if (recentTotal < 5) return;

    if (sr < SUCCESS_RATE_CRITICAL) {
      console.log(
        `[enrichment-do] CRITICAL: ` +
          `${(sr * 100).toFixed(0)}% success → ` +
          `cooldown ${COOLDOWN_MS / 1000}s`
      );
      this.state.cooldown_until = Date.now() + COOLDOWN_MS;
      this.state.current_batch_size = ABSOLUTE_MIN_BATCH;
      this.state.current_interval_ms = MAX_INTERVAL_MS;
    } else if (sr < SUCCESS_RATE_LOW) {
      this.state.current_batch_size = Math.max(
        ABSOLUTE_MIN_BATCH,
        Math.floor(
          this.state.current_batch_size * RAMP_DOWN_FACTOR
        )
      );
      this.state.current_interval_ms = Math.min(
        MAX_INTERVAL_MS,
        Math.floor(this.state.current_interval_ms * 1.5)
      );
    } else if (sr > SUCCESS_RATE_GREAT) {
      this.state.current_batch_size = Math.min(
        ABSOLUTE_MAX_BATCH,
        Math.ceil(
          this.state.current_batch_size * RAMP_UP_FACTOR
        )
      );
      this.state.current_interval_ms = Math.max(
        MIN_INTERVAL_MS,
        Math.floor(this.state.current_interval_ms * 0.8)
      );
    } else if (sr > SUCCESS_RATE_GOOD) {
      this.state.current_batch_size = Math.min(
        ABSOLUTE_MAX_BATCH,
        Math.ceil(this.state.current_batch_size * 1.1)
      );
      this.state.current_interval_ms = Math.max(
        MIN_INTERVAL_MS,
        Math.floor(this.state.current_interval_ms * 0.95)
      );
    }
  }

  private async scheduleNext(): Promise<void> {
    if (this.state.running) {
      await this.ctx.storage.setAlarm(
        Date.now() + this.state.current_interval_ms
      );
    }
  }

  // ── Metrics ───────────────────────────────────────

  private getStatus(): object {
    const sr = this.successRate();
    const rate = this.recentCompletionsPerMinute();
    return {
      running: this.state.running,
      total_dispatched: this.state.total_dispatched,
      total_completed: this.state.total_completed,
      total_failed: this.state.total_failed,
      completions_per_minute: rate,
      success_rate: Math.round(sr * 100),
      current_batch_size: this.state.current_batch_size,
      current_interval_ms: this.state.current_interval_ms,
      in_cooldown:
        Date.now() < this.state.cooldown_until,
      cooldown_remaining_s: Math.max(
        0,
        Math.round(
          (this.state.cooldown_until - Date.now()) / 1000
        )
      ),
      last_tick_at: this.state.last_tick_at || null,
      projected_per_day: rate * 60 * 24,
    };
  }

  private successRate(): number {
    const windowStart =
      Date.now() - METRICS_WINDOW_MINUTES * 60_000;
    let completed = 0;
    let failed = 0;
    for (const [ts, m] of Object.entries(
      this.state.metrics
    )) {
      if (Number(ts) >= windowStart) {
        completed += m.completed;
        failed += m.failed;
      }
    }
    const total = completed + failed;
    return total === 0 ? 1 : completed / total;
  }

  private recentTotal(): number {
    const windowStart =
      Date.now() - METRICS_WINDOW_MINUTES * 60_000;
    let total = 0;
    for (const [ts, m] of Object.entries(
      this.state.metrics
    )) {
      if (Number(ts) >= windowStart) {
        total += m.completed + m.failed;
      }
    }
    return total;
  }

  private recentCompletionsPerMinute(): number {
    const windowStart =
      Date.now() - METRICS_WINDOW_MINUTES * 60_000;
    let total = 0;
    for (const [ts, m] of Object.entries(
      this.state.metrics
    )) {
      if (Number(ts) >= windowStart) {
        total += m.completed;
      }
    }
    return Math.round(total / METRICS_WINDOW_MINUTES);
  }

  private recordMetric(
    key: "dispatched" | "completed" | "failed",
    count: number
  ): void {
    const minuteTs =
      Math.floor(Date.now() / 60_000) * 60_000;
    if (!this.state.metrics[minuteTs]) {
      this.state.metrics[minuteTs] = {
        dispatched: 0,
        completed: 0,
        failed: 0,
      };
    }
    this.state.metrics[minuteTs][key] += count;
  }

  private pruneMetrics(): void {
    const cutoff =
      Date.now() - (METRICS_WINDOW_MINUTES + 1) * 60_000;
    for (const ts of Object.keys(this.state.metrics)) {
      if (Number(ts) < cutoff) {
        delete this.state.metrics[Number(ts)];
      }
    }
  }

  private async persist(): Promise<void> {
    this.pruneMetrics();
    await this.ctx.storage.put("state", this.state);
  }
}

The Worker entrypoint that creates the singleton with location hint:

type Env = {
  DB: D1Database;
  HTML_CACHE: R2Bucket;
  ENRICHMENT_CONTROLLER: DurableObjectNamespace<EnrichmentController>;
};

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);

    if (url.pathname.startsWith("/v1/enrichment")) {
      const id = env.ENRICHMENT_CONTROLLER.idFromName("main");
      const stub = env.ENRICHMENT_CONTROLLER.get(id, {
        locationHint: "enam",
      });
      const doUrl = new URL(request.url);
      doUrl.pathname = doUrl.pathname.replace(
        "/v1/enrichment",
        ""
      );
      return stub.fetch(doUrl.toString(), request);
    }

    return new Response("Not found", { status: 404 });
  },
};

export { EnrichmentController };

DO Adaptive Controller vs. Alternative Approaches

DimensionFixed CronToken BucketQueue BackpressureDO Adaptive Controller
What it isCron trigger fires every N minutes, processes a fixed batchIn-memory counter that refills at a fixed rate, gates requestsQueue with max_concurrency and DLQ, retries with backoffSingleton DO with alarm loop, rolling metrics, adaptive parameters
Interval controlFixed (1-min minimum on CF)N/A (gates incoming, doesn’t schedule)Queue consumer pulls when readyDynamic: 10s–120s, adjusted per tick
Batch size controlFixed at deploy timeN/A (per-request gating)Controlled by max_concurrency + batch sizeDynamic: 2–50, adjusted per tick based on success rate
State persistenceNone — statelessIn-memory only (lost on restart)Queue messages persist, but consumer is statelessFull state in DO storage, survives restarts and migrations
Learning from failuresNo — same batch/interval regardlessNo — fixed refill rateLimited — retries with backoff per messageYes — rolling window success rate drives all parameters
Cooldown on sustained failureNo — keeps hammeringNo — keeps gating at same rateDLQ catches failed messages after max retriesYes — 5-min cooldown on <20% success rate
Per-entity isolationOne cron for everythingPer-API-key typicallyPer-queue, but queues are sharedPer-DO — each entity gets its own adaptive loop
Location optimizationRuns in any coloN/AConsumer runs in any coloLocation hints pin to target-API-friendly colo
ObservabilityLogs onlyMetrics if you build themQueue depth, DLQ depthFull status API: rate, batch, interval, projections
Cost while idleCron fires even with no workMemory cost for idle bucketsQueue polling costZero — DO hibernates, alarm is free until it fires
Implementation effortLowMediumMediumHigh (but reusable)
Best forSimple periodic jobsProtecting your own API from callersDecoupling producers from consumersContinuous adaptive crawling of external APIs

When to Use Each

Fixed Cron is right when:

Token Bucket is right when:

Queue Backpressure is right when:

DO Adaptive Controller is right when:

The DO Pattern as “Stateful Scheduler”

The key differentiator is that the DO adaptive controller is a stateful scheduler — it combines scheduling, execution, state management, and adaptation into a single primitive. Every other approach separates at least two of these concerns:

The DO collapses all four into one object. The same code that decides “when to run” also knows “how the last run went” and uses that knowledge to decide “what to do differently next time.” This is why it can adapt in ways that separated systems cannot.


Don’tDo InsteadWhy
Use Promise.all for parallel batch processingUse Promise.allSettledPromise.all rejects on first failure, losing in-flight results and giving inaccurate success/failure counts
Store metrics as a flat array of eventsStore as minute-bucketed aggregatesFlat arrays grow unboundedly; minute buckets are constant-size and easy to prune
React to individual request failuresReact to rolling window success rateIndividual failures cause oscillation; rolling windows provide stable signal
Set a long cooldown (30+ minutes)Keep cooldown short (3-5 minutes) and let the controller re-enter cooldown if neededLong cooldowns waste time if the API recovers quickly; short + repeated is more responsive
Use deleteAlarm() to stop the loopSet running = false and let the alarm see itdeleteAlarm() races with alarm delivery; running = false is always safe
Hard-code the User-Agent stringUse bare fetch() with no custom headers for JSON APIsCustom UAs trigger TLS fingerprint mismatches on certain CF colos, causing false 403s (observed with Apple APIs)
Store state as multiple separate storage keysStore as a single JSON objectSeparate keys mean non-atomic updates — you can have batch_size from tick N and interval from tick N-1 after a crash
Skip the recent < 5 check in adapt()Always require minimum sample size before adaptingWithout it, a single failed request triggers aggressive backoff, potentially entering cooldown unnecessarily
Use lifetime success rate for adaptationUse a rolling window (5 minutes)Lifetime rates react too slowly — 10,000 successes followed by 50 failures barely moves the needle
Process items serially in the alarm handlerProcess in parallel chunks with inter-chunk delaysSerial processing wastes the alarm’s CPU time budget; parallel chunks maximize throughput within limits
Create the DO without a location hintUse locationHint on the first get() callWithout a hint, the DO spawns near the first request, which may be an admin in Europe while the target API is in the US
Retry rate-limited items immediately in the same tickLet them stay in the queue for the next tickRetrying immediately just burns more rate limit budget; the next tick will have adapted parameters
Use setTimeout loops inside the alarm handlerUse the alarm loop itself for recurring worksetTimeout inside a Worker has no persistence guarantee; alarms are durable and survive restarts
Store raw error strings in metricsClassify errors and only count API-level failuresParse errors and 404s are item-level problems, not rate-limit signals; counting them pollutes the success rate

The Durable Object must be declared in wrangler.jsonc:

{
  "$schema": "node_modules/wrangler/config-schema.json",
  "name": "my-crawler",
  "main": "src/index.ts",
  "compatibility_date": "2024-12-01",

  "durable_objects": {
    "bindings": [
      {
        "name": "ENRICHMENT_CONTROLLER",
        "class_name": "EnrichmentController"
      }
    ]
  },

  "migrations": [
    {
      "tag": "v1",
      "new_classes": ["EnrichmentController"]
    }
  ],

  // D1 database for the work queue
  "d1_databases": [
    {
      "binding": "DB",
      "database_name": "my-crawler-db",
      "database_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    }
  ],

  // R2 bucket for caching scraped HTML
  "r2_buckets": [
    {
      "binding": "HTML_CACHE",
      "bucket_name": "my-crawler-html-cache"
    }
  ]
}

Key points:


Official Cloudflare Documentation

Cloudflare Blog Posts

Rate Limiting and Backoff Theory

Community Resources


Built from production code running on Cloudflare Workers, processing thousands of Apple App Store records daily with zero manual intervention. The adaptive controller pattern has maintained a 97%+ success rate across 6 months of continuous operation, automatically navigating Apple’s varying rate limit enforcement.


Edit page
Share this post on:

Previous Post
Composable Processor Architecture for AI Content Pipelines
Next Post
Content Rewards Clipping Guide