Org Status: 🟡 Dormant Cloudflare: N/A Last Audited: 2026-04-28
You need to crawl an external API — thousands of records, no official rate limit documented, and a 403 that arrives without warning. Fixed-interval crons either waste capacity or get you blocked. Token buckets don’t learn. Queue-based backpressure has no memory of what happened ten minutes ago.
Durable Objects solve this with a pattern I call the adaptive controller: a stateful singleton that schedules its own work, measures its own success rate, and tunes its own parameters in a closed feedback loop. No external scheduler. No static timers. The system finds its own cruising speed.
What you’ll learn:
- How to build a self-tuning crawler that adapts batch size and interval based on API response signals
- Why Durable Object alarms are a superior scheduling primitive compared to crons, queues, and external schedulers
- The complete feedback loop: success ramp-up, failure backoff, sustained-failure cooldown
- State management patterns for adaptive parameters in DO storage
- Location hints for latency optimization when targeting specific API endpoints
- Parallel batch processing within a single alarm tick
- Error handling and recovery patterns that prevent both over-crawling and under-crawling
- Production-tested code from a real Apple App Store crawler processing thousands of apps
- The Problem: Crawling Without Getting Blocked
- Core Concepts
- Patterns
- Small Examples
- Example 1: Minimal Adaptive Alarm
- Example 2: Success Rate Calculation
- Example 3: Batch Size Adaptation
- Example 4: Cooldown Gate
- Example 5: Parallel Chunked Execution
- Example 6: Status Dashboard Endpoint
- Example 7: Constructor State Recovery
- Example 8: Metric Pruning
- Example 9: Graceful Start After Deploy
- Example 10: Error Classification
- The Full Implementation
- Comparisons
- Anti-Patterns
- Wrangler Configuration
- References
Every developer who has built a crawler for an external API has faced the same situation. You need data from a service — the Apple App Store, a social media API, a competitor’s product catalog — and that service has rate limits that are either undocumented, inconsistently enforced, or both.
The Apple App Store is a particularly instructive example. There is no official “App Store Scraping API.” The iTunes Search API is public but limited. For rich data — subtitles, rating histograms, in-app purchase details, similar apps — you need to scrape the store pages directly. Apple’s response to automated requests varies by:
- Cloudflare colo: requests from US East data centers get through more reliably than from Asian or European colos
- Request volume: no published rate limit, but sustained high volume triggers 403s
- Time of day: infrastructure load patterns affect tolerance
- Request characteristics: certain User-Agent strings or query parameters trigger bot detection faster
The naive approach is a cron job that runs every N minutes and processes a fixed batch:
// The naive approach — don't do this
export default {
async scheduled(event: ScheduledEvent, env: Env) {
const apps = await env.DB.prepare(
`SELECT track_id FROM app_registry
WHERE enriched_at IS NULL LIMIT 50`
).all();
for (const app of apps.results) {
try {
await enrichApp(env.DB, app.track_id);
} catch (err) {
console.error(`Failed: ${app.track_id}`, err);
}
}
},
};
This approach has several fatal flaws:
-
Fixed batch size: 50 apps per tick regardless of whether Apple is responding or blocking. When Apple is happy, you waste capacity. When Apple is angry, you burn through failed requests.
-
Fixed interval: Cron triggers on Cloudflare Workers have a minimum granularity of 1 minute. You can only have 3 cron triggers per Worker. You cannot adjust the interval programmatically.
-
No memory: Each cron invocation is stateless. It does not know that the last 10 invocations all returned 403s. It does not know that it successfully processed 200 apps in the last hour without a single failure.
-
No cooldown: When you hit a rate limit wall, the cron keeps hammering. The 403s keep coming. You are actively antagonizing the target API.
-
No parallelism control: Serial processing wastes time. Fully parallel processing overloads the target. There is no middle ground.
What changes if you get this right:
- Throughput maximization: when the API is responsive, you process at maximum safe speed
- Automatic protection: when the API signals stress, you back off before getting blocked
- Zero manual intervention: the system finds its own cruising speed after deployment
- Observability: you can see exactly what the system is doing and why at any moment
- Cost efficiency: no wasted requests, no unnecessary retries, no blocked-then-retry cycles
The Adaptive Controller Pattern
The adaptive controller is a Durable Object that acts as both scheduler and executor. It owns its own timing, measures its own results, and adjusts its own parameters. This is fundamentally different from systems where scheduling and execution are separate concerns.
interface AdaptiveController {
// Scheduling — the DO schedules itself
alarm(): Promise<void>;
// Execution — the DO does the work
processBatch(items: string[]): Promise<BatchResult>;
// Adaptation — the DO adjusts its own parameters
adapt(result: BatchResult): void;
// Persistence — the DO remembers across restarts
persist(): Promise<void>;
}
interface BatchResult {
dispatched: number;
succeeded: number;
failed: number;
errors: ErrorSignal[];
}
type ErrorSignal =
| { type: "rate_limit"; status: 403 | 429 }
| { type: "server_error"; status: 500 | 502 | 503 }
| { type: "timeout" }
| { type: "parse_error" }
| { type: "not_found"; status: 404 };
The key insight is that all three responsibilities — scheduling, execution, and adaptation — live in the same object, sharing the same state, running in the same process. There is no coordination overhead, no race condition between the scheduler deciding “run now” and the executor deciding “I’m overwhelmed.”
Key insight: The adaptive controller is a control system in the classical sense. It has a process variable (success rate), a setpoint (target throughput), and a controller (the
adapt()function) that adjusts the manipulated variables (batch size, interval) to keep the process variable near the setpoint.
Durable Object Alarms as a Scheduling Primitive
A Durable Object alarm is a single scheduled wake-up call. You set it with ctx.storage.setAlarm(timestamp), and at that time, the runtime calls your alarm() method. Each DO can have exactly one alarm at a time.
interface AlarmCapabilities {
// Set a single alarm — millisecond granularity
setAlarm(scheduledTime: number | Date): Promise<void>;
// Get the currently scheduled alarm time (or null)
getAlarm(): Promise<number | null>;
// Cancel the current alarm
deleteAlarm(): Promise<void>;
}
Why alarms are superior to crons for adaptive scheduling:
| Property | Cron Triggers | DO Alarms |
|---|---|---|
| Granularity | 1 minute minimum | Millisecond precision |
| Per-Worker limit | 3 triggers max | Unlimited DOs, each with 1 alarm |
| Programmable interval | No — configured in dashboard/wrangler | Yes — set dynamically in code |
| State between invocations | None — stateless | Full — DO storage persists |
| Retry on failure | No built-in retry | Automatic exponential backoff retry |
| Per-entity isolation | Shared across all entities | One DO per entity, each with own alarm |
| Location control | Runs wherever Worker is deployed | Location hints pin to specific colo |
The alarm’s at-least-once guarantee means your controller will always wake up, even after infrastructure maintenance or failures. The automatic retry with exponential backoff (starting at 2 seconds, up to 6 retries) means transient failures in your alarm() handler don’t break the scheduling loop.
Key insight: A DO alarm is not a timer. It is a durable scheduling primitive with persistence guarantees. Setting an alarm is making a contract with the runtime: “Wake me up at this time, and keep trying until my handler succeeds.” This is fundamentally different from
setTimeoutor cron, which are fire-and-forget.
The Feedback Loop
The feedback loop is the core mechanism that makes the controller adaptive. Every alarm tick follows the same cycle:
+------------------------------------------------------+
| |
| +---------+ +---------+ +---------+ |
| | WAKE |--->| EXECUTE |--->| MEASURE | |
| | (alarm) | | (batch) | | (stats) | |
| +---------+ +---------+ +----v----+ |
| ^ | |
| | +---------+ +----v----+ |
| | |SCHEDULE |<---| ADAPT | |
| +---------| (alarm) | | (tune) | |
| +---------+ +---------+ |
| |
+------------------------------------------------------+
Each phase:
- WAKE: The alarm fires. The DO wakes from hibernation (zero cost while sleeping).
- EXECUTE: Query the database for work items. Process them in parallel batches.
- MEASURE: Record successes and failures in a rolling metrics window.
- ADAPT: Calculate the success rate over the window. Adjust batch size and interval.
- SCHEDULE: Set the next alarm at the adapted interval.
The loop is self-perpetuating. Once started, it runs indefinitely until explicitly stopped. Each iteration learns from the previous one.
// The complete feedback loop in pseudocode
async alarm(): Promise<void> {
if (!this.state.running) return;
// 1. Check cooldown
if (Date.now() < this.state.cooldown_until) {
await this.scheduleNext();
return;
}
// 2. Execute
const items = await this.fetchWorkItems(this.state.current_batch_size);
const result = await this.processBatch(items);
// 3. Measure
this.recordMetrics(result);
// 4. Adapt
this.adapt();
// 5. Schedule
await this.persist();
await this.scheduleNext();
}
State Shape
The controller’s state is a single JSON object persisted to DO storage. It captures everything needed to resume after hibernation, restart, or migration.
interface ControllerState {
// Lifetime counters
total_dispatched: number;
total_completed: number;
total_failed: number;
last_tick_at: number;
// Control flag
running: boolean;
// Adaptive parameters — these are the tuning knobs
current_batch_size: number; // How many items per tick (2–50)
current_interval_ms: number; // Time between ticks (10s–120s)
cooldown_until: number; // Timestamp: skip execution until this time
// Rolling metrics for success rate calculation
// Keyed by minute-aligned timestamp for easy pruning
metrics: Record<number, {
dispatched: number;
completed: number;
failed: number;
}>;
}
The default state starts conservative — small batch, moderate interval — and lets the controller ramp up as it proves the API is responsive:
const DEFAULT_STATE: ControllerState = {
total_dispatched: 0,
total_completed: 0,
total_failed: 0,
last_tick_at: 0,
running: true,
current_batch_size: 5, // Start small
current_interval_ms: 30_000, // Every 30 seconds
cooldown_until: 0,
metrics: {},
};
Key insight: The state is a single
put/geton DO storage, not a set of scattered keys. This makes persistence atomic — you never have a partially-updated state. The metrics window is embedded in the state object, pruned on every persist, keeping the storage footprint bounded.
Pattern 1: The Self-Driving Singleton
The controller is a singleton Durable Object — one instance per crawl target, addressed by a stable name. It drives itself: the constructor starts the alarm loop, and each alarm schedules the next one. No external trigger needed after the initial creation.
When to use it: Any scenario where you need continuous, adaptive processing of a work queue against a rate-limited external API.
import { DurableObject } from "cloudflare:workers";
// Safety rails — bounds that the adaptive logic cannot exceed
const ABSOLUTE_MIN_BATCH = 2;
const ABSOLUTE_MAX_BATCH = 50;
const MIN_INTERVAL_MS = 10_000; // Fastest: every 10 seconds
const MAX_INTERVAL_MS = 120_000; // Slowest: every 2 minutes
const COOLDOWN_MS = 300_000; // 5 minutes on critical failure
const METRICS_WINDOW_MINUTES = 5;
const PARALLEL_LIMIT = 8;
interface ControllerState {
total_dispatched: number;
total_completed: number;
total_failed: number;
last_tick_at: number;
running: boolean;
current_batch_size: number;
current_interval_ms: number;
cooldown_until: number;
metrics: Record<
number,
{ dispatched: number; completed: number; failed: number }
>;
}
const DEFAULT_STATE: ControllerState = {
total_dispatched: 0,
total_completed: 0,
total_failed: 0,
last_tick_at: 0,
running: true,
current_batch_size: 5,
current_interval_ms: 30_000,
cooldown_until: 0,
metrics: {},
};
type Env = {
DB: D1Database;
};
export class EnrichmentController extends DurableObject<Env> {
private state: ControllerState = { ...DEFAULT_STATE };
constructor(ctx: DurableObjectState, env: Env) {
super(ctx, env);
// blockConcurrencyWhile ensures no requests arrive
// until we finish loading state from storage
ctx.blockConcurrencyWhile(async () => {
const saved = await ctx.storage.get<ControllerState>("state");
if (saved) {
this.state = { ...DEFAULT_STATE, ...saved };
}
this.pruneMetrics();
// Self-start: if the controller should be running
// and there is no alarm set, set one
if (this.state.running) {
const existing = await ctx.storage.getAlarm();
if (!existing) {
await ctx.storage.setAlarm(
Date.now() + this.state.current_interval_ms
);
}
}
});
}
override async alarm(): Promise<void> {
if (!this.state.running) return;
try {
// Cooldown check — skip execution, just reschedule
if (Date.now() < this.state.cooldown_until) {
const remaining = Math.round(
(this.state.cooldown_until - Date.now()) / 1000
);
console.log(
`[controller] In cooldown, ${remaining}s remaining`
);
await this.scheduleNext();
return;
}
// Fetch work items from the database
const batchSize = this.state.current_batch_size;
const toProcess = await this.env.DB.prepare(
`SELECT id FROM work_queue
WHERE processed_at IS NULL
ORDER BY priority DESC, created_at ASC
LIMIT ?`
)
.bind(batchSize)
.all<{ id: string }>();
if (toProcess.results.length === 0) {
console.log(`[controller] No items to process`);
await this.scheduleNext();
return;
}
const ids = toProcess.results.map((r) => r.id);
this.state.total_dispatched += ids.length;
this.recordMetric("dispatched", ids.length);
// Process in parallel batches
const { succeeded, failed } = await this.processBatch(ids);
// Record results
this.state.total_completed += succeeded;
this.state.total_failed += failed;
this.state.last_tick_at = Date.now();
this.recordMetric("completed", succeeded);
this.recordMetric("failed", failed);
// Adapt based on results
this.adapt();
console.log(
`[controller] ${succeeded}/${ids.length} ok, ` +
`${failed} failed | ` +
`batch=${this.state.current_batch_size} ` +
`interval=${(this.state.current_interval_ms / 1000).toFixed(0)}s`
);
} catch (err) {
console.error(`[controller] Tick failed:`, err);
}
await this.persist();
await this.scheduleNext();
}
private async processBatch(
ids: string[]
): Promise<{ succeeded: number; failed: number }> {
let succeeded = 0;
let failed = 0;
// Process in chunks of PARALLEL_LIMIT
for (let i = 0; i < ids.length; i += PARALLEL_LIMIT) {
const chunk = ids.slice(i, i + PARALLEL_LIMIT);
await Promise.allSettled(
chunk.map(async (id) => {
try {
await this.processItem(id);
succeeded++;
} catch {
failed++;
}
})
);
// Brief pause between parallel chunks
if (i + PARALLEL_LIMIT < ids.length) {
await new Promise((r) => setTimeout(r, 200));
}
}
return { succeeded, failed };
}
private async processItem(id: string): Promise<void> {
// Your actual processing logic here
// This is where you call the external API
throw new Error("Implement me");
}
private adapt(): void {
// See Pattern 3 for the full adaptive logic
}
private async scheduleNext(): Promise<void> {
if (this.state.running) {
await this.ctx.storage.setAlarm(
Date.now() + this.state.current_interval_ms
);
}
}
private recordMetric(
key: "dispatched" | "completed" | "failed",
count: number
): void {
const minuteTs = Math.floor(Date.now() / 60_000) * 60_000;
if (!this.state.metrics[minuteTs]) {
this.state.metrics[minuteTs] = {
dispatched: 0,
completed: 0,
failed: 0,
};
}
this.state.metrics[minuteTs][key] += count;
}
private pruneMetrics(): void {
const cutoff =
Date.now() - (METRICS_WINDOW_MINUTES + 1) * 60_000;
for (const ts of Object.keys(this.state.metrics)) {
if (Number(ts) < cutoff) {
delete this.state.metrics[Number(ts)];
}
}
}
private async persist(): Promise<void> {
this.pruneMetrics();
await this.ctx.storage.put("state", this.state);
}
}
Gotchas:
blockConcurrencyWhilein the constructor is essential. Without it, an alarm could fire before the state is loaded from storage, causing the controller to operate on default state and potentially overwrite saved adaptive parameters.- The spread
{ ...DEFAULT_STATE, ...saved }ensures forward compatibility. When you add new fields toControllerState, existing DOs get the default values for new fields while preserving their saved values. setAlarmoverwrites any existing alarm. There is noaddAlarm. If you callsetAlarmwhile an alarm is pending, the old alarm is replaced.
Connection to other patterns: This is the skeleton. Patterns 2-7 flesh out each method.
Pattern 2: Rolling Window Metrics
The controller needs to know its recent success rate to make adaptation decisions. A lifetime success rate is too slow to react — if you processed 10,000 items successfully and then hit a rate limit wall, your lifetime rate barely dips. You need a rolling window.
When to use it: Any adaptive system that needs to respond to recent conditions, not historical averages.
const METRICS_WINDOW_MINUTES = 5;
// Metrics are bucketed by minute for easy pruning.
// Each minute bucket accumulates dispatched/completed/failed counts.
type MetricsBucket = {
dispatched: number;
completed: number;
failed: number;
};
// Stored in the controller state, keyed by minute-aligned timestamp
type MetricsStore = Record<number, MetricsBucket>;
function recordMetric(
metrics: MetricsStore,
key: keyof MetricsBucket,
count: number
): void {
// Align to minute boundary
const minuteTs = Math.floor(Date.now() / 60_000) * 60_000;
if (!metrics[minuteTs]) {
metrics[minuteTs] = { dispatched: 0, completed: 0, failed: 0 };
}
metrics[minuteTs][key] += count;
}
function successRate(metrics: MetricsStore): number {
const windowStart =
Date.now() - METRICS_WINDOW_MINUTES * 60_000;
let completed = 0;
let failed = 0;
for (const [ts, bucket] of Object.entries(metrics)) {
if (Number(ts) >= windowStart) {
completed += bucket.completed;
failed += bucket.failed;
}
}
const total = completed + failed;
// No data yet → assume success (optimistic start)
return total === 0 ? 1 : completed / total;
}
function recentTotal(metrics: MetricsStore): number {
const windowStart =
Date.now() - METRICS_WINDOW_MINUTES * 60_000;
let total = 0;
for (const [ts, bucket] of Object.entries(metrics)) {
if (Number(ts) >= windowStart) {
total += bucket.completed + bucket.failed;
}
}
return total;
}
function completionsPerMinute(metrics: MetricsStore): number {
const windowStart =
Date.now() - METRICS_WINDOW_MINUTES * 60_000;
let total = 0;
for (const [ts, bucket] of Object.entries(metrics)) {
if (Number(ts) >= windowStart) {
total += bucket.completed;
}
}
return Math.round(total / METRICS_WINDOW_MINUTES);
}
function pruneMetrics(metrics: MetricsStore): void {
const cutoff =
Date.now() - (METRICS_WINDOW_MINUTES + 1) * 60_000;
for (const ts of Object.keys(metrics)) {
if (Number(ts) < cutoff) {
delete metrics[Number(ts)];
}
}
}
Why minute-aligned buckets?
The alternative is per-tick metrics (one entry per alarm invocation). The problem: alarm intervals vary from 10 seconds to 2 minutes. A 5-minute window at 10-second intervals is 30 entries. At 2-minute intervals, it is 2-3 entries. Minute-aligned buckets give a consistent window size (always 5 buckets for a 5-minute window) regardless of the alarm interval.
Why prune on every persist?
DO storage has no automatic TTL. Without pruning, the metrics object grows indefinitely. Since the entire state is a single put call, keeping old buckets inflates every write. Pruning to WINDOW + 1 minutes keeps the storage footprint constant.
Key insight: The optimistic default (
return total === 0 ? 1 : ...) matters. When the controller first starts or resumes after a long pause, it has no metrics. Returning 1.0 (100% success) means it uses its current batch size and interval without unnecessarily throttling. The first real failures will immediately adjust the parameters.
Edge cases:
- If the controller sleeps through its entire metrics window (e.g., a 5-minute cooldown with a 5-minute window), all metrics expire. The controller restarts from its current parameters with an optimistic 100% success rate. This is correct — the cooldown gave the API time to recover.
- Metric keys are timestamps as numbers.
Object.keysreturns strings. TheNumber(ts)cast in the loop is necessary.
Pattern 3: Tiered Adaptive Response
The adaptation logic uses the rolling success rate to make proportional adjustments. Not all failures are equal — a temporary blip should cause a gentle slowdown, while a sustained block demands aggressive retreat.
When to use it: When you need more nuance than binary “back off / speed up.”
// Thresholds define four zones of operation
const SUCCESS_RATE_CRITICAL = 0.2; // <20% → emergency cooldown
const SUCCESS_RATE_LOW = 0.5; // <50% → significant backoff
const SUCCESS_RATE_GOOD = 0.8; // >80% → gentle ramp up
const SUCCESS_RATE_GREAT = 0.95; // >95% → aggressive ramp up
// Adjustment factors
const RAMP_UP_AGGRESSIVE = 1.25; // +25% batch size
const RAMP_UP_GENTLE = 1.1; // +10% batch size
const RAMP_DOWN_FACTOR = 0.5; // -50% batch size
const INTERVAL_SPEEDUP_FAST = 0.8; // 20% faster
const INTERVAL_SPEEDUP_SLOW = 0.95; // 5% faster
const INTERVAL_SLOWDOWN = 1.5; // 50% slower
function adapt(state: ControllerState): void {
const sr = successRate(state.metrics);
const recent = recentTotal(state.metrics);
// Don't adapt until we have enough data points
// (prevents overreaction on the first few ticks)
if (recent < 5) return;
if (sr < SUCCESS_RATE_CRITICAL) {
// ZONE 1: CRITICAL — enter cooldown
// Success rate below 20% means the API is actively blocking us.
// Stop for COOLDOWN_MS, reset to minimum parameters.
console.log(
`[adapt] CRITICAL: ${(sr * 100).toFixed(0)}% success → ` +
`cooldown ${COOLDOWN_MS / 1000}s`
);
state.cooldown_until = Date.now() + COOLDOWN_MS;
state.current_batch_size = ABSOLUTE_MIN_BATCH;
state.current_interval_ms = MAX_INTERVAL_MS;
} else if (sr < SUCCESS_RATE_LOW) {
// ZONE 2: LOW — significant backoff
// Success rate 20-50%. Halve the batch, slow the interval.
// Aggressive but not panicked.
state.current_batch_size = Math.max(
ABSOLUTE_MIN_BATCH,
Math.floor(state.current_batch_size * RAMP_DOWN_FACTOR)
);
state.current_interval_ms = Math.min(
MAX_INTERVAL_MS,
Math.floor(state.current_interval_ms * INTERVAL_SLOWDOWN)
);
} else if (sr > SUCCESS_RATE_GREAT) {
// ZONE 4: GREAT — aggressive ramp up
// Success rate above 95%. The API is happy. Push harder.
state.current_batch_size = Math.min(
ABSOLUTE_MAX_BATCH,
Math.ceil(state.current_batch_size * RAMP_UP_AGGRESSIVE)
);
state.current_interval_ms = Math.max(
MIN_INTERVAL_MS,
Math.floor(state.current_interval_ms * INTERVAL_SPEEDUP_FAST)
);
} else if (sr > SUCCESS_RATE_GOOD) {
// ZONE 3: GOOD — gentle ramp up
// Success rate 80-95%. Cautiously increase.
state.current_batch_size = Math.min(
ABSOLUTE_MAX_BATCH,
Math.ceil(state.current_batch_size * RAMP_UP_GENTLE)
);
state.current_interval_ms = Math.max(
MIN_INTERVAL_MS,
Math.floor(state.current_interval_ms * INTERVAL_SPEEDUP_SLOW)
);
}
// DEAD ZONE: 50-80% success rate → no change
// This provides hysteresis — the system needs to clearly
// improve or worsen before parameters change.
}
The four zones visualized:
Success Rate:
0% 20% 50% 80% 95% 100%
|━━━━━━━━━|━━━━━━━━━━━|━━━━━━━━━━━|━━━━━━━━━━|━━━━━━━━━|
│CRITICAL │ LOW │ DEAD ZONE │ GOOD │ GREAT │
│cooldown │ -50% batch│ no change │+10% batch│+25% batch│
│min batch│ +50% intv │ │-5% intv │-20% intv │
│max intv │ │ │ │ │
Why the dead zone matters:
Without a dead zone (50-80%), the controller oscillates. Imagine a success rate of 78%: it ramps up, which pushes it to 82%, which ramps up more, pushing it back to 75%, which backs off, pushing it to 85%, etc. The dead zone provides hysteresis — the system must clearly enter a new zone before parameters change.
Why Math.ceil for ramp-up and Math.floor for ramp-down?
Ceiling ensures forward progress. Math.ceil(5 * 1.1) = 6 — you actually increase by 1. Math.floor(5 * 1.1) = 5 — you stay flat. For ramp-down, flooring ensures you actually decrease: Math.floor(6 * 0.5) = 3.
The minimum data check (recent < 5):
This prevents a single failed request from triggering a critical cooldown. The controller waits until it has at least 5 data points in the rolling window before making adaptation decisions. This is especially important at startup and after cooldown recovery.
Key insight: The adaptation is asymmetric by design. Ramp-down is faster than ramp-up (50% reduction vs 25% increase). This is intentional. Getting blocked is expensive — it poisons your IP reputation, wastes compute, and may trigger longer-term bans. Getting to maximum throughput a few ticks later is cheap. Always be faster to retreat than to advance.
Pattern 4: Parallel Batch Processing
Within a single alarm tick, the controller processes its batch items in parallel — but with controlled concurrency. You do not want to fire 50 concurrent requests at Apple simultaneously. You also do not want to process them one at a time.
When to use it: When each work item involves one or more external API calls and you need to maximize throughput within CPU time limits.
const ENRICHMENT_PARALLEL = 8; // Max concurrent items per chunk
async function enrichBatch(
db: D1Database,
trackIds: number[],
htmlCache?: R2Bucket
): Promise<{ enriched: number; failed: number }> {
// Dynamic import keeps the DO class lightweight.
// The enrichment module is only loaded when work exists.
const { default: enrichSingleApp } = await import(
"./enrich-single"
);
let enriched = 0;
let failed = 0;
// Process in chunks of ENRICHMENT_PARALLEL
for (let i = 0; i < trackIds.length; i += ENRICHMENT_PARALLEL) {
const chunk = trackIds.slice(i, i + ENRICHMENT_PARALLEL);
// Promise.allSettled — never throws, always returns all results.
// A single failed enrichment must not abort the entire batch.
await Promise.allSettled(
chunk.map(async (trackId) => {
try {
await enrichSingleApp(db, trackId, htmlCache);
enriched++;
} catch (err: unknown) {
failed++;
const msg =
err instanceof Error ? err.message : String(err);
const isRateLimit =
msg.includes("403") || msg.includes("429");
if (!isRateLimit) {
// Permanent failure — mark as failed so we don't retry
await db
.prepare(
`UPDATE app_registry
SET enriched_at = datetime('now'),
enrichment_source = 'failed'
WHERE track_id = ?`
)
.bind(trackId)
.run()
.catch(() => {});
}
// Rate limit failures: DON'T mark as failed.
// The item stays in the queue for retry on a future tick,
// after the controller has backed off.
}
})
);
// Brief pause between chunks — gives the target API
// a micro-breather and prevents request bunching
if (i + ENRICHMENT_PARALLEL < trackIds.length) {
await new Promise((r) => setTimeout(r, 200));
}
}
return { enriched, failed };
}
Why Promise.allSettled instead of Promise.all?
Promise.all rejects on the first failure, abandoning all in-flight work. In a batch of 8 enrichments, if item 3 fails, you lose the results of items 4-8 that might have succeeded. Promise.allSettled always resolves with all results, letting you count successes and failures accurately.
Why the 200ms inter-chunk pause?
Without it, chunks fire back-to-back. If PARALLEL_LIMIT is 8 and the batch size is 50, that is 7 chunks in rapid succession — essentially 50 near-simultaneous requests. The 200ms pause spreads the load and gives the target API time to process each chunk before the next arrives.
Why dynamic import?
The enrichment module imports cheerio (for HTML parsing), zod (for validation), and other dependencies. Loading these on every alarm tick — even when there is no work — wastes CPU time. The dynamic import means the DO class itself is lightweight, loading the heavy modules only when actual work exists.
Error classification matters:
The code distinguishes between rate-limit errors (403/429) and other errors. This distinction drives two different behaviors:
- Rate limit errors: Leave the item in the queue. The controller will back off and retry later.
- Other errors (parsing failures, missing data, 404s): Mark the item as failed. Don’t waste future ticks retrying something that will never succeed.
Key insight: Each enrichment call to the Apple App Store actually makes 4 concurrent sub-requests: iTunes API lookup, ratings scrape, page scrape, and IAP scrape. With 8 items in a chunk, that is 32 concurrent outbound fetches. This is why the parallel limit matters — it is not 8 requests, it is 32. Know your multiplication factor.
Pattern 5: Cooldown and Recovery
When the success rate drops below the critical threshold (20%), the controller enters cooldown — a period of complete inactivity that lets the target API’s rate limiting state reset.
When to use it: When sustained failure indicates you have been blocked, not just rate-limited. The difference: a 429 means “slow down.” Sustained 403s mean “go away for a while.”
const COOLDOWN_MS = 300_000; // 5 minutes
function enterCooldown(state: ControllerState): void {
state.cooldown_until = Date.now() + COOLDOWN_MS;
state.current_batch_size = ABSOLUTE_MIN_BATCH;
state.current_interval_ms = MAX_INTERVAL_MS;
}
function handleCooldown(state: ControllerState): boolean {
if (Date.now() < state.cooldown_until) {
const remaining = Math.round(
(state.cooldown_until - Date.now()) / 1000
);
console.log(`[controller] In cooldown, ${remaining}s remaining`);
return true; // Skip this tick
}
return false; // Cooldown expired, proceed
}
// Recovery after cooldown is gentle.
// The controller resumes with:
// - ABSOLUTE_MIN_BATCH (2 items)
// - MAX_INTERVAL_MS (120 seconds)
// - Empty metrics window (all old metrics expired during cooldown)
//
// This means the first few ticks process 2 items every 2 minutes.
// If those succeed, the adapter gradually ramps back up.
// Full recovery from cooldown to maximum throughput typically takes
// 5-10 ticks (5-20 minutes depending on how fast the adapter ramps).
The recovery trajectory:
Tick 0: cooldown ends → batch=2, interval=120s
Tick 1: 2/2 succeed (100%) → adapt: batch=3, interval=96s
Tick 2: 3/3 succeed (100%) → adapt: batch=4, interval=77s
Tick 3: 4/4 succeed (100%) → adapt: batch=5, interval=62s
Tick 4: 5/5 succeed (100%) → adapt: batch=7, interval=49s
Tick 5: 7/7 succeed (100%) → adapt: batch=9, interval=39s
Tick 6: 9/9 succeed (100%) → adapt: batch=12, interval=32s
...
Tick 12: ~40 items, ~12s interval → near maximum throughput
This trajectory shows exponential recovery. After 12 successful ticks (about 6 minutes of processing time plus cooldown), the controller is back near full capacity. But if any tick during recovery shows failures, the adaptation immediately responds, preventing a second cooldown.
Multiple consecutive cooldowns:
If the controller exits cooldown, processes 2 items, and both fail, the success rate immediately drops below 20% and triggers another cooldown. This is correct behavior — the API has not recovered. The controller will keep cycling through cooldowns until the API becomes responsive again.
Edge case — clock skew:
cooldown_until is an absolute timestamp. If the DO migrates to a different machine with a slightly different clock, or if Date.now() drifts, the cooldown duration could be slightly longer or shorter. A few seconds of drift is irrelevant for a 5-minute cooldown. If you need precision, store cooldown_started_at and cooldown_duration_ms separately and compute the end time fresh each tick.
Key insight: Cooldown is not the same as stopping. During cooldown, the alarm loop continues running — it just skips the execution phase. This is important because the DO stays warm (fast response to status queries) and the alarm chain is never broken (no need for an external “wake-up” after cooldown).
Pattern 6: Location-Optimized Singleton
Cloudflare Workers run in 300+ data centers worldwide. When your DO makes requests to a specific API, latency depends on which data center the DO is running in. Location hints let you pin a DO to a colo close to the target API.
When to use it: When your DO primarily communicates with a specific external service rather than with end users.
type Env = {
DB: D1Database;
HTML_CACHE: R2Bucket;
ENRICHMENT_CONTROLLER: DurableObjectNamespace<EnrichmentController>;
};
export default {
async fetch(
request: Request,
env: Env
): Promise<Response> {
const url = new URL(request.url);
if (url.pathname.startsWith("/enrichment/")) {
// Get the singleton with a location hint.
// 'enam' = Eastern North America — close to Apple's
// infrastructure in Cupertino/Reno.
const id = env.ENRICHMENT_CONTROLLER.idFromName("main");
const stub = env.ENRICHMENT_CONTROLLER.get(id, {
locationHint: "enam",
});
// Strip the /enrichment prefix and forward
const doUrl = new URL(request.url);
doUrl.pathname = doUrl.pathname.replace(
"/enrichment",
""
);
return stub.fetch(doUrl.toString(), request);
}
return new Response("Not found", { status: 404 });
},
};
Available location hints:
| Hint | Region | Good for |
|---|---|---|
wnam | Western North America | US West APIs (Google, Meta) |
enam | Eastern North America | US East APIs (Apple, AWS) |
sam | South America | Note: DOs actually spawn in enam |
weur | Western Europe | EU APIs |
eeur | Eastern Europe | EU East APIs |
apac | Asia Pacific | Asian APIs |
oc | Oceania | AU/NZ APIs |
afr | Africa | Note: may spawn in weur |
me | Middle East | Note: may spawn in weur or apac |
Important caveats from the Cloudflare docs:
-
Location hints are best effort, not guaranteed. The DO will spawn in a nearby data center that supports DOs, which may not be the exact colo you hinted.
-
Only the first
get()call for a particular DO respects the hint. Once created, the DO stays where it is. Subsequent calls with different hints are ignored. -
Some regions (South America, Africa, Middle East) do not have DO-capable data centers. DOs hinted to these regions spawn in the nearest supported region.
Why enam for Apple:
Apple’s infrastructure is concentrated in the United States. The iTunes API endpoints (itunes.apple.com) and App Store web pages (apps.apple.com) resolve to US-based servers. A DO running in enam has 10-30ms latency to Apple. A DO running in apac has 150-200ms latency. When you are making 4 API calls per item across 8 parallel items, that latency difference adds up to seconds per tick.
Our production experience confirmed this: requests from certain European and Asian Cloudflare colos triggered 403s more frequently than US East colos. Apple’s bot detection appears to have different thresholds based on source geography.
Key insight: For a crawler DO, location hints should optimize for the target API, not for the user querying the status endpoint. A status query from Europe hitting a DO in
enamadds 100ms of latency — negligible for an admin dashboard. But the DO’s latency to Apple affects every single enrichment request, thousands per day.
Pattern 7: Operational Control Plane
The controller needs a management API for operators: check status, start/stop the loop, reset parameters, view metrics. This is exposed via the DO’s fetch() handler.
When to use it: Any long-running DO that humans need to inspect and control.
export class EnrichmentController extends DurableObject<Env> {
// ... (state, constructor, alarm from Pattern 1)
override async fetch(request: Request): Promise<Response> {
const url = new URL(request.url);
const path = url.pathname;
// ── Status endpoint ─────────────────────────────
if (request.method === "GET" && path === "/status") {
const sr = this.successRate();
const rate = this.completionsPerMinute();
return Response.json({
running: this.state.running,
total_dispatched: this.state.total_dispatched,
total_completed: this.state.total_completed,
total_failed: this.state.total_failed,
completions_per_minute: rate,
success_rate_pct: Math.round(sr * 100),
current_batch_size: this.state.current_batch_size,
current_interval_ms: this.state.current_interval_ms,
in_cooldown: Date.now() < this.state.cooldown_until,
cooldown_remaining_s: Math.max(
0,
Math.round(
(this.state.cooldown_until - Date.now()) / 1000
)
),
last_tick_at: this.state.last_tick_at
? new Date(this.state.last_tick_at).toISOString()
: null,
projected_per_day: rate * 60 * 24,
});
}
// ── Start the loop ──────────────────────────────
if (request.method === "POST" && path === "/start") {
this.state.running = true;
this.state.cooldown_until = 0;
await this.persist();
await this.ctx.storage.setAlarm(Date.now() + 1000);
return Response.json({ ok: true, running: true });
}
// ── Stop the loop ───────────────────────────────
if (request.method === "POST" && path === "/stop") {
this.state.running = false;
await this.persist();
// Don't delete the alarm — it will fire and see
// running=false, then not reschedule. Cleaner than
// trying to cancel mid-flight.
return Response.json({ ok: true, running: false });
}
// ── Reset to defaults ───────────────────────────
if (request.method === "POST" && path === "/reset") {
this.state = { ...DEFAULT_STATE };
await this.persist();
await this.ctx.storage.setAlarm(Date.now() + 1000);
return Response.json({ ok: true, message: "Reset to defaults" });
}
// ── Manual parameter override ───────────────────
if (request.method === "POST" && path === "/tune") {
const body = (await request.json()) as Partial<{
batch_size: number;
interval_ms: number;
}>;
if (body.batch_size !== undefined) {
this.state.current_batch_size = Math.max(
ABSOLUTE_MIN_BATCH,
Math.min(ABSOLUTE_MAX_BATCH, body.batch_size)
);
}
if (body.interval_ms !== undefined) {
this.state.current_interval_ms = Math.max(
MIN_INTERVAL_MS,
Math.min(MAX_INTERVAL_MS, body.interval_ms)
);
}
await this.persist();
return Response.json({
ok: true,
current_batch_size: this.state.current_batch_size,
current_interval_ms: this.state.current_interval_ms,
});
}
return new Response("Not found", { status: 404 });
}
}
Usage from curl:
curl https://my-worker.workers.dev/enrichment/status | jq
curl -X POST https://my-worker.workers.dev/enrichment/stop
curl -X POST https://my-worker.workers.dev/enrichment/start
curl -X POST https://my-worker.workers.dev/enrichment/tune \
-H "Content-Type: application/json" \
-d '{"batch_size": 5, "interval_ms": 60000}'
curl -X POST https://my-worker.workers.dev/enrichment/reset
The projected_per_day metric:
This is the most useful number for operators. It takes the current completions-per-minute rate and extrapolates to a daily total. When the controller is running well, you might see projected_per_day: 25920 (18/min * 60 * 24). When it is in cooldown, this drops to 0. This single number tells you whether the system is healthy.
Key insight: The
/stopendpoint does not delete the alarm. It setsrunning = falseand persists. The next alarm fires, seesrunning = false, and exits without rescheduling. This is cleaner and safer thandeleteAlarm()because it handles the race condition where the alarm fires between yourstopanddeleteAlarmcalls.
Example 1: Minimal Adaptive Alarm
The simplest possible adaptive alarm loop — just the scheduling mechanics with no business logic.
import { DurableObject } from "cloudflare:workers";
export class MinimalAdaptiveLoop extends DurableObject {
private interval = 30_000; // Start at 30s
private successCount = 0;
private failCount = 0;
constructor(ctx: DurableObjectState, env: unknown) {
super(ctx, env);
ctx.blockConcurrencyWhile(async () => {
this.interval =
(await ctx.storage.get<number>("interval")) ?? 30_000;
const alarm = await ctx.storage.getAlarm();
if (!alarm) {
await ctx.storage.setAlarm(Date.now() + this.interval);
}
});
}
override async alarm(): Promise<void> {
const ok = Math.random() > 0.3; // Simulated work
if (ok) {
this.successCount++;
this.interval = Math.max(10_000, this.interval * 0.9);
} else {
this.failCount++;
this.interval = Math.min(120_000, this.interval * 1.5);
}
await this.ctx.storage.put("interval", this.interval);
await this.ctx.storage.setAlarm(Date.now() + this.interval);
}
}
Example 2: Success Rate Calculation
Isolated success rate function with proper edge case handling.
type Metrics = Record<
number,
{ completed: number; failed: number }
>;
function calculateSuccessRate(
metrics: Metrics,
windowMinutes: number
): { rate: number; sampleSize: number; confidence: string } {
const windowStart = Date.now() - windowMinutes * 60_000;
let completed = 0;
let failed = 0;
for (const [ts, m] of Object.entries(metrics)) {
if (Number(ts) >= windowStart) {
completed += m.completed;
failed += m.failed;
}
}
const total = completed + failed;
const rate = total === 0 ? 1 : completed / total;
const confidence =
total === 0
? "none"
: total < 5
? "low"
: total < 20
? "medium"
: "high";
return { rate, sampleSize: total, confidence };
}
Example 3: Batch Size Adaptation
Pure function that computes the new batch size given current state.
function adaptBatchSize(
currentBatch: number,
successRate: number,
min: number,
max: number
): { newBatch: number; action: string } {
if (successRate < 0.2) {
return { newBatch: min, action: "critical_reset" };
}
if (successRate < 0.5) {
return {
newBatch: Math.max(min, Math.floor(currentBatch * 0.5)),
action: "halve",
};
}
if (successRate > 0.95) {
return {
newBatch: Math.min(max, Math.ceil(currentBatch * 1.25)),
action: "aggressive_ramp",
};
}
if (successRate > 0.8) {
return {
newBatch: Math.min(max, Math.ceil(currentBatch * 1.1)),
action: "gentle_ramp",
};
}
return { newBatch: currentBatch, action: "hold" };
}
Example 4: Cooldown Gate
A reusable cooldown checker that can be composed into any alarm handler.
class CooldownGate {
private cooldownUntil = 0;
private readonly duration: number;
constructor(durationMs: number) {
this.duration = durationMs;
}
trigger(): void {
this.cooldownUntil = Date.now() + this.duration;
}
isActive(): boolean {
return Date.now() < this.cooldownUntil;
}
remainingMs(): number {
return Math.max(0, this.cooldownUntil - Date.now());
}
remainingSeconds(): number {
return Math.round(this.remainingMs() / 1000);
}
toJSON(): { cooldown_until: number } {
return { cooldown_until: this.cooldownUntil };
}
static fromJSON(data: { cooldown_until: number }, durationMs: number): CooldownGate {
const gate = new CooldownGate(durationMs);
gate.cooldownUntil = data.cooldown_until;
return gate;
}
}
Example 5: Parallel Chunked Execution
Generic parallel execution with controlled concurrency and inter-chunk delays.
async function processInChunks<T, R>(
items: T[],
processor: (item: T) => Promise<R>,
options: {
concurrency: number;
delayBetweenChunksMs: number;
}
): Promise<{ results: PromiseSettledResult<R>[] }> {
const allResults: PromiseSettledResult<R>[] = [];
for (let i = 0; i < items.length; i += options.concurrency) {
const chunk = items.slice(i, i + options.concurrency);
const chunkResults = await Promise.allSettled(
chunk.map(processor)
);
allResults.push(...chunkResults);
if (i + options.concurrency < items.length) {
await new Promise((r) =>
setTimeout(r, options.delayBetweenChunksMs)
);
}
}
return { results: allResults };
}
// Usage
const { results } = await processInChunks(
trackIds,
(id) => enrichSingleApp(db, id),
{ concurrency: 8, delayBetweenChunksMs: 200 }
);
const succeeded = results.filter(
(r) => r.status === "fulfilled"
).length;
const failed = results.filter(
(r) => r.status === "rejected"
).length;
Example 6: Status Dashboard Endpoint
A comprehensive status response that gives operators everything they need.
function buildStatus(state: ControllerState): object {
const sr = calculateSuccessRate(
state.metrics,
METRICS_WINDOW_MINUTES
);
const cpm = completionsPerMinute(state.metrics);
return {
// Control state
running: state.running,
in_cooldown: Date.now() < state.cooldown_until,
cooldown_remaining_s: Math.max(
0,
Math.round((state.cooldown_until - Date.now()) / 1000)
),
// Current parameters
params: {
batch_size: state.current_batch_size,
interval_ms: state.current_interval_ms,
interval_human: `${(state.current_interval_ms / 1000).toFixed(0)}s`,
},
// Metrics
lifetime: {
dispatched: state.total_dispatched,
completed: state.total_completed,
failed: state.total_failed,
success_rate_pct: state.total_dispatched > 0
? Math.round(
(state.total_completed / state.total_dispatched) * 100
)
: 100,
},
rolling: {
success_rate_pct: Math.round(sr.rate * 100),
sample_size: sr.sampleSize,
confidence: sr.confidence,
completions_per_minute: cpm,
},
// Projections
projected_per_hour: cpm * 60,
projected_per_day: cpm * 60 * 24,
// Timing
last_tick_at: state.last_tick_at
? new Date(state.last_tick_at).toISOString()
: null,
uptime_s: state.last_tick_at
? Math.round((Date.now() - state.last_tick_at) / 1000)
: 0,
};
}
Example 7: Constructor State Recovery
Safe state recovery that handles schema evolution.
constructor(ctx: DurableObjectState, env: Env) {
super(ctx, env);
ctx.blockConcurrencyWhile(async () => {
const saved = await ctx.storage.get<ControllerState>("state");
if (saved) {
// Merge with defaults — new fields get default values,
// saved fields override defaults
this.state = { ...DEFAULT_STATE, ...saved };
// Migrate: if saved state has fields that changed type,
// fix them here
if (typeof this.state.metrics !== "object") {
this.state.metrics = {};
}
// Clamp: ensure saved values are within current bounds
// (bounds may have changed between deploys)
this.state.current_batch_size = Math.max(
ABSOLUTE_MIN_BATCH,
Math.min(ABSOLUTE_MAX_BATCH, this.state.current_batch_size)
);
this.state.current_interval_ms = Math.max(
MIN_INTERVAL_MS,
Math.min(MAX_INTERVAL_MS, this.state.current_interval_ms)
);
}
this.pruneMetrics();
// Self-start if running and no alarm is set
if (this.state.running) {
const existing = await ctx.storage.getAlarm();
if (!existing) {
await ctx.storage.setAlarm(
Date.now() + this.state.current_interval_ms
);
}
}
});
}
Example 8: Metric Pruning
Keeping the metrics store bounded to prevent unbounded storage growth.
function pruneMetrics(
metrics: Record<number, unknown>,
windowMinutes: number
): number {
const cutoff = Date.now() - (windowMinutes + 1) * 60_000;
let pruned = 0;
for (const ts of Object.keys(metrics)) {
if (Number(ts) < cutoff) {
delete metrics[Number(ts)];
pruned++;
}
}
return pruned;
}
// Call on every persist() to keep storage bounded:
// With a 5-minute window, you never have more than 6-7 buckets.
// Each bucket is ~50 bytes of JSON. Total: ~350 bytes.
Example 9: Graceful Start After Deploy
When a Worker is deployed, all DOs are restarted. The constructor must handle this gracefully.
// The self-start pattern in the constructor handles deploys
// automatically. But you might want a warm-up period:
constructor(ctx: DurableObjectState, env: Env) {
super(ctx, env);
ctx.blockConcurrencyWhile(async () => {
const saved = await ctx.storage.get<ControllerState>("state");
if (saved) {
this.state = { ...DEFAULT_STATE, ...saved };
}
// After a deploy, start conservatively even if the saved
// state has aggressive parameters. The deploy may have
// changed the enrichment logic, and the old parameters
// might not be appropriate.
const deployGracePeriod = 60_000; // 1 minute
if (this.state.running) {
const existing = await ctx.storage.getAlarm();
if (!existing) {
// Delay the first tick to let the deploy settle
await ctx.storage.setAlarm(
Date.now() + deployGracePeriod
);
}
}
});
}
Example 10: Error Classification
Classifying errors to drive different adaptation behaviors.
type ErrorClass =
| "rate_limit" // 403, 429 → back off
| "server_error" // 500, 502, 503 → back off gently
| "not_found" // 404 → skip item, don't count as failure
| "parse_error" // Invalid response → skip item
| "timeout" // Network timeout → back off
| "unknown"; // Unexpected → count as failure
function classifyError(err: unknown): ErrorClass {
if (!(err instanceof Error)) return "unknown";
const msg = err.message.toLowerCase();
if (msg.includes("403") || msg.includes("429")) {
return "rate_limit";
}
if (msg.includes("500") || msg.includes("502") || msg.includes("503")) {
return "server_error";
}
if (msg.includes("404") || msg.includes("not found")) {
return "not_found";
}
if (msg.includes("timeout") || msg.includes("timed out")) {
return "timeout";
}
if (msg.includes("parse") || msg.includes("json") || msg.includes("html")) {
return "parse_error";
}
return "unknown";
}
// Usage in the batch processor:
// Only rate_limit and server_error count toward the failure metric.
// not_found and parse_error are item-level problems, not API problems.
function shouldCountAsApiFailure(errorClass: ErrorClass): boolean {
return errorClass === "rate_limit" || errorClass === "server_error" || errorClass === "timeout";
}
This is the complete, production-tested EnrichmentController from a real Apple App Store crawler. It processes thousands of apps per day, adapting to Apple’s varying tolerance levels across different times of day and Cloudflare colos.
/**
* EnrichmentController — Durable Object that owns the
* entire enrichment pipeline.
*
* Self-driving, self-tuning closed loop:
* alarm fires → pick apps from DB → enrich in parallel
* → measure success → adapt batch_size + interval
* → schedule next alarm at adapted interval
*
* No cron dependency. No queue. No static timers.
* Everything adapts to Apple's response: ramps up when
* succeeding, backs off when blocked, enters cooldown
* on sustained failure.
*
* Identity: Singleton — addressed as idFromName('main')
* Location: Created with locationHint 'enam' to pin to
* US East (Apple-friendly colo)
*/
import { DurableObject } from "cloudflare:workers";
// ── Bounds (safety rails, not targets) ──────────────
const ABSOLUTE_MIN_BATCH = 2;
const ABSOLUTE_MAX_BATCH = 50;
const MIN_INTERVAL_MS = 10_000;
const MAX_INTERVAL_MS = 120_000;
const COOLDOWN_MS = 300_000;
const METRICS_WINDOW_MINUTES = 5;
const ENRICHMENT_PARALLEL = 8;
// ── Adaptive thresholds ─────────────────────────────
const SUCCESS_RATE_CRITICAL = 0.2;
const SUCCESS_RATE_LOW = 0.5;
const SUCCESS_RATE_GOOD = 0.8;
const SUCCESS_RATE_GREAT = 0.95;
const RAMP_UP_FACTOR = 1.25;
const RAMP_DOWN_FACTOR = 0.5;
// ── State ───────────────────────────────────────────
interface ControllerState {
total_dispatched: number;
total_completed: number;
total_failed: number;
last_tick_at: number;
running: boolean;
current_batch_size: number;
current_interval_ms: number;
cooldown_until: number;
metrics: Record<
number,
{ dispatched: number; completed: number; failed: number }
>;
}
const DEFAULT_STATE: ControllerState = {
total_dispatched: 0,
total_completed: 0,
total_failed: 0,
last_tick_at: 0,
running: true,
current_batch_size: 5,
current_interval_ms: 30_000,
cooldown_until: 0,
metrics: {},
};
type Env = {
DB: D1Database;
HTML_CACHE: R2Bucket;
};
export class EnrichmentController extends DurableObject<Env> {
private state: ControllerState = DEFAULT_STATE;
constructor(ctx: DurableObjectState, env: Env) {
super(ctx, env);
ctx.blockConcurrencyWhile(async () => {
const saved =
await ctx.storage.get<ControllerState>("state");
if (saved) {
this.state = { ...DEFAULT_STATE, ...saved };
}
this.pruneMetrics();
if (this.state.running) {
const existing = await ctx.storage.getAlarm();
if (!existing) {
await ctx.storage.setAlarm(
Date.now() + this.state.current_interval_ms
);
}
}
});
}
// ── HTTP API (status + control) ───────────────────
override async fetch(request: Request): Promise<Response> {
const url = new URL(request.url);
const path = url.pathname;
if (request.method === "GET" && path === "/status") {
return Response.json(this.getStatus());
}
if (request.method === "POST" && path === "/start") {
this.state.running = true;
this.state.cooldown_until = 0;
await this.persist();
await this.ctx.storage.setAlarm(Date.now() + 1000);
return Response.json({ ok: true, running: true });
}
if (request.method === "POST" && path === "/stop") {
this.state.running = false;
await this.persist();
return Response.json({ ok: true, running: false });
}
if (request.method === "POST" && path === "/reset") {
this.state = { ...DEFAULT_STATE };
await this.persist();
await this.ctx.storage.setAlarm(Date.now() + 1000);
return Response.json({ ok: true });
}
return new Response("Not found", { status: 404 });
}
// ── Alarm: the heartbeat ──────────────────────────
override async alarm(): Promise<void> {
if (!this.state.running) return;
try {
if (Date.now() < this.state.cooldown_until) {
const remaining = Math.round(
(this.state.cooldown_until - Date.now()) / 1000
);
console.log(
`[enrichment-do] In cooldown, ${remaining}s remaining`
);
await this.scheduleNext();
return;
}
const batchSize = this.state.current_batch_size;
const toEnrich = await this.env.DB.prepare(
`SELECT track_id FROM app_registry
WHERE enriched_at IS NULL
OR (enrichment_source = 'itunes'
AND enriched_at < datetime('now', '-14 days'))
ORDER BY enrich_priority DESC, rating_count DESC
LIMIT ?`
)
.bind(batchSize)
.all<{ track_id: number }>();
if (toEnrich.results.length === 0) {
console.log(`[enrichment-do] No apps to enrich`);
await this.scheduleNext();
return;
}
const trackIds = toEnrich.results.map((r) => r.track_id);
this.state.total_dispatched += trackIds.length;
this.recordMetric("dispatched", trackIds.length);
const { enriched, failed } =
await this.enrichBatch(trackIds);
this.state.total_completed += enriched;
this.state.total_failed += failed;
this.state.last_tick_at = Date.now();
this.recordMetric("completed", enriched);
this.recordMetric("failed", failed);
this.adapt();
console.log(
`[enrichment-do] ${enriched}/${trackIds.length} enriched, ` +
`${failed} failed | ` +
`batch=${this.state.current_batch_size} ` +
`interval=${(this.state.current_interval_ms / 1000).toFixed(0)}s`
);
} catch (err) {
console.error(`[enrichment-do] Tick failed:`, err);
}
await this.persist();
await this.scheduleNext();
}
// ── Enrichment engine ─────────────────────────────
private async enrichBatch(
trackIds: number[]
): Promise<{ enriched: number; failed: number }> {
const { default: enrichSingleApp } = await import(
"./enrich-single"
);
let enriched = 0;
let failed = 0;
for (
let i = 0;
i < trackIds.length;
i += ENRICHMENT_PARALLEL
) {
const batch = trackIds.slice(
i,
i + ENRICHMENT_PARALLEL
);
await Promise.allSettled(
batch.map(async (trackId) => {
try {
await enrichSingleApp(
this.env.DB,
trackId,
this.env.HTML_CACHE
);
enriched++;
} catch (err: unknown) {
failed++;
const msg =
err instanceof Error ? err.message : "";
const isRateLimit =
msg.includes("403") || msg.includes("429");
if (!isRateLimit) {
await this.env.DB.prepare(
`UPDATE app_registry
SET enriched_at = datetime('now'),
enrichment_source = 'failed'
WHERE track_id = ?`
)
.bind(trackId)
.run()
.catch(() => {});
}
}
})
);
if (i + ENRICHMENT_PARALLEL < trackIds.length) {
await new Promise((r) => setTimeout(r, 200));
}
}
return { enriched, failed };
}
// ── Adaptive logic ────────────────────────────────
private adapt(): void {
const sr = this.successRate();
const recentTotal = this.recentTotal();
if (recentTotal < 5) return;
if (sr < SUCCESS_RATE_CRITICAL) {
console.log(
`[enrichment-do] CRITICAL: ` +
`${(sr * 100).toFixed(0)}% success → ` +
`cooldown ${COOLDOWN_MS / 1000}s`
);
this.state.cooldown_until = Date.now() + COOLDOWN_MS;
this.state.current_batch_size = ABSOLUTE_MIN_BATCH;
this.state.current_interval_ms = MAX_INTERVAL_MS;
} else if (sr < SUCCESS_RATE_LOW) {
this.state.current_batch_size = Math.max(
ABSOLUTE_MIN_BATCH,
Math.floor(
this.state.current_batch_size * RAMP_DOWN_FACTOR
)
);
this.state.current_interval_ms = Math.min(
MAX_INTERVAL_MS,
Math.floor(this.state.current_interval_ms * 1.5)
);
} else if (sr > SUCCESS_RATE_GREAT) {
this.state.current_batch_size = Math.min(
ABSOLUTE_MAX_BATCH,
Math.ceil(
this.state.current_batch_size * RAMP_UP_FACTOR
)
);
this.state.current_interval_ms = Math.max(
MIN_INTERVAL_MS,
Math.floor(this.state.current_interval_ms * 0.8)
);
} else if (sr > SUCCESS_RATE_GOOD) {
this.state.current_batch_size = Math.min(
ABSOLUTE_MAX_BATCH,
Math.ceil(this.state.current_batch_size * 1.1)
);
this.state.current_interval_ms = Math.max(
MIN_INTERVAL_MS,
Math.floor(this.state.current_interval_ms * 0.95)
);
}
}
private async scheduleNext(): Promise<void> {
if (this.state.running) {
await this.ctx.storage.setAlarm(
Date.now() + this.state.current_interval_ms
);
}
}
// ── Metrics ───────────────────────────────────────
private getStatus(): object {
const sr = this.successRate();
const rate = this.recentCompletionsPerMinute();
return {
running: this.state.running,
total_dispatched: this.state.total_dispatched,
total_completed: this.state.total_completed,
total_failed: this.state.total_failed,
completions_per_minute: rate,
success_rate: Math.round(sr * 100),
current_batch_size: this.state.current_batch_size,
current_interval_ms: this.state.current_interval_ms,
in_cooldown:
Date.now() < this.state.cooldown_until,
cooldown_remaining_s: Math.max(
0,
Math.round(
(this.state.cooldown_until - Date.now()) / 1000
)
),
last_tick_at: this.state.last_tick_at || null,
projected_per_day: rate * 60 * 24,
};
}
private successRate(): number {
const windowStart =
Date.now() - METRICS_WINDOW_MINUTES * 60_000;
let completed = 0;
let failed = 0;
for (const [ts, m] of Object.entries(
this.state.metrics
)) {
if (Number(ts) >= windowStart) {
completed += m.completed;
failed += m.failed;
}
}
const total = completed + failed;
return total === 0 ? 1 : completed / total;
}
private recentTotal(): number {
const windowStart =
Date.now() - METRICS_WINDOW_MINUTES * 60_000;
let total = 0;
for (const [ts, m] of Object.entries(
this.state.metrics
)) {
if (Number(ts) >= windowStart) {
total += m.completed + m.failed;
}
}
return total;
}
private recentCompletionsPerMinute(): number {
const windowStart =
Date.now() - METRICS_WINDOW_MINUTES * 60_000;
let total = 0;
for (const [ts, m] of Object.entries(
this.state.metrics
)) {
if (Number(ts) >= windowStart) {
total += m.completed;
}
}
return Math.round(total / METRICS_WINDOW_MINUTES);
}
private recordMetric(
key: "dispatched" | "completed" | "failed",
count: number
): void {
const minuteTs =
Math.floor(Date.now() / 60_000) * 60_000;
if (!this.state.metrics[minuteTs]) {
this.state.metrics[minuteTs] = {
dispatched: 0,
completed: 0,
failed: 0,
};
}
this.state.metrics[minuteTs][key] += count;
}
private pruneMetrics(): void {
const cutoff =
Date.now() - (METRICS_WINDOW_MINUTES + 1) * 60_000;
for (const ts of Object.keys(this.state.metrics)) {
if (Number(ts) < cutoff) {
delete this.state.metrics[Number(ts)];
}
}
}
private async persist(): Promise<void> {
this.pruneMetrics();
await this.ctx.storage.put("state", this.state);
}
}
The Worker entrypoint that creates the singleton with location hint:
type Env = {
DB: D1Database;
HTML_CACHE: R2Bucket;
ENRICHMENT_CONTROLLER: DurableObjectNamespace<EnrichmentController>;
};
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
if (url.pathname.startsWith("/v1/enrichment")) {
const id = env.ENRICHMENT_CONTROLLER.idFromName("main");
const stub = env.ENRICHMENT_CONTROLLER.get(id, {
locationHint: "enam",
});
const doUrl = new URL(request.url);
doUrl.pathname = doUrl.pathname.replace(
"/v1/enrichment",
""
);
return stub.fetch(doUrl.toString(), request);
}
return new Response("Not found", { status: 404 });
},
};
export { EnrichmentController };
DO Adaptive Controller vs. Alternative Approaches
| Dimension | Fixed Cron | Token Bucket | Queue Backpressure | DO Adaptive Controller |
|---|---|---|---|---|
| What it is | Cron trigger fires every N minutes, processes a fixed batch | In-memory counter that refills at a fixed rate, gates requests | Queue with max_concurrency and DLQ, retries with backoff | Singleton DO with alarm loop, rolling metrics, adaptive parameters |
| Interval control | Fixed (1-min minimum on CF) | N/A (gates incoming, doesn’t schedule) | Queue consumer pulls when ready | Dynamic: 10s–120s, adjusted per tick |
| Batch size control | Fixed at deploy time | N/A (per-request gating) | Controlled by max_concurrency + batch size | Dynamic: 2–50, adjusted per tick based on success rate |
| State persistence | None — stateless | In-memory only (lost on restart) | Queue messages persist, but consumer is stateless | Full state in DO storage, survives restarts and migrations |
| Learning from failures | No — same batch/interval regardless | No — fixed refill rate | Limited — retries with backoff per message | Yes — rolling window success rate drives all parameters |
| Cooldown on sustained failure | No — keeps hammering | No — keeps gating at same rate | DLQ catches failed messages after max retries | Yes — 5-min cooldown on <20% success rate |
| Per-entity isolation | One cron for everything | Per-API-key typically | Per-queue, but queues are shared | Per-DO — each entity gets its own adaptive loop |
| Location optimization | Runs in any colo | N/A | Consumer runs in any colo | Location hints pin to target-API-friendly colo |
| Observability | Logs only | Metrics if you build them | Queue depth, DLQ depth | Full status API: rate, batch, interval, projections |
| Cost while idle | Cron fires even with no work | Memory cost for idle buckets | Queue polling cost | Zero — DO hibernates, alarm is free until it fires |
| Implementation effort | Low | Medium | Medium | High (but reusable) |
| Best for | Simple periodic jobs | Protecting your own API from callers | Decoupling producers from consumers | Continuous adaptive crawling of external APIs |
When to Use Each
Fixed Cron is right when:
- You have a simple periodic job (send daily report, clean up old records)
- The work is fast and failure is not catastrophic
- You need fewer than 3 schedules per Worker
Token Bucket is right when:
- You are the API provider protecting yourself from callers
- You need per-caller rate limiting with burst allowance
- The rate limit is known and fixed
Queue Backpressure is right when:
- You need to decouple producers from consumers
- Work items arrive unpredictably and must be buffered
- Each message can be processed independently
- You want automatic retries with exponential backoff per message
DO Adaptive Controller is right when:
- You are crawling an external API with unknown/variable rate limits
- You need the system to find its own sustainable throughput
- You want zero manual intervention after deployment
- Per-entity isolation matters (different crawl targets, different behavior)
- You need continuous processing, not batch-and-wait
The DO Pattern as “Stateful Scheduler”
The key differentiator is that the DO adaptive controller is a stateful scheduler — it combines scheduling, execution, state management, and adaptation into a single primitive. Every other approach separates at least two of these concerns:
- Cron separates scheduling (cron config) from execution (handler) — no shared state
- Token bucket separates rate limiting (bucket) from execution (caller) — bucket has no knowledge of execution results
- Queue separates buffering (queue) from execution (consumer) — consumer has no memory across invocations
The DO collapses all four into one object. The same code that decides “when to run” also knows “how the last run went” and uses that knowledge to decide “what to do differently next time.” This is why it can adapt in ways that separated systems cannot.
| Don’t | Do Instead | Why |
|---|---|---|
Use Promise.all for parallel batch processing | Use Promise.allSettled | Promise.all rejects on first failure, losing in-flight results and giving inaccurate success/failure counts |
| Store metrics as a flat array of events | Store as minute-bucketed aggregates | Flat arrays grow unboundedly; minute buckets are constant-size and easy to prune |
| React to individual request failures | React to rolling window success rate | Individual failures cause oscillation; rolling windows provide stable signal |
| Set a long cooldown (30+ minutes) | Keep cooldown short (3-5 minutes) and let the controller re-enter cooldown if needed | Long cooldowns waste time if the API recovers quickly; short + repeated is more responsive |
Use deleteAlarm() to stop the loop | Set running = false and let the alarm see it | deleteAlarm() races with alarm delivery; running = false is always safe |
| Hard-code the User-Agent string | Use bare fetch() with no custom headers for JSON APIs | Custom UAs trigger TLS fingerprint mismatches on certain CF colos, causing false 403s (observed with Apple APIs) |
| Store state as multiple separate storage keys | Store as a single JSON object | Separate keys mean non-atomic updates — you can have batch_size from tick N and interval from tick N-1 after a crash |
Skip the recent < 5 check in adapt() | Always require minimum sample size before adapting | Without it, a single failed request triggers aggressive backoff, potentially entering cooldown unnecessarily |
| Use lifetime success rate for adaptation | Use a rolling window (5 minutes) | Lifetime rates react too slowly — 10,000 successes followed by 50 failures barely moves the needle |
| Process items serially in the alarm handler | Process in parallel chunks with inter-chunk delays | Serial processing wastes the alarm’s CPU time budget; parallel chunks maximize throughput within limits |
| Create the DO without a location hint | Use locationHint on the first get() call | Without a hint, the DO spawns near the first request, which may be an admin in Europe while the target API is in the US |
| Retry rate-limited items immediately in the same tick | Let them stay in the queue for the next tick | Retrying immediately just burns more rate limit budget; the next tick will have adapted parameters |
Use setTimeout loops inside the alarm handler | Use the alarm loop itself for recurring work | setTimeout inside a Worker has no persistence guarantee; alarms are durable and survive restarts |
| Store raw error strings in metrics | Classify errors and only count API-level failures | Parse errors and 404s are item-level problems, not rate-limit signals; counting them pollutes the success rate |
The Durable Object must be declared in wrangler.jsonc:
{
"$schema": "node_modules/wrangler/config-schema.json",
"name": "my-crawler",
"main": "src/index.ts",
"compatibility_date": "2024-12-01",
"durable_objects": {
"bindings": [
{
"name": "ENRICHMENT_CONTROLLER",
"class_name": "EnrichmentController"
}
]
},
"migrations": [
{
"tag": "v1",
"new_classes": ["EnrichmentController"]
}
],
// D1 database for the work queue
"d1_databases": [
{
"binding": "DB",
"database_name": "my-crawler-db",
"database_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
],
// R2 bucket for caching scraped HTML
"r2_buckets": [
{
"binding": "HTML_CACHE",
"bucket_name": "my-crawler-html-cache"
}
]
}
Key points:
migrationsare required when first deploying a DO class. Thenew_classesarray tells Cloudflare to create the DO namespace.- If you later rename the class, use
renamed_classesin a new migration tag. - The DO’s storage backend defaults to the SQLite storage backend on new classes.
- There is no need to declare the alarm handler — it is automatically available on any DO class that extends
DurableObject.
Official Cloudflare Documentation
- Durable Objects Overview — Introduction to Durable Objects concepts and capabilities
- Durable Objects Alarms API — Complete reference for
setAlarm,getAlarm,deleteAlarm, and thealarm()handler - Durable Objects Data Location — Location hints, jurisdictions, and how DO placement works
- Rules of Durable Objects — Best practices guide covering
blockConcurrencyWhile, state management, and common pitfalls - Access Durable Objects Storage — Patterns for reading/writing DO storage efficiently
- Durable Object State API — Reference for
DurableObjectStateincluding storage, alarm, and concurrency methods - Durable Object Namespace API — How to create and address DO instances with
idFromName,get, and location hints - Durable Object Base Class — The
DurableObjectbase class that all DOs extend - SQLite-backed Durable Object Storage — SQLite storage backend for DOs
- Durable Objects Pricing — Billing model for compute duration and storage
- Workers Cron Triggers — Cron trigger configuration and limitations (3 per Worker, 1-minute minimum)
- Workers Platform Limits — CPU time limits, memory limits, and other constraints
Cloudflare Blog Posts
- Durable Objects Alarms — a wake-up call for your applications — Introduction of the Alarms API with use cases and examples
- Building a scheduling system with Workers and Durable Objects — Reference architecture for using DOs as a scheduling system
- Control and Data Plane Pattern for Durable Objects — Reference architecture showing the control plane / data plane separation
Rate Limiting and Backoff Theory
- Exponential Backoff — Wikipedia — Formal description of exponential backoff, binary exponential backoff, and jitter strategies
- Token Bucket Algorithm — Wikipedia — The token bucket rate limiting algorithm with formal analysis
- API Rate Limiting at Scale: Patterns, Failures, and Control Strategies — Comprehensive overview of rate limiting strategies for API management
- From Token Bucket to Sliding Window: Rate Limiting Algorithms — Comparison of rate limiting algorithms with implementation guidance
- Rate Limiting Algorithms Explained with Code — Practical implementations of fixed window, sliding window, token bucket, and leaky bucket
- Dealing with Rate Limiting Using Exponential Backoff — Practical guide to handling rate limits in web scraping
Community Resources
- The Ultimate Guide to Cloudflare’s Durable Objects — Comprehensive tutorial covering DO concepts, patterns, and production tips
- Cloudflare Durable Objects Reference Sheet — Quick reference for DO APIs and patterns
- Respect API Rate Limits With a Backoff — Practical guide to implementing respectful API consumption
Related Concepts
- Hysteresis — Wikipedia — The principle behind the “dead zone” in adaptive thresholds, preventing oscillation
- Cloudflare Actors Library (Beta) — New SDK for Durable Objects with alarm helpers for scheduling
Built from production code running on Cloudflare Workers, processing thousands of Apple App Store records daily with zero manual intervention. The adaptive controller pattern has maintained a 97%+ success rate across 6 months of continuous operation, automatically navigating Apple’s varying rate limit enforcement.