Org Status: π‘ Dormant Cloudflare: N/A Last Audited: 2026-04-28
Event-Driven Architecture on Cloudflare Workers
βββββββββββββββββββββββ
β Brand Agent β
β (Durable Object) β
β β
β state + strategy β
β decides WHAT + WHY β
ββββ¬βββββββββββββββ¬βββββ
β β
command β β command
βΌ βΌ
ββββββββββββββββββ ββββββββββββββββββ
β RESEARCH_QUEUEβ β PUBLISH_QUEUE β
βββββββββ¬βββββββββ βββββββββ¬βββββββββ
β β
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β GatherFeed β β Pages-plus β
β (research) β β (publisher) β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β
event β β event
βΌ βΌ
ββββββββββββββββββββββββββββββββββ
β EVENTS_QUEUE β
β (fan-out consumer routes to β
β all interested services) β
ββββββββββββββββββββββββββββββββββ
Stop calling fetch() between your Workers. Use Queues.
This is a reference architecture for building event-driven systems on Cloudflareβs developer platform β Workers, Queues, Durable Objects, and the Agents SDK. It covers the patterns that work, the anti-patterns that donβt, and the Cloudflare-specific constraints you need to design around.
Table of Contents
Open Table of Contents
- The Problem with Imperative Architectures
- Two Patterns, Not One
- Cloudflare Primitives
- The Message Envelope
- Queue Topology
- Fan-Out Pattern
- Idempotency
- The Outbox Pattern
- Consumer Middleware
- Agents as Event Reactors
- Durable Workflows
- The Event Catalog
- Small Patterns That Add Up
- Delayed Retry with Backpressure
- Batch Sending
- Event Deduplication in Agents
- One Consumer, Multiple Queues
- Correlation ID Propagation
- Scheduled Agent Cycle
- Dead Letter Queue Monitor
- Read-Only Cross-Service Data Access
- Schema Migration for Event Tables
- Testing Queues Locally with Miniflare
- Workflow Progress to WebSocket Clients
- Conditional Fan-Out
- Graceful Queue Consumer with Batch Acknowledgment
- What Not to Do
- What You Donβt Need
- Cloudflare Constraints
- Compared: Message Queues
- Compared: Durable Execution
- Compared: Stateful Compute
- Compared: Full Stack Architectures
- Full Example: Content Pipeline
- References
- License
The Problem with Imperative Architectures
This is how most people wire up Cloudflare Workers:
// Service A calls Service B, waits, then calls Service C
const research = await fetch("https://gatherfeed.workers.dev/api/v1/research", {
method: "POST",
body: JSON.stringify({ keyword: "best budgeting apps" }),
});
const data = await research.json();
const article = await fetch("https://content-engine.workers.dev/v1/generate", {
method: "POST",
body: JSON.stringify({ research: data }),
});
const content = await article.json();
await fetch("https://publisher.workers.dev/v1/publish", {
method: "POST",
body: JSON.stringify({ content }),
});
This is synchronous, imperative, and fragile. Every problem with microservices shows up:
- Temporal coupling. If GatherFeed is slow, the entire chain blocks. If itβs down, everything fails.
- Tight coupling. Service A knows the URL, method, and payload shape of every downstream service.
- No retry isolation. A failure in step 3 means re-running steps 1 and 2.
- No partial progress. If the Worker hits the CPU time limit mid-chain, all work is lost.
- Cost amplification. One slow downstream service holds your Worker awake (and billable) while it waits.
The fix isnβt better error handling. Itβs a different architecture.
Two Patterns, Not One
People conflate βevent-drivenβ and βmessage-driven.β Theyβre different, and you usually want both.
Message-Driven (Commands)
A producer sends a message to a specific consumer. The producer knows whoβs receiving it.
"Hey GatherFeed, research these keywords."
This is a command. Itβs directed. It tells a service what to do. The producer is coupled to the consumer β it knows the destination queue exists and what the consumer expects.
Event-Driven (Facts)
A producer emits a fact about something that happened. The producer doesnβt know or care whoβs listening.
"Research completed for brand X. 50 keywords ready."
This is an event. Itβs broadcast. It describes what happened, not what should happen next. Consumers subscribe. The producer is decoupled β add or remove consumers without touching the producer.
When to Use Which
| Pattern | Use When | Cloudflare Primitive |
|---|---|---|
| Command (message) | You need a specific service to do a specific thing | Queue with dedicated consumer |
| Event (fact) | Multiple services might care about what happened | Shared events queue with fan-out consumer |
In practice, the flow is: commands trigger work β work produces events β events trigger decisions β decisions produce commands. Thatβs the loop.
Cloudflare Primitives
Cloudflare provides four building blocks for event-driven systems. Each has a specific role:
Cloudflare Queues
The message backbone between Workers. Producer Workers write messages; consumer Workers process them.
| Property | Value |
|---|---|
| Delivery guarantee | At-least-once |
| Ordering | No guarantee |
| Max message size | 128 KB |
| Throughput | 5,000 messages/sec per queue |
| Consumer concurrency | Up to 250 auto-scaling consumers |
| Max retry | Configurable (default 3) |
| Dead letter queue | Supported |
| Message delay | 0β86,400 seconds (24 hours) |
The critical constraint: one consumer Worker per queue. This isnβt a limitation if you design for it β it actually simplifies reasoning about message ownership.
Cloudflare Agents SDK
Persistent, stateful agents built on Durable Objects. Each agent has:
- SQLite database (
this.sql) β queryable, durable storage - Key-value state (
this.state) β syncs to connected WebSocket clients in real-time - Scheduling (
this.schedule()) β cron expressions or one-time delays - Internal queue (
this.queue()) β sequential background task processing - Workflow integration (
this.runWorkflow()) β trigger durable multi-step processes
Agents are the decision-makers. They react to events, maintain strategy, and emit commands.
Cloudflare Workflows (AgentWorkflow)
Durable multi-step execution integrated with Agents:
- Steps are checkpointed β a crash resumes from the last completed step
- Each step retries independently with configurable backoff
- Can wait for external events for up to one year (
waitForApproval) - Steps can update agent state, broadcast to WebSocket clients, call agent methods via RPC
Workflows are for processes that must complete: research β generate β publish. Not for fire-and-forget.
Event Subscriptions
Native Cloudflare service events (R2, KV, Workers AI, Workflows) published to Queues automatically:
npx wrangler queues subscription create my-queue --source r2 --events bucket.created
Events follow a standard structure:
{
"type": "cf.r2.bucket.created",
"source": { "type": "r2" },
"payload": { "name": "my-bucket", "location": "WNAM" },
"metadata": { "accountId": "...", "eventTimestamp": "2026-03-11T10:00:00Z" }
}
This is true event-driven β the emitter doesnβt know whoβs subscribed.
The Message Envelope
Every message through any queue follows this shape:
interface DomainMessage<T = unknown> {
event_id: string; // UUID v4 β deduplication key
type: string; // dot-notation: "research.requested", "content.published"
source: string; // emitting service: "scalable-media", "gatherfeed"
timestamp: string; // ISO 8601
correlation_id?: string; // traces a chain of related messages
payload: T; // reference data β IDs, not objects
}
Rules:
event_idis mandatory. At-least-once delivery means duplicates happen. This is your deduplication key.typeuses dot-notation. Domain-first:research.requested, notrequested_research. The domain is the noun, the action is the verb.- Payloads carry references, not data.
{ research_id: "abc" }, not{ full_research_object: {...} }. The data lives in D1 or R2. Messages are signals, databases are state. correlation_idtraces causality. A brand cycle generates research commands, which produce completion events, which trigger generation commands. The correlation ID ties them together for debugging.
Queue Topology
Each service owns its inbound command queue. There is one shared events queue with a fan-out consumer.
COMMAND QUEUES (directed, one consumer each)
βββββββββββββββββββββββββββββββββββββββββββ
β β
ββββββββββββββββββββ€ gatherfeed-commands β GatherFeed β
β β sm-commands β Scalable Media β
β Producers β publish-commands β Pages-plus β
β (any service) β social-commands β Social-good β
β β β
β βββββββββββββββββββββββββββββββββββββββββββ€
β β β
β β EVENT QUEUE (broadcast, fan-out) β
ββββββββββββββββββββ€ brand-events β Fan-out consumer β
β β routes to all β
β interested queues β
βββββββββββββββββββββββββββββββββββββββββββ
Wrangler Configuration
Producer side (e.g., Scalable Media):
{
"queues": {
"producers": [
{ "queue": "gatherfeed-commands", "binding": "RESEARCH_QUEUE" },
{ "queue": "publish-commands", "binding": "PUBLISH_QUEUE" },
{ "queue": "brand-events", "binding": "EVENTS_QUEUE" }
],
"consumers": [
{
"queue": "sm-commands",
"max_batch_size": 10,
"max_batch_timeout": 5,
"dead_letter_queue": "sm-commands-dlq"
}
]
}
}
Consumer side (e.g., GatherFeed):
{
"queues": {
"producers": [
{ "queue": "brand-events", "binding": "EVENTS_QUEUE" }
],
"consumers": [
{
"queue": "gatherfeed-commands",
"max_batch_size": 5,
"max_batch_timeout": 10,
"dead_letter_queue": "gatherfeed-commands-dlq"
}
]
}
}
Dead letter queues are mandatory. A message that fails max_retries times goes to the DLQ instead of being silently dropped. Monitor the DLQ β itβs your system telling you something is broken.
Fan-Out Pattern
Cloudflare Queues supports one consumer per queue. To deliver the same event to multiple services, use a fan-out consumer β a Worker that reads from the shared events queue and re-publishes to destination queues:
// fan-out-consumer/src/index.ts
export default {
async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
for (const msg of batch.messages) {
const routes = getRoutes(msg.body.type);
await Promise.all(
routes.map((queue) => queue.send(msg.body))
);
msg.ack();
}
},
};
function getRoutes(eventType: string): Queue[] {
// Route table β add new subscribers here, not in the producer
const routes: Record<string, Queue[]> = {
"research.completed": [env.SM_COMMANDS, env.ANALYTICS_QUEUE],
"content.published": [env.SM_COMMANDS, env.SOCIAL_COMMANDS],
"content.performed": [env.SM_COMMANDS],
};
return routes[eventType] ?? [];
}
The routing logic is JavaScript, not config. Add a subscriber by adding a line to the route table. Remove one by deleting the line. No producer changes needed.
This is the key advantage over static topic-based routing (Kafka, RabbitMQ): the routing is programmable. You can route based on payload fields, time of day, feature flags β anything.
Idempotency
Cloudflare Queues delivers at-least-once. This means your consumer will see duplicate messages. Not might. Will.
Every queue consumer must be safe to run twice with the same input. There are three strategies:
Strategy 1: Processed Events Table
The general-purpose approach. Check before acting:
async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
for (const msg of batch.messages) {
const { event_id } = msg.body;
// Already processed?
const existing = await env.DB.prepare(
"SELECT 1 FROM processed_events WHERE event_id = ?"
).bind(event_id).first();
if (existing) {
msg.ack();
continue;
}
try {
await handleMessage(msg.body, env);
await env.DB.prepare(
"INSERT INTO processed_events (event_id, type, processed_at) VALUES (?, ?, ?)"
).bind(event_id, msg.body.type, new Date().toISOString()).run();
msg.ack();
} catch (err) {
msg.retry({ delaySeconds: 30 });
}
}
}
Strategy 2: Natural Business Key
For writes where a natural key exists:
-- If we already have research for this keyword, skip
INSERT OR IGNORE INTO research (keyword, brand_slug, data, created_at)
VALUES (?, ?, ?, ?);
The INSERT OR IGNORE (or ON CONFLICT DO NOTHING) makes the write idempotent at the database level. No separate tracking table needed.
Strategy 3: Upstream Deduplication
For external API calls, check if the result already exists before calling:
async function researchKeyword(keyword: string, env: Env): Promise<void> {
// Already have fresh research?
const existing = await env.DB.prepare(
"SELECT 1 FROM research WHERE keyword = ? AND created_at > datetime('now', '-7 days')"
).bind(keyword).first();
if (existing) return; // Skip β don't burn an API call
const result = await callPerplexityViaApiMom(keyword, env);
await env.DB.prepare("INSERT INTO research ...").bind(...).run();
}
This saves money (no redundant API calls) and is naturally idempotent.
Pruning
Processed event records donβt need to live forever. Prune after 7 days:
DELETE FROM processed_events WHERE processed_at < datetime('now', '-7 days');
Run this on a cron schedule or inside a scheduled Agent task.
The Outbox Pattern
Thereβs a subtle failure mode: your D1 write succeeds, but the Queue publish fails (network issue, Worker CPU limit). Now your database says βresearch completeβ but no event was emitted. Downstream services never find out.
The fix is the outbox pattern: write the event to an outbox table in the same D1 transaction as your business data. A separate process polls the outbox and publishes to Queues.
// In your queue consumer β write data AND outbox event in one transaction
await env.DB.batch([
env.DB.prepare(
"INSERT INTO research (id, keyword, data) VALUES (?, ?, ?)"
).bind(id, keyword, JSON.stringify(data)),
env.DB.prepare(
"INSERT INTO outbox (event_id, type, payload, created_at) VALUES (?, ?, ?, ?)"
).bind(
crypto.randomUUID(),
"research.completed",
JSON.stringify({ brand_slug, research_ids: [id] }),
new Date().toISOString()
),
]);
A scheduled task (cron or Agent schedule) publishes outbox events:
async function flushOutbox(env: Env): Promise<void> {
const pending = await env.DB.prepare(
"SELECT * FROM outbox WHERE published_at IS NULL ORDER BY created_at LIMIT 50"
).all();
for (const row of pending.results) {
await env.EVENTS_QUEUE.send({
event_id: row.event_id,
type: row.type,
source: "gatherfeed",
timestamp: row.created_at,
payload: JSON.parse(row.payload),
});
await env.DB.prepare(
"UPDATE outbox SET published_at = ? WHERE event_id = ?"
).bind(new Date().toISOString(), row.event_id).run();
}
}
This guarantees that if the data was written, the event will eventually be published. The consumerβs idempotency handling covers the case where the event publishes twice (outbox flushed, but the published_at update failed).
Consumer Middleware
Borrowed from Watermillβs middleware pattern. Every queue handler should be wrapped in composable middleware:
type MessageHandler = (msg: DomainMessage, env: Env) => Promise<void>;
type Middleware = (next: MessageHandler) => MessageHandler;
// Deduplication middleware
function withDedup(db: D1Database): Middleware {
return (next) => async (msg, env) => {
const exists = await db.prepare(
"SELECT 1 FROM processed_events WHERE event_id = ?"
).bind(msg.event_id).first();
if (exists) return;
await next(msg, env);
await db.prepare(
"INSERT INTO processed_events (event_id, type, processed_at) VALUES (?, ?, ?)"
).bind(msg.event_id, msg.body.type, new Date().toISOString()).run();
};
}
// Logging middleware
function withLogging(logger: Logger): Middleware {
return (next) => async (msg, env) => {
logger.info({ event_id: msg.event_id, type: msg.type }, "processing");
try {
await next(msg, env);
logger.info({ event_id: msg.event_id }, "completed");
} catch (err) {
logger.error({ event_id: msg.event_id, error: err.message }, "failed");
throw err;
}
};
}
// Compose middleware
function pipe(...middlewares: Middleware[]): (handler: MessageHandler) => MessageHandler {
return (handler) => middlewares.reduceRight((next, mw) => mw(next), handler);
}
// Usage
const processResearch = pipe(
withDedup(env.DB),
withLogging(logger),
)(async (msg, env) => {
// Pure business logic β no boilerplate
const { keywords, brand_slug } = msg.payload;
for (const kw of keywords) {
await researchKeyword(kw, brand_slug, env);
}
});
Every handler gets deduplication, logging, and error tracking for free. Add metrics, rate limiting, or correlation tracking by adding a middleware. The business logic stays clean.
Agents as Event Reactors
The Cloudflare Agents SDK provides persistent, stateful Durable Objects that are perfect for the decision-maker role in an event-driven system.
A BrandAgent doesnβt do the work. It decides what work needs doing, based on its accumulated state.
import { Agent } from "agents";
interface BrandState {
brand_slug: string;
keywords_researched: number;
articles_published: number;
last_research_cycle: string | null;
last_publish_cycle: string | null;
pending_generation: number;
}
export class BrandAgent extends Agent<Env, BrandState> {
initialState: BrandState = {
brand_slug: "",
keywords_researched: 0,
articles_published: 0,
last_research_cycle: null,
last_publish_cycle: null,
pending_generation: 0,
};
// Scheduled: wake up and assess what needs doing
async discoveryCheck() {
const daysSinceResearch = this.daysSince(this.state.last_research_cycle);
if (daysSinceResearch > 7) {
// Emit command β don't do the research here
await this.env.RESEARCH_QUEUE.send({
event_id: crypto.randomUUID(),
type: "research.requested",
source: "scalable-media",
timestamp: new Date().toISOString(),
correlation_id: `cycle-${this.state.brand_slug}-${Date.now()}`,
payload: {
brand_slug: this.state.brand_slug,
keywords: await this.getTargetKeywords(),
priority: "normal",
},
});
this.setState({
...this.state,
last_research_cycle: new Date().toISOString(),
});
}
}
// React to event: research is done
async onResearchCompleted(event: DomainMessage<ResearchCompletedPayload>) {
// Read research from GatherFeed's database
const research = await this.fetchResearchByIds(event.payload.research_ids);
// Apply strategy: filter, score, rank
const candidates = this.evaluateKeywords(research);
// Emit generation commands for the best candidates
for (const candidate of candidates) {
await this.env.CONTENT_QUEUE.send({
event_id: crypto.randomUUID(),
type: "content.generate",
source: "scalable-media",
timestamp: new Date().toISOString(),
correlation_id: event.correlation_id,
payload: {
brand_slug: this.state.brand_slug,
keyword: candidate.keyword,
research_id: candidate.research_id,
template: "pseo-article",
},
});
}
this.setState({
...this.state,
keywords_researched: this.state.keywords_researched + research.length,
pending_generation: candidates.length,
});
}
// React to event: content was published
async onContentPublished(event: DomainMessage<ContentPublishedPayload>) {
this.setState({
...this.state,
articles_published: this.state.articles_published + 1,
pending_generation: Math.max(0, this.state.pending_generation - 1),
last_publish_cycle: new Date().toISOString(),
});
// Schedule a performance check in 7 days
await this.schedule(
7 * 24 * 60 * 60, // seconds
"checkPerformance",
{ urls: event.payload.urls }
);
}
private evaluateKeywords(research: Research[]): Candidate[] {
// Strategy logic: difficulty < 40, volume > 100, not already published
return research
.filter((r) => r.difficulty < 40 && r.volume > 100)
.filter((r) => !this.isAlreadyPublished(r.keyword))
.sort((a, b) => b.volume / b.difficulty - a.volume / a.difficulty)
.slice(0, 10);
}
}
The agentβs core loop is Decide β Evolve β Project (borrowed from Emmettβs event sourcing patterns):
- Decide: Given current state + incoming event β what commands to emit?
- Evolve: Update state based on what happened.
- Project: Build read models (published count, keyword performance, content calendar).
The agent never calls Perplexity. Never calls Gemini. Never publishes HTML. It decides, records, and commands. Everything else is someone elseβs job.
Durable Workflows
For multi-step processes that must complete, use AgentWorkflow. Each step is checkpointed β a crash resumes from the last completed step, not from the beginning.
import { AgentWorkflow } from "agents/workflows";
import type { AgentWorkflowEvent, AgentWorkflowStep } from "agents/workflows";
type GenerateParams = {
brand_slug: string;
keyword: string;
research_id: string;
template: string;
};
export class ContentWorkflow extends AgentWorkflow<BrandAgent, GenerateParams> {
async run(event: AgentWorkflowEvent<GenerateParams>, step: AgentWorkflowStep) {
const { brand_slug, keyword, research_id, template } = event.payload;
// Step 1: Fetch research (durable β won't re-run on retry)
const research = await step.do("fetch-research", {
retries: { limit: 3, delay: "5 seconds", backoff: "exponential" },
timeout: "30 seconds",
}, async () => {
return await fetchResearchFromGatherFeed(research_id, this.env);
});
this.reportProgress({ step: "research", status: "complete", percent: 0.2 });
// Step 2: Generate outline
const outline = await step.do("generate-outline", {
retries: { limit: 3, delay: "10 seconds", backoff: "exponential" },
timeout: "2 minutes",
}, async () => {
return await callGeminiViaApiMom("outline", { keyword, research, template }, this.env);
});
this.reportProgress({ step: "outline", status: "complete", percent: 0.4 });
// Step 3: Draft article
const draft = await step.do("draft-article", {
retries: { limit: 3, delay: "10 seconds", backoff: "exponential" },
timeout: "5 minutes",
}, async () => {
return await callGeminiViaApiMom("draft", { outline, research }, this.env);
});
this.reportProgress({ step: "draft", status: "complete", percent: 0.7 });
// Step 4: Editorial pass
const article = await step.do("editorial-pass", {
retries: { limit: 2, delay: "10 seconds", backoff: "exponential" },
timeout: "3 minutes",
}, async () => {
return await callGeminiViaApiMom("editorial", { draft, keyword }, this.env);
});
this.reportProgress({ step: "editorial", status: "complete", percent: 0.9 });
// Step 5: Store and emit publish command
await step.do("store-and-publish", {
retries: { limit: 3, delay: "5 seconds", backoff: "exponential" },
}, async () => {
const contentId = await storeArticle(article, brand_slug, this.env);
await this.env.PUBLISH_QUEUE.send({
event_id: crypto.randomUUID(),
type: "content.publish",
source: "scalable-media",
timestamp: new Date().toISOString(),
payload: { brand_slug, content_id: contentId, keyword },
});
});
await step.reportComplete({ keyword, brand_slug });
}
}
Wrangler configuration for the workflow:
{
"durable_objects": {
"bindings": [
{ "name": "BRAND_AGENT", "class_name": "BrandAgent" }
]
},
"workflows": [
{
"name": "content-workflow",
"binding": "CONTENT_WORKFLOW",
"class_name": "ContentWorkflow"
}
],
"migrations": [
{ "tag": "v1", "new_sqlite_classes": ["BrandAgent"] }
]
}
If step 3 (drafting) crashes because Gemini is down, the workflow retries step 3 β it doesnβt re-fetch research or re-generate the outline. Steps 1 and 2 are already checkpointed.
The Event Catalog
A shared, typed registry of every event in the system. Every service imports from it. This prevents message schema drift.
// packages/events/src/index.ts
// βββ Base ββββββββββββββββββββββββββββββββββββββββββββ
export interface DomainMessage<T = unknown> {
event_id: string;
type: string;
source: string;
timestamp: string;
correlation_id?: string;
payload: T;
}
// βββ Research Domain βββββββββββββββββββββββββββββββββ
export interface ResearchRequestedPayload {
brand_slug: string;
keywords: string[];
priority: "low" | "normal" | "high";
}
export interface ResearchCompletedPayload {
brand_slug: string;
research_ids: string[];
keywords_researched: number;
}
// βββ Content Domain ββββββββββββββββββββββββββββββββββ
export interface ContentGeneratePayload {
brand_slug: string;
keyword: string;
research_id: string;
template: string;
}
export interface ContentReadyPayload {
brand_slug: string;
content_id: string;
keyword: string;
word_count: number;
}
export interface ContentPublishPayload {
brand_slug: string;
content_id: string;
keyword: string;
}
export interface ContentPublishedPayload {
brand_slug: string;
urls: string[];
published_at: string;
}
// βββ Performance Domain ββββββββββββββββββββββββββββββ
export interface PerformanceReportPayload {
brand_slug: string;
url: string;
impressions: number;
clicks: number;
position: number;
period: string;
}
// βββ Type Map ββββββββββββββββββββββββββββββββββββββββ
export type EventTypeMap = {
"research.requested": ResearchRequestedPayload;
"research.completed": ResearchCompletedPayload;
"content.generate": ContentGeneratePayload;
"content.ready": ContentReadyPayload;
"content.publish": ContentPublishPayload;
"content.published": ContentPublishedPayload;
"performance.report": PerformanceReportPayload;
};
// βββ Helper ββββββββββββββββββββββββββββββββββββββββββ
export function createMessage<K extends keyof EventTypeMap>(
type: K,
source: string,
payload: EventTypeMap[K],
correlationId?: string
): DomainMessage<EventTypeMap[K]> {
return {
event_id: crypto.randomUUID(),
type,
source,
timestamp: new Date().toISOString(),
correlation_id: correlationId,
payload,
};
}
Usage in any service:
import { createMessage } from "@brand-engine/events";
await env.RESEARCH_QUEUE.send(
createMessage("research.requested", "scalable-media", {
brand_slug: "niche-fi",
keywords: ["best budgeting apps", "compound interest calculator"],
priority: "normal",
}, correlationId)
);
Type-safe. Auto-completed. If the schema changes, every consumer gets a compile error.
Small Patterns That Add Up
These are the building blocks. Each is small enough to drop into any Worker.
Delayed Retry with Backpressure
When a downstream service is overloaded, donβt hammer it. Delay the retry:
async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
for (const msg of batch.messages) {
try {
await handleMessage(msg.body, env);
msg.ack();
} catch (err) {
if (err instanceof RateLimitError) {
// Back off β re-deliver in 60 seconds
msg.retry({ delaySeconds: 60 });
} else if (err instanceof TransientError) {
// Standard retry β re-deliver in 10 seconds
msg.retry({ delaySeconds: 10 });
} else {
// Permanent failure β let it DLQ after max retries
msg.retry();
}
}
}
}
Batch Sending
Queues support sendBatch β send up to 100 messages in one call. Use this when generating many commands at once:
// BrandAgent discovers 50 keywords, sends them all at once
const messages = keywords.map((kw) => ({
body: createMessage("research.requested", "scalable-media", {
brand_slug: "niche-fi",
keywords: [kw],
priority: "normal",
}, correlationId),
}));
await env.RESEARCH_QUEUE.sendBatch(messages);
Event Deduplication in Agents
The Agent SDKβs webhook guide recommends this pattern β track seen event IDs in the agentβs SQLite:
class BrandAgent extends Agent<Env, BrandState> {
private async isProcessed(eventId: string): Promise<boolean> {
const row = this.sql`
SELECT 1 FROM processed_events WHERE event_id = ${eventId}
`.toArray();
return row.length > 0;
}
private async markProcessed(eventId: string, type: string): Promise<void> {
this.sql`
INSERT INTO processed_events (event_id, type, processed_at)
VALUES (${eventId}, ${type}, ${new Date().toISOString()})
`;
}
async handleEvent(event: DomainMessage) {
if (await this.isProcessed(event.event_id)) return;
// ... handle the event ...
await this.markProcessed(event.event_id, event.type);
}
}
One Consumer, Multiple Queues
A single Worker can consume from multiple queues. Use batch.queue to route:
export default {
async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
switch (batch.queue) {
case "sm-commands":
await handleCommands(batch, env);
break;
case "sm-events":
await handleEvents(batch, env);
break;
case "sm-commands-dlq":
await handleDeadLetters(batch, env);
break;
}
},
};
Correlation ID Propagation
Every command inherits the correlation ID from its triggering event. This lets you trace an entire cycle:
// Event arrives: research.completed with correlation_id "cycle-niche-fi-1710..."
async onResearchCompleted(event: DomainMessage<ResearchCompletedPayload>) {
// Commands inherit the correlation_id
await env.CONTENT_QUEUE.send(
createMessage("content.generate", "scalable-media", {
brand_slug: event.payload.brand_slug,
keyword: "best budgeting apps",
research_id: "r_001",
template: "pseo-article",
}, event.correlation_id) // β passed through
);
}
// Later, in the DLQ handler, you can find every message in the chain:
// SELECT * FROM processed_events WHERE correlation_id = 'cycle-niche-fi-1710...'
Scheduled Agent Cycle
An agent that wakes up on a cron, checks what needs doing, and emits commands:
class BrandAgent extends Agent<Env, BrandState> {
// Called by this.schedule("0 */6 * * *", "cycle")
async cycle() {
const state = this.state;
// Check: do we need research?
if (this.daysSince(state.last_research_cycle) > 7) {
await this.requestResearch();
}
// Check: do we have unpublished content?
const unpublished = this.sql`
SELECT COUNT(*) as count FROM content WHERE published_at IS NULL
`.toArray();
if (unpublished[0].count > 0) {
await this.publishPending();
}
// Check: do we need performance review?
if (this.daysSince(state.last_performance_check) > 14) {
await this.requestPerformanceData();
}
}
}
Dead Letter Queue Monitor
A simple DLQ consumer that logs failures and alerts:
// dlq-monitor/src/index.ts
export default {
async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
for (const msg of batch.messages) {
// Log the failure with full context
console.error(JSON.stringify({
level: "error",
message: "DLQ message received",
event_id: msg.body.event_id,
type: msg.body.type,
source: msg.body.source,
correlation_id: msg.body.correlation_id,
timestamp: msg.body.timestamp,
attempts: msg.attempts,
}));
// Store for investigation
await env.DB.prepare(
"INSERT INTO dlq_messages (event_id, type, source, payload, received_at) VALUES (?, ?, ?, ?, ?)"
).bind(
msg.body.event_id,
msg.body.type,
msg.body.source,
JSON.stringify(msg.body.payload),
new Date().toISOString()
).run();
msg.ack(); // Acknowledge so it doesn't loop
}
},
};
Read-Only Cross-Service Data Access
Services can expose read-only HTTP APIs for data retrieval. This is the only permitted synchronous cross-service call:
// GatherFeed exposes a read API β no auth needed for internal reads,
// or use service binding for zero-network-hop access
app.get("/api/v1/research/:id", async (c) => {
const research = await c.env.DB.prepare(
"SELECT * FROM research WHERE id = ?"
).bind(c.req.param("id")).first();
if (!research) return c.json({ error: "not found" }, 404);
return c.json(research);
});
// BrandAgent reads GatherFeed's data when deciding what to generate
// This is a query, not a command β it's fine to be synchronous
async fetchResearchByIds(ids: string[]): Promise<Research[]> {
const results = await Promise.all(
ids.map((id) =>
fetch(`${this.env.GATHERFEED_URL}/api/v1/research/${id}`)
.then((r) => r.json())
)
);
return results;
}
Schema Migration for Event Tables
Every service that consumes from queues needs these tables:
-- processed_events: idempotency tracking
CREATE TABLE IF NOT EXISTS processed_events (
event_id TEXT PRIMARY KEY,
type TEXT NOT NULL,
processed_at TEXT NOT NULL
);
CREATE INDEX idx_processed_events_at ON processed_events(processed_at);
-- outbox: guaranteed event publication
CREATE TABLE IF NOT EXISTS outbox (
event_id TEXT PRIMARY KEY,
type TEXT NOT NULL,
source TEXT NOT NULL,
payload TEXT NOT NULL,
correlation_id TEXT,
created_at TEXT NOT NULL,
published_at TEXT
);
CREATE INDEX idx_outbox_unpublished ON outbox(published_at) WHERE published_at IS NULL;
-- dlq_messages: dead letter queue investigation
CREATE TABLE IF NOT EXISTS dlq_messages (
event_id TEXT PRIMARY KEY,
type TEXT NOT NULL,
source TEXT NOT NULL,
payload TEXT NOT NULL,
received_at TEXT NOT NULL,
investigated_at TEXT
);
Testing Queues Locally with Miniflare
You can test queue producers and consumers locally without deploying:
import { Miniflare } from "miniflare";
const mf = new Miniflare({
workers: [
{
name: "producer",
modules: true,
script: `
export default {
async fetch(request, env) {
await env.QUEUE.send({ event_id: "test-1", type: "research.requested" });
return new Response("sent");
}
}
`,
queueProducers: { QUEUE: "research-commands" },
},
{
name: "consumer",
modules: true,
script: `
export default {
async queue(batch) {
for (const msg of batch.messages) {
console.log("received:", msg.body.type);
msg.ack();
}
}
}
`,
queueConsumers: { "research-commands": { maxBatchTimeout: 1 } },
},
],
});
// Trigger the producer
const resp = await mf.dispatchFetch("http://localhost");
console.log(await resp.text()); // "sent"
// Consumer logs: "received: research.requested"
Workflow Progress to WebSocket Clients
AgentWorkflow can broadcast progress to connected dashboard clients in real-time:
class ContentWorkflow extends AgentWorkflow<BrandAgent, GenerateParams> {
async run(event: AgentWorkflowEvent<GenerateParams>, step: AgentWorkflowStep) {
// Non-durable: broadcasts to all WebSocket clients
this.broadcastToClients({
type: "workflow-started",
keyword: event.payload.keyword,
});
const outline = await step.do("outline", { /* ... */ }, async () => {
return await generateOutline(event.payload, this.env);
});
// Progress update β clients see this in real-time
this.reportProgress({ step: "outline", status: "complete", percent: 0.3 });
const draft = await step.do("draft", { /* ... */ }, async () => {
return await generateDraft(outline, this.env);
});
// Durable state update β persists AND broadcasts
await step.mergeAgentState({
currentWorkflow: { keyword: event.payload.keyword, step: "editorial", percent: 0.7 },
});
// ... continue ...
}
}
// Client-side React hook receives all updates automatically:
// const agent = useAgent({ agent: "brand-agent", name: "niche-fi",
// onStateUpdate: (s) => setProgress(s.currentWorkflow)
// });
Conditional Fan-Out
Route events based on payload content, not just event type:
function getRoutes(event: DomainMessage): Queue[] {
const routes: Queue[] = [];
// Type-based routing
if (event.type === "research.completed") {
routes.push(env.SM_COMMANDS);
}
if (event.type === "content.published") {
routes.push(env.SM_COMMANDS);
// Only fan out to social if the brand has social enabled
if (event.payload.brand_slug !== "internal-tools") {
routes.push(env.SOCIAL_COMMANDS);
}
// Only fan out to analytics in production
if (env.ENVIRONMENT === "production") {
routes.push(env.ANALYTICS_QUEUE);
}
}
// Priority-based routing
if (event.payload.priority === "high") {
routes.push(env.ALERTS_QUEUE);
}
return routes;
}
Graceful Queue Consumer with Batch Acknowledgment
Process a batch, acknowledge the good messages, retry the bad ones:
async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
// Don't use batch.ackAll() or batch.retryAll()
// Handle each message individually for granularity
for (const msg of batch.messages) {
try {
await processMessage(msg.body, env);
msg.ack(); // This one is done
} catch (err) {
if (isRetryable(err)) {
msg.retry({ delaySeconds: computeBackoff(msg.attempts) });
} else {
// Log the permanent failure, ack so it goes to DLQ
console.error(`Permanent failure for ${msg.body.event_id}:`, err);
msg.ack(); // Will hit DLQ after max retries anyway
}
}
}
}
function computeBackoff(attempts: number): number {
// Exponential backoff: 5s, 10s, 20s, 40s, capped at 300s
return Math.min(5 * Math.pow(2, attempts), 300);
}
function isRetryable(err: unknown): boolean {
return err instanceof TransientError ||
(err instanceof Response && err.status >= 500);
}
What Not to Do
| Anti-Pattern | Why It Fails | Do This Instead |
|---|---|---|
| Large payloads in messages | 128KB limit. Queues are for signals, not data transfer. | Put data in D1/R2. Put a reference ID in the message. |
| Assuming message order | Queues donβt guarantee order. content.published can arrive before content.ready. | Design every handler to work regardless of arrival order. Use state to reconcile. |
| Sync disguised as async | Sending a command then polling for the result is just HTTP with extra latency. | Let the completion event come to you. React, donβt poll. |
| Processing without dedup | At-least-once will bite you. Duplicate research calls, duplicate articles, duplicate costs. | Check event_id before every action. Use INSERT OR IGNORE for writes. |
| One giant queue | Everything competes. A slow handler blocks fast ones. No isolation. | Separate command queues per service. Shared events queue with fan-out. |
| Direct API calls between Workers | Temporal coupling. CF error 1042 on same-account Workers. Cost amplification. | Queues for actions. Read-only HTTP only for data retrieval. |
| Agent does the work | The agent calls Perplexity, generates content, publishes HTML. Now itβs a monolith in a Durable Object. | Agent decides and commands. Workflows and services do the work. |
| Ignoring the DLQ | Messages fail silently. You donβt know something is broken until a customer complains. | Monitor DLQ. Alert on messages. Every DLQ message is a bug report. |
| No correlation ID | A brand cycle triggers 50 research commands, 30 generations, 20 publications. You canβt trace the chain. | Every command inherits the correlation_id from the triggering event. |
| Storing state in messages | Messages are ephemeral. If you lose the message, you lose the state. | Store state in Agent SQLite or D1. Messages are notifications about state changes. |
What You Donβt Need
- Kafka or RabbitMQ. CF Queues handles 5,000 msg/sec per queue with auto-scaling consumers. Unless youβre doing millions/sec, itβs enough.
- Full event sourcing. Rebuilding state from event streams is powerful but complex. D1 is your source of truth. Events are signals, not the ledger.
- A saga orchestrator library. AgentWorkflow with durable steps IS the saga pattern. Each step is a compensatable transaction.
- External pub/sub. CF Queues + fan-out consumer covers fan-out. Event subscriptions cover native CF events.
- Service mesh / service discovery. Queue bindings in
wrangler.jsoncARE your service discovery. No runtime lookup needed.
Cloudflare Constraints
Design around these. Donβt fight them.
| Constraint | Impact | Design Response |
|---|---|---|
| One consumer per queue | No consumer groups like Kafka | One queue per consuming service. Fan-out consumer for broadcast. |
| At-least-once delivery | Duplicates will happen | Idempotency everywhere. processed_events table. Natural business keys. |
| No message ordering | Canβt rely on sequence | Handlers must be order-independent. Use timestamps + state for reconciliation. |
| 128KB message limit | Canβt send large payloads | Messages carry IDs. Data lives in D1/R2. |
| Worker CPU time limits | Long chains can timeout | Break chains into separate queue hops. Each hop is a fresh Worker invocation. |
| CF error 1042 | Same-account Workers canβt fetch() each other via workers.dev | Donβt use HTTP between Workers. Use Queues. Service bindings only for infrastructure. |
| No event replay | Canβt rewind and replay like Kafka | Outbox pattern for guaranteed publication. DLQ for failed messages. Careful idempotency. |
Queue send() can fail | Write succeeds, publish doesnβt | Outbox pattern β write event to D1 in same transaction, publish from outbox. |
Compared: Message Queues
Cloudflare Queues is one option among many. Hereβs how it stacks up against the alternatives β and when youβd pick each.
Cloudflare Queues vs AWS SQS
SQS is the closest equivalent. Both are managed, serverless, point-to-point queues.
| Dimension | Cloudflare Queues | AWS SQS |
|---|---|---|
| Delivery | At-least-once | At-least-once (standard) or exactly-once (FIFO) |
| Ordering | No guarantee | No guarantee (standard) or strict FIFO |
| Max message size | 128 KB | 256 KB (up to 2 GB with S3 pointer) |
| Throughput | 5,000 msg/sec/queue | Nearly unlimited (standard), 300 msg/sec (FIFO) |
| Consumers | 1 per queue (up to 250 concurrent invocations) | Unlimited consumers polling |
| Dead letter queue | Yes | Yes |
| Message delay | Up to 24 hours | Up to 15 minutes |
| Retention | Until consumed | Up to 14 days |
| Compute coupling | Tightly coupled to Workers | Loosely coupled (Lambda, EC2, ECS, anything) |
| Egress fees | None | Standard AWS egress |
| Fan-out | Manual (JS routing in consumer) | Use SNS + SQS fan-out pattern |
| Pricing model | Per message (simple) | Per request + data transfer |
When to pick SQS: You need FIFO ordering, exactly-once delivery, longer retention, or your compute already lives on AWS.
When to pick CF Queues: Your compute is on Cloudflare Workers. Zero egress fees. Simpler pricing. No separate fan-out service needed β JavaScript routing in the consumer handles it.
Cloudflare Queues vs AWS SNS + SQS
AWS solves fan-out by pairing SNS (pub/sub) with SQS (queues). A producer publishes to an SNS topic, and multiple SQS queues subscribe.
AWS: Producer β SNS Topic β SQS Queue A (Service A)
β SQS Queue B (Service B)
β SQS Queue C (Service C)
CF: Producer β Events Queue β Fan-out Consumer β Queue A (Service A)
β Queue B (Service B)
β Queue C (Service C)
| Dimension | SNS + SQS | CF Queues + Fan-out Consumer |
|---|---|---|
| Fan-out mechanism | Declarative: subscribe queues to topics | Programmable: JS code routes messages |
| Filtering | SNS subscription filter policies (JSON) | Any JS logic (payload inspection, env vars, time-based) |
| Ordering | SNS FIFO + SQS FIFO (same message group) | No ordering guarantee |
| Complexity | Two services to configure, IAM policies | One extra Worker (the fan-out consumer) |
| Scalability | SNS handles millions of subscribers | You manage the fan-out Workerβs queue bindings |
| Content-based routing | SNS filter policies (limited to message attributes) | Full JS β route on any field, any logic |
Pros of SNS+SQS: Battle-tested at massive scale. Declarative subscriptions. FIFO support.
Pros of CF fan-out: Programmable routing (not just attribute matching). No extra service to manage β itβs just a Worker. No cross-service IAM configuration.
Cloudflare Queues vs AWS EventBridge
EventBridge is AWSβs serverless event bus β the closest analog to our βevents queue + fan-out consumerβ pattern, but fully managed.
| Dimension | AWS EventBridge | CF Queues + Fan-out |
|---|---|---|
| Event routing | Rules with 28+ filtering patterns | JavaScript (unlimited flexibility) |
| Schema registry | Built-in (auto-discovers schemas) | Manual (TypeScript event catalog) |
| Targets | 25+ AWS services directly | Only CF Queues (you wire the rest) |
| Cross-account | Native cross-account event bus | Not applicable (single CF account) |
| Archive & replay | Yes (replay events from archive) | No (consumed = gone) |
| Throughput | Soft limits, request increases | 5,000 msg/sec per queue |
| Latency | ~500ms typical | Sub-100ms (same CF network) |
| Pricing | Per event ingested | Per message sent/received |
When EventBridge wins: You need event archive/replay, schema discovery, or deep AWS service integration.
When CF wins: You want sub-100ms latency between services on the same edge network. Your routing logic is complex enough that JSON filter patterns donβt cut it.
Cloudflare Queues vs Apache Kafka
Kafka is a different animal β a distributed commit log, not a message queue.
| Dimension | Apache Kafka | Cloudflare Queues |
|---|---|---|
| Model | Distributed commit log | Message queue |
| Ordering | Guaranteed per partition | No guarantee |
| Delivery | Exactly-once (with transactions) | At-least-once |
| Retention | Configurable (days, weeks, forever) | Until consumed |
| Replay | Yes β consumers can rewind to any offset | No |
| Consumer groups | Multiple consumers per topic (consumer groups) | One consumer per queue |
| Throughput | Millions of messages/sec | 5,000 msg/sec per queue |
| Operations | Self-managed or Confluent Cloud ($$$) | Fully managed, zero ops |
| Schema | Schema Registry (Avro, Protobuf, JSON Schema) | Manual (TypeScript types) |
| Latency | Low (ms) | Low (ms, same network) |
| Cost | Clusters + storage + egress | Per-message, no base cost |
When Kafka wins: You need event replay (audit logs, rebuilding state), strict ordering, multiple independent consumers reading the same stream, or million-msg/sec throughput.
When CF Queues wins: You donβt need any of that. Most applications donβt. If your throughput is under 50K msg/sec and you donβt need replay, Kafka is massive overkill. CF Queues is zero-ops and costs pennies.
Cloudflare Queues vs Google Cloud Pub/Sub
| Dimension | Google Cloud Pub/Sub | Cloudflare Queues |
|---|---|---|
| Model | Pub/Sub with subscriptions | Point-to-point queue |
| Fan-out | Native: multiple subscriptions per topic | Manual: fan-out consumer |
| Ordering | Supported (ordering keys) | No guarantee |
| Delivery | At-least-once (exactly-once with ordering) | At-least-once |
| Dead letter | Yes | Yes |
| Message size | 10 MB | 128 KB |
| Retention | Up to 31 days | Until consumed |
| Push/Pull | Both | Both (Worker consumer or HTTP pull) |
| Global | Multi-region by default | Global (Cloudflare network) |
When Pub/Sub wins: Native fan-out without custom code. Larger messages. Longer retention. Deep GCP integration.
When CF Queues wins: Tighter integration with Workers. No egress fees. Simpler pricing. Lower latency within the CF network.
Cloudflare Queues vs RabbitMQ
| Dimension | RabbitMQ | Cloudflare Queues |
|---|---|---|
| Model | AMQP message broker | Managed queue |
| Routing | Exchanges: direct, topic, fanout, headers | JavaScript consumer logic |
| Ordering | Per-queue FIFO | No guarantee |
| Delivery | At-least-once or at-most-once (configurable) | At-least-once |
| Operations | Self-managed or CloudAMQP | Fully managed |
| Protocol | AMQP, MQTT, STOMP | HTTP (producer), push (consumer) |
| Consumer model | Pull (competing consumers) | Push (one consumer Worker, auto-scaled) |
When RabbitMQ wins: You need complex routing topologies (exchanges + binding keys), multiple protocols, or priority queues.
When CF Queues wins: You donβt want to manage infrastructure. Your services are Workers. Simple is better.
Compared: Durable Execution
Multi-step workflows that survive failures. This is the domain of βdurable executionβ engines.
Cloudflare Workflows vs AWS Step Functions
| Dimension | AWS Step Functions | Cloudflare Workflows |
|---|---|---|
| Definition | JSON (Amazon States Language) or visual designer | TypeScript code (step.do()) |
| Execution model | State machine with transitions | Sequential code with durable checkpoints |
| Max duration | 1 year (standard), 5 minutes (express) | Up to 1 year (waitForEvent) |
| Retry | Per-state retry/catch policies | Per-step retry with backoff |
| Parallelism | Parallel and Map states | Promise.all() across multiple step.do() |
| Human-in-the-loop | Task tokens (callback pattern) | waitForApproval() |
| Observability | Visual execution history, X-Ray | reportProgress(), lifecycle callbacks |
| Pricing | Per state transition ($0.025/1K) | Per step (included in Workers pricing) |
| Compute coupling | Lambda, ECS, any AWS service | Workers only |
| Agent integration | No native equivalent | Native: AgentWorkflow β Agent RPC |
Pros of Step Functions: Visual designer. Deep AWS integration (invoke any service as a step). Mature ecosystem. Express Workflows for high-throughput, short-duration.
Pros of CF Workflows: Code, not config. TypeScript all the way β no ASL to learn. Native agent integration (workflows can call agent methods, update agent state). Simpler pricing.
Key difference: Step Functions are state machines defined in JSON. CF Workflows are sequential TypeScript with checkpoints. If you think in code, CF Workflows feel natural. If you think in diagrams, Step Functions have a visual editor.
Cloudflare Workflows vs Temporal
| Dimension | Temporal | Cloudflare Workflows |
|---|---|---|
| Language | Go, Java, TypeScript, Python, .NET | TypeScript (Workers) |
| Infrastructure | Self-hosted cluster or Temporal Cloud | Fully managed (zero ops) |
| Execution model | Replay-based (deterministic re-execution) | Checkpoint-based (step results persisted) |
| Durability | Infinite (workflow history is persistent) | Up to 1 year |
| Signals & queries | First-class (signal a running workflow, query its state) | sendEvent(), waitForApproval() |
| Versioning | Workflow versioning with getVersion() | No native versioning |
| Observability | Temporal Web UI, workflow history explorer | Agent callbacks, reportProgress() |
| Community | Large (Uber, Netflix, Snap, Stripe) | Growing (Cloudflare ecosystem) |
| Learning curve | Steep (deterministic constraints, replay model) | Moderate (just step.do() in sequence) |
Pros of Temporal: Battle-tested at enormous scale. Rich workflow primitives (signals, queries, child workflows, continue-as-new). Multi-language. Self-hostable. Large community.
Pros of CF Workflows: Zero infrastructure. No cluster to manage. No replay model to understand β just write sequential code. Tight integration with Agents SDK for real-time state sync and WebSocket broadcasting.
Key difference: Temporal uses replay-based durability β your workflow function is re-executed from the start, but completed activities return cached results. This is powerful but requires understanding deterministic constraints (no random, no Date.now() outside activities). CF Workflows uses checkpoint-based durability β step results are stored, and execution resumes from the last checkpoint. Simpler mental model, fewer gotchas.
Cloudflare Workflows vs Inngest
| Dimension | Inngest | Cloudflare Workflows |
|---|---|---|
| Model | Event-driven step functions | Agent-integrated durable workflows |
| Trigger | Events (any source via HTTP) | Agent runWorkflow() or manual |
| Steps | step.run(), step.sleep(), step.waitForEvent() | step.do(), step.sleep(), waitForApproval() |
| Infrastructure | Managed (SaaS) or self-hosted | Managed (Cloudflare) |
| Compute | Your serverless functions (Lambda, Vercel, CF Workers) | Workers only |
| Fan-out | Built-in (step.sendEvent()) | Manual (queue publish in step) |
| Concurrency control | Built-in (per-function, per-key) | Manual |
| Pricing | Per step execution | Included in Workers |
Pros of Inngest: Platform-agnostic. Built-in concurrency control. Event-driven triggers from any source. Can invoke functions on any serverless platform.
Pros of CF Workflows: No external dependency. Same-network execution (no HTTP hop to invoke). Agent state integration. Simpler if youβre already on Cloudflare.
Cloudflare Workflows vs Restate
Restate is a newer durable execution engine that sits between Temporalβs power and Inngestβs simplicity.
| Dimension | Restate | Cloudflare Workflows |
|---|---|---|
| Model | Durable async/await with journaling | Durable steps with checkpoints |
| Execution | Proxied function calls (like Temporal, but lighter) | Sequential step.do() calls |
| State | Built-in key-value state per handler | Agent SQLite + key-value state |
| Versioning | First-class handler versioning | No native versioning |
| Infrastructure | Restate Server (self-hosted or cloud) | Fully managed |
| Language | TypeScript, Java, Kotlin, Go | TypeScript |
Pros of Restate: Solves the immutability problem (handler versioning). Lighter than Temporal. Virtual objects (similar to Durable Objects).
Pros of CF Workflows: No external server. Deeply integrated with the rest of the Cloudflare stack (Queues, D1, R2, Agents).
Compared: Stateful Compute
The BrandAgent pattern (persistent, stateful, addressable compute) has equivalents on other platforms.
Cloudflare Durable Objects / Agents SDK vs Azure Durable Entities
| Dimension | Azure Durable Entities | CF Durable Objects / Agents SDK |
|---|---|---|
| Model | Entity functions (actor-like) | Durable Objects (actor model) |
| Storage | Azure Table Storage (managed) | SQLite per object (co-located) |
| Addressing | Entity ID | Object ID (name-based) |
| Communication | Signals (one-way) or calls (request/response) | RPC, WebSocket, HTTP |
| Scheduling | Durable timers | this.schedule() (cron or delay) |
| Real-time | SignalR integration | Native WebSocket, state sync to clients |
| Colocation | Compute and storage may be separate | Compute and storage in the same thread |
Key advantage of CF: Storage is co-located with compute in the same thread. No network hop to read state. This is unique β on Azure, entity state is in Table Storage, which means a network round-trip on every read.
Cloudflare Durable Objects vs AWS DynamoDB + Lambda
AWS doesnβt have a direct equivalent to Durable Objects. The closest pattern is Lambda functions triggered by DynamoDB Streams:
CF: Request β Worker β Durable Object (state + compute in one place)
AWS: Request β Lambda β DynamoDB (state)
β DynamoDB Streams β Lambda (react to changes)
| Dimension | DynamoDB + Lambda | Cloudflare Durable Objects |
|---|---|---|
| State access | Network hop to DynamoDB | Same-thread SQLite (0ms) |
| Change events | DynamoDB Streams β Lambda | Not built-in (use Queues or outbox) |
| Consistency | Eventually consistent reads (strong optional) | Strongly consistent (single-threaded) |
| Addressing | Partition key | Object name |
| Cost | DynamoDB RCU/WCU + Lambda invocations | Durable Object requests + duration |
| Actor model | No (you build it) | Yes (one instance per ID, serialized access) |
Pros of DynamoDB+Lambda: Mature. Global tables. DynamoDB Streams for change data capture (CF has no equivalent of change feeds).
Pros of CF Durable Objects: Single-threaded actor model eliminates concurrency bugs. Storage is in the same thread β no network hop. WebSocket support built-in. Agents SDK adds scheduling, workflows, and state sync for free.
Cloudflare Agents SDK vs Microsoft Orleans (Virtual Actors)
The Agents SDKβs actor-per-entity pattern is directly inspired by the virtual actor model that Orleans pioneered and deco-cx/actors adapted for edge:
| Dimension | Orleans | Cloudflare Agents SDK |
|---|---|---|
| Language | C# (.NET) | TypeScript |
| Runtime | .NET cluster (self-hosted or Azure) | Cloudflare Workers (managed) |
| Actor lifecycle | Virtual: activated on demand, deactivated on idle | Same: created on first access, hibernates |
| Persistence | Pluggable (Azure Storage, SQL, etc.) | SQLite (co-located) |
| Streams | Orleans Streams (pub/sub between actors) | No built-in inter-agent pub/sub (use Queues) |
| Timers/Reminders | Built-in (survive restarts) | this.schedule() (survives restarts) |
| Reentrancy | Configurable | Single-threaded (no reentrancy) |
Key insight: Orleans invented the pattern. CF Agents SDK implements it on edge infrastructure. The mental model is the same β one actor per entity (one BrandAgent per brand), activated on demand, sleeps when idle, state persists across activations.
Compared: Full Stack Architectures
How does a full event-driven system on Cloudflare compare to equivalent architectures on other platforms?
Cloudflare Stack vs AWS Serverless Stack
| Layer | Cloudflare | AWS |
|---|---|---|
| Compute | Workers | Lambda |
| Message queue | Queues | SQS |
| Event bus | Queues + fan-out consumer | EventBridge or SNS |
| Stateful compute | Durable Objects / Agents SDK | Step Functions + DynamoDB (no direct equivalent) |
| Durable workflows | Workflows / AgentWorkflow | Step Functions |
| Database | D1 (SQLite) | DynamoDB or Aurora Serverless |
| Object storage | R2 | S3 |
| Real-time | WebSockets on Durable Objects | API Gateway WebSocket + Lambda |
| Scheduling | Agent this.schedule() or Cron Triggers | EventBridge Scheduler |
| AI inference | Workers AI | Bedrock |
| Observability | Logpush, Workers Analytics | CloudWatch, X-Ray |
Pros of AWS: Broader service catalog. More mature. Larger community. More third-party integrations. FIFO queues. Event replay (EventBridge Archive).
Pros of Cloudflare: Radically simpler. Fewer services to wire together. Co-located compute+storage (Durable Objects). Zero egress fees. Sub-100ms global latency. One language (TypeScript) for everything. Agents SDK is a uniquely integrated offering β stateful compute + scheduling + workflows + WebSocket + SQLite in one class.
The honest trade-off: AWS gives you more Lego bricks. Cloudflare gives you fewer bricks that fit together more tightly. If you need exactly the right brick (FIFO queues, event replay, cross-account event bus), AWS has it. If you want to build fast with less operational overhead, Cloudflareβs integrated stack is hard to beat.
When to NOT Use the Cloudflare Stack
Be honest about what it canβt do:
- You need FIFO ordering. CF Queues donβt guarantee order. Period. If your domain requires strict ordering (financial transactions, ledger entries), use SQS FIFO or Kafka.
- You need event replay. CF Queues are consume-and-forget. Kafkaβs commit log or EventBridge Archive let you rewind. If audit/compliance requires replaying the event stream, Cloudflare canβt do it natively.
- You need multiple independent consumers per topic. Kafka consumer groups let N consumers read the same topic independently. CF Queues requires the fan-out consumer pattern, which adds a hop and is less elegant at scale.
- Your team knows AWS. Organizational momentum matters. If your team has years of AWS muscle memory, the productivity gain of Cloudflareβs simpler model might not offset the learning curve.
- You need > 50K msg/sec sustained. CF Queues does 5K/queue. You can shard across queues, but at some point youβre fighting the platform. Kafka is built for this.
- You need global database transactions. D1 is regional. DynamoDB Global Tables or CockroachDB handle multi-region writes. Durable Objects are globally addressable but single-homed.
When the Cloudflare Stack Shines
- Content platforms. Read-heavy, write-light. D1 per-service. Workers at the edge. Durable Objects for personalization. This is the sweet spot.
- AI agent systems. Agents SDK is purpose-built. Persistent state + scheduling + workflows + real-time WebSocket. No equivalent exists on AWS without stitching 4-5 services together.
- Multi-tenant SaaS. Durable Objects as per-tenant state. D1 per-tenant databases. Workers for API layer. Natural isolation boundaries.
- Side projects and startups. Zero base cost. Pay-per-use. No clusters to manage. Ship in a weekend.
Full Example: Content Pipeline
Hereβs the complete flow β a BrandAgent wakes up, triggers research, reacts to results, generates content, and publishes. Every arrow is a queue message:
1. BrandAgent scheduled wake-up fires (this.schedule, cron)
2. Agent checks state: "Last research was 8 days ago. Need keywords."
3. Agent sends β RESEARCH_QUEUE:
{ type: "research.requested", payload: { brand_slug: "niche-fi", keywords: [...] } }
4. GatherFeed consumer picks up message
5. GatherFeed calls Perplexity via API Mom β keyword research
6. GatherFeed stores results in its D1
7. GatherFeed writes to outbox in same transaction
8. GatherFeed outbox flush β EVENTS_QUEUE:
{ type: "research.completed", payload: { brand_slug: "niche-fi", research_ids: [...] } }
9. Fan-out consumer reads event, routes to SM_COMMANDS queue
10. BrandAgent receives event, reads GatherFeed's DB via read API
11. Agent applies strategy: difficulty < 40, volume > 100, not published
12. Agent selects 10 keywords, starts ContentWorkflow for each
13. ContentWorkflow step.do("outline") β Gemini via API Mom
14. ContentWorkflow step.do("draft") β Gemini
15. ContentWorkflow step.do("editorial") β Gemini
16. ContentWorkflow step.do("store-and-publish") β D1 write + PUBLISH_QUEUE:
{ type: "content.publish", payload: { brand_slug: "niche-fi", content_id: "..." } }
17. Pages-plus consumer picks up message, reads content from SM's read API
18. Pages-plus writes to its D1, articles go live
19. Pages-plus writes to outbox β EVENTS_QUEUE:
{ type: "content.published", payload: { brand_slug: "niche-fi", urls: [...] } }
20. Fan-out consumer routes to SM_COMMANDS and SOCIAL_COMMANDS
21. BrandAgent records: articles live. Schedules performance check in 7 days.
22. Social-good receives event, creates social posts promoting the content.
23. Seven days later: BrandAgent wakes up, reads analytics, adjusts strategy.
24. Cycle continues.
No service waited for another. GatherFeed could take 5 minutes or 5 hours. If step 14 crashed, the workflow resumed at step 14, not step 1. Every service only knows about its own queue and the events queue. Add a new service by adding a line to the fan-out consumerβs route table.
References
Cloudflare Documentation
- Cloudflare Queues β message queues between Workers
- How Queues Works β producers, consumers, batching, delivery guarantees
- Queue Configuration β wrangler setup, DLQ, concurrency
- Delivery Guarantees β at-least-once semantics
- Batching and Retries β retry config, message delays
- Consumer Concurrency β auto-scaling consumers
- Event Subscriptions β native CF service events (R2, KV, Workers AI, Workflows)
- Cloudflare Agents SDK β persistent stateful agents on Durable Objects
- Agents API Reference β scheduling, queues, retries, state, workflows
- Agent Workflows β durable multi-step execution
- Build a Durable AI Agent β tutorial: Agents SDK + Workflows
- Workers Best Practices β configuration, architecture, observability
- Durable Objects β stateful serverless with SQLite
Libraries and Frameworks
- @cloudflare/actors β Cloudflareβs framework for easier Durable Objects, pub/sub, broadcast
- Cloudflare Agents SDK β build and deploy AI agents on Workers
- Durable Object Groups (DOG) β Cloudflareβs library for DO replica clusters
- Emmett β TypeScript event sourcing: decide/evolve/project pattern
- Pongo β event sourcing on PostgreSQL (has Cloudflare Workers sample)
- Watermill β Go event-driven library: middleware chains, pub/sub abstraction, CQRS
- Dapr + Cloudflare Queues β event-driven reference with Dapr sidecar pattern
- durable-objects-channel β pub/sub module built on Durable Objects
- deco-cx/actors β Orleans-inspired virtual actors on Durable Objects and Deno
Comparisons and Analysis
- Durable Execution Showdown: AWS vs Temporal vs Cloudflare Workflows β side-by-side comparison of durable execution engines
- The Rise of the Durable Execution Engine β Temporal and Restate in event-driven architectures
- The Emerging Landscape of Durable Computing β survey of durable execution platforms
- The Ultimate Guide to TypeScript Orchestration β Temporal vs Trigger.dev vs Inngest
- Choosing Between SNS, SQS, and EventBridge β AWS messaging service comparison
- AWS Decision Guide: SNS vs SQS vs EventBridge β official AWS guidance
- Edge Databases Compared β D1/KV/Durable Objects vs DynamoDB vs Cosmos DB vs Firestore
- Cloud Provider Comparison: Cloudflare vs AWS vs Azure vs GCP β full-stack platform comparison
- Restate: Solving Durable Executionβs Immutability Problem β handler versioning approach
- Inngest vs Temporal β detailed feature comparison
Architecture Patterns
- Event-Driven.io β Oskar Dudyczβs resources on event-driven architecture and event sourcing
- CQRS Pattern β IBMβs reference architecture for CQRS in event-driven systems
- Awesome CQRS & Event Sourcing β curated list of CQRS and event sourcing resources
- Awesome Software Design β patterns, decisions, and design rules
License
MIT