Skip to content
Gary Wu
Go back

Event-Driven Architecture on Cloudflare Workers

Edit page

Org Status: 🟑 Dormant Cloudflare: N/A Last Audited: 2026-04-28


Event-Driven Architecture on Cloudflare Workers

                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚    Brand Agent       β”‚
                        β”‚  (Durable Object)    β”‚
                        β”‚                      β”‚
                        β”‚  state + strategy    β”‚
                        β”‚  decides WHAT + WHY  β”‚
                        β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                           β”‚              β”‚
                    command β”‚              β”‚ command
                           β–Ό              β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  RESEARCH_QUEUEβ”‚  β”‚  PUBLISH_QUEUE  β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚                   β”‚
                      β–Ό                   β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  GatherFeed  β”‚    β”‚  Pages-plus  β”‚
              β”‚  (research)  β”‚    β”‚  (publisher) β”‚
              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚                   β”‚
               event β”‚                   β”‚ event
                     β–Ό                   β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚         EVENTS_QUEUE            β”‚
              β”‚  (fan-out consumer routes to    β”‚
              β”‚   all interested services)      β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Stop calling fetch() between your Workers. Use Queues.

This is a reference architecture for building event-driven systems on Cloudflare’s developer platform β€” Workers, Queues, Durable Objects, and the Agents SDK. It covers the patterns that work, the anti-patterns that don’t, and the Cloudflare-specific constraints you need to design around.


Table of Contents

Open Table of Contents

The Problem with Imperative Architectures

This is how most people wire up Cloudflare Workers:

// Service A calls Service B, waits, then calls Service C
const research = await fetch("https://gatherfeed.workers.dev/api/v1/research", {
  method: "POST",
  body: JSON.stringify({ keyword: "best budgeting apps" }),
});
const data = await research.json();

const article = await fetch("https://content-engine.workers.dev/v1/generate", {
  method: "POST",
  body: JSON.stringify({ research: data }),
});
const content = await article.json();

await fetch("https://publisher.workers.dev/v1/publish", {
  method: "POST",
  body: JSON.stringify({ content }),
});

This is synchronous, imperative, and fragile. Every problem with microservices shows up:

The fix isn’t better error handling. It’s a different architecture.


Two Patterns, Not One

People conflate β€œevent-driven” and β€œmessage-driven.” They’re different, and you usually want both.

Message-Driven (Commands)

A producer sends a message to a specific consumer. The producer knows who’s receiving it.

"Hey GatherFeed, research these keywords."

This is a command. It’s directed. It tells a service what to do. The producer is coupled to the consumer β€” it knows the destination queue exists and what the consumer expects.

Event-Driven (Facts)

A producer emits a fact about something that happened. The producer doesn’t know or care who’s listening.

"Research completed for brand X. 50 keywords ready."

This is an event. It’s broadcast. It describes what happened, not what should happen next. Consumers subscribe. The producer is decoupled β€” add or remove consumers without touching the producer.

When to Use Which

PatternUse WhenCloudflare Primitive
Command (message)You need a specific service to do a specific thingQueue with dedicated consumer
Event (fact)Multiple services might care about what happenedShared events queue with fan-out consumer

In practice, the flow is: commands trigger work β†’ work produces events β†’ events trigger decisions β†’ decisions produce commands. That’s the loop.


Cloudflare Primitives

Cloudflare provides four building blocks for event-driven systems. Each has a specific role:

Cloudflare Queues

The message backbone between Workers. Producer Workers write messages; consumer Workers process them.

PropertyValue
Delivery guaranteeAt-least-once
OrderingNo guarantee
Max message size128 KB
Throughput5,000 messages/sec per queue
Consumer concurrencyUp to 250 auto-scaling consumers
Max retryConfigurable (default 3)
Dead letter queueSupported
Message delay0–86,400 seconds (24 hours)

The critical constraint: one consumer Worker per queue. This isn’t a limitation if you design for it β€” it actually simplifies reasoning about message ownership.

Cloudflare Agents SDK

Persistent, stateful agents built on Durable Objects. Each agent has:

Agents are the decision-makers. They react to events, maintain strategy, and emit commands.

Cloudflare Workflows (AgentWorkflow)

Durable multi-step execution integrated with Agents:

Workflows are for processes that must complete: research β†’ generate β†’ publish. Not for fire-and-forget.

Event Subscriptions

Native Cloudflare service events (R2, KV, Workers AI, Workflows) published to Queues automatically:

npx wrangler queues subscription create my-queue --source r2 --events bucket.created

Events follow a standard structure:

{
  "type": "cf.r2.bucket.created",
  "source": { "type": "r2" },
  "payload": { "name": "my-bucket", "location": "WNAM" },
  "metadata": { "accountId": "...", "eventTimestamp": "2026-03-11T10:00:00Z" }
}

This is true event-driven β€” the emitter doesn’t know who’s subscribed.


The Message Envelope

Every message through any queue follows this shape:

interface DomainMessage<T = unknown> {
  event_id: string;        // UUID v4 β€” deduplication key
  type: string;            // dot-notation: "research.requested", "content.published"
  source: string;          // emitting service: "scalable-media", "gatherfeed"
  timestamp: string;       // ISO 8601
  correlation_id?: string; // traces a chain of related messages
  payload: T;              // reference data β€” IDs, not objects
}

Rules:

  1. event_id is mandatory. At-least-once delivery means duplicates happen. This is your deduplication key.
  2. type uses dot-notation. Domain-first: research.requested, not requested_research. The domain is the noun, the action is the verb.
  3. Payloads carry references, not data. { research_id: "abc" }, not { full_research_object: {...} }. The data lives in D1 or R2. Messages are signals, databases are state.
  4. correlation_id traces causality. A brand cycle generates research commands, which produce completion events, which trigger generation commands. The correlation ID ties them together for debugging.

Queue Topology

Each service owns its inbound command queue. There is one shared events queue with a fan-out consumer.

                     COMMAND QUEUES (directed, one consumer each)
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚                                         β”‚
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€  gatherfeed-commands  β†’ GatherFeed      β”‚
  β”‚                  β”‚  sm-commands          β†’ Scalable Media   β”‚
  β”‚  Producers       β”‚  publish-commands     β†’ Pages-plus       β”‚
  β”‚  (any service)   β”‚  social-commands      β†’ Social-good      β”‚
  β”‚                  β”‚                                         β”‚
  β”‚                  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
  β”‚                  β”‚                                         β”‚
  β”‚                  β”‚  EVENT QUEUE (broadcast, fan-out)        β”‚
  └───────────────────  brand-events  β†’ Fan-out consumer       β”‚
                     β”‚                  β†’ routes to all         β”‚
                     β”‚                    interested queues     β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Wrangler Configuration

Producer side (e.g., Scalable Media):

{
  "queues": {
    "producers": [
      { "queue": "gatherfeed-commands", "binding": "RESEARCH_QUEUE" },
      { "queue": "publish-commands", "binding": "PUBLISH_QUEUE" },
      { "queue": "brand-events", "binding": "EVENTS_QUEUE" }
    ],
    "consumers": [
      {
        "queue": "sm-commands",
        "max_batch_size": 10,
        "max_batch_timeout": 5,
        "dead_letter_queue": "sm-commands-dlq"
      }
    ]
  }
}

Consumer side (e.g., GatherFeed):

{
  "queues": {
    "producers": [
      { "queue": "brand-events", "binding": "EVENTS_QUEUE" }
    ],
    "consumers": [
      {
        "queue": "gatherfeed-commands",
        "max_batch_size": 5,
        "max_batch_timeout": 10,
        "dead_letter_queue": "gatherfeed-commands-dlq"
      }
    ]
  }
}

Dead letter queues are mandatory. A message that fails max_retries times goes to the DLQ instead of being silently dropped. Monitor the DLQ β€” it’s your system telling you something is broken.


Fan-Out Pattern

Cloudflare Queues supports one consumer per queue. To deliver the same event to multiple services, use a fan-out consumer β€” a Worker that reads from the shared events queue and re-publishes to destination queues:

// fan-out-consumer/src/index.ts
export default {
  async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
    for (const msg of batch.messages) {
      const routes = getRoutes(msg.body.type);

      await Promise.all(
        routes.map((queue) => queue.send(msg.body))
      );

      msg.ack();
    }
  },
};

function getRoutes(eventType: string): Queue[] {
  // Route table β€” add new subscribers here, not in the producer
  const routes: Record<string, Queue[]> = {
    "research.completed": [env.SM_COMMANDS, env.ANALYTICS_QUEUE],
    "content.published":  [env.SM_COMMANDS, env.SOCIAL_COMMANDS],
    "content.performed":  [env.SM_COMMANDS],
  };
  return routes[eventType] ?? [];
}

The routing logic is JavaScript, not config. Add a subscriber by adding a line to the route table. Remove one by deleting the line. No producer changes needed.

This is the key advantage over static topic-based routing (Kafka, RabbitMQ): the routing is programmable. You can route based on payload fields, time of day, feature flags β€” anything.


Idempotency

Cloudflare Queues delivers at-least-once. This means your consumer will see duplicate messages. Not might. Will.

Every queue consumer must be safe to run twice with the same input. There are three strategies:

Strategy 1: Processed Events Table

The general-purpose approach. Check before acting:

async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
  for (const msg of batch.messages) {
    const { event_id } = msg.body;

    // Already processed?
    const existing = await env.DB.prepare(
      "SELECT 1 FROM processed_events WHERE event_id = ?"
    ).bind(event_id).first();

    if (existing) {
      msg.ack();
      continue;
    }

    try {
      await handleMessage(msg.body, env);

      await env.DB.prepare(
        "INSERT INTO processed_events (event_id, type, processed_at) VALUES (?, ?, ?)"
      ).bind(event_id, msg.body.type, new Date().toISOString()).run();

      msg.ack();
    } catch (err) {
      msg.retry({ delaySeconds: 30 });
    }
  }
}

Strategy 2: Natural Business Key

For writes where a natural key exists:

-- If we already have research for this keyword, skip
INSERT OR IGNORE INTO research (keyword, brand_slug, data, created_at)
VALUES (?, ?, ?, ?);

The INSERT OR IGNORE (or ON CONFLICT DO NOTHING) makes the write idempotent at the database level. No separate tracking table needed.

Strategy 3: Upstream Deduplication

For external API calls, check if the result already exists before calling:

async function researchKeyword(keyword: string, env: Env): Promise<void> {
  // Already have fresh research?
  const existing = await env.DB.prepare(
    "SELECT 1 FROM research WHERE keyword = ? AND created_at > datetime('now', '-7 days')"
  ).bind(keyword).first();

  if (existing) return; // Skip β€” don't burn an API call

  const result = await callPerplexityViaApiMom(keyword, env);
  await env.DB.prepare("INSERT INTO research ...").bind(...).run();
}

This saves money (no redundant API calls) and is naturally idempotent.

Pruning

Processed event records don’t need to live forever. Prune after 7 days:

DELETE FROM processed_events WHERE processed_at < datetime('now', '-7 days');

Run this on a cron schedule or inside a scheduled Agent task.


The Outbox Pattern

There’s a subtle failure mode: your D1 write succeeds, but the Queue publish fails (network issue, Worker CPU limit). Now your database says β€œresearch complete” but no event was emitted. Downstream services never find out.

The fix is the outbox pattern: write the event to an outbox table in the same D1 transaction as your business data. A separate process polls the outbox and publishes to Queues.

// In your queue consumer β€” write data AND outbox event in one transaction
await env.DB.batch([
  env.DB.prepare(
    "INSERT INTO research (id, keyword, data) VALUES (?, ?, ?)"
  ).bind(id, keyword, JSON.stringify(data)),

  env.DB.prepare(
    "INSERT INTO outbox (event_id, type, payload, created_at) VALUES (?, ?, ?, ?)"
  ).bind(
    crypto.randomUUID(),
    "research.completed",
    JSON.stringify({ brand_slug, research_ids: [id] }),
    new Date().toISOString()
  ),
]);

A scheduled task (cron or Agent schedule) publishes outbox events:

async function flushOutbox(env: Env): Promise<void> {
  const pending = await env.DB.prepare(
    "SELECT * FROM outbox WHERE published_at IS NULL ORDER BY created_at LIMIT 50"
  ).all();

  for (const row of pending.results) {
    await env.EVENTS_QUEUE.send({
      event_id: row.event_id,
      type: row.type,
      source: "gatherfeed",
      timestamp: row.created_at,
      payload: JSON.parse(row.payload),
    });

    await env.DB.prepare(
      "UPDATE outbox SET published_at = ? WHERE event_id = ?"
    ).bind(new Date().toISOString(), row.event_id).run();
  }
}

This guarantees that if the data was written, the event will eventually be published. The consumer’s idempotency handling covers the case where the event publishes twice (outbox flushed, but the published_at update failed).


Consumer Middleware

Borrowed from Watermill’s middleware pattern. Every queue handler should be wrapped in composable middleware:

type MessageHandler = (msg: DomainMessage, env: Env) => Promise<void>;
type Middleware = (next: MessageHandler) => MessageHandler;

// Deduplication middleware
function withDedup(db: D1Database): Middleware {
  return (next) => async (msg, env) => {
    const exists = await db.prepare(
      "SELECT 1 FROM processed_events WHERE event_id = ?"
    ).bind(msg.event_id).first();
    if (exists) return;

    await next(msg, env);

    await db.prepare(
      "INSERT INTO processed_events (event_id, type, processed_at) VALUES (?, ?, ?)"
    ).bind(msg.event_id, msg.body.type, new Date().toISOString()).run();
  };
}

// Logging middleware
function withLogging(logger: Logger): Middleware {
  return (next) => async (msg, env) => {
    logger.info({ event_id: msg.event_id, type: msg.type }, "processing");
    try {
      await next(msg, env);
      logger.info({ event_id: msg.event_id }, "completed");
    } catch (err) {
      logger.error({ event_id: msg.event_id, error: err.message }, "failed");
      throw err;
    }
  };
}

// Compose middleware
function pipe(...middlewares: Middleware[]): (handler: MessageHandler) => MessageHandler {
  return (handler) => middlewares.reduceRight((next, mw) => mw(next), handler);
}

// Usage
const processResearch = pipe(
  withDedup(env.DB),
  withLogging(logger),
)(async (msg, env) => {
  // Pure business logic β€” no boilerplate
  const { keywords, brand_slug } = msg.payload;
  for (const kw of keywords) {
    await researchKeyword(kw, brand_slug, env);
  }
});

Every handler gets deduplication, logging, and error tracking for free. Add metrics, rate limiting, or correlation tracking by adding a middleware. The business logic stays clean.


Agents as Event Reactors

The Cloudflare Agents SDK provides persistent, stateful Durable Objects that are perfect for the decision-maker role in an event-driven system.

A BrandAgent doesn’t do the work. It decides what work needs doing, based on its accumulated state.

import { Agent } from "agents";

interface BrandState {
  brand_slug: string;
  keywords_researched: number;
  articles_published: number;
  last_research_cycle: string | null;
  last_publish_cycle: string | null;
  pending_generation: number;
}

export class BrandAgent extends Agent<Env, BrandState> {
  initialState: BrandState = {
    brand_slug: "",
    keywords_researched: 0,
    articles_published: 0,
    last_research_cycle: null,
    last_publish_cycle: null,
    pending_generation: 0,
  };

  // Scheduled: wake up and assess what needs doing
  async discoveryCheck() {
    const daysSinceResearch = this.daysSince(this.state.last_research_cycle);

    if (daysSinceResearch > 7) {
      // Emit command β€” don't do the research here
      await this.env.RESEARCH_QUEUE.send({
        event_id: crypto.randomUUID(),
        type: "research.requested",
        source: "scalable-media",
        timestamp: new Date().toISOString(),
        correlation_id: `cycle-${this.state.brand_slug}-${Date.now()}`,
        payload: {
          brand_slug: this.state.brand_slug,
          keywords: await this.getTargetKeywords(),
          priority: "normal",
        },
      });

      this.setState({
        ...this.state,
        last_research_cycle: new Date().toISOString(),
      });
    }
  }

  // React to event: research is done
  async onResearchCompleted(event: DomainMessage<ResearchCompletedPayload>) {
    // Read research from GatherFeed's database
    const research = await this.fetchResearchByIds(event.payload.research_ids);

    // Apply strategy: filter, score, rank
    const candidates = this.evaluateKeywords(research);

    // Emit generation commands for the best candidates
    for (const candidate of candidates) {
      await this.env.CONTENT_QUEUE.send({
        event_id: crypto.randomUUID(),
        type: "content.generate",
        source: "scalable-media",
        timestamp: new Date().toISOString(),
        correlation_id: event.correlation_id,
        payload: {
          brand_slug: this.state.brand_slug,
          keyword: candidate.keyword,
          research_id: candidate.research_id,
          template: "pseo-article",
        },
      });
    }

    this.setState({
      ...this.state,
      keywords_researched: this.state.keywords_researched + research.length,
      pending_generation: candidates.length,
    });
  }

  // React to event: content was published
  async onContentPublished(event: DomainMessage<ContentPublishedPayload>) {
    this.setState({
      ...this.state,
      articles_published: this.state.articles_published + 1,
      pending_generation: Math.max(0, this.state.pending_generation - 1),
      last_publish_cycle: new Date().toISOString(),
    });

    // Schedule a performance check in 7 days
    await this.schedule(
      7 * 24 * 60 * 60, // seconds
      "checkPerformance",
      { urls: event.payload.urls }
    );
  }

  private evaluateKeywords(research: Research[]): Candidate[] {
    // Strategy logic: difficulty < 40, volume > 100, not already published
    return research
      .filter((r) => r.difficulty < 40 && r.volume > 100)
      .filter((r) => !this.isAlreadyPublished(r.keyword))
      .sort((a, b) => b.volume / b.difficulty - a.volume / a.difficulty)
      .slice(0, 10);
  }
}

The agent’s core loop is Decide β†’ Evolve β†’ Project (borrowed from Emmett’s event sourcing patterns):

The agent never calls Perplexity. Never calls Gemini. Never publishes HTML. It decides, records, and commands. Everything else is someone else’s job.


Durable Workflows

For multi-step processes that must complete, use AgentWorkflow. Each step is checkpointed β€” a crash resumes from the last completed step, not from the beginning.

import { AgentWorkflow } from "agents/workflows";
import type { AgentWorkflowEvent, AgentWorkflowStep } from "agents/workflows";

type GenerateParams = {
  brand_slug: string;
  keyword: string;
  research_id: string;
  template: string;
};

export class ContentWorkflow extends AgentWorkflow<BrandAgent, GenerateParams> {
  async run(event: AgentWorkflowEvent<GenerateParams>, step: AgentWorkflowStep) {
    const { brand_slug, keyword, research_id, template } = event.payload;

    // Step 1: Fetch research (durable β€” won't re-run on retry)
    const research = await step.do("fetch-research", {
      retries: { limit: 3, delay: "5 seconds", backoff: "exponential" },
      timeout: "30 seconds",
    }, async () => {
      return await fetchResearchFromGatherFeed(research_id, this.env);
    });

    this.reportProgress({ step: "research", status: "complete", percent: 0.2 });

    // Step 2: Generate outline
    const outline = await step.do("generate-outline", {
      retries: { limit: 3, delay: "10 seconds", backoff: "exponential" },
      timeout: "2 minutes",
    }, async () => {
      return await callGeminiViaApiMom("outline", { keyword, research, template }, this.env);
    });

    this.reportProgress({ step: "outline", status: "complete", percent: 0.4 });

    // Step 3: Draft article
    const draft = await step.do("draft-article", {
      retries: { limit: 3, delay: "10 seconds", backoff: "exponential" },
      timeout: "5 minutes",
    }, async () => {
      return await callGeminiViaApiMom("draft", { outline, research }, this.env);
    });

    this.reportProgress({ step: "draft", status: "complete", percent: 0.7 });

    // Step 4: Editorial pass
    const article = await step.do("editorial-pass", {
      retries: { limit: 2, delay: "10 seconds", backoff: "exponential" },
      timeout: "3 minutes",
    }, async () => {
      return await callGeminiViaApiMom("editorial", { draft, keyword }, this.env);
    });

    this.reportProgress({ step: "editorial", status: "complete", percent: 0.9 });

    // Step 5: Store and emit publish command
    await step.do("store-and-publish", {
      retries: { limit: 3, delay: "5 seconds", backoff: "exponential" },
    }, async () => {
      const contentId = await storeArticle(article, brand_slug, this.env);

      await this.env.PUBLISH_QUEUE.send({
        event_id: crypto.randomUUID(),
        type: "content.publish",
        source: "scalable-media",
        timestamp: new Date().toISOString(),
        payload: { brand_slug, content_id: contentId, keyword },
      });
    });

    await step.reportComplete({ keyword, brand_slug });
  }
}

Wrangler configuration for the workflow:

{
  "durable_objects": {
    "bindings": [
      { "name": "BRAND_AGENT", "class_name": "BrandAgent" }
    ]
  },
  "workflows": [
    {
      "name": "content-workflow",
      "binding": "CONTENT_WORKFLOW",
      "class_name": "ContentWorkflow"
    }
  ],
  "migrations": [
    { "tag": "v1", "new_sqlite_classes": ["BrandAgent"] }
  ]
}

If step 3 (drafting) crashes because Gemini is down, the workflow retries step 3 β€” it doesn’t re-fetch research or re-generate the outline. Steps 1 and 2 are already checkpointed.


The Event Catalog

A shared, typed registry of every event in the system. Every service imports from it. This prevents message schema drift.

// packages/events/src/index.ts

// ─── Base ────────────────────────────────────────────
export interface DomainMessage<T = unknown> {
  event_id: string;
  type: string;
  source: string;
  timestamp: string;
  correlation_id?: string;
  payload: T;
}

// ─── Research Domain ─────────────────────────────────
export interface ResearchRequestedPayload {
  brand_slug: string;
  keywords: string[];
  priority: "low" | "normal" | "high";
}

export interface ResearchCompletedPayload {
  brand_slug: string;
  research_ids: string[];
  keywords_researched: number;
}

// ─── Content Domain ──────────────────────────────────
export interface ContentGeneratePayload {
  brand_slug: string;
  keyword: string;
  research_id: string;
  template: string;
}

export interface ContentReadyPayload {
  brand_slug: string;
  content_id: string;
  keyword: string;
  word_count: number;
}

export interface ContentPublishPayload {
  brand_slug: string;
  content_id: string;
  keyword: string;
}

export interface ContentPublishedPayload {
  brand_slug: string;
  urls: string[];
  published_at: string;
}

// ─── Performance Domain ──────────────────────────────
export interface PerformanceReportPayload {
  brand_slug: string;
  url: string;
  impressions: number;
  clicks: number;
  position: number;
  period: string;
}

// ─── Type Map ────────────────────────────────────────
export type EventTypeMap = {
  "research.requested": ResearchRequestedPayload;
  "research.completed": ResearchCompletedPayload;
  "content.generate": ContentGeneratePayload;
  "content.ready": ContentReadyPayload;
  "content.publish": ContentPublishPayload;
  "content.published": ContentPublishedPayload;
  "performance.report": PerformanceReportPayload;
};

// ─── Helper ──────────────────────────────────────────
export function createMessage<K extends keyof EventTypeMap>(
  type: K,
  source: string,
  payload: EventTypeMap[K],
  correlationId?: string
): DomainMessage<EventTypeMap[K]> {
  return {
    event_id: crypto.randomUUID(),
    type,
    source,
    timestamp: new Date().toISOString(),
    correlation_id: correlationId,
    payload,
  };
}

Usage in any service:

import { createMessage } from "@brand-engine/events";

await env.RESEARCH_QUEUE.send(
  createMessage("research.requested", "scalable-media", {
    brand_slug: "niche-fi",
    keywords: ["best budgeting apps", "compound interest calculator"],
    priority: "normal",
  }, correlationId)
);

Type-safe. Auto-completed. If the schema changes, every consumer gets a compile error.


Small Patterns That Add Up

These are the building blocks. Each is small enough to drop into any Worker.

Delayed Retry with Backpressure

When a downstream service is overloaded, don’t hammer it. Delay the retry:

async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
  for (const msg of batch.messages) {
    try {
      await handleMessage(msg.body, env);
      msg.ack();
    } catch (err) {
      if (err instanceof RateLimitError) {
        // Back off β€” re-deliver in 60 seconds
        msg.retry({ delaySeconds: 60 });
      } else if (err instanceof TransientError) {
        // Standard retry β€” re-deliver in 10 seconds
        msg.retry({ delaySeconds: 10 });
      } else {
        // Permanent failure β€” let it DLQ after max retries
        msg.retry();
      }
    }
  }
}

Batch Sending

Queues support sendBatch β€” send up to 100 messages in one call. Use this when generating many commands at once:

// BrandAgent discovers 50 keywords, sends them all at once
const messages = keywords.map((kw) => ({
  body: createMessage("research.requested", "scalable-media", {
    brand_slug: "niche-fi",
    keywords: [kw],
    priority: "normal",
  }, correlationId),
}));

await env.RESEARCH_QUEUE.sendBatch(messages);

Event Deduplication in Agents

The Agent SDK’s webhook guide recommends this pattern β€” track seen event IDs in the agent’s SQLite:

class BrandAgent extends Agent<Env, BrandState> {
  private async isProcessed(eventId: string): Promise<boolean> {
    const row = this.sql`
      SELECT 1 FROM processed_events WHERE event_id = ${eventId}
    `.toArray();
    return row.length > 0;
  }

  private async markProcessed(eventId: string, type: string): Promise<void> {
    this.sql`
      INSERT INTO processed_events (event_id, type, processed_at)
      VALUES (${eventId}, ${type}, ${new Date().toISOString()})
    `;
  }

  async handleEvent(event: DomainMessage) {
    if (await this.isProcessed(event.event_id)) return;
    // ... handle the event ...
    await this.markProcessed(event.event_id, event.type);
  }
}

One Consumer, Multiple Queues

A single Worker can consume from multiple queues. Use batch.queue to route:

export default {
  async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
    switch (batch.queue) {
      case "sm-commands":
        await handleCommands(batch, env);
        break;
      case "sm-events":
        await handleEvents(batch, env);
        break;
      case "sm-commands-dlq":
        await handleDeadLetters(batch, env);
        break;
    }
  },
};

Correlation ID Propagation

Every command inherits the correlation ID from its triggering event. This lets you trace an entire cycle:

// Event arrives: research.completed with correlation_id "cycle-niche-fi-1710..."
async onResearchCompleted(event: DomainMessage<ResearchCompletedPayload>) {
  // Commands inherit the correlation_id
  await env.CONTENT_QUEUE.send(
    createMessage("content.generate", "scalable-media", {
      brand_slug: event.payload.brand_slug,
      keyword: "best budgeting apps",
      research_id: "r_001",
      template: "pseo-article",
    }, event.correlation_id) // ← passed through
  );
}

// Later, in the DLQ handler, you can find every message in the chain:
// SELECT * FROM processed_events WHERE correlation_id = 'cycle-niche-fi-1710...'

Scheduled Agent Cycle

An agent that wakes up on a cron, checks what needs doing, and emits commands:

class BrandAgent extends Agent<Env, BrandState> {
  // Called by this.schedule("0 */6 * * *", "cycle")
  async cycle() {
    const state = this.state;

    // Check: do we need research?
    if (this.daysSince(state.last_research_cycle) > 7) {
      await this.requestResearch();
    }

    // Check: do we have unpublished content?
    const unpublished = this.sql`
      SELECT COUNT(*) as count FROM content WHERE published_at IS NULL
    `.toArray();

    if (unpublished[0].count > 0) {
      await this.publishPending();
    }

    // Check: do we need performance review?
    if (this.daysSince(state.last_performance_check) > 14) {
      await this.requestPerformanceData();
    }
  }
}

Dead Letter Queue Monitor

A simple DLQ consumer that logs failures and alerts:

// dlq-monitor/src/index.ts
export default {
  async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
    for (const msg of batch.messages) {
      // Log the failure with full context
      console.error(JSON.stringify({
        level: "error",
        message: "DLQ message received",
        event_id: msg.body.event_id,
        type: msg.body.type,
        source: msg.body.source,
        correlation_id: msg.body.correlation_id,
        timestamp: msg.body.timestamp,
        attempts: msg.attempts,
      }));

      // Store for investigation
      await env.DB.prepare(
        "INSERT INTO dlq_messages (event_id, type, source, payload, received_at) VALUES (?, ?, ?, ?, ?)"
      ).bind(
        msg.body.event_id,
        msg.body.type,
        msg.body.source,
        JSON.stringify(msg.body.payload),
        new Date().toISOString()
      ).run();

      msg.ack(); // Acknowledge so it doesn't loop
    }
  },
};

Read-Only Cross-Service Data Access

Services can expose read-only HTTP APIs for data retrieval. This is the only permitted synchronous cross-service call:

// GatherFeed exposes a read API β€” no auth needed for internal reads,
// or use service binding for zero-network-hop access
app.get("/api/v1/research/:id", async (c) => {
  const research = await c.env.DB.prepare(
    "SELECT * FROM research WHERE id = ?"
  ).bind(c.req.param("id")).first();

  if (!research) return c.json({ error: "not found" }, 404);
  return c.json(research);
});

// BrandAgent reads GatherFeed's data when deciding what to generate
// This is a query, not a command β€” it's fine to be synchronous
async fetchResearchByIds(ids: string[]): Promise<Research[]> {
  const results = await Promise.all(
    ids.map((id) =>
      fetch(`${this.env.GATHERFEED_URL}/api/v1/research/${id}`)
        .then((r) => r.json())
    )
  );
  return results;
}

Schema Migration for Event Tables

Every service that consumes from queues needs these tables:

-- processed_events: idempotency tracking
CREATE TABLE IF NOT EXISTS processed_events (
  event_id TEXT PRIMARY KEY,
  type TEXT NOT NULL,
  processed_at TEXT NOT NULL
);

CREATE INDEX idx_processed_events_at ON processed_events(processed_at);

-- outbox: guaranteed event publication
CREATE TABLE IF NOT EXISTS outbox (
  event_id TEXT PRIMARY KEY,
  type TEXT NOT NULL,
  source TEXT NOT NULL,
  payload TEXT NOT NULL,
  correlation_id TEXT,
  created_at TEXT NOT NULL,
  published_at TEXT
);

CREATE INDEX idx_outbox_unpublished ON outbox(published_at) WHERE published_at IS NULL;

-- dlq_messages: dead letter queue investigation
CREATE TABLE IF NOT EXISTS dlq_messages (
  event_id TEXT PRIMARY KEY,
  type TEXT NOT NULL,
  source TEXT NOT NULL,
  payload TEXT NOT NULL,
  received_at TEXT NOT NULL,
  investigated_at TEXT
);

Testing Queues Locally with Miniflare

You can test queue producers and consumers locally without deploying:

import { Miniflare } from "miniflare";

const mf = new Miniflare({
  workers: [
    {
      name: "producer",
      modules: true,
      script: `
        export default {
          async fetch(request, env) {
            await env.QUEUE.send({ event_id: "test-1", type: "research.requested" });
            return new Response("sent");
          }
        }
      `,
      queueProducers: { QUEUE: "research-commands" },
    },
    {
      name: "consumer",
      modules: true,
      script: `
        export default {
          async queue(batch) {
            for (const msg of batch.messages) {
              console.log("received:", msg.body.type);
              msg.ack();
            }
          }
        }
      `,
      queueConsumers: { "research-commands": { maxBatchTimeout: 1 } },
    },
  ],
});

// Trigger the producer
const resp = await mf.dispatchFetch("http://localhost");
console.log(await resp.text()); // "sent"
// Consumer logs: "received: research.requested"

Workflow Progress to WebSocket Clients

AgentWorkflow can broadcast progress to connected dashboard clients in real-time:

class ContentWorkflow extends AgentWorkflow<BrandAgent, GenerateParams> {
  async run(event: AgentWorkflowEvent<GenerateParams>, step: AgentWorkflowStep) {
    // Non-durable: broadcasts to all WebSocket clients
    this.broadcastToClients({
      type: "workflow-started",
      keyword: event.payload.keyword,
    });

    const outline = await step.do("outline", { /* ... */ }, async () => {
      return await generateOutline(event.payload, this.env);
    });

    // Progress update β€” clients see this in real-time
    this.reportProgress({ step: "outline", status: "complete", percent: 0.3 });

    const draft = await step.do("draft", { /* ... */ }, async () => {
      return await generateDraft(outline, this.env);
    });

    // Durable state update β€” persists AND broadcasts
    await step.mergeAgentState({
      currentWorkflow: { keyword: event.payload.keyword, step: "editorial", percent: 0.7 },
    });

    // ... continue ...
  }
}

// Client-side React hook receives all updates automatically:
// const agent = useAgent({ agent: "brand-agent", name: "niche-fi",
//   onStateUpdate: (s) => setProgress(s.currentWorkflow)
// });

Conditional Fan-Out

Route events based on payload content, not just event type:

function getRoutes(event: DomainMessage): Queue[] {
  const routes: Queue[] = [];

  // Type-based routing
  if (event.type === "research.completed") {
    routes.push(env.SM_COMMANDS);
  }

  if (event.type === "content.published") {
    routes.push(env.SM_COMMANDS);

    // Only fan out to social if the brand has social enabled
    if (event.payload.brand_slug !== "internal-tools") {
      routes.push(env.SOCIAL_COMMANDS);
    }

    // Only fan out to analytics in production
    if (env.ENVIRONMENT === "production") {
      routes.push(env.ANALYTICS_QUEUE);
    }
  }

  // Priority-based routing
  if (event.payload.priority === "high") {
    routes.push(env.ALERTS_QUEUE);
  }

  return routes;
}

Graceful Queue Consumer with Batch Acknowledgment

Process a batch, acknowledge the good messages, retry the bad ones:

async queue(batch: MessageBatch<DomainMessage>, env: Env): Promise<void> {
  // Don't use batch.ackAll() or batch.retryAll()
  // Handle each message individually for granularity

  for (const msg of batch.messages) {
    try {
      await processMessage(msg.body, env);
      msg.ack(); // This one is done
    } catch (err) {
      if (isRetryable(err)) {
        msg.retry({ delaySeconds: computeBackoff(msg.attempts) });
      } else {
        // Log the permanent failure, ack so it goes to DLQ
        console.error(`Permanent failure for ${msg.body.event_id}:`, err);
        msg.ack(); // Will hit DLQ after max retries anyway
      }
    }
  }
}

function computeBackoff(attempts: number): number {
  // Exponential backoff: 5s, 10s, 20s, 40s, capped at 300s
  return Math.min(5 * Math.pow(2, attempts), 300);
}

function isRetryable(err: unknown): boolean {
  return err instanceof TransientError ||
    (err instanceof Response && err.status >= 500);
}

What Not to Do

Anti-PatternWhy It FailsDo This Instead
Large payloads in messages128KB limit. Queues are for signals, not data transfer.Put data in D1/R2. Put a reference ID in the message.
Assuming message orderQueues don’t guarantee order. content.published can arrive before content.ready.Design every handler to work regardless of arrival order. Use state to reconcile.
Sync disguised as asyncSending a command then polling for the result is just HTTP with extra latency.Let the completion event come to you. React, don’t poll.
Processing without dedupAt-least-once will bite you. Duplicate research calls, duplicate articles, duplicate costs.Check event_id before every action. Use INSERT OR IGNORE for writes.
One giant queueEverything competes. A slow handler blocks fast ones. No isolation.Separate command queues per service. Shared events queue with fan-out.
Direct API calls between WorkersTemporal coupling. CF error 1042 on same-account Workers. Cost amplification.Queues for actions. Read-only HTTP only for data retrieval.
Agent does the workThe agent calls Perplexity, generates content, publishes HTML. Now it’s a monolith in a Durable Object.Agent decides and commands. Workflows and services do the work.
Ignoring the DLQMessages fail silently. You don’t know something is broken until a customer complains.Monitor DLQ. Alert on messages. Every DLQ message is a bug report.
No correlation IDA brand cycle triggers 50 research commands, 30 generations, 20 publications. You can’t trace the chain.Every command inherits the correlation_id from the triggering event.
Storing state in messagesMessages are ephemeral. If you lose the message, you lose the state.Store state in Agent SQLite or D1. Messages are notifications about state changes.

What You Don’t Need


Cloudflare Constraints

Design around these. Don’t fight them.

ConstraintImpactDesign Response
One consumer per queueNo consumer groups like KafkaOne queue per consuming service. Fan-out consumer for broadcast.
At-least-once deliveryDuplicates will happenIdempotency everywhere. processed_events table. Natural business keys.
No message orderingCan’t rely on sequenceHandlers must be order-independent. Use timestamps + state for reconciliation.
128KB message limitCan’t send large payloadsMessages carry IDs. Data lives in D1/R2.
Worker CPU time limitsLong chains can timeoutBreak chains into separate queue hops. Each hop is a fresh Worker invocation.
CF error 1042Same-account Workers can’t fetch() each other via workers.devDon’t use HTTP between Workers. Use Queues. Service bindings only for infrastructure.
No event replayCan’t rewind and replay like KafkaOutbox pattern for guaranteed publication. DLQ for failed messages. Careful idempotency.
Queue send() can failWrite succeeds, publish doesn’tOutbox pattern β€” write event to D1 in same transaction, publish from outbox.

Compared: Message Queues

Cloudflare Queues is one option among many. Here’s how it stacks up against the alternatives β€” and when you’d pick each.

Cloudflare Queues vs AWS SQS

SQS is the closest equivalent. Both are managed, serverless, point-to-point queues.

DimensionCloudflare QueuesAWS SQS
DeliveryAt-least-onceAt-least-once (standard) or exactly-once (FIFO)
OrderingNo guaranteeNo guarantee (standard) or strict FIFO
Max message size128 KB256 KB (up to 2 GB with S3 pointer)
Throughput5,000 msg/sec/queueNearly unlimited (standard), 300 msg/sec (FIFO)
Consumers1 per queue (up to 250 concurrent invocations)Unlimited consumers polling
Dead letter queueYesYes
Message delayUp to 24 hoursUp to 15 minutes
RetentionUntil consumedUp to 14 days
Compute couplingTightly coupled to WorkersLoosely coupled (Lambda, EC2, ECS, anything)
Egress feesNoneStandard AWS egress
Fan-outManual (JS routing in consumer)Use SNS + SQS fan-out pattern
Pricing modelPer message (simple)Per request + data transfer

When to pick SQS: You need FIFO ordering, exactly-once delivery, longer retention, or your compute already lives on AWS.

When to pick CF Queues: Your compute is on Cloudflare Workers. Zero egress fees. Simpler pricing. No separate fan-out service needed β€” JavaScript routing in the consumer handles it.

Cloudflare Queues vs AWS SNS + SQS

AWS solves fan-out by pairing SNS (pub/sub) with SQS (queues). A producer publishes to an SNS topic, and multiple SQS queues subscribe.

AWS:    Producer β†’ SNS Topic β†’ SQS Queue A (Service A)
                             β†’ SQS Queue B (Service B)
                             β†’ SQS Queue C (Service C)

CF:     Producer β†’ Events Queue β†’ Fan-out Consumer β†’ Queue A (Service A)
                                                   β†’ Queue B (Service B)
                                                   β†’ Queue C (Service C)
DimensionSNS + SQSCF Queues + Fan-out Consumer
Fan-out mechanismDeclarative: subscribe queues to topicsProgrammable: JS code routes messages
FilteringSNS subscription filter policies (JSON)Any JS logic (payload inspection, env vars, time-based)
OrderingSNS FIFO + SQS FIFO (same message group)No ordering guarantee
ComplexityTwo services to configure, IAM policiesOne extra Worker (the fan-out consumer)
ScalabilitySNS handles millions of subscribersYou manage the fan-out Worker’s queue bindings
Content-based routingSNS filter policies (limited to message attributes)Full JS β€” route on any field, any logic

Pros of SNS+SQS: Battle-tested at massive scale. Declarative subscriptions. FIFO support.

Pros of CF fan-out: Programmable routing (not just attribute matching). No extra service to manage β€” it’s just a Worker. No cross-service IAM configuration.

Cloudflare Queues vs AWS EventBridge

EventBridge is AWS’s serverless event bus β€” the closest analog to our β€œevents queue + fan-out consumer” pattern, but fully managed.

DimensionAWS EventBridgeCF Queues + Fan-out
Event routingRules with 28+ filtering patternsJavaScript (unlimited flexibility)
Schema registryBuilt-in (auto-discovers schemas)Manual (TypeScript event catalog)
Targets25+ AWS services directlyOnly CF Queues (you wire the rest)
Cross-accountNative cross-account event busNot applicable (single CF account)
Archive & replayYes (replay events from archive)No (consumed = gone)
ThroughputSoft limits, request increases5,000 msg/sec per queue
Latency~500ms typicalSub-100ms (same CF network)
PricingPer event ingestedPer message sent/received

When EventBridge wins: You need event archive/replay, schema discovery, or deep AWS service integration.

When CF wins: You want sub-100ms latency between services on the same edge network. Your routing logic is complex enough that JSON filter patterns don’t cut it.

Cloudflare Queues vs Apache Kafka

Kafka is a different animal β€” a distributed commit log, not a message queue.

DimensionApache KafkaCloudflare Queues
ModelDistributed commit logMessage queue
OrderingGuaranteed per partitionNo guarantee
DeliveryExactly-once (with transactions)At-least-once
RetentionConfigurable (days, weeks, forever)Until consumed
ReplayYes β€” consumers can rewind to any offsetNo
Consumer groupsMultiple consumers per topic (consumer groups)One consumer per queue
ThroughputMillions of messages/sec5,000 msg/sec per queue
OperationsSelf-managed or Confluent Cloud ($$$)Fully managed, zero ops
SchemaSchema Registry (Avro, Protobuf, JSON Schema)Manual (TypeScript types)
LatencyLow (ms)Low (ms, same network)
CostClusters + storage + egressPer-message, no base cost

When Kafka wins: You need event replay (audit logs, rebuilding state), strict ordering, multiple independent consumers reading the same stream, or million-msg/sec throughput.

When CF Queues wins: You don’t need any of that. Most applications don’t. If your throughput is under 50K msg/sec and you don’t need replay, Kafka is massive overkill. CF Queues is zero-ops and costs pennies.

Cloudflare Queues vs Google Cloud Pub/Sub

DimensionGoogle Cloud Pub/SubCloudflare Queues
ModelPub/Sub with subscriptionsPoint-to-point queue
Fan-outNative: multiple subscriptions per topicManual: fan-out consumer
OrderingSupported (ordering keys)No guarantee
DeliveryAt-least-once (exactly-once with ordering)At-least-once
Dead letterYesYes
Message size10 MB128 KB
RetentionUp to 31 daysUntil consumed
Push/PullBothBoth (Worker consumer or HTTP pull)
GlobalMulti-region by defaultGlobal (Cloudflare network)

When Pub/Sub wins: Native fan-out without custom code. Larger messages. Longer retention. Deep GCP integration.

When CF Queues wins: Tighter integration with Workers. No egress fees. Simpler pricing. Lower latency within the CF network.

Cloudflare Queues vs RabbitMQ

DimensionRabbitMQCloudflare Queues
ModelAMQP message brokerManaged queue
RoutingExchanges: direct, topic, fanout, headersJavaScript consumer logic
OrderingPer-queue FIFONo guarantee
DeliveryAt-least-once or at-most-once (configurable)At-least-once
OperationsSelf-managed or CloudAMQPFully managed
ProtocolAMQP, MQTT, STOMPHTTP (producer), push (consumer)
Consumer modelPull (competing consumers)Push (one consumer Worker, auto-scaled)

When RabbitMQ wins: You need complex routing topologies (exchanges + binding keys), multiple protocols, or priority queues.

When CF Queues wins: You don’t want to manage infrastructure. Your services are Workers. Simple is better.


Compared: Durable Execution

Multi-step workflows that survive failures. This is the domain of β€œdurable execution” engines.

Cloudflare Workflows vs AWS Step Functions

DimensionAWS Step FunctionsCloudflare Workflows
DefinitionJSON (Amazon States Language) or visual designerTypeScript code (step.do())
Execution modelState machine with transitionsSequential code with durable checkpoints
Max duration1 year (standard), 5 minutes (express)Up to 1 year (waitForEvent)
RetryPer-state retry/catch policiesPer-step retry with backoff
ParallelismParallel and Map statesPromise.all() across multiple step.do()
Human-in-the-loopTask tokens (callback pattern)waitForApproval()
ObservabilityVisual execution history, X-RayreportProgress(), lifecycle callbacks
PricingPer state transition ($0.025/1K)Per step (included in Workers pricing)
Compute couplingLambda, ECS, any AWS serviceWorkers only
Agent integrationNo native equivalentNative: AgentWorkflow ↔ Agent RPC

Pros of Step Functions: Visual designer. Deep AWS integration (invoke any service as a step). Mature ecosystem. Express Workflows for high-throughput, short-duration.

Pros of CF Workflows: Code, not config. TypeScript all the way β€” no ASL to learn. Native agent integration (workflows can call agent methods, update agent state). Simpler pricing.

Key difference: Step Functions are state machines defined in JSON. CF Workflows are sequential TypeScript with checkpoints. If you think in code, CF Workflows feel natural. If you think in diagrams, Step Functions have a visual editor.

Cloudflare Workflows vs Temporal

DimensionTemporalCloudflare Workflows
LanguageGo, Java, TypeScript, Python, .NETTypeScript (Workers)
InfrastructureSelf-hosted cluster or Temporal CloudFully managed (zero ops)
Execution modelReplay-based (deterministic re-execution)Checkpoint-based (step results persisted)
DurabilityInfinite (workflow history is persistent)Up to 1 year
Signals & queriesFirst-class (signal a running workflow, query its state)sendEvent(), waitForApproval()
VersioningWorkflow versioning with getVersion()No native versioning
ObservabilityTemporal Web UI, workflow history explorerAgent callbacks, reportProgress()
CommunityLarge (Uber, Netflix, Snap, Stripe)Growing (Cloudflare ecosystem)
Learning curveSteep (deterministic constraints, replay model)Moderate (just step.do() in sequence)

Pros of Temporal: Battle-tested at enormous scale. Rich workflow primitives (signals, queries, child workflows, continue-as-new). Multi-language. Self-hostable. Large community.

Pros of CF Workflows: Zero infrastructure. No cluster to manage. No replay model to understand β€” just write sequential code. Tight integration with Agents SDK for real-time state sync and WebSocket broadcasting.

Key difference: Temporal uses replay-based durability β€” your workflow function is re-executed from the start, but completed activities return cached results. This is powerful but requires understanding deterministic constraints (no random, no Date.now() outside activities). CF Workflows uses checkpoint-based durability β€” step results are stored, and execution resumes from the last checkpoint. Simpler mental model, fewer gotchas.

Cloudflare Workflows vs Inngest

DimensionInngestCloudflare Workflows
ModelEvent-driven step functionsAgent-integrated durable workflows
TriggerEvents (any source via HTTP)Agent runWorkflow() or manual
Stepsstep.run(), step.sleep(), step.waitForEvent()step.do(), step.sleep(), waitForApproval()
InfrastructureManaged (SaaS) or self-hostedManaged (Cloudflare)
ComputeYour serverless functions (Lambda, Vercel, CF Workers)Workers only
Fan-outBuilt-in (step.sendEvent())Manual (queue publish in step)
Concurrency controlBuilt-in (per-function, per-key)Manual
PricingPer step executionIncluded in Workers

Pros of Inngest: Platform-agnostic. Built-in concurrency control. Event-driven triggers from any source. Can invoke functions on any serverless platform.

Pros of CF Workflows: No external dependency. Same-network execution (no HTTP hop to invoke). Agent state integration. Simpler if you’re already on Cloudflare.

Cloudflare Workflows vs Restate

Restate is a newer durable execution engine that sits between Temporal’s power and Inngest’s simplicity.

DimensionRestateCloudflare Workflows
ModelDurable async/await with journalingDurable steps with checkpoints
ExecutionProxied function calls (like Temporal, but lighter)Sequential step.do() calls
StateBuilt-in key-value state per handlerAgent SQLite + key-value state
VersioningFirst-class handler versioningNo native versioning
InfrastructureRestate Server (self-hosted or cloud)Fully managed
LanguageTypeScript, Java, Kotlin, GoTypeScript

Pros of Restate: Solves the immutability problem (handler versioning). Lighter than Temporal. Virtual objects (similar to Durable Objects).

Pros of CF Workflows: No external server. Deeply integrated with the rest of the Cloudflare stack (Queues, D1, R2, Agents).


Compared: Stateful Compute

The BrandAgent pattern (persistent, stateful, addressable compute) has equivalents on other platforms.

Cloudflare Durable Objects / Agents SDK vs Azure Durable Entities

DimensionAzure Durable EntitiesCF Durable Objects / Agents SDK
ModelEntity functions (actor-like)Durable Objects (actor model)
StorageAzure Table Storage (managed)SQLite per object (co-located)
AddressingEntity IDObject ID (name-based)
CommunicationSignals (one-way) or calls (request/response)RPC, WebSocket, HTTP
SchedulingDurable timersthis.schedule() (cron or delay)
Real-timeSignalR integrationNative WebSocket, state sync to clients
ColocationCompute and storage may be separateCompute and storage in the same thread

Key advantage of CF: Storage is co-located with compute in the same thread. No network hop to read state. This is unique β€” on Azure, entity state is in Table Storage, which means a network round-trip on every read.

Cloudflare Durable Objects vs AWS DynamoDB + Lambda

AWS doesn’t have a direct equivalent to Durable Objects. The closest pattern is Lambda functions triggered by DynamoDB Streams:

CF:     Request β†’ Worker β†’ Durable Object (state + compute in one place)

AWS:    Request β†’ Lambda β†’ DynamoDB (state)
                         β†’ DynamoDB Streams β†’ Lambda (react to changes)
DimensionDynamoDB + LambdaCloudflare Durable Objects
State accessNetwork hop to DynamoDBSame-thread SQLite (0ms)
Change eventsDynamoDB Streams β†’ LambdaNot built-in (use Queues or outbox)
ConsistencyEventually consistent reads (strong optional)Strongly consistent (single-threaded)
AddressingPartition keyObject name
CostDynamoDB RCU/WCU + Lambda invocationsDurable Object requests + duration
Actor modelNo (you build it)Yes (one instance per ID, serialized access)

Pros of DynamoDB+Lambda: Mature. Global tables. DynamoDB Streams for change data capture (CF has no equivalent of change feeds).

Pros of CF Durable Objects: Single-threaded actor model eliminates concurrency bugs. Storage is in the same thread β€” no network hop. WebSocket support built-in. Agents SDK adds scheduling, workflows, and state sync for free.

Cloudflare Agents SDK vs Microsoft Orleans (Virtual Actors)

The Agents SDK’s actor-per-entity pattern is directly inspired by the virtual actor model that Orleans pioneered and deco-cx/actors adapted for edge:

DimensionOrleansCloudflare Agents SDK
LanguageC# (.NET)TypeScript
Runtime.NET cluster (self-hosted or Azure)Cloudflare Workers (managed)
Actor lifecycleVirtual: activated on demand, deactivated on idleSame: created on first access, hibernates
PersistencePluggable (Azure Storage, SQL, etc.)SQLite (co-located)
StreamsOrleans Streams (pub/sub between actors)No built-in inter-agent pub/sub (use Queues)
Timers/RemindersBuilt-in (survive restarts)this.schedule() (survives restarts)
ReentrancyConfigurableSingle-threaded (no reentrancy)

Key insight: Orleans invented the pattern. CF Agents SDK implements it on edge infrastructure. The mental model is the same β€” one actor per entity (one BrandAgent per brand), activated on demand, sleeps when idle, state persists across activations.


Compared: Full Stack Architectures

How does a full event-driven system on Cloudflare compare to equivalent architectures on other platforms?

Cloudflare Stack vs AWS Serverless Stack

LayerCloudflareAWS
ComputeWorkersLambda
Message queueQueuesSQS
Event busQueues + fan-out consumerEventBridge or SNS
Stateful computeDurable Objects / Agents SDKStep Functions + DynamoDB (no direct equivalent)
Durable workflowsWorkflows / AgentWorkflowStep Functions
DatabaseD1 (SQLite)DynamoDB or Aurora Serverless
Object storageR2S3
Real-timeWebSockets on Durable ObjectsAPI Gateway WebSocket + Lambda
SchedulingAgent this.schedule() or Cron TriggersEventBridge Scheduler
AI inferenceWorkers AIBedrock
ObservabilityLogpush, Workers AnalyticsCloudWatch, X-Ray

Pros of AWS: Broader service catalog. More mature. Larger community. More third-party integrations. FIFO queues. Event replay (EventBridge Archive).

Pros of Cloudflare: Radically simpler. Fewer services to wire together. Co-located compute+storage (Durable Objects). Zero egress fees. Sub-100ms global latency. One language (TypeScript) for everything. Agents SDK is a uniquely integrated offering β€” stateful compute + scheduling + workflows + WebSocket + SQLite in one class.

The honest trade-off: AWS gives you more Lego bricks. Cloudflare gives you fewer bricks that fit together more tightly. If you need exactly the right brick (FIFO queues, event replay, cross-account event bus), AWS has it. If you want to build fast with less operational overhead, Cloudflare’s integrated stack is hard to beat.

When to NOT Use the Cloudflare Stack

Be honest about what it can’t do:

When the Cloudflare Stack Shines


Full Example: Content Pipeline

Here’s the complete flow β€” a BrandAgent wakes up, triggers research, reacts to results, generates content, and publishes. Every arrow is a queue message:

1. BrandAgent scheduled wake-up fires (this.schedule, cron)
2. Agent checks state: "Last research was 8 days ago. Need keywords."
3. Agent sends β†’  RESEARCH_QUEUE:
     { type: "research.requested", payload: { brand_slug: "niche-fi", keywords: [...] } }

4. GatherFeed consumer picks up message
5. GatherFeed calls Perplexity via API Mom β†’ keyword research
6. GatherFeed stores results in its D1
7. GatherFeed writes to outbox in same transaction
8. GatherFeed outbox flush β†’ EVENTS_QUEUE:
     { type: "research.completed", payload: { brand_slug: "niche-fi", research_ids: [...] } }

9. Fan-out consumer reads event, routes to SM_COMMANDS queue

10. BrandAgent receives event, reads GatherFeed's DB via read API
11. Agent applies strategy: difficulty < 40, volume > 100, not published
12. Agent selects 10 keywords, starts ContentWorkflow for each

13. ContentWorkflow step.do("outline") β†’ Gemini via API Mom
14. ContentWorkflow step.do("draft") β†’ Gemini
15. ContentWorkflow step.do("editorial") β†’ Gemini
16. ContentWorkflow step.do("store-and-publish") β†’ D1 write + PUBLISH_QUEUE:
     { type: "content.publish", payload: { brand_slug: "niche-fi", content_id: "..." } }

17. Pages-plus consumer picks up message, reads content from SM's read API
18. Pages-plus writes to its D1, articles go live
19. Pages-plus writes to outbox β†’ EVENTS_QUEUE:
     { type: "content.published", payload: { brand_slug: "niche-fi", urls: [...] } }

20. Fan-out consumer routes to SM_COMMANDS and SOCIAL_COMMANDS

21. BrandAgent records: articles live. Schedules performance check in 7 days.
22. Social-good receives event, creates social posts promoting the content.

23. Seven days later: BrandAgent wakes up, reads analytics, adjusts strategy.
24. Cycle continues.

No service waited for another. GatherFeed could take 5 minutes or 5 hours. If step 14 crashed, the workflow resumed at step 14, not step 1. Every service only knows about its own queue and the events queue. Add a new service by adding a line to the fan-out consumer’s route table.


References

Cloudflare Documentation

Libraries and Frameworks

Comparisons and Analysis

Architecture Patterns


License

MIT


Edit page
Share this post on:

Previous Post
Cost Observability for Cloudflare Workers
Next Post
Event-Driven Architecture on Cloudflare Queues