No media processing service is reliable enough to call once and trust. The architecture that survives production is not the one that expects everything to work — it is the one that expects everything to fail and handles it gracefully. Build for 80% reliability, instrument everything, retry selectively, and keep a fallback chain for every capability that matters.
Table of Contents
Open Table of Contents
- 1. The Uncomfortable Truth
- 2. The Design Principle
- 3. Validation at Every Boundary
- 4. Retry Strategies Per Capability Type
- 5. Fallback Chains
- 6. The Sora Lesson
- 7. Memory as a Weapon
- 8. ElevenLabs Horror Stories
- 9. The Compound Effect of Unreliability
- 10. Monitoring What Matters
- The Architecture That Survives
- Related Articles
- Reference Implementation
1. The Uncomfortable Truth
Here is the data, collected from real production pipelines across 2024 and early 2026:
| Capability | Observed failure mode | Effective reliability |
|---|---|---|
OpenAI TTS (tts-1) | Truncates output to 50–60% of intended content on long inputs | ~60% |
| ElevenLabs | Non-deterministic output; unauthorized billing charges exceeding $2K; account lockouts without warning | ~70% |
| Edge TTS | Long text truncation; audio degradation after extended continuous use | ~80% |
| FFmpeg | 8 GB memory consumed in under 60 seconds on corrupted input streams | Dependent on input validation |
| fal.ai (Wan2) | Internal server errors; generation timeouts with no partial output | ~75% |
| Whisper | Timing drift begins at 20+ minute audio; transcripts desync from media | ~85% for short audio |
| Pexels API | Hard 200 req/hr limit; burst requests trigger 429 with no header warning | Rate-dependent |
| Cloudflare R2 | March 2025 incident: 100% write failure for over one hour | 99.9%+ (but incidents happen) |
| Sora | Entire service shut down March 24, 2026 | 0% (service terminated) |
These are not edge cases. They are the normal operating conditions of production media pipelines. Every capability in that table was relied upon by pipelines that expected it to work. Every one of them failed in ways that were not recoverable without deliberate architectural preparation.
The OpenAI TTS truncation issue is particularly instructive. The API returns HTTP 200. The response contains audio. A naive validator sees success. A duration-aware validator catches that a 500-word narration is 23 seconds when it should be 45 seconds — and the silent truncation is exposed. Without that check, the pipeline ships a video where the narration stops halfway through and the remaining footage plays in silence.
No one filed a support ticket. The model just stopped generating audio after hitting an internal length boundary, returned what it had, and called it done.
This is the shape of production failure in media pipelines: not crashes, not 500 errors, not connection timeouts — silent partial success. The call completes. The output is wrong. Only validation catches it.
2. The Design Principle
There is a simple asymmetry in how architectural assumptions fail:
If you build for 100% reliability and get 80%, your system crashes. Pipelines stall. Steps that depend on the failed capability time out. Without retry logic, a single transient error kills the entire job. Without fallbacks, a provider outage kills every job until it recovers.
If you build for 80% reliability and get 95%, your system is graceful. Retries fire and succeed on the first or second attempt. Fallbacks are available but unused. Monitoring shows a healthy retry rate. The pipeline operator sees nothing alarming.
The asymmetry is not symmetrical in cost. Building for 80% reliability costs engineering time upfront — validation logic, retry policies, fallback chains, monitoring dashboards. Building for 100% reliability costs nothing upfront and costs catastrophically later, at a moment you cannot control.
The principle: always design for the worse case you can plausibly expect, not the average case you would prefer.
For a capability like TTS, that means assuming any given call may produce truncated, silent, or distorted output. The architecture that survives this assumption is not complex — it is just validation, retry, and fallback applied consistently. The architecture that ignores this assumption looks simpler right up until the moment it fails in a way that cannot be patched without a redesign.
The tight loop article makes a related point about systems: “Don’t fix the bug. Fix the system that let the bug live.” In capability reliability terms, this translates directly: don’t fix the truncated audio file. Fix the system that allowed a truncated audio file to make it past the TTS step without detection.
3. Validation at Every Boundary
Every capability output must be validated before the pipeline advances to the next step. The validation is not optional and is not delegated to downstream steps — it happens at the boundary, immediately after the capability completes, before anything else runs.
The pattern:
interface ValidationResult {
valid: boolean;
reason?: string;
measured?: Record<string, number | string>;
}
async function validateTTSOutput(
outputPath: string,
inputText: string
): Promise<ValidationResult> {
const exists = await fs.stat(outputPath).catch(() => null);
if (!exists) return { valid: false, reason: "output file missing" };
if (exists.size === 0) return { valid: false, reason: "output file is empty" };
const duration = await getAudioDuration(outputPath); // ffprobe
const expectedDuration = estimateTTSDuration(inputText); // ~150 words/min
const ratio = duration / expectedDuration;
if (ratio < 0.75) {
return {
valid: false,
reason: "audio duration too short — likely truncated",
measured: { duration, expectedDuration, ratio }
};
}
return { valid: true, measured: { duration, expectedDuration, ratio } };
}
Validation requirements by capability type:
TTS (text-to-speech)
- Output file exists and size > 0
- Audio duration is within 75–130% of expected duration based on word count
- Audio is decodable (not a corrupt container)
- Sample rate matches what the pipeline expects
FFmpeg render
- Output file exists and size > threshold (not a near-empty file)
- Duration > 0 and within expected range
- Video stream is present and decodable
- Resolution matches the requested dimensions
async function validateFFmpegOutput(
outputPath: string,
expectedDurationSec: number,
expectedResolution: { width: number; height: number }
): Promise<ValidationResult> {
const probe = await ffprobe(outputPath);
if (!probe) return { valid: false, reason: "ffprobe failed — file unreadable" };
const video = probe.streams.find(s => s.codec_type === "video");
if (!video) return { valid: false, reason: "no video stream in output" };
const duration = parseFloat(probe.format.duration ?? "0");
if (duration < 0.5) return { valid: false, reason: "duration near zero" };
const durationRatio = duration / expectedDurationSec;
if (durationRatio < 0.9 || durationRatio > 1.1) {
return { valid: false, reason: "duration out of expected range", measured: { duration, expectedDurationSec } };
}
if (video.width !== expectedResolution.width || video.height !== expectedResolution.height) {
return { valid: false, reason: "resolution mismatch", measured: { width: video.width, height: video.height } };
}
return { valid: true };
}
AI video generation
- File exists and is a valid MP4 (not a partial download or corrupt container)
- Duration > 0
- Resolution matches the generation request
- File size is above a minimum threshold (a 1-second 720p video should not be 12 KB)
Stock search
- Response is parseable JSON
- At least N results returned (N is a pipeline parameter, typically 5–10)
- At least one result has a usable media URL
The validation layer is cheap. A full validation pass on a rendered video via ffprobe takes under 100 milliseconds. The cost of skipping it is a downstream step that silently processes corrupted input and produces corrupted output — compounding the damage across every step that follows.
4. Retry Strategies Per Capability Type
Not all capabilities are safe to retry in the same way. Treating them uniformly leads to either missed recovery opportunities or unintended side effects.
Idempotent capabilities — TTS, render, thumbnail generation
Same input always produces equivalent output (or close enough). Safe to retry without concern. The risk is cost and latency, not correctness.
async function retryIdempotent<T>(
fn: () => Promise<T>,
validate: (result: T) => ValidationResult,
maxAttempts = 3
): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const result = await fn();
const validation = validate(result);
if (validation.valid) return result;
if (attempt < maxAttempts) {
const backoffMs = Math.pow(2, attempt) * 1000; // 2s, 4s, 8s
await sleep(backoffMs);
} else {
throw new Error(`Failed after ${maxAttempts} attempts: ${validation.reason}`);
}
}
throw new Error("unreachable");
}
Non-idempotent capabilities — AI video generation, image generation
Same input does not produce the same output. Each retry generates different content. This is sometimes acceptable (any valid video is fine) and sometimes not (the pipeline is expecting a specific shot that matched a script segment). The pipeline author must specify the policy explicitly — do not inherit a default.
type NonIdempotentRetryPolicy =
| { mode: "any-valid"; maxAttempts: number } // retry freely, accept any valid output
| { mode: "fail-fast" } // one attempt only
| { mode: "budget-gated"; budgetUsd: number }; // retry only if budget allows
Rate-limited capabilities — Pexels, fal.ai, stock APIs
Retrying immediately after a 429 makes things worse. The correct behavior: read the Retry-After header or X-RateLimit-Reset header, wait until the rate limit window resets, then retry.
async function retryRateLimited<T>(
fn: () => Promise<{ result: T; headers: Headers }>,
maxAttempts = 3
): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const { result, headers } = await fn().catch(e => {
if (e.status === 429) {
const retryAfter = parseInt(headers.get("Retry-After") ?? "60", 10);
return { result: null, headers, retryAfterMs: retryAfter * 1000 };
}
throw e;
});
if (result !== null) return result;
const resetMs = getRateLimitResetMs(headers); // parse X-RateLimit-Reset
await sleep(resetMs);
}
throw new Error("Rate limit retry exhausted");
}
Expensive capabilities — Runway, Kling, high-cost video generation
These cost real money per attempt. Auto-retry is not the right default. The pipeline must explicitly authorize a retry — either via a budget parameter or via a human-in-the-loop escalation.
async function runExpensiveCapability<T>(
fn: () => Promise<T>,
validate: (result: T) => ValidationResult,
options: { maxCostUsd: number; currentCostUsd: number }
): Promise<T> {
const result = await fn();
const validation = validate(result);
if (validation.valid) return result;
const retryWouldExceedBudget = options.currentCostUsd + COST_PER_ATTEMPT > options.maxCostUsd;
if (retryWouldExceedBudget) {
throw new Error(`Validation failed and retry would exceed budget. Failing step. Reason: ${validation.reason}`);
}
// budget allows — escalate for human decision before retrying
await escalateToHuman({ reason: validation.reason, costIfRetried: COST_PER_ATTEMPT });
throw new Error("Escalated to human — pipeline paused");
}
5. Fallback Chains
When a primary capability fails validation after all retries are exhausted, the next option is not failure — it is the next provider in the fallback chain. A fallback chain is a prioritized list of providers that can satisfy the same capability, ordered from preferred (highest quality, potentially highest cost) to most available (lowest quality, always available).
TTS fallback chain:
text-to-speech:
Try: ElevenLabs (best quality, highest cost, non-idempotent)
Fallback 1: OpenAI TTS (good quality, reliable idempotent)
Fallback 2: Edge TTS (acceptable quality, free, always available)
Terminal: fail the step with structured error
AI video fallback chain:
ai-video-generation:
Try: fal.ai Wan2 (fast, affordable)
Fallback 1: Kling (slower, higher quality)
Fallback 2: Runway (premium, expensive)
Terminal: fail with escalation to human
Stock footage fallback chain:
stock-search:
Try: Pexels (free tier, 200/hr)
Fallback 1: Pixabay (free, different catalog)
Fallback 2: Storyblocks (subscription, large catalog)
Terminal: return empty results, pipeline handles gracefully
API Mom’s CapabilityRegistryDO manages these chains automatically. The pipeline YAML declares text-to-speech as a capability dependency — not elevenlabs. API Mom resolves the capability to the highest-priority available provider, attempts the call, validates the result using the declared validation schema, and cascades to the next provider on failure. The pipeline code never sees provider names.
// Pipeline code — capability-aware, not provider-aware
const audio = await capabilityRegistry.invoke("text-to-speech", {
text: scriptSegment,
voice: "narrator",
validate: {
minDurationRatio: 0.75,
maxDurationRatio: 1.30
}
});
// API Mom internals — provider resolution, retry, fallback
// The pipeline author does not write this
The separation matters. When ElevenLabs locks out your account or charges unauthorized fees, the pipeline does not need to change. API Mom marks ElevenLabs as unhealthy, and all future text-to-speech requests route to OpenAI TTS until ElevenLabs is restored or removed from the chain.
6. The Sora Lesson
On March 24, 2026, OpenAI shut down the Sora service.
Not a degradation. Not a rate limit. Not a temporary outage. The entire service ceased to exist.
Any pipeline that had hardcoded sora as its video generation provider stopped working permanently and silently. The job would submit. The provider call would fail. Without a fallback chain, the pipeline would fail. With a fallback chain configured for a provider that no longer exists, the fallback would never trigger because the top-level error was not a transient failure — it was a permanent one.
The lesson is not specific to Sora. It applies to any third-party capability:
Platforms can disappear overnight. A startup goes bankrupt. An API is deprecated in favor of a newer model. A service pivots away from the market you depend on. The gap between “working yesterday” and “gone today” can be a single business decision by a company whose priorities are not yours.
The capability architecture is the protection. The pipeline YAML references ai-video-generation. API Mom routes to whatever provider is alive. When Sora disappears from the registry, all pipelines that referenced ai-video-generation continue working — routed automatically to the next available provider. The only change is a single line in the provider registry: marking Sora as permanently unavailable.
# pipeline.yaml — what you write
steps:
- capability: ai-video-generation
params:
duration: 5
style: cinematic
# What you do NOT write
steps:
- provider: sora # <-- this breaks permanently on March 24, 2026
endpoint: https://sora.openai.com/...
The capability contract also protects against the inverse: a new, better provider becomes available. The pipeline automatically benefits without any changes. The abstraction layer that feels like overhead during early development becomes a compounding structural advantage over the life of the pipeline.
7. Memory as a Weapon
FFmpeg processing a corrupted input stream can consume 8 GB of memory in under 60 seconds. This is not a theoretical case — it is a documented behavior when certain container formats have corrupted or missing duration metadata, causing FFmpeg to buffer indefinitely while trying to determine stream length before beginning processing.
On a machine running multiple concurrent pipeline jobs, this behavior can exhaust all available memory, trigger the OOM killer, and terminate unrelated processes. The affected pipeline fails. Every other pipeline on the machine may also fail or be corrupted.
Memory, in this context, is a weapon held by the input. Any input from an external source — a user upload, a downloaded file from a third-party API, a generated file from an AI video service — is potentially corrupted. The capability script must treat it as such.
The required defensive protocol for any FFmpeg capability:
#!/bin/bash
set -euo pipefail
INPUT="$1"
OUTPUT="$2"
# 1. Validate input before any processing
if ! ffprobe -v error -select_streams v:0 \
-show_entries stream=codec_type,duration \
-of json "$INPUT" > /tmp/probe.json 2>/dev/null; then
echo '{"error": "input_unreadable"}' >&2
exit 1
fi
DURATION=$(jq -r '.streams[0].duration // "unknown"' /tmp/probe.json)
if [ "$DURATION" = "unknown" ] || [ "$DURATION" = "N/A" ]; then
echo '{"error": "missing_duration_metadata"}' >&2
exit 1
fi
# 2. Set memory limit (500 MB for this capability)
ulimit -v $((500 * 1024)) # 500 MB virtual memory limit
# 3. Set hard timeout — kill after 5 minutes regardless
timeout 300 ffmpeg \
-i "$INPUT" \
-c:v libx264 -preset fast \
-c:a aac \
-movflags +faststart \
"$OUTPUT" \
|| { echo '{"error": "ffmpeg_failed_or_timeout"}' >&2; exit 1; }
# 4. Validate output
if [ ! -s "$OUTPUT" ]; then
echo '{"error": "output_empty"}' >&2
exit 1
fi
The SIGTERM handler for temp file cleanup:
TEMP_DIR=$(mktemp -d)
cleanup() {
rm -rf "$TEMP_DIR"
}
trap cleanup EXIT SIGTERM SIGINT
This is not defensive programming. It is correct programming for an environment where inputs are not trusted. The validation before processing is the most important step — it avoids the corrupted-stream memory consumption entirely by refusing to process inputs that lack valid metadata.
The timeout is the backstop. If validation passes but processing encounters a pathological case, the hard 5-minute kill prevents runaway resource consumption. Five minutes is generous for any media processing job that should complete in under 90 seconds. A job that takes longer is either processing input too large for this capability, or it is stuck.
8. ElevenLabs Horror Stories
ElevenLabs is a capable TTS provider with demonstrably superior voice quality compared to alternatives. It is also the provider that has produced the most severe production incidents in real pipelines.
Incident 1: Unauthorized billing A pipeline running overnight generated significantly more API calls than planned due to a retry loop bug. ElevenLabs has no rate limit on the free tier for the operations that were called. The billing exceeded $2,000 in a single night. The account was charged before any alert could fire.
Incident 2: Account lockout Following unusual billing patterns, ElevenLabs locked the account without prior notice. All pipelines using ElevenLabs as their primary TTS provider stopped working. No programmatic notification was sent — the failure was discovered when pipeline operators noticed audio missing from generated content.
Incident 3: Non-deterministic output The same script, sent twice with identical parameters including voice ID and stability settings, produced measurably different output files. Duration differed by 8%. Pronunciation of specific words differed. This is expected behavior for neural TTS — but pipelines that relied on consistent output for downstream processing (audio-video synchronization, transcription verification) failed in ways that were hard to trace.
The mitigations, applied now:
Never depend on a single TTS provider. ElevenLabs sits at position 1 in the fallback chain. OpenAI TTS is position 2. Edge TTS is position 3. If ElevenLabs fails validation after retries, the chain advances automatically.
Monitor billing programmatically. ElevenLabs provides a usage API. Poll it on every pipeline run, or on a schedule. Alert if the per-hour spend rate exceeds a threshold.
async function checkElevenLabsBilling(): Promise<void> {
const usage = await elevenlabs.getUsageStats({ period: "1h" });
if (usage.charactersUsed > CHAR_LIMIT_PER_HOUR) {
await alert({
severity: "critical",
message: `ElevenLabs usage exceeding threshold: ${usage.charactersUsed} chars/hr`,
action: "pause_pipelines_using_elevenlabs"
});
}
}
Validate output consistency. For idempotent use cases where the same voice and text are expected to produce comparable output, validate that successive calls produce audio within an acceptable duration range of each other. A ratio outside 0.85–1.15 between two calls for the same input indicates unexpected non-determinism.
Keep Edge TTS as a free, always-available fallback. Edge TTS (Microsoft Azure Cognitive Services, accessed via the edge-tts CLI) has no cost per character, no account lockout risk, and no billing surprises. Its quality is acceptable for most production use cases. It is not the first choice — but the first choice does not matter if the fallback is not there when needed.
9. The Compound Effect of Unreliability
Consider a pipeline with five sequential steps. Each step depends on the previous. No retry logic, no fallback chains.
If each step is 90% reliable, the pipeline success rate is:
0.9 × 0.9 × 0.9 × 0.9 × 0.9 = 0.59
A pipeline where each step works nine times out of ten will succeed less than six times out of ten. Forty-one percent failure rate, with no intervention, on a system where every individual component looks healthy.
The math is unforgiving at scale. A pipeline that runs 100 times per day generates 41 failures per day — each requiring investigation, resubmission, or manual remediation.
With retry (2 attempts per step, same provider):
Each step’s effective success rate rises from 90% to approximately:
P(step succeeds) = 1 - P(both attempts fail)
= 1 - (0.1 × 0.1)
= 1 - 0.01
= 0.99
Pipeline success rate:
0.99^5 = 0.95
From 59% to 95% — a 36-point improvement from a single retry per step.
With retry plus fallback chain:
A fallback chain catches failures that retries cannot — provider outages, rate limits, validation failures that indicate a systematic issue with the primary provider rather than a transient error.
If the fallback provider has 90% reliability for cases that the primary failed, the effective per-step success rate rises further:
P(step succeeds) = P(primary succeeds) + P(primary fails) × P(fallback succeeds)
= 0.99 + 0.01 × 0.90
= 0.999
Pipeline success rate:
0.999^5 ≈ 0.995
From 59% to 99.5%. The same five-step pipeline, no changes to the underlying capability providers, no improvements to their individual reliability — just retry logic and fallback chains applied consistently.
| Configuration | Per-step reliability | 5-step pipeline success |
|---|---|---|
| No retry, no fallback | 90% | 59% |
| 2 retries, no fallback | 99% | 95% |
| 2 retries + fallback chain | 99.9% | 99.5% |
This is why retry plus fallback at every step is non-negotiable — not as a best practice, but as a mathematical requirement for operating a multi-step pipeline with acceptable success rates.
The compound effect works in reverse too. A pipeline where each step has an 80% individual success rate (realistic for AI video generation + TTS + stock search combined) has a baseline success rate of 0.8^5 = 33%. Without retry and fallback, one in three pipeline runs completes. With retry and fallback, the same pipeline achieves 99%+ completion.
The engineering cost of retry and fallback chains is fixed. The operational cost of not having them scales linearly with pipeline volume.
10. Monitoring What Matters
The monitoring layer must be specific enough to trigger on real problems and quiet enough to avoid alert fatigue. Four metrics cover the essential signal for capability reliability:
Per-capability success rate
Track success and failure for every capability independently. Do not roll up into a single pipeline success metric — that obscures which capability is degrading.
Alert threshold: success rate drops below 80% over a 15-minute window.
interface CapabilityMetric {
capabilityName: string;
provider: string;
timestamp: number;
success: boolean;
latencyMs: number;
validationResult?: ValidationResult;
retryCount: number;
usedFallback: boolean;
}
// Emit after every capability invocation
await metrics.emit(capabilityMetric);
Per-capability latency P99
Mean latency is misleading. P99 is the latency that nearly every call experiences or better — it captures the long tail that burns pipeline throughput. A TTS call that averages 800ms but has a P99 of 45 seconds indicates a systematic problem (probably timeouts hitting, being retried, succeeding on retry) that the mean hides.
Alert threshold: P99 doubles compared to 24-hour baseline.
Retry rate
A healthy system has a low retry rate. An elevated retry rate means the primary provider is degrading and retries are masking it. This is exactly the scenario where monitoring needs to surface the problem before it gets worse.
Alert threshold: retry rate exceeds 20% of total calls for a given capability over any 30-minute window.
Fallback rate
A fallback being triggered means the primary provider has failed validation after all retries. This is a serious signal — not a transient error but a provider that is systematically failing.
Alert threshold: primary provider fails and fallback is triggered in more than 10% of calls for a given capability over any 60-minute window.
// Dashboard query example
SELECT
capability_name,
provider,
COUNT(*) AS total_calls,
SUM(CASE WHEN success THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS success_rate,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms) AS p99_latency_ms,
SUM(retry_count) * 100.0 / COUNT(*) AS retry_rate,
SUM(CASE WHEN used_fallback THEN 1 ELSE 0 END) * 100.0 / COUNT(*) AS fallback_rate
FROM capability_metrics
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY capability_name, provider
ORDER BY success_rate ASC;
The alert system is the tight loop operating at the capability level. The autonomous entity pattern describes escalation up through human levels when automated recovery fails. Monitoring these four metrics provides the signal that triggers escalation before a degraded capability causes widespread pipeline failure.
What not to monitor:
Do not alert on individual retries. Do not alert on individual fallback uses. Do not alert on single-call failures. These are noise at the call level and signal only at the rate level. The difference between a system that is healthy and a system that is degrading is not any single event — it is the pattern over time.
The Architecture That Survives
The pattern across all ten sections is consistent:
Assume failure. Validate at every boundary. Retry selectively. Fall back gracefully. Monitor rates, not events.
A pipeline built on these principles does not become more complex as it encounters more providers, more failure modes, more edge cases. It becomes more capable — because each new failure mode produces a validation rule, a retry policy, or a fallback entry that prevents the same failure from affecting any future run.
The autonomous entity pattern describes how accumulated fixes become skills. The tight loop principle describes how the system that catches bugs is more valuable than the fix for any individual bug. The compound math of unreliability shows why these properties are not optional at scale.
The 80% assumption is not pessimism. It is calibration. A system designed for the real world, not the ideal one.
Related Articles
- The Autonomous Entity Pattern — escalation ladder, skill crystallization, accumulated failure context
- The Tight Loop: How High-Reliability Systems Stay Alive — fix the system, not the bug
- API MOM as Intelligent Router — capability registry, provider routing, fallback chain management
- The Capability Primitive — decomposing monoliths into composable capabilities
- Cloudflare Autonomous Pipeline — end-to-end pipeline architecture
Reference Implementation
- garywu/api-mom — CapabilityRegistryDO: provider resolution, fallback chain, retry policies, validation schemas
- garywu/video-factory — capability scripts with validation, ulimit guards, SIGTERM cleanup handlers
- garywu/mulan — pipeline YAML format, capability invocation interface, escalation on validation failure