Failover & Fallbacks
Routero AI treats provider outages as routing problems, not application errors. Configure a fallback chain; the Router handles failure transparently — including during active streaming responses.
P99 failover decision + retry: <280 ms.
Configuring a fallback chain
# In your router config or policy YAML
router_settings:
fallbacks:
- openai/gpt-4o:
- anthropic/claude-sonnet-4-6-20250514
- bedrock/meta.llama4-maverick-17b-instruct-v1:0
num_retries: 3
retry_after: 0.08 # 80 ms base backoff
timeout: 30 # per-attempt timeout (seconds)
retry_on:
- 5xx
- timeout
- content_filter
When openai/gpt-4o returns a 5xx or times out, Routero retries on claude-sonnet-4-6, then on llama-4-maverick, before surfacing an error to the caller.
Error classification and retry behaviour
Routero classifies provider errors and chooses the retry strategy accordingly:
| Error type | Default behaviour |
|---|---|
5xx (server error) |
Retry on next deployment in fallback chain |
429 (rate limit) |
Retry on the same deployment after backoff (respects Retry-After header) |
content_filter |
Jump to next deployment (different model may not trip the filter) |
context_window |
Next deployment only if it has a larger context window |
auth_error |
Do not retry; surface error immediately |
timeout |
Retry on next deployment |
Streaming-aware failover
If a provider fails mid-stream, Routero replays only the undelivered tail on the fallback provider. The client receives one uninterrupted SSE stream — no dropped connection, no duplicate tokens, no client-side retry logic required.
Budget-aware fallback
Fallback respects your workspace’s spend policies. If the primary deployment would exceed a budget ceiling, the Router selects the next deployment in the chain before making the call — the budget check runs as part of the policy gate, before the provider call.
Region-pinned fallback
Fallback chains can be constrained to a data residency region. If your policy specifies residency: eu-only, the Router only considers deployments in EU regions for both primary and fallback selection.
→ Policy Routing · Data Residency & Regions
Per-request audit
Every retry and fallback decision is logged in the audit trail:
- Which provider was tried
- The error type and retry reason
- The fallback provider selected
- Total latency including retry overhead
The response includes headers with the chosen provider and retry count for debugging.