Failover & Fallbacks

Routero AI treats provider outages as routing problems, not application errors. Configure a fallback chain; the Router handles failure transparently — including during active streaming responses.

P99 failover decision + retry: <280 ms.

Configuring a fallback chain

# In your router config or policy YAML
router_settings:
  fallbacks:
    - openai/gpt-4o:
        - anthropic/claude-sonnet-4-6-20250514
        - bedrock/meta.llama4-maverick-17b-instruct-v1:0
  num_retries: 3
  retry_after: 0.08          # 80 ms base backoff
  timeout: 30                # per-attempt timeout (seconds)
  retry_on:
    - 5xx
    - timeout
    - content_filter

When openai/gpt-4o returns a 5xx or times out, Routero retries on claude-sonnet-4-6, then on llama-4-maverick, before surfacing an error to the caller.

Error classification and retry behaviour

Routero classifies provider errors and chooses the retry strategy accordingly:

Error type	Default behaviour
`5xx` (server error)	Retry on next deployment in fallback chain
`429` (rate limit)	Retry on the same deployment after backoff (respects `Retry-After` header)
`content_filter`	Jump to next deployment (different model may not trip the filter)
`context_window`	Next deployment only if it has a larger context window
`auth_error`	Do not retry; surface error immediately
`timeout`	Retry on next deployment

Streaming-aware failover

If a provider fails mid-stream, Routero replays only the undelivered tail on the fallback provider. The client receives one uninterrupted SSE stream — no dropped connection, no duplicate tokens, no client-side retry logic required.

Budget-aware fallback

Fallback respects your workspace’s spend policies. If the primary deployment would exceed a budget ceiling, the Router selects the next deployment in the chain before making the call — the budget check runs as part of the policy gate, before the provider call.

→ Budgets & Spend Guards

Region-pinned fallback

Fallback chains can be constrained to a data residency region. If your policy specifies residency: eu-only, the Router only considers deployments in EU regions for both primary and fallback selection.

→ Policy Routing · Data Residency & Regions

Per-request audit

Every retry and fallback decision is logged in the audit trail:

Which provider was tried
The error type and retry reason
The fallback provider selected
Total latency including retry overhead

The response includes headers with the chosen provider and retry count for debugging.