3-Provider Failover Chain
Configure a three-provider fallback so that a single provider outage or rate limit is transparent to your application. This is the most common production Routero configuration.
What you’re building
Request → smart/balanced
→ try: openai/gpt-4o (primary — lowest latency)
→ if 5xx or 429 or timeout:
→ try: anthropic/claude-sonnet-4-6 (fallback 1 — different provider)
→ if 5xx or 429 or timeout:
→ try: bedrock/anthropic.claude-sonnet-4-6 (fallback 2 — different API + region)
→ if all fail: return 503 to caller with retry guidance
Step 1 — Register the three deployments
# Primary
curl -X POST https://api.routero.ai/model/new \
-H "Authorization: Bearer $ADMIN_KEY" \
-d '{"model_name": "smart/balanced", "litellm_params": {"model": "openai/gpt-4o", "api_key": "sk-openai-..."}}'
# Fallback 1
curl -X POST https://api.routero.ai/model/new \
-H "Authorization: Bearer $ADMIN_KEY" \
-d '{"model_name": "smart/balanced", "litellm_params": {"model": "anthropic/claude-sonnet-4-6-20250514", "api_key": "sk-ant-..."}}'
# Fallback 2 — Bedrock uses IAM, not an API key
curl -X POST https://api.routero.ai/model/new \
-H "Authorization: Bearer $ADMIN_KEY" \
-d '{
"model_name": "smart/balanced",
"litellm_params": {
"model": "bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0",
"aws_access_key_id": "...",
"aws_secret_access_key": "...",
"aws_region_name": "us-east-1"
}
}'
Step 2 — Configure fallback order and retry behaviour
In the proxy config YAML or via the dashboard:
router_settings:
routing_strategy: least_busy
num_retries: 2
retry_after: 0.08 # 80ms base backoff
timeout: 30 # per-attempt timeout (s)
retry_on:
- 5xx
- timeout
- content_filter
fallbacks:
- "openai/gpt-4o":
- "anthropic/claude-sonnet-4-6-20250514"
- "bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0"
cooldown_time: 60 # failed provider is cooled down for 60s
Step 3 — Test the failover
Temporarily set an invalid key for OpenAI and verify that requests fall through to Anthropic:
import openai
client = openai.OpenAI(
api_key="YOUR_ROUTERO_KEY",
base_url="https://api.routero.ai/v1",
)
response = client.chat.completions.create(
model="smart/balanced",
messages=[{"role": "user", "content": "Which provider am I on?"}],
)
# Check x-litellm-model-id header — should show the fallback provider
print(response.model)
What to check in the audit log
The audit log entry for each request includes:
fallback_count— number of retries before successmodel— the provider that ultimately served the responselatency_ms— total latency including retry overhead
High fallback_count on a provider indicates it should be cooled down or deprioritised.