Unified API

Routero AI exposes a single OpenAI-compatible API surface in front of 100+ LLM providers. Change base_url — everything else stays the same.

Base URL: https://api.routero.ai/v1

Inference endpoints (data plane)

All endpoints accept standard OpenAI request shapes and return standard OpenAI response shapes.

Endpoint	Method	Description
`/chat/completions`	POST	Chat completions — primary endpoint
`/completions`	POST	Legacy text completions
`/embeddings`	POST	Text embeddings
`/images/generations`	POST	Image generation
`/images/edits`	POST	Image editing
`/audio/speech`	POST	Text-to-speech
`/audio/transcriptions`	POST	Speech-to-text
`/moderations`	POST	Content moderation
`/rerank`	POST	Document reranking (Cohere-compatible)
`/batches`	POST	Async batch processing
`/files`	POST/GET	File upload/retrieval
`/models`	GET	List available models
`/responses`	POST	OpenAI Responses API
`/threads`, `/assistants`	POST/GET	Assistants API

Provider-native aliases:

Anthropic: /v1/messages, /v1/messages/count_tokens
Google: /v1beta/models/{model}:generateContent, :streamGenerateContent, :countTokens
Azure OpenAI: /openai/deployments/{model}/chat/completions

Supported providers

Routero ships per-provider transformation for 100+ providers across 139 configurations. A sample:

Provider	Notes
OpenAI	All models, all endpoints
Anthropic	Claude family, including Messages API
AWS Bedrock	All Bedrock models incl. Claude, Llama, Mistral
Google Vertex AI	Gemini, PaLM, embedding models
Google Gemini (direct)	`gemini-*` models
Azure OpenAI	All deployments, Azure AI Studio
Groq	Ultra-fast inference
Ollama	Local/self-hosted models
Cohere	Completions, embeddings, rerank
Mistral	All Mistral models
DeepSeek	DeepSeek-R1, V3
Together AI	Open models
Fireworks AI	Fast open-model inference
xAI (Grok)	Grok family
Perplexity	Online LLMs
Databricks	DBRX, Llama on Databricks
AWS SageMaker	Custom endpoint support
IBM Watsonx	Enterprise LLMs
Snowflake	Cortex LLMs
vLLM / hosted_vllm	Self-hosted vLLM instances
Regional providers	Volcengine, DashScope, MiniMax, Moonshot, ZhipuAI, and more

See /models for the full live list of models configured in your workspace.

Model string formats

# Smart alias — resolves to your configured policy
"smart/balanced"

# Provider-scoped
"openai/gpt-4o"
"anthropic/claude-sonnet-4-6-20250514"
"bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0"
"azure/my-deployment-name"
"ollama/llama3.2"

# Bare name — Routero infers the provider
"gpt-4o"
"claude-sonnet-4-6"

Streaming

All streaming endpoints use standard Server-Sent Events (SSE). Routero passes chunks through with zero buffering. Failover during a streaming response replays only the tail — the client receives one uninterrupted stream even if the primary provider fails mid-response.

stream = client.chat.completions.create(
    model="smart/balanced",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")