Unified API
Routero AI exposes a single OpenAI-compatible API surface in front of 100+ LLM providers. Change base_url — everything else stays the same.
Base URL: https://api.routero.ai/v1
Inference endpoints (data plane)
All endpoints accept standard OpenAI request shapes and return standard OpenAI response shapes.
| Endpoint | Method | Description |
|---|---|---|
/chat/completions |
POST | Chat completions — primary endpoint |
/completions |
POST | Legacy text completions |
/embeddings |
POST | Text embeddings |
/images/generations |
POST | Image generation |
/images/edits |
POST | Image editing |
/audio/speech |
POST | Text-to-speech |
/audio/transcriptions |
POST | Speech-to-text |
/moderations |
POST | Content moderation |
/rerank |
POST | Document reranking (Cohere-compatible) |
/batches |
POST | Async batch processing |
/files |
POST/GET | File upload/retrieval |
/models |
GET | List available models |
/responses |
POST | OpenAI Responses API |
/threads, /assistants |
POST/GET | Assistants API |
Provider-native aliases:
- Anthropic:
/v1/messages,/v1/messages/count_tokens - Google:
/v1beta/models/{model}:generateContent,:streamGenerateContent,:countTokens - Azure OpenAI:
/openai/deployments/{model}/chat/completions
Supported providers
Routero ships per-provider transformation for 100+ providers across 139 configurations. A sample:
| Provider | Notes |
|---|---|
| OpenAI | All models, all endpoints |
| Anthropic | Claude family, including Messages API |
| AWS Bedrock | All Bedrock models incl. Claude, Llama, Mistral |
| Google Vertex AI | Gemini, PaLM, embedding models |
| Google Gemini (direct) | gemini-* models |
| Azure OpenAI | All deployments, Azure AI Studio |
| Groq | Ultra-fast inference |
| Ollama | Local/self-hosted models |
| Cohere | Completions, embeddings, rerank |
| Mistral | All Mistral models |
| DeepSeek | DeepSeek-R1, V3 |
| Together AI | Open models |
| Fireworks AI | Fast open-model inference |
| xAI (Grok) | Grok family |
| Perplexity | Online LLMs |
| Databricks | DBRX, Llama on Databricks |
| AWS SageMaker | Custom endpoint support |
| IBM Watsonx | Enterprise LLMs |
| Snowflake | Cortex LLMs |
| vLLM / hosted_vllm | Self-hosted vLLM instances |
| Regional providers | Volcengine, DashScope, MiniMax, Moonshot, ZhipuAI, and more |
See /models for the full live list of models configured in your workspace.
Model string formats
# Smart alias — resolves to your configured policy
"smart/balanced"
# Provider-scoped
"openai/gpt-4o"
"anthropic/claude-sonnet-4-6-20250514"
"bedrock/anthropic.claude-sonnet-4-6-20250514-v1:0"
"azure/my-deployment-name"
"ollama/llama3.2"
# Bare name — Routero infers the provider
"gpt-4o"
"claude-sonnet-4-6"
Streaming
All streaming endpoints use standard Server-Sent Events (SSE). Routero passes chunks through with zero buffering. Failover during a streaming response replays only the tail — the client receives one uninterrupted stream even if the primary provider fails mid-response.
stream = client.chat.completions.create(
model="smart/balanced",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")