Introduction to Routero AI
Every AI model. One router you can trust.
Routero AI is an enterprise AI control plane — a unified gateway that sits between your applications and every AI provider. It gives platform, security, and FinOps teams the governance layer they need to ship AI features with confidence, while letting developers use the OpenAI SDK they already know.
Change base_url in one line of code. Get 100+ models, built-in failover, declarative policy, spend controls, and an immutable audit trail — with data exactly where your security team requires.
“We replaced four gateways and a 600-line failover hack with one Routero AI config.”
The enterprise problem
Shipping AI in production means clearing three hurdles before a single prompt touches a user:
- Security & compliance — Which models can touch sensitive data? Who approved this? What went through the system last Tuesday?
- Cost accountability — Which team is spending what? Who gets the bill when the model runs overnight by mistake?
- Operational reliability — What happens when GPT-4o rate-limits at 2 am? Can we swap providers without a deploy?
Routero AI is purpose-built for all three — with a control plane your security team can review, your FinOps team can report on, and your platform team can operate without building it from scratch.
Routero charges for the control plane, not for tokens. Provider costs are passed through at list price with zero markup. Every charge is accountable and tied to an operational purpose.
One request, four decisions
Every request runs a deterministic, auditable pipeline:
Your app
→ [Policy gate] identity · content check · model allowlist · budget
→ [Provider selection] health + latency · price · residency · fallback chain
→ [Account & audit] token/$ debited atomically · decision logged immutably
→ Provider
P50 routing overhead: ~8–12 ms. P99: <50 ms. Every decision is logged within milliseconds and reproducible months later.
Four building blocks
Routero is composed of four composable primitives. Use one or all — they are independent.
Routes & Failover
Named model groups with ordered provider fallbacks. Automatically retries on 5xx, rate limits, or content-filter trips. Streaming-aware — no dropped chunks. Three providers in a chain is the new 99.99% uptime.
→ Routing & Load Balancing · Failover & Fallbacks
Policy Routing
Declarative YAML rules that decide which model serves which request — evaluated on identity, content classification, region, budget state, schedule, and custom app signals. Version-controlled, human-reviewable, and hot-reloaded in under 5 seconds with no application redeploy.
Budgets & Spend Guards
Hard ceilings, soft alerts, and per-team chargeback for every dollar of AI spend. Warn at 80 %, auto-throttle at 100 %, block if you mean it. Finance gets one consolidated invoice; each team gets attributed line items.
SSO, RBAC & Audit
SAML 2.0 · SCIM 2.0 auto-provisioning · Cerbos fine-grained authorization · short-lived scoped virtual keys · an immutable, cryptographically signed audit log for every request, policy change, and key rotation.
Deployment: pick your trust boundary
The same control plane runs in three configurations. Your security team picks where data lives.
| Deployment | Best for | Where data lives |
|---|---|---|
| Routero Cloud | Fastest onboarding, elastic scale | Routero’s AWS (Singapore), SOC 2 |
| Single-Tenant Cloud | Dedicated region, physical isolation, data residency | Your chosen region, Routero-managed |
| Self-Hosted (AWS / Docker) | VPC isolation, full key control, air-gap-ready | Entirely your infrastructure |
Advanced Features — the production AI layer
Beyond routing and governance, Routero ships four opt-in capabilities that production AI systems typically build themselves. Activate each by passing a single ID on any request — no payload restructuring, no new endpoints.
| Feature | What it does |
|---|---|
| Token Saving | Prompt compression + exact & semantic response caching — reduce compute cost without changing application code |
| Guardrails | Content filtering · PII redaction (Presidio) · secret detection · tool allow/deny lists — centrally managed, per-org enforced |
| Prompt Management | Central prompt registry with immutable versioning, Jinja2 templates, two-layer caching, and instant rollback |
| Memory-as-a-Service | Long-term memory via Mem0 (vector) and Cognee (knowledge graph) — automatically retrieved and injected per request |
Who this documentation is for
Platform & infrastructure engineers — building the AI plumbing. Start with Quickstart then Deployment Options.
Security & compliance — reviewing and approving. Start with SSO, RBAC & Audit, Compliance, and Deployment Options.
FinOps & engineering managers — owning the bill. Start with Budgets & Spend Guards and Cost Tracking & Billing.
Developers — calling the API. Start with Quickstart and Unified API.