Guardrails
Guardrails are org-scoped named configurations that apply one or more safety engines to requests and responses. They run inside the gateway — before the LLM sees the prompt and after it responds — without changing a line of application code.
Guardrails answer legal’s question: “What did the model see?” Content-filter violations, PII redactions, and secret detections are written to your audit log with their category and message — not the raw blocked content.
Activation
response = client.chat.completions.create(
model="smart/balanced",
messages=[{"role": "user", "content": user_input}],
extra_body={"guardrail_id": "my-pii-guardrail"},
)
On a violation that is configured to block, the gateway returns HTTP 400 with a structured error:
{
"error": {
"message": "Request blocked by guardrail: PII detected (EMAIL_ADDRESS)",
"type": "guardrail_violation",
"code": "guardrail_blocked"
}
}
Built-in engines
Four engines compose within a single guardrail. They run sequentially; each receives the (possibly-modified) output of the previous.
Content Filter
Blocks or flags requests and responses matching keyword or regex patterns.
| Config | Description |
|---|---|
banned_keywords |
Case-insensitive substring match list |
banned_patterns |
Regex list with IGNORECASE |
event_hooks |
pre_call, post_call, or both |
No extra dependencies. Zero-latency.
Tool Permission
Enforces an allow-list or deny-list on function/tool names before the LLM call.
| Config | Description |
|---|---|
allowed_tools |
Whitelist — only these tool names are permitted |
blocked_tools |
Blacklist — these tool names are removed from the request |
on_violation |
block (reject the request) or remove (strip the tool silently) |
Runs pre-call only (tools are in the request, not the response).
PII Detection (Presidio)
Detects and anonymises personally identifiable information in prompts and responses using Microsoft Presidio.
| Config | Description |
|---|---|
entities |
List of entity types: PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, US_SSN, IBAN_CODE, IP_ADDRESS, LOCATION, … |
action |
anonymize (replace with <ENTITY_TYPE>) or block (reject if PII found) |
score_threshold |
Minimum Presidio confidence score (default 0.5) |
event_hooks |
pre_call, post_call, or both |
Dependencies: presidio-analyzer, presidio-anonymizer
Presidio runs locally in the gateway — PII never leaves your infrastructure to reach an external moderation vendor.
Secret Detection (detect-secrets)
Detects leaked credentials and secrets in prompts using Yelp detect-secrets.
| Config | Description |
|---|---|
action |
redact (replace with [REDACTED]) or block (reject) |
detectors |
Subset of ~21 built-in detectors: aws, github, slack, stripe, jwt, private_key, azure, twilio, base64_high_entropy, … |
Runs pre-call only (secrets are in the prompt, not the response).
Dependencies: detect-secrets
Creating a guardrail
curl -X POST https://api.routero.ai/guardrail \
-H "Authorization: Bearer $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{
"guardrail_name": "pii-redact-prod",
"engines": [
{
"engine_name": "presidio",
"config": {
"entities": ["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD", "US_SSN"],
"action": "anonymize",
"score_threshold": 0.5
},
"event_hooks": ["pre_call", "post_call"]
},
{
"engine_name": "detect_secret",
"config": {
"action": "redact",
"detectors": ["aws", "github", "stripe", "jwt"]
},
"event_hooks": ["pre_call"]
}
]
}'
Management API
| Endpoint | Description |
|---|---|
GET /guardrail/engines |
List available engine types |
POST /guardrail |
Create a guardrail |
GET /guardrail/list |
List guardrails in workspace (paginated) |
GET /guardrail/{id} |
Get guardrail details |
PATCH /guardrail/{id} |
Update a guardrail |
DELETE /guardrail/{id} |
Delete a guardrail |