Guardrails

Guardrails are org-scoped named configurations that apply one or more safety engines to requests and responses. They run inside the gateway — before the LLM sees the prompt and after it responds — without changing a line of application code.

Guardrails answer legal’s question: “What did the model see?” Content-filter violations, PII redactions, and secret detections are written to your audit log with their category and message — not the raw blocked content.


Activation

response = client.chat.completions.create(
    model="smart/balanced",
    messages=[{"role": "user", "content": user_input}],
    extra_body={"guardrail_id": "my-pii-guardrail"},
)

On a violation that is configured to block, the gateway returns HTTP 400 with a structured error:

{
  "error": {
    "message": "Request blocked by guardrail: PII detected (EMAIL_ADDRESS)",
    "type": "guardrail_violation",
    "code": "guardrail_blocked"
  }
}

Built-in engines

Four engines compose within a single guardrail. They run sequentially; each receives the (possibly-modified) output of the previous.

Content Filter

Blocks or flags requests and responses matching keyword or regex patterns.

Config Description
banned_keywords Case-insensitive substring match list
banned_patterns Regex list with IGNORECASE
event_hooks pre_call, post_call, or both

No extra dependencies. Zero-latency.


Tool Permission

Enforces an allow-list or deny-list on function/tool names before the LLM call.

Config Description
allowed_tools Whitelist — only these tool names are permitted
blocked_tools Blacklist — these tool names are removed from the request
on_violation block (reject the request) or remove (strip the tool silently)

Runs pre-call only (tools are in the request, not the response).


PII Detection (Presidio)

Detects and anonymises personally identifiable information in prompts and responses using Microsoft Presidio.

Config Description
entities List of entity types: PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, US_SSN, IBAN_CODE, IP_ADDRESS, LOCATION, …
action anonymize (replace with <ENTITY_TYPE>) or block (reject if PII found)
score_threshold Minimum Presidio confidence score (default 0.5)
event_hooks pre_call, post_call, or both

Dependencies: presidio-analyzer, presidio-anonymizer

Presidio runs locally in the gateway — PII never leaves your infrastructure to reach an external moderation vendor.


Secret Detection (detect-secrets)

Detects leaked credentials and secrets in prompts using Yelp detect-secrets.

Config Description
action redact (replace with [REDACTED]) or block (reject)
detectors Subset of ~21 built-in detectors: aws, github, slack, stripe, jwt, private_key, azure, twilio, base64_high_entropy, …

Runs pre-call only (secrets are in the prompt, not the response).

Dependencies: detect-secrets


Creating a guardrail

curl -X POST https://api.routero.ai/guardrail \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "guardrail_name": "pii-redact-prod",
    "engines": [
      {
        "engine_name": "presidio",
        "config": {
          "entities": ["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD", "US_SSN"],
          "action": "anonymize",
          "score_threshold": 0.5
        },
        "event_hooks": ["pre_call", "post_call"]
      },
      {
        "engine_name": "detect_secret",
        "config": {
          "action": "redact",
          "detectors": ["aws", "github", "stripe", "jwt"]
        },
        "event_hooks": ["pre_call"]
      }
    ]
  }'

Management API

Endpoint Description
GET /guardrail/engines List available engine types
POST /guardrail Create a guardrail
GET /guardrail/list List guardrails in workspace (paginated)
GET /guardrail/{id} Get guardrail details
PATCH /guardrail/{id} Update a guardrail
DELETE /guardrail/{id} Delete a guardrail