Memory-as-a-Service

Memory-as-a-Service (MaaS) turns the gateway into a memory provider. Applications get personalization and long-term context without operating their own vector store or graph database — just pass a memory_id on the request.


How it works

Pre-call (retrieval): The gateway takes the latest user message, searches the memory session for the top-3 relevant facts, and injects them into the system message as:

[Past Context for ID: user-alice]
- Prefers summaries under 200 words
- Working on Q3 APAC analysis
- Last session: discussed Bedrock pricing

Post-call (storage): After the LLM responds, the gateway asynchronously stores the (user message, assistant response) turn in the memory backend. Pass store_memory: false to skip storage on a specific request.


Activation

response = client.chat.completions.create(
    model="smart/balanced",
    messages=[{"role": "user", "content": "Remind me where we left off."}],
    extra_body={
        "memory_id": "user-alice",
        "store_memory": True,          # default — omit to use default
    },
)

Memory engines

Engine Backend Best for
Mem0 Postgres + pgvector User preferences, recent facts, short-to-medium semantic recall
Cognee Neo4j + pgvector + Postgres Entity/relationship knowledge, long-horizon reasoning, knowledge graph queries

Choose the engine when creating the memory session. Sessions cannot change engine after creation.

Mem0 queries use keyword vs. question heuristics and fact deduplication to reduce redundant storage.

Cognee supports SearchType.GRAPH_COMPLETION, CHUNKS, and SUMMARIES — with graph→vector search fallback for robustness. Deletion cleans up Neo4j, PGVector, and Postgres atomically.


Creating a memory session

curl -X POST https://api.routero.ai/memory/session/create \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session_name": "user-alice",
    "engine_name": "mem0"
  }'

The returned memory_id is what callers pass on requests.


Manual fact management

You can ingest facts directly (without going through a chat turn) and query the session programmatically:

# Ingest a fact manually
curl -X POST https://api.routero.ai/memory/session/add \
  -d '{"memory_id": "user-alice", "messages": [{"role": "user", "content": "My team is in Singapore."}]}'

# Search the session
curl -X POST https://api.routero.ai/memory/session/search \
  -d '{"memory_id": "user-alice", "query": "location preferences"}'

# List all stored facts
curl "https://api.routero.ai/memory/session/user-alice/facts"

Org scoping and isolation

Memory sessions belong to the organization of the creating key. Sessions are IDOR-protected: a key from org A cannot access or inject a session from org B. The memory_id is opaque to the upstream provider — it is stripped before the request is forwarded.


Management API

Endpoint Description
GET /memory/engines List available memory engine types
POST /memory/session/create Create a memory session
GET /memory/sessions List all sessions in workspace
GET /memory/session/{id} Get session details
PATCH /memory/session/{id} Update session config
DELETE /memory/session/{id} Delete session and all stored facts
POST /memory/session/add Manually ingest facts
POST /memory/session/search Query the session
GET /memory/session/{id}/facts List all stored facts

Dependencies

Engine Required packages Required infrastructure
Mem0 mem0ai Postgres + pgvector
Cognee cognee Neo4j + Postgres + pgvector

Both are available in Private Deployments — see Reference Architecture for infrastructure requirements.


Internal cost accounting

Embedding and extraction calls made by the memory subsystem (for storage and retrieval) route back through the proxy under an internal service-account key. These costs are tracked as platform spend — not charged to the calling user’s key — and are visible in the billing dashboard under Internal / Platform.