Memory-as-a-Service

Memory-as-a-Service (MaaS) turns the gateway into a memory provider. Applications get personalization and long-term context without operating their own vector store or graph database — just pass a memory_id on the request.

How it works

Pre-call (retrieval): The gateway takes the latest user message, searches the memory session for the top-3 relevant facts, and injects them into the system message as:

[Past Context for ID: user-alice]
- Prefers summaries under 200 words
- Working on Q3 APAC analysis
- Last session: discussed Bedrock pricing

Post-call (storage): After the LLM responds, the gateway asynchronously stores the (user message, assistant response) turn in the memory backend. Pass store_memory: false to skip storage on a specific request.

Activation

response = client.chat.completions.create(
    model="smart/balanced",
    messages=[{"role": "user", "content": "Remind me where we left off."}],
    extra_body={
        "memory_id": "user-alice",
        "store_memory": True,          # default — omit to use default
    },
)

Memory engines

Engine	Backend	Best for
Mem0	Postgres + pgvector	User preferences, recent facts, short-to-medium semantic recall
Cognee	Neo4j + pgvector + Postgres	Entity/relationship knowledge, long-horizon reasoning, knowledge graph queries

Choose the engine when creating the memory session. Sessions cannot change engine after creation.

Mem0 queries use keyword vs. question heuristics and fact deduplication to reduce redundant storage.

Cognee supports SearchType.GRAPH_COMPLETION, CHUNKS, and SUMMARIES — with graph→vector search fallback for robustness. Deletion cleans up Neo4j, PGVector, and Postgres atomically.

Creating a memory session

curl -X POST https://api.routero.ai/memory/session/create \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session_name": "user-alice",
    "engine_name": "mem0"
  }'

The returned memory_id is what callers pass on requests.

Manual fact management

You can ingest facts directly (without going through a chat turn) and query the session programmatically:

# Ingest a fact manually
curl -X POST https://api.routero.ai/memory/session/add \
  -d '{"memory_id": "user-alice", "messages": [{"role": "user", "content": "My team is in Singapore."}]}'

# Search the session
curl -X POST https://api.routero.ai/memory/session/search \
  -d '{"memory_id": "user-alice", "query": "location preferences"}'

# List all stored facts
curl "https://api.routero.ai/memory/session/user-alice/facts"

Org scoping and isolation

Memory sessions belong to the organization of the creating key. Sessions are IDOR-protected: a key from org A cannot access or inject a session from org B. The memory_id is opaque to the upstream provider — it is stripped before the request is forwarded.

Management API

Endpoint	Description
`GET /memory/engines`	List available memory engine types
`POST /memory/session/create`	Create a memory session
`GET /memory/sessions`	List all sessions in workspace
`GET /memory/session/{id}`	Get session details
`PATCH /memory/session/{id}`	Update session config
`DELETE /memory/session/{id}`	Delete session and all stored facts
`POST /memory/session/add`	Manually ingest facts
`POST /memory/session/search`	Query the session
`GET /memory/session/{id}/facts`	List all stored facts

Dependencies

Engine	Required packages	Required infrastructure
Mem0	`mem0ai`	Postgres + pgvector
Cognee	`cognee`	Neo4j + Postgres + pgvector

Both are available in Private Deployments — see Reference Architecture for infrastructure requirements.

Internal cost accounting

Embedding and extraction calls made by the memory subsystem (for storage and retrieval) route back through the proxy under an internal service-account key. These costs are tracked as platform spend — not charged to the calling user’s key — and are visible in the billing dashboard under Internal / Platform.