Gateway Security#

Engineering whitepaper · Scutum · 2026

What this is

A description of how Scutum is built to be operated safely as a piece of production AI infrastructure. It documents the threat model we design against, the boundaries we trust, the boundaries we don't, and the operator-side controls available at each layer. Customer-facing security review uses this plus our SOC 2 / DPA documentation. We update it on changes; the dated header tracks the revision.

Threat model#

Scutum sits between three classes of caller and three classes of callee:

┌─ Application traffic ─────────────────┐                    ┌─ External providers ─────────────┐
│  Browser / mobile / server-side apps │ ───► Scutum proxy ──► OpenAI / Anthropic / Bedrock /  │
│  (untrusted code, untrusted PII)      │      (LiteLLM)       Google / xAI / DeepSeek / Azure │
└────────────────────────────────────────┘                    └──────────────────────────────────┘
                  │                              ▲
                  ▼                              │
┌─ Operator traffic ─────────────────────┐       │
│  Admin Console (browser session)       │ ───► Admin-API ────► PostgreSQL (audit, config,
│  Operator CLIs (curl / scutum CLI)     │      (FastAPI)       leads, licenses, etc.)
└────────────────────────────────────────┘                    │
                                                              │  Redis (cache, rate limit)
┌─ Internal services ────────────────────┐                    │
│  Cost predictor, budget webhook, SRE   │ ───► Internal HTTP, X-Service-Key gated
│  agent, workflow engine, A2A runtime    │       (separate auth boundary from the user-facing API)
└────────────────────────────────────────┘

We assume:

Adversarial application traffic. Anything in a request body might be a prompt-injection payload, a credential-exfiltration attempt, an SSRF probe, or trivially malformed.
Trusted operators, with the operator's session cookie or admin JWT as the single-factor binding. Multi-factor lives in the SSO IdP, not Scutum.
External providers are honest but not infallible. They occasionally return wrong responses, leak prompts via shared infrastructure, or make undocumented breaking changes.
Internal services are trusted because of the X-Service-Key boundary — they cannot reach Scutum from the public network without that key. Compromise of any internal service is treated as compromise of the platform.
The host is not a hostile multi-tenant. Self-hosted Scutum runs on infrastructure you control. We do not defend against a co-tenant root user reading process memory; that's the host platform's job.

The non-goals: defending against compromised hardware, the provider intentionally lying about completions, or operator collusion (multiple admins working together to bypass audit). Those are out of scope for what software at the gateway layer can do.

Trust boundaries#

Five places a request crosses a boundary that needs explicit handling:

Boundary 1 — public TLS terminator#

In production, requests arrive at TLS terminator — Cloudflare, NLB, your nginx — before reaching any Scutum container. We require TLS 1.2+ with modern ciphers. The terminator is responsible for the certificate; Scutum doesn't manage cert rotation. If you self-terminate (e.g., nginx sidecar inside the same host), use a real cert from Let's Encrypt or your CA — don't ship with localhost.pem.

The terminator also passes the real client IP via X-Forwarded-For. Admin-api respects it for rate-limit accounting and for the audit log's actor_ip field. Do not expose admin-api directly without a terminator that injects this header; the rate limiter will key on the load-balancer IP and a single misbehaving client takes everyone down.

Boundary 2 — application authentication (Bearer token)#

Every /v1/* request to the proxy and every /api/v1/* request to admin-api carries a Bearer token. There are three classes:

Master SCUTUM_API_KEY. Generated by the installer, stored in config/.env. Compromise is total — every endpoint, every team. Rotate via Admin Console (creates a new master key, revokes the old) on a quarterly cadence; immediately if leaked.
Per-team / per-user keys. Created from the Admin Console; scoped to a team's allowed models, budget, and rate limit. Stored in Postgres hashed; the plaintext is shown once on creation. Compromise is bounded by the scope.
Admin JWT. Issued by /auth/login after SSO or master-key authentication. Short-lived (15-minute idle, 8-hour absolute by default; configurable). Used for Admin Console + privileged API calls.

The master key and per-team keys are validated by LiteLLM directly (it owns the LiteLLM_VerificationToken table). Admin JWTs are validated by admin-api against JWT_SECRET_KEY. No key class is ever logged; redaction is on by default in OpenTelemetry export.

Boundary 3 — internal-service auth (X-Service-Key)#

Service-to-service calls (cost predictor → admin-api, budget webhook → cost predictor, SRE agent → admin-api, …) authenticate via a single INTERNAL_SERVICE_KEY in config/.env. The header name is X-Service-Key.

We treat this as a coarse ring-zero credential. If it leaks, every internal service is suspect. Two design decisions follow:

Internal services don't accept user traffic. They listen only on the Compose network, not on host ports outside development. Production Compose binds them to the internal Docker network; production Kubernetes binds them to ClusterIP. The internal key is therefore not a key the public internet can present.
The key is not used for fine-grained authz. It only crosses the "is this a Scutum component" boundary. Authz inside the platform uses the user's identity from the upstream Bearer token, which the calling service forwards.

Boundary 4 — provider API keys#

Provider keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, …) live in config/.env and are read by LiteLLM at call time. They never enter the request path your application sees, never appear in audit logs, never appear in OpenTelemetry traces (LiteLLM strips them; we have a separate sanitiser in our OTel pipeline as defence-in-depth).

If you need per-team provider keys (BYO keys — a customer's OPENAI_API_KEY for their own usage), use LiteLLM's per-key model overrides in the Admin Console rather than putting customer keys in your central .env. Per-key overrides are stored encrypted at rest (AES-GCM via cryptography); rotation is via the same UI.

Boundary 5 — license validation#

Customer-side license JWTs are validated offline against a public key bundled in the admin-api image (config/license-public.pem). The matching private key lives only on the issuer's laptop. Two failure modes we explicitly defend against:

Tampered JWT. Caught by the Ed25519 signature check in pyjwt; rejected with structured error.
Substituted public key (an attacker rebuilds the admin-api image with their own public key, mints a license against their key). The bundled key ships as part of the published image; substituting it means rebuilding admin-api from source, which voids your support and is detectable in audit logs (the image digest changes).

The license validator is soft-fail: an expired or missing license never crashes admin-api, only triggers the renewal banner and operator-visible state. The threat model treats license bypass as a commercial concern, not an availability concern — the platform stays up regardless.

Audit-log integrity#

Every admin mutation writes to audit_logs with actor_id, actor_email, actor_ip, action, resource_type, resource_id, changes (JSONB), and timestamp. The table is append-only by convention; we do not provide a UI or API path that deletes or edits audit rows.

A privileged Postgres user could directly mutate the table from psql. To defend against this we recommend:

A separate Postgres role that admin-api connects as, with INSERT, SELECT on audit_logs but no UPDATE, DELETE.
An audit-log archiver job that periodically copies rows older than 30 days to immutable object storage with a retention lock (S3 Object Lock in compliance mode, GCS Bucket Lock).

Tamper-evidence beyond that — Merkle-chained audit hashes, third-party witnesses — is on our roadmap but not shipped in v0.1. If you need cryptographic non-repudiation today, write to us.

Data minimisation#

What Scutum stores in Postgres:

Configuration: keys (hashed), teams, budgets, model access, prompts, guardrails, MCP servers, A2A agents, routing policies. None contains user content beyond what the operator typed.
Audit logs: who-did-what at the configuration layer. We do not log request bodies or response bodies into audit_logs. They route through OpenTelemetry traces (your decision whether to retain those, and for how long, lives in your OTel collector config).
LiteLLM SpendLogs: per-request cost, latency, model, team, and truncated prompt/response previews (configurable, off by default for prompt content). The full request body is never stored if LITELLM_LOG_REQUEST_RESPONSE is unset, which is the default we ship.
Demo requests: only on the scutum.dev marketing-profile deployment. The customer-safe release compose excludes the landing-backend service entirely; customers' deployments don't have a demo-request endpoint to attack. (This is enforced by the marketing-vs-customer profile gate, which is the highest-impact security boundary in the codebase.)

What Scutum never stores:

Provider API keys in any database (only in env / secrets manager).
Request payloads to /v1/chat/completions in audit-log structured columns.
Browser cookies, IP addresses beyond actor_ip on admin actions.

Common attacks and the platform's response#

Attack	Where it hits	Defence
Prompt injection in a user-supplied document	LLM proxy	Out-of-scope at the platform layer (it's a model-side problem). Mitigations: ship guardrails (regex, DLP, semantic-similarity, model-based) on inbound prompts; the application is responsible for not blindly executing model output.
API key stuffing (brute-forcing valid Bearer tokens)	LiteLLM proxy	Per-IP rate limiting (Redis-backed) on `/v1/*`. 401 responses don't leak whether the key was malformed vs. revoked vs. unknown.
DoS via expensive prompts (huge `max_tokens`, deep agent loops)	LiteLLM proxy	Pre-call cost prediction (`cost-predictor`) gates request size against per-team budgets. `MAX_REQUEST_SIZE_BYTES` middleware (default 1 MiB) caps pathological bodies before parsing.
SSRF via a request URL field	Admin API (MCP/A2A configs)	Cedar policies on `mcp.url` reject loopback, link-local, and metadata-service IPs. Validation runs at config-write time, not call time, so misconfiguration fails early.
Audit log poisoning (forging `actor_ip`, `actor_id`)	Admin API	`actor_id` reads from JWT claims (signed); `actor_ip` reads from `X-Forwarded-For` only when a trusted proxy is configured. Untrusted ingress is rejected at the load balancer.
License signature substitution	Admin API license validator	Public-key in image; substitution requires rebuilding from source.
PII exfiltration via logs	OpenTelemetry pipeline	`redact_keys` config in `otel-collector` strips Authorization headers, X-Service-Key, and a regex-matched set of likely PII fields (emails, SSN-shape, credit-card-shape) before export. Operator-extensible.
Per-team budget bypass by switching `team_id` mid-request	LiteLLM proxy	Team identity is bound to the API key, not the request body. Switching teams requires switching keys, which requires admin action, which audit-logs.

What an enterprise security review usually asks, and where to find the answer#

When a buyer's CISO sends us their security questionnaire, the answers cluster:

"Where does our data go?" Self-hosted on your infrastructure. Provider API calls go to whichever providers you configure. We — Scutum the company — never see your data.
"What's your SOC 2 status?" SOC 2 Type I planned for Q3 2026; Type II Q1 2027. We document compensating controls in the meantime.
"How do you handle vulnerabilities in upstream components?" We track LiteLLM, FastAPI, asyncpg, the Python and Node base images. Critical CVEs trigger an out-of-band patch release within 7 days; high-severity within 30. Our changelog (CHANGELOG.md) lists CVE references on relevant releases.
"Can we audit a build?" Image labels include org.opencontainers.image.source (this repo) and org.opencontainers.image.version. We sign release images with cosign on the roadmap; not yet shipped.
"Do you have a bug bounty?" Not yet. Responsible disclosures to [email protected] get a response within 48h and a coordinated patch within the windows above.

What you should rotate, and how often#

Credential	Recommended rotation	Trigger
`JWT_SECRET_KEY`	Quarterly	Suspected admin-account compromise
`INTERNAL_SERVICE_KEY`	Quarterly	Suspected internal-service compromise
`POSTGRES_PASSWORD`	Annually	Database access change
Per-team API keys	When a team member departs	Whenever a holder of the key leaves
Master `SCUTUM_API_KEY`	When leaked	Otherwise leave alone — rotation breaks every customer integration
License JWT	At renewal (we email 14 days before)	We email; you POST `/api/v1/license/activate`

The Admin Console has a "Rotate" action for the first three. Postgres password rotation requires a brief restart; the others are zero-downtime.

Reporting issues#

[email protected] for vulnerabilities. PGP key on the website (planned — currently use email + signal at our published number for sensitive disclosures). We respond within 48 hours, coordinate disclosure timing, and credit reporters in release notes if they consent.

For non-security operational questions: [email protected].