API Integration Guide#

Scutum exposes an OpenAI-compatible LLM endpoint on port 4000. Any application that works with the OpenAI API can connect by changing the base URL and API key — no other code changes required.

Authentication#

All requests to the Scutum proxy require a Bearer token in the Authorization header. Use SCUTUM_API_KEY from your config/.env file, or a per-team / per-user scoped key created from the Admin Console:

Authorization: Bearer $SCUTUM_API_KEY

For production, generate per-user or per-team API keys through the Admin UI or Admin API at http://localhost:8086.

Base URL#

The Scutum proxy listens on port 4000 of whichever host you deployed to. Examples:

Environment	Base URL
Local	`http://localhost:4000`
Your VM	`http://<your-host>:4000` (or behind your TLS terminator)
Admin UI	`http://<your-host>:5173`

All OpenAI-compatible endpoints live under /v1/: - POST /v1/chat/completions -- chat completions (streaming and non-streaming) - POST /v1/embeddings -- text embeddings - GET /v1/models -- list available models

Code Examples#

curl#

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SCUTUM_API_KEY" \
  -d '{
    "model": "claude-sonnet-4.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "max_tokens": 256,
    "temperature": 0.7
  }'

Python (OpenAI SDK)#

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000/v1",
    api_key="$SCUTUM_API_KEY",
)

response = client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=256,
    temperature=0.7,
)

print(response.choices[0].message.content)

TypeScript (OpenAI SDK)#

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4000/v1",
  apiKey: "$SCUTUM_API_KEY",
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4.5",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  max_tokens: 256,
  temperature: 0.7,
});

console.log(response.choices[0].message.content);

Go (OpenAI SDK)#

package main

import (
    "context"
    "fmt"
    openai "github.com/sashabaranov/go-openai"
)

func main() {
    config := openai.DefaultConfig("$SCUTUM_API_KEY")
    config.BaseURL = "http://localhost:4000/v1"

    client := openai.NewClientWithConfig(config)

    resp, err := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: "claude-sonnet-4.5",
            Messages: []openai.ChatCompletionMessage{
                {Role: "system", Content: "You are a helpful assistant."},
                {Role: "user", Content: "What is the capital of France?"},
            },
            MaxTokens:   256,
            Temperature: 0.7,
        },
    )
    if err != nil {
        panic(err)
    }

    fmt.Println(resp.Choices[0].Message.Content)
}

Model Aliases#

Instead of specifying a provider-specific model, you can use group aliases that route across multiple providers automatically:

Alias	Routes to	Best for
`fast`	gpt-5-mini, claude-haiku-4.5, gemini-3-flash, grok-3-mini	Low-latency responses
`smart`	gpt-5, claude-sonnet-4.5, gemini-3-pro, grok-4	Balanced quality & speed
`powerful`	gpt-5.2, claude-opus-4.5, o3-pro, grok-4-heavy	Maximum capability
`reasoning`	o3, o3-pro, deepseek-r1	Complex reasoning tasks
`coding`	claude-sonnet-4.5, deepseek-coder, codellama	Code generation & review
`cost-effective`	gpt-5-mini, claude-haiku-4.5, gemini-2.5-flash-lite, deepseek-v3	Budget-friendly

# Use a group alias -- the gateway picks the best available model
response = client.chat.completions.create(
    model="fast",
    messages=[{"role": "user", "content": "Summarize this text..."}],
)

Provider-specific aliases are also available: openai, anthropic, google, xai, deepseek, bedrock, vertex, azure.

Streaming#

Enable streaming by setting stream: true. The gateway returns Server-Sent Events (SSE) in the standard OpenAI format:

Python Streaming#

stream = client.chat.completions.create(
    model="claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Write a short poem about APIs."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

TypeScript Streaming#

const stream = await client.chat.completions.create({
  model: "claude-sonnet-4.5",
  messages: [{ role: "user", content: "Write a short poem." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Embeddings#

Generate text embeddings through the same gateway:

response = client.embeddings.create(
    model="azure-text-embedding-3-large",
    input="The quick brown fox jumps over the lazy dog",
)

vector = response.data[0].embedding
print(f"Embedding dimension: {len(vector)}")

Available embedding models include azure-text-embedding-3-large and azure-text-embedding-3-small. These require AZURE_API_KEY and AZURE_API_BASE to be set in config/.env.

Error Handling#

The gateway returns standard HTTP error codes with JSON bodies:

Status	Meaning	Common cause
400	Bad Request	Invalid JSON or missing required fields
401	Unauthorized	Missing or invalid API key
404	Not Found	Unknown model name
429	Rate Limited	Too many requests (RPM/TPM exceeded)
500	Internal Server Error	Upstream provider failure
503	Service Unavailable	All providers in fallback chain failed

Python Error Handling#

from openai import OpenAI, APIError, RateLimitError, APIConnectionError

client = OpenAI(
    base_url="http://localhost:4000/v1",
    api_key="$SCUTUM_API_KEY",
)

try:
    response = client.chat.completions.create(
        model="gpt-5",
        messages=[{"role": "user", "content": "Hello"}],
    )
except RateLimitError:
    print("Rate limited -- back off and retry")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")
except APIConnectionError:
    print("Could not connect to the gateway")

Rate Limits#

The gateway enforces rate limits at multiple levels:

Global: 1000 requests/minute (configurable in Admin UI > Settings)
Per key: 100 RPM and 100,000 TPM by default (configurable per API key)
Per model: Provider-specific limits are respected automatically

When rate limited, the response includes a Retry-After header. The OpenAI SDKs handle this automatically with exponential backoff.

Fallback Chains#

When a primary model is unavailable or returns an error, the gateway automatically tries fallback models. For example, a request to gpt-5 will fall back to gpt-5.2, then claude-opus-4.5, then grok-4.

Fallback happens transparently -- your application receives a response without needing to implement retry logic. The response metadata includes the actual model used:

response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}],
)

# Check which model actually served the request
print(response.model)  # Could be "gpt-5" or a fallback like "claude-opus-4.5"

Key fallback chains configured by default:

Primary Model	Fallback Sequence
gpt-5	gpt-5.2, claude-opus-4.5, grok-4
claude-opus-4.5	claude-sonnet-4.5, gpt-5, grok-4
claude-sonnet-4.5	claude-opus-4.5, gpt-5, gemini-3-pro
gemini-3-pro	gemini-2.5-pro, claude-sonnet-4.5, gpt-5
deepseek-r1	o3, deepseek-v3

Admin API#

The Admin API at http://localhost:8086 (production: https://api.aicontrolplane.dev/admin/api/v1) provides management endpoints for programmatic configuration. It uses JWT authentication:

# Get a JWT token
TOKEN=$(curl -s http://localhost:8086/auth/login \
  -H "Content-Type: application/json" \
  -d '{"api_key": "$SCUTUM_API_KEY"}' | jq -r '.access_token')

Available Endpoints#

Method	Endpoint	Description
GET	`/api/v1/models`	List all model configurations
PUT	`/api/v1/models/{model_id}`	Update a model configuration
GET	`/api/v1/budgets`	List all budgets
POST	`/api/v1/budgets`	Create a budget
PUT	`/api/v1/budgets/{id}`	Update a budget
GET	`/api/v1/teams`	List all teams
POST	`/api/v1/teams`	Create a team
PUT	`/api/v1/teams/{team_id}`	Update a team
DELETE	`/api/v1/teams/{team_id}`	Delete a team
GET	`/api/v1/guardrail-assignments`	List guardrail-to-team assignments
GET	`/api/v1/keys`	List all API keys
POST	`/api/v1/keys/generate`	Generate a new API key
POST	`/api/v1/keys/update`	Update an API key
POST	`/api/v1/keys/delete`	Delete API keys
GET	`/api/v1/mcp-servers`	List MCP server configs
POST	`/api/v1/mcp-servers`	Create an MCP server config
PUT	`/api/v1/mcp-servers/{id}`	Update an MCP server config
POST	`/api/v1/mcp-servers/{id}/test`	Test MCP server connectivity
GET	`/api/v1/mcp-servers/sync/preview`	Preview Agent Gateway config
POST	`/api/v1/mcp-servers/sync`	Deploy MCP configs to Agent Gateway
GET	`/api/v1/workflows`	List workflow definitions
POST	`/api/v1/workflows`	Create a workflow
POST	`/api/v1/workflow-executions`	Execute a workflow
GET	`/api/v1/workflow-executions`	List workflow executions
GET	`/api/v1/workflow-executions/{id}`	Get execution details
GET	`/api/v1/routing-policies`	List routing policies
POST	`/api/v1/routing-policies`	Create a routing policy
GET	`/api/v1/metrics/realtime`	Get real-time platform metrics
GET	`/api/v1/settings`	Get platform settings
PUT	`/api/v1/settings`	Update platform settings

# Example: List models
curl http://localhost:8086/api/v1/models \
  -H "Authorization: Bearer $TOKEN"

# Example: Generate an API key
curl -X POST http://localhost:8086/api/v1/keys/generate \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"key_alias": "my-service", "max_budget": 100, "duration": "90d"}'

# Example: Preview Agent Gateway config
curl http://localhost:8086/api/v1/mcp-servers/sync/preview \
  -H "Authorization: Bearer $TOKEN"

See the Admin UI Guide for the full web console walkthrough.