LiteLLM Integration#
How the AI Control Plane uses LiteLLM, which features are pre-configured and exposed through the Admin UI, and which features are available for adoption.
Our Approach#
LiteLLM v1.80+ has dozens of powerful features. Most deployments use less than 20% of them because configuration is complex, scattered across YAML files, and undocumented for specific use cases. This platform pre-wires the most valuable features with sensible defaults and exposes them through a unified UI.
What We Pre-Configure#
| Feature | LiteLLM Config | Platform Value-Add |
|---|---|---|
| 85+ models across 9 providers | model_list in config.yaml |
Pre-configured with pricing, RPM/TPM limits, and cross-provider fallback chains |
| Usage-based routing | router_settings.routing_strategy |
Pre-set with RPM/TPM limit checking and pre-call validation |
| Fallback chains | fallbacks in config.yaml |
Pre-built cross-provider chains (GPT-5 → Claude → Grok, etc.) |
| Model group aliases | Model groups in config.yaml | 15 semantic aliases: fast, smart, powerful, reasoning, coding, cost-effective + per-provider (openai, anthropic, google, xai, deepseek, bedrock, vertex, azure, local) |
| Redis semantic caching | cache_params |
Pre-configured with redis-semantic type, 0.92 similarity threshold, 1-hour TTL |
| OpenTelemetry + Prometheus | success_callback, failure_callback |
Pre-wired to OTEL collector + Prometheus, feeding pre-built Grafana dashboards |
| Guardrails | guardrails in config.yaml |
Custom pre-call guardrail handler, manageable via Admin UI |
| Budget defaults | budget_config |
Global soft/hard limits + per-key defaults, plus budget webhook for alerts |
| Health checks | background_health_checks |
Every 2 hours, auto-excludes unhealthy models from routing |
| Retry with backoff | retry_policy |
3 retries with exponential backoff before fallback chain triggers |
What the Admin UI Exposes#
The React dashboard exposes 20 pages — some proxy to LiteLLM, others are powered entirely by the Admin API:
| Admin UI Page | Backend | What You Can Do |
|---|---|---|
| Dashboard | LiteLLM /spend/report |
Today/week/month spend, top models, request counts |
| Models | LiteLLM /model/* |
View all models, add new ones, delete unused |
| API Keys | LiteLLM /key/* |
Create keys with budgets, rate limits, model restrictions, expiry |
| Teams | LiteLLM /team/* |
Create teams with isolated budgets and model access |
| Budgets | LiteLLM /budget/* |
Create reusable budget profiles |
| Organizations | Admin API | Multi-tenant org hierarchy with business units, SSO, member roles |
| Audit Log | Admin API | Filterable activity logs with CSV/JSON export |
| Prompts | Admin API | Versioned prompt templates with rendering, approval workflows |
| Rate Limits | Admin API | Per-user/team/model rate policies with burst control |
| Model Access | Admin API | Tiered access with approval workflows and grant durations |
| Chargeback | Admin API | Cost allocation rules, chargeback reports, budget forecasting |
| SLA Monitor | Admin API | Provider health, SLA definitions, violations, failover rules |
| A/B Tests | Admin API | Model comparison with traffic splitting and metric collection |
| Events | Admin API | Event subscriptions with webhook/Slack/email delivery |
| Routing | Admin API + LiteLLM sync | Fallback chains, model groups, routing strategies |
| MCP Servers | Admin API + Agent Gateway | MCP server config, connectivity testing, deployment |
| A2A Agents | Admin API | Agent-to-Agent endpoint configuration |
| Guardrails | Admin API | PII detection, toxicity, prompt injection, DLP detectors |
| Workflows | Admin API + Workflow Engine | LangGraph workflow templates and executions |
| Settings | Admin API | Default model, global rate limit, caching, maintenance mode |
LiteLLM Features We Surface#
Cost Tracking & FinOps#
LiteLLM tracks spend per request automatically. We build on this with:
- Dashboard page — real-time cost/request/token charts
- Budget webhook — soft limits trigger Slack/PagerDuty/email alerts, hard limits block requests
- Cost predictor — per-request cost estimates using tiktoken + model pricing tables
- Grafana dashboards — FinOps cost tracking dashboard with trend analysis
Guardrails#
LiteLLM v1.79+ has built-in content filtering (PII, bias, toxicity). We expose this through:
- Admin UI guardrails page — create named configurations with toggles for each scanner
- Per-team assignment — assign different guardrail configs to different teams
- Event logging — guardrail violations logged with risk scores and actions taken
Observability#
LiteLLM emits OTEL traces and Prometheus metrics. We pre-wire:
- OTEL Collector — receives traces from LiteLLM + Agent Gateway + Admin API
- Prometheus — scrapes metrics from all services
- Grafana — pre-built dashboards for platform overview, FinOps, infrastructure
- Jaeger — distributed tracing UI for debugging request flows
Caching#
LiteLLM supports multiple cache types. Pre-configured with semantic caching:
cache: true
cache_params:
type: "redis-semantic"
host: "redis"
port: 6379
ttl: 3600
namespace: "litellm"
similarity_threshold: 0.92
redis_semantic_cache_embedding_model: "text-embedding-3-small"
This caches responses by embedding similarity, so paraphrased prompts return cached results. Toggled on/off via Admin UI Settings page (enable_caching). See the Semantic Caching Guide for details.
Additional LiteLLM Features#
LiteLLM has additional native features beyond what we pre-configure. Some we already cover through the Admin API; others are available to enable directly.
Already Covered by the Admin API#
These LiteLLM features have equivalents built into the platform — no additional configuration or Enterprise license needed:
| LiteLLM Feature | Our Equivalent (Admin API) |
|---|---|
| Prompt Studio | Prompt registry with versioning, rendering, and approval workflows (/api/v1/prompts) |
| MCP permission management | MCP server CRUD with connectivity testing and Agent Gateway deployment (/api/v1/mcp-servers) |
| Granular RBAC (Enterprise) | Cedar policies + org member roles + model access tiers with approval workflows (/api/v1/model-access/tiers) |
| SSO (Enterprise) | Full OIDC SSO per organization (/api/v1/organizations/{org_id}/sso) — Okta, Google, Azure AD |
| Per-team guardrails (Enterprise) | Guardrail configs assigned per team (/api/v1/guardrails/{id}/assign/{team_id}) |
| Tag budgets (Enterprise) | Team budgets + cost allocation rules with cost centers (/api/v1/cost-allocation/rules) |
| Audit logs (Enterprise) | Filterable audit logs with CSV/JSON export (/api/v1/audit-logs) |
| Dynamic rate limiter (Enterprise) | Per-user/team/model rate policies with burst multipliers and pre-flight checks (/api/v1/rate-limits) |
Available to Enable (LiteLLM native)#
These LiteLLM-native features are not yet exposed in the platform but can be enabled with minimal effort:
| Feature | Effort | Value |
|---|---|---|
| Semantic caching (Qdrant) | Config change | Alternative to Redis semantic caching using Qdrant vector DB |
| Slack/Discord alerting | Config change | Real-time alerts for slow responses, error spikes, budget thresholds |
| Tag-based routing | Config change | Route requests by metadata (production vs dev, priority tiers) |
| Pass-through endpoints | Config change | Direct provider API access with cost tracking |
| Langfuse integration | Add callback + deploy Langfuse | Prompt tracing, evaluation, and analytics |
| Batch API | Enable endpoint | 50% cost reduction for bulk processing |
| Traffic mirroring | Config change | Shadow production traffic to evaluate new models |
| Key rotation | Config + secret manager | Automatic credential rotation |
Configuration Reference#
Model Routing#
From config/litellm/config.yaml:
router_settings:
routing_strategy: "usage-based-routing"
routing_strategy_args:
ttl: 60
rpm_limit_check: true
tpm_limit_check: true
enable_pre_call_checks: true
Fallback Chains#
fallbacks:
- "gpt-5": ["gpt-5.2", "claude-opus-4.5", "grok-4"]
- "claude-opus-4.5": ["claude-sonnet-4.5", "gpt-5", "grok-4"]
- "gemini-3-pro": ["gemini-2.5-pro", "claude-sonnet-4.5", "gpt-5"]
Budget Configuration#
budget_config:
global_budget:
soft_budget: 1000.00
max_budget: 1500.00
budget_duration: "monthly"
default_key_config:
max_budget: 100.00
budget_duration: "monthly"
rpm_limit: 100
tpm_limit: 100000
Observability Callbacks#
litellm_settings:
success_callback: ["otel", "prometheus"]
failure_callback: ["otel", "prometheus"]
service_callback: ["prometheus"]
Migration from Standalone LiteLLM#
If you're already running LiteLLM standalone:
Step 1: Merge Your Config#
The platform's config/litellm/config.yaml uses the same format. Add your custom models alongside the 85+ pre-configured ones.
Step 2: Set Environment Variables#
Move API keys to config/.env:
Step 3: Start the Platform#
Your existing OpenAI-compatible client code doesn't change:
Step 4: Enable Additional Services#
docker compose --profile observability up -d # Grafana, Prometheus, Jaeger
docker compose --profile finops up -d # Cost predictor, budget webhook
docker compose --profile workflows up -d # Temporal, LangGraph
Integrating with an Existing LiteLLM Instance#
If you already run LiteLLM in production and want to add the platform's governance, FinOps, and UI features without replacing your existing proxy, you can point the platform at your running instance instead of using the bundled one.
What Changes#
The bundled litellm container is replaced by your existing deployment. All platform services that talk to LiteLLM (Admin API, workflow engine, cost predictor, budget webhook, A2A runtime) are redirected to your instance via environment variables.
Step 1: Set Environment Variables#
In config/.env, point to your existing LiteLLM:
# Your existing LiteLLM instance
LITELLM_URL=https://litellm.internal.example.com:4000
LITELLM_MASTER_KEY=sk-your-existing-master-key
Step 2: Disable the Bundled LiteLLM Service#
Create a docker-compose.override.yaml in the project root:
services:
litellm:
profiles: ["disabled"] # Prevents this service from starting
admin-api:
environment:
LITELLM_URL: ${LITELLM_URL}
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY}
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
# Remove litellm dependency — the override replaces the entire depends_on
workflow-engine:
environment:
LITELLM_URL: ${LITELLM_URL}
LITELLM_API_KEY: ${LITELLM_MASTER_KEY}
depends_on:
postgres:
condition: service_healthy
# Remove litellm dependency
a2a-runtime:
environment:
LITELLM_URL: ${LITELLM_URL}
cost-predictor:
environment:
LITELLM_URL: ${LITELLM_URL}
budget-webhook:
environment:
LITELLM_URL: ${LITELLM_URL}
Step 3: Ensure Network Connectivity#
Your existing LiteLLM must be reachable from the Docker network. Options:
- Same Docker network: Add
external: truetogateway-networkand connect your LiteLLM container. - Host network: Use
host.docker.internal(macOS/Windows) or172.17.0.1(Linux) if LiteLLM runs on the host. - Remote: Use the full URL (e.g.,
https://litellm.internal.example.com:4000). Ensure the Docker containers can reach it.
Step 4: Verify#
docker compose --env-file config/.env up -d
# Confirm Admin API can reach your LiteLLM
curl http://localhost:8086/health
# Should return {"status": "ok"}
# Confirm models are visible through the Admin API
TOKEN=$(curl -s http://localhost:8086/auth/login \
-d '{"api_key":"sk-your-existing-master-key"}' \
-H 'Content-Type: application/json' | jq -r .access_token)
curl http://localhost:8086/api/v1/models \
-H "Authorization: Bearer $TOKEN" | jq length
What Works#
All platform features work with an external LiteLLM instance:
| Feature | Status | Notes |
|---|---|---|
| Admin UI dashboard | Works | Reads spend/metrics from LiteLLM's API |
| Model/key/team management | Works | Proxies to your LiteLLM's /model/*, /key/*, /team/* |
| Guardrails | Works | Configured via Admin API, applied via LiteLLM's guardrail hooks |
| FinOps (cost prediction, budgets) | Works | Cost predictor calls your LiteLLM for model info |
| Workflows | Works | Workflow engine sends LLM calls to your LiteLLM |
| Semantic caching | Depends | Uses your LiteLLM's cache config — ensure redis-semantic is configured |
| Observability | Partial | Platform dashboards work if your LiteLLM emits Prometheus metrics to the same endpoint |
What Does Not Work#
- Config file management: The platform cannot edit your LiteLLM's
config.yaml. Model definitions, fallback chains, and router settings must be managed in your existing config. - Health-gated startup: The bundled setup waits for LiteLLM to be healthy before starting the Admin API. With an external instance, services start immediately — ensure your LiteLLM is already running.
Related Guides#
- Comparison — how the platform compares to alternatives
- Agent Gateway Deep Dive — Agent Gateway integration
- Cost Management — budgets and FinOps