AI Control Plane Demo Runbook#

Quick Status Check#

# Check all services
docker compose ps

# Expected: 22 services (with --profile full), all healthy

Service Endpoints#

Service	URL	Credentials
Landing Page	http://localhost:9999	-
Admin UI	http://localhost:5173	API Key: `$LITELLM_KEY`
Admin API (Swagger)	http://localhost:8086/docs	-
LiteLLM API	http://localhost:4000	`$LITELLM_KEY`
LiteLLM UI	http://localhost:4000/ui	`$LITELLM_KEY`
Playground	http://localhost:6001	-
Deck	http://localhost:6002	-
Docs Site	http://localhost:8089	-
Agent Gateway	http://localhost:9000	-
Grafana	http://localhost:3030	admin / admin@123
Prometheus	http://localhost:9090	-
Jaeger	http://localhost:16686	-
Vault	http://localhost:8200	Token: `root-token-for-dev`
Temporal UI	http://localhost:8088	-
Cost Predictor	http://localhost:8080	-
Budget Webhook	http://localhost:8081	-
Workflow Engine	http://localhost:8085	-

Admin UI Features (http://localhost:5173)#

Login: Use API key $LITELLM_KEY

What you can do:#

Feature	Description
Dashboard	Real-time metrics, cost charts, provider health
Models	View/configure model routing, access tiers, deprecation badges
API Keys	Generate and manage keys with budgets and model restrictions
Teams	Manage teams, members, and content policies
Budgets	Set spending limits per user/team/global
Organizations	Org hierarchy, business units, SSO/OIDC config, members
Audit Log	Filterable trail of every config change with export
Prompts	Versioned prompt templates with approval workflows
Rate Limits	Granular rate limiting per user/team/model
Model Access	Access tiers (standard/premium/experimental) with approvals
Chargeback	Cost allocation rules, reports, budget forecasts
SLA Monitor	Provider health, latency tracking, SLA violations, failover
A/B Tests	Model comparison with traffic splitting and metrics
Events	Event subscriptions (Slack, PagerDuty, email, webhooks)
MCP Servers	Configure MCP tool servers with deploy-to-gateway
A2A Agents	Agent-to-Agent protocol management
Guardrails	Content filtering, DLP detectors, safety rules
Workflows	LangGraph workflow templates and execution history
Settings	Platform-wide toggles, caching, maintenance mode

Admin API Endpoints (http://localhost:8086/docs)#

POST   /auth/login                          - Login with API key
GET    /auth/me                             - Get current user info
GET    /api/v1/models                       - List models
PUT    /api/v1/models/{id}                  - Update model config
GET    /api/v1/teams                        - List teams
POST   /api/v1/teams                        - Create team
GET    /api/v1/budgets                      - List budgets
POST   /api/v1/budgets                      - Create budget
GET    /api/v1/organizations                - List organizations
POST   /api/v1/organizations                - Create organization
GET    /api/v1/audit-logs                   - List audit events
GET    /api/v1/prompts                      - List prompt templates
POST   /api/v1/prompts                      - Create prompt template
GET    /api/v1/rate-limits                  - List rate limit policies
POST   /api/v1/rate-limits                  - Create rate limit policy
GET    /api/v1/model-access/tiers           - List access tiers
POST   /api/v1/model-access/requests        - Submit access request
GET    /api/v1/cost-allocation/rules        - List allocation rules
POST   /api/v1/chargeback/reports/generate  - Generate chargeback report
GET    /api/v1/sla/definitions              - List SLA definitions
GET    /api/v1/sla/health                   - Provider health status
GET    /api/v1/sla/violations               - List SLA violations
GET    /api/v1/ab-tests                     - List A/B tests
POST   /api/v1/ab-tests                     - Create A/B test
GET    /api/v1/events/subscriptions         - List event subscriptions
POST   /api/v1/events/subscriptions         - Create event subscription
GET    /api/v1/detectors                    - List DLP detectors
GET    /api/v1/cache/stats                  - Semantic cache stats
GET    /api/v1/guardrails                   - List guardrails
GET    /api/v1/mcp-servers                  - List MCP servers
GET    /api/v1/agents                       - List A2A agents
GET    /api/v1/workflows                    - List workflows
GET    /api/v1/deprecations                 - List model deprecations
GET    /api/v1/settings                     - Platform settings
GET    /api/v1/metrics/realtime             - Real-time metrics

Grafana Dashboards (http://localhost:3030)#

Login: admin / admin@123

Dashboard	URL	Data Source
AI Control Plane Overview	/d/ai-gateway-overview	Prometheus
FinOps Cost Tracking	/d/finops-cost-tracking	PostgreSQL

AI Control Plane Overview shows:#

Total Requests, Error Rate, P95 Latency
Total Spend, Total Tokens
Request Rate by Model (time series)
Latency Percentiles by Model
Request/Spend/Token Distribution (pie charts)

FinOps Cost Tracking shows:#

Today's/Week's/Total Spend
Token Usage (Input/Output)
Spend by Model (pie chart)
Requests by Model (pie chart)
Tokens by Model (pie chart)
Recent Requests Log (table)

Demo Scenario 1: Basic AI Chat#

Tests: LiteLLM proxy, model routing

# 1. List available models
curl http://localhost:4000/v1/models \
  -H "Authorization: Bearer $LITELLM_KEY" | jq '.data[].id'

# 2. Chat completion (OpenAI format)
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, what is 2+2?"}]
  }' | jq '.choices[0].message'

# 3. Try Claude model
curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LITELLM_KEY" \
  -d '{
    "model": "claude-3-5-sonnet",
    "messages": [{"role": "user", "content": "Explain quantum computing in one sentence"}]
  }' | jq '.choices[0].message'

Demo Scenario 2: Cost Prediction#

Tests: Cost predictor service, budget awareness

# 1. Predict cost for a request
curl http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a 500 word essay about AI"}],
    "max_tokens": 1000
  }' | jq

# 2. Check budget status
curl http://localhost:8081/budgets | jq

Demo Scenario 3: Policy-Based Routing#

Tests: Routing policies via Admin API

# 1. Login to Admin API
TOKEN=$(curl -s http://localhost:8086/auth/login \
  -d '{"api_key":"$LITELLM_KEY"}' \
  -H 'Content-Type: application/json' | jq -r .access_token)

# 2. List routing policies
curl http://localhost:8086/api/v1/routing-policies \
  -H "Authorization: Bearer $TOKEN" | jq

# 3. Create a routing policy (restrict to cost-effective models)
curl -X POST http://localhost:8086/api/v1/routing-policies \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "name": "Budget routing demo",
    "description": "Restrict to cost-effective models",
    "priority": 10,
    "condition": "team == demo",
    "action": "permit",
    "target_models": ["gpt-5-mini", "claude-haiku-4.5", "gemini-2.5-flash-lite"]
  }' | jq

Demo Scenario 4: Workflow Engine#

Tests: LangGraph workflows, templates

# 1. Check workflow engine health
curl http://localhost:8085/health | jq

# 2. List available workflow templates
curl http://localhost:8085/api/v1/templates | jq

# 3. Start a research workflow
curl http://localhost:8085/api/v1/executions \
  -H "Content-Type: application/json" \
  -d '{
    "template": "research",
    "input": {
      "query": "What are the latest trends in AI agents?"
    }
  }' | jq

# 4. Check execution status (replace {id} with actual ID)
curl http://localhost:8085/api/v1/executions/{id} | jq

Demo Scenario 5: Observability Stack#

Grafana (http://localhost:3030)#

Login: admin / admin@123
Dashboards: AI Control Plane → AI Control Plane Overview, FinOps Cost Tracking

Prometheus (http://localhost:9090)#

# Query examples:
litellm_requests_metric_total
litellm_spend_metric_total
litellm_llm_api_latency_metric_bucket

Jaeger (http://localhost:16686)#

Search for traces by service
View request flow across services

Demo Scenario 6: Vault Secrets#

Tests: Secrets management, access control

# 1. Check Vault status
curl http://localhost:8200/v1/sys/health | jq

# 2. List secrets (requires token)
export VAULT_ADDR=http://localhost:8200
export VAULT_TOKEN=root-token-for-dev

vault kv list secret/ai-gateway/dev/

# 3. Read a secret
vault kv get secret/ai-gateway/dev/providers/openai

Demo Scenario 7: Enterprise Organization Setup#

Tests: Multi-tenancy, SSO, team hierarchy

# 1. Login to Admin API
TOKEN=$(curl -s http://localhost:8086/auth/login \
  -d '{"api_key":"$LITELLM_KEY"}' \
  -H 'Content-Type: application/json' | jq -r .access_token)

# 2. Create organization
curl -X POST http://localhost:8086/api/v1/organizations \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Demo Corp", "slug": "demo-corp", "max_budget": 1000}' | jq

# 3. Create business unit
ORG_ID=$(curl -s http://localhost:8086/api/v1/organizations \
  -H "Authorization: Bearer $TOKEN" | jq -r '.[0].id')

curl -X POST "http://localhost:8086/api/v1/organizations/$ORG_ID/business-units" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Engineering", "slug": "engineering", "max_budget": 500}' | jq

Demo Scenario 8: Audit & Compliance#

Tests: Audit logging, change tracking

# View recent audit events
curl "http://localhost:8086/api/v1/audit-logs?limit=10" \
  -H "Authorization: Bearer $TOKEN" | jq

# Filter by resource type
curl "http://localhost:8086/api/v1/audit-logs?resource_type=organization" \
  -H "Authorization: Bearer $TOKEN" | jq

Demo Scenario 9: Chargeback & SLA#

Tests: Cost allocation, provider health

# 1. Create cost allocation rule
curl -X POST http://localhost:8086/api/v1/cost-allocation/rules \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Eng to CC-1234", "allocation_type": "cost_center", "allocation_target": "CC-1234"}' | jq

# 2. Check provider health
curl http://localhost:8086/api/v1/sla/health \
  -H "Authorization: Bearer $TOKEN" | jq

# 3. Create SLA definition
curl -X POST http://localhost:8086/api/v1/sla/definitions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "GPT-4o SLA", "provider": "openai", "model_pattern": "gpt-4o*", "target_p95_ms": 3000, "target_error_rate": 0.01}' | jq

Demo Scenario 10: A/B Testing & Events#

Tests: Model comparison, event system

# 1. Create A/B test
curl -X POST http://localhost:8086/api/v1/ab-tests \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "GPT vs Claude", "base_model": "gpt-4o", "variant_model": "claude-sonnet-4.5", "traffic_split_percent": 20}' | jq

# 2. Create event subscription
curl -X POST http://localhost:8086/api/v1/events/subscriptions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "Budget Alerts", "event_types": ["budget.exceeded", "sla.violation"], "channel": "webhook", "config": {"url": "https://httpbin.org/post"}}' | jq

# 3. Send test event
curl -X POST http://localhost:8086/api/v1/events/test \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"event_type": "budget.exceeded", "payload": {"team": "engineering", "amount": 500}}' | jq

Troubleshooting#

Service not healthy#

# Check logs
docker compose logs <service-name> --tail 100

# Restart specific service
docker compose restart <service-name>

Database connection issues#

# Check postgres
docker compose exec postgres psql -U litellm -c "SELECT 1"

Reset everything#

docker compose down -v
docker compose up -d

Demo Readiness Checklist#

[ ] All services running (docker compose ps)
[ ] Vault initialized with secrets
[ ] LiteLLM responding to /v1/models
[ ] Admin UI accessible at :5173
[ ] Grafana dashboards showing data at :3030
[ ] At least one chat completion working
[ ] Organizations page shows at least one org
[ ] Audit log has entries from setup operations
[ ] Chargeback allocation rules configured
[ ] SLA definitions created
[ ] Event subscriptions active