Skip to content

FinOps KPI Definitions#

Document Information#

Field Value
Version 1.0
Last Updated 2026-02
Owner FinOps Team
Review Cycle Monthly

Overview#

This document defines the Key Performance Indicators (KPIs) for AI Control Plane Platform cost management. These metrics enable teams to understand, optimize, and govern LLM spending.


1. Cost Efficiency KPIs#

1.1 Cost Per Request (CPR)#

Definition: Average cost incurred per API request to the gateway.

CPR = Total Spend / Total Requests

Prometheus Query:

sum(increase(litellm_spend_total[24h])) / sum(increase(litellm_requests_total[24h]))

Target Good Warning Critical
< $0.01 < $0.005 $0.01-0.05 > $0.05

Use Cases: - Identify expensive usage patterns - Compare efficiency across teams - Track optimization improvements


1.2 Cost Per 1K Tokens (CPT)#

Definition: Average cost per 1,000 tokens processed (input + output combined).

CPT = Total Spend / (Total Tokens / 1000)

Prometheus Query:

sum(increase(litellm_spend_total[24h])) / (sum(increase(litellm_tokens_total[24h])) / 1000)

Model Tier Expected CPT
Budget (Haiku, GPT-4o-mini) $0.0002 - $0.001
Standard (Sonnet, GPT-4o) $0.002 - $0.01
Premium (Opus, GPT-4-turbo) $0.01 - $0.03
Self-hosted (Llama) $0.00005 - $0.0003

Use Cases: - Model cost comparison - Vendor negotiation benchmarks - Self-hosted vs API cost analysis


1.3 Output/Input Token Ratio#

Definition: Ratio of output tokens to input tokens, indicating response verbosity.

Token Ratio = Output Tokens / Input Tokens

Prometheus Query:

sum(increase(litellm_tokens_total{type="output"}[24h]))
/ sum(increase(litellm_tokens_total{type="input"}[24h]))

Use Case Expected Ratio
Chat/Q&A 0.5 - 2.0
Code Generation 2.0 - 5.0
Summarization 0.1 - 0.3
Translation 0.8 - 1.2

Use Cases: - Identify verbose prompts/responses - Optimize prompt engineering - Detect potential misuse


1.4 Cache Hit Rate#

Definition: Percentage of requests served from cache vs. calling the LLM provider.

Cache Hit Rate = Cached Requests / Total Requests Γ— 100%

Prometheus Query:

sum(rate(litellm_cache_hits_total[1h])) / sum(rate(litellm_requests_total[1h]))

Target Good Needs Improvement
> 20% > 30% < 10%

Use Cases: - Validate caching effectiveness - Identify cacheable workloads - Reduce redundant API calls


2. Budget Management KPIs#

2.1 Budget Utilization Rate#

Definition: Percentage of allocated budget consumed within the period.

Budget Utilization = Current Spend / Budget Limit Γ— 100%

Prometheus Query:

# By team
sum(litellm_team_spend_total) by (team) / sum(litellm_team_budget_total) by (team)

# Global
sum(litellm_spend_total) / sum(litellm_global_budget_total)

Status Utilization
🟒 Healthy < 70%
🟑 Warning 70-90%
πŸ”΄ Critical > 90%

Alerting Thresholds: - Warning at 80% - Critical at 95%


2.2 Budget Burn Rate#

Definition: Rate at which budget is being consumed, used to project exhaustion date.

Burn Rate = Current Period Spend / Days Elapsed
Projected Exhaustion = Budget Remaining / Burn Rate

Prometheus Query:

# Daily burn rate
sum(increase(litellm_spend_total[24h]))

# Days until budget exhaustion
(sum(litellm_team_budget_total) - sum(litellm_team_spend_total))
/ sum(increase(litellm_spend_total[24h]))

Projection Action
> 30 days remaining Normal
15-30 days Review usage
< 15 days Urgent optimization needed

2.3 Budget Variance#

Definition: Difference between planned (forecasted) and actual spend.

Variance = (Actual Spend - Forecasted Spend) / Forecasted Spend Γ— 100%
Variance Assessment
Β±10% On track
+10-25% Over budget
+25%+ Significant overrun
-25%+ Under-utilized

Use Cases: - Monthly/quarterly financial reporting - Forecast accuracy improvement - Resource allocation decisions


3. Usage Distribution KPIs#

3.1 Model Mix Distribution#

Definition: Percentage breakdown of requests/spend by model.

Prometheus Query:

# Spend distribution
sum(increase(litellm_spend_total[24h])) by (model)
/ sum(increase(litellm_spend_total[24h]))

# Request distribution
sum(increase(litellm_requests_total[24h])) by (model)
/ sum(increase(litellm_requests_total[24h]))

Healthy Distribution Example: | Model Tier | Request % | Spend % | |------------|-----------|---------| | Budget | 60-70% | 10-20% | | Standard | 25-35% | 40-50% | | Premium | 5-10% | 30-40% |


3.2 Team Cost Allocation#

Definition: Cost distribution across teams/departments.

Prometheus Query:

sum(increase(litellm_spend_total[30d])) by (team)

Reporting Format: | Team | Monthly Spend | % of Total | Budget | Utilization | |------|---------------|------------|--------|-------------| | Engineering | $X,XXX | XX% | $X,XXX | XX% | | Data Science | $X,XXX | XX% | $X,XXX | XX% | | Product | $X,XXX | XX% | $X,XXX | XX% |


3.3 Provider Distribution#

Definition: Spend breakdown by LLM provider (OpenAI, Anthropic, XAI, Self-hosted).

Prometheus Query:

sum(increase(litellm_spend_total[24h])) by (provider)
/ sum(increase(litellm_spend_total[24h]))

Strategic Targets: | Provider | Target % | Rationale | |----------|----------|-----------| | Self-hosted | 40-60% | Cost control, data privacy | | OpenAI | 20-30% | Specific capabilities | | Anthropic | 15-25% | Quality, safety | | XAI | 5-10% | Diversity, fallback |


4. Operational Efficiency KPIs#

4.1 Request Success Rate#

Definition: Percentage of requests that complete successfully (non-error).

Success Rate = Successful Requests / Total Requests Γ— 100%

Prometheus Query:

sum(rate(litellm_requests_total{status="success"}[5m]))
/ sum(rate(litellm_requests_total[5m]))

SLO Target Acceptable Degraded
99.9% > 99.5% < 99%

4.2 Retry Rate#

Definition: Percentage of requests that required retries.

Prometheus Query:

sum(rate(litellm_retries_total[1h])) / sum(rate(litellm_requests_total[1h]))

Status Retry Rate
🟒 Healthy < 5%
🟑 Elevated 5-15%
πŸ”΄ High > 15%

Cost Impact: Each retry adds ~100% cost for that request.


4.3 Fallback Rate#

Definition: Percentage of requests routed to fallback models.

Prometheus Query:

sum(rate(litellm_fallback_requests_total[1h])) / sum(rate(litellm_requests_total[1h]))

Healthy Warning
< 2% > 5%

Use Cases: - Provider reliability assessment - Capacity planning - Cost impact analysis (fallbacks may cost more/less)


5. Unit Economics KPIs#

5.1 Cost Per User#

Definition: Average cost per active user per period.

Cost Per User = Total Spend / Active Users

Prometheus Query:

sum(increase(litellm_spend_total[30d])) / count(count by (user) (litellm_requests_total))

Benchmarks by User Type: | User Type | Monthly Cost | |-----------|--------------| | Light (< 100 req/day) | $5-20 | | Moderate (100-1000 req/day) | $20-100 | | Heavy (> 1000 req/day) | $100-500 | | Power (automated/batch) | $500+ |


5.2 Self-Hosted Cost Ratio#

Definition: Cost savings achieved by using self-hosted models vs. equivalent API calls.

Savings Ratio = 1 - (Self-hosted Cost / Equivalent API Cost)

Calculation Components: - Infrastructure cost (GPU, compute, storage) - Operational cost (team time, maintenance) - Equivalent API cost at same token volume

Target: > 50% savings at scale


5.3 Cost Per Business Outcome#

Definition: Cost attributed to specific business outcomes (varies by use case).

Use Case Metric Target CPO
Customer Support Cost per ticket resolved < $0.50
Code Generation Cost per PR assisted < $2.00
Content Creation Cost per article < $1.00
Data Analysis Cost per report < $5.00

6. Reporting Schedule#

6.1 Real-time Dashboard#

  • All KPIs updated every 30 seconds
  • Alert thresholds monitored continuously

6.2 Daily Report#

  • Cost Per Request trend
  • Budget utilization by team
  • Anomaly highlights
  • Delivery: Slack #finops-daily at 9 AM IST

6.3 Weekly Report#

  • Week-over-week cost comparison
  • Model mix analysis
  • Top 10 spenders
  • Optimization recommendations
  • Delivery: Email to stakeholders, Monday 10 AM IST

6.4 Monthly Report#

  • Budget variance analysis
  • Provider distribution trends
  • Unit economics deep dive
  • Forecast vs actual
  • Chargeback data for finance
  • Delivery: Confluence + email, 3rd business day

7. Grafana Dashboard Panels#

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          AI Control Plane FinOps Dashboard                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Total Spend β”‚ Cost/Requestβ”‚ Budget Used β”‚ Cache Hit % β”‚    Burn Rate        β”‚
β”‚   (24h)     β”‚    (avg)    β”‚   (month)   β”‚             β”‚  (days remaining)   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                        Hourly Spend by Model (stacked bar)                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚      Cost Distribution by Team (pie)    β”‚   Budget Utilization (gauges)    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Provider Spend Breakdown (pie)        β”‚   Cost Per 1K Tokens (line)      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                      Token Usage: Input vs Output (timeseries)              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                          Top 10 Spenders Table                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

8. Optimization Recommendations#

Based on KPI Analysis#

KPI Signal Recommendation
High CPR Review prompt length, enable caching
Low cache hit rate Identify repeated queries, tune TTL
High token ratio Optimize prompts, set max_tokens
Budget > 80% Alert team, review heavy users
High retry rate Check provider health, adjust timeouts
Premium model > 30% spend Evaluate if premium needed, try cheaper alternatives
Self-hosted < 40% Migrate suitable workloads to vLLM

Appendix A: Prometheus Recording Rules#

groups:
  - name: finops-recording-rules
    interval: 1m
    rules:
      - record: aigateway:cost_per_request:1h
        expr: sum(increase(litellm_spend_total[1h])) / sum(increase(litellm_requests_total[1h]))

      - record: aigateway:cost_per_1k_tokens:1h
        expr: sum(increase(litellm_spend_total[1h])) / (sum(increase(litellm_tokens_total[1h])) / 1000)

      - record: aigateway:budget_utilization:team
        expr: sum(litellm_team_spend_total) by (team) / sum(litellm_team_budget_total) by (team)

      - record: aigateway:daily_spend:model
        expr: sum(increase(litellm_spend_total[24h])) by (model)

      - record: aigateway:cache_hit_rate:1h
        expr: sum(rate(litellm_cache_hits_total[1h])) / sum(rate(litellm_requests_total[1h]))

      - record: aigateway:token_ratio:1h
        expr: sum(rate(litellm_tokens_total{type="output"}[1h])) / sum(rate(litellm_tokens_total{type="input"}[1h]))

Appendix B: SQL Queries for FinOps Reporter#

-- Daily cost by team
SELECT
    date,
    team_id,
    SUM(total_cost) as daily_cost,
    SUM(request_count) as requests,
    SUM(total_cost) / NULLIF(SUM(request_count), 0) as cost_per_request
FROM cost_tracking_daily
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY date, team_id
ORDER BY date DESC, daily_cost DESC;

-- Model efficiency comparison
SELECT
    model,
    SUM(total_cost) as total_cost,
    SUM(input_tokens + output_tokens) as total_tokens,
    SUM(total_cost) / NULLIF(SUM(input_tokens + output_tokens), 0) * 1000 as cost_per_1k_tokens
FROM cost_tracking_daily
WHERE date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY model
ORDER BY cost_per_1k_tokens;

-- Budget burn rate projection
WITH daily_spend AS (
    SELECT
        team_id,
        AVG(total_cost) as avg_daily_spend
    FROM cost_tracking_daily
    WHERE date >= CURRENT_DATE - INTERVAL '7 days'
    GROUP BY team_id
)
SELECT
    team_id,
    avg_daily_spend,
    budget_limit,
    (budget_limit - current_spend) / NULLIF(avg_daily_spend, 0) as days_remaining
FROM daily_spend
JOIN team_budgets USING (team_id);