AI FinOps: Tracking Token Spend Across Your Org
LLM bills grew from invisible to huge in the span of a year. A complete FinOps playbook for AI workloads: attribution, budgets, alerting, and the reports finance actually wants.
“Sovereign AI” went from buzzword to requirement in 2025. The EU AI Act is in full effect. India’s DPDP law has teeth. Saudi Arabia, UAE, Indonesia, and a handful of other jurisdictions now require certain AI workloads to run inside national borders with local oversight.
If you serve users in these regions, “just route to OpenAI” is no longer a viable answer. This post covers the deployment playbooks we use with clients building sovereign AI stacks for EU and India specifically — the two most mature and most regulated markets as of early 2026.
It’s not a single thing. At least four distinct requirements travel under the banner:
Different laws and contracts emphasize different ones. EU AI Act cares most about (3) and (4) for high-risk systems. GDPR cares about (1). Some public-sector procurement requires all four.
Deploy GPU compute inside EU regions. Options:
For most commercial SaaS, EU-region hyperscaler + EU neocloud for GPU compute is the pragmatic default. For public sector and regulated industries, prefer sovereign cloud (OVHcloud, Scaleway) or self-hosted.
The models you can run in EU, ranked by sovereignty:
For high-risk AI Act systems, self-hosted or EU-hosted is the only safe default. For low-risk, US-hosted with EU residency is usually fine.
For systems classified high-risk under the AI Act:
This is a real engineering and legal investment. Budget for it. Tools that help:
[ User in EU ]
│
▼
[ CDN in EU (Cloudflare EU) ]
│
▼
[ API in EU region (AWS / Scaleway) ]
│
▼
[ LiteLLM gateway — routes to EU backends only ]
│
├── [ Mistral La Plateforme (France) ]
├── [ OVH AI Endpoints (France) ]
├── [ Self-hosted vLLM on OVHcloud GPU (Llama-3-70B) ]
└── [ Azure OpenAI (EU data residency) ]
│
▼
[ Vector DB in EU (Qdrant Cloud EU region) ]
│
▼
[ Object storage in EU (S3 EU / OVH S3) ]
│
▼
[ Observability (Langfuse EU, Datadog EU) ]
India’s DPDP Act requires data localization for specific categories of personal data. For public sector workloads under MeitY and RBI guidelines, even stronger localization.
Options:
For defense, government, and highly regulated workloads, Indian-owned sovereign cloud is often required. For commercial SaaS, hyperscaler India regions are typically fine.
Model options inside India:
The multilingual story matters: Indian workloads often need Indic-language support (Hindi, Tamil, Bengali, etc). Open models (Llama 3, Gemma, Qwen) have solid multilingual but Indian-built models (Sarvam) often outperform on Indic specifics.
Critical areas under DPDP:
For RBI-regulated financial services, additional strictures:
Practical patterns:
[ User in India ]
│
▼
[ CDN in India (AWS CloudFront Mumbai / Cloudflare) ]
│
▼
[ API in Mumbai (AWS IN / Azure IN) ]
│
▼
[ LiteLLM gateway — India-region backends ]
│
├── [ Self-hosted Llama-3-70B on Tata AI Cloud ]
├── [ Sarvam-2B for Indic languages ]
└── [ Azure OpenAI India region ]
│
▼
[ pgvector on RDS Mumbai ]
│
▼
[ S3 Mumbai (with replication disabled outside IN) ]
│
▼
[ Observability in Mumbai region ]
If you serve both EU and India from one product, you need multi-region architecture:
1. Data partitioning by region. A user’s data lives in the region where they’re based. No cross-region replication for PII.
2. Per-region inference. Route calls to the inference backend in the user’s region. Latency is secondary; data residency is primary.
3. Per-region vector stores. Each region has its own vector index.
4. Centralized code, decentralized data. Application code can deploy from one pipeline. State never leaves region.
5. Regional observability. Don’t pipe EU traces to US datadog by default. Use regional datadog or Langfuse.
6. Global metadata-only. Per-user settings without PII can live globally. Full user profiles live regionally.
This is more work than a single-region deployment. Budget 2–3 months of platform engineering to set up the first two regions properly. Additional regions become faster after the pattern is established.
Sovereign infrastructure costs more. Typical multipliers vs. single-region US deployment:
Budget accordingly. For regulated workloads, the cost is unavoidable. For workloads where sovereignty is a market-entry nice-to-have, weigh cost vs. the revenue it unlocks.
Already in 2026:
We maintain a regulatory tracking doc for clients. The space moves monthly.
Building sovereign AI infrastructure? Reach out — we’ve scoped EU and India deployments for regulated industries.
LLM bills grew from invisible to huge in the span of a year. A complete FinOps playbook for AI workloads: attribution, budgets, alerting, and the reports finance actually wants.
A comprehensive 2500+ word end-to-end guide covering everything you need to take AI agents from experimental prototypes to reliable production systems, including architecture patterns, reliability engineering, monitoring, and scaling strategies
Running Llama 3 in production takes more than docker run. A complete guide: weight distribution, quantization, serving topology, autoscaling, evals, and cost comparisons vs the major API providers.