TURION .AI
/ developers · 04

AI Agent Running costs in production.

What an agent actually costs once it's live, broken down by model and by workload. Prices are sourced from OpenRouter and provider catalogues; we update this page when the catalogues move.

Models15
SourceOpenRouter
UpdatedMay 2026
CurrencyUSD

/ 01 · per-run cost

What does one agent run cost?

Three concrete examples, each picked to match the model it would actually run on in production. Cost = input tokens × input price + output tokens × output price. Tool-call round-trips count toward the input bill on the next turn — they're the most common cost surprise teams hit.

/ small

Simple support triage

Claude Haiku 4.5

Reads a ticket, classifies it, drafts the canned reply, opens a Jira if escalation is needed.

input tokens
1,200
output tokens
400
tool calls
1
Input spend<$0.01
Output spend<$0.01
AI cost / run<$0.01
Same task by a support analyst 4 min · $1.87
You save per run $1.86 583× cheaper
/ medium

Code review agent

Claude Sonnet 4.6

Reads a PR diff (8k tokens of code), runs three lint/test tool calls, writes a structured review.

input tokens
8,000
output tokens
2,500
tool calls
3
Input spend$0.0240
Output spend$0.0375
AI cost / run$0.0615
Same task by a senior engineer 25 min · $39.58
You save per run $39.52 644× cheaper
/ large

Research / longform agent

Claude Opus 4.7

Multi-step research run: fan-out to 4 sub-agents, web search + doc retrieval, synthesised report with citations.

input tokens
60,000
output tokens
6,000
tool calls
12
Input spend$0.9000
Output spend$0.4500
AI cost / run$1.3500
Same task by a research analyst 95 min · $114
You save per run $113 84× cheaper
/ 02 · model prices

Top 15 models, cheapest first per tier.

Tier Provider Model Input $ / M tok Output $ / M tok Context Best for
Frontier Anthropic Claude Opus 4.7 $15.00 $75.00 200k Hardest reasoning, long-context research, code
Frontier OpenAI GPT-5 $1.25 $10.00 400k Deep reasoning + native tool use
Frontier Google Gemini 2.5 Pro $1.25 $10.00 1M Long context (1M tokens), multimodal
Frontier xAI Grok 4 $5.00 $15.00 256k Conversational reasoning, current events
Workhorse Anthropic Claude Sonnet 4.6 $3.00 $15.00 200k Default agent loop, code review, structured output
Workhorse OpenAI GPT-5 mini $0.25 $2.00 200k Cheap-and-good agent default for high-volume work
Workhorse OpenAI GPT-4.1 $2.00 $8.00 1M Long-context retrieval, document Q&A
Workhorse Google Gemini 2.5 Flash $0.30 $2.50 1M Cheap long-context, real-time agents
Workhorse Anthropic Claude Haiku 4.5 $1.00 $5.00 200k Tight loops, classifiers, voice agents
Workhorse Mistral Mistral Large 2 $2.00 $6.00 128k EU-hosted, function calling
Workhorse Cohere Command R+ $2.50 $10.00 128k RAG-tuned with strong citations
Open-weight Meta Llama 3.3 70B $0.59 $0.79 128k Cheap general-purpose; self-host friendly
Open-weight DeepSeek DeepSeek V3.1 $0.27 $1.10 128k Cheapest workhorse with reasoning chops
Open-weight Qwen Qwen 2.5 72B $0.40 $0.40 128k Cheap, very fast, strong on multilingual
Open-weight Mistral Mixtral 8x22B $0.65 $0.65 64k Cheap MoE, decent function calling

Snapshot from openrouter.ai and provider catalogues, May 2026. Provider-direct pricing is usually within ±5% of OpenRouter's published rate; volume discounts and prompt-cache reads can drop the effective input price by 50–90%.

/ 03 · monthly projection

What does that look like at scale?

Per-run cost × runs/day × 30. Log-scaled axes; expensive frontier workloads pull away fast. Most teams we work with sit between 1,000 and 10,000 runs/day per agent.

$0.10 $1.00 $10.00 $100 $1.0k $10.0k 10/d 100/d 1k/d 10k/d
Simple support triage <$0.01 / run Code review agent $0.0615 / run Research / longform agent $1.3500 / run
Profile 10/day100/day1k/day10k/day
Simple support triage $0.96$9.60$96.00$960
Code review agent $18.45$185$1.8k$18.4k
Research / longform agent $405$4.0k$40.5k$405.0k
/ 04 · vs human work

What you save vs. paying a person to do the same work.

Same three workloads, side by side: AI cost vs. the loaded monthly cost of the role that would otherwise own the work. Numbers assume each profile runs at 1,000 events/day (the middle of the projection above) for 30 days.

Profile AI / month Human / month Savings / month Multiplier
Simple support triage4 min · support analyst @ $28/hr $96.00 $56.0k $55.9k 583×
Code review agent25 min · senior engineer @ $95/hr $1.8k $1187.5k $1185.7k 644×
Research / longform agent95 min · research analyst @ $72/hr $40.5k $3420.0k $3379.5k 84×
Totals · 3 profiles, 1k events/day each $42.4k $4663.5k $4621.1k 110×

Per-run costs come from the model prices above; per-run human-equivalents use loaded US/EU salaries for the role that owns the work today (a support analyst at $28/hr, a senior engineer at $95/hr, a research analyst at $72/hr). At 1,000 events/day for 30 days, the simple agent saves ~$56k/month against a single support analyst's hourly cost; the research agent saves over $3M/month against a research desk doing the same volume manually.

/ 05 · where the bill goes

Where the cost actually comes from.

  • Input tokens dominate. Most agents read more than they write — a code-review agent typically processes 5–10× more input than output. Trimming the system prompt and the retrieved context is the highest-leverage cost lever.
  • Tool calls inflate input. Each tool-call round-trip re-sends the prior conversation plus the tool result on the next turn. A 12-step agent on a 4k system prompt easily reads 48k tokens before the user notices.
  • Prompt caching is the cheapest fix. Anthropic, OpenAI, and Google all cache the static head of the prompt at 10–25% of the standard input rate. For long, stable system prompts this drops the effective bill 50–90%.
  • Match the model to the step. Use Haiku/Flash/mini for classifiers and routing, escalate to Sonnet/GPT-5/Gemini Pro only when the eval set demands it. A two-tier router cuts spend on most production agents in half.
  • Reasoning models bill on hidden tokens. Frontier reasoning modes ("thinking", "deliberation") count toward output. A single Opus 4.7 reasoning call can write 4–6k tokens you'll never see.
  • Per-tenant budgets in the runtime. Hard-cap each agent at a daily $ ceiling. We set this in CI and the workflow cancels promotion if the eval suite blows past it.
/ 09 · NEXT

Build the agent your
team keeps promising.

Pair with one of our solutions architects. Two weeks from kickoff to a deployed, evaluated, observable agent in your stack.

we'll walk you through a system that's already running in production