AI Agent Running Costs

/ 01 · per-run cost

What does one agent run cost?

Three concrete examples, each picked to match the model it would actually run on in production. Cost = input tokens × input price + output tokens × output price. Tool-call round-trips count toward the input bill on the next turn — they're the most common cost surprise teams hit.

/ small

Simple support triage

Claude Haiku 4.5

Reads a ticket, classifies it, drafts the canned reply, opens a Jira if escalation is needed.

input tokens: 1,200
output tokens: 400
tool calls: 1

Input spend<$0.01

Output spend<$0.01

AI cost / run<$0.01

Same task by a support analyst 4 min · $1.87

You save per run $1.86 583× cheaper

/ medium

Code review agent

Claude Sonnet 4.6

Reads a PR diff (8k tokens of code), runs three lint/test tool calls, writes a structured review.

input tokens: 8,000
output tokens: 2,500
tool calls: 3

Input spend$0.0240

Output spend$0.0375

AI cost / run$0.0615

Same task by a senior engineer 25 min · $39.58

You save per run $39.52 644× cheaper

/ large

Research / longform agent

Claude Opus 4.7

Multi-step research run: fan-out to 4 sub-agents, web search + doc retrieval, synthesised report with citations.

input tokens: 60,000
output tokens: 6,000
tool calls: 12

Input spend$0.9000

Output spend$0.4500

AI cost / run$1.3500

Same task by a research analyst 95 min · $114

You save per run $113 84× cheaper

/ 02 · model prices

Top 15 models, cheapest first per tier.

Tier	Provider	Model	Input $ / M tok	Output $ / M tok	Context	Best for
Frontier	Anthropic	Claude Opus 4.7	$15.00	$75.00	200k	Hardest reasoning, long-context research, code
Frontier	OpenAI	GPT-5	$1.25	$10.00	400k	Deep reasoning + native tool use
Frontier	Google	Gemini 2.5 Pro	$1.25	$10.00	1M	Long context (1M tokens), multimodal
Frontier	xAI	Grok 4	$5.00	$15.00	256k	Conversational reasoning, current events
Workhorse	Anthropic	Claude Sonnet 4.6	$3.00	$15.00	200k	Default agent loop, code review, structured output
Workhorse	OpenAI	GPT-5 mini	$0.25	$2.00	200k	Cheap-and-good agent default for high-volume work
Workhorse	OpenAI	GPT-4.1	$2.00	$8.00	1M	Long-context retrieval, document Q&A
Workhorse	Google	Gemini 2.5 Flash	$0.30	$2.50	1M	Cheap long-context, real-time agents
Workhorse	Anthropic	Claude Haiku 4.5	$1.00	$5.00	200k	Tight loops, classifiers, voice agents
Workhorse	Mistral	Mistral Large 2	$2.00	$6.00	128k	EU-hosted, function calling
Workhorse	Cohere	Command R+	$2.50	$10.00	128k	RAG-tuned with strong citations
Open-weight	Meta	Llama 3.3 70B	$0.59	$0.79	128k	Cheap general-purpose; self-host friendly
Open-weight	DeepSeek	DeepSeek V3.1	$0.27	$1.10	128k	Cheapest workhorse with reasoning chops
Open-weight	Qwen	Qwen 2.5 72B	$0.40	$0.40	128k	Cheap, very fast, strong on multilingual
Open-weight	Mistral	Mixtral 8x22B	$0.65	$0.65	64k	Cheap MoE, decent function calling

Snapshot from openrouter.ai and provider catalogues, May 2026. Provider-direct pricing is usually within ±5% of OpenRouter's published rate; volume discounts and prompt-cache reads can drop the effective input price by 50–90%.

/ 03 · monthly projection

What does that look like at scale?

Per-run cost × runs/day × 30. Log-scaled axes; expensive frontier workloads pull away fast. Most teams we work with sit between 1,000 and 10,000 runs/day per agent.

Simple support triage <$0.01 / run Code review agent $0.0615 / run Research / longform agent $1.3500 / run

Profile 10/day100/day1k/day10k/day

Simple support triage $0.96$9.60$96.00$960

Code review agent $18.45$185$1.8k$18.4k

Research / longform agent $405$4.0k$40.5k$405.0k

/ 04 · vs human work

What you save vs. paying a person to do the same work.

Same three workloads, side by side: AI cost vs. the loaded monthly cost of the role that would otherwise own the work. Numbers assume each profile runs at 1,000 events/day (the middle of the projection above) for 30 days.

Profile AI / month Human / month Savings / month Multiplier

Simple support triage4 min · support analyst @ $28/hr $96.00 $56.0k $55.9k 583×

Code review agent25 min · senior engineer @ $95/hr $1.8k $1187.5k $1185.7k 644×

Research / longform agent95 min · research analyst @ $72/hr $40.5k $3420.0k $3379.5k 84×

Totals · 3 profiles, 1k events/day each $42.4k $4663.5k $4621.1k 110×

Per-run costs come from the model prices above; per-run human-equivalents use loaded US/EU salaries for the role that owns the work today (a support analyst at $28/hr, a senior engineer at $95/hr, a research analyst at $72/hr). At 1,000 events/day for 30 days, the simple agent saves ~$56k/month against a single support analyst's hourly cost; the research agent saves over $3M/month against a research desk doing the same volume manually.

/ 05 · where the bill goes

Where the cost actually comes from.

Input tokens dominate. Most agents read more than they write — a code-review agent typically processes 5–10× more input than output. Trimming the system prompt and the retrieved context is the highest-leverage cost lever.
Tool calls inflate input. Each tool-call round-trip re-sends the prior conversation plus the tool result on the next turn. A 12-step agent on a 4k system prompt easily reads 48k tokens before the user notices.
Prompt caching is the cheapest fix. Anthropic, OpenAI, and Google all cache the static head of the prompt at 10–25% of the standard input rate. For long, stable system prompts this drops the effective bill 50–90%.
Match the model to the step. Use Haiku/Flash/mini for classifiers and routing, escalate to Sonnet/GPT-5/Gemini Pro only when the eval set demands it. A two-tier router cuts spend on most production agents in half.
Reasoning models bill on hidden tokens. Frontier reasoning modes ("thinking", "deliberation") count toward output. A single Opus 4.7 reasoning call can write 4–6k tokens you'll never see.
Per-tenant budgets in the runtime. Hard-cap each agent at a daily $ ceiling. We set this in CI and the workflow cancels promotion if the eval suite blows past it.

AI Agent Running costs in production.

What does one agent run cost?

Simple support triage

Code review agent

Research / longform agent

Top 15 models, cheapest first per tier.

What does that look like at scale?

What you save vs. paying a person to do the same work.

Where the cost actually comes from.

Build the agent your
team keeps promising.

What does one agent run cost?

Simple support triage

Code review agent

Research / longform agent

Top 15 models, cheapest first per tier.

What does that look like at scale?

What you save vs. paying a person to do the same work.

Where the cost actually comes from.

Build the agent yourteam keeps promising.

Don't miss out on AI insights

Build the agent your
team keeps promising.