Simple support triage
Claude Haiku 4.5Reads a ticket, classifies it, drafts the canned reply, opens a Jira if escalation is needed.
- input tokens
- 1,200
- output tokens
- 400
- tool calls
- 1
What an agent actually costs once it's live, broken down by model and by workload. Prices are sourced from OpenRouter and provider catalogues; we update this page when the catalogues move.
Three concrete examples, each picked to match the model it would actually run on in production. Cost = input tokens × input price + output tokens × output price. Tool-call round-trips count toward the input bill on the next turn — they're the most common cost surprise teams hit.
Reads a ticket, classifies it, drafts the canned reply, opens a Jira if escalation is needed.
Reads a PR diff (8k tokens of code), runs three lint/test tool calls, writes a structured review.
Multi-step research run: fan-out to 4 sub-agents, web search + doc retrieval, synthesised report with citations.
| Tier | Provider | Model | Input $ / M tok | Output $ / M tok | Context | Best for |
|---|---|---|---|---|---|---|
| Frontier | Anthropic | Claude Opus 4.7 | $15.00 | $75.00 | 200k | Hardest reasoning, long-context research, code |
| Frontier | OpenAI | GPT-5 | $1.25 | $10.00 | 400k | Deep reasoning + native tool use |
| Frontier | Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Long context (1M tokens), multimodal | |
| Frontier | xAI | Grok 4 | $5.00 | $15.00 | 256k | Conversational reasoning, current events |
| Workhorse | Anthropic | Claude Sonnet 4.6 | $3.00 | $15.00 | 200k | Default agent loop, code review, structured output |
| Workhorse | OpenAI | GPT-5 mini | $0.25 | $2.00 | 200k | Cheap-and-good agent default for high-volume work |
| Workhorse | OpenAI | GPT-4.1 | $2.00 | $8.00 | 1M | Long-context retrieval, document Q&A |
| Workhorse | Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Cheap long-context, real-time agents | |
| Workhorse | Anthropic | Claude Haiku 4.5 | $1.00 | $5.00 | 200k | Tight loops, classifiers, voice agents |
| Workhorse | Mistral | Mistral Large 2 | $2.00 | $6.00 | 128k | EU-hosted, function calling |
| Workhorse | Cohere | Command R+ | $2.50 | $10.00 | 128k | RAG-tuned with strong citations |
| Open-weight | Meta | Llama 3.3 70B | $0.59 | $0.79 | 128k | Cheap general-purpose; self-host friendly |
| Open-weight | DeepSeek | DeepSeek V3.1 | $0.27 | $1.10 | 128k | Cheapest workhorse with reasoning chops |
| Open-weight | Qwen | Qwen 2.5 72B | $0.40 | $0.40 | 128k | Cheap, very fast, strong on multilingual |
| Open-weight | Mistral | Mixtral 8x22B | $0.65 | $0.65 | 64k | Cheap MoE, decent function calling |
Snapshot from openrouter.ai and provider catalogues, May 2026. Provider-direct pricing is usually within ±5% of OpenRouter's published rate; volume discounts and prompt-cache reads can drop the effective input price by 50–90%.
Per-run cost × runs/day × 30. Log-scaled axes; expensive frontier workloads pull away fast. Most teams we work with sit between 1,000 and 10,000 runs/day per agent.
Same three workloads, side by side: AI cost vs. the loaded monthly cost of the role that would otherwise own the work. Numbers assume each profile runs at 1,000 events/day (the middle of the projection above) for 30 days.
Per-run costs come from the model prices above; per-run human-equivalents use loaded US/EU salaries for the role that owns the work today (a support analyst at $28/hr, a senior engineer at $95/hr, a research analyst at $72/hr). At 1,000 events/day for 30 days, the simple agent saves ~$56k/month against a single support analyst's hourly cost; the research agent saves over $3M/month against a research desk doing the same volume manually.
Pair with one of our solutions architects. Two weeks from kickoff to a deployed, evaluated, observable agent in your stack.