TURION .AI

Coding Agents Just Crossed an Economic Threshold — and Composer 2.5 Is the Proof Point

Balys Kriksciunas · · 4 min read
Developer workspace with Cursor IDE displaying agent-assisted code diff visualizations in blue and purple tones

Cursor's Composer 2.5 matches GPT-5.5 and Opus 4.7 on agentic coding benchmarks at 1/10th the cost. Combined with Spotify's 99% AI coding adoption and Microsoft's computer-use GA, May 18–25 marks the week coding agents stopped being a productivity experiment and started being an infrastructure decision.

Three data points landed within the same 72-hour window last week, and together they mean more than any single announcement could.

Point one: Cursor shipped Composer 2.5 on May 18 — an in-house agentic coding model that scores 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1, matching Claude Opus 4.7 and GPT-5.5 on these benchmarks at roughly one-tenth the cost per token (source). It is built on Moonshot’s Kimi K2.5 checkpoint with 25× more synthetic training tasks than Composer 2, plus targeted RL fine-tuning for effort calibration and long-horizon task sustainment.

Point two: Spotify’s co-CEO confirmed on an earnings call that senior engineers “have not written a single line of code since December” — they now prompt and review AI output exclusively. The company’s internal AI coding platform, Honk (built on Claude Code and the Claude Agent SDK), has merged over 1,500 PRs and achieved 99% voluntary adoption among engineers (source).

Point three: Microsoft’s Copilot Studio computer-use agents hit general availability on May 13, with the press cycle peaking May 21–22 as global geo-rollout completed. At 5 Copilot Credits per step ($0.04/step on prepaid standard models), it’s the first enterprise-grade computer-use capability available to every Power Platform tenant (source).

Individually, each is a product milestone. Together, they signal that coding agents crossed an economic threshold last week — from “impressive demo” to “line item on the infrastructure budget.”

Why Composer 2.5 changes the math

The benchmark-to-price ratio is the story. On SWE-Bench Multilingual, Composer 2.5 (79.8%) trades blows with GPT-5.5 (82.2%) and Opus 4.7 (79.4%). On Terminal-Bench 2.0, it matches Opus 4.7 (69.3% vs. 69.4%) but trails GPT-5.5 (82.7%). But the pricing gap is what forces the conversation: Composer 2.5 standard tier costs $0.50/$2.50 per million input/output tokens — roughly 10× cheaper than Opus 4.7 per token (source).

Translation for teams running long agent sessions: you can now get frontier-competitive coding performance without frontier inference bills. For a 30-minute multi-file refactor with sustained tool calls, that’s the difference between spending $3 and $30 per session.

Cursor’s training approach is worth noting because it’s replicable. The team used sharded Muon optimizer and dual-mesh HSDP parallelism on the Kimi K2.5 base, then ran large-scale synthetic task generation — 25× more training examples than Composer 2 — followed by targeted RL with textual feedback on communication style and effort calibration. The result is a model that doesn’t just score well; it stays on task across long sessions without overcorrecting or abandoning context.

The Spotify signal

Spotify’s 99% internal adoption isn’t a press stunt — it’s a data point about what happens when coding agents are integrated into CI/CD infrastructure rather than treated as a personal productivity tool. Honk is wired into fleet management, uses MCP for tool access, and runs on Claude Code + Claude Agent SDK. The system has merged over 1,500 PRs. Senior engineers function as reviewers and prompters, not line-writers.

The implication for platform teams: if you’re still evaluating coding agents as “developer tools,” you’re framing the question wrong. This is becoming an infrastructure layer — and it needs the same governance, observability, and cost controls you’d apply to any production service. (We covered the organizational dimensions of this shift in our coding agent adoption analysis.)

What this means for your stack

Three things changed last week that affect build-vs-buy decisions for agent infrastructure:

  1. The cost floor dropped. Composer 2.5 proves you can get frontier-competitive coding performance from a non-frontier model. Expect more in-house agentic models from IDE vendors — and expect the pricing pressure to cascade upward to API providers.

  2. Enterprise computer-use is now GA. Microsoft’s Copilot Studio release means agent-driven UI automation is available to organizations that couldn’t touch it before. The $0.04/step pricing is aggressive enough to make legacy RPA budgets look expensive.

  3. The adoption evidence is now board-level material. Spotify’s 99% figure, Anthropic’s Code with Claude London event, and the mainstream press coverage of AI coding (three top-tier business outlets ran long-form pieces in a 36-hour span) mean the CFO has probably already heard about this. Your infrastructure plan needs an answer.

For a broader view of how the agent framework landscape is shifting, see our complete guide to AI agent frameworks in 2026. For the enterprise picture, our 2026 adoption analysis tracks the trends behind the headlines — including why 88% of agent pilots still never reach production.

The week of May 18–25 didn’t invent coding agents. It made them an economic inevitability.

← back to blog