"Why does my Claude bill keep going up even when nobody is coding?"

"AI coding agents accumulate costs even during idle or low-activity periods because every new message re-reads the entire conversation history from scratch. A long session with 30 messages doesn't cost 30 equal units — message 30 alone costs roughly 31 times more than message 1. Unclosed sessions and background agent activity can quietly drain budgets without a single line of code being written."

"How much does a long agent session actually cost compared to a short one?"

"Costs scale non-linearly in LLM-based tools because the full conversation history is re-sent with every message. According to token optimization analysis, message 30 in a session costs approximately 31 times more than the first message. Short, focused sessions with clear context resets are significantly cheaper than leaving a single agent session running all day."

"What happened to companies that forgot to set AI usage limits?"

"One enterprise spent $500 million on AI in a single month after failing to cap consumption, and Uber's CTO exhausted their entire 2026 Claude Code budget in just four months. These incidents were driven largely by unmonitored agentic usage patterns rather than deliberate overuse. Without hard limits, costs compound faster than most quarterly budget reviews can catch."

"Is GitHub Copilot still priced per seat or did that change?"

"GitHub moved away from pure seat-based pricing in June 2026, shifting toward usage-and-output billing metrics. This means teams that budgeted based on headcount may now face unpredictable bills tied to how intensively agents are actually used. The change caught many engineering teams off guard because their 2026 budgets were set before agentic tool adoption accelerated."

"Can you actually reduce token costs without slowing down development?"

"Yes — the main lever is session hygiene, specifically starting fresh conversations more frequently and avoiding bloated context windows that force the model to re-read irrelevant history. Breaking work into smaller, scoped agent tasks rather than one long continuous session can dramatically cut costs without changing output quality. Monitoring tools that surface per-session token usage also help teams spot waste before it compounds."

Side Income

AI coding agent idle time: are you leaving money on the table?

June 16, 2026 7 min read English

Uber’s CTO burned through the company’s entire 2026 Claude Code budget in four months. One enterprise spent $500 million on AI in a single month after forgetting to set usage limits. These aren’t edge cases — they’re early warnings of a structural cost problem that most engineering teams haven’t priced into their workflows yet.

The question isn’t whether AI coding agents are useful. They clearly are. The question is whether you’re getting billed for time the agent isn’t actually working — and whether your team’s habits are quietly multiplying that cost.

Key Takeaways
According to SmarterX, some enterprises exhausted their entire annual AI budget within three months, driven primarily by unmonitored agentic usage patterns.
Every LLM message re-reads the full conversation history from scratch, meaning message 30 in a session costs roughly 31x more than message 1, according to DEV Community’s token optimization analysis.
GitHub’s June 2026 billing shift from seat-based to usage-and-output metrics confirms the industry is repricing AI around results, not hours — per Medium’s analysis of AI billing disruption.
Goldman Sachs projects token consumption will grow 24x — reaching 120 quadrillion tokens monthly — between 2026 and 2030, per SmarterX.

The Token Economy Nobody Explained to Engineering Teams

AI coding agents didn’t come with a cost manual. Most teams adopted Claude Code, GitHub Copilot, or similar tools under seat-based pricing — a familiar SaaS model that obscured what was really happening under the hood.

That model is collapsing fast. GitHub moved to usage-and-output billing in June 2026. According to Medium’s analysis of the billing shift, this mirrors a broader structural repricing across the industry — away from access fees and toward consumption metrics. The transition caught most 2026 AI budgets flat-footed. Those budgets were set in fall 2025, before agentic tools triggered exponential consumption increases.

Google processed 3.2 quadrillion tokens in May 2026 alone — a 7x year-over-year increase, per SmarterX. That number makes more sense when you understand the mechanics: every message sent to an LLM re-reads the entire conversation history from scratch. It’s not incremental. Message 30 doesn’t cost “message 30 worth” of tokens. It costs 31x message 1.

Most developers don’t know this. And their workflows show it.

The Hidden Cost Multipliers Inside Your Daily Workflow

Conversation Bloat Is Your Biggest Leak

According to DEV Community’s token optimization breakdown, quality degrades past roughly 60% of the context window — and cost keeps climbing regardless. Capping sessions at 15-20 messages isn’t just good hygiene. It’s the difference between a predictable bill and a surprise.

The correction habit is quietly expensive. When a developer sends “actually, ignore that last request” instead of editing the original message, they’ve just extended the conversation history and reloaded everything before it. Edits replace history. Corrections compound it.

Batch tasking matters more than most teams realize. Each separate message reloads the entire context. Five clarifying messages to a Claude Code session doesn’t cost 5x message 1 — it costs 5, 6, 7, 8, and 9 messages of context respectively, totaling 35 context loads instead of 5.

System Prompts and MCP Servers Are Silent Budget Drains

The CLAUDE.md file loads on every single message. At 2,000 tokens per load across a 30-message session, that’s 60,000 tokens spent entirely on system prompt — before a single line of code gets touched. According to DEV Community, stripping verbose instructions down to minimal, unambiguous directives achieves a 40-50% reduction in system prompt token spend.

MCP server schemas are worse. Playwright MCP alone injects roughly 15,000 tokens per message. That’s not idle time — that’s the agent burning through budget on tool definitions before it touches your actual problem.

Prompt caching theoretically reduces repeated static content to around 10% of original cost. The catch: any mid-session change to CLAUDE.md or tool definitions invalidates the cache entirely. Teams making incremental tweaks to their system prompts are paying full price, repeatedly.

The Thinking Budget Nobody Controls

Claude Code’s /effort command controls internal reasoning tokens — and they bill at the same rate as output tokens. Max effort can consume 10x more tokens than low effort on identical prompts, per DEV Community’s analysis. Running max effort on a variable rename isn’t thoroughness. It’s measurably expensive.

The right routing: /effort low for formatting and renaming, auto for standard development work, /effort high only for architecture decisions and complex debugging. Most teams run everything at auto or max by default.

Effort Level vs. Cost: What the Data Shows

Use Case	Recommended Effort	Relative Token Cost	When to Override
Variable renaming / formatting	`/effort low`	1x baseline	Never
Standard feature work	`auto`	2-4x baseline	Rarely
Integration debugging	`auto` or `high`	4-7x baseline	Complex edge cases
Architecture decisions	`/effort high`	8-10x baseline	Most of the time
Boilerplate generation	`/effort low`	1-2x baseline	Never

The pattern is clear. In most cases, it’s not idle time driving costs — it’s active misconfiguration. Teams burning through budgets are usually running high-effort reasoning on low-complexity tasks, not sitting idle.

What Engineering Teams and Finance Should Do Differently

For individual developers — the cheapest fix is behavioral. Edit instead of correct. Batch instead of fragment. Cap long sessions and start fresh rather than extending a 40-message context. Install ccusage for Claude or CodexBar for Codex to make consumption visible in real time, per DEV Community’s monitoring recommendations.

For engineering leads — audit your CLAUDE.md files and MCP configurations before anything else. A 2,000-token system prompt on a 100-developer team running 20-message sessions daily adds up to millions of tokens weekly on setup costs alone. Treat system prompt size the way you’d treat a slow database query — profile it, compress it, measure the difference.

For finance and procurement teams — the Uber and enterprise examples from SmarterX share a common failure: 2026 AI budgets were scoped against 2025 usage patterns, before agentic tools went mainstream. This approach fails when procurement controls don’t exist to match the new consumption reality. The question isn’t whether AI coding agents create value — according to Medium’s billing analysis, AI-assisted development compresses task completion times by 50-70%. The question is whether that efficiency gain gets captured, or quietly vanishes into unmonitored token consumption.

What to watch next: Microsoft canceled most internal Claude Code licenses six months post-deployment, citing cost. That’s a signal, not an anomaly. Watch for enterprise AI vendors shifting toward flat-fee unlimited models — SmarterX identifies this as one of the proposed structural fixes. If token costs drop 10-100x within the next 12 months as projected, the calculus changes. But the behavioral habits formed now will persist either way.

The Bottom Line

It’s not idle time that’s the primary leak. It’s conversation bloat, misconfigured system prompts, runaway effort settings, and MCP schema overhead — all of it invisible without active monitoring.

Four things to walk away with:

Message 30 costs 31x message 1. Cap sessions, don’t extend them.
System prompts load on every message. Compress aggressively — 40-50% reduction is achievable.
Effort routing matters. Max reasoning on a rename task isn’t thoroughness. It’s waste.
Budget models are lagging reality. Goldman Sachs projects 24x token growth by 2030. Your 2026 AI budget was probably scoped against 2025 patterns.

The teams that win aren’t the ones spending the most on AI tooling. They’re the ones who understand where the meter actually runs — and configure their workflows accordingly.

How are you currently tracking your AI coding agent spend, and do you know which part of your workflow is consuming the most tokens?

References

Photo by Igor Omilaev on Unsplash