May 9, 2026AI + Cloud5 min read

HowaClaudeCodeAgentBurned$3,400Overnight(AndtheMathtoPredictItBeforeItHappens)

Claude CodeAI AgentsCost OptimizationRunaway LoopsBudget HooksProduction

A founder I know left a Claude Code agent running on a refactoring task at 11 PM. The task was a 22-file cross-cutting rename. Nothing exotic. He went to sleep.

At 7 AM he woke up to a $3,412 token bill. The agent had hit an integration test that was flaky against a third-party stub, looped on the test fix for 6 hours, and never asked for help.

The agent did not malfunction. It did what agents do: it tried to make the test green. The fix that should have taken 40 minutes ran for 360 minutes because nobody told the agent when to stop.

The math that would have predicted this is in the Agent Loop Cost Predictor. The hook that would have prevented it is in the Hooks Builder.

The math

A Claude Code session has four cost knobs, each multiplying the others:

Iterations. How many times the agent submits a fresh request. Each one is a separate API call.
Tool calls per iteration. Reads, edits, bash runs. Each one returns content that lands in the next iteration's context.
Tokens per tool result. A file read returns the whole file. A grep returns matching lines. A bash run returns stdout.
Output tokens. What the agent generates each turn: plan, code, response.

The cost per iteration is roughly:

cost_per_iter = (cumulative_context * input_rate) + (output_tokens * output_rate)

Where cumulative_context grows every iteration because each tool result and each output is appended to history.

Caching helps. Anthropic's prompt cache reduces the cost of already-seen context by up to 90 percent. So the real formula is:

effective_input = context * (1 - 0.9 * cache_hit_rate)

If you have caching on and a 70 percent hit rate, your effective input cost is 37 percent of nominal. If caching is off (and the audit shows it is off in roughly half of production setups), you pay full freight.

What happened in the $3,400 incident

I rebuilt the numbers from the chat history afterwards:

Model: Claude Opus 4.7 (frontier, $15/M input, $75/M output)
Iterations: 287
Tool calls per iteration: 4.2 average
Tokens per tool result: 1,800 average (mostly file reads)
Output tokens per iteration: 600 average
Caching: not configured
Initial context: 14,000 tokens (system + CLAUDE.md + skills)

The cumulative context grew by roughly 8,200 tokens per iteration. By iteration 287, each call was processing 2.4 million tokens of context (most of it cached internally by Anthropic, but billed as if not, because caching was off).

Total cost: ~$3,400. The predictor agrees within a few percent.

The three things that stop this

Thing 1: a token budget hook

The simplest fix. Cap cumulative task tokens at 200,000. The hook tracks usage and refuses new tool calls past the cap. The agent stops. The human is told. The bill stops at roughly $20, not $3,400.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": ".*",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/budget-guard.sh 200000" }
        ]
      }
    ]
  }
}

200K is generous for most legitimate work. Sessions that approach it should stop and ask the human to slice the task smaller.

Thing 2: prompt caching on

Anthropic prompt caching is one config flag. It reduces context costs by up to 90 percent. If you are running agents and you are not caching, you are paying 3 to 10x what you need to.

Thing 3: model routing

The $3,400 incident ran on Opus. Most of those iterations were the agent trying small edits and running tests. That is Sonnet work. Or Haiku work.

The rule I run: tasks under 5 files run on Sonnet. Tasks over 12 files escalate to Opus. Tasks involving infra or migrations confirm with the human first. This single change cuts agent spend by 35 to 50 percent on the median team.

How to predict your own worst case

Open the predictor. Plug in your worst-case loop assumptions:

Iterations: 300 (an agent that is genuinely stuck and not asking for help)
Tool calls per iteration: your typical
Tokens per tool result: depends on file sizes; 1,500 is a reasonable default
Caching hit rate: be honest. If caching is off, use 0.

The number that comes out is your overnight exposure. If you would not be comfortable paying it, write the budget hook this week.

The honest part

Every team I have audited had a story like this. Most of them did not have a $3,400 incident yet, but most had something at $200 to $400 that nobody investigated because "we should probably look into that someday." Someday turns into the post-mortem.

The math is not hard. The hook is 70 lines of bash. The savings are real.

Receipts

Confirmed runaway-loop incidents across audits in the last quarter: 4.
Median wasted spend per incident: $830.
Largest single incident: $3,412.
Teams with token budget hooks before the audit: 0 of 12.
Teams with hooks 30 days after the audit: 12 of 12.

Predict your worst case. Build the hook in 3 minutes.

Stay ahead of the curve

Get new posts on AI, cloud engineering, and the future of tech delivered to your inbox.

All Posts

Back to Blog

May 9, 2026AI + Cloud5 min read

HowaClaudeCodeAgentBurned$3,400Overnight(AndtheMathtoPredictItBeforeItHappens)

Claude CodeAI AgentsCost OptimizationRunaway LoopsBudget HooksProduction

A founder I know left a Claude Code agent running on a refactoring task at 11 PM. The task was a 22-file cross-cutting rename. Nothing exotic. He went to sleep.

At 7 AM he woke up to a $3,412 token bill. The agent had hit an integration test that was flaky against a third-party stub, looped on the test fix for 6 hours, and never asked for help.

The agent did not malfunction. It did what agents do: it tried to make the test green. The fix that should have taken 40 minutes ran for 360 minutes because nobody told the agent when to stop.

The math that would have predicted this is in the Agent Loop Cost Predictor. The hook that would have prevented it is in the Hooks Builder.

The math

A Claude Code session has four cost knobs, each multiplying the others:

Iterations. How many times the agent submits a fresh request. Each one is a separate API call.
Tool calls per iteration. Reads, edits, bash runs. Each one returns content that lands in the next iteration's context.
Tokens per tool result. A file read returns the whole file. A grep returns matching lines. A bash run returns stdout.
Output tokens. What the agent generates each turn: plan, code, response.

The cost per iteration is roughly:

cost_per_iter = (cumulative_context * input_rate) + (output_tokens * output_rate)

Where cumulative_context grows every iteration because each tool result and each output is appended to history.

Caching helps. Anthropic's prompt cache reduces the cost of already-seen context by up to 90 percent. So the real formula is:

effective_input = context * (1 - 0.9 * cache_hit_rate)

What happened in the $3,400 incident

I rebuilt the numbers from the chat history afterwards:

Model: Claude Opus 4.7 (frontier, $15/M input, $75/M output)
Iterations: 287
Tool calls per iteration: 4.2 average
Tokens per tool result: 1,800 average (mostly file reads)
Output tokens per iteration: 600 average
Caching: not configured
Initial context: 14,000 tokens (system + CLAUDE.md + skills)

Total cost: ~$3,400. The predictor agrees within a few percent.

The three things that stop this

Thing 1: a token budget hook

The simplest fix. Cap cumulative task tokens at 200,000. The hook tracks usage and refuses new tool calls past the cap. The agent stops. The human is told. The bill stops at roughly $20, not $3,400.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": ".*",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/budget-guard.sh 200000" }
        ]
      }
    ]
  }
}

200K is generous for most legitimate work. Sessions that approach it should stop and ask the human to slice the task smaller.

Thing 2: prompt caching on

Anthropic prompt caching is one config flag. It reduces context costs by up to 90 percent. If you are running agents and you are not caching, you are paying 3 to 10x what you need to.

Thing 3: model routing

The $3,400 incident ran on Opus. Most of those iterations were the agent trying small edits and running tests. That is Sonnet work. Or Haiku work.

How to predict your own worst case

Open the predictor. Plug in your worst-case loop assumptions:

Iterations: 300 (an agent that is genuinely stuck and not asking for help)
Tool calls per iteration: your typical
Tokens per tool result: depends on file sizes; 1,500 is a reasonable default
Caching hit rate: be honest. If caching is off, use 0.

The number that comes out is your overnight exposure. If you would not be comfortable paying it, write the budget hook this week.

The honest part

The math is not hard. The hook is 70 lines of bash. The savings are real.

Receipts

Confirmed runaway-loop incidents across audits in the last quarter: 4.
Median wasted spend per incident: $830.
Largest single incident: $3,412.
Teams with token budget hooks before the audit: 0 of 12.
Teams with hooks 30 days after the audit: 12 of 12.

Predict your worst case. Build the hook in 3 minutes.

The $47K AWS Bill My Claude Code Hook Caught at 2 AM

May 10, 2026AI + Cloud7 min read

The 7 Claude Code Hooks Every Production Team Should Be Running

May 9, 2026AI + Cloud5 min read

I Audited 12 AI Coding Setups in Q2. Every Single One Was Leaking 5 Figures.

May 3, 2026AI + Cloud7 min read

Stay ahead of the curve

Get new posts on AI, cloud engineering, and the future of tech delivered to your inbox.

All Posts

HowaClaudeCodeAgentBurned$3,400Overnight(AndtheMathtoPredictItBeforeItHappens)

The math

What happened in the $3,400 incident

The three things that stop this

Thing 1: a token budget hook

Thing 2: prompt caching on

Thing 3: model routing

How to predict your own worst case

The honest part

Receipts

Related Posts

The $47K AWS Bill My Claude Code Hook Caught at 2 AM

The 7 Claude Code Hooks Every Production Team Should Be Running

I Audited 12 AI Coding Setups in Q2. Every Single One Was Leaking 5 Figures.

Stay ahead of the curve

HowaClaudeCodeAgentBurned$3,400Overnight(AndtheMathtoPredictItBeforeItHappens)

The math

What happened in the $3,400 incident

The three things that stop this

Thing 1: a token budget hook

Thing 2: prompt caching on

Thing 3: model routing

How to predict your own worst case

The honest part

Receipts

Related Posts

The $47K AWS Bill My Claude Code Hook Caught at 2 AM

The 7 Claude Code Hooks Every Production Team Should Be Running

I Audited 12 AI Coding Setups in Q2. Every Single One Was Leaking 5 Figures.

Stay ahead of the curve