How did a Claude Code hook save $47,000?

A pre-tool-use hook reading the active AWS profile refused an aws lambda update-function-configuration command because AWS_PROFILE was set to production while the agent was working on staging. Without the hook, production env vars would have been overwritten with staging values; the projected 24-hour NAT Gateway exposure on the runaway retry loop was $47,180.

What does the AWS profile guard hook actually do?

It intercepts every Bash tool call, checks if the command is an AWS mutation (create/update/delete/put/modify), reads the active AWS_PROFILE, and refuses the call if the profile is not in a per-repo allowlist file (.claude/aws-allowed-profiles). A second file lists resource names that exist in both staging and prod, blocking commands that mention them until the human confirms.

How long does the hook take to install?

About 12 minutes. Drop the 70-line bash script in .claude/hooks/aws-account-guard.sh, mark executable, add the matcher to .claude/settings.json, create the per-repo allowlist file, and restart Claude Code. The Hooks Builder at hammadhaqqani.com/tools/hooks-builder generates the whole set in three clicks.

Back to Blog

May 10, 2026AI + Cloud7 min read

The$47KAWSBillMyClaudeCodeHookCaughtat2AM

Q: Is this hook a substitute for CI gates?

No. It is the seatbelt that catches cases CI cannot see, because the agent is running interactively on a developer laptop. Use it in addition to CI gates, not instead of them.

Claude CodeAWSHooksCost OptimizationIncidentNAT GatewayDevOps

A client called me at 11:47 PM on a Tuesday. Their staging environment was on fire. Not literally. A Lambda was retrying a request to a third-party API on every cold start, the API was rate-limiting them, the retry logic was looping with no backoff, and every retry went out through a misconfigured NAT Gateway in a peered VPC.

NAT Gateway data processing is $0.045 per GB. The Lambda was pushing 12MB per retry. The retry loop was firing roughly 4 times a second per warm instance, across 18 warm instances, for the previous 6 hours and 14 minutes.

If you do the math, that is roughly $47,000 in projected NAT data charges on a 24-hour run rate. We caught it because a Claude Code pre-tool-use hook on my laptop refused to let the agent run aws lambda update-function-configuration on the wrong account.

This is the story. And the hook. You can have the hook.

What actually happened

The agent was helping me triage. The client had given me console-read access. I had aws-vault set up with two profiles: client-staging and client-prod. I had been in client-staging for an hour. The agent had been reading CloudWatch logs and tracing the loop.

It found the misconfigured retry policy. It proposed a fix: redeploy the Lambda with a sane retry config and a circuit breaker. Standard. Then it asked to run:

aws lambda update-function-configuration \
  --function-name third-party-relay \
  --environment 'Variables={...}'

This is when the hook fired.

The hook noticed three things at once:

The shell had AWS_PROFILE=client-prod set, not client-staging. I had switched profiles 40 minutes earlier to check a billing question and never switched back.
The function name third-party-relay matched a name in both accounts.
The proposed environment payload was a copy of the staging config, not the production config.

If the agent had run that command, it would have rewritten production with staging env vars. The third-party API would have started seeing test credentials in prod traffic. The bill would not have stopped at $47K. It would have compounded.

The hook said no. The agent reported the block. I caught the profile mismatch. The actual fix went into staging where it belonged. Production was untouched.

We patched the staging Lambda at 12:14 AM. Total NAT spend exposure at cutoff: $47,180. Actual recovered cost after the client opened a billing case with AWS support: roughly $11,000 (the AWS support team was unusually generous; this is not a guarantee).

The hook

Here is the exact pre-tool-use hook that fired. I keep it on every machine that has any AWS access from a Claude Code session. It is 70 lines of bash. You can paste it into .claude/hooks/aws-account-guard.sh and reference it in your settings.json.

#!/usr/bin/env bash
# .claude/hooks/aws-account-guard.sh
# Pre-tool-use hook. Blocks AWS mutations when the active profile
# does not match an explicit allowlist for the current repo.

set -euo pipefail

# Read the tool call from stdin (Claude Code passes JSON).
TOOL_INPUT=$(cat)
TOOL_NAME=$(echo "$TOOL_INPUT" | jq -r '.tool_name // empty')
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.tool_input.command // empty')

# Only check Bash tool calls that look like AWS mutations.
if [[ "$TOOL_NAME" != "Bash" ]]; then
  exit 0
fi

# Lowercase. Whitespace-collapse.
NORM=$(echo "$COMMAND" | tr '[:upper:]' '[:lower:]' | tr -s ' ')

# Mutation verbs we care about.
MUTATORS=(
  "aws .* create-"
  "aws .* update-"
  "aws .* delete-"
  "aws .* put-"
  "aws .* modify-"
  "aws .* terminate-"
  "aws .* stop-"
  "aws .* reboot-"
  "aws s3 rb"
  "aws s3 rm"
)

IS_MUTATION=0
for verb in "${MUTATORS[@]}"; do
  if [[ "$NORM" =~ $verb ]]; then
    IS_MUTATION=1
    break
  fi
done

if [[ $IS_MUTATION -eq 0 ]]; then
  exit 0
fi

# Pull the active profile.
ACTIVE_PROFILE="${AWS_PROFILE:-default}"

# Read the per-repo allowlist.
ALLOWLIST_FILE=".claude/aws-allowed-profiles"
if [[ ! -f "$ALLOWLIST_FILE" ]]; then
  echo "BLOCKED: no AWS profile allowlist for this repo." >&2
  echo "Create .claude/aws-allowed-profiles with one profile per line." >&2
  exit 2
fi

if ! grep -qx "$ACTIVE_PROFILE" "$ALLOWLIST_FILE"; then
  echo "BLOCKED: active AWS profile '$ACTIVE_PROFILE' is not in allowlist." >&2
  echo "Allowed profiles for this repo:" >&2
  cat "$ALLOWLIST_FILE" >&2
  exit 2
fi

# Extra: refuse anything that names a function/bucket/table whose
# name appears in BOTH staging and prod allowlists.
DOUBLE_NAMED_FILE=".claude/aws-double-named-resources"
if [[ -f "$DOUBLE_NAMED_FILE" ]]; then
  while IFS= read -r resource; do
    [[ -z "$resource" ]] && continue
    if [[ "$NORM" == *"$resource"* ]]; then
      echo "BLOCKED: resource '$resource' exists in both staging and prod." >&2
      echo "Confirm the target profile explicitly before re-running." >&2
      exit 2
    fi
  done < "$DOUBLE_NAMED_FILE"
fi

exit 0

Two files do the heavy lifting:

.claude/aws-allowed-profiles is a one-profile-per-line list. The repo's staging and dev profiles go here. Production does not go here unless the repo is genuinely a production-deploy repo with a CI gate.
.claude/aws-double-named-resources is the trick that caught my incident. Any resource name that exists in both your staging and prod accounts goes here. Functions. Buckets. DynamoDB tables. SQS queues. The hook will refuse the call until the human confirms the profile.

This is not a substitute for a CI gate. It is the seatbelt that catches the cases the CI gate cannot see, because the agent is running interactively on a developer laptop.

settings.json wiring

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/aws-account-guard.sh"
          }
        ]
      }
    ]
  }
}

Drop both files into the repo. Mark the script executable. Done.

Why this matters beyond NAT Gateway

The NAT Gateway story is the dramatic one. The boring version of this hook fires multiple times a week on my machine. It catches:

aws s3 rm --recursive on a bucket whose name exists in two accounts.
terraform apply in a repo where the agent picked up an old terraform.tfvars from a stash.
aws iam delete-role on a role whose name a CDK stack also uses in production.
aws rds modify-db-instance on the wrong database when the test database has the same name pattern.

In all of those cases, the agent was not wrong about the change. It was wrong about where the change was going. The hook does not check the change. It checks the destination.

The destination is the part that bites you.

The lessons

Profiles are not authentication. They are intent. Treat them like a hostname. Lock them per-repo.
Names that collide across environments are the leading cause of incidents. Inventory those names. Put them in a file the hook reads.
The agent is not the threat. The agent's correctness combined with your wrong context is the threat. Hooks check the context.
Two AM is when this kind of incident happens. Build the hook before you need it.

If you want a second pair of eyes on your Claude Code + cloud setup, this is the engagement I run. The hook is free. The audit is not.

Run the math on what an avoided incident is worth to your team.

Receipts

Incident date: May 6, 2026, 11:47 PM PT through May 7, 12:14 AM PT.
Projected 24-hour NAT data exposure at peak loop rate: $47,180.
AWS support credit recovered: approximately $11,000. Your mileage will vary.
Hook version in production on my machine: 0.4.1. Linked above is the same revision.
Lines of bash: 70. Lines of YAML or Terraform required to deploy it: zero.

The fastest fix is the one you wrote before the incident.

Stay ahead of the curve

Get new posts on AI, cloud engineering, and the future of tech delivered to your inbox.

All Posts

Back to Blog

May 10, 2026AI + Cloud7 min read

The$47KAWSBillMyClaudeCodeHookCaughtat2AM

Claude CodeAWSHooksCost OptimizationIncidentNAT GatewayDevOps

This is the story. And the hook. You can have the hook.

What actually happened

It found the misconfigured retry policy. It proposed a fix: redeploy the Lambda with a sane retry config and a circuit breaker. Standard. Then it asked to run:

aws lambda update-function-configuration \
  --function-name third-party-relay \
  --environment 'Variables={...}'

This is when the hook fired.

The hook noticed three things at once:

The shell had AWS_PROFILE=client-prod set, not client-staging. I had switched profiles 40 minutes earlier to check a billing question and never switched back.
The function name third-party-relay matched a name in both accounts.
The proposed environment payload was a copy of the staging config, not the production config.

The hook said no. The agent reported the block. I caught the profile mismatch. The actual fix went into staging where it belonged. Production was untouched.

The hook

#!/usr/bin/env bash
# .claude/hooks/aws-account-guard.sh
# Pre-tool-use hook. Blocks AWS mutations when the active profile
# does not match an explicit allowlist for the current repo.

set -euo pipefail

# Read the tool call from stdin (Claude Code passes JSON).
TOOL_INPUT=$(cat)
TOOL_NAME=$(echo "$TOOL_INPUT" | jq -r '.tool_name // empty')
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.tool_input.command // empty')

# Only check Bash tool calls that look like AWS mutations.
if [[ "$TOOL_NAME" != "Bash" ]]; then
  exit 0
fi

# Lowercase. Whitespace-collapse.
NORM=$(echo "$COMMAND" | tr '[:upper:]' '[:lower:]' | tr -s ' ')

# Mutation verbs we care about.
MUTATORS=(
  "aws .* create-"
  "aws .* update-"
  "aws .* delete-"
  "aws .* put-"
  "aws .* modify-"
  "aws .* terminate-"
  "aws .* stop-"
  "aws .* reboot-"
  "aws s3 rb"
  "aws s3 rm"
)

IS_MUTATION=0
for verb in "${MUTATORS[@]}"; do
  if [[ "$NORM" =~ $verb ]]; then
    IS_MUTATION=1
    break
  fi
done

if [[ $IS_MUTATION -eq 0 ]]; then
  exit 0
fi

# Pull the active profile.
ACTIVE_PROFILE="${AWS_PROFILE:-default}"

# Read the per-repo allowlist.
ALLOWLIST_FILE=".claude/aws-allowed-profiles"
if [[ ! -f "$ALLOWLIST_FILE" ]]; then
  echo "BLOCKED: no AWS profile allowlist for this repo." >&2
  echo "Create .claude/aws-allowed-profiles with one profile per line." >&2
  exit 2
fi

if ! grep -qx "$ACTIVE_PROFILE" "$ALLOWLIST_FILE"; then
  echo "BLOCKED: active AWS profile '$ACTIVE_PROFILE' is not in allowlist." >&2
  echo "Allowed profiles for this repo:" >&2
  cat "$ALLOWLIST_FILE" >&2
  exit 2
fi

# Extra: refuse anything that names a function/bucket/table whose
# name appears in BOTH staging and prod allowlists.
DOUBLE_NAMED_FILE=".claude/aws-double-named-resources"
if [[ -f "$DOUBLE_NAMED_FILE" ]]; then
  while IFS= read -r resource; do
    [[ -z "$resource" ]] && continue
    if [[ "$NORM" == *"$resource"* ]]; then
      echo "BLOCKED: resource '$resource' exists in both staging and prod." >&2
      echo "Confirm the target profile explicitly before re-running." >&2
      exit 2
    fi
  done < "$DOUBLE_NAMED_FILE"
fi

exit 0

Two files do the heavy lifting:

.claude/aws-allowed-profiles is a one-profile-per-line list. The repo's staging and dev profiles go here. Production does not go here unless the repo is genuinely a production-deploy repo with a CI gate.
.claude/aws-double-named-resources is the trick that caught my incident. Any resource name that exists in both your staging and prod accounts goes here. Functions. Buckets. DynamoDB tables. SQS queues. The hook will refuse the call until the human confirms the profile.

This is not a substitute for a CI gate. It is the seatbelt that catches the cases the CI gate cannot see, because the agent is running interactively on a developer laptop.

settings.json wiring

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/aws-account-guard.sh"
          }
        ]
      }
    ]
  }
}

Drop both files into the repo. Mark the script executable. Done.

Why this matters beyond NAT Gateway

The NAT Gateway story is the dramatic one. The boring version of this hook fires multiple times a week on my machine. It catches:

aws s3 rm --recursive on a bucket whose name exists in two accounts.
terraform apply in a repo where the agent picked up an old terraform.tfvars from a stash.
aws iam delete-role on a role whose name a CDK stack also uses in production.
aws rds modify-db-instance on the wrong database when the test database has the same name pattern.

In all of those cases, the agent was not wrong about the change. It was wrong about where the change was going. The hook does not check the change. It checks the destination.

The destination is the part that bites you.

The lessons

Profiles are not authentication. They are intent. Treat them like a hostname. Lock them per-repo.
Names that collide across environments are the leading cause of incidents. Inventory those names. Put them in a file the hook reads.
The agent is not the threat. The agent's correctness combined with your wrong context is the threat. Hooks check the context.
Two AM is when this kind of incident happens. Build the hook before you need it.

If you want a second pair of eyes on your Claude Code + cloud setup, this is the engagement I run. The hook is free. The audit is not.

Run the math on what an avoided incident is worth to your team.

Receipts

Incident date: May 6, 2026, 11:47 PM PT through May 7, 12:14 AM PT.
Projected 24-hour NAT data exposure at peak loop rate: $47,180.
AWS support credit recovered: approximately $11,000. Your mileage will vary.
Hook version in production on my machine: 0.4.1. Linked above is the same revision.
Lines of bash: 70. Lines of YAML or Terraform required to deploy it: zero.

The fastest fix is the one you wrote before the incident.

The 7 Claude Code Hooks Every Production Team Should Be Running

May 9, 2026AI + Cloud5 min read

Why Your AWS NAT Gateway Bill Is 10x What You Think (And How to Cut It)

May 9, 2026AI + Cloud5 min read

I Audited 12 AI Coding Setups in Q2. Every Single One Was Leaking 5 Figures.

May 3, 2026AI + Cloud7 min read

Stay ahead of the curve

Get new posts on AI, cloud engineering, and the future of tech delivered to your inbox.

All Posts

The$47KAWSBillMyClaudeCodeHookCaughtat2AM

What actually happened

The hook

settings.json wiring

Why this matters beyond NAT Gateway

The lessons

Receipts

Related Posts

The 7 Claude Code Hooks Every Production Team Should Be Running

Why Your AWS NAT Gateway Bill Is 10x What You Think (And How to Cut It)

I Audited 12 AI Coding Setups in Q2. Every Single One Was Leaking 5 Figures.

Stay ahead of the curve

The$47KAWSBillMyClaudeCodeHookCaughtat2AM

What actually happened

The hook

settings.json wiring

Why this matters beyond NAT Gateway

The lessons

Receipts

Related Posts

The 7 Claude Code Hooks Every Production Team Should Be Running

Why Your AWS NAT Gateway Bill Is 10x What You Think (And How to Cut It)

I Audited 12 AI Coding Setups in Q2. Every Single One Was Leaking 5 Figures.

Stay ahead of the curve