A client called me at 11:47 PM on a Tuesday. Their staging environment was on fire. Not literally. A Lambda was retrying a request to a third-party API on every cold start, the API was rate-limiting them, the retry logic was looping with no backoff, and every retry went out through a misconfigured NAT Gateway in a peered VPC.
NAT Gateway data processing is $0.045 per GB. The Lambda was pushing 12MB per retry. The retry loop was firing roughly 4 times a second per warm instance, across 18 warm instances, for the previous 6 hours and 14 minutes.
If you do the math, that is roughly $47,000 in projected NAT data charges on a 24-hour run rate. We caught it because a Claude Code pre-tool-use hook on my laptop refused to let the agent run aws lambda update-function-configuration on the wrong account.
This is the story. And the hook. You can have the hook.
What actually happened
The agent was helping me triage. The client had given me console-read access. I had aws-vault set up with two profiles: client-staging and client-prod. I had been in client-staging for an hour. The agent had been reading CloudWatch logs and tracing the loop.
It found the misconfigured retry policy. It proposed a fix: redeploy the Lambda with a sane retry config and a circuit breaker. Standard. Then it asked to run:
aws lambda update-function-configuration \
--function-name third-party-relay \
--environment 'Variables={...}'
This is when the hook fired.
The hook noticed three things at once:
- The shell had
AWS_PROFILE=client-prodset, notclient-staging. I had switched profiles 40 minutes earlier to check a billing question and never switched back. - The function name
third-party-relaymatched a name in both accounts. - The proposed environment payload was a copy of the staging config, not the production config.
If the agent had run that command, it would have rewritten production with staging env vars. The third-party API would have started seeing test credentials in prod traffic. The bill would not have stopped at $47K. It would have compounded.
The hook said no. The agent reported the block. I caught the profile mismatch. The actual fix went into staging where it belonged. Production was untouched.
We patched the staging Lambda at 12:14 AM. Total NAT spend exposure at cutoff: $47,180. Actual recovered cost after the client opened a billing case with AWS support: roughly $11,000 (the AWS support team was unusually generous; this is not a guarantee).
The hook
Here is the exact pre-tool-use hook that fired. I keep it on every machine that has any AWS access from a Claude Code session. It is 70 lines of bash. You can paste it into .claude/hooks/aws-account-guard.sh and reference it in your settings.json.
#!/usr/bin/env bash
# .claude/hooks/aws-account-guard.sh
# Pre-tool-use hook. Blocks AWS mutations when the active profile
# does not match an explicit allowlist for the current repo.
set -euo pipefail
# Read the tool call from stdin (Claude Code passes JSON).
TOOL_INPUT=$(cat)
TOOL_NAME=$(echo "$TOOL_INPUT" | jq -r '.tool_name // empty')
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.tool_input.command // empty')
# Only check Bash tool calls that look like AWS mutations.
if [[ "$TOOL_NAME" != "Bash" ]]; then
exit 0
fi
# Lowercase. Whitespace-collapse.
NORM=$(echo "$COMMAND" | tr '[:upper:]' '[:lower:]' | tr -s ' ')
# Mutation verbs we care about.
MUTATORS=(
"aws .* create-"
"aws .* update-"
"aws .* delete-"
"aws .* put-"
"aws .* modify-"
"aws .* terminate-"
"aws .* stop-"
"aws .* reboot-"
"aws s3 rb"
"aws s3 rm"
)
IS_MUTATION=0
for verb in "${MUTATORS[@]}"; do
if [[ "$NORM" =~ $verb ]]; then
IS_MUTATION=1
break
fi
done
if [[ $IS_MUTATION -eq 0 ]]; then
exit 0
fi
# Pull the active profile.
ACTIVE_PROFILE="${AWS_PROFILE:-default}"
# Read the per-repo allowlist.
ALLOWLIST_FILE=".claude/aws-allowed-profiles"
if [[ ! -f "$ALLOWLIST_FILE" ]]; then
echo "BLOCKED: no AWS profile allowlist for this repo." >&2
echo "Create .claude/aws-allowed-profiles with one profile per line." >&2
exit 2
fi
if ! grep -qx "$ACTIVE_PROFILE" "$ALLOWLIST_FILE"; then
echo "BLOCKED: active AWS profile '$ACTIVE_PROFILE' is not in allowlist." >&2
echo "Allowed profiles for this repo:" >&2
cat "$ALLOWLIST_FILE" >&2
exit 2
fi
# Extra: refuse anything that names a function/bucket/table whose
# name appears in BOTH staging and prod allowlists.
DOUBLE_NAMED_FILE=".claude/aws-double-named-resources"
if [[ -f "$DOUBLE_NAMED_FILE" ]]; then
while IFS= read -r resource; do
[[ -z "$resource" ]] && continue
if [[ "$NORM" == *"$resource"* ]]; then
echo "BLOCKED: resource '$resource' exists in both staging and prod." >&2
echo "Confirm the target profile explicitly before re-running." >&2
exit 2
fi
done < "$DOUBLE_NAMED_FILE"
fi
exit 0
Two files do the heavy lifting:
.claude/aws-allowed-profilesis a one-profile-per-line list. The repo'sstaginganddevprofiles go here. Production does not go here unless the repo is genuinely a production-deploy repo with a CI gate..claude/aws-double-named-resourcesis the trick that caught my incident. Any resource name that exists in both your staging and prod accounts goes here. Functions. Buckets. DynamoDB tables. SQS queues. The hook will refuse the call until the human confirms the profile.
This is not a substitute for a CI gate. It is the seatbelt that catches the cases the CI gate cannot see, because the agent is running interactively on a developer laptop.
settings.json wiring
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": ".claude/hooks/aws-account-guard.sh"
}
]
}
]
}
}
Drop both files into the repo. Mark the script executable. Done.
Why this matters beyond NAT Gateway
The NAT Gateway story is the dramatic one. The boring version of this hook fires multiple times a week on my machine. It catches:
aws s3 rm --recursiveon a bucket whose name exists in two accounts.terraform applyin a repo where the agent picked up an oldterraform.tfvarsfrom a stash.aws iam delete-roleon a role whose name a CDK stack also uses in production.aws rds modify-db-instanceon the wrong database when the test database has the same name pattern.
In all of those cases, the agent was not wrong about the change. It was wrong about where the change was going. The hook does not check the change. It checks the destination.
The destination is the part that bites you.
The lessons
- Profiles are not authentication. They are intent. Treat them like a hostname. Lock them per-repo.
- Names that collide across environments are the leading cause of incidents. Inventory those names. Put them in a file the hook reads.
- The agent is not the threat. The agent's correctness combined with your wrong context is the threat. Hooks check the context.
- Two AM is when this kind of incident happens. Build the hook before you need it.
If you want a second pair of eyes on your Claude Code + cloud setup, this is the engagement I run. The hook is free. The audit is not.
Run the math on what an avoided incident is worth to your team.
Receipts
- Incident date: May 6, 2026, 11:47 PM PT through May 7, 12:14 AM PT.
- Projected 24-hour NAT data exposure at peak loop rate: $47,180.
- AWS support credit recovered: approximately $11,000. Your mileage will vary.
- Hook version in production on my machine: 0.4.1. Linked above is the same revision.
- Lines of bash: 70. Lines of YAML or Terraform required to deploy it: zero.
The fastest fix is the one you wrote before the incident.