Over the last 30 days I ran Claude Code as the primary pair-programmer across seven concurrent projects spanning consumer mobile, backend services, trading automation, personal web, networking, open-source docs, and a tax utility.
Here's the receipt: 426 commits, 347 co-authored by Claude (81%), 17 PRs merged, +144,484 / −33,762 lines across 7,082 file-touches. Estimated ~$1,100 in tokens for the month.
(Project names are redacted below. The scope and stack are real.)
TL;DR
- 7 projects in parallel, not one. Different stacks, different quality bars.
- 426 commits in 30 days. 347 co-authored by Claude (81%).
- 17 merged PRs, spanning mobile feature parity, backend services, web redesigns, infra work.
- +144,484 insertions / −33,762 deletions. Net +110,722 lines.
- 7,082 file-touches (same file can be edited many times; this is Claude's working surface area, not unique file count).
- ~$1,100 estimated in Claude tokens at published Sonnet pricing. Labeled estimate. Anthropic console is authoritative.
- ~173 hours reclaimed at a conservative 30-minute-per-commit average. Roughly $16K of labor for $1.1K in tokens. Call it a ~15x return for the month.
- 1 real near-miss on the consumer mobile app: the onboarding "fix" that was a regression in disguise. Details in section 5.
The seven repos
| Project | Stack | Commits | Claude | PRs | Net lines |
|---|---|---|---|---|---|
| Consumer SaaS (mobile + web + infra) | iOS, Android, Next.js, Workers, AWS, Terraform | 222 | 209 | 4 | +70,002 |
| Trading automation platform | Python, backtesting, live execution | 130 | 68 | 11 | +18,589 |
| Personal brand site | Next.js 15, React 19 | 53 | 53 | 1 | +10,573 |
| Self-hosted VPN | Ansible, WireGuard, Raspberry Pi | 4 | 4 | 0 | +5,251 |
| Home network rebuild | UniFi, DNS, scripts | 9 | 9 | 0 | +5,339 |
| Open-source curated list | Markdown | 6 | 2 | 1 | +162 |
| Tax automation utility | Python | 2 | 2 | 0 | +806 |
| TOTALS | 426 | 347 | 17 | +110,722 |
Three of these were already live in production when the 30 days started. Four were either greenfield or got a major rebuild during the month.
The setup (same guardrails across every repo)
A single CLAUDE.md template per repo, tuned to the stack but enforcing the same rules:
settings.jsonscoped to the repo only. No writes outside.- Pre-tool-use hooks block destructive operations (
rm -rf,git push --force,terraform destroy). - No production secrets in context. Dev environments only. Real deploys gated by normal CI after merge.
- Manual review required on
terraform/,.github/workflows/, and anything touching auth or data migrations. - Trivial PRs (docs, formatting, dep bumps) auto-merge with green CI after the first 14 days.
- Per-task token budget in hooks so runaway sessions couldn't burn $50 silently.
Writing that template once and dropping it into every repo was the single highest-leverage hour of the month.
Week 1: Ramp-up (Mar 20 to 27)
0 Claude co-authored commits on the main SaaS project. All of week 1 was guardrails: the CLAUDE.md template, the hook config, the per-repo settings.json, a shared agents directory for repeatable reviewers.
Counter-intuitive but load-bearing. If I'd skipped this week and started with a fresh agent against the open repo, week 2 would have been the incident story instead of the productivity story.
The smaller side projects (home network, self-hosted VPN, tax utility) did get light Claude use this week. Low-stakes places to stress-test the hooks.
Week 2: Hitting stride (Mar 27 to Apr 3)
More than 150 commits on the main SaaS alone. This was the week the productivity claim stopped feeling like vendor marketing.
The big wins:
- Complete mobile feature parity push between iOS and Android on the consumer SaaS. Same nav, same state flows, same visual language. Thousands of file-touches on iOS, a few hundred on Android.
- Trading platform backtesting engine gained a full risk-management layer and +11 merged PRs across the month's trading-pipeline work.
- The Next.js dashboard on the SaaS got a full redesign + dark mode polish.
- CI coverage went from "some tests" to green-required baseline across three of the seven repos.
Typical cost-per-commit stabilized mid-week. Claude had read enough of each repo to stop asking where things live.
Week 3: Polish and parity (Apr 3 to 10)
50+ commits across the active projects. Slope changes here because the high-leverage work was mostly done. What remained: UI consistency, platform parity bugs, error-state polish.
One surprise. Claude insisted on writing tests for boring display components I'd have skipped. That coverage paid for itself the first time I refactored shared layout.
Week 4: The incident (Apr 10 to 17)
Here's the one that almost shipped. On the consumer SaaS, Claude opened a PR that "fixed" an iOS onboarding bug: the flow advanced to the success screen even when the backend registration had failed. The bug was real. The fix looked clean.
What I caught on manual review: the fix silenced the error but didn't surface a retry flow. Users would have seen a spinner, then nothing, with no indication anything was wrong. The "fix" was a regression in disguise. The visible bug was gone, the user experience was silently worse.
We caught it before merging to main. Lessons baked into the ruleset immediately:
- Any commit that touches auth, onboarding, or registration flows requires a second review.
- Any
catchblock Claude adds needs an explicit user-facing error path in the same PR. - "The screen no longer advances incorrectly" is step 1, not the fix.
Across 347 Claude-co-authored commits, one near-miss caught at review. That's a ~0.3% error rate on auth-critical flows. Low, but not zero. Hence the guardrails.
What shipped that I wouldn't have
- Cross-platform UI parity at a quality level I'd never have shipped manually. Claude kept a running mental model of "if I change this in iOS, I should update the Android equivalent."
- Boilerplate tests for data models and view states. Probably 400+ assertions I'd have waived as YAGNI.
- Module-level README docs across every
services/folder on every repo. Nobody writes these. Claude happily did. - Self-hosted VPN rebuild from scratch with proper Ansible playbooks. 5,251 lines of infrastructure-as-code on a repo that had been "a bag of bash scripts" for three years.
- A tax automation utility I'd been meaning to write since January. Shipped in one session.
The math
At Sonnet blended pricing (~$8 per 1M tokens, weighted mix of input/output), 347 Claude-assisted commits at 400K tokens apiece roll up to **$1,100 for the month.** That's an estimate, not a bill. The real number is whatever your Anthropic console shows, and prompt-caching will drive it lower if you're doing multi-turn sessions on the same repo (my cache hit rate hovered near 99.99% when I spot-checked it).
Value reclaimed, conservatively: ~30 minutes saved per commit (typing, context-switching, boilerplate). 347 × 0.5h = ~173 hours reclaimed. At a senior engineer fully-loaded rate ($180K/yr → ~$93.75/hr), that's roughly $16,200 of labor for $1,100 in tokens. Call it a ~15x return for the month.
Your numbers will differ. The point: the rate is high enough that "is this worth it" is the wrong question. The right question is "what are my guardrails."
Calculate your own team's version →
Every constant I used: the methodology page.
What I'd tell a CTO considering this
- Budget a ramp week. Week 1 was 0 commits on the main project, all guardrails. That week is non-negotiable. It's what separates "productivity story" from "incident story."
- One
CLAUDE.mdtemplate, many repos. Spreading across seven projects was only possible because the rules were consistent. Write it once, tune per stack. - Hooks are not optional. Every scary story about agent coding I've heard traces back to missing pre-tool-use hooks, not to the model.
- Budget per task, not per month. A stuck session burns $50/hr. Token budgets per task catch this before the bill does.
- Keep humans on auth, networking, secrets, data migrations. Always. The onboarding story is a reminder that "the visible bug went away" is not the same as "the bug is fixed." Agents write fluent suppression code faster than you can notice it.
Caveats
- Sample size: one senior engineer, one month, seven concurrent projects. Directional, not gospel.
- I'm a senior engineer with 10+ years of cloud infrastructure experience. A junior would get a very different leverage ratio, especially on the "Claude wrote a bad fix" detection part.
- Token cost is estimated from published pricing. Your Anthropic console is the authoritative source.
- "Hours reclaimed" counts typing + context-switching. It doesn't quantify the interrupt avoidance, the tests I wouldn't have written, the cross-platform parity I wouldn't have shipped, or the VPN I wouldn't have rebuilt.
- 7 projects in 30 days isn't representative of a 9-to-5 workload. I was aggressively using Claude Code from ~6am personal-project time, during work, and evenings. Your cadence will differ.
- Project names are redacted intentionally. Happy to share more under NDA during a consulting call; not comfortable posting the GitHub URLs publicly.
If you're rolling this out
I help engineering teams dial in their Claude Code setup without breaking prod: audits, fractional support, leadership workshops. Details →
Or book a 30-min intro call if you'd rather talk first.