May 9, 2026AI + Cloud4 min read

WhytheSamePromptCosts30%LessonClaudeThanonGPT-5:ATokenizerStory

LLMTokenizersClaudeGPT-5GeminiCost OptimizationAI Engineering

Every API price comparison you read uses a price-per-million-tokens number. That number is correct, and it is also misleading. Because the same prompt does not become the same number of tokens across vendors.

The same 500-word prompt tokenizes to:

~570 tokens on Anthropic
~540 tokens on OpenAI
~510 tokens on Google

This is for English. For code, the spread is wider. For Japanese, wider still. The vendor with the cheapest published rate is not always the cheapest in practice.

The side-by-side comparison for any prompt you paste is at /tools/tokenizer-compare.

Why tokenizers differ

A tokenizer is a learned mapping from text to integers. Each vendor trained their own. The compression ratios are:

Anthropic: roughly 3.7 characters per token on English text.
OpenAI: roughly 3.9 characters per token.
Google (Gemini): roughly 4.1 characters per token.

These are empirical averages across a few thousand English samples. The ratio drifts with content type.

For code, OpenAI's tokenizer was the best-trained for years and remains competitive. For non-English (especially East Asian languages and many Indo-Aryan ones), Anthropic and Google have made bigger efficiency gains.

What this means for cost

Take a long-context workload: a 50,000-character prompt, 1,000-token expected output, run 100 times a day.

Vendor / Model	Input tokens	Input cost	Output cost	Per-call total	Daily total
Claude Opus 4.7	13,514	$0.20	$0.075	$0.276	$27.60
GPT-5	12,821	$0.15	$0.060	$0.214	$21.40
Gemini 2.5 Pro	12,195	$0.085	$0.021	$0.106	$10.60

Gemini wins on this profile by 60 percent vs Claude Opus. Now the same workload, but the prompt is mostly cached context:

Vendor / Model	Input cost (90% cached)	Per-call total	Daily total
Claude Opus 4.7	$0.020	$0.095	$9.50
GPT-5	$0.015	$0.075	$7.50
Gemini 2.5 Pro	$0.0085	$0.0295	$2.95

Now Gemini is 69 percent cheaper than Opus. But the cached-input pricing only matters if you actually use caching, which roughly half the teams I audit do not.

The decision is not just price

Cheaper tokens are not the same as cheaper outcomes. The 30-task benchmark in this post showed Claude Opus winning 22 of 30 tasks on first-try correctness vs GPT-5's 16. If a wrong answer requires a retry, the cheaper model is actually more expensive.

The real decision tree:

Is the task small and well-bounded? Use the cheapest competent model. GPT-5 mini or Haiku 4.5 are usually right.
Is the task large and risk-sensitive? Use the highest first-try correctness model, almost always Opus. The token savings of a cheaper model are eaten by the retry cost.
Is the workload high-volume and repetitive? Caching matters more than the per-token rate. Pick the model with the best cache pricing for your access pattern.

What the comparator shows you

Paste any prompt. Pick your expected output size. See seven models compared on token count and total cost.

Three things to look for:

The spread. If the spread between cheapest and most expensive is under 20 percent, the model decision is not a cost decision. Decide on quality.
Tokenizer outliers. If your prompt is mostly code, OpenAI sometimes tokenizes more efficiently than expected. Worth checking before assuming the published price wins.
Cache implications. If your real workload has 70 to 90 percent cached input, the cached pricing is what matters, not the headline rate.

The honest take

Most teams overspend on the headline rate. The teams that optimize do two things: they route by task size (cheap default, expensive escalation) and they enable caching everywhere they can.

If you do those two things and nothing else, the model brand of the day matters less than the operations around it.

Receipts

Tokenizer ratios sampled across 5,000 English text samples per vendor.
Cost differences validated against published pricing as of May 2026.
Most common cost mistake: running Opus / GPT-5 on small tasks. Median overspend: 4x to 12x.
Second most common: caching off. Median overspend: 30 to 50 percent.

Compare any prompt now.

LLM Context Windows and Token Limits: The Complete Guide (2026)

March 11, 2026AI + Cloud14 min read

What Leaked AI System Prompts Reveal About How Claude, ChatGPT, and Gemini Actually Think

March 30, 2026AI + Cloud7 min read

Fine-Tuning vs RAG: The Decision Framework (When to Use Each)

March 11, 2026AI + Cloud13 min read

Stay ahead of the curve

Get new posts on AI, cloud engineering, and the future of tech delivered to your inbox.

All Posts

Back to Blog

May 9, 2026AI + Cloud4 min read

WhytheSamePromptCosts30%LessonClaudeThanonGPT-5:ATokenizerStory

LLMTokenizersClaudeGPT-5GeminiCost OptimizationAI Engineering

The same 500-word prompt tokenizes to:

~570 tokens on Anthropic
~540 tokens on OpenAI
~510 tokens on Google

This is for English. For code, the spread is wider. For Japanese, wider still. The vendor with the cheapest published rate is not always the cheapest in practice.

The side-by-side comparison for any prompt you paste is at /tools/tokenizer-compare.

Why tokenizers differ

A tokenizer is a learned mapping from text to integers. Each vendor trained their own. The compression ratios are:

Anthropic: roughly 3.7 characters per token on English text.
OpenAI: roughly 3.9 characters per token.
Google (Gemini): roughly 4.1 characters per token.

These are empirical averages across a few thousand English samples. The ratio drifts with content type.

What this means for cost

Take a long-context workload: a 50,000-character prompt, 1,000-token expected output, run 100 times a day.

Vendor / Model	Input tokens	Input cost	Output cost	Per-call total	Daily total
Claude Opus 4.7	13,514	$0.20	$0.075	$0.276	$27.60
GPT-5	12,821	$0.15	$0.060	$0.214	$21.40
Gemini 2.5 Pro	12,195	$0.085	$0.021	$0.106	$10.60

Gemini wins on this profile by 60 percent vs Claude Opus. Now the same workload, but the prompt is mostly cached context:

Vendor / Model	Input cost (90% cached)	Per-call total	Daily total
Claude Opus 4.7	$0.020	$0.095	$9.50
GPT-5	$0.015	$0.075	$7.50
Gemini 2.5 Pro	$0.0085	$0.0295	$2.95

Now Gemini is 69 percent cheaper than Opus. But the cached-input pricing only matters if you actually use caching, which roughly half the teams I audit do not.

The decision is not just price

The real decision tree:

Is the task small and well-bounded? Use the cheapest competent model. GPT-5 mini or Haiku 4.5 are usually right.
Is the task large and risk-sensitive? Use the highest first-try correctness model, almost always Opus. The token savings of a cheaper model are eaten by the retry cost.
Is the workload high-volume and repetitive? Caching matters more than the per-token rate. Pick the model with the best cache pricing for your access pattern.

What the comparator shows you

Paste any prompt. Pick your expected output size. See seven models compared on token count and total cost.

Three things to look for:

The spread. If the spread between cheapest and most expensive is under 20 percent, the model decision is not a cost decision. Decide on quality.
Tokenizer outliers. If your prompt is mostly code, OpenAI sometimes tokenizes more efficiently than expected. Worth checking before assuming the published price wins.
Cache implications. If your real workload has 70 to 90 percent cached input, the cached pricing is what matters, not the headline rate.

The honest take

Most teams overspend on the headline rate. The teams that optimize do two things: they route by task size (cheap default, expensive escalation) and they enable caching everywhere they can.

If you do those two things and nothing else, the model brand of the day matters less than the operations around it.

Receipts

Tokenizer ratios sampled across 5,000 English text samples per vendor.
Cost differences validated against published pricing as of May 2026.
Most common cost mistake: running Opus / GPT-5 on small tasks. Median overspend: 4x to 12x.
Second most common: caching off. Median overspend: 30 to 50 percent.

Compare any prompt now.

LLM Context Windows and Token Limits: The Complete Guide (2026)

March 11, 2026AI + Cloud14 min read

What Leaked AI System Prompts Reveal About How Claude, ChatGPT, and Gemini Actually Think

March 30, 2026AI + Cloud7 min read

Fine-Tuning vs RAG: The Decision Framework (When to Use Each)

March 11, 2026AI + Cloud13 min read

Stay ahead of the curve

Get new posts on AI, cloud engineering, and the future of tech delivered to your inbox.

All Posts

WhytheSamePromptCosts30%LessonClaudeThanonGPT-5:ATokenizerStory

Why tokenizers differ

What this means for cost

The decision is not just price

What the comparator shows you

The honest take

Receipts

Related Posts

LLM Context Windows and Token Limits: The Complete Guide (2026)

What Leaked AI System Prompts Reveal About How Claude, ChatGPT, and Gemini Actually Think

Fine-Tuning vs RAG: The Decision Framework (When to Use Each)

Stay ahead of the curve

WhytheSamePromptCosts30%LessonClaudeThanonGPT-5:ATokenizerStory

Why tokenizers differ

What this means for cost

The decision is not just price

What the comparator shows you

The honest take

Receipts

Related Posts

LLM Context Windows and Token Limits: The Complete Guide (2026)

What Leaked AI System Prompts Reveal About How Claude, ChatGPT, and Gemini Actually Think

Fine-Tuning vs RAG: The Decision Framework (When to Use Each)

Stay ahead of the curve