Tinker AI
Read reviews

Token Counter

Count tokens for GPT-5.5, Claude 4.x, Gemini, DeepSeek, and 20+ models — see what each costs.

Input
0 chars
Model Tokens Input $/M Output $/M
GPT-5.5
OpenAI
$15.00 $60.00
GPT-5
OpenAI
$5.00 $25.00
GPT-4o
OpenAI
$2.50 $10.00
GPT-4o mini
OpenAI
$0.15 $0.60
o3
OpenAI
$2.00 $8.00
Claude Opus 4.7
Anthropic
$15.00 $75.00
Claude Sonnet 4.6
Anthropic
$3.00 $15.00
Claude Haiku 4.5
Anthropic
$0.25 $1.25
Gemini 2.5 Pro
Google
$1.25 $5.00
Gemini 2.5 Flash
Google
$0.30 $2.50
Llama 3.3 70B
Meta
$0.59 $0.79
DeepSeek V3
DeepSeek
$0.27 $1.10

When you actually need this

You’re building an agent loop and the prompts are getting long. You want to know if you’re approaching the context window before you ship the change. You wrote a system prompt that you suspect is wasteful and want to see how much you’d save by trimming.

The official tokenizer libraries each have a SDK and an install. This page lets you paste text and see the count for 22 models at once, with cost projection at the listed prices. No signup, no account, no SDK install.

Gotchas we keep hitting

Different models, different tokenizers. OpenAI’s o200k_base, Anthropic’s Claude tokenizer, and Cohere’s BPE all carve up the same string differently. For English prose the variance is small. For Chinese, Korean, or Japanese it’s significant — a paragraph that’s 200 tokens for GPT can be 250 for Claude. Code with lots of underscores and dots tokenizes differently from natural language.

System prompt + tools schema count too. When debugging “why is my prompt expensive?”, remember the system prompt and any tools/functions schema get tokenized on every call. A 2000-token tools schema sent on every request is 2000 input tokens × every-call. Cache it (Anthropic prompt caching, OpenAI prompt caching) and the marginal cost drops to near-zero.

Output tokens are usually the cost driver. Models charge 3–5x more for output than input. A typical chat call has 500 input tokens and 1500 output tokens — output is 6–10x the input cost. When choosing models, look at the output rate, not the input rate.

HTML and Markdown eat tokens fast. A code block in fenced Markdown costs more than the same code in plain text — the fence backticks are extra tokens, and Markdown link syntax adds more. If you’re feeding the model HTML, strip it to text first; you’ll often cut input tokens 30%.

Whitespace and Unicode normalization matter. Trailing spaces, multiple consecutive newlines, and combining Unicode characters (e.g., emoji modifiers) all cost more tokens than they look like. Sometimes meaningfully more.

The pricing snapshot is from a specific date. This tool’s prices were verified on the last_reviewed date. Providers change pricing without much notice; if the bill looks off vs the displayed rate, check the provider’s current page.

With AI in the loop

Before you scale up an automated workflow, run a representative prompt + input through this and project the cost across your expected volume. Many “AI is too expensive” complaints come from underestimating output tokens. If your prompt asks the model to “explain its reasoning step by step”, expect output to be 3-5x your typical chat call.

For cost-sensitive workflows, A/B test prompt variants against this counter before deploying. We’ve cut a production agent’s cost by 40% just by tightening the system prompt and removing a redundant tools schema entry.

When choosing between Claude Haiku and GPT-4o-mini for a high-volume task, run the actual prompt through both and compare: tokens × $/M for input + tokens × $/M for output. The headline rates ($0.25/M vs $0.15/M) are misleading when the tokenizers count differently. Sometimes Haiku is cheaper end-to-end despite the higher per-token rate, because it tokenizes more compactly for code-heavy text.

FAQ

How accurate are the counts?
For OpenAI models we use js-tiktoken with the official encoding tables — exact match to the API. For Anthropic we use their published BPE specification — official accuracy. For Gemini, Llama, DeepSeek, and others we approximate using cl100k_base, with measured accuracy noted per model (88% to 95%).
Where do the prices come from?
Each model's row links to the provider's pricing page where the number was last verified. The Pricing GitHub Action (Plan 4) refreshes them weekly; manual edits to models.yaml are also welcome.
Why is output priced higher than input?
Generation is more compute-intensive than reading. Output tokens go through the model autoregressively (one at a time); input tokens are processed in parallel. The cost ratio is typically 3-5x.
Does this count my system prompt?
No. The counter measures whatever you paste into the input. If you want the full prompt cost, paste your full prompt including system message + tools schema + user message all together.
Why does the same text count differently across models?
Each model family has its own tokenizer. Claude's BPE merges different sub-strings than OpenAI's o200k_base. For English prose the difference is usually under 5%; for code, CJK text, or unusual symbols it can be 15%+.

Related tools in your toolbox