Sizing a system prompt to make sure it (plus expected user input and headroom for output) fits the context window. Estimating the API cost of a batch job before you run it — paste 100 representative inputs and multiply. Comparing prompt-engineering iterations: did the new prompt actually get shorter, or did you just feel like it did? Sanity-checking that a "this is too long" error wasn't caused by hidden whitespace, BOM markers, or copy-paste noise.

Costs are input prices per 1M tokens as of 2025. Output tokens are typically more expensive — multiply by 3–5× for a worst-case output estimate. Llama 3 shows zero cost because the typical deployment is self-hosted. Hosted offerings (Together, Groq, Fireworks) charge $0.20–$1 per 1M depending on size. Prices change. Check the provider's pricing page before relying on these numbers for a real budget.

Tokens ≠ words. An English word averages 1.3 tokens; "antidisestablishmentarianism" is roughly 7. Code and structured text tokenise much higher per character. CJK text is dense. Each Chinese / Japanese / Korean character can be its own token, so 1000 chars ≈ 1000 tokens — much more expensive per "character" than English. Hidden characters add up. Pasted text with zero-width joiners, NBSPs, or BOMs gets counted too. Use the Unicode Inspector tool if your token count looks suspiciously high. System + user + assistant tokens compound. The "context window" budget includes every message in the conversation. Don't size your input against the raw limit; leave 30–50% headroom for replies.

Token Counter

Estimate token counts across GPT-4, Claude, Llama, Gemini, and other LLMs. Paste text, see counts per model side-by-side. Files never leave your browser.

Input

Token counts per model

Model	Tokens	Input $/1M	Cost (this text)

Estimates only. Real tokenizers (tiktoken, SentencePiece, etc.) may differ by ±10%. For exact counts, run the model's tokenizer locally.

What is this for?

Every interaction with an LLM is metered in tokens — sub-word units that the model's tokenizer carves from your text. Tokens drive both context-window limits ("does this fit?") and pricing ("how much will this cost?"). The exact count depends on the model's tokenizer, which you usually don't have at hand. This tool gives you a fast estimate for every major model side-by-side, plus the dollar cost for the input text against each model's published per-token price.

When to use it

Sizing a system prompt to make sure it (plus expected user input and headroom for output) fits the context window.
Estimating the API cost of a batch job before you run it — paste 100 representative inputs and multiply.
Comparing prompt-engineering iterations: did the new prompt actually get shorter, or did you just feel like it did?
Sanity-checking that a "this is too long" error wasn't caused by hidden whitespace, BOM markers, or copy-paste noise.

Honesty about accuracy

These are heuristics, not the real tokenizers. tiktoken (OpenAI), Anthropic's tokenizer, and SentencePiece (Llama, Gemini, Mistral) each carve text differently. For English prose, our estimates land within ±5%. Code, dense JSON, and CJK text drift to ±10% or worse.
Why we don't ship the real tokenizers. tiktoken alone is ~1 MB of WASM + data files; loading it just to count tokens would bloat the page tenfold. If you need exact counts (e.g. you're hitting a hard 8k context limit), run tiktoken in Python locally or call the model's /v1/tokenize endpoint.
What we get right. Relative ordering (which model uses more tokens for the same text) is reliable. Cost rankings are usually accurate. Order-of-magnitude estimates ("is this 500 or 5000 tokens?") are dead-on.

Pricing notes

Costs are input prices per 1M tokens as of 2025. Output tokens are typically more expensive — multiply by 3–5× for a worst-case output estimate.
Llama 3 shows zero cost because the typical deployment is self-hosted. Hosted offerings (Together, Groq, Fireworks) charge $0.20–$1 per 1M depending on size.
Prices change. Check the provider's pricing page before relying on these numbers for a real budget.

Common gotchas

Tokens ≠ words. An English word averages 1.3 tokens; "antidisestablishmentarianism" is roughly 7. Code and structured text tokenise much higher per character.
CJK text is dense. Each Chinese / Japanese / Korean character can be its own token, so 1000 chars ≈ 1000 tokens — much more expensive per "character" than English.
Hidden characters add up. Pasted text with zero-width joiners, NBSPs, or BOMs gets counted too. Use the Unicode Inspector tool if your token count looks suspiciously high.
System + user + assistant tokens compound. The "context window" budget includes every message in the conversation. Don't size your input against the raw limit; leave 30–50% headroom for replies.