If you are a serious Claude user,paying for Claude.ai Pro or Max, hitting the Anthropic API from a script, or running Claude Code on a daily basis,there comes a moment when "the bill was bigger this month" stops being acceptable as a diagnosis. You want to know which project drove it, which model ate the budget, and whether you are actually getting more done.
Anthropic gives you some of this information. Not all of it. This guide walks through what's available at each surface, where the blind spots are, and how to wire up a complete picture.
The three surfaces, the three views.
Most people meet Claude through one of three doors, each with a very different relationship to "tokens":
- claude.ai:the web app and mobile apps. Flat-rate subscription (Free, Pro, Max). No token counts shown by default. Limits enforced as "messages per X hours."
- The Anthropic API:pay-per-token. Every response from the API includes a
usageblock withinput_tokens,output_tokens, and cache counters. - Claude Code:the CLI / IDE assistant. Backed by the API under the hood; some sessions are subscription-bundled, others bill per token.
These three are billed differently, gated differently, and instrumented differently. A complete picture of "my Claude usage" has to combine all three. Most users only see one.
What Anthropic shows you out of the box.
1. The Claude.ai web app.
On Claude.ai itself, token counts are essentially hidden. You get:
- A rolling limit indicator when you bump up against your quota ("You've reached your limit. Try again at 4:00 PM").
- A model picker (Sonnet, Opus, Haiku),but no indicator of cost-per-message for each.
- A retention setting and a list of conversations. No usage history view.
In other words: you can use Claude all day and have zero visibility into how close you are to the next limit, which conversations were expensive, or whether Opus would have been overkill for any given thread. The product is intentionally simple. Useful, until you want to optimize.
2. The Anthropic console.
If you have an API key, console.anthropic.com gives you a usage dashboard with daily token volume per model and a cost total. This is genuinely useful,but only for API traffic. It does not show your Claude.ai web usage or your Claude Code sessions in any meaningful breakdown.
You can also see invoices and set workspace-level spend limits. What you cannot see: per-prompt cost, per-script attribution, or which one of your shell pipelines accidentally hit Opus 4,000 times last Tuesday.
3. Claude Code and the API response.
Every API response includes a usage object that looks
something like:
{
"usage": {
"input_tokens": 1842,
"output_tokens": 731,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0
}
}
This is the raw ground truth. If you control the code calling the API, you can log this to your own store and reconstruct usage at whatever granularity you want,per request, per project, per user. For Claude Code specifically, the CLI emits its own session logs, but they live on your machine and nobody is rolling them up for you.
The three blind spots almost everyone has.
Blind spot 1: claude.ai usage is invisible.
Most Claude usage by hours-of-the-day is the web app. The web app shows you nothing about tokens. So the "headline" surface,where the actual conversations happen,is the one with the worst instrumentation by default.
The only way to fix this is to count tokens client-side, in the browser, while you chat. That is exactly what a browser extension can do: read the prompt and response from the DOM, run a tokenizer locally, and surface a count. The prompt text never has to leave your machine.
Blind spot 2: cross-platform totals don't exist anywhere.
Your real AI bill is "Claude + ChatGPT + Cursor + whatever your team added last week." Anthropic only shows you Anthropic. OpenAI only shows you OpenAI. There is no native dashboard that adds them up, and there will not be,none of the providers are incentivized to build it.
We wrote more about why this matters in Why tracking tokens matters. The short version: until you can see total spend across all your AI tools in one place, you cannot make sensible decisions about which one to lean on for which kind of work.
Blind spot 3: which conversations were expensive.
Even when you have monthly totals, you usually cannot answer "which five conversations cost the most?" That is the question that actually drives optimization,because nine times out of ten, it's the same one or two long-running threads doing 60% of the damage, and once you see them you can split them or move them to a cheaper model without changing anything else about how you work.
Of typical Claude spend, in our data, comes from repeated or rephrased prompts within a single session. The retry tax is real, and you cannot see it without per-prompt counts.
How to actually get a complete picture.
Here is the stack we recommend if you want full visibility, in the order you would build it:
Step 1: install a browser extension that counts tokens locally.
For claude.ai (and the other web chat surfaces), the only sane answer is a browser extension that reads the page DOM and runs the tokenizer in JavaScript on your machine. Look for one that explicitly does not transmit prompt content. TokenEyez does this; so do a couple of competitors. Whichever you pick, check the privacy policy.
What this gives you: per-message input + output token counts in the Claude UI as you type, a daily total, a per-conversation breakdown, and the running cost-equivalent for users on the API (since Claude.ai's subscription has an implied per-token value you can compare against).
Step 2: connect API usage if you have a key.
If you call the Anthropic API directly,from scripts, agents, or
Claude Code,wire those responses into the same store. The
usage field is right there in every response; you just
need a script or a logging hook to capture it.
Once you have both web and API in one place, you can answer questions like: "Am I getting better leverage out of my $20 Pro plan than out of my $80 of API spend?",which is exactly the kind of question a finance-minded person actually wants answered, and which neither Anthropic surface can answer alone.
Step 3: pull it into Claude itself via MCP.
Here's the move that surprises most people: once you have usage data structured, you can let Claude query it for you. An MCP server exposes your usage history as a tool Claude can call. So instead of opening a dashboard, you ask:
"How much did I spend on Opus this week, and how does that compare to last week?"
Claude calls the tool, reads back your numbers, and answers in plain English. TokenEyez ships an MCP server for this,the install instructions live here,and the experience of having Claude actually know how much it has cost you is the kind of thing you do not realize you wanted until you have it.
What "good" looks like.
You'll know your tracking is good enough when these five questions have one-screen answers:
- How much have I spent this month, across all my Claude surfaces combined?
- What's my mix of Opus, Sonnet, and Haiku,by token volume and by cost?
- Which five conversations or projects cost the most?
- Am I trending up, flat, or down compared to last month?
- If I keep going at today's pace, what will the month-end bill be?
If you cannot answer those in under thirty seconds, you don't have visibility,you have invoices. Those are different things, and the difference is exactly the thing that costs you money.
Privacy is non-negotiable.
One last note. Whatever you pick, the rule is the same: your prompts should never leave your machine. The token count is enough to give you everything in this article. The prompt body adds no value and adds real risk. If a tool wants the full content of your conversations to "give you better insights," walk away. Token counts plus model plus timestamp is all you actually need.
That is the bar TokenEyez was designed to clear. It is also the bar anything else you choose should clear, whether it has our name on it or not.