Cost Tracking
Every token has a price. Claude Code tracks costs per session, persists them across CLI restarts, and surfaces configurable threshold dialogs before your bill surprises you.
What’s Tracked
Cost tracking is per-token, per-model, and per-call-type. Not a single running total — a detailed breakdown.
| Token Type | What It Counts | Pricing |
|---|---|---|
| Input tokens | Prompt + full conversation history sent to the API | Cheapest per-token |
| Output tokens | Response text generated by the model | ~3–5× input rate (model-dependent) |
| Thinking tokens | Extended thinking reasoning blocks, if enabled | Same or higher than output rate |
| Cache write tokens | Prompt caching: writing to the prompt cache | ~1.25× input rate |
| Cache read tokens | Prompt caching: reading from the cache | ~0.1× input rate (significant discount) |
If you switch to a fallback model mid-session (due to rate limiting or API errors — see The Agent Loop), costs are tracked separately per model with their respective rates applied.
Deep Dive: Multi-Model Cost Breakdown
When a session uses more than one model, the cost display shows a per-model breakdown:
FUNCTION formatSessionCost(session): breakdown = {}
FOR EACH call IN session.apiCalls: model = call.model IF model NOT IN breakdown: breakdown[model] = { input: 0, output: 0, thinking: 0, total: 0 }
rates = getPricingRates(model) // fetched from config or API callCost = { input: call.inputTokens * rates.input, output: call.outputTokens * rates.output, thinking: call.thinkingTokens * rates.thinking }
breakdown[model].input += callCost.input breakdown[model].output += callCost.output breakdown[model].thinking += callCost.thinking breakdown[model].total += sum(callCost)
RETURN breakdownThe session status bar shows the total. The threshold dialog (when triggered) shows the full per-model breakdown so you can see whether the cost came from extended thinking, heavy output, or a model switch.
Cost Persistence Across Restarts
Session costs survive CLI process exits. They are tied to the session ID, not the process.
FUNCTION resumeSession(sessionId): session = loadFromDisk(sessionId)
// Cost is restored from stored state, not recalculated state.cumulativeCost = session.cumulativeCost state.tokenBreakdown = session.tokenBreakdown
// Threshold is re-evaluated against resumed cost // So a threshold dialog can fire on the very first call // of a resumed session if the previous run was already near the limit checkCostThreshold(state.cumulativeCost)
RETURN stateThis matters in practice: if a 3-hour session costs $1.80 and you resume it after lunch, the cost display starts at $1.80 — not $0.00. The threshold system works against the true total.
Threshold Dialogs
When cumulative session cost crosses a configurable threshold, a dialog pauses the agent loop.
Thresholds are saved per project config — a $5 threshold on a small CLI project, a $50 threshold on a large codebase refactor. Different projects can have different budgets.
// Project-level threshold config (in .claude/settings.json){ "costThreshold": 5.00, // USD — dialog fires when exceeded "costThresholdCurrency": "USD"}Choosing Continue auto-raises the threshold for the current session only (typically doubles it). It does not persist — the next session starts fresh against the original configured threshold.
Cost Optimization
| Technique | How It Helps | When to Use |
|---|---|---|
/compact | Summarizes history before next API call — shrinks input tokens on every subsequent call in the session | Any session over ~30 minutes |
| Use Haiku for simple tasks | Significantly cheaper per token than Sonnet; Sonnet much cheaper than Opus | Mechanical tasks: formatting, renaming, boilerplate |
| Subagents with isolated context | Each subagent starts with a fresh context — sends less history per call than the parent session | Long multi-step tasks that can be decomposed |
| Avoid re-sending large files | If a file is read once and not modified, referencing it by path is cheaper than re-reading it on every turn | Large config files, full source trees |
| Disable extended thinking for routine tasks | Thinking tokens add cost without improving simple tasks | Code formatting, documentation generation, renames |
| Prompt caching | Repeated system prompts and static context are cached — cache reads cost ~10× less than fresh input | Consistent CLAUDE.md, large boilerplate context |
Deep Dive: Where Tokens Actually Go in a Typical Session
A 60-minute refactoring session broken down:
Token sources (approximate, 200K-token session):
System prompt + CLAUDE.md ~8,000 input (sent every turn) Conversation history ~40,000 input (grows each turn) Tool results (files read, etc.) ~60,000 input (accumulates) Extended thinking ~20,000 thinking (if enabled) Output (responses + code) ~15,000 output
Total input sent to API: ~108,000 tokens (across ~20 turns) But conversation grows → later turns send MORE than earlier turns Turn 1: ~10K input Turn 10: ~35K input Turn 20: ~75K input ← this is where cost acceleratesThis is why /compact mid-session has compounding benefits: it resets the accumulation curve. A compact at turn 10 means turns 11–20 start from a smaller base instead of the full accumulated history.
Why This Matters to You
- Set cost thresholds per project in
.claude/settings.json— prevents surprise bills on large automated tasks - Resume sessions with
claude -rand cost tracking continues from the true cumulative total, not zero /compactis the cheapest intervention — it reduces input tokens on every subsequent API call in the session- Model choice is the highest-leverage cost decision: Haiku is 10–20× cheaper than Opus per token for tasks that don’t need deep reasoning
- Extended thinking adds real cost — disable it for routine tasks and reserve it for genuinely complex reasoning
- Subagent isolation is underrated: spawning a subagent for a bounded task means it sends only its own short context, not your full 80K-token session history
See also: Architecture Overview — The Agent Loop