Skip to content

Cost Tracking

Every token has a price. Claude Code tracks costs per session, persists them across CLI restarts, and surfaces configurable threshold dialogs before your bill surprises you.


What’s Tracked

Cost tracking is per-token, per-model, and per-call-type. Not a single running total — a detailed breakdown.

Token TypeWhat It CountsPricing
Input tokensPrompt + full conversation history sent to the APICheapest per-token
Output tokensResponse text generated by the model~3–5× input rate (model-dependent)
Thinking tokensExtended thinking reasoning blocks, if enabledSame or higher than output rate
Cache write tokensPrompt caching: writing to the prompt cache~1.25× input rate
Cache read tokensPrompt caching: reading from the cache~0.1× input rate (significant discount)

If you switch to a fallback model mid-session (due to rate limiting or API errors — see The Agent Loop), costs are tracked separately per model with their respective rates applied.

Deep Dive: Multi-Model Cost Breakdown

When a session uses more than one model, the cost display shows a per-model breakdown:

FUNCTION formatSessionCost(session):
breakdown = {}
FOR EACH call IN session.apiCalls:
model = call.model
IF model NOT IN breakdown:
breakdown[model] = { input: 0, output: 0, thinking: 0, total: 0 }
rates = getPricingRates(model) // fetched from config or API
callCost = {
input: call.inputTokens * rates.input,
output: call.outputTokens * rates.output,
thinking: call.thinkingTokens * rates.thinking
}
breakdown[model].input += callCost.input
breakdown[model].output += callCost.output
breakdown[model].thinking += callCost.thinking
breakdown[model].total += sum(callCost)
RETURN breakdown

The session status bar shows the total. The threshold dialog (when triggered) shows the full per-model breakdown so you can see whether the cost came from extended thinking, heavy output, or a model switch.


Cost Persistence Across Restarts

Session costs survive CLI process exits. They are tied to the session ID, not the process.

sequenceDiagram participant C1 as Claude CLI (Run 1) participant D as Session Store (disk) participant C2 as Claude CLI (Run 2) C1->>D: Append cost event after each API call C1->>D: Write session metadata on exit Note over C1: Process exits (crash or normal) C2->>D: claude -r SESSION_ID (resume) D->>C2: Load conversation history + cumulative cost C2->>C2: Continue tracking from persisted total Note over C2: Status bar shows full session cost,\nnot just this run's cost
FUNCTION resumeSession(sessionId):
session = loadFromDisk(sessionId)
// Cost is restored from stored state, not recalculated
state.cumulativeCost = session.cumulativeCost
state.tokenBreakdown = session.tokenBreakdown
// Threshold is re-evaluated against resumed cost
// So a threshold dialog can fire on the very first call
// of a resumed session if the previous run was already near the limit
checkCostThreshold(state.cumulativeCost)
RETURN state

This matters in practice: if a 3-hour session costs $1.80 and you resume it after lunch, the cost display starts at $1.80 — not $0.00. The threshold system works against the true total.


Threshold Dialogs

When cumulative session cost crosses a configurable threshold, a dialog pauses the agent loop.

flowchart TD CALL["API call completes\n(tokens + cost recorded)"] CHECK{Session cost\n≥ threshold?} DIALOG["CostThresholdDialog\n─────────────────\nCurrent cost: $X.XX\nTokens: input / output / thinking\nBreakdown by model\n─────────────────\nContinue | Pause | Abort"] CONTINUE["Continue\n→ Raise threshold for this session\n→ Won't ask again until new threshold"] PAUSE["Pause\n→ Save session to disk\n→ Exit CLI cleanly\n→ Resume later with claude -r"] ABORT["Abort\n→ Discard current agent turn\n→ Session stays open\n→ User decides next action"] CALL --> CHECK CHECK -->|No| PROCEED["Agent loop continues"] CHECK -->|Yes| DIALOG DIALOG --> CONTINUE DIALOG --> PAUSE DIALOG --> ABORT style CALL fill:#1e293b,color:#94a3b8,stroke:#334155 style CHECK fill:#1e293b,color:#fcd34d,stroke:#334155 style DIALOG fill:#1e293b,color:#7dd3fc,stroke:#334155 style CONTINUE fill:#1e293b,color:#86efac,stroke:#334155 style PAUSE fill:#1e293b,color:#fcd34d,stroke:#334155 style ABORT fill:#1e293b,color:#fda4af,stroke:#334155 style PROCEED fill:#1e293b,color:#86efac,stroke:#334155

Thresholds are saved per project config — a $5 threshold on a small CLI project, a $50 threshold on a large codebase refactor. Different projects can have different budgets.

// Project-level threshold config (in .claude/settings.json)
{
"costThreshold": 5.00, // USD — dialog fires when exceeded
"costThresholdCurrency": "USD"
}

Choosing Continue auto-raises the threshold for the current session only (typically doubles it). It does not persist — the next session starts fresh against the original configured threshold.


Cost Optimization

TechniqueHow It HelpsWhen to Use
/compactSummarizes history before next API call — shrinks input tokens on every subsequent call in the sessionAny session over ~30 minutes
Use Haiku for simple tasksSignificantly cheaper per token than Sonnet; Sonnet much cheaper than OpusMechanical tasks: formatting, renaming, boilerplate
Subagents with isolated contextEach subagent starts with a fresh context — sends less history per call than the parent sessionLong multi-step tasks that can be decomposed
Avoid re-sending large filesIf a file is read once and not modified, referencing it by path is cheaper than re-reading it on every turnLarge config files, full source trees
Disable extended thinking for routine tasksThinking tokens add cost without improving simple tasksCode formatting, documentation generation, renames
Prompt cachingRepeated system prompts and static context are cached — cache reads cost ~10× less than fresh inputConsistent CLAUDE.md, large boilerplate context
Deep Dive: Where Tokens Actually Go in a Typical Session

A 60-minute refactoring session broken down:

Token sources (approximate, 200K-token session):
System prompt + CLAUDE.md ~8,000 input (sent every turn)
Conversation history ~40,000 input (grows each turn)
Tool results (files read, etc.) ~60,000 input (accumulates)
Extended thinking ~20,000 thinking (if enabled)
Output (responses + code) ~15,000 output
Total input sent to API: ~108,000 tokens (across ~20 turns)
But conversation grows → later turns send MORE than earlier turns
Turn 1: ~10K input
Turn 10: ~35K input
Turn 20: ~75K input ← this is where cost accelerates

This is why /compact mid-session has compounding benefits: it resets the accumulation curve. A compact at turn 10 means turns 11–20 start from a smaller base instead of the full accumulated history.


Why This Matters to You

  • Set cost thresholds per project in .claude/settings.json — prevents surprise bills on large automated tasks
  • Resume sessions with claude -r and cost tracking continues from the true cumulative total, not zero
  • /compact is the cheapest intervention — it reduces input tokens on every subsequent API call in the session
  • Model choice is the highest-leverage cost decision: Haiku is 10–20× cheaper than Opus per token for tasks that don’t need deep reasoning
  • Extended thinking adds real cost — disable it for routine tasks and reserve it for genuinely complex reasoning
  • Subagent isolation is underrated: spawning a subagent for a bounded task means it sends only its own short context, not your full 80K-token session history

See also: Architecture OverviewThe Agent Loop