Cost Tracking

Every token has a price. Claude Code tracks costs per session, persists them across CLI restarts, and surfaces configurable threshold dialogs before your bill surprises you.

What’s Tracked

Cost tracking is per-token, per-model, and per-call-type. Not a single running total — a detailed breakdown.

Token Type	What It Counts	Pricing
Input tokens	Prompt + full conversation history sent to the API	Cheapest per-token
Output tokens	Response text generated by the model	~3–5× input rate (model-dependent)
Thinking tokens	Extended thinking reasoning blocks, if enabled	Same or higher than output rate
Cache write tokens	Prompt caching: writing to the prompt cache	~1.25× input rate
Cache read tokens	Prompt caching: reading from the cache	~0.1× input rate (significant discount)

If you switch to a fallback model mid-session (due to rate limiting or API errors — see The Agent Loop), costs are tracked separately per model with their respective rates applied.

Deep Dive: Multi-Model Cost Breakdown

When a session uses more than one model, the cost display shows a per-model breakdown:

FUNCTION formatSessionCost(session):
  breakdown = {}

  FOR EACH call IN session.apiCalls:
    model = call.model
    IF model NOT IN breakdown:
      breakdown[model] = { input: 0, output: 0, thinking: 0, total: 0 }

    rates = getPricingRates(model)  // fetched from config or API
    callCost = {
      input:    call.inputTokens    * rates.input,
      output:   call.outputTokens   * rates.output,
      thinking: call.thinkingTokens * rates.thinking
    }

    breakdown[model].input    += callCost.input
    breakdown[model].output   += callCost.output
    breakdown[model].thinking += callCost.thinking
    breakdown[model].total    += sum(callCost)

  RETURN breakdown

The session status bar shows the total. The threshold dialog (when triggered) shows the full per-model breakdown so you can see whether the cost came from extended thinking, heavy output, or a model switch.

Cost Persistence Across Restarts

Session costs survive CLI process exits. They are tied to the session ID, not the process.

sequenceDiagram participant C1 as Claude CLI (Run 1) participant D as Session Store (disk) participant C2 as Claude CLI (Run 2) C1->>D: Append cost event after each API call C1->>D: Write session metadata on exit Note over C1: Process exits (crash or normal) C2->>D: claude -r SESSION_ID (resume) D->>C2: Load conversation history + cumulative cost C2->>C2: Continue tracking from persisted total Note over C2: Status bar shows full session cost,\nnot just this run's cost

FUNCTION resumeSession(sessionId):
  session = loadFromDisk(sessionId)

  // Cost is restored from stored state, not recalculated
  state.cumulativeCost = session.cumulativeCost
  state.tokenBreakdown = session.tokenBreakdown

  // Threshold is re-evaluated against resumed cost
  // So a threshold dialog can fire on the very first call
  // of a resumed session if the previous run was already near the limit
  checkCostThreshold(state.cumulativeCost)

  RETURN state

This matters in practice: if a 3-hour session costs $1.80 and you resume it after lunch, the cost display starts at $1.80 — not $0.00. The threshold system works against the true total.

Threshold Dialogs

When cumulative session cost crosses a configurable threshold, a dialog pauses the agent loop.

flowchart TD CALL["API call completes\n(tokens + cost recorded)"] CHECK{Session cost\n≥ threshold?} DIALOG["CostThresholdDialog\n─────────────────\nCurrent cost: $X.XX\nTokens: input / output / thinking\nBreakdown by model\n─────────────────\nContinue | Pause | Abort"] CONTINUE["Continue\n→ Raise threshold for this session\n→ Won't ask again until new threshold"] PAUSE["Pause\n→ Save session to disk\n→ Exit CLI cleanly\n→ Resume later with claude -r"] ABORT["Abort\n→ Discard current agent turn\n→ Session stays open\n→ User decides next action"] CALL --> CHECK CHECK -->|No| PROCEED["Agent loop continues"] CHECK -->|Yes| DIALOG DIALOG --> CONTINUE DIALOG --> PAUSE DIALOG --> ABORT style CALL fill:#1e293b,color:#94a3b8,stroke:#334155 style CHECK fill:#1e293b,color:#fcd34d,stroke:#334155 style DIALOG fill:#1e293b,color:#7dd3fc,stroke:#334155 style CONTINUE fill:#1e293b,color:#86efac,stroke:#334155 style PAUSE fill:#1e293b,color:#fcd34d,stroke:#334155 style ABORT fill:#1e293b,color:#fda4af,stroke:#334155 style PROCEED fill:#1e293b,color:#86efac,stroke:#334155

Thresholds are saved per project config — a $5 threshold on a small CLI project, a $50 threshold on a large codebase refactor. Different projects can have different budgets.

// Project-level threshold config (in .claude/settings.json)
{
  "costThreshold": 5.00,       // USD — dialog fires when exceeded
  "costThresholdCurrency": "USD"
}

Choosing Continue auto-raises the threshold for the current session only (typically doubles it). It does not persist — the next session starts fresh against the original configured threshold.

Cost Optimization

Technique	How It Helps	When to Use
`/compact`	Summarizes history before next API call — shrinks input tokens on every subsequent call in the session	Any session over ~30 minutes
Use Haiku for simple tasks	Significantly cheaper per token than Sonnet; Sonnet much cheaper than Opus	Mechanical tasks: formatting, renaming, boilerplate
Subagents with isolated context	Each subagent starts with a fresh context — sends less history per call than the parent session	Long multi-step tasks that can be decomposed
Avoid re-sending large files	If a file is read once and not modified, referencing it by path is cheaper than re-reading it on every turn	Large config files, full source trees
Disable extended thinking for routine tasks	Thinking tokens add cost without improving simple tasks	Code formatting, documentation generation, renames
Prompt caching	Repeated system prompts and static context are cached — cache reads cost ~10× less than fresh input	Consistent CLAUDE.md, large boilerplate context

Deep Dive: Where Tokens Actually Go in a Typical Session

A 60-minute refactoring session broken down:

Token sources (approximate, 200K-token session):

  System prompt + CLAUDE.md       ~8,000   input    (sent every turn)
  Conversation history            ~40,000  input    (grows each turn)
  Tool results (files read, etc.) ~60,000  input    (accumulates)
  Extended thinking               ~20,000  thinking (if enabled)
  Output (responses + code)       ~15,000  output

  Total input sent to API:   ~108,000 tokens (across ~20 turns)
  But conversation grows → later turns send MORE than earlier turns
  Turn 1:  ~10K input
  Turn 10: ~35K input
  Turn 20: ~75K input     ← this is where cost accelerates

This is why /compact mid-session has compounding benefits: it resets the accumulation curve. A compact at turn 10 means turns 11–20 start from a smaller base instead of the full accumulated history.

Why This Matters to You

Set cost thresholds per project in .claude/settings.json — prevents surprise bills on large automated tasks
Resume sessions with claude -r and cost tracking continues from the true cumulative total, not zero
/compact is the cheapest intervention — it reduces input tokens on every subsequent API call in the session
Model choice is the highest-leverage cost decision: Haiku is 10–20× cheaper than Opus per token for tasks that don’t need deep reasoning
Extended thinking adds real cost — disable it for routine tasks and reserve it for genuinely complex reasoning
Subagent isolation is underrated: spawning a subagent for a bounded task means it sends only its own short context, not your full 80K-token session history

See also: Architecture Overview — The Agent Loop