Skip to content

Rate Limiting

Claude Code tracks your usage across two independent time windows. Understanding both prevents unexpected blocks mid-task — especially during long agentic sessions that consume tokens quickly.


Dual-Window System

Two counters run simultaneously at all times. Either one can trigger rate limiting on its own — hitting the 5-hour limit does not reset or affect the 7-day counter, and vice versa.

flowchart LR subgraph W1["5-Hour Rolling Window"] T1["Tokens consumed\nin last 5 hrs"] TH1["Burst threshold\n(e.g. 1M tokens)"] T1 -->|compare| TH1 end subgraph W2["7-Day Rolling Window"] T2["Tokens consumed\nin last 7 days"] TH2["Sustained-use threshold\n(e.g. 5M tokens)"] T2 -->|compare| TH2 end TH1 -->|exceeds| RATE["Rate Limited"] TH2 -->|exceeds| RATE style W1 fill:#1e293b,color:#7dd3fc,stroke:#334155 style W2 fill:#1e293b,color:#c4b5fd,stroke:#334155 style RATE fill:#1e293b,color:#f87171,stroke:#334155 style T1 fill:#1e293b,color:#94a3b8,stroke:#334155 style T2 fill:#1e293b,color:#94a3b8,stroke:#334155 style TH1 fill:#1e293b,color:#fcd34d,stroke:#334155 style TH2 fill:#1e293b,color:#fcd34d,stroke:#334155
WindowPurposeReset Behavior
5-hour rollingBurst protection — prevents single heavy sessions from monopolizing capacitySlides continuously; not a fixed clock reset
7-day rollingSustained-use cap — prevents chronic heavy users from exceeding weekly quotasAlso slides — based on rolling wall time, not calendar week

Both windows track input tokens, output tokens, and thinking tokens independently.


3-Tier Early Warning System

The system warns you before blocking — with escalating urgency based on how close you are to either window’s threshold.

flowchart TD CHECK{Usage in\neither window?} CHECK -->|~60% of threshold| T1 CHECK -->|~80% of threshold| T2 CHECK -->|~95% of threshold| T3 CHECK -->|100% — limit hit| BLOCK T1["Tier 1 — Subtle indicator\nStatus bar shows current usage %\nNo interruption to workflow"] T2["Tier 2 — Warning message\nConversation output includes\nyellow warning with usage numbers"] T3["Tier 3 — Urgent dialog\nFull-screen breakdown:\n• tokens used / remaining\n• which window is near limit\n• time until window resets\n• suggestion to reduce usage"] BLOCK["Rate Limited\n429 returned from API\nAll calls blocked until window rolls"] style CHECK fill:#1e293b,color:#fcd34d,stroke:#334155 style T1 fill:#1e293b,color:#86efac,stroke:#334155 style T2 fill:#1e293b,color:#fcd34d,stroke:#334155 style T3 fill:#1e293b,color:#fda4af,stroke:#334155 style BLOCK fill:#1e293b,color:#f87171,stroke:#334155

Tier 3 is the last actionable warning before the block. The dialog prompts you to run /compact or reduce scope — at that point, a few hundred tokens of action can save you from an hours-long wait.


How Limits Are Detected

The system uses two sources of usage data, with the more accurate one taking precedence:

FUNCTION checkRateLimit(response):
IF response.headers.has("anthropic-ratelimit-*"):
// Header-based: most accurate
// API returns remaining tokens/requests per window directly
remaining5h = header("anthropic-ratelimit-tokens-remaining")
remaining7d = header("anthropic-ratelimit-tokens-remaining-7d")
resetTime = header("anthropic-ratelimit-tokens-reset")
updateWarningTier(remaining5h, remaining7d)
ELSE:
// Local fallback: less accurate but prevents over-quota calls
// Happens after crash/restart when headers unavailable
localCount = readLocalUsageCache()
estimateRemainingFromLocal(localCount)

The local fallback deliberately errs on the side of caution — it may warn you sooner than necessary, but it will never silently allow calls that would return 429.

Deep Dive: Local Usage Cache

The local cache persists usage data to disk so estimates survive CLI restarts:

FUNCTION updateLocalCache(tokensUsed):
cache = readDisk(".claude/usage-cache.json")
// Purge entries older than the longest window (7 days)
cache.entries = cache.entries.filter(e =>
now() - e.timestamp < SEVEN_DAYS_MS
)
// Append new usage event
cache.entries.push({
timestamp: now(),
inputTokens: tokensUsed.input,
outputTokens: tokensUsed.output,
thinkingTokens: tokensUsed.thinking,
model: tokensUsed.model
})
writeDisk(".claude/usage-cache.json", cache)
FUNCTION estimateFromLocal():
cache = readDisk(".claude/usage-cache.json")
cutoff5h = now() - FIVE_HOURS_MS
cutoff7d = now() - SEVEN_DAYS_MS
total5h = sum(cache.entries.filter(e => e.timestamp > cutoff5h))
total7d = sum(cache.entries.filter(e => e.timestamp > cutoff7d))
RETURN { total5h, total7d }

This means usage tracking works even when the API is unreachable — the system can still warn you before sending a request that will certainly fail.


What Happens When Rate Limited

FUNCTION handleRateLimitError(error): // error.status == 429
resetTime = error.headers["retry-after"] OR error.body.reset_at
// Show user: which window triggered, time until reset
displayRateLimitDialog({
window: error.body.window, // "5h" or "7d"
resetAt: resetTime,
waitSecs: resetTime - now()
})
// Escalate through recovery options (see: The Agent Loop)
IF fallbackModel.available AND fallbackModel.hasCapacity():
switchToFallbackModel() // continue with alternate model
retryRequest()
ELSE:
pauseAgentLoop() // suspend until window resets
notifyUser("Session paused. Resumable at " + resetTime)

If a fallback model is available and has remaining capacity under its own rate limits, the system switches automatically. You may notice a model name change in the status bar — this is intentional recovery behavior, not a bug. See The Agent Loop for how the escalation ladder works.


Practical Guidance

QuestionAnswer
How do I check current usage?The status bar shows the active window’s usage %. Tier 1 warning (60%) is the earliest visible signal.
What reduces token consumption?/compact shrinks conversation history before the next API call. Subagents with isolated contexts send less history per call.
Can I increase limits?Pro and Enterprise plans have higher per-window caps. Max Tier (via Claude.ai) significantly raises both windows.
Why did my limit reset unexpectedly?The 5-hour window is rolling — it slides continuously. If your heavy usage was 5+ hours ago, that portion has rolled out of the window.
Does /compact count against my limit?Yes — compaction is itself an API call. But it’s far cheaper than the context it eliminates from future calls.
Do subagents share my limit?Yes. All API calls — yours, your subagents’, background tasks — count against the same user-level windows.

Why This Matters to You

  • The warning tiers tell you before you’re blocked — Tier 2 at 80% is your action window, not Tier 3 at 95%
  • /compact is the fastest intervention mid-session — fewer input tokens on every subsequent call
  • Subagents with isolated context reduce per-call cost, but they still consume your shared quota; plan accordingly
  • If you hit limits often, check whether you’re sending large file contents or long histories on every call — context accumulates fast
  • The local fallback cache means estimates persist across restarts — you won’t accidentally blow past a limit just because you restarted the CLI

See also: Architecture OverviewThe Agent Loop