Rate Limiting
Claude Code tracks your usage across two independent time windows. Understanding both prevents unexpected blocks mid-task — especially during long agentic sessions that consume tokens quickly.
Dual-Window System
Two counters run simultaneously at all times. Either one can trigger rate limiting on its own — hitting the 5-hour limit does not reset or affect the 7-day counter, and vice versa.
| Window | Purpose | Reset Behavior |
|---|---|---|
| 5-hour rolling | Burst protection — prevents single heavy sessions from monopolizing capacity | Slides continuously; not a fixed clock reset |
| 7-day rolling | Sustained-use cap — prevents chronic heavy users from exceeding weekly quotas | Also slides — based on rolling wall time, not calendar week |
Both windows track input tokens, output tokens, and thinking tokens independently.
3-Tier Early Warning System
The system warns you before blocking — with escalating urgency based on how close you are to either window’s threshold.
Tier 3 is the last actionable warning before the block. The dialog prompts you to run /compact or reduce scope — at that point, a few hundred tokens of action can save you from an hours-long wait.
How Limits Are Detected
The system uses two sources of usage data, with the more accurate one taking precedence:
FUNCTION checkRateLimit(response): IF response.headers.has("anthropic-ratelimit-*"): // Header-based: most accurate // API returns remaining tokens/requests per window directly remaining5h = header("anthropic-ratelimit-tokens-remaining") remaining7d = header("anthropic-ratelimit-tokens-remaining-7d") resetTime = header("anthropic-ratelimit-tokens-reset") updateWarningTier(remaining5h, remaining7d)
ELSE: // Local fallback: less accurate but prevents over-quota calls // Happens after crash/restart when headers unavailable localCount = readLocalUsageCache() estimateRemainingFromLocal(localCount)The local fallback deliberately errs on the side of caution — it may warn you sooner than necessary, but it will never silently allow calls that would return 429.
Deep Dive: Local Usage Cache
The local cache persists usage data to disk so estimates survive CLI restarts:
FUNCTION updateLocalCache(tokensUsed): cache = readDisk(".claude/usage-cache.json")
// Purge entries older than the longest window (7 days) cache.entries = cache.entries.filter(e => now() - e.timestamp < SEVEN_DAYS_MS )
// Append new usage event cache.entries.push({ timestamp: now(), inputTokens: tokensUsed.input, outputTokens: tokensUsed.output, thinkingTokens: tokensUsed.thinking, model: tokensUsed.model })
writeDisk(".claude/usage-cache.json", cache)
FUNCTION estimateFromLocal(): cache = readDisk(".claude/usage-cache.json") cutoff5h = now() - FIVE_HOURS_MS cutoff7d = now() - SEVEN_DAYS_MS
total5h = sum(cache.entries.filter(e => e.timestamp > cutoff5h)) total7d = sum(cache.entries.filter(e => e.timestamp > cutoff7d))
RETURN { total5h, total7d }This means usage tracking works even when the API is unreachable — the system can still warn you before sending a request that will certainly fail.
What Happens When Rate Limited
FUNCTION handleRateLimitError(error): // error.status == 429 resetTime = error.headers["retry-after"] OR error.body.reset_at
// Show user: which window triggered, time until reset displayRateLimitDialog({ window: error.body.window, // "5h" or "7d" resetAt: resetTime, waitSecs: resetTime - now() })
// Escalate through recovery options (see: The Agent Loop) IF fallbackModel.available AND fallbackModel.hasCapacity(): switchToFallbackModel() // continue with alternate model retryRequest() ELSE: pauseAgentLoop() // suspend until window resets notifyUser("Session paused. Resumable at " + resetTime)If a fallback model is available and has remaining capacity under its own rate limits, the system switches automatically. You may notice a model name change in the status bar — this is intentional recovery behavior, not a bug. See The Agent Loop for how the escalation ladder works.
Practical Guidance
| Question | Answer |
|---|---|
| How do I check current usage? | The status bar shows the active window’s usage %. Tier 1 warning (60%) is the earliest visible signal. |
| What reduces token consumption? | /compact shrinks conversation history before the next API call. Subagents with isolated contexts send less history per call. |
| Can I increase limits? | Pro and Enterprise plans have higher per-window caps. Max Tier (via Claude.ai) significantly raises both windows. |
| Why did my limit reset unexpectedly? | The 5-hour window is rolling — it slides continuously. If your heavy usage was 5+ hours ago, that portion has rolled out of the window. |
Does /compact count against my limit? | Yes — compaction is itself an API call. But it’s far cheaper than the context it eliminates from future calls. |
| Do subagents share my limit? | Yes. All API calls — yours, your subagents’, background tasks — count against the same user-level windows. |
Why This Matters to You
- The warning tiers tell you before you’re blocked — Tier 2 at 80% is your action window, not Tier 3 at 95%
/compactis the fastest intervention mid-session — fewer input tokens on every subsequent call- Subagents with isolated context reduce per-call cost, but they still consume your shared quota; plan accordingly
- If you hit limits often, check whether you’re sending large file contents or long histories on every call — context accumulates fast
- The local fallback cache means estimates persist across restarts — you won’t accidentally blow past a limit just because you restarted the CLI
See also: Architecture Overview — The Agent Loop