The Agent Loop
Every interaction with Claude Code runs through a single async generator loop. Not a request-response pair. Not a pipeline. A while(true) loop that keeps running until Claude decides it is done.
Understanding this loop explains most of the behavior you observe: why Claude sometimes takes many steps, why it recovers from errors automatically, and why /compact exists.
The 4-Phase Loop
Phase 1: Context Assembly
Before calling the LLM, the system assembles what it knows:
FUNCTION assembleContext(state): IF tokenCount > compactThreshold: triggerProactiveCompact()
systemPrompt = buildSystemPrompt(rules, memories, projectContext) tokenBudget = calculateBudget(maxContext, currentUsage) messages = attachMemories(state.messages, prefetchedMemory)
RETURN { systemPrompt, messages, tokenBudget }Phase 2: Stream API Call
The LLM response arrives as a stream, not a single blob. The system watches the stream for tool_use blocks and begins executing them immediately — without waiting for the response to finish.
FUNCTION streamAPICall(context): FOR chunk IN llmStream(context): IF chunk.type == "tool_use": streamingExecutor.start(chunk) // start immediately state.needsFollowUp = true ELSE IF chunk.type == "text": terminalUI.render(chunk.text)
RETURN statePhase 3: Tool Execution
After the stream ends, the streaming executor has already started some tools. The remaining tools are executed in batches — concurrent where safe, serial where not. See Tool Orchestration for how batching decisions are made.
FUNCTION executeTools(pendingTools): batches = partitionIntoBatches(pendingTools) FOR batch IN batches: IF batch.isConcurrent: results = await Promise.all(batch.tools.map(run)) ELSE: results = await runSerially(batch.tools)
applyContextModifiers(results) injectFileChangeNotifications(results)
RETURN collectResults()Phase 4: Stop or Continue
The loop asks one question: does the response contain unresolved tool_use blocks?
IF state.needsFollowUp == false: RETURN finalResponse // doneELSE: nextState = buildNextState(state, toolResults) CONTINUE // loop againDeep Dive: Loop State Object
The state object passed between iterations carries everything the loop needs:
interface LoopState { messages: Message[] // Full conversation history toolUseContext: { tools: Tool[] // Available tools for this agent permissions: PermissionConfig // Current permission settings agentConfig: AgentDefinition | null // Agent-specific config if subagent } autoCompactTracking: { hasCompacted: boolean // True after auto-compact fires compactedAtTurn: number // Turn number when compact happened } maxOutputTokensRecoveryCount: number // 0-3, increments per recovery attempt hasAttemptedReactiveCompact: boolean // Circuit breaker: true after first 413 recovery turnCount: number // Current iteration count transition: { reason: TransitionReason // Why this iteration started metadata: Record<string, unknown> // Per-reason data (see table below) }}Each transition reason carries specific metadata:
| Reason | Metadata |
|---|---|
next_turn | { toolResults: ToolResult[] } — results to inject |
max_output_tokens_escalate | { newBudget: 64000 } — one-time budget increase |
max_output_tokens_recovery | { attempt: number, maxAttempts: 3 } — recovery counter |
reactive_compact_retry | { summary: string } — compact summary to use |
collapse_drain_retry | { committedCount: number } — collapses applied |
token_budget_continuation | { remainingBudget: number } — auto-continue with budget |
stop_hook_retry | { hookErrors: string[] } — errors from hooks to inject |
The Key Decision: needsFollowUp
Claude Code does not trust the API’s stop_reason metadata. Instead it observes behavior: are there tool_use blocks in the response?
- If yes →
needsFollowUp = true→ continue the loop - If no →
needsFollowUp = false→ return to user
Principle: observe behavior, not labels. stop_reason: "end_turn" can appear alongside tool calls. The system ignores the label and checks the actual content. This makes the loop robust to unexpected API behavior.
Escalating Recovery
When things go wrong, the system does not retry blindly. It escalates through levels, each with a circuit breaker to prevent infinite loops.
Death spiral prevention: when an API error occurs, Stop hooks are skipped. Why? Stop hooks can inject additional context into the conversation — which would add more tokens to an already-overloaded context, making the problem worse. The system deliberately suppresses them during error recovery.
The Recovery Message
When the output is still too long after the budget escalation, the system injects a recovery prompt to continue:
"Resume directly — no apology, no recap of what you were doing.Pick up mid-thought. Break remaining work into smaller pieces."Every word is engineered to save tokens:
| Phrase | Token-saving purpose |
|---|---|
| ”Resume directly” | Prevents a preamble paragraph |
| ”no apology” | Prevents “I’m sorry for the confusion…” (typical LLM opening) |
| “no recap” | Prevents summarizing what was done (wastes 50–200 tokens) |
| “Pick up mid-thought” | Allows incomplete sentences, saves polite framing |
| ”Break into smaller pieces” | Prevents the same overrun on the next turn |
Deep Dive: Death Spiral Prevention
When the API returns an error, the loop has a subtle risk: stop hooks might inject additional tokens into the conversation. If the error was prompt_too_long, those extra tokens make the next request even larger — creating an infinite loop.
FUNCTION handleError(error, state): // CRITICAL: Skip stop hooks on API errors // Stop hooks inject context (error summaries, status messages) // If error is prompt_too_long, more context = worse // "error → hook → retry → error → hook → retry → ..."
IF error.type == "api_error": skipStopHooks = true // Break the spiral
IF error.type == "prompt_too_long" AND NOT state.hasAttemptedReactiveCompact: // Try context collapse drain first (cheap) collapsed = tryCollapseDrain(state) IF collapsed: RETURN { transition: "collapse_drain_retry" }
// Then reactive compact (expensive but effective) summary = await compactConversation(state.messages) state.hasAttemptedReactiveCompact = true // Circuit breaker: never retry RETURN { transition: "reactive_compact_retry", summary }
// If reactive compact already tried and still failing → give up surfaceErrorToUser(error)The circuit breaker pattern: hasAttemptedReactiveCompact is set to true after the first reactive compact attempt. If the next API call still returns 413, the system concludes the conversation is fundamentally too large for recovery and surfaces the error to the user rather than looping.
State Transitions
Inside the while(true) loop there is a hidden state machine. Each state name corresponds to a recovery path:
| State | Trigger | What Happens |
|---|---|---|
next_turn | Normal tool results | Continue with injected tool results |
max_output_tokens_escalate | First overrun | Token budget 8K → 64K (once only) |
max_output_tokens_recovery | Continued overrun | Inject recovery message (max ×3) |
token_budget_continuation | Budget exhausted mid-task | Auto-continue with budget warning |
reactive_compact_retry | 413 context too large | LLM summarizes context, retry |
collapse_drain_retry | 413 after compact | Cheaper collapse drain, retry |
stop_hook_retry | Stop hook blocked output | Re-enter loop after hook resolution |
These states are never shown to you directly — they are internal transitions that keep the loop alive through failure modes.
Why This Matters to You
- Why Claude “thinks forever”: It is in a tool execution loop with many iterations. Each
tool_use→tool_resultcycle is one loop turn. A task that reads 10 files, runs tests, and patches 3 bugs may take 15+ turns. - Why
/compactexists: It is proactive defense. The compaction threshold fires before the 413 emergency. Running/compactmanually during a long session resets the token count before the escalation ladder kicks in. - Why Claude says “resuming” after long outputs: The
max_output_tokens_recoverystate injected the recovery message. Claude is picking up mid-task after hitting the output limit. - Why errors sometimes switch to a different model: The API error → fallback model escalation. If the primary model returns repeated errors, the system retries on a backup model automatically.
- Why the recovery message sounds terse: That terseness is intentional engineering. Every word Claude doesn’t write is tokens saved for the actual task.