The Agent Loop

Every interaction with Claude Code runs through a single async generator loop. Not a request-response pair. Not a pipeline. A while(true) loop that keeps running until Claude decides it is done.

Understanding this loop explains most of the behavior you observe: why Claude sometimes takes many steps, why it recovers from errors automatically, and why /compact exists.

The 4-Phase Loop

flowchart TD START([New user prompt]) --> P1 P1["Phase 1: Context Assembly\n─────────────────\nCheck if compact needed\nBuild system prompt\nAttach memories\nCalculate token budget"] P2["Phase 2: Stream API Call\n─────────────────\nCall LLM with streaming\nDetect tool_use blocks\nSet needsFollowUp = true\nStart streaming executor"] P3["Phase 3: Tool Execution\n─────────────────\nRun remaining tools\n(concurrent or serial batches)\nCollect results\nApply context modifiers\nInject file/memory notifications"] P4{"Phase 4: Stop or Continue?\n─────────────\nneedsFollowUp = false?"} DONE([Return to user]) NEXT["Build next state\n(inject tool_results)"] P1 --> P2 --> P3 --> P4 P4 -- Yes, done --> DONE P4 -- No, continue --> NEXT --> P1 style START fill:#1e293b,color:#94a3b8,stroke:#334155 style DONE fill:#1e293b,color:#86efac,stroke:#334155 style P4 fill:#1e293b,color:#fcd34d,stroke:#334155 style P1 fill:#1e293b,color:#7dd3fc,stroke:#334155 style P2 fill:#1e293b,color:#7dd3fc,stroke:#334155 style P3 fill:#1e293b,color:#7dd3fc,stroke:#334155 style NEXT fill:#1e293b,color:#94a3b8,stroke:#334155

Phase 1: Context Assembly

Before calling the LLM, the system assembles what it knows:

FUNCTION assembleContext(state):
  IF tokenCount > compactThreshold:
    triggerProactiveCompact()

  systemPrompt = buildSystemPrompt(rules, memories, projectContext)
  tokenBudget  = calculateBudget(maxContext, currentUsage)
  messages     = attachMemories(state.messages, prefetchedMemory)

  RETURN { systemPrompt, messages, tokenBudget }

Phase 2: Stream API Call

The LLM response arrives as a stream, not a single blob. The system watches the stream for tool_use blocks and begins executing them immediately — without waiting for the response to finish.

FUNCTION streamAPICall(context):
  FOR chunk IN llmStream(context):
    IF chunk.type == "tool_use":
      streamingExecutor.start(chunk)   // start immediately
      state.needsFollowUp = true
    ELSE IF chunk.type == "text":
      terminalUI.render(chunk.text)

  RETURN state

Phase 3: Tool Execution

After the stream ends, the streaming executor has already started some tools. The remaining tools are executed in batches — concurrent where safe, serial where not. See Tool Orchestration for how batching decisions are made.

FUNCTION executeTools(pendingTools):
  batches = partitionIntoBatches(pendingTools)
  FOR batch IN batches:
    IF batch.isConcurrent:
      results = await Promise.all(batch.tools.map(run))
    ELSE:
      results = await runSerially(batch.tools)

    applyContextModifiers(results)
    injectFileChangeNotifications(results)

  RETURN collectResults()

Phase 4: Stop or Continue

The loop asks one question: does the response contain unresolved tool_use blocks?

IF state.needsFollowUp == false:
  RETURN finalResponse         // done
ELSE:
  nextState = buildNextState(state, toolResults)
  CONTINUE                     // loop again

Deep Dive: Loop State Object

The state object passed between iterations carries everything the loop needs:

interface LoopState {
  messages: Message[]                    // Full conversation history
  toolUseContext: {
    tools: Tool[]                        // Available tools for this agent
    permissions: PermissionConfig        // Current permission settings
    agentConfig: AgentDefinition | null  // Agent-specific config if subagent
  }
  autoCompactTracking: {
    hasCompacted: boolean                // True after auto-compact fires
    compactedAtTurn: number              // Turn number when compact happened
  }
  maxOutputTokensRecoveryCount: number   // 0-3, increments per recovery attempt
  hasAttemptedReactiveCompact: boolean   // Circuit breaker: true after first 413 recovery
  turnCount: number                      // Current iteration count
  transition: {
    reason: TransitionReason             // Why this iteration started
    metadata: Record<string, unknown>    // Per-reason data (see table below)
  }
}

Each transition reason carries specific metadata:

Reason	Metadata
`next_turn`	`{ toolResults: ToolResult[] }` — results to inject
`max_output_tokens_escalate`	`{ newBudget: 64000 }` — one-time budget increase
`max_output_tokens_recovery`	`{ attempt: number, maxAttempts: 3 }` — recovery counter
`reactive_compact_retry`	`{ summary: string }` — compact summary to use
`collapse_drain_retry`	`{ committedCount: number }` — collapses applied
`token_budget_continuation`	`{ remainingBudget: number }` — auto-continue with budget
`stop_hook_retry`	`{ hookErrors: string[] }` — errors from hooks to inject

The Key Decision: needsFollowUp

Claude Code does not trust the API’s stop_reason metadata. Instead it observes behavior: are there tool_use blocks in the response?

If yes → needsFollowUp = true → continue the loop
If no → needsFollowUp = false → return to user

Principle: observe behavior, not labels. stop_reason: "end_turn" can appear alongside tool calls. The system ignores the label and checks the actual content. This makes the loop robust to unexpected API behavior.

Escalating Recovery

When things go wrong, the system does not retry blindly. It escalates through levels, each with a circuit breaker to prevent infinite loops.

flowchart TD ERR([Error or overrun detected]) E1["Output too long\n→ Increase budget 8K → 64K\n(once only)"] E2["Still too long\n→ Inject recovery message\n(up to 3 times)"] E3["Context too large (413)\n→ Try context collapse drain\n→ Then reactive compact"] E4["API error\n→ Try fallback model"] E5["All else fails\n→ Surface error to user"] ERR --> E1 --> E2 --> E3 --> E4 --> E5 style ERR fill:#1e293b,color:#fda4af,stroke:#334155 style E1 fill:#1e293b,color:#fcd34d,stroke:#334155 style E2 fill:#1e293b,color:#fcd34d,stroke:#334155 style E3 fill:#1e293b,color:#fda4af,stroke:#334155 style E4 fill:#1e293b,color:#94a3b8,stroke:#334155 style E5 fill:#1e293b,color:#f87171,stroke:#334155

Death spiral prevention: when an API error occurs, Stop hooks are skipped. Why? Stop hooks can inject additional context into the conversation — which would add more tokens to an already-overloaded context, making the problem worse. The system deliberately suppresses them during error recovery.

The Recovery Message

When the output is still too long after the budget escalation, the system injects a recovery prompt to continue:

"Resume directly — no apology, no recap of what you were doing.
Pick up mid-thought. Break remaining work into smaller pieces."

Every word is engineered to save tokens:

Phrase	Token-saving purpose
”Resume directly”	Prevents a preamble paragraph
”no apology”	Prevents “I’m sorry for the confusion…” (typical LLM opening)
“no recap”	Prevents summarizing what was done (wastes 50–200 tokens)
“Pick up mid-thought”	Allows incomplete sentences, saves polite framing
”Break into smaller pieces”	Prevents the same overrun on the next turn

Deep Dive: Death Spiral Prevention

When the API returns an error, the loop has a subtle risk: stop hooks might inject additional tokens into the conversation. If the error was prompt_too_long, those extra tokens make the next request even larger — creating an infinite loop.

FUNCTION handleError(error, state):
  // CRITICAL: Skip stop hooks on API errors
  // Stop hooks inject context (error summaries, status messages)
  // If error is prompt_too_long, more context = worse
  // "error → hook → retry → error → hook → retry → ..."

  IF error.type == "api_error":
    skipStopHooks = true    // Break the spiral

  IF error.type == "prompt_too_long" AND NOT state.hasAttemptedReactiveCompact:
    // Try context collapse drain first (cheap)
    collapsed = tryCollapseDrain(state)
    IF collapsed:
      RETURN { transition: "collapse_drain_retry" }

    // Then reactive compact (expensive but effective)
    summary = await compactConversation(state.messages)
    state.hasAttemptedReactiveCompact = true  // Circuit breaker: never retry
    RETURN { transition: "reactive_compact_retry", summary }

  // If reactive compact already tried and still failing → give up
  surfaceErrorToUser(error)

The circuit breaker pattern: hasAttemptedReactiveCompact is set to true after the first reactive compact attempt. If the next API call still returns 413, the system concludes the conversation is fundamentally too large for recovery and surfaces the error to the user rather than looping.

State Transitions

Inside the while(true) loop there is a hidden state machine. Each state name corresponds to a recovery path:

State	Trigger	What Happens
`next_turn`	Normal tool results	Continue with injected tool results
`max_output_tokens_escalate`	First overrun	Token budget 8K → 64K (once only)
`max_output_tokens_recovery`	Continued overrun	Inject recovery message (max ×3)
`token_budget_continuation`	Budget exhausted mid-task	Auto-continue with budget warning
`reactive_compact_retry`	413 context too large	LLM summarizes context, retry
`collapse_drain_retry`	413 after compact	Cheaper collapse drain, retry
`stop_hook_retry`	Stop hook blocked output	Re-enter loop after hook resolution

These states are never shown to you directly — they are internal transitions that keep the loop alive through failure modes.

Why This Matters to You

Why Claude “thinks forever”: It is in a tool execution loop with many iterations. Each tool_use → tool_result cycle is one loop turn. A task that reads 10 files, runs tests, and patches 3 bugs may take 15+ turns.
Why /compact exists: It is proactive defense. The compaction threshold fires before the 413 emergency. Running /compact manually during a long session resets the token count before the escalation ladder kicks in.
Why Claude says “resuming” after long outputs: The max_output_tokens_recovery state injected the recovery message. Claude is picking up mid-task after hitting the output limit.
Why errors sometimes switch to a different model: The API error → fallback model escalation. If the primary model returns repeated errors, the system retries on a backup model automatically.
Why the recovery message sounds terse: That terseness is intentional engineering. Every word Claude doesn’t write is tokens saved for the actual task.