Claude Code Architecture Explorer

What is Claude Code?

Anthropic's official command-line interface for interacting with Claude AI directly from the terminal.

💡 AI Engineering Insight

Claude Code is a textbook example of how to build a production AI agent system. It demonstrates tool-augmented LLMs, streaming architecture, permission systems, multi-agent orchestration, and extensible plugin architectures — all concepts you'll need as an AI engineer.

Claude Code is not just a chat CLI. It's a full agentic system that can read/write files, execute shell commands, search the web, manage tasks, spawn sub-agents, integrate with external services via MCP, and orchestrate multi-agent "swarms" — all while enforcing a sophisticated permission model and streaming results in real time through a React-powered terminal UI.

Layered Architecture

Click any layer to see details. The system is organized in clean layers from entry points down to external integrations.

Entry Points

cli.tsx init.ts mcp.ts SDK schemas

entrypoints/cli.tsx — The main CLI bootstrap. Handles fast-path flags (--version, --help), feature-gated paths (daemon, bridge mode), then delegates to init.ts for full startup. Uses Commander.js for argument parsing.
entrypoints/init.ts — Full initialization: config validation, env setup, telemetry, LSP server manager, scratchpad creation, OAuth account info.
entrypoints/mcp.ts — Separate entrypoint when Claude Code runs as an MCP server itself.
entrypoints/sdk/ — Zod schemas (coreSchemas.ts, controlSchemas.ts) defining the Agent SDK API surface.

Core Engine

main.tsx (804KB) query.ts QueryEngine.ts history.ts context.ts

main.tsx — The monolithic orchestrator (~804KB). Imports all subsystems, sets up React/Ink context, initializes commands & tools, handles session resumption. Think of it as the "app kernel".
query.ts — Async generator that drives the conversation loop: processes user input → builds messages → calls API → handles tool execution → yields results.
QueryEngine.ts — Higher-level orchestrator managing tool availability, token budgets, context management, and streaming coordination.
history.ts — Session history persistence (in-memory + disk), compact/microcompact for history compression.
context.ts — Builds the system context: git status, memory files, dynamic variables.

Tools

44+ tools Tool.ts (interface) tools.ts (registry) StreamingToolExecutor

Each tool lives in its own directory under tools/. Tools implement a common Tool interface defined in Tool.ts. The tools.ts file registers and assembles all tools. StreamingToolExecutor handles concurrent execution, ordering results, and managing exclusive vs concurrent-safe tools. Tools are the primary way Claude interacts with the outside world.

Services

API client MCP client Analytics OAuth LSP Plugins

services/api/claude.ts — The Anthropic API client. Handles streaming, message normalization, tool schema conversion, betas, retries. ~125KB of critical integration code.
services/mcp/ — Full MCP (Model Context Protocol) implementation: client, config, auth, connection management, elicitation handling. 25+ files.
services/analytics/ — OpenTelemetry, first-party events, GrowthBook feature flags.
services/tools/ — Tool execution engine, streaming executor, tool hooks, orchestration pipeline.

Terminal UI

React + Ink 87 hooks Keybindings Dialogs Virtual scroll

Claude Code uses React + Ink to render a full interactive terminal UI. 87 React hooks manage everything from typeahead autocomplete (212KB!) to voice integration (99KB), REPL bridges, virtual scrolling, and global keybindings. The UI is not a thin shell — it's a full React app rendering to the terminal.

Extensions

Skills Plugins MCP Servers Hooks Agents

Skills — 19 bundled skills + user-defined ones. Loaded from .claude/skills/. Examples: batch processing, code simplification, scheduling.
Plugins — Full plugin system with 19+ subcommands for management. Plugins can add commands, tools, and hooks.
MCP Servers — Connect to external MCP servers (stdio, SSE, HTTP, WebSocket) to extend tool capabilities.
Hooks — Pre/post-sampling, session lifecycle, file change, agent, HTTP, and prompt hooks.
Agents — Sub-agent spawning with model override, worktree isolation, background execution, and multi-agent coordination.

Core Modules Deep Dive

Click any card to expand details. Each module represents a major system within Claude Code.

⚙ Query Engine

The brain — async generator driving the entire conversation loop

click to expand

Files: query.ts (68KB), QueryEngine.ts (46KB)

The query engine is an async generator — a powerful pattern for streaming AI systems. It yields events as they happen:

Input processing — Validates and normalizes user input
Message assembly — Builds the full message array (system + history + user)
API call — Streams the request to Claude API
Tool extraction — Parses tool_use blocks from the response
Tool execution — Runs tools via StreamingToolExecutor
Result aggregation — Collects results for the next turn
Loop or complete — Decides whether to continue the conversation

💡 Pattern: Async Generator for Agent Loops

Using an async generator for the main agent loop is elegant: each yield is a checkpoint where the UI can render, the user can interrupt, and state can be saved. This is a pattern worth adopting in your own AI systems.

🔧 Tool System

44+ tools that give Claude the ability to interact with the world

click to expand

Interface: Tool.ts (29KB) — defines the common tool contract

Registry: tools.ts (17KB) — assembles and registers all tools

Executor: StreamingToolExecutor.ts — concurrent execution engine

Each tool provides:

name — Unique identifier sent to the API
description — Shown to Claude in system prompt
input_schema — JSON Schema for parameters
execute() — The actual implementation
isReadOnly() — Whether it modifies state
Permission metadata — What access level is needed

💡 Pattern: Tool-Augmented LLMs

This is the ReAct pattern (Reason + Act). The LLM decides which tool to call, the system executes it, and the result goes back to the LLM. Claude Code's implementation shows production-grade patterns: concurrent execution, permission checks, streaming results, and graceful error handling.

🔒 Permission System

Multi-layered security: modes, rules, classifiers, and hooks

click to expand

24+ files in utils/permissions/

Permission checking flows through multiple layers:

Mode check — Default, Auto, Plan, or Unlimited
Rule matching — Allow/deny/ask rules with glob patterns
ML classifier — yoloClassifier.ts uses pattern matching for auto-approval
Dangerous pattern detection — Catches risky operations
Interactive prompt — Falls back to asking the user

Key insight: permissions are tool-specific. A BashTool command gets different scrutiny than a FileReadTool call. The system knows which tools are read-only vs. state-modifying.

🌐 API Integration

How Claude Code communicates with the Anthropic API

click to expand

File: services/api/claude.ts (~125KB)

The API layer handles:

Streaming — Uses @anthropic-ai/sdk streaming API with delta events
Message normalization — Converts internal message types to API format
Tool schema conversion — Translates tool definitions to API schema
Beta features — Extended thinking, prompt caching, structured outputs
Retry logic — Exponential backoff with jitter via withRetry.ts
Token tracking — Per-request and session-wide cost accounting
Context management — Prompt caching, context collapse for long conversations

💡 Production API Pattern

Note the separation: the API client handles transport, retries, and format conversion. Business logic stays in the query engine. This separation of concerns is critical for maintainability in AI applications.

🎨 Terminal UI (React + Ink)

Full React application rendering to the terminal

click to expand

Claude Code renders using Ink, a React renderer for the terminal. This means:

Components, state, hooks, effects — all standard React patterns
87 custom hooks for complex UI behaviors
useTypeahead (212KB) — Full autocomplete engine
useVoiceIntegration (99KB) — Speech-to-text
useVirtualScroll — Efficient rendering of long outputs
useGlobalKeybindings — Keyboard shortcut system
Interactive dialogs, file pickers, permission prompts

💡 Why React for CLI?

React's declarative model makes complex TUI state management tractable. As AI tools get more interactive (progress bars, multi-step confirmations, live streaming), a component model pays for itself.

🚀 Agent & Swarm System

Multi-agent orchestration with sub-agents, teams, and coordinators

click to expand

AgentTool (234KB) spawns independent sub-agents that:

Run with their own context window and tool set
Can use different models (Opus, Sonnet, Haiku)
Execute in isolated git worktrees
Run in background with completion notifications
Communicate via SendMessageTool

Coordinator mode (coordinatorMode.ts) orchestrates "swarms" of agents working together on complex tasks. TeamCreate/TeamDelete tools manage agent teams.

💡 Multi-Agent Pattern

This is a hierarchical multi-agent system: a coordinator delegates to workers, each with specialized capabilities and isolation. Key challenges: communication, state sharing, and permission inheritance. Claude Code solves these with message passing, worktree isolation, and cascading permissions.

🔌 MCP Integration

Model Context Protocol for connecting to external tool servers

click to expand

25+ files in services/mcp/

MCP is an open protocol that lets Claude connect to external tool servers:

Transports: stdio, SSE, HTTP, WebSocket, in-process SDK
Tools: MCP servers expose tools that Claude can use
Resources: Access text, images, blobs from MCP servers
Authentication: Full OAuth flow per server
Elicitation: MCP servers can request additional info from the user

💡 Why MCP Matters

MCP decouples the AI system from specific tool implementations. Instead of hardcoding integrations, you connect to standardized servers. This is the future of AI tool ecosystems — learn this protocol well.

🧠 Memory System

Persistent memory across conversations with relevance-based retrieval

click to expand

Files: memdir/ directory

The memory system gives Claude persistent context:

MEMORY.md — Index file (max 200 lines) with pointers to topic files
Topic files — Detailed memories with frontmatter (name, description, type)
Types: user, feedback, project, reference
Relevance scoring — findRelevantMemories.ts selects which memories to load
Auto-memory — Automatic extraction from conversations
Team memory — Shared across team members

💡 Memory Architecture

This is a file-based RAG system without the vector DB. Instead of embedding similarity, it uses structured metadata and relevance heuristics. For many use cases, this simpler approach works just as well.

Application Startup Flow

From `claude` command to interactive session — click each step for details.

1

CLI Entry & Fast Paths

entrypoints/cli.tsx

Parse CLI args, check for fast-path flags (--version, --help), feature-gated modes

Commander.js parses arguments. Fast paths exit early without loading the full app (saves ~500ms). Feature flags like COORDINATOR_MODE, DAEMON, BRIDGE_MODE route to specialized entry points. This is a common startup optimization in large CLI apps.

2

Initialization

entrypoints/init.ts

Config validation, env vars, telemetry, LSP setup, OAuth account info

Validates all config files (settings.json from multiple sources), sets up environment variables, initializes OpenTelemetry for observability, creates the LSP server manager for code intelligence, and populates OAuth account information for API auth.

3

Settings Resolution

utils/settings/

Merge local + project + user + enterprise + MDM + dynamic settings

Settings cascade from 6 sources with increasing priority: local (~/.claude/settings.json), project (.claude/settings.json), user (platform-specific), enterprise (remote), MDM (device management), and dynamic (GrowthBook feature flags). Conflicts are resolved by priority.

4

Tool & Command Registration

tools.ts, commands.ts

Register 44+ tools and 104 commands, load MCP tools, load plugins

All built-in tools are instantiated and registered. MCP servers are connected and their tools are dynamically added. Plugins are loaded and may contribute additional commands/tools. The tool registry determines what capabilities Claude has in this session.

5

Permission Setup

utils/permissions/permissionSetup.ts

Initialize permission mode, load rules, set up classifiers

Permission rules are loaded from settings. The YOLO classifier is initialized for auto-mode. Dangerous patterns are compiled. The permission setup determines the security posture for the entire session.

6

Memory & Context Loading

memdir/, context.ts

Scan memory files, load MEMORY.md, build system context with git status

The memory system scans for MEMORY.md and topic files, scores them for relevance, and loads selected memories into the system context. Git status is captured. All of this becomes part of the system prompt that Claude sees.

7

React/Ink UI Render

main.tsx, ink.ts

Mount React component tree, render terminal UI, wait for user input

The Ink renderer mounts the React component tree to the terminal. This includes the input area, status line, virtual scroll region, and all interactive elements. The app is now ready for user input. Session resumption happens here if resuming a previous conversation.

Request Lifecycle

What happens when you type a message and press Enter. This is the core agent loop.

✍

User Input

earlyInput.ts

→

📄

Process Input

processUserInput.ts

→

📜

Build System Prompt

systemPrompt.ts

→

📦

Assemble Messages

query.ts

→

☁

API Call (Streaming)

services/api/claude.ts

→

⚡

Stream Response

StreamingToolExecutor.ts

→

🔒

Permission Check

permissions.ts

→

⚙

Execute Tools

toolExecution.ts

→

🔁

Loop or Complete

query.ts

💡 The ReAct Loop

This is the Reason-Act-Observe loop that powers all modern AI agents. Claude reasons about what to do, acts by calling tools, observes the results, and repeats. The loop continues until Claude decides no more tools are needed. Understanding this loop is fundamental to AI engineering.

Tools System

All 44+ tools organized by category. Click a category to expand.

📁 File System Operations 6 ▶

FileReadToolRead files with line numbers, supports images, PDFs, notebooks. Read-only, always safe.

FileEditToolExact string replacement in files. Requires reading first. Sends only the diff.

FileWriteToolWrite/overwrite entire files. Must read existing files first. Used for new files.

GlobToolFast file pattern matching (e.g., **/*.ts). Returns paths sorted by modification time.

GrepToolContent search with full regex. Supports file type filtering, context lines, counts.

NotebookEditToolEdit Jupyter notebook cells. Understands notebook structure and cell types.

⌨ Execution & Shell 3 ▶

BashToolExecute bash commands. Supports timeouts, background execution. Persistent working directory.

PowerShellToolWindows PowerShell equivalent. Used on Windows platforms instead of BashTool.

LSPToolLanguage Server Protocol integration. Go-to-definition, references, diagnostics.

🤖 AI & Agent Orchestration 6 ▶

AgentToolSpawn sub-agents with own context. Model override, worktree isolation, background execution. 234KB implementation.

SendMessageToolSend messages between agents. Enables inter-agent communication in swarm mode.

TeamCreateToolCreate multi-agent teams for coordinated work. Part of the swarm system.

TeamDeleteToolDelete agent teams when work is complete.

SkillToolInvoke custom skills — reusable task definitions with specialized prompts.

MCPToolInvoke tools exposed by connected MCP servers. Dynamic tool discovery.

🌐 Web & External Data 4 ▶

WebSearchToolSearch the web using Brave Search API. Returns titles, snippets, URLs.

WebFetchToolFetch content from URLs. Handles HTML, JSON, text. Content extraction.

ListMcpResourcesToolList resources available from connected MCP servers.

ReadMcpResourceToolRead specific resources (text, images, blobs) from MCP servers.

☑ Task Management 5 ▶

TaskCreateToolCreate tasks for tracking work progress. Useful for multi-step implementations.

TaskUpdateToolUpdate task status (in_progress, completed, failed). Mark milestones.

TaskListToolList all tasks and their statuses. Overview of work progress.

TaskGetToolGet details of a specific task including output and status.

TaskStopToolStop a running task. Cancel background work.

🛠 Workflow & Mode 7 ▶

EnterPlanModeToolSwitch to planning mode. Claude can only reason and plan, not execute.

ExitPlanModeToolExit planning mode and return to execution.

EnterWorktreeToolCreate an isolated git worktree for safe experimentation.

ExitWorktreeToolExit worktree, optionally keeping or discarding changes.

AskUserQuestionToolExplicitly ask the user a question when clarification is needed.

SleepToolPause execution for a specified duration. Useful between polling operations.

ToolSearchToolSearch for available tools by name or description. Deferred tool discovery.

🕑 Scheduling & Remote 3 ▶

ScheduleCronToolSchedule recurring tasks with cron expressions. Persistent scheduling.

RemoteTriggerToolExecute remote scheduled agents on-demand.

TaskOutputToolRead output from completed async/remote tasks.

🔧 Utility & Debug 3 ▶

BriefToolGenerate a concise summary of work done. Useful for handoffs.

SyntheticOutputToolGenerate synthetic tool outputs for testing and coordination.

MCPAuthToolHandle MCP server authentication flows (OAuth, tokens).

💡 Tool Design Pattern

Notice how each tool has a clear single responsibility, a declarative JSON Schema for inputs, and explicit read-only vs. mutating classification. This enables the permission system to make fine-grained decisions. When building your own AI tools, follow this pattern: small, well-defined, self-describing tools.

Permission System

Multi-layered security model. Click each mode for details.

Default Interactive Approval

Every tool use prompts the user for approval. Safest mode. Good for learning.

In default mode, every tool invocation triggers an interactive prompt in the terminal. The user can approve, deny, or set a rule for future calls. This is the safest mode and helps you understand exactly what Claude is doing.

Best for: New users, sensitive codebases, learning how Claude Code works.

Auto Classifier-Based Approval

ML classifier auto-approves safe operations. Dangerous operations still prompt.

The YOLO classifier (yoloClassifier.ts, 52KB) uses pattern matching to classify tool calls as safe or dangerous. Safe operations (reads, searches) are auto-approved. Dangerous operations (deletes, force pushes, writes to sensitive paths) still require user approval.

Classification pipeline:

Check allow/deny rules from settings
Check if tool is read-only
Run bash command classifier for shell commands
Check dangerous pattern database
Fall back to interactive if uncertain

Best for: Experienced users who want speed but still want guardrails.

Plan Plan Mode

Claude can only reason and plan. No tool execution until explicitly exited.

In plan mode, Claude can read and analyze but cannot make changes. This is useful for:

Getting a plan before committing to changes
Understanding code without risk of modification
Reviewing architecture decisions

Use EnterPlanModeTool / ExitPlanModeTool to toggle.

Unlimited Full Auto-Approval

All operations auto-approved. Maximum speed, minimum safety. Use with caution.

Every tool call is automatically approved without any checks. This is the fastest mode but provides no safety net.

Warning: Only use in sandboxed environments, CI pipelines, or when you fully trust the operation. Claude can delete files, run arbitrary commands, and push code without asking.

Permission Check Flow

🔧

Tool Call

→

📋

Check Rules

Allow / Deny / Ask

→

🤖

Classifier

yoloClassifier.ts

→

⚠

Danger Check

dangerousPatterns.ts

→

👤

User Prompt

interactiveHandler.ts

Agent & Multi-Agent System

How Claude Code spawns, manages, and coordinates multiple AI agents.

      // AgentTool spawns a sub-agent with its own context

      const agent = await spawnAgent({

        prompt: "Search for all API endpoints and document them",

        model: "sonnet",         // Model override per agent

        subagent_type: "Explore", // Specialized agent type

        isolation: "worktree",  // Git worktree isolation

        run_in_background: true, // Non-blocking execution

      });

      // Agent runs independently with its own:

      // - Context window (no shared state pollution)

      // - Tool set (may differ from parent)

      // - Permission inheritance

      // - Memory snapshot

Built-in agent types: general-purpose, Explore (fast codebase search), Plan (architecture), statusline-setup, claude-code-guide. Each type has a curated tool set optimized for its task.

      // Coordinator creates a team of agents

      TeamCreateTool({

        agents: [

          { name: "frontend", prompt: "Fix React components" },

          { name: "backend",  prompt: "Update API handlers" },

          { name: "tests",    prompt: "Write integration tests" },

        ]

      });

      // Agents communicate via SendMessageTool

      SendMessageTool({ to: "backend", message: "API schema changed" });

      // Coordinator mode (coordinatorMode.ts) manages:

      // - Work delegation across agents

      // - Communication routing

      // - Result aggregation

Worktree Isolation

Each agent gets its own git worktree — a separate working directory with its own branch. Changes are isolated. If the agent fails, the worktree is cleaned up. If changes are good, they can be merged back.

Context Isolation

Each sub-agent has its own context window. It doesn't see the parent's full conversation. This prevents context pollution and allows agents to focus on their specific task with maximum context budget.

Permission Inheritance

Sub-agents inherit permission settings from their parent. A restricted parent cannot spawn an unrestricted child. This ensures the security model is maintained throughout the agent hierarchy.

Memory Snapshots

Agents receive a snapshot of relevant memories from the parent session. They can read but not write to the parent's memory. This gives context without risking memory corruption.

Agents can run asynchronously in the background:

LocalAgentTask — In-process background execution. The agent runs in the same process but on a separate async track. Notification on completion.
RemoteAgentTask — Cloud execution via CCR (Claude Code Remote). The agent runs on Anthropic's infrastructure. Useful for long-running tasks.
ScheduleCronTool — Schedule agents to run on a cron schedule. Persistent scheduling that survives session restarts.
RemoteTriggerTool — Manually trigger scheduled remote agents on-demand.

MCP (Model Context Protocol)

The open protocol for connecting Claude to external tool servers. Click components for details.

🔌 MCP Client

Core protocol implementation (~119KB)

click to expand

services/mcp/client.ts implements the full MCP client:

Connection lifecycle management
Message serialization/deserialization
Tool invocation proxy
Resource access
Capability negotiation
Error handling and reconnection

📢 Transport Layer

5 transport types for different server architectures

click to expand

stdio — Launch server as child process, communicate via stdin/stdout. Most common for local tools.
SSE — Server-sent events over HTTP. Good for web-based servers.
HTTP — REST API transport. Stateless request/response.
WebSocket — Full-duplex communication. Best for real-time tools.
SDK (In-process) — Direct function calls. Zero network overhead.

🔑 Authentication

Full OAuth flow per server (~88KB)

click to expand

services/mcp/auth.ts handles:

Per-server OAuth credentials
Token refresh and rotation
IDP (Identity Provider) login flows
Cross-App Access (XAA) authentication
Credential storage and retrieval

🏭 Elicitation

MCP servers can request user input (~179KB UI)

click to expand

Elicitation is a unique MCP feature: servers can ask the user for additional information during tool execution. The ElicitationDialog.tsx (179KB!) renders rich interactive forms in the terminal for:

Configuration prompts
Authentication flows
Decision points
Data entry forms

💡 MCP Architecture Lesson

MCP separates tool definition from tool implementation. The AI sees a standard tool interface; the implementation can be a local script, a remote API, or a complex distributed system. This is the adapter pattern applied to AI tooling — a powerful abstraction for building extensible AI systems.

Hook System

Extensibility through lifecycle hooks. Hooks let you inject custom behavior at key moments.

Pre-Sampling Hooks

Run before the API call. Can modify the messages, add context, or block the request. Use for prompt engineering, guardrails, or dynamic context injection.

Post-Sampling Hooks

Run after Claude's response. Can modify the response, log it, trigger side effects. Use for output filtering, analytics, or automated follow-ups.

Tool Hooks

Run before/after each tool execution. Can approve, deny, modify, or log tool calls. Use for auditing, custom permission logic, or tool result caching.

Session Hooks

Lifecycle events: session start, end, pause, resume. Use for environment setup/teardown, logging, or automated git operations.

File Change Hooks

Triggered when files change on disk. Use for auto-formatting, linting, test running, or notifying other tools of changes.

HTTP Hooks

Send webhooks to external services. Use for Slack notifications, CI triggers, audit logs, or integration with other systems.

    // Example hook configuration in settings.json

    {

      "hooks": {

        "pre-sampling": [{

          "type": "command",

          "command": "echo 'Adding context...' && cat .claude/context.md"

        }],

        "post-tool-use": [{

          "type": "http",

          "url": "https://hooks.slack.com/...",

          "matcher": "BashTool"

        }]

      }

    }

Memory System

Persistent cross-conversation memory with structured types and relevance-based retrieval.

💬

Conversation

User reveals info

→

🔍

Detect Memory

extractMemories/

→

📄

Classify Type

user / feedback / project / reference

→

💾

Write File

.claude/memory/*.md

→

📑

Update Index

MEMORY.md

Memory Types

User Memory

Who is the user? Role, expertise, preferences. "Senior Go dev, new to React." Shapes how Claude communicates and what it explains.

Feedback Memory

User corrections and validations. "Don't mock the DB." "Single PR was right." Prevents repeating mistakes and reinforces good patterns.

Project Memory

Ongoing work context. "Merge freeze after March 5." "Auth rewrite for compliance." Non-obvious project state that can't be derived from code.

Reference Memory

Pointers to external resources. "Bugs tracked in Linear INGEST project." "Latency dashboard at grafana.internal/d/api-latency."

    // Memory file structure (.claude/memory/feedback_testing.md)

    ---

    name: Integration Tests Must Hit Real DB

    description: Never mock the database in integration tests

    type: feedback

    ---

    Don't mock the database in integration tests.

    **Why:** Last quarter, mocked tests passed but the prod

    migration failed because mock/prod divergence masked a bug.

    **How to apply:** When writing or reviewing tests that

    touch the database, always use a real test database instance.

Streaming Architecture

How real-time streaming works from API to terminal.

API Stream Events

        // Stream events from @anthropic-ai/sdk

        message_start     // Initial metadata, model info

        content_block_start // New block: text, tool_use, thinking

        content_block_delta // Incremental content (text chunks)

        content_block_stop // Block complete

        message_delta     // Usage updates (tokens)

        message_stop      // Response complete

Streaming Tool Execution

        // StreamingToolExecutor manages concurrency

        if (tool.isReadOnly()) {

          // Run concurrently with other reads

          executeParallel(tool);

        } else {

          // Wait for all pending, run exclusively

          await drainQueue();

          executeExclusive(tool);

        }

        // Results buffered, emitted in order

        // Progress messages stream to UI immediately

💡 Streaming Pattern

The concurrent-safe vs. exclusive tool execution model is brilliant: read-only tools (Glob, Grep, Read) can run in parallel for speed, while mutating tools (Edit, Write, Bash) run exclusively to prevent conflicts. This is the same principle as a read-write lock in concurrent programming, applied to AI tool execution.

CLI Commands

104 commands organized by function. These are slash commands you type in the CLI.

Git & PR Workflow

/commit /commit-push-pr /review /diff /branch /security-review

Session Management

/session /resume /rewind /history /clear /export

Planning & Tasks

/plan /tasks /effort /status /ultraplan

Configuration

/config /keybindings /permissions /privacy-settings /theme /model

Extensions

/mcp /skills /hooks /plugin /reload-plugins

Setup & Auth

/login /logout /install /desktop /ide /mobile

Diagnostics

/doctor /help /debug-tool-call /insights /cost /usage

Utility

/fast /brief /copy /voice /memory /bughunter /advisor

Configuration & Settings

Settings cascade from 6 sources with increasing priority.

Priority 1

~/.claude/settings.json Local user defaults

User-level defaults. Applied to all projects. Set personal preferences here.

Priority 2

.claude/settings.json Project-specific

Project-level settings. Committed to git for team sharing. Tool permissions, hooks, MCP servers.

Priority 3

Platform user settings OS-specific location

Platform-specific settings location (XDG on Linux, Application Support on macOS).

Priority 4

Enterprise managed Remote config

Enterprise-managed settings pushed from organization administrators. Cannot be overridden by user.

Priority 5

MDM policies Device management

Mobile Device Management policies. Enforced by IT. Highest static priority.

Priority 6

GrowthBook feature flags Dynamic runtime

Dynamic feature flags via GrowthBook. Can enable/disable features in real-time without deployment. Highest priority.

💡 Configuration Cascade Pattern

This 6-level cascade (local → project → user → enterprise → MDM → dynamic) is a production pattern you'll see in enterprise software. Each level can override the previous, with dynamic flags having the final say. This enables progressive rollouts, A/B testing, and emergency kill switches.

Project Directory Structure

Click folders to expand. Key files are annotated.

📂entrypoints/— CLI, SDK, MCP entry points

📄cli.tsx— Main CLI bootstrap (39KB)

📄init.ts— Full initialization sequence

📄mcp.ts— MCP server entrypoint

📂sdk/— Agent SDK schemas

📄coreSchemas.ts— Zod schemas (56KB)

📄controlSchemas.ts— Control messages (19KB)

📄main.tsx— ★ THE main orchestrator (804KB)

💡Imports all subsystems, sets up React/Ink, initializes everything

📄query.ts— Async generator agent loop (68KB)

💡The core conversation loop: input → API → tools → repeat

📄QueryEngine.ts— High-level query orchestrator (46KB)

📄Tool.ts— Tool interface definition (29KB)

📄tools.ts— Tool registry & assembly (17KB)

📄commands.ts— Command registry (25KB)

📄history.ts— Session history (14KB)

📄context.ts— System context builder (6KB)

📄cost-tracker.ts— Token/cost accounting (10KB)

📂tools/— 44+ tool implementations

📂AgentTool/— Sub-agent spawning (234KB!)

📂BashTool/— Shell execution

📂FileReadTool/— File reading

📂FileEditTool/— File editing

📂FileWriteTool/— File writing

📂GlobTool/— File search

📂GrepTool/— Content search

📂WebSearchTool/— Web search

📂WebFetchTool/— URL fetching

📂MCPTool/— MCP integration

...+ 34 more tools

📂commands/— 104 CLI commands

📄commit.ts

📄commit-push-pr.ts

📂config/

📂mcp/

📂plugin/— 19 plugin subcommands

...+ 99 more commands

📂services/— Core services (100+ files)

📂api/— Anthropic API (22 files)

📄claude.ts— Main API client (125KB)

📄withRetry.ts— Retry with backoff

📄errors.ts— Error classification

📂mcp/— MCP protocol (25 files)

📄client.ts— MCP client (119KB)

📄auth.ts— OAuth (88KB)

📄config.ts— Server config (51KB)

📂tools/— Tool execution engine

📂analytics/— Telemetry (11 files)

📂utils/— 331 utility files

📂permissions/— Permission system (24+ files)

📄permissions.ts— Main engine (52KB)

📄yoloClassifier.ts— Auto-approval (52KB)

📄filesystem.ts— FS permissions (62KB)

📄dangerousPatterns.ts— Risk detection

📂hooks/— Hook utilities (19 files)

📂settings/— Settings management

📂hooks/— 87 React hooks

📄useTypeahead.tsx— Autocomplete (212KB!)

📄useReplBridge.tsx— REPL integration (115KB)

📄useVoiceIntegration.tsx— Voice (99KB)

📄useCanUseTool.tsx— Permission hook (40KB)

📂memdir/— Memory system

📂coordinator/— Multi-agent coordination

📂skills/— Skill system (19 bundled)

📂types/— TypeScript type definitions

📂state/— App state management

Key AI Engineering Concepts

Patterns and concepts demonstrated in this codebase that every AI engineer should know.

ReAct Loop (Reason-Act-Observe)

The fundamental pattern: LLM reasons about the task, decides on a tool action, observes the result, and repeats. Implemented in query.ts as an async generator.

Tool-Augmented LLMs

Extending LLM capabilities with external tools. Each tool has a JSON Schema, description, and execution function. The LLM uses the schema to construct valid calls.

Streaming Architecture

Real-time token streaming from API to UI. Enables responsive UX and early tool detection. Uses SSE (Server-Sent Events) under the hood.

Multi-Agent Orchestration

Hierarchical agent system: coordinator delegates to workers. Each agent has isolated context, permissions, and tools. Communication via message passing.

Context Window Management

Automatic compaction when context gets too long. History compression, message summarization, and selective memory loading keep conversations within limits.

Permission & Safety Systems

Multi-layered security: mode selection, rule matching, ML classification, dangerous pattern detection, and interactive approval. Essential for production AI agents.

Prompt Engineering at Scale

The system prompt is dynamically assembled from base instructions, tool descriptions, memory, git context, and more. This is production prompt engineering.

Model Context Protocol (MCP)

Open protocol for connecting AI to external tools. Decouples tool definition from implementation. Transport-agnostic (stdio, HTTP, WebSocket).

Feature Flags & Progressive Rollout

50+ feature flags gate functionality. GrowthBook integration enables A/B testing and gradual rollout. Dead code elimination removes unused features from builds.

Persistent Memory / RAG-lite

File-based memory system with structured types, relevance scoring, and cross-session persistence. Simpler than vector DBs but effective for the use case.

Async Generator Pattern

Using JavaScript async generators for the agent loop. Each yield is a checkpoint for UI updates, user interruption, and state persistence.

Hook-Based Extensibility

Lifecycle hooks (pre/post-sampling, tool execution, session) enable extensibility without modifying core code. Classic plugin architecture pattern.

Recommended Learning Path

A structured approach to understanding this codebase and growing as an AI engineer.

1

Understand the Agent Loop

Start with: query.ts, QueryEngine.ts

Read how the async generator drives the conversation. This is the heart of any AI agent system.

Key questions to answer:

How does the agent decide when to call tools vs. respond with text?
How are messages assembled before each API call?
What happens when context gets too long?
How does the loop know when to stop?

Build exercise: Write a minimal ReAct loop in TypeScript that calls a single tool.

2

Study the Tool System

Start with: Tool.ts, tools.ts, tools/BashTool/

Understand how tools are defined, registered, and executed. Look at the Tool interface.

Key questions:

What does a tool's JSON Schema look like?
How does the streaming executor handle concurrent vs exclusive tools?
How are tool results formatted and sent back to the API?

Build exercise: Implement a custom tool (e.g., a calculator) following the Tool interface pattern.

3

Explore the API Integration

Start with: services/api/claude.ts, withRetry.ts

See how the Anthropic API is called, how streaming works, and how errors are handled.

Key questions:

How are stream events processed?
What betas and features are enabled?
How does retry logic work? What's the backoff strategy?
How are tokens counted and costs tracked?

Build exercise: Build a streaming chat client using the Anthropic SDK with tool support.

4

Learn the Permission Model

Start with: utils/permissions/permissions.ts, yoloClassifier.ts

Understand how safety is enforced in a system where AI can execute arbitrary code.

Key insight: This is the hardest part of building production AI agents. Any AI that can run code needs guardrails. Study how Claude Code classifies dangerous operations and how the cascade of rules/classifiers/prompts works together.

Build exercise: Design a permission system for a simple AI agent that can read/write files.

5

Dive into Multi-Agent

Start with: tools/AgentTool/, coordinator/coordinatorMode.ts

See how sub-agents are spawned, isolated, and coordinated for complex tasks.

Key questions:

How is context isolated between parent and child agents?
How do agents communicate?
How is permission inheritance handled?
What's the difference between local and remote agents?

Build exercise: Build a mini multi-agent system where a coordinator delegates tasks to specialized workers.

6

Master MCP

Start with: services/mcp/client.ts, services/mcp/config.ts

Understand the Model Context Protocol — the future standard for AI tool integration.

MCP is becoming the USB of AI tools — a standard interface that any tool can implement and any AI can consume. Understanding it deeply will be valuable.

Build exercise: Build an MCP server that exposes a custom tool (e.g., a database query tool) and connect it to Claude Code.

7

Study Production Patterns

Explore: services/analytics/, cost-tracker.ts, utils/settings/

Learn the production engineering: observability, cost tracking, config management, feature flags.

Production AI systems need more than just the AI part. Study:

Observability — OpenTelemetry, event logging, diagnostics
Cost management — Token tracking, budget enforcement
Configuration — Multi-source settings cascade
Feature flags — GrowthBook integration for progressive rollout
Error handling — Graceful degradation, retries, user-friendly errors

Execution Flows — Visual Walkthrough

Trace exactly how code executes through the system. Select a workflow, then click any node to expand its details and decision branches.

● Action

◆ Decision

↻ Loop

⚠ Error / Recovery

→ Emit / Yield

🏁 CLI Entry & Routing

When you run claude in the terminal, the CLI has to decide which mode to launch. There are 12+ fast paths that exit before loading the full app, plus the main interactive path. This decision tree runs in under 50ms for fast paths.

Entry Process starts

entrypoints/cli.tsx : line 33

CLI binary is invoked. Args are parsed from process.argv.

The entry point is the compiled cli.tsx file. Before loading any modules, it checks for fast-path flags. This is a critical performance optimization — claude --version returns instantly instead of waiting for the full app to load.

Check Is it a fast-path flag?

--version, --dump-system-prompt, --daemon-worker, etc.

Yes — Fast Path (zero/minimal module loading)

--version / -v → Print version from MACRO.VERSION (build-time constant), exit immediately.

--dump-system-prompt → Load just the prompt builder, print the full system prompt, exit.

--daemon-worker → Launch as lean background worker process. Skips all UI.

No — Continue checking

Fall through to check for server modes and specialized entries.

Check Is it a server/integration mode?

MCP server, Chrome extension, bridge, computer use

MCP Server Mode

--claude-in-chrome-mcp → Run as MCP server for Chrome extension.

--chrome-native-host → Chrome native messaging host.

--computer-use-mcp → Computer use MCP server (feature-gated: CHICAGO_MCP).

Bridge / Remote Control Mode

Commands: remote-control, rc, bridge, sync

Checks: auth status → GrowthBook gate → policy (allow_remote_control) → bridgeMain(args)

This enables controlling Claude Code from external tools (IDE extensions, web UI).

Daemon Mode

Command: daemon [subcommand] → Loads configs, inits sinks, runs daemonMain().

Manages background processes and persistent sessions.

Check Is it a background/template/runner mode?

Background sessions, templates, BYOC, self-hosted

Background Sessions

Commands: ps, logs, attach, kill, --bg

Manages sessions running in background. ps lists them, attach reconnects, kill terminates.

Template Jobs

Commands: new, list, reply → Job template management.

Runner Modes

BYOC environment runner, self-hosted runner — for enterprise cloud deployments.

Check Tmux worktree fast path?

--tmux + --worktree flags for isolated agent execution

Yes — Worktree in Tmux

Checks isWorktreeModeEnabled(), then calls execIntoTmuxWorktree(args).

This creates an isolated git worktree inside a tmux session — used by the Agent system for parallel work.

No — Main interactive path

We've exhausted all fast paths. Time to load the full application.

Load Capture early input & dynamic import main.tsx

entrypoints/cli.tsx : line 287

Start capturing keystrokes immediately, then load the 804KB main module.

startCapturingEarlyInput() begins buffering user keystrokes before the app is ready. This means the user can start typing immediately — the input will be replayed when the UI is ready. Then main.tsx is dynamically imported (lazy loaded) to begin full initialization.

💡 Pattern: Early Input Capture

Buffering user input during boot is a UX pattern that makes apps feel instant even when they're not. The perceived latency drops because the user's intent isn't lost during startup.

Init Full initialization sequence

entrypoints/init.ts → main.tsx

Config, telemetry, LSP, OAuth, tools, commands, permissions, memory, UI mount.

Initialization order (matters!):

Validate configs from all 6 sources
Set environment variables
Initialize OpenTelemetry
Start LSP server manager
Create scratchpad directory
Populate OAuth account info
Register 44+ tools & 104 commands
Connect MCP servers & discover tools
Load plugins
Set up permission mode & rules
Load memory (MEMORY.md + relevant topics)
Build system context (git status, etc.)
Mount React/Ink UI

Loop Enter the main REPL — waiting for user input

main.tsx → launchRepl()

The main while(true) loop that processes commands until the session ends.

The REPL loop dequeues commands from a priority queue. Priority order: now > next > later. It checks if each command is a slash command or regular input, routes accordingly, and handles the query lifecycle. The loop exits when the user quits or the session is terminated.

🔄 The Agent Loop (query.ts)

This is the heart of the system. An async generator that drives the entire Reason-Act-Observe cycle. Each iteration: call the API, stream the response, execute any tools, then decide whether to continue. Click nodes to see decision branches and error recovery paths.

🔁 Query Loop Iteration Start

Setup Prepare iteration

query.ts — queryLoop()

Increment depth counter, apply compaction strategies, create tool executor.

At the start of each iteration:

Snip compact (HISTORY_SNIP) — Remove old tool_use/tool_result pairs
Microcompact — Compress tool pairs, preserve thinking blocks
Context collapse — Apply staged collapses to save tokens
Autocompact — Proactive summarization of old messages
Create StreamingToolExecutor — Fresh executor for this turn
Hard limit check — Block if at absolute context limit

☁ API Call & Streaming

Stream Call Anthropic API with streaming

deps.callModel() → services/api/claude.ts

Sends messages + system prompt + tool schemas. Receives stream events.

The API call includes:

Full message history (after compaction)
System prompt (dynamically built)
All available tool schemas
Betas: extended thinking, prompt caching, structured outputs
Model selection (may differ from default if fallback occurred)
Max tokens budget

Wrapped in attemptWithFallback — if the primary model fails, it can switch to a fallback model and retry.

Process For each stream event...

query.ts — inner streaming loop

Process content blocks: text, thinking, tool_use. Feed tools to executor.

Stream events processed:

message_start → Extract metadata, model info
content_block_start → New block: text, tool_use, or thinking
content_block_delta → Incremental text chunks (rendered live in UI)
content_block_stop → Block complete
message_delta → Token usage updates
message_stop → Response complete

During streaming, tool_use blocks are detected and immediately fed to the StreamingToolExecutor — tools can start executing before the full response is complete.

Check Recoverable error withheld?

Prompt too long, media size error, max output tokens

Some errors are "withheld" — not surfaced immediately because the system can try to recover:

Prompt Too Long (413)

The context exceeded the model's limit. Recovery: context collapse drain → reactive compact → retry. See "Context Management" flow.

Max Output Tokens

Claude's response was truncated. Recovery: if count < 3, retry with 64K max tokens. Then inject TRUNCATE_MESSAGE for multi-turn recovery.

Media Size Error

Image/file too large. Recovery: strip media from messages and retry.

Check Streaming fallback triggered?

FallbackTriggeredError caught — switch model and retry

Yes — Model Fallback

Switch currentModel to fallback model. Strip thinking blocks (model may not support them). Yield tombstones for orphaned messages. Create fresh executor. Log analytics event. Yield "switched model" notification. Retry the entire API call.

No — Continue to post-streaming

All stream events processed successfully.

⚙ Tool Execution & Continuation

Hooks Execute post-sampling hooks

executePostSamplingHooks()

Fire-and-forget hooks after Claude's response. Non-blocking.

Check Was the request aborted?

User pressed Ctrl+C or Escape during streaming

Yes — Abort

Get remaining results from executor (generate synthetic results for unfinished tools). Clean up computer use session. Return { reason: 'aborted_streaming' }.

No — Check for tool calls

Decision Did Claude request any tool calls?

Check: does the response contain tool_use content blocks?

Yes — Execute tools (needsFollowUp = true)

The StreamingToolExecutor already started executing tools during streaming. Now we wait for all remaining tools to complete:

processQueue() — Execute any queued tools
getCompletedResults() — Yield results in order
Race: completing tools vs. progress messages
Collect all tool_result messages

After all tools complete: run stop hooks, check continuation conditions, then loop back to the top with tool results appended to messages.

No — Claude responded with text only

No tools to execute. Check for withheld errors that need recovery. If no recovery needed, this is a normal completion.

Check Continuation conditions

Should the loop continue, or is the turn complete?

Stop — Turn Complete

preventContinuation from stop hooks → { reason: 'stop_hook_blocked' }

maxTurns reached → { reason: 'max_turns_exceeded' }

No tool calls and no recovery needed → { reason: 'complete' } (normal exit)

Token budget exceeded → { reason: 'token_budget_exceeded' }

Continue — Next Iteration

Tool results are appended to messages. State is updated. Loop back to the top.

This is the ReAct cycle: Claude reasons → calls tools → sees results → reasons again.

💡 How Many Turns?

A simple question might complete in 1 turn (text only). A coding task might take 5–20 turns (read files, edit, run tests, fix errors). Complex tasks can hit 50+ turns. The loop continues until Claude stops calling tools, hits the max turn limit, or the context runs out.

🔧 Tool Execution Pipeline

When Claude decides to call a tool, it goes through validation, permission checking, execution, and result collection. The StreamingToolExecutor manages concurrency — read-only tools run in parallel, mutating tools run exclusively.

Receive Tool use block detected in stream

StreamingToolExecutor.addTool()

A tool_use content block is extracted from Claude's response.

The tool_use block contains: { id, name, input }. The executor finds the tool definition, parses the input against the JSON Schema, checks if the tool is concurrency-safe, and adds it to the queue with status 'queued'. Then processQueue() is called immediately.

Concurrency Can this tool run now?

Concurrency-safe tools can overlap; exclusive tools must wait.

Concurrent-safe (e.g., Read, Glob, Grep)

No tools executing: Any tool can start.

Only concurrent-safe tools executing: Another concurrent-safe tool can join.

Multiple reads and searches run in parallel for speed.

Exclusive (e.g., Edit, Write, Bash) or blocked

A non-concurrent tool is already executing: Wait.

This tool is non-concurrent and others are running: Wait for all to finish.

This prevents race conditions (e.g., two edits to the same file).

Pre-check Any abort conditions?

Check for discard, sibling error, user interrupt before executing.

Discarded (streaming fallback occurred)

Generate synthetic error: 'streaming_fallback'. Tool never runs.

Sibling error (another tool's Bash command failed)

Generate synthetic error: 'sibling_error'. When one Bash fails, remaining tools are cancelled to avoid cascading failures.

User interrupted (Ctrl+C)

Check tool.interruptBehavior: 'cancel' → reject immediately. 'block' → wait for current operation to finish.

All clear — proceed to permission check

Validate Parse input against JSON Schema

tool.inputSchema.safeParse()

Zod validation ensures the tool receives correctly-typed parameters.

Uses Zod's safeParse() for type-safe validation. If parsing fails, the tool returns an error result immediately — Claude will see the validation error and can retry with corrected parameters.

Permission Check permission to use this tool

canUseTool() → see "Permission Checking" flow

Full permission cascade: rules → classifier → hooks → user prompt.

Allowed — Execute tool

Permission granted by rule, classifier, or user approval.

Denied — Return rejection

Execute permissionDeniedHooks(). Yield REJECT_MESSAGE back to Claude. Claude sees "Permission denied" and adjusts its approach.

Execute Run tool.call()

tool.call({ input, context, abortSignal })

The tool's actual implementation runs. Returns an async generator of results.

Each tool is an async generator that can yield multiple results:

Progress messages — Yielded immediately to UI (e.g., "Reading file...")
Tool result blocks — Text content, images, errors
State updates — File state caching, content replacement

The tool receives an abortSignal so it can be cancelled mid-execution if the user interrupts.

Post Run post-tool-use hooks

runPostToolUseHooks()

After execution: logging, analytics, hook triggers, state updates.

Post-tool hooks can: log the tool call for audit, send webhooks to external services, update analytics, trigger file change watchers. Tool results are also stored for the session history.

Return Result yielded back to query loop

StreamingToolExecutor.getCompletedResults()

Results collected in order, appended as tool_result messages.

Order preservation is critical: even though tools may complete out of order (parallel execution), results are yielded in the same order they were requested. The executor stops yielding at the first incomplete non-concurrent tool to maintain this guarantee.

💡 Why Order Matters

The API expects tool_result messages in the same order as tool_use blocks. Out-of-order results would confuse the model. This is a subtle but critical detail in multi-tool execution.

🔒 Permission Checking Decision Tree

Every tool call passes through this multi-layer security cascade. The system balances safety with usability — read-only operations need less scrutiny than rm -rf.

Input Tool call received for permission check

utils/permissions/permissions.ts

Tool name, input parameters, and current permission context.

Layer 1 Check DENY rules

Scan all rule sources for explicit deny rules matching this tool.

Rule sources checked (in priority order):

policySettings — Enterprise policies (highest priority)
projectSettings — .claude/settings.json
flagSettings — Feature flags
userSettings — ~/.claude/settings.json
cliArg — Command-line flags
command — Per-command overrides
session — Runtime rules set during session

Match found → DENIED

Return PermissionDenyDecision immediately. No further checks.

Layer 2 Check ASK rules

Any rule forcing this tool to always require user confirmation?

Match found → Must ASK user

Skip classifier and go directly to interactive prompt.

Layer 3 Check ALLOW rules

Explicit allow rules matching this tool + input pattern.

Allow rules can match on:

Tool name (exact or glob: mcp__server1__*)
Input patterns (e.g., allow Bash only for npm test)
File paths (e.g., allow Edit only in src/**)

Match found → ALLOWED

Return PermissionAllowDecision. Tool can execute.

Layer 4 Permission mode check

What mode is the session in? This determines the next step.

Mode: "auto" → Run ML classifier

The YOLO classifier (yoloClassifier.ts, 52KB) analyzes the tool call:

Is the tool read-only? → Likely safe
Bash command classification — pattern match against known safe/dangerous commands
File path analysis — is it writing to sensitive paths?
Dangerous pattern database — rm -rf, git push --force, etc.

Result: approved (allow), denied (deny), or uncertain (fall through to ask).

Mode: "default" / "ask" → Skip to user prompt

No classifier. Every unmatched tool goes to the user.

Mode: "unlimited" → Auto-allow everything

No checks. Immediate approval. Use with caution.

Layer 5 Pre-tool-use hooks

Custom hooks can override the permission decision.

executePermissionRequestHooks() runs user-defined hooks that can:

Hook allows → ALLOWED

Hook denies → DENIED

No hook decision → Fall through to user prompt

Layer 6 Interactive user prompt

Last resort: ask the user to approve or deny.

Interactive session (terminal)

Show permission dialog with tool details. User choices:

Allow once — Grant for this call only
Allow always — Create a persistent allow rule
Deny — Reject this call
Deny always — Create a persistent deny rule

Headless session (non-interactive)

Run runPermissionRequestHooksForHeadlessAgent(). If no hook allows it: auto-deny. Headless agents can't prompt a user.

Result Final permission decision

allow / deny / ask — with reason and source for audit trail.

The result includes:

behavior: allow, deny, or ask
decision: the specific decision object
reason: which layer made the decision (rule, classifier, hook, user)

Everything is logged for the audit trail via permissionLogging.ts.

⚡ Streaming & Content Block Processing

How stream events from the Anthropic API are processed, rendered, and fed into the tool executor in real-time.

API Layer

SDK Stream — @anthropic-ai/sdk emits Stream<BetaRawMessageStreamEvent> objects. Connection uses HTTP/2 with SSE (Server-Sent Events). Keep-alive pings maintain the connection. Automatic reconnection on transient failures.

Event Router

Message Type Dispatch
message_start → Extract model, usage metadata, create AssistantMessage shell
content_block_start → Allocate new block: text | thinking | tool_use | redacted_thinking
content_block_delta → Append chunk to current block. Text deltas render immediately in terminal.
content_block_stop → Finalize block. If tool_use: send to StreamingToolExecutor
message_delta → Update token counts (input, output, cache read/write)
message_stop → Complete message. Trigger post-processing.

Text Blocks

Real-time rendering — Each text delta is immediately pushed to the React/Ink renderer. The terminal updates character-by-character as Claude "types." Markdown is parsed and formatted on-the-fly. Code blocks get syntax highlighting. This is the streaming UX users see.

Tool Use Blocks

Progressive parsing — Tool use blocks stream their JSON input incrementally. The executor receives the tool as soon as content_block_stop fires. For concurrent-safe tools, execution begins while other blocks are still streaming. This overlap is a key performance optimization.

Thinking Blocks

Extended thinking — When enabled (beta), Claude can emit thinking blocks showing its reasoning process. These are rendered differently in the UI (collapsible, dimmed). They may be stripped during model fallback if the fallback model doesn't support thinking.

Error Handling

Stream errors are classified:
• Transient (network, 429, 503) → withRetry.ts retries with exponential backoff + jitter
• Prompt too long (413) → Withheld for recovery (context collapse / compact)
• Max output tokens → Withheld for retry with higher limit
• Auth error (401, 403) → Surface to user, suggest re-auth
• Model overloaded (529) → Trigger model fallback
• Unrecoverable → Surface error, end turn

🧠 Context Window Management

As conversations grow, the context window fills up. Claude Code uses 4 layers of compaction to keep conversations going — each more aggressive than the last. This is one of the most sophisticated parts of the system.

📚 Proactive Compaction (every iteration)

Layer 1 Snip Compact

HISTORY_SNIP feature

Remove old tool_use + tool_result message pairs entirely.

The simplest strategy: old tool interactions are removed wholesale. The model doesn't need to see that you searched for a file 30 turns ago. Tracks tokens freed. This is cheap and fast but lossy — the model loses all memory of those tool calls.

Layer 2 Microcompact

microcompact()

Compress tool_use/tool_result pairs in-place. Preserve thinking blocks.

More surgical than snip: keeps the tool interaction but compresses the content. For example, a 500-line file read result might be replaced with a 5-line summary. Thinking blocks are protected (configurable). Uses prompt cache editing (CACHED_MICROCOMPACT) to avoid re-processing unchanged messages.

Layer 3 Context Collapse

contextCollapse.applyCollapsesIfNeeded()

Stage and commit collapses — replace message ranges with summaries.

Context collapse works in two phases:

Stage — Identify candidate message ranges for collapse
Commit — Replace ranges with generated summaries

The model sees a "collapsed" view where old conversations are summarized. This preserves the gist while freeing tokens for new content.

Layer 4 Autocompact

deps.autocompact()

Call a separate Claude instance to summarize old messages.

The most sophisticated strategy: a forked assistant call generates a summary of older messages. Recent messages (the "tail") are preserved verbatim. The summary replaces everything before the tail. Tracks a turn counter for periodic re-compaction. Logs token savings to analytics.

💡 Meta: Using AI to Manage AI Context

Autocompact uses Claude to summarize Claude's own conversation. This is a powerful pattern: the model summarizes what it needs to remember, then continues with the summary instead of the full history. It's recursive self-management.

⚠ Reactive Recovery (on error)

Error Prompt Too Long (413) received

The API rejected the request — context exceeded model limits.

This error is withheld (not surfaced immediately). The system attempts recovery in order:

Recovery 1: Context Collapse Drain

contextCollapse.recoverFromOverflow() — Force-commit all staged collapses. If tokens freed > 0, retry the API call.

Recovery 2: Reactive Compact

reactiveCompact.tryReactiveCompact() — Emergency compaction. Aggressively summarizes old messages. Strips images (media recovery). If successful, retry.

Both failed: Surface error to user

The conversation is truly too long. User needs to start a new session or manually clear history.

Error Max Output Tokens hit

Claude's response was truncated mid-generation.

Attempt 1 (count < 3, 8K cap)

Override max output tokens to 64K. Retry same request. Claude gets more room to finish.

Attempt 2 (already at 64K)

Inject TRUNCATE_MESSAGE into the conversation: "Your response was truncated. Please continue." Claude resumes where it left off in the next turn.

Attempt 3+ — Give up

Surface the error. The response is genuinely too long.

🚀 Agent Spawning Flow

When Claude decides it needs a sub-agent, the AgentTool creates an isolated child process with its own context window, tools, and permissions.

Request Claude calls AgentTool

tools/AgentTool/AgentTool.tsx

Parameters: prompt, agent type, model override, isolation mode, background flag.

Agent types available: general-purpose, Explore (fast search), Plan (architecture), statusline-setup, claude-code-guide. Each type has a curated tool set — e.g., Explore agents can't Edit or Write files.

Validate Check permissions and policy limits

Agent deny rules, policy limits on concurrent agents, resource budgets.

Deny rules can block specific agent types: Agent(Explore) could be denied. Policy limits cap the number of concurrent agents (preventing runaway spawning). Resource budgets ensure agents don't consume unlimited tokens.

Decision Isolation mode?

How should the agent be isolated from the parent?

Worktree isolation

Create an isolated git worktree: git worktree add into a temp directory. The agent works on a separate branch. Changes can be merged back or discarded. The worktree is auto-cleaned if no changes are made.

No isolation (default)

Agent works in the same directory as the parent. Shares the filesystem but has its own context window. Concurrent file edits could conflict.

Remote (async)

Agent runs on Anthropic's cloud infrastructure (CCR). Fully isolated compute and storage. Used for long-running tasks.

Fork Create sub-agent with isolated context

forkSubagent()

New context window, curated tool set, inherited permissions, memory snapshot.

The sub-agent receives:

Fresh context window — Not polluted by parent's conversation
Curated tools — Based on agent type (Explore gets read-only tools)
Inherited permissions — Can't exceed parent's permission level
Memory snapshot — Relevant memories from parent, read-only
Agent prompt — The task description + agent type instructions
Working directory — Parent's dir or isolated worktree

Decision Foreground or background?

Foreground (default)

Parent blocks and waits. Sub-agent's query loop runs to completion. Results stream back to parent. Parent sees the final output when done.

Background

Parent continues immediately. Sub-agent runs asynchronously. Parent receives a notification when the agent completes. Can check status via TaskGetTool.

Execute Sub-agent runs its own query loop

Full ReAct cycle: API calls, tool execution, multiple turns.

The sub-agent is a complete, independent agent. It runs the same query loop, can call tools, and iterates until done. It can even spawn its own sub-agents (nested spawning), though this is limited by policy to prevent infinite recursion.

Return Results collected and returned to parent

Final text, modified files list, tool usage summary.

The parent receives: final response text, list of files modified (important for worktree merging), tool usage statistics, and any memory update suggestions. For worktree agents, the worktree path and branch name are returned so the parent can decide whether to merge.

⌨ Slash Command Routing

When input starts with /, it's routed to the command system instead of the query engine. Commands are a different execution path from tools.

Input User types /command [args]

Input starts with "/" and isn't marked as skipSlashCommands.

Parse Extract command name and arguments

parseSlashCommand(inputString)

Split on first space: "/commit -m fix" → name="commit", args="-m fix"

Find Match command in registry

findCommand(name, commands)

Found — by name or alias

104 registered commands. Checked by exact name first, then aliases. Bridge-mode sessions only allow bridge-safe commands.

Not found — check if it's a skill

If no command matches, check the skills registry. Skills are invokable via /skill-name syntax.

Type What kind of command is it?

LocalJSX command (has React component)

Commands like /mcp, /config, /permissions render interactive React UIs in the terminal. The JSX component mounts, the user interacts, and the result is captured when the component unmounts.

Regular command

Commands like /commit, /clear, /help execute their handler function. May add messages to history, modify state, or trigger side effects.

Skill command

Skills like /simplify, /commit (skill version) expand into full prompts that are then processed by the query engine. The shouldQuery flag is set to true.

Result Does the command trigger a query?

shouldQuery = true

The command's output becomes a user message sent to Claude. Example: /commit skill expands into a detailed prompt about creating a git commit, which Claude then executes via the normal agent loop.

shouldQuery = false

The command handled everything itself. Messages (if any) are added to history. The REPL loops back to wait for the next input. Example: /clear just clears the screen.

🔌 MCP Server Connection Flow

How Claude Code discovers, connects to, authenticates with, and registers tools from MCP servers.

Config Read MCP server configurations

services/mcp/config.ts

Load from settings.json (project + user + enterprise).

MCP servers are configured in settings.json under the mcpServers key. Each entry specifies: server name, transport type, connection details (command/url/port), environment variables, and optional auth config. Configs are merged across all settings sources.

Transport What transport type?

stdio — Most common for local tools

Spawn the server as a child process. Communicate via stdin (requests) and stdout (responses). Environment variables expanded from config. Process lifecycle managed by Claude Code.

SSE / HTTP — Web-based servers

Connect to a URL. SSE uses Server-Sent Events for streaming responses. HTTP uses standard request/response. Headers and auth tokens attached per config.

WebSocket — Full-duplex

Connect via ws:// or wss://. Best for servers that push updates. Reconnection logic handles dropped connections.

SDK (in-process) — Zero overhead

Direct function calls within the same process. Used for built-in MCP integrations. No serialization overhead.

Auth Does the server require authentication?

Yes — OAuth / Token flow

services/mcp/auth.ts (88KB) handles: token retrieval from credential store, OAuth authorization code flow (opens browser), token refresh, IDP (Identity Provider) login via xaaIdpLogin.ts, credential persistence. Each server has its own auth context.

No — Connect directly

Connect Initialize MCP client & negotiate capabilities

MCPConnectionManager.tsx

Handshake: client sends capabilities, server responds with its capabilities.

The MCP handshake establishes: protocol version, supported features (tools, resources, prompts, elicitation), server name and version. If capability negotiation fails, the connection is rejected gracefully.

Discover Fetch tools, resources, and prompts

tools/list, resources/list, prompts/list

Query the server for all available capabilities. Convert to Claude Code tool format.

tools/list → Get tool definitions (name, description, input schema). Converted to Tool objects with mcp__serverName__toolName naming convention.
resources/list → Available resources (files, data). Accessible via ReadMcpResourceTool.
prompts/list → Prompt templates the server provides.

Register Add MCP tools to available tool set

MCP tools appear alongside built-in tools. Claude can call them like any other tool.

MCP tools are dynamically registered into the tool registry. They get permission rules just like built-in tools (you can allow/deny mcp__github__create_issue). When Claude calls an MCP tool, the MCPTool proxy forwards the call to the appropriate server.

🔌 Hook Lifecycle & Execution

Hooks fire at specific moments in the execution lifecycle. They enable custom behavior without modifying core code. Here's exactly when each hook type fires.

Session Start

Session hooks fire when the interactive session begins. Use for: environment setup, git branch checks, starting background services, logging session start.

Pre-sampling

Pre-sampling hooks fire before each API call. The hook receives the messages array and can modify it. Use for: injecting dynamic context, enforcing guardrails, adding system instructions, logging prompts. Can block the API call.

API Call

No hooks here — the API call is atomic. Streaming begins and content blocks arrive.

Post-sampling

Post-sampling hooks fire after Claude's full response is received (fire-and-forget, non-blocking). Use for: analytics, response logging, output filtering, triggering notifications.

Pre-tool-use

Permission hooks fire before each tool executes (part of permission check, Layer 5). Hook receives: tool name, input, context. Can ALLOW or DENY the tool call. Use for: custom permission logic, audit logging, input sanitization.

Tool Executes

The tool runs. Progress messages stream to UI.

Post-tool-use

Post-tool hooks fire after each tool completes. Hook receives: tool name, input, result, duration. Use for: Slack notifications, CI triggers, audit logs, tool result caching, file change notifications.

Stop Hooks

Stop hooks fire when Claude's turn ends (no more tool calls). Use for: auto-formatting, running linters, test execution, generating summaries, memory extraction. Can PREVENT continuation (block the next query iteration).

File Change

File change hooks fire when files are modified on disk (by tools or externally). Use for: auto-formatting on save, linting, test re-runs, live reload triggers.

Session End

Session hooks fire on session end. Use for: cleanup, saving state, final analytics flush, stopping background services.

💡 Hook Execution Model

Hooks can be shell commands (run in a subprocess) or HTTP webhooks (POST to a URL). Shell hooks receive context via environment variables and stdin. HTTP hooks receive a JSON payload. Both can return structured responses that influence the system's behavior. Hooks are configured in settings.json and managed via /hooks command.