Interactive deep-dive into Anthropic's CLI for Claude — a production-grade AI engineering system
Anthropic's official command-line interface for interacting with Claude AI directly from the terminal.
Claude Code is a textbook example of how to build a production AI agent system. It demonstrates tool-augmented LLMs, streaming architecture, permission systems, multi-agent orchestration, and extensible plugin architectures — all concepts you'll need as an AI engineer.
Claude Code is not just a chat CLI. It's a full agentic system that can read/write files, execute shell commands, search the web, manage tasks, spawn sub-agents, integrate with external services via MCP, and orchestrate multi-agent "swarms" — all while enforcing a sophisticated permission model and streaming results in real time through a React-powered terminal UI.
Click any layer to see details. The system is organized in clean layers from entry points down to external integrations.
tools/. Tools implement a common Tool interface defined in Tool.ts. The tools.ts file registers and assembles all tools. StreamingToolExecutor handles concurrent execution, ordering results, and managing exclusive vs concurrent-safe tools. Tools are the primary way Claude interacts with the outside world.
.claude/skills/. Examples: batch processing, code simplification, scheduling.Click any card to expand details. Each module represents a major system within Claude Code.
Files: query.ts (68KB), QueryEngine.ts (46KB)
The query engine is an async generator — a powerful pattern for streaming AI systems. It yields events as they happen:
Using an async generator for the main agent loop is elegant: each yield is a checkpoint where the UI can render, the user can interrupt, and state can be saved. This is a pattern worth adopting in your own AI systems.
Interface: Tool.ts (29KB) — defines the common tool contract
Registry: tools.ts (17KB) — assembles and registers all tools
Executor: StreamingToolExecutor.ts — concurrent execution engine
Each tool provides:
This is the ReAct pattern (Reason + Act). The LLM decides which tool to call, the system executes it, and the result goes back to the LLM. Claude Code's implementation shows production-grade patterns: concurrent execution, permission checks, streaming results, and graceful error handling.
24+ files in utils/permissions/
Permission checking flows through multiple layers:
Key insight: permissions are tool-specific. A BashTool command gets different scrutiny than a FileReadTool call. The system knows which tools are read-only vs. state-modifying.
File: services/api/claude.ts (~125KB)
The API layer handles:
@anthropic-ai/sdk streaming API with delta eventswithRetry.tsNote the separation: the API client handles transport, retries, and format conversion. Business logic stays in the query engine. This separation of concerns is critical for maintainability in AI applications.
Claude Code renders using Ink, a React renderer for the terminal. This means:
React's declarative model makes complex TUI state management tractable. As AI tools get more interactive (progress bars, multi-step confirmations, live streaming), a component model pays for itself.
AgentTool (234KB) spawns independent sub-agents that:
SendMessageToolCoordinator mode (coordinatorMode.ts) orchestrates "swarms" of agents working together on complex tasks. TeamCreate/TeamDelete tools manage agent teams.
This is a hierarchical multi-agent system: a coordinator delegates to workers, each with specialized capabilities and isolation. Key challenges: communication, state sharing, and permission inheritance. Claude Code solves these with message passing, worktree isolation, and cascading permissions.
25+ files in services/mcp/
MCP is an open protocol that lets Claude connect to external tool servers:
MCP decouples the AI system from specific tool implementations. Instead of hardcoding integrations, you connect to standardized servers. This is the future of AI tool ecosystems — learn this protocol well.
Files: memdir/ directory
The memory system gives Claude persistent context:
findRelevantMemories.ts selects which memories to loadThis is a file-based RAG system without the vector DB. Instead of embedding similarity, it uses structured metadata and relevance heuristics. For many use cases, this simpler approach works just as well.
From `claude` command to interactive session — click each step for details.
COORDINATOR_MODE, DAEMON, BRIDGE_MODE route to specialized entry points. This is a common startup optimization in large CLI apps.
What happens when you type a message and press Enter. This is the core agent loop.
This is the Reason-Act-Observe loop that powers all modern AI agents. Claude reasons about what to do, acts by calling tools, observes the results, and repeats. The loop continues until Claude decides no more tools are needed. Understanding this loop is fundamental to AI engineering.
All 44+ tools organized by category. Click a category to expand.
Notice how each tool has a clear single responsibility, a declarative JSON Schema for inputs, and explicit read-only vs. mutating classification. This enables the permission system to make fine-grained decisions. When building your own AI tools, follow this pattern: small, well-defined, self-describing tools.
Multi-layered security model. Click each mode for details.
In default mode, every tool invocation triggers an interactive prompt in the terminal. The user can approve, deny, or set a rule for future calls. This is the safest mode and helps you understand exactly what Claude is doing.
Best for: New users, sensitive codebases, learning how Claude Code works.
The YOLO classifier (yoloClassifier.ts, 52KB) uses pattern matching to classify tool calls as safe or dangerous. Safe operations (reads, searches) are auto-approved. Dangerous operations (deletes, force pushes, writes to sensitive paths) still require user approval.
Classification pipeline:
Best for: Experienced users who want speed but still want guardrails.
In plan mode, Claude can read and analyze but cannot make changes. This is useful for:
Use EnterPlanModeTool / ExitPlanModeTool to toggle.
Every tool call is automatically approved without any checks. This is the fastest mode but provides no safety net.
Warning: Only use in sandboxed environments, CI pipelines, or when you fully trust the operation. Claude can delete files, run arbitrary commands, and push code without asking.
How Claude Code spawns, manages, and coordinates multiple AI agents.
Built-in agent types: general-purpose, Explore (fast codebase search), Plan (architecture), statusline-setup, claude-code-guide. Each type has a curated tool set optimized for its task.
Each agent gets its own git worktree — a separate working directory with its own branch. Changes are isolated. If the agent fails, the worktree is cleaned up. If changes are good, they can be merged back.
Each sub-agent has its own context window. It doesn't see the parent's full conversation. This prevents context pollution and allows agents to focus on their specific task with maximum context budget.
Sub-agents inherit permission settings from their parent. A restricted parent cannot spawn an unrestricted child. This ensures the security model is maintained throughout the agent hierarchy.
Agents receive a snapshot of relevant memories from the parent session. They can read but not write to the parent's memory. This gives context without risking memory corruption.
Agents can run asynchronously in the background:
The open protocol for connecting Claude to external tool servers. Click components for details.
services/mcp/client.ts implements the full MCP client:
services/mcp/auth.ts handles:
Elicitation is a unique MCP feature: servers can ask the user for additional information during tool execution. The ElicitationDialog.tsx (179KB!) renders rich interactive forms in the terminal for:
MCP separates tool definition from tool implementation. The AI sees a standard tool interface; the implementation can be a local script, a remote API, or a complex distributed system. This is the adapter pattern applied to AI tooling — a powerful abstraction for building extensible AI systems.
Extensibility through lifecycle hooks. Hooks let you inject custom behavior at key moments.
Run before the API call. Can modify the messages, add context, or block the request. Use for prompt engineering, guardrails, or dynamic context injection.
Run after Claude's response. Can modify the response, log it, trigger side effects. Use for output filtering, analytics, or automated follow-ups.
Run before/after each tool execution. Can approve, deny, modify, or log tool calls. Use for auditing, custom permission logic, or tool result caching.
Lifecycle events: session start, end, pause, resume. Use for environment setup/teardown, logging, or automated git operations.
Triggered when files change on disk. Use for auto-formatting, linting, test running, or notifying other tools of changes.
Send webhooks to external services. Use for Slack notifications, CI triggers, audit logs, or integration with other systems.
Persistent cross-conversation memory with structured types and relevance-based retrieval.
Who is the user? Role, expertise, preferences. "Senior Go dev, new to React." Shapes how Claude communicates and what it explains.
User corrections and validations. "Don't mock the DB." "Single PR was right." Prevents repeating mistakes and reinforces good patterns.
Ongoing work context. "Merge freeze after March 5." "Auth rewrite for compliance." Non-obvious project state that can't be derived from code.
Pointers to external resources. "Bugs tracked in Linear INGEST project." "Latency dashboard at grafana.internal/d/api-latency."
How real-time streaming works from API to terminal.
The concurrent-safe vs. exclusive tool execution model is brilliant: read-only tools (Glob, Grep, Read) can run in parallel for speed, while mutating tools (Edit, Write, Bash) run exclusively to prevent conflicts. This is the same principle as a read-write lock in concurrent programming, applied to AI tool execution.
104 commands organized by function. These are slash commands you type in the CLI.
/commit /commit-push-pr /review /diff /branch /security-review
/session /resume /rewind /history /clear /export
/plan /tasks /effort /status /ultraplan
/config /keybindings /permissions /privacy-settings /theme /model
/mcp /skills /hooks /plugin /reload-plugins
/login /logout /install /desktop /ide /mobile
/doctor /help /debug-tool-call /insights /cost /usage
/fast /brief /copy /voice /memory /bughunter /advisor
Settings cascade from 6 sources with increasing priority.
This 6-level cascade (local → project → user → enterprise → MDM → dynamic) is a production pattern you'll see in enterprise software. Each level can override the previous, with dynamic flags having the final say. This enables progressive rollouts, A/B testing, and emergency kill switches.
Click folders to expand. Key files are annotated.
Patterns and concepts demonstrated in this codebase that every AI engineer should know.
The fundamental pattern: LLM reasons about the task, decides on a tool action, observes the result, and repeats. Implemented in query.ts as an async generator.
Extending LLM capabilities with external tools. Each tool has a JSON Schema, description, and execution function. The LLM uses the schema to construct valid calls.
Real-time token streaming from API to UI. Enables responsive UX and early tool detection. Uses SSE (Server-Sent Events) under the hood.
Hierarchical agent system: coordinator delegates to workers. Each agent has isolated context, permissions, and tools. Communication via message passing.
Automatic compaction when context gets too long. History compression, message summarization, and selective memory loading keep conversations within limits.
Multi-layered security: mode selection, rule matching, ML classification, dangerous pattern detection, and interactive approval. Essential for production AI agents.
The system prompt is dynamically assembled from base instructions, tool descriptions, memory, git context, and more. This is production prompt engineering.
Open protocol for connecting AI to external tools. Decouples tool definition from implementation. Transport-agnostic (stdio, HTTP, WebSocket).
50+ feature flags gate functionality. GrowthBook integration enables A/B testing and gradual rollout. Dead code elimination removes unused features from builds.
File-based memory system with structured types, relevance scoring, and cross-session persistence. Simpler than vector DBs but effective for the use case.
Using JavaScript async generators for the agent loop. Each yield is a checkpoint for UI updates, user interruption, and state persistence.
Lifecycle hooks (pre/post-sampling, tool execution, session) enable extensibility without modifying core code. Classic plugin architecture pattern.
A structured approach to understanding this codebase and growing as an AI engineer.
Build exercise: Write a minimal ReAct loop in TypeScript that calls a single tool.
Build exercise: Implement a custom tool (e.g., a calculator) following the Tool interface pattern.
Build exercise: Build a streaming chat client using the Anthropic SDK with tool support.
Build exercise: Design a permission system for a simple AI agent that can read/write files.
Build exercise: Build a mini multi-agent system where a coordinator delegates tasks to specialized workers.
Build exercise: Build an MCP server that exposes a custom tool (e.g., a database query tool) and connect it to Claude Code.
Trace exactly how code executes through the system. Select a workflow, then click any node to expand its details and decision branches.
When you run claude in the terminal, the CLI has to decide which mode to launch. There are 12+ fast paths that exit before loading the full app, plus the main interactive path. This decision tree runs in under 50ms for fast paths.
while(true) loop that processes commands until the session ends.This is the heart of the system. An async generator that drives the entire Reason-Act-Observe cycle. Each iteration: call the API, stream the response, execute any tools, then decide whether to continue. Click nodes to see decision branches and error recovery paths.
A simple question might complete in 1 turn (text only). A coding task might take 5–20 turns (read files, edit, run tests, fix errors). Complex tasks can hit 50+ turns. The loop continues until Claude stops calling tools, hits the max turn limit, or the context runs out.
When Claude decides to call a tool, it goes through validation, permission checking, execution, and result collection. The StreamingToolExecutor manages concurrency — read-only tools run in parallel, mutating tools run exclusively.
Every tool call passes through this multi-layer security cascade. The system balances safety with usability — read-only operations need less scrutiny than rm -rf.
How stream events from the Anthropic API are processed, rendered, and fed into the tool executor in real-time.
@anthropic-ai/sdk emits Stream<BetaRawMessageStreamEvent> objects. Connection uses HTTP/2 with SSE (Server-Sent Events). Keep-alive pings maintain the connection. Automatic reconnection on transient failures.
message_start → Extract model, usage metadata, create AssistantMessage shellcontent_block_start → Allocate new block: text | thinking | tool_use | redacted_thinkingcontent_block_delta → Append chunk to current block. Text deltas render immediately in terminal.content_block_stop → Finalize block. If tool_use: send to StreamingToolExecutormessage_delta → Update token counts (input, output, cache read/write)message_stop → Complete message. Trigger post-processing.
content_block_stop fires. For concurrent-safe tools, execution begins while other blocks are still streaming. This overlap is a key performance optimization.
withRetry.ts retries with exponential backoff + jitterAs conversations grow, the context window fills up. Claude Code uses 4 layers of compaction to keep conversations going — each more aggressive than the last. This is one of the most sophisticated parts of the system.
When Claude decides it needs a sub-agent, the AgentTool creates an isolated child process with its own context window, tools, and permissions.
When input starts with /, it's routed to the command system instead of the query engine. Commands are a different execution path from tools.
How Claude Code discovers, connects to, authenticates with, and registers tools from MCP servers.
Hooks fire at specific moments in the execution lifecycle. They enable custom behavior without modifying core code. Here's exactly when each hook type fires.
Hooks can be shell commands (run in a subprocess) or HTTP webhooks (POST to a URL). Shell hooks receive context via environment variables and stdin. HTTP hooks receive a JSON payload. Both can return structured responses that influence the system's behavior. Hooks are configured in settings.json and managed via /hooks command.