Why Single-Agent Claude Code Breaks Past Three Files

Claude Code is Anthropic's command-line AI agent that operates at the project level, not line by line. It reads your codebase, plans a sequence of actions, executes them with real development tools — git, package managers, test runners — evaluates the result, and adjusts. For focused tasks on a handful of files, this single-agent loop is remarkably capable.

The problem emerges when the task grows. Most agents "fall apart past three files" because they are trying to hold coherent state across an ever-growing context: the original instructions, every file read, every tool result, every intermediate edit. The context window fills; earlier edits slip out of the model's effective attention; contradictions creep in.

The instinct to reach for multiple agents at this point is correct — but the execution matters enormously. Google Research reported a 39–70% performance degradation for multi-agent variants on strict sequential reasoning tasks. Fan out incorrectly and you have compounded the problem: you now have several agents each making locally coherent but globally contradictory changes.

The core tradeoff is this:

Scenario Single agent Multi-agent
Sequential steps with shared state Wins — full context coherence Loses — state sync overhead dominates
Independent subtasks, separate files Loses — context bloat, slow Wins — parallelism pays off
Mixed: plan once, execute in parallel Partial win on planning only Wins if orchestrator owns planning
Strict sequential reasoning (maths, logic proofs) Wins — no cross-agent drift Loses — Google Research: 39–70% degradation

The practical rule: single-agent wins when state must stay coherent across steps; multi-agent wins when subtasks can run independently. The rest of this guide builds on the planning-first discipline covered in plan-first Claude Code workflows — that article is the prerequisite, this one is the sequel.

Claude Code's Subagent Primitives: Task Tool, Worktrees, Batch

Before choosing a pattern, understand what Claude Code gives you out of the box as of June 2026.

The Task Tool (Agent spawning)

The Task tool is how an orchestrator agent spawns a subagent. In the Claude Agent SDK it surfaces as the Agent() call:

from claude_agent_sdk import Agent

result = Agent(
    description="Migrate auth module to new token format",
    prompt="""
    Refactor /src/auth/token.py to use the new TokenV2 format.
    - Replace all TokenV1 references
    - Update the decode() method signature
    - Run: python -m pytest tests/test_auth.py
    Report: files changed, tests passed/failed.
    """,
)
print(result)

The subagent runs with its own context window. It executes real tools — reads files, writes files, runs shell commands — and returns a result string. The orchestrator never sees the subagent's internal reasoning; it only sees the final return value.

Worktree Isolation

The isolation: 'worktree' flag (or --isolation worktree in the CLI) instructs Claude Code to run the subagent in an isolated git worktree — a separate checkout of the repository on a new branch. Agents cannot touch each other's working directories. If the agent makes no changes, the worktree is automatically cleaned up.

result = Agent(
    description="Refactor database layer",
    prompt="...",
    isolation="worktree",   # Each agent gets its own branch
)

Batch Mode

Claude Code supports batch mode for running the same prompt against a list of inputs — useful for processing a directory of files, a list of API endpoints, or a set of test cases. Batch mode is a thin wrapper around sequential agent calls, not true parallelism, but it keeps the orchestrator prompt clean.

Pro tip

Concurrent agent calls are capped internally; excess calls queue automatically. If you dispatch 20 subagents at once, the SDK will serialise the overflow. Design your fan-out to dispatch in groups matching your concurrency budget — typically 3–5 for interactive sessions, up to 10 for unattended pipelines.

Pattern 1: Sequential Pipeline (Plan → Implement → Test)

The sequential pipeline is the gateway pattern. It keeps a single agent in control at each stage but hands off context explicitly between stages. This is the right first step when you are moving from a monolithic prompt to a structured workflow.

When to use it

  • The output of each step is the input to the next
  • You want a human review gate between plan and implementation
  • The task touches a single coherent module (not independent services)

Shell invocation pattern

# Step 1: plan — output to a structured file
claude -p "Read /src/payments/ and write a migration plan to /tmp/plan.md.
List every file to change, every function to rename, every test to update.
Do not edit any source files yet." \
  --output-format text > /tmp/plan.md

# Human review gate (optional but recommended for production)
echo "Review /tmp/plan.md. Press Enter to continue or Ctrl-C to abort."
read

# Step 2: implement — agent reads the plan, not the whole codebase
claude -p "Execute the migration plan at /tmp/plan.md.
Edit only the files listed. Run: python -m pytest tests/test_payments.py
After tests pass, write a summary to /tmp/impl-summary.md." \
  --allowedTools Edit,Bash,Read

# Step 3: test — separate agent with fresh context
claude -p "Read /tmp/impl-summary.md. Run the full test suite.
Check for regressions. Write a test report to /tmp/test-report.md." \
  --allowedTools Bash,Read

Python SDK variant

import subprocess, pathlib

PLAN_FILE   = pathlib.Path("/tmp/plan.md")
IMPL_FILE   = pathlib.Path("/tmp/impl-summary.md")
REPORT_FILE = pathlib.Path("/tmp/test-report.md")

def run_agent(prompt: str) -> str:
    result = subprocess.run(
        ["claude", "-p", prompt, "--output-format", "text"],
        capture_output=True, text=True, check=True
    )
    return result.stdout

# Stage 1 — plan
run_agent(f"Read /src/payments/. Write a migration plan to {PLAN_FILE}. "
          f"List every file to change. Do not edit source files.")

# Stage 2 — implement (reads plan, not full codebase)
run_agent(f"Execute the plan at {PLAN_FILE}. Run pytest tests/test_payments.py. "
          f"Write a summary to {IMPL_FILE}.")

# Stage 3 — test
run_agent(f"Read {IMPL_FILE}. Run the full test suite. "
          f"Write a test report to {REPORT_FILE}.")

print(REPORT_FILE.read_text())
Watch out

Passing entire file contents between stages balloons token costs and risks drift. Pass structured artefacts instead: a plan file listing paths and function names, a summary file listing what changed. Each stage reads its inputs explicitly rather than inheriting a giant context blob.

Pattern 2: Fan-Out (Parallel Subagents on Independent Files)

Fan-out is the pattern most often demoed and most often misapplied. It works when you have a list of genuinely independent units — separate microservices, separate documentation pages, separate data-processing scripts — where no unit's output affects another's input.

When to use it

  • You have N similar tasks on N independent files or services
  • Each task has a clear, bounded scope that does not reference shared mutable state
  • Speed matters more than the cost of running N context windows

Python SDK fan-out

import concurrent.futures
from claude_agent_sdk import Agent

SERVICES = [
    "/src/services/auth",
    "/src/services/billing",
    "/src/services/notifications",
    "/src/services/search",
]

def migrate_service(service_path: str) -> dict:
    result = Agent(
        description=f"Add OpenTelemetry tracing to {service_path}",
        prompt=f"""
        Add OpenTelemetry tracing to the service at {service_path}.
        - Import opentelemetry-sdk
        - Wrap every public function with a span
        - Do NOT touch any file outside {service_path}
        - Run: python -m pytest {service_path}/tests/
        Return a JSON summary: {{"service": ..., "files_changed": [...], "tests": "passed|failed"}}
        """,
        isolation="worktree",   # Each agent in its own branch
    )
    return {"service": service_path, "result": result}

# Dispatch up to 4 agents in parallel
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as pool:
    futures = {pool.submit(migrate_service, svc): svc for svc in SERVICES}
    results = []
    for f in concurrent.futures.as_completed(futures):
        results.append(f.result())

for r in results:
    print(r["service"], "→", r["result"][:120])

Conflict avoidance via JSONL issue tracking

When multiple agents run in parallel, shared state is the enemy. One robust pattern is to store inter-agent communication as JSONL in git with hash-based IDs. Each agent appends to a shared log; it never overwrites existing entries. The orchestrator reads the log after all agents complete.

import json, hashlib, time, pathlib

LOG = pathlib.Path(".agent-log.jsonl")

def append_log(agent_id: str, event: str, data: dict):
    entry = {
        "id": hashlib.sha256(f"{agent_id}{time.time()}".encode()).hexdigest()[:12],
        "agent": agent_id,
        "event": event,
        "ts": time.time(),
        **data,
    }
    # Append-only — no read-modify-write, no merge conflicts
    with LOG.open("a") as f:
        f.write(json.dumps(entry) + "\n")

Agents write to the log but never read-modify-write it. Git's append-only semantics for JSONL files mean parallel commits to the same log do not produce conflicts — the next merge resolves trivially by concatenation.

Pattern 3: Worktree Isolation (Concurrent Edits Without Conflicts)

Worktree isolation takes fan-out one step further by giving each agent a fully isolated git worktree — its own branch, its own working directory, no shared filesystem state. This is the pattern for any task that involves edits across multiple files where two agents might otherwise touch the same path.

How worktrees work

A git worktree is a second (or third, or tenth) checkout of the same repository in a different directory, each on its own branch. Claude Code's --isolation worktree flag creates one per agent, runs the agent inside it, and automatically removes it if the agent makes no changes. If changes were made, the path and branch name are returned for the orchestrator to review.

from claude_agent_sdk import Agent

def run_isolated_agent(name: str, prompt: str) -> dict:
    """Run an agent in an isolated worktree. Returns branch info if changes were made."""
    result = Agent(
        description=name,
        prompt=prompt,
        isolation="worktree",
    )
    # result includes worktree path and branch if changes were made,
    # or a note that the worktree was cleaned up if unchanged
    return {"name": name, "result": result}

agents = [
    run_isolated_agent(
        "api-layer-refactor",
        "Refactor /src/api/ to use the new request validation library. Run tests."
    ),
    run_isolated_agent(
        "docs-update",
        "Update all docstrings in /src/api/ to match the new validation signatures."
    ),
]
Pro tip

After all worktree agents complete, treat their diffs as pull requests. Review each branch with git diff main...<branch> before merging. The orchestrator's job at this stage is integration: checking that two agents' changes do not contradict each other at the boundaries.

The "land the plane" pattern

Every subagent operating in a worktree should clean up its state at session end — committing any partial work to its branch, writing a status summary to the shared JSONL log, and explicitly closing tool calls. This is the "land the plane" pattern: you do not leave agents in a half-edited state where the orchestrator cannot tell whether work is in progress or abandoned.

LAND_THE_PLANE_SUFFIX = """
Before you finish:
1. Commit all changes to the current branch with a clear message.
2. Write a status entry to .agent-log.jsonl: {"agent": "...", "status": "complete|partial|failed", "summary": "..."}
3. If any tests failed, note them in the summary — do not silently skip them.
"""

result = Agent(
    description="Refactor auth module",
    prompt=your_main_prompt + LAND_THE_PLANE_SUFFIX,
    isolation="worktree",
)

Pattern 4: Orchestrator + Specialist Subagents

The orchestrator pattern is the most powerful and the most complex. A top-level orchestrator agent receives the original task, breaks it into typed subtasks, dispatches each to a specialist subagent, collects results, and synthesises them. The key word is "typed": each specialist has a narrow, well-defined scope and a clear output contract.

When to use it

  • The task spans multiple domains: code, documentation, tests, infrastructure
  • You want different model behaviours for different subtasks (e.g., a fast model for docs, a careful model for security review)
  • You need a coordination layer that humans can inspect and intervene in

Orchestrator script

from claude_agent_sdk import Agent
import json

TASK = "Add rate limiting to the /api/v2/search endpoint."

# Stage 1: Orchestrator plans and assigns typed subtasks
plan_result = Agent(
    description="Planner — decompose task into specialist assignments",
    prompt=f"""
    Task: {TASK}

    Read /src/api/v2/search.py and /src/middleware/.

    Produce a JSON plan with this schema:
    {{
      "subtasks": [
        {{"id": "...", "specialist": "code|test|docs|infra", "prompt": "...", "files": [...]}}
      ]
    }}

    Rules:
    - Each subtask must touch only its listed files
    - No subtask prompt may reference another subtask's output
    - Infra subtask: only /terraform/ or /k8s/ files
    """,
)

plan = json.loads(plan_result)

# Stage 2: Dispatch each specialist in isolation
specialist_results = {}

for subtask in plan["subtasks"]:
    specialist = subtask["specialist"]
    result = Agent(
        description=f"{specialist} specialist — {subtask['id']}",
        prompt=subtask["prompt"] + """

        Land the plane:
        - Commit all edits to your branch
        - Write {"agent": "...", "status": "complete|failed", "summary": "..."} to .agent-log.jsonl
        """,
        isolation="worktree",
    )
    specialist_results[subtask["id"]] = result
    print(f"[{specialist}] {subtask['id']} done")

# Stage 3: Orchestrator synthesises
synthesis = Agent(
    description="Synthesiser — merge specialist output into a single PR summary",
    prompt=f"""
    Original task: {TASK}
    Specialist results: {json.dumps(specialist_results, indent=2)}

    Read .agent-log.jsonl. Check for any failures.
    Write a PR description to /tmp/pr-description.md.
    List: files changed, tests passed, any open issues.
    """,
)

print(synthesis)
From a Verified Builder

"We run a four-specialist setup for every new feature: a code agent, a test agent, a docs agent, and a security-review agent. The orchestrator writes the plan and the synthesiser writes the PR. The only human step is approving the merge. We ship about 40% more features per sprint than we did with single-agent Claude Code, but we invested two weeks building the orchestration harness first. Do not skip that investment."

— A Verified Builder · Bengaluru, IN

Managing Context and Token Budgets Across Subagents

Multi-agent systems multiply your token spend. Each subagent runs a full context window; the orchestrator runs its own. Without deliberate budget management, a workflow that costs £0.50 as a single agent can cost £5 or more as a naive fan-out.

The levers are scope, caching, and model selection.

Context budget guide by project size

Project size Recommended tier Agents Isolation Notes
Small (<10k LoC) Single terminal 1 None Single-agent covers the whole repo; no fan-out needed
Medium (10k–100k LoC) Worktree agents 3–5 worktree per agent Orchestrator + specialists; diff review before merge
Large (100k–500k LoC) Worktree + pipeline 5–10 worktree + batch Add a CI integration step; human reviews orchestrator plan
Very large (>500k LoC) Fully autonomous pipeline 10+ worktree per service Persistent orchestrator; agents scoped to service boundaries

Practical budget rules

  • Pass file paths, not file contents to subagents wherever possible. Let the agent read only what it needs.
  • Use prompt caching for shared artefacts: system prompts, coding standards, project-wide context. A cached 50k-token system prompt costs a fraction of a fresh read on every agent call.
  • Use a fast model for orchestration. The orchestrator's job is planning and routing, not deep reasoning. A fast model is adequate and materially cheaper.
  • Cap subtask scope explicitly. Tell each specialist the exact file paths it is permitted to touch. Agents that wander into adjacent modules cost tokens and introduce merge conflicts.
  • Monitor queue depth. If you dispatch more agents than the concurrency cap, the SDK queues the overflow. Track queue depth to avoid unbounded waits in pipelines with strict SLAs.
Watch out

About one in three professional developers uses Claude Code, GitHub Copilot, and Cursor simultaneously (UC San Diego/Cornell survey, 29 of 99 developers). Each tool has its own context window and its own token spend. If your team is running Claude Code multi-agent workflows alongside Copilot, the aggregate spend can surprise you. Set a per-session budget and a per-day alert.

Production Checklist Before Merging Multi-Agent Output

Multi-agent output requires more rigorous review than single-agent output because conflicts surface at integration time, not during execution. Use this checklist before every merge.

Pre-merge checklist
  • Read .agent-log.jsonl — confirm all agents reported "status": "complete"
  • Run git diff main...<branch> for every worktree branch — no surprise files
  • Confirm no agent touched files outside its declared scope
  • Run the full test suite from a clean checkout, not from inside a worktree
  • Check for duplicate symbol definitions across merged branches
  • Review import changes — parallel agents may add conflicting dependency versions
  • Verify no agent committed credentials, environment variables, or generated artefacts to git
  • Read the synthesiser's PR description — it surfaces any failures the specialists noted

Production-worthy agents hold context across multiple files, edit real files, run real tests, and ask before touching the main branch. The checklist enforces the last condition programmatically: nothing merges to main until a human has verified the agent log, the diffs, and the test report.

For evaluation patterns beyond this checklist — including golden sets and LLM judges for agentic output — see our guide on building an LLM evaluation suite. The agentic AI paradigm these workflows operate within is discussed in depth in our companion news piece on how agentic software is restructuring development.

Shipping multi-agent systems in production?

AI Tech Connect is the directory where builders working on agentic AI get found by the teams that need them. Browse verified profiles or add your own.

Browse Builders →

When to Scale from Solo to Multi-Agent

The three-tier framework as of June 2026 is the clearest mental model for deciding when to scale:

  • Tier 1 — Single terminal agents: Claude Code subagents with no extra tooling. Suitable for tasks that fit in one context window, sequential pipelines, and projects under 10k LoC. No dashboard needed; the terminal is the interface.
  • Tier 2 — Isolated worktree agents: Three to ten agents, each in its own git worktree, with a lightweight diff-review step before merge. Suitable for medium codebases, feature development across multiple services, and any workflow where merge conflicts are a realistic risk.
  • Tier 3 — Fully autonomous pipelines: More than ten agents, CI integration, persistent orchestrator state. Suitable for large codebases, continuous refactoring pipelines, and workflows that run unattended overnight.

Decision table: scale up when...

Signal Action
Agent context fills before task completes Split into sequential pipeline (Pattern 1)
Task has 3+ independent modules Fan-out with worktree isolation (Pattern 3)
Different subtasks need different expertise Orchestrator + specialists (Pattern 4)
Merge conflicts appearing in single-agent output Add worktree isolation immediately
Task runs fine but takes too long Fan-out parallel agents (Pattern 2)
Multi-agent output is less coherent than single-agent Reduce to single agent; task has hidden state dependencies
Sequential reasoning required (maths, data pipelines) Stay on single agent; do not fan out

The last row is important. If you observe quality regression after adding subagents — the Google Research finding of 39–70% degradation on sequential reasoning — the correct response is to collapse back to a single agent, not to add more coordination overhead. Multi-agent is not inherently better; it is better for the specific class of problems where independent parallelism is the bottleneck.

Builders across India and the UK who are shipping agentic systems at scale consistently report the same pattern: they started with single-agent Claude Code, hit the three-file coherence wall, added orchestration incrementally, and settled on a workflow that looks like Pattern 4 for feature work and Pattern 1 for targeted refactors. The tips hub at /tips/ has further guides on each component of that stack. If you are building these systems professionally, your profile belongs in the Builders directory — teams hiring for agentic AI engineering roles look there first.