The two-headline summary
At the Code with Claude conference — the livestream was announced on 2 May 2026 and MIT Technology Review's write-up landed on 21 May 2026 — Anthropic shipped two changes that quietly redraw the operating model for production agents. The first is Dreaming: a scheduled process that reviews past agent sessions, surfaces patterns, and curates memory between runs. The second is Multi-Agent Orchestration, now generally available: a lead agent that delegates to specialist subagents working in parallel on a shared filesystem, each with its own model, prompt, and tool allow-list.
On their own each is a worthwhile primitive. Together they change how you architect anything that runs for more than a single user turn. This piece walks through what the two features are, how they likely work under the bonnet, what the failure modes are, and how a builder in Mumbai, Bengaluru, London or Manchester should start planning their next orchestration loop.
- Dreaming is durable memory — a curated store that survives between sessions, not a longer context window.
- Multi-agent orchestration is parallelism with a coordinator — lead agent decomposes, subagents specialise, shared filesystem is the substrate.
- Rate limits are now generous — Anthropic doubled Claude Code limits and raised API limits for Opus, so fan-out is economically viable.
- Sandboxed deployment is unblocked — a separate 19 May update lets Managed Agents run in customer-controlled sandboxes with private MCP servers.
Treat Dreaming as a write-behind cache for "what we learned this week" — not a replacement for retrieval. Keep your RAG index for facts and your Dreaming memory for habits, preferences and recurring failure modes. Mixing the two is the single biggest mistake we expect to see in the first quarter of adoption.
How Dreaming almost certainly works
Anthropic has not published the full architecture, but the shape is easy to infer from the announcement language and from how the Claude Agent SDK already structures sessions. The most plausible implementation is an overnight cron — or any scheduled trigger you control — that streams the session transcripts from a defined window into a reviewer agent. The reviewer extracts patterns, decisions, and recurring tool-call sequences, then writes them as compact entries into a memory store the next run reads at start-up.
The interesting design choice is curation rather than accumulation. A naive memory system grows linearly with session count and eventually drowns the agent in stale entries. A curated memory store has the reviewer agent decide what to keep, what to consolidate, and what to drop — a process much closer to how human sleep is thought to consolidate experience into long-term traces. Hence the name.
For a production builder, the practical questions are:
- What is the review window? Last 24 hours, last 100 sessions, or last N successful task completions? Each implies different memory shapes.
- Who is the reviewer? A separate model run (probably a smaller, cheaper Claude variant), or the same agent reflecting on its own logs? The cost profile differs by an order of magnitude.
- How is the memory store keyed? If it is global, every session benefits; if it is per-user or per-tenant, you preserve isolation but multiply storage.
- What is the read pattern? Whole-store at session start, or retrieval-on-demand mid-session? The latter is far more scalable but harder to reason about.
How multi-agent orchestration changes your loop
The pattern Anthropic is endorsing is one many of us already hand-rolled with the Claude Agent SDK and brittle subprocess plumbing: a lead agent reads the task, decomposes it, spawns specialist subagents in parallel, and synthesises their output. What is new is that this is now first-class, with proper shared filesystem semantics, model-per-subagent configuration, and tool allow-lists that are scoped per subagent rather than globally.
The shared filesystem matters more than it sounds. In every hand-rolled orchestration loop we have seen, the hardest bug is two subagents writing to overlapping files and silently corrupting each other's work. Anthropic's runtime treats the filesystem as the shared workspace and handles the contention — which is a meaningful step up from "pass JSON between processes and hope".
| Dimension | Single long-running agent | Multi-agent orchestration |
|---|---|---|
| Context per role | One prompt, one allow-list — everything competes for attention | Each subagent has its own prompt, tools and model — focused attention |
| Parallelism | Sequential by definition | True parallel execution on shared filesystem |
| Cost shape | Long single turn, one cache hit, predictable | Many short turns, multiple cache primings, harder to predict |
| Failure isolation | One bad tool call can poison the entire loop | Subagent failure is contained; lead can retry or reroute |
| Debuggability | One transcript to read | Multiple transcripts plus coordinator — needs better tooling |
| Best fit | Coherent, sequential reasoning (compliance review, single repo refactor) | Embarrassingly parallel work with synthesis (codebase audit, multi-doc analysis) |
Recommended subagent specialisations to start with
If you are designing a fresh orchestration loop on top of the GA runtime, three subagent shapes are worth borrowing from how Anthropic itself appears to use the pattern internally:
- The researcher — read-only tool allow-list, large context budget, paired with retrieval. Its job is to gather, not to act.
- The implementer — write access to a scoped directory, smaller context budget, deterministic prompt. Its job is to ship code or edits in a tight loop.
- The auditor — read-only, runs after implementers, validates against acceptance criteria. Cheaper model, run-on-demand, can short-circuit the loop if everything is green.
"We had been faking this with a lead orchestrator written in Python that shelled out to Claude Code subprocesses. The new runtime collapses 600 lines of glue code into a single config block. The bigger win is the shared filesystem — we used to lose half a day a week to two subagents stepping on each other in a monorepo. That class of bug is just gone."
— Arjun, Verified Builder · Bengaluru, INThe rate-limit context — and why it matters now
Multi-agent orchestration only makes economic sense if you can actually fan out without immediately hitting a 429. Anthropic doubled rate limits on Claude Code and raised API limits for Claude Opus alongside the orchestration GA, which is the second-most-important sentence in the whole announcement. A lead agent spawning six parallel subagents on a fresh capacity envelope is a different conversation from the previous limits, which used to throttle anything beyond two or three concurrent specialists for non-enterprise tiers.
For Indian teams running on rupee budgets and UK teams optimising for predictable sterling spend, the practical translation is that the upper bound on subagent fan-out is now governed more by your own cost discipline than by the platform. That is a meaningful shift.
Set a per-loop spend ceiling at the lead-agent layer before you optimise anything else. With Dreaming surfacing patterns and orchestration multiplying turn count, the cost-per-task surface area gets large fast. Budget first, optimise second.
Where Dreaming will bite you
The failure modes worth watching for, in roughly the order we expect to see them in production:
- Memory poisoning — a single bad session pattern gets canonised into the memory store and influences every subsequent run. Build an explicit "demote" path that the agent can invoke when a memory entry is contradicted by evidence.
- Cross-tenant leakage — if your memory store is keyed at the wrong level of granularity, one tenant's patterns leak into another's reasoning. Default to per-tenant isolation unless you have a clear reason not to.
- Drift you cannot audit — agents start behaving differently over weeks in ways your eval set does not catch. Snapshot the memory store on a schedule and diff it; treat the diff as a release artefact.
- Cost creep — the reviewer agent runs whether or not it had anything useful to consolidate. If your traffic is bursty, gate Dreaming on a minimum session-volume threshold rather than running it on every cron tick.
Where multi-agent orchestration will bite you
- Coordinator overload — the lead agent's prompt grows to handle every subagent's output schema and eventually exceeds its useful context. Push synthesis logic down into a dedicated "synthesiser" subagent rather than letting the lead do everything.
- Shared-filesystem races — even with the runtime's safety net, two subagents editing the same file with different intents will produce mush. Partition write access by directory at config time, not at runtime.
- Cache fragmentation — each subagent primes its own cache. If your context base is large, you pay the cold-cache penalty per subagent. Pre-warm with a shared base prompt the lead injects.
- Observability gap — a single transcript is easy to read; six parallel transcripts plus a coordinator log are not. Invest in a tracer that correlates subagent runs to the parent task before you scale loops past three specialists.
Want to discuss this with other verified Builders?
Every article on AI Tech Connect is written or reviewed by Verified Builders. Browse profiles, shortlist who you want to hire or collaborate with.
Browse Builders →Self-hosted sandboxes and private MCP — the deployment angle
The 19 May enhancement that 9to5Mac flagged is worth pulling forward. Claude Managed Agents can now run inside a sandbox the customer controls and connect outwards to private MCP servers. For regulated workloads in India — DPDP-bound personal data — and the UK — anything touching FCA or NHS systems — this is the difference between "interesting demo" and "deployable architecture". Pair that with the self-hosted sandbox + MCP tunnel deployment pattern we covered earlier, and you have a credible production posture without sending sensitive data to a public managed runtime.
The orchestration runtime appears to honour the same sandbox boundary, which means a lead agent spawning specialists can keep the whole fan-out inside your VPC. That is not yet documented exhaustively, so verify with Anthropic support before you commit a compliance roadmap to it — but the pieces line up.
What to do this fortnight
Concretely, if you are running anything that resembles a production agent today:
- Audit your current orchestration glue. If you hand-rolled a lead-plus-subagent pattern, prototype the same loop on the GA runtime and benchmark cost-per-task.
- Decide if you need Dreaming or RAG. If your agent's failures are "it forgot what we agreed last week", Dreaming is your tool. If they are "it does not know our policy", build retrieval first.
- Set per-loop budgets. Multi-agent fan-out compounds cost faster than people expect; a hard ceiling at the lead layer is non-negotiable.
- Plan your observability. A correlator that ties subagent transcripts to parent tasks is a precondition for scaling, not a nice-to-have.
- Verify sandbox compatibility if you are in a regulated sector — DPDP in India, FCA or NHS in the UK — before you commit to a managed-runtime roadmap.
The Code with Claude announcements are the strongest signal yet that Anthropic is treating production agents as the centre of gravity rather than chat. Builders who get the orchestration shape right in the next quarter will compound that advantage; the rest will spend the year reinventing the wheel against a moving target.
Anthropic's official changelog and announcement coverage lives at anthropic.com/news. For the broader stack context, see our recent coverage of Claude Opus 4.7's 1M-token window and the Managed Agents public beta guide.