What you need to know

Agent frameworks come and go. The patterns underneath them are durable. Whether you are wiring a planner with the OpenAI Agents SDK in Bengaluru, building a stateful graph in LangGraph for a fintech in London, or rolling your own loop with nothing but an HTTP client, you are almost always assembling the same small set of moves. As of 2026, the working vocabulary that the field has settled on — across practitioner write-ups, vendor documentation and the wider community — names six core agent design patterns.

  • Reflection — the agent critiques and revises its own output before returning it.
  • Tool Use — the agent calls external functions, APIs and data sources, usually in a ReAct-style reason-act-observe loop.
  • Planning — the agent decomposes a goal into ordered sub-tasks before executing.
  • Multi-Agent Collaboration — several specialised agents work together on one problem.
  • Orchestrator-Worker — a central agent dispatches sub-tasks to interchangeable worker agents and gathers the results.
  • Evaluator-Optimiser — one agent generates, a second scores against criteria, and the loop iterates until the bar is met.

The rest of this guide takes them one at a time. For each you get a plain-English definition, a rule for when to use it, a minimal code sketch, and the failure mode that bites people in production. First, though, the most important decision sits one level up — whether you need an agent at all.

Pro tip

Treat these six as building blocks, not menu items. A capable production system is usually a planner that uses tools, with a reflection step on the risky outputs, sitting under a thin orchestrator. You are composing patterns, not choosing one.

When you need an agent versus a workflow — and why to start simple

An agent is not a synonym for "anything that uses an LLM". The useful distinction is about who decides the control flow. In a workflow, you decide the steps in advance and the code runs them in order; the model fills in the blanks. In an agent, the model decides the next step at runtime — which tool to call, whether the answer is good enough, when to stop. Agency buys you flexibility on genuinely open-ended tasks. It costs you predictability, latency, money and a much larger surface area for things to go wrong.

The practical test is simple. If you can draw the flowchart before you see the input, build the workflow. A nightly pipeline that pulls invoices, extracts fields and writes to a ledger does not need agency — it needs reliable extraction and good error handling. Reach for an agent only when the path genuinely depends on what the model discovers as it goes: a support assistant that may need to look up an order, check a returns policy, or escalate, and cannot know which until it reads the message.

When you do need an agent, start with the simplest design that could possibly work: a planner-executor with explicit Tool Use and a clear stopping condition. Add nothing else until a measured failure forces your hand. This is not timidity; it reflects what the field actually learned the hard way. The majority of production agent failures from 2024 through 2026 were architectural, not model-quality problems — runaway loops with no stop condition, context windows stuffed with irrelevant history, tools with ambiguous descriptions, and silent failures with no way to trace what the agent did. A stronger model rarely fixes any of those.

Watch out

The single most common cause of an agent burning money in production is a missing or weak stopping condition. Always cap the number of steps, set a token or cost budget, and define what "done" looks like before you let the loop run unattended.

For a fuller treatment of the core loop, scope and evaluation, see our companion piece on how to build a production AI agent. With the framing settled, here are the six patterns.

Pattern 1 — Reflection

What it is. The agent does not return its first attempt. It generates a candidate output, evaluates that output against explicit criteria, and then either accepts it or revises and tries again. The evaluation can be the same model prompted as a critic, a separate model, or a deterministic check such as "does this code compile". The cycle is simply: generate, evaluate, accept or revise.

When to use it. Reflection is, in the words of practitioners, "not for intelligence — for risk reduction". It earns its keep wherever a wrong answer is expensive: code destined to be merged, a financial summary that informs a decision, a legal or compliance draft, a customer-facing email that cannot be retracted. It does little for low-stakes, throwaway outputs, where the extra pass is pure cost.

def reflect(task, max_rounds=2):
    draft = generate(task)
    for _ in range(max_rounds):
        critique = evaluate(draft, criteria=task.criteria)
        if critique.passes:
            return draft
        draft = revise(draft, critique)
    return draft  # return best effort; never loop forever

Failure mode. Two failure modes are common. The first is the critic that always agrees — if the evaluator shares the generator's blind spots, reflection adds cost without adding quality. Give it concrete, checkable criteria rather than "is this good?". The second is the infinite-polish loop: without a hard round cap, an over-eager critic can revise forever, never satisfied. Cap the rounds and return the best effort.

Recommended

Wherever you can, make the evaluator deterministic. "Run the test suite", "validate against the JSON schema", "check the figures sum correctly" are far stronger critics than a second LLM opinion, and they cost almost nothing.

Pattern 2 — Tool Use and the ReAct loop

What it is. Tool Use lets the agent reach outside the model — to call a function, query a database, hit an API, run code, or search the web. The dominant structure for this is ReAct, which alternates thought, action and observation in a loop: the model reasons about what to do, takes one action such as a tool call, reads the observation that comes back, and repeats until it can answer. Two properties make this pattern foundational. Because every step is written down, the trace is auditable — you can see exactly why the agent did what it did. And because each decision is grounded in a real observation rather than the model's prior, ReAct reduces hallucination relative to answering in one shot.

When to use it. Almost always. Any agent that needs current data, must act on the world, or has to ground its answers in a source of truth uses tools. The quality of a tool-using agent depends far more on the clarity of the tool descriptions and schemas than on raw model strength.

SYSTEM = "Reason step by step. Use a tool when you need facts. Stop when you can answer."

def react(question, tools, max_steps=8):
    history = [question]
    for _ in range(max_steps):
        step = model(SYSTEM, history)          # thought + optional action
        if step.is_final:
            return step.answer
        observation = tools[step.action.name](**step.action.args)
        history += [step.thought, step.action, observation]
    return "Step budget exhausted."            # explicit stop condition

Failure mode. Vague tool descriptions are the killer. If two tools sound similar, or arguments are under-specified, the agent picks the wrong one or fills arguments badly — and then loops trying to recover. Write tool descriptions as if for a new colleague: say what the tool does, what each argument means, and when not to use it. Our deep dive on reliable function schemas for agents covers the schema design that makes or breaks this pattern.

Pattern 3 — Planning

What it is. Rather than improvising step by step, the agent first decomposes the goal into an ordered set of sub-tasks, then executes them — often revising the plan as it learns. This is the classic planner-executor split: one phase produces a plan, another carries it out. The planner can be the same model in a separate prompt, or a dedicated planning component.

When to use it. Planning helps when a task has several dependent steps that benefit from being thought through up front — "research three vendors, compare on five criteria, and write a recommendation" goes better with a plan than with pure improvisation. For short, single-hop tasks, an explicit planning phase is overhead; a plain ReAct loop is leaner.

def plan_and_execute(goal, tools):
    plan = planner(goal)                 # ordered list of sub-tasks
    results = []
    for step in plan:
        result = react(step, tools)      # each sub-task is its own loop
        results.append(result)
        if step.may_replan and off_track(result, goal):
            plan = planner(goal, context=results)   # adapt, don't barrel on
    return synthesise(goal, results)

Failure mode. Rigid plans that never adapt. If the agent commits to a plan written before it had any information and then refuses to revise when reality diverges, it marches confidently off a cliff. Build in a replanning checkpoint. The opposite failure — replanning on every step — throws away the benefit and burns tokens, so gate replanning behind a real "are we off track?" signal.

Pro tip

Surface the plan to the user before executing the expensive steps. A visible plan is a cheap human-in-the-loop checkpoint: the user catches a misread goal in seconds, before the agent spends ten tool calls solving the wrong problem.

Pattern 4 — Multi-Agent Collaboration

What it is. Instead of one generalist agent, several specialised agents work on one problem and exchange results — a researcher hands findings to a writer who hands a draft to a reviewer, for example. Each agent has a narrower role, its own prompt, and often its own tools and isolated context window.

When to use it. Multi-agent designs pay off in two situations: when sub-tasks are genuinely parallel (three researchers each take a vendor at the same time), or when a sub-task needs an isolated context so it is not distracted by the rest of the conversation. They do not, on their own, make an agent smarter. The honest default is that a single capable agent with good tools and a tidy context window beats a swarm for most tasks.

# Specialists with isolated context, composed by a coordinator
researcher = Agent(role="research", tools=[search, fetch])
writer     = Agent(role="draft",    tools=[])
reviewer   = Agent(role="review",   tools=[fact_check])

def collaborate(brief):
    findings = researcher.run(brief)
    draft    = writer.run(brief, context=findings)
    review   = reviewer.run(draft)
    return draft if review.passes else writer.run(brief, context=findings, notes=review)

Failure mode. Coordination cost and context loss at the seams. Every hand-off is a chance to drop information, contradict an earlier agent, or spawn a back-and-forth that never converges. Multi-agent systems also multiply token spend and introduce new ways to fail silently. Treat extra agents as a cost you must justify with a measured win, not a default. Our guide to agent memory management patterns covers how to share state across agents without losing the thread, and the comparison of multi-agent orchestration with LangGraph shows one concrete implementation.

Pattern 5 — Orchestrator-Worker

What it is. A structured form of multi-agent collaboration. A central orchestrator receives the task, breaks it into sub-tasks, dispatches each to a worker agent, and gathers the results into a final answer. The distinguishing feature is that workers are interchangeable and stateless with respect to one another — the orchestrator owns the coordination, and workers just do their slice. This is the pattern behind most "deep research" and large-scale summarisation systems.

When to use it. When the work fans out cleanly into independent, similar sub-tasks — summarise forty documents, analyse twenty pull requests, enrich a list of companies. Because workers do not depend on each other, they parallelise beautifully, which is where the latency win comes from. As of 2026, stateful graph frameworks have matured to support exactly this shape at scale; LangGraph, for instance, reached a stable semantic version and is used to run dozens of concurrent agent instances in production. The pattern is framework-agnostic, but mature tooling makes it far less painful.

def orchestrate(task, worker, max_parallel=8):
    subtasks = orchestrator_plan(task)              # fan out
    results = run_in_parallel(                        # workers are independent
        [lambda s=s: worker(s) for s in subtasks],
        limit=max_parallel,
    )
    return orchestrator_synthesise(task, results)    # fan in

Failure mode. The orchestrator becomes a bottleneck and a single point of failure. If it mis-decomposes the task, every worker does the wrong thing in parallel — fast, expensive and wrong. And the synthesis step quietly hits its own context limit when forty worker outputs all have to be read back in. Cap the fan-out, and summarise worker results before the final synthesis rather than concatenating them raw.

Watch out

Unbounded fan-out is the orchestrator's version of the runaway loop. A task that decomposes into "one worker per row" against a large input can launch thousands of calls in seconds. Always set a hard parallelism limit and a total-call budget.

Pattern 6 — Evaluator-Optimiser

What it is. A close cousin of Reflection, but with the roles formalised into two distinct agents and a tighter loop. One agent — the optimiser — generates a candidate. A second agent — the evaluator — scores it against explicit criteria and returns structured feedback. The optimiser uses that feedback to produce a better candidate, and the loop repeats until the evaluator's bar is met or a round cap is hit. Where Reflection is often a single model talking to itself, Evaluator-Optimiser keeps generation and judgement cleanly separated.

When to use it. When you have a clear, scoreable quality target and iteration genuinely improves the result: drafting copy to a brand voice, generating code against a test suite, producing a query that returns the right rows, tuning a summary to a length and tone. The separation is the point — an independent evaluator catches what a self-critiquing generator misses.

def evaluator_optimiser(task, threshold=0.9, max_rounds=4):
    candidate = optimiser(task)
    for _ in range(max_rounds):
        score, feedback = evaluator(candidate, rubric=task.rubric)
        if score >= threshold:
            return candidate
        candidate = optimiser(task, feedback=feedback)
    return candidate  # best effort within budget

Failure mode. A weak or mis-calibrated evaluator. If the rubric is vague, the evaluator's scores wobble and the optimiser chases noise; if the evaluator is too lenient, the loop accepts mediocre work; too strict, and it never converges. The fix is the same discipline you would apply to any judge: write a concrete rubric, and validate the evaluator against a small golden set of human-scored examples. Our walkthrough on building an LLM evaluation suite with golden sets and judges is the natural next step here.

Shipping agents? Show the work on your Builder profile.

Every article on AI Tech Connect is written by a Verified Builder, and the people hiring across India and the UK browse it to find them. Adding your profile is free.

Become a Verified Builder →

Composing patterns in production

No real system uses one pattern in isolation. The patterns nest. A production research assistant might look like this: an orchestrator receives the brief and writes a plan; each plan step is a worker running a ReAct tool-use loop; the worker outputs pass through an evaluator-optimiser loop before the orchestrator synthesises them; and a final reflection step checks the synthesis against the original brief. Five of the six patterns, one system.

The table below is the quick-reference you actually use when designing — what each pattern is for, and what it costs you. Cost here means tokens and money; latency means wall-clock time to a final answer. Both rise as you add patterns, which is exactly why you add them deliberately.

Pattern Reach for it when… Cost / latency trade-off
Reflection A wrong answer is expensive and you can state the quality criteria Roughly 2× cost and latency per accepted output; cheap if the critic is deterministic
Tool Use (ReAct) The agent needs live data, must act, or must ground answers in a source Latency scales with number of steps; the foundation, so the baseline cost
Planning Multi-step, dependent tasks that benefit from being thought through up front One extra planning call up front; overhead for short, single-hop tasks
Multi-Agent Collaboration Sub-tasks are genuinely parallel or need isolated context Multiplies token spend; coordination overhead; new silent-failure modes
Orchestrator-Worker Work fans out into many independent, similar sub-tasks Lowest latency via parallelism; cost scales with fan-out, so cap it
Evaluator-Optimiser There is a clear, scoreable target and iteration improves the result Cost scales with rounds; only as good as the evaluator's calibration

One more composition note: keep the patterns legible to each other. When a planner hands a sub-task to a worker, pass a clean, scoped instruction — not the entire conversation so far. When an evaluator returns feedback, return it as structured fields, not prose the optimiser has to re-parse. Most composition bugs are context-hygiene bugs in disguise.

From a verified Builder

"We rebuilt a 'five-agent' pipeline as a single planner with three tools and a reflection pass. It was faster, a third of the cost, and we could finally read the trace when it went wrong. The patterns are real — but fewer, better-composed ones beat a crowd."

— Anjali, Verified Builder · Pune, India

Common pitfalls — the failures are architectural

It bears repeating because it is the single most useful thing to internalise: through 2024 to 2026, the agents that failed in production overwhelmingly failed on architecture, not on model quality. The same handful of mistakes recur regardless of which framework or model is underneath.

  • No stopping condition. Loops that can run forever, fan-outs with no cap, reflection that polishes endlessly. Every loop needs a step budget, a cost budget and a definition of done.
  • Context bloat. Stuffing the window with the full history every turn degrades reasoning and inflates cost. Pass scoped, relevant context; summarise old turns.
  • Ambiguous tools. Vague descriptions and under-specified arguments cause wrong tool choices and recovery loops. Write tool docs for a new colleague.
  • Silent failures. An agent that swallows a tool error and carries on produces confident nonsense. Make failures loud, and instrument everything.
  • Patterns added without measurement. Reaching for multi-agent or reflection because it sounds sophisticated, not because a metric demanded it. Add a pattern, measure, keep it only if it earned its place.

The thread running through all five is observability. You cannot fix an architectural failure you cannot see. Before you scale an agent, you need traces, cost attribution and the ability to replay a bad run. Our guide on instrumenting agents with OpenTelemetry is the right companion to this section.

Avoid

Reaching for a bigger or newer model as the first fix for a misbehaving agent. If the problem is a runaway loop, a bloated context, or a badly described tool, a stronger model burns more money making the same architectural mistake faster.

Next steps

You now have the full vocabulary. The most valuable thing you can do with it is restraint: design the simplest system that solves your task — a planner that uses tools, with a stop condition — and add Reflection, Multi-Agent Collaboration, Orchestrator-Worker or Evaluator-Optimiser only when a measured failure justifies the cost. Re-read any agent codebase through this lens and it stops being a tangle and becomes a composition of named moves you can reason about.

From here, three concrete directions. To choose the tooling that will host these patterns, our comparison of agent frameworks for 2026 and the agent-SDK landscape map the options across OpenAI, Google and Anthropic. To go deeper on the build loop itself, return to how to build a production AI agent. And to make the whole thing measurable, start the evaluation suite before you scale, not after.

The patterns are the durable part. Frameworks will keep churning; the six moves will still be how agents are built.

Pattern naming and definitions draw on widely used practitioner taxonomies, including SitePoint's 2026 write-up on agentic design patterns, and field guidance from vdf.ai and innovatrixinfotech.com.