What changed, and what you now have to decide
Two years ago, choosing how to build an agent meant gluing together a model API, a vector store and a pile of your own loop code. By the middle of 2026 the picture has narrowed sharply. The agent-framework field has consolidated to roughly six major harnesses, each backed by a hyperscaler, a frontier lab, or a strong open-source community. That consolidation is good news: there are now defaults you can reach for. It is also a trap, because each default embeds a different philosophy about how an agent should be controlled, where its memory lives, and how much of your stack you hand to one vendor.
The six that matter are the Claude Agent SDK from Anthropic, the OpenAI Agents SDK, AWS Strands Agents, LangGraph from the LangChain team, and the two community-led options, CrewAI and AG2. There is also a seventh path worth naming: the IDE-native harness, where Cursor bakes the agent loop straight into the editor with rules files, loop detection and model-specific prompt adaptation. That is a different shape of decision, but for many small teams it is the one they actually make first.
This guide will not crown a winner. The honest answer to "which framework is best" is "best for what, on whose infrastructure, in which language". So we will frame the choice around six axes, give you a comparison table, and then walk through a concrete way to decide. The lens throughout is a builder in Bengaluru or Bristol who has to put a working agent in front of real users next quarter, not a researcher chasing a benchmark.
Axis 1 — The control model
This is the axis that decides everything else, so start here. Agent frameworks fall into three camps by how much control you hand the model over its own flow.
The first camp is the autonomous tool-loop harness: the Claude Agent SDK and the OpenAI Agents SDK. You give the model a set of tools, a goal and some guardrails, and it decides what to call and when. The Claude Agent SDK leans into this with a built-in permission model, a hooks system that fires at defined points in the agent loop, and explicit support for long-running, multi-session agents. You trade fine-grained orchestration for speed of build — you are not drawing a flowchart, you are setting a budget and a boundary and letting the loop run.
The second camp is the explicit graph or state machine, which is LangGraph's whole identity. You model the agent's workflow as a directed graph with nodes, edges and explicit state. Nothing happens that you did not draw. That is more upfront work, but it buys you deterministic branching, clean human-in-the-loop checkpoints and a state object you can inspect and replay. If your workflow has hard approval gates — a loan decision in fintech, a clinical triage step, a compliance sign-off — the graph model is usually the right one.
The third camp is role-based crews, which is where CrewAI and AG2 sit. You describe agents as roles — a researcher, a writer, a reviewer — and let them collaborate, often by passing messages. CrewAI optimises for accessibility and quick iteration; AG2, the community successor to AutoGen, leans into conversational richness and code execution between agents. This shape is intuitive and demos beautifully. It is also the hardest to make boringly reliable in production, because the coordination is emergent rather than drawn.
Write your control model down in one sentence before you pick a framework. If the sentence is "the model figures out the steps", you want a tool-loop harness. If it is "these exact steps run in this order with an approval here", you want a graph. If you cannot finish the sentence, you are not ready to choose — and that is fine, prototype first.
Axis 2 — Lock-in versus portability
Every framework sits somewhere on a line between single-vendor convenience and model-agnostic portability. AWS Strands is model-driven and integrates deeply with Amazon Bedrock, but it will also drive Claude through the Anthropic API, the Llama family, Ollama for local development, and other providers via LiteLLM — so the framework is AWS-shaped without being AWS-only. LangGraph, CrewAI and AG2 are model-agnostic by design; you wire in whichever provider you like.
The vendor SDKs are the sharper end of the trade. The Claude Agent SDK and OpenAI Agents SDK are easiest when you stay inside their respective ecosystems, and that is the point — the convenience is real. The question to ask is not "is there lock-in" but "what specifically is locked". Usually it is not the model call, which you can swap; it is the surrounding state, memory and session layer. Which is the next axis.
Axis 3 — Hosted memory and state, or do it yourself
An agent without memory restarts cold every session. You can build persistence yourself — a database, a checkpointer, a retrieval layer — or you can rent it. Anthropic's Claude Managed Agents Memory reached public beta on 23 April 2026, giving you persistent cross-session memory inside Anthropic's hosted agent runtime. That is a genuine accelerant: you stop writing plumbing and start shipping behaviour.
The catch is that hosted memory is the stickiest form of lock-in, because state is far harder to migrate than a model call. LangGraph takes the opposite stance, shipping its own persistence and checkpointing so your state lives wherever you run it. For a team in India or the UK weighing data-residency obligations under DPDP or UK GDPR, where the memory physically lives is not a footnote — it is a compliance line item. Decide who owns the agent's memory before you write the first tool.
Hosted memory and managed runtimes are wonderful until an audit, a data-residency rule, or a vendor price change forces a move. The model call ports in an afternoon; a year of accumulated agent state does not. If you are in a regulated sector, keep the persistence layer under your own control even if it costs you a fortnight of build time.
Axis 4 — Language ecosystem
Most of these frameworks lead with Python, and LangGraph, CrewAI and AG2 are firmly Python-first. If your data and ML stack already lives in Python, that alignment removes friction. The Claude Agent SDK and the OpenAI Agents SDK both ship first-class TypeScript alongside Python, which matters more than it sounds: if your product is a Node service or an edge worker, a TypeScript-native harness keeps the agent in the same runtime as the rest of your application instead of bolting on a Python sidecar.
The pragmatic rule is unglamorous — pick the language your production service already speaks. A two-person team in Manchester shipping a TypeScript SaaS should not stand up a Python microservice just to satisfy a framework, and a Pune team whose backend is FastAPI should not contort itself into Node. The language your on-call engineer can debug at 2 a.m. is the right one.
Axis 5 — Production concerns that bite after launch
Demos hide the things that decide whether an agent survives contact with real traffic. Before you commit, score each framework on the unglamorous list:
- Observability — can you trace every tool call, token and decision? Strands ships with observability hooks; LangGraph's inspectable state and the wider tracing ecosystem make replay straightforward. Treat anything you cannot trace as a future incident.
- Evaluation — you need a way to run the agent against a fixed test set and catch regressions. Graph frameworks make this easier because the steps are explicit; autonomous loops need you to build the harness around them.
- Cost control — autonomous loops can run long and spend hard. The Claude Agent SDK's permission and hooks model gives you natural choke points to cap spend; in a free-running crew, cost is emergent and harder to bound.
- Permissions and safety — who decides whether the agent may delete a file, send an email, or move money? A built-in permission model, as in the Claude Agent SDK, is a head start over rolling your own.
- Multi-session durability — long-running agents must survive restarts and resume cleanly. This is where managed memory and explicit checkpointing earn their keep.
None of this shows up in a weekend prototype. All of it shows up in week three of production. Weight these axes accordingly.
Axis 6 — IDE-native, library, or SDK
There is a shape of agent that never touches a framework SDK at all: the one living inside your editor. Cursor integrates its harness into the IDE — rules files that steer behaviour, loop detection to stop runaway sessions, and model-specific prompt adaptation under the hood. For a solo builder or a small team whose "agent" is really an aggressive coding assistant, that is often the whole answer, and a far lighter commitment than standing up the Agents SDK. The distinction is between an agent you embed in a product (library or SDK) and an agent you use to build the product (IDE-native). Be honest about which one you actually need before you reach for the heavier tool.
The comparison at a glance
One table, six frameworks, the axes that decide the call. Treat best-fit as a starting bias, not a rule.
| Framework | Backer | Control model | Language | Hosted memory | Best-fit use case |
|---|---|---|---|---|---|
| Claude Agent SDK | Anthropic (frontier lab) | Autonomous tool-loop + permissions/hooks | Python & TypeScript | Yes — Managed Agents Memory (beta) | Long-running, multi-session agents with safety gates |
| OpenAI Agents SDK | OpenAI (frontier lab) | Autonomous tool-loop + sandbox harness | Python & TypeScript | Via OpenAI platform | Teams already on the OpenAI stack |
| AWS Strands | AWS (hyperscaler) | Model-driven, light orchestration | Python & TypeScript | Via AWS services | Bedrock-centric, AWS-native builds |
| LangGraph | LangChain (OSS) | Explicit directed graph + state | Python-first | DIY — own checkpointing/persistence | Branching workflows with approval gates |
| CrewAI | Community / OSS | Role-based crews | Python-first | DIY | Fast iteration on multi-role tasks |
| AG2 | Community / OSS (AutoGen successor) | Conversational, event-driven crews | Python-first | DIY | Code-executing, conversational multi-agent |
Every article here is written for Builders. Want your name on the next one?
AI Tech Connect lists AI engineers, founders and researchers across India and the UK — and the people hiring browse it to find them. Adding your profile is free.
Become a Verified Builder →How to actually choose
Enough axes. Here is a sequence that gets a first production agent shipped without a six-month framework debate.
- Start simple. Your first agent rarely needs a multi-agent crew or a forty-node graph. A single autonomous loop with three or four well-described tools handles more than you would expect. Reach for the simplest harness that fits, and add structure only when a real failure forces it.
- Match the framework to your control needs, not the hype. Re-read your one-sentence control model from Axis 1. Autonomous goal? A tool-loop SDK. Hard branching with gates? LangGraph. Several cooperating roles? CrewAI or AG2. The sentence picks the camp; the camp narrows the shortlist to two.
- Let your infrastructure break the tie. If you live on AWS, start with Strands. If your product is TypeScript on the edge, the TypeScript-native SDKs win. If your team already runs the LangChain ecosystem, LangGraph is the path of least resistance. Where you already are is a stronger signal than any benchmark.
- Avoid premature framework lock-in — prototype on two. Build the same small agent on your top two shortlisted frameworks in a single week. Keep your tool implementations and prompt logic in plain functions that either framework can call, so the framework stays a thin shell around portable code. The one that fights you less in week one is usually the one that fights you less in production.
- Decide who owns memory and observability before you scale. Hosted memory and managed runtimes accelerate the prototype; for anything regulated or long-lived, settle the persistence and tracing question deliberately rather than inheriting whatever the framework defaulted to.
For an Indian or UK team shipping its first production agent, the safe default in 2026 looks like this: a single-vendor tool-loop SDK in the language your service already speaks, with your tools and prompts kept framework-agnostic so a later move stays cheap. Graduate to an explicit graph the moment a hard approval gate appears, and reach for a crew only when you genuinely have distinct roles that must collaborate. Most teams over-engineer the framework and under-engineer the evals. Do the opposite.
Primary references worth reading before you commit: Anthropic's guide to building agents with the Claude Agent SDK, AWS's introduction to Strands Agents, and the LangGraph overview from the LangChain team. For the in-IDE comparison point, our look at Cursor Composer 2.5 and the Cursor SDK covers the editor-native harness in detail. If your interest is the model side of the loop, see Claude Opus 4.8's dynamic workflows and parallel subagents, and for a self-hosted, fully open stack, OpenClaw and the self-hosted agent stack. Teams wanting a smarter orchestration layer should read our piece on whether your agent's control layer should be Bayesian.
The bottom line for builders
The consolidation to six harnesses means you no longer have to invent the wheel — but it also means the wheel comes pre-shaped with someone's opinion about control, memory and lock-in baked in. Pick on axes, not headlines. Write your control model in a sentence, let your existing infrastructure and language break the tie, prototype on two before you commit, and keep your tools and prompts portable so today's choice is not a five-year sentence. The framework is the easy part to change later; the evals, the permissions and the memory are the hard parts. Spend your attention there.