Which agent SDK should we pick for a single-vendor stack?

Pick the first-party SDK that matches your existing model commitment. If you are already on OpenAI for chat workloads, the Agents SDK with the Responses API is the path of least resistance. On Google Cloud or Vertex, ADK is the better fit. On Anthropic, the Agent SDK plus Managed Agents covers long-horizon work. The price you pay is harder model-switching later.

Is LangGraph still relevant now that the labs ship their own SDKs?

Yes. LangGraph 1.0, generally available since 22 October 2025, is the only one of the five with a model-agnostic graph runtime, durable state and a mature human-in-the-loop API. For multi-model orchestration and any workflow that needs to pause for days, it is still the strongest choice.

Do the SDKs support MCP and A2A?

All five support the Model Context Protocol for tool access. Anthropic, OpenAI, Google, Microsoft and AWS co-govern MCP through the Linux Foundation Agentic AI Foundation. A2A, the agent-to-agent protocol, has native support in Google ADK, LangGraph, CrewAI, LlamaIndex Agents, Semantic Kernel and AutoGen. OpenAI and Anthropic support A2A via community adapters today.

Can we mix LangGraph with Anthropic Agent SDK and Google ADK in the same stack?

Yes, and a growing number of teams do exactly this. LangGraph handles the outer graph, individual nodes wrap first-party SDKs as tools, and MCP servers expose shared resources. The trade-off is two layers of observability and two SDK upgrade paths to track.

What about CrewAI versus LangGraph for multi-agent debate?

CrewAI, with roughly 45.9k GitHub stars and the new NVIDIA NeMo Agent Toolkit integration, is purpose-built for crew-style role play and reaches a working multi-agent debate faster. LangGraph wins when you need explicit control over the message graph, deterministic checkpoints and replay. Pick CrewAI for prototypes and content-style debates, LangGraph for anything that has to be auditable.

Agent-SDK wars: OpenAI vs Google ADK vs Anthropic

What changed in the last sixty days

If you stepped away from agent tooling for the first quarter of 2026, you came back to a different stack. OpenAI used its March platform update to ship the open-source Agents SDK and the new Responses API — a deliberate consolidation that folds in the tool-use ergonomics of the soon-to-be-deprecated Assistants API. Google countered in April with Agent Development Kit (ADK), including the Java 1.0 release that introduced an app and plugin architecture, event compaction, and a long roster of first-party tools (GoogleMapsTool, UrlContextTool, ContainerCodeExecutor, ComputerUseTool). Anthropic published the Claude Agent SDK alongside Claude 4.6, then layered Managed Agents on top — a hosted, long-horizon execution surface with durable sessions and stable harnesses.

That is just the labs. The independent ecosystem moved at the same pace. LangGraph 1.0 went generally available on 22 October 2025, with zero breaking changes, durable state and a first-class human-in-the-loop API that Uber, LinkedIn and Klarna had already been depending on in production. CrewAI sits at roughly 45.9k GitHub stars and is now a first-class target inside NVIDIA's NeMo Agent Toolkit, which adds parallel execution, speculative branching and node-level priority routing on top of CrewAI's crew-style orchestration. The toolkit also targets LangChain, LlamaIndex, Semantic Kernel and Google ADK — a quiet signal that the runtime layer underneath the SDKs is starting to consolidate.

For Indian GCC teams and UK public-sector builders, the question is no longer "which framework exists" — it is "which of the five do we standardise on, and what do we lose by choosing one"? This piece works through five real workloads, scores each SDK, and finishes with the trade-offs nobody is putting on the slide deck.

Pro tip

Before the architecture call, agree on the workload mix for the next four quarters. The right SDK for a team that ships one long-horizon agent per quarter is not the right SDK for a team running fifty short-turn extractions an hour. Most disagreements in agent-framework selection are actually disagreements about workload, not about technology.

The five workloads we scored

We picked workloads that map to what we hear from the Verified Builder network — not benchmarks designed for press releases. Each one is something a team in Bengaluru, Pune, London or Manchester is shipping this quarter.

RAG over docs

Retrieval-augmented question answering over a private corpus. Could be a 200-page UK Financial Conduct Authority handbook, an Indian bank's product brochures, or a GCC's internal Confluence. Latency target sub-second; citation fidelity non-negotiable.

Tool-using browser agent

Agent that drives a real browser to complete a task — submitting a vendor onboarding form on the GeM portal, scraping pricing from a competitor's site, or executing a workflow inside a legacy supplier portal that has no API. Computer-use primitives required.

Multi-agent debate

Two or more agents argue different positions to surface assumptions before a human ratifies. Common in policy review at Whitehall, in pricing-committee work at FMCG GCCs, and increasingly in AI safety internal reviews.

Long-horizon code patch

Agent that reads a repository, plans a multi-file change, drafts a pull request, runs tests, fixes failures and waits for code-review feedback. The session lives for hours or days. Durable state and human-in-the-loop are mandatory.

Structured-output extraction

High-throughput, schema-conformant extraction from messy inputs — invoices, contracts, customer-service transcripts. Tens of thousands of calls a day; cost per call matters more than peak quality.

The decision matrix

Five SDKs, five workloads. Each cell scored as ✓ (strong fit), ◐ (workable with caveats) or ✗ (poor fit) with a one-line note. This is the table to bring to your architecture review.

SDK	RAG over docs	Browser agent	Multi-agent debate	Long-horizon code	Structured extraction
OpenAI Agents SDK	✓ file_search built in	✓ computer_use tool first-party	◐ handoffs work, no graph	◐ no durable state out of the box	✓ Responses API JSON mode
Google ADK	✓ Vertex grounding native	✓ ComputerUseTool, UrlContextTool	✓ A2A native, multi-agent first-class	◐ event compaction helps, no checkpointing	✓ strong schema enforcement
Anthropic Agent SDK	✓ long context plus prompt cache	◐ computer-use beta, smaller tool catalogue	◐ sub-agent pattern works, no debate primitive	✓ Managed Agents = durable sessions	◐ strong quality, higher cost per call
LangGraph 1.0	✓ model-agnostic, plug any retriever	◐ depends on chosen model SDK	✓ explicit graph = auditable debate	✓ durable state was built for this	◐ overhead per call adds up at high QPS
CrewAI	◐ crew metaphor awkward for plain RAG	◐ via tool wrappers, not first-party	✓ role-play debate is its sweet spot	◐ no native checkpointing, NeMo helps	✗ crew overhead per extraction is wasteful

Feature comparison: licence, models, MCP, A2A, observability

The decision matrix tells you what each SDK is good at. The feature table tells you what you get when you adopt one — including the things that bite later.

SDK	Licence	Multi-model	MCP support	A2A support	Observability	GA status
OpenAI Agents SDK	MIT	Limited (LiteLLM bridge)	Native via Responses API	Community adapters	OpenAI Traces dashboard	GA, March 2026
Google ADK	Apache 2.0	Vertex Model Garden	Native (managed servers)	Native, first-class	Cloud Trace, OTel	GA Java 1.0, April 2026
Anthropic Agent SDK	MIT	Claude family only	Native, co-governed	Adapter required	Managed Agents traces	GA with Claude 4.6
LangGraph 1.0	MIT	Any model with provider	Native	Native	LangSmith	GA, 22 October 2025
CrewAI	MIT	Any model via LiteLLM	Via NeMo Agent Toolkit	Native	OTel, Crew Studio	Stable, ~45.9k stars

Watch out

First-party SDK lock-in is real. The OpenAI Agents SDK assumes the Responses API; the Anthropic Agent SDK assumes Messages and Managed Agents; ADK assumes Vertex tooling. Migrating off any of them later is rarely a one-line change — it is usually a re-architecture, because the framework's primitives leak into your domain code. If you anticipate switching models within twelve to eighteen months, put LangGraph in front and treat the lab SDKs as adapters.

The Indian GCC angle

For the Global Capability Centres in Bengaluru, Hyderabad, Pune and Gurugram, the cost ledger is unambiguous. A boutique consulting practice or GCC platform team will typically lose two to four weeks per SDK evaluation cycle — engineering time, plus security review, plus procurement. With five candidates, that is a quarter spent comparing instead of shipping. The pragmatic move is to short-list two SDKs, not five, and use the workload mix to drop three of them on day one.

One concrete example we are seeing repeatedly: a 400-engineer GCC running structured-output extraction at high QPS for a UK or US parent company. CrewAI is the wrong tool here — the crew metaphor adds cost and latency to every call, and at fifty thousand extractions a day that compounds. The right shortlist is OpenAI Agents SDK or ADK, scored on price per million tokens against the parent's existing cloud commitment. Most teams arrive at OpenAI by default; the GCCs that took twenty minutes to map their workload to ADK saved roughly forty percent on per-call cost because their parent was already on Google Cloud.

The second pattern: data-residency-sensitive workloads for Indian banks and insurers. RBI's outsourcing guidance and the DPDP Act push towards in-country inference. ADK on Vertex (with Mumbai region) and Anthropic Agent SDK via Bedrock (with Mumbai region) both qualify; OpenAI's footprint in India is thinner. This is not a quality call — it is a procurement call, and the SDK picks itself.

The UK public-sector angle

Across the channel, the constraints are different and equally specific. NHS Trusts, the Home Office and local-authority procurement frameworks have layered compliance requirements: the UK GDPR, the draft UK Frontier AI Bill reporting expectations, plus the national procurement frameworks (G-Cloud 14, Crown Commercial Service AI DPS). The SDK choice has to fit a supplier model the framework already recognises.

For most NHS Trust pilots, the real question is whether the agent can be hosted under the Trust's existing Microsoft or Google cloud agreement, with full audit and a recognised data processing footprint. ADK on Vertex (UK regions) and OpenAI Agents SDK on Azure (UK South) both clear this bar. Anthropic via Bedrock (London) is increasingly viable. LangGraph is attractive for procurement — it is cloud-agnostic and the Trust can describe it as "infrastructure" rather than a model dependency, which simplifies the compliance narrative considerably.

Public-sector teams should also pay attention to A2A. Whitehall is moving towards a model where one department's agent can be discovered and invoked by another department's agent under a known protocol. ADK and LangGraph have native A2A; OpenAI and Anthropic require adapters. If your two-year roadmap includes inter-departmental agents, that is a real constraint, not a theoretical one.

Want to discuss this with other verified Builders?

Every article on AI Tech Connect is written by a Verified Builder or our editorial team. Browse profiles, shortlist who you want to hire or collaborate with.

Browse Builders →

Multi-model orchestration: is the polyglot stack viable?

The interesting answer is yes, with a specific shape. The pattern that works in production looks like this: LangGraph as the outer graph, with individual nodes wrapping a first-party SDK as an internal tool. One node calls Anthropic Agent SDK for long-context analysis; another calls Google ADK for a structured extraction backed by Gemini's grounding; a third calls OpenAI's Responses API for a fast tool-use sub-task. All three share an MCP server for the corpus and a second MCP server for the tool catalogue.

This works because MCP is now a genuinely shared protocol. By February 2026 the Model Context Protocol crossed 97 million monthly SDK downloads, and the Linux Foundation's Agentic AI Foundation — co-founded by OpenAI, Anthropic, Google, Microsoft, AWS and Block — now governs both MCP and A2A across nearly 150 member organisations. We covered the broader interop story in our AGNTCY interoperability piece; the practical upshot is that you can hot-swap a tool server without re-deploying the agent.

The cost of the polyglot stack is honest: two or three SDK upgrade paths to track, two layers of observability (LangSmith plus the lab's native tracing), and a non-trivial DevOps burden. For most teams that is the wrong trade-off. For the teams that explicitly need model diversity — typically platform teams serving multiple internal customers, or any team that wants to keep switching costs low — it is exactly right.

What about the skills layer above SDKs?

One thing we did not score: the emerging "skills" layer that sits above all five SDKs. GitHub's gh skill is the most visible — a package manager for agent skills. ADK has its own SkillToolset that lets agents load domain expertise on demand. Anthropic ships skills bundled with Managed Agents. The pattern is the same in each case: small, composable, version-controlled units of capability that any SDK can load.

If your shortlist is between two near-equivalent SDKs, look at the skills ecosystem before committing. The framework that lets your team ship one well-tested skill and reuse it across three agents is the framework that compounds.

The IDE layer is changing the shape of the question

Worth noting: the SDK is no longer the only place agent loops live. Cursor 3's parallel agents push much of the orchestration into the IDE itself — for code workloads, the relevant question is increasingly "which model does my IDE call" rather than "which SDK do I write". For enterprise workloads beyond code — for instance, the long-running operational agents in ServiceNow's autonomous workforce — the SDK still matters because the runtime is yours.

So, which one do you pick?

There is no winner; there is a workload-shaped answer. To compress the matrix into a four-line heuristic:

Single-vendor stack with short to medium-horizon agents — pick the first-party SDK matching your model commitment. OpenAI for Azure shops, ADK for Google Cloud, Anthropic for Bedrock-heavy estates.
Long-horizon, durable, audit-heavy — LangGraph 1.0. The durable-state primitive is genuinely better than what the labs ship.
Multi-agent debate or role-play — CrewAI for prototypes; LangGraph if it has to be auditable.
High-throughput structured extraction — OpenAI Agents SDK or ADK. Skip the orchestration frameworks; the overhead per call is the killer.

Whichever you pick, write your domain code against your own interface, not the SDK's. The labs are shipping fast and they will keep changing primitives. Treat them as adapters, not as foundations, and the next sixty days of releases will not require another quarter of evaluations.

Primary sources for this piece: platform.openai.com/docs, google.github.io/adk-docs, docs.anthropic.com, langchain-ai.github.io/langgraph, and github.com/crewAIInc/crewAI.

Agent-SDK wars: OpenAI vs Google ADK vs Anthropic — which to pick