What changed in the last sixty days
If you stepped away from agent tooling for the first quarter of 2026, you came back to a different stack. OpenAI used its March platform update to ship the open-source Agents SDK and the new Responses API — a deliberate consolidation that folds in the tool-use ergonomics of the soon-to-be-deprecated Assistants API. Google countered in April with Agent Development Kit (ADK), including the Java 1.0 release that introduced an app and plugin architecture, event compaction, and a long roster of first-party tools (GoogleMapsTool, UrlContextTool, ContainerCodeExecutor, ComputerUseTool). Anthropic published the Claude Agent SDK alongside Claude 4.6, then layered Managed Agents on top — a hosted, long-horizon execution surface with durable sessions and stable harnesses.
That is just the labs. The independent ecosystem moved at the same pace. LangGraph 1.0 went generally available on 22 October 2025, with zero breaking changes, durable state and a first-class human-in-the-loop API that Uber, LinkedIn and Klarna had already been depending on in production. CrewAI sits at roughly 45.9k GitHub stars and is now a first-class target inside NVIDIA's NeMo Agent Toolkit, which adds parallel execution, speculative branching and node-level priority routing on top of CrewAI's crew-style orchestration. The toolkit also targets LangChain, LlamaIndex, Semantic Kernel and Google ADK — a quiet signal that the runtime layer underneath the SDKs is starting to consolidate.
For Indian GCC teams and UK public-sector builders, the question is no longer "which framework exists" — it is "which of the five do we standardise on, and what do we lose by choosing one"? This piece works through five real workloads, scores each SDK, and finishes with the trade-offs nobody is putting on the slide deck.
Before the architecture call, agree on the workload mix for the next four quarters. The right SDK for a team that ships one long-horizon agent per quarter is not the right SDK for a team running fifty short-turn extractions an hour. Most disagreements in agent-framework selection are actually disagreements about workload, not about technology.
The five workloads we scored
We picked workloads that map to what we hear from the Verified Builder network — not benchmarks designed for press releases. Each one is something a team in Bengaluru, Pune, London or Manchester is shipping this quarter.
RAG over docs
Retrieval-augmented question answering over a private corpus. Could be a 200-page UK Financial Conduct Authority handbook, an Indian bank's product brochures, or a GCC's internal Confluence. Latency target sub-second; citation fidelity non-negotiable.
Tool-using browser agent
Agent that drives a real browser to complete a task — submitting a vendor onboarding form on the GeM portal, scraping pricing from a competitor's site, or executing a workflow inside a legacy supplier portal that has no API. Computer-use primitives required.
Multi-agent debate
Two or more agents argue different positions to surface assumptions before a human ratifies. Common in policy review at Whitehall, in pricing-committee work at FMCG GCCs, and increasingly in AI safety internal reviews.
Long-horizon code patch
Agent that reads a repository, plans a multi-file change, drafts a pull request, runs tests, fixes failures and waits for code-review feedback. The session lives for hours or days. Durable state and human-in-the-loop are mandatory.
Structured-output extraction
High-throughput, schema-conformant extraction from messy inputs — invoices, contracts, customer-service transcripts. Tens of thousands of calls a day; cost per call matters more than peak quality.
The decision matrix
Five SDKs, five workloads. Each cell scored as ✓ (strong fit), ◐ (workable with caveats) or ✗ (poor fit) with a one-line note. This is the table to bring to your architecture review.
| SDK | RAG over docs | Browser agent | Multi-agent debate | Long-horizon code | Structured extraction |
|---|---|---|---|---|---|
| OpenAI Agents SDK | ✓ file_search built in | ✓ computer_use tool first-party | ◐ handoffs work, no graph | ◐ no durable state out of the box | ✓ Responses API JSON mode |
| Google ADK | ✓ Vertex grounding native | ✓ ComputerUseTool, UrlContextTool | ✓ A2A native, multi-agent first-class | ◐ event compaction helps, no checkpointing | ✓ strong schema enforcement |
| Anthropic Agent SDK | ✓ long context plus prompt cache | ◐ computer-use beta, smaller tool catalogue | ◐ sub-agent pattern works, no debate primitive | ✓ Managed Agents = durable sessions | ◐ strong quality, higher cost per call |
| LangGraph 1.0 | ✓ model-agnostic, plug any retriever | ◐ depends on chosen model SDK | ✓ explicit graph = auditable debate | ✓ durable state was built for this | ◐ overhead per call adds up at high QPS |
| CrewAI | ◐ crew metaphor awkward for plain RAG | ◐ via tool wrappers, not first-party | ✓ role-play debate is its sweet spot | ◐ no native checkpointing, NeMo helps | ✗ crew overhead per extraction is wasteful |
Feature comparison: licence, models, MCP, A2A, observability
The decision matrix tells you what each SDK is good at. The feature table tells you what you get when you adopt one — including the things that bite later.
| SDK | Licence | Multi-model | MCP support | A2A support | Observability | GA status |
|---|---|---|---|---|---|---|
| OpenAI Agents SDK | MIT | Limited (LiteLLM bridge) | Native via Responses API | Community adapters | OpenAI Traces dashboard | GA, March 2026 |
| Google ADK | Apache 2.0 | Vertex Model Garden | Native (managed servers) | Native, first-class | Cloud Trace, OTel | GA Java 1.0, April 2026 |
| Anthropic Agent SDK | MIT | Claude family only | Native, co-governed | Adapter required | Managed Agents traces | GA with Claude 4.6 |
| LangGraph 1.0 | MIT | Any model with provider | Native | Native | LangSmith | GA, 22 October 2025 |
| CrewAI | MIT | Any model via LiteLLM | Via NeMo Agent Toolkit | Native | OTel, Crew Studio | Stable, ~45.9k stars |
First-party SDK lock-in is real. The OpenAI Agents SDK assumes the Responses API; the Anthropic Agent SDK assumes Messages and Managed Agents; ADK assumes Vertex tooling. Migrating off any of them later is rarely a one-line change — it is usually a re-architecture, because the framework's primitives leak into your domain code. If you anticipate switching models within twelve to eighteen months, put LangGraph in front and treat the lab SDKs as adapters.
The Indian GCC angle
For the Global Capability Centres in Bengaluru, Hyderabad, Pune and Gurugram, the cost ledger is unambiguous. A boutique consulting practice or GCC platform team will typically lose two to four weeks per SDK evaluation cycle — engineering time, plus security review, plus procurement. With five candidates, that is a quarter spent comparing instead of shipping. The pragmatic move is to short-list two SDKs, not five, and use the workload mix to drop three of them on day one.
One concrete example we are seeing repeatedly: a 400-engineer GCC running structured-output extraction at high QPS for a UK or US parent company. CrewAI is the wrong tool here — the crew metaphor adds cost and latency to every call, and at fifty thousand extractions a day that compounds. The right shortlist is OpenAI Agents SDK or ADK, scored on price per million tokens against the parent's existing cloud commitment. Most teams arrive at OpenAI by default; the GCCs that took twenty minutes to map their workload to ADK saved roughly forty percent on per-call cost because their parent was already on Google Cloud.
The second pattern: data-residency-sensitive workloads for Indian banks and insurers. RBI's outsourcing guidance and the DPDP Act push towards in-country inference. ADK on Vertex (with Mumbai region) and Anthropic Agent SDK via Bedrock (with Mumbai region) both qualify; OpenAI's footprint in India is thinner. This is not a quality call — it is a procurement call, and the SDK picks itself.
The UK public-sector angle
Across the channel, the constraints are different and equally specific. NHS Trusts, the Home Office and local-authority procurement frameworks have layered compliance requirements: the UK GDPR, the draft UK Frontier AI Bill reporting expectations, plus the national procurement frameworks (G-Cloud 14, Crown Commercial Service AI DPS). The SDK choice has to fit a supplier model the framework already recognises.
For most NHS Trust pilots, the real question is whether the agent can be hosted under the Trust's existing Microsoft or Google cloud agreement, with full audit and a recognised data processing footprint. ADK on Vertex (UK regions) and OpenAI Agents SDK on Azure (UK South) both clear this bar. Anthropic via Bedrock (London) is increasingly viable. LangGraph is attractive for procurement — it is cloud-agnostic and the Trust can describe it as "infrastructure" rather than a model dependency, which simplifies the compliance narrative considerably.
Public-sector teams should also pay attention to A2A. Whitehall is moving towards a model where one department's agent can be discovered and invoked by another department's agent under a known protocol. ADK and LangGraph have native A2A; OpenAI and Anthropic require adapters. If your two-year roadmap includes inter-departmental agents, that is a real constraint, not a theoretical one.
Want to discuss this with other verified Builders?
Every article on AI Tech Connect is written by a Verified Builder or our editorial team. Browse profiles, shortlist who you want to hire or collaborate with.
Browse Builders →Multi-model orchestration: is the polyglot stack viable?
The interesting answer is yes, with a specific shape. The pattern that works in production looks like this: LangGraph as the outer graph, with individual nodes wrapping a first-party SDK as an internal tool. One node calls Anthropic Agent SDK for long-context analysis; another calls Google ADK for a structured extraction backed by Gemini's grounding; a third calls OpenAI's Responses API for a fast tool-use sub-task. All three share an MCP server for the corpus and a second MCP server for the tool catalogue.
This works because MCP is now a genuinely shared protocol. By February 2026 the Model Context Protocol crossed 97 million monthly SDK downloads, and the Linux Foundation's Agentic AI Foundation — co-founded by OpenAI, Anthropic, Google, Microsoft, AWS and Block — now governs both MCP and A2A across nearly 150 member organisations. We covered the broader interop story in our AGNTCY interoperability piece; the practical upshot is that you can hot-swap a tool server without re-deploying the agent.
The cost of the polyglot stack is honest: two or three SDK upgrade paths to track, two layers of observability (LangSmith plus the lab's native tracing), and a non-trivial DevOps burden. For most teams that is the wrong trade-off. For the teams that explicitly need model diversity — typically platform teams serving multiple internal customers, or any team that wants to keep switching costs low — it is exactly right.
What about the skills layer above SDKs?
One thing we did not score: the emerging "skills" layer that sits above all five SDKs. GitHub's gh skill is the most visible — a package manager for agent skills. ADK has its own SkillToolset that lets agents load domain expertise on demand. Anthropic ships skills bundled with Managed Agents. The pattern is the same in each case: small, composable, version-controlled units of capability that any SDK can load.
If your shortlist is between two near-equivalent SDKs, look at the skills ecosystem before committing. The framework that lets your team ship one well-tested skill and reuse it across three agents is the framework that compounds.
The IDE layer is changing the shape of the question
Worth noting: the SDK is no longer the only place agent loops live. Cursor 3's parallel agents push much of the orchestration into the IDE itself — for code workloads, the relevant question is increasingly "which model does my IDE call" rather than "which SDK do I write". For enterprise workloads beyond code — for instance, the long-running operational agents in ServiceNow's autonomous workforce — the SDK still matters because the runtime is yours.
So, which one do you pick?
There is no winner; there is a workload-shaped answer. To compress the matrix into a four-line heuristic:
- Single-vendor stack with short to medium-horizon agents — pick the first-party SDK matching your model commitment. OpenAI for Azure shops, ADK for Google Cloud, Anthropic for Bedrock-heavy estates.
- Long-horizon, durable, audit-heavy — LangGraph 1.0. The durable-state primitive is genuinely better than what the labs ship.
- Multi-agent debate or role-play — CrewAI for prototypes; LangGraph if it has to be auditable.
- High-throughput structured extraction — OpenAI Agents SDK or ADK. Skip the orchestration frameworks; the overhead per call is the killer.
Whichever you pick, write your domain code against your own interface, not the SDK's. The labs are shipping fast and they will keep changing primitives. Treat them as adapters, not as foundations, and the next sixty days of releases will not require another quarter of evaluations.
Primary sources for this piece: platform.openai.com/docs, google.github.io/adk-docs, docs.anthropic.com, langchain-ai.github.io/langgraph, and github.com/crewAIInc/crewAI.