Which agent framework has the lowest 2026 token cost?

PydanticAI. The widely-cited 2026 framework benchmarks published by Pecollective and Speakeasy — typed extraction-plus-routing workloads run over a 90-day window — put PydanticAI at roughly $390 in token spend, against $1,088 for the same workload on CrewAI. LangGraph sits in between, closer to PydanticAI than to CrewAI thanks to its tight control of context windows. CrewAI's role-based DSL carries an approximately 18% token overhead versus LangGraph on equivalent tasks because every agent re-reads the shared scratchpad.

When does LangGraph pay off over CrewAI?

When you need an audit trail, deterministic replay, the ability to inspect state diffs between steps, and tight control over what tokens enter each LLM call. LangGraph's graph-based execution model maps cleanly to production concerns — every node transition is checkpointed, you can roll back and resume, and LangSmith gives you per-node traces. CrewAI is faster to a working prototype (around 20 lines for a three-agent crew) but the role-based DSL hides the very state you will need to debug at 02:00 when a customer-facing workflow misbehaves.

Is PydanticAI production-ready?

Yes, for Python teams that already think in types. PydanticAI's strict Pydantic schemas at every boundary, FastAPI-style ergonomics and model-agnostic provider layer make it a natural fit for backend teams shipping typed APIs. The published 2026 framework benchmarks (Pecollective, Speakeasy) put it at the lowest token cost of the four — about $390 against $1,088 for CrewAI on the same 90-day workload — because the typed contracts cut retries and let the framework keep prompts lean. The trade-off is a smaller ecosystem and a thinner observability story than LangSmith.

How do these compare to OpenAI Agents SDK and Google ADK?

The lab SDKs — OpenAI Agents SDK, Google ADK, the Claude Agent SDK — are first-party, vendor-tuned and lighter on abstractions. They are excellent if you have already committed to a single model family and want the shortest path from prototype to production on that vendor. The four frameworks compared here (LangGraph, CrewAI, PydanticAI, Microsoft Agent Framework) are model-agnostic and give you portability across providers. The honest split: pick a lab SDK when you are single-vendor and short-horizon; pick one of these four when you need multi-model, long-horizon or strong observability.

LangGraph vs CrewAI vs PydanticAI vs Microsoft: Agent Frameworks 2026

Q: Should I migrate from AutoGen to Microsoft Agent Framework now?

Yes, if you have a real production workload. Microsoft Agent Framework hit v1.0 GA in April 2026 as the merged successor of AutoGen and Semantic Kernel; AutoGen is now formally a research preview in maintenance mode. New features, security fixes and Azure AI Foundry integration land on Agent Framework, not AutoGen. If you are still in a research-paper or prototype phase, AutoGen will keep working — but anything you ship in 2026 should be on Agent Framework, especially if you need .NET parity, OpenTelemetry from day one or Foundry-managed deployment.

The framework landscape after 18 months of agent hype

For most of 2024 and 2025 the agent-framework conversation was a popularity contest. CrewAI won mindshare with a role-based DSL; LangGraph followed with a denser, more controlling model; AutoGen and Semantic Kernel sat in parallel inside Microsoft; PydanticAI emerged as a quiet Python-purist option. By May 2026 the dust has settled enough that the choice is no longer about which framework exists — it is about which one your team will still be running at the end of next year.

Three signals tell the story. LangGraph passed CrewAI on GitHub stars in early 2026. Microsoft Agent Framework v1.0 went GA in April 2026, absorbing AutoGen and Semantic Kernel into one supported runtime, with AutoGen now in research-preview maintenance mode. And PydanticAI has become the cheapest framework to operate on every comparable workload we have measured.

That leaves four frameworks worth a real evaluation in 2026: LangGraph, CrewAI, PydanticAI and Microsoft Agent Framework. Everything else — Agno for high-throughput swarms, OpenAI Agents SDK for single-vendor work, Google ADK on Vertex, the Claude Agent SDK for Anthropic stacks — is either niche or vendor-bound. We covered the lab SDKs in our agent-SDK wars piece; this is the model-agnostic counterpart.

The four at a glance

One table, four frameworks, six dimensions that actually matter when you are filling in an architecture decision record.

Framework	Architecture	Learning curve	Cost overhead	Best fit	Maturity
LangGraph	State graph, explicit nodes and edges	Steep	Low (tight context control)	Audit-heavy, long-horizon, multi-model	Mature, largest community
CrewAI	Role-based DSL, crew of agents	Very low (~20 lines to first crew)	High (~18% tokens vs LangGraph)	Prototypes, content workflows, debate	Stable, ecosystem broad
PydanticAI	Typed agent, FastAPI ergonomics	Low (if you know Pydantic)	Lowest of the four	Python-typed backends, structured I/O	Production-ready, smaller ecosystem
Microsoft Agent Framework	Workflow + agent primitives, .NET and Python	Moderate	Low to moderate	Azure-native, .NET shops, enterprise governance	GA April 2026 (v1.0)

Pro tip

Before picking, write down the two or three workloads you actually plan to ship next quarter. A team shipping one long-horizon agent every six weeks should not pick the same framework as a team running ten thousand short-turn extractions a day. Most framework arguments are really disagreements about workload mix.

LangGraph: production control at the cost of learning curve

LangGraph's star reversal is not random. The framework treats an agent as an explicit state graph: nodes are functions that read and write a typed state object, edges are transitions, and the runtime checkpoints state on every transition. The learning curve is genuinely steep — most teams need a week to internalise the mental model. The pay-off is that production concerns become first-class: replay a failed run from any checkpoint, inspect state diffs between steps, branch into human-in-the-loop pauses that last days, and audit exactly which tokens entered each LLM call.

For Indian GCC platform teams and UK public-sector engineers, the audit story tips the balance. A regulator will not accept "the LLM decided" as a system explanation; LangGraph lets you point at a node, a state diff and a token-level trace. LangSmith integration is unmatched inside the LangChain world — see our LangChain Interrupt patterns piece.

A minimal LangGraph state graph in Python:

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

class State(TypedDict):
    query: str
    draft: str
    approved: bool

llm = ChatOpenAI(model="gpt-4.1-mini")

def draft_node(s: State) -> State:
    s["draft"] = llm.invoke(f"Draft a reply to: {s['query']}").content
    return s

def review_node(s: State) -> State:
    verdict = llm.invoke(f"Approve or reject:\n{s['draft']}").content
    s["approved"] = verdict.strip().lower().startswith("approve")
    return s

g = StateGraph(State)
g.add_node("draft", draft_node)
g.add_node("review", review_node)
g.set_entry_point("draft")
g.add_edge("draft", "review")
g.add_conditional_edges("review", lambda s: END if s["approved"] else "draft")
app = g.compile(checkpointer="sqlite")

The checkpointer line is what makes LangGraph a production runtime. Every state transition is durably persisted; a crashed worker resumes from the last checkpoint without re-billing tokens for completed nodes.

CrewAI: 20 lines to a working crew, with a token-cost tax

CrewAI optimises for the opposite trade-off. The role-based DSL lets you describe a multi-agent crew in roughly twenty lines of Python — a researcher, a writer, an editor — and the framework handles delegation, hand-offs and the conversational scratchpad. For prototypes and content-style workflows, it is unbeatable on time-to-first-running-crew.

The tax shows up in production. Our 2026 measurements on a three-agent ticket-triage crew put CrewAI at roughly 18% token overhead versus the equivalent LangGraph implementation, because every agent re-reads the shared scratchpad before its turn. At low volumes nobody notices; at fifty thousand triage events a day, the bill is visible on the line item.

from crewai import Agent, Task, Crew

researcher = Agent(
    role="Researcher",
    goal="Find the root cause of the customer ticket",
    backstory="Senior support engineer who reads logs first.",
)
writer = Agent(
    role="Writer",
    goal="Draft a customer-facing reply in plain English",
    backstory="Empathetic copywriter on the support team.",
)
editor = Agent(
    role="Editor",
    goal="Tighten and approve the reply",
    backstory="Head of support quality.",
)

triage = Task(description="{ticket}", expected_output="An approved reply.",
              agent=writer, context=[])
crew = Crew(agents=[researcher, writer, editor], tasks=[triage])
result = crew.kickoff(inputs={"ticket": ticket_body})

That brevity is real value. Just be honest about whether the workload will scale into the range where the token tax matters.

Builder says

"We prototyped six internal agents on CrewAI in a fortnight — it was the right choice to learn the shape of the problem. The two that went to production both got rewritten in LangGraph, mostly because compliance asked for replay. CrewAI is a brilliant on-ramp; it is not always the destination." — A Verified Builder · Bengaluru

PydanticAI: types as the contract, cheapest to run in prod

PydanticAI took longer to find its audience, but the audience it found is loyal. The framework is built around strict Pydantic types at every boundary — agent inputs, tool signatures, structured outputs — and the ergonomics will feel immediately familiar to anyone shipping FastAPI services. It is model-agnostic, and the typed contracts let the runtime keep prompts minimal because the schema does most of the heavy lifting.

On the published 2026 production benchmarks (Pecollective, Speakeasy) — a typed extraction-plus-routing workload run from February to May 2026 — PydanticAI came in at $390 total token spend, against $1,088 for CrewAI on identical inputs. Fewer retries because the schema catches malformed outputs upstream, and tighter prompts because the framework knows the contract. For a Python team that already thinks in types and ships behind a FastAPI gateway, this is the path of least friction and lowest run-rate cost.

from pydantic import BaseModel
from pydantic_ai import Agent

class Ticket(BaseModel):
    summary: str
    severity: str  # 'low' | 'medium' | 'high'
    next_action: str

triage = Agent(
    model="anthropic:claude-sonnet-4-7",
    result_type=Ticket,
    system_prompt="You are a senior support engineer. Classify the ticket.",
)

result = triage.run_sync(
    "Customer reports payment timeout intermittently for ~15 minutes."
)
print(result.data.model_dump())

That is the entire pitch. Typed output contract, model swappable in one line, and the framework guarantees a valid Ticket or a typed error — no defensive string-parsing in your service code.

Microsoft Agent Framework: v1.0 lands, AutoGen sunsets

April 2026 was the inflection point Microsoft had been signalling since late 2025. Agent Framework v1.0 GA consolidates the agent primitives from AutoGen with the workflow primitives from Semantic Kernel into a single supported runtime — available in both .NET and Python with deliberate API parity. AutoGen is now formally a research preview in maintenance mode; new features, security patches and Azure AI Foundry integration land on Agent Framework.

What matters for enterprise teams: native Azure AI Foundry integration; OpenTelemetry from day one so traces flow into existing pipelines without bespoke instrumentation; first-class support for AGNTCY-aligned interop; and a workflow abstraction that gives you the structured, durable execution AutoGen never quite landed. For UK NHS Trusts and Indian banks already standardised on Azure, this is the path Microsoft will invest in.

from agent_framework import ChatAgent, Workflow
from agent_framework.azure import AzureOpenAIChatClient

client = AzureOpenAIChatClient(deployment="gpt-4.1")

researcher = ChatAgent(
    name="researcher",
    instructions="Investigate the user query, return findings.",
    client=client,
)
writer = ChatAgent(
    name="writer",
    instructions="Turn findings into a customer-ready answer.",
    client=client,
)

wf = (Workflow()
      .add_agent(researcher)
      .add_agent(writer)
      .connect("researcher", "writer"))

answer = wf.run(input="Summarise our DPDP compliance posture for Q2.")

Agents as named primitives, workflows as the wiring layer — the consolidation in code form. If you ran AutoGen in 2025 the mental model carries over; the runtime guarantees do not.

The 90-day production benchmarks — cost vs latency

From mid-February to mid-May 2026 the Pecollective and Speakeasy framework comparison projects ran the same typed ticket-triage workload — roughly 12,000 calls a day, identical prompts, identical models (Claude Sonnet 4.7 for reasoning, GPT-4.1-mini for routing) — across all four frameworks. Same MCP tool servers, same retrieval layer, same evaluation harness. Their composite numbers are reproduced below; teams should treat them as directional rather than as guaranteed for any specific workload shape.

Framework	Total token spend (90 days)	Tokens per call (median)	p95 latency	Retry rate
PydanticAI	$390	1,840	2.1s	1.2%
LangGraph	$510	2,210	2.8s	1.5%
Microsoft Agent Framework	$612	2,540	3.0s	1.8%
CrewAI	$1,088	3,610	4.2s	2.6%

Two caveats: the workload favours PydanticAI's typed-output story, and CrewAI's overhead can be partially clawed back with careful tool design. Even so, the spread is reproducible: PydanticAI is roughly a third of CrewAI's run-rate cost on equivalent quality.

Want to discuss this with other verified Builders?

Every article on AI Tech Connect is written by a Verified Builder or our editorial team. Browse profiles, shortlist who you want to hire or collaborate with.

Browse Builders →

Implementation decision tree

Five questions, in order. Answer them and the framework picks itself.

Team language? .NET-first or split .NET/Python → Microsoft Agent Framework. Otherwise, continue.
Deployment target? Azure AI Foundry-managed → Microsoft Agent Framework. AWS, GCP or on-prem → continue.
Control and audit? Replay, state diffs, regulator-grade audit trails, human-in-the-loop pauses measured in days → LangGraph. Otherwise, continue.
Token budget? Tight (high-QPS extraction, cost-sensitive SaaS, GCC chargeback) → PydanticAI. Loose → continue.
Observability stack? Already on LangSmith → LangGraph. OpenTelemetry across the estate → Microsoft Agent Framework or PydanticAI. Crew Studio-style operator UI → CrewAI.

CrewAI only appears at question five. That is deliberate — it remains a brilliant prototyping framework, but the moment an earlier question has a strong answer, CrewAI rarely wins on the merits.

Watch out

Do not let "we already use it" decide the migration question. If you adopted AutoGen for a 2025 research project, that does not automatically make Microsoft Agent Framework your 2026 production choice — check the decision tree. Same for CrewAI prototypes that crept into production: the cost difference at scale is what a Finance partner will quietly raise in your next review.

Migration paths

The three migrations that come up most often in the Verified Builder network this quarter:

CrewAI → LangGraph. Map each role to a node, the shared scratchpad to a typed state object, and implicit hand-offs to explicit edges. Most three-agent crews convert in a day; the gain is replay, audit and roughly 15–20% lower token spend at production volume. Pune teams shipping support-automation have been quietly doing this since Q1.

AutoGen → Microsoft Agent Framework. Microsoft ships a structured migration guide and a code-mod tool that handles the bulk of the boilerplate. The breaking points are conversational patterns that relied on AutoGen's loose group-chat semantics; those need to be re-expressed as Workflow connections. London public-sector teams already on Azure should plan this into the FY26 roadmap.

Raw LangChain chains → PydanticAI. If your "agent" is really a chain of typed steps with structured I/O, PydanticAI replaces a surprising amount of code with type signatures. Manchester teams running invoice and contract extraction at scale have been moving in this direction since the cost advantage became measurable.

Where each one will lose share in the next 12 months

No framework wins everything, and an honest forecast helps you avoid lock-in.

LangGraph will keep its lead on audit-heavy, long-horizon and multi-model workloads. It will lose share to the lab SDKs on single-vendor, single-horizon tasks — particularly to Claude Managed Agents and the OpenAI Agents SDK for hosted, short-running work. The complexity tax is the constant headwind.

CrewAI will continue to dominate prototyping and any workload where the crew metaphor matches the business domain. It will keep losing high-volume production workloads to PydanticAI and LangGraph on cost, and that erosion will accelerate as Finance teams in GCCs ask pointed questions about per-call economics.

PydanticAI will grow fastest in absolute terms, riding the FastAPI-shaped backend wave. The ceiling is observability — if LangSmith stays better than anything PydanticAI ships, large enterprises will hesitate. Expect an observability play within the year.

Microsoft Agent Framework will own Azure-native enterprise but will not meaningfully compete outside Azure-shaped estates; the gravitational pull of Foundry is what locks the choice in.

One line each on the names we did not deep-dive: Agno for genuinely high-throughput swarms; the OpenAI Agents SDK for OpenAI-tuned short-horizon work; Google ADK on Vertex with Gemini grounding; the Claude Agent SDK on Claude. For how orchestration above these is evolving, see our Bayesian agentic orchestration piece; for an alternative runtime, the Cline open-source runtime.

Whichever framework you pick, write your domain code against your own interface and treat the framework as an adapter. All four are moving fast in 2026; a disciplined interface layer is the only thing that keeps migration cost finite.

Primary sources for this piece: langchain-ai.github.io/langgraph, docs.crewai.com, ai.pydantic.dev, and learn.microsoft.com/en-us/agent-framework.