Why does Mistral keep building dense models when the industry has moved to MoE?

Dense models are easier to serve at low concurrency, simpler to fine-tune, and have predictable latency on smaller GPU pools. For self-hosted regulated workloads on four to eight GPUs, dense architecture is often cheaper than MoE in practice — even if MoE wins on per-token cost at hyperscale.

How does Medium 3.5 compare to Devstral 2 for coding tasks?

Medium 3.5 replaces Devstral 2 in Mistral's own Vibe CLI and consolidates the coding specialist's strengths into a unified model. For most builders, Medium 3.5 is now the default; Devstral 2 only retains an edge in narrow self-hosted footprints where its smaller size matters.

Can Indian banks and UK fintechs use this under data localisation rules?

La Plateforme is EU-hosted with a DPA on offer, which satisfies most UK/EU residency requirements. For Indian RBI-regulated workloads, the open weights make on-prem deployment in Mumbai or Hyderabad regions practical, which closes the localisation gap that frontier US-hosted APIs cannot.

Should I migrate from GPT-5.5 to Medium 3.5 today?

If you are residency-constrained, price-sensitive, or want self-host optionality — yes, pilot it now. If you need the absolute top SWE-Bench Verified number, GPT-5.5 (88.7%) and Opus 4.7 (87.6%) remain ahead. Most builders should run a split-routing pilot before committing.

Mistral Medium 3.5: 77.6% SWE-Bench on a 128B dense model

Q: Is Mistral Medium 3.5 open-source or open-weight?

Open-weight, not open-source. The 128B weights are published on Hugging Face under a modified MIT licence, which permits self-hosting and most commercial use, but training data and the full training recipe are not released.

What changed on 29 April

Mistral AI released Mistral Medium 3.5 on 29 April 2026 alongside its remote-agent platform Vibe. Three things matter for builders shipping into Indian and British regulated environments:

Dense, not MoE. All 128 billion parameters fire on every token. Mistral is the last frontier lab still betting on dense architecture at this scale, and the bet has practical consequences for self-hosted serving economics.
77.6% on SWE-Bench Verified. Within ten percentage points of Claude Opus 4.7 (87.6%) and GPT-5.5 (88.7%) on the canonical agentic-coding benchmark — but at a fraction of the price, and with weights you can move into your own data centre.
EU residency by default. La Plateforme is hosted in Paris with a Data Processing Agreement on offer. For UK NHS suppliers, FCA-regulated fintechs, and Indian banks staring down RBI's data-localisation mandate, that detail is the entire decision.

Pro tip

Medium 3.5 also consolidates three previously separate Mistral models — Medium 3.1 (instruction-following), Magistral (reasoning), and Devstral 2 (coding) — into a single set of weights. If you have been routing between Mistral checkpoints for different task types, you can collapse that routing layer to one endpoint and recover the latency you lost to it.

The benchmark picture

Headline numbers without context mislead, so we have laid the four serious frontier coders side-by-side. The SWE-Bench numbers below are the Verified split — the human-curated subset of 500 GitHub issues — not Pro and not Lite. Pricing is per million tokens (in / out) at the default tier on each provider's own API as of 7 May 2026.

Model	SWE-Bench Verified	MMLU	HumanEval	Price /MTok (in / out)	Residency default	Open weights
Mistral Medium 3.5	77.6%	87.4%	92.1%	$1.50 / $7.50	EU (Paris)	Yes (modified MIT)
Mistral Devstral 2	74.2%	83.0%	90.4%	$0.80 / $4.00	EU (Paris)	Yes (Apache 2.0)
Claude Opus 4.7	87.6%	91.2%	94.6%	$5.00 / $25.00	US	No
GPT-5.5	88.7%	92.0%	95.1%	$5.00 / $30.00	US	No

Two patterns leap out. First, Mistral has closed the agentic-coding gap meaningfully — the 10-point Verified deficit to Opus 4.7 is real, but it is the smallest gap a sovereign European model has ever held against the frontier. Second, on price-per-output-token, Medium 3.5 is roughly a quarter of GPT-5.5 ($7.50 versus $30.00). The original brief flagged "~40% cheaper"; the actual gap is wider, and that should change procurement maths in both London and Mumbai.

For the methodology subtlety — why Verified, Pro, and Lite splits give very different rankings — see our explainer on the SWE-Bench Verified vs Pro benchmark gap. A model that looks dominant on Verified can collapse on Pro, and vice versa.

Dense vs MoE: the serving-economics question

The headline architectural choice — staying dense while every other frontier lab has moved to mixture-of-experts — is not nostalgia. It is a deliberate bet on the serving cost curve at the scale most enterprises actually run inference, which is rarely hyperscale.

What MoE gives you (and what it costs)

A typical MoE design routes each token through, say, 8 experts out of 128, activating perhaps 30B parameters out of a 400B total. At hyperscale concurrency — thousands of simultaneous requests — the cost per token plummets because most weights stay idle on most calls. But MoE has serving overheads:

Memory still scales with the total parameter count, not the activated count. A 400B-total MoE needs roughly the same VRAM as a 400B dense model.
Routing variance hurts batching efficiency at low concurrency. If your shop runs 20–50 concurrent requests rather than 20,000, MoE loses much of its advantage.
Fine-tuning gets harder — expert collapse is a real failure mode that needs careful regularisation.

What dense gives you

Dense Medium 3.5 fits in roughly 256 GB of GPU memory at FP8 — four H200s, or eight L40S, or a single B200 node with headroom. Latency is predictable because every token traverses the same compute path. Fine-tuning is "boring" in the best sense: standard LoRA recipes work without expert-balancing tricks.

Our reading of the B300 inference economics — the new generation of NVIDIA accelerators that began shipping in volume this quarter — is that the dense advantage holds for most regulated-workload deployments. If you are not Microsoft Azure or Yotta running tens of thousands of concurrent sessions, dense is usually the right call.

Watch out

Do not confuse "active parameters" with "served parameters" when sizing your MoE alternatives. A 400B-total / 30B-active MoE still needs the full 400B weights resident in GPU memory. We have seen procurement decks model the active count as the memory footprint, which produces a hardware spec roughly an order of magnitude too small.

The UK builder angle: NHS, FCA and procurement

British builders shipping into regulated procurement face a structural problem with US-hosted frontier APIs. NHS Digital's data-residency clauses, the FCA's operational-resilience expectations, and the broader UK Frontier AI Bill draft all push towards EU- or UK-domiciled processing. Until 29 April, the only credible "frontier and EU-hosted" options in coding were Mistral's specialist Devstral 2 and the smaller open-weight families — not the unified flagship most teams actually want.

Medium 3.5 closes that gap. La Plateforme's Paris hosting — paired with a signed DPA — clears the residency hurdle for most NHS Trust suppliers and FCA-regulated fintechs we have spoken to. For higher-sensitivity workloads where even EU egress is contested, the open weights mean a UK-domiciled deployment on G-Cloud-listed infrastructure is a procurement-friendly conversation rather than an API-vendor sales cycle.

One Cambridge-based fintech we follow has been piloting GPT-5.5 for internal developer tooling but has refused to ship customer-data-touching agents on it for residency reasons. Medium 3.5's combination of "good enough on Verified" and "weights we can move" is, on paper, exactly their unblock.

The India builder angle: BFSI, RBI and the on-prem option

The Reserve Bank of India's data-localisation framework — most acute for payment systems but pervasive across BFSI — has had a quiet but expensive effect on Indian banks' AI roadmaps. Several private-bank pilots we know are running OpenAI through approved deployment patterns, but the legal effort to keep those approvals current is non-trivial, and a board-level review can pause a programme for a quarter.

Medium 3.5's open weights collapse that risk surface. A Mumbai or Hyderabad on-prem deployment is technically straightforward — the model fits comfortably on a single B200 node — and answers the localisation question definitively. For a BFSI customer who has already spent eight figures on data-centre capacity, the marginal cost of hosting a coding model alongside their core-banking workloads is small.

The trade-off is a roughly 10-point SWE-Bench Verified gap to Opus 4.7. Whether that matters depends on your task profile. For routine refactors, internal tool generation, and code review, our experience is that the gap narrows substantially in practice — Verified is not a perfect proxy for daily work. For green-field architecture or genuinely novel problem-solving, Opus 4.7 still earns its keep.

Want to discuss this with other verified Builders?

Every article on AI Tech Connect is written by — or curated alongside — Verified Builders. Browse profiles, shortlist who you want to hire or collaborate with.

Browse Builders →

When to pick what — a decision matrix

The honest framing is that no single model wins outright. Here is how we are advising builders to route their workloads as of this week:

Scenario	Pick	Why
NHS Trust supplier, customer-data-touching agent	Medium 3.5 (La Plateforme)	EU residency + DPA without sacrificing Verified score below the 75% practical floor.
RBI-regulated bank, internal developer tooling	Medium 3.5 (self-host on B200)	Open weights answer localisation; coding quality is sufficient for refactor and review tasks.
UK SaaS startup, no residency constraint, top-quality agent	Opus 4.7	87.6% Verified plus the 1M-context window we covered in our launch piece is hard to beat.
Indian product company, cost-sensitive, mixed workload	Medium 3.5 + GPT-5.5 split	Route 80% to Medium 3.5 for routine work, escalate the 20% hardest queries to GPT-5.5.
Self-host coding specialist on tight VRAM budget	Devstral 2	Smaller footprint when you cannot stretch to a B200 node and only need the coding head.
Edge deployment, fully air-gapped	Medium 3.5 (open weights)	Modified MIT licence is the most permissive credible-quality option in the open-weight tier.

Open-weight, not open-source — say it precisely

One vocabulary point that matters in regulated procurement. Mistral has released Medium 3.5 as open-weight, not open-source. The 128B weights are downloadable from Hugging Face under a modified MIT licence, which permits self-hosting and most commercial use. But the training data, the curation pipeline, and the fine-tuning recipe are not released. A truly open-source model, in the OSI sense, would publish all of these.

This distinction matters for two reasons. First, in compliance reviews — particularly under the EU AI Act's general-purpose-AI obligations that take effect in August 2026 — the operator of an open-weight model still inherits transparency obligations the original lab has only partly addressed. Second, "open-source" in marketing copy invites scrutiny that "open-weight" does not. Use the precise term in your security and procurement filings.

For the broader landscape, our April open-weight roundup covers the GLM, Llama and Mistral releases of the past month — Medium 3.5 sits at the top of that pile by capability, but it is not alone in the category.

What we would do this week

If we were running an AI platform team in either market, the next two weeks would look like this:

Stand up a La Plateforme account and run your existing internal-coding-agent eval suite against Medium 3.5. The objective is a Verified-equivalent number on your tasks, not the leaderboard's.
Cost-model the dense self-host for your concurrency profile. Spec out a four-H200 or single-B200 footprint, then compare the all-in monthly cost against your current GPT-5.5 spend.
Tag the workloads that are residency-blocked. Anything currently routed to a US-hosted API "with caveats" is a candidate for a hard switch.
Pilot a split-routing layer. Send everything to Medium 3.5 by default; escalate to GPT-5.5 or Opus 4.7 on a confidence-score threshold. Most teams find the escalation rate is below 15%.

Source links for verification: Mistral's launch post is on the Mistral AI blog, the model card and weights live on Hugging Face, and the leaderboard numbers are tracked at llm-stats.com and corroborated by codersera.com.

For builders who want a deeper view of how Mistral's coding-specialist lineage fed into this release, our earlier Devstral 2 deep-dive remains the best context — Medium 3.5 inherits much of that work.