What changed this month
- DeepSeek V4 Pro is the top open-weight model on the aggregate Artificial Analysis Intelligence Index, with a score of around 52 (May 2026). That is the broad cross-task index, not a single coding benchmark.
- The open-weight field has filled out fast — Qwen 3.6, Kimi K2.6, GLM-5.1, Gemma 4, MiniMax-M2.7 and Ring-2.6-1T all shipped inside roughly six weeks.
- The capability gap to closed frontier models is now about six to nine months — narrow enough that, for many workloads, cost and control decide the call rather than raw capability.
- Licence matters as much as the score — DeepSeek ships MIT, Qwen and Gemma ship Apache-2.0, and Meta keeps the Llama community licence. The permissive camp has effectively won the developer mindshare race.
If you only want the benchmark depth or the self-hosting maths, we have already written those. This piece is the level above: the aggregate leaderboard picture, and the decision you actually face when a free, openly licensed model is genuinely competitive with what you are paying an API for.
Treat the Intelligence Index as a shortlisting tool, not a verdict. A score of ~52 tells you DeepSeek V4 Pro belongs in your evaluation set; it does not tell you it will win on your tickets. Always run your own task-representative evals before you switch a production route.
The May 2026 open-weight leaderboard, in one table
Here are the leading open-weight models as the field stands this month. We have deliberately not invented per-model index numbers — only the figures we can stand behind appear here. Use this to scope a shortlist, then benchmark on your own workload.
| Model | Lab | Licence | Context | Notable strength |
|---|---|---|---|---|
| DeepSeek V4 Pro | DeepSeek | MIT | 1M tokens | #1 open-weight on the Artificial Analysis Intelligence Index (~52); 1.6T-total / 49B-active MoE |
| DeepSeek V4 Flash | DeepSeek | MIT | 1M tokens | 284B-total / 13B-active MoE; the cheaper-to-serve sibling for high-volume routes |
| Qwen 3.6 | Alibaba | Apache-2.0 | Long context | Strong coding line; Qwen3.6-27B reports ~77.2% on SWE-Bench Verified |
| Kimi K2.6 | Moonshot | Open weight | Long context | Competitive agentic and long-document reasoning |
| GLM-5.1 | Z.ai | Open weight | Long context | Well-rounded general assistant performance |
| Gemma 4 | Apache-2.0 | Long context | Permissively licensed; strong on smaller, edge-friendly footprints | |
| MiniMax-M2.7 | MiniMax | Open weight | Long context | Efficiency-focused general model |
| Ring-2.6-1T | inclusionAI | Open weight | Long context | Trillion-scale entrant pushing the open frontier upward |
DeepSeek V4 Pro and V4 Flash both released on 24 April 2026, both under MIT, both with a 1M-token context window. The wider point is the pace: a frontier-adjacent open model now lands every few weeks, and three of the most-used ones — DeepSeek, Qwen and Gemma — carry the two most permissive licences in common use. For builders, that combination of speed and permissiveness is the real story behind the leaderboard.
Note the two-tier shape DeepSeek itself has adopted. V4 Pro is the capability play — a 1.6T-total mixture-of-experts with 49B parameters active per token, which is what earns the top index slot but also demands the heaviest serving footprint. V4 Flash, at 284B-total and 13B-active, is the volume play: materially cheaper to run, fast enough for interactive workloads, and still openly licensed. Most teams that adopt open weights end up running both — Pro for the hard tickets, Flash for the long tail — rather than betting everything on a single model. That same Pro-plus-Flash pattern now recurs across the field, and it is the closest thing the open ecosystem has to the "frontier model plus mini model" tiering that the closed labs popularised.
When open weights beat a closed frontier model
The honest answer is "it depends on what you are optimising for". Capability alone rarely settles it now that the gap has closed to six to nine months. Here are the four situations where an open-weight #1 like DeepSeek V4 Pro is the better call for an Indian or UK team.
1. Data residency and sovereignty are hard requirements
If your data cannot leave a jurisdiction, a model you host inside it removes the question entirely. Under India's Digital Personal Data Protection Act, Data Fiduciaries carry real accountability for where personal data flows and who processes it; running an open-weight model on infrastructure you control keeps that boundary clean. UK teams handling regulated or public-sector data face the same instinct under UK GDPR and sector rules — self-hosting an MIT-licensed model on a UK region sidesteps the cross-border processing analysis that a closed API forces you into. With a closed frontier model you are trusting a contractual data-handling commitment; with an open-weight model the data simply never leaves.
2. Cost control at volume on subsidised compute
Per-token API pricing is convenient until your volume makes it the largest line in your bill. Self-hosting converts a variable per-call cost into a fixed compute cost, and that maths can flip hard in open weights' favour once utilisation is high. The economics are sharper still where compute is subsidised: India's IndiaAI Mission is putting tens of thousands of GPUs into the market at heavily discounted hourly rates, and the UK's AI Research Resource (AIRR) gives accredited teams access to national accelerator capacity. We have run the self-hosting break-even in detail in a separate piece — see our 8×H100 break-even analysis and the IndiaAI subsidised-GPU breakdown for the numbers.
"Free weights" is not "free to run". A 1.6T-total MoE like DeepSeek V4 Pro needs a serious multi-GPU serving stack, a paged-attention inference engine, autoscaling, monitoring and on-call. Below a certain steady volume, a closed API is genuinely cheaper once you price in the engineering time. Do the break-even before you commit, not after.
3. Deep fine-tuning and full control of the stack
Open weights let you fine-tune, quantise, prune and distil without asking anyone. If your edge is a domain-adapted model — Indian legal language, UK financial-conduct rules, a specialist medical vocabulary — you need weights you can train on, not an endpoint. You also get reproducibility: the model does not change underneath you on someone else's release schedule, which matters for evaluation, audit and any regulated deployment that has to be re-certified when the model changes.
4. Coding and agentic work where open is already close
Coding is the area where open weights have caught up most visibly. DeepSeek V4 Pro and Qwen 3.6 both post coding numbers that put them squarely in production contention, not just on benchmark leaderboards — Qwen3.6-27B alone reports around 77.2% on SWE-Bench Verified, a figure that would have been frontier-only a year ago. For the full coding-benchmark depth see our SWE-Bench deep dive, and for the broader competitive picture among Chinese labs see the open-weight coding landscape.
The agentic angle is where the 1M-token window earns its keep. Coding agents spend their context budget on repository files, tool output and prior turns, and a model that can hold a large working set without aggressive truncation makes for noticeably steadier multi-step runs. Combined with a permissive licence, that means an Indian or UK team can wire a fully self-hosted coding agent — weights, inference, and the orchestration around it — without a single external API call leaving their network. For regulated codebases and internal tooling, that is a genuinely different proposition from routing every keystroke of context to a third party.
Want to discuss this with other verified Builders?
Every article on AI Tech Connect is written by or for Verified Builders. Browse profiles, shortlist who you want to hire or collaborate with.
Browse Builders →The licence question: MIT vs Apache vs Llama
For anyone shipping a commercial product, the licence is not a footnote — it is a gate. The three you will meet most often behave quite differently.
- MIT (DeepSeek V4 Pro and Flash) — about as permissive as it gets. Use it commercially, modify it, redistribute it; keep the copyright notice. Minimal friction for a product team.
- Apache-2.0 (Qwen 3.6, Gemma 4) — permissive and commercially friendly, with an explicit patent grant that gives legal teams comfort. This is the licence that has effectively won the permissive race for open-weight models in 2026.
- Llama community licence (Meta) — usable commercially but with an acceptable-use policy and a clause that triggers extra terms above a very large monthly-active-user threshold. Fine for most, but read it if you operate at consumer scale or in a sensitive domain.
The practical guidance: if licence simplicity is a priority, MIT and Apache-2.0 models give you the cleanest path to a commercial deployment. Whatever you choose, read the actual LICENSE file in the model repository before you ship — not a summary, and not a forum post.
Standardise on one permissive licence family across your model estate. Picking MIT or Apache-2.0 models by default means your legal review is a one-time exercise rather than a per-model fire drill every time a new open-weight release tempts you.
The part nobody puts on the leaderboard
An aggregate index score is the start of the work, not the end. Choosing an open-weight model commits you to owning the serving layer, and that layer is where projects quietly stall. You need an inference engine and the operational maturity around it: throughput tuning, batching, KV-cache management, autoscaling for spiky traffic, observability, and a rollback plan when a fine-tune regresses.
You also need your own evaluation harness. The reason to run task-representative evals is simple — a model that tops a public index can still lose on your specific distribution of tickets, prompts or documents. Build a small golden set from real production cases, score every candidate on it, and let that decide the route. That discipline is what separates a switch that saves money from one that quietly degrades quality.
So — open or closed?
For most Indian and UK teams the answer is not binary; it is a split. Route the workloads that demand residency, deep customisation or high steady volume to a self-hosted open-weight model — DeepSeek V4 Pro is a strong default for that lane today, with V4 Flash or Qwen 3.6 covering the cheaper, higher-throughput routes. Keep the spiky, low-volume or capability-maximal work on a closed frontier API where you do not want to own the stack. The leaderboard tells you open weights have earned a seat at that table. Your evals, your compliance map and your traffic shape tell you which lane each workload belongs in.
One last word of perspective. A year ago, the gap between the best open model and the closed frontier was wide enough that the choice mostly made itself for any serious workload. That is no longer true. With the gap down to six to nine months and the top open model carrying an MIT licence, the decision has moved from "can open weights do this?" to "do we want to own the operational and compliance trade-offs that come with running them?". That is a healthier question to be asking, and for a growing share of Indian and UK builders the answer is increasingly yes — at least for the workloads where control is worth more than the last few points of capability.
Leaderboard source: Artificial Analysis. Always confirm the current index and per-model figures at source, as the open-weight field is moving quickly.