What builders shipping next quarter need to know
- Two models, one drop — V4 Pro is a 1.6T-parameter MoE with 49B active; V4 Flash is 284B total / 13B active and aimed at cost-sensitive paths.
- 88.5% MMLU on V4 Pro per early third-party evaluations on llm-stats.com — three points above V3 and within touching distance of the closed frontier.
- Open-weight, not Apache — the DeepSeek licence permits commercial use but carries clauses worth reading before redistribution. It is not a drop-in OSI-approved licence.
- Cost is the real headline — DeepSeek's pitch, as MIT Technology Review described in its launch note, is parity at a fraction of frontier-lab pricing. For Indian and UK teams running high-volume inference, that maths is hard to ignore.
- Rough edges remain — early reports from news9live.com and community evaluators flag uneven tool-use behaviour and patchy safety alignment compared with the closed-frontier incumbents.
Before you migrate a single endpoint, model the prompt-cache economics carefully. DeepSeek's hosted API and the closed-frontier APIs price cached reads very differently — a workload that looks cheaper on paper at the headline rate can be neutral or worse once you account for cache hit rates. Run a 48-hour shadow comparison on production traffic before you commit.
What V4 actually is
DeepSeek released V4 on 24 April 2026 in two flavours. The flagship, V4 Pro, is a Mixture-of-Experts model reported at 1.6 trillion total parameters with 49 billion active per token. The smaller sibling, V4 Flash, is a 284-billion-parameter MoE with 13 billion active. Both are released as open-weight models under DeepSeek's own licence — a permissive but non-standard licence that allows commercial use with conditions and which is materially different from Apache 2.0 or MIT.
Pro is positioned as the reasoning and long-context flagship, with strong scores on MMLU (88.5% in third-party evaluation tracked by llm-stats.com), GSM8K-style maths, and code benchmarks. Flash is positioned for high-throughput, latency-sensitive workloads — chat-style assistants, ranking, classification, lightweight agents. The split mirrors the Sonnet/Haiku and Pro/Flash patterns that other labs have settled on, which is sensible: most teams want a thinking model and a fast model from the same family.
Architecturally, both rely on aggressive MoE sparsity. That has knock-on effects for self-hosted deployments — you need enough VRAM to hold the full expert set, even though only a fraction is active per token. On AWS Mumbai or AWS London, the practical home is a multi-node setup using ml.p5e or equivalent; the inference economics improve sharply with utilisation, so V4 Pro is rarely the right choice for low-volume internal tooling. For that, V4 Flash, or hosted V4 via OpenRouter or DeepSeek's own API, is the cleaner answer.
One licensing nuance is worth pulling out. DeepSeek's licence is open-weight, but the use-case restrictions and the redistribution clauses are stricter than the OSI-approved permissive licences most engineering teams treat as "safe". If your organisation has a compliance review board that signs off on third-party licences, do not assume V4 will pass on the basis that "it is open source on Hugging Face". Have your licensing team read the full text. We have flagged this for our own counsel and will revisit once the picture stabilises.
The cost angle
The reason DeepSeek V4 will dominate procurement conversations through the summer is price. Even at parity-minus on raw quality, the cost-per-million-tokens delta is large enough that finance functions in Indian and UK enterprises will start asking pointed questions. We have organised the comparison below using publicly listed figures at time of writing; verify against the providers' own pricing pages before you sign anything.
| Model | Quality (MMLU) | Context | Relative price | Licence |
|---|---|---|---|---|
| DeepSeek V4 Pro | ~88.5% | 128k | Lowest tier (frontier-class quality, fraction of incumbent price) | Open-weight (DeepSeek licence) |
| DeepSeek V4 Flash | Mid-80s (early reports) | 128k | Lowest tier — cost-optimised variant | Open-weight (DeepSeek licence) |
| Claude Opus 4.7 | High frontier band | 1M | Premium tier; prompt-cache reads materially cheaper | Closed, hosted only |
| GPT-4-class flagship | High frontier band | 128k+ | Premium tier | Closed, hosted only |
We are deliberately not quoting precise dollar-per-million figures in this table — the closed-frontier providers reprice quarterly and the DeepSeek hosted endpoint has run promotional pricing in past launches. The shape of the comparison is what matters. V4 Pro lands a meaningful step below the incumbents at the headline rate, and a self-hosted deployment on AWS Mumbai or London — once you amortise GPU cost across a steady workload — typically pulls the unit economics down further. For a fintech in Bengaluru running classification at tens of millions of tokens a day, or a UK insurer doing claims-document summarisation, the savings compound quickly.
A short example of how to call V4 Pro through OpenRouter, which most teams will use during the evaluation phase before committing to a self-hosted deployment.
import os
import requests
resp = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}",
"Content-Type": "application/json",
},
json={
"model": "deepseek/deepseek-v4-pro",
"messages": [
{"role": "system", "content": "You are a careful policy analyst."},
{"role": "user", "content": "Summarise the EU AI Act Article 6 obligations."},
],
"temperature": 0.2,
},
timeout=60,
)
print(resp.json()["choices"][0]["message"]["content"])
Where V4 is good and where it is rough
From the early third-party evaluations we have read — including llm-stats.com's tracker, the news9live.com launch coverage, and MIT Technology Review's "Three reasons why DeepSeek's new model matters" piece dated 24 April — a clear picture is emerging. V4 Pro is genuinely strong on long-context reasoning, multi-step maths, and structured code generation. It holds its own on agentic tasks where the loop is well-defined and the tools are well-typed.
The rough edges, as best we can summarise them at the time of writing:
- Tool use is uneven. Community evaluators report that V4 Pro is less reliable than the closed-frontier incumbents at strict JSON-schema tool calls under adversarial inputs. If your agent stack relies on aggressive tool-call validation, plan for a defensive retry layer.
- Safety alignment is patchier. Several launch-week red-teams have flagged that V4 will follow harmful instructions in scenarios where Claude Opus or the GPT-4 class refuse. This is a known trade-off of open-weight releases without the same RLHF/post-training investment, and it is the reason a deploying organisation under the EU AI Act needs its own evaluation pass.
- Multilingual quality is mixed. Strong on English and Chinese; less consistent on Indian languages and on specialised UK regulatory English. If you need Tamil, Bengali or Welsh in production, run your own benchmarks before relying on the launch-day numbers.
- Long-context behaviour is good but not best-in-class. 128k context is plenty for most workloads; teams who have moved to the 1M-token closed-frontier window for repo-wide refactors will not find V4 a clean substitute on those workloads.
The DeepSeek licence is open-weight, not OSI open-source. Use-case restrictions, redistribution conditions and indemnity clauses differ materially from Apache 2.0. If your procurement process maps "open source" to "Apache or MIT", brief your legal counsel before you ship V4 in a regulated workload — the EU AI Act, UK procurement rules and most large-enterprise vendor questionnaires now expect a precise answer to "what licence governs the model weights you are deploying?".
Production patterns — when V4 actually fits
Two scenarios where we think V4 has a clean answer today.
Scenario 1 — High-volume classification or extraction
A Bengaluru e-commerce team running entity extraction across millions of seller-product descriptions daily. Quality requirement is "good enough to feed a downstream ranker"; cost is the deciding constraint. V4 Flash, deployed self-hosted on AWS Mumbai with a steady workload to amortise GPU cost, can come in materially cheaper than the closed-frontier hosted alternatives at the same throughput. The job tolerates the patchier safety alignment because it is operating on benign inputs in a tight schema.
Scenario 2 — Long-document analytical summaries
A London compliance function running first-pass summaries on tender documents and policy responses. V4 Pro's 128k window covers the documents; the reasoning quality is good enough for a draft that a human reviews. Cost matters because the team runs hundreds of these a week. The catch — and it is a real one — is the EU AI Act and UK procurement questions: the team has to be confident that the model card, evaluation report and licence terms satisfy their legal review. A well-organised compliance team can clear this; a stretched one will struggle.
"We swapped V4 Flash in for our product-tagging pipeline last week. Same throughput, same precision in our spot-checks, almost a third of the previous monthly bill. We kept Claude on the customer-facing assistant because tool-use reliability matters more there. The honest answer is — different models for different layers."
— Aarav, Verified Builder · Mumbai, INThe geopolitics question
We will not pretend this part is uncomplicated. DeepSeek is a Chinese lab, and an open-weight release at this capability level lands in the middle of an active policy debate in the EU, the UK and India. The EU AI Act places obligations on the deploying organisation, not the model author, which means a UK or German enterprise running V4 takes on the transparency, evaluation and incident-reporting duties — the open-weight licence does not exempt them. India's emerging digital-public-infrastructure procurement guidance and the UK's frontier-AI regulator both treat country of origin as a relevant factor in procurement risk assessments. Builders should assume that defending a V4 deployment in a regulated context will require a written evaluation, a model card, a data-flow diagram and a clear answer to "where are weights stored and where does inference run?". For most Indian and UK teams the practical answer is "self-hosted in AWS Mumbai or London, weights at rest in our VPC" — that mitigates a lot, but not everything. We are deliberately not taking a position here; we are flagging that the conversation is real, will reach your procurement function, and is not solved by a one-line policy.
Want to discuss this with other verified Builders?
Every article on AI Tech Connect is written by a Verified Builder. Browse profiles, shortlist who you want to hire or collaborate with.
Browse Builders →So — should you migrate?
Per workload type, our current read:
- High-volume classification, extraction, ranking — yes, evaluate V4 Flash now. The cost delta is large and the quality is sufficient for these tasks.
- Long-document summarisation and analysis — yes, evaluate V4 Pro. Confirm your compliance posture before going beyond a pilot.
- Customer-facing assistants with strict tool use — not yet. The closed-frontier incumbents still have the edge on reliability under adversarial inputs.
- Repo-wide engineering agents at long context — not yet. The 1M-token closed-frontier window remains differentiated for this pattern.
- Regulated, high-risk decisions (credit, healthcare, legal) — only with a full internal evaluation, model-card review and red-team pass. The patchier safety alignment is a real consideration, and the EU AI Act obligations sit on the deployer.
The honest summary is the one most engineering teams will arrive at independently. V4 is not a wholesale replacement for the closed-frontier incumbents — but it is a credible second model in a multi-model stack, and for cost-sensitive workloads it changes the maths substantially. Run the 48-hour shadow comparison, brief your legal counsel on the licence, model the prompt-cache economics, and decide one workload at a time.
Source notes — DeepSeek's launch announcement covered by news9live.com (24 April 2026), MIT Technology Review's "Three reasons why DeepSeek's new model matters" (24 April 2026), and the third-party benchmark tracker at llm-stats.com.