Does Gemini 3.5 Flash really beat Gemini 3.1 Pro?

On the benchmarks Google published — coding, agentic and multimodal — yes, 3.5 Flash scores above 3.1 Pro while serving output tokens far faster. Benchmark wins are not the same as a win on your workload, so run your own evals before switching.

When is Gemini 3.5 Pro coming out?

Google said Gemini 3.5 Pro will roll out in June 2026. It was not released at I/O 2026 on 19 May; only Gemini 3.5 Flash and the Gemini Omni video model shipped that day.

Can I use Gemini Spark in India or the UK today?

Not yet for general users. At launch Spark is in beta for trusted testers and US Google AI Ultra subscribers. A wider rollout, including India and the UK, has not been dated, so plan around the API rather than Spark for now.

Should I migrate my agents to Gemini 3.5 Flash?

Test it on long-horizon agentic and coding tasks where speed and cost matter most. If your evals confirm the quality, the faster output and lower price make it a strong default. Keep a router so you can fall back to other models per task.

Gemini 3.5 Flash and Spark: Google's I/O 2026 Play

What changed at Google I/O 2026

Gemini 3.5 Flash shipped, the Pro tier did not. Google released the Flash model on 19 May and said Gemini 3.5 Pro will follow in June 2026 — a reversal of the usual flagship-first launch order.
Flash claims to beat the previous Pro. On Google's published numbers, 3.5 Flash scores above Gemini 3.1 Pro on coding, agentic and multimodal benchmarks, while serving output tokens substantially faster.
Spark is the headline product. A 24/7 personal agent that runs on dedicated Google Cloud virtual machines, so it keeps working after you close the laptop. It is in beta for trusted testers and US Google AI Ultra subscribers.
Gemini Omni rounds out the slate. A video-generation and editing model unveiled alongside the others — a signal that Google now ships across text, agents and video in a single I/O cycle.

For builders, the interesting story is not the keynote choreography. It is that a Flash-class model — the tier you reach for when you care about latency and unit cost — is now claiming frontier-level capability. That changes the maths on a lot of production systems. We unpack what to test, what to budget for, and what Spark tells you about where consumer agents are heading. If you have been tracking the wider Android and Gemini story, our earlier piece on Google I/O 2026 and Gemini's Android intelligence push sets the broader context.

Pro tip

Do not migrate on the strength of a launch-day benchmark table. Pull your ten hardest production traces — the ones that already fail or cost too much — and replay them against Gemini 3.5 Flash before you change a single line of routing code. A vendor benchmark is a hypothesis; your traces are the test.

Gemini 3.5 Flash: a cheaper, faster frontier tier

The core claim Google made for Gemini 3.5 Flash is that it delivers frontier performance for agents and coding while sitting in the Flash tier on both speed and price. Google's own benchmark figures put 3.5 Flash ahead of Gemini 3.1 Pro on Terminal-Bench 2.1, on agentic tool-use evaluations and on multimodal understanding tasks such as chart reading. Google also describes the model as running roughly four times faster in output tokens per second than comparable frontier models, with a one-million-token context window.

On price, Google has positioned 3.5 Flash well below comparable frontier models, and described long-horizon agentic tasks as costing "often less than half" what comparable models charge. We are deliberately not quoting a single hard per-token figure as gospel here: published API pricing should be confirmed against Google's own developer pricing page before you build a budget on it, because launch-week aggregator numbers drift. Treat the cost story as directional — meaningfully cheaper than the previous Pro tier — and verify the exact rate yourself.

Here is how the tiers line up in plain terms. The numbers below are positioning, not a contract — read them as "what kind of workload each tier is for."

Tier	Status at I/O 2026	Relative speed	Relative cost	Best for
Gemini 3.1 Pro	Available (previous flagship)	Standard Pro	Higher	Deep reasoning where latency is secondary
Gemini 3.5 Flash	Shipped 19 May 2026	~4x faster output than comparable frontier models	Lower — well below comparable frontier tiers	High-volume agents, coding loops, multimodal apps
Gemini 3.5 Pro	Delayed to June 2026	To be confirmed	To be confirmed	The new top-of-stack reasoning tier

If you have followed the Gemini Flash lineage, this is a familiar trajectory. We covered the economics of the previous generation in our breakdown of Gemini 3.2 Flash and its sub-dollar token pricing, and the long-context direction in our look at Gemini 3.1 Ultra's two-million-token window and code execution. The 3.5 Flash release continues the pattern: Google keeps pushing capability down the price curve faster than most teams refresh their model choices.

What this means for your build

For an Indian SaaS team running a high-volume support agent, or a UK fintech batch-classifying documents overnight, the unit economics are the whole game. A model that is faster per token and cheaper per token, at similar quality, compounds: lower latency improves conversion on interactive features, and lower cost widens the set of features you can afford to ship at all. The honest caveat is that "beats 3.1 Pro on Google's benchmarks" is not the same as "beats your current model on your traffic." Benchmarks like Terminal-Bench measure a curated task distribution; your production distribution is messier.

Watch out

Flash-tier models are tuned for speed, and that tuning can show up as shallower reasoning on genuinely hard, multi-step problems. If your agent does long-horizon planning — chained tool calls, branching decisions, recovering from its own mistakes — test those exact paths. A model can win a coding benchmark and still drift on a forty-step workflow. Keep a model router so any single task can fall back to a heavier tier.

Gemini Spark: the 24/7 cloud agent

Spark is the more strategically interesting announcement. It is a personal AI agent that runs continuously on dedicated Google Cloud virtual machines, which means it keeps executing tasks after you close your laptop or put your phone away. It is powered by Gemini 3.5 Flash and integrates natively with Gmail, Docs, Sheets and Slides, with connections to third-party services arriving over the following months. At launch it is in beta for trusted testers and US Google AI Ultra subscribers — there is no general availability in India or the UK yet.

The architectural choice matters. Most consumer agents today are session-bound: they run while a tab is open and stop when it closes. Spark moves the agent onto a persistent cloud VM, which is the same shift production engineering teams have been making for their own agents — long-running workers, durable state, asynchronous task queues. Google packaging that as a consumer product is a signal about where the category is going.

From a verified Builder

"We have been running our own persistent-agent infrastructure for clients for a year — durable VMs, a task queue, an approval gate before anything spends money. Seeing Google ship the same shape to consumers tells me the pattern is now table stakes. The differentiator is no longer 'we have an always-on agent' — it is how well you handle the boring parts: approvals, audit logs, and graceful failure."

— Anjali, Verified Builder · Bengaluru, IN

What builders should do about Spark

Spark is not yet something most builders can use directly, but it is something to design around. Three concrete moves:

Decide whether you compete or complement. If your product is a thin agent wrapper over email and documents, Spark is a direct threat. If you serve a regulated workflow, a niche vertical, or a market Spark has not reached, you have room — but you should be explicit about what you do that a general consumer agent will not.
Adopt the persistent-agent pattern now. Session-bound agents will start to feel dated. Move long-running work onto durable infrastructure with checkpointed state, so a task survives a dropped connection or a restart.
Build the approval layer first, not last. Spark asks before high-stakes actions such as spending money or sending external email. That is the part users actually judge an agent on. A clear, auditable approval gate is a feature, not overhead — and for Indian and UK builders working under DPDP and UK data rules, it is also a compliance asset.

Want to discuss this with other verified Builders?

Every article on AI Tech Connect is written by a Verified Builder. Browse profiles, shortlist who you want to hire or collaborate with.

Browse Builders →

The dual-market read: cost, agents and timing

For builders in India and the UK alike, the practical takeaway from I/O 2026 is about timing and discipline rather than hype. A cheaper, faster frontier tier is genuinely useful — but only if you actually capture the saving. Teams that route every request to one model and never revisit the choice will leave the gain on the table. Teams that maintain a routing layer, run regular evals, and re-benchmark when a new tier ships will compound it.

For Indian teams, where margin pressure on AI features is acute and many products serve price-sensitive users, a model that meaningfully lowers cost per task can be the difference between a feature being viable and being cut. For UK teams, often selling into enterprise and public-sector buyers, the latency improvement and the maturing agent story matter for procurement conversations — buyers increasingly ask not just "what can it do" but "how fast, how cheaply, and with what audit trail." Globally, the message is the same: the frontier is moving down the price curve, and the winners are the teams set up to notice.

On Spark specifically, the honest position for builders outside the US is patience plus preparation. You cannot ship on it today, so do not plan a roadmap around it. But you can adopt its architecture — persistent VMs, durable state, an explicit approval gate — so that when comparable capability reaches your market, your product already works that way. The teams that treated last year's agent frameworks as a preview, and built accordingly, are the ones shipping confidently now.

Primary coverage of the keynote is worth reading in full: CNBC's report on Gemini 3.5 and Gemini Spark, the Google blog post on the Gemini 3.5 model family, and 9to5Google's I/O 2026 coverage. Confirm any per-token pricing against Google's own developer pricing page before you build a budget on it.

So — what should you do this week?

Three actions, in order:

Replay your hardest traces against Gemini 3.5 Flash. Use real production failures, not the vendor's benchmark set. If the quality holds and the cost drops, that is a confirmed win.
Audit your routing. If you have no model router, build one. A new frontier tier every quarter is now the norm, and you want switching to be a config change, not a rewrite.
Prototype the persistent-agent pattern. Even a small durable worker with checkpointed state and an approval gate teaches you what Spark-style products demand — well before that capability is generally available in your market.

The pattern across this I/O is clear enough: capability keeps getting cheaper and faster, and agents keep getting more autonomous. Builders who treat each release as a prompt to re-test and re-route — rather than a headline to react to — are the ones who keep their products both competitive and affordable.

Gemini 3.5 Flash and Spark: what Google's I/O 2026 play means for Indian and UK builders