Is Gemini 3.2 Flash pricing confirmed?

Not officially. The $0.25/M input and $2.00/M output figures come from API metadata surfaced during the staged rollout on 5 May 2026. Google has not published an official pricing page for Gemini 3.2 Flash as of 12 May 2026. Expect confirmation at Google I/O on 19–20 May.

How does Gemini 3.2 Flash compare to Gemini 3.1 Flash?

Gemini 3.1 Flash was priced at $0.50/M input and $3.00/M output. If the 3.2 Flash figures hold, that is a 50% reduction in both input and output cost, alongside reported speed improvements observed in community benchmarks. Official benchmark comparisons have not been published by Google yet.

Where can I access Gemini 3.2 Flash today?

Gemini 3.2 Flash has been observed in the iOS Gemini app and Google AI Studio via a staged rollout. Vertex AI availability is expected but unconfirmed as of 12 May 2026. Check Google AI Studio's model selector for the latest availability.

What is expected at Google I/O on 19–20 May?

Google I/O 2026 (19–20 May) is widely expected to bring the full Gemini 4 announcement, with reported features including 10M+ token context and native multimodal output (image and video generation). Official pricing for Gemini 3.2 Flash and any Gemini 4 tiers will almost certainly be confirmed at the event.

Should I migrate from Gemini 3.1 Flash to 3.2 Flash now?

If the pricing holds, switching to Gemini 3.2 Flash at current reported rates offers a meaningful cost reduction with no reported capability regression. The practical advice is to run a small-scale A/B test in Google AI Studio before committing production workloads — wait for official pricing confirmation at Google I/O if your budget planning requires certainty.

Gemini 3.2 Flash: $0.25/M Tokens, Faster Than 3.1 Pro

What we know: the 5 May staged rollout

On 5 May 2026, users of the iOS Gemini app began reporting a new model appearing in the model selector labelled "Gemini 3.2 Flash". The same model surfaced in Google AI Studio within hours. Google made no public announcement — no blog post, no changelog entry, no social media post. The rollout appears to have been a staged deployment rather than a formal launch.

Community testers on X and Reddit immediately began probing the model's capabilities. The headline observations:

Noticeably faster time-to-first-token than Gemini 3.1 Pro
Stronger performance on coding tasks — particularly SVG generation, ASCII art, and interactive HTML/JS generation — than 3.1 Pro in informal head-to-heads
Context window appears to be at least 1 million tokens (consistent with the Gemini 3.x family), though Google has not confirmed the exact limit
Multimodal input (images, documents) confirmed working; video and audio input status is unconfirmed

Caveat

No official MMLU, HumanEval, SWE-bench, or GPQA benchmarks have been published by Google for Gemini 3.2 Flash as of 12 May 2026. All performance claims above come from community testing, not controlled evaluation. Do not use these observations as the basis for production migration decisions without your own evals.

The pricing signal — and why it matters

The figure attracting the most attention is the reported input pricing of $0.25 per million tokens. This comes from API metadata surfaced during the staged rollout, not an official Google pricing page. The reported output price is $2.00 per million tokens.

If confirmed, these figures represent a significant reduction from the current Gemini 3.1 Flash pricing ($0.50/M input, $3.00/M output) and place Gemini 3.2 Flash at the same input price point as Gemini 3.1 Flash-Lite — but with substantially higher reported capability.

How Gemini 3.2 Flash compares on price

Model	Input ($/MTok)	Output ($/MTok)	Status
Gemini 3.2 Flash	$0.25 (reported)	$2.00 (reported)	Unconfirmed — Google I/O pending
Gemini 3.1 Flash-Lite	$0.25	$1.50	Published (Google AI pricing page)
Gemini 3.1 Flash	$0.50	$3.00	Published
Claude Haiku 4.5	$1.00	$5.00	Published (Anthropic pricing page)
GPT-5.4 mini	$0.75	$4.50	Published (OpenAI pricing page)

At the reported price, Gemini 3.2 Flash would be 66% cheaper per input token than Claude Haiku 4.5 and 67% cheaper than GPT-5.4 mini. For high-volume production workloads — document classification, structured extraction, summarisation pipelines — this is a material cost difference that compounds quickly at scale.

Builder angle

If your monthly input volume is above 50 billion tokens, the difference between $0.25/M and $1.00/M is $37,500 per month — enough to fund a junior engineer's salary. Do not wait for Google I/O to start your evaluation; run it now so you can act immediately when pricing is confirmed.

Where it is available today

As of 12 May 2026, Gemini 3.2 Flash has been confirmed available in:

iOS Gemini app — accessible via the model selector for Gemini Advanced subscribers
Google AI Studio — available in the model dropdown; API key access works for developers

Vertex AI availability has not been confirmed. For builders on Google Cloud who use Vertex AI for production deployments, the standard path is to test in AI Studio first and then expect Vertex availability to follow within days to weeks of the I/O announcement. Check the Vertex AI model garden for availability updates.

What to expect at Google I/O: 19–20 May

Google I/O 2026 runs on 19–20 May. Based on pre-event signals and the Gemini 3.2 Flash staged rollout pattern, the most likely announcements include:

Official Gemini 3.2 Flash pricing and GA — the staged rollout almost certainly precedes an I/O stage announcement
Gemini 4 — widely reported to feature a 10 million+ token context window and native multimodal output (image and video generation built into the base model, not via separate APIs)
Gemini 4 Flash and Ultra tiers — the standard three-tier release pattern (Ultra/Pro/Flash) is expected to continue
Vertex AI model availability — enterprise GA for both Gemini 3.2 Flash and Gemini 4 tiers expected at or shortly after I/O

The Gemini 3.1 Ultra launch earlier this year already demonstrated Google's willingness to push context windows well beyond what competitors offer. Gemini 4 at 10M+ tokens would be a qualitatively different capability — enabling whole-codebase analysis, legal corpus review, and multi-month correspondence threading in a single context.

Implications for Indian and UK builders on Google Cloud

Google Cloud is the dominant cloud provider for a large segment of Indian AI startups, particularly those that grew up on Firebase and have deep GSuite integrations. Gemini 3.2 Flash's cost position is directly relevant to this cohort.

For Indian builders, the cost arithmetic is amplified by rupee-denominated revenue. A product monetised in INR that pays USD inference costs sees the full pricing delta in margin compression. Moving from a $1.00/M model to a $0.25/M model at the same output quality effectively quadruples the margin at the inference layer — significant for any product in the 10–50M token/month range that is not yet at the scale where self-hosting makes sense.

UK startups on Google Cloud and Vertex AI face a different calculus: GDPR and UK GDPR data-residency requirements constrain which Vertex AI regions can be used. The EU (Belgium, Frankfurt) and UK South regions are available on Vertex AI; confirm that Gemini 3.2 Flash will be available in those regions before committing to a migration plan.

Migration checklist

Before switching from Gemini 3.1 Flash to 3.2 Flash in production: (1) Run your existing eval suite on 3.2 Flash in AI Studio. (2) Check your Vertex AI region for availability. (3) Confirm the output pricing — the reported $2.00/M output is higher relative to input than 3.1 Flash-Lite; output-heavy workloads may see a smaller overall saving than input-heavy ones. (4) Wait for the official pricing page before updating your cost model.

The broader context: Google's Flash-class model cadence

The Gemini Flash line has become Google's most commercially important model tier. Flash-class models account for the majority of Google AI Studio API calls and Vertex AI token volume, precisely because they sit at the intersection of cost, speed, and capability that production workloads require.

The no-announcement staged rollout for Gemini 3.2 Flash follows the same pattern as Gemini 2.5 Flash-Lite earlier this year — Google appears to be treating Flash-tier releases as continuous improvements rather than discrete product launches, reserving the stage time at I/O for the headline Gemini 4 announcement.

This matters for builders who track the model landscape via official changelogs: if you are not also monitoring community reports in Google AI Studio forums, the Gemini Discord, and developer X accounts, you are likely discovering new Google model releases 1–2 weeks late. For a model with this pricing profile, two weeks of delay on adoption is real money left on the table.

For a broader view of where inference costs are heading across the industry, see our analysis of AI inference economics in 2026. For a comparison of how Gemini Flash sits relative to the OpenAI model family, see our GPT-5.5 launch coverage.

Gemini 3.2 Flash quietly lands at a reported $0.25/M tokens — and it is apparently faster than 3.1 Pro

What we know: the 5 May staged rollout

The pricing signal — and why it matters

How Gemini 3.2 Flash compares on price

Where it is available today

What to expect at Google I/O: 19–20 May

Implications for Indian and UK builders on Google Cloud

The broader context: Google's Flash-class model cadence

Building on Gemini or Vertex AI?

Frequently asked

Tracking the model landscape for your team?