At a glance: what changed this month
- Crusoe MI300X is now $1.71/GPU-hr on published pricing — the lowest rate from any major provider as of May 2026.
- CoreWeave's effective H100 rate is about $6.16/GPU-hr once you normalise its unbundled 8-GPU HGX nodes (GPU + vCPU + RAM + storage all priced separately).
- Lambda H100 SXM5 sits at $2.99–$3.79/hr depending on configuration and is bundled — closer to what most builders expect to see on an invoice.
- The 192GB HBM3 on MI300X is the unsung hero — Llama-3.3-70B fits on a single device with KV cache headroom, so tensor-parallel splits and their NCCL/RCCL overhead can disappear from your serving topology.
- SemiAnalysis ClusterMAX 2.0 still gives only CoreWeave the Platinum tier; Crusoe holds Gold. Price advantage does not automatically buy you operational maturity.
Before you pivot anything, normalise every quote to cost-per-GPU-hour-all-in. Unbundled providers like CoreWeave look brutally cheap on the GPU line, then add CPU, RAM and NVMe back in. Lambda's all-in pricing is honest about this from the start. Always re-derive the cost on your own spreadsheet — don't trust the headline.
The price table — May 2026, normalised per GPU-hour
The single most important thing to do this week is build your own version of this table for your specific workload. Below is the published-list comparison; your real number changes once egress, reserved-capacity discounts and regional surcharges land.
| Provider | GPU | $/GPU-hr (list) | Bundled? | ClusterMAX tier |
|---|---|---|---|---|
| Crusoe | AMD MI300X (192GB) | $1.71 | Bundled | Gold |
| Lambda | NVIDIA H100 SXM5 | $2.99–$3.79 | Bundled | — |
| CoreWeave | NVIDIA H100 HGX (8-GPU) | ~$6.16 (normalised) | Unbundled | Platinum |
| CoreWeave | NVIDIA A100 80GB | ~$2.70 | Unbundled | Platinum |
| AWS (London) | NVIDIA H100 (p5) | ~$10–$12 (eff.) | Bundled | — |
| IndiaAI Mission pool | H100-class (subsidised) | ~₹150/hr (~$1.78) | Bundled | — |
Two things jump off the page. One: at list, Crusoe MI300X is the same order of magnitude as IndiaAI's subsidised H100-class pool, but available to anyone with a credit card and no eligibility paperwork. Two: CoreWeave is roughly 40–50% under AWS for comparable instances, which is what their Platinum operational rating earns them — they aren't trying to win on sticker price.
Why MI300X changes the cost model
The headline price gets the clicks, but the architectural story is the real shift. AMD's MI300X carries 192GB of HBM3 per device. The H100 SXM5 carries 80GB. That difference cascades through your serving topology in three ways.
One — fewer devices per replica. Llama 3.3 70B in FP8 weighs roughly 70GB. On H100 you generally need a 2-way tensor-parallel split with NCCL communication on every forward pass. On MI300X it fits on a single device with around 120GB of headroom for KV cache, batched prefill and the activation graph. You delete an entire dimension of distributed-serving complexity.
Two — bigger batches, better utilisation. With 120GB of free HBM, you can run a much larger paged-attention KV cache. That lets vLLM or SGLang schedule longer queues without thrashing, and your tokens-per-second per GPU climbs accordingly. VentureBeat's reporting on the $401B AI infrastructure spend problem keeps surfacing the same uncomfortable number: typical enterprise GPU utilisation sits around 5%. The MI300X memory profile is genuinely useful for closing that gap because it lets a single GPU absorb more concurrent requests before saturating.
Three — simpler ops, fewer failure modes. Fewer GPUs per replica means fewer collective comms, fewer NCCL hangs, and a cheaper blast radius when a node dies. For builders running on platforms with thinner SRE bench depth, that operational simplification is worth almost as much as the per-hour price cut.
ROCm is the price you pay for the memory. vLLM, SGLang and TensorRT-LLM all support AMD now, but kernel maturity still trails CUDA. Expect to spend an engineer-week shaking out attention-kernel selection and quantisation paths. If your team has zero ROCm experience, budget for it before you commit to a price-pivot.
The honest comparison: H100 vs MI300X for inference
Pricing aside, throughput parity is real for most inference shapes — but the variance is wider on AMD than on NVIDIA. Here is the practical framing builders should carry into their RFQs.
- Decode-bound, memory-bound workloads (long contexts, batched chat, 70B-class models) — MI300X is competitive or better, mostly thanks to memory bandwidth and capacity.
- Prefill-heavy workloads with short outputs (RAG, embedding, classification) — H100 is still the safer pick; FP8 kernels are mature, and CoreWeave's Platinum operations matter when prefill spikes cause queue blow-ups.
- Sub-7B models in production — neither device is interesting. Use an A100 80GB at $2.70/hr on CoreWeave, or a B200 spot. MI300X memory is wasted here.
- Speculative decoding stacks (draft + target) — MI300X simplifies hosting both models on one device. Real gain for teams running their own EAGLE / Medusa setups.
For deeper context on which serving engine to pick once you've chosen the hardware, our vLLM vs SGLang vs TensorRT-LLM piece walks through the maturity matrix end-to-end.
Worked example: Llama 3.3 70B at 200 RPS
This is the maths most builders should be running this week. Assume a chatbot or agent-tier workload — 200 requests per second, mean output 250 tokens, mean input 1,200 tokens, FP8 quantised weights.
Throughput target. 200 RPS × 250 output tokens = 50,000 output tokens/second. On a single MI300X with vLLM + paged attention + continuous batching, FP8 Llama-3.3-70B realistically holds ~2,000 output tokens/sec. Round up for safety: 30 GPUs to serve the load.
Crusoe MI300X cost. 30 GPUs × $1.71/hr × 24 × 30 = $36,936/month. About £29,400 or ₹30.7 lakh.
CoreWeave H100 cost (normalised). Llama-3.3-70B in FP8 requires 2× H100 per replica (tensor-parallel-2). Per-replica throughput drops to ~1,400 output tok/sec after NCCL overhead. You need 36 H100s — that is 18 replicas, each pair-bonded. 36 × $6.16/hr × 24 × 30 = $159,667/month. About £127,000 or ₹1.33 crore.
Lambda H100 SXM5 cost. Same 36 H100s at $3.79/hr (the higher-config rate, because you'll want NVMe + high-CPU): 36 × $3.79/hr × 24 × 30 = $98,236/month. About £78,000 or ₹81.7 lakh.
The delta. Crusoe MI300X is roughly 4.3× cheaper than CoreWeave and 2.7× cheaper than Lambda at this workload shape. Even after you reserve a one-time engineering investment of, say, $15,000 to port your stack to ROCm and prove out the kernels, you break even inside the first month and bank $120,000+ every month thereafter against the CoreWeave alternative.
That is not a small saving. That is the difference between an AI feature being a margin-killer and being profitable. For a longer treatment of how to make those margins work, see our piece on AI inference costs in 2026 and building profitable products and the broader strategic context in Cerebras IPO at $56B: why OpenAI bet 750MW on wafer-scale.
Don't blindly migrate. Run a one-week shadow deployment: mirror 10% of production traffic to Crusoe MI300X, log p50/p95/p99 latency and tokens-per-second, and only cut over once your evals match. The hyperscalers and inference-cloud players we cover at DeepInfra's $107M Series B all use this pattern internally — there is no reason for a 12-person startup not to.
The India angle: how this stacks against IndiaAI and Neysa
The IndiaAI Mission's subsidised pool — roughly 34,000 GPUs at around ₹150/hour for eligible Indian startups — was meant to be the floor for compute prices in India. Crusoe's MI300X at $1.71/hr lands at approximately ₹142/hr on current FX, slightly under that floor and with 192GB HBM3 versus 80GB. The IndiaAI route still wins on data-sovereignty and grant economics; Crusoe wins on raw price and zero paperwork.
For Indian builders, the practical RFQ now needs five quotes, not three:
- IndiaAI Mission pool (if you qualify) — subsidised, on-shore, paperwork-heavy.
- E2E Networks, Yotta or Tata Cloud — on-shore, predictable, mature support, premium price.
- Neysa or another Series-B-funded Indian GPU cloud — competitive pricing, on-shore, growing operational maturity.
- Crusoe MI300X — lowest absolute price, off-shore, ROCm software burden.
- Lambda or CoreWeave — for any workload that is genuinely H100/CUDA-bound.
The UK angle: Isambard-AI, sovereign infra and the AWS London anchor
For UK builders, the picture is different. Isambard-AI in Bristol is the sovereign anchor, and the UK Sovereign AI Fund continues to commit to domestic capacity. Neither prices like a commercial cloud — both are oriented around research grants and strategic public-interest workloads. That leaves AWS London as the default for anything that needs UK-region serving with familiar enterprise SLAs, typically at $10–$12 effective per H100-hour.
Crusoe's MI300X price is roughly 6–7× cheaper than AWS London H100, but the workload has to tolerate cross-Atlantic latency for the data plane. For batch, async, and most B2B inference where p95 budgets are above ~300ms end-to-end, the maths still works. For real-time consumer apps on UK soil, AWS London or an Isambard-AI partnership is still the play.
One more option worth pricing into the RFQ: TPU. Both Anthropic and Meta have moved meaningful inference traffic to Google TPU; our TPU migration analysis shows the savings can run to 65% against H100, with the trade-off being JAX/XLA tooling instead of CUDA or ROCm.
Want to compare notes with builders running ROCm in production?
Every infra article on AI Tech Connect is written or vetted by a Verified Builder. Browse profiles, shortlist who you want to hire or pair with on your next pricing pivot.
Browse Builders →What IN + UK builders should do this month
This is the vendor RFQ checklist. Walk through it before you commit.
- Re-normalise every quote to cost-per-GPU-hour-all-in. Insist providers itemise GPU, vCPU, RAM, local NVMe, network egress and storage egress separately, then add them back together. Unbundled quotes hide 30–50% of the real cost on the line items you didn't read.
- Ask for three commit horizons — on-demand, 6-month, 12-month — and the discount curve. The headline list price is rarely what you'll pay; the discount curve is.
- Demand a 7-day evaluation window on real workload. Any provider serious about your business will hand you free or near-free GPU-hours for a shadow deployment. If they won't, that tells you everything.
- Quote ROCm engineering time honestly. A week of one senior engineer is roughly $5,000–$15,000 fully loaded. Amortise that across your projected 12-month MI300X bill before you pitch your CFO.
- Confirm region. UK GDPR-bound workloads cannot just leak to a US data centre. Crusoe has multiple regions; verify yours.
- Verify the operational tier. ClusterMAX is one signal; ask for their incident history and MTTR on the specific GPU class you want.
- Confirm spot/preemption policy. AMD spot availability is thinner than NVIDIA — if you're architecting around interruption tolerance, get the actual reclamation SLA in writing.
- Build a cost dashboard before you cut over. A Grafana panel that compares cost-per-1000-tokens across providers in real time is a 2-day build and saves arguments for years.
The strategic read
For two years the GPU market has been a one-vendor story, priced like one. Crusoe putting MI300X at $1.71/hr is the most visible sign yet that the duopoly era is opening up: AMD has the silicon, the open-source serving stacks are catching up, and at least one credible operator is willing to pass the savings to builders rather than pocket them. That changes the unit economics of every AI product that survives the cycle.
The defensive posture is also worth naming. CoreWeave's Platinum tier and AWS London's bundled SLAs aren't going anywhere — they exist for real reasons and will keep mattering for regulated industries and consumer-real-time workloads. But the centre of gravity for the next twelve months of inference-cost optimisation will be on memory-rich, AMD-friendly, off-hyperscaler capacity. Indian and UK builders who price that in now will ship products at margins their competitors cannot match.
For deeper context on where this market is going, see our coverage of Cerebras's IPO and the 750MW OpenAI bet and DeepInfra's $107M Series B. Source pricing data from ThunderCompute's cheapest cloud GPU providers roundup and their CoreWeave pricing review; cross-checked against gpu.fm's 2026 comparison, SemiAnalysis's ClusterMAX 2.0 tiering, and VentureBeat's 5% GPU utilisation report.