Nvidia B100, B200, GB200 – COGS, Pricing, Margins, Ramp – Oberon, Umbriel, Miranda

Nvidia announced their new generation of Blackwell GPUs at GTC. We eagerly await the full architecture white paper to be released to detail the the much needed improvements to the tensor memory accelerator and exact implementation of new MX number formats, discussed here.

We discussed many of the high level features of the architecture such as process node, package design, HBM capacity, SerDes speeds, here, but let’s dive a bit deeper into the systems, ramp, pricing, margins, and Jensen’s Benevolence.

Oct 10, 2023

Nvidia’s Plans To Crush Competition – B100, “X100”, H200, 224G SerDes, OCS, CPO, PCIe 7.0, HBM3E

Dylan Patel, Myron Xie

Nvidia is on top of the world. They have supreme pricing power right now, despite hyperscaler silicon ramping. Everyone simply has to take what Nvidia is feeding them with a silver spoon. The number one example of this is with the H100, which has a gross margin exceeding 85%. The advantage in performance and TCO continues to hold true because the B100 curb stomps the MI300X, Gaudi 3, and internal hyperscaler chips (besides the Google TPU).

For subscribers will be the ramp, pricing, and the new B100, B200, and GB200. The pricing will come as a surprise to many, and as such we like to say the B stands for Benevolence, not Blackwell, because the graciousness of our lord and savior Jensen Huang is smiling upon the world, particularly the GPU-poor.

B100 / B200 Configuration

As previously discussed, Blackwell has 2 reticle-sized GPU dies. The GPU compute dies will remain on 4nm like Hopper, so the first time Nvidia is not opting for a node transition for their datacenter GPUs. This is quite noteworthy as Nvidia has shipped roughly ~800mm²+ for the V100, A100, and H100. Now instead of being able to shrink process nodes for a larger transistor budget, they have to double the amount of silicon. This is due to issues with TSMC’s original 3nm, N3B.

Dec 21, 2022

TSMC’s 3nm Conundrum, Does It Even Make Sense? – N3 & N3E Process Technology & Cost Detailed

Dylan Patel, Afzal Ahmad

In addition, there is up to 8 stacks of 8-hi HBM3E with up to 192GB capacity. Both SK Hynix and Micron are suppliers with the vast majority being SK Hynix. This is a change from SK Hynix being the sole supplier for the H100 ramp. Samsung continues to be the laggard despite their announcements of developing “the world’s fastest” HBM3E. Samsung loves press releases, but they are currently facing major challenges in qualifications.

The trend in GPU roadmaps is more silicon (for both logic and memory) in a bigger package and silicon-based interposers are hitting their limit in terms of size. The increased size makes silicon much harder to handle which kills yields. The B100 package is much larger, and as such it will be the first major high volume product utilizing CoWoS-L. CoWoS-L is an organic RDL substrate with passive silicon bridges.

The first version of Blackwell, B100, codename Umbriel, is for time to market, it keeps PCIe Gen 5, 400G networking, etc. In fact, the B100 air-cooled 700W can even slide into existing servers that accept the H100 and H200 baseboards with nearly no modifications. Despite this NVLink speeds inside the box are doubled.

The B200 will follow shortly after, with a higher power limit of 1,000W. This version requires a redesign of servers. Based on checks from our newest analyst in Taiwan, Chaolien, the 1,000W version can still be air-cooled, which will come as a surprise to many. Both these keep PCIe Gen 5 and 3.2T networking per server

For the standard GPU only product, Miranda comes after Umbriel. Miranda has PCIe Gen 6 and up to 800G networking enabled, with 6.4T per server. It has up to 192GB on the roadmap. However, Nvidia has already bought up all the supply of 36GB HBM which SK Hynix and Micron are ramping early next year. This means that there can be a refresh that actually goes up to 288GB per GPU.

The product everyone in the supply chain is buzzing for is the Oberon GB200 platform. We will discuss that pricing, COGS, margins, ramp, and Jensen’s Benevolence for subscribers.

The GB200, codenamed Oberon and Bianca solves many of the issues from the GH200.

First off, it halves the number of CPUs that are required, so that greatly improves TCO. Currently, GH200 is just far too expensive, and most large scale AI workloads do not require the big Nvidia tax for Grace. After all, Grace in GH200 for most models simply acts as the most expensive memory controller in the world. Basically, every major firm involved with AI has ran the TCO calculation and chose to purchase more GPUs, and not buy the CPUs from Nvidia.

This potentially changes with GB200. Nvidia is halving the number of CPUs to GPUs, which is critical for TCO. On the flipside, Nvidia is trying to sell integrated racks with liquid cooling. They do not want to sell the GPU compute trays and NVSwitch trays separately. This is something hyperscalers are probably not happy with, but necessary due to the level of integration required with the switch backplane, power busbar, and cooling. The CPUs, GPUs, NVSwitches, and ConnectX7 NICs are all watercooled.

There are two versions of the rack here. One is 120kW, which is an incredible amount of power for 72 GPUs. It has 10 compute trays on top, 9 NVSwitch trays in the middle, and 8 compute trays on the bottom. Each compute tray contains 4 GPUs, 4 NICs, 2 CPUs, 768GB of HBM, and 1,024GB of LPDDR5X. This version is actually later though.

Nvidia has also opted for a lower power rack that has 36 GPUs and 9 compute trays, which can have NVLink scale up connect across two racks. This is likely much easier for most firms to deploy. Nvidia has told the primary ODM for these racks to prepare capacity for 50,000+ racks next year. A gargantuan volume of CPU+GPU hybrid versus the miniscule <1% of GPUs shipped with GH200.

COGS & Pricing

Despite the huge improvements in TCO the B100 offers to end users, the cost to manufacture more than doubles. Contact us for our full BOM model which breaks out COGS by GPU SKU Nvidia will offer. This includes wafer pricing, Bose Einstein yield model, recoverable/repairable area, die cost, CoWoS-L packaging, ABF substrate, assembly, test, HBM, SLT, and much more.

Our understanding is that the ASP for the B100 baseboard will come in at only ~$240,000 per GPU baseboard for high volume hyperscaler level pricing. This is a surprise for many as this is much less than the expected 50% increase in ASP gen on gen, which was the expectation.

With the B100 still offering tremendous increase in performance and TCO, this raises the question why? The simple answer is competition is emerging. The AMD MI300 has emerged as a credible competitor for inference use cases although AMD had to pull out all stops in terms of taking big technology risks and pricing aggressively. AMD secured major orders from Nvidia’s two largest customers, Meta and Microsoft.

On the custom front all of Nvidia’s major customers are designing their own chips. Only Google have been successful to date, but Amazon continues to ramp Inferentia and Trainium, even though the current generation is not great, Meta is betting big on MTIA long term, and Microsoft is starting their silicon journey as well. As the hyperscalers dramatically increase their capital expenditure to defend their moats in the Gen AI world, the more hyperscaler dollars are ending up as Nvidia’s gross profits. There is an extreme sense of urgency to find alternatives.

Our accelerator model shares units, volumes, and ASP for all 8 SKUs of the Blackwell family. The initial B100 is of course the lowest of the bunch.

Margin Impact – Nvidia’s Benevolence

With production cost doubling, but ASP only increasing a fraction of that, it’s clear that Nvidia’s margins on the B100 won’t be as good as the H100.

Therefore we believe Nvidia’s margins have peaked. We expect B100 and future families to have slightly lower margins, but furthermore, over the next few quarters H100 margins will also come down due to the H200 and H20.

The H200 will be the same ASP as H100 but with significantly more HBM, adding to the BOM. The H20 is even worse from a financial perspective: Nvidia is offering it for significantly lower prices as it is shipping with far fewer FLOPS, even though it is the same GPU silicon as the H100. Furthermore, the HBM capacity is increased from 80GB to 96GB so the overall BOM cost actually increases here.

Nvidia decided it would be difficult to justify for Chinese customers to pay more for significantly fewer FLOPs, as they also have to compete with massive reimportation schemes of H100/H200 amounting to hundreds of thousands of GPUs. Nvidia would prefer China buys the China specific GPUs as they have no way of tracking or preventing the reimportation schemes. If they can sell more H20 and reduce the reimportation volumes, they probably draw less ire from the US Government.

While these margin decreases sounds dramatic, those who care about the finances can breathe a sigh of relief as the financial impact is far from disastrous, gross margins are only going down a few points from here, due to the product markup and gross margin being already so high in the first place.

Source: SemiAnalysis, Nvidia company filings

Our take is simple, Nvidia cares more about gross profit and market share than gross margin. Obsessively worrying about a few percentage points is what bean counters do, not visionaries like Jensen Huang, who want to rule the world as benevolent compute dictators.

Some of this margin deterioration is blunted though due to the mix of datacenter revenue continuing to soar. Furthermore, Nvidia is diversifying their supply chain from 800G transceivers to power delivery components, which helps blunt the margin decrease as well.

Despite this, management guided to lower margins for the fiscal year compared to the previous 2 quarters.

GAAP and non-GAAP gross margins are expected to be 76.3% and 77%, respectively, plus or minus 50 basis points. Similar to Q4, Q1 gross margins are benefiting from favourable component costs. Beyond Q1, for the remainder of the year, we expect gross margins to return to the mid-70s percent range.…

We highlighted in our opening remarks, really about our Q4 results and our outlook for Q1. Both of those quarters are unique. Those 2 quarters are unique in their gross margin as they include some benefit from favourable component cost in the supply chain kind of across both our compute and networking, and also in several different stages of our manufacturing process. So looking forward, we have visibility into a mid-70s gross margin for the rest of the fiscal year, taking us back to where we were before this Q4 and Q1 peak that we’ve had here. So we’re really looking at just a balance of our mix. Mix is always going to be our largest driver of what we will be shipping for the rest of the year and those are really just the drivers.

Colette Kress (Q4 FY2024 earnings call)

However, on the other hand there could be larger qualitative impact on the narrative around Nvidia. Nvidia is perceived as the dominant player in accelerated AI compute and rightly so. One of the bear cases for Nvidia is competition: that other players will chip away at their software moat and then Nvidia will no longer be able to enjoy their current levels of profitability, which have never been seen before from physical hardware.

Nvidia starting to take a margin hit is the right defensive move as it tries to take away the oxygen from some initially promising attempts from AMD and hyperscaler in-house silicon trying to break into the market. However, some people may perceive this as worrying that Nvidia feels the need for defense as the moat is no longer impenetrable.

Keep Reading

Comments

Reece

March 18, 2024

Thank you so much for all the new and excellent content Dylan! I could not be happier as a new subscriber!

Reply
Alan Keizer

March 18, 2024

Great preconference briefing! Looking forward to post-GTC reports.

Reply
EC

March 18, 2024

Great preview. You guys do a hellofa job.

Reply
GFDZ

March 18, 2024

Hi Dylan how can I get the full BOM model, breaking out COGS by GPU SKU Nvidia will offer, thank you!

Reply
1. Dylan Patel
  
  March 18, 2024
  
  Send us an email. Pricing depends on who the company is
  
  Reply
Ravee Mehta

March 18, 2024

Where do u think margins settle in at?

Reply
Frank

March 18, 2024

Awesome work! Excited to be at GTC and this is amazing context.

Reply
DL

March 24, 2024

Great stuff as always, how much revenue will the GB200 NVL72 system be per rack for NVDA? Any rough ballpark estimates you can share?

Reply
1. Dylan Patel
  
  March 24, 2024
  
  Feel free to purchase the accelerator model. We have by sku estimates
  
  Reply
  1. DL
    
    March 24, 2024
    
    let me know how I can do that, thx
    
    Reply
    1. Dylan Patel
      
      March 24, 2024
      
      Email Myron at SemiAnalysis dot com
      
      Reply
Abdoullah Sardi

March 25, 2024

Hello Dylan and the team, Many thanks for the wonderful work as always.

Quick question, you said previously that Nvidia was able to grasp nearly all the capacity for HBM3e from Micron and SK Hynix, how can AMD compete if there’s no HBM available?

Also, do you have any spicy information on what would be AMD’s reponse for their upcoming AI Chip?

Thanks!

Reply
1. Dylan Patel
  
  March 25, 2024
  
  https://semianalysis.com/accelerator-model
  
  Reply
Jure

March 25, 2024

Nice write-up. Could you further comment on where does B100’s technical advantage over Gaudi 3 lies? From what I could see, in some tests Gaudi 2 was already close to H100 and that was 7nm (Gaudi) vs 5nm (H100). Gaudi 3 is supposed to have 4x fp16, which is what counts for training. We have no data regarding lower precision capability of Gaudi 3, but maybe that’s where Nvidia is better?

Reply
1. Dylan Patel
  
  March 25, 2024
  
  Raw specs are trivial
  
  Reply

Leave a Reply Cancel reply

No results found

Filter options

Filter