AMD's Ryzen 9000 won't beat the previous-gen X3D models in gaming, but they'll be close — improved 3D V-Cache coming, too

Donny Woligroski
(Image credit: LinkedIn)

We had a chance to catch up with AMD's Senior Technical Marketing Manager of Consumer Processors, Donny Woligroski, during Computex 2024 to discuss the latest news surrounding the company's Zen 5 Ryzen 9000 announcements at the show. Woligroski told us that while the Ryzen 9000 chips won't beat the company's existing specialized Ryzen 7000X3D chips in gaming — they're currently among the best CPUs for gaming — the difference between the chips will be closer than we've seen in the past. AMD is also working on an improved version of its 3D V-Cache tech, and Woligroski explained why the new Ryzen 9000 chips mark a big step forward even though they have the same core counts and largely the same boost frequencies as their predecessors.

In its press release, AMD billed the flagship Ryzen 9950X as the "fastest consumer desktop performance in the world," but notably didn't claim it's the fastest gaming chip on the market despite its benchmarks showing the 9950X beating Intel's flagship Core i9-14900K by ~11% on average in gaming. However, the company didn't compare it with its own Ryzen 7 7800X3D, which leads our CPU benchmark hierarchy in gaming performance.

I asked Woligroski if the 9950X would take the crown of the fastest gaming chip on the market. "Is it the fastest in gaming? It's faster than the competition in our tests. X3D is still the king of the hill, but by a much smaller margin than typically between X3D and non-X3D," Woligroski responded. "So a 7800X3D would, yes, be faster than 9700X, but maybe not by as much as you would expect."

This isn't the first time we've seen AMD's specialized, ultra-powerful X3D gaming chips retain their lead over a newer generation of AMD's chips: In our testing, despite having the newer Zen 4 architecture, the fastest Ryzen 7000 chip, the Ryzen 9 7950X, trailed the previous-gen Zen 3 Ryzen 7 5800X3D by about 8%. AMD didn't beat its own watermark until the new Ryzen 7 7800X3D launched a year later.

Woligroski points to a slimmer margin between the X3D and non-X3D chips this time around, an improvement likely borne of Zen 5's impressive 16% IPC gain, faster L1 and L2 caches, and better boost frequencies. We'll also see much faster performance in productivity workloads with Ryzen 9000 over the X3D chips.

The Ryzen 7 7800X3D's second-gen 3D V-Cache took gaming performance to a whole new level — it's ~30% faster in gaming than the fastest standard Ryzen 7000 processor. We'll have to wait for our own testing to see how that pans out with newer models, but AMD clearly has plans for its new 3D V-Cache engine that powers the X3D chips.

AMD revamps its game-boosting 3D V-Cache tech

AMD 3D V-Cache illustration for Ryzen 7000-series. (Image credit: AMD)

Even though the standard Ryzen models will be closer than ever to the previous-gen X3D chips, Woligroski also teased that the next-gen 3D V-Cache implementation on its X3D chips will also see marked improvements.

"And then when it comes to X3D, and I'll just get around that now, we're super committed to X3D. In fact, we have some really, really cool updates to X3D coming. So we're working on iterating and not just rehashing it," said Woligroski.

We don't know the details yet, but there are multiple steps that AMD could take to improve 3D V-Cache. For instance, for two generations, AMD's L3 cache chiplet has used a density-optimized version of the 7nm node. Moving to a newer process node, like 6nm or maybe even 5nm, could enable AMD to cram in even more L3 cache capacity.

The L3 chiplet also rides on top of the CPU die, which presents thermal challenges that lead to reduced performance in some standard productivity workloads. A newer, thinner die design could allow the company to reduce the thermal overhead of the L3 chiplet, thus allowing an X3D chip to perform more like a regular non-X3D model in standard work.

If AMD addresses the thermal constraints, it could also put L3 dies on both CPU dies. The current 12-core, 7900X3D and 16-core, 7950X3D only have cache stacked on one of the two CCD chiplets. That allows the other chiplet to reach higher boost clocks, but doubling the added cache while keeping higher clocks would potentially be even better. Naturally, cost would be the deciding factor, as the X3D tech does come at a premium.

AMD's second-gen 3D V-Cache improved on the first generation by increasing L3 chiplet data throughput from 2 TB/s to 2.5 TB/s, but it still used the same hybrid bonding approach with the same 9-micron TSV pitch as the previous-gen. The pitch is exceedingly important as it measures the density of the TSVs that connect the L3 chiplet to the CPU die, and moving to a smaller pitch (TSMC currently offers 6-micron SoIC-X pitches) could allow AMD to cram more connections into the same area, thus improving bandwidth and performance by a much larger factor.

AMD could also include an additional L2 cache on the chiplet, but given the state of today's technology, it's unclear if that is possible. I recently spoke with Sam Naffziger, Senior Vice President, AMD Corporate Fellow, and Product Technology Architect at AMD, and asked if AMD was considering stacking L1 and L2 caches.

"Absolutely, if you get to finer-grain 3D interconnect. So we're at 9-micron through silicon via (TSV) pitches today. As you go down to, you know, 6-, 3-, 2- micron and even lower, the level of partitioning can become much finer-grained," Nafzigger said. (It's noteworthy that he didn't define the specific pitch required for an L2 cache, so it isn't clear if this is possible yet.) AMD is even considering adding larger CPU register files to chiplets, but Naffziger said today's hybrid bond pitches can't support the needed bandwidth (the tech is on imec and foundry roadmaps, though).

Don't sleep on Ryzen 9000

Many were surprised that AMD's new Ryzen 9000 chips have the same core counts and cache capacities as the prior-gen models. Boost frequencies also remain the same on a few models, while others only see a slight 100 MHz improvement. AMD has also significantly reduced the base clocks by up to 700 MHz, contributing to a 40% reduction in TDP. However, the 16% IPC and doubled L1 and L2 data bandwidth, among other refinements, deliver big gen-on-gen gains that AMD says make it well worth the upgrade.

Some of the improvements aren't as readily apparent on the spec sheet. David McAfee, the Corporate VP and General Manager of the Client Channel Business at AMD, told the press during a pre-Computex keynote briefing that Ryzen 9000 has better boost residency, meaning that it stays at its boost frequency for longer than the prior-gen models:

"I think the other thing that we'll get into is the frequency residency in spite of the fact that the Fmax (maximum frequency) hasn't really changed," McAfee said. "As far as what's on the box, the frequency residency, the efficiency of the lid and thermal design in the 9000 generation gives your effective frequency a lift over the prior generation. So you actually do see a net, overall positive there with just the processor running faster with the Zen 5 architecture versus Zen 4."

"At the end of the day, we give you more performance without increasing power, and at the end of the day, we give you more performance without increasing the heat. At the end of the day, we bought a non-X3D chip very close to an X3D chip when it comes to gaming," said Wologroski.

"All of those things are pretty big differences compared to previous small steps and launches. We're going back on TDP because it turns out our eight cores are so good we don't need higher TDP, so I think it's a pretty stark comparison," he concluded.

Of course, the proof is in the shipping silicon. The Ryzen 9000 chips ship in July, and we'll put the chips through our benchmarking wringer then.  

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • redgarl
    It is easier to try to extrapolate the performance metrics with AMD IPC gaming chart.

    From the Horizon benchmarks, the 9950x would be 14% faster than the 7800x3D...
    From the F1 2023 benchmarks, the 9950x would be 16% faster than the 7800x3D...
    Surprising results, while for DOTA 2 the 9950x would be 11% slower than the 7800x3D...
    Reply
  • Cooe
    "Woligroski points to a slimmer margin between the X3D and non-X3D chips this time around"

    🤦 He's referring to next-gen non-X3D vs last-gen X3D CPU's ala R7 9700X vs R7 7800X3D here, NOT current gen non-X3D vs current gen X3D ala R7 9700X vs R7 9700X3D! 😑

    He's basically just saying that the gaming performance gap between the R7 9700X will be smaller vs the R7 7800X3D than the R7 7800X was vs the R7 5800X3D. Aka instead of an -8% gap like you saw last time, maybe it's more like ≈-0-5%. In fact, I totally expect standard Zen 5 to come out on top in many people's reviews simply based on differences in game selection.

    As far as the gaming performance gains with Zen 5 X3D vs regular Zen 5 goes, I'm expecting even LARGER performance gains than what we saw last time thanks to the upcoming new 3rd Gen 3D V Cache along with whatever "new surprises" they've been very loudly hinting at. Each generation of 3D V Cache has been more performant than the last, and I fully expect this generation will be no different. 🤷

    I also expect the frequency regression gap between X3D vs non-X3D this generation to be the smallest we've yet seen. Small enough to put 3D V Cache on BOTH CCD's for the Ryzen 9 parts? Probably not, but it should still close the gap by a notable amount.
    Reply
  • Cooe
    "Moving to a newer process node, like 6nm or maybe even 5nm, could enable AMD to cram in even more L3 cache capacity."

    🤦 Come on Paul!!! You've been in the game WAAAAAAY too long to not be better than this semiconductor technology illiterate garbage! SRAM (aka cache) transistor density scaling absolutely hit a BRICK FREAKING WALL at 7nm! Only logic transistors are still shrinking anymore past that point!!!

    Aka, going to 5nm would do practically NOTHING to increase cache capacity OR reduce die size, but would still make the chip about 2x more expensive! 6nm (aka refined 7nm w/ more EUV layers) might make sense strictly for power efficiency gains, but that's just about the limit of what you can manufacture a die of pure SRAM cache on without EXTREME levels of waste.

    AMD's X3D dies are likely to stay 7nm or at best 6nm for basically the foreseeable future unless TSMC or Samsung have a MASSIVE breakthrough on pushing SRAM transistor scaling forward again that as of right now looks to be nothing but a wishful pipe dream... 🤷
    Reply
  • PaulAlcorn
    Cooe said:
    "Moving to a newer process node, like 6nm or maybe even 5nm, could enable AMD to cram in even more L3 cache capacity."

    🤦 Come on Paul!!! You've been in the game WAAAAAAY too long to not be better than this semiconductor technology illiterate garbage! SRAM (aka cache) transistor density scaling absolutely hit a BRICK FREAKING WALL at 7nm! Only logic transistors are still shrinking anymore past that point!!!

    Aka, going to 5nm would do practically NOTHING to increase cache capacity OR reduce die size, but would still make the chip about 2x more expensive! 6nm (aka refined 7nm w/ more EUV layers) might make sense strictly for power efficiency gains, but that's just about the limit of what you can manufacture a die of pure SRAM cache on without EXTREME levels of waste.

    AMD's X3D dies are likely to stay 7nm or at best 6nm for basically the foreseeable future unless TSMC or Samsung have a MASSIVE breakthrough on pushing SRAM transistor scaling forward again that as of right now looks to be nothing but a wishful pipe dream... 🤷
    Incorrect. TSMC SRAM scaling hits the wall at the transition from 5nm to 3nm, with the caveat that this occurs with its standard libraries. That means moving from 7nm to 6nm or 5nm would still make sense.

    Besides, we aren't talking about a standard node here. AMD uses a specialized density-optimized 7nm TSMC node for the SRAM die, which, in fact, makes it significantly denser than the 5nm die that it's placed atop. I would expect AMD to take a similar density-optimized approach in future iterations. TSMC N6 has an 18% increase in logic density over DUV 7nm, so it's also logical to expect density gains for an optimized SRAM process.

    As an aside, this SRAM scaling wall only makes it all the more attractive to put SRAM on an older node and preserve die area on the smaller process node.
    Reply
  • Cooe
    PaulAlcorn said:
    Incorrect. TSMC SRAM scaling hits the wall at the transition from 5nm to 3nm. That means moving from 7nm to 6nm or 5nm would make plenty of sense.

    Besides, we aren't talking about a standard node here. AMD uses a specialized density-optimized 7nm TSMC node for the SRAM die, which, in fact, makes it significantly denser than the 5nm die that it's placed atop. I would expect AMD to take a similar approach in future iterations. Additionally, TSMC N6 has an 18% increase in logic density over DUV 7nm, so it's also logical to expect density gains for an optimized SRAM process.

    As an aside, this SRAM scaling wall only makes it all the more attractive to put SRAM on an older node and preserve die area on the smaller process node.
    Lol this is just utterly irrelevant semantics. 5nm's SRAM scaling improvement over 7nm is literally like ≈+10%... For double the cost at a MINIMUM! (While 3nm vs 5nm has a mere ≈+5% improvement. 🤦). Aka the brick wall has already been hit, and that is simply not debatable. If AMD moves to 6nm this generation it'll be strictly for thermal/clock-speed reasons, NOT density or die size.

    5nm's MINISCULE density gains for SRAM simply don't justify the HUMONGOUS cost increase for a die of pure SRAM, or at least not for a couple more years until 5nm gets non-crowded and relatively affordable.

    So sure, 5nm 3D V Cache chips might happen EVENTUALLY, but I wouldn't expect it until around Zen 7. Otherwise it just simply does not math. And even when 5nm DOES become relatively affordable it still only just BARELY maths! A mere +10% increase to either cache capacity or reduction in die size simply doesn't move the needle.

    AMD would essentially need to go from 7/6nm all the way down to 3nm to have an even SOMEWHAT notable improvement in SRAM cache density, but that would still only be a whopping..... +15% gain lol. 🤷 For like +4-5x the price... Compare that to pre-7nm nodes which were increasing SRAM density by at least >≈+30-50% every, single, generation.

    AMD's basically stuck with <=64MB L3 cache chips made on 7nm family nodes for the foreseeable future. 5nm might happen eventually when it's not so stinking expensive, but it won't even come CLOSE to being enough of an change to actually increase the cache capacity. Not even 3nm would make a >64MB chip possible. Due to how AMD's L3 is laid out you'd need to hit 92MB which is simply a bridge MUCH too far for even 3nm's mere ≈+15% SRAM density gain over 7nm without a SIGNIFICANT die size increase.
    Reply
  • rluker5
    It is good AMD is clearing this up ahead of time.
    A lot probably suspected this was the case as it also was last time. The new Ryzens are definitely improvements and can be portrayed as such instead of falling short of expectations now.
    Reply
  • thestryker
    Zen 5 not beating Zen 4 X3D shouldn't be particularly surprising given that their benchmark testing against the 14900K had been with the performance power profile which does have an impact on gaming performance in some games. Zen 4 X3D is around 16% faster than Zen 4 and around 5% faster than 14900K which means I wouldn't be surprised if Zen 5 was around 14900KS in gaming performance.

    Zen 5's multithreaded performance is what seems like will shine this generation. Between the higher IPC and efficiency improvements it's looking very good.
    Reply
  • KraakBal
    Why not just put the cache die under the CPU die? Solve the heat problem that way?
    Reply
  • PaulAlcorn
    KraakBal said:
    Why not just put the cache die under the CPU die? Solve the heat problem that way?
    Power and signals are routed from the PCB to the bottom of the chip, which would cause issues passing through the cache die. Backside power delivery (wherein they move this circuitry to the other side of the transistors) could fix this issue, but that isn't coming until future TSMC nodes (A16 is the first iirc).
    Reply
  • usertests
    What does everyone think the 9000X3D improvements will look like?

    I hope they do a 24-core Zen 5 X3D + Zen 5c, or at least not make a single-CCD V-Cache model like 7950X3D/7900X3D.

    TSMC always had the ability to do multiple layers of cache. Maybe it's time to pull that out and differentiate the top chips from the 6/8-core X3D. Kind of like how Intel gives you progressively more L3 cache as you move up from the bottom to the flagship. So if they make a 9950X3D with cache only on 1 CCD, it could have 160 MiB instead of 96 MiB. Then keep the 9800X3D with 96 MiB. Suddenly the 8-core doesn't game as well as the 16-core, barring scheduling issues.

    Cooe said:
    Lol this is just utterly irrelevant semantics. 5nm's SRAM scaling improvement over 7nm is literally like ≈+10%... For double the cost at a MINIMUM! (While 3nm vs 5nm has a mere ≈+5% improvement. 🤦). Aka the brick wall has already been hit, and that is simply not debatable. If AMD moves to 6nm this generation it'll be strictly for thermal/clock-speed reasons, NOT density or die size.
    I think TSMC wants to transition most N7 production to N6, and it gets mild efficiency gains that weren't touted (they only talked about +18% density for logic), so I could see that node being used and resulting in some measurable benefit. 1st-gen to 2nd-gen V-Cache increased bandwidth by 25% (2.0 to 2.5 Tb/s).

    Everyone made a big deal about the SRAM brick wall at 5nm, but I think it's possible that it starts scaling some more only with a post-FinFET technology. Obviously there's GAAFETs at TSMC N2, but maybe something beyond that would be better. But I agree that a mature node should be used. In fact, maybe ALL L3 cache should be moved off of core dies and onto cache chiplets. Could be part of the secret sauce for Zen 6 or later.
    Reply