Personal computing discussed

Moderators: renee, Flying Fox, morphine

 
biffzinker
Gerbil Jedi
Topic Author
Posts: 1998
Joined: Tue Mar 21, 2006 3:53 pm
Location: AK, USA

AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 11:32 am

We've been chasing AMD Zen for a long time now. Our older report from April 2015 uncovered an important detail about component organization on Zen processors - the clumping of four CPU cores into a highly-specialized, possibly indivisible subunit referred to then, as the "Zen Quad-core Unit." Some of the latest presentations about the architecture, following AMD's "performance reveal" event from earlier this month, shed more light on this quad-core unit.

AMD is referring to the Zen quad-core unit as the CPU-Complex (CCX). Each CCX is a combination of four independent CPU cores. Unlike on "Bulldozer," a "Zen" core does not share any of its number-crunching machinery with neighboring cores. Each "Zen" core has a dedicated L2 cache of 512 KB, and four Zen cores share an 8 MB L3 cache. AMD will control core-counts by controlling CCX units. A "Summit Ridge" socket AM4 processor features two CCX units (making up eight cores in all), sharing a dual-channel DDR4 memory controller, and the platform core-logic (chipset), complete with an integrated PCI-Express root complex. Socket AM4 APUs will feature one CCX unit, and an integrated GPU in place of the second CCX. With this, AMD is able to bring the two diverse desktop platforms under one socket.

Source: TechPowerUp
It would take you 2,363 continuous hours or 98 days,11 hours, and 35 minutes of gameplay to complete your Steam library.
In this time you could travel to Venus one time.
 
chuckula
Minister of Gerbil Affairs
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 11:46 am

Anand (which is having technical difficulty right now) had a pretty in-depth dive on Zen's chip configuration too.

What we have with Zen is effectively two independent 4 core CPUs that happen to be placed on a single die. Inter "complex" communication goes through the on-chip hub in a manner that is logically similar to accessing DRAM although it doesn't actually require a physical traversal to the DRAM device to move the bytes.  I'll tell you right now that this is going to produce strong NUMA behavior since there is going to be a noticeable penalty in moving data to the wrong core even inside of a single chip.

Anand is back up:

It is worth noting that a single CCX has 8 MB of cache, and as a result the 8-core Zen being displayed by AMD at the current events involves two CPU Complexes. This affords a total of 16 MB of L3 cache, albeit in two distinct parts. This means that the true LLC for the entire chip is actually DRAM, although AMD states that the two CCXes can communicate with each other through the custom fabric which connects both the complexes, the memory controller, the IO, the PCIe lanes etc.


http://www.anandtech.com/show/10591/amd ... allelism/5

The "custom fabric" is clearly the on-die equivalent of the north bridge. Some people have noted that Intel's big chips (and I mean *big* chips) implement a multi-loop mesh. First of all, a haswell-era 8 core Xeon or HEDT part is using a single ring bus with some pretty insane bandwidth and while Skylake hasn't gotten much love Intel actually doubled the ringbus bandwidth again over Haswell/Broadwell. Even in a mesh where some data might have to hop between two rings in a > 8 core chip, the total bandwidth will be much higher and most importantly latency is going to be massively lower than having to hop through an external controller hub. The 4-way cache that AMD is advertising appears to be a cross-bar switched configuration (hence the claim of having the same average access time for all 4 cores) that's very reminiscent of Nehalem-era L3 caches before Intel adopted the ring and mesh topologies starting with Sandy Bridge. Intel had trouble scaling the crossbar to higher core counts and it looks like AMD also stopped at 4 cores for similar reasons.

You can see the single-ring topology for 8 core Haswell parts and the mesh topology for larger chips here: http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-haswell-ep-cores-/4

I fully understand why AMD did this design: They could get away with designing a quad-core processor that can be scaled to 8 cores in a finished product without having to spend the time and money in designing a chip that actually has to integrate all 8 cores together at a low level.  It also tells you that the GPU/CPU connection in next year's Zen APU's will continue to effectively be a separate component that talks to the 4-core Zen "complex" through the equivalent of the north bridge, which is actually very ironic because Intel, which supposedly doesn't integrate graphics that well, simply puts the IGP on the same ring bus for ultra fast communication with the CPU cores.
Last edited by chuckula on Wed Aug 24, 2016 1:13 pm, edited 4 times in total.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
Waco
Maximum Gerbil
Posts: 4850
Joined: Tue Jan 20, 2009 4:14 pm
Location: Los Alamos, NM

Re: AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 11:56 am

chuckula wrote:
What we have with Zen is effectively two independent 4 core CPUs that happen to be placed on a single die. Inter "complex" communication goes through the on-chip hub in a manner that is logically similar to accessing DRAM although it doesn't actually require a physical traversal to the DRAM device to move the bytes.  I'll tell you right now that this is going to produce strong NUMA behavior since there is going to be a noticeable penalty in moving data to the wrong core even inside of a single chip.

That totally depends on how they're linked. They could easily be joined closely enough that core-to-core within a complex and core-to-core across complexes is very similar. Keep in mind that Intel has this same issue, only in a different way, since they use a ring bus on their chips. The distance between two cores is variable between 1 and X/2 hops, where X is the number of cores on chip.

EDIT: It's more complicated that that on the largest chips, since there are multiple rings. My point though, is that a single NUMA domain doesn't exactly have all cores equal even in Intel land.
Victory requires no explanation. Defeat allows none.
 
Bauxite
Gerbil Elite
Posts: 788
Joined: Sat Jan 28, 2006 12:10 pm
Location: electrolytic redox smelting plant

Re: AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 12:17 pm

Not really a big deal, as noted very similar been going on since the mcc/hcc xeons at least. Most OSes have at least some accounting for this when assigning threads, assuming you keep them up to date (windows).

I have an 8 core ivy bridge (cut down from a 12 die) that is definitely two rings of 4.
TR RIP 7/7/2019
 
biffzinker
Gerbil Jedi
Topic Author
Posts: 1998
Joined: Tue Mar 21, 2006 3:53 pm
Location: AK, USA

Re: AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 12:44 pm

So technically Naples doesn't need to be two or more dies on one package it could be one die with eight CCX although how big would the die end up?
It would take you 2,363 continuous hours or 98 days,11 hours, and 35 minutes of gameplay to complete your Steam library.
In this time you could travel to Venus one time.
 
chuckula
Minister of Gerbil Affairs
Posts: 2109
Joined: Wed Jan 23, 2008 9:18 pm
Location: Probably where I don't belong.

Re: AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 1:00 pm

biffzinker wrote:
So technically Naples doesn't need to be two or more dies on one package it could be one die with eight CCX although how big would the die end up?

If GloFo could fab that chip then sure. But there's no way GloFo is going to economically fab a chip that big. Naples is all but assured to be four of these 8-core dies put together in one package, and AMD has even been advertising a "global memory interface" that would connect the dies together in a single package.
4770K @ 4.7 GHz; 32GB DDR3-2133; Officially RX-560... that's right AMD you shills!; 512GB 840 Pro (2x); Fractal Define XL-R2; NZXT Kraken-X60
--Many thanks to the TR Forum for advice in getting it built.
 
TheRazorsEdge
Gerbil Team Leader
Posts: 219
Joined: Tue Apr 03, 2007 1:10 pm

Re: AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 1:07 pm

Bauxite wrote:
Not really a big deal, as noted very similar been going on since the mcc/hcc xeons at least. Most OSes have at least some accounting for this when assigning threads, assuming you keep them up to date (windows).

Windows, Linux, and BSD are all NUMA-aware, as are the major hypervisors.

But if one application thread needs to access more memory beyond a single NUMA node, you will see performance degradation regardless. In virtual environments, the division between NUMA nodes might not be where it is normally expected (on bare metal, the memory is generally divided equally).

The entire software stack needs to support NUMA for the impact of this design decision to be trivial. The complexities will mostly be mitigated by OS and hypervisor design, but this will likely make the edge cases all the more difficult to identify and address.
 
the
Gerbil Elite
Posts: 941
Joined: Tue Jun 29, 2010 2:26 am

Re: AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 3:24 pm

TheRazorsEdge wrote:
Bauxite wrote:
Not really a big deal, as noted very similar been going on since the mcc/hcc xeons at least. Most OSes have at least some accounting for this when assigning threads, assuming you keep them up to date (windows).

Windows, Linux, and BSD are all NUMA-aware, as are the major hypervisors.


Generally speaking, they are more than NUMA-aware. The schedulers also know what two logical processors relate to the same physical core. Similar scheduling changes were made when Bulldozer arrived (multithreaded applications would be placed on the same module where as independent applications would be spread across different modules). The main reason is see a performance gain from higher cache hit rate.

Schedulers will need to be updated for Zen's cluster arrangement but it is pretty much par for the course nowadays.
Dual Opteron 6376, 96 GB DDR3, Asus KGPE-D16, GTX 970
Mac Pro Dual Xeon E5645, 48 GB DDR3, GTX 770
Core i7 [email protected] Ghz, 32 GB DDR3, GA-X79-UP5-Wifi
Core i7 [email protected] Ghz, 16 GB DDR3, GTX 970, GA-X68XP-UD4
 
the
Gerbil Elite
Posts: 941
Joined: Tue Jun 29, 2010 2:26 am

Re: AMD ZEN Quad-Core Subunit Named CPU-Complex (CCX)

Wed Aug 24, 2016 3:36 pm

Bauxite wrote:
Not really a big deal, as noted very similar been going on since the mcc/hcc xeons at least. Most OSes have at least some accounting for this when assigning threads, assuming you keep them up to date (windows).

I have an 8 core ivy bridge (cut down from a 12 die) that is definitely two rings of 4.


The 8 core Ivy bridge is actually cut down from a 10 core die that has two uni-directional rings.

The 12 core Ivy Bridge-EP and all Ivy Bridge-EX chips have three uni-directional rings but each core only connects to two of the three.

It wasn't until the 18 core Haswell-EP chips that two fully independent ring sets were used.

I've outlined a good chunk of Intel's topologies here.
Dual Opteron 6376, 96 GB DDR3, Asus KGPE-D16, GTX 970
Mac Pro Dual Xeon E5645, 48 GB DDR3, GTX 770
Core i7 [email protected] Ghz, 32 GB DDR3, GA-X79-UP5-Wifi
Core i7 [email protected] Ghz, 16 GB DDR3, GTX 970, GA-X68XP-UD4

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On