Anand (which is having technical difficulty right now) had a pretty in-depth dive on Zen's chip configuration too.
What we have with Zen is effectively two independent 4 core CPUs that happen to be placed on a single die. Inter "complex" communication goes through the on-chip hub in a manner that is logically similar to accessing DRAM although it doesn't actually require a physical traversal to the DRAM device to move the bytes. I'll tell you right now that this is going to produce strong NUMA behavior since there is going to be a noticeable penalty in moving data to the wrong core even inside of a single chip.
Anand is back up:
It is worth noting that a single CCX has 8 MB of cache, and as a result the 8-core Zen being displayed by AMD at the current events involves two CPU Complexes. This affords a total of 16 MB of L3 cache, albeit in two distinct parts. This means that the true LLC for the entire chip is actually DRAM, although AMD states that the two CCXes can communicate with each other through the custom fabric which connects both the complexes, the memory controller, the IO, the PCIe lanes etc.
http://www.anandtech.com/show/10591/amd ... allelism/5The "custom fabric" is clearly the on-die equivalent of the north bridge. Some people have noted that Intel's big chips (and I mean *big* chips) implement a multi-loop mesh. First of all, a haswell-era 8 core Xeon or HEDT part is using a single ring bus with some pretty insane bandwidth and while Skylake hasn't gotten much love Intel actually doubled the ringbus bandwidth again over Haswell/Broadwell. Even in a mesh where some data might have to hop between two rings in a > 8 core chip, the total bandwidth will be much higher and most importantly latency is going to be massively lower than having to hop through an external controller hub. The 4-way cache that AMD is advertising appears to be a cross-bar switched configuration (hence the claim of having the same average access time for all 4 cores) that's very reminiscent of Nehalem-era L3 caches before Intel adopted the ring and mesh topologies starting with Sandy Bridge. Intel had trouble scaling the crossbar to higher core counts and it looks like AMD also stopped at 4 cores for similar reasons.
You can see the single-ring topology for 8 core Haswell parts and the mesh topology for larger chips here: http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-haswell-ep-cores-/4
I fully understand why AMD did this design: They could get away with designing a quad-core processor that can be scaled to 8 cores in a finished product without having to spend the time and money in designing a chip that actually has to integrate all 8 cores together at a low level. It also tells you that the GPU/CPU connection in next year's Zen APU's will continue to effectively be a separate component that talks to the 4-core Zen "complex" through the equivalent of the north bridge, which is actually very ironic because Intel, which supposedly doesn't integrate graphics that well, simply puts the IGP on the same ring bus for ultra fast communication with the CPU cores.