0. Native quad coreMany of us have been asking how AMD would respond to the Core microarchitecture, and to date, we haven't heard much encouraging news. However, in my talks with various AMD reps about these matters, they have always insisted to me that a proper response was comingyet their hands were tied to talk about them. Now we know AMD is not messing around.
1. Hypertransport up to 5.2GT/s
2. Better coherency
3. Private L2, shared L3 cache that scales up.
4. Separate power planes and pstates for north bridge and CPU
5. 128b FPUs - see 14,15
6. 48b virtual/physical addressing and 1GB pages
7. Support for DDR2, eventually DDR3
8. Support for FBD1 and 2 eventually
9. I/O virtualization and nested page tables
10. Memory mirroring, data poisoning, HT retry protocol support
11. 32B instead of 16B ifetch
12. Indirect branch predictors
13. OOO load execution - similar to memory disambiguation
14. 2x 128b SSE units
15. 2x 128b SSE LDs/cycle
16. Several new instructions
Some of these changes were somewhat obvious and necessary, especially for AMD's continued push in the server space. (Winning in that market is vitally important for a number of reasons, including the prestige it carries, the higher average selling prices per chip, and the fact that technology from the very high end tends to filter down into other market segments over time.) The server-oriented modifications in K8L include the move to quad cores, faster HyperTransport links, and shared L3 cache. These things should both allow and require better cache coherency mechanisms in order to scale up to four sockets and beyond. I'd expect coherency data to be summarized per socket based on the contents of the shared L3 cache, with arbitration over L2 caches happening internally on the chip. That's my guess, anyhow. Four cores and more cache should also bring higher performance, of course.
The other changes for servers include the improved memory addressing, I/O virtualization, and the cluster of reliability stuff like memory mirroring and HT retry. Those things will matter for servers but may take some time to matter elsewhere.
The changes to each K8L CPU core proper is the big news, though. We've known for a while that K8L would likely include better floating-point math performance, and the K8L looks to be able to match the Core with single-cycle 128-bit SSE ops. What's also new here are the additions of an indirect branch predictor, 32B instruction fetch, and out-of-order loads. With these changes, the K8L could catch up to Intel's Core in integer performance as well as FP. Johan DeGelas has written about how the K8 isn't nearly as flexible with loads as Core, and AMD looks to be addressing that issue, which is a big opportunity for improvement. Taken alongside the ability to modulate voltage and clock speeds for the cores and the memory controller separately, these changes could lead to a big boost in power-efficient performance.
We can probably expect the K8L to scale down in terms of the size of the shared L3 cache and in the number of cores onboard. We might see a desktop variant with a 1MB shared L3 and two cores, for instance.
Kanter says the K8L is scheduled to arrive in early 2007, but given how these things tend to go, we'll have to wait and see about that. AMD has to complete the transition to 65 nm between now and then, and that move should help bolster the K8 against Core in the meantime. What seems clear from these revelations is that the CPU market will probably continue to be a two-horse race for some time yet, even if Intel’s Core is as strong as we think, so long as AMD can continue to execute on its plans.