Memory subsystem performance
Now that we've considered power efficiency, we'll move on to our performance results, beginning with some synthetic tests of the CPUs' memory subsystems. These results don't track directly with real-world performance, but they do give us some insights into the CPU and system architectures involved. For this first test, the graph is pretty crowded. We've tried to be selective, generally only choosing one representative from each architecture. This test is multithreaded, so more coreswith associated L1 and L2 cachescan lead to higher throughput.

The additional cores grant the X6 a straightforward increase in L1 and L2 cache bandwidth. Interestingly, because AMD's caches are exclusivethat is, the lower-level caches don't replicate data in the higher-level cachesthe total effective cache size on Thuban chips is rather considerable. The L1 data, L2, and L3 caches total up to about 9.4MB. That's effectively larger, and generally faster, than the Core i7-930's inclusive cache hierarchy with an 8MB L3. Still, Thuban falls short in cache throughput and total size compared to the 32-nm Core i7-980X, which has six cores of its own and a 12MB last-level cache.
This graph becomes almost impossible to read once we get to the larger block sizes, where we're really measuring main memory bandwidth. Stream is a better test of that particular attribute.

The X6 chips post a slight but consistent performance increase in the Add and Triad tests compared to older Phenom II processors. That's likely the result of some tweaks AMD made to the memory controller in Thuban. The gain is not the result of Thuban's additional cores, because the X6 chips performed best here with only four threads running; those are the results we've reported.

We've included these results for the sake of completeness, but we'll admit up front that they may be iffy. This test produces results in CPU cycles, and we convert those numbers to nanoseconds based on clock speeds. Trouble is, clock speeds are no longer static, even though we disable SpeedStep and Cool'n'Quiet for this particular benchmark. Our assumption is that the CPUs will reach their respective turbo peaks during this simple, single-threaded test. However, they may not be doing so in every case. If you assume the X6 1090T is running at its base frequency, it would be at a more pedestrian 50 ns here, not 44. The X6 1055T would be at 57 ns. We're not sure which is the right answer, and we may have to start disabling turbo in order to conduct these tests.
Interestingly, we measured Thuban's L3 cache latency at 52 cycles, a little lower than the 57 cycles we saw with the Phenom II X4 965.
| LG's little HU80KA projector can cast a big 4K picture | 17 |