When we asked AMD about the possibility of a six-core desktop processor last summer, shortly after the six-core Istanbul Opteron launch, the company inquired whether we would personally buy such a product. Perhaps not, we replied, but we know some folks would enjoy the option. AMD’s head of server and workstation marketing fired back, “If you have a million friends that need it…”
Barely a month later, AMD officially confirmed plans for a six-core desktop product code-named Thuban, which would be based on AMD’s second-generation six-core Opteron design, complete with DDR3 memory support, HyperTransport 3.0, and likely higher performance. Thuban would come out at some point in 2010. Between then and now, Intel has managed to beat AMD to the punch with the Core i7-980X Extreme. When it hit stores last month, the 980X simultaneously became the world’s first six-core desktop processor and the world’s first 32-nm six-core chip.
Today, AMD finally lifts the veil on the Phenom II X6to undoubtedly high expectations. Is this the design that will let AMD re-enter battlefields long conceded to Intel’s Core i5 and i7 CPUs, or does it only serve to cement Intel’s leadership in high-end desktop processors?
What makes Thuban special
Hearing the product namePhenom II X6one might think Thuban is little more than a Phenom II X4 with a couple of extra cores glued on. After all, it is based on the same architecture, and it works in the same Socket AM2+ and Socket AM3 motherboards. Making that assumption would be unwise, however, because aside from having six cores on a single die instead of four, Thuban departs from today’s Phenom II X4s in two important ways.
The first of those is Turbo Core, whose basic premise should sound familiar to anyone acquainted with Intel’s Core i5 and i7 CPUs. From day one, those Intel chips have featured a technology called Turbo Boost, which dynamically raises the clock speeds of active cores depending on the workload and the available thermal headroom. Turbo Boost cleverly balances clock frequencies and thermals, increasing clock speeds with lightly multithreaded workloads and reducing them when more cores are fully occupied. So long as the headroom is available, Turbo Boost will even keep clock frequencies above the CPU’s rated speed when all cores are quite busy.
AMD’s Turbo Core is simply a different implementation of the same concept. Turbo Core boosts the clocks on up to three of Thuban’s cores when the others aren’t fully loaded, and it raises them substantiallyby as much as 500MHz in the case of the Phenom II X6 1055T. AMD’s take on Turbo differs from Intel’s in many details, though.
For instance, Intel’s Turbo Boost employs a network of on-chip thermal sensors and a fairly sophisticated built-in microcontroller dedicated to power management, ensuring the best use of the available thermal headroom. Turbo Boost behavior may vary from chip to chip and system to system, depending on the thermal properties of the individual CPU and on the effectiveness of the cooling solution. By contrast, AMD processors with Turbo Core are screened to meet certain thermal conditions at the factory, and each processor with a given model number should behave the same as any other.
In fact, after a couple of probing conversations with various PR types at AMD, we’re fairly certain Thuban silicon doesn’t contain any substantial new logic dedicated to making Turbo Core work. Turbo Core is essentially an extension of the existing mechanism for management of power states, a la Cool’n’Quiet and SpeedStep. Not that there’s anything wrong with that. Although Intel’s way of doing things ought to allow it to squeeze more headroom out of each chip, Turbo Core should provide the advertised improvements in clock speed and performance with minimal drama. Never one to miss a trick, AMD marketing even touts Turbo Core’s consistent, deterministic behavior as a positive trait.
One question we raised when we first reported on Turbo Core was how this mechanism could possibly coexist with AMD’s decision to lock the P-states of all cores together in the Phenom II. We originally explained the rationale for that choice like so:
The firm found that the varying power states (or P-states) on the Phenom could prove to be confusing to the Windows Scheduler, which wouldn’t necessarily choose wisely when deciding whether to schedule a thread on a core with a low P-state or a high one. As a result, enabling the Cool’n’Quiet dynamic power saving feature could lead to unintended performance degradation. To work around this problem, AMD has decided to link together the P-states of the Phenom II’s cores, via some BIOS-level changes.
Locked P-states would mean all cores run at the same clock frequency, which doesn’t mix well with the dynamic symphony that Turbo Core ought to be. Having seen Turbo Core in action and pinged AMD on the matter, we can confirm that the Phenom II X6’s don’t have linked P-states. The cores move up and down in frequency independently of one another, with up to three of themany three of them, depending on the load at the momentranging north of the CPU’s rated clock speed. This behavior is easily observable using the monitoring tool in AMD’s Overdrive utility, and it’s quite the contrast to the cores-in-lock-step operation of a Phenom II X4.
One remaining question is how Thuban is able to avoid the effects of the scheduling problems that led AMD to link the P-states together on earlier Phenom II processors. When we posed this question to the folks in AMD PR at the eleventh hour before publication of this review, they didn’t have a definite answer. However, our casual observations suggest AMD may be using the CPU’s rated clock speed as a common baseline to ensure decent performance when threads jump between cores. For example, the Phenom II X6 1090T has a base clock of 3.2GHz and a Turbo Core peak of 3.6GHz. When a single-threaded application is running on it, no core drops below 3.2GHz. When that application stops and the CPU is essentially idle, all six cores drop down to 800MHz, the minimum speed allowed by Cool’n’Quiet.
Turbo Core should give AMD a much better fighting chance against Intel’s latest wave of CPUs. The lack of such a technology partly explains why Phenom IIs have historically done poorer than Core i5s and i7s in benchmarks; they might be competitive when all four of their cores are busy, but in more lightly multithreaded apps, they’re stuck at their base clock speeds.
AMD had another ace up its sleeve when designing Thuban. The folks at GlobalFoundries have made tweaks to their 45-nm silicon-on-insulator process, adding a low-k dielectric layer to reduce leakage power. The result? Within a given thermal envelope, AMD can achieve nearly the same clock speeds with six cores as a Phenom II X4 did with four cores.
That’s a pretty big deal. The fastest quad-core Phenom II X4 AMD has managed to produce, the 965 Black Edition, runs at 3.4GHz. Meanwhile, the fastest Istanbul six-core Opteron based on the same process technology only does 2.8GHz. AMD might have been able to break 3GHz by taking the same design to the desktop, but a hypothetical Istanbul-derived Phenom II would still be at a disadvantage compared to higher-clocked quad-core products. Many desktop apps don’t take advantage of more than a couple of cores, so in those cases, clock speed becomes the determining factorand a pair of 3GHz Phenom II cores just ain’t that fast.
Intel stepped down to a whole new process technology to give us a six-core processor, the Core i7-980X Extreme, with the same clock speeds and TDP as its previous flagship, the Core i7-975 Extreme. AMD has pulled off a similar feat of clock scaling and power efficiency while staying at the same 45-nm node. We’ve seen this kind of mid-stream refinement in process tech from AMD in the past, and it has continued through the spin-off of GlobalFoundries as a separate entity. Of course, Intel still has a considerable advantage from a manufacturing perspective, since Gulftown has a 28% smaller die area than Thuban despite having about 56% more transistors, as noted in the table below. Strangely, AMD PR resisted giving us an estimated transistor count for Thuban, but they did point us to the Istanbul Opteron as a point of reference. We suspect the two are essentially identical in this regard, with a few rather minor changes between steppings.
|Penryn||Core 2 Duo||2||2||6 MB||45||410||107|
|Bloomfield||Core i7||4||8||8 MB||45||731||263|
|Lynnfield||Core i5, i7||4||8||8 MB||45||774||296|
|Westmere||Core i3, i5||2||4||4 MB||32||383||81|
|Gulftown||Core i7-980X||6||12||12 MB||32||1170||248|
|Deneb||Phenom II||4||4||6 MB||45||758||258|
|Propus/Rana||Athlon II X4/X3||4||4||512 KB x 4||45||300||169|
|Regor||Athlon II X2||2||2||1 MB x 2||45||234||118|
|Thuban||Phenom II X6||6||6||6 MB x 1||45||~904||346|
Gulftown is even smaller than the Deneb silicon inside quad-core Phenom IIsa clear testament to Intel’s manufacturing superiority. AMD may be able to get away with larger chips and tighter margins than before, though, since it no longer needs to pay the tremendous research and development costs associated with silicon manufacturing. That responsibility now falls upon GlobalFoundries.
As it is, Thuban looks well equipped to put AMD back into contention in the higher echelons of the desktop processor market. The big questions, of course, are how quick those Phenom II X6 CPUs actually are, and how much they cost.
The Phenom II X6 and the competition
We could be comparing AMD’s two Phenom II X6 processors to the Core i7-980X Extreme. We could tell you the AMD chips are based on older process technology and have larger dies yet slightly lower TDP ratings. We could say the 980X and the fastest Phenom II X6 have almost the same base and “turbo” clock speeds, although the Intel part has double the L3 cache, one more memory channel, and twice the thread count. We could go on.
Contrasting these two chips would only have an academic interest, however, because AMD’s pricing dictates an entirely different comparison. You see, instead of going toe-to-toe with the 980X at $999, AMD has opted to offer the fastest of its two Phenom II X6 processors for $295. The slower one has an even lower bulk price: $199. As a result, we’ll largely be comparing the newcomers to Intel’s quad-core, 45-nm Core i5 and i7 products.
Before we do that, we should take a moment to introduce fully the two Phenom II X6 processors AMD has released today:
|Phenom II X6 1055T||6||6||2.8 GHz||3.3 GHz||6 MB||2||125W||$199|
|Phenom II X6 1090T
|6||6||3.2 GHz||3.6 GHz||6 MB||2||125W||$295|
Update 04/28: AMD originally told us the Phenom II X6 1090T would have a $285 price tag. Today, the company sent us an e-mail saying the 1090T is in fact priced at $295. We’ve updated this articleincluding our value section on page 15to reflect the change.
These CPUs cap off the Phenom II family, whose quad-core models are listed below:
|Phenom II X4 905e||4||4||2.5 GHz||N/A||6 MB||2||65W||$165|
|Phenom II X4 910e||4||4||2.6 GHz||N/A||6 MB||2||65W||$175|
|Phenom II X4 925||4||4||2.8 GHz||N/A||6 MB||2||95W||$145|
|Phenom II X4 945||4||4||3.0 GHz||N/A||6 MB||2||95W||$155|
|Phenom II X4 955
|4||4||3.2 GHz||N/A||6 MB||2||125W||$165|
|Phenom II X4 965
|4||4||3.4 GHz||N/A||6 MB||2||140W||$185|
As you can see, the Phenom II X6 sets a new high-water mark for both pricing and specifications in AMD’s desktop product line. Remember what we said about power envelopes earlier? The Phenom II X6 1090T can run six cores at the same speed as the quad-core Phenom II X4 955 Black Edition within the same 125W power envelope. When Turbo kicks in and pushes three of its cores to 3.6GHz, the 1090T should also deliver better performance in lightly multithreaded applications than the 965 Black Edition, which maxes out at 3.4GHz regardless of the load.
AMD has positioned both Phenom II X6 variants smack-dab in quad-core Core i5 and i7 territory, although once again, it doesn’t pretend to tackle the high end. Intel’s “Extreme” processors will probably remain unchallenged until at least the next AMD CPU generation.
|Core i5-750||4||4||2.66 GHz||3.20 GHz||8 MB||2||95W||$196|
|Core i7-860||4||8||2.80 GHz||3.46 GHz||8 MB||2||95W||$284|
|Core i7-870||4||8||2.93 GHz||3.60 GHz||8 MB||2||95W||$562|
|Core i7-920||4||8||2.66 GHz||2.93 GHz||8 MB||3||130W||$284|
|Core i7-930||4||8||2.80 GHz||3.06 GHz||8 MB||3||130W||$294|
|Core i7-960||4||8||3.20 GHz||3.46 GHz||8 MB||3||130W||$562|
|Core i7-975 Extreme||4||8||3.33 GHz||3.60 GHz||8 MB||3||130W||$999|
|Core i7-980X Extreme||6||12||3.33 GHz||3.60 GHz||12 MB||3||130W||$999|
In terms of official, bulk pricing numbers, it would appear AMD has priced the Phenom II X6 1055T opposite the Core i5-750 and the 1090T against a pair of Intel offerings: the Core i7-860 and a relative newcomer, the Core i7-930, which essentially replaces the Core i7-920. Those are bold moves. The Core i5-750 in particular has displayed a unique mix of performance, power efficiency, and value, faring exceptionally well in our past value comparisons.
We have tested most of these processors in the following pages. One painful exception is the Core i7-860, which is unfortunately absent. We can still compare the Phenom II X6 1090T to the Core i7-930, and you might also want to keep in mind that the Core i7-860’s performance and power draw would likely be just a little lower than the i7-870’s.
The 890FX chipset and our motherboard
AMD’s new Phenoms are joining us together with a new chipset, the 890FX, which constitutes the new high end for AMD motherboards.
Just like the 790FX before it, the 890FX brings us generous amounts of connectivity options and oodles of PCI Express lanes. Both chipsets actually use pretty much the same north bridge: a slab of 65-nm silicon with 42 lanes of second-generation PCI Express connectivity hooked up to the processor via a 4 GT/s HyperTransport 3.0 link.
AMD pairs this north bridge with its new SB850 south bridge, which we recently reviewed as part of the 890GX chipset. The SB850 delivers six Serial ATA 6Gbps ports (all backward-compatible with previous versions of the standard, of course), 14 USB 2.0 ports, integrated Gigabit Ethernet, and 32-bit PCI connectivity. What differentiates the 890FX from the 890GX is the north-bridge component, then, which lacks integrated graphics but packs considerably more PCIe lanes than in the 890GX.
We’ll talk more about our testing setup shortly, but in case you’re wondering, we did test Thuban using an 890FX mobothe MSI 890FXA-GD70, to be precise. As the image above attests, this board is part of a new wave of high-end Socket AM3 motherboards meant to match the Phenom II X6’s more upscale pedigree.
We’ve underclocked the Core i5-661 to 2.8GHz in order to simulate the Core i3-540. Although we did change the core clock to the proper speed, the processor’s uncore clock remained at the i5-661’s stock frequency. We believe shipping Core i3-540 processors have a 2.13GHz uncore clock, while the i5-661 has a 2.4GHz uncore clock, so our simulated processor may perform slightly better than the real item due to a higher L3 cache speed. The differences are likely to be very minor, based on our experience with Lynnfield partsthe L3 cache is incredibly fast, regardlessbut we thought you should know about that possibility.
Additionally, two of our Core i7 processors, the i7-930 and i7-960, are actually an underclocked Core i7-975 Extreme, but in those cases, we’re fairly certain all of the clocks match what they should, since Bloomfield gives us a little more control over such things. In order to run the Core i7-930/960’s memory at 1333MHz, we raised the uncore clock to 2.66GHz. That comes with the territory, and I expect many Core i7-900-series owners have done the same.
Happily, we were able to simulate the Phenom II X6 1055T’s performance, including Turbo Core, quite precisely using AMD’s Overdrive utility, so we’ve included scores for it. One place where we couldn’t do so is in our [email protected] results, since AMD’s Overdrive utility doesn’t work in Linux, where that benchmark runs.
As is our custom, we’ve omitted the simulated processor speed grades from our power consumption testing.
After consulting with our readers, we’ve decided to enable Windows’ “Balanced” power profile for the bulk of our desktop processor tests, which means power-saving features like SpeedStep and Cool’n’Quiet are operating. (In the past, we only enabled these features for power consumption testing.) Our spot checks demonstrated to us that, typically, there’s no performance penalty for enabling these features on today’s CPUs. If there is a real-world penalty to enabling these features, well, we think that’s worthy of inclusion in our measurements, since the vast majority of desktop processors these days will spend their lives with these features enabled. We did disable these power management features to measure cache latencies, but otherwise, it was unnecessary to do so.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and we reported the median of the scores produced.
Our test systems were configured like so:
|Processor||Athlon II X2 255 3.1GHz
Athlon II X3 440 3.0GHz
Athlon II X4 630 2.8GHz
Athlon II X4 635 2.9GHz
Phenom II X2 550 3.1GHz
Phenom II X4 910e 2.6GHz
Phenom II X4 965 3.4GHz
| Phenom II X6 1055T 2.8GHz
Phenom II X6 1090T 3.2GHz
| Pentium E6500 2.93GHz
Core 2 Duo E7600 3.06GHz
Core 2 Quad Q6600 2.4GHz
|Pentium 4 670 3.8GHz|| Core 2 Duo E8600 3.33GHz
Core 2 Quad Q9400 2.66GHz
|Motherboard||Gigabyte MA785G-UD2H||MSI 890FXA-GD70||Asus P5G43T-M Pro||Asus P5G43T-M Pro||Asus P5G43T-M Pro|
|North bridge||785GX||890FX||G43 MCH||G43 MCH||G43 MCH|
|Memory size||4GB (2 DIMMs)||4GB (2 DIMMs)||4GB (2 DIMMs)||4GB (2 DIMMs)||4GB (2 DIMMs)|
|Memory speed||1333 MHz||1333 MHz||1066 MHz||800 MHz||1333 MHz|
|Memory timings||8-8-8-20 2T||8-8-8-20 2T||7-7-7-20 2T||7-7-7-20 2T||8-8-8-20 2T|
|–||AMD AHCI 126.96.36.199||INF update 188.8.131.520
Rapid Storage Technology 184.108.40.2067
|INF update 220.127.116.110
Rapid Storage Technology 18.104.22.1687
|INF update 22.214.171.1240
Rapid Storage Technology 126.96.36.1997
SB750/ALC889A with Realtek 188.8.131.5295 drivers
SB850/ALC892 with Microsoft 6.1.7600.16385 drivers
ICH10R/ ALC887 with Realtek 184.108.40.20695 drivers
ICH10R/ALC887 with Realtek 220.127.116.1195 drivers
ICH10R/ALC887 with Realtek 18.104.22.16895 drivers
|Processor||Core i5-750 2.66GHz
Core i7-870 2.93GHz
|Core i3-530 2.93GHz
Core i3-540 3.06GHz
Core i5-661 3.33GHz
|Core i7-920 2.66GHz||Core i7-930 2.8GHz
Core i7-960 3.2GHz
Core i7-975 Extreme 3.33GHz
Core i7-980X Extreme 3.33GHz
|Motherboard||Gigabyte P55A-UD6||Asus P7H57D-V EVO||Gigabyte EX58-UD3R||Gigabyte X58A-UD5R|
|North bridge||P55 PCH||H57 PCH||X58 IOH||X58 IOH|
|Memory size||4GB (2 DIMMs)||4GB (2 DIMMs)||6GB (3 DIMMs)||6GB (3 DIMMs)|
|Memory speed||1333 MHz||1333 MHz||1066 MHz||1333 MHz|
|Memory timings||8-8-8-20 2T||8-8-8-20 2T||7-7-7-20 2T||8-8-8-20 2T|
|INF update 22.214.171.1240
Rapid Storage Technology 126.96.36.1997
|INF update 188.8.131.520
Rapid Storage Technology 184.108.40.2067
|INF update 220.127.116.110
Rapid Storage Technology 18.104.22.1687
|INF update 22.214.171.1240
Rapid Storage Technology 126.96.36.1997
P55 PCH/ALC889 with Realtek 188.8.131.5295 drivers
H57 PCH/ALC889 with Realtek 184.108.40.20695 drivers
ICH10R/ALC888 with Realtek 220.127.116.1195 drivers
ICH10R/ALC889 with Realtek 18.104.22.16895 drivers
They all shared the following common elements:
|Hard drive||WD RE3 WD1002FBYS 1TB SATA|
|Discrete graphics||Asus ENGTX260 TOP SP216 (GeForce GTX 260) with ForceWare 195.62 drivers|
|OS||Windows 7 Ultimate x64 Edition RTM|
|OS updates||DirectX August 2009 update|
|Power supply||PC Power & Cooling Silencer 610 Watt|
We’d like to thank Asus, Corsair, Gigabyte, OCZ, and WD for helping to outfit our test rigs with some of the finest hardware available. Thanks to Intel and AMD for providing the processors, as well, of course.
The test systems’ Windows desktops were set at 1600×1200 in 32-bit color. Vertical refresh sync (vsync) was disabled in the graphics driver control panel.
We used the following versions of our test applications:
- SiSoft Sandra 2010.1.16.11
- Stream 5.8 64-bit
- CPU-Z 1.52.2
- WorldBench 6 Gold
- Left 4 Dead 2 22.214.171.124
- Modern Warfare 2
- Valve VRAD map build benchmark
- Valve Source Engine particle simulation benchmark
- Cinebench R10 64-bit Edition
- POV-Ray for Windows 3.7 beta 34 64-bit
- picCOLOR 4.0 build 677 64-bit
- 7-Zip 4.65 64-bit
- TrueCrypt 6.3a
- notfred’s Folding benchmark CD generated 8/25/09
- The Panorama Factory 5.3 x64 Edition
- Windows Live Movie Maker 14
- x264 HD benchmark 3.0
- LAME MT 3.97a 64-bit
- ArcSoft Total Media Theatre 126.96.36.199
The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Power consumption and efficiency
For these tests, we used an Extech 380803 power meter to capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire systemthe CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (The monitor was plugged into a separate outlet.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.
We’ll start with the show-your-work stuff, plots of the raw power consumption readings. We’ve broken things down by socket type in order to keep them manageable. Please note that, because our Asus H57 motherboard tends to draw more power than we’d like, we’ve tested power consumption for the Core i5-530 and the Core i5-661 on our P55 mobo, instead. And since we switch to an 890FX board for our Phenom II X6 testing, we went back and re-tested the Phenom II X4 965 on that same motherboard, to give us a direct comparison between the X4 and X6.
We can slice up these raw data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render. Next, we can look at peak power draw by taking an average from the ten-second span from 15 to 25 seconds into our test period, when the processors were rendering.
The X6 1090T draws only a little more power under load than a Phenom II X4 965 on the same motherboard, and the X6’s power consumption at idle isn’t much higher than the X4’s, either, despite the presence of two more cores and a heckuva lot more transistors. Against Intel, the competitive situation is mixed. The 1090T’s power draw is comparable to that of the Core i7-920, the closest analog we have in our power tests that’s based on the X58 chipset. However, the P55-based Core i7-870 draws substantially less power than the 1090T.
We can highlight power efficiency by looking at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules. (In this case, to keep things manageable, we’re using kilojoules.)
Relatively high power draw at idle contributes to a shaky showing for the X6 1090T here. The Core i7-920 is in the same boat, though, as are the other processors based on Intel’s X58 platform. The lower-end Intel platforms, including the older Core 2 parts, fare much better.
We can pinpoint efficiency more effectively by considering the amount of energy used for the task. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.
The 1090T’s six cores render this scene very efficiently, finishing quickly enough to put the 1090T near the top of the standings. We don’t have power data for the most directly comparable Intel parts, but the Core i7-920 is very close to the 1090T, and the i7-930 would likely be in the same ballpark.
Meanwhile, the Core i7-870 demonstrates the amazing efficiency of fully Hyper-Threaded Lynnfield processors on the P55 platform. The thing is, there’s a little room for platform-level improvement on the AMD side, as well, as demonstrated by the superior efficiency of the Phenom II X4 965 when tested on an 890GX motherboard rather than the 890FX.
Memory subsystem performance
Now that we’ve considered power efficiency, we’ll move on to our performance results, beginning with some synthetic tests of the CPUs’ memory subsystems. These results don’t track directly with real-world performance, but they do give us some insights into the CPU and system architectures involved. For this first test, the graph is pretty crowded. We’ve tried to be selective, generally only choosing one representative from each architecture. This test is multithreaded, so more coreswith associated L1 and L2 cachescan lead to higher throughput.
The additional cores grant the X6 a straightforward increase in L1 and L2 cache bandwidth. Interestingly, because AMD’s caches are exclusivethat is, the lower-level caches don’t replicate data in the higher-level cachesthe total effective cache size on Thuban chips is rather considerable. The L1 data, L2, and L3 caches total up to about 9.4MB. That’s effectively larger, and generally faster, than the Core i7-930’s inclusive cache hierarchy with an 8MB L3. Still, Thuban falls short in cache throughput and total size compared to the 32-nm Core i7-980X, which has six cores of its own and a 12MB last-level cache.
This graph becomes almost impossible to read once we get to the larger block sizes, where we’re really measuring main memory bandwidth. Stream is a better test of that particular attribute.
The X6 chips post a slight but consistent performance increase in the Add and Triad tests compared to older Phenom II processors. That’s likely the result of some tweaks AMD made to the memory controller in Thuban. The gain is not the result of Thuban’s additional cores, because the X6 chips performed best here with only four threads running; those are the results we’ve reported.
We’ve included these results for the sake of completeness, but we’ll admit up front that they may be iffy. This test produces results in CPU cycles, and we convert those numbers to nanoseconds based on clock speeds. Trouble is, clock speeds are no longer static, even though we disable SpeedStep and Cool’n’Quiet for this particular benchmark. Our assumption is that the CPUs will reach their respective turbo peaks during this simple, single-threaded test. However, they may not be doing so in every case. If you assume the X6 1090T is running at its base frequency, it would be at a more pedestrian 50 ns here, not 44. The X6 1055T would be at 57 ns. We’re not sure which is the right answer, and we may have to start disabling turbo in order to conduct these tests.
Interestingly, we measured Thuban’s L3 cache latency at 52 cycles, a little lower than the 57 cycles we saw with the Phenom II X4 965.
This is my favorite game in a long, long time, so I had to use in it our latest CPU test suite. Borderlands is based on Unreal Engine technology and includes built-in speed test, which we used here. We tested with the game set to its highest quality settings at a range of resolutions. The results from the lowest resolutions will highlight the separation between the CPUs best, so I’d pay the most attention to them. The higher resolution results demonstrate what happens when the GeForce GTX 260 graphics card begins to restrict frame rates.
The addition of two more cores and Turbo Core’s higher frequencies don’t seem to offer much benefit here. The X6 1090T runs slightly behind the X4 965 at lower resolutions, and the closest competition from Intel is clearly faster.
Modern Warfare 2
With Modern Warfare 2, we used FRAPS to record frame rates over the course of a 60-second gameplay session. We conducted this gameplay session five times on each CPU and have reported the median score from each processor. We’ve also graphed the frame rates from a single, representative session for each. We tested this game at a relatively low 1024×768 resolution, with no AA, but otherwise using the highest in-game visual quality settings.
AMD’s new entrants don’t do any better in Modern Warfare 2, although here, Intel’s success matters little. Even sub-$100 dual-core processors can crank out a minimum frame rate of about 60 per second, which matches the maximum most LCD monitors can deliver. Processor performance should be the least of your worries in cross-platform titles like these.
Left 4 Dead 2
We tested Left 4 Dead 2 by playing back a custom demo using the game’s timedemo function. Again, we had all of the image quality options cranked, and we tested with 16X anisotropic filtering and 4X antialiasing. The game’s multi-core rendering option was, of course, enabled.
The 1090T consistently trails the X4 965 by a small amount. That suggests the 1090T is generally operating at its 3.2GHz base clock speed in this game, 200MHz shy of the X4 965’s speed. Why is that? For a clue, we can turn to the lower-end AMD processors. Note that the Athlon II X4 635, with four cores at 2.9GHz, outperforms the Phenom II X2 550, with two cores at 3.1GHz (and more cache). That suggests Left 4 Dead 2 is making good use of multithreading, and the presence of four robust threads may be keeping the Phenom II X6 chips from ranging into their Turbo Core frequencies.
In fact, the X6 1090T has yet to best the X4 965 in any of our gaming tests. That’s not a major upset, because newer games running in Windows 7 do seem to take advantage of threading pretty well. We’ll have to keep an eye on this question in other lightly multithreaded benchmarks as we go.
Source engine particle simulation
Next up is a test we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up some benchmarks to demonstrate the benefits of multithreading.
This test runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.
Our CPUs are finally put to good use by Valve’s particle simulation test, which shows a clear progression from our cheaper, slower test subjects to their quicker siblings. The Phenom II X6 1055T finds itself almost neck-and-neck with the Core i5-750, its intended target, while the 1090T doesn’t quite manage to catch up to the i7-870 and i7-930.
We have, for quite some time now, used WorldBench in our CPU tests. Over that time, we’ve found that some of WorldBench’s tests can be rather temperamental and may refuse to run periodically. We’ve also found that some of the same tests tend to have inconsistent results that aren’t always influenced much by processor performance. Other applications in WorldBench 6, like the Windows Media Encoder 9 test, make little or no use of multithreading, despite the fact that such applications are typically nicely multithreaded these days. As a result, we’ve decided to limit our use of WorldBench to a selection of its applications, rather than the full suite.
MS Office productivity
Firefox web browsing
Multitasking – Firefox and Windows Media Encoder
The two new Phenom IIs return to their middle-of-the-pack positions in these three benchmarks. The quad-core Phenom II X4 965 outruns its two successors in both Firefox and the multitasking test. Somehow, even the combination of two more cores and Turbo Core isn’t enough to put the 1090T ahead of its predecessor.
File compression and encryption
7-Zip file compression and decompression
We return to more widely multithreaded tasks here, and Thuban shines appropriately, either matching or soundly beating its intended rivals from the Intel camp.
WinZip file compression
The older version of WinZip embedded in WorldBench uses either one or two threads, clearly no more than that. Fortunately, this gives us a chance to see Turbo Core in action, as the 1090T solidly improves on the X4 965’s performance, though it can’t catch the competing Intel offerings.
TrueCrypt disk encryption
Here’s a new addition at our readers’ request. This full-disk encryption suite includes a performance test, for obvious reasons. We tested with a 50MB buffer size and, because the benchmark spits out a lot of data, averaged and summarized the results in a couple of different ways.
How about that. Six cores rule in TrueCrypt, where the Phenom II X6s are second only to the Core i7-980X Extreme… despite costing a fraction of the price. We suspect the picture will change here once TrueCrypt incorporates support for the encryption-specific acceleration instructions built into Intel’s 32-nm processors, but we’re still awaiting a newer revision of this software.
The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs.
In the past, we’ve added up the time taken by all of the different elements of the panorama creation wizard and reported that number, along with detailed results for each operation. However, doing so is incredibly data-input-intensive, and the process tends to be dominated by a single, long operation: the stitch. So this time around, we’ve simply decided to report the stitch time, which saves us a lot of work and still gets at the heart of the matter.
The Phenom II X6 handles panorama stitching nicely, as the AMD chips pull well ahead of their most direct competitors.
picCOLOR image processing and analysis
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including SSE extensions, multiple cores, and Hyper-Threading. Many of its individual functions are multithreaded.
Recently, at our request, Dr. Müller graciously agreed to re-tool his picCOLOR benchmark to incorporate some real-world usage scenarios. As a result, we now have four new tests that employ picCOLOR for image analysis. I’ve included explanations of each test from Dr. Müller below.
Particle Image Velocimetry (PIV) is being used for flow measurement in air and water. The medium (air or water) is seeded with tiny particles (1..5um diameter, smoke or oil fog in air, titanium dioxide in water). The tiny particles will follow the flow more or less exactly, except may be in very strong sonic shocks or extremely strong vortices. Now, two images are taken within a very short time interval, for instance 1us. Illumination is a very thin laser light sheet. Image resolution is 1280×1024 pixels. The particles will have moved a little with the flow in the short time interval and the resulting displacement of each particle gives information on the local flow speed and direction. The calculation is done with cross-correlation in small sub-windows (32×32, or 64×64 pixel) with some overlap. Each sub-window will produce a displacement vector that tells us everything about flow speed and direction. The calculation can easily be done multithreaded and is implemented in picCOLOR with up to 8 threads and more on request.
Real Time 3D Object Tracking is used for tracking of airplane wing and helicopter blade deflection and deformation in wind tunnel tests. Especially for comparison with numerical simulations, the exact deformation of a wing has to be known. An important application for high speed tracking is the testing of wing flutter, a very dangerous phenomenon. Here, a measurement frequency of 1000Hz and more is required to solve the complex and possibly disastrous motion of an aircraft wing. The function first tracks the objects in 2 images using small recognizable markers on the wing and a stereo camera set-up. Then, a 3D-reconstruction follows in real time using matrix conversions. . . . This test is single threaded, but will be converted to 3 threads in the future.
Multi Barcodes: With this test, several different bar codes are searched on a large image (3200×4400 pixel). These codes are simple 2D codes, EAN13 (=UPC) and 2 of 5. They can be in any rotation and can be extremely fine (down to 1.5 pixel for the thinnest lines). To find the bar codes, the test uses several filters (some of them multithreaded). The bar code edge processing is single threaded, though.
Label Recognition/Rotation is being used as an important pre-processing step for character reading (OCR). For this test in the large bar code image all possible labels are detected and rotated to zero degree text rotation. In a real application, these rotated labels would now be transferred to an OCR-program – there are several good programs available on the market. But all these programs can only accept text in zero degree position. The test uses morphology and different filters (some of them multithreaded) to detect the labels and simple character detection functions to locate the text and to determine the rotational angle of the text. . . . This test uses Rotation in the last important step, which is fully multithreaded with up to 8 threads.
The X6’s strong showings the first two tests are tempered by relatively weak performance in the final two tasks, where even some low-end Intel processors are quicker.
picCOLOR’s synthetic tests put the 1090T ahead of the X4 965 by just a smidgen, but even the Core i5-750 is markedly faster.
Media encoding and editing
x264 HD benchmark
This benchmark tests one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark.
Thuban’s six cores excel here. The Phenom II X6 1055T matches the Core i5-750 in the first pass and beats it in the second pass, while the 1090T Black Edition keeps up with the much pricier Core i7-870.
Windows Live Movie Maker 14 video encoding
For this test, we used Windows Live Movie Maker to transcode a 30-minute TV show, recorded in 720p .wtv format on my Windows 7 Media Center system, into a 320×240 WMV-format video format appropriate for mobile devices.
Surprisingly, Microsoft’s consumer video encoder for Windows doesn’t appear to take advantage of more than four threads. Without additional threading, the X6 1090T needs more time to encode this clip than the X4 965 does. The six-core Core i7-980X is in the same boat, unable to surpass the quad-core i7-975.
LAME MT audio encoding
LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors.
Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.
We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.
Not much to write home about there. The 1090T is one second quicker than the X4 965 in the single-threaded tests, giving us a small taste of a Turbo Core advantage.
3D modeling and rendering
The Cinebench benchmark is based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.
Here, the X6’s six cores, each with formidable floating-point computing power, trump four Intel cores. Both X6 parts outperform their like-priced rivals, and the 1090T also scores higher than the X4 965 in the single-threaded test thanks to some help from Turbo Core.
We’re using the latest beta version of POV-Ray 3.7 that includes native multithreading and 64-bit support.
The X6 chips continue their strong showing in POV-Ray, especially in the chess2 scene, where all six cores are free to go to town. The situation with the benchmark scene is a little more complicated, because it includes a long, single-threaded operation followed by a multithreaded rendering stage. The processors with a turbo feature should adapt well to both stages.
3ds max rendering
Valve VRAD map compilation
This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to pre-compute lighting that goes into games like Half-Life 2.
In our final two rendering tests, the Core i7-930 just beats the Phenom II X6 1090T, while the Core i5-750 trails the 1055T.
Next, we have a slick little [email protected] benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, [email protected] is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.
The [email protected] project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, [email protected] should be a great example of real-world scientific computing.
notfred’s Folding Benchmark CD tests the most common work unit types and estimates the number of points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.
On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.
This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested.
We have, in the past, included results for multiple WU types, but given the fact that per-core performance results are distorted when Hyper-Threading allows multiple threads to be run simultaneously, we’ve decided simply to report the overall score this time.
If you’re into Folding, the Phenom II X6 1090T looks like a very solid choice, with a small advantage over the Core i7-930 in overall points per day across the four types of work units. Our simulated 1055T couldn’t play here, since this benchmark runs in Linux and AMD’s Overdrive utility runs in Windows.
Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He has provided us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:
In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.
In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database.
MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.
The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we’ve tested with one to eight threads.
I should mention that performance scaling in MyriMatch tends to be limited by several factors, including memory bandwidth, as David explains:
Inefficiencies in scaling occur from a variety of sources. First, each thread is comparing to a common collection of tandem mass spectra in memory. Although most peptides will be compared to different spectra within the collection, sometimes multiple threads attempt to compare to the same spectra simultaneously, necessitating a mutex mechanism for each spectrum. Second, the number of spectra in memory far exceeds the capacity of processor caches, and so the memory controller gets a fair workout during execution.
Here’s how the processors performed.
Two facets of Intel’s architecture, Hyper-Threading and an excellent memory subsystem, help grant the Core i7 processors the lead here. Something interesting to note: with six threads active, the Core i7-930 finishes in 78 seconds, two seconds behind the Phenom II X6 1090T. Thing is, the X6 is maxed out and doesn’t benefit from spinning off extra threads, while the i7-930 shaves off additional time when going to eight threads across its four cores.
STARS Euler3d computational fluid dynamics
Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here.
In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:
The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.
The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.
So the higher the score, the faster the computer. Charles tells me these CFD solvers are very floating-point intensive, but they’re oftentimes limited primarily by memory bandwidth. He has modified the benchmark for us in order to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.
Although this is a very different sort of application, these results play out similarly to our MyriMatch scores above. This time, however, the X6 1090T can’t even match the Core i5-750. Meanwhile, the six-core Gulftown chip is nearly twice as fast as Thuban. Sobering.
The X6 1090T is a Black Edition processor, so its multiplier is unlocked for easy overclocking. Combine that with Turbo Core and AMD’s Windows-based Overdrive tweaking utility, and you have utter, total control over the way this thing operates. AMD exposes all of the knobs and dials for Turbo Core right in Overdrive, so you can define how many cores will Turbo up and how far they’ll go.
The utility even lets the user choose the peak voltage used when cores range into Turbo territoryand yes, all of this overclocking goodness works seamlessly in conjunction with Cool’n’Quiet, for lower power draw at idle. As always, Overdrive offers control over the regular, non-Turbo clock multiplier and CPU voltage, too, so you can brew up your own cocktail of excess for both lightly and heavily multithreaded workloads.
Here’s a look at the Overdrive monitoring page in action during one of our overclocking attempts. The cores’ clock speeds range between 3.2GHz and 3.6GHz, and oddly enough, the CPU voltages appear to vary from one core to the next. (For what it’s worth, I believe the Phenom II has a single voltage plane for all cores, and I’d chalk up the variance here to monitoring lag in a very dynamic situation.)
Faced with the prospect of getting to play with all of these knobs and dials, we took the rare step of forsaking our beloved BIOS-based overclocking methods. Via Overdrive, we created a custom profile for our Phenom II X6 1090T that had a base clock of 3.9GHz at 1.4V and a Turbo Core clock of 4.3GHz at 1.525V. I’m not sure what to make of this, but that config seemed to be optimal. Although we were using a beefy tower cooler that generally kept CPU temperatures in check, raising the base clock for all six cores to 4GHz just didn’t work out well for us. One of our test apps would crash or the system would lock, despite raising the base CPU voltage as far as 1.525V. Similarly, taking the Turbo Core max to 4.4GHz wasn’t stable, even if we pushed the Turbo voltage up to 1.55V. Perhaps with a little more tweaking, we could have hit 4.3GHz stable across all six cores, or a larger subset than the Turbo Core default of three, but we devoted an awful lot of time just getting as far as we did.
Here’s how the X6 1090T performs at those considerable clock rates:
It’s faster than a Core i7-975 Extreme, which is pretty darned good. Still, that’s a fair distance from the monstrous Core i7-980X.
What about power consumption at this speed and voltage?
Since all of its power-saving mojo is intact and working properly, the overclocked 1090T draws no more power at idle than its stock-clocked self. Peak power draw is up considerably, well above the Core i7-975 Extreme, but not quite to the dizzying 322W zenith that the overclocked Core i7-980X system reaches. With good aftermarket cooling, you may find a decent amount of headroom of the Phenom II X6, as we did in our sample.
The value proposition
Now that we’ve buried you under mounds of information, what can we make of it all? One way to filter the information is to consider the value proposition for each CPU model. Exercises like this one are inherently fraught with various, scary dangersgiving the wrong impression, committing bad math, overemphasizing price, coming off as irredeemably cheesybut our value comparisons have proven to be popular over time, so we’ve taken another crack at it.
What we’ve done is mash up all of our performance data in one, big summary value for each processor. The performance data for each benchmark was converted to a percentage using the Pentium 4 670 as the baseline. We’ve included nearly every benchmark we used in our overall index, with the exception of the purely synthetic tests like Stream. We excluded MyriMatch, Euler3D, and [email protected], since not all processors were tested in those benchmarks. In cases where the benchmarks had multiple components, we used an overall mean rather than including every component score individually. Each benchmark should thus be represented and weighted equally in the final tally. (The one case where we didn’t average together a single application’s output was WorldBench’s two 3ds max tests, since one measures 3D modeling performance and the other rendering.)
This overall performance index makes us a little bit wary, because it’s simply a mash-up of results from various tests, rather than an index carefully weighted to express a certain set of priorities. Still, our test suite itself is intended to cover the general desktop PC’s usage model, so the index ought to suffice for this exercise.
We then took prices for each CPU from the official Intel and AMD price lists. For our historical comparison, we’ve also included the Core 2 Quad Q6600 and the Pentium 4 670 in a couple of places at their initial launch prices.
If we simply take overall performance and divide by price, we get results that look like this:
By this measure, you should almost always buy one of the cheapest CPUs on the market. This bar chart gives us a strong sense of value, but it may focus our attention a little too exclusively on CPU prices alone. For many of us, time is money, and faster computer hardware is relatively inexpensive. What we really want to know is where we can find the best combination of price and performance for our needs. To give us a better visual sense of that, we’ve devised our nefarious scatter plots.
The faster a processor is, the higher on the chart it will be. The cheaper it is, the closer to the left edge. The better values, then, tend to be closer to the top-left corner of the plot. If you wish, you can find your price range and look for the best performer in that area.
AMD’s new Phenom II X6 processors fare rather well here. The X6 1055T eclipses its closest rival, the Core i5-750. Meanwhile, the 1090T falls between the Core i7-930 and the much pricier Core i7-870 on both the performance and price axes.
That gets us closer to the heart of the matter, but in reality, the price of a processor is just one component of a PC’s total cost, and the various platforms do have some price disparities between them. Echoing our last CPU roundup, we fashioned some sample systems loosely based on the Utility Player build in our latest system guide for each platform type. Our goal was to achieve rough parity by selecting full-sized ATX motherboards with similar, enthusiast-friendly feature sets. Here are the components we picked for the different platforms, along with system prices:
|Platform||Total price||Motherboard||Memory||Common components|
|AMD 890GX||$656.94||Asus M4A89GTD Pro
|4GB Kingston DDR3-1333
|XFX Radeon HD 5770 1GB graphics card ($159.99), Western Digital Caviar Black 1TB hard drive ($109.99), Samsung SH-S223L DVD burner ($26.99), Antec Sonata III case with 500W PSU ($114.99)|
|Intel P45||$656.94||Gigabyte GA-EP45T-USB3P
|Intel P55||$649.94||Gigabyte GA-P55-UD3
|Intel X58||$789.94||Gigabyte GA-X58A-UD3R
|6GB OCZ DDR3-1600
What happens when we factor these rather considerable system prices into our value equation?
Whoa. Not just one, but both of the Phenom II X6s dethrone the Core i5-750. We can probably attribute these rankings to the six-core AMD chips’ much higher performance in highly multithreaded tasks, since the Phenom II X6 1055T often falls close to (and sometimes below) the i5-750 in other apps.
The scatter plot gives us a little more context and highlights another interesting matchup: that of the Phenom II X6 1090T versus the Core i7-930. While both processors perform roughly in the same ballpark in our test suite overall, the Intel chip requires relatively expensive X58 motherboards and triple-channel memory kits, while the AMD chip works happily in more affordable 890GX mobos (and even cheaper Socket AM3 offerings) with dual-channel RAM. The 1090T ends up looking somewhat more attractive as a result.
Performance per dollar isn’t the whole story these days, though. The power efficiency of a processor increasingly helps determine its value proposition for a host of reasons, from total system costs to noise levels to the size of your electric bill. We measured full system power draw and considered efficiency earlier in this article; now, we can factor in system prices to give us a sense of power-efficient performance per dollar.
Despite making inroads on the performance-per-dollar front, AMD still hasn’t quite nabbed the power-efficiency value crown, which falls upon the Core i5-750 once again. The Phenom II X6 1090T still manages to outdo the Core i7-870, though.
One may look at the information we’ve presented in the preceding pages in two ways.
On the product front, the Phenom II X6 processors are unabashedly good news. AMD has managed to create a pair of new processors whose performance and value propositions are even better than some of Intel’s most attractive offerings, the Core i7-930 and the Core i5-750. That’s a considerable achievement, attributable to Thuban’s two major new features. Turbo Core offers a modest but measurable performance boost in lightly multithreaded workloads, while the addition of two cores brings more consequential improvements in heavily multithreaded tasks like video encoding, 3D rendering, and scientific computing. The fact that these things happen in the same 125W power envelope as the Phenom II X4 is cause for celebration. The Phenom II X6 chips match up pretty well against the Core i7-900 series in terms of power efficiency, although they have a ways to go to catch Intel’s Lynnfield processors on the P55 platform.
We absolutely love the degree of control over everything that AMD has exposed in the 1090T Black Edition via its Overdrive utility, too. Those who are prone to tinker will almost surely be able to extract some extra performance from the X6 1090T. If you’re considering building a new system, the Phenom II X6 should at least be part of the conversation. If you already own a Socket AM2+ or Socket AM3 system and are looking to upgrade, the X6 may be the way to go.
On the other hand, to be a real downer, AMD’s accomplishment here is essentially to match the Core i7-940, a product Intel first introduced in the fall of 2008. Against Bloomfield, Thuban is slower in lightly threaded applications and in cases where memory bandwidth is the primary constraint. The 45-nm Bloomfield chip is substantially smaller than Thuban, too. Now, Intel is building an even smaller 32-nm Gulftown part with six cores and, as you’ve seen in the preceding pages, otherworldly performance. AMD is once again contending in the middle range of the CPU market with a compelling product, but technology-wise, they still have a long, tough road ahead before catching Intel.
We’re not convinced that fact will matter to most folks, though. If you have between $200 and $300 in your budget for your next CPU purchase, the Phenom II X6 merits serious consideration, because it’s a solid value for the money.