AMD has been going through some difficult times lately, with management changes, layoffs, and a steady drip of talent drain, as a host of familiar faces have fled for greener pastures. These things have come against a backdrop of mounting financial losses and tough questions about the company’s future and direction.
Much of the turmoil can be traced back to one big, fateful event: the difficult, disappointing birth of the all-new CPU microarchitecture known as Bulldozer. As techie types, we’re perhaps overestimating the role technology plays in these matters. Still, Bulldozer was viewed by many as AMD’s next great hope, its first from-the-ground-up new x86 CPU architecture in over a decade. When the FX processors not only failed to catch up to the competition from Intel but also struggled to beat AMD’s own prior generation in performance and power efficiency, some unpleasant fallout was inevitable.
Once the first chips were out the door, AMD’s engineering task became clear: to do as much as possible to improve the Bulldozer microarchitecture as quickly as possible. Alongside the FX processors, the firm announced a plan that included a series of updates to its CPU cores over the next few years, with promised increases in performance and power efficiency. The first of those incremental updates was dubbed Piledriver, a modest refresh that first hit the market this past spring aboard the Trinity APU. Now, roughly a year after the first FX chips arrived, revamped FX processors based on Piledriver are making their debut, more or less on schedule. That is, all things considered, a very positive sign.
The question now is whether it’s enough. Are these CPUs good enough to carve out some space in the market against some tremendously formidable competition? You may be surprised by the answer.
Vishera ain’t just a river in Russia
The chip that’s the subject of our attention today is code-named Vishera, and it’s the direct successor to the silicon that powered the prior-gen FX processors, which was known as Orochi. Vishera and Orochi share almost everything—both are manufactured on GlobalFoundries’ 32-nm SOI fabrication process, both have 8MB of L3 cache, and both are essentially eight-core CPUs. The one big difference is the transition from Bulldozer to Piledriver cores—or, to put it more precisely, from Bulldozer to Piledriver modules. These “modules” are a fundamental structure in AMD’s latest architectures, and they house two “tightly coupled” integer cores that share certain resources, including a front-end, L2 cache, and floating-point unit. Thus, AMD bills a four-module FX processor as an eight-core CPU, and we can’t entirely object to that label.
|Lynnfield||Core i5, i7||4||8||8 MB||45||774||296|
|Gulftown||Core i7-9xx||6||12||12 MB||32||1168||248|
|Sandy Bridge||Core i5, i7||4||8||8 MB||32||995||216|
|Sandy Bridge-E||Core-i7-39xx||8||16||20 MB||32||2270||435|
|Ivy Bridge||Core i5, i7||4||8||8 MB||22||1400||160|
|Deneb||Phenom II||4||4||6 MB||45||758||258|
|Thuban||Phenom II X6||6||6||6 MB||45||904||346|
|Llano||A8, A6, A4||4||4||1 MB x 4||32||1450||228|
|Trinity||A10, A8, A6||2||4||2 MB x 2||32||1303||246|
We covered the enhancements made to the Piledriver modules in more detail here, but the highlights are pretty straightforward. Piledriver includes a collection of small tweaks to individual parts of the module intended to increase instruction throughput. The changes range from the CPU’s front end through the cores and into the cache subsystem, and no single change contributes much more than a 1% increase in throughput. All together, the gains are maybe on the order of 6%, perhaps less, so we’re not looking at a vast improvement. Still, Piledriver includes other modifications. The FPU supports the three-operand version of the fused multiply-add instruction, a key part of the AVX specification that will also be supported in Intel’s upcoming Haswell chips. This change puts AMD and Intel on the same page going forward. (Support for the FMA4 instruction from Bulldozer is retained, at least for now.) More crucially, Piledriver has been optimized to reach higher clock speeds at lower voltages, a tweak that paid off nicely for the mobile Trinity chip. As you’ll see, it has benefited the desktop FX processors, as well.
|FX-8350||4||8||4.0 GHz||4.2 GHz||2.2 GHz||8 MB||125 W||$195|
|FX-8320||4||8||3.5 GHz||4.0 GHz||2.2 GHz||8 MB||125 W||$169|
|FX-6300||3||6||3.5 GHz||4.1 GHz||2.0 GHz||8 MB||95 W||$132|
|FX-4300||2||4||3.8 GHz||4.0 GHz||2.0 GHz||4 MB||95 W||$122|
The new lineup of FX chips is detailed above. Today’s headliner is the FX-8350, the only one of the four new Vishera-based parts AMD has supplied us for review. The FX-8350 shares the same power envelope (125W) and Turbo peak (4.2GHz) as the chip it supplants, the FX-8150. The most notable difference is the base clock; the FX-8350’s is a nosebleed-inducing 4GHz, up from its predecessor’s 3.6GHz.
The FX-8350’s higher base frequency should boost performance, especially in widely multithreaded workloads. Still, if you’re like me, you’re looking at the 200MHz gap between the base and Turbo peak clock speeds and wondering why it isn’t larger. The whole idea of these dynamic clocking schemes, after all, is to take advantage of the additional thermal headroom made available when not all cores are busy. Vishera can gate off power to inactive modules, granting more space for those that remain active. Higher voltages and frequencies are then usually possible within the same thermal envelope. There’s a 500MHz gap between the base and peak clocks for the FX-8320. Why doesn’t the FX-8350 offer a similar increase in peak clock frequency?
Our best guess is that too few of these chips will tolerate frequencies above 4.2GHz well enough, consistently enough, and at low enough voltages to allow AMD to ship a product in volume with a higher Turbo peak. If so, that’s a shame, because low performance in lightly threaded workloads is arguably this CPU architecture’s biggest weakness. Higher Turbo frequencies could do a lot to remedy that problem.
That said, the FX-8350’s price is quite nice. The $195 sticker positions it between a couple of Intel’s Ivy Bridge-based offerings, the Core i5-3470 at $185 and the Core i5-3570K at $225. Both are true quad-core, four-threaded processors. Of those two, only the i5-3570K has an unlocked upper multiplier for easy overclocking, whereas all of the FX parts are unlocked. On the other hand, the Intel processors have peak power ratings of 77W, vastly lower than the FX-8350’s 125W TDP.
Speaking of smaller power envelopes, the two lower-end FX models take advantage of Piledriver’s power enhancements by dropping to a more modest 95W. The chips they replace, the FX-6200 and FX-4170, are both 125W parts. The new models even sacrifice a bit of clock speed to get there. For instance, the FX-6300 is clocked at 3.5/4.1GHz, while the older FX-6200 runs at 3.8/4.1GHz. AMD tells us it expects the performance of these two parts to be similar, since the per-clock performance gains in Piledriver should make up some of the difference.
The lowest-end FX processor, the FX-4300, overlaps almost entirely with the A10-5800K desktop Trinity that we reviewed earlier this month. Both list for $122. The 5800K has a 200MHz higher Turbo peak, five more watts of max power draw, and integrated graphics. The FX-4300 instead has 4MB of L3 cache, which Trinity lacks. Then again, the A-series APUs have integrated PCIe connectivity and drop into their own brand-new socket, while the new FX series uses the same Socket AM3+ infrastructure as the prior models, so they’re really aimed at different platforms.
Our testing methods
We ran every test at least three times and reported the median of the scores produced.
The test systems were configured like so:
Phenom II X4 850
Phenom II X4 980
Phenom II X6 1100T
Crosshair V Formula
|Memory size||8 GB (2 DIMMs)||8 GB (2 DIMMs)||16 GB
|Memory speed||1600 MT/s||1600 MT/s||1600 MT/s|
Realtek 184.108.40.20602 drivers
Realtek 220.127.116.1102 drivers
Realtek 18.104.22.16802 drivers
|Memory size||8 GB
|Memory speed||1600 MT/s||1600 MT/s||1333 MT/s|
Realtek 22.214.171.12402 drivers
Realtek 126.96.36.19902 drivers
P55/VIA VT1828S with
They all shared the following common elements:
HyperX SH100S3B 120GB SSD
Radeon HD 7950 Double Dissipation 3GB with Catalyst 12.3 drivers
|OS||Windows 7 Ultimate x64 Edition
Service Pack 1
(AMD systems only: KB2646060, KB2645594 hotfixes)
Thanks to Corsair, XFX, Kingston, MSI, Asus, Gigabyte, Intel, and AMD for helping to outfit our test rigs with some of the finest hardware available. Thanks to Intel and AMD for providing the processors, as well, of course.
We used the following versions of our test applications:
- AIDA64 2.30
- Stream 5.8 64-bit
- SiSoft Sandra 2012.SP3
- 7-Zip 9.20 64-bit
- TrueCrypt 7.1a
- Chromium 20.0.1096.0
- SunSpider 0.9.1
- The Panorama Factory 5.3 x64 Edition
- POV-Ray for Windows 3.7 RC5 64-bit
- Cinebench R11.5 64-bit Edition
- LuxMark 2.0
- AMD APP OpenCL ICD 898.1
- Intel OpenCL SDK 1.5
- x264 HD benchmark 4.0
- Windows Live Movie Maker 14
- picCOLOR 4.0 build 722 beta 64-bit
- Qtbench 0.2.2
- MyriMatch proteomics benchmark
- CASE Lab Euler3d CFD benchmark multithreaded edition
- Civilization V
- Battlefield 3
- The Elder Scrolls V: Skyrim
- Crysis 2 with DX11 tessellation and hi-res texture patches
- Batman: Arkham City
- FRAPS 3.4.7
Some further notes on our testing methods:
- The test systems’ Windows desktops were set at 1920×1080 in 32-bit color. Vertical refresh sync (vsync) was disabled in the graphics driver control panel.
- We used a Yokogawa WT210 digital power meter to capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (The monitor was plugged into a separate outlet.) We measured how each of our test systems used power across a set time period, during which time we encoded a video with x264.
- After consulting with our readers, we’ve decided to enable Windows’ “Balanced” power profile for the bulk of our desktop processor tests, which means power-saving features like SpeedStep and Cool’n’Quiet are operating. (In the past, we only enabled these features for power consumption testing.) Our spot checks demonstrated to us that, typically, there’s no performance penalty for enabling these features on today’s CPUs. If there is a real-world penalty to enabling these features, well, we think that’s worthy of inclusion in our measurements, since the vast majority of desktop processors these days will spend their lives with these features enabled. We did disable these power management features to measure cache latencies, but otherwise, it was unnecessary to do so.
The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Memory subsystem performance
These synthetic tests are intended to measure specific properties of the system and may not end up tracking all that closely with real-world application performance. Still, they can be enlightening.
One of Piledriver’s purported tweaks is an improved hardware prefetcher, which populates the L2 cache by examining access patterns and predicting what data will be needed next. Whatever changes AMD has made on that front don’t show up in our Stream results, where the FX-8350 matches the FX-8150 almost exactly. Many of the Intel chips extract more bandwidth from the same dual-channel DDR3 memory config. With four channels, the Core i7-3820 and 3960X achieve nearly double the transfer rates.
This test is multithreaded, so it captures the bandwidth of all caches on all cores concurrently. The different test block sizes step us down from the L1 and L2 caches into L3 and main memory.
Although the FX-8350 achieves somewhat higher cache throughput than the FX-8150, we can probably chalk up the differences to the 8350’s higher base clock frequency. We might be seeing the effects of Piledriver’s larger L1 cache TLB at the 32KB block size, but it’s tough to say for sure.
SiSoft has a nice write-up of this latency testing tool, for those who are interested. We used the “in-page random” access pattern to reduce the impact of prefetchers on our measurements. We’ve reported the results in terms of CPU cycles, which is how this tool returns them. The problem with translating these results into nanoseconds, as we’ve done in the past with latency measurements, is that we don’t always know the clock speed of the CPU, which can vary depending on Turbo responses. At any rate, knowing latency in clock cycles is helpful for understanding, say, the differences between Bulldozer and Piledriver. Imagine that.
Piledriver’s memory subsystem doesn’t appear to be any quicker, on a per-cycle basis, than Bulldozer’s. In fact, the FX-8350’s caches are a bit slower at each step up the ladder.
Some quick synthetic math tests
We don’t have a proper SPEC rate test in our suite (yet!), but I wanted to take a quick look at some synthetic computational benchmarks, to see how the different architectures compare, before we move on to more varied and robust application-based workloads. These simple tests in AIDA64 are nicely multithreaded and make use of the latest instructions, including Bulldozer’s XOP in the CPU Hash test and FMA4 in the FPU Julia and Mandel tests.
The FX-8350 takes the top spot in the CPU Hash test, not surprising given the relatively strong performance of the AMD processors in this integer-focused benchmark. The more FPU-intensive fractal tests are a very different story, with the Sandy and Ivy Bridge-based chips topping the charts. Although Vishera’s four FPUs should, in theory, be capable of the same number of peak FLOPS per clock as any Sandy or Ivy quad-core, the FX-8350’s throughput here is substantially lower, even with the advantage of a higher clock speed. With the aid of the FMA instruction and a 4GHz base clock, at least the FX-8350’s four FPUs are able to outperform the six older FPUs on the Phenom II X6 1100T, a feat the FX-8150 can’t duplicate.
Power consumption and efficiency
Our workload for this test is encoding a video with x264, based on a command ripped straight from the x264 benchmark you’ll see later. This encoding job is a two-pass process. The first pass is lightly multithread and will give us the chance to see how power consumption looks when mechanisms like Turbo and core power gating are in use. The second pass is more widely multithreaded.
We’ve tested all of the CPUs in our default configuration, which includes a discrete Radeon card. We’ve also popped out the discrete card to get a look at power consumption for the A10, Core i3, and A8-3850.
The raw plots above give us a good sense of the several things, including the huge gap in between the max power draw of the AMD and Intel solutions in the same price range.
Notice how the Core i5-3570K draws virtually the same amount of power during the lightly-threaded first stage of the encoding process and the more heavily multithreaded second stage. Presumably, that means the CPU is taking full advantage of its prescribed power envelope during both stages. The FX-8150 isn’t far from that ideal, either. The FX-8350, however, draws quite a bit more power during the second stage than the first. That suggests the FX-8350 is leaving some thermal headroom on the table with its relatively conservative 4.2GHz Turbo frequency.
The FX-8350 is a sizeable chip with a hefty thermal envelope, so these results are no surprise. The basic parameters haven’t changed since the FX-8150. The test system based on the closest competition, the Core i5-3470, draws over 20W less at idle and over 100W less under load than our FX-8350 test rig.
We can quantify efficiency by looking at the amount of power used, in kilojoules, during the entirety of our test period, when the chips are busy and at idle. By that measure, the FX-8350 is an improvement over the FX-8150, since it finishes its work and drops to idle sooner.
Perhaps our best measure of CPU power efficiency is task energy: the amount of energy used while encoding our video. This measure rewards CPUs for finishing the job sooner, but it doesn’t account for power draw at idle.
Although one wouldn’t necessarily think of a 125W processor as power-efficient, the FX-8350 requires less energy to complete this task than any AMD processor before it. This is a pretty solid step forward compared to the Bulldozer-based FX-8150, particularly since Vishera is nothing more than tweaked silicon based on the same basic architecture and built with the same 32-nm SOI fab process.
Again, the competition from Intel is vastly more efficient overall—not just the 22-nm Ivy Bridge parts, but also the 32-nm Sandy Bridge chips.
The Elder Scrolls V: Skyrim
For the gaming tests, we’re using our latency-focused testing methods. If you’re unfamiliar with what we’re doing, you might want to check out our recent CPU gaming performance article, which has a subset of the data here and explains our methods reasonably well.
You can see from the plots that the FX-8350 improves upon the FX-8510 and the Phenom II X6, with more frames generated during the test run and fewer, shorter latency spikes during its duration. (For frame time plots from all of the CPUs tested, go here.)
Although the FX-8350 has the highest FPS average of any AMD processor we’ve tested, the Phenom II X4 980 still edges it out in our latency-focused metric, the 99th percentile frame time. By either measure, the FX-8350 is one of AMD’s fastest gaming chips—but you can easily see the problem with that statement, compared to the recent Intel processors. Even the lowly Pentium G2120 is faster in this Skyrim test scenario.
We suspect the Bulldozer architecture’s trouble with gaming comes down to relatively low per-thread performance in lightly threaded workloads. In many games, a single, branchy control thread tends to be the performance limiter. The FX-8150’s frame latencies spike upward for the last 5% or so of frames, which prove to be difficult for it. The FX-8350 doesn’t really change that dynamic—the spike in the last 5% is still present—but its frame times are lower across the board. The improvement is enough to push the FX-8350 slightly ahead of the Phenom II X6 1100T in the last few percentage points. That’s progress. Unfortunately, AMD has a much longer way to go in order to catch Intel’s current processors.
Before anyone panics over the gap between Intel and AMD in this latency-sensitive gaming test, we’ll want to ground our analysis in reality by considering the amount of time spent on truly long-latency frames. Once we do so, some of the practical concerns about FX-8350 performance dissipate. Virtually none of the processors spend any time working on frames for more than 50 milliseconds, our usual threshold for “badness.” That means you’re looking at reasonably fluid animation with most of these CPUs, including the FX-8350. In fact, we have to ratchet the threshold past our customary next stop, 33 milliseconds or 30 FPS, and down to 16.7 milliseconds—equivalent to 60 FPS—to see meaningful differences between the CPUs.
Batman: Arkham City
When we’re moving across the Arkham cityscape, the game engine has to stream in new areas every so often, and that difficult task is partially CPU-bound. You can see the spikes in all of the frame time plots that come at semi-regular intervals, and you’ll notice that the spikes tend to be shorter on the faster processors.
The FPS average and our 99th percentile frame time metric agree: in this tough test filled with slowdowns, the FX-8350 outperforms the prior champ from the green team, the Phenom II X4 980. The 99th percentile result tells the story: the FX-8350 delivers 99% of the frames in this test run in under 25 milliseconds, equivalent to 40 FPS.
The FX-8350’s latency curve looks quite decent, too, with a smooth and not-too-large ramp upward in the last few percentage points worth of frames.
The occasional spikes throughout the test run mean that all of the CPUs spend a little time beyond our 50-millisecond threshold, but the FX-8350 only burns 70 milliseconds working on long-latency frames during our entire 90-second test period. That’s just a few momentary hitches, and it’s less than half the time the FX-8150 spends beyond the same threshold. Still, competing solutions like the Core i5-3470 and i5-3570K reduce those hitches to nearly nothing.
Here’s a game that runs quite well on nearly all of the CPUs we tested, with one notable exception: the Pentium G2120, the only processor of the group with only two physical cores and two logical threads. The rest have at least four threads via Hyper-Threading.
The FX-8350 performs very well with this nicely threaded game engine, particularly in our more latency-focused metrics. In fact, the FX-8350 spends the least time of any CPU beyond our ultra-tight 16.7 millisecond threshold.
Notice the spike at the beginning of the test run; it happens on each and every CPU. You can feel the hitch while playing. Apparently, the game is loading some data for the area we’re about to enter. Faster CPUs tend to reduce the size of the spike.
Here are more signs of life from the AMD camp. The FX-8350 outright ties the Intel competition, the Core i5-3470, in the FPS average metric, and the FX-8350’s 99th percentile frame time is only a fraction longer.
The difference in the latency curves from the FX-8150 to the FX-8350 illustrates AMD’s progress. The FX-8150 struggles in roughly a quarter of the frames, with latencies rising to near 20 milliseconds, while the FX-8350 doesn’t reach the 20-millisecond mark until the final 4% or so of frames rendered. Once we reach the last 1% or so of really tough frames, the FX-8350 essentially matches the competition from Intel.
That single big spike at the beginning of the test run contributes virtually all of the time the faster CPUs spend beyond our 50-ms threshold, as you can tell from the plots. We burned about 50% more time waiting for that one frame to finish on the FX-8350 than on the competing Intel products.
Multitasking: Gaming while transcoding video
A number of readers over the years have suggested that some sort of real-time multitasking test would be a nice benchmark for multi-core CPUs. That goal has proven to be rather elusive, but we think our latency-oriented game testing methods may allow us to pull it off. What we did is play some Skyrim, with a 60-second tour around Whiterun, using the same settings as our earlier gaming test. In the background, we had Windows Live Movie Maker transcoding a video from MPEG2 to H.264. Here’s a look at the quality of our Skyrim experience while encoding.
Well, good news and bad news here, I suppose. On the positive front, the FX-8350 outperforms any prior AMD CPU in this test scenario and delivers a fairly smooth Skyrim experience while encoding video in the background. On the downside, the FX-8350’s eight cores do not deliver a superior multitasking experience compared to the quad-core competition from Intel. Even the two-generations-old Core i5-760 is faster.
Civ V will run this benchmark in two ways, either while using the graphics card to draw everything on the screen, just as it would during a game, or entirely in software, without bothering with rendering, as a pure CPU performance test.
Either way you cut it, the FX-8350 remains true to what we’ve seen in our other gaming tests: it’s pretty fast in absolute terms, easily improves on the performance of prior AMD chips, and still has a long way to go to catch Sandy Bridge, let alone Ivy.
Compiling code in GCC
Another persistent request from our readers has been the addition of some sort of code-compiling benchmark. With the help of our resident developer, Bruno Ferreira, we’ve finally put together just such a test. Qtbench tests the time required to compile the QT SDK using the GCC compiler. Here is Bruno’s note about how he put it together:
QT SDK 2010.05 – Windows, compiled via the included MinGW port of GCC 4.4.0.
Even though apparently at the time the Linux version had properly working and supported multithreaded compilation, the Windows version had to be somewhat hacked to achieve the same functionality, due to some batch file snafus.
After a working multithreaded compile was obtained (with the number of simultaneous jobs configurable), it was time to get the compile time down from 45m+ to a manageable level. This required severe hacking of the makefiles in order to strip the build down to a more streamlined version that preferably would still compile before hell froze over.
Then some more fiddling was required in order for the test to be flexible about the paths where it was located. Which led to yet more Makefile mangling (the poor thing).
The number of jobs dispatched by the Qtbench script is configurable, and the compiler does some multithreading of its own, so we did some calibration testing to determine the optimal number of jobs for each CPU.
TrueCrypt disk encryption
TrueCrypt supports acceleration via Intel’s AES-NI instructions, so the encoding of the AES algorithm, in particular, should be very fast on the CPUs that support those instructions. We’ve also included results for another algorithm, Twofish, that isn’t accelerated via dedicated instructions.
7-Zip file compression and decompression
Ah. Now that we’ve moved past the gaming tests, we’re on friendlier ground for the FX-8350. Those eight integer cores can all contribute heavily in most of the tests above, and as a result, the FX-8350 doesn’t just match the Core i5-3570K—it rivals the much pricier Core i7-3770K. SunSpider is the lone exception to that trend, likely because not all elements of it are widely multithreaded.
The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs.
In the past, we’ve added up the time taken by all of the different elements of the panorama creation wizard and reported that number, along with detailed results for each operation. However, doing so is incredibly data-input-intensive, and the process tends to be dominated by a single, long operation: the stitch. Thus, we’ve simply decided to report the stitch time, which saves us a lot of work and still gets at the heart of the matter.
picCOLOR image processing and analysis
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including SSE extensions, multiple cores, and Hyper-Threading. Many of its individual functions are multithreaded.
At our request, Dr. Müller graciously agreed to re-tool his picCOLOR benchmark to incorporate some real-world usage scenarios. As a result, we now have four tests that employ picCOLOR for image analysis: particle image velocimetry, real-time object tracking, a bar-code search, and label recognition and rotation. For the sake of brevity, we’ve included a single overall score for those real-world tests.
x264 HD benchmark
This benchmark tests one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark.
Windows Live Movie Maker 14 video encoding
For this test, we used Windows Live Movie Maker to transcode a 30-minute TV show, recorded in 720p .wtv format on my Windows 7 Media Center system, into a 320×240 WMV-format video format appropriate for mobile devices.
The FX-8350 continues to demonstrate solid advancements over the FX-8150 in every test above, but these image-centric applications are a bit more of a challenge. Only in the second pass of the x264 test does the FX-8350 manage to match or outperform its closest rivals from Intel.
Since LuxMark uses OpenCL, we can also use it to test both GPU and CPU performance—and even to compare performance across different processor types. Since OpenCL code is by nature parallelized and relies on a real-time compiler, it should adapt well to new instructions. For instance, Intel and AMD offer integrated client drivers for OpenCL on x86 processors, and they both claim to support AVX. The AMD APP driver even supports Bulldozer’s and Piledriver’s distinctive instructions, FMA4 and XOP.
We’ll start with CPU-only results. These results come from the AMD APP driver for OpenCL, since it tends to be faster on both Intel and AMD CPUs, funnily enough.
Now we’ll see how a Radeon HD 7950 performs when driven by each of these CPUs.
Finally, we can combine CPU and GPU computing power to see whether we can extract more performance with the two processor types both working on the same problem at once.
The FX-8350 decidedly outperforms the Core i5-3570K when asked to tackle the problem entirely by itself via the AMD APP ICD. Only the recent Intel CPUs with Hyper-Threading and four (or more) cores are faster. However, the Radeon is clearly more proficient at this job than any of the CPUs, and, like most of the processors, the FX-8350 is better off just feeding the Radeon data than trying to help with the computation.
The Cinebench benchmark is based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.
Turns out the FX-8150 is no slouch in these rendering apps, and the FX-8350’s solid gains over its predecessor allow it to place near the top of the charts, rivaling the Hyper-Threaded Intel quad cores.
STARS Euler3d computational fluid dynamics
Euler3D tackles the difficult problem of simulating fluid dynamics. Like MyriMatch, it tends to be very memory-bandwidth intensive. You can read more about it right here.
Performance in these two scientific computing workloads used to track together pretty closely, believe it or not, and appeared to be primarily limited by memory bandwidth. Over time, the performance results in these two workloads have diverged as CPU architectures have diverged.
All of AMD’s FX processors are unlocked, so overclocking them is, in theory, as easy as turning up the multiplier. I generally prefer to overclock my CPUs using the BIOS—err, firmware—rather than the various Windows programs out there. However, more recently, I’ve taken a liking to the ease and quickness of AMD’s Overdrive utility, along with its ability to control Turbo Core behavior very precisely. So, when it came time to overclock the FX-8350, I decided to use Overdrive. I’m not sure it was the right call, but that’s what I used.
When you’re overclocking a CPU that starts out at 125W, you’re gonna need some decent cooling. AMD recommends the big-ass FX water cooler we used to overlocked the FX-8150, but being incredibly lazy, I figured the Thermaltake Frio OCK pictured above, which was already mounted on the CPU, ought to suffice. After all, the radiator is just as large as the water cooler’s, and the thing is rated to dissipate up to 240W. Also, I swear to you, there is plenty of room—more than an inch of clearance—between the CPU fan and the video card, even though it doesn’t look like it in the picture above. Turns out the Frio OCK kept CPU temperatures in the mid 50° C range, even at full tilt, so I think it did its job well enough.
Trouble is, I didn’t quite get the results I’d hoped. As usual, I logged my attempts at various settings as I went, and I’ve reproduced my notes below. I tested stability using a multithreaded Prime95 torture test. Notice that I took a very simple approach, only raising the voltage for the CPU itself, not for the VRMs or anything else. Perhaps that was the reason my attempts went like so:
4.8GHz, 1.475V – reboot
4.7GHz, 1.4875V – lock
4.6GHz, 1.525V – errors on multiple threads
4.6GHz, 1.5375V – errors with temps ~55C
4.6GHZ, 1.5375V, Turbo fan – stable with temps ~53.5C, eventually locked
4.6GHZ, 1.5375V, manual fan, 100% duty cycle at 50C – lock
4.6GHZ, 1.55V, manual fan, 100% duty cycle at 50C – crashes, temps ~54.6C
4.4GHz, 1.55V – ok
4.5GHz, 1.55V – ok, ~57C, 305W
4.5GHz, 1.475V – errors
4.5GHz, 1.525V – errors
4.5GHz, 1.5375V – OK, ~56C
At the end of the process, I could only squeeze an additional 500MHz out of the FX-8350 at 1.5375V, one notch down from the max voltage exposed in the Overdrive utility. AMD told reviewers to expect something closer to 5GHz, so apparently either I’ve failed or this particular chip just isn’t very cooperative.
I disabled Turbo Core for my initial overclocking attempts, but once I’d established a solid base clock, I was able to grab a little more speed by creating a Turbo Core profile that ranged up to 4.8GHz at 1.55V. Here’s how a pair of our benchmarks ran on the overclocked FX-8350.
A couple of other notes. First, remember that we measured peak power draw for the stock-clocked FX-8350 system at 196W in x264 encoding. The overclocked and overvolted config tested above peaked at about 262W, considerably more than the stock one. As you might imagine, when dealing with that sort of heat production, our Frio OCK was spun up like Joe Biden during the VP debate.
Second, I had hoped to include a quick Skyrim test to see how the FX-8350’s gaming performance is improved by higher clock frequencies, but when I went to test it, our overclocked config wasn’t entirely stable. The game didn’t crash, but our character moved around erratically from time to time. (I’m straining to resist making a second Biden reference here.) We’ll have to spend more time with the FX-8350 in order to find an optimal overclocked config.
You’ve probably gathered that the FX-8350 improves on its Bulldozer-based precursor pretty handily for a chip that’s neither a die shrink nor an all-new architecture.
The final verdict on the FX-8350 isn’t terribly difficult to render, but it does have several moving parts. As usual, our value scatter plots will help us sort out the key issues. I’ve created a couple of them for your viewing pleasure. The first one shows overall performance from our entire CPU test suite (a geometric mean), with the exception of the synthetic benchmarks back on page three. Our gaming tests are a component of this overall performance metric. The second scatter plot isolates gaming performance by itself, with our latency-focused 99th percentile frame time results converted to FPS for easy readability. On both plots, the best values will be closer to the top left corner, where prices are low and performance is high.
The overall performance scatter offers some good news for AMD fans: the FX-8350 outperforms both the Core i5-3470 and the 3570K in our nicely multithreaded test suite. As a result, the FX-8350 will give you more performance for your dollar than the Core i5-3570K, and it at least rivals our value favorite from Intel, the Core i5-3470.
Pop over to the gaming scatter, though, and the picture changes dramatically. There, the FX-8350 is the highest-performance AMD desktop processor to date for gaming, finally toppling the venerable Phenom II X4 980. Yet the FX-8350’s gaming performance almost exactly matches that of the Core i3-3225, a $134 Ivy Bridge-based processor. Meanwhile, the Core i5-3470 delivers markedly superior gaming performance for less money than the FX-8350. The FX-8350 isn’t exactly bad for video games—its performance was generally acceptable in our tests. But it is relatively weak compared to the competition.
This strange divergence between the two performance pictures isn’t just confined to gaming, of course. The FX-8350 is also relatively pokey in image processing applications, in SunSpider, and in the less widely multithreaded portions of our video encoding tests. Many of these scenarios rely on one or several threads, and the FX-8350 suffers compared to recent Intel chips in such cases. Still, the contrast between the FX-8350 and the Sandy/Ivy Bridge chips isn’t nearly as acute as it was with the older FX processors. Piledriver’s IPC gains and that 4GHz base clock have taken the edge off of our objections.
The other major consideration here is power consumption, and really, the FX-8350 isn’t even the same class of product as the Ivy Bridge Core i5 processors on this front. There’s a 48W gap between the TDP ratings of the Core i5 parts and the FX-8350, but in our tests, the actual difference at the wall socket between two similarly configured systems under load was over 100W. That gap is large enough to force the potential buyer to think deeply about the class of power supply, case, and CPU cooler he needs for his build. One could definitely get away with less expensive components for a Core i5 system.
That’s likely why AMD has offered some inducements to buy the FX-8350, including a very generous $195 price tag and an unlocked multiplier. If you’re willing to tolerate more heat and noise from your system, if you’re not particularly concerned about the occasional hitch or slowdown while gaming, if what you really want is maximum multithreaded performance for your dollar… well, then the FX-8350 may just be your next CPU. I can’t say I would go there, personally. I’ve gotten too picky about heat and noise over time, and gaming performance matters a lot to me.
Still, with the FX-8350, AMD has returned to a formula that has endeared it to PC enthusiasts time and time again: offering more performance per dollar than you’d get with the other guys, right in that sub-$200 sweet spot. That’s the sort of progress we can endorse.
Follow me on Twitter for occasional outbursts.