AMD’s quad-core Opteron 2300 processors

Somewhere around mid-morning this past Friday, a rather large package made its way into the depths of Damage Labs.

Inside was a server containing something very special: a pair of AMD’s new quad-core Opteron processors. The chip code-named “Barcelona” has been something of an enigma during its development, both because of questions about exactly when it would arrive and how it would perform when it did. After a long, hot weekend of non-stop testing, we have some answers to those questions. AMD is formally introducing its Barcelona-based Opteron 2300-series processors today, so the time is now. As for the performance, well, keep reading to see exactly how the new Opterons compare to Intel’s quad-core Xeons.


Introducing the Opteron 2300 series

As I said, we received AMD’s new Opterons just this past Friday. I’ve concentrated my efforts since then on testing the heck out of them, so you’re going to be spared my attempts to summarize this new CPU architecture in any kind of depth. If you’re unfamiliar with AMD’s K10 architecture and want an in-depth look at how it works, let me suggest reading David Kanter’s excellent overview of Barcelona. I will give you some basics, though.

Barcelona is a single-chip, native quad-core design. Each of those cores have been substantially revised to improve performance per clock cycle through a variety of tweaks, some big and some small. The cores now have a wider, 32-byte instruction fetch, and the floating-point units can execute 128-bit SSE operations in a single clock cycle (including the Supplemental SSE3 instructions Intel included in its Core-based Xeons). Accordingly, the Barcelona core has more bandwidth throughout in order to accommodate higher throughput—internally between units on the chip, between the L1 and L2 caches, and between the L2 cache and the north bridge/memory controller.

AMD has also added an L3 cache to the chip. That results in a cache hierarchy that includes 64KB of dedicated L1 cache and 512KB of dedicated L2 cache per core, bolstered by a 2MB L3 cache that’s shared dynamically between all four cores. The total effective cache size is still much smaller than Intel’s Xeons, but AMD claims its mix of dedicated and shared caches can avoid contention problems that Intel’s large, shared L2 might have.

Behind this L3 cache sits an improved memory controller, still integrated into the CPU as with previous Opterons. AMD claims this memory controller is better able to take advantage of the higher bandwidth offered by DDR2 memory thanks to a number of enhancements, including buffers that are between 2X and 4X the size of those in previous Opterons and an improved prefetch mechanism. Perhaps most notably, the new controller can access each 64-bit memory channel independently, reading from one while writing to another, instead of just treating dual memory channels as a single 128-bit device.

Throughout Barcelona, from this memory controller to the CPU cores, AMD has made revisions with power-efficiency in mind. That starts with clock gating, whereby portions of the chip not presently in use are temporarily deactivated. AMD says it has improved its clock gating on both coarse- and fine-grained scales, combining the ability to turn off, say, the entire floating-point unit when running integer-heavy code with the ability to put smaller logic blocks on the chip to sleep when they’re not needed. Even the memory controller will turn off its write logic during reads and vice-versa.

Clock gating is a commonly used technique these days, but some of Barcelona’s tricks are more novel. Unlike other x86 multicore processors, each of Barcelona’s CPU cores is clocked independently, so that each one can raise and lower its clock speed (via PowerNow) dynamically in response to demand. (In Intel’s current Xeons, one core at high utilization means the other core on that chip must run at a higher clock speed, as well.) Barcelona’s CPU voltage is still dependent on power state of the core with highest utilization, but AMD has separated the power plane for the chip’s CPU core from the power plane for its memory controller. As a result, the memory controller and CPU cores can each draw only the power they need.

All told, these modifications led to a chip comprised of approximately 463 million transistors. As manufactured on AMD’s 65nm SOI process, Barcelona measures 285mm².

The obvious goals for Barcelona included several key things: doubling the number of CPU cores per socket, raising the number of instructions each core can execute per clock, keeping power use relatively low by taking advantage of opportunities for dynamic scaling, and in doing so, achieving vastly improved performance per watt. AMD also sought to extend its excellent HyperTransport-based system architecture, although many of those improvements will have to wait for platform and chipset updates. The most urgent overarching goal, though, was undoubtedly restoring AMD’s competitive position compared to Intel’s Xeons based on the formidable Core microarchitecture.

The nuts and bolts of the quad-core Opterons

AMD continues its tradition with these new Opterons of making them drop-in replacements for the existing infrastructure. In this case, that infrastructure involves Socket F-class servers and workstations. With only a BIOS update, these systems can move from dual-core to quad, without need for a change in motherboards, cooling solutions, or power supplies—not a bad proposition at all. That upgrade proposition does come with a caveat, though: older motherboards that don’t support Barcelona’s split power planes will suffer a performance hit with certain Opteron 2300 models. For example, the Opteron 2350’s default memory controller clock is 1.8GHz. Without separate voltage domain, though, the 2350’s memory controller drops to 1.6GHz. That matters quite a bit more than you might think, in part because the L3 cache uses the same clock.



A pair of Opteron 2350 processors

AMD is introducing another innovation of sorts with Barcelona in the form of a new power rating, dubbed ACP for “average CPU power.” Differences in describing a processor’s maximum power and thermal envelope, known as Thermal Design Power, have long been a source of contention between Intel and AMD. For ages, AMD has argued that its TDP ratings are an absolute maximum while Intel’s are something less than that, and—hey, not fair! At the same time, AMD hasn’t had the same class of dynamic thermal throttling that Intel’s chips have, so it’s had to make do with more conservative estimates. The problem, according to AMD, is that its numbers were being compared directly to Intel’s, which could be misleading—particularly since its processors incorporate a north bridge, as well.

At long last, AMD is looking to sidestep this issue by creating a new power rating for its CPUs. Despite the name, ACP is not so much about “average” power use but about power use during high-utilization workloads. AMD has a methodology for defining a processor’s ACP that involves real-world testing with such workloads, and the company will apparently be using ACP as the primary way to talk about its CPUs’ power use going forward, though it will still disclose max power, as well. To give you a sense of things, standard Opterons with a 95W max power rating will have a 75W ACP. This move may be controversial, but personally, I think it’s probably justifiable given the power draw profiles we’ve seen from Opterons. I’m not especially excited about it one way or another since we spend hours measuring CPU power use around here. We’ll show you numbers shortly, and you can decide what to think about them.

Now that you know what ACP means, here’s a look at the initial Opteron 2300 lineup, complete with ACP and TDP numbers for each part.

Clock speed North bridge

speed

ACP

TDP

Price
Opteron
2350
2.0GHz 1.8GHz

75W

95W

$389
Opteron
2347
1.9GHz 1.6GHz

75W

95W

$316
Opteron
2347 HE
1.9GHz 1.6GHz

55W

68W

$377
Opteron
2346 HE
1.8GHz 1.6GHz

55W

68W

$255
Opteron
2344 HE
1.7GHz 1.4GHz

55W

68W

$209

These chips fit into the same basic power envelopes as current Opterons, obviously, and AMD continues to offer HE models with higher power efficiency for a slight price premium. These first chips run at rather low clock frequencies, with even lower memory controller/L3 cache speeds. Fortunately, AMD does plan to ramp up clock speeds. To demonstrate that, they shipped us a pair of 2.5GHz Barcelona engineering samples at the eleventh hour, which they later christened as the Opteron 2360 SE. These higher-frequency products won’t be available until some time in the fourth quarter of this year, but we can give you a preview of their performance today. Naturally, we’ve run them through our full gamut of tests, along with a pair of 2GHz Opteron 2350s. We also have a pair of Opteron 2347 HEs, but we’ve had to defer that review to another day due to time constraints.

Test notes

You’ll want to watch several key matchups in the results, including:

  • Opteron 2350 vs. Xeon E5345 — This is the matchup AMD has identified as the most direct comparison on a price and performance basis. Looks to me like E5345s still cost more than the Opteron 2350’s list price, but this 2.33GHz Xeon will be a good foil for the 2GHz Barcelona.
  • Opteron 2350 vs. Xeon L5335 — Here’s your clock-for-clock comparison between Intel’s quad-core Xeons and AMD’s quad-core Opterons. The L5335 is a brand-new, low-power version of the Xeon whose most direct competitor might be the Opteron 2347 HE, but CPU microarchitecture geeks will appreciate comparing performance per clock between these two.
  • Opteron 2360 SE vs. Opteron 2218 HE — Watch this one in single-threaded (and up to four-threaded) tests to get a rough sense of Barcelona’s per-clock performance versus older Opterons. The 2218 HE runs at 2.6GHz and the 2360 SE runs at 2.5GHz, so it’s not an exact match, but it’s close.
  • Opteron 2360 SE vs. Xeon X5365 — The best of breed from AMD and Intel face off.

You can see our system configs and test applications below. We’ve tried to produce as direct a comparison as possible. We even tested the Xeons and Opterons 2300s in the same sort of chassis with the same cooling solution. The Opteron 2200s were in a slightly different enclosure, but I did some testing and believe the power consumption from its cooling fans is similar. All of the systems used the same model of power suppy unit.

I think we have a good mix of tests, but they’re more heavily geared toward HPC and digital content creation than traditional server workloads. We have added SPECjbb2005 this time out, and we continue working on developing some other server-oriented tests to add to our suite. Unfortunately, our preparation time for this review was limited.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processors Dual
Xeon L5335 2.0GHz

Dual Xeon
E5345
2.33GHz

Dual Xeon
X5365
3.0GHz

Dual
Opteron
2218 HE
2.6GHz

Dual Opteron
2220
2.8GHz

Dual
Opteron 2350 2.0GHz

Dual Opteron 2360 SE 2.5GHz

System
bus
1333MHz
(333MHz quad-pumped)
1GHz
HyperTransport
1GHz
HyperTransport
Motherboard SuperMicro
X7DB8+
Tyan
Tiger K8SSA (S3992)
SuperMicro
H8DMU+
BIOS
revision
8/13/2007 5/29/2007 8/15/2007
North
bridge
Intel
5000P MCH
ServerWorks
BCM 5780
Nvidia
nForce Pro 3600
South
bridge
Intel
6321 ESB ICH
ServerWorks
BCM 5785
Nvidia
nForce Pro 3600
Chipset
drivers
INF
Update 8.3.0.1013
SMBus
driver 4.57
Memory
size
8GB
(8 DIMMs)
8GB
(8 DIMMs)
8GB
(8 DIMMs)
Memory
type

1024MB DDR2-667 FB-DIMMs at 667MHz

1024MB ECC reg. DDR2-667 DIMMs at 667MHz

1024MB ECC reg. DDR2-667 DIMMs at 667MHz
CAS
latency (CL)
5 5 5
RAS
to CAS delay (tRCD)
5 5 5
RAS
precharge (tRP)
5 5 5
Cycle
time (tRAS)
15 13 15
Storage
controller
Intel
6321 ESB ICH
with

Intel Matrix Storage Manager 7.6

Broadcom
RAIDCore with

1.1.7057.1 drivers

Nvidia
nForce Pro 3600 with

6.87 drivers

Hard
drive
WD
Caviar WD1600YD 160GB
Graphics Integrated
ATI ES1000 with 6.14.10.6553 drivers
OS Windows
Server 2003 R2 Enterprise x64 Edition with Service Pack 2
Power
supply
Ablecom
PWS-702A-1R
700W

We used the following versions of our test applications:

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

With all of the talk about Barcelona’s increased throughput, I figured we should put that to the test. Here’s a quick synthetic benchmark of cache and memory bandwidth.

Barcelona delivers as advertised on this front, doubling the L1 and more than doubling the L2 cache bandwidth of the older Opteron 2200s, despite having lower clock speeds. Let’s take a closer look at the tail end of these results, where we’re primarily accessing main memory. I believe these results show memory bandwidth available to a single CPU core, not total system bandwidth, but they’re still enlightening.

The improvements to Barcelona’s memory controller appear to pay off nicely here. I’m a little dubious about the relatively low results for the Xeons, though. I expect we could see higher results with a different test.

Anyhow, that’s bandwidth, but its close cousin is memory access latency. Opterons have traditionally had very low latencies thanks to their integrated memory controllers. How does Barcelona look here?

Well, that’s not so good. Let’s look a little closer at the results with the aid of some fancy 3D graphs, and I think we can pinpoint a reason for the Opteron 2300s’ higher memory access latencies. In the graphs below, by the way, yellow represents L1 cache, light orange is L2 cache, red is L3 cache, and dark orange is main memory. Just because we can.

Ok, stop right there and have a look. The Opteron 2350’s L3 cache has a latency of about 23ns, and the 2360 SE’s L3 latency is about 19ns. Since latency in the memory hierarchy is a cumulative thing, that’s very likely the cause of our higher memory access latencies. I would give you the L3 cache latency in CPU clock cycles, but that’s kind of beside the point. Barcelona’s L3 cache runs at the speed of the north bridge—so 1.8GHz in the 2350 and 2.0GHz in the 2360 SE. The L3 cache may have some additional latency for other reasons: because cache access between the four cores is doled out in a round-robin fashion and because of the FIFO buffers that sit in front of this cache in order to deal with cores running at what may be vastly different clock speeds.

Adding the L3 cache in this way was undoubtedly a tradeoff for AMD, but it certainly carries a hefty latency penalty. This penalty may become less pronounced when Barcelona reaches higher clock speeds. AMD says the memory controller’s speed can increase as clock frequencies do.

SPECjbb2005

SPECjbb2005 simulates the role a server would play executing the “business logic” in the middle of a three-tier system with clients at the front-end and a database server at the back-end. The logic executed by the test is written in Java and runs in a JVM. This benchmark tests scaling with one to many threads, although its main score is largely a measure of peak throughput.

SPECjbb2005 can be configured to run in many different ways, with different performance outcomes, depending on the tuning of the JVM, thread allocations, and all sorts of other things. I had no intention of producing a record score myself; I just wanted to test relative performance on equal footing. We’ll leave peak scores to the guys who spend their days optimizing for a single benchmark.

I used the Sun JVM for Windows x64, and I found that using two instances of the JVM produced the best scores on the Opteron-based systems. Scores with one or two instances were about the same on the Xeons, so I settled on two instances for my testing, with the following Java options:

-Xms2048m -Xmx4096m +XX:AggressiveOpts

Those settings produced the following results:

In our first real performance test, Barcelona comes out looking very good. The Opteron 2350 outperforms the Xeon E5345, and the 2.5GHz Opteron 2360 SE beats out the 3GHz Xeon X5365—a promising start indeed.

Valve VRAD map compilation

This next test processes a map from Half-Life 2 using Valve Software’s VRAD lighting tool. Valve uses VRAD to precompute lighting that goes into games like Half-Life 2. This isn’t a real-time process, and it doesn’t reflect the performance one would experience while playing a game. Instead, it shows how multiple CPU cores can speed up game development.

I’ve included a quick Task Manager snapshot from the test below, and I’ll continue that on the following pages. That’s there simply to show how well the application makes use of eight CPU cores, when present. As you’ll see, some apps max out at four threads.

This is a disappointing way to follow up that SPECjbb performance. Barcelona can’t match the Xeons clock for clock here, which leaves the 2GHz 2350 trailing the rest of the quad-core processors and the 2.5GHz 2360 SE behind the 2.33GHz Xeon E5345. Obviously, even this performance is a huge improvement over the Opteron 2200 series, though, and at least puts AMD back in the game.

Cinebench

Graphics is a classic example of a computing problem that’s easily parallelizable, so it’s no surprise that we can exploit a multi-core processor with a 3D rendering app. Cinebench is the first of those we’ll try, a benchmark based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores are available.

I had high hopes for Barcelona’s purported improvements in floating-point math, but we’re just not seeing it here. Have a look at the single-threaded performance of the Opteron 2218 HE (at 2.6GHz) versus the Opteron 2360 SE (at 2.5GHz): performance per clock is nearly identical between K8 and Barcelona. The one saving grace for the new Opterons is strong multi-threaded scaling. The Xeon E5345 is faster than the 2360 SE with one thread but slower with eight. Put another way, the E5345 offer a 6.2X speedup with multithreading, while the Barcelona’s is nearly 7X.

POV-Ray rendering

We caved in and moved to the beta version of POV-Ray 3.7 that includes native multithreading. The latest beta 64-bit executable is still quite a bit slower than the 3.6 release, but it should give us a decent look at comparative performance, regardless.

Again, we’re seeing strong performance scaling with Barcelona, but not dominance in floating-point math. The Xeon L5335 at 2GHz is just a few ticks behind the 2GHz Opteron 2350.

I decided to go ahead and report these results for the sake of completeness, but I don’t believe they’re telling us much about the new Opterons’ competence. This beta version of POV-Ray seems to have a problem with single-threaded tasks bouncing around from one CPU core to the next, and this causes especially acute problems on NUMA systems. Since the vast majority of the computation time for these scene involves such single-threaded work, things turn out badly for the Opteron 2300s.

MyriMatch

Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He recently offered to provide us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.

In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database.

MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we’ve tested with one to eight threads.

This application looks to be limited by memory bandwidth or some similiar factor; scaling beyond four threads doesn’t work out well on any of these systems. That said, the new Opterons perform respectably here, with the 2350 matching the Xeon E5346 and the 2360 SE edging out the Xeon X5365.

STARS Euler3d computational fluid dynamics

Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here. (I believe the score you see there at almost 3Hz comes from our eight-core Clovertown test system.)

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.

The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. I understand the STARS Euler3D routines are both very floating-point intensive and oftentimes limited by memory bandwidth. Charles has updated the benchmark for us to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.

The Barcelona Opterons can’t even match the Xeons clock for clock here. In fact, even the 2360 SE trails the slowest 2GHz Xeon. Again, we’re seeing just under 2X the performance of the dual-core Opterons at similar clock speeds (2218 HE vs. 2360 SE), but even that’s not enough to catch Intel.

Folding@Home

Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.

The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.

notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.

On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.

This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.

To catch the architectural improvements, follow the matchup between Opteron 2218 HE and Opteron 2360 SE at similar clock speeds once again. Per-clock performance is similar between K8 and Barcelona with the Tinker and Amber work units, where AMD already excelled, but Barcelona is much stronger with the two Gromacs WU types—which tend to dominate the WU assignment these days, as I understand it. Those improvements aren’t enough to catch the Xeons, though. The Intel processors remain faster clock-for-clock and also run at higher frequencies.

The Barcelona-based Opterons put in a respectable showing overall, however, on the strength of eight cores that handle the Amber and Tinker WU types relatively well.

The Panorama Factory
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs. The program’s timer function captures the amount of time needed to perform each stage of the panorama creation process. I’ve also added up the total operation time to give us an overall measure of performance.

Here’s another case where the new Opterons can’t match the Xeons clock for clock and they have lower frequencies. Things do seem to tighten up with the faster 2360 SE, its higher frequency core, and its faster memory controller clock. This design wants to run faster than where AMD’s starting with it.

picCOLOR

picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Eight of the 12 functions in the test are multithreaded, and in this latest revision, five of those eight functions use four threads.

Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1 GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.

This app uses a maximum of four threads, and again, the Barcelona Opterons perform similarly on a per-clock basis to the older dual-core Opterons. I’ll leave you to analyze the finer points of the individual sub-tests, if that’s your thing.

Windows Media Encoder x64 Edition

Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. Unfortunately, it doesn’t appear to use more than four threads, even on an eight-core system. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default “High definition quality audio” codec threw some errors in Windows Vista, I instead used the “Multichannel audio” codec. Both audio codecs have a variable bitrate peak of 192Kbps.

Here, the Barcelona and Xeon quad-core processors at 2GHz go head to head, and the Xeon comes away victorious. Fortunately, the 2.5GHz Opteron 2360 SE looks like it may be relatively stronger.

SiSoft Sandra Mandelbrot

Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX, SSE, and SSE2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:

This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.

The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.

We’re using the 64-bit version of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations in parallel.

I’ve been looking forward to seeing the results of this little test, because it has the potential to demonstrate what Barcelona’s single-cycle 128-bit SSE enhancements can do when given a simple, parallelizable task, just as it did when the Core microarchitecture arrived. The story that it tells is intriguing. We see huge improvements between the Opteron 2218 HE and the 2360 SE—nearly 4X in the integer test, with only twice as many cores on the 2360 SE. The magnitude of the gain in the floating-point test is lower, but still well past the doubled score one might expect from twice the cores in ideal conditions. Overall, though, the Xeons’ per-clock throughput remains much higher than Barcelona’s.

POV-Ray power consumption and efficiency

Now that we’ve had a look at performance in various applications, let’s bring power efficiency into the picture. Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we asked POV-Ray to render our “chess2.pov” scene at 1024×768 resolution with antialiasing set to 0.3.

Before testing, we enabled the CPU power management features for Opterons and Xeons—PowerNow! and Demand Based Switching, respectively—via Windows Server’s “Server Balanced Processor Power and Performance” power scheme.

Incidentally, the Xeon’s I’ve used here are brand-new G-step models that promise lower power use at idle than older ones. I used a beta BIOS for our SuperMicro X7DB8+ motherboard that supports the enhanced idle power management capabilities of G-step chips. Unfortunately, I’m unsure whether we’re seeing the full impact of those enhancements. Intel informs me that only newer revisions of its 5000-series chipset support G-step processors fully in this regard. Although this is a relatively new motherboard, I’m not certain it has the correct chipset revision.

Anyhow, here are the results:

We can slice up the data in various ways in order to better understand them, though. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

The new Opterons draw a little more power at idle than the old ones, as might be expected with so many more transistors on the chips. Still, the Barcelonas draw much less power at idle than the Xeons. Part of the Xeons’ problem is a platform issue: FB-DIMMs draw quite a bit more power per module than DDR2 DIMMs.

Next, we can look at peak power draw by taking an average from the ten-second span from 30 to 40 seconds into our test period, during which the processors were rendering.

True to its billing, the Opteron 2350 draws no more power under load than the Opteron 2220, its dual-core predecessor. Of course, AMD had to compromise on clock frequency in order to do it, but this still is an impressive result—especially since the 2350 draws less power under load than the low-power Xeon L5335, whose TDP rating is 50W. Then again, this is total system power draw, and we’ve already established that the Xeons have a handicap there—one they’re tied to, nonetheless.

Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.

By using more cores to finish the rendering work sooner, the Opteron 2350 is able to use less power through the course of the test period than the Opteron 2220, despite having similar peak power consumption and higher idle power consumption. Sometimes, power efficiency is partially about finishing first. However, the Xeons’ strong performance alone can’t redeem them here.

We can quantify efficiency even better by considering the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve chosen to identify the end of the render as the point where power use begins to drop from its steady peak. We’ve sometimes seen disk paging going on after that, but we don’t want to include that more variable activity in our render period.

We’ve computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

This may be our best measure of power-efficient performance under load, and in this measure, the Barcelona Opterons again excel. The Xeons are close here due to their short render times, but the new Opterons place first and second.

Power use at partial utilization with SPECjbb 2005

Before we close out our look at power efficiency, I’d like to consider another example. I’ve measured power use in SPECjbb2005 in order to show how it scales with incremental increases in load. I’ve only used a single instance of the JVM so that we can see a nice, gradual step up in load—two instances would take us to peak utilization much quicker.

We’ve graphed the quad-core Opterons and Xeons together. Since the dual-core Opterons take much longer to finish, they get their own graph.

Well, that’s interesting to see. I’m not sure exactly what to make of it just yet. I’d like to correlate power and performance here, but as I’ve mentioned, our testing time has been limited. Perhaps next time.

Conclusions

The new Barcelona-based quad-core Opterons bring major performance gains over their dual-core predecessors while fitting comfortably into the same power and thermal envelopes. Doubling the number of CPU cores will take you a long way in the server/workstation space, where the usage models tend to involve explicitly parallel workloads. The new Opterons also bring improved clock-for-clock performance in some cases, most notably with SSE-intensive applications like the Folding@Home Gromacs core. However, Barcelona’s gains in performance per clock aren’t quite what we expected, especially in floating-point-intensive applications like 3D rendering, where it looks for all the world like a quad-core K8. As a result, Barcelona is sometimes faster, sometimes slower, and oftentimes the equal of Intel’s Core microarchitecture, MHz for MHz. Given the current clock speed situation, that’s a tough reality.

That said, new processor microarchitectures often scale quite well with clock speed, and our sneak peek at the 2.5GHz Opteron 2360 SE suggests Barcelona might be that way. Still, one can’t help but wonder whether AMD did the right thing with its L3 cache. That cache’s roughly 20ns access latency erases the Opteron’s lifelong advantage in memory access latencies, yet it nets an effective total cache size just over half that of the current Xeon’s. Since the L3 cache is clocked at the same speed as the memory controller, raising that memory controller’s clock speed should be a priority for AMD. This particular issue may be more of a concern in desktops and workstations than in servers, however, given the usage models involved.

At its modest price and 2GHz clock speed, the Opteron 2350 is still a compelling product for the server space, especially as a drop-in upgrade for existing Opterons. AMD’s HyperTransport-based system architecture remains superior—a similar design is the way of the future for Intel now—and this architecture is one of the reasons why the Opteron 2350 scales relatively well in some applications, such as SPECjbb2005. Also, for the past couple of years, both Intel and AMD have been talking up a storm about how power-efficient performance is the new key to processors, especially in the server space. By that standard, AMD now has the lead. By any measure—and we have several, including idle power, peak power, and a couple of energy use metrics—the quad-core Opterons trump Intel’s quad-core Xeons. Even the early 2.5GHz chip we tested proved to have relatively low power draw, which bodes well. We’ll have to take the Opteron 2347 HE out for a test drive soon, to see how it fares, as well.

Nonetheless, AMD now faces some harsh realities. For one, it is not going to capture the overall performance lead from Intel soon, not even in “Q4,” which is when higher-clocked parts like the Opteron 2360 SE are expected to arrive. Given what we’ve seen, AMD will probably have to achieve something close to clock speed parity with Intel in order to compete for the performance crown. On top of that, Intel is preparing new 45nm “Harpertown” Xeons for launch some time soon, complete with a 6MB L2 cache, 1.6GHz front-side bus, clock speeds over 3GHz, and expected improvements in per-clock performance and power efficiency. These new Xeons could make life difficult for Barcelona. And although AMD should remain competitive in the server market on the strength of Opteron’s natural system architecture and power efficiency advantages, this CPU architecture may not translate well to the desktop, where it has to compete with a Core 2 processor freed from the power and memory latency penalties of FB-DIMMs. But that, I suppose, is a question for another day.

Comments closed
    • VILLAIN_xx
    • 12 years ago

    I apologize in advance for this long opinion. Both Intel and AMD are competing violently. Which is great for us consumers.

    I really enjoyed this review alot. Always loved Techreport to put things in layman’s terms for people to understand everything from Rendering to Power draw. I have high hopes for AMD to stay alive and continue to bring things to the table and compete with Intel and bring new things to ALL of us. I see theres alot of people rooting for the Blue team and the Green team. Show loyalty to your family and friends before Blue or Green.

    Things to remind from about 5+ years ago. AMD put out their X2’s and spanked the benchmarks across the board. Intel couldnt make anything close for all that time being until they came out with Conroe a odd few years later. About 1 year later AMD at least responded to a new product thats in the same ball park. They had to. Their wallets are no where near as deep to Intel’s to keep making crappy lines of CPUs for years and relying on the Intel name brand and marketing ploys to sell. The X2’s only made AMD sat on their first golden product like fat wealthy men wearing their monocles and drinking tea, (while that….. Intel FINALLY put their engineers to work and unleashed a beast as X2 once was).

    Conroe Definition=Lower Power draw, Clock to Clock better performance, and high overclock potential compared to X2s.

    Conroe has only been out for about a year and some months and its already obsolete. Intel had to make the q6X00 series because they knew AMD was already on the quad path before Conroe. The Barcelona was AMD’s only potential to make the comeback. Intel HAD to beat them to the Quad Core just as they were once beaten to Dual Cores..

    Back to todays Date, AMD rushed out some low Clock Barcelonas (premature baby Cpus I.M.O.) for their first Quad line.

    We all saw the benches and they are contending pretty decent with Intels teenaged Xeons, and with some damn low surprising power consumption as Conroe once surprised us. Im thinking about the 2.5ghz 2360 SE and the 3.0ghz 5365! AMD is .5 ghz behind and not getting as spanked in something that means alot to me as an Animator, and thats Rendering. Only 9sec. and 0.5ghz behind.

    I cant wait till Techreport starts overclocking or shows a stock 3.0ghz (when/If AMD delivers), and find out if its overclock roof is as high as the Conroes. Then maybe these Barcy’s might earn some more credit and respect. Remember the Benches for the X2s and Conroes? You have to up the clock on a X2 at least 0.2 ghz to compete with a Conroe on most benches…

    Today i saw a 2.0ghz Barcy compete really well with a 2.33ghz Xeon, and it had lower power consumption.

    Oh what about the Penryn’s and Phenoms? Thats when ill consider to buy a Quad Core. AND thats when we’ll see some REAL toe to toe fighting.. Both these releases will be the ironed out current server technologies on smaller cpus. Intel is bragging about Penryn being the real competition. From the benches of present day, its 45nm, 10% lower power usage and 5% better performance than Conroe. Last time i heard, thats pretty much how Phenom is being compared to Barcelona on 65nm and Barcelona is contending very well. But honestly WHO cares who gets smaller first, Really now did that make you guys consider size will be the deal breaker in a CPU purchase? Another thought, How many of us have waited for this Barcy before really considering buying a Xeon? Theyre both server chips and alot of money if you do not intend to use it for business purposes. Oh thats right Fan Boys would buy it (or pretend they did) for bragging rights on these forums..Its Your money not mine.. ill wait for Penryn and Phenom.

    IF AMD and INtel are just as equal in clock to clock and Overclock potential in the benches, then finally…. our CPU companies are closer to being honest and not fooling any one with Price and Performance.

    No more market gimmicks. No more personal lab paper benchmarks… just give us a great Price & raw Performance

    (oh & dont forget the motherboard prices lol)

    :o)

    • Damage
    • 12 years ago

    I’ve finally banned SVB. That’s enough trolling for one lifetime–or several. Apologies for not getting to it sooner; I was out at a funeral yesterday.

      • totoro
      • 12 years ago

      Wow, that was fast. Thanks Damage for making the world a better place 🙂

      • ew
      • 12 years ago

      It would be great if the comments section had some sort of self moderation.

        • indeego
        • 12 years ago

        “options”

        I think what you mean is more moderators to filter the chaffg{<.<}g

          • D-Man
          • 12 years ago

          Now, stop a second… I am not SVB, nor do i have any affiliation to him in any way. this is also my first time on this website, Ive heard of it before, but never used it. I was mainly googling the phenom/penryn procs, and found this site. i read through it all and am glad that SVB did have posting rights. You see, take out all of SVB’s posts, and all of the replies to his posts, and you lose a lot of viable information. (Whether it be his or that of the repliers) All of which i am glad is there, because i did learn, or at least research into some of the benchmark qualities for myself. My conclusion which i will not display here, for I would rather not start anything more, and cause more problems. IMO removing SVB was a mistake, sometimes having someone like that on a forum can build its strength, because it calls for more research to be done, which enhances the depth, and knowledge of the story/ its readers.

            • VILLAIN_xx
            • 12 years ago

            ?????

            Through the posts i’ve been reading, i dont really see why SVB was banned. Seemed like someone just doing some hardcore reading of reviews about benchmarks. In fact, i think he had alot more rellevant interesting side notes to say than most people’s postings here.

            lol, watch me get banned.

      • cynan
      • 12 years ago

      Sorry to hear about the funeral, I hope it was no one close…

      Based on this thread alone, I as well, wouldn’t think that SVB’s emphatic “participation” warranted a banning. If he was trolling (which he well may have been) then it was a more intelligently constructed troll then 99.9% of the rest of the trolling that goes on – and that alone would be deserving of a more linient consequence, no?

      Yes, the posts were lengthy, repetative, vast in number and even perhaps a tad overly critical, but I would hope that a little over-enthusiasm is not a prerequisite for being banned on this forum.

        • ssidbroadcast
        • 12 years ago

        The funeral was *[

          • VILLAIN_xx
          • 12 years ago

          edit:

          i apologize…

          After witnessing the edicate (im sure i spelled this wrong) of this site almost a year later.

          SVD was a troll!

    • UberGerbil
    • 12 years ago

    RealWorldTech offers a perspective on the Barcelona launch. Nothing particularly new, but it’s a balanced viewpoint assessing it entirely (and appropriately) in the server context. In other words, it’s a long way away from the fanboy lanparty desktop viewpoint you see in some of the threads here.

    And TR gets a shout out for offering the best review of the chip so far.

    §[<http://www.realworldtech.com/page.cfm?NewsID=375&date=09-13-2007#375<]§

    • snowdog
    • 12 years ago

    You can’t reach conspiracy theorists (CTs) with reason.

    If someone believes the CIA brought down the buildings on 9/11 with a controlled demolition. No amount of scientific evidence or reasonable argument will convince them to the contrary.

    Likewise if someone thinks that a great conspiracy is taking place to tilt benchmarks in Intels favor by using the Intel compiler that exists seemingly for no other purpose they can not be persuaded to the contrary by facts or reasonable argument.

    For the reasonable people, who may be swayed by voluminous arguments from the CT’s. Here are some things to think about.

    The Intel compiler is chosen in many high performance situations because it yields the fastest code on both platforms. It is widely used by AMD for it’s own specCPU benchmarking entries.
    §[<http://www.spec.org/<]§ Apparently AMD must be biased against AMD to use this Intel compiler? Testing on SpecCPU shows that AMD benefits as much as Intel from the Intel compiler: §[<http://www.principledtechnologies.com/clients/reports/Intel/CompComp.pdf<]§ Next a CT might respond that this is because SpecCPU is simplistic 486 code and doesn't use SSE. Nothing can be further from the truth. Do a search on spec Cpu2000 and SSE and you find it takes advantage of SSE optimizations and code and it is not a simplistic benchmark either but a suite of applications/algorithms. As of Feburary 2007 it was retired for Cpu2006 which is much more complex suite make much more use of SSE. Still using Intel compilers again. Or the CT might latch onto some point in the past when Intel checked if it was one of their Own CPUs before enabling SSE optimizations (not their job to find out otherwise) but they wouldn't dig deep enough to find that with SSE opt flags now it doesn't check at all and the code may actually just crash if SSE is not present. In short Intel makes a great compiler, it is used by people who want to squeeze the greatest performance out of any applications and produces SSE optimizations that work on both Intel and AMD chips. But hey you are probably a reasonable person and already knew that.

      • totoro
      • 12 years ago

      he also clearly understands the difference between a development environment and a compiler and is just kidding when he implies that Visual C++ is a programming language, right?
      ; )

      • shaoyu
      • 12 years ago

      Throwing cute labels around does not lend you more credibility. Only factual argument does. Nobody here is claiming that the reviewers conspired with intel to use icc compiled benchmarks. If you are saying intel compiler discrimination is no longer an issue, please provide evidence as other people claim the opposite.

        • snowdog
        • 12 years ago

        r[

        • SVB
        • 12 years ago

        It is very interesting that the Intellabees have not chosen to comment about the blog at §[<http://www.swallowtail.org/naughty-intel.html<]§ wher the offending lines of code are documented, an explanation of their effect is given, a corrective patch is given and conclude with comments about the effects on benchmarks. Religious zealots and fanboy's (Is there any difference) refuse to believe that their belief in their God can possibly be in error. I thank you and all of the others that stuck with my long and drawn out discussion. I really didn't know where I would finish when I started (and as I type this on my 3.2Ghz Pentium 4). If Damage will permit, I think that an understanding of benchmarks, benchmarketing and their abuses is a most important topic for anyone who is an extreme PC user. For the Intellabees that want to believe that the great God Intel would not alter a compiler for benchmarketing purposes, all I can say if I provided two very well documented references. There are hundreds more all over the internet. And for those who wonder if I think AMD is pure as the driven snow I have this comment: The only reason that I seem to concentrate more on Intel's sins is that a 1000 people can sin a hellova faster than 100.

          • snowdog
          • 12 years ago

          Those are Intel only options. Obviously facts or reason hold little sway with you (or your sock puppet) so it becomes increasingly obvious that answering your delusional rants is pointless. The link you quote contains the explanation if you would only read it.

          You are jumping on a problem in version 7 of Intels compiler, we are now on version 10. It is perfectly reasonably that upon the introduction of each new version of SSE, the Intel compiler will first support ONLY the Intel chips for a couple of what should be Obvious reasons.

          Intel Chips get SSE version first.
          Intel will get to develop the compiler with the SSE Intel chips before the chips are even released.

          *[

            • SVB
            • 12 years ago

            Sometimes unintended choices of words can tell a lot: ” You are jumping on a problem in version 7 of Intels compiler, we are now on version 10. ”

            I would have said, “You are jumping on a problem in version 7 of Intels compiler, Intel is now on version 10. ”

            Are you an Intel employee? Anyway, it shows a strong identification with Intel.

            I don’t know about version 10. My personal version is 8.1 and is so damn expensive for how little I use it that I really can’t justify going to version 10.

            Your characterization of the situation is somewhat wrong and missing some facts. There was no version 6 of the Intel Fortran compiler. Version 6 was the last Compaq version.

            Compaq Fortran version 6 was a serious thorn in Intel’s side. Nearly all of the benchmarks compiled with Compaq V6 rand benchmarks on AMD much faster than on Intel chips. After it was acquired by Intel, Intel began to win nearly all of the benchmarks compiled with V7. Many system administrators identified the change that caused the problem and complained to Intel. Tim’s blog details the runaround he was given in order to get a fix. At the time of the blog, he was still waiting.
            Was the problem ever fixed (ie. in version 10)? I don’t know and it’s not worth it to me to spend over $600 to find out. But given Dirk Meyers comments over at the Inquirer: §[<http://www.chipzilla.com/?article=42257,<]§ and this comment on the Intel community forum dated 4/27/2007 §[<http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30233789.aspx,<]§ I suspect not. Do you know if it was actually fixed or are you just positing a fact to win a discussion? Oh, and oh yes *[

            • snowdog
            • 12 years ago

            Not my Job to explain the internet to you. You can spend forever doing a search for something with someone whining about Intel and put the onus on me to answer it.

            All your are doing is finding a new complaint when your old one is exhausted.

            Your current complaint.

            Source: The Inquirer: Yipee!
            The complaint: Intels compiler is better than PGI. Umm So?
            “When SPECint_2006 was ran with the Intel vs. PGI compiler, Opteron 2350 has 5% performance deficit,”

            When they ran it on GCC for both opteron takes the lead. So Opteron takes the lead when run on crappy compiler.

            Is there anything in there anywhere that shows Intel doing wrong?

            Finally, for the record I don’t work for Intel and have no affiliation, never owned stock or anything. If damage requests it, I will verify my particulars with him, will you? Will “shaoyu”?

            • SVB
            • 12 years ago

            I’m not whining, I sitting here ROFLMAO at your attempt to one up but actually digging your self in deeper.

            The only semiconductor house I ever worked for was National Semi and that was a long time ago. I am unable to tell this forum just exactly who I work for, suffice it to say I am not currently nor have I been associated with any electronic parts manufacturer since NatSemi. I do work for a very large, independent R&D organization.

            Just because the source is the Inquirer, does that mean that Dirk Meyer didn’t say it? Just because its on the Intel software community development site does that mean its not true. Oh wait, that’s Intel. Feel that hole get a little deeper.

            Joseph E. Johnston wrote Robert E Lee a letter in March, 1865, about William T. Sherman “Sherman’s course can not be altered by the small force I have. I can do no more than annoy him.” Your not even annoying.

            • snowdog
            • 12 years ago

            Meritless Post. What is wrong with Intel building a better compiler than PGI?

            That is all the Inq link showed.

            • SVB
            • 12 years ago

            Nothing. But I really think I allocated the blame in post #176.

            Only because AMD is strapped for cash I feel the responsibility is:

            Most responsible > Benchmarkers > Intel > AMD > least responsible.

            If AMD had been making $100m per quarter, then

            Most responsible > Benchmarkers > AMD > Intel > least responsible.

            If AMD had had the cash, saw this coming, and they should have after BAPCO, then it is their fault for not having taken proactive action. That’s where the anti-trust comes in.

        • swaaye
        • 12 years ago

        Yes, but is it wrong for Intel to bias their compiler? They invest their money into it and it is a competitive advantage for them to make it perform best on their hardware.

        AMD apparently can’t be bothered to produce their own compiler.

        It’s like people complaining that NVIDIA’s devrel “TWIMTBP” program causes ATI hardware to perform sub-par. Well, maybe AMD/ATI should get some better devrel going? Is it so surprising that companies will use what works best for the majority, or is the easiest, or is a product from a company that is willing to assist or sponsor them somewhat?

        And ironically there have been TWIMTBP games that run better on ATI hardware, just as there is software built with the Intel compiler yet running better on AMD hardware.

          • Vaughn
          • 12 years ago

          wow this is still going getting hard to follow! I think you guys both need a beer! Hooters anyone =)

            • SVB
            • 12 years ago

            Absolutely. Just provide a bodyguard when I have to go home to the wife.

          • SVB
          • 12 years ago

          Really, that’s an excellent question. And maybe Intel has only a small problem as opposed to some others.

          More history. At one time, Intel developed and maintained their own compiler. They had continual problems with the code generation and compiler failures. In 2003, they scrapped the original compiler and acquired Compaq’s. I believe that no cash was involved but was a trade of assets, some of which involved EPIC and Itanium. Very shortly afterwards, the compiler stopped optimizing for AMD processors. As I said before, the Compaq V6 was a thorn in Intel’s side because the optimizations for AMD allowed AMD to win many benchmarks.

          1. Was it wrong for Intel to buy the compiler? Absolutely not.
          2. If they acquired the compiler to cripple AMD’s ability to win benchmarks, would that be wrong. Probably, given their monopoly position, that would be lessening AMD’s ability to make a profit. Since the compiler is part of AMD’s antitrust suit, their lawyers certainly hope the judge will think so. If the judge agrees, it could end up being a very expensive acquisition for Intel. I’ll let the judge decide.
          3. Should Intel have made a statement that the compiler no longer optimizes for AMD. Absolutely, as it caused considerable damage to some AMD customers.
          4. Does the major fault lie elsewhere? Absolutely. SiSoft, SPEC and other benchmark organizations should have known that using the Intel compilers would skew the benchmark results. They chose to remain silent and not use other compilers. There are many choices which could have been used which generate identical code for AMD and Intel. The degree of optimization is irrelevant when doing generalized benchmarks. For a specific installation, then the benchmarks should be compiled with the tools used in that installation. I really wouldn’t care about AMD vs Intel on a SiSoft benchmark. I want to know which one runs our MYSQL best and runs the code developed with our Absoft Fortran compiler most efficiently.

          As for your other assertions, I’m not into NVIDIA or ATI. Talk to my son, the gamer.

          Should AMD have been more proactive? I have to give a qualified yes. The have been a number of compiler developers that have gone under in recent years. AMD could have and should have brought the assets of, say Watcom, and underwrote their development. I don’t think the Compaq/Intel development team was over 30 people. Even with AMD’s financial problems, they could have found another $10 or $12 megabucks. A few more megabucks per year to have supported a neutral or AMD favorable benchmark may have substantially eliminated the problem. Sometimes it doesn’t pay to be too cheap.

          So while AMD and Intel were playing gun fighter at three paces, the average user was standing in between.

    • SVB
    • 12 years ago

    It’s just about done.
    Seriously, I want to thank snowdog and damage for their examples that raised objections to my arguments. Understanding them gave me an important clue to explain the problem with the benchmarks.

    I should have realized that on Spec CPU2000 that Intel and AMD would scale identically. A little history. For those that didn’t know, Gordon Moore, Bob Noyce (the founders of Intel) and Jerry Sanders (the founder of AMD) worked for Fairchild Semiconductor and were friendly. When IBM wanted to use the 8086 for their PC, the insisted that Intel have a second source so Bob and Gordon went to their bud, Jerry, as the second source. Intel supplied the masks and AMD fabbed. This worked well until Andy (Only the Paranoid Survive) Grove became chair of Intel and wanted the x86 market to himself. Law suit 1 followed and AMD won the right to continue to fab x86 clones from the Intel masks. Intel went to the 486 and AMD reverse engineered it and produced an exact copy. Lawsuit 2 followed and AMD was given access to Intel IP (I’m not sure I agree with the judge’s decision and I think other factors such as maintaining competition may have entered into the decision). There was some further negotiation and the result was that AMD and Intel have cross licensed each others IP. Bottom line was that the Intel 486 and the AMD 486 are identical, micro-op for micro-op. Run them at the same clock and the execute programs in identical times. The were also socket interchangable so that if you didn’t like Intel on your Intel board, you could always switch to an AMD processor. So AMD and Intel run base x86 code and MMX at the same IPC.
    This is the interesting point with the Intel compilers. If the cpu string is not “GENUINEINTEL” then the compiler defaults to 486 code. AMD and Intel run at exactly the same IPC. If the procesor is “GENUINEINTEL” then the SSE optimizations are included. So it isn’t really necessary to get an “AUTHENTICAMD” processor to run AMD vs. Intel comparisons with the Intel compilers. Run the benchmark normally for the Intel scores. Recompile the benchmark without the SSE optimizations, run it on the Intel porcessor and get the AMD scores. This will save Damage a lot of time and money in testing because he won’t have to get an AMD board for comparison. Even simpler, run any SSE benchmark once on an Intel 25Mhz 486, multiply by the clock frequency of the AMD chip to be tested and divide by 25 and obtain the score for the latest AMD chip. Try it. And if anyone now thinks benchmarks using Intel compiles are legitimate, I have swamp land in Florida to sell.

    r[

      • shaoyu
      • 12 years ago

      SVB, thanks for providing so much information here. I am surprised that the intel compiler issue wasn’t made more public-aware and it seems to me that most reviewers are not even aware of it. People seem to hold these corporations with ridiculously low moral standard. It is unfair competition and should be rightly considered cheating. Intel deserves some anti-trust investigation over these compiler tricks and having their developers questioned why they had to look at intel trademark rather than cpu capabilities.

      I think your concerns about icc compiled benchmarks are quite valid. The issue here is that icc is most likely the first compiler to support the latest ISA and optimizations, so people are tempted to use them to measure newly released cpus. That’s why I think anti-trust investigation is necessary to correct this situation.

      Given the current situation, I would lend much more importance to benchmarks such as mysql and linpack, both of which barcelona is doing pretty well I think.

      Also appreciate your patience with the detractors.

        • Anomymous Gerbil
        • 12 years ago

        Comedy gold 🙂

    • TravelMug
    • 12 years ago

    SVB` struggling here is comedy gold :))

    /takes note to check back later

      • SVB
      • 12 years ago

      Quite frankly, based on your troll like comment, I’m in a run-away win. If you can’t make a better argument or show it’s wrong then poke fun at the author. It saves having to think.

        • TravelMug
        • 12 years ago

        The situation at the moment does not warrant any reply to any of your comments. Several people tried to explain to you why you`re wrong and also pointed out explicitly why you are a fanboy. No need to repeat those same arguments as you are seemingly unable to take any of them onboard.

        Don`t mind me by the way, by all means continue. It`s great entertainment.

          • SVB
          • 12 years ago

          You still don’t understand, do you?

          • Jigar
          • 12 years ago

          There is difference … I don’t think he was acting Fan boy. Some one getting confused and arguing because he thinks that, that’s the truth doesn’t make him a Fanboy.

            • SVB
            • 12 years ago

            Thank you. Not a fan boy, just someone who believes that some benchmarks lie. And I’m not alone by far. Anand apparently also believes it which I think is why I can’t find SiSoft or Spec benchmarks on his site. Many other sites have made similar comments and AMD strongly believes it.
            But hey, if a site needs to run Intel c++, then get an Intel cpu. Using an AMD cpu would be foolish.
            But if the site uses Visual c++, and runs mainly MS products there are more choices, assuming that the 4% loss of c++ code performance can be made up by a better deal on the AMD processor.

    • albundy
    • 12 years ago

    well, thanks for the review. My company is in need of a server upgrade and I was wondering who I should go with. Intel is just too hard to pass up performance wise and most importantly, cost wise.

      • derFunkenstein
      • 12 years ago

      You can’t possibly have a company…you’re like 14.

        • BoBzeBuilder
        • 12 years ago

        LMAO!!! Get a life kids!

    • StashTheVampede
    • 12 years ago

    A mixed bag of benches and workloads are shown here, but let’s look at what it really means for AMD:
    – Drop in replacement for Socket 1207 boards. OEMs can deliver solutions almost immediately with these chips.
    – A clock for clock performance increase with headroom to grow (vs. the current chip and hitting it’s limits).
    – Smaller parity against Intel’s chips at the same price.
    – Still low(er) power consumption compared to competition.

    These chips are obviously NOT the Core2 killers, but who *truly* believed they would be? I, for one, knew these chips would be faster than current Opty’s, but wouldn’t slaughter Xeons across the board.

    • johnny blaze
    • 12 years ago

    Hey I dunno, but when I look at the picture in the cinebench, I see the xCPU thing at 22472, but the highest you noted on your chart is 16546 for the cinebench cpu score, maybe I am reading it wrong or something, may I get some clarification??

    • UberGerbil
    • 12 years ago

    …………
    (I didn’t have this comment replying in the right place. So I moved it — you can read it in its entirety down below. I don’t remove my comments. Unlike some people).

      • gratuitous
      • 12 years ago
        • fishmahn
        • 12 years ago

        Awww, I chuckled at that, and you had to go delete it 🙁

        Mike.

    • leor
    • 12 years ago

    ima jump on that 2.5.

    AMD is back in the game, but they’re not winning any championships yet . . .

    • droopy1592
    • 12 years ago

    S939 for life! Until We get past Vista or some amazing gotta have app comes out, I’m gonna 3800×2 it for life…

    AMD should have stuck with socket 939

    *ducks

    • snowdog
    • 12 years ago

    Amazing to me is that some people read these bench marks and see it as a tie. I went through and highlighted the percent wins in the clock for clock comparison. 2350 vs 5335 at 2GHz. I left out the cache memory access because is no a benchmark that actually tests any application performance. I also left out Mandlebrot which does seem very Intel optimized.

    Here is the Percent wins in benchmarks clock to clock: Intel in r[

      • Mr Bill
      • 12 years ago

      Try clock for clock and energy per mflop.

      • shank15217
      • 12 years ago

      people are reading techreport benches and benches from other sites and discussing them here. It certainly isn’t as cut and dry as your color coded % list.

      • maxxcool
      • 12 years ago

      A very disappointing release

      • SVB
      • 12 years ago

      Your numbers looked flakey in some cases so I reran them on a spread sheet on my Intel PC:

      Opt 2350 Xeon 5335 Better AMD/Intel
      Sandra Cache 11433 5179 H 120.8
      CPU Access 91 95 L 4.2
      Specjbb 88949 87099 H 2.1
      VRAD 121 107 L -13.1
      CineBench(8) 12623 14129 H -10.7
      CineBench(1) 1934 2310 H -16.3
      POV 1024×768 77 80 H -3.8
      Myri (1) 768 704 L -9.1
      Myri (8) 372 379 L 1.8
      Stars (1) 0.46 0.65 H -29.2
      Stars (8) 2.34 2.7 H -13.3
      Folding 1099 1063 H 3.4
      Panorama 23.05 20.41 L -12.9
      Pic 8.09 10.11 H -20.0
      WME 543 510 L -6.5
      SciSoft Int 228112 437006 H -47.8
      SciSoft Flt 296230 335126 H -11.6
      Power Idle 151 212 L 28.8
      Power POV 268 273 L 1.8

      So while my arithmetic differs from yours is a few places the overall trend is clear.
      Now these results are really curious. If I study the Scisoft Integer results, I would have to believe that the Xeon 5335 is nearly twice as fast as the Opteron 5335. Huh? If I look at the Scisoft floating results, I would have to believe that the Xeon was 11% better than the Opteron. Huh? The Opteron is know to have a better designed floating point unit.
      So I started to look into the benchmarks them selves. How was the Scisoft benchmark prepared. The answer from xbitlabs

      “The obtained results are very curious. While Pentium 4 processor with EM64T support benefits from the shift to 64-bit mode in almost every benchmark, the competitor from AMD is very often getting fewer points in 64-bit OS compared with its results in 32-bit operating system. However, I would like to assure you right away that this result is not indicating any overall performance drop of Athlon 64 based systems in 64-bit work mode. The problem actually lies with the SiSoft Sandra 2005 SP1 benchmark, which is better optimized for Intel EM64T architecture. Our analysis shows that SiSoftware uses an Intel compiler to form the execution code for its benchmarks. Moreover, 32-bit and 64-bit benchmark versions use different algorithms based on unequa instructions sets. Therefore, you shouldn’t base your verdict on the results of only this particular benchmark. ” -xbitlabs.com

      And there’s the answer!!!!!!!!! For those of us who have been in this field for many years and have long memories, we remember when Intel brought the Dec Fortran compiler and changed the options. Rather than optimize based on the SSE, SSE2, SSE3, … flags, the revised Intel Compile checks for the string “GENUINE INTEL” and if it doesn’t find them, it shuts off the optimizations in the executable code. “AUTHENTIC AMD” won’t do.
      There is an interesting comment by Dirk Meyer, that bad boy who ran away from Intel to join AMD, over on the Inquirer, “Dirk started to explain the difference, the guilty party supposedly being Intel’s compiler, quoted being “nothing more than a benchmark compiler.” When SPECint_2006 was ran with the Intel vs. PGI compiler, Opteron 2350 has 5% performance deficit, but when SPECint_base2006 was ran with both systems running publicly available gcc compiler, AMD has a performance lead of 9%.”

        • snowdog
        • 12 years ago

        My calculations are less misleading. When Intel doubles AMD performance, you record -50% (Scisoft Integer) when AMD doubles intel your record +100% (Sandra Cache). In fact the magnitude is the same in either case. Don’t you think that is just a tad misleading?

        To not tilt the results I used the same consistent formula for all *[< (highest-lowest)/lowest<]* giving a consistent percent increase from the slower to the faster to not mislead by recording a percent increase on AMD and percent decrease on Intel which mis-represents the magnitude of change to the casual observer. By your calculation: AMD 50 Intel 100, is -50% AMD 100, Intel 50 is +100% By my calculation: AMD 50 Intel 100, is +100% (for intel) AMD 100, Intel 50 is +100% (for AMD) I really think my method, that you called flakey is more honest and less misleading. As far as you speculation about tilted Intel benchmarks, you will note that I excluded SciSoft numbers. Along with the Cache numbers the don't even reflect CPU/Memory involvement and are just a hardware curiosity. I bent over backwards to be fair, you cried fowl about a benchmark I didn't include and derided my calculations while including more misleading ones.

          • SVB
          • 12 years ago

          I used the similar formulas but I used AMD as the basis. That is Intel/AMD -1 when higher was better and 1-Intel/AMD when lower was better. It really doesn’t make any difference which formulas are used, the results give the same trend.
          The problem is that the benchmarks are more skewed than the relative performance of the processors. I had to ask what would cause a consistant bias of roughly 20%. The answer is very simply, the lack of SSE optimizations. That’s when all of the pieces fell into place. I suggest you Google “Van Smith” and “Bapco” to get a clearer picture.
          The Microsoft/Dec/Compaq/HP/Intel compiler has a long history. Before it was purchased by Intel from HP, AMD consistently got better benchmarks when programs were compiled using that compiler. After the Intel purchase, the AMD scores went south. About 3 years ago there was an investigation and it was discovered that all optimizations were turned off when the string “GENUINE INTEL” was missing from the cpu no matter what the settings requested by the user.
          The Sisoft Sandra scores, were the most extreme examples so I just made the case on them. The problem is that Cinebench and POV and possibly others use the same compiler. Don’t know, didn’t check beyond those 3.
          The real solution is to recompile the benchmarks using gcc but that’s not the goal of benchmarketing. I suspect that Anand, who has been around almost as long as I, remembers the benchmark wars from 2002 and 2003 and avoids tainted benchmarks. That’s probably why his favor AMD more. But I really don’t see much difference. A 5 to 10% difference is smaller than my head can measure so WTH.

            • rxc6
            • 12 years ago

            q[

            • SVB
            • 12 years ago

            One method of trolling is by character assassination, implying that the other party doesn’t know what they are doing or possibly implying that they are a marketing troll.
            Another trolling technique would be to make a issue over a small point which might be insignificant to imply that the whole method was invalid. Does the method of comparing benchmarks mean anything when the benchmarks are invalid? Aren’t you deflecting attention away from the real issue?
            The problem of making any comparison is to make the comparison on the same basis. Turning SSE optimizations off in brand A and on in brand B (I) clearly makes the comparisons invalid. It’s like drag racing and requiring the A team to pull a 10,000 pound trailer and not requiring the same of the I team.
            It really doesn’t make any difference what the benchmarks are as long as they represent a meaningful work load to your environment and are identical, the same code. If one was optimized and the other wasn’t, doesn’t that invalidate the benchmarking? (Do you really have a problem with that statement?). Would it make and less sense to compare Barcelona results on Cinebench to Xeon results on POV.

            • snowdog
            • 12 years ago

            Fanboy Rantings are not Facts. First you claimed that the benchmarks you saw were basically a tie. That prompted me to check the clock-to-clock benches and report them fairly.

            Then you responded by suggesting my numbers were flakey and posted your tilted view on them by treating reporting AMD as Percent improvements and Intel as Percent losses, thus altering the scale of the percentage magnitudes in AMDs favor.

            Next you paint all the benchmarks as Intel Biased. I excluded the one that looked like it was highly Intel optimized in my results.

            Sit down and try to look at this objectively. Your actions make you appear as a rationalizing fanboy. Report the numbers in a even manner. Take a realistic look at your benchmark conspiracy theorys.

            If you have a concern about a specific benchmark make a solid case for it’s exclusion. Note Scisoft was excluded by me from the beginning.

            Also realize that not ever case of optimization is a nefarious plot. Software coded by individual my be optimized for what happens to be the dominant processor. This is common in all software. Here is SiSoft says:
            Q: Are the tests in the CPU Multi-Media Benchmark optimised for a specific CPU?
            A: Yes, the tests are optimised as far as possible but without introducing instructions that would generate large penalties on other processors.

            * ALU (Integer) Test – Optimised for Intel Pentium core.
            * FPU (Floating Point) Test – Optimised for Intel Pentium core.
            * MMX (Integer) Test – Optimised for Intel Pentium MMX core.
            * Enhanced MMX (Integer) Test – Optimised for AMD Athlon.
            * SSE (Integer & Floating Point) Test – Optimised for Intel Pentium III.
            * SSE2 (Integer & Floating Point) Test – Optimised for Intel Pentium 4.
            * SSSE3 (Integer) Test – Optimised for Intel Core 2.

            • SVB
            • 12 years ago

            Your attempt to assassinate my character does not change the facts. I have very carefully documented every assertion that I made with at least one reference. But I going to list them to see which ones you disagree with:
            1. The Intel Fortran and C++ compilers add code to the execution path when the CPU identifier is anything other than “GENUINEINTEL”. §[<https://techreport.com/discussions.x/8547.<]§ 2. Unoptimized code runs slower than optimized code. 3. Three of the benchmarks, Cinebench, POV and SiSoft, were compiled with Intel compilers. (reference xbitlabs quote, also go to the sites for these benchmarks to see how they were prepared, as I did). For a very interesting comment, read this §[<http://www.swallowtail.org/naughty-intel.html.<]§ The author, who very carefully documented the compiler problem goes so far as to state in his opinion that Intel attempted to cover up the problem. I quote r[< It is a shame that the Intel compiler, which use to be almost the no-brainer choice if your primary concern was fast code, is now being coerced into being a marketing tool. Crippling the output for non-Intel chips may mean that some published benchmarks may end up bogusly favouring Intel over AMD, but the cost is that if you want to release fast production code I can't recommend the (unpatched) compiler. There are an awful lot of AMD machines out there!<]r 4. That Intel has been caught in the past rigging benchmarks. (Google "Van Smith" and BAPCO) 5. New assertion: These benchmarks: §[<http://www.anandtech.com/IT/showdoc.aspx?i=3091,<]§ §[<http://www.anandtech.com/IT/showdoc.aspx?i=3091,<]§ §[<http://www.techwarelabs.com/reviews/processors/barcelona/<]§ show Barcelona at least even to substantially ahead (20%) and these benchmarks were based on Linux systems with open, known code. One of the reviewers, Johan DeGales, is very well respected. 6. Your assertion about percnt gains and loses is patently false. My equation was (AMD - Intel)/AMD so that everything would be on the same basis. That is standard scientific method. Your method of (high - low)/low does not give consistent comparable results from test to test. 7. I never said that anyone here was biased. 8. My mistake was quoting Scot (the author) that the benchmarks were a tie without critically reviewing the benchmarks. I'm going to quote a message I received today :/[

            • Damage
            • 12 years ago

            SVB, if anyone’s character is being assassinated here, it’s mine. I would like to address some of your assertions briefly.

            First, you point to my article about Intel’s compiler shenanigans as proof that any software developed with the Intel compiler should be immediately off-limits for testing. That raises several red flags.

            You mention SiSoft Sandra explicitly in this context. Yet the issue my news item highlighted was about SSE/SSE2 code paths being deactivated, and that’s clearly not the case in Sandra. How else do you explain the Barcelona’s huge clock-for-clock gains in this SSE-intensive test, just as expected due to architectural enhancements for SSE? (Not that I am heavily invested in this Sandra Mandelbrot test; it is a synthetic test and an academic curiosity.)

            Also, we’ve seen multiple cases where Intel’s compiler produces performance gains on AMD processors. See the LAME MT VBR results here:

            §[<https://techreport.com/articles.x/12091/8<]§ Or the Sphinx results here: §[<https://techreport.com/articles.x/10508/11<]§ Even some of AMD's own published benchmark results use software touched by the Intel compiler. I'm not sure what compiler Maxon uses for Cinebench, it may be Intel's as you assert, but that benchmark has been a favorite of AMD's for public demos of its multicore products. AMD has also published SPECjbb2005 results for its own processors using the Intel-optimized JRockit JRE, simply because it's fastest. And that leads us to the crux of the problem. We've sourced the vast majority of our benchmarks (including Cinebench in that it's based on Cinema 4D) from real-world applications created by ISVs, academics, and open-source development efforts. Some of these folks may use one of Intel's compilers, simply because these tools are somewhat popular. This is a dynamic you can't avoid if you live in the x86 world as AMD does. I am not saying, incidentally, that AMD doesn't need to redouble its own complier and ISV assistance efforts, as well. I wish they would. But to pretend one can or should avoid any apps touched by Intel's compilers or ISV assistance efforts verges on tinfoil hattery. (Indeed, the great success of the K7 and K8 architectures were due in no small part to their ability to run practically any existing code well. Correspondingly, Intel's Pentium 4-gen CPUs struggled comparatively at times because they were more in need of optimization help and didn't fare well with legacy code.) Benchmarking is not easy to do well, and it's vastly easier for you to sit and take potshots than it is for us to defend our methods. That is the nature of the beast, and we recognize that. But your assertions that the only fair tests must involve open-source code and/or gcc are woefully unrealistic and fraught with wishful thinking. We prefer to tether ourselves to actual applications in the majority of our tests in order to avoid this pitfall. AMD will have to fare well in this same arena in order to succeed.

            • SVB
            • 12 years ago

            I did not mean to or imply any character fault. I am just questioning the results of any benchmark which uses Intel compilers (both Fortran and C++). Your article from techreport which I quoted does only talk about SSEx instructions. But I have read many articles on the subject and it would seem that the problems go beyond SSEx instructions.

            Your comments about SiSoft Sandra are interesting and seem to contradict snowdog’s post #141. But you bring up a very interesting issue: to what extent do these benchmarks use SSEx instructions. I would assume that many of the graphical programs such as Cinebench, POV would use SSEx instructions. In looking at the benchmark sites, it was stated that both used Intel compilers.

            I don’t know if you consider it within the purview of Techreport but the issues of benchmarking and relevance to the general computer user are the basic issue here.

            The purpose of benchmarking is to investigate how the users typical load runs on the proposed processor. Unfortunately benchmarketing is all too frequently substituted for benchmarking at the behest of Intel AND AMD. It just because of their smaller size that we see less from AMD.

            I almost never use applications which use either Intel Fortran or Intel C++ and I work in R&D at a large development center. Fortran has a very limited range of application to the general user and I’m not sure that benchmarks using Fortran are relevant even to me. I have never used the Intel C++ compiler, having always opted for the Microsoft Development System and Visual C++. Because of the pervasiveness of Microsoft products and the wide use of Visual C++, I would think that compilation with Visual C++ would be much preferred to Intel C++.

            Beyond the question of benchmark preparation, there is an issue of selection. Barcelona is intended to be a server chip. I realize that most of visitor to you site are probably gamers, but I suggest that MYSQL would have been a better choice. And that games and ripping would be a good choice for the Phenom/Penryn comparison.

            AMD was very clever in their benchmarks to only compare Barcelona to K8. I’m sure one reason was to avoid any comparison with C2D but the another was surely to have avoided any problems that compiler “optimizations” would have on the published benchmarks. I think that Dirk Meyers comments in the Inquirer indicate that AMD is very much aware of the problem. The numbers that were quoted, 10 to 15%, for using Intel compilers are a tad above what I have seen elsewhere (10%).

            Your assertion that the Intel compiler produces better results for AMD processors also doesn’t seem to be bourne out by your own data:

            LAME MT MP3 encoding l[

            • Damage
            • 12 years ago

            You’re abruptly changing the topic now to the test selections/usage models. I figured it was an obvious place to go when you’d largely lost the compiler issue, but it is amusing to see. Please go troll somewhere else, thanks.

            • snowdog
            • 12 years ago

            Intel is the fastest compiler even for AMD. Get with the times. Your complaints are based on 2005 information, compilers are updated constantly. What was pertinent then is not necessarily pertinent today. The version of SiSoft you rant about was the version where 64bit extension code was in Beta, I would bet it is much cleaner today. Intel can’t be expected to spend as much time getting competitor code optimized. Where is the AMD compiler?

            The truth is Intel has the best Optimizing Compiler even for AMD code, Even AMD uses the intel compiler for it’s benchmark entries. Check who submitted this and what compiler they use.

            §[<http://www.spec.org/cpu2000/results/res2003q2/cpu2000-20030505-02154.asc<]§ Your claims are nothing but conspiracy theories. Intel compilers tend to be used because they are the best. EVEN AMD uses Intel compilers. Though it is funny to see that at least you are a consistent Fanboy ranting against the Evil Intel back in 2005 much like you do today. Say we get a benchmark that delivers iterations per second. GCC: AMD 4000 Intel 3800 ICC: AMD 5000 Intel 6000 You would insist we use an inferior compiler because it makes your favorite architecture look better. That is known as cutting off your nose to spite your face. Chances are most performance software uses ICC to squeeze out better performance, so you better get used to it.

            • SVB
            • 12 years ago

            Intel is the fastest compiler even for AMD. Get with the times. Your complaints are based on 2005 information, compilers are updated constantly.

            Interesting, that you talk about get with the times but you example is from 2002 which is somewhat older than the 2005 benchmark. r[

            • totoro
            • 12 years ago

            ICC means Intel C++ Compiler.
            Perhaps you should not argue against things with which you are unfamiliar.

            • SVB
            • 12 years ago

            ????????????. I know what icc is, I also know what gcc is. I was asking him where he got the numbers he was quoting. Please stick to useful comments and avoid insults.

            • totoro
            • 12 years ago

            You replied to r[

            • SVB
            • 12 years ago

            Actually, it would depend on the letters and/or context wouldn’t it. ICC -Intel Channel Conference. icc Intel c++ compiler, gcc gnu c++ compiler. In the context I assumed the icc was the Intel c++ compiler which is not that widely used. Almost all Linux users use the Linux version of gcc although there is an Intel version. Just look at Linux kernel builds. The Intel Linux c++ compiler has the same AMD/Intel optimization problems as does the Windows version. You should read this blog §[<http://www.swallowtail.org/naughty-intel.html<]§ as to how it was discovered, how small a fix it took to correct, Intel's reaction to the problem and how the was a 10% improvement in AMD performance. I would like to hear your comments. Microsoft has built a Visual Development system that includes c++, JAVA, Basic and maybe a few other things. I have a personal copy and have used it on several occasions. I also have a personal copy of Intel Fortran V9 I believe. Both have a very good developer interface and I cannot prefer one to the other. gcc is an SOB to use IMHO. At work, Visual C++ is used exclusively. Most, I believe because of all the goodies that come with it. As far as performance s/w goes, I don't think it is a real concern in the real world. Most I manage the fabrication of "remote sensing" systems that are put in places it is nearly impossible to get to. AMD has not and probably never built a mil-spec cpu. Intel did build a few which I used but they lost interest. The only hardened cpu's available today are Power PC, 1750a and Coldfire. The reason I don't think there is much interest in optimizing compilers today is that it is much cheaper to get a bigger engine to power systems. Based on my experience, a custom designed computer will cost about $10-15 megabucks. Twice the cpu power still costs $10-15 megabucks. The software will cost $30-50 megabucks. (about $1000 for each fully debugged and documented line of code. It seems a bit much but factoring in vacations, coffee breaks and overhead thats it. Highly optimized code can increase the cost by 50%. Using an optimizing compiler will gain typically 5-20% on most applications (not a whole lot of vector apps out there). Using assembler language will gain up to 50%. So to keep s/w costs down, a non-optimizing compiler is used. Visual c++ is the easiest to use.

            • totoro
            • 12 years ago

            wrong again.

            • SVB
            • 12 years ago

            wrong again.
            About what? Please enlighten me.

            • snowdog
            • 12 years ago

            AMD Late 2006 Spec entry, still using Intel compilers.

            §[<http://www.spec.org/cpu2000/results/res2006q3/cpu2000-20060721-06608.asc<]§ In fact search through a bunch of them. If they did the test on Windows, they used Intel. Note they use VC++ and replace the compiler with Intels. Now please explain why AMD would submit all it's Spec numbers with Intel compilers if they didn't produce the best results with AMD chips. Look at this PDF: §[<http://www.principledtechnologies.com/clients/reports/Intel/CompComp.pdf<]§ Significant performance gains for the Intel compiler with both the Xeon and the *[

            • SVB
            • 12 years ago

            Sorry to have taken so long but I need to research a few things because you posed some interesting questions. I looked at the SPEC.ORG site but so far I cannot tell just what the executables in Spec CInt 2000 do. But here are a couple observations.

            1. SSE was introduced with the PIII in February of 1999 and it took considerable time for them to be implemented. A number of applications listed ie gzip, gcc, bzip2 are 386 code and would not have SSE code. So while there was time for SSE to be introduced into some of the apps, was it done. If it wasn’t done, then the compiler would make no difference. That would also explain why AMD was using Spec CInt2000 in 2006 and not SpecCInt 2006. As long as they use Spec CInt 2000, they would have no problems with biased compilers.
            2. I believe that this is confirmed by your second example. The date of the report was June 2006 but Spec CPU2000 was used (Spec Cint and Spec Cfp are its two components). It was very interesting that everything in the Principled Technologies report scaled so nicely (See figure 3) The ratio of gcc to icc was 1.22 for both the opteron and the xeon and that in both cases the xeon was equivalent to 1.13 to 1.14 opterons. Everything is linear and consistent. Now here is the really interesting part. The opteron clock was 2.66Ghz and the xeon clock was 3.0Ghz. That ration of frequencies is 1.1278 which is virtually identical to the 1.13 to 1.14. So that with scaling, the opteron and the xeon performed identically. That’s a very nice and interesting result. It tells me that for a given clock speed, AMD and Intel processors execute base X86 and MMX code at exactly the same rate!!!!!! That’s astounding (at least to me). So the implication is that the only difference in performance IS the SSE instructions and their execution rate. (Scott, please weigh in if you have an opinion). This gets us back to the SSE instructions and the differences in compilers and why AMD always uses Spec 2000 for benchmarks. Very nice proof of my issue with the compilers and benchmarks.
            3. The Specjbb2005 probably uses SSE which is why AMD won’t use it.

            Here’s some research for you: Does AMD ever use any Spec suite other than CPU 2000 in their benchmarks? I would appreciate comments on the swallowtail.org blog. Also, I was wondering if Intel fixed the problem after AMD filed anti-trust in May of 2005 naming the alterations to the compiler as one reason they couldn’t make more money. Anyone know?

            • redpriest
            • 12 years ago

            No, they haven’t fixed the compiler. It still does the cpuid check, and defaults the path. Now whether or not this makes a big difference in some apps is up for interpretation.

            However, it is their prerogative to do so. You can make the complaint that it biases software, but that’s a competitive advantage of theirs.

        • flip-mode
        • 12 years ago

        That post could do with a little more formatting. Maybe some green/red text for the percentages. Also, snowdog has a point about your percentages giving a skewed impression. Your math is right, but it backs up the saying that there are lies, damn lies, and statistics.

          • SVB
          • 12 years ago

          Actually not my math but Excel math which is why the formatting is so bad. I copied and pasted from the Excel spread sheet to the post and it all went to …. So then I straightened it up and what you see is how it reformatted it again. Just love M$. I did not call snowdog’s calculations flakey, I said they looked flakey. What happened was I went to look at the Cinebench numbers and my mental calculations were somewhat different than snowdog’s. I got to wondering what he did so I did my own on the Excel spreadsheet so there would be no doubt as to my algorithm. As for my methodology, I will point out that it is exactly the same as Anand’s. AMD 2350 is set to 1 and the Intel 5335 is then done as a ratio of the AMD 2350. Blame me, blame Anand. But none of this changes the fact that the numbers are screwy. AMD uses a slightly bigger and better SSE engine so it should run faster, not slower. The ratio of processing power should be 3 to 2. On the integer side, AMD was somewhat smaller and the expected ratio should have been 3 to 3.5. (I believe its 3 double issue to 2 double issue and 3 single issue, could someone who remembers give me the correct ratios if I’m off). I would have expected AMD to be 50% faster on array type floating calculations and 17% slower on integer calculations. The result was not what was expected and that presents a problem to be solved.
          And as for my “speculation” about benchmarks, I gave 3 references. I could have given dozens more.
          And as to the chicanery with the Intel fortran compiler, I refer you to: §[<https://techreport.com/discussions.x/8547.<]§ It's a known fact that companies manipulate benchmarks. I'm just surprised that after the BAPCO fiasco and the known biasing of the Intel compilers anyone would build a benchmark using Intel compilers.

      • snowdog
      • 12 years ago

      l[

        • SVB
        • 12 years ago

        Easy, go to the benchmark results and read it off. For example, in the AMD submissions that you referenced for AMD, there were a total of 6 submissions. 3 for windows and 3 for linux. In the windows submissions, it clearly states that they were prepared with Intel c++ v9.

          • snowdog
          • 12 years ago

          I am talking about the benchmarks at this site.

          This list:
          SpecJBB: l[<2%<]l Valve: r[<13%<]r CineBench: r[<12%<]r PovRay: l[<4%<]l Myri: Tie Stars: r[<15%<]r Folding: l[<3%<]l PanFac: r[<13%<]r PicColor: r[<25%<]r WME: r[<6%<]r Which of these. Can we assume you don't think WME was tainted by a Intel compiler?

            • SVB
            • 12 years ago

            That’s very difficult. I only looked up 3 before I got tired. What I did was to Google the benchmark and then read down through the items to find little bits and pieces of information and try to get a picture of what was in the benchmark and who made it. Generally after about 50 dead ends, I was able to get someone who told me about the benchmark. But better still, why don’t you email Damage. He used them so he should have a copy and know where they cam from.

            I’m glad to see you taking a more positive attitude toward this. I’ve got to go. But I could lend a hand. After all, it was one of your references that gave me the final clue.

    • snowdog
    • 12 years ago

    That was a launch? Did others who surf many tech sites notice a distinct lack of a splash or any excitement?

    Did AMD do something to sour relations with hardware sites, it barely warranted a mention on most hardware sites.

    I would think with months of secrecy not letting anyone near a Barcelona, they would have an NDA expire yesterday releasing a hoard of reviews across all the major hardware sites.

      • stdPikachu
      • 12 years ago

      I think that review kits only went out on Friday, and I think only Anandtech and TR were the people who got benches out in time. Other sites have mentioned they’ve received kit, but I’ve not seen any more benches yet.

      • green
      • 12 years ago

      i don’t think that’s it
      you generally don’t see many sites doing server chip reviews
      there should be a flood of reviews when phenom comes out though

        • snowdog
        • 12 years ago

        Fair point. I don’t really care for server reviews, but it was the first look at Barcelona.

    • Conquerist
    • 12 years ago

    Good Review.
    Regarding WME not being able to create more than 4 threads:
    Why don’t you use x264 to test video encoding performance? I see so many hardware review sites usign quicktime, xvid, or windows media to test the performance, but x264 scales the best with cores, and is one of (if not the) most advanced and efficient video codec out there. And it’s FOSS. You can get the newest build at x264.nl and if you like a good frontend (GUI) called megui on the doom9 forums. x264 should scale to more than 4 cores, but if it doesn’t, you can always leave a comment for the developers.

      • Prospero424
      • 12 years ago

      I’ll second this. AutoMKV is also a nice, user-friendly, free transcoding program that showcases x264.

      X264 is really all I use anymore.

      It’s not exactly a gaping hole in the review methodology or anything, but I think it would be nice.

      • stdPikachu
      • 12 years ago

      Thirded. x264 performance is now my primary consideration when evaluating a new CPU – since it never seems to fail to (almost) max out two or four cores, I think it’d be a nice core-scaling benchmark as well.

    • Smurfer2
    • 12 years ago

    Come spring will I be buying a Penryn or a Phenom? I dunno…

    • wof
    • 12 years ago

    Now if they just make a S939 version ….

    *hides*

      • flip-mode
      • 12 years ago

      /me slaps wof 939 times

    • UberGerbil
    • 12 years ago

    There’s a discussion of TR’s numbers on the SPECjbb test by some of the experts over at RWT:
    §[<http://realworldtech.com/forums/index.cfm?action=detail&id=82686&threadid=82680&roomid=2<]§ Interestingly, they note that the Sun JVM has a 10-20% perf disadvantage relative to JRockit on 64bit; many of the most recent top scores posted at spec.org use JRockit (which is generally faster all around) so keep that in mind when comparing scores. When it comes to comparing Xeon to Opteron, it appears disabling hardware prefetch on the Xeon gains 25% on this benchmark §[<http://www.spec.org/osg/jbb2005/results/res2007q2/jbb2005-20070326-00276.html<]§ vs §[<http://www.spec.org/osg/jbb2005/results/res2007q2/jbb2005-20070326-00275.html<]§ (Note those tests were done by AMD)

      • redpriest
      • 12 years ago

      That’d be great except …. hardware prefetching is already disabled for that review. It’s right in there in the Xeon test bed notes.

        • UberGerbil
        • 12 years ago

        Ah, right you are. Missed that. (I have to say these days I skip over a lot of the test system details, but I should’ve gone back to look for that).

    • SVB
    • 12 years ago

    So much spin and so little information. I found the review at Anand’s very interesting. In fritzchess Intel was 10% better per clock. In the Intel *[

      • flip-mode
      • 12 years ago

      g[

        • SVB
        • 12 years ago

        That’s almost always true: But there are a number of interesting points that have a great deal of historical context:
        1. It appears from Anand that Barcelona is somewhat ahead on IPC but suffers badly (currently) in clock. For the last 15 or so years, Intel had clocked higher and AMD had the higher IPC.
        2. AMD has almost always had a clocking problem with first spins of any silicon. Second spins are usually greatly improved.
        3. There has always been a perception that AMD is later than what they really were. I had a long running discussion with one of the now banned Intel trolls about the release date of Barcelona. (The troll won!) Based on the June 2006 Analyst review, it seemed that AMD would start production of Barcelona at the end of June ’07 and release at the end of August. That makes AMD about 3 weeks late and not 6 to 9 months!
        4. Shrinkage is not getting the big gains of the past. Penryn will do well to get a 15% clock boost. Just as the Barcelona shrinkage due in 1H08 will only net 15% or less.
        5. So adding up the shrinkages, the respin of Barcelona and CSI to Penryn, I expect a dead heat at July 1 ’08.

          • snowdog
          • 12 years ago

          So you are ready to buy AMD stock? 😉 2008 is Nehalem intro time. Judging by how far back Intel has been showing/benching Penryns, I assume they will have quite a stockpile whenever they decide to launch and all indications are that they have had no clocking issues.

            • fishmahn
            • 12 years ago

            Considering that they’re up 0.44 to Intel’s 0.06, someone is. 🙂

            Mike.

            • snowdog
            • 12 years ago

            I don’t see Barcelona changing the the stocks downward trend.

            • Anomymous Gerbil
            • 12 years ago

            Oh dear. A single day’s stock price movements are almost entirely meaningless.

            • green
            • 12 years ago

            you don’t see much market change on release unless all the market analysts get it absolutely wrong
            given such little movement and amd’s general long drawn out downward trend i’d say some people figured it out a while back
            plus given that barcelona isn’t terrible and isn’t being left completely in the dust, there isn’t too much reason to go on a giant dumping spree

            [Edit] i should point out that we’ll likely see another two quarters of loses from amd for this one

            • SVB
            • 12 years ago

            I only brought AMD stock once in ’03 when it was selling for $3+ a share and the details about Opteron were clear enough to indicate that it would mop the floor with Northwood. Sold out around $30. Right now I don’t see enough difference indicate a significant AMD advantage (or Intel advantage) over the next year.
            Intel projects Nehalem production to begin 2H08 so I would expect release very late ’08. I cut off my prediction at July 1 so Nehalem doen’t yet exist. This is a game that can be played forever into the future. Just extend the end date 3 or 6 months to include the next big thing from your guy. Personally, I don’t like to go over a year because that’s about the time from first tape out to production. Until tape out, a processor can be anything.

            • Anonymous Gerbll
            • 12 years ago

            I dislike people who brag like that. But anyway, how many shares did you sell? =P

            • SVB
            • 12 years ago

            Brought and sold 2000. Wish I had done more and wish I had waited when it went to 40+. I didn’t mean to brag but it just seemed an appropriate response as to whether or not I was into AMD stock. I have dabbled in Intel but only made a few dollars a share. Just barely enough to cover commissions.
            This is a condition that occurs rarely with AMD and never with Intel. AMD is pretty much speculation and on those rare occasions when AMD does something good, their stock will shoot up by a factor of 10. It’s happened 2 or 3 times. After the run up the stock slowly collapses. That’s why I sold. I knew the collapse was coming, just didn’t know when and didn’t want to be sitting on my hands.
            This happens in many stocks and the problem for many investors is not to talk themselves into thinking a new product is going to change the industry. That happens all too often. It’s okay to like a company and its products, just don’t fall in love.

            • snowdog
            • 12 years ago

            My point: I expect the stock is lower in 6months rather than higher. From the benches we have seen, there are improvements but on this site Barcelona lost most of the benchmarks. They fared better on Anand, but this is also the older tech Intel. Anands Gaming benchmarks show improvments, but looking at them it doesn’t even look like the catch Conroe clock for clock.

            Once Intel starts pouring out the high clocked Penryns(doesn’t look like Intel is having any hiccups getting Penryn ready), I think AMD is essentially in the same place the were for most of this year. Solidly in second place and not able to command high ASP for the processors.

            I wouldn’t touch this stock anytime soon. There may be buying opportunities in the future.

            • SVB
            • 12 years ago

            Well I would disagree on the clock for clock basis because Anand’s data shows better IPC and Damage’s conclusion doesn’t dispute it. But on the stock, I agree. AMD would have to do something special and this isn’t it. This is really more of a keep the pace.
            The only item that does concern me is the anti-trust suit. If I had to bet, I would bet AMD and probably in the low 10 digits. But betting on the outcome of a law suit is a worse gamble than stocks. If AMD gets low enough, then I might. A 1000 shares at 3 is not much of a loss and probably won’t go much below even if AMD lost. The biggest loss would be commissions.
            A 2 or 3 gigabuck win for AMD would send the company and the stock to the moon. That’s a new fab, no debt and Intel’s worst nightmare.
            If I read Intel’s position between the lines, it seems they are not going to contest that they might have been bad boys but that the financial damage was due to AMD’s mismanagement. And there was some of that. Wish I could get an advanced copy of the judges decision.

            • snowdog
            • 12 years ago

            TR shows IPC advantage for Intel. So at this moment it is a wash. All in all it is likely server space won’t move much and it seems likely that this is the best market for AMD.

            Desktop going by Anand numbers looks like AMD is still behind on IPC for games which tends to drive the desktop race to a fair extent.

            It will probably be near the end of the year before we get some clear Phenom/Penryn desktop numbers, but from what I have seen so far, I would bet on Penryn being ahead.

            • SVB
            • 12 years ago

            You need to be careful in reading the charts. The only valid IPC comparison without doing some calculations would be Opteron 2350 vs Xeon 5335 since both run at 2.0 Ghz. I checked the first 5 or 6 and the results were very even. That said, Opteron is likely to gain more from the respin than Penryn will gain from 45nm.

            • snowdog
            • 12 years ago

            l[

            • Anomymous Gerbil
            • 12 years ago

            Investing in shares is not gambling.

            • NIKOLAS
            • 12 years ago

            g[

            • SVB
            • 12 years ago

            Commissions generally run 2-4% for market trades and $50 for post market trades. For small trades, commissions an run up 8% for market trades. (Rule #1, don’t deal in small lots). Assuming 4% each way, an 8% gain is needed to break even. Many investment grade stocks in old industries won’t do that (ie public utilities). That’s why I like to keep to trades in the 1000’s. If the stock doesn’t move as much as I thought it would, I can back out and still have a tiny profit.

            • orthogonal
            • 12 years ago

            You need to get a cheap online broker with <$9.99 per trade. You’ll save a lot of money that way 🙂

          • flip-mode
          • 12 years ago

          We might be talking around each other. I see waiting another six months as a big problem when we’ve been waiting since June 2006 for AMD to bring an actual competitor to Intel’s top end parts.

          It’s also troubling that apparently in some instances the K10 shows negligible performance improvement over K8.

            • shank15217
            • 12 years ago

            so what you do is compile the information in techreport and anandtech and come to the logical conclusion, barcelona is behind in overall performance however the architecture scales better to clockspeed therefore AMD has one issue handled, now it has to focus on the other.

            • SVB
            • 12 years ago

            Funny, I didn’t read that. What I read was: /[

    • liar
    • 12 years ago

    Review mentioned at ARS : §[<http://arstechnica.com/news.ars/post/20070910-barcelonas-out-and-the-reviews-are-out.html<]§ "...There aren't many reviews out this morning, and the few that are up aren't worth looking at (see the next section for why this is the case). The only bright spot in this picture is Scott's review at Tech Report..." Also (perhaps) of note : "AMD sells 15,000 Barcelona chips to one user as it launches new Intel fight" §[<http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9035161&source=rss_news10<]§

      • UberGerbil
      • 12 years ago

      Yeah, blade systems and customers who need compute density (over raw performance) and interconnect throughput, especially on a budget, should see a lot to like in Barcelona. And the cluster in the linked article is all of those.

    • nexxcat
    • 12 years ago

    Just a quick question: any reason why Barcelona is on a nForce Pro 3600-based SuperMicro board and the traditional Opterons are on a ServerWorks BCM 5780-based Tyan board?

    Thanks.

      • Damage
      • 12 years ago

      The nForce board has support for split power planes, and the older Tyan board does not. As a result, the Opteron 2200s would perform about the same on the nForce board, but the Opteron 2300 Barcelona chips would be slower since they’d run their memory controller at a lower clock speed.

      I couldn’t test the Opteron 2200s on the newer motherboard since I didn’t have it ahead of the Barcelona box’s arrival on Friday.

        • nexxcat
        • 12 years ago

        Aha, thank you. So AMD sent you Barcelona review system with an nForce Pro chipset for comparison, when they have ATI chipsets at their disposal? Interesting 🙂

          • Prototyped
          • 12 years ago

          Actually, for multiprocessors, the RD790 won’t be out until late this year or early next year. Their intention is to sell it for nForce Professional 3600/3050 systems until the RD790 chipset is out. See the slide at §[<https://techreport.com/discussions.x/12943<]§ . AMD and ATI currently don't have modern multiprocessor chipsets (though I'm not sure what exactly that entails in a HyperTransport-based system). The last Opteron chipset either made was the AMD 8111/8131/8151 combination, which supports only 800 MHz HyperTransport, no SATA and no PCI Express. Understandably they'd want to use something more modern. :)

    • Jeffery
    • 12 years ago

    Nice article. It will be painful to see a more elegant system architecture get trounced on the desktop front in terms of raw performance, so here is to hoping the gap won’t be too wide. 🙂

      • tfp
      • 12 years ago

      Itanium didn’t get trounced in this article! 😉

        • derFunkenstein
        • 12 years ago

        Yeah, or IBM’s POWER5. Talk about elegant…Opterons are just more of the same x86-64 that we’ve had since 2003. 😆

      • derFunkenstein
      • 12 years ago

      Who cares of it’s “elegant”…performance is where it’s at.

        • FubbHead
        • 12 years ago

        I like elegant… But I also like viable and cheap. 🙂

    • flip-mode
    • 12 years ago

    Gotta say, the way Anand’s articles / comparisons are set up are very very very helpful.

      • Mr Bill
      • 12 years ago

      Agreed, it makes the server potential look much brighter.

        • flip-mode
        • 12 years ago

        I especially like the graphs that set one or another CPU as the baseline and show other CPUs as a percentage of it’s performance – in one instance they set the 2.0GHz K10 as the baseline and in another instance they set a 2.6GHz K8 as the baseline and showed K10s percentage difference. It’s honestly much more informative than tossing a bunch of CPUs into a graph me thinks. Both methods have their place though.

          • shank15217
          • 12 years ago

          anandtech benches were hardly server benchmarks, they fall in the general category of the techreport benches, hpc or general computation. The only server benchmark was mysql and it wasn’t complete. I’m not sure why they published it.

    • eclypse3demons
    • 12 years ago

    Any competition is good, even if its not the end all be all launches many were hoping for. I like both technologies gives us options. I say good job to Intel and AMD for giving us some good stuff.

    • Archer
    • 12 years ago

    Excellent work on the review, as usual.

    • melvz90
    • 12 years ago

    Quote: As i see it: AMD has lost again 😉

    As expected… just like the R600…

    • lex-ington
    • 12 years ago

    While AMD will be bashed by most for not beating Intel across the board, let’s look at this from a business prospective.

    If all I have to do as a business owner is drop in a new processor and not change ANYTHING (no reloading any windows or server software or chipset software . . .nothing) and get more power for the same (or maybe less) power bill . . I call that a success at anytime.

      • Pax-UX
      • 12 years ago

      That’s assuming they’re running software what won’t require major investment in new license. CPU could be cheap, but new 2 CPU licence for Oracle = $$$ … Depends on the nature of the Application, even Windows has 2 / 4 proc issue it think, can’t remember if MS sorted that out

        • Usacomp2k3
        • 12 years ago

        Yeah they did. It’s by the socket. So 2 single-core cpu’s will work just the same as 2 dual-core cpu’s will work just the same as 2 quad-cores.

      • Mr Bill
      • 12 years ago

      Good point, I have a friend who will probably do just that. AMD can count on selling into that back market of already installed systems, while staying inside the same power envelope, a sweet bonus for loyal customers.

      • green
      • 12 years ago

      that depends on where you start from of course
      (eg. prescott to core2 would have been ultimate given how bad prescott was. but if you were back on s478 it would have meant jack all)

    • Krogoth
    • 12 years ago

    Thank god, Shintel, Porkster and Progesterone are gone.

    They would have a field day with flame fests.

    It seems hellokitty is MIA.

    • Prototyped
    • 12 years ago

    I’m concerned about the SPECjbb2005 results.

    See the submitted results for dual X5365 systems from Fujitsu-Siemens as well as Dell for their PowerEdge 1950 here —

    §[<http://www.spec.org/jbb2005/results/res2007q3/<]§ All have got dual processors (8 cores), and are scoring /[http://www.spec.org/jbb2005/results/res2007q2/jbb2005-20070411-00285.html<]§ Nearly 90,000 bops vs 52,438 bops that your testing revealed. Might there be something wrong?

      • totoro
      • 12 years ago

      They probably used a different JVM.

        • UberGerbil
        • 12 years ago

        Yep. TR used Sun, the Fujitsu and AMD results were with JRockit. As Scott says wrt that test:
        “*[http://dev2dev.bea.com/blog/hstahl/archive/2007/09/specjbb2005_res_1.html<]§

    • stmok
    • 12 years ago

    A side question:

    What the heck is “Rapid Virtualization Indexing” exactly?

    (I know its supposed to accelerate something that plays an important role in virtualization performance, but not sure of the details).

    Its tooted in AMD’s presentation slides, but it would be nice if we can see this feature be benchmarked or compared in some way. (Not mentioned in this article for some reason).

      • Gungir
      • 12 years ago

      If it’s mentioned in AMD’s presentation slides, odds are it’s not ready for the market yet. Not many people are itching for new virtualization software to my knowledge, unlike new CPU architectures, so I doubt that we’ll see any pre-release of RVI like we did with Barcelona.

    • tfp
    • 12 years ago

    Right now AMD needs to get the L3 clock speed up to the chip speed to improve performance that would pull up the memory controller as well. It would improve the system memory latency as a whole and I expect it would provide a good performance improvement as well.

    I think, at least for now, that it was a mistake to link the L3 speed to the memory controller but that is probably what they needed to do to keep the shared cache running “full out” while they are able to down clock other parts/cores of the chip. It does make sense that it was linked to something that will still be running “all out” as long as one core is up and running, the memory controller. I would expect that it is cheaper to do than to give the L3 cache its own clock but it looks to have impacted performance.

    • gratuitous
    • 12 years ago
      • Lucky Jack Aubrey
      • 12 years ago

      /[

      • flip-mode
      • 12 years ago

      I predict you’ll be wrong. OEMs want a complete package. Besides, all the cards besides the HD2900 are selling in very high volumes to OEMs and should provide some good revenues.

      Furthermore, your post has little to do with the subject at hand.

        • willyolio
        • 12 years ago

        perhaps he should delete it then.

        =P

      • Bauxite
      • 12 years ago

      I predict you’re yet another -[

        • flip-mode
        • 12 years ago

        Well the guy can’t win if he’s first criticized for deleting his posts and then once he starts leaving them alone he’s labeled as a Troll.

          • UberGerbil
          • 12 years ago

          He can indeed win by both not trolling and not deleting.

      • snowdog
      • 12 years ago

      There is no going back. AMD bet the farm on the ATI acquisition. Nothing about the acquisition caused R600/Barcelona issues. These would have happened if they remained stand alone companies. It is just unfortunate that they coincide after the ATI purchase.

      I don’t pay attention to the server market, but I wouldn’t be buying AMD stock anytime soon. It is going to be a tough slog at AMD as they put their Fusion/Bulldozer strategy into play.

      For now they have bigger, more expensive dies and even if they tie Intel on the desktop, which seems more likely Intel holds both the desktop IPC and clock speed crown. Intel also has a lead on production volume and costs. Ouch, Ouch, Ouch…

      On graphics they also seem to be second on all fronts, with NVidia about to drop even faster parts. It must be tense at AMD.

        • FubbHead
        • 12 years ago

        You know… Being second on two fronts might not be that bad, as the number one in both markets isn’t the same player.

          • snowdog
          • 12 years ago

          Actually it seems worse to me. They are competing against companies with focused missions. CPU for Intel, Graphics for NVidia. The bulk of their resources go into that focused division.

          AMD must make tough decisions about which division gets the smaller cut and may fall even more. Also AMD in particular barely ever made money when in second place. ATI hurts to a similar extent when in second place.

    • crose
    • 12 years ago

    Great, but two things missing: 45nm and clock-parity with Xeon/Core 2

    • flip-mode
    • 12 years ago

    Damn, this isn’t what I was hoping for. I was hoping for more or less per-clock equality. It ain’t all bad, but it ain’t good enough for me to pick a Phenom over a “Core”, all else equal.

    So the rumors prove true everyone: When AMD or Intel are silent then there is reason to worry and little cause for hope.

    • Logan[TeamX]
    • 12 years ago

    Well, that’s pretty clear. If I want to build my 2S, 4-core-per-package box at home it’ll be Intel-based.

    I’ve had AMD for 6 years and I’ve rarely been disappointed with it. I may yet just upgrade the video card to something new and run it for another year or two. (Opty 170 @ 2.4GHz, 2GB RAM)

    Post up if anyone has a line on a S939 Opteron 185.

      • d0g_p00p
      • 12 years ago

      Newegg has the S939 Opty 185’s in stock.

    • Krogoth
    • 12 years ago

    Excellent review, Damage.

    • snowdog
    • 12 years ago

    No surprises. I was expecting a less than home run performance, since in past months AMD has been directing all their hype toward Bulldozer and other future developments. Telegraphing that Barcelona was not going to be a huge step up on Intel.

    Looking at Anands gaming performance improvement it looks like a 15% improvement, clock for clock over K8, which looks pretty good. A bigger jump than Penryn has, but Intel is starting ahead, but it looks like they will still be ahead with Penryn but with a smaller gap.

    AMD is in between the Rock of Intel and the hard place of higher productions costs. Barcelona is on 65nm, Penryn 45, giving Intel a smaller die and lower production costs. Also native quad cores will cost more to manufacture as well.

    AMD is really going to need to batten down the hatches to ride out the storm a while longer.

    • Jigar
    • 12 years ago

    Picture doesn’t look that good for Barcelona… but let’s see what happens, when Barcelona scales above 2.6GHZ.

      • tfp
      • 12 years ago

      Scaling will not change unless they change how part of the chip is running…

      • stmok
      • 12 years ago

      2.6Ghz+…That’s what I hear people are saying. But will it be true?

      I’m finding “Barcelona” like a hot girl that doesn’t know how to perform in bed. Its a bit anti-climatic.

      Anandtech’s article shows a slightly more rosier picture.

        • tfp
        • 12 years ago

        If this is true then they are changing something, it could be as simple as being able to run the memory controller at CPU clock speed. Performance per clock will not increase as Mhz go up if everything is increased at the same rate they are using now.

    • clone
    • 12 years ago

    while AMD is flowering up with the low power consumption #’s as a reason for why the clock speeds are so low, they do have a point….. whether it translates into increasing sales is beyond me.

    after reading all of the tests it appears the chip will scale very well including the cache latency which appears to minimise as clock speeds increase and also lends credence to AMD’s comment that once above 2600mhz Barcelona scales very well….. future proofing I suspect will be a definite necessity once Intel responds so work hard as you just might have something AMD.

    while that’s great the reality is no 2600mhz parts are available.

    I hope for AMD’s sake that power consumption is a serious concern in servers so that they can at least slow the bloodletting until they can get the higher end parts on the market in volume.

    if things weren’t going so badly for AMD I suspect these CPU’s would never have been released in the manner they’ve been, while they show promise they won’t stop anyone in other segments from going Intel if they were sitting on the fence prior.

      • Severus
      • 12 years ago

      Its a pretty anemic launch, but I agree with the “wait and see how it scales” sentiment.

      Power consumption in server space really is a big concern, however. It allows higher densities. Two of my last three clients are at the absolute limits of their server rooms power and cooling capacities, yet compute demand from the businesses continues to grow exponentially all the same. It is a significant consideration for medium-to-large enterprises.

      That said, we all want to see comprehensive gaming benchmarks 🙂

    • UberGerbil
    • 12 years ago

    Rarely has so much anticipation been applied to something so ultimately anticlimactic. It’s /[

      • insulin_junkie72
      • 12 years ago

      *[< Rarely has so much anticipation been applied to something so ultimately anticlimactic.<]* The introduction of ATI's R600s has to at least be in the running on that score, I'd think.

      • Ruiner
      • 12 years ago

      Who’s Jar Jar in that analogy?

    • shank15217
    • 12 years ago

    in 2 different benchmarks phenom is faster than core 2 clock for clock, winrar compression and 3d rendering with kribi bench. Penryn will not debut at 4ghz and you haven’t seen any info about phenom overclocking. K10 scales very well with clockspeeds, at 3 ghz things will look better for phenom than at 2, this is obvious from the benches. reply to 6

      • tombman
      • 12 years ago

      Both tests you mentioned rely heavily on memory bandwith and therefore NUMA. The K10 CORE itself is clearly weaker than Conroe (let alone Penryn). Just check the single cpu, or better, single core benchmarks. (Or just check benchmarks that don`t require massiv memory bandwiths, like GAMES or Cinebench.)

      In a gaming pc environment there will be no NUMA (for speedup), nor FB-DIMM (for slowdown). Both cpus will use simple DDR2 or DDR3. And with only 4 cores in a single cpu system (thats what gamers use) FSB bandwidth limitations is no problem (i have a quadcore now).

      reply to #9

        • shank15217
        • 12 years ago

        if both tests rely on memory bandwidth then barcelona would have beat the core 2 on the other rendering tests as well. Also if you actually ran the rar benchmark, all it does is compress a stream of bits and give the throughput. That data stays in caches, not in main memory.

    • ssidbroadcast
    • 12 years ago

    Huh. Seems like the market is headed towards a near-parity stalemate. Is perhaps the laws of physics/diminishing returns responsible for this?

      • Krogoth
      • 12 years ago

      Indeed, it has been like this since the failure of Prescott in terms of its original goals.

      Silicon is just running out of breathing space. The other alternative semiconductor-grade materials can scale higher, but are far more expensive to fab with.

        • Gungir
        • 12 years ago

        Far too expensive for now, that is. There was a time when a computer was far too expensive for the average potential buyer. Now? Well, things have changed a bit, to say the least. No matter how far out a technology may seem now, if cost is the greatest reason it’s inviable, I thnk we’ll eventually see it in use.

    • shank15217
    • 12 years ago

    That clock scaling is something worth looking into especially with the L3 cache in the picture and performance in the desktop space seems that a 3.0 ghz phenom will compete with a 3.0 ghz core 2. I would buy a phenom at 3.0 ghz, its “fast enough”. The huge differences in desktop performance seems to be gone. Also other reviews on the web with different applications paint a better picture of the barcelona chips, consider the kribi3d benchmark done at anandtech. Per core performance isn’t exactly fair because many apps benefit from better bandwidth and data access latencies available to the Barcelona architecture.

    • tombman
    • 12 years ago

    Good review.

    As i see it: AMD has lost again 😉

    At best they can match the clock-for-clock performance of Conroe. Since Intel has almost 50% higher clockspeed performance AND experience together with 45nm process, which lets them produce chips way cheaper than AMD- i see no real light at the end of the tunnel, that AMD is in since a pretty long time now. IMO AMD has to reach 3Ghz und get 45nm chips soon, or they’ll be doomed 😉

      • shank15217
      • 12 years ago

      I would suggest you read some of the other benches available on the web. Its really dependent on the application benchmark you are using. Its certainly not an “at best” scenario.

        • tombman
        • 12 years ago

        I red them all, and K10 will be no match for penryn…
        HighEnd cpus are bought by freaks with cash, and freaks overclock. Penryn will run at 4Ghz with standard air cooling- i want to see phenom do this ;D

      • lyc
      • 12 years ago

      you seem to take an awful lot of pleasure in stating that… i too am disappointed by these (preliminary) results, but *[

        • tombman
        • 12 years ago

        Of course i’m happy, because i will have no reason to change my system- intel still rules 🙂

        Only thing i will do is swap my Kentsfield for a Yorkfield, that´s all.

        Imagine K10 would have been faster: i would`ve to change my mainboard, forget SLI, get a worse retention system (cpu pins suck hard, as does ZIF sockets), get lower overclocks etc…

        (I now have 4Ghz Kenntsfield + 8800 GTX SLI running on an eVGA 680i mainboard, which is compatible with yorkfield….)

        I’m not an Intel fanboy, i don’t care who has the best performance, be it intel or amd- as long as i have the best cpu 😉
        Same for graphics, nvidia or amd…

          • eitje
          • 12 years ago

          psch, right – you’re not an Intel fanboy? You’re just a socket fanboy….?

      • Krogoth
      • 12 years ago

      Wow, you really have no clue how the CPU market works.

      Barcelona can easily compete against Intel’s offerings in the server market. The older K8s are denying Intel of some mainstream market sells. Both of these markets outweigh the tiny enthusiast market by a large margin.

      AMD will fare this rough storm fine. They just need to refine the fab process and binning. I think Penyrn is going to be just as large of a disappointment in the eyes of epenis crowd due to its own massive hype. I think 4Ghz will be Penyrn’s practical celling and most of its performance improvements lie in SSE. I am not saying it is a bad thing, but Intel’s next big thing is going to be Nehalem.

      In other words, if you are on a X2 or Conroe-based platform. There is no reason to get a Phenom or Penyrn-base platform.

      BTW, I am on a Q6600 system. I want AMD to give Intel some pressure, because competition is ultimately good for the consumer. You may or may not remember what were the CPU prices back when Intel was dominating the market. 😉

        • lyc
        • 12 years ago

        definitely not, if you look at his reply to mine — he completely missed the point…

    • provoko
    • 12 years ago

    Great bed time reading. 😉

    • Jive
    • 12 years ago

    Very nice review as always, at least AMD can stay competitive for a while longer, although its not looking too good in the long run. Now, where are those Phenom reviews? 😀

Pin It on Pinterest