AMD’s dual-core Opteron processors

MICROPROCESSORS ARE GETTING too hot, requiring too much power, and not delivering enough additional performance for it. That’s the basic problem. The engine that’s driven the microcomputer’s incredible rise in capability over the past 30 years, Moore’s Law, isn’t quite out of steam yet, but some of its offshoots are on the ropes. CPU designers have nearly exhausted their collective bag of tricks to get more performance out of additional transistors on a chip by increasing parallelism at the instruction level. Speculative execution and deep pipelining are by now very standard features, and CPU designs are getting increasingly complex and hard to manage. When Gordon Moore’s goose lays a golden egg and the number of transistors possible on a chip doubles, as it is supposed to do every 18 months, taking advantage of the windfall has proven increasingly difficult.

Cranking up clock speeds hasn’t helped much, either, because of transistor leakage problems. Chips are sucking up large amounts of power and expending much of it as heat, and the problem grows more acute as clock speeds ramp up. The most widely noted example of these problems, by far, has been at the company Moore co-founded. The power, heat, and speed problems of the “Prescott” core inside of recent Pentium 4 processors prompted Intel into an impressive and very public change of direction over the course of the past year. The company has sworn off the quest for 4GHz, shied away from clock speed as a measure of performance, and utterly rewritten its CPU roadmap.

AMD has not been entirely immune to these problems, but it has sidestepped their worst effects by keeping clock speeds down. The original Opteron processor debuted two years ago today at speeds up to 2GHz. Two years later, the same processors are available at 2.6GHz—only a 600MHz increase, not much in the grand scheme.

Fortunately, both AMD and Intel seem to have settled on an answer that should allow them to take advantage of ballooning transistor counts to gain additional performance: thread-level parallelism. By dialing back clock speeds and putting multiple CPU cores on a chip, the theory goes, processor performance can rise as transistor counts do. This sort of parallelism will, of course, be familiar to those who know a thing or two about Opteron processors, which have commonly been employed in pairs as part of server or workstation systems.

In fact, AMD says that the Opteron was designed from the outset with dual-core implementations in mind. The folks there are also quick to remind anyone who will listen that AMD was first to tape out an x86-compatible dual-core design and first to demonstrate such a beast in public. Today, they aim to be the first manufacturer to deliver dual-core x86 processors for workstations and servers, just days after Intel officially announced its first dual-core desktop processors.

We’ve had a pair of dual-core Opteron processors on the test bench for some time now, and we’re pleased to report some rather impressive results. AMD’s dual-core design is something more than just a pair of CPUs glued together on a single piece of silicon, and this design choice yields a performance dividend. Keep reading to see how the new Opteron 275 stacks up against its Opteron predecessors and against Intel’s latest “Nocona” Xeons. We also have a head-to-head battle of single-socket, dual-core workstation processors: the Opteron 175 versus the Pentium Extreme Edition 840.

The processors
On looks alone, one would be hard pressed to tell the difference between dual-core Opterons and their single-core counterparts.


A pair of Opteron 875 processors

They’re cosmetically identical, save for the slightly revamped model numbering scheme. The three-tiered processor series convention remains intact. 100 series processors are for single-socket systems, the 200 series for dual, and the 800 series is intended for 4-socket systems or better. However, instead of incrementing the tail end of the model number by two as clock speeds ramp up, as the Opterons 246, 248, and 250 did, the dual-core models will come in increments of five. The first dual-core Opterons will arrive at clock speeds of 1.8, 2.0, and 2.2GHz as models x65, x70, and x75, respectively.

Prices will vary according to whether the chips are part of the 100, 200, or 800 series and according to clock speeds, but the general plan for pricing is fairly straightforward: it’s almost as if AMD were introducing three new top-end speed grades at once. However, there is some overlap. For instance, the Opteron 252 is priced at $851, and the Opteron 265 will be priced the same. Consumers can choose whether they wish to purchase a dual-core processor at 1.8GHz or a single core at 2.6GHz for the same amount. The higher models will carry a premium, but AMD plans to bring the prices of dual-core Opterons down over time into the territory of the current single-core models.

The even better news for current owners of Opteron systems is that the dual-core Opterons will be pin-compatible with existing Socket 940 systems, capable of acting as drop-in replacements for current single-core models. The only requirement is that the motherboard must be able to support newer 90nm chips like the Opteron 252. If the board can do that, it should be able to handle the dual-core chips after a BIOS update, AMD claims. (Check with your motherboard maker to be sure.)

In order to pull off this impressive feat of backward compatibility, AMD had to make its dual-core parts fit into the same basic power and heat envelopes as its single-core processors. To do so, the company tweaked its fabrication process, using lower-leakage transistors that switch somewhat slower but waste less power, among other things. As a result, the Opteron 275 tops out at 2.2GHz, but it consumes no more power than the Opteron 252 at 2.6GHz.

This is one of the minor miracles of choosing thread-level parallelism over higher clock speeds. When we asked AMD CTO Fred Weber about how they managed to keep power and heat so low, he was coy about which specific optimizations AMD employed, but he offered some examples. When you’re not optimizing for the absolute best linear performance, he noted, many things are possible, including everything from changing the oxide thickness and transistor voltages to resizing buffers and more extensive clock gating.

To further manage heat and power, dual-core Opterons will support AMD’s PowerNow feature (also known as Cool’n’Quiet in the desktop world) that scales clock speeds and CPU voltages down at times of low CPU loads. This feature will function on a whole-chip basis; the CPU cores will not scale their clock speeds up and down independently.


A shot of the dual-core Opteron die. Source: AMD.

As for the chip itself, the dual-core Opteron will be manufactured on AMD’s 90nm process with silicon-on-insulator (SOI) technology. The chips will include all of the latest enhancements AMD has made to the K8 core, including SSE3 support and an improved memory controller with broader compatibility, improved memory loading, and more efficient memory mapping.

A dual-core Opteron chip packs in about 233 million transistors, and its die size is a very healthy 199 mm2. The Intel Prescott/Nocona on which the Xeon is based is 112 mm2 with roughly 125 million transistors. (The newer version with 2MB of L2 cache has 133 million transistors.) So the dual-core Opteron is large, but it’s also a very close match for Intel’s “Smithfield” dual core, which weighs in at roughly 230 million transistors and 206 mm2, although estimates and methods of counting transistors can vary.

 

A closer look at AMD’s dual-core architecture
Let’s start by looking at a very simplified diagram of a dual-core Opteron, which looks like so:


How two CPU cores are situated together on a chip. Source: AMD.

As you can see, AMD didn’t simply glue a pair of K8 cores together on a single piece of silicon. They’ve actually done some integration work at a very basic level, so that the two CPU cores can act together more effectively. Each of the K8 cores has its own, independent L2 cache onboard, but the two cores share a common system request queue. They also share a dual-channel DDR memory controller and a set of HyperTransport links to the outside world. Access to these I/O resources is adjudicated via a crossbar, or switch, so that each CPU can talk directly to memory or I/O as efficiently as possible. In some respects, the dual-core Opteron acts very much like a sort of SMP system on a chip, passing data back and forth between the two cores internally. To the rest of the system I/O infrastructure, though, the dual-core Opteron looks more or less like the single-core version.

The Opteron’s system architecture remains very different from that of its primary competitor, Intel’s Xeon. AMD says its so-called Direct Connect architecture was over-designed for single-core Opterons with an eye to the dual-core future. Each processor (whether dual core or single) has its own local dual-channel DDR memory controller, and the processors talk to one another and to I/O chips via point-to-point HyperTransport links running at 1GHz. This arrangement makes for a network-like system topology with gobs of bandwidth. The total possible bandwidth flowing through the 940 pins of an Opteron 875 is 30.4GB/s—technically, enough to choke a horse. With one less HyperTransport link, the Opteron 275 can theoretically hit 22.4GB/s.

By contrast, current Xeons have a shared front-side bus on which the north bridge chip (with memory controller) and both processors reside. At 800MHz, its total bandwidth is 6.4GB/s—a possible bottleneck in certain situations.

MESI-MESI-MOESI Banana-fana…
In order to understand the impact of AMD’s dual-core chip design and system architecture, we should briefly discuss cache coherency. This scary sounding term is actually one of the bigger challenges in a multiprocessor system. How do you handle the fact that one CPU may have a certain chunk of data in its cache and be modifying it while another CPU wants to read it from memory and operate on it, as well? Assuming you don’t run from the room screaming in fear at the complexity of it all, the answer is some sort of cache coherency protocol. Such a protocol would store information about the status of data in the cache and offer updates to other CPUs in the system when something changes.

Intel’s Xeons use a cache coherency protocol called MESI. MESI is an acronym that stands for the various states that data in the CPU’s cache can be flagged as: modified, exclusive, shared, or invalid. Let’s tackle them completely out of order, just to be difficult. If a CPU pulls a chunk of data into cache and has not modified it, the data will be flagged as Exclusive. Should another CPU pull that same chunk of data into its cache, the data would then be marked as Shared. Then let’s say that one of the processors were to modify that data; the data would be marked locally as Modified, and the same chunk on the other CPU would be flagged as Invalid.

Simple, no?

The processor with the Invalid data in its cache (CPU 0, let’s say) might then wish to modify that chunk of data, but it could not do so while the only valid copy of the data is in the cache of the other processor (CPU 1). Instead, CPU 0 would have to wait until CPU 1 wrote the modified data back to main memory before proceeding—and that takes time, bus bandwidth, and memory bandwidth. This is the great drawback of MESI.

AMD sought to address this problem by making use of a cache coherency protocol called MOESI, which adds a fifth possible state to its quiver: Owner. (MOESI is used by all Opterons and was even used by the Athlon MP and 760MP chipset back in the day.) A CPU that “owns” certain data has that data in its cache, has modified it, and yet makes it available to other CPUs. Data flagged as Owner in an Opteron cache can be delivered directly from the cache of CPU 0 into the cache of CPU 1 via a CPU-to-CPU HyperTransport link, without having to be written to main memory.

That alone is a nice enhancement over MESI, but the dual-core Opterons take things a step further. In the dual-core chip, cache coherency for the two local CPU cores is still managed via MOESI, but updates and data transfers happen through the system request interface (SRI) rather than via HyperTransport. This interface runs at the speed of the CPU, so transfers from the cache on core 0 into the cache on core 1 should happen very, very quickly. Externally, MOESI updates from a pair of cores in a socket are grouped in order to keep HyperTransport utilization low.

Again, this is quite the contrast with Intel’s dual-core implementation, which remains on Smithfield almost exactly like a pair of Xeons on two sockets. MESI updates are communicated over the front-side bus. There is no alternative internal on-chip data path.

Interestingly, the ability of the two cores to pass data quickly to one another seems to offer a compelling enough performance benefit that, from what I gather, AMD’s guidance to OS vendors has been to give priority to scheduling threads on adjacent cores first before spinning off a thread on a CPU core on another socket. That’s despite the fact that there’s additional memory bandwidth available on the second socket.

 

Why I am a bad person
This is the part of the review where I explain why I benchmarked what I did, the way I did, and in such obvious violation of the Sacred Creed of Geeks Everywhere.

First, I tried to test the CPUs in such way as to show their benefits and limitations. To do so, I used the brand-spanking-new Windows XP x64 Edition operating system and a number of 64-bit applications. WinXP x64 is NUMA aware—that is, it comprehends the need to put data into the memory attached to the CPU modifying that data. By nature, Opteron systems require a NUMA-aware OS in order to perform at their best. Also, AMD says that the WinXP x64 scheduler is especially well tuned for dual-core processors.

Obviously, 64-bit applications are the future of dual-core processors. All of the CPUs that I tested, including the moldy old Opterons, newer Xeons, and brand-new dual-core processors from AMD and Intel, are 64-bit capable. I also tried to make use of multithreaded applications where possible, although some programs aren’t threaded and some types of tasks simply don’t lend themselves to multithreading. The end result is that nine of our 15 test applications are multithreaded, and of those nine, five are 64-bit binaries.

I understand that multitasking has been cited as one of the key areas where dual-core processors will benefit the end user, and I certainly don’t disagree entirely. The full-fat, Atkins-approved creamy smoothness that comes with a multiprocessor system will be a boon in desktop systems and in low-end, single-processor workstations, and I have extolled its virtues at length in the past. However, most workstation-class systems are already multiprocessor boxes. Not only that, but we are living right now in what I’d call the Multitasking Moment, as we transition from one CPU core to two. Once dual-core processors become more common, multitasking smoothness will no longer be a big issue. The more relevant question as we move to two, four, eight, and more CPU cores per system will be about the benefits of thread-level parallelism to outright performance, and that is the question I’ve attempted to address in my testing.

I also have not made any attempt at server-class testing in this review. I would have loved to do it, but it would be a new enterprise for us around here, and we had our hands full in doing the testing we did. I would also like to apologize for the workstation purists for not ponying up the cash for high-end Quadro or FireGL graphics cards for our test systems. Truth be told, I would have loved to use them, but the dual Xeon rig soaked up our budget for this review. And that’s without us paying through the nose for registered DDR2-400 DIMMs for that rig. I am, as I said, a bad person.

I am also probably a bad person for focusing primarily on DCC and scientific computing instead of CAD/CAM applications that are generally not multithreaded. Even worse, I threw in a few game benchmarks at the end of the review. Please, whatever you do, don’t tell your boss about those.


I am also a bad person because I really like Tyan’s wicked S2895 motherboard that
sports dual 940-pin sockets and dual 16-lane PCI-E x16 slots.

Cool stuff to watch for in the results
There are a number of intriguing matchups in our benchmark results. Let me outline a few of them, so you know what to watch for.

  • Opteron 175 versus dual Opteron 248 — This is a matchup at 2.2GHz between a dual-core Opteron and a pair of single-core Opterons. Is it better to have two cores situated closely together, or is having double the memory bandwidth, as the Opteron 248s do, preferable?
  • Opteron 175 versus Opteron 152 — So would you rather have: a single-core Opteron at 2.6GHz or a dual-core model at 2.2GHz?
  • Dual Opteron 275 versus dual Xeon 3.4GHz — Yes, Intel has newer Xeons out with 2MB of L2 cache, but we couldn’t find them to purchase when we were buying hardware for this review. We suspect that cache size doesn’t make much difference in most of the applications we’re testing, which don’t tend to use the extra meg of cache very well. (See our P4 600 series review for more on the 2MB cache’s performance impact.) The question is, how does a dual-CPU Hyper-Threaded Xeon compare to a quad-core Opteron?
  • Opteron 175 versus Pentium Extreme Edition 840 — Dual-core processors from AMD and Intel go head to head. Perhaps we’re crossing market segments a little bit here, but then Intel targets its high-end Pentium chipsets and processors at the low-end workstation market. The Opteron 175 and Extreme Edition 840 are arguably direct competitors. Who else is Intel targeting with a pair of 3.2GHz Prescott cores on one chip?

    Evil people who wish to observe possible desktop processor performance matchups should note that the Opteron 175 is essentially identical to the Athlon 64 X2 4400+.

  • Dual Xeon 3.2GHz versus Pentium Extreme Edition 840 — Both are dual Prescott cores with 1M of L2 cache running at 3.2GHz on a shared 800MHz front-side bus. The Xeon is saddled with a slower memory subsystem, while the Pentium XE 840 is one chip rather than two. Hmm.
  • Pentium Extreme Edition 840 versus Pentium D 840 — If you disable Hyper-Threading on the Pentium XE 840, you get a Pentium D 840. That’s exactly what we did, because we were curious to see the performance impact.
  • Redemption for Prescott? — We’ve used a mix of single-threaded and multithreaded applications in our past CPU reviews, but we’re using more threading now than ever. Does Intel’s Hyper-Threaded processor core regain some of its luster when more of the benchmarks go multithreaded?
  • Dual Opteron 275 versus the world — What can a dual-socket, quad-core system do better than any of the two-core systems we’re testing here? Not everything, because some applications only spin off one or two threads. In some cases, though, well, you’ll see..

There are some other interesting questions to be asked about the results, but you can find them for yourself. I’m just offering my top suggestions.

 

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.

Our test systems were configured like so:

Processor Opteron 148
Opteron 152
Opteron 175
Dual Opteron 248
Dual Opteron 252
Dual Opteron 275
Xeon 3.2GHz (Nocona 1MB)
Xeon 3.4GHz (Nocona 1MB)
Dual Xeon 3.2GHz (Nocona 1MB)
Dual Xeon 3.4GHz (Nocona 1MB)
Pentium 4 660 3.6GHz
Pentium D 840 3.2GHz
Pentium Extreme Edition 840 3.2GHz
System bus 1GHz HyperTransport 800MHz (200MHz quad-pumped) 800MHz (200MHz quad-pumped)
Motherboard Tyan Thunder K8WE S2895 SuperMicro X6DAL-G Intel D955XBK
BIOS revision 2/21/2005 beta 080010 BK95510J.86A.1152
North bridge nForce Professional 2200
nForce Professional 2050
AMD 8131 PCI-X Tunnel
Intel E7525 955X MCH
South bridge 6300ESB ICH7R
Chipset drivers SMBus driver 4.45
IDE driver 4.75
OS integrated INF Update 7.0.0.1019
Memory size 2GB (4 DIMMs) 2GB (4 DIMMs) 1GB (2 DIMMs)
Memory type OCZ PC3200 512MB registered ECC DDR SDRAM at 400MHz Kingston PC3200 512MB registered ECC DDR DRAM at 333MHz Corsiar XMS2 5400UL DDR2 SDRAM at 533MHz
CAS latency (CL) 3 2.5 3
RAS to CAS delay (tRCD) 3 3 2
RAS precharge (tRP) 3 3 2
Cycle time (tRAS) 8 7 8
Hard drive Maxtor DiamondMax 10 250GB SATA 150
Audio Integrated nForce/AD1981B
with NVIDIA 4.60 drivers
Integrated 6300ESB/ALC650
with Realtek 5.10.0.5820 drivers
Integrated ICH7R/STAC9221D5
with SigmaTel 5.10.4456.0 drivers
Graphics GeForce 6800 Ultra 256MB PCI-E
with ForceWare 71.84 drivers
GeForce 6800 Ultra 256MB PCI-E
with ForceWare 71.84 drivers
GeForce 6800 Ultra 256MB PCI-E
with ForceWare 71.84 drivers
OS Windows XP Professional x64 Edition Windows XP Professional x64 Edition Windows XP Professional x64 Edition
OS updates

Note that we have less total memory on the Pentium setups. I don’t believe any of our benchmarks are constrained by available RAM in a 1GB system, but you’ll still want to keep the difference in mind.

All tests on the Pentium systems were run with Hyper-Threading enabled, except where otherwise noted.

Thanks to Corsair, OCZ, and Kingston for providing us with memory for our testing. This matchup required lots of high-quality RAM, so we had to spread the love around. All three brands are far and away superior to generic, no-name memory.

Also, all of our test systems were powered by OCZ PowerStream power supply units. The PowerStream was one of our Editor’s Choice winners in our latest PSU round-up.

The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

Memory performance
We begin with a few synthetic memory performance tests because the memory subsystem’s performance affects our understanding of what’s going on in the rest of the benchmarks.

Sandra’s memory performance tests are multithreaded and take full advantage of the dual-socket Opterons’ NUMA memory subsystem. Yow.

The Xeon systems are at a distinct disadvantage due to their dual-channel DDR333 memory, but that’s about as fast as it gets for Xeons currently. Some Xeon motherboards offer dual-channel DDR2 400, but that’s not likely much faster than DDR memory at 333MHz. The Pentium XE 840 achieves much higher throughput thanks to its dual channels of low-latency DDR533 memory.

Linpack lets us get a quick visual on the size and speed of the L1 data and L2 caches on these processors, and there’s nothing unexpected here.

The Opteron’s copious memory bandwidth is tied to its low memory latencies, which come courtesy of its built-in memory controller. Because the memory controller runs at the speed of the CPU, the faster the processor, the lower the memory access latency. The multi-socket Opterons pay a small latency penalty here, but nothing major. The Xeons, on the other hand, have a rough time of it.

 

POV-Ray rendering
POV-Ray just recently made the move to 64-bit binaries, and thanks to the nifty SMPOV distributed rendering utility, we’ve been able to make it multithreaded, as well. SMPOV spins off any number of instances of the POV-Ray renderer, and it will bisect the scene in several different ways. For this scene, the best choice was to divide the screen up horizontally between the different threads, which provides a fairly even workload.

Notice the Task Manger graph above. I’ve included those to give some indication of how much an application occupies each CPU. In this case, SMPOV and POV-Ray show a near-perfect 100%, nailed-to-the-wall utilization across all four CPU cores in our dual Opteron 275 system.

Rendering is one of those cases where multithreading can bring huge performance increases. The dual Opteron 275 crushes everything else here, as one might expect, rendering the entire scene in 87 seconds. Note that the dual-core Opteron 175 slightly outperforms the pair of Opteron 248s, too.

The other big thing to notice here is how much faster the Prescott/Nocona core looks when additional POV-Ray threads take advantage of Hyper-Threading. The Xeon 3.4GHz shaves 50 seconds off of its render time with a second thread.

 

3dsmax 7 rendering
We tested 3ds max performance by rendering 20 frames of a sample scene at 320×240 resolution. This particular scene makes use of a motion-blur effect that requires extensive multi-pass rendering. We tried two different renderers: 3ds max’s default scanline renderer and its built-in version of the mental ray renderer.

The default renderer performs very well on our quad-core Opteron 275 system, and once again, the Opteron 175 just edges out a pair of 248s.

Unfortunately, the mental ray renderer didn’t like our dual Opteron 275 system. It refused to use all four cores to the full because the license for 3ds max’s integrated version of mental ray will only use two processors, no more. AMD has been pushing software makers to do their licensing on a per-CPU rather than per-core basis, but some current workstation-class programs will probably present this sort of problem, at least until the licensing model for dual-core products is fully worked out and programs are updated.

With two threads, the dual Opteron 252s at 2.6GHz are fastest here. Once more, the Opteron 175 outdoes the dual 248s, as it does a pair of Xeons.

 

Cinema 4D rendering
Cinema 4D’s rendering engine does a very nice job of distributing the load across multiple processors, as the Task Manager graph shows.

With four threads active, the dual Opteron 275 system absolutely tears through this rendering task, busting the 1000 mark in a benchmark where our previous champ had scored 689. That champ, by the way, was a system with dual 3.2GHz Xeons based on the older “Prestonia” core. The newer Nocona Xeons at 3.4GHz are slower here, even at a higher clock speed.

The remainder of Cinebench’s tests are all single-threaded shading tests, and the Opterons all perform well here, though additional cores or processors are no help.

 

LAME audio encoding
LAME MT is, as you might have guessed, a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. You can even download a paper (in Word format) describing the programming effort.

Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. The author notes, “In general, this approach is highly recommended, for it is exponentially harder to debug a parallel application than a linear one.”

We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here, as we have done in our previous CPU reviews.

Notice how the CPU affinity tends to ping-pong around as the encoder runs. That’s typical behavior in Windows for some applications.

As you can see, LAME MT’s two threads work well with Hyper-Threading or SMP, but they don’t take any extra advantage of the four cores on the dual Opteron 275.

 

Xmpeg/DivX video encoding
We used the Xmpeg/DivX combo to convert a DVD .VOB file of a movie trailer into DivX format. Like LAME MT, this application is only dual threaded.

Once more, the dual Opteron 252s wind up at the top of the heap, but the Opteron 175 isn’t far behind, nearly tied with that other dual-core CPU, the Pentium XE 840. The Xeons are quite likely hampered here by their lower memory bandwidth.

Windows Media Encoder video encoding
We asked Windows Media Encoder to convert a gorgeous 1080-line WMV HD video clip into a 640×480 streaming format using the Windows Media Video 8 Advanced Profile codec.

This video encoder makes better use of four cores, and the dual Opteron 275 system finishes before the rest. The Opteron 175, meanwhile, absolutely runs away from the Opteron 248s. Could this be the new K8 cores’ SSE3 support at work?

 

ScienceMark

We’re using the 64-bit beta version of ScienceMark for these tests, and several of its components are multithreaded. ScienceMark author Alexander Goodrich says this about the Molecular Dynamics simulation:

Molecular Dynamics is lightly multithreaded – one thread takes care of U/I aspects, and the other thread takes care of the computation. The computation itself is not multithreaded, though Tim and I were looking into ways of changing the algorithm to support multi-threading programming a couple years ago – it’s a lot of effort, unfortunately. When MD [is] running there [is] a total of 2 threads for the process.

Here are the results:

The Primordia test “calculates the Quantum Mechanical Hartree-Fock Orbitals for each electron in any element of the periodic table.” Alex says this about it:

Primordia is multithreaded. Two main tasks occur which allow this to happen. Essentially, we identified 2 parallel tasks that could be done. We could probably take this a step further and optimize it even more. There is an issue, however, with the Pentium Extreme Edition that we’ve identified. The second computation thread gets executed on the logical HT thread rather than the 2nd core, so performance isn’t as good as it could be. This will be fixed in the next revision. This doesn’t effect [sic] the regular Pentium D. A workaround could include disabling HT on Pentium EE. There are 3 threads for primordia – 2 threads for computation, 1 thread for U/I.

The next two tests are only single-threaded, and they don’t make as good use of any of the CPUs here as they could if they were better optimized. The ScienceMark team has plans to incorporate linear algebra libraries from Intel and AMD in order to boost performance.

 

SiSoft Sandra
Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX and SSE/2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:

This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.

The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.

We’re using the 64-bit port of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations at once.

The Dual Opteron 275s scale very nicely in this test, showing off the incredible peak performance possible with four CPU cores working together. Among the single-socket dual-core processors, though, the Pentium XE 840 crushes the Opteron 175.

Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance. However, the versions of Sphinx we’re using are only single-threaded.

The dual-core Opterons perform quite a bit better here than the Opteron 148 and 248 do, possibly because of the enhancements AMD has made to its memory controller. Overall, however, CPUs geared more for linear performance triumph in this single-threaded test, as the Pentium 4 660 at 3.8GHz takes the top spot, followed by the Opteron 152.

 

picCOLOR
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA.

At our request, Dr. Müller, the program’s author, added larger image sizes to this latest build of picCOLOR. We were concerned that the thread creation overhead on the tests rather small default image size would overshadow the benefits of threading. Dr. Müller has also made picCOLOR multithreading more extensive. Eight of the 12 functions in the test are now multithreaded.

Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.

With a max of two threads, the Opteron 252 comes out on top, followed by the dual-core Opterons. This appears to be another case where Hyper-Threading confusion causes the Pentium XE 840 to stumble. Perhaps the latest version of picCOLOR, which incorporates four threads for some functions, could take better advantage of the XE 840 and the dual Opteron 275s.

 

Gaming performance
Purists, look away! We’re running games on the quad Opteron workstation box. Please, just flip ahead a couple of pages before you burst into flames.

Doom 3
We tested performance by playing back a custom-recorded demo that should be fairly representative of most of the single-player gameplay in Doom 3.

Far Cry
Our Far Cry demo takes place on the Pier level, in one of those massive, open outdoor areas so common in this game. Vegetation is dense, and view distances can be very long.

Unreal Tournament 2004
Our UT2004 demo shows yours truly putting the smack down on some bots in an Onslaught game.

Should you wish to run video games on an Opteron 175 or *cough* something like it, it would serve that purpose quite well, based on these results.

 

3DMark05

3DMark05’s overall score is utterly bottlenecked by the graphics card, but its CPU score is not. In fact, the CPU tests include an element of multithreading, and test two especially like the dual Opteron 275’s four cores.

 

Power consumption
We measured the power consumption of our entire test systems, except for the monitor, at the wall outlet using a Watts Up PRO watt meter. The test rigs were all equipped with OCZ PowerStream 520W power supply units. The idle results were measured at the Windows desktop, and we used SMPOV and the 64-bit version of the POV-Ray renderer to load up the CPUs. In all cases, we asked SMPOV to use the same number of threads as there were CPU front ends in Task Manager—so four for the dual Opteron 252, four for the Pentium XE 840, two for the Opteron 175, and so on.

The graphs below have results for “power management” and “no power management.” That deserves some explanation. By “power management,” we mean SpeedStep or PowerNow. (In the case of the Pentium 4 600-series processors, the C1E halt state is always available, even in the “no power management” tests.) Sadly, the beta BIOS we used for our Tyan S2895 motherboard didn’t support AMD’s PowerNow, so we couldn’t report scores for the Opterons with power management enabled.

At idle and under load, AMD appears to have delivered on its promise: the dual-core Opteron 175 actually consumes less power than the Opteron 152. Since power consumption is pretty directly related to heat output, AMD appears to have hit both its power and heat targets for the dual-core parts. Even the quad-core Opteron 275 system consumes substantially less power under load than the dual Xeon rig or the Pentium XE 840.

Incidentally, simply by turning off Hyper-Threading, the Pentium XE’s power consumption under load drops from 313W to 292W.

Here’s something kind of interesting. Since we’re dealing with dual-socket systems, we can calculate the power consumption delta when going from single to dual CPUs. That lets us isolate CPU power consumption from overall system consumption—at least in theory, I think, and don’t sue me please.

Again, the Opteron 175 is well within the power envelope established by its predecessors.

 
Conclusions
AMD’s dual-core Opteron processors are extremely well executed on all fronts, based on what we’ve seen. AMD’s dual-core design has a technical elegance that Intel’s can’t match, and that design brings superior performance. One Opteron 175 performs slightly better than a pair of Opteron 248s running at the same clock speed, and it does so while consuming less power than a single-core Opteron 152. All in all, very impressive.

Going to a dual-core Opteron does, however, involve some tradeoffs. Fundamentally, one is giving up single-threaded performance in order to gain multithreaded performance. Whether or not this tradeoff makes sense will depend on the kind of applications one plans to run on the system. Many of our benchmarks were multithreaded, but only made use of two threads, leaving the dual Opteron 275 system looking a little pointless. The Opteron 252 system outperformed it in many of these dual-threaded apps, like media encoding. Our other tests, however, showed the Opteron 275s to be an absolute rendering powerhouse. Which processor is the better buy will depend greatly on its intended use.

The rough part of the story is that AMD isn’t asking customers to choose between an Opteron 252 and a 275, which could be a tough choice for many workstation users. They’ve priced the Opteron 265, which runs at only 1.8GHz, right on top of the Opteron 252 at 2.6GHz. That forces one to choose: are you really committed to the idea of dual-core processors or not? For systems that already have two CPU sockets, I’m not sure what I’d choose without knowing the specific types of applications involved. The move to dual-core CPUs effectively ups the ante on thread-level parallelism in workstations, and some classes of applications will benefit from that effect more dramatically, and immediately, than others.

I do think that the answer for single-socket workstations is probably rather straightforward: I’d pick the dual-core Opteron over the single for the same reasons that most workstations have traditionally had multiple processors. Not only will the dual-core CPU bring better multitasking responsiveness, but it will also work well in dual-threaded applications, which are fairly common. In the server space, the choice to opt for dual-core chips for web, database, and terminal severs will also likely be rather easy, given the highly threaded nature of such roles.

There are a number other subplots in our benchmark results, as we discussed earlier, and I won’t attempt to address them all here. You’ve seen the results for yourself. We will probably be sorting through some of the more profound questions about the benefits and limitations of thread-level parallelism for years to come. 

Comments closed
    • thecatman
    • 14 years ago

    Impressive!
    some questions:
    which power supply do you use in this test? (model)
    all the tests have been made with the sli activated?
    which cpu cooler do you use in this test? (model)
    where I can steal a pair of opteron dual core?
    thanks!

    • Hector
    • 15 years ago

    Proesterchen,

    I can’t help if you’re clinically retared and can’t understand it. Are you the fat rich kid..too much drugs in HS?

      • Proesterchen
      • 15 years ago

      Yeah, sure, I’m rich, fat and retarded beyond needing to be institutionalized. In fact, this is my overpaid personal nurse typing this post, interpreting my unconcievable babblings into what you probably regard as particularly useless words.

      Anyways, IIRC, alasly your post is gone, you compared AMD to a new-comer, someone starting out fresh. Please remember that AMD will be 36 years old this year, its just a few months younger than Intel. Fact also is, it went nowhere in that time from an investors POV.

      Mind you, AMD managed to make technically sound products for the past few years, but as a company, it’s successful only by the widest of definitions of the word. (i.e. it managed to survive the past 35 years)

    • Hector
    • 15 years ago

    Naktor,

    All those billions did’nt help Intel in R&D, did it? AMD is hungry, like those immigrants you hear come here with $90 in thier pockets and five years later are millionaires. Intel OTOH is the fat rich kid, lazy, getting fatter, will never have to work a day in his life due to the fortune his parents built for him but whose fortune is getting watered down every year.. Who knows? In 35 years from now the immigrant could surpass Richy Rich. Like AMD could do to Intel.

      • Proesterchen
      • 15 years ago

      What a fucked-up analogy. Wow.

    • Hector
    • 15 years ago

    Scott,

    Why do you review these AMD Opterons with crippled 2T command and slow timings (3-3-3)? It’s would’nt be so bad if you did’nt use uber Corsiar XMS2 with tight timings (3-2-2) on the Intel setup, but as it is it’s severe enough to warrent comment. A64’s and Opterons suffer huge performance hits when running at the 2T command rate shown all over the net, about 4% slower. And running those loose timings is another, at least, 5% hit over 2-2-2.. Making a total loss of performance somewhere in the neighborhood of 9%. I’m really surprised AMD did as well as they did crippled like this. But for a more accurate representation of performance you may want to get some real ram when testing AMD’s 64 bit chips in the future.

    This is even more true when finally getting a real live X2 since using ECC is good for another 3% penalty on top of the 9% due to additional wait state.. I would estimate a real 4400+, with proper ram (2-2-2-7 1T), will be about 10-15% faster than the benchmarks shown for the Opteron 175 here.

      • Damage
      • 15 years ago

      The Tyan boards are pretty much never stable with tight memory timings. Tyan has only validated a few DIMMs for this board, and they all run at CAS 3/2T timings. That’s pretty much the way it goes with Opterons.

      As for the Athlon 64 X2, it should indeed be a little bit faster in memory-bound benchmarks and synthetic memory tests. We will review it properly, with non-registered DIMMs and the like, when the time comes.

    • endersdouble
    • 15 years ago

    OK, just a wee bit confused here–great article by the way, not confusing at all and a lot of info. Just want some info about purchasing…
    From what I understand, Real Soon Now the A-64x2s will come out. These are socket 939 dual core processors?
    I soon plan to upgrade to an A-64 system, probably a 3500. Now, when the x2’s come out, will I be able to just drop that into a random socket 939 board and get dual-core goodness?

    Also, Damage, an idea–have you ever considered, in your tables of benchmarking systems, throwing a price on each? That is, for each benching system, grab a Priceline/Newegg price for all the gear, add together. That way, we not only know that the dual Opteron 265s are 25% faster than the single 175, but also that it costs $2500 as opposed to $1650? (s/fake numbers/real numbers, of course.)

      • NegativeEntropy
      • 15 years ago

      Any S939 board that supports the newest 90nm A64s should support dual core with a bios update.

      I doubt we’ll see significant retail availability of A64 X2 processors before this fall. The large OEMs will likely consume the few available once they’re “launched” in June.

      AMD is low on fab space until the end of the year when the new fab opens, hence one reason they chose to concentrate dual core (which lowers the amount of processors per wafer) server/workstations — lower volume.

    • Prototyped
    • 15 years ago

    Oh my, excellent review.

    I doubt that most of the gamers who visit this site (except for the Folders) do much that steps into the realm of multiple computation-intensive threads, so I have grave doubts that even the desktop processors (Athlon 64 X2, Pentium D) will make a big difference to the performance most gamers care about. (Witness the gaming benchmarks — no performance advantage to dual-core over single-core or dual-processor over single-processor; the 152 and 252 win all.)

    Fortunately, not everyone who reads the site falls along the gamer/surfer/media-peruser-only stereotype. 😉

    • stmok
    • 15 years ago

    Athlon64 X2
    Dual 2.4GHz, 1Mb cache per core
    US$1,001
    Dual 2.4GHz, 512Kb cache per core
    US$803
    Dual 2.2GHz, 1Mb cache per core
    US$581
    Dual 2.2GHz, 512Kb cache per core
    US$537

    Pentium-D
    Dual 3.2GHz, 1Mb cache per core
    US$528
    Dual 3.0GHz, 1Mb cache per core
    US$314
    Dual 2.8GHz, 1Mb cache per core
    US$240

    I wonder how well the 2.8Ghz overclock?
    Maybe a “budget” dual-core setup using these CPUs?

      • Krogoth
      • 15 years ago

      I doubt Dual-Cores are going to OC well for ether party. You are dealing with twice the variables (two cores, double the thermal output, voltage requirements etc.)

      • Zenith
      • 15 years ago

      Overclocking those is Suicide. They are hot. Real hot. HOT HOT.

    • SuperSpy
    • 15 years ago

    q[

    • Ragnar Dan
    • 15 years ago

    This link on page 4 doesn’t work: §[<https://techreport.com/reviews/2005q2/opteron-x75/trdemo3.zip<]§

    • Craig P.
    • 15 years ago

    There are multitprocessor MD codes floating around in academia — there are guys in my research group who have done 16-processor runs on BoB (though I only see one on right now…).

    §[<http://bob.nd.edu<]§

    • Stranger
    • 15 years ago

    so damage when are you going to jump on the TR UT server?

    • Zenith
    • 15 years ago

    …SSE3 talk all around. Have none of you seen the Venice Vs Winchester benchmarks at the same frequencies? Venice walks away with Winchester placed very flat under Venice’s foot.

    The reason the new dual cores chips beat their single core brethren is because the Venice core is faster than Newcastle/Winchester (Or Sledgehammer and Clawhammer.) Although, the name for the Opteron Venice dual cores are Denmark, Italy, and Egypt (Venice 90nm SSE3 1xx, 2xx, 8xx)

    And there’s Venus, Troy, and Athens (Venice 90nm SSE3 single core 1xx, 2xx, and 8xx)

    This is just collective information I’ve found from roadmaps and such, please correct me if I’m out of date on this stuff…I love to have my stuffs straight.

      • UberGerbil
      • 15 years ago

      Venice only offers a few percent improvement over Newcastle, and even less over Winchester:
      §[<http://www.xbitlabs.com/articles/cpu/display/athlon64-venice_14.html<]§ The advantage is real, but I wouldn't say it walks all over Winchester.

        • Zenith
        • 15 years ago

        Considering there is no cache size or frequency improvement, and no big drawbacks, I’d call that signifigant. That and SSE3 when supported, helps too.

        Besides, you can technically get a higher % increase when you take the sum result of two venice cores over two winchester. Sorta. 🙂

      • Krogoth
      • 15 years ago

      SSE3 isn’t really the reason why the Venices and other E4 based K8s outperform their predecessors. AMD did further core tweaks to squeeze a little more IPCs.

        • Zenith
        • 15 years ago

        …That’s exactly my point…did you miss my entire post? The Venice core has a tweaked memory controller.

        The whole point of my post was: The lead in the 175/275 over single core duallies isn’t HT, isn’t SSE3, it’s the Venice core tweaks….

        That was the whole point of the post….

          • Krogoth
          • 15 years ago

          Techincally, the dual-core Opterons are based off the Venus, Troy and Athens cores. The closest A64 counterpart is the San Diegos which are basically a Venice with double the L2 cache.

          I just help cleaify to the others why the Venices have sightly better IPC performance then Winchesters and Newcastles.

            • Zenith
            • 15 years ago

            Well, you’re intentions are good. Yes, it’s because of the tweaked Memory controller, or so AMD says.

            As for core names…I covered all the core names in my post #82, except San Diego which is yes, a Venice with 1MB L2.

    • nihilistcanada
    • 15 years ago

    The real question is how creamy smooth it is. Are we talking Half and Half or Double Devon creme smooth?

    • ionpro
    • 15 years ago

    Re: The power readings… I know you said not to give you a hard time, Damage, but I do think that measuring the power deltas is a little unfair. The Opterons have a memory controller on board, and because of the NUMA archetecture both have fully-powered banks of registered memory. I think that pretty vastly inflates the power deltas for the processors themselves, vs. the duallie Xeons where that isn’t an issue.

      • Damage
      • 15 years ago

      Yeah, and the PSU isn’t 100% efficient, either, so let’s just give up. Bust out the sackcloth and ashes!

    • indeego
    • 15 years ago

    I think Anand’s conclusions are a tad more poignantg{<:<}g q[<"Despite AMD's lead in getting dual core server/workstation CPUs out to market, Intel has very little reason to worry from a market penetration standpoint. We've seen that even with a multi-year performance advantage, it is very tough for AMD to steal any significant business away from Intel, and we expect that the same will continue to be the case with the dual core Opteron. It's unfortunate for AMD that all of their hard work will amount to very little compared to what Intel is able to ship, but that has always been reality when it comes to the AMD/Intel competition."<]q While the tech is fantastic.... I don't think we'll see much adoption outside companies aiming for the very best bang/buck. everyone else will stick with Dell/nobodyevergotfiredforbuyingintelg{<.<}g

      • Krogoth
      • 15 years ago

      I beg the differ for the Server/Workstation market. In that segment time is truly $$$$, not even Intel’s marketing monster is going to save the obiviously inferior Xeon MPs. Stability is a non issue for ether platform, both have proven themselves to be reliable and robust.

      Intel isn’t really worry because, they still have the mainstream market by it’s nuts. That’s where the big “$$$$” are really made. They have the fab capacity to handle the yield hits of the lower-binned Smithfield cores.

      • UberGerbil
      • 15 years ago

      Yes and no. AMD has an uphill fight, no question about it, and is at an enormous business/financial disadvantage despite its technical virtues. However, there has been some progress since they shifted their emphasis away from desktops and towards servers (and mobile, but there hasn’t been as much progress there). They have HP and Sun on board as server OEMs, and that’s not insignificant. HP, although they’re subject to executive suite turmoil and confusion, is a particularly impressive cheerleader because they also develop and sell Itanium and Xeon servers and yet recommend Opteron-based Proliant systems (in particular, they’re on record as suggesting the 4P Opteron Proliant offers better peformance than a 4P Xeon — and this was before dual core or 64bit). A solid foothold in the (profitable) server market should eventually pay dividends in terms of OEM wins on the desktop, and keep AMD financially alive in the meantime; even if that never happens, the “small server” market is growing and already large enough to sustain AMD indefinitely. Also, when Fab36 comes on line towards the end of the year their unit cost should drop, enhancing profitability, and they should no longer be fab-constrained, which should also help with OEM wins for larger-volume products (such as desktops). If they’re not fab-constrained Dell loses one of their objections too, though I wouldn’t expect that to change minds in Round Rock in itself.

      Intel is huge, isn’t going away, and is dangerous even when it makes mistakes. They appear to have the train back on the track with the 65nm transition, so AMD has as big a task as it ever had. But AMD has a sensible plan and appears to be executing just about perfectly. You can’t ask for more than that.

      • FireGryphon
      • 15 years ago

      I relish TR reviews for their focus on the technology, not some wishy-washy industry commentary. The moment you start stepping into market analysis and other areas, you lose the focus on tech, and that’s what I’m here for.

      • WaltC
      • 15 years ago

      People often forget that Intel’s biggest (only?) advantage over AMD is mere FAB capacity. As AMD grows and brings its newer FAB facilities online this will be an ever-shrinking advantage for Intel. Even with its present FAB capacities, AMD already dominates several distinct market segments–such as the 3d-gaming/enthusiast market sector, for instance. It’s AMD’s technological advances that have kept the company growing and prospering, and like others have said any amateur “industry analysis” between AMD and Intel which assumes Intel’s current position is unassailable is flawed. Contrary to some popular opinion, Intel is not an elemental force of nature…;)

        • droopy1592
        • 15 years ago

        It’s a matter of a few years before Intel is the budget chip provider.

          • WaltC
          • 15 years ago

          Heh-Heh…;) IIRC, Intel has always been a “budget” chip supplier. Remember the original cacheless Celerons? They were the forerunner of the “modern Celeron” and were intended to drive AMD out of the business. Intel never saw AMD’s attack on the high-end of the x86 cpu market coming until it was far too late to do anything about it. IMO, today I see no difference at all between AMD and Intel in terms of which is a “budget” chip supplier. Those days ended forever for AMD with the K7 launch in 1999.

    • Division 3
    • 15 years ago

    *ponders*

    Power consumption is pretty high but…..all those things aside. Imagine dualcore in a laptop..?

      • Fx-Overlord
      • 15 years ago

      Really?

      I’m really impressed with the power consumption actually when you consider the 148 consumes more power than the 175 at load and is almost the same idling. And i can get 4 cores under load consuming less than 2 Intel cores 🙂

      • Illissius
      • 15 years ago

      Actually… considering that AMD has managed to make these consume *less* power than their single core brethren, which I wouldn’t have thought possible, their plans to put them in a notebook begin to make more sense. If they can get a single core A64 down to 30-ish watts, they can probably do the same for a dually.

      • Division 3
      • 15 years ago

      Actually that geared twords power consumption in the laptop with dualcore…

    • Flowboy
    • 15 years ago

    I was in love with the 170 and 175 as an upgrade for my 146 until I saw the pricing.

    Opteron 148 (2.2 GHz): $278
    Opteron 248 (2.2GHz): $455
    Opteron 175 (2.2GHz): $999

    The 4400 X2 at $2.2GHz with 1MB of cache costs $581, and I would imagine performs the same as the 175.

    That $419 price difference will more than pay for a 939 mobo.

      • Mr Bill
      • 15 years ago

      Do they have one-way dual core 2.0GHz 170 or 1.8GHz 165?

      • wierdo
      • 15 years ago

      for now.

    • Illissius
    • 15 years ago

    /[

    • Mr Bill
    • 15 years ago

    An excellent and thoughtful writeup! You have really done an outstanding job this time. I have two observations about the ScienceMark single precision Blas SGEMM contrasted with the double precision Blas DGEMM on page 11.

    I’m supposing the Opteron’s do so well in double precision versus single precision because of the difference between the efficiency of a true 64-bit core implementation versus a (very quick!) 32-bit core. The DGEMM/SGEMM ratio for the Pentium XE 840 is 0.819 (i.e. its slower in double precision). The Opteron 175 DGEMM/SGEMM ratio is 3.36. Wow! Its over 3 times the mflops in double precision. I’d like to see some benchmarks that benefit from or require use of double precision math.

    A secondary speedup effect seems to be there due to the dualcore design. Judging from DGEMM results (148 versus 248) versus (175 versus 275); that must be the memory performance difference we saw on page 5. Its really impressive, well done AMD.

    • d0g_p00p
    • 15 years ago

    Typo or graphic error on page 8. You write:

    ” where our previous champ had scored 689. That champ, by the way, was a system with dual 3.2GHz Xeons based on the older “Prestonia” core”

    The graphs show the Opt 252’s clocking that 689 score. So who is it, Xeons or Opts being the older champ?

      • Damage
      • 15 years ago

      Click the link in that passage. 😉

        • d0g_p00p
        • 15 years ago

        edit: Ahh… I see.

    • gubbar924
    • 15 years ago

    thought I’d share…

    §[<http://www.theinquirer.net/?article=22711<]§

    • apopilot
    • 15 years ago

    Should I wait to purchase an AMD 3500+. I have my eye on the X2 4400+. My question is will this work with games? Will games take advantage of the dual cores?

      • derFunkenstein
      • 15 years ago

      if you saw the article, the answer is in the UT2k4, Doom III, etc benchmarks…and the answer is right now, no. But if this catches on, expect that to change. No reason not to spin off different threads for sound, AI, etc.

        • UberGerbil
        • 15 years ago

        Yeah, no reason… except for the added complexity in the design and coding, the extra debugging time, and the resulting impact on schedule and ship dates. They’re going to do it, I agree, but let’s not trivialize the work involved. If just ‘spinning off threads’ was that easy they’d already be doing it.

        One thing none of these game benchmarks test is netplay: DirectPlay is multithreaded, and even games that don’t use that part of DirectX may see some benefit from having another core available to take on a bit of the networking overhead. Whether that will be perceptible to the gamer is an interesting question.

    • leor
    • 15 years ago

    now here’s the qustion:

    can you pair a 275 with a 248 and have a 3 core system?

      • derFunkenstein
      • 15 years ago

      you’d have to have a new acronym, as it’s no longer Symmetric as in SMP.

        • Vertigo
        • 15 years ago

        AMP has a nice ring to it…

        • UberGerbil
        • 15 years ago

        Yeah it is, at least enough to still call it SMP. The three cores all still have identical /[

      • UberGerbil
      • 15 years ago

      Damage: I would like to know the answer to this question. Just out of pure primate curiosity.

        • Damage
        • 15 years ago

        I will let you know as soon as I’m ready to possibly destroy a $2649 processor. 🙂

          • UberGerbil
          • 15 years ago

          Heh. I didn’t pick your nick. I just expect you to live up to it. 🙂

          In any case, I have a word for you: warranty. At the very least, the next time you’re talking to somebody from AMD or one of the mobo vendors, tell them “So I’m thinking of mixing a single core and a dual core in my two-socket mobo…” and see if they spit out their coffee.

            • Damage
            • 15 years ago

            Haha. Well, damage is kind of entertaining until you hit the $2500 treshold.

            • UberGerbil
            • 15 years ago

            But even that could be educational: when you fry a dual core, do you get two independent little puffs of magic smoke, or only one? 😉

            Please have a camera running when you do this…

            Seriously, I doubt it will cause any permanent damage. Very possibly the bios will be *[

    • Samlind
    • 15 years ago

    Now I wonder whether Dell, who wants to dominate the server market, is going to tolerate losing market share to everyone else. Supermicro has said they are going AMD, so now Dell is alone, and is going to be stuck with a decidedly inferior product mix.

    I know that no server apps were tested, but the results speak for themselves, and those results would indicate that in the highly threaded server usage AMD’s new baby is going to kick major butt.

    • Krogoth
    • 15 years ago

    AMD defintely has a winner in the Server/Workstation market. The amount of processing power that a dual-core, dual-chip Opteron has is downright wicked! Intel’s quad-chip Xeons and their semi-propertary motherboards are more expensive, have inferior performance and consume far more power.

    Fellow ethusiasts don’t despair, these monsters are priced and targeted for the Server/Workstation market. They can handle the $1,000+ price tag per chip.

    Wait for the upcoming A64-X2s to come out which are still only going to be affordible to the power user, ethusiast segiment for a while. (My estimate is around $500-800)

    • Steel
    • 15 years ago

    Nice. Very nice. 🙂
    Now to wait until the price drops enough to upgrade my dual Opteron system for less than a grand.

    • gavinjcd
    • 15 years ago

    Wow Intel really got raped there.

    AMD are really starting to make companies like DELL pretty stupid for not carrying the opteron/Athlon 64. With scores like that they have to carry it. It is “THE FASTEST”, there is no denying it anymore.

    You can see that AMD’s dual core design really shines when the 175 can beat out it’s dual processor counterpart, very very good.

    Gav

      • stmok
      • 15 years ago

      I hear Supermicro (an Intel focused mobo maker) is gonna make mobos for Opterons (dual-core).

    • KyleFL
    • 15 years ago

    /[

      • Kurlon
      • 15 years ago

      Actually, I think the special HT control instructions are part of the SSE3 bundle.

      • Ryszard
      • 15 years ago

      The x75 Opterons advertise themselves as HT processors, but don’t implement HyperThreading. So there’s only two cores.

      SSE3 also has a pair of HyperThreading-related instructions.

        • Fx-Overlord
        • 15 years ago

        Yeah i think the Hyperthreading bit was so that programs that are HT enabled and only search for the HT bit, to spawn extra threads will see the HT enabled on the X2 and spawn extra threads even though it doesn’t have HT really

          • kyaku00x
          • 15 years ago

          isn’t there an E4 stepping of the 248 that has SSE3? which stepping was the 248 in the review?

    • wagsbags
    • 15 years ago

    Think the athlon version will drop to about $150 in a year so I can get one when I build a new comp?

    • Randy2
    • 15 years ago

    Another kick in the pants for Intel !

    Nicely done article – what does the device manager report for processers ? 2 or 4 ?

    Edit: 2 or 4 on the dual proc board.

      • mattsteg
      • 15 years ago

      I seem to recall pictures of quad-cpu task manager graphs paired with every benchmark.

        • Steel
        • 15 years ago

        The *[

          • mattsteg
          • 15 years ago

          Well, XP does know that there are 2 physical processors. Not sure how it represents that in device manager, but I don’t really see it as a big deal either.

            • Randy2
            • 15 years ago

            It’s not really a big deal, and yes, I saw the 4 task manager graphs. I was just curious though.

            • Norphy
            • 15 years ago

            On Ye Olde P4C with HT enabled, Windows reports 2 CPUs in Device Manager. I’d be surprised if it didn’t report four CPUs for the quad core system.

    • highlandr
    • 15 years ago

    Posting in support of the article before reading!

    [Do Not Disturb, reading about dually cores]

    • Pettytheft
    • 15 years ago

    What’s the pricing on these? Did I miss it in the review?

      • tfp
      • 15 years ago

      /[

        • Ricardo Dawkins
        • 15 years ago

        yep..not cheap…and I dont have money for these CPUs. give me a dual Semprom – Celeron D and count me in at least the Smithfields aren’t so expensive…damn

          • tfp
          • 15 years ago

          I was thinking the same thing. I’m not sure the price justifies the performance, at least not for me. However this is aimed at the workstation/server market so, in many cases this is probably worth it for them.

            • derFunkenstein
            • 15 years ago

            I expect the Opteron 175 to slide in substantially cheaper than the 275. I also expect it to be cheaper than a pair of Opteron 248’s. Also, single-socket 940 boards are cheaper than duals, so if you just want something for dual-threaded apps, the Opteron 175 could be a tasty treat. I would hope that a 939 version of a dual-core Athlon 64 would be cheaper still, and 939 boards are already pretty cheap. Now you’re looking at (guessing) maybe a price of $450 for a CPU and $90 for a board. That could be pretty nice.

    • eitje
    • 15 years ago

    i think i just lost a little bowel control.

    i wonder if i can convince my boss that i need to get my dual-Xeon workstation upgraded. 🙂

    ps – i doubt he’ll go with the “i need a whole new computer” argument. 😉

      • Randy2
      • 15 years ago

      He won’t if you keep crapping your pants and reading TR instead of working !

        • eitje
        • 15 years ago

        well, i can’t explain the first part, but i need to the second to… stay on top of hardware trends. yeah.

    • BigMadDrongo
    • 15 years ago

    Great review as always… my Envy Meter has gone right up, again. I *must* have four processor cores in my home PC…

    Unless AMD’s model numbering has confused me (which is possible), isn’t there a typo on page 3? Rather than “175 vs 252” I think you mean “175 vs *[<1<]*52".

      • wagsbags
      • 15 years ago

      Unless I’m mistaken 175 is a dual core and 252 is two single cores. These run at the same speed so it’s a direct comparison whether it’s better to have two separate procs or one dual core.

        • leor
        • 15 years ago

        the 175 runs at 2.2 ghz and the 252 runs at 2.6, so they are not at the same speed at all.

        the apples to apples speed comparison would be the 175 to the 248.

    • LicketySplit
    • 15 years ago

    Another nice review as usual Mr. Wasson…pretty impressive.

    • Ruiner
    • 15 years ago

    God those model numbers make reading the charts hell.

    Any overclocking, wink wink?

    • IntelMole
    • 15 years ago

    How long till I can get a quad core laptop to replace the single core one I’ve got at the moment? 😛

    That thing would be a beast, but don’t put it on your lap :-D,
    -Mole

      • wierdo
      • 15 years ago

      your lap should also be safe since it runs cooler now too lol.

      • UberGerbil
      • 15 years ago

      I don’t know that any laptops beyond weirdo niche designs will ever adopt more than one processor socket. It’s hard to cram everything needed for even one socket into that form factor. If you’re talking quad-core CPUs, we’re probably 18 months away from even talking about that, even for desktops, and laptops will trail further behind. Yonah (dual core Pentium M) is due around the end of the year depending on how the shrink to 65nm goes for Intel.

        • IntelMole
        • 15 years ago

        heh thought as much.

        It’s just that lugging a system to and from uni (even a SFF) will be a PITA.

        Even that will be behind on quad cores I guess,
        -Mole

    • stmok
    • 15 years ago

    I had contemplated whether to get a full-blown dual CPU Opteron mobo to update various PIII setups I have (I want to update 4 setups)…But forget that!

    If its overall, a cheaper path with a single socket, dual-core setup, I’m going for that! (since performance is about the same!)

    Opteron 175 seems to suit my requirements…Now to wait for the Athlon64 X2 variant. 🙂

    • FireGryphon
    • 15 years ago

    The amount of power in these systems is sick. I can’t wait for this to trickle down to a more affordable level.

    • Ryszard
    • 15 years ago

    Won’t a 2.2GHz 1MiB dual-core Athlon 64 X2 be the 4600+, rather than the 4400+ ? I base that from the 2.4GHz 1MiB X2 being the 4800+.

      • crose
      • 15 years ago

      Shouldn’t that be MB (MegaByte) and not MiB (Million Bytes)?

        • Ryszard
        • 15 years ago

        A MiB is a mebibyte, and the correct binary prefix for the number.

        §[<http://physics.nist.gov/cuu/Units/binary.html<]§

          • crose
          • 15 years ago

          Cool. Haven’t seen that before, although it said something about they started using it 1998.

          • eitje
          • 15 years ago

          q[

          • IntelMole
          • 15 years ago

          I won’t be able to stop saying Gimli byte now.

          Ta very much :-P,
          -Mole

      • Fx-Overlord
      • 15 years ago

      4800+ is 2.4GHz and 1MB L2
      4600+ is 2.4GHz and 512KB L2
      4400+ is 2.2GHz and 1MB L2
      4200+ is 2.2GHz and 512KB L2

      Read the main page

        • Ryszard
        • 15 years ago

        Yeah, I noticed there were half-cache X2s after it was posted. I didn’t think there would be when I posted. Cheers.

    • Zenith
    • 15 years ago

    So THAT’S what you meant by Twins! BRAVO! *Reads*

    • Dposcorp
    • 15 years ago

    Nice seeing the normal single core dual Opterons doing well still.

    • dragmor
    • 15 years ago

    Impressive

    • droopy1592
    • 15 years ago

    AMD pwn3d themselves AND Intel. Damn, that dual core chip is nasty.

    • z00100
    • 15 years ago

    Good Lord!!!

    I need one of those desperately…………make it 2 of those……….

    mmmmmmmmmmm Folding…………mmmmmmmmmmmmm

      • drfish
      • 15 years ago

      Yes… Folding… Four clients on one system each with it’s own core… *drool* How many PPD would that thing put out?

    • crose
    • 15 years ago

    Finally! Thank you. 😉

Pin It on Pinterest

Share This