AMD’s ‘Istanbul’ six-core Opteron processors

The recent advent of Intel’s “Nehalem” Xeons had a bit of an apocalyptic feeling to it, when one considered the implications for AMD. Despite strong showings from the past few generations of Xeons and some unfortunate problems for the first quad-core Opterons, Intel never really seemed to open up an insurmountable lead in the two-socket server and workstation spaces. The Opteron’s power efficiency was consistently strong, at least, and its outright performance wasn’t too far behind the curve. The Nehalem-based Xeons, though, reached dizzying new performance heights with comparatively modest power consumption. One was left to wonder how on earth AMD would respond.

Now we have an answer, and it’s an interesting one, to say the least. The newest Opteron, code-named Istanbul, packs not four but six cores on a single die, giving it a considerable boost in performance potential. Not only that, but it’s hitting the market early. AMD had originally planned to introduce this product in the October time frame, but the first spin of Istanbul silicon came back solid, so the firm pulled the launch forward into June. Even with the accelerated schedule, of course, Istanbul comes not a moment too soon, now that Nehalem Xeons are out in the wild. We’ve had a pair of Istanbul chips humming away in our labs for the past week. Let’s have a look at whether they can restore the Opteron’s competitiveness.

The hexapod cometh

In the wake of Intel’s introduction of a radically new platform, AMD is emphasizing buttoned-down continuity for its new Opterons. In fact, this continuity may be Istanbul’s defining feature. By and large, Istanbul is essentially a quad-core “Shanghai” processor with two additional cores added to the die. Istanbul is compatible with the existing Socket F infrastructure, so it’s an easy drop-in upgrade for existing servers. So long as your Socket F motherboard supports dual power planes, all that’s required for an Istanbul upgrade is a quick BIOS flash and a chip swap. (In fact, that’s exactly how we prepared our test system for this review.) To get the six-core chips even fit into existing power envelopes, AMD has dialed back clock frequencies slightly, which is why the company cites a general performance boost of around 30% when going from a Shanghai Opteron to an Istanbul—depending, of course, on the workload.

Although AMD expresses hope of in-place server upgrades becoming a healthy portion of its business in a down economy, the more likely payoff for Istanbul is with AMD’s largest customers: system vendors, who ought to be able to refresh their Opteron-based product lineups with relatively minimal validation efforts. In fact, I’d expect to see quite a few vendors unveiling Istanbul-based systems in the coming weeks, starting today, even though they’ve just introduced new Xeon-based offerings, as well.

Istanbul looks like Shanghai plus two cores. Source: AMD.

Despite all of this sleepy talk about continuity, Istanbul does have a few new tricks up its sleeve. For one thing, the north bridge and HyperTransport clocks in Istanbul are decoupled, so higher HyperTransport frequencies are possible. The Opterons introduced today all have a HyperTransport clock of 2.4GHz, resulting in a 4.8 GT/s transaction rate. The north bridge clock, which also governs the speed of the L3 cache, runs at 2.2GHz.

The most notable change, though, is probably the addition of a feature AMD calls HT Assist. HT Assist is essentially a probe filter intended to reduce the overhead required for the synchronization of cached data across CPUs in multiple sockets. HT Assist reserves space in each processor’s L3 cache, in which it stores an index of where that CPU’s cache lines are being used system-wide. The CPU then becomes “host” of the cache lines stored in its directory. If any CPU needs an update about a particular cache line, it will often know which CPU is the correct host to probe for that information. AMD says HT Assist can replace broadcast probe requests (sent to all sockets) with directed requests in 8 of 11 typical CPU-to-CPU transactions. This reduction in probe traffic can yield big gains in available system bandwidth, as we reported when we saw AMD demo a 4P system whose Stream bandwidth increased from roughly 25GB/s to 42GB/s with the addition of Istanbul processors with HT Assist.

Back then, AMD talked of user-configurable HT Assist index sizes that could be set in the BIOS. Since that time, the firm has instead settled on a static index size of 1MB, which it considers the most optimal tradeoff between cache size and index granularity. To keep things simple, including Istanbul validation for system vendors, the index size will not be user-configurable. AMD has also decided not to enabled HT Assist by default on 2P systems, because the reduction in probe traffic on a 2P box isn’t worth the loss of 1MB of L3 cache per processor. For what it’s worth, our 2P SuperMicro H8DMU+ motherboard does expose a BIOS option to enable this feature, and we found that enabling it produced no appreciable increase in Stream bandwidth.

The Istanbul Opteron die. Source: AMD.

Like Shanghai before it, Istanbul is produced by GlobalFoundries on its 45nm SOI fabrication process. Istanbul weighs in at 904 million transistors, and its six-core die is 346 mm². Compare that to Shanghai, which is 758 million transistors and 258 mm². Istanbul isn’t 50% larger by either count, although its core count is up from four to six, because a 6MB L3 cache occupies a large portion of both chips. Intel’s Nehalem Xeons, of course, are also 45nm chips, and have dimensions very similar to Shanghai, with roughly 751 million transistors in a 263 mm² die. In other words, even if AMD does match Nehalem with Istanbul, it will be doing so with a considerably larger chip.

The comparison to Nehalem is instructive for many reasons, not least of which is the very different approaches AMD and Intel have taken with their latest CPU architectures. From a certain way of looking at things, they reach similar destinations by different paths. Istanbul, of course, has six execution cores, each of which can issue three instructions per clock. Nehalem has four cores, but they are true four-issue cores, capable of issuing, executing, and retiring four instructions per clock. Chip wide, then, Istanbul can issue 18 instructions per clock, while Nehalem can issue 16—closer than one might think, when just considering core counts. Also, thanks to simultaneous multithreading, Nehalem can track eight hardware threads, to Istanbul’s six, for greater thread-level parallelism. Perhaps most decisive for many of today’s workloads is the fact that Nehalem has three channels of DDR3 memory per socket, versus Istanbul’s two channels of DDR2. Despite its larger die size and higher core count, Istanbul isn’t necessarily far-and-way superior to Nehalem, even in theory.

That’s the match-up in the 2P space, but 4P and better servers may be more hospitable ground for the time being. The Xeon 7400 series processors, better known as Dunnington, have six cores, but are based on Intel’s older microarchitecture. AMD expects Istanbul to give it a clear lead in this space, at least until Nehalem-EX arrives later this year with native octal cores and four memory channels per socket.

Pricing and availability

Istanbul Opterons will populate the new Opteron 2400 and 8400 series lineups, and their introduction brings with it some price reductions on existing Shanghai Opterons.

Model Cores Clock speed North bridge/

L3 cache speed

HyperTransport

speed

ACP Price
Opteron 2435 6 2.6GHz 2.2GHz 2.4GHz 75W $989
Opteron 2431 6 2.4GHz 2.2GHz 2.4GHz 75W $698
Opteron 2427 6 2.2GHz 2.2GHz 2.4GHz 75W $455
Opteron 2389 4 2.9GHz 2.2GHz 2.2GHz 75W $698
Opteron 2387 4 2.8GHz 2.2GHz 2.2GHz 75W $523
Opteron 2384 4 2.7GHz 2.2GHz 2.2GHz 75W $523
Opteron 2382 4 2.6GHz 2.2GHz 2.2GHz 75W $316
Opteron 2380 4 2.5GHz 2.0GHz 2.0GHz 75W $316
Opteron 2378 4 2.4GHz 2.0GHz 2.0GHz 75W $174
Opteron 2376 4 2.3GHz 2.0GHz 2.0GHz 75W $174

The three 2P versions of Istanbul run at 2.2, 2.4, and 2.6GHz, and all fit into AMD’s mainstream 75W ACP power envelope. AMD is quick to point out that its entire product lineup shares the same basic feature set—including cache sizes, memory speeds, and virtualization support—in contrast to the breathtaking variety of the Xeon 5500 series, which can be rather daunting to keep sorted.

One can see here how AMD intends for the quad- and six-core Opterons to coexist. The top Shanghai model, the 2389 at 2.9GHz, drops from $989 to $698 to make room for the 2.6GHz Istanbul. The other Shanghais tumble in reaction. At that same $698 mark is the Opteron 2431, a 2.4GHz Istanbul. So the customer is faced with a fairly straightforward choice between four cores at 2.9GHz or six cores at 2.4GHz for the same price. The 4P-and-greater Opteron 8000 series presents the same choice, with higher stakes.

Model Cores Clock speed North bridge/

L3 cache speed

HyperTransport

speed

ACP Price
Opteron 8435 6 2.6GHz 2.2GHz 2.4GHz 75W $2,649
Opteron 8431 6 2.4GHz 2.2GHz 2.4GHz 75W $2,149
Opteron 8389 4 2.9GHz 2.2GHz 2.2GHz 75W $2,149
Opteron 8387 4 2.8GHz 2.2GHz 2.2GHz 75W $1,865
Opteron 8384 4 2.7GHz 2.2GHz 2.2GHz 75W $1,514
Opteron 8382 4 2.6GHz 2.2GHz 2.2GHz 75W $1,165
Opteron 8380 4 2.5GHz 2.0GHz 2.0GHz 75W $989
Opteron 8378 4 2.4GHz 2.0GHz 2.0GHz 75W $873

The first wave of Istanbuls all occupy standard power envelopes, but the six-core chips will proliferate to the other Opteron power grades this summer. We expect to see an SE model (105W ACP) at 2.8GHz, an HE (55W ACP) at 2GHz, and an EE (40W ACP) at 1.9GHz.

We have in our labs a pair of Opteron 2435 processors, and we’ve selected as their most direct competition a pair of Xeon X5550s. These Nehalem CPUs have a core clock of 2.66GHz, a 6.4 GT/s QPI link, support DDR3 1333MHz memory, and list for $958.

The X5550 has a 95W TDP rating, but there is some dispute over whether AMD’s ACP and Intel’s TDP are truly comparable. AMD has its own TDP numbers for its processors—SE chips are 137W, standard ones are 115W, HE models are 79W, and EE models are 60W—but it claims those numbers are more of an absolute peak than Intel’s. Hence the development of its ACP metric. We’ll measure power ourselves shortly, so I wouldn’t get too hung up on that issue.

After this, there’s that and the other

AMD has already outlined its plans for the next little while, including the introduction of the Socket F-compatible Fiorano platform later this year, an all-AMD effort that will bring PCIe Gen2 and HyperTransport 3 support (for the chipset link, not just CPU-to-CPU links like now), along with hardware support for I/O virtualization. After that, in early 2010, will come the bifurcation of Opteron socket types into two classes, the higher-end G34 with four memory channels and the mid-range C32 socket with dual-channel memory. These new sockets will enable some features already present in 45nm Opteron silicon, including DDR3 memory support and a fourth HyperTransport link. The two socket types will overlap in the 2P space, while only the G34 will serve 4P and beyond.

Source: AMD.

For sheer power, the most interesting of the two is the G34 socket, which will play host to Magny-Cours, a 12-core monster that’s essentially comprised of two Istanbuls in a single package, with an in-package HyperTransport interconnect between the two dies. AMD’s Mike Goddard told us this on-package HT connection isn’t anything special, just a pair of HT links (one x16 and one x8) running at regular frequencies. However, without the need to traverse a longer distance over a motherboard, Goddard said AMD should be able to tune the synchronizers on the HT links to achieve much lower latencies than a socket-to-socket connection.

Beyond that, mapping out the multi-chip-per-package future of the Opteron becomes rather tricky. Magny-Cours, for instance, will be fully connected on a per chip basis, not just per socket, in a 2P system. The routing on a 4P system becomes very daunting, very quickly, but the bottom line is that it’s fully connected per socket, not per die, with no more than two hops required in any scenario. Goddard said it was “a science experiment” getting that 4P routing topology done.

Source: AMD.

After a refresh with 32nm processors based on the next-generation “Bulldozer” microarchitecture on the G34 and C32 platforms in 2011, AMD plans to introduce a new platform again in 2012. Details about this one are sketchy, but Goddard told us that platform would include on-die PCI Express connectivity. I expect we’ll learn more about that as the time approaches.

Our testing methods

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processors
Dual Xeon E5450 3.0GHz

Dual Xeon X5492 3.4GHz
Dual
Xeon L5430 2.66GHz
Dual
Xeon X5550 2.66GHz

Dual
Xeon W5580 3.2GHz

Dual
Opteron 2347 HE 1.9GHz

Dual
Opteron 2356 2.3GHz


Dual Opteron 2384 2.7GHz

Dual Opteron 2389 2.9GHz

Dual Opteron 2435 2.6GHz
System
bus
1333
MT/s

(333MHz)

1600
MT/s

(400MHz)

1333
MT/s

(333MHz)

QPI
6.4 GT/s

(3.2GHz)

HT
2.0 GT/s

(1.0GHz)

HT
2.0 GT/s

(1.0GHz)

HT
4.4 GT/s

(2.2GHz)

HT
4.8 GT/s

(2.4GHz)

Motherboard SuperMicro
X7DB8+
SuperMicro
X7DWA
Asus
RS160-E5
SuperMicro
X8DA3
SuperMicro
H8DMU+
SuperMicro
H8DMU+
BIOS
revision
6/23/2008 8/04/2008 8/08/2008 2/20/2009 3/25/08 10/15/08
10/15/08
05/18/09
North
bridge
Intel
5000P MCH
Intel
5400 MCH
Intel
5100 MCH
Intel
5520 MCH
Nvidia
nForce Pro 3600
Nvidia
nForce Pro 3600
South
bridge
Intel
6321 ESB ICH
Intel
6321 ESB ICH
Intel
ICH9R
Intel
ICH10R
Nvidia
nForce Pro 3600
Nvidia
nForce Pro 3600
Chipset
drivers
INF
Update 9.0.0.1008
INF
Update 9.0.0.1008
INF
Update 9.0.0.1008
INF
Update 8.9.0.1006
Memory
size
16GB
(8 DIMMs)
16GB
(8 DIMMs)
6GB (6 DIMMs) 24GB (6 DIMMs) 16GB
(8 DIMMs)
16GB
(8 DIMMs)
Memory
type
2048MB
DDR2-800 FB-DIMMs
2048MB
DDR2-800 FB-DIMMs
1024MB
registered ECC

DDR2-667 DIMMs

4096MB
registered ECC

DDR3-1333 DIMMs

2048MB
registered ECC

DDR2-800 DIMMs

2048MB
registered ECC

DDR2-800 DIMMs

Memory
speed (Effective)

667MHz
800MHz
667MHz
1333MHz
667MHz
800MHz
CAS
latency (CL)
5 5 5 10 5 6
RAS
to CAS delay (tRCD)
5 5 5 9 5 5
RAS
precharge (tRP)
5 5 5 9 5 5
Storage
controller
Intel
6321 ESB ICH
with

Matrix Storage Manager 8.6

Intel
6321 ESB ICH
with

Matrix Storage Manager 8.6

Intel ICH9R with

Matrix Storage Manager 8.6

Intel ICH10R with

Matrix Storage Manager 8.6

Nvidia
nForce Pro 3600
LSI
Logic Embedded MegaRAID

with 8.9.518.2007 drivers

Power
supply
Ablecom
PWS-702A-1R
700W
Ablecom
PWS-702A-1R
700W
FSP
Group FSP460-701UG 460W
Ablecom
PWS-702A-1R
700W
Ablecom
PWS-702A-1R
700W
Ablecom
PWS-702A-1R
700W
Graphics Integrated
ATI ES1000 with 8.240.50.3000 drivers
Integrated
ATI ES1000 with 8.240.50.3000 drivers
Integrated
XGI Volari Z9s with 1.09.10_ASUS drivers
Nvidia
GeForce 8400 GS with ForceWare 182.08 drivers
Integrated
ATI ES1000 with 8.240.50.3000 drivers
Integrated
ATI ES1000 with 8.240.50.3000 drivers
Hard
drive
WD
Caviar WD1600YD 160GB
OS Windows
Server 2008 Enterprise x64 Edition with Service Pack 1

We used the following versions of our test applications:

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

This bandwidth test gives a nice visual for the different levels of the cache and memory hierarchy. Because AMD’s lower-level caches don’t replicate all of the contents of the higher-level caches, Istanbul’s two additional 512KB L2 caches (associated with its two added cores) increase its total effective cache size—and bandwidth—compared to Shanghai.

One new addition we’ve made for this review is a proper Stream bandwidth test. This version of Stream is multithreaded and can be told how many threads to create. We’ve chosen the optimal number for each system. As you can see, the Nehalem Xeons have a clear lead in available bandwidth thanks primarily to their three channels of DDR3 1333MHz memory. With no real changes to the memory subsystem, Istanbul achieves no more throughput than Shanghai.

Memory access latencies haven’t really changed with Istanbul, either, even though six cores are now sharing the same two memory controllers.

We can get a closer look at access latencies throughout the memory hierarchy with the 3D graphs below. I’ve colored the block sizes that correspond to different cache levels, with yellow being L1 data cache and brown representing main memory.

The continuity between Istanbul and Shanghai continues here. The Xeon X5550 looks pretty similar, too, but it has smaller L1 and L3 caches, a larger, quicker L3 cache (8MB) and much shorter access times to main memory.

SPECjbb2005

SPECjbb 2005 simulates the role a server would play executing the “business logic” in the middle of a three-tier system with clients at the front-end and a database server at the back-end. The logic executed by the test is written in Java and runs in a JVM. This benchmark tests scaling with one to many threads, although its main score is largely a measure of peak throughput.

As you may know, system vendors spend tremendous effort attempting to achieve peak scores in benchmarks like this one, which they then publish via SPEC. We did not intend to challenge the best published scores with our results, but we did hope to achieve reasonably optimal tuning for our test systems. We used a fast JVM—the 64-bit version of Oracle’s JRockIt JRE P28.0—and picked up some tweaks for tuning from recently published results. We used two JVM instances on all systems (one per socket), with the following command line options:

start /AFFINITY [FC0, 03F] java -Xms3900m -Xmx3900m -Xns3260m -XXaggressive -Xlargepages:exitOnFailure=true -Xgc:genpar -XXgcthreads:6 -XXcallprofiling -XXtlasize:min=4k,preferred=1024k

Those options are specifically the ones used with the Istanbul Opteron system. They varied for the other two systems in a couple of ways. Notice that we used the Windows “start” command to affinitize threads on a per-socket basis. For the Xeon X5550 system with 16 threads, we used masks [FF00, 00FF], and for the Shanghai Opterons, we used [F0,0F]. We also adjusted the number of garbage collector threads (-XXgcthreads) for each JVM to match the number of hardware threads per socket. In keeping with the SPECjbb run rules, we tested at up to twice the optimal number of warehouses per system, with the optimal count being the total number of hardware threads.

In all cases, Windows Server’s “lock pages in memory” setting was enabled for the benchmark user. In the X5550 system’s BIOS, we disabled the “hardware prefetch” and “adjacent cache line prefetch” options.

Since this is a new round of tests with an updated JVM, we’ve limited our scope to the three most relevant CPU types.

Even with six cores, the Opteron 2435 can’t match the Xeon X5550 in SPECjbb2005. Istanbul does bring substantial progress over Shanghai, however, closing the gap quite a bit. Things become more interesting when we bring power use into the picture, as we’re about to do.

SPECpower_ssj2008

Another new addition for this review is, at long last, SPECpower_ssj2008. Like SPECjbb2005, this benchmark is based on multithreaded Java workloads and uses similar tuning parameters, but its workloads are somewhat different. SPECpower is also distinctive in that it measures power use at different load levels, stepping up from active idle to 100% utilization in 10% increments. The benchmark then reports power-performance ratios at each load level.

SPEC’s run rules for this benchmark require the collection of ambient temperature, humidity, and altitude data, as well as power and performance, in order to prevent the gaming of the test. Per SPEC’s recommendations, we used a separate system to act as the data collector. Attached to it were a Digi WatchPort/H temperature and humidity sensor and an Extech 380803 power meter. I should note that the Extech is not officially approved by SPEC. Although it generally works well enough, the Extech occasionally produces a clearly wrong reading, which is either approximately one half or twice the prior reading—apparently a simple serial communications quirk. We’ve found that we can filter out these errors with a simple inspection of the data, and SPECpower appears to catch the errors, as well. Our results would not be accepted for publication by SPEC unless we used an approved (and much costlier) power meter. They should, however, be good enough for our purposes.

We used the same basic performance tuning and system setup parameters here that we did with SPECjbb2005, with the exception that we lowered the JVM heap size slightly to avoid a memory allocation error. Here’s an example of the Java options from our Istanbul system:

-Xms3700m -Xmx3700m -Xns3000m -XXaggressive -Xlargepages:exitOnFailure=true -Xgc:genpar -XXgcthreads:6 -XXcallprofiling -XXtlasize:min=4k,preferred=1024k

Like I said, the heap size is the only real change. Due to this benchmark’s long run times, we only ran it once on each system.

SPECpower_ssj results are a little more complicated to interpret than your average benchmark. We’ve plotted the output in several ways into order to help us understand it.

Here’s a look at ssj_ops, the benchmark’s measure of performance, and the power consumed in watts at each load level. The Istanbul Opteron 2435-based system looks awfully good here; its power consumption is similar to the Shanghai system at each load level, but with substantially higher performance. The Xeon X5550 system is a little different; at active idle, it draws 142W, versus 150W for the two Opteron boxes. Beyond that, the Xeon X5550 system draws more power but achieves higher performance at each step than the Opteron 2435.

A look at performance-to-power ratios should help clarify things.

Now we can see just how incredibly close a race this is. The performance-power curves for the Opteron 2435 and Xeon X5550 systems almost perfectly overlap, amazingly enough. The Nehalem Xeon is slightly superior at the lower load levels, but the Istanbul box takes a lead as utilization climbs to 40% and higher.

Obviously, Istanbul’s showing here represents a solid advance over Shanghai. Multi-core processors tend to offer very strong power efficiency propositions with highly parallel workloads. Adding two more cores and dialing back clock speeds in order to fit into the same power envelopes as Shanghai proves to be a very effective strategy in this case.

Surprisingly, the Xeon X5550 system manages to out-point the Opteron 2435 in SPECpower_ssj2008’s overall performance per watt summation, although only by an eyelash. The overall result takes power draw at active idle into account, which is probably what puts the Xeon over the top. Make no mistake, though: this Istanbul system is very much a match for the Xeon in terms of power-efficient performance.

Cinebench rendering

We can take another look at power consumption and energy-efficient performance by using a test whose time to completion varies with performance. In this case, we’re using Cinebench, a 3D rendering benchmark based on Maxon’s Cinema 4D rendering engine.

In this application, Istanbul’s two additional cores bring it even closer to the Xeon X5550. As the multithreaded version of this test ran, we measured power draw at the wall socket for each of our test systems across a set time period.

A quick look at the data tells us much of what we need to know, Still, we can quantify these things with more precision. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

Idle power draw here is similar to what we saw in SPECpower_ssj, but slightly higher, especially for the Xeon X5550 system. Only one watt separates it from the Istanbul box.

Next, we can look at peak power draw by taking an average from the ten-second span from 15 to 25 seconds into our test period, during which the processors were rendering.

Power draw under load here isn’t quite as high as it was in SPECpower_ssj, but the trend remains the same: the Xeon X5550 system draws considerably more power at peak than the Opteron systems.

One way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.

The Istanbul system consumes less energy over the course of the test period than the Xeon X5550.

We can quantify efficiency even better by considering specifically the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

The energy efficiency picture comes into sharper focus with this final metric. The Istanbul Opteron-based system requires less energy to render the scene than anything we tested but a tailored low-power system based on the Xeon L5430 and Intel’s San Clemente platform. This is a more definitive result than we saw in SPECpower_ssj, and Istanbul comes out clearly ahead of the Xeon X5550.

MyriMatch proteomics

Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He has provided us with an intriguing benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of proteins. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.
In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database.

MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used.

I should mention that performance scaling in MyriMatch tends to be limited by several factors, including memory bandwidth, as David explains:

Inefficiencies in scaling occur from a variety of sources. First, each thread is comparing to a common collection of tandem mass spectra in memory. Although most peptides will be compared to different spectra within the collection, sometimes multiple threads attempt to compare to the same spectra simultaneously, necessitating a mutex mechanism for each spectrum. Second, the number of spectra in memory far exceeds the capacity of processor caches, and so the memory controller gets a fair workout during execution.

Here’s how the processors performed.

The Opterons aren’t entirely memory bandwidth bound in this test, because the Opteron 2435 shaves 20 seconds off of the execution time compared to the 2389. That’s a healthy improvement, but it’s not sufficient to catch the Xeon X5550, which completes the test 14 seconds ahead of the Istanbul Opteron.

STARS Euler3d computational fluid dynamics

Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here.

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.
The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. Charles tells me these CFD solvers are very floating-point intensive, but oftentimes limited primarily by memory bandwidth. He has modified the benchmark for us in order to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.

I thought it might be nice to plot the performance at different thread counts, which I did in the graph above. However, we’ve seen some pretty broad variance in the results of this test at lower thread counts, which suggests that it may be stumbling over these systems’ non-uniform memory architectures. Just for kicks, I decided to try running two instances of this benchmark concurrently, with each one affinitized to a socket, and adding the results into an aggregate compute rate. Doing so proved to offer a nice performance boost.

Both the Xeons and Opterons benefited from the change. However you run this test, though, the Nehalem Xeons are simply faster, probably due to their superior memory bandwidth.

Folding@Home

Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.

The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. Overall, Folding@Home should be a great example of real-world scientific computing.

notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.

On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the total number of cores (or threads, in the case of SMT) in the system in order to estimate the total number of points per day that CPU might achieve.

This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.

The Xeon X5550’s per-thread performance here, in the individual work unit types, is relatively weak because it’s running two threads per core. Once we get to the final analysis, though, its total projected points per day look much stronger. Istanbul is once again a respectable improvement on Shanghai, but not quite fast enough to catch the new Xeons.

POV-Ray rendering

We’ve been using this chess2 POV-Ray scene as an example for ages, and here, it offers some drama, with the Opteron 2435 finishing one second after the Xeon X5550. The POV-Ray benchmark scene has a large single-threaded component, so it produces very different results. The Opteron 2389 is faster in this case, giving us a peek at the other side of the core-versus-frequency tradeoff.

Valve VRAD map lighting

This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to pre-compute lighting that goes into its games.

The Nehalem Xeons simply excel in this application. No contest, really.

x264 HD video encoding

This benchmark tests performance with one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark. These scores come from the newer, faster version 0.59.819 of the x264 executable.

One can surmise by looking at these results that x264’s second pass is more widely multithreaded than the first. True to form, the Shanghai Opterons are faster in pass one, while Istanbul is faster the second time around. Due to the flexibility of its “Turbo mode” mechanism, the Xeon X5550’s performance is excellent in both cases.

Sandra Mandelbrot

We’ve included this final test largely just to satisfy our own curiosity about how the different CPU architectures handle from SSE extensions and the like. SiSoft Sandra’s “multimedia” benchmark is intended to show off the benefits of “multimedia” extensions like MMX, SSE, and SSE2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:

This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.

The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.

The benchmark contains many versions (ALU, MMX, (Wireless) MMX, SSE, SSE2, SSSE3) that use integers to simulate floating point numbers, as well as many versions that use floating point numbers (FPU, SSE, SSE2, SSSE3). This illustrates the difference between ALU and FPU power.

The SIMD versions compute 2/4/8 Mandelbrot point iterations at once – rather than one at a time – thus taking advantage of the SIMD instructions. Even so, 2/4/8x improvement cannot be expected (due to other overheads), generally a 2.5-3x improvement has been achieved. The ALU & FPU of 6/7 generation of processors are very advanced (e.g. 2+ execution units) thus bridging the gap as well.

We’re using the 64-bit version of the Sandra executable, as well.

In our final benchmark, the Istanbul Opteron produces a bit of a surprise, besting the Xeon X5550 in all three tests—and even outperforming the $1600-a-pop Xeon W5580 in the integer x8 test.

Conclusions

We’ve now had a look at AMD’s first response to Nehalem, and well, it’s not bad. The six-core Opteron 2345 can’t quite match the Xeon X5550 in overall performance—and although these products are nearly the same price, the Xeon X5550 isn’t the highest Nehalem speed grade. That honor would fall to the Xeon W5580 processor that appeared in some of our benchmark results. In terms of raw performance in a 2P system, Nehalem still reigns supreme.

Yet Istanbul should be a clear improvement over Shanghai for many workstation-class workloads and most server-class workloads—i.e., those that are essentially parallel and widely multithreaded. The Opteron 2435 manages to deliver this higher performance not just within the same power envelopes, but quite empirically with almost the exact same measured power consumption as the Opteron 2389.

This combination yields a nice increase in power efficiency, which was enough to put our Istanbul-based test system in the same territory as our Xeon X5550 system. The competition between the two was remarkably close in SPECpower_ssj, and the Istanbul system required notably less energy to render the Cinema 4D sample scene in Cinebench. So despite that fact that Intel leads in outright performance, the Opteron 2435 is entirely competitive on the power-efficiency front, with lower peak power draw, to boot. Those who evaluate systems strictly on this basis would do well to keep Opterons in the mix.

And if you have existing, compatible Socket F servers, the Istanbul Opterons should be an excellent drop-in upgrade. They’re a no-brainer, really, when one considers energy costs and per-socket/per-server software licensing fees.

AMD has a tougher sell to make when it comes to brand-new systems. The Nehalem Xeons offer higher peak performance with a similar energy-efficiency proposition. Still, Istanbul at least keeps the Opteron in the conversation, which makes the outlook for AMD seem substantially less apocalyptic than it did several short months ago.

Comments closed
    • pluscard
    • 10 years ago

    Has anyone seen any 2345 Istanbul cpus in the wild at this point?

    • pogsnet
    • 10 years ago
    • Dposcorp
    • 10 years ago

    100th post FTW!

      • Mr Bill
      • 10 years ago

      101 a silly little millimeter longer. LOL that dates me.

        • Xylker
        • 10 years ago

        Makes you a smoker, too.

          • Mr Bill
          • 10 years ago

          No, never have. But I grew up in the era when cigarette ads were allowed on radio and TV. They hit you with that darned jingle constantly.

    • cocobongo_tm
    • 10 years ago

    Aaaaa…guys. In the x264 benchmark…you seem to speculate about x264’s second encoding pass being more multi-threaded than the first…I wouldn’t know…but did you specifically instructed x264 to run in multithreaded mode? Like “x264 -o benchmark.264 –threads 6 input.yuv 1280×720”? This’ll start x264 with 6 separate encoding threads in parallel.

    So for the Xeons this would be –threads 8 (4 x 2 Hyper Threading ones), for the Istanbul –threads 6 and so on….

      • swaaye
      • 10 years ago

      You should run X264 with -threads auto. By design it runs more threads than you have CPUs (read that somewhere).

    • Chrispy_
    • 10 years ago

    Whilst drop-in upgrades makes the new Opteron a no-brainer for that particular market segment, the carrot for new sales is the exact same argument.

    “Why buy a marginally faster Xeon server when AMD has a history of allowing longer platform life and lower overall TCO?”

    Sure, you’re making an assumption that AMD will continue to support socket F, but then historic data would suggest that they will, and in a worst case scenario, the new Opterons aren’t inferior enough to make you second-guess your decision.

    • Thresher
    • 10 years ago

    Unfortunately, every time I see the word “Istanbul”, the song “Istanbul (not Constantinople)” starts playing in my head.

    🙁

    • Mr Bill
    • 10 years ago

    Its about time that an AMD CPU got top scores on Sandra Mandelbrot. More cores finally did the trick? Or was it something more subtle?

      • Mr Bill
      • 10 years ago

      Answered my own question. The 2.6GHz 2435 scales very well compared to the similarly clocked 2.7GHz 2384. Multiplying the 2435 numbers by 4/6 gives the following results.
      ………………….(4/6)*2435 result versus 2384 result
      Int X 8…………….. 356*4/6=297 versus 252
      FP X 8………………221*4/6=147 versus 153
      FP double X 4…….121*4/6=81 versus 84

    • sergeant_skyes
    • 10 years ago

    ha as i thought. nothing much changed in the synthetic tests as istanbul has the same frequency of NB and HT which is why they show similar results. only in multi threaded apps they show their performance. and i dont think the competition is fair b/w the intels and amds processors. Firstly intel has triple channel DDR3 memory running at 1333mhz where as AMD has dual channel DDR2 memory running at 800mhz. the difference in the clocks is way too much to compare. Intel has 8 virtual cores and AMD has 6 physical cores. I dont know which is a good idea in this case. And hey AMD has a low L3 cache speed so it isnt suprising to see Intel is leading in all synthetic benchmarks. AMD should probably increase their clock speeds and L3 cache speeds to get near the Intel processors. they are pretty close in real world performance so these two factors might help.

    • pluscard
    • 10 years ago

    I didn’t see any server workload tests, like sql, etc.

      • TravelMug
      • 10 years ago

      Check out Johan’s article over at AT for those.

      BTW – George Ou and tshen83 in one thread? Me thinks heads will assplode.

    • ssidbroadcast
    • 10 years ago

    Hey uh, has AMD made any mention perchance of making a home desktop variant of this?

      • ish718
      • 10 years ago

      How would it compete with core i7 920 and be profitable at the same time?

        • Meadows
        • 10 years ago

        By raising clock speeds and TDP.

          • ish718
          • 10 years ago

          Yeah, it would cost more than the core i7 and still be slower in some applications and have less OCing capabilities and will end up getting stomped by a moderately overclocked core i7. Not to mention core i5…

        • OneArmedScissor
        • 10 years ago

        I’d want one if there were a desktop version. But I’m someone with out of the ordinary requirements, not even remotely a significant part of the market, as would be most anyone else with a REAL use for such a thing. It doesn’t make sense for a few reasons, and that’s just one of them.

        It would have to have a higher clock speed. Assuming I could make linear use of either more speed or cores, which is possible:

        2.6 GHz x 6 = effective 15.6 GHz

        3.2 GHz x 4 = effective 12.8 GHz

        The best 6 core Opteron is not even quite what a 5 core version of the Phenom II 945 would be, and it’s quite possible you could match it by overclocking a real quad-core, as it is. If they ever sell a new Phenom II quad-core that’s much faster, it would be close enough.

        I wouldn’t really need to hope for anything new to come along, though, because I could always do this right now:

        2.4 GHz x 4 x 2 = effective 19.2 GHz

        Point being, I’d be better off just throwing together a cheap dual socket server with low end quad-cores, as those are only $173 each. I may even do that, once new AMD server boards are out.

    • echo_seven
    • 10 years ago

    It’s been a while since I saw AMD finish something *early*, nice surprise.

    • FireGryphon
    • 10 years ago

    Great article, as always.

    An engineer friend of mine went on a rant once about tri-core, sextal-core, etc. processors (basically any processor that doesn’t have a number of cores that is a multiple of two). He said something about losing overhead (or registers?) by having three, six, etc. cores. Does anyone here care to elaborate on this in more detail? I can’t seem to find any info on it anywhere.

      • Blackened
      • 10 years ago

      I always thought 2 was a multiple of 6.

        • Meadows
        • 10 years ago

        g{

          • Blackened
          • 10 years ago

          Sorry, 6 is a multiple of 2. Actually math was the only class I never had a problem in but AP Calculus was the last one I took and that was over 15 years ago.. I had to look it up, but I only had the numbers inverted and my point still stands because 6 *[

            • tfp
            • 10 years ago

            He means power of 2…

          • MadManOriginal
          • 10 years ago

          Not at hard as you failed your Insulting People on the Internet class apparently.

            • Meadows
            • 10 years ago

            Oh, shush you.

    • georgeou
    • 10 years ago

    Your power measurements on the X5550 server appears to be 100 watts too high.

    The Istanbul power measurements look like all other AMD Opteron servers measured on the official SPEC.org database. However, your measurement on the X5550 at 350+ watts is ~100 watts higher than every other server running the X5570 CPU on the official SPEC.org database. Clearly the AMD Istanbul system seems to be running an optimum low power configuration but the Intel Nehalem system is running far from what anyone would consider optimum in terms of energy efficiency.

    You can see for yourself here and see a significant power discrepancy on the Intel server but not the AMD server
    §[< http://www.spec.org/power_ssj2008/results/res2009q2/power_ssj2008-20090515-00160.html< ]§ X5570 §[<http://www.spec.org/power_ssj2008/results/res2009q2/power_ssj2008-20090505-00153.html< ]§ X5570 §[<http://www.spec.org/power_ssj2008/results/res2009q1/power_ssj2008-20081224-00107.html< ]§ 2384 §[<http://www.spec.org/power_ssj2008/results/res2009q2/power_ssj2008-20090504-00149.html<]§ 2377HE Now I’m not suggesting foul play here, but this particular discrepancy especially in the context of your report’s claim that it will put AMD’s ACP power claims to test is very problematic. It invalidates the results that Istanbul servers are nearly as energy efficient when they’re actually significantly less efficient. Now I looked into your methodology and I may have found the culprit. The X5550 system is running an unusually powerful graphics chipset for a server with the Nvidia GeForce 8400 on a SuperMicro X8DA3. I have never seen anyone using a server board with a gaming graphics card before, and none of the other systems you tested used such an unusual graphics system. That particular motherboard might also have an inefficient VRM or something else. I’d be curious where you got this motherboard for you to get a power measurement result that is 100 watts higher than everyone else. Now I’m sorry if I sound critical of your work because I respect your work overall. But I think you need to look into this discrepancy and come up with an explanation.

    • eitje
    • 10 years ago

    I know this is an Opty review, but I noticed on page 7 this weird occilation in the Xeon *55** procs, at idle. What’s up with that?

    edit: Ah, Damage/George might have IDed where the flux comes from – the 8400 GS might be doing some funny things with the idle power on those two systems.

      • Meadows
      • 10 years ago

      I’d say the oscillation probably comes from HT making the renderer go “d’oh” from time to time.

        • eitje
        • 10 years ago

        but the rendering is already done. It’s at idle.

          • Meadows
          • 10 years ago

          You’re probably right, I never /[

            • eitje
            • 10 years ago

            So, you don’t know what you’re even commenting on, but you’ll gladly share your hunches.

            Shows what you know, since the flux is only happening on the 8400GS-equipped systems.

            • Meadows
            • 10 years ago

            What does a videocard do that makes it cyclically work or not work?

            • eitje
            • 10 years ago

            I don’t know that – but the only systems in on the configuration page that are showing cyclic power spikes on the rendering charts have an 8400 GS installed.

    • ssidbroadcast
    • 10 years ago

    You know uh, it’s hard for me to make my own interpretations of server-class benchmarks, so I’m afraid I only read the first and last pages… and meanwhile wait for the Podcast topic.

    • wingless
    • 10 years ago

    AMD can come out with a 6-core AM3 desktop chip, clock it at 3.2 core, 2.2 NB, 2.4 HT Link, and sell it for a pretty good amount with these results. Istanbul isn’t bad at all over standard 4-core systems.

    They have to get this out for desktop use ASAP.

      • armistitiu
      • 10 years ago

      Why? What desktop applications that are widely used can take advantage of 6 threads? I still don’t know many.
      I think the performance gap between I7 and Deneb should be filled with single core performance(/clock maybe when bulldozer comes). Until then they could use some kind of turbo mode. To let one core overclock a bit when only 1 thread is active.
      Don’t get me wrong. Threads are good in some cases. That’s why i7 shines in so many benchies and most of server workloads : Hyper threading.

        • shank15217
        • 10 years ago

        Funny you say that, if desktop apps can benefit from hyper threading, they would also benefit from 2 extra cores. Encoding, image processing, rendering and other workstation applications could benefit greatly from a 6-core Opteron.

      • ish718
      • 10 years ago

      Good idea, too bad AMD’s current financial position wouldn’t like such a move.

    • UberGerbil
    • 10 years ago

    g[

    • armistitiu
    • 10 years ago

    How is this benchmark relevant for a server CPU? We already knew that Istanbul has the same core architecture as Shanghai and the same amount of cache so really no surprises here. These benchmarks only showed us that multi-threaded tests scale well on Istanbul. What did you expect? Some magic-infused new ALU and some SMT baked into the cores that makes Istanbul core perform Nehalem like?
    Your tests should have been on 2P or 4P servers and with actual SERVER benchmarks (including virtualization). I don’t blame you for not having a 4P server but please try and not run Folding@Home on a server CPU.
    What’s to notice here is the power consumption of Istanbul which is quite impressive (less than X5550 and equal to Shanghai @ 2.9).
    I’m quite sure that 2.8 Ghz or even 2.9Ghz Istanbuls are on their way so AMD is not so dead as you all think (at least not this year).
    Also…the graphs are a bit misleading. At first i thought you’re testing a single processor.
    PS: Anandtech has a nice review too.

    • Zhaine
    • 10 years ago

    Yes, that is exactly why it is a /[

    • derFunkenstein
    • 10 years ago

    I demand this article be re-written with more anti-Indilinx/OCZ SSD bashing.

      • eitje
      • 10 years ago

      Well, they didn’t even MENTION OCZ in this article, which is plainly an oversight.

    • Hattig
    • 10 years ago

    Well as a Socket F server upgrade, you can’t beat Istanbul. Good power consumption as well – 6 cores in 75W with good performance! The power consumption benchmarks were very good. I think we can say that AMD’s 45nm process is pretty decent. In addition I think Istanbul is good enough for a company that is AMD based to continue using AMD processors and keep a stable platform, even for new server purchases.

    Of course AMD needs to get their next platform out, and enable DDR3 for Istanbul. Also if they had a way to overclock cores when others were idle (for low-threaded apps) then they would have improved results as well.

    Of course Istanbul will benefit a lot in the 4P and 8P markets because of the HT Assist, which doesn’t help on the 2P reviews that I have seen. With no 4P Nehalem CPU out yet, and when it does come out a full platform validation requirement for businesses, this is where AMD can regain a lot of ground and stand proud.

    It was also a shame to not see any virtualisation benchmarks, but Anandtech mostly did a good job here.

    • UberGerbil
    • 10 years ago

    Well, this is a tasty article with which to start the week.

    • Kurkotain
    • 10 years ago

    At least AMD can catch a breather…but the new xeons and nehalem EX are going to crush them, if they don’t do some serious magic…

    • Taddeusz
    • 10 years ago

    The review doesn’t mention the fact that the quad core Xeons do with 4 cores what the Opterons barely manage to do with 6.

      • Krogoth
      • 10 years ago

      Technically, it is 4-cores with SMT on each core.

        • tfp
        • 10 years ago

        Where is Logan-X with his 6 cores > then 4 cores + SMT?

        • Hattig
        • 10 years ago

        And SMT actually works on Nehalem – 30% improvement in many cases. Considering it’s a 5% increase in die size to add, that’s a definite win, although Nehalem is also 4-issue by design to aid with SMT so maybe a bit more in this case. Also Nehalem has a power-of-two core count, and it appears that some software prefers this.

      • Damage
      • 10 years ago

      O RLY?

        • Stranger
        • 10 years ago

        YA RLLY… but seriously the review labor over the fact that istanbul has 6 cores even the table of contents graphic for the review has six cores showing…

      • derFunkenstein
      • 10 years ago

      did you read the review? That’s all over the place.

      • flip-mode
      • 10 years ago

      Jebus, dummest komment evr.

        • BoBzeBuilder
        • 10 years ago

        flip-mode is on point.

    • esterhasz
    • 10 years ago

    While it’s great that AMD does some catching up here, the fact that Nehalem does what it does with not even two thirds of Istanbuls die size really does make me *gulp*

      • valrandir
      • 10 years ago

      This can’t be cheap for AMD. It’s not good business, anyway.

        • esterhasz
        • 10 years ago

        The hope is that the pricing structure in the server market reduces the relative significance of fabrication cost. Still…

    • ew
    • 10 years ago

    Istanbul was Constantinople
    Now it’s Istanbul, not Constantinople
    Been a long time gone, Constantinople
    Now it’s Turkish delight on a moonlit night

    Came here to say that. Now I’ll read the article.

      • UberGerbil
      • 10 years ago

      Why did Constantinople get the works?
      It’s nobody’s business but the Turks.

        • derFunkenstein
        • 10 years ago

        meh………..

      • SecretMaster
      • 10 years ago

      Gahhh! Every time an Istanbul related news/article is posted this inevitably gets mentioned.

        • ssidbroadcast
        • 10 years ago

        Yeah it’s getting real played out.

        • eitje
        • 10 years ago

        the worst part is, do you think thew RIAA is getting any royalties from these reproductions?! NO.

          • Krogoth
          • 10 years ago

          CEASE AND DESIST!

          coming to Damage in 4….3…2…1…..

      • grimpr
      • 10 years ago

      All the world bows to turkey, its kebab delights and remembers fondly of its genocides.

    • Vasilyfav
    • 10 years ago

    But can it run Crysis?

      • Krogoth
      • 10 years ago

      “will it run Crysis” is a bad meme.

      Warning to the readers: Soapbox material.

      Crysis is a prime example of what not to do with a tech demo. It is poorly written “code-wise”. The art direction and storyline is lazy copy and paste from previous stuff (LOL Far Cry I). These things destroy any technical merit of engine. EA is full of butthurt when their product didn’t meet sales projections and blames the industry’s #1 scapegoat (take a guess?) for their product’s ills. Crytek went back realized how they can no longer rely on hopelessly overpowered GPU solutions to compensate for their unwillingness to make efficient code. I sincerely hope Crytek’s next project will be at least entertaining and can run smoothly on a good number of their customer base’s systems.

      • Meadows
      • 10 years ago

      Shirley.

      • tfp
      • 10 years ago

      but can it FOLD??!?!?!??!

        • derFunkenstein
        • 10 years ago

        no.

        • Convert
        • 10 years ago

        Page 9.

      • ew
      • 10 years ago

      No. It will only run on the Beowolf cluster of these I’m imagining…

    • Krogoth
    • 10 years ago

    Hmmmm, Istanbul is interesting. I am afraid that upcoming Westmere-based Xeons (also sextal-core) are going to take the lead by a good margin.

      • crazybus
      • 10 years ago

      Sextal-core? Hex-core, Sex-core, or plain old six-core perhaps.

        • armistitiu
        • 10 years ago

        Silicon-core! Or HighK-core that Intel’s been playing lately(or Nu-metal as i like to call it :P). Anyway…semi-metal-core ftw!

      • shank15217
      • 10 years ago

      Except I can put a 6 core Opteron in 2 year old servers and get massive performance gains. This processor is huge in the HPC or blade systems arena. Effectively increasing cpu performance in the same power envelope by 50% at a marginal cost of the system. In a 2P blade the cpu cost is 15-20% of the total blade cost. Drop in upgrades that increase performance by 50% for the price of 20% of a new system is a no brainer.

        • UberGerbil
        • 10 years ago

        Except that very few customers do this. Most servers in commercial use get swapped out in total, not upgraded in place. Accounting rules, licensing issues, and service contracts all tend to work against it. As the article says, the real win in that respect is for the OEMs and VARs, who should have validated systems using existing infrastructure very quickly.

          • Jypster
          • 10 years ago

          I am not realy sure on that UberGerbil. When I was working for IBM X-series (x86 servers) here in Australia one of the most common requests, after Hard drive upgrade options was for Processors. Supprisingly we had more requests for Processors than memory which I could never understand. Maybe people could workout the upgrades for memory themselves I am unsure.

          In this current economic enviroment I can see companies wishing to extending the life of their machines further as well so there should be even more of a demand for drop in replacements for both AMD and Intel

            • Blackened
            • 10 years ago

            I can confirm UberGerbil’s acertion. We have over 10k servers in our datacenter and we hardly ever upgrade anything in our production environment. The only time we really upgrade is after we have bought new hardware to replace the older production servers, we will sometimes upgrade the memory and/or disks in the older servers to use for more QA and testing.

            And shank15217 we use thousands of blades in our datancenter but we only use them for the easy serviceability. The only thing we have ever upgraded in them is the disk capacity.

            • shank15217
            • 10 years ago

            In a HPC datacenter we upgrade everything. In LANL, we changed out every Opteron single core with a dual core variety when they came out because it was the fastest and cheapest upgrade we could get. That was over 8000 cpus. We did that over 4 months. It all depends on the flexibility of your company and its mission. If you are service oriented IT department, the most important feature is up time. If your DC has large scale clustered systems, upgrades happen often because they don’t cause service failures.

            • Usacomp2k3
            • 10 years ago

            I haven’t worked with many blades, but I also have never seen a processor upgraded in a server room. Hard-drive’s have been swapped out both due to death or to upgrading, but never CPU’s.

            Granted my hands-on knowledge was only with 1 company, but that was still about a dozen racks worth of equipment.

          • shank15217
          • 10 years ago

          Blades are designed for easy upgrades, that’s one of the reason people buy them. Its easier to pull out a half height blade and upgrade/add parts than 2U server. Blade systems are getting more popular because of this very reason.

Pin It on Pinterest

Share This