Not only has this arrangement been a performance bottleneck, but it has also helped limit the number of processor cores Intel could beneficially run in a two-socket system. As a result, the quick turn toward dual cores Intel made across its product lines hasn't worked out so well for the Xeon. Before today, the only dual-core Xeon Intel sold was confined to low-volume, high-dollar server systems, and that processor, code-named Paxville, came at just one speed grade: a plodding (by Netburst standards) 2.8GHz. Single-core Xeons have been forced to contend with AMD's dual-core Opterons in mid-to-low-end servers and workstations, and that's not a winning proposition.
Intel is looking to reverse its fortunes in the server and workstation markets with a series of new products designed to take on the Opteron and win. The first wave of those products is debuting today in the form of a new server platform, code-named Bensley. This platform consists primarily of a new chipset that will be mated with a trio of new Xeon processors, also scheduled to arrive in stages. The star of that lineup, once it arrives, will be the Woodcrest microprocessor based on Intel's new Core microarchitecture, which promises higher performance and lower power consumption than current Netburst-derived Xeons. I recently attended a reviewer's workshop in Portland, Oregon where I got to spend some quality time with a pair of Woodcrest processors on the Bensley platform, and I came away impressed. More importantly, I came away with some benchmark scores we can compare directly to AMD's Opteron 285.
The Bensley platform
The Bensley platform's central nervous system is tied together by the Blackford memory controller hub, or MCH. Blackford brings a few key innovations to Intel's dual-socket server hardware in an effort to put Xeon on better footing against the Opteron's combination of integrated memory controllers and HyperTransport interconnects. You can get a sense of Bensley's layout in the block diagram below.
Intel hasn't (yet) decided to move to high-speed, chip-to-chip interconnects in place of a traditional front-side bus, but it has added more dedicated bandwidth per processor by doubling up on front-side busses, with one bus dedicated to each of Bensley's two CPU sockets. This arrangement should give each Xeon plenty of dedicated bandwidth to talk to the rest of the system. The busses can run at speeds up to 1066 or 1333MHz, depending on the CPU, for as much as 8.5 or 10.5 GB/s of peak bus bandwidth per socket, respectively. A traditional bus may not be as elegant a solution as HyperTransport, but for most practical purposes, once implemented, two of them should almost surely perform well enough in a two-socket system.
Blackford does adopt a more serialized approach to transmitting some data, howeverspecifically, data to and from main memory, thanks to the addition of Fully Buffered DIMMs. Intel has helped shepherd this new memory standard to market, and Blackford takes advantage of it. The basic approach behind FB-DIMMsreplacing older, more parallel memory interfaces with a narrower interface that features higher clock speedmay sound familiar to those who lived through the rise, sputter, and fall of RDRAM. FB-DIMMs do have some of the same advantages of RDRAM. The narrower links simplify routing of traces on the motherboard and offer the potential for more total memory capacity. That's why Intel has pushed this standard for servers, where higher memory capacities are needed. Yet FB-DIMMs sidestep some of RDRAM's problems, including the need for a custom memory IC. Instead, FB-DIMMs employ an array of bog-standard DDR2 memory chips that sit behind a so-called Advanced Memory Buffer (AMB) chip present on each module. The AMB talks to the memory controller over a serial link, transmits data to and from memory chips, and passes on signals to any other modules on the same channel.
The buffering and serialization involved in this scheme potentially introduces additional latency for memory accesses, but the FB-DIMM spec attempts to mitigate this problem in various ways. For instance, the presence of dedicated upstream and downstream links allows for simultaneous memory reads and writes. Intel seems confident these measures should suffice, especially since Blackford has four FB-DIMM channels to enable a host of concurrent memory accesses and over 20 GB/s of total bandwidth. Of course, as a server memory technology, FB-DIMM also has extensive provisions built in for error correction, retries, and the like. (For more on the nitty-gritty of the FB-DIMM scheme, I suggest reading Charlie Demerjian's three-part series at the Inquirer.)
FB-DIMM's virtues come at a price, however. Intel estimates FB-DIMMs will consume about 5 W per module more than DDR2. In fact, FB-DIMMs appear to require some form of active cooling at this stage. Many of the open test systems we observed in Intel's labs had a 120 mm fan resting on top of them, running, and our Bensley test system has a cooling tunnel shroud that extends over the DIMMs as well as the processors. Since Blackford can support up to 16 memory modules, FB-DIMMs have potential to add as much as 80W to a system's total power draw. Intel seems willing to take this hit in the server space and expects to be able to deliver competitively power-efficient total platforms, regardless.
In addition to its CPU and memory interfaces, Blackford has three different eight-lane PCI Express links, through which it communicates with various I/O chips, including the ESB-2 I/O bridge. The ESB-2 offers most of the functions of a traditional chipset's south bridge, including a PCI-X bus, dual Gigabit Ethernet MACs, and three PCI-E x4 links for peripherals such as storage controllers.
Incidentally, Intel has workstation versions of the Bensley platform and Blackford MCH in the works, as well, known as the Glidewell platform and the Green Creek MCH. These are distinguished from their server-class counterparts by Green Creek's use of its PCI-E connectivity for graphics rather than other sorts of I/O. Glidewell will launch after Bensley, but should be coming fairly soon.
|Here's the not-so-live video version of The TR Podcast 164||10|
|Here's what's cooking in Damage Labs||13|
|Deal of the week: An IPS ultra-wide for $420, plus cheap SSDs and more||12|
|Microsoft's quarterly revenue up 25% on strong Surface, Xbox sales||18|
|Assassin's Creed Unity PC requires 6GB of RAM, GTX 680||202|
|Join us as we attempt to live stream The TR Podcast tonight||13|
|Civ: Beyond Earth with Mantle aims to end multi-GPU microstuttering||62|
|CPU startup claims to achieve 3x IPC gains with VISC architecture||59|
|I just found this AMAZING trick! Call of Duty takes up 0GB if you just don't buy it!||+114|