Jon Stokes over at Ars Technica has done a fine job of explaining what makes the Cell processor so unique. The Cell, formally announced yesterday by partners IBM, Sony, and Toshiba, is the much-hyped wunderchip that will power the Playstation 3 (alongside an NVIDIA GPU). Mr. Stokes illuminates the Cell's unique attributes, which include eight specialized SIMD execution cores coupled to a single 64-bit PowerPC core:
To sum up, IBM has sort of reapplied the RISC approach of throwing control logic overboard in exchange for a wider execution core and a larger storage area that's situated closer to the execution core. The difference is that instead of the compiler taking up the slack (as in RISC), a combination of the compiler, the programmer, some very smart scheduling software, and a general-purpose CPU doing the kind of scheduling and resource allocation work that the control logic used to do.I don't want to make too much of this, because there's much to learn yet about Cell and there are some real differences here, but those eight SIMD execution units sure remind me of the guts of a modern GPU, which is also a parallel SIMD machine. (I've been on this GPU-CPU collision course kick for a while now.) Here's how Stokes describes a Cell SIMD execution core, or SPE:
The actual architecture of the Cell SPE is a dual-issue, statically scheduled SIMD processor with a large local storage (LS) area. In this respect, the individual SPUs are like very simple, PowerPC 601-era processors.I believe NVIDIA's NV4x pixel shader unit is also a dual-issue SIMD machine that operates on 128-bit vectors of four 32-bit elements, also known as pixels (or fragments). Like I said, there are no doubt real differences between the two types of processing units but the similarity is substantialand GPUs are becoming more and more general in their computational capabilities.
The main differences between an individual SPE and an early RISC machine are twfold. First, and most obvious, is the fact that the Cell SPE is geared for single-precision SIMD computation. Most of its arithmetic instructions operate on 128-bit vectors of four 32-bit elements. So the execution core is packed with vector ALUs, instead of the traditional fixed-point ALUs.
Adding additional SIMD processing power a la Cell looks to me like a better means of capitalizing on growing transistor counts than the more conventional multi-core CPU approach that Intel and AMD are taking. At the very least, it seems obviously better once you get past two conventional CPU cores, when diminishing returns on additional general-purpose cores will become a problem. That's the trouble with the approach outlined in the recent Microsoft patent app: the second, third, and fourth CPUs are being asked to do work that might be better handled by a parallel SIMD machine.
|Porsche and AOC present the PDS241 and PDS271 monitors||7|
|EK shows its first waterblock for an AMD Ryzen mobo||3|
|HyperX's Pulsefire gaming mouse reviewed||5|
|HP DreamColor Z31x and Z24x displays are ready for the movies||7|
|Intel's 32GB Optane Memory storage accelerator reviewed||59|
|Akitio Node Lite is a small aluminum home for PCIe devices||10|
|Radeon Pro Duo gets more energy-efficient with Polaris||43|
|Rumor: Intel Skylake-X and X299 will headline Computex 2017||56|
|Rumor: Nvidia to answer Radeon RX 550 with GeForce GT 1030||20|
|Those power consumption numbers are very fermi-liar||+54|