To sum up, IBM has sort of reapplied the RISC approach of throwing control logic overboard in exchange for a wider execution core and a larger storage area that's situated closer to the execution core. The difference is that instead of the compiler taking up the slack (as in RISC), a combination of the compiler, the programmer, some very smart scheduling software, and a general-purpose CPU doing the kind of scheduling and resource allocation work that the control logic used to do.I don't want to make too much of this, because there's much to learn yet about Cell and there are some real differences here, but those eight SIMD execution units sure remind me of the guts of a modern GPU, which is also a parallel SIMD machine. (I've been on this GPU-CPU collision course kick for a while now.) Here's how Stokes describes a Cell SIMD execution core, or SPE:
The actual architecture of the Cell SPE is a dual-issue, statically scheduled SIMD processor with a large local storage (LS) area. In this respect, the individual SPUs are like very simple, PowerPC 601-era processors.I believe NVIDIA's NV4x pixel shader unit is also a dual-issue SIMD machine that operates on 128-bit vectors of four 32-bit elements, also known as pixels (or fragments). Like I said, there are no doubt real differences between the two types of processing units but the similarity is substantialand GPUs are becoming more and more general in their computational capabilities.
The main differences between an individual SPE and an early RISC machine are twfold. First, and most obvious, is the fact that the Cell SPE is geared for single-precision SIMD computation. Most of its arithmetic instructions operate on 128-bit vectors of four 32-bit elements. So the execution core is packed with vector ALUs, instead of the traditional fixed-point ALUs.
Adding additional SIMD processing power a la Cell looks to me like a better means of capitalizing on growing transistor counts than the more conventional multi-core CPU approach that Intel and AMD are taking. At the very least, it seems obviously better once you get past two conventional CPU cores, when diminishing returns on additional general-purpose cores will become a problem. That's the trouble with the approach outlined in the recent Microsoft patent app: the second, third, and fourth CPUs are being asked to do work that might be better handled by a parallel SIMD machine.