If the P4 takes a "narrow and deep" approach to performance and the G4e takes a "wide and shallow" approach, the 970's approach could be characterized as "wide and deep." In other words, the 970 wants to have it both ways: an extremely wide execution core and a 16-stage (integer) pipeline that, while not as deep as the P4's, is nonetheless built for speed. Using a special technique, which we'll discuss shortly, the 970 can have a whopping 200 instructions on-chip in various stages of execution, a number that dwarfs not only the G4e's 16-instruction window but also the P4's 126-instruction one.The article is a pretty technical read, but it's well worth wading through, even if you have to go over sections a few times to get things straight in your head. It's probably a good idea to get yourself a fresh cup of coffee and perhaps even a pastry before you dig in.
You can't have everything, though, and the 970 pays a price for its "more is better" design. When we discuss instruction dispatching and out-of-order execution on the 970, we'll see what tradeoffs IBM made in choosing this design.