The A8 die shots show an integrated GPU with four "cores," and each pair of GPU cores appears to share a major logic block between them. The most likely candidate for the new iPhone GPU, far and away, is the PowerVR GX6450 from Imagination Technologies. Apple has long relied on PowerVR GPUs for its iOS devices, and the GX6450 is a four-core product from the PowerVR Series6XT family based on the "Rogue XT" architecture.
iPhone 6 Plus
|SoC||Apple A6||Apple A7||Apple A8|
|GPU die area||20.7 mm²||22.1 mm²||19.1 mm²|
|GPU||PowerVR SGX 543MP||PowerVR G6430||PowerVR GX6450|
|Est. clock speed||~280 MHz||~430 MHz||~430 MHz||~475 MHz|
|Texture filtering||6 texels/clock||8 texels/clock||8 texels/clock|
|Pixel fill||6 pixels/clock||8 pixels/clock||8 pixels/clock|
|System memory||1GB LPDDR2||1GB LPDDR3||1GB LPDDR3|
The Rogue XT architecture is a bit unusual compared to most conventional GPUs because it's built around a tile-based deferred rendering (TBDR) method. We've reviewed "tiler" GPUs of this sort before, but it's been a while (unless, you know, there's something Nvidia isn't telling us). TBDR rejiggers the order of the traditional graphics pipeline in order to ensure that the GPU only spends its cycles shading and texturing pixels that will appear visible in the final frame being produced. In theory, at least, these GPUs ought to be very efficient with their resources. That's probably why Imagination Technologies has been a big player in a mobile SoC world defined by strict constraints.
Imagination Technologies has been reasonably forthcoming about the guts of its graphics IP recently, so we have a fairly good idea how the various iPhone GPUs ought to stack up (although Apple may have modified some of that IP in ways we don't know). I think we can trust the basic per-clock GPU throughput numbers above. The GPU clock speeds are easy enough to estimate by testing delivered performance, as we've done below, and working backward.
Apple has touted a substantial graphics performance increase for the new iPhones, and such things are usually achieved by making the GPU wider. You'll notice in the table above that the area of the A8's die dedicated to the GPU has only dropped by a few square millimeters, in spite of the process shrink from 28 to 20 nm. Clearly, the amount of GPU logic present has grown. Curiously, though, the peak shader arithmetic, texturing, and pixel fill figures haven't increased from the A7.
Chalk up that anomaly to the direction Imagination Technologies took with its PowerVR Series6XT. The PowerVR GX6450 has many of the same theoretical peak rates as the G6430 before it, but there's more going on under the covers. Although the fp32 math rate is the same, the Rogue XT shader core can deliver 25% more fp16 flops than its predecessors. The on-chip SRAM pools for tile buffers, the register file, and the caches have grown in size so that the existing graphics units can be more fully utilized. Rogue XT also adds support for the ASTC texture compression algorithm first developed by ARM. Thanks to these changes, the GX6540 should improve delivered performance even without an increase in peak rates. (For those interested in the gory details of the Rogue shader units, I've written a little about them here.)
These first two tests stress key graphics rates for texturing and shading. The A8 more or less keeps pace with the Adreno 330 GPU in the L3 G3 and OnePlus One in terms of texturing throughput, but the new iPhones fall behind in a directed test of shader arithmetic. (I get a kick out of writing about Qualcomm's GPUs, since "Adreno" is an anagram for Radeon, revealing its roots at ATI.)
The Tegra K1 in the Shield tablet is both figuratively and literally in another class.
Alpha blending is more of a classic graphics sort of thing to do, and in this workload, the new iPhones suddenly look to be more competitive.
As I understand it, this benchmark attempts to measure driver overhead by issuing a draw call, changing state, and doing it again, over and over. Performance in this test may end up being gated by CPU throughput as much as anything else. That fact could, at least in part, help explain the iPhones' big lead here. Driver overhead is a significant part of the overall performance picture in 3D gaming, so this result is relevant regardless of the primary constraint involved.
All three of these tests are rendered off-screen at a common resolution, so they're our best bet for cross-device GPU comparisons. They're also more complete benchmarks than the directed tests above, since they involve rendering a real scene that could plausibly come from a mobile 3D game. The older iPhones can't run GFXBench's "Manhattan" test because it requires OpenGL ES 3.0 compliance.
As soon as it gets its claws into this sort of workload, the A8's GPU looks quite a bit stronger than it does in synthetic tests of ALU and texturing rates. The delivered performance and efficiency of the GPU in the new iPhones is quite good—and according to GFXBench, at least, the GX6450 is indeed a substantial step up from the G6430 in the iPhone 5S.
The iPhone 4, uh, was good for its era, but I'm not waiting for those benchmarks to finish ever again.
Native device resolution gaming
Devices with higher-resolutions displays will have to push more pixels in order to deliver the same frame rendering times as their lower-res competition. The tests above give us a look at how these systems fare when asked to light up all of their pixels. Although the 6 Plus's GPU appears to be clocked somewhat faster than the iPhone 6's, it extra juice isn't enough to make up entirely for 6 Plus's higher display resolution. In one of the two tests, at least, the 6 Plus is faster than the iPhone 5S.
More importantly, my 6 Plus certainly runs Infinity Blade III smoothly.
The iOS version of Basemark X runs on-screen and off-screen tests and then spits out a single, composite score, unfortunately. I wish we could break out the component tests, especially since this benchmark walks through a nice-looking scene rendered using the popular Unity game engine.
One other feature of Basemark X is an intriguing quantitative test of graphics image quality.
Real-time graphics is strange because there's not always one "right" answer about the color of a rendered pixel. Filtering methods and degrees of mathematical precision vary, since GPU makers take different shortcuts to trade off image quality against performance.
Basemark X attempts to quantify the fidelity of a GPU's output by comparing it to some ideal—in this case, I believe the reference renderer is a desktop-class graphics chip. That's a fair standard, since desktop chips these days produce something close to ideal imagery. The higher the signal-to-noise ratio reported, the closer these mobile GPU come to matching the reference image.
Frustratingly, a couple of the devices refused to run the quality test with "out of memory" errors. Among those that did run the test, the Tegra K1 in the Shield tablet comes out on top. The other mobile GPUs are pretty closely bunched together after that. I suspect the Tegra K1's score is the same in the regular and high-precision versions of the test because its GPU always renders everything using fp32 precision internally, even if the application doesn't request high precision.
The flip side of that coin is what happens with the PowerVR and Adreno GPUs in the high-precision test. They all hit a ceiling at about the same place, well below the Shield Tablet's score, even though their shader ALUs are capable of fp32 precision when requested. I suspect the limitation here isn't in the shader ALUs, but in other graphics-focused hardware, interpolators and such, whose internal precision may not be up to snuff.
This limitation isn't a problem for mobile graphics in its current state. Both the iPhones and the Qualcomm-based devices produce rich visuals without any obvious artifacts in today's games. But mobile GPUs may need to gain more consistent precision, like desktop GPUs did in the DX11 generation, going forward. Games may require added precision as developers layer on ever more complex effects, and GPU computing applications will probably require it, as well.