I'll admit that I've not spent enough time looking into how some of the top mobile graphics cores actually work. That's true in part because we primarily cover desktop parts, and in part due to limited time. The biggest contributor, though, may be the simple fact that the mobile GPU guys don't tend to share many details of their GPU architectures.
That said, the folks at Imagination Technologies have shipped a ton of GPU inside of, you know, every iPhone and iPad ever, and they have been somewhat forthcoming recently about how their GPUs look internally. That openness extends to today's announcement of the PowerVR Series7XT GPUs, which are the new high end of the PowerVR lineup and are based on a newly revised incarnation of the PowerVR Rogue architecture. The folks at Imagination have even provided a reasonably useful functional block diagram.
Yep, looks vaguely like a GPU. Some of the most notable features of the Series7XT family aren't readily apparent from a block diagram, though.
Many of those features involve support for Android 5.0 Lollipop and the Android Extension Pack for OpenGL ES 3.1. Most notable among them is new tessellation hardware that should allow for higher-detail worlds in mobile games and graphics. That addition should bring Rogue's feature set up to par with many desktop GPUs. In fact, Imagination says the Series7XT has "optional" support for DirectX 11.
The Series7XT also adds hardware support for GPU virtualization, so a single Series7XT GPU can be shared among multiple virtual machines running on a hypervisor.
In my view, though, the biggest change in the Series7XT is simply one of scale. To understand that, we need to take a closer look at the GPU's architecture, which brings us to the block diagram above. This diagram has amazing potential to confuse and confound, but I think the bottom line is fairly straightforward. Start with the fact that mobile GPUs like this one still tend to employ fp16 precision by default, rather than the fp32 default used by modern desktop graphics. That's a reasonable choice for real-time mobile graphics, I think.
What this diagram is attempting to show us is, in its most detailed form on the left, a single pipeline from a Rogue cluster. I don't think that pipeline really has six "ALU core" units and a special-function unit (SFU). Instead, I think it generally executes four fp16 operations per cycle, possibly in conjunction with a SFU op. If a program asks for 32-bit precision, then you get two fp32 operations instead. I doubt the hardware can co-issue and process a full slate of fp16 and fp32 instructions simultaneously.
Also, I don't think those "FLOP" boxes represent any sort of hardware. I think they're meant to convey the ability of each unit, at peak, to produce floating-point operations. Thanks to the magic of the fused multiply-add, each "ALU core" can process two flops per cycle, max.
Bottom line, then, each superscalar Series7XT pipeline can process the equivalent of a single pixel (plus maybe a SFU operation) in each clock cycle. Whether that's scheduled as four red components or a single RGBA pixel, I can't tell from here. Regardless, these pipelines are aggregated together in 16-wide groups. Each of these groups is pretty beefy, then, with a total of 64 "ALU cores," as Imagination Technologies calls them. Nvidia has taken to calling these same hardware resources "CUDA cores," and AMD still uses "stream processors" most of the time.
Now consider that the Series7XT can scale up to 16 clusters or 1024 "ALU cores" for fp16—and the equivalent of 512 "ALU cores" for fp32 datatypes. Even the fp32 count, which I believe is the relevant comparison here, is substantially larger than the 192 "CUDA cores" in Nvidia's Tegra K1 SoC. That's simply a statement of scale, not efficiency or throughput, but it helps orient us to the landscape.
Of course, smaller implementations are also possible. The entire Series7XT family is detailed here. Meanwhile, Imagination is also taking this architecture into smaller scale deployments like wearables with the Series7XE family.
This product announcement simply means Series7XT GPUs are available for Imagination Technologies' customers to license, so we probably won't see any solutions based on this IP hitting the market for another six to 12 months. Still, based on what Imagination Technologies is offering its customers, we can probably expect that mobile graphics will continue to scale up at a breakneck pace in the coming years.