Gen11 graphics promise high-end features for baseline gaming
As Sunny Cove will be the next-generation building block of Intel's general-purpose compute resources, the Gen11 IGP will serve as the next pixel-pushing engine for Ice Lake processors. Intel gave us a high-level look at the GT2 configuration of its Gen11 architecture during its event. For the unfamiliar, GT2 is the middle child of Intel's integrated graphics processors and sits on the die of to many of the company's mainstream CPUs.
Most prominently, Intel wants to establish a teraflop of single-precision floating-point throughput as the baseline level of performance users can expect from GT2 configurations of Gen11. Compared to the roughly 440 GFLOPS (and yes, that's giga with a G) available from the UHD 620 graphics processor in a broad swath of basic systems on the market today, that kind of performance improvement on a platform with as much reach as Intel's integrated graphics processors could bring enjoyable gameplay to a far broader audience than ever before.
To get there, engineer David Blythe says his team set out to cram as much performance as it possibly could into the power envelope available to it. A Gen11 IGP in its GT2 configuration has 64 execution units, up from 24 in Gen9, and squeezing that much shader power into an IGP footprint and maximizing its efficiency was a battle of inches, according to Blythe. The Gen11 team apparently had to go after every small improvement it could in the pursuit of its power, performance and area goals, and that meant touching not just one or two parts of the integrated graphics processor, but every part of it.
The net result of that work was a significant reduction in the area of the basic execution unit. Blythe claimed that implementing a Gen9 EU and a Gen11 EU on the same process would put the Gen11 EU at 75% of the area of its predecessor, partially explaining how it was able to pack so many more of those units into the undisclosed area allocated for GT2 configs of Gen11 on Ice Lake.
In pursuit of both power savings and higher performance, Gen11 supports a form of tile-based rendering in addition to its immediate-mode renderer. According to Blythe, certain pixel-limited workloads benefit greatly from the ability to keep their data local to the graphics processor, and by invoking the tile-based renderer, those applications can save 30% memory bandwidth and therefore power from the uncore of the processor. In turn, the Gen11 GPU can take the juice saved that way and turn it into higher frequency on the shader pipeline. The tile-based renderer can be dynamically invoked as needed during the course of shading pixels and left off when it's not needed.
To keep more data closer to those execution units, Gen11 has a much, much larger L3 cache than Gen9. Blythe says that the GT2 configuration of Gen11 has a 3-MB L2 cache, more than four times larger than the one in the GT2 implementation of Gen9 and even larger in absolute terms than the 2.3-MB L3 in even the highest-performance GT4 implementation of Gen9.
Other improvements in the memory subsystem of the Gen11 IGP include better lossless memory compression, a common focus of improvement for making the most of available memory bandwidth in graphics processors both large and small. Blythe says the Gen11 compression scheme is up to 10% more effective at its best, but real-world performance is more likely to fall around 4% on a geometric-mean measure.
The Gen11 team also separated the per-slice shared local memory in Gen11 from the L3 cache. That structure is now its own per-slice private allocation, and each of those blocks of memory has its own data path to allow the IGP to get better parallelism out of L3 cache accesses and inter-IGP memory accesses. Finally, the Graphics Technology Interface (GTI) that joins the integrated graphics processor with the rest of the CPU is now capable of performing reads and writes at 64 bytes per clock.
While Nvidia's Turing architecture might boast the first practical implementation of the ability to vary shading rates in a scene on a fine-grained basis, Intel points out that it invented the idea of what it calls coarse pixel shading. The company claims to have published a paper on the concept as far back as 2014. Now, that technique will be available to programmers on Gen11 graphics processors.
While Intel and Nvidia's implementations of variable-rate shading likely differ in granularity, the point of the technology remains the same on Gen11 as it is on Turing: to avoid performing shading work that don't result in appreciable increases in detail for parts of the scene that might not need it. Intel has so far implemented two techniques using CPS: a global coarse-pixel-shading setting and a radial falloff function that resembles foveated rendering. The company also notes that the algorithm is also available on a draw-call-by-draw-call basis.
The company's demos of coarse pixel shading covered two potential ways the tech can be used. One was a synthetic, pixel-bound case where the software was choosing shading rate going on distance from the camera and using level-of-detail characterizations on a per-object basis. In this demo, employing coarse pixel shading offered as much as a 2x boost in performance, but the company admitted that this was a best-case scenario.
Intel also showed an Unreal Engine demo with the radial falloff filter it had developed. In that case, the improvement from CPS was closer to 1.3x-1.4x that of the base case without CPS. Like Nvidia, Intel says its coarse pixel shading API is simple and easy to integrate, so we'll be curious to see how much adoption this technology gets and how developers might choose to use it in the real world.
Gen11 is the first Intel graphics processor with support for the long-promised and long-awaited VESA Adaptive Sync standard. Variable-refresh-rate displays are a mature technology at this point, but it's still welcome to see relatively modest graphics processors like GT2 driving compatible monitors in a tear-free fashion. Intel also claims that its Adaptive Sync-compatible IGPs will include desirable features like low framerate compensation from the get-go.
Overall, the GT2 implementation of Gen11 and its promise of usable gaming performance, combined with its modern display features and likely-to-be-egalitarian positioning could introduce a broad audience to some features that only high-end graphics cards enjoy today.