Thanks to a paper Intel is presenting at this year's Siggraph conference, we now know more about Intel's in-development graphics processor and do-everything data-parallel computing engine known as Larrabee. Dr. Larry Seiler, Senior Principal Engineer on the Larrabee project, and a number of his co-authors provided an overview of the paper to the press late last week, and in the process, they revealed quite a bit about how Larrabee will work and what Intel's design philosophy looks like. We still don't have enough information to piece together any performance projections, but Intel has detailed quite a bit about the basic building blocks of Larrabee's architecture and about its approach to rendering.
Sadly to say, I was working on write-up about this new Larrabee info, but I just wasn't able to get it done last night. Sometimes in life, you just have to punt, and for various reasons, today is one of those times. But you should not miss out on the Larrabee goods if you're into this stuff. Let me suggest reading the Larrabee write-up by the guys over at AnandTech for more detail on the chip's inner workings. The highlights include:
- An x86-compatible basic processing core derived from the original Pentium. This core has been heavily modified to include a 16-ALU-wide vector unit for use in Larrabee. Each core has L1 instruction and data caches plus a 256KB L2 cache, all fully coherent
- A 1024-bit ring bus (512 bits in each direction) for inter-core communication. This bus will carry data between the cores and other major units on the chip, including data being shared between the cores' individual L2 cache partitions and the cache coherency traffic needed to keep track of it all.
- Very little fixed-function logic. Most stages of the traditional graphics pipeline will run as programs on Larabee's processing cores, including primitive setup, rasterization, and back-end frame buffer blending. The major exception here is texture sampling, where Intel has chosen to use custom logic for texture decompression and filtering. Intel expects this approach to yield efficiency benefits by allowing for very fine-grained load balancing; each stage of the graphics pipeline will occupy the shaders only as long as necessary, and no custom hardware will sit idle while other stages are processed.
- DirectX and OpenGL support via tile-based deferred rendering. With Larrabee's inherent programmability, support for traditional graphics APIs will run as software layers on Larrabee, and Intel has decided to implement those renderers using a tile-based deferred rendering approach similar to the one last seen on the PC in the Kyro II chip. My sense is that tile-based deferred rendering can be very bandwidth-efficient, but may present compatibility problems at first since it's not commonly used on the PC today. Should be interesting to see how it fares in the wild.
- A native C/C++ programming mode for all-software renderers. Graphics developers will have the option of bypassing APIs like OpenGL altogether and writing programs for Larrabee using Intel's C/C++-style programming model. They wouldn't get all of the built-in facilities of an existing graphics API, but they'd gain the ability to write their own custom rendering pipelines with whatever features they might wish to include at each and every stage.
This Larrabee info is fascinating, because it makes for head-spinning potential in many areas and yet... one could easily see the first products falling well short of existing GPUs in terms of performance and area efficiency in most games and other graphics apps. Custom hardware is awfully difficult to beat.
One potential positive of Larrabee's fully coherent memory subsystem is the possibility of much more efficient multi-chip implementations. Nvidia and AMD are essentially managing the coherency problem manually via custom game profiles for multi-GPU setups right now. When I asked about this issue, Intel said it didn't expect to have the same pain as its competitors in this area.
The possible downsides of all-software implementations of things like the render back-end are also rather apparent. We saw this illustrated nicely when the Radeon HD 4800 series brought a vast improvement over the shader-based MSAA resolve used in the Radeon HD 2900 and 3800 series products.
Somewhat comically, the initial reactions to the Larrabee architecture disclosures from Intel are mixed along clear dividing lines. When I pinged David Kanter, the CPU guru at Real World Tech, about it, he was very much impressed with the choices Intel had made and generally positive about the prospects. Meanwhile, Rys over at Beyond3D expressed quite a bit of skepticism about the chip's likely performance and area efficiency for graphics versus more traditional architectures.
Handicapping Larrabee's performance is very difficult right now for many reasons, including the fact that we still don't know some key specifications: how many cores Larrabee chips will have, how the memory interfaces will look, the number and capacity of the custom texture units, and what sort of clock speeds to expect. We also don't yet know exactly how Intel will be extending the x86 instruction set for graphics. We should have more answers as Larrabee's release approaches. What we do know for sure is that things are about to get a lot more interesting.