Experimental MIT chip brings cache coherency to many cores

CPU core counts are growing. Four or eight cores are pedestrian nowadays, and core counts in the double digits are becoming increasingly common in the server space. To address the problems of efficient, scalable inter-core communication between large numbers of cores, researchers, including a group at MIT, have been exploring a design called a "network on a chip," or NoC. At the International Symposium on Computer Architecture last week, the MIT group outlined its first silicon based on the design: an experimental 36-core chip called Scorpio.

A block diagram of the Scorpio CPU. Source: MIT

Why a NoC? According to an earlier press release from MIT and the above paper, the most commonly used interconnect between cores, the bus, can only scale to about eight or 10 cores before running into power consumption and latency problems. The NoC design replaces the bus with short interconnects between adjacent cores. Each core has its own router, which moves packetized requests across the chip in a manner similar to that used by the Internet Protocol.

MIT says a major problem with NoCs has been maintaining cache coherency, the issue of "ensuring that cores' locally stored copies of globally accessible data remain up to date." The research group is said to have solved this problem in a novel, scalable fashion by adding a second, "shadow" network that allows an existing cache coherency protocol, "snoopy," to function properly on NoCs. The group's paper indicates that, while Scorpio isn't the first NoC-based CPU, it is the first to implement this cache coherency scheme.

In any case, there appears to be promise for the design: the group claims an "average application runtime reduction of 24.1% and 12.9% in comparison to distributed directory and AMD HyperTransport coherence protocols, respectively," when running the Splash-2 and Parsec benchmarks on a simulated version of the CPU.

According to MIT, after the prototype silicon is confirmed to be functional, benchmarks will be run to test the chip's actual performance and compare it to the group's simulated results. The architecture will then be open-sourced using the Verilog hardware description language. 

Tip: You can use the A/Z keys to walk threads.
View options

This discussion is now closed.