Live blog: IDF 2011 Justin Rattner keynote

Time for one last live blog from the Intel Developer Forum in San Franscisco.  Today Justin Rattner gives an update on the future: Intel’s R&D efforts.

We’re starting with a cheesy video, in which the military has discovered a 48-core chip and is deeply concerned.

They’ve found the chip’s programmer and are interrogating him about how the heck he parallel programmed the chip.  He explains there are lots of tools now, even JavaScript!


Ladies and gentlemen, please welcome Justin Rattner.

And he’s wearing a beret!  Nice.

He offers us his Mooly impersonation.  Which is.. short.

Five years ago today, I stood on this stage to give the opening keynote. It’s usually the CEO thing, and I’m not vying for the CEO thing, so have no fear.  But I was onstage to introduce the Core architecture.  At that time, we talked about slowing down some cores to speed up others.  We’ve come a very long way in five years.

We now have heterogeneous processors, with GPUs onboard.  Soon, Intel will introduce Knight’s Corner, Intel’s first many core general-purpose processor.

Programming the cores is easy.  Familiar memory model, familiar instruction set.

We also launched our terascale effort, Intel’s many-core research program.  We’ve built a few experimental processors in Intel Labs.  The 80-core, the 48-core single-chip cloud computer that tests out what a future server processor might look like.

 And we’ve been busy creating better tools for programming many-core architectures.

 We haven’t limited our scalability test for many-core to HPC apps.  We’re testing lots of different types.  Across this very large range of applications, we’re seeing excellent scalability.  Don’t think there’s anything on this chart less than 30X speedup.  Given us a lot of confidence people are going to be able to put this architecture to work and take performance to higher levels.

Andrezj Nowak, a particle physicist from CERN openlab, is onstage to talk with us.  He works on optimizing performance for many-core.

Let’s talk about the large hadron collider.  It operates at a temp cooler than outer space, at 1.8 Kelvin.  And… we’re going to see a video about it.

Currently yearly data production from LHC is over 15 petabytes. Real challenge is processing the data later.  We’ve build a grid with 250K Intel processing cores.  Is spread across the world.

Processing takes place in four major domains.  We simulate the physics and see if the behavior in the collider matches with what we know.

We can use the same toolset we use for Xeon with the MIC, which is nice.  Here’s a look at some code than Intel helped us visualize.

We use a single MIC core here and can visualize the program running.  Now, we’ve engaged all 32 cores of the MIC on this other machine.  What would take minutes on a single core takes seconds on the MIC.

We’re looking forward to further versions of the MIC, and we’ll take as many cores as we can get.

Rattner: Make me one promise.  Don’t make any little black holes that suck us all in.

Openlab dude: Ok, if you promise me more cores.


And the Openlab guy is finished.

Rattner: Do you have to be a ninja programmer to write many-core applications?

Loud gong sound echoes through the hall.  To no laughter whatsoever.

Don’t worry.  We’re not going to bring ninjas back onstage.

(Phew.  Can we banish additional ninja jokes, too?  Parallelism isn’t always good.)

Billy is here to talk about improving the access of large-scale content in the cloud.

Billy says what folks have done with legacy databases for the cloud is just moving the entire database into RAM.

Best transaction rates today are about 560K transactions/second.  With MIC, we can go over 800K queries/sec with lower latency.

Brendan Eich, CTO Mozilla, is onstage to talk about parallel Javascript.  He made Javascript.

“I had 10 days in May 1995 to make it.” Has grown into a mature, widely used language.

Javascript so far on the client is predominantly sequential, but we’d like to use multiple cores.  Tatiana is going to show us parallel Javascript.  River Trail is the code name of the work.  It’s running a physical simulation at 3 FPS on one thread.  Pop to all of the cores, and it speeds up to 45 FPS.

She says using this should be “quite easy” because it’s just Javscript extended to add parallelism in an easy way.  Is available to developers on

Brendan says he’s going to promote this at standards bodies.

Moving on, we’re asking whether an LTE base station can be built out of a multi-core PC.  Had an idea, along with our friends at China Mobile, that it might be possible to turn a standard PC into a base station.  Entered into an agreement with them two years ago in order to do this.  Architecture is interesting.  Key idea is that at the cell tower is just the RF front end.  Radio signals are digitized, moved over fiber network to a data center.  It’s kind of base stations in the cloud.

Dude from China Mobile is onstage to demo.  He’s armed with a quad-core Sandy Bridge desktop and a pretty thick accent.  Says they’re using AVX instructions to do signal processing.  With lots of optimization, can handle real-time requirements.  And the workload isn’t even using all of the computing power.

Rattner: Tell me how you dealt with the real-time issues.

Software is real-time patched Linux.  In about 3ms, is able to respond.  Can stream video over it.  Next year, will begin field trials with China Mobile.

Rattner: We’re not just looking at base stations, but high-volume network equipment and switches using standard components from computers.

Now, Dave is going to tell us how we can use the power of multi-core for security.

Folks have perked up, sitting on the edge of their seats.

(Kidding.  Kidding.)

Dave says “I love this demo, because you can actually see the cryptography.”  There are.. pictures of people onscreen.  Not sure I see the crypto.

Oh, some pictures look like static.

They use a webcam as a biometric security gate to determine which pictures the user can access.  Changes depending on who’s on camera, using facial recognition.

Dave’s finished, and Justin reminds us you don’t need to be ninja programmer for MIC programming.

What lies beyond multi-core computing?  Extreme scale computing.

Our 10-year goal is to achieve a 300X improvement in energy efficiency for computing.  Equal to 20 picojoules/FLOPS at the system level.


Extreme scale guru whose name I missed is here to talk about… extreme scale computing.

Today, we operate a transistor at several times its threshold voltage.  One thing we can do is reduce the supply voltage and bring it closer to threshold.

Claremont: a Pentium-class processor running near threshold voltage.  This is the one from the solar-powered system demo on Tuesday.  We’re operating within a couple of hundred millivolts of threshold voltage. 

This is a 5X improvement in power efficiency, but could have gotten ~10X with a newer core.

It’s so old, we went on eBay looking for a Pentium motherboard for it.

How do we turn this into a higher performance system?  Scales to over 10X the frequency when running at nominal supply voltage.

It’s running Quake!  Slowly.  Heh.

So could see future ultra-low-power devices with wide dynamic operating range.

New prototype: hybrid DRAM stack.  About 4-8 can be stacked.  These are 4-high.  Stacked mem is very high efficiency.  A terabit per second demo, supposedly very energy efficient, but I don’t see any info on voltage or power draw.  Hrm.

And we’re finished with the cool power guru.

Rattner: What we’ve been talking about today is the future.  We have something called the Tomorrow Project.  Brian David Johnson, our futurist, is gonna talk about it… on video.

Voiceover with lots of graphics that look like Tron.  Although no light cycles. 🙁

We’re talking to dignitaries, thinkers, sci-fi authors.  Want to invite you all to join the conversation by visiting the website for it.

“If you can dream it, we can invent it together.  Thank you, and see you next year.”

Annnnnd, that’s that.  We’ll take one of those near-threshold-voltage computers for review, please.  Thanks.

Comments closed
    • ronch
    • 8 years ago

    This guy’s trying too hard to be a Mooly.

    • srg86
    • 8 years ago

    I thought this near threshold CPU must have been a Pentium 1 class chip, the previous keynote pics have the old Intel PCIset chipset in them! I’m guessing either a 430FX or VX.

    • NeelyCam
    • 8 years ago

    That memory cube thingy looks quite brilliant. DRAM processes are geared towards low cost and leakage, so transistors on these are slow and can’t drive mobo traces between the memory and the memory controller fast or power-efficienctly.

    A high-speed buffer chip can drive those traces much faster and more efficiently using serial approaches like PCIe or QPI/HT. Once you put it together with the memory stack with hundreds of TSVs, you can increase the parallelism of the DRAM->buffer link, running each (very very short) trace slower and more power efficiently while keeping the total throughput high (sort of like in WideIO).

    You can probably run the DRAM chips at much lower voltages because their I/O speed requirements are much relaxed… saving even more power. Power reductions and bandwidth increases all around. Pure win.

    [url<],13277.html[/url<] [url<][/url<] making them drive long traces on motherboards burns a lot of power and limits how fast they can go.

    • dpaus
    • 8 years ago

    What’s with the ninja fetish??

    EDIT: and if Dr. Evil can have sharks with frikkin’ lasers on their head, why doesn’t Intel have ninjas with berets?

    (trick question: the answer is that Intel doesn’t have as much money as Dr. Evil)

Pin It on Pinterest

Share This