Personal computing discussed

Moderators: renee, Flying Fox, morphine

 
MetricT
Gerbil In Training
Topic Author
Posts: 8
Joined: Mon Sep 23, 2013 6:21 am

Question on long-term CPU architecture

Wed Dec 18, 2013 2:03 pm

For years, CPU manufacturers used Moore's Law to scale up the frequency of the chip. This helped drive single-core speeds upward for decades, until 2005 when Intel hit a thermal limit at 4 GHz per core. Single-core clock speeds have no longer scaled up nearly as fast with subsequent process shrinks. That's a big reason why the per-core speed from Sandy Bridge -> Ivy Bridge -> Haswell -> Broadwell only increased ~10% each generation.

Next they started increasing the number of cores per CPU. That helped a bit, but they are starting to hit diminishing returns there too. Most consumer-level software doesn't benefit beyond a core or two, you have to do substantial rewrites of your code to take advantage of multiple cores, and even then most normal programs won't scale well beyond 4-8 cores due to Amdahl's Law.

So their next use of die space from process shrinks appears to be a) adding massive amounts of on-die cache and b) adding new instructions and offloading some computations to dedicated silicon (video compression/decompression, encryption, speech recognition, audio processing) Broadwell is rumored to have 128 MB of cache and an audio DSP. Imagine Commonlake+1 packing 1 GB of on-die Z-RAM cache and lots of dedicated silicon goodies.

But there's only so much cache you can use effectively before diminishing returns again kick in. So assuming CPU manufacturers will be able to continue pulling off process shrinks in the post-silicon era, how would they effectively use the extra die space to increase computation?
 
Sargent Duck
Grand Gerbil Poohbah
Posts: 3220
Joined: Thu Mar 13, 2003 8:05 pm
Location: In my secret cave that has bats

Re: Question on long-term CPU architecture

Wed Dec 18, 2013 2:19 pm

MetricT wrote:
So assuming CPU manufacturers will be able to continue pulling off process shrinks in the post-silicon era, how would they effectively use the extra die space to increase computation?


Intel/AMD aren't necessarily concerned that much about increasing computation with the extra space from die shrinks (don't get me wrong, it's still a priority), but the big thing right now is making desktop chips more energy efficient and increasing the IPC (increasing efficiency). These two things will greatly to increase the attractiveness of these chips (lower power requirements, better for laptops/tablets and also desktop users won't need a massive heatsink/fan combo). We're at a point now where the software needs to play catchup with the hardware (multi processor support), and the average desktop today is plenty fast to surf the Internet and Facebook.

As well, having cooler running chips allows them to increase the clock speeds, which is cheaper than adding more cache.
No matter how bad the new homepage sucks or how bungled the new management is...

To all the original writers/contributors and volunteers, please know that I have nothing but the deepest love for you and the work you've done.
 
Damage
Gerbil Jedi
Posts: 1787
Joined: Wed Dec 26, 2001 7:00 pm
Location: Lee's Summit, Missouri, USA
Contact:

Re: Question on long-term CPU architecture

Wed Dec 18, 2013 3:20 pm

Some things:

Moore's Law is about transistor scaling, not clock speed scaling.

You can use extra transistors to enable clock speed scaling, but recently, architects have instead generally sought to improve:

-Instruction-level parallelism (IPC increases)

-Thread-level parallelism (SMT and core count increases, along with larger last-level caches for better sharing)

-Integration (Cost reduction, power reduction, some I/O latency reduction)

-Dynamic voltage and frequency scaling (to get the most clock speed and perf out of a given power budget)

The first category, TLP/IPC increases, has been the source of much of the incremental goodness in Sandy Bridge, Ivy, and Haswell. You can gain a lot of performance by dedicating more logic to improving branch prediction accuracy and such.

But so have each of the others. No front has been neglected, really, in big x86 chips in recent years.

The interesting thing now is that two limitations are becoming more important, even as transistor scaling promises to continue for a while (probably).

One, power is the primary performance constraint for CPUs, bar none. That's why CPU architects moved away from clock speed scaling as a primary tool, and each decision about any of the other ways of improving performance is made with an eye toward power efficiency.

Two, the goodness of Moore's Law seems to be waning as the benefits of pushing into higher gate densities fade. Taking photolithography further now means using double-patterning (dual masks with an offset) on more metal layers on a chip, for example. Double-patterning costs more, and it means a die shrink won't necessarily give you more transistors for "free." Similarly, some process shrinks have had little to no (or negative) benefits for power consumption lately. FinFETs are helping, but things only get harder going forward.

Future CPU architectures will have to be built within these constraints. We could see new architectures coming out more slowly as a result, but I wouldn't count on it. Instead, we see to be seeing an explosion of new designs custom-tailored for specific market segments and missions, both in the x86 world and more notably in ARM.
Scott Wasson - "Damage"
 
MetricT
Gerbil In Training
Topic Author
Posts: 8
Joined: Mon Sep 23, 2013 6:21 am

Re: Question on long-term CPU architecture

Wed Dec 18, 2013 4:43 pm

I work in academic HPC, so focusing on "cool" isn't cool for us :-) Our researchers expect their code to just magically run significantly faster every year, and I don't think they understand the brick wall their expectations are about to run into.

I know the IPC/SMT/Power management stuff helped a lot in the past, but their current contribution is marginal, and marginal increases tended to get smaller, not bigger. Massive cache increase is the only thing I see on the horizon that would potentially offer the big increases we used to see, and even that would likely be a one-shot thing. And then we reach the "There Be Dragons" spot on the map.
 
Damage
Gerbil Jedi
Posts: 1787
Joined: Wed Dec 26, 2001 7:00 pm
Location: Lee's Summit, Missouri, USA
Contact:

Re: Question on long-term CPU architecture

Wed Dec 18, 2013 4:48 pm

Yeah, although I have to say, the idea of large amounts of RAM on chip (or on the package with the SoC) still seems pretty sexy to me. Once you're rocking a gig of super-fast "local" RAM, a whole lot of common computing workloads simply become solved problems. Take serving this here website to lots of people at once, for instance, or PS4/Xbone-class real-time graphics. We can just fit that stuff (the whole of TR's ~14 years of content, save for podcasts, or the whole rendering data set) into cache and not take the latency and power hit from going outside.

That is a very nice thing on the horizon.

After we gain that superpower, how scary are the dragons, really? :)

Edit: I should say that academic HPC should see some nice gains from fast local RAM, provided the amounts get to be large enough. Yes, the data sets are large there, but at some point, we could have as much on-chip storage as we now have on a GPU or what have you.
Scott Wasson - "Damage"
 
MetricT
Gerbil In Training
Topic Author
Posts: 8
Joined: Mon Sep 23, 2013 6:21 am

Re: Question on long-term CPU architecture

Wed Dec 18, 2013 5:29 pm

Damage wrote:
I should say that academic HPC should see some nice gains from fast local RAM, provided the amounts get to be large enough. Yes, the data sets are large there, but at some point, we could have as much on-chip storage as we now have on a GPU or what have you.


They probably won't be packing enough local RAM on-die anytime soon for us. We have several users that require 96 GB per machine. Large cache + DDR4 is going to have to do us for a while. Those apps are all hard-core serial apps in R, Matlab, Perl, and are showcases for "Weinberg's Second Law"-style programming.

We do have users (physicists mostly) who write parallel apps in C/Fortran. For them we will be getting a rack or two of Xeon Phi's later this year to play with. They read 3-5 GB/sec off disk for months on end, and the filesystem on our storage servers can use thousands of concurrent threads. I'm looking forward to seeing what difference TSX instructions make.
 
confusedpenguin
Gerbil Team Leader
Posts: 228
Joined: Tue Nov 15, 2011 12:50 am
Location: Potato State

Re: Question on long-term CPU architecture

Sat Dec 21, 2013 2:11 pm

I'm not too worried about energy efficiency. In Idaho we have cheap cheap power. I think my electric bill for a two-story house last year during one of the coldest months was 80 bucks. I like my desktop PC to double as a space heater too, so the more power I'm drawing, the better. :) I also dump used motor oil in the wood stove. It helps make more heat, and it makes a huge black polluting cloud out the chimney. Gives me a warm fuzzy feeling. :)

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On