Why are CPUs so parsimonious with L1 cache

Discussion of all forms of processors, from AMD to Intel to VIA.

Moderators: Flying Fox, morphine

Why are CPUs so parsimonious with L1 cache

Postposted on Tue Jan 25, 2011 8:58 pm

I came across this layman article about cache, but I still don't understand some aspects of it:

Level 1 cache actually resides in the processor core and runs at the processor speed, very fast compared to the other RAM. Due to physical space constraints the size of this cache is small; on the Intel Yonah dual core processor the L1 cache is 32KB while others can be up to 128 KB. Level 2 cache rests outside the CPU core and before the DRAM. This cache will typically run at speeds below the processor speed, but it still faster then the DRAM and is far larger then L1 cache...One might ask, “if cache is so much faster then any other type of memory why not built a system that only uses cache?” The answer is money, the cost of SRAM (cache) ranges from $4,000 to $10,000 per gigabyte.


It mentions physical space constraints? What causes the constraint? Does L1 caches require more complex circuitry? And why is it so expensive? They don't use gold instead of silicon for cache :wink: Does it have more defects per area than say non cache?

I keep thinking how we are coming up with 22nm processes and transistor budgets in the billions. To my naive understanding, shouldn't that make L1 cache cheap and plentiful.
I dont think, therefore I am not.
WhatMeWorry
Gerbil
 
Posts: 84
Joined: Mon Apr 13, 2009 6:01 pm
Location: Illinois

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Tue Jan 25, 2011 9:36 pm

WhatMeWorry wrote:It mentions physical space constraints? What causes the constraint? Does L1 caches require more complex circuitry? And why is it so expensive? They don't use gold instead of silicon for cache :wink: Does it have more defects per area than say non cache?

I keep thinking how we are coming up with 22nm processes and transistor budgets in the billions. To my naive understanding, shouldn't that make L1 cache cheap and plentiful.

The L1 cache requires a deep, deep integration with the CPU to maintain its clock speeds while staying clock-synchronized with the CPU. Basically it has all the constraints of every other part of the CPU in terms of physical layout on the silicon. L2 is far enough separated that you can just shove transistors at it and make it bigger without worrying about such things so much -- which you might notice CPU mkers have in fact been doing.
Core i7 920, 3x2GB Corsair DDR3 1600, 80GB X25-M, 1TB WD Caviar Black, MSI X58 Pro-E, Radeon 4890, Cooler Master iGreen 600, Antec P183, opticals
SNM
Emperor Gerbilius I
 
Posts: 6206
Joined: Fri Dec 30, 2005 10:37 am

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Tue Jan 25, 2011 10:16 pm

There have been chips with large L1 but the only ones I can think of are the older HP PA-RISC chips. They were a bit faster than Sun chips of the same day, if I am remember right, which had small L1 and large half speed L2.

AMD has normally had a large L1 than intel and their caches are/were exclusive vs inclusive.

Though like you I am surprised the L1 sizes haven't increased more over the years, must be easier/cheaper to get performance elsewhere.
tfp
Grand Gerbil Poohbah
 
Posts: 3076
Joined: Wed Sep 24, 2003 11:09 am

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Tue Jan 25, 2011 10:19 pm

At the clock speeds a modern CPU core runs at, the propagation time of electrical impulses -- even at near the speed of light -- starts to become an issue. The L1 cache needs to be physically small so that it can be located as close as possible to the execution units in the CPU core.

Furthermore, the logic to manage a cache is not trivial. Any given physical memory address can be mapped to any one of a number of potential cache locations; there is also additional logic to track which locations in the cache are valid and which ones are "dirty" (need to be flushed back out to L2 because they have been modified). The larger the cache, the slower the cache management logic gets, since it needs to deal with more bookkeeping data.

This is why we have L2 (and now L3 as well). As the caches physically move farther away from the cores, they get progressively larger and slower. You could even view system RAM as your L4 cache, sitting between the CPU and your disk drives...
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38125
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Tue Jan 25, 2011 10:33 pm

Parsimonious...good word. :)
bdwilcox
Graphmaster Gerbil
 
Posts: 1261
Joined: Mon Apr 21, 2003 12:21 pm

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Tue Jan 25, 2011 10:35 pm

Yeah, who says the Internet is turning us all into a bunch of illiterate dummies? :lol:
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38125
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Tue Jan 25, 2011 10:53 pm

just brew it! wrote:Yeah, who says the Internet is turning us all into a bunch of illiterate dummies? :lol:

Good thing he didn't use the synonym niggardly. You can get fired for that, you know.
bdwilcox
Graphmaster Gerbil
 
Posts: 1261
Joined: Mon Apr 21, 2003 12:21 pm

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Tue Jan 25, 2011 11:18 pm

It all comes down to minimizing average memory access times. It depends on cache hit rate and latency.

A smaller cache will have a lower hit rate (percentage of times a particular item is found on the cache as opposed to being fetched from main memory) but will also tend to have a lower latency. The opposite is true for a larger cache.

Thus (when you do the math), the best performance is usually obtained with several levels of cache, starting with smaller, faster caches (L1) and growing progressively bigger and slower (L2 and L3)

Other factors are involved, but you get the general idea.
Intel Core i7 3770K / 16 GB Kingston HyperX DDR3-1600 / Intel 520 180GB SSD / Zotac GTX660 / Corsair TX650 V2 PSU / Audigy 2 ZS
Wajo
Gerbil Elite
Silver subscriber
 
 
Posts: 596
Joined: Fri Jun 18, 2004 2:08 am
Location: MX

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Wed Jan 26, 2011 12:37 am

Wajo wrote:Thus (when you do the math), the best performance is usually obtained with several levels of cache, starting with smaller, faster caches (L1) and growing progressively bigger and slower (L2 and L3)

I believe his question was more along the lines of, "Why don't we just make an L1 that is as big as the L1+L2+L3 all put together?"

If you don't understand that larger caches are necessarily slower, it is a reasonable question to ask.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38125
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Wed Jan 26, 2011 8:58 am

Good thing he didn't use the synonym niggardly. You can get fired for that, you know.


Naw, these days one simply can't use terms like "target", "Fire", or "Crosshairs". I guess that makes all discussion of AMD's "Crossfire" off limits. :)
mutarasector
Gerbil In Training
 
Posts: 8
Joined: Sun Nov 16, 2008 10:59 pm

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Wed Jan 26, 2011 10:28 am

bdwilcox wrote:
just brew it! wrote:Yeah, who says the Internet is turning us all into a bunch of illiterate dummies? :lol:

Good thing he didn't use the synonym niggardly. You can get fired for that, you know.

Dang I had forgotten about that. Crazy stuff... :roll:
"No I don't want the Ask toolbar! No I don't want Bing as my default search! No I don't want to make Chrome my default browser!"
"Good grief, man! WHAT are you trying to install on that poor computer?"
"Antivirus."
kvndoom
Minister of Gerbil Affairs
Silver subscriber
 
 
Posts: 2420
Joined: Sat Feb 28, 2004 11:47 pm
Location: Communistwealth of Virginia

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Wed Jan 26, 2011 12:52 pm

And with that, let's please end this excursion towards the border of R&P.

Thanks for listening.
Life is hard; but it's harder if you're stupid. Big Al.
Captain Ned
Global Moderator
Gold subscriber
 
 
Posts: 20647
Joined: Wed Jan 16, 2002 7:00 pm
Location: Vermont, USA

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Fri Jan 28, 2011 6:48 pm

The larger the cache, the higher the latency. If they made 256k of L1 cache, not only would it eat up more transistors, but it could over double the latency, which would hurt performance.

It's a delicate balance between locality and latency. Prefetching and hyper threading can help mask a lot of stalls caused by cache misses.

Also, with cache, latency is cumulative.

example. a program requests data from a memory address, it
checks L1 cache - 2 cycles, not there
check L2 cache - 12 cycles, not there
check L3 cache - 25 cycles, no there
goes out to main memory, has to wait 2 command cycles, 9 cas cycles, 9 cas-ras cycles, 9 ras cycles @ 1600 mhz, finally read.(memory latencies are at memory speeds, so a CPU at 3.2ghz would see latencies of 4-18-18-18 instead of 2-9-9-9)

you spend 39 cycles just figuring out you need to get the data from main memory. The larger the cache, the less likely the data will have to be read from main memory, but with diminishing returns and increased latency added for each step, you optimize your sizes.
bcronce
Gerbil
 
Posts: 18
Joined: Tue Jun 03, 2008 5:12 pm

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Fri Jan 28, 2011 7:48 pm

And there are secondary considerations as well. For example, the bigger the cache the larger a target it is for soft error-causing cosmic rays (process node reductions of course shrink the physical size, but make the bits easier to flip). The larger L2+ caches get around this by including a lot of error correction circuitry, but that incrases latency and power usage (yet another reason why those caches are slower than L1). From the intro of a paper (PDF) from Carnegie Mellon
Rising soft-error rates are a major concern for modern microprocessor designers. The reduction in charge stored in memory cells, a result of continued technology scaling, leaves on-chip SRAMs (e.g., caches, TLBs, register files) highly susceptible to soft errors. Coding techniques, such as SECDED ECC (single-error correct, double-error detect), are widely utilized for protecting on-chip SRAMs. For L1 data caches, however, where low access latencies are critical, the additional delay to correct ECC errors prohibits inline correction on a read. In the event an error is detected on a read, recent designs such as the AMD Opteron throw a machine check exception asynchronously, potentially halting the machine to prevent silent data corruption.

Further compounding problems, recent work suggests that spatial multi-bit errors, where a single cosmic particle strike upsets multiple neighboring memory cells, are increasingly likely at future technology nodes. Bit interleaving, also called column multiplexing, is the conventional approach used to protect memory arrays from spatial multi-bit errors. In bit interleaving, bits belonging to multiple ECC check words are physically interleaved so that a spatial multi-bit error does not affect adjacent bits from a single check word. For SRAMs in a high-performance processor, however, our results indicate that interleaving beyond two-way is prohibitively expensive from a power perspective as a result of the additional precharging of bitlines from the interleaved data.
The bigger the L1 cache, the more they have to worry about soft errors (and what to do about them). This may not be as important a factor in keeping L1 caches small as some of the other things already mentioned, but when you're designing the critical path parts of a processor everything has an effect that has to be considered.
UberGerbil
Gerbil Khan
 
Posts: 9999
Joined: Thu Jun 19, 2003 3:11 pm

Re: Why are CPUs so parsimonious with L1 cache

Postposted on Sat Jan 29, 2011 4:32 pm

AMD apparently uses ECC at all levels for data cache: http://support.amd.com/us/Processor_TechDocs/46878.pdf

(Instruction cache only needs parity checking instead of full-blown ECC, since you can just reload the bad location from RAM if corruption is detected.)
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 38125
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer


Return to Processors

Who is online

Users browsing this forum: No registered users and 4 guests