Personal computing discussed
Moderators: renee, Flying Fox, morphine
memory bandwidth starved...
WhatMeWorry wrote:memory bandwidth starved...
Ignorant question here, so why is the CPU bandwith so much less than the GPU bandwidth (that's PCIe, right?) Is this a historical limitation going back to the original 8086 CPU or the PC bus architecture?
If we knew what we know now (or if we knew where things stood currently) would the original PC architecture be engineered differently?
Guess it is a too late to start over with a clean slate?
WhatMeWorry wrote:memory bandwidth starved...
Ignorant question here, so why is the CPU bandwith so much less than the GPU bandwidth (that's PCIe, right?)
WhatMeWorry wrote:Ignorant question here, so why is the CPU bandwith so much less than the GPU bandwidth?
phileasfogg wrote:
I don't mean to nitpick, but that bandwidth is 25.6GB/s, not 12.8. But yes, it is far, far below the 250+ GB/s available on the AMD7970 GPU!
JustAnEngineer wrote:Fredric Brown summed it up in a very short story written in 1954.
http://www.roma1.infn.it/~anzel/answer.html
just brew it! wrote:Texturing operations still require a fair bit of random access to read the source textures...
chuckula wrote:just brew it! wrote:Texturing operations still require a fair bit of random access to read the source textures...
Ahh... Texture Atlases are your friends. You pack lots of mini textures that have spatial locality into a single chunk of memory and then bind the whole atlas at once.
just brew it! wrote:chuckula wrote:just brew it! wrote:Texturing operations still require a fair bit of random access to read the source textures...
Ahh... Texture Atlases are your friends. You pack lots of mini textures that have spatial locality into a single chunk of memory and then bind the whole atlas at once.
That avoids the (substantial) overhead of repeatedly binding the textures, but does not help with the random access issue. Unless the entire thing fits into an internal cache inside the GPU you still need to pull the sub-textures out of the atlas in the order that they are requested by the texturing pipelines, and this is not going to result in the entire block of memory being streamed out sequentially!
Edit: In fact, it potentially causes the memory accesses made by the texturing pipelines to be *less* sequential, since you're pulling a small rectangular region of a large texture instead of an entire small texture. (It's still a net win because the overhead of changing the texture bindings is huge compared to the penalty for having to do what amounts to a gather operation on the rows of the source texture data.)