Opteron NUMA, latency + bandwidth boost questions

Discussion of all forms of processors, from AMD to Intel to VIA.

Moderators: morphine, Flying Fox

Opteron NUMA, latency + bandwidth boost questions

Postposted on Mon Feb 14, 2005 2:02 pm

With the new NForce4 Pro boards, I am starting to appreciate the potential of NUMA, where each CPU has its own RAM and they share via Hypertransport. This gives almost twice the RAM bandwidth.

The overview on GamePC said that NUMA increases bandwidth, but also increases latency by 10%.
http://www.gamepc.com/labs/view_content ... pro&page=3

Why would latency increase? If all the RAM were hanging on one CPU, wouldn't there be just the same latency (over hypertransport) when cpu #2 asks for RAM from CPU #1?


If there IS a difference, could you tweak your setup and put say 2 SIMMS on each CPU, doing benchmarks for your app of interest, then putting all 4 SIMMS on one CPU and seeing which setup would work better for you? Or is there some restriction, like you must have RAM on both CPUS always?
spworley
Gerbil First Class
 
Posts: 159
Joined: Sun Jun 13, 2004 11:25 pm

Postposted on Mon Feb 14, 2005 2:09 pm

Numa? That's the name of my new puppy.

Image
VooBass
Gerbil XP
 
Posts: 497
Joined: Wed Jan 23, 2002 6:00 pm
Location: Canada

Postposted on Mon Feb 14, 2005 2:34 pm

Damned good lookin dog..or should i say horse :lol:
Just an old sheepdog waiting for some nasty wolves to show...ive got more than enough teeth left.
LicketySplit
Gerbil God
 
Posts: 24507
Joined: Sat Jan 19, 2002 6:00 pm
Location: Soap Lake, Wa

Postposted on Mon Feb 14, 2005 2:41 pm

LicketySplit wrote:Damned good lookin dog..or should i say horse :lol:


Lion, actually. Numa means a male lion, which seemed like an appropriate name for a Leonberger (lion hill) puppy. He's 4 months old and already 60 lbs.!
VooBass
Gerbil XP
 
Posts: 497
Joined: Wed Jan 23, 2002 6:00 pm
Location: Canada

Re: Opteron NUMA, latency + bandwidth boost questions

Postposted on Mon Feb 14, 2005 3:43 pm

spworley wrote:Why would latency increase? If all the RAM were hanging on one CPU, wouldn't there be just the same latency (over hypertransport) when cpu #2 asks for RAM from CPU #1?

That should be the latency that they are talking about. Remember there is inherent latency from the memory request arriving at the RAM stick and the actual data coming out on the data lines.

BTW, that small A8N-L looks like a good cheap board to get into SMP. Too bad there aren't more PCIe slots.
Flying Fox
Gerbil God
 
Posts: 21718
Joined: Mon May 24, 2004 1:19 am

Re: Opteron NUMA, latency + bandwidth boost questions

Postposted on Mon Feb 14, 2005 4:21 pm

Flying Fox wrote:
spworley wrote:Why would latency increase? If all the RAM were hanging on one CPU, wouldn't there be just the same latency (over hypertransport) when cpu #2 asks for RAM from CPU #1?

That should be the latency that they are talking about. Remember there is inherent latency from the memory request arriving at the RAM stick and the actual data coming out on the data lines.


Well, there's one hop of latency from RAM to the CPU. If the other CPU is the one that needs the data, then there's a second hop from CPU 1 to CPU2. A single CPU machine never has that second hop, so its latency is smaller.

But my question is why latency would be increased by having RAM hanging off both CPUs as opposed to just one. If the RAM is on just one cpu, then EVERY access by CPU #2 will have the hypertransport hop. If RAM is on both CPUS, then half of CPU #1's queries will need a hop, and half of CPU #2's queries will need a hop, and the net average latency should be the same, shouldn't they?

The best way to tell is to actually try it with a NUMA motherboard I guess. :-)

VooBass, you get immediate cute puppy points. :)
spworley
Gerbil First Class
 
Posts: 159
Joined: Sun Jun 13, 2004 11:25 pm

Postposted on Mon Feb 14, 2005 11:06 pm

It's because the stuff stored in one CPU's bank isn't completely replicated in the other CPU's RAM bank. Every time CPU0 asks for data that is in CPU1's RAM bank, the data fetch request hast to go over the HT bus which results in an initial latency hit of about 17-20ns. A NUMA aware OS will be better at keeping NUMA transfers to a bare minimum which will result in a lower over all latency increase.
Adamantine
Gerbil Team Leader
 
Posts: 230
Joined: Mon Feb 14, 2005 10:41 pm
Location: MI

Postposted on Tue Feb 15, 2005 3:17 am

Adamantine wrote: A NUMA aware OS will be better at keeping NUMA transfers to a bare minimum which will result in a lower over all latency increase.


OK, so this makes sense.. a NUMA aware OS will try to optimize its RAM to reduce the necessary communication over the HT link. In the optimal case, each CPU will be accessing ONLY the RAM on its own bank and never needs to query RAM via HT at all.
Does the OS actively shuffle actual RAM storage as needed? Ie, move stored bytes from one bank to the other just to have it easier to access for the CPU which has "affinity" for that chunk of bytes? That would be an impressive proactive optimization.

So this means that a NUMA aware OS with RAM attached to each CPU would have LESS latency, on average, than the motherboards with RAM off of just one CPU. This makes sense too. So the GamePC review was wrong about a 10% penalty, it's actually LOWER latency on average, not higher, though it'd be dependent on the OS's smarts to arrange for that more efficient RAM distribution.

So it would make no sense at all to put all your RAM on one CPU, it never hurts to use both CPU's slots, and it may help.

Is this right? Seems logical now after thinking through it all.
spworley
Gerbil First Class
 
Posts: 159
Joined: Sun Jun 13, 2004 11:25 pm

Postposted on Tue Feb 15, 2005 4:36 am

spworley wrote:So it would make no sense at all to put all your RAM on one CPU, it never hurts to use both CPU's slots, and it may help.


Except that it's easier to make, easier to design, and easier to troubleshoot I'd imagine as well.

All of which make it nice 'n' cheap, at the expense of a percentage or two here and there that might not even get noticed if the stuff being done has bottlenecks elsewhere.

Even if it does get noticed, the motherboard is cheaper, so you should know better anyways :lol:,
-Mole
Living proof of John Gabriel's theorem
IntelMole
Grand Gerbil Poohbah
 
Posts: 3526
Joined: Sat Dec 29, 2001 6:00 pm
Location: The nearest pub

Postposted on Tue Feb 15, 2005 6:29 am

spworley wrote:So the GamePC review was wrong about a 10% penalty, it's actually LOWER latency on average, not higher, though it'd be dependent on the OS's smarts to arrange for that more efficient RAM distribution.
I think GamePC was comparing latencies between that NUMA system vs a single Athlon64 dual channel system. In that regard the NUMA system will be a bit off. That's why you still want to game on a single proc system, until games take advantage of that 2nd CPU and the performance gain outstripped the latency loss.
Flying Fox
Gerbil God
 
Posts: 21718
Joined: Mon May 24, 2004 1:19 am

Postposted on Tue Feb 15, 2005 12:49 pm

Flying Fox wrote:I think GamePC was comparing latencies between that NUMA system vs a single Athlon64 dual channel system.


No, it was comparing latencies on the same motherboard, measured with RAM off of both CPUs, versus measured with RAM off of just one CPU.

"In our tests, we found that with modules connected to each CPU, our latencies were roughly 10% higher compared to if all of our modules were connected through a single memory bus."
http://www.gamepc.com/labs/view_content ... pro&page=3

I am really bugged by this and am picking all of you experts's brains to help me understand why. :-) My current theory is just a bad BIOS.. the GamePC review SAYS that NUMA wasn't working right on their board because of an early BIOS, but doesn't blame the latency difference on that, just a poor bandwidth measurement.
spworley
Gerbil First Class
 
Posts: 159
Joined: Sun Jun 13, 2004 11:25 pm

Postposted on Tue Feb 15, 2005 2:32 pm

spworley wrote:I am really bugged by this and am picking all of you experts's brains to help me understand why. :-) My current theory is just a bad BIOS.. the GamePC review SAYS that NUMA wasn't working right on their board because of an early BIOS, but doesn't blame the latency difference on that, just a poor bandwidth measurement.

If you look at the games comparison the single CPU A64 edged out the NUMA systems. That's because games love low latencies. Other than that generally bandwidth > slight latency gain.

And "single memory bus" can mean the 1-CPU system too? :roll:
Flying Fox
Gerbil God
 
Posts: 21718
Joined: Mon May 24, 2004 1:19 am


Return to Processors

Who is online

Users browsing this forum: No registered users and 0 guests