With the new NForce4 Pro boards, I am starting to appreciate the potential of NUMA, where each CPU has its own RAM and they share via Hypertransport. This gives almost twice the RAM bandwidth.
The overview on GamePC said that NUMA increases bandwidth, but also increases latency by 10%.
http://www.gamepc.com/labs/view_content ... pro&page=3
Why would latency increase? If all the RAM were hanging on one CPU, wouldn't there be just the same latency (over hypertransport) when cpu #2 asks for RAM from CPU #1?
If there IS a difference, could you tweak your setup and put say 2 SIMMS on each CPU, doing benchmarks for your app of interest, then putting all 4 SIMMS on one CPU and seeing which setup would work better for you? Or is there some restriction, like you must have RAM on both CPUS always?

