Thanks to its general-purpose computing push with CUDA, Nvidia has made some headway in the high-performance computing market. The firm's CUDA Zone page is riddled with sample applications ranging from academia to the oil-and-gas industry. David Kanter from Real World Technology thinks something is missing, though: error resilience at the memory level.
As Kanter's article points out, pretty much all servers ship with ECC system memory as standard these days. However, even Nvidia's expensive, HPC-targeted Tesla cards have plain GDDR3 memory just like their GeForce counterparts—there's just more of it. Without ECC, achieving reliably error-free output might have to involve running important calculations twice—not exactly good for performance.
Adding ECC support would involve "arrays of 9 DRAMs (instead of 8), extra pins to connect to the extra DRAMs and more logic in the memory controllers." Those changes would increase costs, but Nvidia should be able to absorb those in the relatively high price tags it commands for workstation- and server-class GPUs.
In fact, Kanter believes this design decision is so straightforward that we'll see ECC memory support in the company's next-generation graphics hardware. It's not a matter of if but when, he says.