From the beginning, both AMD and Nvidia have been heavily backing the OpenCL framework, which promises vendor-agnostic GPU computing goodness for all. If the contents of a thread in AMD's Developer Forums are any indication, however, owners of Radeon HD 4000-series graphics cards may end up with the short end of the stick when it comes to GPU computing.
In the thread, a developer who identifies himself as Matt Taylor wrote, "We're developing using openCL, and have one dev machine with an NVIDIA GTX 260, and another with an ATI 4870. . . I'm sorry to say we are getting approximately 5x the performance from the NVIDIA card, than from the ATI." Performance is so bad, Taylor adds, that his 2.4GHz Core 2 Quad processor outperforms the Radeon "by a factor of two."
AMD OpenCL Compiler Engineer Micah Villmow responded an hour later with the following:
This is entirely dependent on how you coded the kernel and what OpenCL features you are using. There are known performance issues for HD4XXX series of cards on OpenCL and there is currently no plan to focus exclusively on improving performance for that family. The HD4XXX series was not designed for OpenCL whereas the HD5XXX series was. There will be performance improvements on this series because of improvements in the HD5XXX series, so it will get better, but it is not our focus.
Villmow later qualified that response by saying, "[the Radeon HD 4870] just has to be programmed differently than the 5XXX series to get performance because of the lack of proper hardware local support. It is possible to get good performance, just not with a direct port from Cuda [Nvidia's GPU compute architecture]." He also stressed that AMD's compiler stack will include more device-specific optimizations as it matures.
In any case, this example doesn't bode well for pre-DirectX 11 Radeons in the coming wave of OpenCL applications.
We should of course point out that not all Nvidia cards are based on the same GT200 architecture as the GeForce GTX 260 that purportedly performed so well. G92-based offerings like the GeForce GTS 250, GeForce 9800 GT, and GeForce GTX 200M series make up a big chunk of Nvidia's current lineup, and they're all derived from the older G80 design. The G80 was Nvidia's first DirectX 10 architecture, and it might have some of the same hardware limitations as the Radeon HD 4000 series when it comes to GPU computing. (Thanks to Expreview for the link.)