Shining Arcanine wrote:Outside of legacy scientific computing software where you can wait months or even years for computations to finish, I am not sure why anyone would need a hardware floating point unit in their CPU. Processors are fast enough that that the things that hardware floating point units made computable per unit time 10 years ago are computable per unit time with compiler generated integer instructions today. Aside from legacy scientific computing software, there is no killer application that takes advantage of hardware floating point units in CPUs, because even if the CPU is as optimal as possible, it is still too slow. Having these calculations be done on GPUs is the way forward and it is not just me who thinks this.
Dude! Why weren't you around when Intel and AMD were spending all that time adding floating point hardware to their CPUs years ago. You could have saved them so much time and money! Obviously, they were mislead as to this particular need.
... or maybe you need to get out more, and there are a helluva lot more applications that benefit greatly from hardware FP than you claim. Ever had to recompute a massive spreadsheet that took more than an hour with hardware FP? It would take days with software FP. And then there is stuff like simple gaming, and its close cousin simulation. AMD suffered big time in the K6 days because their hardware FP wasn't as good as Intel's; something they fixed with the Athlon. Not to mention all the scientific computing you just mentioned, which may or may not fit a CUDA-like model. Your view of the computing world appears to be exceedingly small.
Your approach reminds me of grid computing. You can push stuff into a grid and take advantage of massively parallel computational power. Something that might otherwise take days can be done in minutes, making very complex problems rather simple. That is, if it fits the grid paradigm. Of course, you have to re-architect the solution to this completely non-traditional paradigm. And random data access is very different -- you can't just query a SQL database. Grid data is distributed in chunked files around the grid for fast parallel access, but is extremely inefficient to access in a random access pattern. You could put an actual SQL database on the grid, but it would likely melt down as many thousands of processes try to access the data at the same time, since it's not designed for these sorts of access patterns.
The point is, as others have quite eloquently pointed out, not everything fits the GPU model, and even if it did, GPUs are not consistently available, consistently featureful, or even consistently of the same API. Maybe some day when GPU units are built into every processor, ala AMD Fusion, the streaming processors can be more closely integrated with the CPU. But that's a long way off. What we have now is clumsily integrated and must be explicitly accommodated.
Sorry, but you're completely off base in your analysis. Yes, for
certain problems GPU-based solutions are awesome, just as for
certain problems grid-based parallelism is awesome. But the problem must fit the solution space in this particular case, rather than the other way around.