So what you are saying is that 3 seconds doesn't make much difference, and you are correct. What you missed is that your "emulated" version on the Nexus one took 250% longer to run than the version on your desktop. Not a big deal when we are talking about less than ten seconds. But what about something that takes an hour to run on your desktop? Perhaps a POVRAY render or recoding that movie rip. Now it will take two and a half hours. That's a big difference. I happen to work in "The Real World (tm)" supporting some folks who actually make chips for a living and I can tell you that if Intel or AMD decided to drop the FPU and it a lot of what we do take 2.5x longer, that manufacturer would not have another processor in our company. Despite what you may claim, a large category of problems do not lend themselves well to massive parallelization due to data dependencies in the calculation, algorithmic limitations, IO requirements, etc.
Please name the problems you cite. If they are large enough that they are taking a noticeable amount of time, then I am certain that you will find a way to parallelize them. Google Chromium is an excellent example of this, where putting each page into its own separate process parallelized webpages rendering in a tabbed web browser, which was slow with the single renderer thread approach Firefox took. I doubt that everything you run is one massive problem that cannot be broken into separate threads and if it can, you can likely put it into a SIMD programming model. Regardless, everyone, everywhere agrees that the single threaded programming model is a dead-end in terms of performance. Any business that cannot parallelize its critical software applications will be killed by those that can, in which case, the strength of a single processing unit does not matter so long as you have a sufficiently large number of them.
wibeasley wrote:Here are two more GPGPU people who believe that that FPUs aren't unnecessary.Shining Arcanine wrote:I think you missed the bottomline, which is that floating point performance is not important in CPUs to the point where people should be arguing over how well AMD's floating point units in their new CPUs perform. That is why I asked why people care about it in the first place and it is also why I explained why the units are unnecessary. The performance of unnecessary units is not really an area that merits people's attention.http://www.sdtimes.com/content/article.aspx?ArticleID=34842&page=2Could this obviate the need for extensive concurrency training for software developers? Can they simply offload parallel computation to the GPU, which, unlike the CPU, has the potential to linearly scale performance the more cores it has? Can you just “fire it and forget it,” as Sanford Russell, general manager of CUDA and GPU Computing at Nvidia, puts it? Sorry, no.
“The goal is not to offload the CPU. Use CPUs for the things they’re best at and GPUs for the things they’re best at,” said (Mark) Murphy. An example is a magnetic resonance imaging reconstruction program that he found worked best on the six-core Westmere CPU. “The per-task working set just happened to be 2MB, and the CPU had 12MB of cache per socket and six cores," said Murphy. "So if you have a problem with 2MB or less per task, then it maps very beautifully to the L3 cache. Two L3 caches can actually supply data at a higher rate than the GPU can."
I think you are ignoring the point being that if I can make a decent argument for them being unnecessary, then their actual performance is not really something that should be a concern for people.