Crayon Shin Chan wrote:I use doubles to calculate the quadratic equation, and for a fuzzy logic project. It's in the hardware because a long time ago people realized this needed to be accelerated. Let's keep the FPU inside the CPU, it's more convenient for everybody that way.
I used double variables in a homework assignment I did yesterday for my Numerical Analysis class. I was doing Newtonian mechanics simulations and at the scale I was doing them, the presence of hardware floating point units did not make a difference in whether or not it finished in an acceptable time-frame. If it did matter, I could have used the Runge-Kutta method instead of Euler's method to obtain solutions of the differential equations involved.
Outside of legacy scientific computing software where you can wait months or even years for computations to finish, I am not sure why anyone would need a hardware floating point unit in their CPU. Processors are fast enough that that the things that hardware floating point units made computable per unit time 10 years ago are computable per unit time with compiler generated integer instructions today. Aside from legacy scientific computing software, there is no killer application that takes advantage of hardware floating point units in CPUs, because even if the CPU is as optimal as possible, it is still too slow. Having these calculations be done on GPUs is the way forward and it is not just me who thinks this. The NCSA director made public comments on this recently, which are identical to what I am saying:
http://insidehpc.com/2010/11/02/ncsa-di ... computing/General purpose logic is always slower than dedicated logic. This somewhat contradicts historical experience, but historically, since clock speeds increased with transistor budgets, economics of scale enabled companies like Intel to take advantage of higher clock speeds and greater transistor budgets from more advanced process technology and perform well enough that the dedicated hardware could not compete from a performance/price perspective. Today, since you cannot get faster clock speeds from more advanced process technologies, you must to add constraints on how things are done to continue scaling and specifically, those constraints are that the same function is done on independent data in parallel, which is stream processing. If you go further back in history to the advent of the CPU, you would find that simply doing things on the CPU placed constraints on how things are done and it only makes sense that moving forward beyond what the CPU enabled would require additional constraints.
Furthermore, it is difficult to scale floating point computation intensive calculations without doing the same functions independently of one another and if you do them independently of one another, you have an application that exploits stream processing. It is so difficult to scale such calculations that as far as I know, there does not exist a single application that does floating point computation intensive calculations, which both is not a stream processing application and can be accelerated by SMP CPUs. With that in mind, I do not see how the presence of a hardware floating point units helped your project. It seems to me that you are crediting a very specific approximation of a deterministic turing machine for what is given to you by a much larger category of approximations of deterministic turing machines. How is that not the case?
bitvector wrote:dkanter wrote:GMoreover, GPUs are very brittle tools and perform poorly for software that has irregular control flow, data structures or communication between threads. On any of those cases, CPUs tend to win. And most efficient algorithms for FP workloads tend to use a lot of communication to reduce computation by many orders of magnitude.
Yeah, and to expand on the communication part -- often people ignore the break-even points of using the GPU at all (based on the size of the problem). The overhead involved in getting your dataset into the GPU's accessible memory, arranging the data to be effective for GPU computation and getting it back out again can be substantial and make the overall gains much lower than just comparing the raw computation alone.
A research group at my alma mater published a workshop paper called
"On the Limits of GPU Acceleration" where they model these ratios and calculate the break-even points for different problem types. And of course, the sentiment you're expressing above about the limited applicability of GPGPU to a narrow set of domains is a given.
If it goes slower below the break-even point, then that is fine. As long as it goes faster above it, it is likely that no one will care in 10 years. That is done all the time in computer programming.
bitvector wrote:Shining Arcanine wrote:Did you know that close to zero of the commonly-used server software requires a floating point unit?
Server is a hugely ambiguous and overloaded term; sometimes "server" just means managed machines in data centers, which could be running large scale machine learning batch jobs, data mining, simulations, or any number of things. What do
you mean by server?
A server is a machine in a standard ATX or blade case that is dedicated to handling multiple users. Perhaps I should have been more clear on that, as you are right that the term is too abstract to discuss specific things about it.
Disclaimer: I over-analyze everything, so try not to be offended if I over-analyze something you wrote.