Floating-point units in server-grade CPUs

Discussion of all forms of processors, from AMD to Intel to VIA.

Moderators: Flying Fox, morphine

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 11:41 am

Flying Fox wrote:IIRC the FP units in modern day CPUs are doing double duty running SSE instructions too? So there is still a use for that. Simply put, the unit has been integrated, not taking up that much die space, and the vendors just leave it there. How much do you think Intel/AMD/SunOracle/etc can save if they take out the FPU from their cores? $5 from the chip price?


If they play their cards right, it could be worth another core, although that depends on how many cores the chip has in the first place.

Flying Fox wrote:
Shining Arcanine wrote:A server is a machine in a standard ATX or blade case that is dedicated to handling multiple users. Perhaps I should have been more clear on that, as you are right that the term is too abstract to discuss specific things about it.
Great, all those 1U-4U rack mounted computers are not servers anymore. :roll:


Pardon my lack of IT experience to make the distinction, but 1U-4U rack mounted computers are all blades to me.


Buub wrote:
Shining Arcanine wrote:Outside of legacy scientific computing software where you can wait months or even years for computations to finish, I am not sure why anyone would need a hardware floating point unit in their CPU. Processors are fast enough that that the things that hardware floating point units made computable per unit time 10 years ago are computable per unit time with compiler generated integer instructions today. Aside from legacy scientific computing software, there is no killer application that takes advantage of hardware floating point units in CPUs, because even if the CPU is as optimal as possible, it is still too slow. Having these calculations be done on GPUs is the way forward and it is not just me who thinks this.

Dude! Why weren't you around when Intel and AMD were spending all that time adding floating point hardware to their CPUs years ago. You could have saved them so much time and money! Obviously, they were mislead as to this particular need.


Stream processing had not been invented at the time. CPUs with hardware floating point units needed to exist before people would see the need for GPUs and contribute to their eventual evolution into stream processors. It is like how you needed the vacuum tubes to exist before people would see the need for transistors and contribute to the creation of the integrated circuit.

Another way of thinking is that people could have told IBM about the utility of using electricity for doing calculations to save time and money making the Automatic Sequence Controlled Calculator.

Buub wrote:... or maybe you need to get out more, and there are a helluva lot more applications that benefit greatly from hardware FP than you claim. Ever had to recompute a massive spreadsheet that took more than an hour with hardware FP? It would take days with software FP. And then there is stuff like simple gaming, and its close cousin simulation. AMD suffered big time in the K6 days because their hardware FP wasn't as good as Intel's; something they fixed with the Athlon. Not to mention all the scientific computing you just mentioned, which may or may not fit a CUDA-like model. Your view of the computing world appears to be exceedingly small.


AMD suffered a great deal during that time for a multitude of reasons, not the least of which were poor chipsets made by third party manufacturers. Poor floating point performance was not a singular cause for their fiscal performance.

By the way, scientific computing is designed to fit the hardware provided to it. If that were not the case, we would have 10THz CPUs being fabricated for scientific computing right now.

Buub wrote:Your approach reminds me of grid computing. You can push stuff into a grid and take advantage of massively parallel computational power. Something that might otherwise take days can be done in minutes, making very complex problems rather simple. That is, if it fits the grid paradigm. Of course, you have to re-architect the solution to this completely non-traditional paradigm. And random data access is very different -- you can't just query a SQL database. Grid data is distributed in chunked files around the grid for fast parallel access, but is extremely inefficient to access in a random access pattern. You could put an actual SQL database on the grid, but it would likely melt down as many thousands of processes try to access the data at the same time, since it's not designed for these sorts of access patterns.


That is a fair summary of the problems that have occurred repetitively across the entire history of computing. They are nothing new and they will never go away. Your computer has these issues internally right now, only it is not as obvious unless you are either writing an operating system or designing hardware for its replacement.

Buub wrote:The point is, as others have quite eloquently pointed out, not everything fits the GPU model, and even if it did, GPUs are not consistently available, consistently featureful, or even consistently of the same API. Maybe some day when GPU units are built into every processor, ala AMD Fusion, the streaming processors can be more closely integrated with the CPU. But that's a long way off. What we have now is clumsily integrated and must be explicitly accommodated.


The same is true for CPUs and human computers. Not every fits the CPU model as it is currently done and even if it did, CPUs are not consistently available, consistently featureful, or even consistently of the same API. Maybe some day CPUs will be ubiquitous, but that is a long way off. What CPUs are now are clumsily used and must be explicitly accommodated.

Buub wrote:Sorry, but you're completely off base in your analysis. Yes, for certain problems GPU-based solutions are awesome, just as for certain problems grid-based parallelism is awesome. But the problem must fit the solution space in this particular case, rather than the other way around.


Science has always required that you find a way to force problems into a solution space, rather than the other way around. That is nothing new.
Last edited by Shining Arcanine on Fri Nov 05, 2010 11:55 am, edited 1 time in total.
Disclaimer: I over-analyze everything, so try not to be offended if I over-analyze something you wrote.
Shining Arcanine
Gerbil Jedi
 
Posts: 1717
Joined: Wed Jun 11, 2003 11:30 am

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 11:44 am

Shining Arcanine wrote:Outside of legacy scientific computing software where you can wait months or even years for computations to finish, I am not sure why anyone would need a hardware floating point unit in their CPU. Processors are fast enough that that the things that hardware floating point units made computable per unit time 10 years ago are computable per unit time with compiler generated integer instructions today. Aside from legacy scientific computing software, there is no killer application that takes advantage of hardware floating point units in CPUs, because even if the CPU is as optimal as possible, it is still too slow. Having these calculations be done on GPUs is the way forward and it is not just me who thinks this.


I think the issue here is that you are looking forward as to how future programing will be done. The issue is that there are more than a few legacy systems that are in regular use that will require the fpu on server chips, and these apps just aren't going anywhere. Could it be done efficiently on a gpu? Yes. Cost effective is an entirely different matter. The truth of the matter is that fpu perforance on cpus matter because enormous applications exist and are not going anywhere (more's the pity). The other real word issue is the java performance on existing apps. I can think of my own company in this regard. I have access to 15 apps that are java based and accessed through Citrix. 3 of these apps are critical to day-to-day work. As in, if they go down, we lose an enormous amount of money. And when we bring up a new server farm, were Intel to scrap the fpu and AMD kept it, who do you think would get our business? I know when it comes time to look at server options, my CIO is going to say "No" to moving our floating point calculations to gpus. It isn't worth the cost or risk for systems that are used by 40,000 employess and 10,000 vendor/contractors.

sound business > efficient coding

On a smaller scale, Android developers have learned quite rapidly how necessary an fpu is with the ARM procs. Particularly in Java with 64-bit double precision. And the gpu just isn't an option. Emulating 64-bit double precision is terrible slow.
Sony a7
Sony Zeiss 55/1.8 SSM, 24-70/4 SSM
Minolta 17-35/2.8-4 D, 100-300 APO
TheEmrys
Minister of Gerbil Affairs
Silver subscriber
 
 
Posts: 2150
Joined: Wed May 29, 2002 8:22 pm
Location: Northern Colorado

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 12:08 pm

Shining Arcanine wrote:
Flying Fox wrote:Great, all those 1U-4U rack mounted computers are not servers anymore. :roll:
Pardon my lack of IT experience to make the distinction, but 1U-4U rack mounted computers are all blades to me.
http://en.wikipedia.org/wiki/Blade_(disambiguation)

"Blade server, a self-contained computer server, designed for high density"

And besides the fact that blades are typically described as servers, no 3U or 4U rackmount machine would likely be described as a blade. Blades are thin (hence the namesake), usually 0.5-1U. You use this terminology in your argument but then you don't know what it means: don't you see how that could be problematic for your claims?
bitvector
Grand Gerbil Poohbah
 
Posts: 3234
Joined: Wed Jun 22, 2005 4:39 pm
Location: Mountain View, CA

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 12:11 pm

If they play their cards right, it could be worth another core, although that depends on how many cores the chip has in the first place.


Improbable if not bordering on impossible.

It is this precise action that lead to the Pentium FDIV problem. They wanted to shrink the size of the FPU in order to reduce die size. The proof they used to illustrate that they could remove values from the lookup table was faulty and the validators somehow were asleep at the wheel and just accepted the change without testing.

The hilarity really kicks off when you realize that the FPU was more or less in the center of the Pentium die. If you remove Ohio from the United States, the overall size of the nation does not change. So the change was needless as the original premise of die space savings was a incorrect.

If the FPU on .35nm was only a small portion within the die, then it's very unlikely that at .045 it would suddenly yield enough space to become an entire core.

Not to mention that neither Intel or AMD work that way. The FPU will never go away. If Intel or AMD were big into ditching legacy then their wouldn't be such a huge focus on carrying forward the less desirable parts of x86. I'm sure nothing would make them happier than to be able to ditch all the corner cases they have to carry forward from the past, but that's not the reality.
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3521
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 12:11 pm

TheEmrys wrote:
Shining Arcanine wrote:Outside of legacy scientific computing software where you can wait months or even years for computations to finish, I am not sure why anyone would need a hardware floating point unit in their CPU. Processors are fast enough that that the things that hardware floating point units made computable per unit time 10 years ago are computable per unit time with compiler generated integer instructions today. Aside from legacy scientific computing software, there is no killer application that takes advantage of hardware floating point units in CPUs, because even if the CPU is as optimal as possible, it is still too slow. Having these calculations be done on GPUs is the way forward and it is not just me who thinks this.


I think the issue here is that you are looking forward as to how future programing will be done. The issue is that there are more than a few legacy systems that are in regular use that will require the fpu on server chips, and these apps just aren't going anywhere. Could it be done efficiently on a gpu? Yes. Cost effective is an entirely different matter. The truth of the matter is that fpu perforance on cpus matter because enormous applications exist and are not going anywhere (more's the pity). The other real word issue is the java performance on existing apps. I can think of my own company in this regard. I have access to 15 apps that are java based and accessed through Citrix. 3 of these apps are critical to day-to-day work. As in, if they go down, we lose an enormous amount of money. And when we bring up a new server farm, were Intel to scrap the fpu and AMD kept it, who do you think would get our business? I know when it comes time to look at server options, my CIO is going to say "No" to moving our floating point calculations to gpus. It isn't worth the cost or risk for systems that are used by 40,000 employess and 10,000 vendor/contractors.

sound business > efficient coding

On a smaller scale, Android developers have learned quite rapidly how necessary an fpu is with the ARM procs. Particularly in Java with 64-bit double precision. And the gpu just isn't an option. Emulating 64-bit double precision is terrible slow.


I do not think you understand your position. Java bytecode runs on the Java virtual machine, so dealing with processors that lack floating point units is a task that the JVM developers must handle and you are completely insulated from it. An alternative would be moving to an operating system whose kernel emulates floating point instructions, which is acceptable for legacy programs and does not require that your organization write any new code.

I think that the issue is that people think that floating point units are required for doing floating point operations, which is false. Both your CPU and GPU can lack floating point units and you will still be able to get things done. Unless your organization's software was written in assembly, it is not your problem. Instead, it is a problem for compiler writers, which for the most part, would be Microsoft in the case of the Windows platform, open source developers in the case of the UNIX platform and Oracle in the case of the Java platform. The Java platform is an oddball, because there is an additional stage of compilation between the machine and the developer, which means that the JVM needs to be rewritten instead of the compiler that is provided to software developers.

As for Android, I recently ran Sunspider on a Google Nexus One, which did not take much longer to complete than Sunspider in Google Chromium on my laptop's Intel Core T2400 processor. I think that the difference was 5 seconds versus 2 seconds, which is negligible. As mentioned earlier, Javascript relies solely on floating point operations, so it would seem that the "emulation" is not as slow as you would think it is. After all, the emulation involved is simply the compiler emulation language features rather than an interpreter emulating a machine. The difference between those two concepts is on several orders of magnitude. Note that I am assuming that you are correct in saying that Android phones lack floating point units, which is an assumption that I think is false, but it does not at all affect the importance of floating point hardware. I could specify "importance or lack of importance", but that would be redundant; it is unfortunate that I feel that there is a need to enumerate that.

Ryu Connor wrote:
If they play their cards right, it could be worth another core, although that depends on how many cores the chip has in the first place.


Improbable if not bordering on impossible.

It is this precise action that lead to the Pentium FDIV problem. They wanted to shrink the size of the FPU in order to reduce die size. The proof they used to illustrate that they could remove values from the lookup table was faulty and the validators somehow were asleep at the wheel and just accepted the change without testing.

The hilarity really kicks off when you realize that the FPU was more or less in the center of the Pentium die. If you remove Ohio from the United States, the overall size of the nation does not change. So the change was needless as the original premise of die space savings was a incorrect.

If the FPU on .35nm was only a small portion within the die, then it's very unlikely that at .045 it has become big enough to yield enough space to become an entire core.

Not to mention that neither Intel or AMD work that way. The FPU will never go away. If Intel or AMD were big into ditching legacy then their wouldn't be such a huge focus on carrying forward the less desirable parts of x86. I'm sure nothing would make them happier than to be able to ditch all the corner cases they have to carry forward from the past, but that's not the reality.


Most of the die is cache. The actual cores are small relative to that and the floating point units are large relative to the things that need to exist inside the cores.

The original Pentium die implemented floating point operations using x87, which is a stack based approach that is used to minimize the die area required to implement floating point operations. The instructions used in x86 and x87 are fundamentally different in that respect and while that approach was good for its time, it is slow and it is a reason why programmers of the era tried to avoid using floating point operations. A good floating point implementation does not take a stack based approach to computation, which is why SSE introduced new registers and instructions that enabled people to do floating point calculations the way that they would have been implemented in the Pentium processor had die space not been an issue.
Disclaimer: I over-analyze everything, so try not to be offended if I over-analyze something you wrote.
Shining Arcanine
Gerbil Jedi
 
Posts: 1717
Joined: Wed Jun 11, 2003 11:30 am

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 12:25 pm

Shining Arcanine wrote:I do not think you understand your position. Java bytecode runs on the Java virtual machine, so dealing with processors that lack floating point units is a task that the JVM developers must handle and you are completely insulated from it. An alternative would be moving to an operating system whose kernel emulates floating point instructions, which is acceptable for legacy programs and does not require that your organization write any new code.

As usual, you've completely missed the point. The problem he was referring to isn't that Java coders need to write their own floating point math routines (obviously they don't); the problem is that emulated floating point is godawful slow.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37632
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 12:30 pm

just brew it! wrote:
Shining Arcanine wrote:I do not think you understand your position. Java bytecode runs on the Java virtual machine, so dealing with processors that lack floating point units is a task that the JVM developers must handle and you are completely insulated from it. An alternative would be moving to an operating system whose kernel emulates floating point instructions, which is acceptable for legacy programs and does not require that your organization write any new code.

As usual, you've completely missed the point. The problem he was referring to isn't that Java coders need to write their own floating point math routines (obviously they don't); the problem is that emulated floating point is godawful slow.


Do you realize that there are different types of emulation and even different types of slowness? Your computer emulates a Turing machine and it is quite slow at it, but I have yet to see complaints about either of those things.

By the way, compilers are theoretically capable of inlining function calls to functions that emulate floating point instructions at the site of every floating point instruction in their abstract syntax tree, which can be done in a single pass before any optimizations are done. While I have not done it myself, I know that it is being done for certain types of ARM processors that lack floating point units and it is the right way to port programs that use floating point numbers like the Java virtual machine to processors that lack floating point hardware. It is also orders of magnitude faster than having the kernel emulate each explicit instruction because it does not involve the use of system interrupts, yet it is still a form of emulation and it is not much slower than having a hardware floating point unit in the first place. In rare cases, it could even be faster. It depends on what what is being calculated and how the compiler's optimization stage is designed.
Last edited by Shining Arcanine on Fri Nov 05, 2010 12:50 pm, edited 1 time in total.
Disclaimer: I over-analyze everything, so try not to be offended if I over-analyze something you wrote.
Shining Arcanine
Gerbil Jedi
 
Posts: 1717
Joined: Wed Jun 11, 2003 11:30 am

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 12:40 pm

Shining Arcanine wrote:Your computer emulates a Turing machine and it is quite slow at it, but I have yet to see complaints about either of those things.

:lol: :lol: :lol:
Can somebody explain the difference between emulating a Turing Machine and fitting within the class of machines which can be described by a Turing Machine? I would but he blocked me for proving him wrong too many times.
Core i7 920, 3x2GB Corsair DDR3 1600, 80GB X25-M, 1TB WD Caviar Black, MSI X58 Pro-E, Radeon 4890, Cooler Master iGreen 600, Antec P183, opticals
SNM
Emperor Gerbilius I
 
Posts: 6206
Joined: Fri Dec 30, 2005 10:37 am

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 1:10 pm

Shining Arcanine wrote:I do not think you understand your position. Java bytecode runs on the Java virtual machine, so dealing with processors that lack floating point units is a task that the JVM developers must handle and you are completely insulated from it. An alternative would be moving to an operating system whose kernel emulates floating point instructions, which is acceptable for legacy programs and does not require that your organization write any new code.


Umm.... sort of. JVM coulddo it, but its really the slowness of 64-bit double precision emulation where it becomes a problem. And I should be insulated in theory. But this is where theory sort of runs out. If you think Oracle is going to come out with a revolutionary system that off-loads fp operations to a gpu or a cpu emulator and then guarantee it will work on every piece of software that uses java.... it just isn't going to happen. And that is what is needed for 99% of existing businesses who would consider switching. It has to be guaranteed, and most wouldn't do it even if it were.

And changing on OS's really isn't an going to be an option. We can't even change web browsers (yes well be in IE6 and WinXP on user stations for the next few years). Get an internship at a telco or the federal government. There will be a complete lack of disconnect for you. Particularly when you are working with systems that pre-date computers and where up-time is vastly more important than efficiency.
Sony a7
Sony Zeiss 55/1.8 SSM, 24-70/4 SSM
Minolta 17-35/2.8-4 D, 100-300 APO
TheEmrys
Minister of Gerbil Affairs
Silver subscriber
 
 
Posts: 2150
Joined: Wed May 29, 2002 8:22 pm
Location: Northern Colorado

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 1:22 pm

Bottom line of all this misinformation: Yes, it is theoretically possible to remove FPUs from modern x86 processors with enough software support. No, you would never want to do such foolish thing even if you could squeeze an extra INT-only core in place of the FPU. Unless your goal is to return to FP performance levels from 15+ years ago.

Just because something can be done, doesn't mean it should be done.
"I take sibling rivalry to the whole next level, if it doesn't require minor sugery or atleast a trip to the ER, you don't love her." - pete_roth
"Yeah, I see why you'd want a good gas whacker then." - VRock
dextrous
Gerbil Elite
 
Posts: 563
Joined: Mon Nov 22, 2004 1:49 pm
Location: Ooooooooooklahoma

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 1:31 pm

TheEmrys wrote:
Shining Arcanine wrote:I do not think you understand your position. Java bytecode runs on the Java virtual machine, so dealing with processors that lack floating point units is a task that the JVM developers must handle and you are completely insulated from it. An alternative would be moving to an operating system whose kernel emulates floating point instructions, which is acceptable for legacy programs and does not require that your organization write any new code.


Umm.... sort of. JVM coulddo it, but its really the slowness of 64-bit double precision emulation where it becomes a problem. And I should be insulated in theory. But this is where theory sort of runs out. If you think Oracle is going to come out with a revolutionary system that off-loads fp operations to a gpu or a cpu emulator and then guarantee it will work on every piece of software that uses java.... it just isn't going to happen. And that is what is needed for 99% of existing businesses who would consider switching. It has to be guaranteed, and most wouldn't do it even if it were.

And changing on OS's really isn't an going to be an option. We can't even change web browsers (yes well be in IE6 and WinXP on user stations for the next few years). Get an internship at a telco or the federal government. There will be a complete lack of disconnect for you. Particularly when you are working with systems that pre-date computers and where up-time is vastly more important than efficiency.


Why do you think that the 64-bit performance an issue, but not the 32-bit performance?

By the way, Microsoft is capable of patching Windows XP to emulate floating point instructions on Intel-compatible processors that lack them. It is incredibly easy to do in comparison to writing a real emulator. One programmer could probably do it in a week assuming he is familiar with the design of the NT kernel.

dextrous wrote:Bottom line of all this misinformation: Yes, it is theoretically possible to remove FPUs from modern x86 processors with enough software support. No, you would never want to do such foolish thing even if you could squeeze an extra INT-only core in place of the FPU. Unless your goal is to return to FP performance levels from 15+ years ago.


I think you missed the bottomline, which is that floating point performance is not important in CPUs to the point where people should be arguing over how well AMD's floating point units in their new CPUs perform. That is why I asked why people care about it in the first place and it is also why I explained why the units are unnecessary. The performance of unnecessary units is not really an area that merits people's attention.

dextrous wrote:Just because something can be done, doesn't mean it should be done.


Exactly. Why make the floating point units perform better when the calculations will be moving to the GPU anyway?
Disclaimer: I over-analyze everything, so try not to be offended if I over-analyze something you wrote.
Shining Arcanine
Gerbil Jedi
 
Posts: 1717
Joined: Wed Jun 11, 2003 11:30 am

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 1:39 pm

Shining Arcanine wrote:
Crayon Shin Chan wrote:I use doubles to calculate the quadratic equation, and for a fuzzy logic project. It's in the hardware because a long time ago people realized this needed to be accelerated. Let's keep the FPU inside the CPU, it's more convenient for everybody that way.


I used double variables in a homework assignment I did yesterday for my Numerical Analysis class. I was doing Newtonian mechanics simulations and at the scale I was doing them, the presence of hardware floating point units did not make a difference in whether or not it finished in an acceptable time-frame. If it did matter, I could have used the Runge-Kutta method instead of Euler's method to obtain solutions of the differential equations involved.

Outside of legacy scientific computing software where you can wait months or even years for computations to finish, I am not sure why anyone would need a hardware floating point unit in their CPU. Processors are fast enough that that the things that hardware floating point units made computable per unit time 10 years ago are computable per unit time with compiler generated integer instructions today. Aside from legacy scientific computing software, there is no killer application that takes advantage of hardware floating point units in CPUs, because even if the CPU is as optimal as possible, it is still too slow. Having these calculations be done on GPUs is the way forward and it is not just me who thinks this. The NCSA director made public comments on this recently, which are identical to what I am saying:

http://insidehpc.com/2010/11/02/ncsa-di ... computing/

General purpose logic is always slower than dedicated logic. This somewhat contradicts historical experience, but historically, since clock speeds increased with transistor budgets, economics of scale enabled companies like Intel to take advantage of higher clock speeds and greater transistor budgets from more advanced process technology and perform well enough that the dedicated hardware could not compete from a performance/price perspective. Today, since you cannot get faster clock speeds from more advanced process technologies, you must to add constraints on how things are done to continue scaling and specifically, those constraints are that the same function is done on independent data in parallel, which is stream processing. If you go further back in history to the advent of the CPU, you would find that simply doing things on the CPU placed constraints on how things are done and it only makes sense that moving forward beyond what the CPU enabled would require additional constraints.

Furthermore, it is difficult to scale floating point computation intensive calculations without doing the same functions independently of one another and if you do them independently of one another, you have an application that exploits stream processing. It is so difficult to scale such calculations that as far as I know, there does not exist a single application that does floating point computation intensive calculations, which both is not a stream processing application and can be accelerated by SMP CPUs. With that in mind, I do not see how the presence of a hardware floating point units helped your project. It seems to me that you are crediting a very specific approximation of a deterministic turing machine for what is given to you by a much larger category of approximations of deterministic turing machines. How is that not the case?


The FPU helped because I used it. That's how it helped. Regardless of whether I needed it done within a timeframe, I used it. Ergo, it helped. And you are not one to talk, having admitted to using it as well. It doesn't matter whether emulation would've been fast enough for your needs, because you used it. If you go on with this argument, we can all draw similarities to the "I never use Linux on my PS3, therefore everyone else shouldn't have the option to" situation. Which I am still bitter about, Sony.
Mothership: Thuban 1055T@3.7GHz, 12GB DDR3, M5A99X EVO, GTX470+Icy Vision Rev.2@840/3800, Vertex 2E 60GB
Supply ship: Sargas@2.8GHz, 12GB DDR3, M4A88TD-V EVO/USB3
Corsair: Macbook Air Ivy Bridge
Crayon Shin Chan
Minister of Gerbil Affairs
 
Posts: 2237
Joined: Fri Sep 06, 2002 11:14 am
Location: Malaysia

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 1:43 pm

Crayon Shin Chan wrote:
Shining Arcanine wrote:
Crayon Shin Chan wrote:I use doubles to calculate the quadratic equation, and for a fuzzy logic project. It's in the hardware because a long time ago people realized this needed to be accelerated. Let's keep the FPU inside the CPU, it's more convenient for everybody that way.


I used double variables in a homework assignment I did yesterday for my Numerical Analysis class. I was doing Newtonian mechanics simulations and at the scale I was doing them, the presence of hardware floating point units did not make a difference in whether or not it finished in an acceptable time-frame. If it did matter, I could have used the Runge-Kutta method instead of Euler's method to obtain solutions of the differential equations involved.

Outside of legacy scientific computing software where you can wait months or even years for computations to finish, I am not sure why anyone would need a hardware floating point unit in their CPU. Processors are fast enough that that the things that hardware floating point units made computable per unit time 10 years ago are computable per unit time with compiler generated integer instructions today. Aside from legacy scientific computing software, there is no killer application that takes advantage of hardware floating point units in CPUs, because even if the CPU is as optimal as possible, it is still too slow. Having these calculations be done on GPUs is the way forward and it is not just me who thinks this. The NCSA director made public comments on this recently, which are identical to what I am saying:

http://insidehpc.com/2010/11/02/ncsa-di ... computing/

General purpose logic is always slower than dedicated logic. This somewhat contradicts historical experience, but historically, since clock speeds increased with transistor budgets, economics of scale enabled companies like Intel to take advantage of higher clock speeds and greater transistor budgets from more advanced process technology and perform well enough that the dedicated hardware could not compete from a performance/price perspective. Today, since you cannot get faster clock speeds from more advanced process technologies, you must to add constraints on how things are done to continue scaling and specifically, those constraints are that the same function is done on independent data in parallel, which is stream processing. If you go further back in history to the advent of the CPU, you would find that simply doing things on the CPU placed constraints on how things are done and it only makes sense that moving forward beyond what the CPU enabled would require additional constraints.

Furthermore, it is difficult to scale floating point computation intensive calculations without doing the same functions independently of one another and if you do them independently of one another, you have an application that exploits stream processing. It is so difficult to scale such calculations that as far as I know, there does not exist a single application that does floating point computation intensive calculations, which both is not a stream processing application and can be accelerated by SMP CPUs. With that in mind, I do not see how the presence of a hardware floating point units helped your project. It seems to me that you are crediting a very specific approximation of a deterministic turing machine for what is given to you by a much larger category of approximations of deterministic turing machines. How is that not the case?


The FPU helped because I used it. That's how it helped. Regardless of whether I needed it done within a timeframe, I used it. Ergo, it helped. And you are not one to talk, having admitted to using it as well. It doesn't matter whether emulation would've been fast enough for your needs, because you used it. If you go on with this argument, we can all draw similarities to the "I never use Linux on my PS3, therefore everyone else shouldn't have the option to" situation. Which I am still bitter about, Sony.


Your programming language helped you because you used it. Whether or not the code that your compiler generated used floating point instructions is irrelevant. You are not in a position to say that you are explicitly using your CPU's floating point unit unless you either write your programs in assembly or you wrote the compiler you use because the underlying hardware is divorced from high level programming language constructs by your compiler. That has been what it means to use a high level programming language since the inception of the concept with FORTRAN.
Last edited by Shining Arcanine on Fri Nov 05, 2010 1:48 pm, edited 1 time in total.
Disclaimer: I over-analyze everything, so try not to be offended if I over-analyze something you wrote.
Shining Arcanine
Gerbil Jedi
 
Posts: 1717
Joined: Wed Jun 11, 2003 11:30 am

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 1:48 pm

Shining Arcanine wrote:Exactly. Why make the floating point units perform better when the calculations will be moving to the GPU anyway?


Because nobody expects every line of code that uses floats or doubles to be rewritten in a 1-2 year old standard for a piece of silicon that may or may not be inside everybody's computers. Yes, there are people out there without a OpenCL capable GPU. There are also people out there that don't have or want a GPU, like those who shell out money to buy servers. Hence, AMD's Fusion. And while AMD's slaving away trying to make sure that everybody gets higher floating point performance, you are here saying that we don't need floating point at all? The trend is obvious.
Mothership: Thuban 1055T@3.7GHz, 12GB DDR3, M5A99X EVO, GTX470+Icy Vision Rev.2@840/3800, Vertex 2E 60GB
Supply ship: Sargas@2.8GHz, 12GB DDR3, M4A88TD-V EVO/USB3
Corsair: Macbook Air Ivy Bridge
Crayon Shin Chan
Minister of Gerbil Affairs
 
Posts: 2237
Joined: Fri Sep 06, 2002 11:14 am
Location: Malaysia

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 1:50 pm

Crayon Shin Chan wrote:
Shining Arcanine wrote:Exactly. Why make the floating point units perform better when the calculations will be moving to the GPU anyway?


Because nobody expects every line of code that uses floats or doubles to be rewritten in a 1-2 year old standard for a piece of silicon that may or may not be inside everybody's computers. Yes, there are people out there without a OpenCL capable GPU. There are also people out there that don't have or want a GPU, like those who shell out money to buy servers. Hence, AMD's Fusion. And while AMD's slaving away trying to make sure that everybody gets higher floating point performance, you are here saying that we don't need floating point at all? The trend is obvious.


The way forward is clear and there are ways of providing legacy support to people that stay behind. How does this differ from being an issue of legacy code and legacy hardware? How does the floating point unit design in AMD's new processors differ from a step in that direction? How does what you say differ from a strawman argument?
Disclaimer: I over-analyze everything, so try not to be offended if I over-analyze something you wrote.
Shining Arcanine
Gerbil Jedi
 
Posts: 1717
Joined: Wed Jun 11, 2003 11:30 am

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 2:02 pm

Shining Arcanine wrote:
Crayon Shin Chan wrote:
Shining Arcanine wrote:Exactly. Why make the floating point units perform better when the calculations will be moving to the GPU anyway?


Because nobody expects every line of code that uses floats or doubles to be rewritten in a 1-2 year old standard for a piece of silicon that may or may not be inside everybody's computers. Yes, there are people out there without a OpenCL capable GPU. There are also people out there that don't have or want a GPU, like those who shell out money to buy servers. Hence, AMD's Fusion. And while AMD's slaving away trying to make sure that everybody gets higher floating point performance, you are here saying that we don't need floating point at all? The trend is obvious.


The way forward is clear and there are still ways of providing legacy support for these people. How does this differ from being an issue of legacy code and legacy hardware? How does the floating point unit design in AMD's new processors differ from a step in that direction?


The way forward is not clear. We know that OpenCL and CUDA will be around for a long time. For GPU based computing to really take off, the compiler should already be writing OpenCL for me when I declare/initialize a float. But no, you have to include a whole set of new libraries. Fusion is an initiative to wed GPU type FP hardware with the x86 world, so that such things aren't needed when running code on that particular type of floating point unit. In the end, everybody will call it an FPU, and you will dance around singing "move computation to the GPU!" and everybody will laugh at you, asking why on earth would they need to do that when on Fusion, their legacy x87 code is already running on an array of shaders thanks to a lot of hardware translation magic.

Like it or not, you will be using a hardware FPU in the future. And this GPU craze you're talking about will be integrated into the CPU anyway, where it will be called an FPU, rendering your point quite moot.
Mothership: Thuban 1055T@3.7GHz, 12GB DDR3, M5A99X EVO, GTX470+Icy Vision Rev.2@840/3800, Vertex 2E 60GB
Supply ship: Sargas@2.8GHz, 12GB DDR3, M4A88TD-V EVO/USB3
Corsair: Macbook Air Ivy Bridge
Crayon Shin Chan
Minister of Gerbil Affairs
 
Posts: 2237
Joined: Fri Sep 06, 2002 11:14 am
Location: Malaysia

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 2:19 pm

SA, I don't think you're getting it: your idealism is based in a fantasy world, not the world the rest of us live in. You are simply wrong on this point. Period.

If your claim had any relevance, Intel and AMD would both be removing FPUs from processors to use for other things, or to reduce die size and power consumption. They aren't. They're not stupid. Even all but the most limited embedded processors have hardware FP support. Evidently they know something you fail to acknowledge: hardware FP is valuable in current software.
Buub
Maximum Gerbil
Silver subscriber
 
 
Posts: 4195
Joined: Sat Nov 09, 2002 11:59 pm
Location: Seattle, WA

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 2:46 pm

I give up. This is no longer a discussion (was it ever?) on the merits and reasons a cpu-based fpu matters. Its become a debate and it doesn't work when some are more interested in winning than they are about meaningful discourse.
Sony a7
Sony Zeiss 55/1.8 SSM, 24-70/4 SSM
Minolta 17-35/2.8-4 D, 100-300 APO
TheEmrys
Minister of Gerbil Affairs
Silver subscriber
 
 
Posts: 2150
Joined: Wed May 29, 2002 8:22 pm
Location: Northern Colorado

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 2:50 pm

Crayon Shin Chan wrote:Like it or not, you will be using a hardware FPU in the future. And this GPU craze you're talking about will be integrated into the CPU anyway, where it will be called an FPU, rendering your point quite moot.


And this is, of course, a conversation we've all been trying to have with SA for quite some time.

Whether GPU or CPU, we're still talking hardware, right? So, really, what's the diffence? There isn't any! What we call a "CPU" today used to be numerous different discrete chips. For a recent example, look at how the memory controller used to be part of the "chipset" but now is part of the CPU. Why is FPU code in x86 called x87? Because it used to be a different chip! The CPU ended in 86 and the FPU ended in 87. It's all transistors! The only question is how many you can fit on a single IC.

So why have GPUs remained unassimilated for so long? Because the workload they do is very specific, very useful, and so embarassingly parallel that endless amounts of transistors can be thrown at it. Because of that workload, a GPU can fill up an entire IC all by itself. You can make one as big as economics and process tech allows. But, no matter what the size, it's still only going to be good at the one workload. That's the only reason why you can profitably use so many transistors in the first place! If those transistors could be used to accelerate general purpose workloads, you wouldn't see CPUs having 2,4,6,8 cores on a single IC. That's because you can't rely on workloads being embarassingly parallel.

There is no magic. It's all design choices in how you use the transistors you have. GPUs are monsters because they're doing something very specific that is INCREDIBLY parallelizable. But the less and less specific they get, they less and less monstrous they can be. If you use those transistors to design something that is truly general purpose, you'll have what we already do: a CPU. You're not going to get massive performance improvements just because you have the NVIDIA brand.
Glorious
Darth Gerbil
Gold subscriber
 
 
Posts: 7839
Joined: Tue Aug 27, 2002 6:35 pm

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 3:26 pm

There is a truly stupendous amount of fail in this thread.

I'm curious why people are still biting.
Think for yourself, schmuck!
i5-2500K@4.3|Asus P8P67-LE|8GB DDR3-1600|Powercolor R7850 2G|1.5TB 7200.11|1988 Model M|Saitek X-45 & P880|Logitech MX 518|Dell 2209WA|Sennheiser PC151|Asus Xonar DX
bthylafh
Grand Gerbil Poohbah
 
Posts: 3148
Joined: Mon Dec 29, 2003 11:55 pm
Location: Southwest Missouri, USA

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 4:40 pm

bthylafh wrote:There is a truly stupendous amount of fail in this thread.

I'm curious why people are still biting.

Well, other than the brief response I posted around noon today (and now this), I've basically been lurking since yesterday AM. It is like watching a slow-motion train wreck. The engine went off the rails sometime on day 2 or 3, but the train kept plowing forward until around day 6 when the tanker cars started exploding. Yesterday it went off a cliff. Today (day 9) it is a flaming pile of twisted wreckage at the bottom of said cliff...
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37632
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 6:50 pm

just brew it! wrote:Well, other than the brief response I posted around noon today (and now this), I've basically been lurking since yesterday AM. It is like watching a slow-motion train wreck. The engine went off the rails sometime on day 2 or 3, but the train kept plowing forward until around day 6 when the tanker cars started exploding. Yesterday it went off a cliff. Today (day 9) it is a flaming pile of twisted wreckage at the bottom of said cliff...

The fireman screams and the engine just gleams.
It is one of the blessings of old friends that you can afford to be stupid with them. Ralph Waldo Emerson.
Captain Ned
Global Moderator
Gold subscriber
 
 
Posts: 20183
Joined: Wed Jan 16, 2002 7:00 pm
Location: Vermont, USA

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 6:58 pm

Crayon Shin Chan wrote:
Shining Arcanine wrote:The way forward is clear and there are still ways of providing legacy support for these people. How does this differ from being an issue of legacy code and legacy hardware? How does the floating point unit design in AMD's new processors differ from a step in that direction?


The way forward is not clear. We know that OpenCL and CUDA will be around for a long time. For GPU based computing to really take off, the compiler should already be writing OpenCL for me when I declare/initialize a float. But no, you have to include a whole set of new libraries. Fusion is an initiative to wed GPU type FP hardware with the x86 world, so that such things aren't needed when running code on that particular type of floating point unit. In the end, everybody will call it an FPU, and you will dance around singing "move computation to the GPU!" and everybody will laugh at you, asking why on earth would they need to do that when on Fusion, their legacy x87 code is already running on an array of shaders thanks to a lot of hardware translation magic.

Like it or not, you will be using a hardware FPU in the future. And this GPU craze you're talking about will be integrated into the CPU anyway, where it will be called an FPU, rendering your point quite moot.


Integrating GPUs into CPUs would require unifying the logic, which would mean that you would have a bigger CPU and not a GPU integrated into a CPU. The only thing that you could do is put them both onto the same silicon like what Microsoft did with the most recent iteration of the XBox 360. At the same time, they will still be separate logical units. You cannot say that one is a subunit of another, so neither of them were integrated into the other. The situation today with GPUs is much different than the situation in the past when you had accessory chips whose only purpose was to make the CPU better. While those were like the male angler-fish, the GPU is not.

Also, GPUs contain hardware floating point units as logical subunits. You cannot call a GPU a floating point unit anymore than you could call a CPU an integer unit.

Buub wrote:SA, I don't think you're getting it: your idealism is based in a fantasy world, not the world the rest of us live in. You are simply wrong on this point. Period.

If your claim had any relevance, Intel and AMD would both be removing FPUs from processors to use for other things, or to reduce die size and power consumption. They aren't. They're not stupid. Even all but the most limited embedded processors have hardware FP support. Evidently they know something you fail to acknowledge: hardware FP is valuable in current software.


While we are likely in a transition period, which could result in floating point units scaled back or even removed from CPUs, the point is that the performance of floating point units in CPUs is of little importance to the general public. Hence, the question, why do you care about floating point performance?

By the way, AMD is removing FPUs from their processors. Bulldozer has half the FPUs that it should have. AMD says that doing that saves die area and reduces power consumption.
Disclaimer: I over-analyze everything, so try not to be offended if I over-analyze something you wrote.
Shining Arcanine
Gerbil Jedi
 
Posts: 1717
Joined: Wed Jun 11, 2003 11:30 am

Re: Floating-point units in server-grade CPUs

Postposted on Fri Nov 05, 2010 7:48 pm

It is like watching a slow-motion train wreck. The engine went off the rails sometime on day 2 or 3, but the train kept plowing forward until around day 6 when the tanker cars started exploding. Yesterday it went off a cliff. Today (day 9) it is a flaming pile of twisted wreckage at the bottom of said cliff...


That's what she said!
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3521
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Floating-point units in server-grade CPUs

Postposted on Sat Nov 06, 2010 1:01 pm

Ryu Connor wrote:If the FPU on .35nm was only a small portion within the die, then it's very unlikely that at .045 it would suddenly yield enough space to become an entire core.

Wow, 0.35 nm? I think you're breaking a couple laws of physics already.

Shining Arcanine wrote:Science has always required that you find a way to force problems into a solution space, rather than the other way around. That is nothing new.

Science doesn't force problems.

Shining Arcanine wrote:Exactly. Why make the floating point units perform better when the calculations will be moving to the GPU anyway?

No they won't.
The burden of proof is on you.
Meadows
Grand Gerbil Poohbah
Silver subscriber
 
 
Posts: 3152
Joined: Mon Oct 08, 2007 1:10 pm
Location: Location: Location

Re: Floating-point units in server-grade CPUs

Postposted on Sat Nov 06, 2010 2:50 pm

Wow, 0.35 nm? I think you're breaking a couple laws of physics already.


My dearest and most pedantic, but lovable bastard,

You knew what I meant, but I realize you can't resist exciting me with such teases. Just for you, I went and copied and pasted the mu.

DIAF

Love,

Ryu



P5 .8 µm | 800nm
P54C .6 µm | 600nm
P54CS .35 µm | 350nm

Peryn .045µm | 45nm
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3521
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: Floating-point units in server-grade CPUs

Postposted on Sat Nov 06, 2010 6:52 pm

Interesting that you just proved our point and interpreted it the wrong way.

Shining Arcanine wrote:As for Android, I recently ran Sunspider on a Google Nexus One, which did not take much longer to complete than Sunspider in Google Chromium on my laptop's Intel Core T2400 processor. I think that the difference was 5 seconds versus 2 seconds, which is negligible. As mentioned earlier, Javascript relies solely on floating point operations, so it would seem that the "emulation" is not as slow as you would think it is


So what you are saying is that 3 seconds doesn't make much difference, and you are correct. What you missed is that your "emulated" version on the Nexus one took 250% longer to run than the version on your desktop. Not a big deal when we are talking about less than ten seconds. But what about something that takes an hour to run on your desktop? Perhaps a POVRAY render or recoding that movie rip. Now it will take two and a half hours. That's a big difference. I happen to work in "The Real World (tm)" supporting some folks who actually make chips for a living and I can tell you that if Intel or AMD decided to drop the FPU and it a lot of what we do take 2.5x longer, that manufacturer would not have another processor in our company. Despite what you may claim, a large category of problems do not lend themselves well to massive parallelization due to data dependencies in the calculation, algorithmic limitations, IO requirements, etc.

--SS
Last edited by morphine on Sun Nov 07, 2010 12:53 am, edited 1 time in total.
Reason: fixed quote
SecretSquirrel
Gerbil Jedi
Gold subscriber
 
 
Posts: 1697
Joined: Tue Jan 01, 2002 7:00 pm
Location: The Colony, TX (Dallas suburb)

Re: Floating-point units in server-grade CPUs

Postposted on Mon Nov 08, 2010 7:44 am

Shining Arcanine wrote:The situation today with GPUs is much different than the situation in the past when you had accessory chips whose only purpose was to make the CPU better. While those were like the male angler-fish, the GPU is not.


Interesting. Here on our planet, in the real world, the GPU is indeed an "Accessory chip" that makes "the CPU better."

You know, seeing how a CPU can do everything a GPU can do, albeit slower, and how a GPU is useless without a CPU.
Glorious
Darth Gerbil
Gold subscriber
 
 
Posts: 7839
Joined: Tue Aug 27, 2002 6:35 pm

Re: Floating-point units in server-grade CPUs

Postposted on Mon Nov 08, 2010 7:58 am

Glorious wrote:
Shining Arcanine wrote:The situation today with GPUs is much different than the situation in the past when you had accessory chips whose only purpose was to make the CPU better. While those were like the male angler-fish, the GPU is not.

Interesting. Here on our planet, in the real world, the GPU is indeed an "Accessory chip" that makes "the CPU better."

You know, seeing how a CPU can do everything a GPU can do, albeit slower, and how a GPU is useless without a CPU.

Yup. A GPU is quite literally a specialized, highly parallelized FPU... one that just happens to have a couple of video outputs hanging off of it.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37632
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Floating-point units in server-grade CPUs

Postposted on Mon Nov 08, 2010 10:05 am

Shining Arcanine wrote:I think you missed the bottomline, which is that floating point performance is not important in CPUs to the point where people should be arguing over how well AMD's floating point units in their new CPUs perform. That is why I asked why people care about it in the first place and it is also why I explained why the units are unnecessary. The performance of unnecessary units is not really an area that merits people's attention.
Here are two more GPGPU people who believe that that FPUs aren't unnecessary.
Could this obviate the need for extensive concurrency training for software developers? Can they simply offload parallel computation to the GPU, which, unlike the CPU, has the potential to linearly scale performance the more cores it has? Can you just “fire it and forget it,” as Sanford Russell, general manager of CUDA and GPU Computing at Nvidia, puts it? Sorry, no.

“The goal is not to offload the CPU. Use CPUs for the things they’re best at and GPUs for the things they’re best at,” said (Mark) Murphy. An example is a magnetic resonance imaging reconstruction program that he found worked best on the six-core Westmere CPU. “The per-task working set just happened to be 2MB, and the CPU had 12MB of cache per socket and six cores," said Murphy. "So if you have a problem with 2MB or less per task, then it maps very beautifully to the L3 cache. Two L3 caches can actually supply data at a higher rate than the GPU can."
http://www.sdtimes.com/content/article.aspx?ArticleID=34842&page=2
wibeasley
Gerbil Elite
Gold subscriber
 
 
Posts: 952
Joined: Sat Mar 29, 2008 3:19 pm
Location: Norman OK

PreviousNext

Return to Processors

Who is online

Users browsing this forum: Blazex, Exabot [Bot] and 6 guests