SA wrote:If they are large enough that they are taking a noticeable amount of time, then I am certain that you will find a way to parallelize them
Dude, when you say things like that we have to wonder if you even understand what the word "parallelize" means. Not every problem is parallelizable. The classic analogy is that bearing a child takes 9 months, no matter how many women you have. It is not impossible, or even all that unlikely, to have a bunch of dependent calculations. Thus not every problem can be split up and run concurrently.
SA wrote:Google Chromium is an excellent example of this, where putting each page into its own separate process parallelized webpages rendering in a tabbed web browser, which was slow with the single renderer thread approach Firefox took.
That's a "problem" that's obviously parallelizable because it is composed of indepedent tasks. Each webpage doesn't have any dependency on any other webpage. That's not finding concurrency within a problem, that's just having a bunch of different problems to begin with.
And if it was strictly a performance question they would have just threaded it, the primary reason behind the process per window/tab concept was the security model that comes with processes.
SA wrote: I doubt that everything you run is one massive problem that cannot be broken into separate threads and if it can, you can likely put it into a SIMD programming model.
You're the one speaking in absurd absolutes here, not us. NO ONE has claimed that everything isn't parallelizable. They are just claiming that some things aren't, and they can't handwave that away.
SA wrote:Regardless, everyone, everywhere agrees that the single threaded programming model is a dead-end in terms of performance.
The performance increases have slowed down, this is true, but single-thread performance still matters and will continue to matter. If it DIDN'T matter, you'd see ICs with 16 in-order cores taking the world by storm. I don't see that, do you?
This ridiculous faith you have in the notion that every problem can be parallelized is just, well, absurd. It plainly isn't true. It's also more complicated than that, because even if
most of your problem is parallelizable, there is still a hard limit to how much performance you can gain by throwing parallel execution at it. Guess what the limit is? Oh, right, the amount of time your program takes in the parts that aren't parallelizable. You are only asymptotically approaching it by adding more and more parallel execution! In other words, even in a world with "free" parallelization hardware(instantaneously fast Tesla's for everyone!), singlethreaded performance will always matter. In fact, such a world would make single-threaded performance the DETERMINING factor!
It will still matter! It will always matter!
What I am referring to is known as Amdahl's law. Ubergerbil wrote a
great post about this some years back.
viewtopic.php?f=2&t=44090&hilit=amdahlSA wrote: Any business that cannot parallelize its critical software applications will be killed by those that can, in which case, the strength of a single processing unit does not matter so long as you have a sufficiently large number of them.
That maybe true if performance is extremely important to your product and you're leaving possible concurrency on the table that your competitors are picking up, but it's not true if you product is designed to deal with problems that inherently cannot be parallelized well.
And, again, if the strength of a single "processing unit" didn't matter, why don't we see ICs with umpteen in-order cores dominating the market?
SA wrote:I think you are ignoring the point being that if I can make a decent argument for them being unnecessary, then their actual performance is not really something that should be a concern for people.
If anyone is "ignoring" your point, that's because your "point" is a fantasy. You're not making a decent argument that they're unnecessary, you're just waving your hand and
saying they are.
It's like starting a mathematical proof with a priori definition for the division of zero and then "proving" a whole host of mathematical concepts. Yes, you can do some pretty groundbreaking things once you do that (1 can now equal 2, AWESOME!). It's just that, well, you know, we're not really impressed. Saying we should just ignore your first statemnt and concentrate on your later work because it's so incredible is missing the point.
SA wrote:In computer hardware, floating point units are logical units that take data inputs and a input and produce a data output according to those inputs, with a mapping from inputs to outputs that corresponds to the IEEE754 standard.
Not that I fully understand what the heck you even mean, but the IEEE754 is a bit more than just "how do I perform operations on floats of like precision." There are subtle, but incredibly important matters like "how do I do operations between floats of differing precisions" and "how do I handle exceptions." There are rounding modes, FMAs, subnormals, lions, tigers and bears! Not-so-incidentally, those kinds of things are actually the complex parts of the standard that take up the majority of its text.
SA wrote:If your statements are correct in saying that GPUs are floating point units, then block diagrams of GPUs contradict your statements by failing to adhere to the definition of a floating point unit. Here is a block diagram for a recent GPU:
Here's what Scott prefaced that diagram with:
Scott Wasson wrote: Images like the one below may not mean much divorced from context
He's only more right when they are used in the WRONG context.
SA wrote:Since what you say contradicts the definition of a floating point unit, what do you consider a floating point unit to be?
Logic that is intended to deal with Floats?
SA wrote:By the way, as a side note, page 106 of Nvidia's CUDA programming guide states that integer types are supported, which means that you can do integer operations on Nvidia's GPUs:
Do all of them handle them natively through, or just Fermi? Because the fact that a
programming framework can use them doesn't exactly mean a whole lot by itself, you know?
And, in respect to Fermi, it's perhaps more of a vector processor than a straight FPU, which JBI covered by saying "specialized" and "highly parallel." So, what do you think you are showing?
SA wrote:Emulation is usually used in reference to simulating a full machine.
News to me. When people are talking about FPUs and embedded processors, they're usually talking about software emulation, kernel emulation or how the processor can emulate having a FPU through microcode that just uses its ALU. In all cases, you're not simulating the "full machine" and in software emulation, you're not even simulating instructions at all.
SA wrote:When I realized your misuse of terminology, I edited my post to compensate for it.

Just because he uses a word in a context you're unfamiliar with doesn't mean
he's wrong. It's just your raging absolutism leading you into silliness again.
Just because you think you have really cool, nicely defined and easily understood box doesn't mean you can suddenly stuff the entire world into it. And your box sucks anyway. Stop telling us what you *think* you've learned in class and actually pay more attention. This isn't just real world versus the academy because you
regularly get the theory wrong too.