SoulSlave wrote:I see, I was in fact wondering, because of the way Bulldozer decodes 256bit simd instructions into two smaller instructions (did I get it right?), and it chews trhough 128bit instructions directly...
Meaning it would have twice as much trhoughput as SB in 128bit instructions and half as that in 256bit ones...
As for discussing it, i allways thought that forums were for that purpose... I can see the point in discussing things that are allready out in the market, but i think it is way cooler to gain a litle perspective in discussing new technologies with people that know more than you might...
SoulSlave wrote:I am no expert, but it seems a reasonable line of thought, would be nice if someone could corroborate it (or explain why I am wrong) though...
JF-AMD wrote:Why would it have 1/2 the FLOPs?
Sandybridge has 8 FP units. In 256-bit mode it is 8 256-bit. In 128-bit mode it is 8 128-bit.
Bulldozer has 8 Flex FP units. Because it has 16 cores and 16 schedulers, in 256-bit mode it is 8 256-bit units. In 128-bit mode, the FP splits into 2 FMACs that can each operate simultataneously on 2 different threads. In 128-bit software you will have 16 FP units.
Most software will not be recompiled for AVX right off the bat, so assume that there is going to be a lot of legacy advantage for AMD.
Reading through some articles about Bulldozer and Sandy Bridge, I can't help but wonder why aren't people giving more credit to the former's FP performance.
sweatshopking wrote:I would say that's a fair comparison, excepting of course that so far, intel has been faster, even with 50% of the cores.
Thorburn wrote:Errrrrm unless I'm missing something here that's just wrong all over the place.
The initial Bulldozer design is 8 'cores', made from 4 modules, so 8-threads, 4 FP units.
The SSE/AVX units are 128-bits in width, with single cycle SSE instructions, two cycles for AVX.
So for a 4 module Bulldozer and a 4-core Sandy Bridge, assuming equal clock speeds, SSE throughput should be equal, but Sandy Bridge will have 2x the AVX throughput.
JF-AMD wrote:sweatshopking wrote:I would say that's a fair comparison, excepting of course that so far, intel has been faster, even with 50% of the cores.
Actually a xeon 5680 is 5-20% slower than an Opteron 6176. Both process 12 threads. Oh, and the AMD is ~20% less expensive.
JF-AMD wrote:Some additional data about Flex FP:
sweatshopking wrote:JF-AMD wrote:Some additional data about Flex FP:
JF, what exactly is your job at AMD?
sweatshopking wrote:omg really? That's hilarious!!
Ok JF, I get what you're doing. Marketing! also, can I work for you? I am a good worker.