Are the Bulldozer FP capabilities being underestimated?

Discussion of all forms of processors, from AMD to Intel to VIA.

Moderators: Flying Fox, morphine

Are the Bulldozer FP capabilities being underestimated?

Postposted on Fri Sep 03, 2010 3:28 pm

Reading through some articles about Bulldozer and Sandy Bridge, I can't help but wonder why aren't people giving more credit to the former's FP performance.

I mean, as far as I know, Sandy Bridge has four FP units all of them AVX capable, and as such, all of them are 256 bit wide. The articles that I read about SB (Sandy Bridge), seem to point out that when processing 128 bit SSE instructions only the first half of the registers are used. Meaning that there are in fact only 4 128 bit units as well.

If I understand correctly Bulldozer has 8 such units that could combine to form 4 256 bit ones, and although I understand that when processing AVX instructions it might be reasonably slower than SB, all the rest has the potential to be a lot faster...

I am no expert, but it seems a reasonable line of thought, would be nice if someone could corroborate it (or explain why I am wrong) though...
Phenom II X3 720 (+ Unlocked Core & @ 3.3GHz, talk about value!)
SoulSlave
Gerbil
 
Posts: 23
Joined: Mon Nov 17, 2008 12:57 pm
Location: Brazil. No our capital is not Buenos Aires, and we speak PORTUGUESE!

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Fri Sep 03, 2010 7:41 pm

What is probably being underestimated is not the performance, but instead the flexibility of the FPU.

But, until both products are out on the street, there is too much speculation and not enough fact.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/
JF-AMD
Gerbil
 
Posts: 33
Joined: Wed Dec 09, 2009 11:27 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Sun Sep 05, 2010 12:51 pm

I see, I was in fact wondering, because of the way Bulldozer decodes 256bit simd instructions into two smaller instructions (did I get it right?), and it chews trhough 128bit instructions directly...

Meaning it would have twice as much trhoughput as SB in 128bit instructions and half as that in 256bit ones...

As for discussing it, i allways thought that forums were for that purpose... I can see the point in discussing things that are allready out in the market, but i think it is way cooler to gain a litle perspective in discussing new technologies with people that know more than you might...
Phenom II X3 720 (+ Unlocked Core & @ 3.3GHz, talk about value!)
SoulSlave
Gerbil
 
Posts: 23
Joined: Mon Nov 17, 2008 12:57 pm
Location: Brazil. No our capital is not Buenos Aires, and we speak PORTUGUESE!

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Sun Sep 05, 2010 9:02 pm

I heard Bulldozer has 8 pipelines in the fpu. I think this design will do AMD what Core 2 did for Intel, however, since Sandy Bridge is also coming out at the same time, it won't be as big of a gap. In any case, if I had 1k to drop in stock, I'd still drop it in AMD.
countcristo
Gerbil
 
Posts: 14
Joined: Mon Jul 26, 2010 2:21 pm

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Sun Sep 12, 2010 6:03 pm

SoulSlave wrote:I see, I was in fact wondering, because of the way Bulldozer decodes 256bit simd instructions into two smaller instructions (did I get it right?), and it chews trhough 128bit instructions directly...

Meaning it would have twice as much trhoughput as SB in 128bit instructions and half as that in 256bit ones...

As for discussing it, i allways thought that forums were for that purpose... I can see the point in discussing things that are allready out in the market, but i think it is way cooler to gain a litle perspective in discussing new technologies with people that know more than you might...


Why would it have twice the performance in 128-bit (SSE) instructions? Nehalem already has single cycle SSE instructions, as did Conroe, previous generations performed them across two 64-bit ops.

The way these instructions work is you have a single instruction performed across multiple bits of data, eg. an identical SSE operation could be applied across 4 x 32-bit pieces of data, AVX would allow it across 8 x 32-bit (for example, I don't know the specifics).

So really the throughput in terms of FLOPS for AVX on Bulldozer will be no higher than SSE, as both are performing 128-bit ops per cycle - you might gain a bit from fewer instructions being decoded, but you also might lose a little as you don't always have enough data requiring that single instruction to fully pack the unit.

Sandy Bridge in theory can double its throughput with AVX, 256-bit single cycle vs. 128-bit.
Thorburn
Gerbil
 
Posts: 42
Joined: Tue Mar 13, 2007 7:12 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Sun Oct 10, 2010 7:49 am

Why would it have 1/2 the FLOPs?

Sandybridge has 8 FP units. In 256-bit mode it is 8 256-bit. In 128-bit mode it is 8 128-bit.

Bulldozer has 8 Flex FP units. Because it has 16 cores and 16 schedulers, in 256-bit mode it is 8 256-bit units. In 128-bit mode, the FP splits into 2 FMACs that can each operate simultataneously on 2 different threads. In 128-bit software you will have 16 FP units.

Most software will not be recompiled for AVX right off the bat, so assume that there is going to be a lot of legacy advantage for AMD.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/
JF-AMD
Gerbil
 
Posts: 33
Joined: Wed Dec 09, 2009 11:27 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Sun Oct 10, 2010 8:30 am

SoulSlave wrote:I am no expert, but it seems a reasonable line of thought, would be nice if someone could corroborate it (or explain why I am wrong) though...


Your reasoning is correct, except I believe that the first run of BD CPUs will be 4-module units (with 2 cores on each module sharing 1 FPU each).

So you would have an effective 8 128-bit units, but only 4 256-bit executions per cycle. It will match SB on 128 bit operations but have half the throughput on 256 bit (which will admittedly be quite rare for a while).
Wind, Sand and Stars.
Voldenuit
Minister of Gerbil Affairs
 
Posts: 2454
Joined: Sat Sep 03, 2005 11:10 pm

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Sun Oct 10, 2010 10:47 am

@Voldenuit

That's correct, but AMD will likely be positioning a 16-core MCM Interlagos against the 8-core SB. On the desktop it will be an 8-core Zambezi against a 4-core SB (maybe 6-core; I think 8-core will be an Extreme Edition like Gulftown or not even a 1-socket part). Also remember that the 16-core from AMD and the 8-core from Intel will have similar TDPs.

So, at price parity and in the same power envelope, AMD will offer double the 128-bit throughput and the same 256-bit throughput. Not accounting for IPC-like differences and the fact that BD will clock much higher than SB.
Game_boy
Gerbil Elite
 
Posts: 564
Joined: Mon Aug 06, 2007 12:46 pm

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Tue Oct 12, 2010 8:01 am

MAYBE clock higher. Who knows? I don't. I can speculate, but all AMD has said is that they're getting good speeds. I would expect intel to get good speeds with sandy bridge. SB sounds like it'll be more efficient on integer performance, and close on the FP. I would expect it too be faster core for core. The FP abilities are great, but for your average user, sandy bridge should be a better purchase, if it's priced competitively.
sweatshopking
Gerbil Elite
Silver subscriber
 
 
Posts: 685
Joined: Fri Aug 15, 2008 10:37 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Tue Oct 12, 2010 9:05 am

@sweatshopking

No no, Bulldozer is a high-clocking chip by design. Like the Pentium 4 was. It has a pipeline designed for it; that's been in a number of official AMD presentations and statements. The tradeoff is of course lower IPC than otherwise and more power consumption, but we don't know enough to know if it'll pay off. It's aso designed to require less voltage at 2GHz operation than K8, so it can reach those speeds. It should do around 4GHz on the desktop.

SB will be faster core for core. AMD will sell you double the integer cores though, at the same price (16 v 8, 8 v 4).
Game_boy
Gerbil Elite
 
Posts: 564
Joined: Mon Aug 06, 2007 12:46 pm

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Tue Oct 12, 2010 9:11 am

I want to see your link where AMD says that Bulldozer will be 4ghz. If sandy bridge hits 3.4 and is quicker clock for clock, with lower power consumption, I don't care about double the cores, (athlon II x4 vs i3, anyone.... hint: i3 is faster, with lower power consumption) what does it matter?
sweatshopking
Gerbil Elite
Silver subscriber
 
 
Posts: 685
Joined: Fri Aug 15, 2008 10:37 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Tue Oct 12, 2010 11:02 am

AMD didn't say that.

Former AMD employee Mitch Alsup said:

"When I left, BD was supposed to be 20-25% faster frequency wise, and
lose a little architectural figure (5%-ish) of merit due to the
microarchitecture. The surprising thing was the lack of mention of
frequency in the market-droid-ing.

BD was to use a 12-gate pipeline, while Athlon used a 16-gate pipe and
Opteron used a 17-gate pipe. {Add 5 more gates for FLOP, jitter and
Skew to arrive at actual cycle time.} Process shrink is on top of his.
{K9 was to use an 8-gate pipeline.}

Most of what got cut was cut to enable the 12-gate pipe (if indeed
they did achieve that.) In Athlon/Opteron, one can forward a byte,
word, double, or quad from any of the 5 results to any operand of any
6 integer computation units {ALU, AGU}. If BD can't (or couldn't when
I left) forward anything to anywhere, and eats a little AFoM because
of this. This probably saved 2 real gate delays. Lopping off the extra
ALU, and a few other things saves another gate and we are then within
spitting distance (1-gate) of the desired 12-gate pipe in the integer
pipe. More lopping occured in the L1cache pipe to reach the cycle time
goal. "


Dresdenboy (who follows AMD patents related to Bulldozer and had a fair idea of the uarch's layout before the official presentation a few months ago) said:

"This means a design which is aimed at a 20-30% higher clock frequency compared to K8 with 22 FO4 (same voltage and fab process)." (Note here K8 refers to Phenom II, not Athlon 64)

Note that the BD Mitch was referring to was the cancelled 45nm version, hence why AMD is now claiming higher IPC as opposed to the 5% lower mentioned. The patents (thousands of them) also support this design indirectly.

And AMD is offering double the cores in the SAME power consumption. They said a 16-core Interlagos will have similar TDPs to Magny-Cours (137W). Intel's 4-core SB will be 95W on the server. I imagine 8 Intel cores will be 130W. So AMD is comparable. Actually AMD is very competitive today on the server with 45nm 12-cores against 32nm 6 on price, performance and power consumption. They lose out to 8-cores, but in an 8v16 battle they would be matched even if they used just Phenom II cores.

Remember also that while AMD loses in Athlon II v Core i3, that's because consumer apps use <4 threads and Intel will therefore be faster with HT. In an 8-core Zambezi v 4 core 8 thread SB then AMD would be closer because HT couldn't help consumer apps, and of course they won't be a process node behind next year either so power consumption will improve.
Game_boy
Gerbil Elite
 
Posts: 564
Joined: Mon Aug 06, 2007 12:46 pm

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Wed Oct 13, 2010 6:46 am

JF-AMD wrote:Why would it have 1/2 the FLOPs?

Sandybridge has 8 FP units. In 256-bit mode it is 8 256-bit. In 128-bit mode it is 8 128-bit.

Bulldozer has 8 Flex FP units. Because it has 16 cores and 16 schedulers, in 256-bit mode it is 8 256-bit units. In 128-bit mode, the FP splits into 2 FMACs that can each operate simultataneously on 2 different threads. In 128-bit software you will have 16 FP units.

Most software will not be recompiled for AVX right off the bat, so assume that there is going to be a lot of legacy advantage for AMD.


Errrrrm unless I'm missing something here that's just wrong all over the place.

The initial Bulldozer design is 8 'cores', made from 4 modules, so 8-threads, 4 FP units.

The SSE/AVX units are 128-bits in width, with single cycle SSE instructions, two cycles for AVX.

So for a 4 module Bulldozer and a 4-core Sandy Bridge, assuming equal clock speeds, SSE throughput should be equal, but Sandy Bridge will have 2x the AVX throughput.
Thorburn
Gerbil
 
Posts: 42
Joined: Tue Mar 13, 2007 7:12 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Wed Oct 13, 2010 8:59 am

Reading through some articles about Bulldozer and Sandy Bridge, I can't help but wonder why aren't people giving more credit to the former's FP performance.


Plus, the name "Bulldozer" is just so much cooler than "Sandy Bridge"...

Until both products are out on the street, it's all just speculation - some people are underestimating, some are overestimating. Either or both of these could be truly awesome - and either or both could utterly flop due to some unforeseen issue.
cphite
Gerbil Elite
 
Posts: 567
Joined: Thu Apr 29, 2010 9:28 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Wed Oct 13, 2010 11:27 am

SB will be fine. it's an evolution, more than BD anyway. Anand has already tested early samples, and seen that it was fast. BD, nobody knows.
sweatshopking
Gerbil Elite
Silver subscriber
 
 
Posts: 685
Joined: Fri Aug 15, 2008 10:37 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Wed Oct 13, 2010 1:02 pm

Bulldozer will be like the K8 before a more of a server CPU so will have to be very good at multitasking /multithread
applications like server workloads.

So basically is low power ,low performance type architecture but it will trey to be fast in single instruction by speed (>3GHz).
In multithreaded application will be good by multiplying core by adding more modules to the mix.

I like to make a comparison between the Ati vs Nvidia and AMD vs Intel.

I'm thinking more as Nvidia vs Ati ,Intel is like Nvidia single monolithically design very fast and not very efficient.Where AMD is concentrating in more of the same units ,slow performance , but many and very fast (like 4870x2).
Its going to be more like a radeon x870X2 will have 4 module(x2 cores each) very efficient ,like 80%, compared to normal dual core cpu/module. But it will compensate by using more modules where it must be fast.
Now is AMD Bulldozer -the 5870X2 and Intel SB-the 480 gtx? or more like 4870X2 vs gtx 280? :P
bogbox
Gerbil
 
Posts: 47
Joined: Wed Nov 21, 2007 10:36 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Wed Oct 13, 2010 1:34 pm

I would say that's a fair comparison, excepting of course that so far, intel has been faster, even with 50% of the cores.
sweatshopking
Gerbil Elite
Silver subscriber
 
 
Posts: 685
Joined: Fri Aug 15, 2008 10:37 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 14, 2010 2:43 pm

Winnar!
sweatshopking
Gerbil Elite
Silver subscriber
 
 
Posts: 685
Joined: Fri Aug 15, 2008 10:37 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 21, 2010 5:48 am

sweatshopking wrote:I would say that's a fair comparison, excepting of course that so far, intel has been faster, even with 50% of the cores.


Actually a xeon 5680 is 5-20% slower than an Opteron 6176. Both process 12 threads. Oh, and the AMD is ~20% less expensive.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/
JF-AMD
Gerbil
 
Posts: 33
Joined: Wed Dec 09, 2009 11:27 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 21, 2010 5:53 am

Thorburn wrote:Errrrrm unless I'm missing something here that's just wrong all over the place.

The initial Bulldozer design is 8 'cores', made from 4 modules, so 8-threads, 4 FP units.

The SSE/AVX units are 128-bits in width, with single cycle SSE instructions, two cycles for AVX.

So for a 4 module Bulldozer and a 4-core Sandy Bridge, assuming equal clock speeds, SSE throughput should be equal, but Sandy Bridge will have 2x the AVX throughput.



Actually, it is correct. "Interlagos" is a 16-core Bulldozer design. 16 cores, 16 threads.

AVX is handled by combining the 2 128-bit FMACs so you can do a 256-bit dispatch in one cycle.

Also, we have FMACs which can do ADD or MUL. Intel has a dedicated FADD and FMUL. So if you have 2 FMULS to do in a row, with AMD you could execute both on the same cycle (one per FMAC) but with intel you would have to take 2 cycles.

Also, intel utilizes integer pipelines to help execute 256-bit AVX instructions. AMD's are all dediated out of FPU registers and pipelines.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/
JF-AMD
Gerbil
 
Posts: 33
Joined: Wed Dec 09, 2009 11:27 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 21, 2010 11:48 am

JF-AMD wrote:
sweatshopking wrote:I would say that's a fair comparison, excepting of course that so far, intel has been faster, even with 50% of the cores.


Actually a xeon 5680 is 5-20% slower than an Opteron 6176. Both process 12 threads. Oh, and the AMD is ~20% less expensive.


Lol in what benchmark? I'm not sure if those are the 2 newest cpu's that both companies make, but last time I checked, Xeon raped the opteron's.
sweatshopking
Gerbil Elite
Silver subscriber
 
 
Posts: 685
Joined: Fri Aug 15, 2008 10:37 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 21, 2010 12:36 pm

@sweatshopking

http://www.anandtech.com/show/2978/amd- ... -core-xeon

I think it's not quite as good as JF says, but the Opterons do not fare badly. They are competitive perf/$ and perf/watt with the Xeons. A much better position than on the desktop.

The exciting thing is that next year in the top-of-the-range servers, Intel does not have an SB part. Only a 10-core Westmere.

If a 12-core Opteron is performing similarly to a 6-core Xeon now, and AMD is promising 50% more average performance, that should mean a 16-core Interlagos will be competiive with Intel's best server chip next year (not SB).
Game_boy
Gerbil Elite
 
Posts: 564
Joined: Mon Aug 06, 2007 12:46 pm

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 21, 2010 2:52 pm

SPECint_rate2006 is ~5% faster for Opteron
SPECfp_rate2006 is ~20% faster for Opteron
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/
JF-AMD
Gerbil
 
Posts: 33
Joined: Wed Dec 09, 2009 11:27 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 21, 2010 4:38 pm

JF, while BD looks good for server, I'm worried about the desktop. I've been AMD-ing it since my Athlon 800. I liked the performance for the price of the P2-X4 955 BE that I've got now. But I'm wondering if the direction AMD is going will bring products that compete with SB on the desktop. Servers are great and all and if AMD has to focus there for business reasons then so be it, but it will pain me if I have to move over to an Intel platform. I just don't appreciate the way Intel has strong-armed the market, and would like not to give them any of my money.
flip-mode
Gerbil Khan
Silver subscriber
 
 
Posts: 9101
Joined: Thu May 08, 2003 12:42 pm
Location: Cincinnati, OH

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Fri Oct 22, 2010 5:56 am

I would really wait until both BD and SB are launched before you start worrying. The problem is that there are lots of fanboys running around making proclamations about things, but the reality is that until both products are out there, nobody will really know.

Let the truth guide you on what to buy, don't make decisions today because the data is incomplete.
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/
JF-AMD
Gerbil
 
Posts: 33
Joined: Wed Dec 09, 2009 11:27 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Mon Oct 25, 2010 8:40 am

Some additional data about Flex FP:

http://bit.ly/c4XoRV
While I work for AMD, my posts are my own opinions.

http://blogs.amd.com/work/author/jfruehe/
JF-AMD
Gerbil
 
Posts: 33
Joined: Wed Dec 09, 2009 11:27 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 28, 2010 6:58 am

JF-AMD wrote:Some additional data about Flex FP:

http://bit.ly/c4XoRV


JF, what exactly is your job at AMD?
sweatshopking
Gerbil Elite
Silver subscriber
 
 
Posts: 685
Joined: Fri Aug 15, 2008 10:37 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 28, 2010 7:11 am

sweatshopking wrote:
JF-AMD wrote:Some additional data about Flex FP:

http://bit.ly/c4XoRV


JF, what exactly is your job at AMD?


John Fruehe is the Director of Product Marketing for Server/Workstation products at AMD.
Intel i5 4670K @ 4.8GHZ|ATI Radeon HD 7970 (Stock)| 12 GB RAM (Stock)| Xtreme Music with G500 5.1 | Panasonic "TH-L42E60" @ 120 HZ.
Jigar
Maximum Gerbil
Silver subscriber
 
 
Posts: 4612
Joined: Tue Mar 07, 2006 4:00 pm

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 28, 2010 9:56 am

omg really? That's hilarious!!

Ok JF, I get what you're doing. Marketing! also, can I work for you? I am a good worker.
sweatshopking
Gerbil Elite
Silver subscriber
 
 
Posts: 685
Joined: Fri Aug 15, 2008 10:37 am

Re: Are the Bulldozer FP capabilities being underestimated?

Postposted on Thu Oct 28, 2010 10:08 am

sweatshopking wrote:omg really? That's hilarious!!

Ok JF, I get what you're doing. Marketing! also, can I work for you? I am a good worker.


Hey i am a marketing manager as well, i can hire you :wink:

BTW, no offense to you JF, sweatshopking is just like that, please bare with him. :)
Intel i5 4670K @ 4.8GHZ|ATI Radeon HD 7970 (Stock)| 12 GB RAM (Stock)| Xtreme Music with G500 5.1 | Panasonic "TH-L42E60" @ 120 HZ.
Jigar
Maximum Gerbil
Silver subscriber
 
 
Posts: 4612
Joined: Tue Mar 07, 2006 4:00 pm

Next

Return to Processors

Who is online

Users browsing this forum: No registered users and 3 guests