Personal computing discussed
Moderators: renee, Flying Fox, morphine
blastdoor wrote:I'm remembering, with the help of the google, that Itanium started out 6-wide and ultimately reached 12-wide.
Apple's latest big core is 8-wide.
I wonder -- how are they making such a wide architecture work? Certainly they have a huge instruction cache and other OOO capes. But I wonder -- is their compiler helping out at all?
Buub wrote:blastdoor wrote:I'm remembering, with the help of the google, that Itanium started out 6-wide and ultimately reached 12-wide.
Apple's latest big core is 8-wide.
I wonder -- how are they making such a wide architecture work? Certainly they have a huge instruction cache and other OOO capes. But I wonder -- is their compiler helping out at all?
"Apples" and Oranges (not sorry...)
Itanium was a complete departure from common architectures, and some argued, the entirely wrong approach to general purpose computing. VLIW in Itanium required the compiler to do all the deduction up front, rather than modern superscaler, speculative execution architectures with dynamic scheduling, which do a lot of optimization on the fly. While in some ways VLIW was simpler, there were other ways it was substantially more complex. IMHO, Itanium failed because Intel was trying to force a square peg into a round hole just to be different (i.e. to break the x86 duopoly, and create a brand new monopoly which they controlled).
ARM, and Apple's adaptation of it, on the other hand, it an extension modern architectures in ways that have been proven to work in general purpose computing devices.
blastdoor wrote:Buub wrote:blastdoor wrote:I'm remembering, with the help of the google, that Itanium started out 6-wide and ultimately reached 12-wide.
Apple's latest big core is 8-wide.
I wonder -- how are they making such a wide architecture work? Certainly they have a huge instruction cache and other OOO capes. But I wonder -- is their compiler helping out at all?
"Apples" and Oranges (not sorry...)
Itanium was a complete departure from common architectures, and some argued, the entirely wrong approach to general purpose computing. VLIW in Itanium required the compiler to do all the deduction up front, rather than modern superscaler, speculative execution architectures with dynamic scheduling, which do a lot of optimization on the fly. While in some ways VLIW was simpler, there were other ways it was substantially more complex. IMHO, Itanium failed because Intel was trying to force a square peg into a round hole just to be different (i.e. to break the x86 duopoly, and create a brand new monopoly which they controlled).
ARM, and Apple's adaptation of it, on the other hand, it an extension modern architectures in ways that have been proven to work in general purpose computing devices.
Yeah, I know. But still, apples design is super wide. I’m not in any way meaning to imply it’s VLIW — I know it’s not. But compilers are important and Apple writes their own (along with their own language.) I’m just wondering if there’s some portion of the performance of this super wide design that depends on a particularly good compiler
tfp wrote:I think he means the integer pipeline can issue 8 instructions at once, is 8-wide, not the number of cores on the cpu.
tfp wrote:I think he means the integer pipeline can issue 8 instructions at once, is 8-wide, not the number of cores on the cpu.
Buub wrote:tfp wrote:I think he means the integer pipeline can issue 8 instructions at once, is 8-wide, not the number of cores on the cpu.
My apologies if I misunderstood, because that is a completely different architectural discussion, agreed!
A single Firestorm achieves memory reads up to around 58GB/s, with memory writes coming in at 33-36GB/s. Most importantly, memory copies land in at 60 to 62GB/s depending if you’re using scalar or vector instructions. The fact that a single Firestorm core can almost saturate the memory controllers is astounding and something we’ve never seen in a design before.
blastdoor wrote:4. The GPU is great for integrated graphics, but seriously lags behind discrete GPUs (no surprise, but still).
Flying Fox wrote:blastdoor wrote:4. The GPU is great for integrated graphics, but seriously lags behind discrete GPUs (no surprise, but still).
They already beat older gen discrete, albeit low end?
tfp wrote:Isn't the M1 for the Apple Low end, or maybe lower performance, laptops? They could increase the bus width to 256 for the higher end, assuming support in the processor. At some point cost is a factor and wider bus with means more channels and most cost on the ram side. That said they have so much cache on chip/socket that has to mitigate some of the ram needs.
Buub wrote:It seems to me that they're going to have to decouple the RAM from the CPU package to attain "Pro" levels of memory density. There is simply no way they're going to solder 256GB of RAM onto a CPU package with current RAM densities.
That may not mean DIMMs, still, but I can't see a Mac Pro buyer being OK with soldered-down RAM, especially a 1/4 TB of it.
It seems a phenomenal development for smaller machines with exceptional performance. But it just doesn't scale to bigger stuff.
blastdoor wrote:How about replacing a Mac Pro with 1 TB of RAM with a SOC that has 128 GB of HBM
Waco wrote:blastdoor wrote:How about replacing a Mac Pro with 1 TB of RAM with a SOC that has 128 GB of HBM
I think the size and spot price of HBM stacks will make that a pipedream. Plus you'd need at minimum (assuming the earlier bandwidth numbers are correct) 20 of the fat cores to drive it assuming perfect scaling.
Buub wrote:This will allow high densities and user-upgradability with industry standard parts.
K-L-Waster wrote:Is there any evidence Apple wants users to be able to upgrade with industry standard parts? Their business model these days seems to be to make everything proprietary and have enforced obsolescence.
K-L-Waster wrote:Buub wrote:This will allow high densities and user-upgradability with industry standard parts.
Is there any evidence Apple wants users to be able to upgrade with industry standard parts? Their business model these days seems to be to make everything proprietary and have enforced obsolescence.
Why allow users to upgrade using standard DIMMS when you can solder on memory directly to the motherboard or SOC, charge a 90% premium, then require the user to upgrade the entire system in 3 years when their existing system is "no longer compatible" with the latest version of the OS? (Why wouldn't it be compatible? Don't worry, I'm sure there will be a reason...)