TR Forums

blastdoor · Tue Nov 10, 2020 10:24 am

I predict that Apple will announce a MacBook Air with an A14 which will be much faster than the Intel MBA it replaces and for less money.

I also predict that Apple will announce a 13" MacBook Pro with an A14x that will be approximately as fast was the 16" MBP with the 8 core CPU and the middle of the road AMD GPU (5400M).

We'll see what happens at 1:00 pm Eastern today.

Aranarth · Tue Nov 10, 2020 11:05 am

Slower or same speed, but more expensive, and better battery life.

Apple is NEVER cheaper!

blastdoor · Tue Nov 10, 2020 1:44 pm

Looks like I was wrong, but in a good way. More powerful than I thought, plus a Mac mini.

No price change on Air and MBP; price cut on Mac mini.

blastdoor · Tue Nov 10, 2020 6:44 pm

Deep dive into A14 and some speculation on M1:

https://www.anandtech.com/show/16226/ap ... -deep-dive

TLDR— apple is going wide in a way that the limitations of x86 make impossible for intel or AMD.

Apple has the performance/watt lead and if anybody else ever takes it, it won’t be x86.

blastdoor · Tue Nov 10, 2020 9:44 pm

I'm remembering, with the help of the google, that Itanium started out 6-wide and ultimately reached 12-wide.

Apple's latest big core is 8-wide.

I wonder -- how are they making such a wide architecture work? Certainly they have a huge instruction cache and other OOO capes. But I wonder -- is their compiler helping out at all?

Buub · Wed Nov 11, 2020 11:29 am

blastdoor wrote:
I'm remembering, with the help of the google, that Itanium started out 6-wide and ultimately reached 12-wide.

Apple's latest big core is 8-wide.

I wonder -- how are they making such a wide architecture work? Certainly they have a huge instruction cache and other OOO capes. But I wonder -- is their compiler helping out at all?

"Apples" and Oranges (not sorry...)

Itanium was a complete departure from common architectures, and some argued, the entirely wrong approach to general purpose computing. VLIW in Itanium required the compiler to do all the deduction up front, rather than modern superscaler, speculative execution architectures with dynamic scheduling, which do a lot of optimization on the fly. While in some ways VLIW was simpler, there were other ways it was substantially more complex. IMHO, Itanium failed because Intel was trying to force a square peg into a round hole just to be different (i.e. to break the x86 duopoly, and create a brand new monopoly which they controlled).

ARM, and Apple's adaptation of it, on the other hand, it an extension modern architectures in ways that have been proven to work in general purpose computing devices.

blastdoor · Wed Nov 11, 2020 9:35 pm

Buub wrote:
blastdoor wrote:
I'm remembering, with the help of the google, that Itanium started out 6-wide and ultimately reached 12-wide.

Apple's latest big core is 8-wide.

I wonder -- how are they making such a wide architecture work? Certainly they have a huge instruction cache and other OOO capes. But I wonder -- is their compiler helping out at all?

"Apples" and Oranges (not sorry...)

Itanium was a complete departure from common architectures, and some argued, the entirely wrong approach to general purpose computing. VLIW in Itanium required the compiler to do all the deduction up front, rather than modern superscaler, speculative execution architectures with dynamic scheduling, which do a lot of optimization on the fly. While in some ways VLIW was simpler, there were other ways it was substantially more complex. IMHO, Itanium failed because Intel was trying to force a square peg into a round hole just to be different (i.e. to break the x86 duopoly, and create a brand new monopoly which they controlled).

ARM, and Apple's adaptation of it, on the other hand, it an extension modern architectures in ways that have been proven to work in general purpose computing devices.

Yeah, I know. But still, apples design is super wide. I’m not in any way meaning to imply it’s VLIW — I know it’s not. But compilers are important and Apple writes their own (along with their own language.) I’m just wondering if there’s some portion of the performance of this super wide design that depends on a particularly good compiler

blastdoor · Wed Nov 11, 2020 9:35 pm

Impressive numbers:

https://browser.geekbench.com/v5/cpu/4648107

Buub · Fri Nov 13, 2020 6:12 pm

blastdoor wrote:
Buub wrote:
blastdoor wrote:
I'm remembering, with the help of the google, that Itanium started out 6-wide and ultimately reached 12-wide.

Apple's latest big core is 8-wide.

I wonder -- how are they making such a wide architecture work? Certainly they have a huge instruction cache and other OOO capes. But I wonder -- is their compiler helping out at all?

"Apples" and Oranges (not sorry...)

Itanium was a complete departure from common architectures, and some argued, the entirely wrong approach to general purpose computing. VLIW in Itanium required the compiler to do all the deduction up front, rather than modern superscaler, speculative execution architectures with dynamic scheduling, which do a lot of optimization on the fly. While in some ways VLIW was simpler, there were other ways it was substantially more complex. IMHO, Itanium failed because Intel was trying to force a square peg into a round hole just to be different (i.e. to break the x86 duopoly, and create a brand new monopoly which they controlled).

ARM, and Apple's adaptation of it, on the other hand, it an extension modern architectures in ways that have been proven to work in general purpose computing devices.

Yeah, I know. But still, apples design is super wide. I’m not in any way meaning to imply it’s VLIW — I know it’s not. But compilers are important and Apple writes their own (along with their own language.) I’m just wondering if there’s some portion of the performance of this super wide design that depends on a particularly good compiler

VLIW and RISC (Arm) are almost polar opposites in many ways, which is why I made the point.

I don't understand what's exceptional about 8-wide. Both AMD and Intel make processors eight or more cores wide. Heck, some Intel server processors come with 64 cores. I don't think the width of the number of the cores is particularly relevant.

On the other hand, Apple seems to have done some pretty impressive things with memory buses on the package, keeping them short and fast. Not a fan of the RAM being an unchangeable part of the CPU package (both from a price and flexibility perspective), but it certainly seems to have paid off in performance.

tfp · Fri Nov 13, 2020 6:55 pm

I think he means the integer pipeline can issue 8 instructions at once, is 8-wide, not the number of cores on the cpu.

Buub · Fri Nov 13, 2020 8:31 pm

tfp wrote:
I think he means the integer pipeline can issue 8 instructions at once, is 8-wide, not the number of cores on the cpu.

My apologies if I misunderstood, because that is a completely different architectural discussion, agreed!

blastdoor · Sun Nov 15, 2020 11:11 pm

tfp wrote:
I think he means the integer pipeline can issue 8 instructions at once, is 8-wide, not the number of cores on the cpu.

Correct

blastdoor · Sun Nov 15, 2020 11:12 pm

Buub wrote:
tfp wrote:
I think he means the integer pipeline can issue 8 instructions at once, is 8-wide, not the number of cores on the cpu.

My apologies if I misunderstood, because that is a completely different architectural discussion, agreed!

No worries.

blastdoor · Tue Nov 17, 2020 1:01 pm

More details from Anandtech:

https://www.anandtech.com/show/16252/ma ... -m1-tested

A few things caught my eye:

1. It looks like Apple is definitely using LPDDR4X RAM with a 128 bit bus, so my dreams of HBM were misplaced
2. If I'm interpreting this text correctly, this design is seriously bandwidth constrained:

A single Firestorm achieves memory reads up to around 58GB/s, with memory writes coming in at 33-36GB/s. Most importantly, memory copies land in at 60 to 62GB/s depending if you’re using scalar or vector instructions. The fact that a single Firestorm core can almost saturate the memory controllers is astounding and something we’ve never seen in a design before.

3. The FP performance of Apple's cores is even more impressive than the INT.
4. The GPU is great for integrated graphics, but seriously lags behind discrete GPUs (no surprise, but still).

Overall, the M1 seems like a killer SOC for the specific devices in which it's being used. But to scale into a higher-end laptop, iMac, and eventually a Mac Pro, there are two big (and related) issues to address: (1) memory bandwidth and (2) GPU.

The only way I can think of to address both issues and still remain committed to UMA (which they really seem committed to, based on rhetoric) is to move to a chiplet design and to use HBM. With a chiplet design, they can mix and match CPU and GPU chiplets for different desktop products and with HBM they can keep those chiplets fed.

Tue Nov 17, 2020 1:18 pm

blastdoor wrote:
4. The GPU is great for integrated graphics, but seriously lags behind discrete GPUs (no surprise, but still).

They already beat older gen discrete, albeit low end?

blastdoor · Tue Nov 17, 2020 2:47 pm

Flying Fox wrote:
blastdoor wrote:
4. The GPU is great for integrated graphics, but seriously lags behind discrete GPUs (no surprise, but still).

They already beat older gen discrete, albeit low end?

Yup, certainly true. But that still lags far behind what's in higher end iMacs and the Mac Pro.

I don't mean that as a criticism of the M1 -- obviously the SOC in a fanless laptop isn't going to compete with a high-end GPU in a workstation.

BUT -- Apple is moving ARM across the whole lineup, and that lineup includes a high-end workstation. I'm just super curious to see how they meet that need for a high-end GPU in a workstation if they are going with their own GPU and if they stick with UMA, both of which sound like things they are really committed to.

tfp · Tue Nov 17, 2020 3:08 pm

Isn't the M1 for the Apple Low end, or maybe lower performance, laptops? They could increase the bus width to 256 for the higher end, assuming support in the processor. At some point cost is a factor and wider bus with means more channels and most cost on the ram side. That said they have so much cache on chip/socket that has to mitigate some of the ram needs.

blastdoor · Tue Nov 17, 2020 3:26 pm

tfp wrote:
Isn't the M1 for the Apple Low end, or maybe lower performance, laptops? They could increase the bus width to 256 for the higher end, assuming support in the processor. At some point cost is a factor and wider bus with means more channels and most cost on the ram side. That said they have so much cache on chip/socket that has to mitigate some of the ram needs.

yeah, M1 is low-end.

And of course you're right about adding channels.

It's just that Apple seems very keen on UMA, which means that bandwidth has to support both CPU and GPU. And for the high-end, they'll need a more powerful GPU. So then it's bandwidth for lots more CPU and GPU cores. I can imagine it being done, it's just that it will need to look pretty different from M1. I'm curious to see what solution they come up with.

Buub · Tue Nov 17, 2020 3:45 pm

It seems to me that they're going to have to decouple the RAM from the CPU package to attain "Pro" levels of memory density. There is simply no way they're going to solder 256GB of RAM onto a CPU package with current RAM densities.

That may not mean DIMMs, still, but I can't see a Mac Pro buyer being OK with soldered-down RAM, especially a 1/4 TB of it.

It seems a phenomenal development for smaller machines with exceptional performance. But it just doesn't scale to bigger stuff.

blastdoor · Tue Nov 17, 2020 4:29 pm

Yeah, great point.

How about replacing a Mac Pro with 1 TB of RAM with a SOC that has 128 GB of HBM backed with a high speed, high endurance 1 TB SSD for swap?

Buub wrote:
It seems to me that they're going to have to decouple the RAM from the CPU package to attain "Pro" levels of memory density. There is simply no way they're going to solder 256GB of RAM onto a CPU package with current RAM densities.

That may not mean DIMMs, still, but I can't see a Mac Pro buyer being OK with soldered-down RAM, especially a 1/4 TB of it.

It seems a phenomenal development for smaller machines with exceptional performance. But it just doesn't scale to bigger stuff.

Waco · Wed Nov 18, 2020 2:33 pm

blastdoor wrote:
How about replacing a Mac Pro with 1 TB of RAM with a SOC that has 128 GB of HBM

I think the size and spot price of HBM stacks will make that a pipedream. Plus you'd need at minimum (assuming the earlier bandwidth numbers are correct) 20 of the fat cores to drive it assuming perfect scaling.

blastdoor · Thu Nov 19, 2020 6:58 am

Waco wrote:
blastdoor wrote:
How about replacing a Mac Pro with 1 TB of RAM with a SOC that has 128 GB of HBM

I think the size and spot price of HBM stacks will make that a pipedream. Plus you'd need at minimum (assuming the earlier bandwidth numbers are correct) 20 of the fat cores to drive it assuming perfect scaling.

Thanks for the reality check -- after a little research, I see you're probably right.

I presume that the Mac Pro will be the last to transition to ASi, which means it's 2 years out. In 2022, it sounds like *maybe* we could expect HBM3 (or whatever they end up calling it) that could be integrated into a SOC, but that 64 GB might be the best we could hope for in terms of capacity (and 512-bit bus width might be more realistic than 1024, because then it sounds like the RAM could be attached via organic packaging rather than a silicon interposer).

What do you think of the idea of a 1 TB fast SSD drive serving in a swap role to augment that very fast, but kind of small, 64 GB of HBM? Could something like that work? Or do you think it makes more sense to have RAM in DIMMs as the next tier of storage (which maybe could allow for less HBM, acting as kind of a cache)?

Waco · Thu Nov 19, 2020 9:49 am

Flash as main memory is excruciatingly painfully slow. DIMMs would be fine.

Buub · Thu Nov 19, 2020 1:27 pm

I predict a different direction. Really friggin huge L3 (or is it L4 -- haven't looked at the cache architecture of the M1 that closely) off-die caches on the CPU package, with relatively conventional RAM on a wide, relatively typical bus.

This will allow high densities and user-upgradability with industry standard parts, while also allowing for the high-speed memory bandwidth via cache.

K-L-Waster · Thu Nov 19, 2020 2:50 pm

Buub wrote:
This will allow high densities and user-upgradability with industry standard parts.

Is there any evidence Apple wants users to be able to upgrade with industry standard parts? Their business model these days seems to be to make everything proprietary and have enforced obsolescence.

Why allow users to upgrade using standard DIMMS when you can solder on memory directly to the motherboard or SOC, charge a 90% premium, then require the user to upgrade the entire system in 3 years when their existing system is "no longer compatible" with the latest version of the OS? (Why wouldn't it be compatible? Don't worry, I'm sure there will be a reason...)

blastdoor · Thu Nov 19, 2020 5:34 pm

K-L-Waster wrote:
Is there any evidence Apple wants users to be able to upgrade with industry standard parts? Their business model these days seems to be to make everything proprietary and have enforced obsolescence.

Is there any ? Yes, quite a bit, like the vast majority of Macs that Apple has released since 1985 allowing user-upgradeable RAM. Not all, but the vast majority.

But there's also evidence that user-upgradability isn't as high a priority for Apple as it is for other PC manufacturers, and Apple will sacrifice it if doing so improves the product in some other way. For mobile devices, they value reducing weight and size pretty highly. For all devices, they value the snappiness of the UI very highly. They also like quiet fans. And I'd say they value overall performance pretty highly, too.

You can see in the current lineup of desktop Macs that all of these competing priorities come together in different ways in different products. Users can upgrade RAM in a 27" iMac and in a Mac Pro. But you can't upgrade RAM in an iMac Pro. Is that because Apple forgot to "screw over" users with the iMac 27" and Mac Pro but remembered to do it with the iMac Pro? I say, no. Instead, it's because the iMac Pro involves a different set of tradeoffs among competing priorities. They wanted more power in the iMac Pro than the iMac, yet they also wanted it to be quiet, so it has a larger cooling apparatus that eliminates space for easy user RAM upgrades.

So... for an ASi-based iMac and Mac Pro, I'm guessing that if there isn't user-upgradeable RAM, it will because Apple decided excluding user-upgradeable RAM could make the device better in some other way.

I'll also guess that the only Mac that has a shot at user-upgradeable RAM is the Mac Pro.

Wirko · Thu Nov 19, 2020 6:59 pm

What's the legal aspect of Intel ISA emulation? The combined power of the M1 and Rosetta can execute x86-64 code, and while they can't handle AVX, they can very probably handle some extensions that are still patented. Intel has threatened other large corporations before over that.

https://developer.apple.com/documentation/apple_silicon/about_the_rosetta_translation_environment
https://www.extremetech.com/computing/250776-intel-quietly-threatens-microsoft-qualcomm-x86-emulation

Buub · Thu Nov 19, 2020 9:11 pm

K-L-Waster wrote:
Buub wrote:
This will allow high densities and user-upgradability with industry standard parts.

Is there any evidence Apple wants users to be able to upgrade with industry standard parts? Their business model these days seems to be to make everything proprietary and have enforced obsolescence.

Why allow users to upgrade using standard DIMMS when you can solder on memory directly to the motherboard or SOC, charge a 90% premium, then require the user to upgrade the entire system in 3 years when their existing system is "no longer compatible" with the latest version of the OS? (Why wouldn't it be compatible? Don't worry, I'm sure there will be a reason...)

For the Mac Pro specifically, yes. It is a true "workstation class" machine (otherwise interpreted as a server in a desktop case). Aside from the "trash can" model, it has always featured modularity and expandability.

I have a 2010 Mac Pro sitting under my desk that has a relatively normal PCIe slotted backplane and SATA ports. It also has a lot of custom Apple hardware, some of it pretty ingeniously designed. I've upgraded the CPU, the RAM, the video card, PCIe USB-3 and USB-C cards, and multiple SATA drives.

The modern Mac Pro is likewise as modular and expandable. The downside is that no normal human can afford it. But that doesn't change the fact that it can be opened up and modified without voiding the warranty.

Why not solder it down? The modern Mac Pro can support up to 1.5 TB of RAM. It would be highly unusual to purchase that much up front. I would expect most shops would buy as much RAM as they need to run some high-end software, and add RAM as needed in the future, given the cost of buying 1.5TB up front.

TR Forums

Mac ASi predictions

Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Re: Mac ASi predictions

Who is online