the wrote:There are several reasons why Atom lost in the ultra low power market and this is indeed one of them. The comparison here is that there are ARM instructions that fly through there decoders: there doesn't need to be a difference between the micro-op and the incoming instruction. For ARM, decoders are still necessary to handle instructions that are implemented via microcode. ARM being RISC didn't inherently help on the performance side but it certainly helped keep power consumption in check to dominate in that area.
No, the biggest reason was that they were paired with chipsets (which were on ancient process nodes as normal, with no real power management, as normal, etc... all of which was typical and largely fine for desktops but a huge problem for mobile) and other things that literally drew more power than the processors.
Mobile and ultra-low power aren't just the CPU, they are -EVERYTHING-. You *HAVE* to focus on the entire design, from the display, to the support chips, to the support discrete electronics, to the software stack at ALL levels.
Apple does that, they do it extremely well, and they have been doing it for a very, very long time.
Intel, well, they didn't even do that for their own support chips, at least initially, and they never really could for the whole device, and they didn't have much to do with software either.
Apple, again, controls ALL of that.
^ That is the biggest reason BY far.
I mean, the first Atom desktops had like <5w CPUs and like ~20w northbridges and crazy stuff like that.
the wrote:The Core series does have one major difference between Atom in this area: decoded instruction cache. This small cache permits the system to by-pass the decoder logic to increase performance and decrease power consumption. The Core series also has the benefit of doing the reverse: it can combine two instructions into a fused micro-op.
Uh, there are lot of differences....? They are completely different implementations of the ISA.
Which is sort of the point I (and JBI) have been trying to pound into your skull for years: to a large extent, the ISA is just window dressing. What's actually going on in the shop is largely what matters.
*yes, yes the window dressing has effects. But, as we also keep trying to tell you, the irregularity/complexity of the instruction format for x86 isn't just a drawback. CISC was designed like that to save memory, and by saving memory we save bandwidth/hitting memory.
That has significant power implications in the present day.You routinely ignore this to rail about how x86 MUST BE DECODED!!!!!!!
Look, I know you're obsessed with *JUST* all this CPU blather of yours, but they are just aren't the only thing that matters when it comes to the power draw of a device. These days, they increasingly aren't even the most important part: When Linux finally mainlined the power "package" SATA <-> power-state stuff in 4.15, that was a 25% power savings on idle.
https://patchwork.kernel.org/patch/9952739/NOT. JUST. THE. CPU. DECODE. LOGIC.
ARGGHHHHHHHHHHHHH
the wrote:I would differ in that the execution units in the backend are relatively simple compared to the OoO logic and dispatching that is done in the front end. The front end is all about keeping those execution units busy by figuring out the most optimal order operations are performed. This also includes breaking up a single instruction into several micro-ops that are executed by the backend.
The point is that these things are all integrally wrapped together, and the decode logic is essentially noise as JBI discussed.
The decode logic in the Pentium Pro, a nearly QUARTER OF A CENTURY OLD PROCESSOR, was 40% of the die. That was on a 800nm processor (check my zeros, I think that's right, it was expressed in microns[!!!] back then)!
https://arstechnica.com/features/2004/07/pentium-1/7/By the time of the Pentium 4, it was "well under 10%"
https://arstechnica.com/features/2004/07/pentium-1/2/^ that article, itself, is nearly 15 years old!
This is not an issue unless you are in very, very low power designs. Basically sub-watt. We've talked about this before, and you just insist you are right and I'm wrong, except that I continually cite all sorts of evidence and research and you do nothing but reiterate your own personal feelings.
Well, I don't really care about your personal feelings and since I've caught you making significant factual mistakes in these sorts of claims before, so I really just don't think you're the expert you only seem to think you are.
the wrote:Intel's decoders are not static. One factor is necessity as Intel is continually adding new instructions which is a given that the decoders have to adapt to these changes. However, Intel is still making more changes. Previously mentioned are instruction fusion and the decoded micro-op cache. Decoders also have to adapt to the execution hardware (ie. one 256 bit AVX instruction is broken into two 128 bit wide micro-ops on the AMD side for example). There has at least been research into fusing multiple 128 bit operations into larger 256 bit or 512 bit operations for execution as well, though I haven't heard of this actually being implemented in hardware.
That's not even remotely what JBI said or implied.
He was pointing out that they've been doing it for well over 25 years (The Pentium original had decode then, ARM has decoding now, this isn't as ridiculously manichean as you propose). So, yes, it's a solved problem, because despite people saying that x86 was a dead-end in the late 80s and early 90s, it's now basically the ubiquitous architecture for non-mobile computers. Die? Dude,
it murdered virtually everything else. All those competitors, plenty of which were RISC, are now either on life-support or hiding in the hills in extreme niche applications.
ARM was originally a desktop processor. Apple basically bought it, and focused everything about that ecosystem towards mobile. It wasn't because ARM was magically better, the only part of "ARM" that really contributed,
sui generis, was the simplicity (for cost reasons primarily) and its CURRENT lack of various cruft.
Now, of course, there are all sorts of cruft and other complexities. We have profiles for the architectures, semi-native bytecode execution, "thumb" instructions, etc...