Early benches of recent ARM server CPUs show serious promise

There's been a lot of talk in recent months about ARM-based server CPUs in one form or another. While it's been a reasonable assumption that said CPUs would offer impressive performance-per-watt ratios, the big question mark in everyone's mind is probably "how fast are they in absolute terms?" Answers for that question have now started to form thanks to a handful of recently-released benchmarks from academia and industry.

Cloudflare is a name that may be recognized by a handful of gerbils. The company is primarily known for its CDN (Content Delivery Network) services, and I'm willing to bet that you may have noticed at least once that you're downloading something from a domain ending in cloudflare.com. Vlad Krasnov, one of the company's engineers, recently shared a number of benchmarks comparing an engineering-sample server fitted with a 46-core Qualcomm Centriq SoC clocked at 2.5 GHz versus a dual-socket Broadwell Xeon E5-2630 v4 system at 2.2 GHz (and a 3.1 GHz turbo), and a dual-socket Xeon Silver 4116 system at 2.1 GHz (with a 3 GHz turbo clock).

Maximum processor TDP for the Intel systems is 170W, while the Centriq system is content with 120W. It's worth pointing out that the Intel Skylake Server processors in the test weren't the highest-end Platinum models, but as Krasnov remarks, those machines have TDPs as high as 200W, and that Cloudflare is primarily concerned with performance-per-watt, hence the Xeon Silver chips.

Source: Cloudflare blog

Cloudflare's software stack relies on cryptography and compression functionality in multiple languages to run, as well as good ol' web serving. The company relies on the Lua and Go languages for many of its needs. Krasnov goes on to point out that some of the software used isn't yet fully optimized (if at all) for the ARM architecture, but the results are impressive enough as-is.

Source: Cloudflare blog

In public key cryptography (OpenSSL) Falkor scores a good win, although it falters in symmetric-key cryptography tests, likely thanks to its narrower SIMD units compared to the Intel competition. When it comes to gzip and brotli compression, although Falkor's single-core performance trails the the Intel systems, it comes into its own and proves itself superior in multi-core scenarios, taking into consideration that Cloudflare apparently doesn't use the highest compression levels in brotli.

Falkor doesn't do too well in Cloudflare's Go cryptography, regular expressions, compression, and string-handling tests. Krasnov believes this is due simply to the fact that the language and libraries in question aren't really optimized for ARM at all yet. In the Lua tests, however, Falkor proves itself "competitive," offering roughly similar performance to the Intel machines. That performance comes with impressive power efficiency. In the final Nginx webserver test, the single-socket Centriq system ends the party with a bang, serving an average of 214 requests per consumed watt, versus 99 for Skylake and 77 for Broadwell.

Cloudflare isn't the only outfit offering a sneak peek at benchmarks for ARM servers. The GW4 Alliance, an academic consortium of four British universities, is readying up Isambard, an ARM-based XC50-series supercomputer manufactured by Cray, packing 10,000 ARM CPU cores from Cavium ThunderX2 processors, with two 32-core processsors at over 2 GHz per node. Simon Mcintosh-Smith from the University of Bristol compared a single-socket "early access" Cavium ThunderX2 system with 32 cores at 2.5 GHz to a 22-core, 2.1 GHz Xeon Gold 6152 system and an 18-core Xeon E5-2695 v4 server at 2.1 GHz. Neither Cavium nor the GW4 provide TDPs for the ThunderX2 CPU, so it's not possible to compare the systems on that metric.

Source: GW4 Alliance

The ThunderX2-based system scores wins nearly across the board, though it's worth pointing out that the Intel systems in question were running at a clock speed disadvantage. What's especially impressive is that even with the use of Intel's optimizing compiler where possible on the Xeon systems, the ThunderX2 system was able to eke out an advantage regardless. Mcintosh-Smith believes that applications relying on raw memory bandwidth see better performance from Cavium's higher number of memory channels (eight versus Skylake's six and Broadwell's four), than ones that rely on raw computing ability, where the ThunderX2 CPU might be at a disadvantage. These results were shown at the SC17 conference, and you can check out more information in the slides here.

It's worth pointing out that all of these benchmarks cover specific use cases, albeit relatively common ones in the case of Cloudflare's wide range of benchmarks. Nevertheless, they seem to paint a fairly consistent picture: ARM CPUs should definitely be taken seriously in server rooms and data centers these days.

Comments closed
    • ronch
    • 3 years ago

    If it weren’t for Epyc I’d be more interested in seeing this take off. If there’s someone in the tech industry that needs to succeed, it’s AMD. People need to give Epyc more attention. Is it already shipping for revenue? Has it improved AMD’s bottomline yet?

    • ronch
    • 3 years ago

    If this somehow takes off, do you think AMD will dust off K12?

      • tipoo
      • 3 years ago

      You know AMD, skate to where the puck used to be and sell off the assets of where it’s going.

    • ronch
    • 3 years ago

    Promising eh? Prove it. Run Crysis.

    • Forge
    • 3 years ago

    Intel has spent the last 15-20 years keeping themselves just far enough ahead of AMD to be dominant, but not so far ahead that AMD went under. They did that for so long and so well that they completely forgot there are other parties playing the game.

    Round two! Intel vs. ARM! FIGHT!

    • Rza79
    • 3 years ago

    I would think anyone brave enough to try something else than an Intel CPU, would try AMD’s EPYC first before they switch everything to ARM.
    I doubt Centriq would win any of these benchmarks from a 32-core EPYC 7551P processor.
    Energy consumption tests on Anand showed a total system power usage of 327W for a dual EPYC 7601 server running POV-Ray. That’s for a supposed 2x180W TDP CPU system (360W total for just the CPU’s) including additional components. At least during POV-Ray, a single 32-core EPYC server might to be around 150W or less.

    • mcarson09
    • 3 years ago

    If they had real contenders they’d emulate x86 and blow Intel out of the water. How many corporate databases use x86? I honestly want ARM to take off, but there is still a lot of legacy apps out in the world. Transmeta proved it could be done, but they couldn’t do it fast enough. If they were to get emulation correct they could do it for cheaper and use a lot less power.

      • just brew it!
      • 3 years ago

      A lot of Cloud platforms are Linux based. Legacy apps don’t matter for these people because it is easy to recompile everything for ARM.

      I doubt anyone else will ever beat Intel at the x86 game. AMD has kept (and it would appear, will keep) them honest by periodically poking at them with competitive products, but I don’t think anyone will unseat them as the #1 x86 vendor. We may all eventually transition to ARM (or something else) though.

        • cmrcmk
        • 3 years ago

        You’re right that most cloud platforms are linux and thus could recompile for ARM. However, having access to all the source code does not equal “easy to recompile”. Plenty of applications, especially high performance ones like databases, tend to include either code optimized for a specific processor family or just outright assembly language.

        That said, for someone like Cloudflare, Amazon, or Google, the development work to recompile and dome minimal reoptimization for ARM might well be worth it to expand their hardware choices. Google has already waded into the Power ISA so they’ve likely done much of the work needed to branch to a third ISA.

          • just brew it!
          • 3 years ago

          The entire Linux kernel, and most of the core application stack tools and libraries are reasonably portable, and/or have already been tweaked to be buildable on ARM.

          Debian has ARM ports of their distro, including large swaths of their ecosystem.

          Heck, I saw LibreOffice running on a 1st gen Raspberry Pi years ago, shortly after the RPi came out. (It didn’t run [i<]well[/i<] mind you... but obviously the Raspbian folks got it to compile!)

        • Forge
        • 3 years ago

        I’m still waiting for my DEC 21164 desktop. We should all transition to MIPS instead. Microsoft already has NT4 for it!

        ARM isn’t going to replace x86. ARM and x86 together will create the Borg.

        • Anonymous Coward
        • 3 years ago

        I wouldn’t emphasize Linux specifically here. True, Linux is at work many places. A lot of “the cloud” is software as a service so the customer has little or no concern with either the ISA or OS. Or “serverless” code such as Lambda, which is generally going to be bytecode or interpreted rather than compiled. A business could go a [i<]long ways[/i<] without being concerned with the ISA and almost ignorant of any OS issues. Sometimes a cloud platform can feel a lot like a giant OS in itself. So its not that Linux or open source is saving the day, its that Linux and open source work hand-in-hand with massive in-house software development at massive cloud providers who have the resources to eliminate even very small examples of waste.

          • just brew it!
          • 3 years ago

          I was talking from the Cloud providers’ perspective. There’s been a massive shift to the Cloud. If the Cloud providers go ARM, end users will come along for the ride, whether they realize it or not.

            • Anonymous Coward
            • 3 years ago

            I agree that most businesses and users will just go along for the ride. However I think its important to emphasize that the cloud providers are enabled not only because they use a lot of open source software, but also because they control both the software and hardware stacks, and have incentive to throw engineers at problems to even shave percentages off their global costs.

      • windwalker
      • 3 years ago

      Sure, if they would be real contenders they would win even while carrying the other guy on their back across the finish line.

    • Sahrin
    • 3 years ago

    ARM is not x86, and it never will be. x86 math units are monsters – there’s literally nothing like them in the world. ARM will never catch up.

    What you are seeing in these benchmarks is specific optimizations built into the ARM core (like a dedicated AES unit) that x86 has too, but ARM’s much simpler core allows you to build more cores and thus achieve better multithreaded performance *for these specific, accelerated tasks.*

    Much like using a GPU for compute, it’s going to be specific cases where ARM is competitive. An x86 ALU still stomps the floor with it for general purpose computing, and the average server workload is generalized enough that ARM just won’t cut it.

    It AMD is actually working on an IF-enabled massively-cored ARM chip, this could be an additional niche they can chuck at Intel. Imagine a K12-server with Epyc *and* Instinct MI accelerators. Big, hairy math workload? Throw it at Zen. Elegant, massively parallel workload? Throw it at Vega. Highly specialized, threaded workload with a dedicated offload unit on the ARM core? Throw it at K12. All connected to the same massive RAM pool and – maybe – the same Pro SSG NVMe pool.

    Like a swiss army knife server.

      • mcarson09
      • 3 years ago

      Everybody loves the boob so breast cancer gets more money than prostate cancer.

      If more money was dumped into ARM for purposes of competing in the server and desktop segments to dethrone Intel it would happen. Remember people said Linux would not overtake Windows in the server segment and it has. It’s failure in the desktop market is more of a creative difference over a unified GUI. Let’s not forget Intel is having trouble breaking into the phone market.

      I really wish someone would take over AMD who actually wants to crush Intel and Nvidia instead of raping their customer base with “almost there” product lines.

      • Klimax
      • 3 years ago

      Most likely before that happens, Intel sends there mega-core Atoms. (Although their big cores are not that large either)

      • Pancake
      • 3 years ago

      I’ll just leave this here: Apple A10X.

      • Laykun
      • 3 years ago

      Wait, was some one arguing other wise?

      The article is pretty clear that this architecture is very good at very specific types of work. In a large enough enterprise environment or in an AWS setup, work is generally piped through multiple servers per request, these ARM chips make a pretty good case to take over the work of some of those parts of the pipeline. No one is advocating the death of x86, I think this is more of a celebration of making a better tool for a specific job.

      • Anonymous Coward
      • 3 years ago

      Why should x86 have an endless advantage in computation? Certainly the current state of x86 is largely disconnected from where it started, and I am not aware of any reason something analogous can’t be done with the ARM ISA.

        • blastdoor
        • 3 years ago

        You have to think of it from the perspective of a SIMD unit. Sitting there on the same die with the legendary x86 ISA is simply awe-inspiring. We are talking a long lineage of tradition and accomplishment going back to the 1970s. There’s just no way that a SIMD lacking that inspirational example could possibly add or multiply matrices in quite the same way.

          • tipoo
          • 3 years ago

          (are the downdoots not getting the sarcasm?)

            • blastdoor
            • 3 years ago

            Humor — a difficult concept. It is not… logical.

          • Pancake
          • 3 years ago

          It’s the heritage. It’s the tradition.

        • derFunkenstein
        • 3 years ago

        clearly he’s telling us mere mortals that there’s only one right way to design a processor and Intel has figured it out.

        • Sahrin
        • 3 years ago

        It has absolutely nothing to do with the ISA and everything to do with the invested engineering resources.

        An x86 ALU isn’t intrinsically different than an ARM ALU. They’re both RISC register-based machines. What’s different is the relative efficiency and complexity of the x86 machine’s handling of code. The time and effort invested in branch prediction, data and cache handling, instruction mapping, etc – this is all done to extract the maximum amount of instruction level parallelism and efficiency possible. There’s a reason ARM only recently became an Out of Order execution system – because this is fucking hard to do.

        You think that AMD and Intel just arbitrarily decide to make their designs 8-wide (ie, 2 wider than the widest ARM core)? No. The hardware has to be able to usefully utilized by the front end. If you can’t dispatch 6 instructions per cycle, then what’s the point of having a 6-wide core? Increasing cache sizes isn’t just done for funsies, when you boost a cache size it slows the cache down – so you have to figure a way to engineer transistors and SRAM cells such that the cache is even faster to compensate for the latency penalties.

        Intel and AMD spend billions of dollars developing these techniques. They are techniques that are not implemented in ARM.

        So sure, for an arbitrarily embarrassingly efficient workload like…AES decryption, ARM comes out smelling like roses. If you retire 1 AES instruction per clock, and you have more cores…you’re going to win. Run some hairy C++ code or a real database sim through the front end of an ARM CPU and see what happens.

        The x86 front end isn’t just for fun…it adapts an incredibly diverse code base into a RISC code set that the CPU is incredibly efficienct at managing. OoO, for example, lets the CPU look into the stack and pick out isntructions it can execute now – but it has to be *good* at that, it’s not free. There’s no way to know in advance what an arbitrary piece of code will look like before it hits the front end. A change to improve execution of one type of instruction can dramatically slow another.

        It’s entirely possible ARM is great at this, too. But engineering is about tradeoffs. It’s *not* true that you can build an arbitratily wide, arbitrarily fast CPU if you an unlimited resources. There are fundamental limits that depend on your ability to handle the instruction stream intelligently.

        It’s not just a bunch of calculators duck taped together.

          • Pancake
          • 3 years ago

          Can you think of a fruity-themed company that has many times more resources than Intel and AMD put together, possibly better engineering talent and just happens to design the fastest most power efficient mobile CPU on the planet? And if they feel like moving that into laptops and servers they will completely mutilate x86 in raw performance and performance/W terms?

            • Sahrin
            • 3 years ago

            >and just happens to design the fastest most power efficient mobile CPU on the planet?

            Intel isn’t fruity themed. (The most powerful and efficient mobile CPU is mobile Skylake-U and -Y processors.

            Intel spends $11B a year on R&D. This is exactly what Apple spends. Are you seriously suggesting that Apple is going to catch Intel … spending the exact same amount of money as Intel does?

            The A-series chips are very impressive, but just like ARM in general are highly tuned for very specific workloads. It’s why the A10X can destroy a Java benchmark, but struggle to run a basic work processing app or spreadsheet (which my 4-generation old Haswell chip laughs at).

            • Anonymous Coward
            • 3 years ago

            I have to object to the notion that Apple is designing a processor that excels at benchmarks but not… eh, Excel. The Axxx processors seem to be quite well rounded, although aimed at a lower-wattage segment than Intel aims at.

            Part of implementing an excellent product is of course correctly choosing objectives, which it seems Apple has. They rule handheld performance despite Intel’s desire to enter that market.

            • blastdoor
            • 3 years ago

            Yeah…. Sahrin vastly overstated his case regarding the unassailable awesomeness of x86, then Pancake overstated his case regarding Apple’s ability to assail x86, and so on. Such is life on the Internet.

            I think the truth is that very few companies (maybe just one) are really in a position to compete with Intel. This is almost purely for economic reasons, not anything magical about Intel. If anybody could do it, it would be Apple — again, for economic reasons, not magic. Apple has taken the time and money to build the capacity to design highly competitive CPUs.

            But even if Apple were to decide to make a push to replace Intel chips in Macs, I don’t think it would be due to an ability to completely blow Intel out of the water on traditional PC measures of single-threaded CPU performance. It would be because Apple is able to approximately equal Intel on those measures, but then through customization create a SOC that gives the Mac a competitive edge. That’s really the advantage of the A# series chips — it’s not just that they have an Intel-class CPU, they also have a good balance of CPU, GPU, ISP, and other ASICs to enable useful features in the iPhone.

            I’m really at a little bit of a loss as to why Apple hasn’t done this with the Mac. My best guess is that the threat to do so might have pushed Intel to give them a better deal, making it not quite worth Apple’s trouble.

            • Anonymous Coward
            • 3 years ago

            Yeah… I’m not so hot on the idea that they really could switch to ARM on the desktop (or laptop), or that they would gain a lot from doing so. It seems like the larger form factors are well served by what Intel, AMD and nVidia cook up.

            At some size, I guess they can just cut out Intel and go with “store brand” processors. 🙂

            I’m never going to forgive Apple for dropping PPC though. I think they would have done fine with the G5 lineage, IBM is alive and kicking with new cores. There was that PPC startup covering mobile, forget the name. It looked scary at the time, but this happened just as CPU scaling started to shut down. They could have made it, and been sitting pretty today with in-house PPC & ARM designs, based in part on IP purchased from IBM.

            • NoOne ButMe
            • 3 years ago

            How much of Intel’s R&D IS NOT fabrication facilities? :yawn: :rolleyes:
            Sure, not all of Apple’s 11 billion dollars is in CPUs, but If Apple wanted to, they could throw that kind of money around.

            For some detail on A11 performance for math, here is an interesting RWT thread:
            [url<]https://www.realworldtech.com/forum/?threadid=172324&curpostid=172324[/url<]

      • willg
      • 3 years ago

      Jim Keller disagrees with you:
      [url<]https://www.youtube.com/watch?v=SOTFE7sJY-Q&feature=youtu.be&t=226[/url<]

        • chuckula
        • 3 years ago

        So where’s K12?

          • willg
          • 3 years ago

          The point that Jim makes that there’s nothing inherently disadvantageous about the ARM ISA vs x86.

          He comments that the extra transitor budget that x86 ISA decode logic requires could be used for more execution resources in the same design using ARM ISA. I

          As for K12, I have no insight into AMD product planning, but I doubt “meh, x86 haz too much AVX, my giveup” was part of it.

            • chuckula
            • 3 years ago

            [quote<]The point that Jim makes that there's nothing inherently disadvantageous about the ARM ISA vs x86. [/quote<] Actually yeah there is. ARM's vector instructions are probably 10 years behind modern x86. And saying "well you could just add them!" is equivalent to saying that you could make x86 a better ISA than ARM simply by dropping old instructions.. in fact, it would be easier to drop stuff from x86 than to expand ARM.

            • willg
            • 3 years ago

            Do you have sources to support that?

            From what I can see ARM’s SVE looks similar to AVX (and more flexible for implementors)
            [url<]https://www.anandtech.com/show/10586/arm-announces-arm-v8a-with-scalable-vector-extensions-aiming-for-hpc-and-data-center[/url<] And wouldn't dropping old instructions from x86 break backwards compatibility? Happy to be wrong...just enjoying having the debate.

            • chuckula
            • 3 years ago

            Two points: 1. Nobody is using SVE yet and the first real implementation won’t be out until the early 2020s.

            2. SVE is an extremely x86 sort of idea: They are making complex scheduling hardware in silicon that attempts to munge different vector instructions together efficiently to allegedly make the lives of software developers easier. Which is exactly what x86 has been doing for decades instead of trying to make an ideologically “pure” instruction set. So basically, that’s ARM saying they think complex on-die instruction schedulers are a necessity for high-performance hardware. Welcome to the club.

            • Anonymous Coward
            • 3 years ago

            So here you’ve claimed that ARM is equivalent to x86, but that falls short of what it appeared you have been arguing for.

            • chuckula
            • 3 years ago

            No, what I’m saying and what I’ve said for years is that the OMG ARM IS MAGICAL propaganda machine pretends that the ARM ISA can just magically do anything when in reality if you want ARM to behave like a high performance x86 processor you’ll end up copying practically everything that makes the x86 processor interesting.

            • Anonymous Coward
            • 3 years ago

            I’m not sure anyone here has taken a position that you are arguing against.

            The part about “propaganda machine” seems misplaced.

            • Ummagumma
            • 3 years ago

            You won’t get anywhere with Chuckula. He found another stash of cooking sherry and started nipping again.

            • thx1138r
            • 3 years ago

            [quote<]SVE is an extremely x86 sort of idea[/quote<] I would have thought it's the complete opposite. x86 takes the approach of sprinkling new vector instructions on its instruction set every year, resulting in a massive number of instructions for a wide variety of vector types and widths. Having a vector-width-agnostic instruction set will massively reduce the number of required instructions, i.e. ARM is adhering to their RISC principles.

      • just brew it!
      • 3 years ago

      There’s nothing stopping someone from grafting something like a full-bown AVX unit onto an ARM core; it just hasn’t made sense. Nobody’s done it because ARM has been relegated to low-power mobile, where the limited space and power budget has been focused on accelerating very specific use cases (like HD video), not general computation.

        • chuckula
        • 3 years ago

        Yeah, but here’s the trick. If an ARM licensee ever got around to actually doing all that then all the magical “OMG SO EFFICIENT” fairy dust that sprinkled on every ARM chip would evaporate and you’d be left with yet another clone of modern high-performance x86 parts — that’s not being fabbed at Intel and that doesn’t have anywhere near the sophisticated surrounding cache architecture to keep the chip running properly.

        If you don’t believe me, go look at Qualcomm’s diagrams for their Centriq chip and have dejavu to 2011 where you see Qualcomm copying Sandy Bridge’s ring bus (looks like Qualcomm thinks Intel got it right).

          • just brew it!
          • 3 years ago

          I’m not necessarily disagreeing with that. “Nothing’s stopping someone” isn’t equivalent to “makes business sense”. But if ARM starts to gain traction in the non-mobile space, then the business case starts to look more interesting.

            • End User
            • 3 years ago

            How about the laptop space?

            [url<]https://www.notebookcheck.net/The-HP-2US29AV-could-be-one-of-the-first-ARM-powered-Windows-10-laptops.263758.0.html?utm_source=dlvr.it&utm_medium=twitter[/url<]

            • just brew it!
            • 3 years ago

            Non-starter for me personally, as I need the ability to run x86 VMs. For general consumer use I could see it working, but for Windows they need to get the 3rd party devs on board, which is a question mark.

            • End User
            • 3 years ago

            Microsoft has x86 emulation ready to roll to cover the laggy 3rd party consumer app devs.

            If you absolutely have to have laptop based x86 VMs then ya, an ARM based laptop is not for you.

            • just brew it!
            • 3 years ago

            How’s the emulation performance? I expect it’ll be “good enough” for productivity apps, but fall far short for computationally intensive stuff or FPS type games.

            • End User
            • 3 years ago

            I have no idea. I have no intention of buying one as I want x86 hardware under Windows.

      • tipoo
      • 3 years ago

      The ISA isn’t the limit to enhancing such things. With enough billions of dollars someone can build an ARM core very similarly to large x86 cores, with wider SIMD (they now support n-length simd) and heavy prefetchers, etc.

      x86 is performant because it’s had years and billions and billions of dollars thrown at it. It’s not performant BECAUSE it’s x86. What we have now with it is very different from when it started, yet still compatible, so too could future ARM cores be.

      • windwalker
      • 3 years ago

      [quote=”Sahrin”<]ARM will never catch up.[/quote<] Famous last words.

        • End User
        • 3 years ago

        +3

        • chuckula
        • 3 years ago

        It’s entirely true unless ARM effectively becomes a clone of x86 at which point insulting x86 becomes a moot point sine you’ve only copied it.

          • tipoo
          • 3 years ago

          x86 is in the hands of Intel and AMD. ARM is open to everyone (for a slim fee). Even if large ARM cores only achieve the perf/watt of large x86 cores, there’s still a point to that. More companies in the game = more competition and hopefully further creative designs.

            • Anonymous Coward
            • 3 years ago

            Intel has pretty well demonstrated that monopolies are not creative. 🙂

            • Sahrin
            • 3 years ago

            Competition does not have an effect on complex tasks.

            • Anonymous Coward
            • 3 years ago

            You must be fishing for replies with that. In a gentler time we could call it trolling, but these days I would reserve that for a more serious effort on your part.

            • Sahrin
            • 3 years ago

            I like how something becomes writ because Ronald Reagan said it out loud on TV.

            They just awarded a Noble Prize to the guy who realized this is true, by the way.

            • Anonymous Coward
            • 3 years ago

            If you say so.

          • NoOne ButMe
          • 3 years ago

          Clean slate.

          While I think people tend to far overstate the effect “baggage” has on x86, ARM has a reasonable chance of actually doing a fresh design, without wasted area/etc.

          It’s small, but it is improvements.

          x86 could do it also. But inherently x86’s advantage is it’s backwards compatibility.

            • tipoo
            • 3 years ago

            ARM already took the chance to deprecate a lot with the ARMv8 instruction set. That’s part of what’s nice about them, they deprecate stuff. x86-64 more just joined them at the hips. That leaves ARMv8 as a comparably clean and slim ISA.

          • windwalker
          • 3 years ago

          iOS still hasn’t caught up in terms of power user features to Symbian.
          You can be technically right but dead wrong. Symbian dead.

    • Mat3
    • 3 years ago

    What happened to AMD’s ARM stuff? First there was Seattle and then K12. Just more precious R&D money wasted it seems.

      • NoOne ButMe
      • 3 years ago

      money.
      Lisa canceled ARM stuff because ecosystem wasn’t ready, and because limited funds meant they pushed Zen out before K12. K12 probably exists on paper, given it sounded like it’s mostly Zen (or Zen is mostly K12).

        • mcarson09
        • 3 years ago

        Maybed they need to cut back on Exec pay and actually fund R&D.

          • NoOne ButMe
          • 3 years ago

          My quick digging shows that Lisa Su got about 1.4 million cash, and 10 million in stock in 2016. With other executives including Mark Papermaster at about 700K/3 million.

          The last executive at AMD to make millions of dollars in salary was Rory Read.

          Most of the ~22-30 million dollars in compensation a year from AMD has been with stock, which I don’t believe could pay for K12.

          So you’re looking at many 20-30 million dollars since development on K12 started to now. Not enough to get K12 out the door.

          • Anonymous Coward
          • 3 years ago

          Oh I know, maybe the engineers should take a pay cut so they can hire more of them!

      • Sahrin
      • 3 years ago

      AMD’s ARM chip (K12) was deprecated in favor of Zen – Jim Keller was put in charge of the entire Zen/IF effort, and decided to promote Zen over K12.

      Even after getting deprecated, K12 was still projected for a 2017 launch. Obviously that’s not going to happen, but it still very much exists.

        • Alexko
        • 3 years ago

        I’m not sure K12 still has much of a point now that Zen/Epyc is here.

          • NoOne ButMe
          • 3 years ago

          If ARM becomes viable, and the usage of ARM ISA over x86 does save a bit of die area, either for cost savings, or for higher performance.

          If you’re shipping millions of units, that 1 extra die a wafer, or bit higher performance could be a large cost saver, or profit adder.

    • chuckula
    • 3 years ago

    I’d be much more interested in the Cloudflare results if they’d turn on QAT and then run a real SSL-enabled web service with the full Skylake platform.

    Xeon-D and Atom systems in a clustered configuration would also be an interesting test since these ARM chips are basically scaled-up versions of microservers.

    On top of that, if you read the “GW4 Alliance” slides they claim that their ARM chips are actually competitors to products like the GPU and Xeon Phi. I noticed that they didn’t actually publish comparative results that included GPU or Xeon Phi tests of those benchmarks.

      • MOSFET
      • 3 years ago

      I didn’t check to see if that Broadwell Xeon E5-2630 and the Xeon Silver 4116 occupy the same spot on the generational price ladder, but the Silver adds 2C/4T, drops everything 100 MHz, and keeps TDP the same. For that, the Intel-ARK-listed price has increased by 50%. That may win ARM some adoption – the Silvers are neutered, and then the TDP goes up. (Not really; 12C/24T in 85W is damn impressive so obviously something has to give, and apparently it’s frequency.)

      So yeah, QAT would be an interesting test. Maybe that’s where the extra cost is wrapped up.

    • tipoo
    • 3 years ago

    Not bad, not bad. And Falkor is only a 4-issue design? I wonder what a future 6-issue offspring could do.

    • Alexko
    • 3 years ago

    Very interesting stuff, but it would have been nice to have some Epyc benchmarks in there, as AMD’s approach is somewhat similar—lots of cores with relatively lower per-core performance.

      • xeridea
      • 3 years ago

      Zen IPC is pretty good, it just doesn’t clock as high.

      [url<]https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd-epyc-7000-cpu-battle-of-the-decade/14[/url<] [quote<]First off, let's gauge the IPC efficiency of the different architectures. Considering that the EPYC core runs at 12-16% lower clockspeeds (3.2 vs 3.6/3.8 GHz), getting 90+% of the performance of the Intel architectures can be considered a "strong" (IPC) showing for the AMD "Zen" architecture. [/quote<] Also interestingly EPYC gets bigger boost on average from SMT than Intel. Would still be a good comparison, since EPYC gets AMD a good showing for server market now.

        • Alexko
        • 3 years ago

        I’m not saying Zen’s IPC is bad, just lower than Skylake’s. Plus, Zen doesn’t clock as high (usually) hence measurable lower per-core performance in most cases.

          • xeridea
          • 3 years ago

          According to article, avg 90% performance at avg 15% lower clock means it has slightly better IPC……

          Also better avg boost from SMT (28% vs 20%).

          So unless workload is highly latency sensitive, I wouldn’t consider speed a factor, and if doing highly threaded tasks, per thread perf will be a wash. If comparing vs ARM CPUs, you aren’t going to be doing latency sensitive tasks.

            • NoOne ButMe
            • 3 years ago

            IPC is typically measured in single-threaded. For Multi-threaded, the extra scaling can make Zen faster, but that’s not what IPC is normally used to discuss.

        • NoOne ButMe
        • 3 years ago

        if I remember, Zen’s SMT has been a bit ahead of Intel’s since it launched

        also EPYC isn’t really on the market, and AMD isn’t yet a known player.

        Plus EPYC seems like it would be better than Intel for the comparisons that Qualcomm is making.

        • Klimax
        • 3 years ago

        You get more out of SMT only when resources are underutilized or core is having bubbles/stalls in execution. (Neither is really good thing)

        That’s why Netburst never benefitted from SMT (lack of resources to fill stalls) and absence of SMT until Nehalem. And even then it took fairly long time to see unambiguous wins for SMT. (Haswell and further)

Pin It on Pinterest

Share This