Nvidia’s Tegra X1 SoC has 8 CPU cores, Maxwell graphics

Nvidia's next-gen Tegra processor debuted during the company's pre-CES press conference this evening. Dubbed Tegra X1, the chip formerly known as "Erista" brings Nvidia's Maxwell graphics architecture to what CEO Jen-Hsun Huang described as a "mobile superchip."

The Tegra X1 combines 256 shader processors with an eight-core CPU "in a 4+4 configuration." The 64-bit CPU uses a big.LITTLE combination of four ARM Cortex A57 cores and four A53 cores. Huang didn't disclose clock speeds, but he did reveal that the thermal envelope is about 10W. The chip is built on 20-nm fabrication tech, and it can play 4K video at 60 FPS using either the H.265 or VP9 codecs.

According to Huang, the Tegra X1 doubles the performance of its K1 predecessor within the same thermal envelope. It's unclear whether Huang was referring to the quad-core ARM Cortex-A15 version of the K1 or to the dual-core Denver variant, but either way, the X1 looks like a substantial upgrade over the current generation.

Here are the relative performance figures Nvidia showed comparing the Tegra X1 to its predecessor and to the Apple A8X SoC from the iPad Air 2:

Nvidia claims the Tegra X1 is the first mobile chip to deliver a teraflop of floating-point throughput. Unlike previous Tegra chips, the X1 supports floating-point datatypes with 16 bits of precision, otherwise known as FP16. This format should require less power to process than the FP32 format used by the Tegra K1.

Huang showed the Tegra X1 running the Unreal Engine 4 "Elemental" demo, whose HDR lighting requires floating-point processing. The demo system wasn't actually visible on stage, but the chip was apparently running within the X1's 10W thermal envelope.

Comments closed
    • MadManOriginal
    • 8 years ago

    Denver also required a lot of excess engineering in both hardware and software to pull of it’s instruction translation mojo, which doesn’t even work very well in general use. I hope they just dead-end the concept, even going with custom ARM cores that use standard concepts would take fewer resources.

    • MadManOriginal
    • 8 years ago

    AMD STARTED IT! …sort of

    • ronch
    • 8 years ago

    Give me your address. I’m gonna send you a joke book by courier.

    • NTMBK
    • 8 years ago

    Erm, no. I disagree with deceptive marketing, regardless of what source it comes from. AMD, Intel, Qualcomm, Samsung etc have all pulled plenty of ethically dubious tricks as well.

    • aestil
    • 8 years ago

    Fairly certain that it can only use 8 cores at once. Nvidia is implementing their own custom interconnect, and it has cache coherence and only 4 of the 8 cores can be active at any one time from what I understand.

    • VincentHanna
    • 8 years ago

    if this was intel, and we were talking about 4 logical cores which were only usable every other cycle or something, or if it was a high/low on/off configuration, you might have a point.

    I don’t see a problem with calling a 2×4 single package CPU an 8-core processor if that is what they want to do. The real issue that I can see with it is, when you get into comparing a true 8-core 1.8ghz CPU with a 1.2Ghz/1.8ghz configuration which will obviously perform fewer total operations per second.

    From what I can tell, Nvidia is being honest in this arena… My question is, will the distributors who buy/use their products? Based on experience, no.

    But then, companies like Samsung and apple rarely advertize their CPU specs to the general public. Chances are, if you are willing to hunt for Core specs, you are probably willing to hunt for benchmarks…Besides, even though tegra is a mobile part, isn’t this chip targeted at automotive manufacturers? I personally dread the day when the processor driving my radio even enters my mind when I am car shopping. That’s the day I move to the Louisiana backwoods and spend a year living on snake and gator to try to get my priorities back in line.

    • torquer
    • 8 years ago

    Wait, this is the APU from the Xbox one?

    • torquer
    • 8 years ago

    Its widely reported that some future GPU will actually feature an ARM CPU on the card itself. So, Nvidia’s work on mobile CPUs isn’t limited exclusively to phone/tablet markets forever.

    • torquer
    • 8 years ago

    Google translate: “Please don’t post something that has the potential to make not-my-chosen-brand look good.”

    • torquer
    • 8 years ago

    I can’t wait for the day when no one publishes any numbers about new hardware launches and no one discusses new tech until its already in consumers’ hands.

    /sarcasm

    • Beelzebubba9
    • 8 years ago

    It’s an 8 core CPU – all 8 cores are usable concurrently.

    Why shouldn’t they call it an 8 core SoC?

    • Pwnstar
    • 8 years ago

    I agree that they shouldn’t call it 8 cores.

    • Andrew Lauritzen
    • 8 years ago

    It’s pretty normal to reuse some of the hardware between the two paths – in fact despite the PVR picture, for each pair you can’t do two FP16 operations at the same time as an FP32 operation (I believe the TR article mentioned this as well) so there’s clearly some sharing there too. The same thing is done for narrower integer operations as well.

    Ultimately it doesn’t really matter about the hardware details though; the only relevant performance implication is whether or not you can do 2x FP16 in the same cycle as 1x FP32, which you usually can’t.

    • southrncomfortjm
    • 8 years ago

    Put this thing in the next Shield Tablet will ya? I really enjoy XCOM: Enemy Within on my Nexus 5, and I’m sure I’d love similar games running even better on a Shield Tablet with “Maxwell grade” graphics. (nevermind that the tablet would almost certainly melt).

    • the
    • 8 years ago

    Apple didn’t entirely develop their designs from scratch. They purchased PA-Semi which did low power, high performance PowerPC designs. Apple was in talks with PA-Semi to use them before they switched to Intel.

    • Benetanegia
    • 8 years ago

    I wouldn’t say that 2 architectures doing something means that it is ordinary to do so…

    That paper doesn’t make clear if there’s dedicated FP16 ALUs or it does the same as Nvidia is doing here. Everything is atributed to what they call FPU, even tho such FPU does FP, integer and all kinds of instructions. It looks more like what Imagination does:

    [url<]http://images.anandtech.com/doci/7793/PowerVR%20Series6XT%20USC.png[/url<] But if you can shed some light on it, I would appreciate it a lot.

    • Benetanegia
    • 8 years ago

    I was not speaking about doing FP16 on dedicated FP16 ALUs (that’s what Imagination and ARM do). I was speaking about FP32 ALUs doing 2x16FP operations. I could be wrong regarding Intel, but no other mobile GPU does that that I know off. They have dedicated FP16 ALUs alongside FP32 ALUs, much the same way that there’s dedicated FP64 ALUs in Kepler/Maxwell. By out of the ordinary I meant they are doing it differently than the rest.

    • HisDivineOrder
    • 8 years ago

    So… I take it the Tegra K1 Denver-based chip is DOA.

    No shock there. At the rate these companies want to pop out these chips (and with nVidia’s own less than stellar reputation for getting product out on time), it’s probably going to start becoming wiser to just stick closer to the ARM-built designs for cores and let them do the legwork on designing chips.

    Worry more about the amenities like GPU’s, numbers, and fabrication.

    • MathMan
    • 8 years ago

    I saw the presentation.

    JHH couldn’t have been more clear that he was talking about FP16.

    If somebody else copies that number without adding this caveat, then that’s not dishonest marketing, but flawed copying.

    JHH spend a lot of time on deep neural networks for image recognition. If there’s one application that has an insatiable need for flops but can live with FP16 only, it’s that one.

    • Andrew Lauritzen
    • 8 years ago

    4300U for the i5, 4650U w/ HD5000 for the i7 (same as Macbook Air) according to tech specs tab:
    [url<]http://www.microsoftstore.com/store/msusa/en_US/pdp/Surface-Pro-3/productID.300190600[/url<] Interestingly the i3 version does use a Y-series CPU though (with an ~11W TDP), which is maybe what you were thinking of? Not sure how common that version if it is though as 64GB of SSD is really not enough for a machine of that caliber/cost...

    • Andrew Lauritzen
    • 8 years ago

    His initial post aside, it’s definitely not out of the ordinary in mobile. In fact it’s the other way around – NVIDIA and Intel GPUs have been the only ones that can’t (yet) do FP16 in that space… everyone else relies on it heavily for performance on mobile.

    On desktop architectures went to FP32 for simplicity at a certain point but on mobile the power benefits of FP16 are far to great so in the future it’s pretty clear that everyone is going to support both.

    • Orb
    • 8 years ago

    It’s so mobile that it goes straight into CARS.

    • 3SR3010R
    • 8 years ago

    If the 10 watts for the SOC or the entire platform?

    • chuckula
    • 8 years ago

    [quote<] I mean you acknowledge that Tegra X1 ALUs are doing something out of the ordinary,[/quote<] May be slightly less out of the ordinary than you think... Broadwell's GPU also handles half-floats. [url<]https://software.intel.com/sites/default/files/managed/71/a2/Compute%20Architecture%20of%20Intel%20Processor%20Graphics%20Gen8.pdf[/url<]

    • 3SR3010R
    • 8 years ago

    Agree with what? The X1 CAN use all 8 cores at the same time.

    [quote<]However, rather than a somewhat standard big.LITTLE configuration as one might expect, NVIDIA continues to use their own unique system. This includes a custom interconnect rather than ARM’s CCI-400, and cluster migration rather than global task scheduling which exposes all eight cores to userspace applications[/quote<]

    • 3SR3010R
    • 8 years ago

    Nvidia gets ripped on the K1 because they promote FP16 to FP32 and run only at 365 GFlops but now that the almost triple it to 1024 GFLops (2.8X) they get ripped again because it is FP16 GFlops.

    One comment Nit Pick.

    PS: FP32 went from 365 GFlops (K1) to 512 GFlops (X1).

    • 3SR3010R
    • 8 years ago

    DELETE

    • derFunkenstein
    • 8 years ago

    OK that’s my bad. I didn’t realize that, but I re-checked a couple articles online and it’s using a 4200U.

    • Benetanegia
    • 8 years ago

    Well, apologies. I woke up very early in the morning to watch the presentation, only to find out it was entirely about cars. Sleep deprivation makes me dense.

    That being said, it was a bad joke. I mean you acknowledge that Tegra X1 ALUs are doing something out of the ordinary, but your post looked like you are pretending that every GPU can do the same and it’s just marketing BS.

    • chuckula
    • 8 years ago

    That’s true, it’s not the number of posts that makes them wrong. Instead, it’s the fact that the Core-m I actually own and use doesn’t seem to suffer from these mysterious throttling ailments that all those 100 posters — who’ve probably never seen a Core-m system in real life much less used one — seem to know are occurring.

    • Pwnstar
    • 8 years ago

    Agreed.

    • Pwnstar
    • 8 years ago

    100 posts doesn’t somehow make them less wrong.

    • Pwnstar
    • 8 years ago

    But Denver is already done. nVidia already did the years of work, they just need to add two cores and port it to a new process, which is much easier than starting from scratch.

    • ronch
    • 8 years ago

    I don’t like how some SoC manufacturers call their chips 8-core when only 4 cores are actually available to software at any given instant. 8-core means 8 cores seen and usable by software ALL the time.

    • Andrew Lauritzen
    • 8 years ago

    The Surface Pro 3 uses a U-series CPU like ultrabooks (conventional TDP ~15W) so it’s definitely not 4.5W 🙂 They may config down the TDP slightly (I haven’t checked) but I’m 90% sure it’s still >10W.

    That said, 10W is indeed too high for fanless tablets by my understanding, excepting particularly exotic cooling.

    • Andrew Lauritzen
    • 8 years ago

    This! It took me 20 minutes yesterday to realize that was FP16… and it’s a 10W TDP. Still good, but not 2x as good as everyone else (which should always raise questions in this space). Sigh…

    • chuckula
    • 8 years ago

    OK Debbie Downer –> Yes I was fully aware of all that including how FP16 by itself requires tweaked hardware (for example, the carry-look ahead logic for 2 16-bit adds isn’t the same as for a single 32-bit add).

    Humor –> I employ it, and even more humorously, I’m very often accused of being an Nvidia fanboy.

    • albundy
    • 8 years ago

    Raising the bar. Synthetically. Again.

    • Benetanegia
    • 8 years ago

    False. Your GPU whichever it might be, can NOT do 2xFP16 instructions on the ALUs. FP16 instructions are promoted to full fledge FP32 instructions and run at 1x the rate, as opposed to 2x the rate in Tegra X1.

    JHH explained the difference between FP16 and FP64 in supercomputers abundantly and in multiple ocassios in his speech. And it was also made abundantly clear that it’s 1 Tflops in FP16 and FP16 only. FP16 is extremely useful in mobile platforms anyhow, and also (apparently) in the automotive industry, so the ability of the X1 to achieve such a high throughput is remarkable.

    Imagination and Apple ALSO market their chips using FP16 throughput quite often.

    • blastdoor
    • 8 years ago

    I think it’s standard procedure when you’re not in first place. Companies that legitimately have the best products can be honest. Companies that don’t can’t be honest. This applies to everyone. During the Netburst era, Intel lied its a$$ off all the time and AMD was pretty straightforward (at least about performance, not about shipping on time). From the Core era onward, Intel is pretty honest and AMD lies its a$$ off all the time. Back when Apple used the PPC G4, they lied their a$$ off all the time. But with their custom A# cores, they’re actually pretty honest.

    The bottom line is that nobody can be honest when the truth is that their product is second rate. Nobody.

    • flip-mode
    • 8 years ago

    It might seem dishonest but that is only because you are looking at it from an apples to apples perspective. JHH has a broader perspective than that, often using an apples to whatever-he-has-in-his-pocket perspective. To be fair to JHH, this is pretty standard procedure.

    • flip-mode
    • 8 years ago

    I think it is a historical reference. Ten years ago it was not an uncommon thing on hardware review sites. So saying the graphs are the worst thing since that long ago is a pretty hard slam.

    • jessterman21
    • 8 years ago

    900p 30fps on High? Definitely.

    • chuckula
    • 8 years ago

    [quote<]Does this indicate that A57 is just that good?[/quote<] Having now seen the A57 in action (finally) it's a pretty safe bet that Apple has nothing to be concerned about. Remember one thing: Nvidia is a GPU company first and foremost. Given that Denver isn't the hyped miracle it was supposed to be, Nvidia is more than happy to slap in adequate A57 cores while really focusing the SoC on the Maxwell GPU for both graphics and compute purposes.

    • blastdoor
    • 8 years ago

    My guess is that this is a bottleneck at the foundries. Every year apple is pushing out a new custom SOC on the latest process at both TSMC and Samsung. It could be that the foundries are telling everyone else that they can’t support anything other than generic ARM cores, because their A team engineers are all tied up on apple products for the foreseeable future.

    • chuckula
    • 8 years ago

    Yeah, when I’ve read about 100 posts whining that the Core-m — at 4.5 watts — is a “power hog” and “thermally throttles” constantly, I’m not holding my breath for these chips to be showing up in tablets. At least not at 10 watts, maybe massively downclocked with some cores disabled though.

    • derFunkenstein
    • 8 years ago

    It’s beyond tablet territory, too. Remember that phones and tablets generally run the same SoCs.

    [s<]Even the Surface Pro 3 with its fan runs a 4.5W CPU.[/s<] edit: WRONG

    • derFunkenstein
    • 8 years ago

    No. Qualcomm is moving to A57 because they were caught with their pants down when Apple announced the A7. There hasn’t been enough time for Qualcomm to develop a custom 64-bit CPU core.

    • Deanjo
    • 8 years ago

    If cash strapped “our CEO is our janitor” AMD has the resources to develop x86 then nVidia has far more than enough money to reach for the lower hanging fruit of a low power high performance ARM core.

    • sweatshopking
    • 8 years ago

    Nvidia certainly has the money. they’ve made massive profits for years. they have billions they could invest in doing it. it’s not just apple that has cash, contrary to what you believe.
    They clearly decided it’s just not WORTH doing.

    • chuckula
    • 8 years ago

    Real Question: Can it drive your car well enough to Avoid a Cr[s<]y[/s<][u<]i[/u<]sis?

    • ronch
    • 8 years ago

    Can it run Crysis?

    • adisor19
    • 8 years ago

    It’s not that. It’s the time and money that Nvidia doesn’t currently have available to develop a low power high performance custom ARM core.

    Adi

    • adisor19
    • 8 years ago

    Bilions of $$$ and many years are needed to develop custom ARM cores. Apple dipped its toes with the A4 and only managed to pull it off with the A6 and later designs after lots and lots of R&D. Their A5 chip was just a standard Coretex A2 ARM core. It took them years to come out with the A6. This is something that pple have a hard time understanding : it’s not easy to create high performance CPU cores and only a few players in the market are able to pull it off.

    Nvidia doesn’t have that kind of resources. Even Qualcomm is struggling currently. The only players with enough resources to pull off custom cores are Intel and Apple and intel is probably very worried at this point seeing Apple’s success with custom ARM cores.

    Adi

    • chuckula
    • 8 years ago

    I prefer to look at it this way: Nvidia just upgraded my 18 month old rig into a multi-terraflop monster using only powerpoint slides!!

    • chuckula
    • 8 years ago

    [quote<]Nvidia claims the Tegra X1 is the first mobile chip to deliver a teraflop of floating-point throughput.[/quote<] Sounds moderately impressive UNTIL: [quote<]Unlike previous Tegra chips, the X1 supports floating-point datatypes with 16 bits of precision, otherwise known as FP16. This format should require less power to process than the FP32 format used by the Tegra K1.[/quote<] Pfft... it's not even single precision teraflop performance, it's 1/2 precision. Whoopee doo.

    • tipoo
    • 8 years ago

    I kind of just expect disappointment from Tegra at this point. Each one has been announced early and sounded exciting and class winning at the time, but shipped far after and when the competition had leapfrogged it. The K1 is ok, but high power draw and in few shipping designs.

    There’s also the usual Nvidia Funky Buisness ™ in this presentation. 1 terraflop, and comparing it to supercomputers? Except, supercomputers would run at fp64, and their figure is for their double speed fp16. The Elemental demo that they bragged about needing 100w systems for the year before, is clearly a visual downgrade from what the other systems ran.

    • tipoo
    • 8 years ago

    Nvidias line on it was that they used stock cores just for time to market purposes, and are still interested in custom. Whether they’re actually better than a57 remains to be seen.

    • Zizy
    • 8 years ago

    Well, NV claims they took A57 because of time to market. Not sure if true, but it is somewhat plausible – K1 Denver was after K1 not Denver (a15).
    My guess: 4C Denver with companion cores isn’t ready if it even exists. Current 2C is meh vs upcoming SD 810 plus it is 28nm so it would also need to be ported, probably taking about as much time as integrating ARM provided solution.

    • NTMBK
    • 8 years ago

    I’d expect it to at least turn up in a few Android TV devices, even if it’s a little too power hungry for a fanless tablet. If hardcore Android gaming ever takes off NVidia will be pretty well positioned.

    • NTMBK
    • 8 years ago

    NVidia’s 64-bit custom core was ready before A57… it just wasn’t that good.

    • NTMBK
    • 8 years ago

    Please, don’t report the TFLOP number without the caveat that this is for FP16 data. JHH then compared this number to a supercomputer… where TFLOPs are measured in FP64. Pretty dishonest marketing there.

    • Zizy
    • 8 years ago

    A57 is ready, custom cores aren’t 🙂 Being 64bit with cleaned up isa has a nice advantage over 32 bit stuff. Plus 64 bit is good for marketing.

    I don’t mind these “off the chart” too much. Thing is 50%+ faster and this is all these NV graphs give you. We need to wait for TR, Anandtech and similar for real hard numbers anyway. 80% starts are way worse, as they are misleading – showing large gains where there are none.

    • MathMan
    • 8 years ago

    Where do you see charts starting at 80%?

    • renz496
    • 8 years ago

    not saying they out right throw Icera. but if you want connectivity for nvidia part they can give you their modem. just that the modem is not integrated to the SoC. to me that is more logical for them to do so. and nvidia already mention they will not going to actively pursue smartphone market anymore. with K1 they bring in their GPGPU stuff into mobile. imagine board like jetson TK1. does integrated modem really matter on such board? instead of wasting die space for integrated modem they can use that space for improving their compute performance.

    • jjj
    • 8 years ago

    Sure thing that’s why they spent hundreds of millions on buying Icera to be out of phones ( a market some 5 times bigger than the tablet market ) and in the future glasses. No connectivity equals no future in mobile.
    They just can’t do it right, same as Samsung and Intel so far.

    • renz496
    • 8 years ago

    when nvidia come up with TK1 it is clear that they have no interest to integrate modem into their future tegra.

    • LordVTP
    • 8 years ago

    It seems possible to me that the lack of tablet/gaming usage talked bout for the x1 might be related to the a57 cores on it, this may be a what the a15 based k1 is to the Denver version k1. For whatever reason (maybe this automotive stuff) the a57s make more sense then larger/fewer threads available for a given transistor budget Denver cores.

    • Visigoth
    • 8 years ago

    10 watts is definitely past mobile phones and firmly into tablet territory…perhaps the next-gen NVIDIA Shield/Nexus tablet?

    • RdVi
    • 8 years ago

    It seems like everyone is moving away from custom cpu cores (bar apple) for the time being. Does this indicate that A57 is just that good?

    As a side note, those “off the chart” graphs are the worst thing since setting the starting point at 80%.

    • jjj
    • 8 years ago

    The press release says 8 CPU cores (4x ARM Cortex A57 + 4x ARM Cortex A53) on 20nm.
    So no quad Denver for us 🙁 , so far anyway. – the article here was edited to include this info but TR apparently doesn’t mention edits.

    Edit [url<]http://www.hardwarezone.com.sg/feature-preview-nvidia-tegra-x1-benchmark-results[/url<] The way it went i wonder if they are just out of mobile. Denver got to be deeply flawed somehow if they don't use it, no modem, it was all about cars and a timeframe for devices mentioned only in the press release [url<]http://www.nvidia.com/object/tegra-x1-processor.html[/url<] Tx1 whitepaper [url<]http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf[/url<]

Pin It on Pinterest

Share This

Share this post with your friends!