Home Nvidia claims Haswell-class performance for Denver CPU core

Nvidia claims Haswell-class performance for Denver CPU core

Scott Wasson
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.

Some of Nvidia's CPU architects gave a talk at the Hot Chips symposium today, and they revealed some long-awaited details about  Nvidia's first custom CPU design. We weren't able to attend the talk, but the firm evidently pre-briefed some analysts about what it planned to say. There's a free-to-download whitepaper at Tirias Research on the Denver CPU core, and I've been scanning it eagerly to see what we can learn.

We already know Denver is a beefier CPU than ARM's Cortex-A15, since two Denver cores replace four A15 cores in the Denver-based variant of the Tegra K1. We also know Denver is, following Apple's Cyclone, the second custom ARM core to support the 64-bit ARMv8 instruction set architecture. We've long suspected other details, but Nvidia hasn't officially confirmed much—until now.

Here are some highlights of the Denver information revealed in the whitepaper and presumably also in the Hot Chips presentation:

  • Binary translation is for real. Yes, the Denver CPU runs its own native instruction set internally and converts ARMv8 instructions into its own internal ISA on the fly. The rationale behind doing so is the opportunity for dynamic code optimization. Denver can analyze ARM code just before execution and look for places where it can bundle together multiple instructions (that don't depend on one another) for execution in parallel. Binary translation has been used by some interesting CPU architectures in the past, including, famously, Transmeta's x86-compatible effort. It's also used for emulation of non-native code in a number of applications.

    Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128MB cache stored in main memory. Optimized code sequences can then be recalled and replayed when they are used again.

  • Execution is wide but in-order. Denver attempts to save power and reap the benefits of dynamic code optimization by eschewing power-hungry out-of-order execution hardware in favor of a simpler in-order engine. That execution engine is very wide: seven-way superscalar and thus capable of processing as many as seven operations per clock cycle. Denver's peak instruction throughput should be very high. The tougher question is what its typical throughput will be in end-user workloads, which can be variable enough and contain enough dependencies to challenge dynamic optimization routines. In other words, Denver's high peak throughput could be accompanied by some fragility when it encounters difficult instruction sequences.

  • Impressively, Nvidia is claiming instruction throughput rates comparable to Intel's Haswell-based Core processors. That's probably an optimistic claim based on the sort of situations Denver's dynamic optimization handles well. Nonetheless, Nvidia has provided a quick set of results from a handful of common synthetic benchmarks. These numbers are normalized against the performance of the 32-bit version of the Tegra K1 based on quad Cortex-A15 cores. They show Denver challenging a Haswell U-series processor in many cases and clearly outperforming a Bay Trail-based Celeron. Another word of warning, though: we don't know the clock speeds or thermal conditions of the Tegra K1 64 SoC that produced these results.

  • Nvidia has built the expected power-saving measures into the Denver core, with "low latency power-state transitions, in addition to extensive power-gating and dynamic voltage and clock scaling based on workloads," according to a blog entry Nvidia has just posted on the SoC. As a result, they claim, "Denver's performance will rival some mainstream PC-class CPUs at significantly reduced power consumption." That sounds like a bold claim, but one wonders if they're comparing to something like Kaveri rather than Broadwell.

We should know more soon. Nvidia says Tegra K1 64 devices should be available "later this year" and alludes to its new SoC as an Android L development platform. I can't wait to put one of these things through its paces.

Latest News

Cool Cleaning Industry Statistics

30 Interesting Cleaning Industry Statistics for 2024

Streaming News & Events

Amazon Prime Video Secures Deal with Sony Pictures – Stream, Targeting Indian Users

Amazon Prime Video has secured a distribution deal with Sony Pictures Television to launch a dedicated stream service in India. The new service is known as Sony Pictures – Stream,...

EU Probes Into TikTok For Child Safety And Content Violation

EU Probes Into TikTok For Child Safety Monitoring & Possible Content Violation

TikTok is under fire again by the EU which has launched a formal investigation on the app to check if their security measures are enough to protect minors or not....

Lockbit Finally Disrupted By A Joint Venture of US, EU, and Britain

Cybercrime Gang Lockbit Finally Disrupted By A Joint Venture of US, EU, and Britain

Cardano Price Prediction: ADA to Hit $8 in Next Bull Run - Factors to Watch Out For
Crypto News

Cardano Price Prediction: ADA to Hit $8 in Next Bull Run – Factors to Watch Out For

Top Crypto Gainers on 19 February – WLD, GRT, and BEAM
Crypto News

Top Crypto Gainers on 19 February – WLD, GRT, and BEAM

Tech Industry Leaders Commit to Fighting AI Election Interference

Tech Industry Leaders Commit to Fighting AI Election Interference