The warm-up: Memory subsystem performance
Since the TLB erratum workaround is likely to affect the performance of the memory subsystem most directly, we'll start by looking at some synthetic memory benchmark results. Note that performance results for these tests don't translate directly into real-world application performance. These more focused benchmarks are instead designed to stress the memory subsystem, including CPU caches and main memory.
This first test lets us whip out a fancy-looking graph. Bear with us as we show off briefly.
I hesitated to include these results because, well, the graph is really hard to read. The problem? All three of the Phenom CPU configs running at 2.3GHz overlap almost entirely. I do think that's enlightening, though, because it shows us that L2 and L3 cache bandwidth don't appear to be affected by the TLB workaround. We can isolate the larger 1GB block size in this test, though, and see a more tangible difference.
Here, the TLB patch's impact begins to be apparent. It's even more apparent in Sandra's Stream-like test of main memory bandwidth. Have a look.
Ow. Memory bandwidth drops from roughly 5.4 GB/s without the TLB patch to 3.65 GB/s with it.
Of course, bandwidth is only part of the story. Let's look at memory access latencies, as well.
The TLB patch exacts an access latency penalty of 40 nanoseconds at our sample block size. We can look at the latency picture more fully with a terrifying-but-useful 3D graph.
In these graphs, yellow represents L1 cache, light orange is L2 cache, and dark orange is main memory. What you're seeing here is memory access latencies at various block and step sizes, in a way that exposes latency for the various stages in the memory hierarchy.
These latency graphs demonstrate the same basic principle that the bandwidth graphs did. The TLB erratum workaround exacts its performance penalty by slowing main memory performance. The actual impact on memory performance is much greater than the 10% number we've seen floating around, but the patch's affect on application performance will depend on whether the app's working data set can fit into the CPU's on-chip caches. The more often an application must reach into main memory, the greater the impact of the erratum workaround will be.
|Some of AMD's next chips will arrive on GloFo's new 12LP process||3|
|The Tech Report System Guide: September 2017 edition||38|
|Intel shows off 10-nm Cannon Lake wafer and talks process tech||25|
|AOC Agon AG322QCX offers 32" of gaming goodness on the cheap||17|
|Aqua Computer Cuplex Kryos Next block is ready for Threadripper||8|
|Amazon's Kindle Fire HD 10 gets a meaty hardware upgrade||26|
|Noctua NH-L9a-AM4 and NH-L12S are ready for little boxes||9|
|Gigabyte's X399 Designare-EX adds Thunderbolt to Threadripper||15|
|No, you can't enable Threadripper's extra two dice||62|