The warm-up: Memory subsystem performance
Since the TLB erratum workaround is likely to affect the performance of the memory subsystem most directly, we'll start by looking at some synthetic memory benchmark results. Note that performance results for these tests don't translate directly into real-world application performance. These more focused benchmarks are instead designed to stress the memory subsystem, including CPU caches and main memory.
This first test lets us whip out a fancy-looking graph. Bear with us as we show off briefly.
I hesitated to include these results because, well, the graph is really hard to read. The problem? All three of the Phenom CPU configs running at 2.3GHz overlap almost entirely. I do think that's enlightening, though, because it shows us that L2 and L3 cache bandwidth don't appear to be affected by the TLB workaround. We can isolate the larger 1GB block size in this test, though, and see a more tangible difference.
Here, the TLB patch's impact begins to be apparent. It's even more apparent in Sandra's Stream-like test of main memory bandwidth. Have a look.
Ow. Memory bandwidth drops from roughly 5.4 GB/s without the TLB patch to 3.65 GB/s with it.
Of course, bandwidth is only part of the story. Let's look at memory access latencies, as well.
The TLB patch exacts an access latency penalty of 40 nanoseconds at our sample block size. We can look at the latency picture more fully with a terrifying-but-useful 3D graph.
In these graphs, yellow represents L1 cache, light orange is L2 cache, and dark orange is main memory. What you're seeing here is memory access latencies at various block and step sizes, in a way that exposes latency for the various stages in the memory hierarchy.
These latency graphs demonstrate the same basic principle that the bandwidth graphs did. The TLB erratum workaround exacts its performance penalty by slowing main memory performance. The actual impact on memory performance is much greater than the 10% number we've seen floating around, but the patch's affect on application performance will depend on whether the app's working data set can fit into the CPU's on-chip caches. The more often an application must reach into main memory, the greater the impact of the erratum workaround will be.
|Cortex-A73 CPU and Mali-G71 GPU power up next-gen phones||41|
|Asus Transformer 3-series are laptops in disguise||6|
|GTX 1070 review roundup: invincible performance per dollar||59|
|Asus slims down Zenbook line with Zenbook 3||16|
|be quiet! Dark Base 900 cases are back in black||0|
|Toshiba's OCZ RD400 512GB SSD reviewed||21|
|Gigabyte shows off its thin Aero laptops and Aorus RGB Fusion Keyboard||21|
|Deals of the week: 25% off Das Keyboard 4 and more||5|
|Everyone from Asus to Zotac has announced a non-reference GTX 1080. I see what you did there!||+45|