The warm-up: Memory subsystem performance
Since the TLB erratum workaround is likely to affect the performance of the memory subsystem most directly, we'll start by looking at some synthetic memory benchmark results. Note that performance results for these tests don't translate directly into real-world application performance. These more focused benchmarks are instead designed to stress the memory subsystem, including CPU caches and main memory.
This first test lets us whip out a fancy-looking graph. Bear with us as we show off briefly.
I hesitated to include these results because, well, the graph is really hard to read. The problem? All three of the Phenom CPU configs running at 2.3GHz overlap almost entirely. I do think that's enlightening, though, because it shows us that L2 and L3 cache bandwidth don't appear to be affected by the TLB workaround. We can isolate the larger 1GB block size in this test, though, and see a more tangible difference.
Here, the TLB patch's impact begins to be apparent. It's even more apparent in Sandra's Stream-like test of main memory bandwidth. Have a look.
Ow. Memory bandwidth drops from roughly 5.4 GB/s without the TLB patch to 3.65 GB/s with it.
Of course, bandwidth is only part of the story. Let's look at memory access latencies, as well.
The TLB patch exacts an access latency penalty of 40 nanoseconds at our sample block size. We can look at the latency picture more fully with a terrifying-but-useful 3D graph.
In these graphs, yellow represents L1 cache, light orange is L2 cache, and dark orange is main memory. What you're seeing here is memory access latencies at various block and step sizes, in a way that exposes latency for the various stages in the memory hierarchy.
These latency graphs demonstrate the same basic principle that the bandwidth graphs did. The TLB erratum workaround exacts its performance penalty by slowing main memory performance. The actual impact on memory performance is much greater than the 10% number we've seen floating around, but the patch's affect on application performance will depend on whether the app's working data set can fit into the CPU's on-chip caches. The more often an application must reach into main memory, the greater the impact of the erratum workaround will be.
|HP Pro x2 612 G2 is a convertible you can upgrade||1|
|PlayStation VR steadily approaches one million units sold||2|
|Panasonic Toughbook CF-33 will crack the floor you drop it on||7|
|Lenovo Yoga 720 and 520 convertibles check all the right boxes||13|
|Huawei P10 phones mash more data together for better pictures||4|
|LG goes long with its upcoming G6 smartphone||23|
|In the lab: Asus' Tinker Board SBC||16|
|Corsair Lighting Node Pro brings light strip control to every PC||8|
|In the lab: HyperX's Alloy FPS mechanical gaming keyboard||10|
|Best part of the article? We're flying home with Ryzen review samples as of this writing.||+46|