just brew it! wrote:That's a tough one, since to verify that a GPU is producing correct output for every frame, you need something that can compute the same output. In other words, a known good GPU. I suppose you could run a GPU-intensive workload, then in the middle of all that feed it a "reference" frame to render, and see if it comes up with the correct result.
But different GPUs (and different driver versions) may produce slightly different results on a pixel-by-pixel basis, so comparing the output frames is problematic. How much of a discrepancy is meaningful?
I don't see the problem?? Just compute the same frame again and if the raw data isn't identical, then obviously something isn't stable. If a GPU was unstable it wouldn't compute the same image data twice. I suppose the artifact scanners could be simply using the CPU to check for out-of-bounds data too as that would be less computationally intensive.
Now I'm curious, I'm not really sure how GPU artifact scanners work. I assumed it was just checksum comparisons on two frames. EVGA's OC Scanner X has a dozen various specific types of workloads, DX, OpenGL, Compute, Tessellation, Particles, and some others. It offers built-in artifact scanning for about half of the various offered workloads:
https://www.evga.com/ocscanner/ This program is pretty old and is an amalgamation of Furmark and other apps. Some of the older tests no longer work, but most of them do.
In my experience if there is a single extremely rare artifact that shows up once every couple minutes, then if the same test is kept running eventually the corruption will keep growing until it becomes visible, then keep growing still until it eventually crashes the program or GPU driver. So I kill any test if the app finds a single artifact, regardless of whether it is visible or not.