Just after publishing my Inside the second article, I jetted off to the Intel Developer Forum last week in order to soak up the latest info on industry happenings. Little did I know that I would end up having a very interesting conversation with the folks at Lucid about one of the more intriguing concepts we'd discussed in that article. Yet that's exactly what happened, and we're now able to relay some news about a "better vsync" implementation with the potential to revolutionize the way graphics solutions deliver consistent and responsive in-game motion.
We met with Lucid's Founder and President, Offir Remez, to discuss Lucid's recent developments, and our excitement grew as Remez introduced a new technology his company has developed, dubbed (perhaps tragically) HyperFormance. With his help, we peeled back the different layers of the onion in order to understand exactly what HyperFormance does and why it matters.
To understand Lucid's new technology, we should start by establishing an understanding of Lucid's existing technologies. The firm's first product was the Hydra GPU load balancing chip, but the GPU virtualization technology it invented for Hydra became much more widely known thanks to a mistake Intel made: disabling the integrated graphics processor in its Sandy Bridge CPUs when a discrete GPU is installed in the system. Sandy Bridge's IGP has a couple of distinctive features, such as hardware-accelerated QuickSync video transcoding and Intel's WiDi display tech, that a power user might wish to enjoy, but most power users will want a discrete GPU for gaming, too. Lucid stepped in and bridged the gap with a software product called Virtu that enables simultaneous access to both QuickSync and the rendering power of a discrete GPU by virtualizing one of the two.
Remez told us Virtu has been a nice success for Lucid, since it's bundled with practically every Z68 motherboard sold and has produced "good feedback" with "happy end users." Remez also revealed Lucid operates a "silent activation server" that tracks user activations, and he described the attach rate for Virtu as "unbelievable." What's more, he said, preliminary data may indicate that almost half of the users are running Virtu in i-mode, where the display is connected to the Sandy Bridge IGP rather than the discrete video card. That is Lucid's preferred configuration, and widespread use of it would show faith in the firm's virtualization tech; i-mode configs virtualize the discrete GPU and use Lucid's abstraction layer for in-game rendering. We'd probably tend to prefer d-mode, since it gives access to GPU-specific driver control panel features and the GPU maker's game-specific tuning. Still, if users are happy with i-mode, it is a nice testament to the compatibility and performance of Lucid's solution.
So far, our story of Lucid's tech may be familiar, but it took a bit of a confounding turn at Computex this past year, when the company unveiled Virtu Universalwhich works on more platforms, including AMD's Llano IGPand a puzzling new feature known as Virtual Vsync. Our report from the show relayed Lucid's pitch for the product, which supposedly allows "frame rates well in excess of a monitor's refresh rate without subjecting users to unsightly tearing."
We were a little bit dubious about the product's merits at the time, so last week, I asked Remez to explain more clearly exactly how it works. Turns out the basics are fairly straightforward. In a dual-GPU Virtu i-mode setup, where the IGP is connected to the display and the discrete GPU is handling game rendering, there's a bit of a tricky question about how synchronization to the display's vertical refresh rate will be achieved. With Virtual Vsync, the IGP communicates with the display and flips to a new frame buffer in between screen painting sessions, which typically happen 60 times per second.
That's also pretty much how standard vsync works. Many users prefer to enable vsync to avoid the tearing artifacts that can be created by flipping to a new frame buffer in middle of a screen refresh. Most GPUs these days combine vsync with a technique known as triple buffering, in which completed frames are queued up in buffers, in order to ensure a new frame is always ready for display when the time comes. Triple buffering frees the GPU to render ahead and accumulate more frames in the buffers, but if the GPU produces frames at a rate faster than the display's refresh rate, some frames inevitably will be discarded. Superficially, from the perspective of tools and benchmarks, conventional vsync has the effect of capping frame rates at the display's refresh rate.
|In the context of our newfound focus on frame latencies as a better measure of in-game performance, a virtualized version of vsync raises some tantalizing possibilities.|
With Virtual Vsync, the GPU is free to render as many frames as possible. The IGP then only displays a subset of those frames, in conjunction with the display refresh, in order to prevent tearing. The elimination of tearing is Virtual Vsync's primary benefit. Still, the fact that the game engine and the discrete GPU are running at a faster rate is apparent to applications, and Lucid has promoted that property of Virtual Vsync. Any utilities like Fraps or benchmarks that measure performance in frames per second will report frame rates higher than 60 FPS, so long as the system's CPU and discrete GPU are up to the task of producing more frames.
That fact alone doesn't seem, you know, terribly useful in the grand scheme of things. One could always simply disable vsync when running benchmarks, as is our standard practice. Higher FPS counts by themselves have little meaning.
However, in the context of our newfound focus on frame latencies as a better measure of in-game performance, a virtualized version of vsync raises some tantalizing possibilities, if it were to have some smarter logic built into it. In fact, in our conversation with him about multi-GPU micro-stuttering, AMD's David Nalasco mentioned a "smarter" version of vsync as a potential means of reducing jitter or other forms of uneven frame delivery.
Turns out at least one smart engineer at Lucid has been thinking along the same lines. In fact, Lucid has built just such a technology, and that's where the unfortunate name "HyperFormance" comes into the picture. HyperFormance is an evolved version of Virtual Vsync with additional intelligence that attempts to ensure optimal delivery of frames to the user. According to Remez, the HyperFormance algorithm has many inputs, including the display timing, CPU and GPU utilization, and what's happening with user input devices. With all of that info at hand, the algorithm attempts to make sure frames are delivered in a timely fashion just as the display is ready to be refreshed.
One of the keys to making HyperFormance work is the fact that, in many cases, frames will have to be dropped or, without vsync, only partially displayed. After all, a GPU spitting out 100 FPS will overwhelm a display scanning at 60Hz, so something has to give. Lucid's algorithm attempts to predict which frames are likely candidates not to be displayedor to be displayed only very partially, contributing to tearing. Those frames are considered "redundant rendering tasks" and are subject to special treatment. Lucid can't simply discard or refuse to handle those frames; doing so would cause problems. Instead, some work is still performed on redundant frames, such as texture loads, which are needed to keep caches warm and to preserve memory locality. Once those basics are performed, though, other work like pixel shader computations can be skipped, because that frame will never be shown to the user. Instead, the GPU finishes its work quickly, the game engine churns out another frame, and Lucid again estimates whether this next frame is a good candidate for display. By tightening that loop between the game engine, DirectX, and the rendering subsystem, the HyperFormance algorithm ensures that any frames fully rendered and sent to the display are very timely.
If it works well, HyperFormance ought to have several related benefits. Evened-out frame delivery in sync with the display should provide a smooth sense of motion, of course. That motion should be improved by the fact that the frames displayed would contain timely informationi.e., the latest updates to the underlying geometry in the game world. Furthermore, the immediacy of those frames should reduce the lag between user inputs and display updates. Remez contrasted the inherent delays in conventional triple buffering, where queuing up three frames at 60 FPS would add up to 48 milliseconds of lag, with Lucid's scheme, where frames are delivered more or less "just in time" for the display refresh.
Make no mistake: if HyperFormance works as advertised, then it should be far from a gimmicky software feature; it could bring a true and tangible improvement in perceived gaming performance, a cut above anything currently offered by AMD or Nvidia. Remez told us Lucid has a host of patents in the works related to this technology, as one might expect.