Just after publishing my Inside the second article, I jetted off to the Intel Developer Forum last week in order to soak up the latest info on industry happenings. Little did I know that I would end up having a very interesting conversation with the folks at Lucid about one of the more intriguing concepts we’d discussed in that article. Yet that’s exactly what happened, and we’re now able to relay some news about a “better vsync” implementation with the potential to revolutionize the way graphics solutions deliver consistent and responsive in-game motion.
We met with Lucid’s Founder and President, Offir Remez, to discuss Lucid’s recent developments, and our excitement grew as Remez introduced a new technology his company has developed, dubbed (perhaps tragically) HyperFormance. With his help, we peeled back the different layers of the onion in order to understand exactly what HyperFormance does and why it matters.
To understand Lucid’s new technology, we should start by establishing an understanding of Lucid’s existing technologies. The firm’s first product was the Hydra GPU load balancing chip, but the GPU virtualization technology it invented for Hydra became much more widely known thanks to a mistake Intel made: disabling the integrated graphics processor in its Sandy Bridge CPUs when a discrete GPU is installed in the system. Sandy Bridge’s IGP has a couple of distinctive features, such as hardware-accelerated QuickSync video transcoding and Intel’s WiDi display tech, that a power user might wish to enjoy, but most power users will want a discrete GPU for gaming, too. Lucid stepped in and bridged the gap with a software product called Virtu that enables simultaneous access to both QuickSync and the rendering power of a discrete GPU by virtualizing one of the two.
Remez told us Virtu has been a nice success for Lucid, since it’s bundled with practically every Z68 motherboard sold and has produced “good feedback” with “happy end users.” Remez also revealed Lucid operates a “silent activation server” that tracks user activations, and he described the attach rate for Virtu as “unbelievable.” What’s more, he said, preliminary data may indicate that almost half of the users are running Virtu in i-mode, where the display is connected to the Sandy Bridge IGP rather than the discrete video card. That is Lucid’s preferred configuration, and widespread use of it would show faith in the firm’s virtualization tech; i-mode configs virtualize the discrete GPU and use Lucid’s abstraction layer for in-game rendering. We’d probably tend to prefer d-mode, since it gives access to GPU-specific driver control panel features and the GPU maker’s game-specific tuning. Still, if users are happy with i-mode, it is a nice testament to the compatibility and performance of Lucid’s solution.
So far, our story of Lucid’s tech may be familiar, but it took a bit of a confounding turn at Computex this past year, when the company unveiled Virtu Universalwhich works on more platforms, including AMD’s Llano IGPand a puzzling new feature known as Virtual Vsync. Our report from the show relayed Lucid’s pitch for the product, which supposedly allows “frame rates well in excess of a monitor’s refresh rate without subjecting users to unsightly tearing.”
We were a little bit dubious about the product’s merits at the time, so last week, I asked Remez to explain more clearly exactly how it works. Turns out the basics are fairly straightforward. In a dual-GPU Virtu i-mode setup, where the IGP is connected to the display and the discrete GPU is handling game rendering, there’s a bit of a tricky question about how synchronization to the display’s vertical refresh rate will be achieved. With Virtual Vsync, the IGP communicates with the display and flips to a new frame buffer in between screen painting sessions, which typically happen 60 times per second.
That’s also pretty much how standard vsync works. Many users prefer to enable vsync to avoid the tearing artifacts that can be created by flipping to a new frame buffer in middle of a screen refresh. Most GPUs these days combine vsync with a technique known as triple buffering, in which completed frames are queued up in buffers, in order to ensure a new frame is always ready for display when the time comes. Triple buffering frees the GPU to render ahead and accumulate more frames in the buffers, but if the GPU produces frames at a rate faster than the display’s refresh rate, some frames inevitably will be discarded. Superficially, from the perspective of tools and benchmarks, conventional vsync has the effect of capping frame rates at the display’s refresh rate.
|In the context of our newfound focus on frame latencies as a better measure of in-game performance, a virtualized version of vsync raises some tantalizing possibilities.|
With Virtual Vsync, the GPU is free to render as many frames as possible. The IGP then only displays a subset of those frames, in conjunction with the display refresh, in order to prevent tearing. The elimination of tearing is Virtual Vsync’s primary benefit. Still, the fact that the game engine and the discrete GPU are running at a faster rate is apparent to applications, and Lucid has promoted that property of Virtual Vsync. Any utilities like Fraps or benchmarks that measure performance in frames per second will report frame rates higher than 60 FPS, so long as the system’s CPU and discrete GPU are up to the task of producing more frames.
That fact alone doesn’t seem, you know, terribly useful in the grand scheme of things. One could always simply disable vsync when running benchmarks, as is our standard practice. Higher FPS counts by themselves have little meaning.
However, in the context of our newfound focus on frame latencies as a better measure of in-game performance, a virtualized version of vsync raises some tantalizing possibilities, if it were to have some smarter logic built into it. In fact, in our conversation with him about multi-GPU micro-stuttering, AMD’s David Nalasco mentioned a “smarter” version of vsync as a potential means of reducing jitter or other forms of uneven frame delivery.
Turns out at least one smart engineer at Lucid has been thinking along the same lines. In fact, Lucid has built just such a technology, and that’s where the unfortunate name “HyperFormance” comes into the picture. HyperFormance is an evolved version of Virtual Vsync with additional intelligence that attempts to ensure optimal delivery of frames to the user. According to Remez, the HyperFormance algorithm has many inputs, including the display timing, CPU and GPU utilization, and what’s happening with user input devices. With all of that info at hand, the algorithm attempts to make sure frames are delivered in a timely fashion just as the display is ready to be refreshed.
One of the keys to making HyperFormance work is the fact that, in many cases, frames will have to be dropped or, without vsync, only partially displayed. After all, a GPU spitting out 100 FPS will overwhelm a display scanning at 60Hz, so something has to give. Lucid’s algorithm attempts to predict which frames are likely candidates not to be displayedor to be displayed only very partially, contributing to tearing. Those frames are considered “redundant rendering tasks” and are subject to special treatment. Lucid can’t simply discard or refuse to handle those frames; doing so would cause problems. Instead, some work is still performed on redundant frames, such as texture loads, which are needed to keep caches warm and to preserve memory locality. Once those basics are performed, though, other work like pixel shader computations can be skipped, because that frame will never be shown to the user. Instead, the GPU finishes its work quickly, the game engine churns out another frame, and Lucid again estimates whether this next frame is a good candidate for display. By tightening that loop between the game engine, DirectX, and the rendering subsystem, the HyperFormance algorithm ensures that any frames fully rendered and sent to the display are very timely.
If it works well, HyperFormance ought to have several related benefits. Evened-out frame delivery in sync with the display should provide a smooth sense of motion, of course. That motion should be improved by the fact that the frames displayed would contain timely informationi.e., the latest updates to the underlying geometry in the game world. Furthermore, the immediacy of those frames should reduce the lag between user inputs and display updates. Remez contrasted the inherent delays in conventional triple buffering, where queuing up three frames at 60 FPS would add up to 48 milliseconds of lag, with Lucid’s scheme, where frames are delivered more or less “just in time” for the display refresh.
Make no mistake: if HyperFormance works as advertised, then it should be far from a gimmicky software feature; it could bring a true and tangible improvement in perceived gaming performance, a cut above anything currently offered by AMD or Nvidia. Remez told us Lucid has a host of patents in the works related to this technology, as one might expect.
HyperFormance in motion
We got a quick demo of HyperFormance in action on a gaming rig with a Core i5-2300 and a GeForce GTX 580 graphics card. The system was running Virtu in i-mode, with the display connected to the Sandy Bridge IGP’s video output. Lucid chose Modern Warfare 2 for this demo. Because it’s not especially CPU or GPU limited, MW2 presents a particular sort of challenge to the display subsystem. In the gun range area at the beginning of the game, frame rates in the on-screen Fraps readout averaged around 330 FPSway faster than the display can handle. Without vsync enabled, one could see loads of tearing in each screen refresh, with multiple transition lines an inch or two apart onscreen. When Remez turned on HyperFormance, the tearing was banished and yet, if anything, the on-screen motion appeared to be faster and smoother than before. I didn’t get to play a multiplayer match or spend enough time with the system to comment strongly on its responsiveness to user inputs, but the HyperFormance-enabled config did seem to be very quick, too.
Jarringly, the Fraps frame rate counter shot up to over 600 FPS with HyperFormance turned ona consequence of the Lucid software choosing, behind the scenes, to render some frames only partially. Remez pointed out the Fraps count wasn’t really a correct number, but he asserted that the higher FPS reading was an indication of responsiveness. I can see where he’s coming from there, because the benefits of a technology like this one can be difficult to convey. Still, after all of our recent work in this area, I’m developing an allergy to gaming performance results expressed in frames per second, especially in cases like this one. The key to HyperFormance is delivering the right frame at the right time, not an increased frame rate.
If you’re salivating at the prospect of using this technology yourself, remember HyperFormance requires a Virtu setup with both an IGP and a discrete GPU. Although Z68 motherboards are pretty popular, many gamers don’t have an IGP in their systems. Also, most laptops have only an IGP and nothing else.
When we expressed our consternation about the IGP + GPU hardware requirement to Remez, he quickly steered us to another demo of an early, in-development software product Lucid calls Virtu XLR8. This program is essentially a stand-alone version of HyperFormance capable of running on a single GPU. Remez said using only one GPU for the whole enchilada adds some overhead (though he didn’t use the word “enchilada,” sadly), but Lucid is in the early stages of building a solution anyhow.
The Virtu XLR8 demo was running on a laptop with a Sandy Bridge processor and an Intel HD 3000 IGP, again in Modern Warfare 2. Without XLR8, the Fraps frame counter hovered around 30 FPS and, since vsync wasn’t enabled, we saw ample visible tearing even at this low frame rate. Worse, the entire setup was slow enough that the game’s animations didn’t appear fluid, a song heard many times in relation to integrated graphics solutions.
When Remez turned on XLR8 and restarted the game, the whole experience was unexpectedly transformed. Yes, the tearing was suppressed and the Fraps FPS counter shot up, as we’d seen with HyperFormance before. More shocking, though, was how much smoother the entire game seemed to run. When you only have 30 or so frames your GPU can deliver in a second, it apparently pays to deliver those frames at the correct intervals. The Sandy Bridge IGP went from offering a rather poor gaming experience to a borderline acceptable one, thanks to Lucid’s algorithm. Both the visual quality of the game and its fluidity were clearly improved.
We want to spend more time with HyperFormance and a later, finished version of Virtu XLR8, but we’re compelled by the concept and by the two quick demonstrations of the tech we’ve seen. The next question many of us will be asking is how we can get our hands on the software. The first place HyperFormance will be available is in Virtu Universal MVP, Lucid’s new top-end product in its Virtu lineup.
Lucid issued a press release at IDF announcing, “Virtu Universal MVP is available to system platform manufacturers using Intel Sandy Bridge Z68/H67/H61, and other Intel integrated graphics, as well as many AMD processor-based PCs and notebooks.” We’re not yet aware of any motherboard or system makers who have licensed Virtu Universal MVP to ship with their products, but we’d expect adoption to take a little bit of time. For most of us, the larger issue is the fact Virtu is sold exclusively as a bundled software package, not as a stand-alone product end users can purchase for themselves.
Fortunately, Remez told us Lucid is considering other avenues for its software sales, given the potential appeal of the HyperFormance algorithm and especially of Virtu XLR8. He confided that his hope was to see XLR8 technology licensed by Intel and built right into its IGP graphics drivers. Such an arrangement would be quite the coup for, say, the gaming responsiveness of Ivy Bridge graphics. Intel Capital helped to fund Lucid, and that relationship was obviously instrumental when Intel pushed Virtu as a solution to the Z68 QuickSync dilemma. Still, the integration of XLR8 into Intel’s graphics drivers was just Remez’s ambition when we spoke with him, nothing like a done deal. Lucid’s other obvious option is to sell its Virtu products directly to end users, including PC gamers looking for an improved experience. As far as we know, nothing has been decided on that front yet, but Lucid is eyeing the possibility carefully.
We’d very much like to see a version of XLR8 that runs on a single, discrete GPU or perhaps even a mismatched team of discrete GPUs, if the overhead of doing it all on one chip is too great. Although we ran out of time before we could discuss this issue with Remez, we’re also curious about whether a HyperFormance-type algorithm could be the answer to the multi-GPU micro-stuttering problems we’ve recently encountered. Of course, Lucid has a different solution, in its Hydra chip, for multi-GPU load balancing. Still, we can’t help but wonder if a software-only option for multi-GPU load balancingwith HyperFormance smarts built inmight be feasible.
Regardless, Virtu Universal MVP is already a real product, and the question now is whether key decision-makers in the industryand end userswill understand the benefits of a technology like HyperFormance. If they do, and if Lucid’s software can deliver on its theoretical promise, then we may have a minor revolution in the way we think about GPU and gaming performance on our hands.