With the megahertz era effectively over and processor makers adding cores rather than cranking up clock speeds, game developers looking to exploit the capabilities of current hardware are faced with a daunting challenge”one of the most important issues to be solving as a game developer right now,” according to Valve software’s Gabe Newell. Valve has invested significant resources into optimizing its Source engine for multi-core systems, and doing so has opened up a whole new world of possibilities for its game designers.
You won’t have to wait for Half-Life 3 to enjoy the benefits of Valve’s multi-core efforts, though. Multi-core optimizations for Source will be included in the next engine update, which is due to become available via Steam before Half-Life 2: Episode 2 is released. Read on to see how Valve has implemented multithreading in its Source engine and developer tools, and how they perform on the latest dual- and quad-core processors from AMD and Intel.
Multiple approaches to multi-core
Unlike some types of applications, games strive for 100% CPU utilization to give players the best experience their hardware can provide. That’s easy enough with a single processor core, but more challenging when the number of cores is multiplied by two, and especially by four. Multithreading is needed to take advantage of extra processor cores, and Valve explored several approaches before settling on a strategy for the Source engine.
Perhaps the most obvious way to take advantage of multiple cores is to distribute in-game systems, such as physics, artificial intelligence, sound, and rendering, among available processors. This coarse threading approach plays well with existing game code, which is generally single-threaded, because it essentially just involves using multiple single threads.
Game code tends to be single-threaded because games are inherently serial applicationseach in-game system depends on the output of other systems. Those dependencies create problems for coarse threading, though, because games tend to become bound by the slowest system. It may be possible to spread multiple systems across a number of processor cores, but performance often doesn’t scale in a linear fashion.
Valve initially experimented with coarse threading by splitting the Source engine’s client and server systems between a pair of processor cores. Client-side systems included the user interface, graphics simulation, and rendering, while server systems handled AI, physics, and game logic. Unfortunately, this approach didn’t yield anywhere close to a linear increase in performance. Valve found that its games spend 80% of their time rendering and only 20% simulating, resulting in an imbalance in the CPU utilization of each core. With standard single-player maps, coarse threading was only able to improve performance by about 20%. Doubling performance was possible, but only by using contrived maps designed to inflate physics and AI loads artificially.
In additional to failing to scale well, coarse threading also introduced an element of latency. Valve had to enable the networking component of the engine to keep the client and server systems synchronized, even with the single-player game. Looking forward, Valve also realized that coarse threading runs into problems when the number of cores exceeds the number of in-game systems. There are more than enough in-game systems to go around for today’s dual- and quad-core processors, of course, but with Intel’s 80-core “terascale” research processor hinting at things to come, coarse threading appears to have little long-term potential.
As an alternative toand indeed the opposite ofcoarse threading, Valve turned its attention to fine-grained threading. This approach breaks down problems into small, identical tasks that can be spread over multiple cores, making it considerably more complex than coarse threading. Operations executed in parallel must be completely orthogonal, and scaling gets tricky if the computational cost of each operation is variable.
Interestingly, Valve has already implemented fine-grained threading in a couple of its in-house development tools. Valve uses proprietary VVIS and VRAD applications to distribute the calculation of visibility and lighting for game levels across all the systems in its Bellevue headquarters. These apps have long taken advantage of distributed computing, much like Folding@Home, but are also well suited to fine-grained threading. Valve has seen close to linear scaling adapting the apps to take advantage of multiple cores, and has even delayed upgrading systems in its offices until it can order quad-core CPUs.
The Prius method of game programming
Fine-grained threading may work well when it comes to visibility and lighting calculations for game levels, but Valve decided that it wasn’t the right approach for multithreading in the Source engine, in part because fine-grained threading tends to be bound by available memory bandwidth. Instead, Valve chose to implement something it calls hybrid threading, which takes an “appropriate tool for the job” approach. With hybrid threading, Valve created a framework that allows multiple threading models depending on what’s appropriate for the task at hand. In-game systems can be sent to individual cores with coarse threading, and calculations that lend themselves to parallel processing can be spread over multiple cores using fine-grained threading. Work can even be queued for processing by idle cores if the results aren’t needed right away.
Of course, Valve didn’t want its game programmers to have to become threading experts just to take advantage of hybrid threading. Game programmers should be solving game problems rather than threading problems, so a work management system was designed to address gaming problems in a way that’s intuitive for game programmers. This system supports all the elements of hybrid threading and focuses on keeping multiple cores as busy as possible.
Valve’s work management system features a main thread that uses a pool of N-1 worker threads where N is the number of processor cores available. Of course, multiple threads create problems for data sharing if parallel threads want to read and write the same data. Locks are traditionally used to prevent corruption when a thread tries to read data that’s currently being written or modified. However, locks force the read thread to wait, leading to idle CPU cycles that clash with Valve’s desire to keep all cores occupied at all times.
In an attempt to avoid core idling due to thread locking, Valve made extensive use of “lock-free” algorithms. These algorithms allow threads to progress regardless of the state of other threads, and have been put under the hood of all of Valve’s developer tools.
To illustrate the application of its new programming framework, Valve explained how it handles multithreaded access to the spatial partition, a data structure the represents every object in the world. The spatial partition is used any time something dynamic happens in the world, from movement to shooting. Obviously, you want to allow multiple threads to access the partition, but that becomes tricky if multiple write threads try to access it at the same time. Through profiling, Valve discovered that 95% of the threads that wanted to access the spatial partition were just reading, while only 5% were writing. Valve now allows multiple threads to read the partition at the same time, but only one thread can access it to write.
Valve was also able to apply multithreading to the Source engine’s renderer. Game engines must perform numerous tasks before even issuing draw calls, including building world and object lists, performing graphical simulations, updating animations, and computing shadows. These tasks are all CPU-bound, and must be calculated for every “view”, be it the player camera, surface reflections, or in-game security camera monitors. With hybrid threading, Valve is able to construct world and object lists for multiple views in parallel. Graphics simulations can be overlapped, and things like shadows and bone transformations for all characters in all views can be processed across multiple cores. Multiple draw threads can even be executed in parallel, and Valve has rewritten the graphics library that sits between its engine and the DirectX API to take advantage of multiple cores.
Valve says hybrid threading is the most difficult approach to multithreading, but it scales well enough to be worth the investment. With dual-core processors, Valve sees an increase in frame rate as the main benefit to multithreading. However, there comes a point where increasing the frame rate begins to deliver diminishing returns. With quad-core systems, Valve is looking to provide gamers with new experiences rather than simply smoothing frame rates. Game elements like artificial intelligence, particle systems, and physics have traditionally been given fractions of a single CPU’s resources. Quad-core processors allow them to access considerably greater computational resources that programmers are more than eager to burn on smarter AI, richer visual simulations, and more realistic physics.
Artificial intelligence is a great candidate for hybrid threading because it’s tolerant in the sense that answers to questions aren’t necessarily needed right away. The game can wait a few fractions of a second for the answer to a “where’s cover?” question without adversely affecting gameplay, allowing some calculations to be queued to run on idle processor cores. There are also implications for what Valve calls out-of-band AI. This additional layer of artificial intelligence is separate from the core AI, but feeds information to it.
Particle systems also lend themselves well to hybrid threading. Although they’re mostly a visual effect, particle systems actually tend not to be GPU-bound. They also tend not to interact with each other, making it possible to run independent particle systems on individual cores. In situations where there is only one particle system in the scene, that system can also be distributed across multiple cores. Having those extra cores available for particle processing allows Valve to create much more complex particle systemsones that can interact with the world and even have gameplay implications.
Quantifying multi-core performance
To illustrate how multi-core processors can improve performance, Valve gave us a couple of benchmark applications. The first runs the VRAD lighting calculation tool on a Half-Life 2 map. This isn’t an end user application, but it shows how well multithreading can speed elements of the game development process, in this case a level build.
VRAD exhibits near-linear scaling, with the quad-core Core 2 Extreme QX6700 building the level nearly twice as quickly as the X6800, which runs at a higher clock speed. The Athlon 64 FX-62 is more than 35% slower than Intel’s fastest dual-core processor and not even close to the QX6700.
Valve also gave us a particle system benchmark that actually runs inside the Source engine. This test steps through a series of particle simulations, and according to Valve, it’s completely CPU-bound. Unlike VRAD, this test case is more typical of what an actual gamer might experience.
The QX6700 cleans up in the particle system benchmark, running nearly twice as fast as the dual-core X6800. Again, we see the FX-62 bring up the rear, although this time it’s only about 20% off the pace set by the X6800.
Valve makes a good case for its hybrid threading model, although it’s hard to argue against using the most appropriate threading approach for a given task. Creating a programming framework that allows that kind of flexibility was apparently very difficult, but in the end, Valve says it will enable games that competitors who don’t make the same investment in multithreading simply won’t be able to match. Hybrid threading has also proven to be an asset in the company’s work with Microsoft’s multi-core Xbox 360 console, and Valve says it sets them up nicely for what they believe is a “post-GPU” era looming over the horizon. Interestingly, though, Valve noted that its model isn’t particularly applicable to the PlayStation 3’s Cell processor.
Valve intends to roll out hybrid threading enhancements in the next major Source engine update, which will be released before Half-Life 2: Episode Two ships. Those enhancements won’t include the richer visual simulations, smarter AI, or more complex physics that are possible with multi-core processors, but dual- and quad-core systems should see a performance boost with Valve’s existing Source-engine games.
Of course, the more intriguing potential of Valve’s approach to multi-core gaming won’t be realized until its game designers start developing titles explicitly with multiple cores in mind. Work has already begun on more complex particle systems, realistic physics, and smarter AI, and Valve may even release a short levelsimilar to Lost Coastto showcase how the Source engine can exploit quad-core processors. That release may be the first glimpse we get of how multi-core processors can fundamentally change gaming. For years, we’ve enjoyed how the rapid pace of graphics hardware development has enabled ever more compelling visuals. Yet while developers have been able to create games that look real, their behavior has been anything but. Multi-core processors may finally give artificial intelligence, physics, and other game elements a chance to catch up.