Longstanding framebuffer question

From the pixels, bits, and shaders to the graphic cards that power them. Discuss the latest from AMD and NVIDIA here.

Moderators: morphine, SecretSquirrel

Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 9:31 am

Airmantharp in another thread wrote:2GB of VRAM per GPU is going to go out of style very, very quickly.
This is a popular sentiment, and the thing always stated is this:
Airmantharp wrote:With 2GB, you're not running AA on a modern game at ~4MP. You run out of RAM :).
And from my own testing, it's true, MSAA and SSAA do use up VRAM very quickly. Why?

By my own calculations:
Code: Select all
2560 pixels wide ×
1440 pixels high ×
64 color bits per pixel (internal precision) =
235,929,600 bits, or 29,491,200 bytes, which is around 28Mbytes.

As I understand it. So each frame in memory is around 28Mbytes; double-buffered, still less than 60Mbytes. So if we look at that, and then apply 4x super-sampling:
Code: Select all
5120 pixels wide ×
2880 pixels high ×
64 color bits per pixel =
943,718,400 bits, or 117,964,800 bytes, which is around 113Mbytes.
Way, way bigger, but still not so large that we need gobs and gobs of RAM. You could easily fit ten of these in the video RAM of a 7970 and have almost two gigabytes left over for textures and other bits.

So what gives? Someone mentioned the Z-buffer in IRC one time, so let's add that in.
Code: Select all
2560 pixels wide ×
1440 pixels high ×
64 color bits per pixel ×
32 bits z-buffer =
7,549,747,200 bits, which is 943,718,400 bytes, or around 900Mbytes.
And that's BEFORE any anti-aliasing! That can't be right, can it? That number seems unrealistically large and I don't really understand that at all. If that were accurate, then that would mean 1GB cards literally cannot run games at 4MP resolution, and that's demonstrably false.

Can someone make any sense of this for me? (」゚ペ)」
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 9:38 am

Loading of higher-resolution textures to make the use of AA worthwhile in the first place? AA isn't particularly helpful if the textures are all blurry because they're being up-sampled.

(Caveat: I'm not a game developer, but I've done some OpenGL work, and also implemented what amounted to a custom fixed-function tile-based GPU in VHDL...)

Edit: Bottom line - the VRAM isn't just used for frame buffers!
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37739
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 9:55 am

Let's create a summary of what kind of buffers a capable high tech graphics engine (forward or deferred) might use. We'll assume the fictional gamer in question has a 2560x1440 screen.

  • The buffer that actually makes it to your screen. It generally uses half precision A16B16G16R16F buffers, or 64bit per pixel (128bit per pixel is supported too, but not of much use).
    Code: Select all
    2560 * 1440 * 64 / 8 / 1024 / 1024 = 28.125MiB

  • There are actually two of those buffers loaded at the same time, one that is currently visible on screen (the "Front Buffer"), and one that the GPU is storing its result in (the "Back Buffer"). When a frame is ready, these two buffers are swapped, and the process continues.
    Code: Select all
    2560 * 1440 * 64 / 8 / 1024 / 1024 = 28.125MiB

  • Then, there's the Depth/Stencil buffer you talked about. Any game that lets you see further than about a thousand units will use D24X8 (24bits of depth information). This adds 32bit per pixel. Notice eight of those are unused, but there are significant performance advantages of using pixels of exactly 4 byte in size.
    Code: Select all
    2560 * 1440 * 32 / 8 / 1024 / 1024 = 14.0625MiB

  • Most game engines will use more "Back Buffers" for two reasons. One is called a ping pong buffer. Why is that needed? Let's say you want to sharpen frame X. This result cannot directly be stored in frame X. To fix this, engines use the idea of Y = AddEffect(X). This can be repeated to add effects in any order, for example:
    Code: Select all
    X = BasicFrame();
    Y = AddEffect(X); // ping
    X = AddEffect(Y); // pong
    // repeat if necessary
    // send last one to display

    Note that this is only used for effects that are calculated at full resolution, like SSAO and sharpening.
    Code: Select all
    2560 * 1440 * 64 / 8 / 1024 / 1024 = 28.125MiB (ping pong)


    The other reason to use this is called "Deferred Rendering". I will not explain why, but this technique stores intermediate results in two to four additional full resolution buffers (assuming four, which is the minimum needed for high detail shading):
    Code: Select all
    2560 * 1440 * 64 / 8 / 1024 / 1024 * 4 = 112.5MiB (deferred)

  • For other special effects that do not need to be calculated per pixel, another set of two ping pong buffers at half (or quarter) resolution is created. This is generally used for things like bloom. As you can imagine, calculating the bloom value for packs of 2x2 pixels will generally yield the same result as when doing it for 1x1 blocks (the bloom effect is usually coarser than 1x1). This saves 75% of the shader calculations needed, but adds memory usage:
    Code: Select all
    ((2560)/2 * (1440)/2) * 64 / 8 / 1024 / 1024) * 2 = 14.0625MiB

  • Don't forget about shadows. I'm assuming the engine developer spent some time optimizing his algorithms, so I will assume Cascaded Shadow Mapping. In outdoor scenes, a reasonable quality shadow map will use three textures: 2048x2048, 1024x1024 and 512x512. Yes, that's right, three shadow maps (sometimes two, sometimes four, never seen 5+ being used). Simply put, this is done to achieve reasonable quality shadow maps at every view distance. Shadows for far away objects are stored in the smallest map, midway object in the midway map, and objects near the camera are stored in the biggest map:
    Image
    http://msdn.microsoft.com/en-us/library/windows/desktop/ee416307(v=vs.85).aspx
    When done properly shadow jaggies should be about equally large at every 3D distance. Obviously, this is a much better technique than to use the dense map for the whole scene. Even a single 8192x8192 monster won't cut it for games like Skyrim. Anyway, these things use a D3DFMT_R32F (one float per shadow pixel) format, which amounts to:
    Code: Select all
    (2048^2 + 1024^2 + 512^2) * 32 / 8 / 1024 / 1024 = 21MiB.

  • Multisampling basically multiplies the size of "Back Buffer", "Front Buffer" and "Z Buffer" buffers by a given amount. Special effects and shadow maps are unaffected. It is a bit complicated to calculate this given amount, because NVIDIA and ATI/AMD do not agree on the multiplying factors of things like "8xAA" (the 8 does not always imply 8x). For now, I will assume four samples per pixels, which is usually called "8x" or "4xQ" by graphics card vendors.
  • Supersampling is a bit worse than multisampling because it multiplies the special effect buffers too by the number before SSAA.

Summing things up, a GPU would require the following amount of memory to be able to run a deferred rendered game without any assets:

Code: Select all
1366x768 = 85.0313MiB
1366x768 2xSSAA = 149.063MiB
1366x768 4xSSAA = 277.125MiB
1366x768 4xMSAA = 105.041MiB
1366x768 4xMSAA 2xSSAA = 189.082MiB
1366x768 4xMSAA 4xSSAA = 357.164MiB
1366x768 8xMSAA = 145.061MiB
1366x768 8xMSAA 2xSSAA = 269.121MiB
1366x768 8xMSAA 4xSSAA = 517.242MiB

1600x900 = 108.891MiB
1600x900 2xSSAA = 196.781MiB
1600x900 4xSSAA = 372.563MiB
1600x900 4xMSAA = 136.356MiB
1600x900 4xMSAA 2xSSAA = 251.713MiB
1600x900 4xMSAA 4xSSAA = 482.426MiB
1600x900 8xMSAA = 191.288MiB
1600x900 8xMSAA 2xSSAA = 361.576MiB
1600x900 8xMSAA 4xSSAA = 702.152MiB

1920x1080 = 147.563MiB
1920x1080 2xSSAA = 274.125MiB
1920x1080 4xSSAA = 527.25MiB
1920x1080 4xMSAA = 187.113MiB
1920x1080 4xMSAA 2xSSAA = 353.227MiB
1920x1080 4xMSAA 4xSSAA = 685.453MiB
1920x1080 8xMSAA = 266.215MiB
1920x1080 8xMSAA 2xSSAA = 511.43MiB
1920x1080 8xMSAA 4xSSAA = 1001.86MiB

2560x1440 = 246MiB
2560x1440 2xSSAA = 471MiB
2560x1440 4xSSAA = 921MiB
2560x1440 4xMSAA = 316.313MiB
2560x1440 4xMSAA 2xSSAA = 611.625MiB
2560x1440 4xMSAA 4xSSAA = 1202.25MiB
2560x1440 8xMSAA = 456.938MiB
2560x1440 8xMSAA 2xSSAA = 892.875MiB
2560x1440 8xMSAA 4xSSAA = 1764.75MiB

3840x2160 = 527.25MiB
3840x2160 2xSSAA = 1033.5MiB
3840x2160 4xSSAA = 2046MiB
3840x2160 4xMSAA = 685.453MiB
3840x2160 4xMSAA 2xSSAA = 1349.91MiB
3840x2160 4xMSAA 4xSSAA = 2678.81MiB
3840x2160 8xMSAA = 1001.86MiB
3840x2160 8xMSAA 2xSSAA = 1982.72MiB
3840x2160 8xMSAA 4xSSAA = 3944.44MiB


I repeat, that is what you need to run the graphics engine. Not the game and its textures and objects. You could call this the Curb Weight.

Why is this relevant when game assets like textures and objects can consume an infinite amount of extra memory? This way you can see what the differences are between resolutions so you know what to expect when buying a new monitor.

Sources:
Last edited by Orwell on Sun Oct 27, 2013 6:17 am, edited 52 times in total.
Phenom II X4 @ 3600/2400 | Radeon 5850 @ 850/1250 | Samsung 830 128GB | Samsung SyncMaster T260 | Boston Acoustics A26
Orwell
Gerbil
 
Posts: 32
Joined: Wed Apr 03, 2013 3:28 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 10:03 am

Orwell wrote:Let's create a summary of what kind of buffers a reasonable but not high tech game engine (a "forward renderer") might use. We'll assume the fictional gamer in question has a 2560x1440 screen.

This is fascinating, please continue! ☆*・゜゚・*\(^O^)/*・゜゚・*☆
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 10:05 am

As I already pointed out, frame (and Z) buffers aren't the only thing consuming VRAM.

To elaborate further: To achieve smooth framerates and avoid starving the CPU of memory bandwidth, textures need to be pre-loaded into the GPU's VRAM. If you want those textures to look good at high resolutions with AA enabled, they need to be high resolution. This chews up a lot of space!
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37739
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 11:17 am

Done. Check previous post.
Phenom II X4 @ 3600/2400 | Radeon 5850 @ 850/1250 | Samsung 830 128GB | Samsung SyncMaster T260 | Boston Acoustics A26
Orwell
Gerbil
 
Posts: 32
Joined: Wed Apr 03, 2013 3:28 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 11:30 am

just brew it! wrote:As I already pointed out, frame (and Z) buffers aren't the only thing consuming VRAM.
Yeah, I know that, as should be obvious from my original post, if you read it again ...

Most games have an option to use the highest-quality textures available all the time. This option isn't that big a deal generally because most games' textures just aren't that big. The overwhelming "conventional wisdom" is that increasing resolution and enabling AA uses up tons and tons of VRAM, and I've noticed myself that it's easy to get up into the >2GB RAM range using 4x SSAA. Orwell's post is a lot to digest; given my weakness with maths I'm gonna have to read it over a few times to get it. ∑(O_O;)

[edit] AHH! This link! http://mynameismjp.wordpress.com/2012/1 ... -overview/ <- I love it! Finally I have a good example image to demonstrate to people why FXAA and MLAA are garbage relative to proper MSAA! So amazing!
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 11:48 am

Done again. Added stuff about deferred rendering, MSAA and SSAA.
Phenom II X4 @ 3600/2400 | Radeon 5850 @ 850/1250 | Samsung 830 128GB | Samsung SyncMaster T260 | Boston Acoustics A26
Orwell
Gerbil
 
Posts: 32
Joined: Wed Apr 03, 2013 3:28 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 11:51 am

As Orwell has explained rather well, not much of the VRAM is used buy the framebuffers, but that VRAM is used up by loads of other things;

The way I understand it, a modern graphics card is a self-contained board that has a processor, memory controller and RAM. It runs its own operating system (a Geforce or Radeon driver is much more than a simple driver these days) and requires dedicated power and cooling. It is far more similar today to a complete computer than it was in the fixed-function days, where the frame buffer and texture memory represented a much greater percentage of the overall VRAM use.

Just as windows programs use system RAM for a multitude of different tasks and services, game engines use VRAM in a similar way now. The more complex a game, the more different tasks are running simultaneously on the GPU and the more memory is needed to store resources for all those tasks in.
Performance nosedives when a Windows application runs out of available RAM because the speed of subsequent fetch/store operations rely on the PCI-Express and SATA bus which are orders of magnitude slower than the RAM. The exact same thing happens when detail and game effects are turned up that consume too much VRAM on a graphics card; Some of the data spills over into system RAM via the PCI-Express bus and this is orders of magnitude slower to fetch/store than the onboard VRAM.
<insert large, flashing, epileptic-fit-inducing signature (based on the latest internet-meme) here>
Chrispy_
Gerbil Jedi
Gold subscriber
 
 
Posts: 1886
Joined: Fri Apr 09, 2004 3:49 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 11:59 am

You want to have all of the textures you might need to apply at the ready, at multiple resolutions. Texture storage (and storage used to implement other effects) are the issue, not the frame buffer itself. Even when you multiply the frame buffer size for AA.

Native screen resolutions have not increased by much, but visual quality has. This demands more storage for more high-resolution textures and supporting data/buffers for effects. VRAM size has more or less kept pace with system memory size and the increase in game complexity/visuals. I guess this doesn't seem at all odd to me?
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37739
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 12:04 pm

Chrispy_ wrote:As Orwell has explained rather well, not much of the VRAM is used buy the framebuffers, but that VRAM is used up by loads of other things;
Yeah, sure, I know all that already.

The thing is, the mantra repeated over and over in forums and on IRC these days is "running out of VRAM", and this is usually in reference to using high degrees of AA, or running a high resolution (like 4K, or Eyefinity 3x1, which is actually quite common among enthusiast gamers these days.) The argument against things like the 690 and the 770 is that they don't have enough VRAM, but I very carefully monitor my CPU, GPU, and VRAM usage at all times, and, well, I almost NEVER go over 2GB of VRAM -- only when I'm using something really esoteric, like 32xS AA, or 8x MSAA + 8x SSAA, and my GPU starts to struggle at those points anyway, so they're pretty unrealistic.

I have a very solid basic understanding of how GPUs work on a conceptual level; it was the math I wasn't certain about. I still have a lot of questions after reading Orwell's post, but I'm reading through his sources now. 「(゚ペ)

Thanks for replying anyway Chrispy_; I always appreciate attempts to educate and inform. (You too JBI! Sorry if I seemed unappreciative earlier.) ♡(*´・ω・)(・ω・`*)♡
just brew it! wrote:Texture storage (and storage used to implement other effects) are the issue, not the frame buffer itself. Even when you multiply the frame buffer size for AA.
Sure; I always thought so! I'm just trying to confirm that, or gain some understanding of why this isn't common knowledge. The performance bottlenecks for Eyefinity and high levels of AA are usually the pixel shaders (old-school texture mapping units), the ROPs, or the memory bandwidth, right?
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 12:20 pm

It's usually textures (and things that use textures to do other things) that eat up the largest chunk of VRAM.

For stuff like AA, it's memory bandwidth and ROPs that have a greater importance IIRC.
ChronoReverse
Gerbil Elite
 
Posts: 736
Joined: Wed Dec 12, 2007 4:20 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 12:25 pm

Yep, a lot of arguments against multiple weaker cards, stuff like the GTX690 and the GTX770 are related to the resolutions used. Often, discussions revolve around the opinion that "one should not use graphics card X in combination with resolution Y because it has or will have to little VRAM". If one checks the grand table of graphics engine curb weights, one can see that moving from 1920x1080 to 2560x1440 only requires an extra 100MiB to 150MiB of VRAM. This isn't much compared to what most of these cards have.

As a comparison, a GTX690 at 1920x1080 4xAA uses 9.1% of its graphics memory for resolution dependent stuff. That increases to 15.4% at 2560x1440 4xAA. Sure, it's measurable, but not much more than that.
Phenom II X4 @ 3600/2400 | Radeon 5850 @ 850/1250 | Samsung 830 128GB | Samsung SyncMaster T260 | Boston Acoustics A26
Orwell
Gerbil
 
Posts: 32
Joined: Wed Apr 03, 2013 3:28 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 1:07 pm

I also suggest reading the first GCN review on Techreport from 2 years ago... and read the upcoming directx11.2 spec

You might see something interesting finally being enabled to ALL GCN based cards.

(Of course mantle and all console also support this)

edit: already available on openGL (dx11.2 is based on a subset)
http://www.opengl.org/registry/specs/AM ... exture.txt
sschaem
Gerbil Team Leader
 
Posts: 247
Joined: Tue Oct 02, 2007 11:05 am

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 3:45 pm

sschaem wrote:You might see something interesting finally being enabled to ALL GCN based cards.

This IS interesting!

Virtualized textures have interested me ever since I played RAGE. In that game, almost no textures are repeated, virtually every surface in the game uses a unique texture. The effect is subtle, but at the same time, makes for an enormous improvement in image quality. On the surface, I realize it might seem like those two statements are contradictory, but that really is how it feels -- you don't immediately notice the megatexturing, but the whole time I was playing RAGE I kept thinking, "gosh, this just looks really nice!" in every scene, because every scene looks different. The effect is dramatically subtle. Can I say that? I don't think I can. I did anyway. Virtual textures~♥~(‘▽^人)

Orwell more or less validated my assumptions, even if I can't reproduce his math in the chart up there. Thanks guys!
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 4:19 pm

Auxy, you can reproduce everything I calculated using the script I provided in the sources. You can run it using MATLAB or have a good look at it without:
http://wilcobrouwer.nl/bestanden/main3.m
Phenom II X4 @ 3600/2400 | Radeon 5850 @ 850/1250 | Samsung 830 128GB | Samsung SyncMaster T260 | Boston Acoustics A26
Orwell
Gerbil
 
Posts: 32
Joined: Wed Apr 03, 2013 3:28 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 5:32 pm

Some links

First TR 7970 review talking about PRT... (This feature been sitting idled for some time now)
http://techreport.com/review/22192/amd- ... rocessor/4

A little demo from Microsoft (dx11.2, only windows8.1 . But should be available under windows7 with Mantle!)
http://www.youtube.com/watch?v=QB0VKmk5bmI

Hint that all GCN will have the feature enabled in upcoming drivers
http://devgurus.amd.com/message/1298518#1298518

Also it seem AMD GCN fully support the DX11.2 Tiled Resources feature (tier 1 & 2)...

ID Tech 6 would rock if it was coded using Mantle :)
sschaem
Gerbil Team Leader
 
Posts: 247
Joined: Tue Oct 02, 2007 11:05 am

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 6:03 pm

Orwell wrote:Auxy, you can reproduce everything I calculated using the script I provided in the sources. You can run it using MATLAB or have a good look at it without:
http://wilcobrouwer.nl/bestanden/main3.m
Ahaha~ ... I have no idea what MATLAB is and I can't parse that script very well, visually, so I'll take your word for it; I was already doing that anyway.

This discussion brings to mind another topic very dear to me: the modern uselessness of driver-forced AA. Around the time that DX9 (and thus, deferred rendering) became mainstream, developers more or less stopped including MSAA in their games. At the same time, driver-forced "mysteriously" stopped working in most titles. These came along with vague remarks from developers that their engine was somehow "incompatible" with anti-aliasing (what?) and that AA was no longer necessary with HD resolutions (w-what the f-?!).

My presumption has always been that MSAA stopped being supported by both developers and drivers due to customer support issues -- someone enables 8x MSAA on a Radeon 8600 and then makes a big scene when they can't run XYZ games playably. Worse still, someone enables 8x MSAA in the driver on same and then complains furiously that every game runs like garbage.

Maybe that's overly presumptuous of me, but, I guess that's what I'm asking now -- why DON'T we see more and wider support for 'proper' anti-aliasing in games? And why can't people see how awful post-process AA is?
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 6:35 pm

Actually, around then, Deferred Rendering became popular and the existing "force AA" methods simply couldn't handle that in an efficient manner.

Driver-forced AA has always been a bit of a hack in the first place. The appropriate place to control and do AA is within the engine so that it's more efficient.
ChronoReverse
Gerbil Elite
 
Posts: 736
Joined: Wed Dec 12, 2007 4:20 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 7:27 pm

ChronoReverse wrote:Actually, around then, Deferred Rendering became popular and the existing "force AA" methods simply couldn't handle that in an efficient manner.
Because they had a bunch of back-buffers to oversample, right? Heh-heh. That's more or less exactly what I was saying - it wasn't that it didn't actually work, it was just really slow, so noobdy -- except for those people with high-end graphics cards -- could run it smoothly.
ChronoReverse wrote:Driver-forced AA has always been a bit of a hack in the first place. The appropriate place to control and do AA is within the engine so that it's more efficient.
Just how much of a hack it is becomes painfully clear if you start messing around with AA bitmasks in Nvidia Inspector. You can have three different bitmasks that provide exactly the same results, visually, and with dramatically different performance characteristics. It's pretty nutsy. I wish I had some idea what each bit meant, rather than blindly stabbing in the dark to make it work.
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 8:18 pm

No, deferred rendering is incompatible with MSAA because the G-Buffers store material data you need for the lighting calculations. MSAA works by scaling a surface and associated depth surface by some factor (vendor controlled, you don't know by how much as a developer) and then performing coverage sampling during fragment output to mask output to pixels that aren't covered by a fragment. This works in normal forward lighting scenarios where the only MSAAed surface is the color buffer because during the resolve, all that's being filtered is final color information. When the same effect is applied to the G-Buffers, you're turning all the geometry information needed for lighting calculations into total garbage (an MSAA surface has to be resolved before it can be sampled). D3D10 or 11, can't remember which, provides the facilities to work around that by providing coverage mask results as pixel shader input so you can do something intelligent about the MSAA-resolved G-Buffers, but generally the expense isn't worth it because 1) there's usually 2-4 G-Buffers and they're all full-res so the additional memory cost of MSAA is pretty exorbitant and 2) most of the time, you end up wanting to undo the resolve anyway to properly calculate lighting.

Deferred light pre-pass rendering is still compatible with MSAA so games that use that will usually offer MSAA support, but light pre-pass renderers require two geometry passes, which can be expensive depending on the scene complexity.

Source: I program graphics for games for a living :)
Zoomastigophora
Gerbil Elite
 
Posts: 613
Joined: Tue Nov 11, 2008 7:10 pm

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 8:29 pm

Zoomastigophora wrote:No, deferred rendering is incompatible with MSAA because the G-Buffers store material data you need for the lighting calculations. MSAA works by scaling a surface and associated depth surface by some factor (vendor controlled, you don't know by how much as a developer) and then performing coverage sampling during fragment output to mask output to pixels that aren't covered by a fragment. This works in normal forward lighting scenarios where the only MSAAed surface is the color buffer because during the resolve, all that's being filtered is final color information. When the same effect is applied to the G-Buffers, you're turning all the geometry information needed for lighting calculations into total garbage (an MSAA surface has to be resolved before it can be sampled). D3D10 or 11, can't remember which, provides the facilities to work around that by providing coverage mask results as pixel shader input so you can do something intelligent about the MSAA-resolved G-Buffers, but generally the expense isn't worth it because 1) there's usually 2-4 G-Buffers and they're all full-res so the additional memory cost of MSAA is pretty exorbitant and 2) most of the time, you end up wanting to undo the resolve anyway to properly calculate lighting.

Deferred light pre-pass rendering is still compatible with MSAA so games that use that will usually offer MSAA support, but light pre-pass renderers require two geometry passes, which can be expensive depending on the scene complexity.

Source: I program graphics for games for a living :)

Unless a whole lot less games use deferred rendering than I thought -- which is to say, I thought it was basically everything in the last few years -- then something is not working as stated. I mean, I can name several games where the developers said MSAA was "incompatible" with their game, but where it can be forced on on Geforce cards in the driver, via hacking the profile with Nvidia Inspector, so *clearly* it's not "incompatible". Unless NVIDIA set up the game's profile such that it intelligently doesn't oversample the G-buffers -- but then why isn't MSAA allowed by default, if they went to the trouble to create a profile for the game which is that intelligent?
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sat Oct 26, 2013 9:22 pm

auxy wrote:I mean, I can name several games where the developers said MSAA was "incompatible" with their game, but where it can be forced on on Geforce cards in the driver, via hacking the profile with Nvidia Inspector, so *clearly* it's not "incompatible". Unless NVIDIA set up the game's profile such that it intelligently doesn't oversample the G-buffers -- but then why isn't MSAA allowed by default, if they went to the trouble to create a profile for the game which is that intelligent?

Even if it seems you were able to force AA on in a profile, it doesn't mean it was actually turned on or it was applied in any meaningful way, but I can't speak for what vendors do in their drivers. I can only offer my perspective from the developer side of things and if a developer say MSAA is incompatible, I have little reason to doubt them. I also have no idea how prevalent deferred rendering is, but I wouldn't assume every game uses it.
Zoomastigophora
Gerbil Elite
 
Posts: 613
Joined: Tue Nov 11, 2008 7:10 pm

Re: Longstanding framebuffer question

Postposted on Sun Oct 27, 2013 12:12 am

Zoomastigophora wrote:Even if it seems you were able to force AA on in a profile, it doesn't mean it was actually turned on or it was applied in any meaningful way, but I can't speak for what vendors do in their drivers. I can only offer my perspective from the developer side of things and if a developer say MSAA is incompatible, I have little reason to doubt them. I also have no idea how prevalent deferred rendering is, but I wouldn't assume every game uses it.
I don't appreciate the implication that I am unable to determine whether anti-aliasing is being applied! I am much more astute than the average person in this regard, but even to the uninitiated, the difference in no AA vs. 4x MSAA or SSAA is blatant in motion, at ~100DPI.

Regardless, while it probably is silly to assume every game uses deferred rendering, I have every reason to doubt it when developers say MSAA is "incompatible" with their game, because it doesn't really make sense. You can oversample any signal, and computer graphics, in the end, are a digital signal like anything else. (¬д¬;) It may require more intelligence on the part of the driver, and/or more bandwidth/fillrate/etc, but I haven't heard anything to indicate that it can't be done.

Maybe they mean "it's too much work and not worth it", or there are other reasons, but if it can be done -- and it can in every DX9 case, as far as I have seen -- it obviously isn't literally "incompatible."
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas

Re: Longstanding framebuffer question

Postposted on Sun Oct 27, 2013 1:05 am

If you're thinking SSAA then yes it would apply to anything but MSAA is usually performed at a stage where it'll be ineffective.

As for deferred rendering being everywhere, why would you believe it not to be? Unreal Engine 3 is deferred and it's super popular. Frostbite 2, CryEngine 3 and Unity are also deferred (capable). I'm no expert but it just seems to be a popular way to do lighting.
ChronoReverse
Gerbil Elite
 
Posts: 736
Joined: Wed Dec 12, 2007 4:20 pm

Re: Longstanding framebuffer question

Postposted on Sun Oct 27, 2013 3:19 am

auxy wrote:I don't appreciate the implication that I am unable to determine whether anti-aliasing is being applied! I am much more astute than the average person in this regard, but even to the uninitiated, the difference in no AA vs. 4x MSAA or SSAA is blatant in motion, at ~100DPI.

Regardless, while it probably is silly to assume every game uses deferred rendering, I have every reason to doubt it when developers say MSAA is "incompatible" with their game, because it doesn't really make sense. You can oversample any signal, and computer graphics, in the end, are a digital signal like anything else. (¬д¬;) It may require more intelligence on the part of the driver, and/or more bandwidth/fillrate/etc, but I haven't heard anything to indicate that it can't be done.

Maybe they mean "it's too much work and not worth it", or there are other reasons, but if it can be done -- and it can in every DX9 case, as far as I have seen -- it obviously isn't literally "incompatible."

Sorry, I didn't mean to imply you aren't able to tell whether AA is applied or not, just that what a driver does when AA is forced on isn't well established. You're also not understanding my point, MSAA doesn't oversample a signal, it oversamples coverage information to interpolate a signal. In the case of G-Buffers, you are literally interpolating geometry data like normals and positions, which will be total garbage at edges where you actually want AA to be effective. Whether the artifact will be visible or not depends on the lighting and surface material, but don't misunderstand, you literally cannot MSAA G-Buffers and expect correct results. If the only thing forced driver AA does it MSAA the back buffer, then the end result will depend on what render passes hit the back buffer. SSAA doesn't suffer from this problem because it actually scales up all the surfaces and the pixel shader gets run for each new pixel, perhaps this is what drivers actually do.

ChronoReverse wrote:As for deferred rendering being everywhere, why would you believe it not to be? Unreal Engine 3 is deferred and it's super popular. Frostbite 2, CryEngine 3 and Unity are also deferred (capable). I'm no expert but it just seems to be a popular way to do lighting.

UE3 started its life on a forward renderer with a nice static lightmap system. In fact, most of the engines you listed started that way and is often still the case today for modern console games (and by extension, PC ports of console games). A clever lightmapping scheme with well controlled dynamic light count can still produce visually impressive results on a forward renderer without the overhead of deferred and allows you to handle transparencies a lot easier. Games like Halo, God of War, and Call of Duty were/are forward rendered, but then Killzone 2 was deferred, so there's no easy rule of thumb. As an engineer though, our job is to determine what technique will satisfy the visual bar set by the art director(s), while still maintaining performance targets and resource limits; just because deferred rendering allows for large light counts doesn't necessarily mean it's the appropriate solution.

The next-gen consoles should see some nice advances in light rendering systems. Tile-based deferred rendering and Forward+(+) are all interesting ways to handle large light count, but clustered shading has my interest the most :)
Zoomastigophora
Gerbil Elite
 
Posts: 613
Joined: Tue Nov 11, 2008 7:10 pm

Re: Longstanding framebuffer question

Postposted on Wed Nov 06, 2013 12:38 pm

Zoomastigophora wrote:Sorry, I didn't mean to imply you aren't able to tell whether AA is applied or not, just that what a driver does when AA is forced on isn't well established. You're also not understanding my point, MSAA doesn't oversample a signal, it oversamples coverage information to interpolate a signal. In the case of G-Buffers, you are literally interpolating geometry data like normals and positions, which will be total garbage at edges where you actually want AA to be effective. Whether the artifact will be visible or not depends on the lighting and surface material, but don't misunderstand, you literally cannot MSAA G-Buffers and expect correct results. If the only thing forced driver AA does it MSAA the back buffer, then the end result will depend on what render passes hit the back buffer. SSAA doesn't suffer from this problem because it actually scales up all the surfaces and the pixel shader gets run for each new pixel, perhaps this is what drivers actually do.
Yeah, of course, I understood already that you can't MSAA the G-buffers from your previous post. You already said that, so why did you think I didn't understand it?
Driver-forced MSAA and SSAA have deterministic and empirically verifiable differences in image quality and performance, so they're definitely not just SSAAing the whole scene with MSAA enabled.

Thanks for the reply anyway. I meant to reply sooner but I sorta forgot about this thread, eheh.
i5-3570K @ 4.4 (NH-C14), 4x8GB DDR3-1866, GA-Z68MA-D3H-B2, ASUS GTXTITAN-6GD5, 128GB Vertex 4 / 2x60GB Vertex Plus R2 / 2x2TB Barracuda 7200.14 RAID0 / ANS-9010 (4x4GB), SST-DA1000 (PSU), 2x VS229H-P, 1x VG248QE, 1x MIMO 720F, Corsair Vengeance K90+M95
auxy
Gerbil Elite
 
Posts: 781
Joined: Sat Jan 19, 2013 4:25 pm
Location: the armpit of Texas


Return to Graphics

Who is online

Users browsing this forum: Yahoo [Bot] and 7 guests