Yet another Deferred Shading / Anti-aliasing discussion...

Started by
11 comments, last by johnchapman 10 years, 9 months ago

Hi. My apologies if this discussion has been played out already. This topic seems to come up a lot, but I did a quick search and did not quite find the information I was looking for. I'm interested in knowing what is considered the best practice these days, with respect to deferred rendering and anti-aliasing. These are the options, as I understand them:

Use some post-processed blur like FXAA.

I've tried enabling NVidia's built-in FXAA support, but the results were not nearly acceptable. Maybe there is another technique that can do a better job?

Use a multi-sampled MRT, and then handle your own MSAA resolve.

I've never done this before, and I'm anxious to try it for the sake of learning how, but it is difficult for me to understand how this is much better than super-sampling. If I understand MSAA correctly, the memory requirements are the same as for super-sampling. The only difference is that your shader is called fewer times. However, with deferred shading, this really only seems to help save a few material shader fragments, which don't seem very expensive in the first place. Unless I'm missing something, you still have to do your lighting calculations once per sample, even if all of the samples have the same exact data in them. Are the material shader savings (meager, I'm guessing) really worth all of the hassle?

Use Deferred Lighting instead of Deferred Shading.

You'll still have aliased lighting, though, and it comes at the expense of an extra pass (albeit depth-only, if I understand the technique correctly). Is anybody taking this option these days?

Use TXAA

NVidia is touting some TXAA technique on their website, although details seem slim. It seems to combine 2X MSAA with some sort of post-process technique. Judging from their videos, the results look quite acceptable, unlike FXAA. I'm guessing that the 2X MSAA would be handled using your own custom MSAA resolve, as described above, but I don't know what processing happens after that.

These all seem like valid options to try, although none of them seem to be from the proverbial Book. It seems to me, though, that forward rendering is a thing of the past, and I would love to be able to fill my scene with lights. I could try implementing all of these techniques as an experiment, but since they each come with a major time investment and learning curve, I was hoping that someone could help point a lost soul in the right direction.

Bonus questions: Is there a generally agreed-upon way to lay out your G-Buffer? I'd like to use this in conjunction with HDR, and so memory/bandwidth could start to become a problem, I would imagine. Is it still best practice to try to reconstruct position from depth? Are half-float textures typically used? Are any of the material properties packed or compressed in a particular way? Are normals stored as x/y coordinates, with the z-coord calculated in the shader?

I'm using OpenGL, if it matters.

Advertisement

I use FXAA but I don't use whats built in with the graphics driver. What you can do for better results is download the FXAA 3.9 shader (used to be on Timothy Lottes blog but I can't find it anymore), it has some conditional compilation setup in it which you can use to tweak the settings. This method is far better then using the graphics driver because you can apply it at a more optimal spot in your rendering pipeline (preventing some unwanted blur). Amazingly the same shader works for both hlsl and glsl and it will work on Intel and AMD gpus as well as consoles. It is important to note that you must have some method to generate luminosity before running the fxaa shader (this is pretty trivial).

FXAA is good in that it's really easy to implement and it's really cheap, the quality is not great. It has limited information to work with, and is completely incapable of handling temporal issues due to lack of sub-pixel information. If you use it you definitely want to do as ic0de recommends and grab the shader code and insert it into your post-processing chain as opposed to letting the driver do it, so that you can avoid applying it to things like text and UI. There's also MLAA which has similar benefits and problems.

You are correct that the "running the shader per-pixel" bit of MSAA only works for writing out your G-Buffer. The trick is to use some method of figuring out which pixels actually have different G-Buffer values in them, and then apply per-sample lighting only to those pixels while applying per-pixel lighting to the rest. For deferred renderers that use fragment shaders and lighting volumes, the typical way to do this is to generate a stencil mask and draw each light twice: once with a fragment shader that uses per-pixel lighting, and once with a fragment shader that uses per-sample lighting. For tiled compute shader deferred renderers you can instead "bucket" per-sample pixels into a list that you build in thread group shared memory, and handle them separately after shading the first sample of all pixels.

Some links:

http://developer.amd.com/wordpress/media/2012/10/Deferred%20Shading%20Optimizations.pps

http://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines

I also wrote quite a bit about this in the deferred rendering chapter of the book that I worked on, and wrote some companion samples that you can find on CodePlex.

Deferred lighting, AKA light pre-pass is basically dead at this point. It's only really useful if you want to avoid using multiple render targets, which was desirable on a particular current-gen console. If MRT isn't an issue then it will only make things worse for you, especially with regards to MSAA.

TXAA is just an extension of MSAA, so you need to get MSAA working before considering a similar approach. Same with SMAA, which basically combines MSAA and MLAA.

Forward rendering is actually making a big comeback in the form of "Forward+", which is essentially a modern variant of light indexed deferred rendering. Basically you use a compute shader to write out a list of lights that affect each screen-space tile (usually 16x16 pixels or so) and then during your forward rendering pass each pixel walks the list and applies each light. When you do this MSAA still works the way it's supposed to, at least for the main rendering pass. If you search around you'll find some info and some sample code.

As for the G-Buffer, as small as you can make it is still the rule of thumb. In general some packing/unpacking shader code is worth being able to use a smaller texture format. Reconstructing position from depth is absolutely the way to go, since it lets you save 3 G-Buffer channels. Storing position in a G-Buffer can also give you precision problems, unless you go for full 32-bit floats.

Thanks so much for your replies.

If I download the FXAA 3.9 shader and integrate it in my pipeline, will it help beyond allowing me to avoid blurring HUD/text elements? The reason I ask is that I have a lot of objects in my scene with long, straight edges -- particularly buildings and chain link fences, but also some vehicles as well -- with which FXAA seems to work particularly poorly. At a distance, these objects create wild, shimmering jaggies that are very distracting. Will downloading the shader actually improve this? Here are a couple examples:

W81rCSo.png

zCL7hBQ.png

As you move the viewpoint around, those broken lines crawl and shimmer.

I've spent the last couple hours reading through the links that you provided, MJP, and it has given me a lot of food for thought.

I'm particularly intrigued by the Forward+ idea, because the idea of using an MRT with HDR and MSAA is starting to sound prohibitive. Let's say I use the G-Buffer layout that you mentioned in your March 2012 blog post on Light-Indexed Deferred rendering, except the albedo buffers need to be bumped up to 64bpp to accommodate HDR rendering (right?). Then, multiply the whole thing by 4 for 4x MSAA, and I have a seriously fat buffer. And what do I do about reflection textures? If I want to do planar reflections or refractions, for example. That seems like it'd be another big fat g-buffer. Am I thinking about this correctly? Plus, you have the lack of flexibility with material parameters that comes with deferred rendering.

Edit: On the other hand, isn't it somewhat expensive the loop through a runtime-determined number of lights inside a fragment shader? If it isn't, then why did the old forward-renderers bother compiling different shaders for different numbers of lights? Why did they not, instead, just allow 8 lights per shader (say), and use a uniform (numLights) to determine how many to actually loop through? Sure, you only get per-object light lists that way, which is imprecise, but is it really slower than having a separate compute shader step that determines the light list on a per-pixel basis?

But if I do end up going the Deferred w/ MSAA route (which I'm kind of leaning toward), using edge detection to find the pixels I actually want to do per-sample lighting on, and doing per-pixel lighting for all others, sounds like it will be a huge time-saver, even if I have to eat a huge amount of memory.

And in any case, the information you provided helped me discover all sorts of inefficiencies with the way we're doing deferred shading, so it looks like I can probably gain some speed just with a few optimizations.

If I download the FXAA 3.9 shader and integrate it in my pipeline, will it help beyond allowing me to avoid blurring HUD/text elements?

There's a lot of options to the FXAA shader that you can tweak yourself.
Also, if you don't actually integrate it into your pipeline, then it's not actually in your game. It's pretty unkind to your users to say "to make my game look best, go into your nVidia driver panel (sorry ATI/intel users) and enable these hacks".

I have a lot of objects in my scene with long, straight edges -- particularly buildings and chain link fences, but also some vehicles as well -- with which FXAA seems to work particularly poorly. At a distance, these objects create wild, shimmering jaggies that are very distracting. Will downloading the shader actually improve this? Here are a couple examples:

In those example pictures, there's a lot of "information" missing -- e.g. lines that have pixels missing from them, so they've become dashed lines rather than solid lines.
Post-processing AA techniques (like FXAA) will not be able to repair these lines.
MSAA / super-sampling techniques will mitigate this issue, making the threshold for a solid line turning into a dashed line smaller.
Another solution is to fix the data going in to the renderer -- if you've got tiny bits of geometry that are going to end up being thinner than a pixel, they should be replaced with some kind of low-LOD version of themselves. e.g. if those fences were drawn with as a textured plane with alpha blending, you'd get soft anti-aliased lines by default, even with no anti-aliasing technique used.

Edit: On the other hand, isn't it somewhat expensive the loop through a runtime-determined number of lights inside a fragment shader?

On D3D9-era GPUs, yes, the branch instructions are quite costly. They'll also compile the loop to something like:
for( int i=0; i!=255; ++i ) { if( i >= g_lightCount ) break; ..... }
On more modern cards, branching is less costly. The biggest worry is when nearby pixels take different branches (e.g. one pixel wants to loop through 8 lights, but it's neighbour wants to loop through 32 lights) -- but most of these Forward+-ish techniques mitigate this issue by clustering/tiling pixels together, so that neighbouring pixels are likely to use the same loop counters and data.


Also, if you don't actually integrate it into your pipeline, then it's not actually in your game. It's pretty unkind to your users to say "to make my game look best, go into your nVidia driver panel (sorry ATI/intel users) and enable these hacks".

Hah, good point. This isn't a product that I plan on releasing into the wild, so that's not even something that I considered.


On more modern cards, branching is less costly. The biggest worry is when nearby pixels take different branches (e.g. one pixel wants to loop through 8 lights, but it's neighbour wants to loop through 32 lights) -- but most of these Forward+-ish techniques mitigate this issue by clustering/tiling pixels together, so that neighbouring pixels are likely to use the same loop counters and data.

Could I trouble you for more information about this? Why is it that nearby pixels like to have the same number of pixels, and what is meant by "nearby"? Would an 8x8 tile suffice? (that seems to be the common size, from what I've been reading)

GPU's use SIMD to run your shader programs on many sets of data at once -- e.g. every time one of it's "CPU cores" executes an instruction, it is actually executing that instruction on, say, 64 different pixels.
If just one of those pixels takes a particular branch, the "CPU core" has to execute the instructions on that branch. To get the correct results, it keeps the results for that one pixel, and ignores results for the other 63 pixels that didn't take the branch.

There's no real portable rules about this; every GPU is free to implement things differently... but I think an 8x8 tile (64 pixels) would be a decent rule of thumb for the kind of coherency you'd like to have.

Secondly, a different, but related concept is that pixel shaders are generally executed on a "quad" of 2x2 pixels, this is so that mipmapping/gradient instructions (e.g. ddx/ddy) can be implemented. If a triangle only covers a single pixel, the shader will likely still be computed for a whole 2x2 quad, with 3 of the pixels being thrown away.

So, if a GPU-core shades 64 pixels at a time, this really means that it shades 16 2x2 quads at a time.

This is fairly portable behaviour AFAIK - so your "tiles" should always have even dimensions -- e.g. 8x8 not 9x9 or 7x7.

Secondly, a different, but related concept is that pixel shaders are generally executed on a "quad" of 2x2 pixels, this is so that mipmapping/gradient instructions (e.g. ddx/ddy) can be implemented. If a triangle only covers a single pixel, the shader will likely still be computed for a whole 2x2 quad, with 3 of the pixels being thrown away.

Not relevant to OP, but that just answered a question I've had for a while. I assumed that was how derivative functions worked, but for some reason I'd always wondered how mipmapping worked in shaders. It never occurred to me that that would be all it took.

-~-The Cow of Darkness-~-

There is a document that deals with that issue of jaggedly lines for fences and power cables and such.

Anti-aliasing alone can't help you since the lines off in the distance end up being much less than one pixel. Look for the section titled "Phone-wire Anti-Aliasing"

http://www.humus.name/Articles/Persson_GraphicsGemsForGames.pdf

Consider it pure joy, my brothers and sisters, whenever you face trials of many kinds, 3 because you know that the testing of your faith produces perseverance. 4 Let perseverance finish its work so that you may be mature and complete, not lacking anything.

The power cables in our scene are basically just rectangular strips, onto which a texture of a catenary-shaped wire is mapped. This was done by a modeler long before my time, so I don't know why he decided to do it that way, but one fortunate side-effect is that the texture filtering takes care of everything for me. The fence poles and light posts are a different story, but because they never end up being much smaller than a pixel, the 4x MSAA seems to do an okay job on them. Thanks for the PDF, though, it has a lot of useful information.

Hodgman, thanks again for your help. I'm much obliged to everyone here. Although I must say, you guys are only encouraging me to ask more questions. =)

This topic is closed to new replies.

Advertisement