Jump to content

  • Log In with Google      Sign In   
  • Create Account

Yet another Deferred Shading / Anti-aliasing discussion...


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
12 replies to this topic

#1 CDProp   Members   -  Reputation: 1047

Like
2Likes
Like

Posted 03 July 2013 - 12:59 PM

Hi. My apologies if this discussion has been played out already. This topic seems to come up a lot, but I did a quick search and did not quite find the information I was looking for. I'm interested in knowing what is considered the best practice these days, with respect to deferred rendering and anti-aliasing. These are the options, as I understand them:

 

Use some post-processed blur like FXAA.

I've tried enabling NVidia's built-in FXAA support, but the results were not nearly acceptable. Maybe there is another technique that can do a better job?

 

Use a multi-sampled MRT, and then handle your own MSAA resolve.

I've never done this before, and I'm anxious to try it for the sake of learning how, but it is difficult for me to understand how this is much better than super-sampling. If I understand MSAA correctly, the memory requirements are the same as for super-sampling. The only difference is that your shader is called fewer times. However, with deferred shading, this really only seems to help save a few material shader fragments, which don't seem very expensive in the first place. Unless I'm missing something, you still have to do your lighting calculations once per sample, even if all of the samples have the same exact data in them. Are the material shader savings (meager, I'm guessing) really worth all of the hassle?

 

Use Deferred Lighting instead of Deferred Shading.

You'll still have aliased lighting, though, and it comes at the expense of an extra pass (albeit depth-only, if I understand the technique correctly). Is anybody taking this option these days?

 

Use TXAA

NVidia is touting some TXAA technique on their website, although details seem slim. It seems to combine 2X MSAA with some sort of post-process technique. Judging from their videos, the results look quite acceptable, unlike FXAA. I'm guessing that the 2X MSAA would be handled using your own custom MSAA resolve, as described above, but I don't know what processing happens after that.

 

These all seem like valid options to try, although none of them seem to be from the proverbial Book. It seems to me, though, that forward rendering is a thing of the past, and I would love to be able to fill my scene with lights. I could try implementing all of these techniques as an experiment, but since they each come with a major time investment and learning curve, I was hoping that someone could help point a lost soul in the right direction.

 

Bonus questions: Is there a generally agreed-upon way to lay out your G-Buffer? I'd like to use this in conjunction with HDR, and so memory/bandwidth could start to become a problem, I would imagine. Is it still best practice to try to reconstruct position from depth? Are half-float textures typically used? Are any of the material properties packed or compressed in a particular way? Are normals stored as x/y coordinates, with the z-coord calculated in the shader?

 

I'm using OpenGL, if it matters.



Sponsor:

#2 ic0de   Members   -  Reputation: 909

Like
1Likes
Like

Posted 03 July 2013 - 01:28 PM

I use FXAA but I don't use whats built in with the graphics driver. What you can do for better results is download the FXAA 3.9 shader (used to be on Timothy Lottes blog but I can't find it anymore), it has some conditional compilation setup in it which you can use to tweak the settings. This method is far better then using the graphics driver because you can apply it at a more optimal spot in your rendering pipeline (preventing some unwanted blur). Amazingly the same shader works for both hlsl and glsl and it will work on Intel and AMD gpus as well as consoles. It is important to note that you must have some method to generate luminosity before running the fxaa shader (this is pretty trivial).


Edited by ic0de, 03 July 2013 - 01:28 PM.

you know you program too much when you start ending sentences with semicolons;


#3 MJP   Moderators   -  Reputation: 11774

Like
5Likes
Like

Posted 03 July 2013 - 02:26 PM

FXAA is good in that it's really easy to implement and it's really cheap, the quality is not great. It has limited information to work with, and is completely incapable of handling temporal issues due to lack of sub-pixel information. If you use it you definitely want to do as ic0de recommends and grab the shader code and insert it into your post-processing chain as opposed to letting the driver do it, so that you can avoid applying it to things like text and UI. There's also MLAA which has similar benefits and problems.
 

You are correct that the "running the shader per-pixel" bit of MSAA only works for writing out your G-Buffer. The trick is to use some method of figuring out which pixels actually have different G-Buffer values in them, and then apply per-sample lighting only to those pixels while applying per-pixel lighting to the rest. For deferred renderers that use fragment shaders and lighting volumes, the typical way to do this is to generate a stencil mask and draw each light twice: once with a fragment shader that uses per-pixel lighting, and once with a fragment shader that uses per-sample lighting. For tiled compute shader deferred renderers you can instead "bucket" per-sample pixels into a list that you build in thread group shared memory, and handle them separately after shading the first sample of all pixels.

Some links:

 

http://developer.amd.com/wordpress/media/2012/10/Deferred%20Shading%20Optimizations.pps

 

http://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines

 

I also wrote quite a bit about this in the deferred rendering chapter of the book that I worked on, and wrote some companion samples that you can find on CodePlex.

 

Deferred lighting, AKA light pre-pass is basically dead at this point. It's only really useful if you want to avoid using multiple render targets, which was desirable on a particular current-gen console. If MRT isn't an issue then it will only make things worse for you, especially with regards to MSAA.

TXAA is just an extension of MSAA, so you need to get MSAA working before considering a similar approach. Same with SMAA, which basically combines MSAA and MLAA.

Forward rendering is actually making a big comeback in the form of "Forward+", which is essentially a modern variant of light indexed deferred rendering. Basically you use a compute shader to write out a list of lights that affect each screen-space tile (usually 16x16 pixels or so) and then during your forward rendering pass each pixel walks the list and applies each light. When you do this MSAA still works the way it's supposed to, at least for the main rendering pass. If you search around you'll find some info and some sample code.

As for the G-Buffer, as small as you can make it is still the rule of thumb. In general some packing/unpacking shader code is worth being able to use a smaller texture format. Reconstructing position from depth is absolutely the way to go, since it lets you save 3 G-Buffer channels. Storing position in a G-Buffer can also give you precision problems, unless you go for full 32-bit floats.
 


Edited by MJP, 04 July 2013 - 02:06 PM.


#4 CDProp   Members   -  Reputation: 1047

Like
0Likes
Like

Posted 03 July 2013 - 06:49 PM

Thanks so much for your replies.

 

If I download the FXAA 3.9 shader and integrate it in my pipeline, will it help beyond allowing me to avoid blurring HUD/text elements? The reason I ask is that I have a lot of objects in my scene with long, straight edges -- particularly buildings and chain link fences, but also some vehicles as well -- with which FXAA seems to work particularly poorly. At a distance, these objects create wild, shimmering jaggies that are very distracting. Will downloading the shader actually improve this? Here are a couple examples:

 

W81rCSo.png

 

zCL7hBQ.png

 

As you move the viewpoint around, those broken lines crawl and shimmer.

 

I've spent the last couple hours reading through the links that you provided, MJP, and it has given me a lot of food for thought. 

 

I'm particularly intrigued by the Forward+ idea, because the idea of using an MRT with HDR and MSAA is starting to sound prohibitive. Let's say I use the G-Buffer layout that you mentioned in your March 2012 blog post on Light-Indexed Deferred rendering, except the albedo buffers need to be bumped up to 64bpp to accommodate HDR rendering (right?). Then, multiply the whole thing by 4 for 4x MSAA, and I have a seriously fat buffer. And what do I do about reflection textures? If I want to do planar reflections or refractions, for example. That seems like it'd be another big fat g-buffer. Am I thinking about this correctly? Plus, you have the lack of flexibility with material parameters that comes with deferred rendering. 

 

Edit: On the other hand, isn't it somewhat expensive the loop through a runtime-determined number of lights inside a fragment shader? If it isn't, then why did the old forward-renderers bother compiling different shaders for different numbers of lights? Why did they not, instead, just allow 8 lights per shader (say), and use a uniform (numLights) to determine how many to actually loop through? Sure, you only get per-object light lists that way, which is imprecise, but is it really slower than having a separate compute shader step that determines the light list on a per-pixel basis?

 

But if I do end up going the Deferred w/ MSAA route (which I'm kind of leaning toward), using edge detection to find the pixels I actually want to do per-sample lighting on, and doing per-pixel lighting for all others, sounds like it will be a huge time-saver, even if I have to eat a huge amount of memory.

 

And in any case, the information you provided helped me discover all sorts of inefficiencies with the way we're doing deferred shading, so it looks like I can probably gain some speed just with a few optimizations.


Edited by CDProp, 03 July 2013 - 06:57 PM.


#5 Hodgman   Moderators   -  Reputation: 31851

Like
3Likes
Like

Posted 03 July 2013 - 07:44 PM

If I download the FXAA 3.9 shader and integrate it in my pipeline, will it help beyond allowing me to avoid blurring HUD/text elements?

There's a lot of options to the FXAA shader that you can tweak yourself.
Also, if you don't actually integrate it into your pipeline, then it's not actually in your game. It's pretty unkind to your users to say "to make my game look best, go into your nVidia driver panel (sorry ATI/intel users) and enable these hacks".

I have a lot of objects in my scene with long, straight edges -- particularly buildings and chain link fences, but also some vehicles as well -- with which FXAA seems to work particularly poorly. At a distance, these objects create wild, shimmering jaggies that are very distracting. Will downloading the shader actually improve this? Here are a couple examples:

In those example pictures, there's a lot of "information" missing -- e.g. lines that have pixels missing from them, so they've become dashed lines rather than solid lines.
Post-processing AA techniques (like FXAA) will not be able to repair these lines.
MSAA / super-sampling techniques will mitigate this issue, making the threshold for a solid line turning into a dashed line smaller.
Another solution is to fix the data going in to the renderer -- if you've got tiny bits of geometry that are going to end up being thinner than a pixel, they should be replaced with some kind of low-LOD version of themselves.  e.g. if those fences were drawn with as a textured plane with alpha blending, you'd get soft anti-aliased lines by default, even with no anti-aliasing technique used.

Edit: On the other hand, isn't it somewhat expensive the loop through a runtime-determined number of lights inside a fragment shader?

On D3D9-era GPUs, yes, the branch instructions are quite costly. They'll also compile the loop to something like:
for( int i=0; i!=255; ++i ) { if( i >= g_lightCount ) break; ..... }
On more modern cards, branching is less costly. The biggest worry is when nearby pixels take different branches (e.g. one pixel wants to loop through 8 lights, but it's neighbour wants to loop through 32 lights) -- but most of these Forward+-ish techniques mitigate this issue by clustering/tiling pixels together, so that neighbouring pixels are likely to use the same loop counters and data.

Edited by Hodgman, 03 July 2013 - 07:47 PM.


#6 CDProp   Members   -  Reputation: 1047

Like
0Likes
Like

Posted 03 July 2013 - 08:04 PM


Also, if you don't actually integrate it into your pipeline, then it's not actually in your game. It's pretty unkind to your users to say "to make my game look best, go into your nVidia driver panel (sorry ATI/intel users) and enable these hacks".

 

Hah, good point. This isn't a product that I plan on releasing into the wild, so that's not even something that I considered.

 


On more modern cards, branching is less costly. The biggest worry is when nearby pixels take different branches (e.g. one pixel wants to loop through 8 lights, but it's neighbour wants to loop through 32 lights) -- but most of these Forward+-ish techniques mitigate this issue by clustering/tiling pixels together, so that neighbouring pixels are likely to use the same loop counters and data.

 

Could I trouble you for more information about this? Why is it that nearby pixels like to have the same number of pixels, and what is meant by "nearby"? Would an 8x8 tile suffice? (that seems to be the common size, from what I've been reading)



#7 Hodgman   Moderators   -  Reputation: 31851

Like
4Likes
Like

Posted 03 July 2013 - 08:15 PM

GPU's use SIMD to run your shader programs on many sets of data at once -- e.g. every time one of it's "CPU cores" executes an instruction, it is actually executing that instruction on, say, 64 different pixels.
If just one of those pixels takes a particular branch, the "CPU core" has to execute the instructions on that branch. To get the correct results, it keeps the results for that one pixel, and ignores results for the other 63 pixels that didn't take the branch.

 

There's no real portable rules about this; every GPU is free to implement things differently... but I think an 8x8 tile (64 pixels) would be a decent rule of thumb for the kind of coherency you'd like to have.

 

 

Secondly, a different, but related concept is that pixel shaders are generally executed on a "quad" of 2x2 pixels, this is so that mipmapping/gradient instructions (e.g. ddx/ddy) can be implemented. If a triangle only covers a single pixel, the shader will likely still be computed for a whole 2x2 quad, with 3 of the pixels being thrown away.

So, if a GPU-core shades 64 pixels at a time, this really means that it shades 16 2x2 quads at a time.

This is fairly portable behaviour AFAIK - so your "tiles" should always have even dimensions -- e.g. 8x8 not 9x9 or 7x7.



#8 cowsarenotevil   Crossbones+   -  Reputation: 2103

Like
0Likes
Like

Posted 03 July 2013 - 09:23 PM

Secondly, a different, but related concept is that pixel shaders are generally executed on a "quad" of 2x2 pixels, this is so that mipmapping/gradient instructions (e.g. ddx/ddy) can be implemented. If a triangle only covers a single pixel, the shader will likely still be computed for a whole 2x2 quad, with 3 of the pixels being thrown away.

 

Not relevant to OP, but that just answered a question I've had for a while. I assumed that was how derivative functions worked, but for some reason I'd always wondered how mipmapping worked in shaders. It never occurred to me that that would be all it took.


-~-The Cow of Darkness-~-

#9 marcClintDion   Members   -  Reputation: 431

Like
1Likes
Like

Posted 03 July 2013 - 09:28 PM

There is a document that deals with that issue of jaggedly lines for fences and power cables and such.  

 

Anti-aliasing alone can't help you since the lines off in the distance end up being much less than one pixel.    Look for the section titled "Phone-wire Anti-Aliasing"

 

http://www.humus.name/Articles/Persson_GraphicsGemsForGames.pdf


Consider it pure joy, my brothers and sisters, whenever you face trials of many kinds, because you know that the testing of your faith produces perseverance. Let perseverance finish its work so that you may be mature and complete, not lacking anything.


#10 CDProp   Members   -  Reputation: 1047

Like
0Likes
Like

Posted 03 July 2013 - 09:37 PM

The power cables in our scene are basically just rectangular strips, onto which a texture of a catenary-shaped wire is mapped. This was done by a modeler long before my time, so I don't know why he decided to do it that way, but one fortunate side-effect is that the texture filtering takes care of everything for me. The fence poles and light posts are a different story, but because they never end up being much smaller than a pixel, the 4x MSAA seems to do an okay job on them. Thanks for the PDF, though, it has a lot of useful information.

 

Hodgman, thanks again for your help. I'm much obliged to everyone here. Although I must say, you guys are only encouraging me to ask more questions. =)



#11 MJP   Moderators   -  Reputation: 11774

Like
0Likes
Like

Posted 04 July 2013 - 12:00 AM

If I download the FXAA 3.9 shader and integrate it in my pipeline, will it help beyond allowing me to avoid blurring HUD/text elements? The reason I ask is that I have a lot of objects in my scene with long, straight edges -- particularly buildings and chain link fences, but also some vehicles as well -- with which FXAA seems to work particularly poorly. At a distance, these objects create wild, shimmering jaggies that are very distracting. Will downloading the shader actually improve this? Here are a couple examples:

 

This is one of the cases that post-processing AA solutions like FXAA have a lot of difficulty with. You really need to rasterize at a higher resolution to make high-frequency geometry look better, and that's exactly what MSAA does. Something like FXAA is fundamentally limited in terms of the information it has available to it, which makes it unable to fix these sorts of situations. Some sort of temporal solution that looks at data from the previous frame can help, but is still usually less effective than MSAA.

 

I'm particularly intrigued by the Forward+ idea, because the idea of using an MRT with HDR and MSAA is starting to sound prohibitive. Let's say I use the G-Buffer layout that you mentioned in your March 2012 blog post on Light-Indexed Deferred rendering, except the albedo buffers need to be bumped up to 64bpp to accommodate HDR rendering (right?). Then, multiply the whole thing by 4 for 4x MSAA, and I have a seriously fat buffer. And what do I do about reflection textures? If I want to do planar reflections or refractions, for example. That seems like it'd be another big fat g-buffer. Am I thinking about this correctly? Plus, you have the lack of flexibility with material parameters that comes with deferred rendering. 

 

Albedo values should always be [0, 1], since they're essentially the ratio of light reflecting off a surface. With HDR the input lighting values are often > 1 and the same goes for the output lighting value, but albedo is always [0,1]. But even with that it's true that a G-Buffer with 4xMSAA enabled can use up quite a bit of memory, which is definitely a disadvantage. Material parameters can also potentially be an issue. If you require a lot of input parameters to your lighting, then you need a lot of G-Buffer textures which increases memory usage and bandwidth. With forward rendering you don't necessarily need to always think about what parameters need to be packed into your G-Buffer, which can potentially make it easier for experimenting with new lighting models

 

Edit: On the other hand, isn't it somewhat expensive the loop through a runtime-determined number of lights inside a fragment shader? If it isn't, then why did the old forward-renderers bother compiling different shaders for different numbers of lights? Why did they not, instead, just allow 8 lights per shader (say), and use a uniform (numLights) to determine how many to actually loop through? Sure, you only get per-object light lists that way, which is imprecise, but is it really slower than having a separate compute shader step that determines the light list on a per-pixel basis?

 

Sure it can be expensive to loop through lights in a shader, but this is essentially what you do in any deferred renderer if you have multiple lights overlapping any given pixel. However with traditional deferred rendering you end up sampling your G-Buffer and blending the fragment shader output for each light, which can consume quite a bit of bandwidth. With forward rendering or tiled deferred rendering you only need to sample your material parameters once and the summing of light contributions happens in registers, which avoids excessive bandwidth usage. The main problem with older forward renderers is that older GPU's and shading languages lacked the flexibility needed to build per-tile lists and dynamically loop over them in a fragment shader. Shaders did not have support for reading from generic buffers, and fragment shaders couldn't dynamically read indexed data from shader constants. You also didn't have compute shaders with shared memory, which is currently the best way to build per-tile lists of lights. But it's true that determining a set of lights per-object is fundamentally the same thing, the main difference is that the level of granularity is different. Also you typically do per-object association on the CPU, while with tiled forward or deferred you do the association on the GPU using the depth buffer to determine if a light affects a given tile.


Edited by MJP, 04 July 2013 - 12:01 AM.


#12 TiagoCosta   Crossbones+   -  Reputation: 2456

Like
0Likes
Like

Posted 04 July 2013 - 04:17 AM

This goes a bit off-topic but I think it might be helpful:

 

Real-time Rendering Architectures (SIGGRAPH 2011) - Great presentation about how the GPUs work.

 

Intersecting Lights with Pixels (SIGGRAPH 2012) - Tiled Deferred vs Tiled Forward (and how to cull lights).



#13 johnchapman   Members   -  Reputation: 550

Like
0Likes
Like

Posted 04 July 2013 - 09:06 AM

There's an adaptive supersampling approach outlined here which may also be of interest.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS