If I download the FXAA 3.9 shader and integrate it in my pipeline, will it help beyond allowing me to avoid blurring HUD/text elements? The reason I ask is that I have a lot of objects in my scene with long, straight edges -- particularly buildings and chain link fences, but also some vehicles as well -- with which FXAA seems to work particularly poorly. At a distance, these objects create wild, shimmering jaggies that are very distracting. Will downloading the shader actually improve this? Here are a couple examples:
This is one of the cases that post-processing AA solutions like FXAA have a lot of difficulty with. You really need to rasterize at a higher resolution to make high-frequency geometry look better, and that's exactly what MSAA does. Something like FXAA is fundamentally limited in terms of the information it has available to it, which makes it unable to fix these sorts of situations. Some sort of temporal solution that looks at data from the previous frame can help, but is still usually less effective than MSAA.
I'm particularly intrigued by the Forward+ idea, because the idea of using an MRT with HDR and MSAA is starting to sound prohibitive. Let's say I use the G-Buffer layout that you mentioned in your March 2012 blog post on Light-Indexed Deferred rendering, except the albedo buffers need to be bumped up to 64bpp to accommodate HDR rendering (right?). Then, multiply the whole thing by 4 for 4x MSAA, and I have a seriously fat buffer. And what do I do about reflection textures? If I want to do planar reflections or refractions, for example. That seems like it'd be another big fat g-buffer. Am I thinking about this correctly? Plus, you have the lack of flexibility with material parameters that comes with deferred rendering.
Albedo values should always be [0, 1], since they're essentially the ratio of light reflecting off a surface. With HDR the input lighting values are often > 1 and the same goes for the output lighting value, but albedo is always [0,1]. But even with that it's true that a G-Buffer with 4xMSAA enabled can use up quite a bit of memory, which is definitely a disadvantage. Material parameters can also potentially be an issue. If you require a lot of input parameters to your lighting, then you need a lot of G-Buffer textures which increases memory usage and bandwidth. With forward rendering you don't necessarily need to always think about what parameters need to be packed into your G-Buffer, which can potentially make it easier for experimenting with new lighting models
Edit: On the other hand, isn't it somewhat expensive the loop through a runtime-determined number of lights inside a fragment shader? If it isn't, then why did the old forward-renderers bother compiling different shaders for different numbers of lights? Why did they not, instead, just allow 8 lights per shader (say), and use a uniform (numLights) to determine how many to actually loop through? Sure, you only get per-object light lists that way, which is imprecise, but is it really slower than having a separate compute shader step that determines the light list on a per-pixel basis?
Sure it can be expensive to loop through lights in a shader, but this is essentially what you do in any deferred renderer if you have multiple lights overlapping any given pixel. However with traditional deferred rendering you end up sampling your G-Buffer and blending the fragment shader output for each light, which can consume quite a bit of bandwidth. With forward rendering or tiled deferred rendering you only need to sample your material parameters once and the summing of light contributions happens in registers, which avoids excessive bandwidth usage. The main problem with older forward renderers is that older GPU's and shading languages lacked the flexibility needed to build per-tile lists and dynamically loop over them in a fragment shader. Shaders did not have support for reading from generic buffers, and fragment shaders couldn't dynamically read indexed data from shader constants. You also didn't have compute shaders with shared memory, which is currently the best way to build per-tile lists of lights. But it's true that determining a set of lights per-object is fundamentally the same thing, the main difference is that the level of granularity is different. Also you typically do per-object association on the CPU, while with tiled forward or deferred you do the association on the GPU using the depth buffer to determine if a light affects a given tile.