Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Today, 06:40 PM

#5075099 Yet another Deferred Shading / Anti-aliasing discussion...

Posted by on 03 July 2013 - 02:26 PM

FXAA is good in that it's really easy to implement and it's really cheap, the quality is not great. It has limited information to work with, and is completely incapable of handling temporal issues due to lack of sub-pixel information. If you use it you definitely want to do as ic0de recommends and grab the shader code and insert it into your post-processing chain as opposed to letting the driver do it, so that you can avoid applying it to things like text and UI. There's also MLAA which has similar benefits and problems.

You are correct that the "running the shader per-pixel" bit of MSAA only works for writing out your G-Buffer. The trick is to use some method of figuring out which pixels actually have different G-Buffer values in them, and then apply per-sample lighting only to those pixels while applying per-pixel lighting to the rest. For deferred renderers that use fragment shaders and lighting volumes, the typical way to do this is to generate a stencil mask and draw each light twice: once with a fragment shader that uses per-pixel lighting, and once with a fragment shader that uses per-sample lighting. For tiled compute shader deferred renderers you can instead "bucket" per-sample pixels into a list that you build in thread group shared memory, and handle them separately after shading the first sample of all pixels.

Some links:






I also wrote quite a bit about this in the deferred rendering chapter of the book that I worked on, and wrote some companion samples that you can find on CodePlex.


Deferred lighting, AKA light pre-pass is basically dead at this point. It's only really useful if you want to avoid using multiple render targets, which was desirable on a particular current-gen console. If MRT isn't an issue then it will only make things worse for you, especially with regards to MSAA.

TXAA is just an extension of MSAA, so you need to get MSAA working before considering a similar approach. Same with SMAA, which basically combines MSAA and MLAA.

Forward rendering is actually making a big comeback in the form of "Forward+", which is essentially a modern variant of light indexed deferred rendering. Basically you use a compute shader to write out a list of lights that affect each screen-space tile (usually 16x16 pixels or so) and then during your forward rendering pass each pixel walks the list and applies each light. When you do this MSAA still works the way it's supposed to, at least for the main rendering pass. If you search around you'll find some info and some sample code.

As for the G-Buffer, as small as you can make it is still the rule of thumb. In general some packing/unpacking shader code is worth being able to use a smaller texture format. Reconstructing position from depth is absolutely the way to go, since it lets you save 3 G-Buffer channels. Storing position in a G-Buffer can also give you precision problems, unless you go for full 32-bit floats.

#5075093 Bilateral Blur with linear depth?

Posted by on 03 July 2013 - 02:02 PM

You can linearize a sample from a depth buffer with just a tiny bit of math, using some values from your projection matrix:


float linearZ = Projection._43 / (zw - Projection._33);


That's using HLSL matrix syntax, you would have to convert that to the appropriate GLSL syntax.

#5074853 Shouldn't the vector that is multipled with the projection matrix be 4D?

Posted by on 02 July 2013 - 03:16 PM


However,if I pass a 3D vector,will DirectX just add the 4th component to it or...? If there is no w,where is the z coppied?


It depends on the math library that you're using, and which function you're using. Both the D3DX and DirectXMath libraries have 2 different vector/matrix transformation functions: one that uses 0 as the W component, and one that uses 1. The functions that end with "Coord" use 1 as the W component, and the functions that end with "Normal" use 0 as the W component.

EDIT: actually let me correct that, there are 3 functions:


D3DXVec3Transform/XMVector3Transform - this uses 1 as a the W component, and returns a 4D vector containing the result of the multiplication

D3DXVec3TransformCoord/XMVector3TransformCoord - this uses 1 as a the W component, and returns a 3D vector containing the XYZ result divided by the W result

D3DXVec3TransformNormal/XMVector3TransformNormal - this uses 0 as a the W component, and returns a 3D vector containing the XYZ result

#5074373 Changing LoD on instancing and alpha testing

Posted by on 01 July 2013 - 01:51 AM

What you basically want is the alpha to be a probability that a pixel is opaque. I would do it like this:

float random = tex2D(g_dissolvemap_sampler, a_texcoord0).r;
if(alpha < random)

You can also use alpha-to-coverage, which will basically accomplish the same thing using a tiled screen-space dither pattern instead of a pure random pattern. You can also  encode a set of dither patterns directly into a texture, and then lookup into that texture based on screen position and alpha value.

#5074361 Global illumination techniques

Posted by on 01 July 2013 - 01:09 AM

I believe that voxel cone tracing is state of the art if you want to do real time GI. I think Crytek and Unreal 4 have it, though I'm not sure if anybody's shipped an actual game with it yet.


Epic has since moved away from it, they're using pre-baked lightmaps and specular probes now. Crytek was using an entirely different technique (Cascaded Light Propogation Volumes) which has a different set of tradeoffs. They shipped it for the PC version of Crysis 2 but not the console version, and I'm not sure if they used it in Crysis 3.

#5074359 Decals with deferred renderer

Posted by on 01 July 2013 - 01:04 AM

DX10 feature level requires full, independent blending support for multiple render targets. So any DX10-capable GPU should support blending and color write control for MRT's, assuming that the driver enables it for D3D9.

#5074208 Global illumination techniques

Posted by on 30 June 2013 - 12:45 PM

Just about all games with GI bake it to lightmaps, and this includes The Last of Us (although The Last of Us does have dynamic GI from your flashlight that's only enabled in a few indoor locations). Very few games have a runtime GI component, since existing techniques are typically expensive and don't end up looking as good as a static-baked result. Some games with deferred renderers try to get away with no GI at all, and just use some runtime or baked AO combined with lots of artist-placed "bounce lights" or "ambient lights" that try to fake GI.

#5073153 So, Direct3D 11.2 is coming :O

Posted by on 27 June 2013 - 12:19 AM

Input assembler moved completely into the vertex shader.  You bind resources of pretty much any type to the vertex shader, access them directly via texture look-ups.  Would make things a lot simpler and more flexible IMHO.  Granted you sort-of can do this already, but I'd be nice if the GPUs/drivers were optimized for it.


GPU's already work this way. The driver generates a small bit of shader code that runs before the vertex shader (AMD calls it a fetch shader), and all it does is load data out of the vertex buffer and dump it into registers. If you did it all yourself in the vertex shader there's not really reason for it to be any slower.


Depth/stencil/blend stage moved completely into the pixel shader.  Sort of like UAVs but not necessarily with the ability to do 'scatter' operations.  Could be exposed by allowing 'SV_Target0', 'SV_Target1' ect... to be read and write.  So initially its loaded with the value of the target, and it can be read, compared, operated on, and then if necessary written.


Programmable blending isn't happening without completely changing the way desktop GPU's handle pixel shader writes. TBDR's can do it since they work with an on-chip cache, but they can't really do arbitrary numbers of render targets.

Doing depth/stencil in the pixel shader deprives you of a major optimization opportunity. It would be like always writing to SV_Depth.

#5072884 DX11 - Multiple Render Targets

Posted by on 25 June 2013 - 06:09 PM

Use PIX or VS 2012 Graphics Debugger to inspect device state at the time of the draw call, it will tell you what render targets are currently bound.

#5072305 about Shader Arrays, Reflection and BindCount

Posted by on 23 June 2013 - 02:14 PM

Arrays of textures in shaders are really just a syntactical convenience, the underlying shader assembly doesn't actually have any support for them so the compiler just turns the array into N separate resource bindings. So it doesn't surprise me that the reflection interface would report it as N separate resource bindings, since that's essentially how you have to treat it on the C++ side of things.

It does seem weird that the BIND_DESC structure has a BindCount fieldthat suggests it would be used for cases like these, but I suppose it doesn't actually work that way. I wonder if that field actually gets used for anything.

#5072302 Environment reflection & fresnel correct ?

Posted by on 23 June 2013 - 02:08 PM

These are open problems, so I don't really have any silver bullet solutions to share with you. For approximating the geometric term of the BRDF + incorrect Fresnel you can apply a curve to approximate those factors, which is basically what Sebastian Legarde described in his blog post. Macro-scale occlusion is trickier, since it depends on the actual geometry of your scene. One idea might be to pre-compute directional occlusion per-vertex or in a texture, and use that as an occlusion factor for your cubemap reflections. Another idea might be to attempt to determine occlusion in screen space using the depth buffer.

#5072109 Environment reflection & fresnel correct ?

Posted by on 22 June 2013 - 07:08 PM

Okay, let's see if we can straighten all of this out.


First, let's start with a BRDF itself. A BRDF basically tells you how much light reflects off a surface towards the eye, given a lighting environment surrounding that surface. To apply it, you integrate BRDF * incidentLighting about a hemisphere surrounding the point's surface normal. A common way of approximating the result of these kinds of integrals is to use monte carlo sampling, where you basically evalute the result of the function being integrated at random points and sum the results (in reality it's more complex than this, but that's not important at the moment). So you can imagine that this is pretty simple to do in a ray tracer: you pick random rays surrounding the surface normal direction, trace the ray, evaluate the BRDF for that given ray direction and eye direction, multiply the the BRDF with the ray result, and add that result to a running sum. It's also trivial to handle punctual light sources (point lights, directional lights, etc.) since these lights are infinitely small  (they're basically a delta) so you can integrate them by just multiplying the BRDF with the lighting intensity.


Now let's talk about microfacet BRDF's. The main idea behind a microfacet BRDF is that you treat a surface as if it's made up of millions of little microscopic surfaces, where each one of those tiny microfacets is perfectly flat. Being perfectly flat lets you treat that microfacet as Fresnel reflector, which means that as light reflects at a shallow angle more of the light is reflected instead of being refracted into the surface. It also means you can use basic geometry to determine what direction a ray of light will reflect off that microfacet. A microfacet BRDF will then assume that all of these little facets are oriented in random directions relative to the overall surface normal. For rougher surfaces, the facets will mostly point away from the normal. For less rough surfaces, the facets will mostly line up with the surface normal. This is modeled in a microfacet BRDF wtih a normal distribution function (NDF), which is essentially a probability density function that tells you the percentage of micro facets that will "line up" in such a way that light from a given direction will reflect towards the eye. For these facets the reflection intensity is assumed to respect Fresnel's laws, which is why you have the Fresnel term in a microfacet BRDF. Now there's one other important piece, which is the geometry term (also known as the shadowing term). This term accounts for light reflecting off a microfacet but then being blocked by other microfacets. In general this will balance out the Fresnel effect, particularly for rougher surfaces since they will have more shadowing.


So let's say we want to apply a microfacet BRDF to environment lighting in a real-time application. Doing this with monte carlo sampling is prohibitively expensive, since you often need thousands of samples to converge on a result. So instead a common technique is to approximate the integral using a pre-integrated environment map. The basic idea is pre-integate a portion of your BRDF with an environment map, using a different roughness for each mip level (integrating with a BRDF is essentially a convolution, so it basically amounts to a blur pass). However you have a major issue, which is that the function you're trying to compute has too high dimensionality. The reflected light off a surface depends on the viewing angle and the surface normal, which means we can't use a single cubemap to store the integrated result for all possible viewing directions and surface orientations. So instead, we make a major approximation by only parameterizing on the view direction reflected off the surface normal. To do this, we can only pre-integrate the distribution term of the BRDF with the environment map. This leaves the geometric and Fresnel terms to be handled at runtime. The common approach for Fresnel is to apply it for the reflected view direction, basically just means that you go to 1 as the normal becomes perpendicular to the view direction. This produces incorrect results, since the Fresnel term should have been applied to all of the individual light directions instead of to one direction after convolving with the NDF. The same goes for the geometric term, which leaves you simple approximations like what Sebastian suggests on his blog.


Now let's look at some pictures that illustrate how some of this works. This first picture shows a object being lit by an environment, with the the material having low roughness (0.01) and low specular intensity (0.05). It was rendered by monte carlo integrating a microfacet BRDF, so it serves as our ground truth:


Skull Specular LowRoughness MC.png


As you can see there's a strong Fresnel effect along the top left portion of the skull.


Now we have an approximated version using a cubemap that was pre-convolved to match the NDF for the same roughness, and Fresnel applied to the reflected view direction:


Skull Specular LowRoughness EM.png


This approximation is actually pretty good, which makes sense since our approximation works best for low roughnesses. This is because for low roughnesses most of the microfacets will be active, and so our assumption of sampling at the reflected view direction is a good one.


Now we have the same skull but with a higher roughness of 0.2, rendered with monte carlo sampling:


Skull Specular HiRoughness MC.png


Now the Fresnel effect is much less pronounced due to the geometric term kicking in, and due to having more variance in the incoming light directions that reflect towards the eye.


Now we'll go back to your cubemap approximation:


Skull Specular HiRoughness EM.png


In this case our Fresnel term is making the reflections much to bright at glancing angles, which means our approximation is no longer a good match.


Now we'll add in a simple curve to Fresnel term to decrease the intensity as roughness increases, in an attempt to balance out the over-brightening of our Fresnel approximation:


Skull Specular HiRoughness EM GApprox.png


This is certainly better, but still wrong in a lot of ways. Ideally we would do a better job with regards to pre-computing the BRDF, and handling view dependence.


One other important thing I'll mention is that you'll also get poor results if you don't handle macro-scale shadowing. As Hodgman mentioned earlier, objects should occlude themselves and if you don't account for this you will get light rays reflecting off surfaces that they should never reach. I don't actually handle this in my images, so you should keep that in mind when looking at them. I agree with Hodgman that this probably the most offensive thing about the original rock image that was posted, since the lack of occlusion combined with incorrect Fresnel gives you that "X-Ray" look.

#5072099 Using D3D9 Functions and, HLSL

Posted by on 22 June 2013 - 05:32 PM

Yes, you can absolutely do that. However the D3DX9 mesh loading functions require a D3D9 device, so you will need to create one in addition to your D3D11 device.

#5071611 Better to have separate shaders for each graphical option, or pass constants...

Posted by on 20 June 2013 - 04:39 PM

Like anything else, the correct choice depends on a few things. Generating seperate shaders will *always* result in more efficient assembly being generated when compared to branching on a value from a constant buffer. Statically disabling a feature allows the compiler to optimize away any calculations and texture fetches that would be needed for that feature, which results in a more efficient shader. Branching on the other hand will allow the GPU to skip executing all of the code in the branch, but there will still be performance penalties from having the branch itself. Also it won't be able to optimize away the code inside the branch, which can increase register usage.

However there are downsides to using seperate shaders. For instance, you have to compile and load more shaders. The number of shaders can explode once you add more than a few features that can all be turned on or off. Also you have to switch shaders more often, which can result in higher CPU overhead and can also impact GPU efficiency by causing pipeline flushes.


For your particular case, shadows are probably a good fit for having a seperate shader. This is because shadows tend to be heavy in terms of GPU performance due to multiple texture fetches, so the performance gain is probably worth it.

#5071338 GPU particles

Posted by on 19 June 2013 - 10:31 PM

Yeah the point->quad expansion has special-case handling in GPU's because it's so common. If you really want to avoid GS you can also use instancing to accomplish the same thing.