Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Today, 01:56 AM

#5077757 Global illumination techniques

Posted by MJP on 14 July 2013 - 08:57 PM

Yeah  that's the gist of it: you store ambient diffuse lighting encoded in some basis at sample locations, and interpolate between the samples based on the position of the object sampling the probes as well as the structured of the probes themselves (grid, loose points, etc.). Some common based used for encoding diffuse lighting:

  • Single value - store a single diffuse color, use it for all normal directions. Makes objects look very flat in ambient lighting, since the ambient diffuse doesn't change with normal direction. There's no way to actually compute diffuse lighting without a surface normal, so typically it will be the average of the computed diffuse in multiple directions.
  • Hemispherical - basically you compute diffuse lighting for a normal pointed straight up, and one for a normal pointed straight down. Then you interpolate between the two value using the Y component of the surface normal used for rendering.
  • Ambient cube - this is what Valve used in HL2. Similar to hemispherical, except diffuse lighting is computed and stored in 6 directions (usually aligned to world space axes). The contribution for each light is determined by taking the dot product of the surface normal with the direction for each axis of the cube.
  • Spherical harmonics - very commonly used in modern games. Basically you can store a low-frequency version of any spherical signal by projecting onto the SH basis functions up to a certain order, with the order determining how much detail will be retained as well as the number of coefficients that you need to store. SH has all kinds of useful properties, for instance they're essentially a frequency-domain representation of your signal which means you can perform convolutions with a simple multiplication. Typically this is used to convolve the lighting environment with a cosine lobe, which essentially gives you Lambertian diffuse. However you can also convolve with different kernels, which allows you to use SH with non-Lambertian BRDF's as well. You can even do specular BRDF's, however the low-frequency nature of SH typically limits you to very high roughnesses (low specular power, for Phong/Blinn-Phong BRDF's).
  • Spherical Radial Basis Functions - with these you basically approximate the full lighting environment surrounding a probe point with a set number of lobes (usually Gaussian) oriented at arbitrary directions about a sphere. These can be cool because they can let you potentially capture high-frequency lighting. However they're also difficult because you have to use a non-linear solver to "fit" a set of lobes to a lighting environment. You can also have issues with interpolation, since each probe can potentially have arbitrary lobe directions.
  • Cube Maps - this isn't common, but it's possible to integrate the irradiance for a set of cubemap texels where each texel represents a surface normal of a certain direction. This makes your shader code for evaluating lighting very simple: you just lookup into the cubemap based on the surface normal. However it's generally overkill, since something like SH or Ambient Cube can store diffuse lighting with relatively little error while having a very compact representation. Plus you don't have to mess around with binding an array of cube map textures, or sampling from them.

For all of these (with the exception of SRBF's) you can generate the probes by either ray-tracing and directly projecting onto the basis, or by rasterizing to a cubemap first and then projecting. This can potentially be very quick, in fact you could do it in real time for a limited number of probe locations. SRBF's are trickier, because of the non-linear solve which is typically an iterative process.

EDIT: I looked at those links posted, and there's two things I'd like to point out. In that GPU Gems article they evalute the diffuse from the SH lighting environment by pre-computing the set of "lookup" SH coefficients to a cube map lookup texture, but this is totally unnecessary since you can't just directly compute these coefficients in the shader. Something like this should work:

// 'lightingSH' is the lighting environment projected onto SH (3rd order in this case),
// and 'n' is the surface normal
float3 ProjectOntoSH9(in float3 lightingSH[9], in float3 n)
    float3 result = 0.0f;
    // Cosine kernel
    const float A0 = 1.0f;
    const float A1 = 2.0f / 3.0f;
    const float A2 = 0.25f;

    // Band 0
    result += lightingSH[0] * 0.282095f * A0;

    // Band 1
    result += lightingSH[1] * 0.488603f * n.y * A1;
    result += lightingSH[2] * 0.488603f * n.z * A1;
    result += lightingSH[3] * 0.488603f * n.x * A1;

    // Band 2
    result += lightingSH[4] * 1.092548f * n.x * n.y * A2;
    result += lightingSH[5] * 1.092548f * n.y * n.z * A2;
    result += lightingSH[6] * 0.315392f * (3.0f * n.z * n.z - 1.0f) * A2;
    result += lightingSH[7] * 1.092548f * n.x * n.z * A2;
    result += lightingSH[8] * 0.546274f * (n.x * n.x - n.y * n.y) * A2;

    return result;

This brings me to my second point, which is that the WebGL article mentions the lookup texture as a disadvantage of SH which really isn't valid since you don't need it at all. This makes SH a much more attractive option for storing irradiance, especially if your goal is runtime generation of irradiance maps since with SH you don't need an expensive convolution step. Instead your projection onto SH is basically a repeated downsampling process, which can be done very quickly. This is especially true if you use compute shaders, since you can use a parallel reduction to perform integration using shared memory with fewer steps.


For an introduction to using SH for this purpose, I would definitely recommend reading Ravi's 2001 paper on the subject. Robin Greene's paper is also a good place to start.

#5077405 Fast exp2() function in shader

Posted by MJP on 13 July 2013 - 01:57 PM

Almost all GPU's have a native exp2 instruction in their ALU's, so you're not going to make a faster version on your own. Converting to integer does often have a performance cost, and on most GPU's integer instructions run at 1/2 or 1/4 rate which means its unlikely you'll get better performance with bit shifts. You'll have to check the available docs on various architectures to find out the specifics.

#5076815 Debugging the stencil buffer

Posted by MJP on 10 July 2013 - 11:07 PM

It depends on which format you used to create the depth buffer. If you used DXGI_FORMAT_R24G8_TYPELESS, then use DXGI_FORMAT_X24_TYPELESS_G8_UINT to create your SRV and then access the G channel in your shader. If you used DXGI_FORMAT_R32G8X24_TYPELESS, then do the same with DXGI_FORMAT_X32_TYPELESS_G8X24_UINT.

In your shader, make sure that you declare your texture as Texture2D<uint2>. You'll then get an integer that's [0, 255], which you can convert to a [0,1] float for displaying.

#5076706 Global illumination techniques

Posted by MJP on 10 July 2013 - 03:06 PM


Hmm, you're right. It looks like they're using cascaded shadow maps for both the static and dynamic geometry, which is interesting. I assume they bake only the indirect lighting and then just add in the direct lighting on the fly. If nothing else, it's probably easier to implement than storing the contribution of direct light onto static geometry.


Guys, I understand the part with shadows. It's not interesting if they are using static shadow maps for static level geometry. I don't think they just bake the indirect lighting and that's it. The actors and other objects moving through the level receive indirect lighting as well. I have a feeling they have some sort of lightmap on static levels and also have some "fill lights" placed here and there simulate bounced light and to illuminate dynamic objects, that move around.



It's fairly common to bake ambient lighting into probes located throughout the level, and then have dynamic objects sample from those probes as they move through the level.

#5076541 Are the D3DX Functions Faster ?

Posted by MJP on 10 July 2013 - 01:13 AM

If you're not experienced at writing a math library, then don't expect to write one that's going to be better than D3DX. It's more likely you'll end up with something slower and buggier.

I'll also point out that there's a newer replacement for D3DX math, called DirectXMath.

#5076283 C++ DX API, help me get it?

Posted by MJP on 08 July 2013 - 11:17 PM

1) A lot of the setup in the simple tutorials is just getting a window going. That's just the way it is in Win32: it's a crusty old C API that takes a lot of boilerplate just for basic functionality. There's no reason not to use a framework to handle this for you, whether it's an existing one like SDL or one of your own design.


2) It's inherited from the Windows API. Almost all API's do something like this, since C/C++ only makes loose guarantees about the size of types. Using typedefs can ensure that the API is always working with the same size type. This isn't really much of an issue for x86 Windows platforms, so you can usually ignore them and using native types or the types from stdint.h


3) D3D uses a light-weight form of COM. Supporting COM requires C compatibility, which means you're mostly limited to functionality exposed in C (before C99). This is why  you have things like pointers instead of references for function parameters, and structs with no methods. However there are actually "C++" versions of some of the structures that have constructors defined. They're named the same with a "C" in front of them, for instance CD3D11_TEXTURE2D_DESC. It's also possible to make your own extension structures if you want.

4) Mostly because the older samples are written in more of a "C with classes" style as opposed to modern C++. The newer samples written for the Windows 8 SDK actually make heavy use of exceptions, smart pointers, and C++11 features. In my home codebase I just made a wrapper macro for D3D calls that checks the HRESULT for failure, and if it fails converts it to a string error message and stuffs it in an exception.

5) This is indeed the same reasoning as #3. It can definitely be pretty ugly at times.

6) Also the same as #3

7) Yeah that stuff is rooted in the Win32 API, and it's seriously ugly. I never use the typedefs in my own code.

8) This comes from D3D being a COM API. DXGI actually happens to be even heavier on the COM compared to D3D, hence it taking the interface GUID as a function parameter. However I'm pretty sure you don't have to use __uuidof if you don't want, it's just a convenience.

The reason SharpDX doesn't "feel" the same is because they wrap all of this in types and functions that convert the COM/Win32 idioms into patterns that are typical to C#. You can certainly do the same except with modern C++ concepts, if that's how you'd like to do it.

#5075565 What is the difference between DXGI_SWAP_EFFECT_DISCARD and DXGI_SWAP_EFFECT_...

Posted by MJP on 05 July 2013 - 04:05 PM

It's pretty simple. If you use DISCARD, then as soon as you call Present the contents of the backbuffer are wiped away. If you use SEQUENTIAL, then the contents of the back buffer remain after calling Present. The order of what you see on the screen is the same in both modes, it's always the same order in which you call Present.

As for your refresh rate question, that depends on what you pass as the SyncInterval parameter of IDXGISwapChain::Present. If you pass 0, then the device never waits and always presents to the screen as soon as the GPU is ready to do so. If you happen to present outside of the VBLANK period then you will get tearing. If you pass 1, then the device waits until the next VBLANK period to flip buffers. So in your 90fps scenario, the device would then effectively be locked at 60Hz since that's the fastest that the display can output to the screen. If you pass 2, then the device waits for the 2nd VBLANK period which would cap you at 30Hz.

#5075108 HLSL Shader Library

Posted by MJP on 03 July 2013 - 02:42 PM

At this point shaders are generic programs that you run on the GPU. It's like asking for a C++ code library.

For any non-simple renderer the shader code will completely depend on the overall architecture of the renderer, and not just the visual appearance of whatever you're drawing. In fact a lot of modern shaders don't draw anything at all!

#5075099 Yet another Deferred Shading / Anti-aliasing discussion...

Posted by MJP on 03 July 2013 - 02:26 PM

FXAA is good in that it's really easy to implement and it's really cheap, the quality is not great. It has limited information to work with, and is completely incapable of handling temporal issues due to lack of sub-pixel information. If you use it you definitely want to do as ic0de recommends and grab the shader code and insert it into your post-processing chain as opposed to letting the driver do it, so that you can avoid applying it to things like text and UI. There's also MLAA which has similar benefits and problems.

You are correct that the "running the shader per-pixel" bit of MSAA only works for writing out your G-Buffer. The trick is to use some method of figuring out which pixels actually have different G-Buffer values in them, and then apply per-sample lighting only to those pixels while applying per-pixel lighting to the rest. For deferred renderers that use fragment shaders and lighting volumes, the typical way to do this is to generate a stencil mask and draw each light twice: once with a fragment shader that uses per-pixel lighting, and once with a fragment shader that uses per-sample lighting. For tiled compute shader deferred renderers you can instead "bucket" per-sample pixels into a list that you build in thread group shared memory, and handle them separately after shading the first sample of all pixels.

Some links:






I also wrote quite a bit about this in the deferred rendering chapter of the book that I worked on, and wrote some companion samples that you can find on CodePlex.


Deferred lighting, AKA light pre-pass is basically dead at this point. It's only really useful if you want to avoid using multiple render targets, which was desirable on a particular current-gen console. If MRT isn't an issue then it will only make things worse for you, especially with regards to MSAA.

TXAA is just an extension of MSAA, so you need to get MSAA working before considering a similar approach. Same with SMAA, which basically combines MSAA and MLAA.

Forward rendering is actually making a big comeback in the form of "Forward+", which is essentially a modern variant of light indexed deferred rendering. Basically you use a compute shader to write out a list of lights that affect each screen-space tile (usually 16x16 pixels or so) and then during your forward rendering pass each pixel walks the list and applies each light. When you do this MSAA still works the way it's supposed to, at least for the main rendering pass. If you search around you'll find some info and some sample code.

As for the G-Buffer, as small as you can make it is still the rule of thumb. In general some packing/unpacking shader code is worth being able to use a smaller texture format. Reconstructing position from depth is absolutely the way to go, since it lets you save 3 G-Buffer channels. Storing position in a G-Buffer can also give you precision problems, unless you go for full 32-bit floats.

#5075093 Bilateral Blur with linear depth?

Posted by MJP on 03 July 2013 - 02:02 PM

You can linearize a sample from a depth buffer with just a tiny bit of math, using some values from your projection matrix:


float linearZ = Projection._43 / (zw - Projection._33);


That's using HLSL matrix syntax, you would have to convert that to the appropriate GLSL syntax.

#5074853 Shouldn't the vector that is multipled with the projection matrix be 4D?

Posted by MJP on 02 July 2013 - 03:16 PM


However,if I pass a 3D vector,will DirectX just add the 4th component to it or...? If there is no w,where is the z coppied?


It depends on the math library that you're using, and which function you're using. Both the D3DX and DirectXMath libraries have 2 different vector/matrix transformation functions: one that uses 0 as the W component, and one that uses 1. The functions that end with "Coord" use 1 as the W component, and the functions that end with "Normal" use 0 as the W component.

EDIT: actually let me correct that, there are 3 functions:


D3DXVec3Transform/XMVector3Transform - this uses 1 as a the W component, and returns a 4D vector containing the result of the multiplication

D3DXVec3TransformCoord/XMVector3TransformCoord - this uses 1 as a the W component, and returns a 3D vector containing the XYZ result divided by the W result

D3DXVec3TransformNormal/XMVector3TransformNormal - this uses 0 as a the W component, and returns a 3D vector containing the XYZ result

#5074373 Changing LoD on instancing and alpha testing

Posted by MJP on 01 July 2013 - 01:51 AM

What you basically want is the alpha to be a probability that a pixel is opaque. I would do it like this:

float random = tex2D(g_dissolvemap_sampler, a_texcoord0).r;
if(alpha < random)

You can also use alpha-to-coverage, which will basically accomplish the same thing using a tiled screen-space dither pattern instead of a pure random pattern. You can also  encode a set of dither patterns directly into a texture, and then lookup into that texture based on screen position and alpha value.

#5074361 Global illumination techniques

Posted by MJP on 01 July 2013 - 01:09 AM

I believe that voxel cone tracing is state of the art if you want to do real time GI. I think Crytek and Unreal 4 have it, though I'm not sure if anybody's shipped an actual game with it yet.


Epic has since moved away from it, they're using pre-baked lightmaps and specular probes now. Crytek was using an entirely different technique (Cascaded Light Propogation Volumes) which has a different set of tradeoffs. They shipped it for the PC version of Crysis 2 but not the console version, and I'm not sure if they used it in Crysis 3.

#5074359 Decals with deferred renderer

Posted by MJP on 01 July 2013 - 01:04 AM

DX10 feature level requires full, independent blending support for multiple render targets. So any DX10-capable GPU should support blending and color write control for MRT's, assuming that the driver enables it for D3D9.

#5074208 Global illumination techniques

Posted by MJP on 30 June 2013 - 12:45 PM

Just about all games with GI bake it to lightmaps, and this includes The Last of Us (although The Last of Us does have dynamic GI from your flashlight that's only enabled in a few indoor locations). Very few games have a runtime GI component, since existing techniques are typically expensive and don't end up looking as good as a static-baked result. Some games with deferred renderers try to get away with no GI at all, and just use some runtime or baked AO combined with lots of artist-placed "bounce lights" or "ambient lights" that try to fake GI.