optimizing deferred shading algorithm

Started by
8 comments, last by Daban 9 years, 6 months ago

Please check my approach for deferred shading algorithm (DX 11) and see if it is possible to optimize it.

  1. Generate G-Buffer. Also use a separate depth stencil buffer. Since this buffer is required for lighting pass, we can’t just create a shader resource view of it, so store view space depth as 16 bit float in one of the render targets.
  2. Switch to single render target mode, still using the same depth stencil buffer we used for scene rendering.
  3. Draw a screen aligned quad, use “0.25 * albedo” to simulate scene ambient lighting. Where albedo is diffuse color from G-Buffer.
  4. For each light in the list (assuming the list is optimized) draw light volume mesh (in two passes) in order to detect pixels inside the light volume (use depth stencil buffer).
  5. Draw a screen aligned quad again, but this time use a lighting algorithm appropriate for current light type (use G-Buffer to retrieve view space depth). While using stencil buffer restricts the calculations to where they should be applied, but this step requires switching between meshes and shaders and render states (light volume <=> quad and associated shaders with them).

I also have a problem with step D. If camera was inside the light volume, front faces of the mesh would be discarded and this ruins everything.

Thank you.

Advertisement

Instead of drawing the light volumes etc. I would recommend to use a tiled approach, here is the basic idea:

1. Subdivide the screen into tiles.

2. Determine which lights influence which tiles (use simple bounding volume).

3. Render tiles as quads .

No use of stencil buffer and switching between rendering bounding volumes and fullscreen quads.

I have seen some examples of using tiled approach, but I'm not sure how to implement that. Let me check google. thaks.

I have seen some examples of using tiled approach, but I'm not sure how to implement that. Let me check google. thaks.


Since this buffer is required for lighting pass, we can’t just create a shader resource view of it, so store view space depth as 16 bit float in one of the render targets.

I'm not sure if I understand the problem with depth stencil view - under D3D11 it is possible read the depth stencil view (depth or stencil) via a shader resource view. For certain cases you may use readonly flag for the SRV so that you can still perform the depth test. 16-bit float depth buffer isn't very good quality in general. It will produce artifacts in the position reconstruction.

Cheers!

Yes Kauna is correct, you can most definitely read from a depth buffer in D3D11. The only issue you might run into is that D3D11 doesn't allow reading from a depth buffer while you're simultaneously writing to it. So if you want to read from your buffer while still using it for depth testing, you have to make a special "read-only" depth stencil view and bind that to the context for rendering.

If your camera is inside a light volume, then you should just fall back to a single-pass method: draw with front-face culling, with GREATER_EQUAL depth testing.

Have you actually tested that stencil method is faster than simple conditional(is this pixel withing range) at pixel shader.

Have you actually tested that stencil method is faster than simple conditional(is this pixel withing range) at pixel shader.

I implemented the stencil stuff about 3-3.5 years ago when we were working on the prototype of our game. At the time it definitely gave us better performance than branching in the shader. That was mostly tested on a GTX 470, I think. Couldn't say for sure if it's still the same on newer GPU's, but the same situation still applies: using stencil allows the hardware to schedule threads somewhat better compared to branching, where you often end up with low-occupancy warps/wavefronts/

Have you actually tested that stencil method is faster than simple conditional(is this pixel withing range) at pixel shader.

I implemented the stencil stuff about 3-3.5 years ago when we were working on the prototype of our game. At the time it definitely gave us better performance than branching in the shader. That was mostly tested on a GTX 470, I think. Couldn't say for sure if it's still the same on newer GPU's, but the same situation still applies: using stencil allows the hardware to schedule threads somewhat better compared to branching, where you often end up with low-occupancy warps/wavefronts/

Did you test with different size point lights? I could guess that there is some size treshold when branching will became faster.

Good for me! I managed to remove one of my render targets. Now my engine extracts view space position from the same depth stencil buffer which is used for stencil operations.
For anyone who don’t know how to create a read only view to depth stencil buffer, this is the gist:


ID3D11Texture2D *Texture;

D3D11_TEXTURE2D_DESC TextureDescription;

ID3D11DepthStencilView *DepthStencilView;

D3D11_DEPTH_STENCIL_VIEW_DESC DepthStencilViewDescription;

ID3D11DepthStencilView *DepthStencilViewReadOnly;

D3D11_DEPTH_STENCIL_VIEW_DESC DepthStencilViewReadOnlyDescription;

ID3D11ShaderResourceView *ShaderResourceView;

D3D11_SHADER_RESOURCE_VIEW_DESC ShaderResourceViewDescription;



TextureDescription.ArraySize = 1;

TextureDescription.CPUAccessFlags = 0;

TextureDescription.MipLevels = 1;

TextureDescription.MiscFlags = 0;

TextureDescription.Usage = D3D11_USAGE_DEFAULT;

TextureDescription.Format = DXGI_FORMAT_R24G8_TYPELESS;

TextureDescription.BindFlags = D3D11_BIND_DEPTH_STENCIL|D3D11_BIND_SHADER_RESOURCE;

TextureDescription.Height = Height;

TextureDescription.Width = Width;

TextureDescription.SampleDesc.Count = SampleCount;

TextureDescription.SampleDesc.Quality = SampleQuality;



// Create the actual resource

Device->CreateTexture2D(&TextureDescription, nullptr, &Texture);



// Create Read-Write view, GPU can write to this view because the resource is created with “D3D11_USAGE_DEFAULT” usage and we don’t use any flag with view description.

DepthStencilViewDescription.Flags = 0;

DepthStencilViewDescription.ViewDimension = D3D11_DSV_DIMENSION_TEXTURE2DMS;

Device->CreateDepthStencilView(Texture, &DepthStencilViewDescription, &DepthStencilView);



// Create Read Only view, we use non-zero flags.

DepthStencilViewReadOnlyDescription.Flags = D3D11_DSV_READ_ONLY_DEPTH; // there is another flag for stencil buffer

DepthStencilViewReadOnlyDescription.Format = DXGI_FORMAT_D24_UNORM_S8_UINT;

Device->CreateDepthStencilView(Texture, &DepthStencilViewReadOnlyDescription, &DepthStencilViewReadOnly);



ShaderResourceViewDescription.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2DMS;

ShaderResourceViewDescription.Format = DXGI_FORMAT_R24_UNORM_X8_TYPELESS;

Device->CreateShaderResourceView(Texture, &ShaderResourceViewDescription, &ShaderResourceView);

Now we can use the shader resource view and the read only depth stencil view simultaneously. I didn’t compile the code, so there may be some syntax errors etc…
Thank you very much, that was awesome!

This topic is closed to new replies.

Advertisement