• Advertisement
Sign in to follow this  

Deferred Shading & Early Z/Stencil

This topic is 2266 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've been reading a bit about optimisations for deferred shading using stencil tests and taking advantage of early z rejection. I understand these concepts in isolation but am not quite sure onhow they integrate with the G Buffers. How dothe lights access the z buffer used by the G Buffers? Do you bind another colour attachment to the FBO for rendering the lighting passes to get acess to the depth buffer used by the FBO?

Share this post


Link to post
Share on other sites
Advertisement

I've been reading a bit about optimisations for deferred shading using stencil tests and taking advantage of early z rejection. I understand these concepts in isolation but am not quite sure onhow they integrate with the G Buffers. How dothe lights access the z buffer used by the G Buffers? Do you bind another colour attachment to the FBO for rendering the lighting passes to get acess to the depth buffer used by the FBO?


Yes, you have to use the same depth attachment in subsequent passes to get benefit from early Z rejection, while the color attachment can change.

However, you may run into problems trying to both read the depth value (as a depth texture) for world space reconstruction in light shaders, and to use the hardware Z-test at the same time. Especially if you also want a stencil buffer, in which case you practically need to create your FBO depth attachment using the packed depth+stencil extension, as AFAIK the seperate stencil buffer attachment was never supported by real hardware.

(more detail: on NVIDIA's GPU's everything would be fine, you can create a packed depth+stencil texture for your FBO depth attachment, and use it both for reading depth, and the hardware Z-test. On AMD GPU's either slowdown or visual artifacts will result, so you have to create it as a renderbuffer instead, but then you can't sample it as a texture.)

What you can do, is to simply forget about reading the hardware depth in your light shaders, and instead render depth into a separate R32F texture during the G-buffer pass. Meanwhile you keep the same depth attachment bound for all the passes to get early-Z benefit for the actual hardware Z-test.

Note that some engines like Horde3D manually copy the depth from the G-buffer depth attachment to the backbuffer depth for the light rendering pass (by sampling the depth texture and writing it back to gl_FragDepth in a full-screen pass) but doing this loses the hierarchical/early Z optimization.

Share this post


Link to post
Share on other sites
D3D11 has the option for read-only depth-stencil views, which let you read from the depth buffer while also using it for depth testing. I'd imagine that the latest version of OpenGL has something similar.

Share this post


Link to post
Share on other sites
Well for basic lighting with deferred rendering, you setup the stencil buffer before rendering the light, to mask where the light goes. Basically render a two sided convex polyhedron, with front facing depth fail decrementing the stencil, back facing depth pass incrementing the stencil. Then the light pass can use a stencil ==0 test with a fullscreen quad (ideally sized or scissored to the relevant screen area), which will use an early-stencil reject. After that you clear the stencil buffer and move on to the next light.

This approach is the most robust, as it handles every case (camera inside the volume, camera intersecting the polgon making up the volume, camera outside the volume etc). If you know you are outside the convex volume, you can use the convex polygon instead of a fullscreen quad, but you will still need to setup the stencil buffer to mask the relevant pixels out.


Most of of rules for early reject are that is disabled with depth or stencil writing are enabled, clearing the entire depth & stencil buffer once a frame wouldn't hurt. For NVIDIA changing depth write modes (say from lessequal to greaterequal) historically has caused the early reject for that buffer to be disabled until the entire buffer is cleared again. Also, most hardware can only really treat the stencil buffer efficiently as a one bit mask, so stick to using ==0 and !=0 tests. Without the early reject the pixel shaders execute before culling.

Share this post


Link to post
Share on other sites
Btw. noticed an error in my advice regarding rendering depth to R32F texture. In contrast to Direct3D, where only the bitdepth of all multiple render targets must match (and possibly not even that on recent hardware), it is in fact illegal to have different internal formats in the color attachments. Therefore, if your G-buffer normal & albedo render targets would be RGBA, then the manually written depth would have to be as well, meaning a manual encode/decode of the depth value to RGB channels, which of course sucks :(

Even worse is that *some* OpenGL drivers ignore that error.

Share this post


Link to post
Share on other sites
Even worse is that *some* OpenGL drivers ignore that error.


Those implementations would be the ones supporting either OpenGL 3.0 or GL_EXTX_framebuffer_mixed_formats. The specification only says that there may be an implementation-dependent set of restrictions on combinations of internal formats (chapter 4.4, whole framebuffer completeness). Having different internal formats is not an error per se. This was "secretly allowed" by drivers exposing GL_EXTX_framebuffer_mixed_formats prior to version 3.0, when it was considered an error (it was an "error" mostly because ATI cards couldn't do it, same reason why the maximum number of multiple render targets was allowed to be 1 at one time, which was totally braindead).

Share this post


Link to post
Share on other sites

Those implementations would be the ones supporting either OpenGL 3.0 or GL_EXTX_framebuffer_mixed_formats. The specification only says that there may be an implementation-dependent set of restrictions on combinations of internal formats (chapter 4.4, whole framebuffer completeness). Having different internal formats is not an error per se. This was "secretly allowed" by drivers exposing GL_EXTX_framebuffer_mixed_formats prior to version 3.0, when it was considered an error (it was an "error" mostly because ATI cards couldn't do it, same reason why the maximum number of multiple render targets was allowed to be 1 at one time, which was totally braindead).


Ah, thanks for the clarification. It's just interesting that on Direct3D side different same-bitdepth formats have been mixed happily for ages, even dating back to rather ancient cards like the Radeon 9800 :)

Share this post


Link to post
Share on other sites
Well, ATI at that time did not really have OpenGL on the menu, probably because supporting both DX and OpenGL costs twice as much money as only supporting one properly, and most games used DX anyway.

This has drastically changed after AMD bought them. Nowadays, it's AMD who's selling the cool cards with the cool drivers, and for less money too. They're no longer the ones with the cheap crap cards and totally broken OpenGL implementations that only give you problems.

One of the problems OpenGL always had was that the ICDs get to have a word with what's going to be the standard, and what's not going to be standard. This is both a good thing and a bane.

It leads to one competitor having vertex texture fetch and multiple render targets, and the other one not having them, but of course nobody will admit having the inferior hardware or an inferior OpenGL implementation or driver. Nobody wants to continue selling OpenGL 2 cards if the guy next door sells OpenGL 3 cards. This ain't no good for business.
So you get a specification that allows for things like "MRT supported, maximum number of buffers = 1" and "vertex texture fetch supported, maximum number of fetches = 0". And, you get something like "must have same internal format" although the same, identical card could certainly and indeed does different formats just fine under a different API.

The dictatorship-style in which DX is maintained has its downsides too, but it sure makes some things much clearer and less ambiguous.

Share this post


Link to post
Share on other sites

In contrast to Direct3D, where only the bitdepth of all multiple render targets must match (and possibly not even that on recent hardware),


Any D3D10 GPU can mix any kind of format when using MRT. IIRC there's a cap in D3D9 that let you check for this functionality, but in D3D10+ it's required as part of the feature set.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement