However, in some cases, this may actually increase performance. Drawing the scene twice, where the first time you only draw depth, and the second time you use your real pixel shaders, is known as a "z pre pass", "depth pre pass", "zpp", etc.
Doom 3 chose to do this on purpose, because it lets the GPU take full advantage of the depth buffer, to avoid overdraw.
Say you've got a camera looking through 3 walls:
Cam -> |A| |B| |C|
If you draw C, then B, then A, then you're running 3 different pixel shaders, even though only the last one (A) counts -- it overwrites the previous results. That's "overdraw".
By drawing the whole scene's depth buffer first, then in the second pass, B & C will be skipped, because they fail the depth test.
In scenes that have a lot of overdraw, then a ZPP may actually improve performance.
I've heard of Z-pass before. I actually read about it. To quote directly:
Double-Speed Z-Only and Stencil Rendering
All GeForce Series GPUs (FX and later) render at double speed when rendering
only depth or stencil values. To enable this special rendering mode, you must
follow the following rules:
- Color writes are disabled
- Texkill has not been applied to any fragments (clip, discard)
- Depth replace (oDepth, texm3x2depth, texdepth) has not been applied to any fragments
- Alpha test is disabled
- No color key is used in any of the active textures
See section 6.4.1 for information on NULL render targets with double speed Z.
3.6.2. Z-cull Optimization
Z-cull optimization improves performance by avoiding the rendering of
occluded surfaces. If the occluded surfaces have expensive shaders applied to
them, z-cull can save a large amount of computation time. See section 4.8 for a
discussion on Z-cull and how to best use it.
3.6.3. Lay Down Depth First (“Z-only rendering”)
The best way to take advantage of the two aforementioned performance
features is to “lay down depth first.” By this, we mean that you should use
double-speed depth rendering to draw your scene (without shading) as a first
pass. This then establishes the closest surfaces to the viewer. Now you can
render the scene again, but with full shading. Z-cull will automatically cull out
fragments that aren't visible, meaning that you save on shading computations.
Laying down depth first requires its own render pass, but can be a performance
win if many occluded surfaces have expensive shading applied to them. Doublespeed rendering is less efficient as triangles get small. And, small triangles can
reduce z-cull efficiency.
Bu I'm not sure exactly what I should do to benefit 100% from this technique and the "double speed" Z rendering and how to satisfy all those points.
Then there is this section that makes things even more confusing:
CULL and EarlyZ: Coarse and
Fine-grained Z and Stencil
Culling
NVIDIA GeForce 6 series and later GPUs can perform a coarse level Z and
Stencil culling. Thanks to this optimization large blocks of pixels will not be
scheduled for pixel shading if they are determined to be definitely occluded.
In addition, GeForce 8 series and later GPUs can also perform fine-grained Z
and Stencil culling, which allow the GPU to skip the shading of occluded pixels.
These hardware optimizations are automatically enabled when possible, so they
are mostly transparent to developers. However, it is good to know when they
cannot be enabled or when they can underperform to ensure that you are taking
advantage of them.
Coarse Z/Stencil culling (also known as ZCULL) will not be able to cull any
pixels in the following cases:
1. If you don’t use Clears (instead of fullscreen quads that write depth) to
clear the depth-stencil buffer.
2. If the pixel shader writes depth.
3. If you change the direction of the depth test while writing depth.
ZCULL will not cull any pixels until the next depth buffer Clear.
4. If stencil writes are enabled while doing stencil testing (no stencil
culling)
5. On GeForce 8 series, if the DepthStencilView has
Texture2D[MS]Array dimension
Also note that ZCULL will perform less efficiently in the following
circumstances
1. If the depth buffer was written using a different depth test direction
than that used for testing
2. If the depth of the scene contains a lot of high frequency information
(i.e.: the depth varies a lot within a few pixels)
3. If you allocate too many large depth buffers.
4. If using DXGI_FORMAT_D32_FLOAT format
Similarly, fine-grained Z/Stencil culling (also known as EarlyZ) is disabled in
the following cases:
1. If the pixel shader outputs depth
2. If the pixel shader uses the .z component of an input attribute with the
SV_Position semantic (only on GeForce 8 series in D3D10)
3. If Depth or Stencil writes are enabled, or Occlusion Queries are
enabled, and one of the following is true:
• Alpha-test is enabled
• Pixel Shader kills pixels (clip(), texkil, discard)
• Alpha To Coverage is enabled
• SampleMask is not 0xFFFFFFFF (SampleMask is set in
D3D10 using OMSetBlendState and in D3D9 setting the
D3DRS_MULTISAMPLEMASK renderstate)