Recommended Posts

I want to add a depth-only pass before my forward and deferred shading. In order to do this, I bind a nullptr as PS and a nullptr as RTV. But I get huge amounts of flickering and background leaking in a static scene? Is this z-fighting?

I use D3D11_COMPARISON_LESS_EQUAL. The VSs are slightly different in the sense that the depth VS uses an object_to_projection transform whereas the other VSs use a object_to_view, view_to_projection transform chain. The object_to_projection, however, is constructed by multiplying the same matrices on the CPU.

Edited by matt77hias

Share this post


Link to post
Share on other sites

For a Z prepass to work correctly, the part of your VS that computes the position needs to exactly match between your Z-only pass and your main pass. This means that they must use the same matrix or chain of matrices: what you described will result in small variations due to finite floating point precision, which is why you're getting Z fighting. I would recommend using the object_to_view + view_to_project in your depth prepass shader, as this has been shown to result in improved precision compared to a pre-composed view * projection matrix.

Share this post


Link to post
Share on other sites

MJP explained it well, your VS math has to match exactly due to floating point inconsistencies

What I wanted to add, is that a depth prepass does no performance benefit to a deferred shader. A depth prepass has a cost (CPU wise all the commands issued twice, GPU wise vertex shader is ran twice, rasterizer works twice as hard); and it is only an optimization if this cost is lower than the cost of running expensive pixel shaders multiple times for the same pixel.

This can be true for forward shading, but for Deferred Shading, this is rarely the case as the pixel shaders tend to be very cheap (sample albedo, sample some other parameters, calculate normals, finally export to render target)

Share this post


Link to post
Share on other sites
57 minutes ago, Matias Goldberg said:

MJP explained it well, your VS math has to match exactly due to floating point inconsistencies

What I wanted to add, is that a depth prepass does no performance benefit to a deferred shader. A depth prepass has a cost (CPU wise all the commands issued twice, GPU wise vertex shader is ran twice, rasterizer works twice as hard); and it is only an optimization if this cost is lower than the cost of running expensive pixel shaders multiple times for the same pixel.

This can be true for forward shading, but for Deferred Shading, this is rarely the case as the pixel shaders tend to be very cheap (sample albedo, sample some other parameters, calculate normals, finally export to render target)

Apparently, Frostbite uses a depth pre-pass before their GBuffer pass.

Share this post


Link to post
Share on other sites
2 hours ago, matt77hias said:

Apparently, Frostbite uses a depth pre-pass before their GBuffer pass.

I doubt they do that anymore, Z-pre passes have been phased out after the transition to modern consoles. It was worth it on the previous generation because polycount could scale far better than memory bandwidth, but that's no longer the case today.

Share this post


Link to post
Share on other sites

For deferred it totally depends on your content, your G-Buffer depth, your hardware, how well you can sort front-to-back, etc.  It's certainly less of an obvious win vs. the forward case, but my no means is it assured to be worthless (FWIW UE4 still uses a Z prepass). You just have to be careful to profile, and judge whether or not it's worth the CPU cost of submitting your geometry twice. As a very talented programmer once said "a Z prepass is a day-to-day decision, not a lifestyle choice." :)

Share this post


Link to post
Share on other sites

Assuming it is not needed anymore for deferred and assuming you could afford it, I start to wonder if this idea of mine makes sense:

  1. GBuffer packing (opaque_models): write to multiple non-MS RTVs (GBuffer)
  2. GBuffer unpacking (opaque_models): write to one non-MS RTV (Color Buffer)
  3. Forward (opaque_models): fetch color from the Color Buffer and write to one MS RTV (back buffer)
  4. Remaining forward passes for emissive and transparent models and sprites

The vertex shader price of step 3 is equal to an early z-pass. What remains is a texture load/fragment which enables MSAA and alpha-to-coverage in a deferred shading algorithm.

Share this post


Link to post
Share on other sites
10 hours ago, Infinisearch said:

Instead of a full z-only pass why not just do one with the major occluders for a given view?

Yeah, I've worked on games where three specific objects were included in a depth-pre-pass (prior to the G-buffer pass), and that's it! They happened to always be very close to the camera and occluded a lot of pixels, and also had high depth complexity / overlapping parts. It turned out that a generic ZPP was a loss, but doing ZPP for just these three specific objects was a win :)

There's no generic general purpose realtime renderer that's a good fit for every game. Every single console game that I've worked on has used a different rendering pipeline. Do what works in your situation.

Share this post


Link to post
Share on other sites
11 hours ago, matt77hias said:

Assuming it is not needed anymore for deferred and assuming you could afford it, I start to wonder if this idea of mine makes sense:

  1. GBuffer packing (opaque_models): write to multiple non-MS RTVs (GBuffer)
  2. GBuffer unpacking (opaque_models): write to one non-MS RTV (Color Buffer)
  3. Forward (opaque_models): fetch color from the Color Buffer and write to one MS RTV (back buffer)
  4. Remaining forward passes for emissive and transparent models and sprites

The vertex shader price of step 3 is equal to an early z-pass. What remains is a texture load/fragment which enables MSAA and alpha-to-coverage in a deferred shading algorithm.

Any thoughts on this?

Share this post


Link to post
Share on other sites
34 minutes ago, Infinisearch said:

When do you do lighting?

At the unpacking (step 2). It is basically deferred shading, but instead you use the image as an srv in a forward step (which unfortunately has a cleared MS dsv to start with). So you will have all the MSAA/Coverage/Depth fluff while having a very cheap PS which just fetches the color: double deferred ;)

I don't know if I am missing something, but having MSAA and alpha-to-coverage without requiring edge fixing, post AA, seems nice.

Edited by matt77hias

Share this post


Link to post
Share on other sites

That's basically how Light Pre-Pass (aka deferred lighting) worked, and how it supports MSAA.

There's some tricky pitfalls though. Your GBuffer sample locations are at pixel centers, but your later MSAA sample locations are not... So you can't simply sample the deferred lighting results in pass 3, or you might fetch data from a completely different triangle/object. Instead, you have to search the local neighbourhood (e.g. 3x3 pixels) and find which GBuffer pixel is the best normal/depth match, and then fetch the lighting result for that pixel.

See also Inferred Lighting, which implements this search algorithm with bilateral filtering.

Share this post


Link to post
Share on other sites

The pixel center is used interpolating all vertex attributes (including the position) unless you use the "centroid" or "sample" interpolation modifier. "centroid" causes the attributes to be interpolated to the centroid of covered sub-samples, while "sample" causes it to be evaluated at the center of the subsample being shaded (using "sample" will cause your pixel shaders to run at sample-rate!). With the default "pixel center" behavior your attributes may be extrapolated off the triangle's surface at edge pixels, so you have to watch out for that.

There's also the EvaluateAttributeAtSampleEvaluateAttributeAtCentroid, and EvaluateAttributeSnapped intrinsics that you can use.

Share this post


Link to post
Share on other sites
6 hours ago, MJP said:

The pixel center is used interpolating all vertex attributes (including the position) unless you use the "centroid" or "sample" interpolation modifier. "centroid" causes the attributes to be interpolated to the centroid of covered sub-samples, while "sample" causes it to be evaluated at the center of the subsample being shaded (using "sample" will cause your pixel shaders to run at sample-rate!). With the default "pixel center" behavior your attributes may be extrapolated off the triangle's surface at edge pixels, so you have to watch out for that.

There's also the EvaluateAttributeAtSampleEvaluateAttributeAtCentroid, and EvaluateAttributeSnapped intrinsics that you can use.

Thanks, really interesting stuff.

Share this post


Link to post
Share on other sites

As far as I understand, if you are consistent between GBuffer unpacking+lighting (pixel center evaluation) and non-MS SRV-to-MS RTV (pixel center evaluation), you should get the same results as for forward MSAA?

As stated above by @MJP, the pixel center is used by default in MSAA. Or is the issue about not using the pixel center for MSAA in the first place (in forward MSAA as well) since it is not necessarily representative for the majority of subpixels?

Share this post


Link to post
Share on other sites
7 hours ago, Hodgman said:

And to solve it, you need to search the lighting buffer in the local neighbourhood for a texel that best matches the geometry (either best-fit or a bilateral filter), in which case you'll end up with those edge pixels realizing that the lighting buffer contains invalid data for them, and instead blending valid data that they find in their neigbourhood:

Does this assume operating at the subpixel level itself?

Could one use store additional ids for this in the GBuffer?

7 hours ago, Hodgman said:

...but if it's fetching colours from that lighting buffer, the bottom right edge of the purple triangle will receive brown lighting.

Ah ok, I now see that forward MSAA does not suffer from this, since a fragment for both the purple and brown triangle would be evaluated and resolved with regard to the subpixels. When using the light buffer, the same two fragments are associated with the brown triangle's material. So in the naive approach, there is only one dominant "material" for all fragments of the same pixel independent of the actual material associated to the geometry.

 

Thanks for the visualizations! Really appreciated!

Edited by matt77hias

Share this post


Link to post
Share on other sites

For what it's worth, both MotorStorm: Pacific Rift & Apocalypse (PS3) and Driveclub (PS4) used varying degress of Depth-only passes. Rift had lots of foliage and ground-rush (grass and the like) so we did a depth-only pass of just the world geometry after rendering upto 64 'occluders'. The occluders were just large polys, upto 8 verts I think, that were inside canyon walls, etc. These occluders were also used further up the pipe on the CPU side to do simple coarse-grained PVS. On Apocalypse, our VS were very heavy from lots of skinning and the like, so we ended up using conditional rendering. We did a full depth-only pass, and used the RSX feature that wrote out a pixel count for each draw call. On the full G-Buffer pass, the RSX used the value to decided whether to skip the draw-call. We did a large number of automated fly-throughs of our levels looking Fwd / Rev with this stuff on/off and it was a win, something like 1ms - 2ms if I recall. Everyone told us that Conditional Rendering was way slower, but not for us. On Driveclub, we again used Occluders but also a full depth-only pass. We fired off a ComputeShader right after to build tile info for lighting, etc which ran in parallel with our ShadowMap passes. Overally, this was a nice win despite some very heavy vertex shaders.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now