Full screen pass optimization on G70 hardware

Started by
25 comments, last by Krypt0n 13 years, 9 months ago
Hello,
Just tested our deferred shadows on G70 hardware (7800) and we got very terrible performance. Our shadows do several full screen passes with screen space quads that are placed on the correct post-prespective Z coordinates to enclose some specific depth range of pixels to apply shadow to them. So, G70 has no Early-Z / Early-Stencil and Z-Cull is off, since our scene contains a lot of alpha-tested geometry.
So, what can be done to optimize that case - we do have already built depth buffer, and want to render several full screen quads on top of it - we rely on the fact, that newer hardware can kill PS() execution based on depth / stencil and now that is not present on G70.
Dynamic branching?
An option is to pack all shadow map full-screen passes into single one, to optimize the rendering - but this is a bit limiting and still will give performance loss when the shader touches the sky (deferred shadow will generate distant positions, that will generate in turn distant shadow map texture coordinates, which ones even clipped slow down the rendering).
So, any ideas?
Anybody with successful implementation of deferred shadow maps on G70 hardware?

(cross-posted on DXDev mailing list)
Advertisement
Could you be more specific ? How terrible is performance ? About how many post-processing shadow passes are we talking ? At what resolution ? What kind of shadows (sun, pointlights, spotlights) ? What shadowing technique (CSM, VSM,..) ?

You're sure that the G70 has no support of early-z-rejection etc. ? I thought it is supported since Geforce3 (take a look at chapter 29.1.2).

Quote:Original post by Zemedelec
So, G70 has no Early-Z / Early-Stencil


What gave you that idea?



Quote:Original post by Ashaman73
Could you be more specific ? How terrible is performance ? About how many post-processing shadow passes are we talking ? At what resolution ? What kind of shadows (sun, pointlights, spotlights) ? What shadowing technique (CSM, VSM,..) ?

Performance is in single digit range (1-2 fps) especially when looking at the sky (depth=1.0), since then we sample every screen pixel for every cascade, compute large world position and sample it in the shadow map.
5 cascades = 5 full-screen passes. Scales with resolution ofc.
Sun shadows, ortho-projection, PSSM/CSM.

Quote:Original post by Ashaman73
You're sure that the G70 has no support of early-z-rejection etc. ? I thought it is supported since Geforce3 (take a look at chapter 29.1.2).

Yep, they have one thing that is early - Z-Cull, i.e. coarse-grained test, the one that fails from the moment you render some alpha-tested geometry to the depth-stencil surface and on.
Per-pixel Early-Z is G80+ afaik.
Quote:Original post by MJP
Quote:Original post by Zemedelec
So, G70 has no Early-Z / Early-Stencil


What gave you that idea?


Docs. Early-Z hardware, they say, appears for the first time in G80.
early-z exists since gf3, like mentioned before. it is disabled if you
-enable alpha test
-use kill/clip in pixelshader
-change compare func

in order to get speed again on G70, you need to work around your alpha-testing.
this is critical, otherwise you pretty much run without optimization and then you're easily 10 to 30 times slower.
For the sky problem, you could use the stencil buffer.

Clear color and stencil to zero (depth to 1.0), and draw all scene geometry with a "replace" stencil operation. Now do your shadow paases with an "equal" stencil function. This will discard all fragments where nothing has been drawn, which corresponds to "sky".
Finally, draw the sky (which doesn't need lighting/shadowing, and which should be drawn last anyway) using normal depth testing. That'll discard all fragments which have been drawn to.

You could use the same trick (stencil) to optimize your full-screen quads on the scene geometry as well.
Do another shadow map, which does not necessarily have to be cascaded and super high resolution, and draw that one first. Disable writes to color and depth. Do your shadow comparison in a "conservative" way, i.e. everything that has as many as 0.01 PCF samples which might possibly be in shadow has its stencil written to. The goal is not to do pretty, alias-free shadows, but to roughly rule out areas that cannot possibly have shadow -- it is pointless to do the 5 shadow passes here.
Now, do your 5 shadow passes and use a stencil function that only lets through the framents marked "possibly in shadow". This should cut down on fill rate considerably.
Quote:Original post by Krypt0n
early-z exists since gf3, like mentioned before. it is disabled if you
-enable alpha test
-use kill/clip in pixelshader
-change compare func

And it is disabled only for the DIP that violated these rules, not later on - right? Since my screen space quads are rendered after the alpha-tested vegetation - they are solid and ok, but are rendered... after the vegetation.

Quote:Original post by Krypt0nin order to get speed again on G70, you need to work around your alpha-testing.

Well, you mean to render the scene without alpha testing? That's kinda not an option - the vegetation can not be rendered without alpha testing...
Quote:Original post by samothThat'll discard all fragments which have been drawn to.

This will discard fragments, *after* the pixel shader is executed for them. And this is the only problem here, not visual artefacts or something like that.
Or G70 has early-stencil and I somehow disabled it too...?

Also, in docs NVidia states that DBT works *after* the pixel shader. Am I reading correctly? If so - DBT is not quite an optimization feature then.

This topic is closed to new replies.

Advertisement