Thoughts on scissor/stencil test as optimizations

Started by
12 comments, last by Toji 18 years, 9 months ago
As some of you know, I'm currently working with omnidirectional shadow mapping. One optimization I've been tooling with in my head is the possibility of using scissor or stencil tests as optimizations when generation the SMs. However, this image from the MSDN suggests that even using both would not be enormously practical. If it helps any, here's the optimization I'm thinking of: On the CPU, calculate a rectangle that will represent the area that the view frustum exists in in image space for the current SM face, and run the scissor test on that. In addition to this, the view frustum would also be rendered to the SM face before anything else, to mask any pixels that lie outside of the view frustum (e.g. a finer, secondary test). Also, (just thought of this while writing) render the back parts of the view frustum to define the max Z value in that area as well (a tertiary test). So, do you guys think this would be helpful to do in order to reduce fill? I haven't worked at all with scissoring/stencil, and I am not sure how big of a jump in speed the Z test would do as well. The biggest concern that I have is that it would be almost pointless because the extra fill used by the view frustum rendering and extra checks would offset the fill/pixel processing saved by doing these tests.
Advertisement
I have seen that pipeline diagram too and it never made sense to me...It seems like it's easy enough for the hardware to do the various tests before the pixel shader is executed - except for the alpha test, and the depth test in the case of a pixel shader which writes to oDepth. And, even if pixels are being processed in parallel, it will be common for all of them to fail the z test, for instance.

In practice it does seem like it is doing tests before invocation of the pixel shader, since I have implemented stencil tests and scissor tests in various situations to save per-pixel work and have noticed significant speedups - more than I would expect from just frame buffer bandwidth. I guess I can't know for sure though. I'd go ahead and try your optimization, I think there's a good chance it will speed things up.

The card manufacturers often advise using a Z Fill pass and I think they imply that it saves pixel shader invocations.

I'd be very interested in hearing some definitive information about this from someone who knows though...
Quote:Original post by ganchmaster
The card manufacturers often advise using a Z Fill pass and I think they imply that it saves pixel shader invocations.


Actually, because this is being applied to shadow maps, there is only a single pass being done, so a Z Fill pass is pretty much out of the question. It'd be absolutely redundant, since once I conduct all of these optimizations I'll be rendering things to the shadow map in front-to-back.

Yeah, I'm sure I've read in nVidia documentation you're meant to avoid changing depth values in pixel shaders if at all possible, exactly because it stops depth being tested before the pixel shader is run and stops hierarchical depth buffers working. As for that pipeline I think that's the theoretical one (with the exception of the scissor test, I can't see why that has to come after the pixel shader), i.e. it may not be possible to do Z/stencil test before pixel shading.
Quote:Original post by Cypher19
Quote:Original post by ganchmaster
The card manufacturers often advise using a Z Fill pass and I think they imply that it saves pixel shader invocations.


Actually, because this is being applied to shadow maps, there is only a single pass being done, so a Z Fill pass is pretty much out of the question. It'd be absolutely redundant, since once I conduct all of these optimizations I'll be rendering things to the shadow map in front-to-back.


Yeah. I meant that they wouldn't be as keen on everyone using a Z fill pass if it didn't help avoid spending cycles on pixel shading. I was arguing that it is likely that the Z test actually occurs before the pixel shader, but the order in which I wrote the statements was confusing. Certainly it seems pointless for you to use a Z fill pass when rendering shadow maps.

Let us know how your stencil and scissor optimizations come out...I bet the scissor will result in a net savings, but I'm not sure about the stencil.

Actually, I did more looking around regarding stencil/scissor/z test locations in pipelines, and here's the conclusion I came to:

1) Scissor is a definite yes
2) Stencil is a no, but requires more research. It seems like this varies from card to card...
3) But who cares, because I can do an early Z pass with the back of the cam's view frustum which can act like a stencil! Basically clear the z buffer to a value of, say, 0, and then with no z-testing being done, render the back half of the view frustum at its normal depth, and then render the SM normally (with <= z comp's). It should be noted that basically EVERY card out there does early hierarchical Z culling, allowing large chunks to quickly be dumped or accepted before the shader. However, not every card does early stencil (why just boggles the mind, imo).
Scissor is logically done late, but in reality it's done early. I believe most modern cards won't even generate fragments outside the scissor region, so that's a definite win.

Often shadow maps are not b/w bound or fill bound, but setup, vertex or attribute bound, b/c there is very little for the rasterizer or shader to do, so I suspect that doing the Z-only frustum, while a cool idea, won't buy you much.

Early Z and early stencil optimizations often rely on limited on-chip resources to be allocated by the driver in order to function. These are typically allocated to the main back buffer's z buffer first, so you may get limited or no fast Z or stencil culling on shadow maps rendered to an off-screen texture.

Have you tried NVPerfHUD to verify that you are fillbound on this part of the scene?
No, I haven't, but I cvan tell you that I DID get a sweet performance boost by not rendering the convex object (big fat room) to the SM.

I just felt that the fill rate is going to be an issue either now or later.
Quote:Original post by SimmerD
Have you tried NVPerfHUD to verify that you are fillbound on this part of the scene?


Maybe I'm missing something here, but how would you use NVPerfHUD to verify that you are fillbound at a particular part of the scene? You can use it to look at the fill implications for your entire scene by turning on and off rasterization, but how for a specific part of the scene?

I could comment out all of the render calls to the main scene itself, leaving SM generation as the only part of the render function?

This topic is closed to new replies.

Advertisement