Jump to content
  • Advertisement
Sign in to follow this  
Cypher19

Thoughts on scissor/stencil test as optimizations

This topic is 4838 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

As some of you know, I'm currently working with omnidirectional shadow mapping. One optimization I've been tooling with in my head is the possibility of using scissor or stencil tests as optimizations when generation the SMs. However, this image from the MSDN suggests that even using both would not be enormously practical. If it helps any, here's the optimization I'm thinking of: On the CPU, calculate a rectangle that will represent the area that the view frustum exists in in image space for the current SM face, and run the scissor test on that. In addition to this, the view frustum would also be rendered to the SM face before anything else, to mask any pixels that lie outside of the view frustum (e.g. a finer, secondary test). Also, (just thought of this while writing) render the back parts of the view frustum to define the max Z value in that area as well (a tertiary test). So, do you guys think this would be helpful to do in order to reduce fill? I haven't worked at all with scissoring/stencil, and I am not sure how big of a jump in speed the Z test would do as well. The biggest concern that I have is that it would be almost pointless because the extra fill used by the view frustum rendering and extra checks would offset the fill/pixel processing saved by doing these tests.

Share this post


Link to post
Share on other sites
Advertisement
I have seen that pipeline diagram too and it never made sense to me...It seems like it's easy enough for the hardware to do the various tests before the pixel shader is executed - except for the alpha test, and the depth test in the case of a pixel shader which writes to oDepth. And, even if pixels are being processed in parallel, it will be common for all of them to fail the z test, for instance.

In practice it does seem like it is doing tests before invocation of the pixel shader, since I have implemented stencil tests and scissor tests in various situations to save per-pixel work and have noticed significant speedups - more than I would expect from just frame buffer bandwidth. I guess I can't know for sure though. I'd go ahead and try your optimization, I think there's a good chance it will speed things up.

The card manufacturers often advise using a Z Fill pass and I think they imply that it saves pixel shader invocations.

I'd be very interested in hearing some definitive information about this from someone who knows though...

Share this post


Link to post
Share on other sites
Quote:
Original post by ganchmaster
The card manufacturers often advise using a Z Fill pass and I think they imply that it saves pixel shader invocations.


Actually, because this is being applied to shadow maps, there is only a single pass being done, so a Z Fill pass is pretty much out of the question. It'd be absolutely redundant, since once I conduct all of these optimizations I'll be rendering things to the shadow map in front-to-back.

Share this post


Link to post
Share on other sites
Yeah, I'm sure I've read in nVidia documentation you're meant to avoid changing depth values in pixel shaders if at all possible, exactly because it stops depth being tested before the pixel shader is run and stops hierarchical depth buffers working. As for that pipeline I think that's the theoretical one (with the exception of the scissor test, I can't see why that has to come after the pixel shader), i.e. it may not be possible to do Z/stencil test before pixel shading.

Share this post


Link to post
Share on other sites
Quote:
Original post by Cypher19
Quote:
Original post by ganchmaster
The card manufacturers often advise using a Z Fill pass and I think they imply that it saves pixel shader invocations.


Actually, because this is being applied to shadow maps, there is only a single pass being done, so a Z Fill pass is pretty much out of the question. It'd be absolutely redundant, since once I conduct all of these optimizations I'll be rendering things to the shadow map in front-to-back.


Yeah. I meant that they wouldn't be as keen on everyone using a Z fill pass if it didn't help avoid spending cycles on pixel shading. I was arguing that it is likely that the Z test actually occurs before the pixel shader, but the order in which I wrote the statements was confusing. Certainly it seems pointless for you to use a Z fill pass when rendering shadow maps.

Let us know how your stencil and scissor optimizations come out...I bet the scissor will result in a net savings, but I'm not sure about the stencil.

Share this post


Link to post
Share on other sites
Actually, I did more looking around regarding stencil/scissor/z test locations in pipelines, and here's the conclusion I came to:

1) Scissor is a definite yes
2) Stencil is a no, but requires more research. It seems like this varies from card to card...
3) But who cares, because I can do an early Z pass with the back of the cam's view frustum which can act like a stencil! Basically clear the z buffer to a value of, say, 0, and then with no z-testing being done, render the back half of the view frustum at its normal depth, and then render the SM normally (with <= z comp's). It should be noted that basically EVERY card out there does early hierarchical Z culling, allowing large chunks to quickly be dumped or accepted before the shader. However, not every card does early stencil (why just boggles the mind, imo).

Share this post


Link to post
Share on other sites
Scissor is logically done late, but in reality it's done early. I believe most modern cards won't even generate fragments outside the scissor region, so that's a definite win.

Often shadow maps are not b/w bound or fill bound, but setup, vertex or attribute bound, b/c there is very little for the rasterizer or shader to do, so I suspect that doing the Z-only frustum, while a cool idea, won't buy you much.

Early Z and early stencil optimizations often rely on limited on-chip resources to be allocated by the driver in order to function. These are typically allocated to the main back buffer's z buffer first, so you may get limited or no fast Z or stencil culling on shadow maps rendered to an off-screen texture.

Have you tried NVPerfHUD to verify that you are fillbound on this part of the scene?

Share this post


Link to post
Share on other sites
No, I haven't, but I cvan tell you that I DID get a sweet performance boost by not rendering the convex object (big fat room) to the SM.

I just felt that the fill rate is going to be an issue either now or later.

Share this post


Link to post
Share on other sites
Quote:
Original post by SimmerD
Have you tried NVPerfHUD to verify that you are fillbound on this part of the scene?


Maybe I'm missing something here, but how would you use NVPerfHUD to verify that you are fillbound at a particular part of the scene? You can use it to look at the fill implications for your entire scene by turning on and off rasterization, but how for a specific part of the scene?

Share this post


Link to post
Share on other sites
I could comment out all of the render calls to the main scene itself, leaving SM generation as the only part of the render function?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!