• Advertisement
Sign in to follow this  

ddx, ddy reuse for better performance?

This topic is 654 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Background

So here's the scenario. I have a baseUV that I reuse to sample several base maps (eg. base color, material, normal map, height map, etc). When I apply Parallax Occlusion Mapping, I have to calculate ddx(baseUV) and ddy(baseUV) to use before sampling the height map in a while loop.

 

Questions

  1. Is it more efficient for me to reuse that ddx(baseUV) and ddy(baseUV) with tex2Dgrad() to sample all my base maps?
  2. Is it better to just use tex2D() at this point, especially after applying offsets from POM?

 

Thanks.

Edited by Josh Petrie

Share this post


Link to post
Share on other sites
Advertisement

I think you should recalculate ddx/ddy after the offsets (from POM) have been added to the texcoords.

As the offsets are different between pixels, this results in different UVs (to sample base maps) and this results in different derivatives.

So as you have to calculate them anyway, you might as well use tex2D.

 

You can probably derive a scaling factor (from the offsets) to apply to the baseUV ddx/ddy (and that will likely be cheaper than recalculating)

Share this post


Link to post
Share on other sites
calculating ddx ddy is as cheap as a subtraction. I would recalculate it. tex2Dgrad is mostly useful for when you have branching paths in your shader. If you ever branch you cannot cacluate ddx and ddy. If you need to sample a texture inside an if statement, you need to calculate ddx and ddy before the branch and pass them into tex2Dgrad.

Share this post


Link to post
Share on other sites

Historically, tex2Dgrad is slower than a normal tex2D even if they have equivalent results. Explicitly specifying gradients potentially requires that the shader core send quite a bit more per-thread data (6 floats vs 2 floats for the 2D case), and on some older GPU's this caused a performance penalty. I'm not sure if it's still slower on newer GPU's, but personally I would still avoid it in order to avoid unneeded register pressure.

Edited by MJP

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement