ddx, ddy reuse for better performance?

Started by
3 comments, last by MJP 7 years, 9 months ago

Background

So here's the scenario. I have a baseUV that I reuse to sample several base maps (eg. base color, material, normal map, height map, etc). When I apply Parallax Occlusion Mapping, I have to calculate ddx(baseUV) and ddy(baseUV) to use before sampling the height map in a while loop.

Questions

  1. Is it more efficient for me to reuse that ddx(baseUV) and ddy(baseUV) with tex2Dgrad() to sample all my base maps?
  2. Is it better to just use tex2D() at this point, especially after applying offsets from POM?

Thanks.


[Hardware:] Falcon Northwest Tiki, Windows 7, Nvidia Geforce GTX 970

[Websites:] Development Blog | LinkedIn
[Unity3D :] Alloy Physical Shader Framework

Advertisement

I think you should recalculate ddx/ddy after the offsets (from POM) have been added to the texcoords.

As the offsets are different between pixels, this results in different UVs (to sample base maps) and this results in different derivatives.

So as you have to calculate them anyway, you might as well use tex2D.

You can probably derive a scaling factor (from the offsets) to apply to the baseUV ddx/ddy (and that will likely be cheaper than recalculating)

calculating ddx ddy is as cheap as a subtraction. I would recalculate it. tex2Dgrad is mostly useful for when you have branching paths in your shader. If you ever branch you cannot cacluate ddx and ddy. If you need to sample a texture inside an if statement, you need to calculate ddx and ddy before the branch and pass them into tex2Dgrad.
My current game project Platform RPG

Okay. Thank you for your assistance. ;)


[Hardware:] Falcon Northwest Tiki, Windows 7, Nvidia Geforce GTX 970

[Websites:] Development Blog | LinkedIn
[Unity3D :] Alloy Physical Shader Framework

Historically, tex2Dgrad is slower than a normal tex2D even if they have equivalent results. Explicitly specifying gradients potentially requires that the shader core send quite a bit more per-thread data (6 floats vs 2 floats for the 2D case), and on some older GPU's this caused a performance penalty. I'm not sure if it's still slower on newer GPU's, but personally I would still avoid it in order to avoid unneeded register pressure.

This topic is closed to new replies.

Advertisement