Jump to content
  • Advertisement
Sign in to follow this  
Quat

Static Branching on DirectX 10/11 Hardware Optimal?

This topic is 2705 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Right now I am using a few uber shaders, and an effects framework type system to generate specialized shaders based on the options I need. However, even with such a system, there are still a lot of combinations I need to manually specify. I could maybe work on automating this, but some combinations don't make sense, so it still requires a human to put the combinations together.

I was thinking of doing a hybrid of generating specialized shaders along with static branching (branching based on a constant register value) to make this more manageable. From my understand, with static branching, the driver would do the optimization and generate a specialized shader at runtime. However, I believe there was concern with some DirectX9 hardware that the driver might not optimize this well and so doing the specialization yourself was preferred. Does anyone know if this situation has improved with DirectX 10/11 hardware. If you have a static conditional:

float4 texColor = float4(1,1,1,1);
if( UseTexture )
{
texColor = DiffuseTex.Sample(...);
}

will it always skip the texture sample instructions if UseTexture == false? I don't want to pay for what I'm not using.

Share this post


Link to post
Share on other sites
Advertisement
float4 texColor = float4(1,1,1,1);
if( UseTexture )
{
texColor = DiffuseTex.Sample(...);
}


Like any other programming language, the statement won't execute if the condition doesn't meet. However, if you have UseTexture a global, what you are likely going to do, then the condition should be very efficient and not going to cost any performance, IIRC.

Share this post


Link to post
Share on other sites
On the GPU, as far as I know, both paths will be executed. So whether or not UseTexture is true, the sample is still performed, its just not used.

The performance diffiference is most likly negligible, even for modest sized scenes. You should try, by testing it out with a scene that uses this shader for the whole framebuffer.

If it does cause a noticable difference, instead, remove the if(), and just set texcolour to be initialised from the DiffuseTex, where its just set to a white texture by defaul. That MAY help somewhat.

Share this post


Link to post
Share on other sites
On the GPU, as far as I know, both paths will be executed. So whether or not UseTexture is true, the sample is still performed, its just not used. [/quote]

Well you are right, but as I said I think that if you define UseTexture as a gobal, the second branch should not be executed, as the shader should know that the branch wouldn't change during the execution. I recall to having read that in the msdn, but I can also be mistaken..

Share this post


Link to post
Share on other sites
You must #define UseTexture 1 and the branch will not be compiled into code. This will be the same for dx9, dx10, dx11 because this is an easy one for the compiler to figure out and discard.

Share this post


Link to post
Share on other sites

Right now I am using a few uber shaders, and an effects framework type system to generate specialized shaders based on the options I need. However, even with such a system, there are still a lot of combinations I need to manually specify. I could maybe work on automating this, but some combinations don't make sense, so it still requires a human to put the combinations together.

I was thinking of doing a hybrid of generating specialized shaders along with static branching (branching based on a constant register value) to make this more manageable. From my understand, with static branching, the driver would do the optimization and generate a specialized shader at runtime. However, I believe there was concern with some DirectX9 hardware that the driver might not optimize this well and so doing the specialization yourself was preferred. Does anyone know if this situation has improved with DirectX 10/11 hardware. If you have a static conditional:

float4 texColor = float4(1,1,1,1);
if( UseTexture )
{
texColor = DiffuseTex.Sample(...);
}

will it always skip the texture sample instructions if UseTexture == false? I don't want to pay for what I'm not using.



Dont worry about it until you can measure a performance impact from this method(which I doubt). On all sensible modern hardware the GPU should at least be smart enough not to perform the memory access associated with the texture instruction, which is what will matter performance wise most of the time.


You should perhaps check the disassembly to see if you actually have a jump instriction, if you do I would be surprised if the driver does not optimize the shader such that it always (genuinly) jumps over the code or just eliminates the code in the if() when it is significant. (with dx9 at least having instructions dependant on derivitives could make it tricky for the compiler)



David

Share this post


Link to post
Share on other sites
There are 3 ways of conditional branching:
  • no branching (compile-time rewrite)
  • no branching (execution of both branches, conditional move)
  • dynamic branchingAll Generations of hardware perform the first two ways of conditional branching in the same way. The third way is only available on SM3 and newer hardware.

    Shaders execute in "warps" (groups of threads running in a processor group, usually something around 32 or 64). On older hardware, every processor in the processor group would perform exactly the same instructions at exactly the same time. Newer hardware allows different processors in the same group to go individual code paths, but it still launches the entire group at the same time and requires the entire group to finish "as one". That means the entire warp will be as slow as the slowest executing fragment.

    If you use a constant, the shader compiler will almost certainly (you never have a guarantee, but this is as certain as it can get) produce a binary that does not contain a branch at all.

Share this post


Link to post
Share on other sites

There are 3 ways of conditional branching:
  • no branching (compile-time rewrite)
  • no branching (execution of both branches, conditional move)
  • dynamic branchingAll Generations of hardware perform the first two ways of conditional branching in the same way. The third way is only available on SM3 and newer hardware.
You can split the 'dynamic branching' dot-point into 'static-branches' and 'dynamic-branches'.
A branch instruction that depends on only a uniform bool or int will become a static-branch, the rest become dynamic-branches. A static branch will take slightly less cycles, but you still pay that cost per-vertex/per-pixel. The cost doesn't go away completely.


Also, a lot of the time, when you write a branch, the compiler will actually spit out the assembly for option #2 above (execution of both branches, conditional move) unless you use the [font="Courier New"][branch][/font] hint to force it's hand. Use [font="'Courier New"]fxc[/font]'s assembly output options if you're interested in how your [font="'Courier New"]if[/font]'s are being compiled.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!