Question about dynamic branching efficiency

Started by
6 comments, last by coderchris 15 years, 5 months ago
So, im thinking about switching my shader system to use an "ubershader" approach, and im curious about how efficient dynamic branching is on modern cards. I know I know, profile :) (I will profile, but I dont actually have this implemented yet and want to do some research before I take the plunge) I had heard that it is particularly bad on nvidia cards (especially pre-G80), and that ATI has always had pretty efficient dynamic branching. From what I understand, dynamic branching is so bad on some cards because they process pixels in blocks, and on early geforce cards, the block size was huge, and if any pixels in that block took different paths, the whole block had to evaluate both paths. For what I want to do (ubershader), the dynamic branching will happen based on some shader parameters, however, they will be uniform for each different object (so object A will have the same parameters for each pixel of A drawn). In my case, I will be passing lighting parameters, shadow maps, ect... Does this mean that I wont have that problem I mentioned above since all pixels for that object will take the same exact path? Also, how long does the actual branching instructions take relative to, say, a texture fetch or something? Heres a quick sudo code example of the kind of ubershader im talking about. Do you guys think this type of thing is viable (and by viable, i mean really fast) on modern cards? I am targeting shader model 4 cards, but im curious about DB performance on SM 3 cards as well.

int numLights
int lightType[8]
bool hasShadow[8]
sampler shadowMaps[8]

pixelShader(...) {
    for (int i=0;i<numLights;i++) {
        if (lightType == DIRECTIONAL) {
            if (hasShadow) { //shadow stuff }
        //other complicated stuff
        }
        else if (lightType == SPOT) {
           // some complicated stuff like above
        }
        else if (.. //and so on
    }

    // more stuff not invovling branching
}
THanks, Chris
Advertisement
Quote:Original post by coderchris
For what I want to do (ubershader), the dynamic branching will happen based on some shader parameters, however, they will be uniform for each different object (so object A will have the same parameters for each pixel of A drawn). In my case, I will be passing lighting parameters, shadow maps, ect...
Does this mean that I wont have that problem I mentioned above since all pixels for that object will take the same exact path?

Also, how long does the actual branching instructions take relative to, say, a texture fetch or something?

What you intend on doing is not dynamic branching but rather static branching. It's practically for free because the graphics driver will extract an optimized shader for each of your objects.

For true dynamic branching the instruction itself can take between zero to about four clock cycles, depending on the hardware and the specific instruction.
A graphics processor is a massive SIMD machine. As such, any uniform branching evaluates only the used branch, and any non-uniform branching requires the processor to evaluate both branches.
The critical thing to remember is that when sampling a texture in a dynamic branch you must specify the mipmap level explicitly, using tex2Dgrad or tex2Dlod for example...otherwise the branch cant be dynamic. Correctly used, dynamic branching can save a lot of cycles.
Quote:
What you intend on doing is not dynamic branching but rather static branching. It's practically for free because the graphics driver will extract an optimized shader for each of your objects.


Static branching? I thought that was used when your passing in uniform variables, like the kind of thing you would have different techniques for. The variables im talking about are actual shader constants set per object, such that I can have just one technique in my FX file. Is the compiler smart enough to actually compile the hundreds of variations of shader parameters and then choose the right one based on the values I set?

Quote:
A graphics processor is a massive SIMD machine. As such, any uniform branching evaluates only the used branch, and any non-uniform branching requires the processor to evaluate both branches.


Ok, so assuming it is infact dynamic branching im looking at here, it shoudl be safe to assume that only one of the many branches will actually be taken, since it is uniform
What you're talking about is static branching and typically compiles down to static branch instructions in the D3D asm level shaders. The driver may choose to implement that behind the scenes by creating multiple cached copies of hardware level shaders for each combination you use or it may support it using actual static branching in hardware, which is generally 'free'. There are costs to using static branching and an uber-shader approach though which is why I put free in quotes - the compiler may have a harder time optimizing a shader that uses static branching and the shader may end up using more GPRs than if you compiled several unique shaders which can lead to performance problems.

Dymamic branching is where you branch on values that may vary per pixel or per vertex and can be expensive.

Using multiple techniques with each using a different shader compiled by passing literal values in for uniform parameters isn't branching at all, it's just a way of using the FX system to generate a bunch of different hard coded shaders without having too much code duplication. If you look at what's generated in the asm shader output you won't see any branch instructions if you take that approach - from the API/hardware point of view there's no branching going on at all, only the hlsl compiler sees the branches.

Game Programming Blog: www.mattnewport.com/blog

Quote:Original post by coderchris
Static branching? I thought that was used when your passing in uniform variables,
Quote:Original post by coderchris
dynamic branching will happen based on some shader parameters, however, they will be uniform for each different object

Ohh, I see now, that makes sense. Thank you for clearing that up :)

This topic is closed to new replies.

Advertisement