So say I was doing a renderer in CUDA.
I could say, enable light 0 and make it a point light.
Disable light 1.
Disable light 2.
Then I'd have the entire program loaded and just run something like this:
for(int light = 0; light < maxLights; light++) {
if(gl_light[light].enabled) {
switch(gl_light[light].type) {
case LIGHT_SPOT:
//do spot light stuff
break;
case LIGHT_POINT:
//do point light stuff
break;
case LIGHT_DIRECTIONAL:
//do diractional light stuff
break;
}
}
}
There are branches, but all processors take the same path in the branches and don't cause divergence.
In the world of Shaders I've seen people recommending something like uber shaders to handle different permutations of rendering states to avoid branches since they are supposed to be slower. But is it really slower in shaders when all of them take the same path in the code? You end up having to compile many different shaders and changing which one is loaded based on what you are rendering, which causes some slowdown.
Is there a reason to not just have these much bigger shaders with some if statements?