Multiple small shaders or larger shader with conditionals?

Started by
5 comments, last by MJP 8 years, 2 months ago

Hey all,

I was wondering if it was better to have my multiple smaller shaders, for example a shader for tiles without lighting, one for self-illumination, another for normal mapping and self-illumination, etc, or if it was fine to hale a larger one using IF/ELSE for different functionality.

I read that conditionals in shaders should be minimized due to their cost, but I wonder if it is better to hahe the IF/ELSE on the CPU side or if a few IFs will keep the shaders fast enough.

I'm only making a 2D tiled game.

Thanks for any input!

Advertisement

You can use #ifdef, #else, #elif, #endif, #define for that purpose, which will output different shaders depending on the defines you supply - think that would be better than conditionals :D

Like:


#ifdef NORMALMAPPING
.. do normalmapping stuff here
#endif

.. So you just write all combinations into one shader, and compile it with different defines to get all the combinations :)

.:vinterberg:.

1) If your GPU is not your performance bottleneck, it may not matter what you do.


I wonder if it is better to hahe the IF/ELSE on the CPU side or if a few IFs will keep the shaders fast enough.

2) Consider that conditionals in a pixel shader will be executed possibly millions of times per frame, compared to a handful of times on the CPU to switch shaders.*

3) If you end up using different shaders (recommended (the way vinterberg described is a good way to approach it)), then make sure you group your draw calls by shader so you're not switching shaders all the time (of course taking into account any draw order required for proper z-order rendering).

*if you're using effects on DX9, pre-shaders could help alleviate some of the performance cost of conditionals in shaders, if you're branching on shader constants.

Depends.

Branching in shaders has a cost.

Changing shaders has a cost.

Depending on what kind of workload you're doing, either cost could outweigh the other. In other words, there's no single one-size-fits-all answer to this.

A further factor, and one that may be more important if each approach already gives you adequate perf, is code complexity. It's perfectly valid to pick an approach that results in simpler, more maintainable code under this circumstance.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Any recent desktop gpu will be able to handle whatever you're doing in a tiled 2D scenario. Unless you grind out some ray marching or use a really low end gpu it probably doesn't matter. A tiled game is probably going to have close to perfect overdraw right?

Using a single shader to do everything is likely going to be easiest to write so I'd do that. If you're branching on constants, you'll be fine.

Somebody correct me if I'm wrong, but if every path in a shader takes the same branch it is nearly (not entirely) the same cost as if the changes were compiled in with defines. If you are rendering an object with a specific set of branches that every thread will take it may not be a big deal. If threads take different branches you will eat the cost of all branches.

Somebody correct me if I'm wrong, but if every path in a shader takes the same branch it is nearly (not entirely) the same cost as if the changes were compiled in with defines. If you are rendering an object with a specific set of branches that every thread will take it may not be a big deal. If threads take different branches you will eat the cost of all branches.


Branching on constants is nice, because there's no chance of divergence. Divergence is when some threads within a warp/wavefront take one side of the branch, and some take another. It's bad because you end up paying the cost of executing both sides of the branch. When branching on a constant everybody takes the same path, so you only pay some (typically small) cost for checking the constant and executing the branch instructions.

As for whether it's the same cost as compiling an entirely different shader permutation, it may depend on what's in the branch as well as the particulars of the hardware. One potential issue with branching on constants is register pressure. Many GPUs have static register allocation, which means that they compute the maximum number of registers needed by a particular shader program at compile time and then make sure that the maximum is always available when the shader is actually executed. Typically the register file has a fixed size and is shared among multiple warps/wavefronts, which means that if a shader needs lots of registers then fewer warps/wavefronts can be in flight simultaneously. GPUs like to hide latency by swapping out warps/wavefronts, so having fewer in flight limits their ability to hide latency from memory access. So let's say that you have something like this:


// EnableLights comes from a constant buffer
if(EnableLights)
{
   DoFancyLightingCodeThatUsesLotsOfRegisters();
}
By branching you can avoid the direct cost of computing the lighting, but you won't be able to avoid the indirect cost that may occur from increased register pressure. However if you were to use a preprocessor macro instead of branch and compile 2 permutations, then the permutation with lighting disabled can potentially use less registers and have greater occupancy. But again, this depends quite a bit on the specifics of the hardware as well as your shader code, so you don't want to generalize about this too much. In many cases branching on a constant might have the exact same performance as creating a second shader permutation, or might even be faster due to CPU overhead from switching shaders.

This topic is closed to new replies.

Advertisement