Uber-shader approach

Started by
6 comments, last by Crowley99 8 years, 2 months ago

I code in OpenGL and I've been reading some posts about this uber-shader approach.

It first sounded neat but on second thought having lots of shader programs will require lots of glUseProgram() calls in runtime, which is afaik the most expensive gl function to call in runtime. (said by L.Spiro as I recall)

Is this thing noted? Am I missing something? Are there better approaches for managing shaders in big engines?

Also, what about using uniforms to do dynamic branch selection? This may slow things a little, but avoids having the switch many programs in every frame.

Advertisement
Mitigate the cost of glUseProgram() by sorting objects with the same shaders such that they can be drawn without calling glUseProgram() (render queues).

Branches inside shaders are almost free as long as all pixels in a block take the same branch. Still there is a small cost for branching, and it never makes sense to keep a branch if you have to swap shaders anyway (such that the branch is always taken in one and never in the other). For example, opaque and translucent objects should always be permutated and without a branch indicating "alpha or not".

A branch is costly if in a block some pixels will take one path and others the other. In that case each branch waits for the other, meaning all pixels ran both branches.

When to permutate is decided based on the balance between these costs.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Every gl* call takes a bit of CPU time to execute that function. As above, you should sort your objects and filter our redundant states to reduce the number of gl calls.

Every instruction in a fragment shaders takes a bit of GPU time for every pixel drawn. You should use optimal shaders to avoid unnecessary instructions.

Yes those two goals can be contradictory -- optimizing your GPU frametime may require hurting your CPU frametime, and vice versa. Optimizations must always be based on measurements from your specific situation.

If you're rendering a thousand objects and 10 million pixels, then it makes sense to try and optimise the per-pixel cost ;) but as always, measure first.

For example, opaque and translucent objects should always be permutated and without a branch indicating "alpha or not".

Why is that? Rendering translucent object only differs from opaque objects by the order of draw calls, no? The shader can stay the same.

Edit: Maybe my English slows me down here but by "permutated" you mean having different shaders for translucent and opaque objects, right?

A side question: What could be a legit number of shader switches in each frame, in AAA games?

What could be a legit number of shader switches in each frame, in AAA games?

5184.

For real? I thought something like 100 would be expensive as hell.

Well then precompiling all possible shader routines shouldn't be too wasteful I guess.mellow.png

Can you answer my first question?

Another question:

How in-depth do you go with uber-shaders? Do you also condition the vertex attributes, like so:


layout(location=0) in vec3 vertex;
#ifdef VERTEX_COLOR
layout(location=1) in vec3 color;
#endif
#ifdef VERTEX_NORMAL
layout(location=2) in vec3 normal;
#endif
//and so on

For real? I thought something like 100 would be expensive as hell.


Remember: not all AAA games are well optimized.

Mitigate the cost of glUseProgram() by sorting objects with the same shaders such that they can be drawn without calling glUseProgram() (render queues).
Branches inside shaders are almost free as long as all pixels in a block take the same branch. Still there is a small cost for branching, and it never makes sense to keep a branch if you have to swap shaders anyway (such that the branch is always taken in one and never in the other). For example, opaque and translucent objects should always be permutated and without a branch indicating "alpha or not".
A branch is costly if in a block some pixels will take one path and others the other. In that case each branch waits for the other, meaning all pixels ran both branches.
When to permutate is decided based on the balance between these costs.
L. Spiro


BTW, if one block of the branch (or even branch chain) requires high register usage, it will reduce the number of warps in flight, which can have an overall negative impact on performance (your super simple "sub-shader" may be running with the same register allocation as your super complex one).

So as a rule of thumb, it is often better to group your complex shaders together in one uber shader and your simpler shaders in another - so slightly less uber ;)

This topic is closed to new replies.

Advertisement