Uber-shader approach

Graphics and GPU Programming Programming OpenGL

Started by Pilpel January 21, 2016 03:24 PM

6 comments, last by Crowley99 8 years, 2 months ago

Pilpel

564

Author

January 21, 2016 03:24 PM

I code in OpenGL and I've been reading some posts about this uber-shader approach.

It first sounded neat but on second thought having lots of shader programs will require lots of glUseProgram() calls in runtime, which is afaik the most expensive gl function to call in runtime. (said by L.Spiro as I recall)

Is this thing noted? Am I missing something? Are there better approaches for managing shaders in big engines?

Also, what about using uniforms to do dynamic branch selection? This may slow things a little, but avoids having the switch many programs in every frame.

L. Spiro

25,818

January 21, 2016 09:23 PM

Mitigate the cost of glUseProgram() by sorting objects with the same shaders such that they can be drawn without calling glUseProgram() (render queues).

Branches inside shaders are almost free as long as all pixels in a block take the same branch. Still there is a small cost for branching, and it never makes sense to keep a branch if you have to swap shaders anyway (such that the branch is always taken in one and never in the other). For example, opaque and translucent objects should always be permutated and without a branch indicating "alpha or not".

A branch is costly if in a block some pixels will take one path and others the other. In that case each branch waits for the other, meaning all pixels ran both branches.

When to permutate is decided based on the balance between these costs.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Hodgman

52,717

January 21, 2016 10:29 PM

Every gl* call takes a bit of CPU time to execute that function. As above, you should sort your objects and filter our redundant states to reduce the number of gl calls.

Every instruction in a fragment shaders takes a bit of GPU time for every pixel drawn. You should use optimal shaders to avoid unnecessary instructions.

Yes those two goals can be contradictory -- optimizing your GPU frametime may require hurting your CPU frametime, and vice versa. Optimizations must always be based on measurements from your specific situation.

If you're rendering a thousand objects and 10 million pixels, then it makes sense to try and optimise the per-pixel cost but as always, measure first.

. 22 Racing Series .

Pilpel

564

Author

January 22, 2016 06:48 AM

For example, opaque and translucent objects should always be permutated and without a branch indicating "alpha or not".

Why is that? Rendering translucent object only differs from opaque objects by the order of draw calls, no? The shader can stay the same.

Edit: Maybe my English slows me down here but by "permutated" you mean having different shaders for translucent and opaque objects, right?

A side question: What could be a legit number of shader switches in each frame, in AAA games?

Hodgman

52,717

January 22, 2016 07:07 AM

What could be a legit number of shader switches in each frame, in AAA games?

5184.

. 22 Racing Series .

Pilpel

564

Author

January 22, 2016 07:24 AM

For real? I thought something like 100 would be expensive as hell.

Well then precompiling all possible shader routines shouldn't be too wasteful I guess.

Can you answer my first question?

Another question:

How in-depth do you go with uber-shaders? Do you also condition the vertex attributes, like so:


layout(location=0) in vec3 vertex;
#ifdef VERTEX_COLOR
layout(location=1) in vec3 color;
#endif
#ifdef VERTEX_NORMAL
layout(location=2) in vec3 normal;
#endif
//and so on

LHLaurini

675

January 22, 2016 06:37 PM

For real? I thought something like 100 would be expensive as hell.

Remember: not all AAA games are well optimized.

Crowley99

194

February 06, 2016 06:23 PM

Mitigate the cost of glUseProgram() by sorting objects with the same shaders such that they can be drawn without calling glUseProgram() (render queues).
Branches inside shaders are almost free as long as all pixels in a block take the same branch. Still there is a small cost for branching, and it never makes sense to keep a branch if you have to swap shaders anyway (such that the branch is always taken in one and never in the other). For example, opaque and translucent objects should always be permutated and without a branch indicating "alpha or not".
A branch is costly if in a block some pixels will take one path and others the other. In that case each branch waits for the other, meaning all pixels ran both branches.
When to permutate is decided based on the balance between these costs.
L. Spiro

BTW, if one block of the branch (or even branch chain) requires high register usage, it will reduce the number of warps in flight, which can have an overall negative impact on performance (your super simple "sub-shader" may be running with the same register allocation as your super complex one).

So as a rule of thumb, it is often better to group your complex shaders together in one uber shader and your simpler shaders in another - so slightly less uber ;)

Uber-shader approach

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Uber-shader approach

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines