Why Shader Permutation

Started by
6 comments, last by cozzie 4 months, 2 weeks ago

I read this wonderful article about shader permutation:
https://therealmjp.github.io/posts/shader-permutations-part1/
https://therealmjp.github.io/posts/shader-permutations-part2/

The article's definition of shader permutation:
“what I’m referring to is taking a pile of shader code and compiling it N times with different options. In most cases these permutations are tied directly to features that are supported by the shader, often by writing the code in an “uber-shader” style with many different features that can be turned on and off independently.”

My understanding is: there are features enabled / disabled base on inputs during compilation. Since developer never know what combination (of features) will be used in the run time. It's better to compile all permutations. This is where the problem come from.

Please correct me if my understanding is wrong.

What I don't understand is why can't we put every piece of related code into one giant shader with many branches(uber shader).
Like what Doom Eternal was doing (https://advances.realtimerendering.com/s2020/RenderingDoomEternal.pdf) ?

A couple of possible reasons I can think of:
1. Put everything into one file created a shader that's too large to maintain.
2. Node based material system is hard to convert into Uber shader (maybe?).
3. Hard to optimize considering the size of it?
4. Too much instructions to fit into instruction cache.

Advertisement

hbr3ehreg said:
What I don't understand is why can't we put every piece of related code into one giant shader with many branches(uber shader).

Back when shader-permutations were invented, (real/dynamic) branches (and loops) on a GPU where very costly. So costly, that you'd usually try to avoid them at all. So, the question “why didn't we write giant shaders with 20 branches” originally could have been answered with: Because this would haven been way slower then with permutations.

Nowadays, the cost of branches is much smaller, so you have more choice. The PDF doesn't really go into detail on how many uber-shaders they have, though they mention they have “a few variants”. So that to me still says permutations. And it kind of makes sense - if 50% of your scenes geometry has this very specific combination of material-properties, why not precompile everything and save those branches (plus, get loop-unrolling etc…)? Switching shaders also had/has a cost, so you want to avoid doing that too much eigther. So you have to find a middle-ground.

Every other point you mention, I don't think applies. How you organize your shaders and how you optimize doesn't make a difference for one big shader or permutations. Permutations are just precompiled branches, so you work with the shaders in the same way in both cases. I have my own node-based material system, and I can just have it generate whatever I want. If I need an uber-shader I get an uber-shader, if it makes sense to have permutations I have permutations.

hbr3ehreg said:
What I don't understand is why can't we put every piece of related code into one giant shader with many branches(uber shader).

The primary technical reason is register and LDS memory allocation.

If we write a complex shader, the compiler eventually needs more registers. Even if we use branches, and only a small amount of code will run, we still pay the cost for the registers needed for the complete code.
The cost is limited occupancy. Because our big shader needs so many registers, the GPU can't have many other shaders in flight to compensate VRAM latency. The GPU is underutilized and perf. goes down.

To counteract, you can use profiling tools showing register allocation. While writing a big ubershader you can see how code size affects it.
Ideally, all your branched code sections reuse the same registers, and you pay no extra cost (ignoring branching and instruction costs). If so, the most demanding code section decides register count, and all the others won't make it worse.
You may then decide to use some permutations still, if it helps to remove such worst case code sections.

I guess that's what id has done, did not check the link. It's also possible to automate such optimization process ofc.

What applies to registers also applies to LDS memory, but i guess compute shaders do not really cause so much permutations usually.

@Juliean
Thanks for your reply!
Do you think Uber shader approach is something limited to linear game like Doom? It's has less variants than open world games I guess.
Or Uber shader is something fit most of the games and can cut significant amount of shader permutations.
Do you see other advantages / disadvantages of uber shader other than shader permutation and performance issues you mentioned already.

Another factor is maintainability and how much is exposed to users. Can your artists make brand new shaders directly or indirectly via materials or is it just the render programmers? Do you expect to have dozens or hundreds or thousands of “shaders”? How are they composed? Hand written?

Have you profiled the performance of what you have BEFORE trying to optimise anything? Have you assessed maintainability BEFORE trying to optimise anything?

If you have only a few shaders, nothing makes a difference and it can all be easily refactored as needed.

“Systemic” shaders or systemic parts of composed shaders versus user-composed shaders are also a difference and each might be done differently.

I cannot answer what kinds of software benefit from different approaches, it will become apparent only once there are “enough” shaders.

hbr3ehreg said:
Since developer never know what combination (of features) will be used in the run time. It's better to compile all permutations. This is where the problem come from.

I'd also like to add that this may or may not actually be a factor. Some engines really do compile all possible permutations all the time (I believe both Unity and Unreal did/do this, but that may have changed since I last used them many years ago). However, if your build system is smart enough then you can know ahead-of-time exactly what permutations are required and ship your game with only the required ones.

hbr3ehreg said:

@Juliean
Thanks for your reply!
Do you think Uber shader approach is something limited to linear game like Doom? It's has less variants than open world games I guess.

I don't think this really matters, I'd say the main difference there would be in spatial division, world streaming etc., not the actual rendering through vertex or pixel shaders.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

This topic is closed to new replies.

Advertisement