Why use Uber Shaders? Why not one giant shader with some ifs.

Started by
20 comments, last by Hodgman 11 years, 11 months ago

So the provided example may not issue branch instructions at all. This is a candidate for uniform branching, in which case the runtime or driver may choose to produce multiple compilations of the shader where all of the branches have been resolved and the loops unrolled. This was really common before we had hardware branching, in the 2.x days. I'm not sure to what extent it's still used now, but you can hint the compiler to unroll loops and avoid branches.

Uber shaders give you much more precise control over compilation though.


That sounds really nice. So it basically knows which compiled version to use depending on what arguments I send it?

I would say, glUseProgram(someShader), and based on the parameters I set, it would actually select the real shader I want? In many cases, the branches are super obvious, like this material is not using normal maps, or this light is not casting specular reflections...

Right now I have a system that uses bit masks to figure out the permutations of the uber shader to load and it's cool and all, but if I can have something much simpler, that would be awesome.



And yeah I can see the problem with using too many registers now. It's always good to know how something actually works.
Advertisement
Let me add one further note on the register allocation problem. If we can decide on one branch before we issue the draw call, Dx has some sweet candy for us. Formerly we would have branched depending on some constant buffer value and probably uniform branching would have kicked in. But now, Dx11 brought us interfaces to HLSL. With those we can define methods, which can be implemented by multiple classes. Before issuing a draw call we can assign a particular class that should be used for an interface variable. The good news is that the driver inlines the hardware native shader code of the methods - declared in the interface and implemented by the selected class - at bind time (!), thereby choosing the optimal register count.

This is supposed to be the solution to the dilemma: ubershaders vs. many specialized shader files. It has two upsides: We can stop worrying about the register allocation (since we’re not branching) and the code becomes cleaner (neither huge branch trees nor dozens of shader files for the permutations).
Of course on the downside it can only optimize the function bodies independently. :-/ But still, it's a very helpful tool.

Allison Klein (GamesFest 2008, slides and audio track are online on MSDN) and Nick Thiebieroz (GDC 09) talked a little on this.
(Edit: In OpenGL the concept is called Subroutine Functions and is basically doing the same.)

Let me add one further note on the register allocation problem. If we can decide on one branch before we issue the draw call, Dx has some sweet candy for us. Formerly we would have branched depending on some constant buffer value and probably uniform branching would have kicked in. But now, Dx11 brought us interfaces to HLSL. With those we can define methods, which can be implemented by multiple classes. Before issuing a draw call we can assign a particular class that should be used for an interface variable. The good news is that the driver inlines the hardware native shader code of the methods - declared in the interface and implemented by the selected class - at bind time (!), thereby choosing the optimal register count.

This is supposed to be the solution to the dilemma: ubershaders vs. many specialized shader files. It has two upsides: We can stop worrying about the register allocation (since we’re not branching) and the code becomes cleaner (neither huge branch trees nor dozens of shader files for the permutations).
Of course on the downside it can only optimize the function bodies independently. :-/ But still, it's a very helpful tool.

Allison Klein (GamesFest 2008, slides and audio track are online on MSDN) and Nick Thiebieroz (GDC 09) talked a little on this.
(Edit: In OpenGL the concept is called Subroutine Functions and is basically doing the same.)


This is also available in nVidia's Cg library and works on a much wider array of hardware-- it was even working on the old GeForce 6800s way back when GPU Gems (1!) was the hot new thing.

Just FYI :)
clb: At the end of 2012, the positions of jupiter, saturn, mercury, and deimos are aligned so as to cause a denormalized flush-to-zero bug when computing earth's gravitational force, slinging it to the sun.

This is also available in nVidia's Cg library and works on a much wider array of hardware-- it was even working on the old GeForce 6800s way back when GPU Gems (1!) was the hot new thing.

Just FYI smile.png


Nice, thanks a lot! That’s very good to know! smile.png
GPU Gems 1 is indeed quite antique. smile.png Kind of cool that those things were possible for so long.

How does Cg handle this? Is it compiling and optimizing the function bodies individually, too, and inlines them at bind time? Or does it compile all permutations completely? How can I – as a programmer – decide which permutation to pick for the execution?

Can you tell me how the whole thing is called in the Cg terminology, so I can find it easier?
I was curious and started browsing through the Cg specification to find out more. Do you mean “Overloading of functions by profile” (page 170)? Also a nice feature, but that’s not it, isn’t it? This doesn’t seem to solve the permutation issue - or does it?

Thanks!
It's right smack in GPU gems, actually. The article doesn't go too much into implementation details, but I wager there's some sort of runtime inlining or additional precompilation going on.

EDIT: If they don't mention it in the new language manuals, I may stand corrected here. Wonder if it's been removed/deprecated somehow.

EDIT 2: Based on the API descriptions provided, it's probably the first approach, inlining/AST substitution.
clb: At the end of 2012, the positions of jupiter, saturn, mercury, and deimos are aligned so as to cause a denormalized flush-to-zero bug when computing earth's gravitational force, slinging it to the sun.

It's right smack in GPU gems, actually. The article doesn't go too much into implementation details, but I wager there's some sort of runtime inlining or additional precompilation going on.

EDIT: If they don't mention it in the new language manuals, I may stand corrected here. Wonder if it's been removed/deprecated somehow.

EDIT 2: Based on the API descriptions provided, it's probably the first approach, inlining/AST substitution.


Thanks again! smile.png

I found it in the Cg Users Manual. It is described in section “Shared Parameters and Interfaces” in epic broadness.
I've been working with OpenGL for a while and recently got into GLSL and never even touched Direct X yet. Does cg run pretty well with OpenGL? I think I remember seeing an extension or something for Cg. Is it a good idea to switch over to Cg and try to make use of that feature?

Knowing OpenGL seems very useful since it pretty much runs on every device I've tried, Windows, Mac, Linux, iPhone, Android...

I've been working with OpenGL for a while and recently got into GLSL and never even touched Direct X yet. Does cg run pretty well with OpenGL? I think I remember seeing an extension or something for Cg. Is it a good idea to switch over to Cg and try to make use of that feature?

Knowing OpenGL seems very useful since it pretty much runs on every device I've tried, Windows, Mac, Linux, iPhone, Android...


Yes, very much so. It can be pretty accurately described by the phrase 'HLSL for OpenGL,' in fact.
clb: At the end of 2012, the positions of jupiter, saturn, mercury, and deimos are aligned so as to cause a denormalized flush-to-zero bug when computing earth's gravitational force, slinging it to the sun.
This remibds me of an blog post that I read a while ago about simulating closures in HLSL. As far as I rember it works on SM 3.0 and up.
http://code4k.blogspot.com/2011/11/advanced-hlsl-using-closures-and.html
Wow, HLSL looks way better than GLSL. More like C++... What the hell... So Cg is similar to HLSL? I should switch to Cg...

This topic is closed to new replies.

Advertisement