Shaders Infrastructure / Best Practices

Started by
10 comments, last by AndyTX 15 years, 5 months ago
I'd like to discuss the infrastructure commonly needed / used to make working with shaders as convenient as possible. Please add your advice, ideas, additional options, advantages and disadvantages or correct me if I'm wrong somewhere. Also if anyone has any good ideas to avoid the problems marked below I'd love to hear them.

1. Deal with Combinatorial Explosion of Shader Permutations.

a) Use one big Über-Shader with dynamic branching. - Bad for performance. - Not feasible for older graphics card that don't support Shader Model 3.0. b) Use a custom combination system that concatenates snippets of shader code together. - A lot of additional code to write and maintain. c) Exploit the preprocessor mechanisms the shader compiler (macros, preprocessor #ifdef tests). - Messy if you need precompiled shader files(?) d) Exploit the "uniform" mechanisms of D3DX effects (uniform bool variables per technique). - Need to trust the system to do the right thing / inspect the output and tweak it. - Depends on D3DX. I went with option d) for now and tried two variants that I read about: d)-i:

float4 PS_DirectBoolParam(float4 inTexCoord : TEXCOORD0, uniform bool configEnabled) : COLOR0 {
	if(configEnabled)	return pow(inTexCoord,pow(inTexCoord,3));
	else return inTexCoord;
}
technique T_DirectBoolParam {
    pass Pass0 {
        PixelShader = compile ps_3_0 PS_DirectBoolParam(false);
    }
}

+ Works. - Becomes quite messy when you have half a dozen or even more boolean options. d)-ii:

struct Config { bool Enabled; };
uniform Config std = { false };
Config WithEnabled(Config cfg) { cfg.Enabled=true; return cfg; }

float4 PS_ViaStruct(float4 inTexCoord : TEXCOORD0, uniform Config config) : COLOR0 {
  if(config.Enabled) return pow(inTexCoord,pow(inTexCoord,3));
	else return inTexCoord;
}

technique T_ViaStruct {
    pass Pass0 {
        PixelShader = compile ps_3_0 PS_ViaStruct(std);
    }
}

+ Leads to quite maintainable code even with a ton of boolean options. - Doesn't seem to compile to the expected output. Uses dynamic Über-Shader-like branches. Are there any good tricks to make this work well?

2. Integrate in Visual Studio: Syntax Hilighting and more

a) Simply register .fx files with the C++ syntax highlighting system + Easy to do - Doesn't know about most shader keywords b) Use IntelliShade.Net plugin + Proper syntax highlighting + Even has IntelliSense support - After using it for some time, I find that I'm actually only annoyed by the IntelliSense. It has NEVER helped and gets in the way often. c) Use the Cg SDK plugin ?

3. Integrate in Visual Studio Build System

a) Write custom pre-/post build steps - Messy - No dependency checking b) Write a "custom build rules" file + Quiet nice and clean + Dependency checking - Only available in C++ projects - Inter-Project and include file dependency checking is broken/flawed Are there any good tricks to make this work well?

4. Write Modular Shader Code

a) Use #include directives - Breaks 3.b) dependency checking b) ?

5. Write API independent code

a) Use Cg ? b) Use HLSL and a HLSL-to-GLSL converter ?
___________________________Buggrit, millennium hand and shrimp!
Advertisement
6. Make it quick and easy to reload shaders on the fly.
___________________________Buggrit, millennium hand and shrimp!
Im interested as well in how people handle these different cases. In a similar thread, I was saying that It seems like the "dynamic branching" idea was the best way to go, because

-in most cases it doesnt actually use dynamic branching (rather static branching) thus there isnt any performance impact

-you dont have to manage a thousand techniques

-only have to compile one file

However, im also hearing talk that these types of shaders take a looong time to compile (I cant verify this though, I guess I havent tried it with a complicated enough shader)

Can anyone confirm weather or not an "ubershader" with lots of static branching actually takes longer to compile than one with say, lots of techniques that uses uniform booleans?
Quote:Original post by coderchris
-in most cases it doesnt actually use dynamic branching (rather static branching) thus there isnt any performance impact

Can you give an example? Is it possible to ensure that dynamic branching is not used?

Quote:-you dont have to manage a thousand techniques

Declaring the technique using a macro takes a lot of pain out of having to do that.

Quote:-only have to compile one file

Why would you have to compile more than one file, when using techniques?

Quote:Can anyone confirm weather or not an "ubershader" with lots of static branching actually takes longer to compile than one with say, lots of techniques that uses uniform booleans?

Maybe I don't quite understand what you mean by static branches, but why would it take longer to compile?
___________________________Buggrit, millennium hand and shrimp!
BY static branching, I mean that the compiler realizes that the variable will be constant for all rendered pixels and verticies of an object, thus it compiles several "versions" of the same shader for different cases of that variable (Since there is no need to dynamically branch if we know exactly which path it will take). Im not sure under what conditions exactly the compile determines this (I think it may depend partly on the drivers), and I dont think its possible to tell the compiler which variables this should happen with, but I would assume the the compiler is pretty smart about it.

Here is a short example of static branching:

int numLights;bool hasShadow[MAX_LIGHTS];int materialType;pixel_shader(vertexShaderOutput in) {    for (int i=0;i<numLights;i++) {        float shadow;        if (hasShadow) { //compute shadowing }        if (materialType == 0) { // do material 0 }        else if (material type == 1) { // do material 1 }        ... and so on    }    return finalColor}


SO basically, as you can see, the variables here are not passed through a technique, but because you can determine that they are constant for all rasterized pixels of each triangle, there is no reason for dynamic branching, so typeically the compiler creates several versions of the same shader with different configurations of the states of these varaibles. SO whenever you set on of these variables from your main application, the driver will pick which one to use.

Now, some things would be very hard to make work with static branching. For example, im not sure that the for loop above would be static or not (if its not you can make it static by putting a bound on numLights).

THe reason I ask about compile time is because there could potentially be ALOT of combinations of these variables. Even with those 3 variables above, theres already hundreds of potential combinations.
Very interesting. How would you confirm that the compiler really does this? Just assuming because of the performance? Do you just have to hope the runtime compiler does it? Does it ever work when compiling offline using fxc.exe? Is this documented anywhere?
___________________________Buggrit, millennium hand and shrimp!
To be honest, IM not really sure how to tell when it does it, or how to indicate to it to use it, I just remember reading in the msdn (try googling msdn static branching) that shader model 3.0 + supports this, and I would assume that the fxc compiler will do this, though I havent tested it.

I know that you can tell the compiler NOT to statically branching by writing [branch] (or whatever it is) above the if/loop

More than that, I dont know, other than I have personally seen (i guess I should say Not seen) performance differences

The only hit for "static branching" on MSDN:
Quote:Original post by MSDN
Static branching allows blocks of shader code to be switched on or off based on a Boolean shader constant. This is a convenient method for enabling or disabling code paths based on the type of object currently being rendered. Between draw calls, you can decide which features you want to support with the current shader and then set the Boolean flags required to get that behavior. Any statements that are disabled by a Boolean constant are skipped during shader execution.

The most familiar branching support is dynamic branching. With dynamic branching, the comparison condition resides in a variable, which means that the comparison is done for each vertex or each pixel at run time (as opposed to the comparison occuring at compile time, or between two draw calls). The performance hit is the cost of the branch plus the cost of the instructions on the side of the branch taken. Dynamic branching is implemented in shader model 3 or higher. Optimizing shaders that work with these models is similar to optimizing code that runs on a CPU.

I also found an old thread here on GD.net with some interesting comments:
Quote:the use of static branching reduce the driver ability to optimize your shader. Most people stay away from static branching and generate a own shader for every permutation needed.

Quote:Nvidia and ATI have some tools that analyze your shader code in the same way the driver does and give you some results.

Quote:Some older cards don't truly support static branching - the driver actually recompiles the shader on the fly to include only the taken branch(es). In these cases using static branching can impose an extra CPU cost that can be significant.



I see no requirement for SM3 anywhere for static branching, but I seem to remember that it quickly causes instruction limits to be exceeded for SM2.

Is SM3 considered an acceptable baseline requirement nowadays? Besides compatibility with cards that don't support SM3, are there any downsides to use SM3 over SM2?
___________________________Buggrit, millennium hand and shrimp!
Hmm, interesting finds. I guess I can see how it would reduce the ability to optimize.

Now that I think about it, I suppose you could emulate static branching yourself by making your own techniques for each combination instead of letting the compiler make them for you, which would allow you to optimize each case yourself...
(I think you already proposed this)

In terms of shader model 3, there arent really any downsides (if anything, theres plenty of upsides to using it).

In my opinion, its perfectly reasonable to target SM 3+; if you look at alot of the recent commercial games released in the past few months (even years) they require minumum of shader model 3. Also, you probably wont be able to find a computer in any store (other than an antique store :P) which doesnt have a shader model 3 card
Could I just point something out here: it's
InteliShade.Net, not
IntelliShade.Net.
There's only one L. Just making sure people don't have any troubles finding it :)

cheers,
metal

This topic is closed to new replies.

Advertisement