[HLSL] Can I improve this somehow?

Started by
10 comments, last by mind in a box 13 years, 9 months ago
Does it perform much faster than the switch version?

I'll try to explain how this works using an imaginary float6 type to simplify things, and replacing your 'default' with a 'case 5' as well. There's some notes below the code explaining what these lines do
//These are all the return values from your switch statement - one for each casefloat4 tex = Tex(TerLayers);float6 returnValues = float6( 1.0, tex.r, tex.g, tex.b, tex.a, 0.0 );//These are the case values from your switch - if Color.x equals that value, then that case is the one we wantfloat6 caseValues = float6( 0.0, 1.0, 2.0, 3.0, 4.0, 5.0 );//evaluate if each case is truefloat6 conditions;conditions[0] = (caseValues[0] == Color.x) ? 1.0 : 0.0;//#1conditions[1] = (caseValues[1] == Color.x) ? 1.0 : 0.0;conditions[2] = (caseValues[2] == Color.x) ? 1.0 : 0.0;conditions[3] = (caseValues[3] == Color.x) ? 1.0 : 0.0;conditions[4] = (caseValues[4] == Color.x) ? 1.0 : 0.0;conditions[5] = (caseValues[5] == Color.x) ? 1.0 : 0.0;//now use the true/false values to select one element from returnValuesreturn dot( conditions, returnValues ); //#2
#1
When you perform a statement like + on two vectors (e.g. a float4), that operation happens on each individual element, e.g.
float4 a = float4(1,2,3,4);
float4 b = float4(4,3,2,1);
float4 c = a + b;//c == float4(5,5,5,5)

In my last example, instead of using "a == b ? 1.0 : 0.0", I used the "step( a, b )" function, which is equivalent to the code "b >= a ? 1.0 : 0.0". Also, like the "+" example above, step operates on all of the elements in the vector, so it does 4 ">=" tests in one go, which is much more efficient than doing each comparison individually.

#2
A dot product (the "dot" function) looks like this:
return conditions.x * returnValues.x     + conditions.y * returnValues.y     + conditions.z * returnValues.z     + ...;
It's basically a quick way of doing lots of multiplications and adding the results together.
In this case (the return statement at the end of the above code), conditions is full of 0's (false) and one 1 (true), so every value in returnValues except one of them is multiplied by 0, and one of them is multiplied by 1. The result of these multiplications is added together, which essentially picks one particular element, and then adds zero it a bunch of times.
This is a quick way of selecting one of the floats from returnValues and throwing out the rest.

#3
You can see in #2, that when using floats for boolean logic like this, that "*" kind of acts like "&&" does in traditional programming.
i.e. these are equivalent
float a = 1.0;          //  bool a = true;float b = 0.0;          //  bool b = false;float a_and_b = a * b;  //  bool a_and_b = a && b;
On the GPU it's better to use floats for most things, so it's good to get used to doing this kind of boolean logic using float-arithmetic.

In my first post I also demonstrated a NOT (traditionally "!")
float x = 1.0;        //  bool x = true;float not_x = 1.0-x;  //  bool not_x = !x;
For completeness you can also do OR (traditionally "||") like:
float a = 1.0;                   //  bool a = true;float b = 0.0;                   //  bool b = false;float a_or_b = saturate(a + b);  //  bool a_or_b = a || b;

[EDIT]Oh, and if you're calling this 8 or 9 times in your shader -- don't assume that the compiler will optimise that for you. Call it once and store the result in a local variable, then re-use that variable 8 times ;)

[Edited by - Hodgman on July 17, 2010 11:35:42 AM]
Advertisement
Wow! Thank you for that explanation. This will help me in lots of situations I think. I never used branching before in my shaders, just because I've heard a lot of bad performance with them. But not using them turned up many problems. Until yesterday I always found a solution. [smile]

I've done some profiling:
- Your solution: ~73 FPS, 6 calls
- Switch/case: ~69 FPS, 6 calls
- programci_84 solution (Yes, got it working): ~80 FPS, 6 calls

For now programci_84's is best.

- Your solution: ~45 FPS, 30 calls
- Switch/case: ~29 FPS, 30 calls
- programci_84 solution: ~30FPS, 30 calls

And then I was surprised [smile]
However, even when I have to do some ugly hacking, I will put it into a local variable before using it. The whole thing is part of a bunch of shader includes, but I think I should be able to get this working.

Also thank you for the Rating++, even if I don't understand why [wink]

This topic is closed to new replies.

Advertisement