# [HLSL] Can I improve this somehow?

This topic is 2779 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Here, I need to call this function many times (About 8 or 9) in my shader:
float GetLayer(PixelShaderInput Input){	//return 1;	switch(Color.x)	{		case 0:		return 1;				case 1:		return Tex(TerLayers).r;				case 2:		return Tex(TerLayers).g;				case 3:		return Tex(TerLayers).b;				case 4:		return Tex(TerLayers).a;		default:		return 0;	}}

And it kicks my FPS from 50 down to 30. I also compile my shaders at the highes optimization level, so do I really do 8-9 texture lookups? "Color" is a global variable set by my application.

##### Share on other sites
make color a float4

example:

// Color is float4(0, 0, 0, 1)

return Tex(TerLayers) * Color;

// returns Tex(TerLayers).a;

edit: sorry, just noticed you want to return a float only. but maybe you can somehow make it usable.

Or this quick fix:

float4 result = Tex(TerLayers) * Color;

return result.x + result.y + result.z + result.a; // maybe theres a function for this

[Edited by - scope on July 16, 2010 6:54:38 PM]

##### Share on other sites
hm... That would restrict me to 4 layers. Currently I am able to have 6 but I only implemented 4 of them. But good Idea, I will try that,but not yet. It's 1AM here and I'll go to bed now. [smile]

##### Share on other sites
maybe describe what you'd like to achieve. is it some sort of multipass terrain texturing thing?

##### Share on other sites

//...float retColor[4] = Tex(TerLayers);if (Color.x) //check it's non-zero    return retColor[Color.x - 1];else    return 1;//...

##### Share on other sites
If you can change Color, then how about
float4 mask = float4( 0, 1, 0, 0 );...float4 layer = Tex(TerLayers);return any( mask ) ? layer.r * mask.r + layer.g * mask.g + layer.b * mask.b + layer.a * mask.a : 1;
This way you can also blend, for example float4( 0.5, 0, 0.5, 0 ) gives 50% layer.r and 50% layer.b

##### Share on other sites
Currently I use this piece of code to blend between textures in my terrain. It allows me use 5 layers, 1 base layer and 4 layers determined by the rgba channels of the weights texture.

float4 weights = WeightsAt(IN.Pos3D.xz);float totalWeight = weights.x + weights.y + weights.z + weights.w;if(totalWeight > 1){	weights /= totalWeight;	totalWeight = 1;}OUT.ColourSpec.rgb = col1 * weights.x + col2 * weights.y + col3 * weights.z + col4 * weights.w + col0 * (1 - totalWeight);OUT.NormalHard.rgb = norm1 * weights.x + norm2 * weights.y + norm3 * weights.z + norm4 * weights.w + norm0 * (1 - totalWeight);OUT.ColourSpec.a = specAmount1 * weights.x + specAmount2 * weights.y + specAmount3 * weights.z + specAmount4 * weights.w + specAmount0 * (1 - totalWeight);OUT.NormalHard.a = specHard1 * weights.x + specHard2 * weights.y + specHard3 * weights.z + specHard4 * weights.w + specHard0 * (1 - totalWeight);

##### Share on other sites
@scope: I wonder if that is still fast if I do all those adds
@programci_84: This is a nice thing, but it seems to fail at compilation time:
Quote:
 TerrainPixelShader.fxh(28,9): error X3017: cannot convert from 'float4' to 'float[4]'

I wonder if I can do this conversion somehow. That seems to fit my needs really good [smile]

@Promethium, Darg: I don't have all my layers in that one shader. So I can't do this [sad]

What I need is indeed a multi-pass terrain renderer. That all happens in screenspace if that helps...

##### Share on other sites
Generally if you're new to shaders, and you're using if or switch, then it can probably be optimized ;)
Rule 1 of shaders is don't branch unless it's really necessary, or you can prove it's an optimization.

I haven't tested this, but it should be equivalent to your original GetLayer function (assuming Color.x is a float and is always positive), but with the nasty branching replaced with 4 dots, 2 steps, some swizzling and an add.
float GetLayer(PixelShaderInput Input){	//generate all the possible results	float4 result1234 = Tex(TerLayers);	float2 result05 = float2(1.0, 0.0);	//generate a 0/1 value for each case to say whether it can be true	float4 case1234 = step( float4(1.0, 2.0, 3.0, 4.0), Color.x );	float2 case05   = step( float2(0.0, 4.004),         Color.x );	//at this point more than one case may be true, because 'step' does >= instead of ==, so lets fix that.	//1 can't be true if 2 is true	//2 can't be true if 3 is true	//3 can't be true if 4 is true	//4 can't be true if 5 is true	not1234 = float4( case1234.yzw, case05.y );	//0 can't be true if 1 is true	//5 can always be true	not05   = float4( case1234.x, 0.0 );	//AND the conditions with their 'nots'	case1234 = dot( case1234, 1.0-not1234 );	case05   = dot( case05,   1.0-not05 );	//Now, case1234 and case05 should be all 0's with a single 1 somewhere in them.	//multiply the conditions (0/1's) with their associated return values	return dot( case1234, result1234 ) + dot( case05, result05 );}

[Edited by - Hodgman on July 17, 2010 9:39:09 AM]

##### Share on other sites
I have no idea why this works, but it works, thank you! [smile]

##### Share on other sites
Does it perform much faster than the switch version?

I'll try to explain how this works using an imaginary float6 type to simplify things, and replacing your 'default' with a 'case 5' as well. There's some notes below the code explaining what these lines do
//These are all the return values from your switch statement - one for each casefloat4 tex = Tex(TerLayers);float6 returnValues = float6( 1.0, tex.r, tex.g, tex.b, tex.a, 0.0 );//These are the case values from your switch - if Color.x equals that value, then that case is the one we wantfloat6 caseValues = float6( 0.0, 1.0, 2.0, 3.0, 4.0, 5.0 );//evaluate if each case is truefloat6 conditions;conditions[0] = (caseValues[0] == Color.x) ? 1.0 : 0.0;//#1conditions[1] = (caseValues[1] == Color.x) ? 1.0 : 0.0;conditions[2] = (caseValues[2] == Color.x) ? 1.0 : 0.0;conditions[3] = (caseValues[3] == Color.x) ? 1.0 : 0.0;conditions[4] = (caseValues[4] == Color.x) ? 1.0 : 0.0;conditions[5] = (caseValues[5] == Color.x) ? 1.0 : 0.0;//now use the true/false values to select one element from returnValuesreturn dot( conditions, returnValues ); //#2
#1
When you perform a statement like + on two vectors (e.g. a float4), that operation happens on each individual element, e.g.
float4 a = float4(1,2,3,4);
float4 b = float4(4,3,2,1);
float4 c = a + b;//c == float4(5,5,5,5)

In my last example, instead of using "a == b ? 1.0 : 0.0", I used the "step( a, b )" function, which is equivalent to the code "b >= a ? 1.0 : 0.0". Also, like the "+" example above, step operates on all of the elements in the vector, so it does 4 ">=" tests in one go, which is much more efficient than doing each comparison individually.

#2
A dot product (the "dot" function) looks like this:
return conditions.x * returnValues.x     + conditions.y * returnValues.y     + conditions.z * returnValues.z     + ...;
It's basically a quick way of doing lots of multiplications and adding the results together.
In this case (the return statement at the end of the above code), conditions is full of 0's (false) and one 1 (true), so every value in returnValues except one of them is multiplied by 0, and one of them is multiplied by 1. The result of these multiplications is added together, which essentially picks one particular element, and then adds zero it a bunch of times.
This is a quick way of selecting one of the floats from returnValues and throwing out the rest.

#3
You can see in #2, that when using floats for boolean logic like this, that "*" kind of acts like "&&" does in traditional programming.
i.e. these are equivalent
float a = 1.0;          //  bool a = true;float b = 0.0;          //  bool b = false;float a_and_b = a * b;  //  bool a_and_b = a && b;
On the GPU it's better to use floats for most things, so it's good to get used to doing this kind of boolean logic using float-arithmetic.

In my first post I also demonstrated a NOT (traditionally "!")
float x = 1.0;        //  bool x = true;float not_x = 1.0-x;  //  bool not_x = !x;
For completeness you can also do OR (traditionally "||") like:
float a = 1.0;                   //  bool a = true;float b = 0.0;                   //  bool b = false;float a_or_b = saturate(a + b);  //  bool a_or_b = a || b;

[EDIT]Oh, and if you're calling this 8 or 9 times in your shader -- don't assume that the compiler will optimise that for you. Call it once and store the result in a local variable, then re-use that variable 8 times ;)

[Edited by - Hodgman on July 17, 2010 11:35:42 AM]

##### Share on other sites
Wow! Thank you for that explanation. This will help me in lots of situations I think. I never used branching before in my shaders, just because I've heard a lot of bad performance with them. But not using them turned up many problems. Until yesterday I always found a solution. [smile]

I've done some profiling:
- Your solution: ~73 FPS, 6 calls
- Switch/case: ~69 FPS, 6 calls
- programci_84 solution (Yes, got it working): ~80 FPS, 6 calls

For now programci_84's is best.

- Your solution: ~45 FPS, 30 calls
- Switch/case: ~29 FPS, 30 calls
- programci_84 solution: ~30FPS, 30 calls

And then I was surprised [smile]
However, even when I have to do some ugly hacking, I will put it into a local variable before using it. The whole thing is part of a bunch of shader includes, but I think I should be able to get this working.

Also thank you for the Rating++, even if I don't understand why [wink]