Too many instructions in pixel shader

Started by
10 comments, last by Sneftel 13 years, 2 months ago
I've been working on a pixel shader that accepts up to 16 colors, 8 for "input" and 8 for "output". If a color in a sample matches the color in the input data, it swaps it out with the output color of the same index.
I'm getting an error stating that my shader has too many instructions, and that shader model 2.0 only supports 64. I'm pretty sure this is because my shader isn't optimized, as every other solution I've tried won't compile.

Shader:
uniform extern texture Sample;

sampler ScreenS = sampler_state
{
Texture = <Sample>;
};

float4 input[8];
float4 output[8];

float4 Swap(float2 texCoord: TEXCOORD0) : COLOR
{
float4 color = tex2D(ScreenS, texCoord.xy);

if (color.r == input[0].r && color.g == input[0].g && color.b == input[0].B)
color = output[0];
else if (color.r == input[1].r && color.g == input[1].g && color.b == input[1].B)
color = output[1];
else if (color.r == input[2].r && color.g == input[2].g && color.b == input[2].B)
color = output[2];
else if (color.r == input[3].r && color.g == input[3].g && color.b == input[3].B)
color = output[3];
else if (color.r == input[4].r && color.g == input[4].g && color.b == input[4].B)
color = output[4];
else if (color.r == input[5].r && color.g == input[5].g && color.b == input[5].B)
color = output[5];/*
else if (color.r == input[6].r && color.g == input[6].g && color.b == input[6].B)
color = output[6];
else if (color.r == input[7].r && color.g == input[7].g && color.b == input[7].B)
color = output[7];*/

return color;
}
technique
{
pass P0
{
PixelShader = compile ps_2_0 Swap();
}
}



If the comment is placed at the end of the else-if statement right above it's current position, it compiles fine, but it doesn't check the last 3 colors.

How can this code be improved so that my criteria is met and uses <= 64 instructions?
Advertisement
Can you do this in two or more passes?

Maybe run the shader once with only 4 swap colors, then take the result of that and pass it through again with your last 4 colors.
[size=2]My Projects:
[size=2]Portfolio Map for Android - Free Visual Portfolio Tracker
[size=2]Electron Flux for Android - Free Puzzle/Logic Game
Can you zero out (or at least control) the alpha channel, and then do a full float4-float4 compare instead of a component-wise compare? That should drop the number of instructions to a third, if my sleep-addled brain is reliable at this hour :-)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Still needs a "scalarization" to make it work with an if IIRC, something along the line of

float3 one = float3(1,1,1);
if(dot(one, color.rgb == input[0].rgb) == 3) return output[0];
// ...

Should still save a couple of instructions IMO. Or have I misunderstood, ApochPiQ ?

Anyway: With SM 2 you hit the limit rather soon, unfortunately. What are you trying to do exactly ? Looks like some (re)paletting. Could you do a preprocess (read: content creation) of your textures and store the indices instead of the colors ? That would give you the possibility to use an indirection with a second texture. SM 2 should allow this and the instruction count would decrease considerably, even with lots more colors, if the need arises.

Edit: Be careful when comparing floats for equality. E.g. I'm not sure if my example is sane in this regard.
Try comparing them like this. It compiled to 26 instructions for me. Instead of comparing to zero you can also test for the result being less than some threshold to catch colours that are close but not exactly equal:

Note that the for loop is equivalent to the eight individual tests, the compiler will unroll it for you.


uniform extern texture Sample;

sampler ScreenS = sampler_state
{
Texture = <Sample>;
};

float4 input[8];
float4 output[8];

float4 Swap(float2 texCoord: TEXCOORD0) : COLOR
{
float4 color = tex2D(ScreenS, texCoord.xy);

for (int i=0; i < 8; i++)
{
if (dot(color.rgb - input.rgb, color.rgb - input.rgb) <= 0.0f)
color = output;
}

return color;
}
technique
{
pass P0
{
PixelShader = compile ps_2_0 Swap();
}
}
Ok if's can be really gnarly so here goes my guidelines 'from the trenches' :)

  • Prefer to use the ternary operator. 3 or 4 comparisons is better than one, and frequently doesn't cause real branches in the resulting shader asm.
  • Avoid if when possible
  • Avoid && and || when possible
  • else is ok
  • 'else if' is not
  • The more code inside the {} of if or else statements makes things harder on the compiler. In a perfect world the only thing in them is a trivial assignment like x=y on on side and x=z on the other.
  • dot and mad and lerp are your most powerful instructions (for the amount of work they do in 1 asm statement). You can't directly write mad's in HLSL but it understands them when it finds them when it sees things like (A*B+C)
  • swizzles are also powerful, they are free (so is saturate)


The compilers can get really messed up with all the code paths, since the real hardware doesn't really branch a lot of the time. In a lot of cases the resulting code from each side of the branch gets unrolled, and sometimes even the code after all the ifs are resolved gets duplicated for each path.

So I would write this like this:

Provide a copy of the inputcolors as if they were a pair of 4x4 matrices, and transpose them so all the red's are in one row etc.



float4 Swap(float2 texCoord: TEXCOORD0) : COLOR
{
float4 color = tex2D(ScreenS, texCoord.xy);

float4 testcolor0 = (color.rrrr == input[0]) ? float4(1,1,1,1) : float4(0,0,0,0)
float4 testcolor1 = (color.gggg == input[1]) ? float4(1,1,1,1) : float4(0,0,0,0)
float4 testcolor2 = (color.bbbb == input[2]) ? float4(1,1,1,1) : float4(0,0,0,0)

float4 testABCD = testcolor0 + testcolor1 + testcolor2;

float4 testcolor4 = (color.rrrr == input[4]) ? float4(1,1,1,1) : float4(0,0,0,0)
float4 testcolor5 = (color.gggg == input[5]) ? float4(1,1,1,1) : float4(0,0,0,0)
float4 testcolor6 = (color.bbbb == input[6]) ? float4(1,1,1,1) : float4(0,0,0,0)

float4 testEFGH = testcolor4 + testcolor5 + testcolor6;

float4 outcolor = color;

float three = 3;
// Reverse order from your if else block, this is to get the same results

outcolor = (testEFGH.aaaa == three.xxxx) ? output[7] : outcolor ;
outcolor = (testEFGH.bbbb == three.xxxx) ? output[6] : outcolor ;
outcolor = (testEFGH.gggg == three.xxxx) ? output[5] : outcolor ;
outcolor = (testEFGH.rrrr == three.xxxx) ? output[4] : outcolor ;
outcolor = (testABCD.aaaa == three.xxxx) ? output[3] : outcolor ;
outcolor = (testABCD.bbbb == three.xxxx) ? output[2] : outcolor ;
outcolor = (testABCD.gggg == three.xxxx) ? output[1] : outcolor ;
outcolor = (testABCD.rrrr == three.xxxx) ? output[0] : outcolor ;

return outcolor;
}


This should come out to around 18 instructions
http://www.gearboxsoftware.com/
In the original post I mean to say 'arithmetic' instructions.
I resolved that by rewriting the shader, however a few problems still persist.

float4 Swap(float2 texCoord: TEXCOORD0) : COLOR
{
float4 original = tex2D(ScreenS, texCoord.xy);
if (original.a == 0)
return original;
float3 color = original.rgb;

for (int i = 0; i < 8; i++)
if (!all(color.rgb - input.rgb))
color = output;

return float4(color.rgb, original.a);
}



Here's the psuedocode:
Store pixel in 32bit original
If the pixels alpha is 0 Then return original
Store pixel in 24bit color
Loop 8 times using indexer i
If NOT color rgb values minus input rgb values are all non-zero Then Set color to the value of output
End Loop

Store color in 32bit returnValue
Return returnValue



The first return statement results in a compile error. It needs to return early so that fully transparent pixels are not processed. Also, it processes every pixel with the same color as input, setting it to (0, 0, 255), which strangely isn't a value within the input or output array.
Are shaders not allowed to return early, or am I doing it wrong?

!all(color.rgb - input.rgb)

That totally doesn't do what you think it does. Perhaps you're looking for !any()?


It needs to return early so that fully transparent pixels are not processed.
[/quote]
Why?
You should treat early outs in shaders like the mythical thing they are:

Several things all conspire to foil it:

  • Pixel shaders run in blocks, the slowest pixel in the block stalls them all. This ties into the language features of the any/all keyword modifiers to flow control.
  • Most branches compile down to executing both paths at all times as true branches are typically extremely expensive on the hardware. It is probably a lot better on DX10-11 hardware but haven't had time to look into it as I work on the older stuff still.
  • clip doesnt keep the rest of the shader from running, it just keeps the shader's output from 'counting'. It will (sometimes) avoid texture fetches which reduces pressure on the texture fetch units and texture cache but thats about it.
  • Aside from trivial examples, shaders are slower than the blend unit's throughput, so rendering a chain link fence with a masked texture is going to be surprisingly expensive (many times removing the clip instruction is the last thing you can do to speed it up!, as it takes an instruction slot on the hardware).
http://www.gearboxsoftware.com/


!all(color.rgb - input.rgb)
[/quote]
That totally doesn't do what you think it does. Perhaps you're looking for !any()?
[/quote]
The any function according to the hlsl reference docs: "This function is similar to the all HLSL intrinsic function. The any function determines if any components of the specified value are non-zero, while the all function determines if all components of the specified value are non-zero."

[font="Segoe UI"]The point in subtracting the colors is to see if there is a difference. If there's a difference then it can be automatically assumed that they aren't the same color. !all means if they are all zero. The any function doesn't return true if all of them are non-zero. Using it would result in pixels with 0 in the red, green, or blue values changing, seeming as if random pixels were modified.
[/font]



It needs to return early so that fully transparent pixels are not processed.
[/quote]
Why?
[/quote]
Fully transparent pixels won't be visible, so to process them would be a big waste of processing time. Over half the pixels in the images I'm rendering are fully transparent.

This topic is closed to new replies.

Advertisement