Sign in to follow this  
Followers 0
batchprogram

Too many instructions in pixel shader

11 posts in this topic

I've been working on a pixel shader that accepts up to 16 colors, 8 for "input" and 8 for "output". If a color in a sample matches the color in the input data, it swaps it out with the output color of the same index.
I'm getting an error stating that my shader has too many instructions, and that shader model 2.0 only supports 64. I'm pretty sure this is because my shader isn't optimized, as every other solution I've tried won't compile.

Shader:
[code]uniform extern texture Sample;

sampler ScreenS = sampler_state
{
Texture = <Sample>;
};

float4 input[8];
float4 output[8];

float4 Swap(float2 texCoord: TEXCOORD0) : COLOR
{
float4 color = tex2D(ScreenS, texCoord.xy);

if (color.r == input[0].r && color.g == input[0].g && color.b == input[0].B)
color = output[0];
else if (color.r == input[1].r && color.g == input[1].g && color.b == input[1].B)
color = output[1];
else if (color.r == input[2].r && color.g == input[2].g && color.b == input[2].B)
color = output[2];
else if (color.r == input[3].r && color.g == input[3].g && color.b == input[3].B)
color = output[3];
else if (color.r == input[4].r && color.g == input[4].g && color.b == input[4].B)
color = output[4];
else if (color.r == input[5].r && color.g == input[5].g && color.b == input[5].B)
color = output[5];/*
else if (color.r == input[6].r && color.g == input[6].g && color.b == input[6].B)
color = output[6];
else if (color.r == input[7].r && color.g == input[7].g && color.b == input[7].B)
color = output[7];*/

return color;
}
technique
{
pass P0
{
PixelShader = compile ps_2_0 Swap();
}
}[/code]


If the comment is placed at the end of the else-if statement right above it's current position, it compiles fine, but it doesn't check the last 3 colors.

How can this code be improved so that my criteria is met and uses <= 64 instructions?
0

Share this post


Link to post
Share on other sites
Can you do this in two or more passes?

Maybe run the shader once with only 4 swap colors, then take the result of that and pass it through again with your last 4 colors.
0

Share this post


Link to post
Share on other sites
Can you zero out (or at least control) the alpha channel, and then do a full float4-float4 compare instead of a component-wise compare? That should drop the number of instructions to a third, if my sleep-addled brain is reliable at this hour :-)
0

Share this post


Link to post
Share on other sites
Still needs a "scalarization" to make it work with an if IIRC, something along the line of
[code]
float3 one = float3(1,1,1);
if(dot(one, color.rgb == input[0].rgb) == 3) return output[0];
// ...
[/code]
Should still save a couple of instructions IMO. Or have I misunderstood, ApochPiQ ?

Anyway: With SM 2 you hit the limit rather soon, unfortunately. What are you trying to do exactly ? Looks like some (re)paletting. Could you do a preprocess (read: content creation) of your textures and store the indices instead of the colors ? That would give you the possibility to use an indirection with a second texture. SM 2 should allow this and the instruction count would decrease considerably, even with lots more colors, if the need arises.

Edit: Be careful when comparing floats for equality. E.g. I'm not sure if my example is sane in this regard.
0

Share this post


Link to post
Share on other sites
Try comparing them like this. It compiled to 26 instructions for me. Instead of comparing to zero you can also test for the result being less than some threshold to catch colours that are close but not exactly equal:

Note that the for loop is equivalent to the eight individual tests, the compiler will unroll it for you.

[code]
uniform extern texture Sample;

sampler ScreenS = sampler_state
{
Texture = <Sample>;
};

float4 input[8];
float4 output[8];

float4 Swap(float2 texCoord: TEXCOORD0) : COLOR
{
float4 color = tex2D(ScreenS, texCoord.xy);

for (int i=0; i < 8; i++)
{
if (dot(color.rgb - input[i].rgb, color.rgb - input[i].rgb) <= 0.0f)
color = output[i];
}

return color;
}
technique
{
pass P0
{
PixelShader = compile ps_2_0 Swap();
}
}
[/code]
1

Share this post


Link to post
Share on other sites
Ok if's can be really gnarly so here goes my guidelines 'from the trenches' :)

[list][*]Prefer to use the ternary operator. 3 or 4 comparisons is better than one, and frequently doesn't cause real branches in the resulting shader asm.[*]Avoid if when possible[*]Avoid && and || when possible[*]else is ok[*]'else if' is not[*]The more code inside the {} of if or else statements makes things harder on the compiler. In a perfect world the only thing in them is a trivial assignment like x=y on on side and x=z on the other.[*]dot and mad and lerp are your most powerful instructions (for the amount of work they do in 1 asm statement). You can't directly write mad's in HLSL but it understands them when it finds them when it sees things like (A*B+C)[*]swizzles are also powerful, they are free (so is saturate)[/list]

The compilers can get really messed up with all the code paths, since the real hardware doesn't really branch a lot of the time. In a lot of cases the resulting code from each side of the branch gets unrolled, and sometimes even the code after all the ifs are resolved gets duplicated for each path.

So I would write this like this:

Provide a copy of the inputcolors as if they were a pair of 4x4 matrices, and transpose them so all the red's are in one row etc.


[code]
float4 Swap(float2 texCoord: TEXCOORD0) : COLOR
{
float4 color = tex2D(ScreenS, texCoord.xy);

float4 testcolor0 = (color.rrrr == input[0]) ? float4(1,1,1,1) : float4(0,0,0,0)
float4 testcolor1 = (color.gggg == input[1]) ? float4(1,1,1,1) : float4(0,0,0,0)
float4 testcolor2 = (color.bbbb == input[2]) ? float4(1,1,1,1) : float4(0,0,0,0)

float4 testABCD = testcolor0 + testcolor1 + testcolor2;

float4 testcolor4 = (color.rrrr == input[4]) ? float4(1,1,1,1) : float4(0,0,0,0)
float4 testcolor5 = (color.gggg == input[5]) ? float4(1,1,1,1) : float4(0,0,0,0)
float4 testcolor6 = (color.bbbb == input[6]) ? float4(1,1,1,1) : float4(0,0,0,0)

float4 testEFGH = testcolor4 + testcolor5 + testcolor6;

float4 outcolor = color;

float three = 3;
// Reverse order from your if else block, this is to get the same results

outcolor = (testEFGH.aaaa == three.xxxx) ? output[7] : outcolor ;
outcolor = (testEFGH.bbbb == three.xxxx) ? output[6] : outcolor ;
outcolor = (testEFGH.gggg == three.xxxx) ? output[5] : outcolor ;
outcolor = (testEFGH.rrrr == three.xxxx) ? output[4] : outcolor ;
outcolor = (testABCD.aaaa == three.xxxx) ? output[3] : outcolor ;
outcolor = (testABCD.bbbb == three.xxxx) ? output[2] : outcolor ;
outcolor = (testABCD.gggg == three.xxxx) ? output[1] : outcolor ;
outcolor = (testABCD.rrrr == three.xxxx) ? output[0] : outcolor ;

return outcolor;
}
[/code]

This should come out to around 18 instructions
1

Share this post


Link to post
Share on other sites
In the original post I mean to say 'arithmetic' instructions.
I resolved that by rewriting the shader, however a few problems still persist.

[code]float4 Swap(float2 texCoord: TEXCOORD0) : COLOR
{
float4 original = tex2D(ScreenS, texCoord.xy);
if (original.a == 0)
return original;
float3 color = original.rgb;

for (int i = 0; i < 8; i++)
if (!all(color.rgb - input[i].rgb))
color = output[i];

return float4(color.rgb, original.a);
}[/code]


Here's the psuedocode:
[code]Store pixel in 32bit original
If the pixels alpha is 0 Then return original
Store pixel in 24bit color
Loop 8 times using indexer i
If NOT color rgb values minus input[i] rgb values are all non-zero Then Set color to the value of output[i]
End Loop

Store color in 32bit returnValue
Return returnValue
[/code]


The first return statement results in a compile error. It needs to return early so that fully transparent pixels are not processed. Also, it processes every pixel with the same color as input, setting it to (0, 0, 255), which strangely isn't a value within the input or output array.
Are shaders not allowed to return early, or am I doing it wrong?
0

Share this post


Link to post
Share on other sites
[quote name='batchprogram' timestamp='1297197927' post='4771509']
!all(color.rgb - input[i].rgb)[/quote]
That totally doesn't do what you think it does. Perhaps you're looking for !any()?

[quote]
It needs to return early so that fully transparent pixels are not processed.
[/quote]
Why?
0

Share this post


Link to post
Share on other sites
You should treat early outs in shaders like the mythical thing they are:

Several things all conspire to foil it:

[list][*]Pixel shaders run in blocks, the slowest pixel in the block stalls them all. This ties into the language features of the any/all keyword modifiers to flow control.[*]Most branches compile down to executing both paths at all times as true branches are typically extremely expensive on the hardware. It is probably a lot better on DX10-11 hardware but haven't had time to look into it as I work on the older stuff still.[*]clip doesnt keep the rest of the shader from running, it just keeps the shader's output from 'counting'. It will (sometimes) avoid texture fetches which reduces pressure on the texture fetch units and texture cache but thats about it.[*]Aside from trivial examples, shaders are slower than the blend unit's throughput, so rendering a chain link fence with a masked texture is going to be surprisingly expensive (many times removing the clip instruction is the last thing you can do to speed it up!, as it takes an instruction slot on the hardware).[/list]
0

Share this post


Link to post
Share on other sites
[quote]
[quote]
!all(color.rgb - input[i].rgb)
[/quote]
That totally doesn't do what you think it does. Perhaps you're looking for !any()?
[/quote]
The any function according to the hlsl reference docs: "This function is similar to the [url="http://msdn.microsoft.com/en-us/library/bb509564(v=vs.85).aspx"][b]all[/b][/url] HLSL intrinsic function. The [b]any[/b] function determines if any components of the specified value are non-zero, while the [b]all[/b] function determines if all components of the specified value are non-zero."

[font="Segoe UI"][size="2"]The point in subtracting the colors is to see if there is a difference. If there's a difference then it can be automatically assumed that they aren't the same color. !all means if they are all zero. The any function doesn't return true if all of them are non-zero. Using it would result in pixels with 0 in the red, green, or blue values changing, seeming as if random pixels were modified.
[/size][/font]

[quote]
[quote]
It needs to return early so that fully transparent pixels are not processed.
[/quote]
Why?
[/quote]
Fully transparent pixels won't be visible, so to process them would be a big waste of processing time. Over half the pixels in the images I'm rendering are fully transparent.
0

Share this post


Link to post
Share on other sites
[quote name='batchprogram' timestamp='1297208035' post='4771605']
Fully transparent pixels won't be visible, so to process them would be a big waste of processing time. Over half the pixels in the images I'm rendering are fully transparent.
[/quote]

Early outs in a shader almost never help with this problem. You should assume fully transparent pixels (via clip or alpha test) cost the same as all the others, because most of the time, they do.
0

Share this post


Link to post
Share on other sites
[quote name='batchprogram' timestamp='1297208035' post='4771605']
[quote]
[quote]
!all(color.rgb - input[i].rgb)
[/quote]
That totally doesn't do what you think it does. Perhaps you're looking for !any()?
[/quote]
The any function according to the hlsl reference docs: "This function is similar to the [url="http://msdn.microsoft.com/en-us/library/bb509564%28v=vs.85%29.aspx"][b]all[/b][/url] HLSL intrinsic function. The [b]any[/b] function determines if any components of the specified value are non-zero, while the [b]all[/b] function determines if all components of the specified value are non-zero."[/quote]
Let's walk through this. Let's suppose color.rgb is (4,5,6) and input[i].rgb is (1,2,6). That is, one of the channels is the same, but not all of them.

So color.rgb - input[i].rgb is (3,3,0).

So all(color.rgb - input[i].rgb) is false.

So !all(color.rgb - input[i].rgb) is true.

See the issue yet?

[quote]
!all means if they are all zero.
[/quote]
It does not. !all(a,b,c) is not the same as all(!a,!b,!c). However, !any(a,b,c) IS the same as all(!a,!b,!c).

For more info, read [url="http://en.wikipedia.org/wiki/De_Morgan%27s_laws"]this[/url].
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0