Boolean operations in shader assembly

Started by
14 comments, last by Oogst 10 years, 8 months ago

So in some places i would like to avoid switching shaders and i see that this static bool (constant for shader execution but different on per drawcall basis) and is costing 1 instruction (it says that in docs), is it cheaper then switching shader and is it cheaper then dynamic branching?

Everything to do with performance characteristics is implementation defined, so you'll have to profile to get answers ;) but...
There's two main implementation options the driver could use:
1) It internally performs a shader switch for you. Basically, it takes your supplied shader code, finds all the permutations based on the static branches, and internally creates one compiled shader program for each permutation. Before each draw call, it checks the values of the 16 booleans to pick the appropriate shader code.
In this case, it's the same as if you implemented your own shader permutation system. Switching shaders is basically free, as long as the previous draw-call covered a few hundred pixels.
2) It leaves the branch in there, performing it per-pixel.
In this case, you're probably going to burn a bunch of cycles per pixel in exchange for the convenience of not having to switch shaders. It will likely be faster than a dynamic branch (e.g. branching on the results of some float computations) by a good amount -- e.g. if a dynamic branch instruction takes a dozen cycles to complete, a static branch instruction might take half a dozen cycles...

I just checked, but saturate as a modifier is only available from shader model 4, while I am using shader model 3. According to documentation: http://msdn.microsoft.com/en-us/library/windows/desktop/hh447231(v=vs.85).aspx

The saturate function in HLSL has been around since shader model 1.

That's really weird, because if I look at the asm output that I'm getting from my compiled SM3 code, it does include instructions like mul_sat (which aren't listed on the MSDN instruction reference for SM3...).

The MSDN also shows that the _sat modifier did exist in SM1...

[edit] The ps_2/ps_3 modifiers are documented here (and for vs_3 here). Mystery solved. That page that says that the modifier is only available in SM4+ is just wrong :/

Advertisement

Ok, thanks. I just wanted to know on what to base my choices.

Now i have one more question if you don't mind. Does some intrinsic functions result in branching like min, max, saturate? I don't see how this can be done different way without checking input variable.

Things like min/max/saturate do not branch. They are extremely simple things that the hardware can just do directly, no need to jump to a different spot in code for that.

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

1. So in some places i would like to avoid switching shaders and i see that this static bool (constant for shader execution but different on per drawcall basis) and is costing 1 instruction (it says that in docs), is it cheaper then switching shader and is it cheaper then dynamic branching?

Just a word of warning about this line of thinking.

On the surface it looks like "it's just one extra instruction, I'll eat it, it's no bother".

It's not that simple. Assuming that this shader is going to cover every pixel in your window, assuming that you have perfect overdraw elimination (hint: you don't), and assuming a 1600x900 resolution, it's actually just under 1.5 million extra instructions. That's what you should be comparing the cost of a shader switch against.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Does some intrinsic functions result in branching like min, max, saturate? I don't see how this can be done different way without checking input variable.

No, the instruction set will contain instructions for performing those operations without branching... or in other words, any branching that is required internally by those algorithms is embedded into the silicon and doesn't count.
e.g. when you write a + b, the hardware might have to make a bunch of decisions based on the sign of a and b (i.e. to actually perform subtraction), but all of that logic is embedded right into the addition hardware, so it all gets done in a single clock cycle.
The logic for min/max/saturate/etc is also built right into the hardware, so no branching of the code is required.

Also, note that there's a lot of other things that you can do in shaders without branching, which you'd traditionally use if statements for in CPU-side code.
e.g. instead of:
if( g_PowerupAmount >= 0.5 ) color = yellow;
else color = white;
You can actually perform that kind of selection without a branch:
color = powerupAmount >= 0.5 ? yellow : white;
//in pseudo asm:
//sub temp powerupAmount 0.5
//cmp color temp yellow white

Just a word of warning about this line of thinking.

On the surface it looks like "it's just one extra instruction, I'll eat it, it's no bother".

It's not that simple. Assuming that this shader is going to cover every pixel in your window, assuming that you have perfect overdraw elimination (hint: you don't), and assuming a 1600x900 resolution, it's actually just under 1.5 million extra instructions. That's what you should be comparing the cost of a shader switch against.

It is indeed a lot of instructions, but also keep in mind that a modern videocard happily does much more. My 3 year old videocard easily does a 1500 instructions post effect on 1920x1200. That means a whopping 3,456,000,000 instructions per frame for just that post effect, and my 3 years old videocard easily does this will above 60fps. So in comparison to what modern videocards can do, 1.5 million instructions is peanuts...

Which is no reason to just throw away performance, of course. :)

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

This topic is closed to new replies.

Advertisement