Boolean operations in shader assembly

Started by
14 comments, last by Oogst 10 years, 8 months ago
What is the best way to do boolean operations in shader assembly for shader model 3.0 DirectX? So far the only way I have seen is smartly using floats to represent them, is there something easier/more efficient/more direct?

For example, if I want to do something like this:

if (a < b && (c < d || e < f))
{
    ...
}

I could break this down to roughly this (slightly simplified from assembly):

if_lt a, b
{
    sub t, c, d
    cmp t, t, 0, 1
    sub s, e, f
    cmp s, s, 0, 1
    add t, t, s
    if_gt t, 0.5
    {
        ...
    }
}

Is there some better way to do this, or is this just how I am supposed to implement this?

(By the way, the reason I have recently started learning shader assembly is that I have a shader for which both Cg and HLSL require too many temporaries to run, while I know it can be done with less.)

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

Advertisement

So far the only way I have seen is smartly using floats to represent them, is there something easier/more efficient/more direct?

If they're dynamic branches (decisions that have to be made per-pixel/vertex), then yeah, you have to use float because there are no integer/bool registers in SM3.

The best way to deal with branching is to avoid doing branching laugh.png cool.png

In your example, you end up using two different branches (one nested in the other). It would likely be much better to just use a single branch, due to how expensive they are on SM3 hardware...


By the way, the reason I have recently started learning shader assembly is that I have a shader for which both Cg and HLSL require too many temporaries to run, while I know it can be done with less

Does the compiler fail due to this, or the runtime?

Usually you can massage your HLSL code to produce better asm, rather then writing the asm yourself (e.g. by using the HLSL attributes such as branch, flatten, fastopt, forcecase, call, unroll, loop, isolate, or by writing more asm-like code).

e.g. to tell the compiler to use a series of cmp instructions and arithmetic for that branch, you could try something like:


[branch]
if( 0 < step(a,b) * (step(c,d) + step(e,f)) )

...you have to use float because there are no integer/bool registers in SM3.

Then WTF is these used for?

...you have to use float because there are no integer/bool registers in SM3.

Then WTF is these used for?

As noted on the page that deals with how to use those (http://msdn.microsoft.com/en-us/library/windows/desktop/bb174580(v=vs.85).aspx) that's for static flow control. i.e. not determined at the shader execution time.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Yeah I meant actual dynamic variables in the shader functions (i.e. non-uniform ones).
You can have 16 'constant' (uniform) bools and 16 ints, but these types don't really exist at runtime - there's no instructions to operate on them. If you want to work with them, you'll be copying the results into temporary float registers.

As Washu mentions above, the 16 bool constants are generally used as a 16bit mask that controls static branching.

Ok. If you don't mind i would like to reuse this topic for some questions.

1. So in some places i would like to avoid switching shaders and i see that this static bool (constant for shader execution but different on per drawcall basis) and is costing 1 instruction (it says that in docs), is it cheaper then switching shader and is it cheaper then dynamic branching?

2. I have this situation for parallel split shadow maps, which of these are cheaper:

this:

if(posVS.z > SplitDist.x && posVS.z <= SplitDist.y) // if in split range
{ ...lots_of_calculus() }

or this:

clip(posVS.z - SplitDist.x);
clip(SplitDist.y - posVS.z);
...lots_of_calculus()
Thanks folks, I'll just work with the floats then! smile.png

1. So in some places i would like to avoid switching shaders and i see that this static bool (constant for shader execution but different on per drawcall basis) and is costing 1 instruction (it says that in docs), is it cheaper then switching shader and is it cheaper then dynamic branching?

This cannot be answered in the general case, because the shader operates per pixel (if it is a pixel shader), and the shader switching is per object. It is impossible to compare the two in anything but a specific situation, because the scene might be a few really large objects (tons of pixels, few shader switches) or a ton of small objects (few pixels, lots of shader switches).

Also, how long shader switching takes, depends on CPU, GPU, DirectX-version and drivers...

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

While we are at the topic of shader assembly: is there a shader instruction equivalent to the Cg/HLSL/GLSL function saturate(x)? This function limits a value to the range [0, 1]. Since it is an intrinsic function in Cg, HLSL and GLSL, I expected to see it in shader assembly as well, but I couldn't find it...

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

Saturate is supported as a modifier for certain instructions. So if you do something like x = saturate(y * z), you'll end up with a mul_sat instruction in your assembly.

This actually maps to how GPU's implement saturate in their native microcode.

I just checked, but saturate as a modifier is only available from shader model 4, while I am using shader model 3. According to documentation: http://msdn.microsoft.com/en-us/library/windows/desktop/hh447231(v=vs.85).aspx

The saturate function in HLSL has been around since shader model 1.

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

This topic is closed to new replies.

Advertisement