[HLSL] Dynamic branching performance

Graphics and GPU Programming Programming

Started by Aqua Costa May 19, 2011 12:04 PM

7 comments, last by _the_phantom_ 12 years, 11 months ago

3,705

Author

May 19, 2011 12:04 PM

How can I know when dynamic branching ruins performance?

In my shaders I , often, use if() and for() is there any "rules" to make dynamic branching faster?

21st Century Moose

13,459

May 19, 2011 04:18 PM

If you can put it in your vertex shader rather than your pixel shader it won't have as much overhead. That's a good general principle with any shader operation: where possible move stuff back to the vertex shader.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

MJP

20,295

May 19, 2011 07:08 PM

How can I know when dynamic branching ruins performance?

Profile. Do it often. AMD's GPU PerfStudio can give you statistics about how many threads take a branch, or don't, and how much wasted work is done because of the branches. I'm not sure if Parallel Nsight can do the same, but I would assume it has some useful info.

In my shaders I , often, use if() and for() is there any "rules" to make dynamic branching faster?

Well first of all, using an "if" or a "for" does not automatically mean you're getting a branch in your shader. The compiler can flatten branches and unroll loops if it's possible to do so. You should check the shader assembly to verify. You can also use attributes to force the compiler to flatten or unroll.

Anyway the number 1 rule for dynamic branching is coherence. You need lots of adjacent pixels (usually within a 32x32 or 64x64 block) to take the same branch, otherwise you end up with all of those pixels taking both branches and doing wasted work. So if you're trying to use dynamic branching as an optimization, only do it for things where the branch will be the same for large portions of the screen. Also it helps to only use a branch to try to skip large sections of code as opposed to smaller ones, since having a branch adds some fixed overhead itself.

The Blog | The Book

Aqua Costa

3,705

Author

May 19, 2011 08:34 PM

What does it mean to flatten a branch?

The assembly of one of my shader that use if() looks like this:



if_nz r2.y

  mov r3.x, r1.z

  mov r3.z, r1.w

  sample_l r4.xyzw, r3.xzxx, t0.xyzw, s0, l(0.000000)

  mov r4.x, r4.x

  add r3.y, r2.x, r3.z

  sample_l r5.xyzw, r3.xyxx, t0.xyzw, s0, l(0.000000)

  add r2.y, r4.x, r5.x

  mov r2.z, -r2.x

  add r3.w, r2.z, r3.z

  sample_l r4.xyzw, r3.xwxx, t0.xyzw, s0, l(0.000000)

  add r2.y, r2.y, r4.x

  add r3.y, r2.z, r3.x

  sample_l r4.xyzw, r3.yzyy, t0.xyzw, s0, l(0.000000)

  add r2.y, r2.y, r4.x

  add r3.x, r2.x, r3.x

  sample_l r3.xyzw, r3.xzxx, t0.xyzw, s0, l(0.000000)

  add r2.x, r2.y, r3.x

else 

  mov r2.x, l(-100000.000000)

endif

I guess this code is branched right? What would a flattened code look like?

Quat

569

May 19, 2011 08:51 PM

It would not have if/else instructions. It would probably have a lerp call.

-----Quat

MJP

20,295

May 19, 2011 09:43 PM

"flattened" means the there is no branch instructions, and some other means is used to calculate the correct value. Typically this is done with a cmp (compare) instruction, but it can be done in other ways. Your assembly has "if" and "else" instructions which are the branching instructions.

The Blog | The Book

_the_phantom_

11,263

May 19, 2011 10:40 PM

In the case of the posted code I would just reverse the 'if' condition as that would likely be a 'win'... of course what I'd be more concerned about given that code is what looks like a lot of stalling from using a texture sample result right away...

Aqua Costa

3,705

Author

May 19, 2011 10:46 PM

In the case of the posted code I would just reverse the 'if' condition as that would likely be a 'win'... of course what I'd be more concerned about given that code is what looks like a lot of stalling from using a texture sample result right away...

I have to sample texture multiple times to smooth the terrain...

_the_phantom_

11,263

May 19, 2011 11:15 PM

I wasn't questioning your need to sample the texture, I was pointing out that as it currently stands (unless my asm reading is very very rusty) you are saying;

- sample texture
- use value right away
- sample texture
- use value right away

etc etc

If the data from the sample instruction isn't ready when you want to use it (as texture fetches have a high latency on them if not in cache) then the thread will be stalled out, if the gpu can't find more threads to fill the work while yours are stalled then gpu cycles go to waste while it waits for data to come back.

[HLSL] Dynamic branching performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

[HLSL] Dynamic branching performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines