# Dynamic Branching in HLSL

This topic is 3852 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

So I was reading the other day about using PCF for shadow maps, making the edges smooth. I tried this using 16 samples, obviously it's very slow. So I decided to try a method where I would map out the edges of an object, and only use PCF on those lines, thus not wasting the filtering on areas that don't need to be. Basicly, it went something like this: (C++) render_depth_map render_edge_map (HLSL) if (on edge) preform PCF else do not preform PCF this method works very well, the only problem is, it doesn't not save me ANY speed at run-time. I've tested it multiple times and in different ways and it is acting the way it should, but it's running as slowly as if it were doing PCF on the entire image. Adding and removing the 'if' statment can make a big difference on the visual, but the speed remains the same. Is there some special function or call I'm missing that allows the shader to skip over code it doesn't need? Keep in mind these states need to vary per pixel. I noticed a function mentioned in GLSL called "KIL". This seems to relate to the topic. Any ideas?

##### Share on other sites
Pixels are rendered in batches on most hardware (the exact size varies - some as coarse as 1024 iirc) and for any significant performance gain all pixels in a batch must follow the same branch. Otherwise those pixels taking the quick early-out branch still have to wait for those taking the long route.

You'll find various bits of discussions on tech forums about the frequency of per-pixel values required for optimal performance. Big bold shadows will probably benefit quite easily but high-frequency detail shadows probably won't.

hth
Jack

##### Share on other sites
Oh, one other obvious thing I forgot to mention - have you considered that your bottleneck is NOT texture sampling?

I'd imagine it unlikely in a SM situation, but it's not impossible. Digging in with a tool like NVPerfHUD should reveal details, failing that the usual tricks for detecting bottlenecks might be worth playing with...

hth
Jack

##### Share on other sites
Quote:
 Original post by jollyjeffersPixels are rendered in batches on most hardware (the exact size varies - some as coarse as 1024 iirc) and for any significant performance gain all pixels in a batch must follow the same branch. Otherwise those pixels taking the quick early-out branch still have to wait for those taking the long route.

hmmm, what exactly do you mean by batches? Does this refer to areas on the screen or (in my case) the receiving surfaces of different objects?

If this is the case, how could I modify my code to take advantage of batching. It does make sense the way you describe it.

Oh, hey, here's an idea. What if there was a way to sense in the program (roughly or precise) which objects were receiving shadows or shadow edges. Those objects that were not occluded would not have to preform such high PCF.

##### Share on other sites
Quote:
 Original post by StarStuddedhmmm, what exactly do you mean by batches? Does this refer to areas on the screen or (in my case) the receiving surfaces of different objects?
A batch, as I understand it, is a grouping of pixels that are in flight at the same time - their start/finish is synchronized to some degree. Exactly what size the batch is and how it is defined (e.g. a 2x3 area of screen space or a 16x16 area..) seems to vary across architectures and isn't something I know in great detail unfortunately. It would appear to have some optimal ratio to the number of pixel shading units, ROPs and TMUs a GPU has.

Quote:
 Original post by StarStuddedIf this is the case, how could I modify my code to take advantage of batching.
I've not seen anything to suggest that you can except for using this knowledge to inform your shader design. That is, if you expect the conditional in a branch to vary on a per-pixel or per-every-other-pixel basis then it might not be worth putting the branch in. However, if the conditional might only change every 100 pixels then it becomes a more clear cut case.

Maybe some sort of thresholding would help - instead of "if( all_in_shadow )" try "if( most_in_shadow )"? e.g. using a lt/gt instead of eq operator.

Quote:
 Original post by StarStuddedOh, hey, here's an idea. What if there was a way to sense in the program (roughly or precise) which objects were receiving shadows or shadow edges. Those objects that were not occluded would not have to preform such high PCF.
Nice idea, but I would suspect (feel free to prove me wrong [wink]) that the amount of work to detect this situation would outweigh the advantage you'd get by reducing the workload on those few pixels...

hth
Jack

##### Share on other sites
StarStudded...

You may want to take a look at this presentation from Gamefest about GPU shader performance. It goes into some detail about how quads and vectors of pixels work (those "batches" jollyjeffers was talking about), and other low-level details of how threading and constants work in shaders. Be sure to check around slide 23 for material that pertains to your situation. It's not exactly beginner-level stuff, but its valuable information IMO.

##### Share on other sites
Just because you have an 'if' in your shader, doesn't mean the compiler will actually generate a dynamic branch. You need to check the assembly output to verify whether the compiler really did generate a dynamic branch - it may do all the work for both branches and then just select the appropriate result. You can instruct the compiler to use a real dynamic branch if possible by putting the attribute [branch] immediately before the if statement if you're using a recent SDK.

You can't perform dependent texture reads inside a dynamic branch. If your code has dependent texture reads inside an if statement then they will force the compiler to eliminate the branch. You can work around this by using a texture fetch intrinsic that doesn't require derivatives of the texture coordinates - either tex2Dgrad or tex2Dlod will work inside a branch.

##### Share on other sites
to mattnewport:
"...will force the compiler to eliminate the branch..."
what does eliminate mean? Does it means : when the shader program runs,it will
always execute this branch?

[Edited by - sixwaters on December 5, 2007 1:51:23 AM]

##### Share on other sites
Quote:
 Original post by sixwatersto mattnewport:"...will force the compiler to eliminate the branch..."what does eliminate mean? Does it means : when the shader program runs,it will always execute this branch?

It means that the compiler will generate code that has no branches, and will instead generate code that computes both results and "chooses" based on the conditional in your HLSL. You'll see this happen quite a bit in trivial branches, such as this one:

if (distance > 5)     value = 1;else     value = 0;

##### Share on other sites
To elaborate a bit, you'll often find it boils down to the cmp instruction once compiled.

if( condition )    result = some_complex_function();else    result = some_other_complex_function();

Where you'd expect it to only execute one of the branches, you'll get:

result_a = some_complex_function();
result_b = some_other_complex_function();
result = cmp( condition, result_a, result_b );

In this situation there is no dynamic branching and you execute BOTH branches.

hth
Jack

1. 1
2. 2
3. 3
4. 4
Rutin
17
5. 5

• 11
• 21
• 12
• 11
• 43
• ### Forum Statistics

• Total Topics
631403
• Total Posts
2999877
×