# How to speed up branch instrctions?

This topic is 4088 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi, all! I have a view dependent constraint in my pixel shader program written in GLSL. See below => ... if(theta > PI*7/4 && theta < PI/4) ... do something1 ... else if( abs(theta - PI_2) < PI_4) ... do something2 ... else if( abs(theta - PI) < PI_4) ... do something3 ... else ... do something4 ... The shader program which involves these branch instructions downs the FPS very much, about only 1/3 FPS of the original non-branch shader program. Could someone know how to encode such branch description or have any suggestions of graphics hardware? I use nVidia 6 series card, but I have heared that ATI's shader model 3.0 implementation is much better than nVidia's. Is it that? Hope someone helps me... thanks a lot!!! :) cavatina

##### Share on other sites
it really depends...provided that gpu pipes are really long branching becomes really tricky. However, what can be stated is the fact that ATi is better in branching that nVidia.

Regarding your code, it really depends what kind of code is executed in the particular branch(for optimal performance it should be equally expensive code) and how coherent are your fragment's execution paths(from your code it seems that this should be ok)

##### Share on other sites
well what you are checking in the ifs is constant
[quote]if(theta > PI*7/4 && theta PI*1.75 && theta <

##### Share on other sites
missing from post:
so why not make them constansts?

If you are going to calculate more than once (why would you?)then change to
if(theta > PI*1.75 && theta < PI*0.25)...

##### Share on other sites
I can only suggest reducing the number if FP operations, e.g.:
// convert angle into quadrant numberint i = (theta + PI_4) * RECIP_PI_2 // RECIP_PI_2 = 2 / PI, or 1 / (PI/2)if (i == 1)  ..else if (i == 2)  ..else if (i == 3)  ..else // i == 0 || i == 4  ..

Skizz

##### Share on other sites
The most important thing for performance of dynamic branching is coherency (as stated). If N fragments in a spatial region takes a different path, you may end up running the program N times for *every* one of the pixels in the region, having 99% of the results simply discarded. Thus it is *very* important to make sure that spatially coherent regions all take the same path through the control flow.

As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64. It's easy to see where differences in performance will occur. Note that next generation cards from both vendors will probably perform similarly if not better than ATI's current cards with respect to dynamic branching.

##### Share on other sites
If not, better to use static branching and also not even use the "else" conditional.

Its simple. You are trying to perform conditionals in a sequential order. Have you thought about doing it in the reverse order? This method will remove the need to use "else" conditionals.

Secondly, sometimes you will find it better to abstract the instructions out of the "if" conditionals. Meaning try not to nest computations inside. This saves quite alot of instructions.

Thirdly, removing the "&&" operator by using small functions, saves 1 instruction each. Naturally, when your code increases, you will realise you can save quite a number of instructions too.

##### Share on other sites
Quote:
 As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.

Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?

##### Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
 As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.

Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?

it's the amount of fragments that is processed by the GPU simultaneously

##### Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
 As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.

Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?

##### Share on other sites
You also might consider moving branches up the pipeline. You mentioned that your shader is computing view dependent effects. You could implement a deferred shading scheme where instead of writing colors to the framebuffer, you write several parameters for each pixel to one of several textures. Then, for the final pass, you draw a screen aligned quad and shade it using your final shader that takes as input the parameters that were stored in textures from the earlier rendering pass. So what does this have to do with branching?

Well, you can write several shaders, one for each control flow condition. Then, you tile several quads across the screen, with different quads corresponding to the different regions where you wanted a certain control flow path to occur. This way, you can change control flow by changing shaders, which moves the branch up the pipeline, so to speak.

This isn't always going to be a good or graceful solution, but if you're having lots of performance problems caused by branching, this is a way to prevent pixels from taking a bunch of different control flow paths that they don't need to take.

##### Share on other sites
Quote:
 Original post by cwhiteYou also might consider moving branches up the pipeline.

Good point - theoretically any control flow can be emulated using multiple passes and predication (in particular, z-cull does a really efficient job of this). This may even be more efficient on some current cards, although those days are numbered.

##### Share on other sites
Thanks for all the kind words...
I will try to tune my shader program from these advices and post the result as soon as possible. thx :)

cavatina

##### Share on other sites
Quote:

Thank you.
This seems a better solution that using different passes doing different shaders and keeps the branch instruction away. Hope I don't miss your meaning... :)

cavatina

##### Share on other sites
Quote:
 Original post by edwinnieAre your branches very big?If not, better to use static branching and also not even use the "else" conditional.Its simple. You are trying to perform conditionals in a sequential order. Have you thought about doing it in the reverse order? This method will remove the need to use "else" conditionals.Secondly, sometimes you will find it better to abstract the instructions out of the "if" conditionals. Meaning try not to nest computations inside. This saves quite alot of instructions.Thirdly, removing the "&&" operator by using small functions, saves 1 instruction each. Naturally, when your code increases, you will realise you can save quite a number of instructions too.

Hi, I try to abstract the real code that "if" condition needs and the FPS arises from 30 to 50. Additionally, I remove "else" conditions and rewrite the code as =>

... do something4 ...
if(...) do something1 ...
if(...) do something2 ...
if(...) do something3 ...

the FPS arises from 50 to 55.
These guildings help me a lot. thanks :)

cavatina

##### Share on other sites

This topic is 4088 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628647
• Total Posts
2984032

• 10
• 9
• 9
• 10
• 21