# How to speed up branch instrctions?

This topic is 4280 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi, all! I have a view dependent constraint in my pixel shader program written in GLSL. See below => ... if(theta > PI*7/4 && theta < PI/4) ... do something1 ... else if( abs(theta - PI_2) < PI_4) ... do something2 ... else if( abs(theta - PI) < PI_4) ... do something3 ... else ... do something4 ... The shader program which involves these branch instructions downs the FPS very much, about only 1/3 FPS of the original non-branch shader program. Could someone know how to encode such branch description or have any suggestions of graphics hardware? I use nVidia 6 series card, but I have heared that ATI's shader model 3.0 implementation is much better than nVidia's. Is it that? Hope someone helps me... thanks a lot!!! :) cavatina

##### Share on other sites
it really depends...provided that gpu pipes are really long branching becomes really tricky. However, what can be stated is the fact that ATi is better in branching that nVidia.

Regarding your code, it really depends what kind of code is executed in the particular branch(for optimal performance it should be equally expensive code) and how coherent are your fragment's execution paths(from your code it seems that this should be ok)

##### Share on other sites
well what you are checking in the ifs is constant
if(theta > PI*7/4 && theta PI*1.75 && theta <

##### Share on other sites
missing from post:
so why not make them constansts?

If you are going to calculate more than once (why would you?)then change to
if(theta > PI*1.75 && theta < PI*0.25)...

##### Share on other sites
I can only suggest reducing the number if FP operations, e.g.:
// convert angle into quadrant numberint i = (theta + PI_4) * RECIP_PI_2 // RECIP_PI_2 = 2 / PI, or 1 / (PI/2)if (i == 1)  ..else if (i == 2)  ..else if (i == 3)  ..else // i == 0 || i == 4  ..

Skizz

##### Share on other sites
The most important thing for performance of dynamic branching is coherency (as stated). If N fragments in a spatial region takes a different path, you may end up running the program N times for *every* one of the pixels in the region, having 99% of the results simply discarded. Thus it is *very* important to make sure that spatially coherent regions all take the same path through the control flow.

As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64. It's easy to see where differences in performance will occur. Note that next generation cards from both vendors will probably perform similarly if not better than ATI's current cards with respect to dynamic branching.

##### Share on other sites
If not, better to use static branching and also not even use the "else" conditional.

Its simple. You are trying to perform conditionals in a sequential order. Have you thought about doing it in the reverse order? This method will remove the need to use "else" conditionals.

Secondly, sometimes you will find it better to abstract the instructions out of the "if" conditionals. Meaning try not to nest computations inside. This saves quite alot of instructions.

Thirdly, removing the "&&" operator by using small functions, saves 1 instruction each. Naturally, when your code increases, you will realise you can save quite a number of instructions too.

##### Share on other sites
Quote:
 As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.

Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?

##### Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
 As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.

Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?

it's the amount of fragments that is processed by the GPU simultaneously

##### Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
 As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.

Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?

1. 1
2. 2
3. 3
4. 4
Rutin
18
5. 5

• 11
• 22
• 12
• 12
• 11
• ### Forum Statistics

• Total Topics
631406
• Total Posts
2999904
×