How to speed up branch instrctions?

Started by
13 comments, last by cavatina 17 years, 7 months ago
Hi, all! I have a view dependent constraint in my pixel shader program written in GLSL. See below => ... if(theta > PI*7/4 && theta < PI/4) ... do something1 ... else if( abs(theta - PI_2) < PI_4) ... do something2 ... else if( abs(theta - PI) < PI_4) ... do something3 ... else ... do something4 ... The shader program which involves these branch instructions downs the FPS very much, about only 1/3 FPS of the original non-branch shader program. Could someone know how to encode such branch description or have any suggestions of graphics hardware? I use nVidia 6 series card, but I have heared that ATI's shader model 3.0 implementation is much better than nVidia's. Is it that? Hope someone helps me... thanks a lot!!! :) cavatina
Advertisement
it really depends...provided that gpu pipes are really long branching becomes really tricky. However, what can be stated is the fact that ATi is better in branching that nVidia.

Regarding your code, it really depends what kind of code is executed in the particular branch(for optimal performance it should be equally expensive code) and how coherent are your fragment's execution paths(from your code it seems that this should be ok)
well what you are checking in the ifs is constant
if(theta > PI*7/4 && theta PI*1.75 && theta <
missing from post:
so why not make them constansts?

If you are going to calculate more than once (why would you?)then change to
if(theta > PI*1.75 && theta < PI*0.25)...
I can only suggest reducing the number if FP operations, e.g.:
// convert angle into quadrant numberint i = (theta + PI_4) * RECIP_PI_2 // RECIP_PI_2 = 2 / PI, or 1 / (PI/2)if (i == 1)  ..else if (i == 2)  ..else if (i == 3)  ..else // i == 0 || i == 4  ..


Skizz
The most important thing for performance of dynamic branching is coherency (as stated). If N fragments in a spatial region takes a different path, you may end up running the program N times for *every* one of the pixels in the region, having 99% of the results simply discarded. Thus it is *very* important to make sure that spatially coherent regions all take the same path through the control flow.

As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64. It's easy to see where differences in performance will occur. Note that next generation cards from both vendors will probably perform similarly if not better than ATI's current cards with respect to dynamic branching.
Are your branches very big?
If not, better to use static branching and also not even use the "else" conditional.

Its simple. You are trying to perform conditionals in a sequential order. Have you thought about doing it in the reverse order? This method will remove the need to use "else" conditionals.

Secondly, sometimes you will find it better to abstract the instructions out of the "if" conditionals. Meaning try not to nest computations inside. This saves quite alot of instructions.

Thirdly, removing the "&&" operator by using small functions, saves 1 instruction each. Naturally, when your code increases, you will realise you can save quite a number of instructions too.
Quote:
As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.


Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?
Quote:Original post by Anonymous Poster
Quote:
As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.


Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?


it's the amount of fragments that is processed by the GPU simultaneously
Quote:Original post by Anonymous Poster
Quote:
As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.


Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?


See this thread and this thread. Search might turn up more.



This topic is closed to new replies.

Advertisement