Jump to content
  • Advertisement
Sign in to follow this  
cavatina

How to speed up branch instrctions?

This topic is 4280 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, all! I have a view dependent constraint in my pixel shader program written in GLSL. See below => ... if(theta > PI*7/4 && theta < PI/4) ... do something1 ... else if( abs(theta - PI_2) < PI_4) ... do something2 ... else if( abs(theta - PI) < PI_4) ... do something3 ... else ... do something4 ... The shader program which involves these branch instructions downs the FPS very much, about only 1/3 FPS of the original non-branch shader program. Could someone know how to encode such branch description or have any suggestions of graphics hardware? I use nVidia 6 series card, but I have heared that ATI's shader model 3.0 implementation is much better than nVidia's. Is it that? Hope someone helps me... thanks a lot!!! :) cavatina

Share this post


Link to post
Share on other sites
Advertisement
it really depends...provided that gpu pipes are really long branching becomes really tricky. However, what can be stated is the fact that ATi is better in branching that nVidia.

Regarding your code, it really depends what kind of code is executed in the particular branch(for optimal performance it should be equally expensive code) and how coherent are your fragment's execution paths(from your code it seems that this should be ok)

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
well what you are checking in the ifs is constant
if(theta > PI*7/4 && theta PI*1.75 && theta <

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
missing from post:
so why not make them constansts?

If you are going to calculate more than once (why would you?)then change to


if(theta > PI*1.75 && theta < PI*0.25)
...

Share this post


Link to post
Share on other sites
I can only suggest reducing the number if FP operations, e.g.:

// convert angle into quadrant number
int i = (theta + PI_4) * RECIP_PI_2 // RECIP_PI_2 = 2 / PI, or 1 / (PI/2)
if (i == 1)
..
else if (i == 2)
..
else if (i == 3)
..
else // i == 0 || i == 4
..


Skizz

Share this post


Link to post
Share on other sites
The most important thing for performance of dynamic branching is coherency (as stated). If N fragments in a spatial region takes a different path, you may end up running the program N times for *every* one of the pixels in the region, having 99% of the results simply discarded. Thus it is *very* important to make sure that spatially coherent regions all take the same path through the control flow.

As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64. It's easy to see where differences in performance will occur. Note that next generation cards from both vendors will probably perform similarly if not better than ATI's current cards with respect to dynamic branching.

Share this post


Link to post
Share on other sites
Are your branches very big?
If not, better to use static branching and also not even use the "else" conditional.

Its simple. You are trying to perform conditionals in a sequential order. Have you thought about doing it in the reverse order? This method will remove the need to use "else" conditionals.

Secondly, sometimes you will find it better to abstract the instructions out of the "if" conditionals. Meaning try not to nest computations inside. This saves quite alot of instructions.

Thirdly, removing the "&&" operator by using small functions, saves 1 instruction each. Naturally, when your code increases, you will realise you can save quite a number of instructions too.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:

As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.


Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Quote:

As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.


Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?


it's the amount of fragments that is processed by the GPU simultaneously

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Quote:

As mentioned, ATI's region size is on the order of 16 pixels (4x4, or 4x12 on 1900 series) and NVIDIA's is somewhere in the neighborhood of 64x64.


Can you elaborate on what that means please? Whats a region size? And nVidia's region size is much bigger?


See this thread and this thread. Search might turn up more.



Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!