Jump to content
  • Advertisement
Sign in to follow this  
jcabeleira

Shader branching ruins performance

This topic is 3093 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

We all know that dynamic branching hits the performance, but how much it impacts is quite ridiculous. Some time ago I discovered that a single "if" statement in one of my shaders was dropping the frame rate to half! I replaced it with a multiply and everything was fine. Today, I was experimenting with a simple raytracing calculations on the GPU, actually some kind of Ambient Occlusion based on the intersection of rays with spheres, and the two nested loops I used (for each ray check collision against each sphere) completely ruins the shader performance. Currently I'm doing this in a GLSL pixel shader: for each pixel cast 8 rays that are checked for intersection against 10 spheres. No big deal, the code is short and clean but it is also slow as hell! I also tried moving the calculation to the vertex shader (my scene contained about 16.000 vertices) and you know what? I got exactly the same performance as if it was running per pixel (which acount for 1.700.000 pixels)! When I saw this, I realized that the calculation complexity was not the reason for the poor performance. Then I moved back to the pixel shader and unrolled the two nested loops by hand which resulted in an awfull lengthy code that surprisingly runs really fast! So my question is this: is there any way for the GLSL compiler to unroll the goddamn loops instead of making the GPU use dynamic branching or making me unroll them by hand? Thanks

Share this post


Link to post
Share on other sites
Advertisement
- Could you show some code ?
- Are you using constants to define iterations or are you using uniforms to define number of iterations ?
- What is your performance impact ?
- Are you sure, that your shader is not falling back to software mode ?
- What hardware are you using ?

--
Ashaman73

Share this post


Link to post
Share on other sites
If you haven't already, try prefixing the if statement with [branch]

Check the disassembly in both cases, you may see an unrolled loop without it and an explicit if with it

[edit: just noticed you're using GLSL, this might not be applicable outside HLSL]

Share this post


Link to post
Share on other sites
Quote:

What video card do you use ?


Nvidia GTX 260. It's good hardware.

Quote:

- Could you show some code ?
- Are you using constants to define iterations or are you using uniforms to define number of iterations ?
- What is your performance impact ?
- Are you sure, that your shader is not falling back to software mode ?
- What hardware are you using


Here is the code. The rays and spheres are two constant arrays that I didn't include here for sake of the post length.


float occlusion= 0.0;

for(int ray= 0; ray< 8; ++ray){
vec3 rayDirection= rays[ray];

for(int sphere= 0; sphere< 8; ++sphere){

vec3 sphereVector= spheres[sphere].position- position;
float d= dot(rayDirection, sphereVector);
vec3 nearestPoint= position+ rayDirection*d;

if(length(nearestPoint- spheres[sphere].position)<= spheres[sphere].radius)
occlusion+= 1.0*dot(rayDirection, normal);

}
}

occlusion/= 8.0;
gl_FragColor= vec4(1.0- occlusion);





With dynamic branching I get 2 fps but with unrolled loops I get 30 fps.
As I said before, I get the same performance whether I run in per pixel or per vertex.

Share this post


Link to post
Share on other sites
It definitely sounds like its running in software mode for whatever reason...

As far as unrolling goes, I believe that most graphics drivers will automatically unroll a loop or branch if they can (it does not depend on a variable). Not sure where I read that though.

Share this post


Link to post
Share on other sites
Quote:
Original post by jcabeleira
Nvidia GTX 260. It's good hardware.


Indeed, your shader should run well on this kind of card..

Quote:
Original post by jcabeleira
With dynamic branching I get 2 fps but with unrolled loops I get 30 fps.
As I said before, I get the same performance whether I run in per pixel or per vertex.


Is the problem the dynamic branching or the loops ?

If you keep the loop but remove the if and always execute the occlusion operation, how does it affect the framerate ?

Do you have recent drivers ?

Share this post


Link to post
Share on other sites
Quote:

Is the problem the dynamic branching or the loops ?

If you keep the loop but remove the if and always execute the occlusion operation, how does it affect the framerate ?

Do you have recent drivers ?


The loops are the problem, I've tried to replace the "if" by a multiply but no performance changes occurred.

Yes I have the most recent drivers.

Share this post


Link to post
Share on other sites
This may be a dumb idea but what happens if you allocate all of your local variables (like sphereVector, nearestPoint, and maybe even the loop counters) outside the loop?

I'm thinking maybe it is unrolling your loop internally but doing so requires it to create more temporary variables than it has room for?

Would you mind showing a screen shot of your result? How does AO look with only 8 samples?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!