Sign in to follow this  
jcabeleira

Shader branching ruins performance

Recommended Posts

Jason Z    6436
I think Lutz is on to something - my SSAO loop runs a very similar type of calculation (two for loops nested with some arithmetic in the center), and even on a 8600M it runs significantly faster than 30 fps for a similar number of iterations. I've also used dynamic branching with for loops on a parallax occlusion calculation and there wasn't any big problems with performance... Have you tried his suggested test out yet?

Share this post


Link to post
Share on other sites
jcabeleira    723
Quote:
Original post by Jason Z
I think Lutz is on to something - my SSAO loop runs a very similar type of calculation (two for loops nested with some arithmetic in the center), and even on a 8600M it runs significantly faster than 30 fps for a similar number of iterations. I've also used dynamic branching with for loops on a parallax occlusion calculation and there wasn't any big problems with performance... Have you tried his suggested test out yet?


Not yet, I'm currently busy with other stuff right now, but I'll try it very soon.
Thanks for all your help, guys.

Share this post


Link to post
Share on other sites
jcabeleira    723
Quote:
Original post by Lutz
I assume that rays and spheres are constant registers? Then that's the problem! I had the same issue with either a 7800 or a 9800, don't remember, but this hardware does not support constant register indexing in a pixel shader! At least not in an efficient way.


You were absolutely right, Lutz. It was the array indexing that was causing the major slow down. I replaced the arrays by textures with the same information and the array indexing by texture reads, and now I get 30 FPS.

But what scares me the most is that I had already several shaders doing heavily use of array indexing for the SSAO and soft shadowing effects. Although their performance wasn't bad, I wonder how much will I increase the frame rate of my application just by removing the array indexing.

Share this post


Link to post
Share on other sites
jcabeleira    723
Hmmm, I overcomplicated the solution. Instead of using textures I'm now using an uniform array of vec3's and also works fine, and it also saves me the trouble of encoding/decoding the needed information in textures.
What is really surprising is that the GPU accepts the uniform array as a fast input method, but doesn't handle a simple constant array declared on the shader the same way.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this