Shader branching ruins performance

Graphics and GPU Programming Programming

Started by jcabeleira January 15, 2010 02:25 AM

28 comments, last by jcabeleira 14 years, 2 months ago

726

Author

January 16, 2010 08:40 PM

Quote:Original post by Ysaneya
If you use the latest drivers, then I would suggest to rollback to previous drivers and see if it makes a difference. Your problem really starts to look like a driver problem to me.

It shouldn't be a driver problem. I've tested the shader on two different computers. One of them is a laptop with a Nvidia GTX 260, the other is a PC with a
Nvidia 9800. Those two computers use different but recent drivers, and the shader performance problem happens on both of them.

Quote:
Other than that, maybe you could give a try to storing your data in 1D textures instead of an array of constants. I've seen strange behaviors when accessing an array of constant uniforms in the past, although that'd be surprising on a GTX 260.

Yes, that could be a good ideia. Thanks.

Voltaico Engine

MJP

20,295

January 16, 2010 09:57 PM

Quote:Original post by Krohm
As a side note, I am pretty sure PS3.0 has full branching support...
EDIT: Anyway, this is just ugly.

It most definitely does support dynamic branching.

With HLSL you can control things like unrolling and dynamic branching using attributes, or compiler flags. I have no idea of GLSL supports such things.

The Blog | The Book

Lutz

462

January 17, 2010 02:06 PM

I assume that rays and spheres are constant registers? Then that's the problem! I had the same issue with either a 7800 or a 9800, don't remember, but this hardware does not support constant register indexing in a pixel shader! At least not in an efficient way. It does in a vertex shader, though. In a pixel shader, the indexing code rays[ray] is basically unrolled into

if (ray == 0) return rays[0];else if (ray == 1) return rays[1];

and so on. As you can imagine, this is sub-optimal to say the least. So if the for-loops are unrolled, also the indexing is done explicitely and you don't have that issue. So it's not dynamic branching, it's array indexing.

To test this hypothesis, try this: Keep the for-loop, but replace rays[ray] with rays[0] and the same with spheres. If the speed increases, indexing is indeed the problem.

To fix the performance issues, use textures. Crytek & co are all using textures as well.

@Krypt0n: Doubles are an SM5 feature, G260s are SM4, though.

bubu LV

1,436

January 17, 2010 02:24 PM

Try replacing this:

  vec3 rayDirection= rays[ray];

with this:

vec3 getRay(int idx){  if (idx == 0) return rays[0]  else if (idx == 1) return rays[1]  else if (idx == 2) return rays[2]  ...}vec3 rayDirection = getRay(ray);

Do the same with spheres.

zedz

291

January 18, 2010 12:27 AM

btw Im sure youre aware but with nvemulate theres an option for it to dump the ASM from a glsl file
which can give u an idea of whats causing the difference in speed between 2 differnt methods

Jason Z

6,437

January 18, 2010 06:16 AM

I think Lutz is on to something - my SSAO loop runs a very similar type of calculation (two for loops nested with some arithmetic in the center), and even on a 8600M it runs significantly faster than 30 fps for a similar number of iterations. I've also used dynamic branching with for loops on a parallax occlusion calculation and there wasn't any big problems with performance... Have you tried his suggested test out yet?

Jason Zink :: DirectX MVP

Direct3D 11 engine on CodePlex: Hieroglyph 3

Direct3D Books: Practical Rendering and Computation with Direct3D 11, Programming Vertex, Geometry, and Pixel Shaders
Articles: Dual-Paraboloid Mapping Article :: Parallax Occlusion Mapping Article (original):: Fast Silhouettes Article

Games: Lunar Rift

jcabeleira

726

Author

January 19, 2010 09:31 PM

Quote:Original post by Jason Z
I think Lutz is on to something - my SSAO loop runs a very similar type of calculation (two for loops nested with some arithmetic in the center), and even on a 8600M it runs significantly faster than 30 fps for a similar number of iterations. I've also used dynamic branching with for loops on a parallax occlusion calculation and there wasn't any big problems with performance... Have you tried his suggested test out yet?

Not yet, I'm currently busy with other stuff right now, but I'll try it very soon.
Thanks for all your help, guys.

Voltaico Engine

outRider

852

January 20, 2010 07:46 PM

As already mentioned, Nvidia cards can't do register indexing, IIRC they use scratch memory to do it, which explains all those movs.

jcabeleira

726

Author

January 22, 2010 11:51 AM

Quote:Original post by Lutz
I assume that rays and spheres are constant registers? Then that's the problem! I had the same issue with either a 7800 or a 9800, don't remember, but this hardware does not support constant register indexing in a pixel shader! At least not in an efficient way.

You were absolutely right, Lutz. It was the array indexing that was causing the major slow down. I replaced the arrays by textures with the same information and the array indexing by texture reads, and now I get 30 FPS.

But what scares me the most is that I had already several shaders doing heavily use of array indexing for the SSAO and soft shadowing effects. Although their performance wasn't bad, I wonder how much will I increase the frame rate of my application just by removing the array indexing.

Voltaico Engine

jcabeleira

726

Author

January 29, 2010 05:28 PM

Hmmm, I overcomplicated the solution. Instead of using textures I'm now using an uniform array of vec3's and also works fine, and it also saves me the trouble of encoding/decoding the needed information in textures.
What is really surprising is that the GPU accepts the uniform array as a fast input method, but doesn't handle a simple constant array declared on the shader the same way.

Voltaico Engine

Shader branching ruins performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Shader branching ruins performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines