Jump to content

  • Log In with Google      Sign In   
  • Create Account


totally confused on branching


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 robotech_er   Members   -  Reputation: 105

Like
0Likes
Like

Posted 10 July 2011 - 10:25 AM

Hi, everyone, I need some help on how tohandle branching in shaders. I searched some articles and threads about thisissue, however still confused.



1. People always say modern GPUs “Avoidlatency stalls by interleaving execution of many groups of fragments”, whatdoes that exactly mean?



2. After SM3.0 shaders can “support truedynamic branching”. I thought with the “true dynamic branching”, the brancheswill be executed just like on CPU. But it turns out not true. I read lotsarticles say that “both branches of if and else will be executed (is this theso called “branch predication”?)”, why?



3. Besides of branch predication, any othermethod to handle branches on now days GPU?



3. Lots of articles talking about SIMDprogramming is about CUDA, can these information about CUDA be applied onshaders such as GLSL.



4. A snippet of pseudo code like this:

for( int i=0; i<15; i++ )

{

…..

If(In.color > val1 )

val2 = a;

else

val2 = b;



…..

val3= function1();

}

function1()

{

…….

somebranches…..

…….

}

The branching in this code is a verylightweight one, which I am told need not to be cared and the compiler can kickin. But this is also a “branching inside inner loops”, the most case should beavoid in program. So what should to be done to this kind of branching?

And there are some branches inside thefunction function1() ,should these branches be considered “branching insideinner loops” or not?



5. Cg compiler with GLSL profile can outputthe assembly of GLSL, are this assembly worthy enough to indicate the way howthe GPU execute it? I want to see how the GPU handle branches from theseassemblies, is that possible? since the assembly still is a middle language.



6. Where can Igrab some information on this branching question and GPU architecture? And somebasic skills to optimize them? Could you offer me some links?



Maybe too may whys…..I am a greener on GPUprogram, any suggestion will be appreciated. Hope my poor English did notbother you.

Thanks in advance!

Sponsor:

#2 Danny02   Members   -  Reputation: 271

Like
1Likes
Like

Posted 10 July 2011 - 03:10 PM

There is a quite cool app from AMD it's called GPU ShaderAnalyzer.
With this Tool u can analyze your shader code and it will show u the assembly code of your shader but also how much cycles it takes to compute.
And because of branching there is an MIN MAX and AVG number of cycles in the output which will show u the worst and best case of your branching code.(another feature is that it shows u if your shader is ALU-operation or texture sample bound etc)

This tool does only simulate AMD graphic cards so u won't know how your shader will perform on NVidia, but I think that nvidia was a bit badder with branching on their older cards then ATI.




#3 V-man   Members   -  Reputation: 805

Like
1Likes
Like

Posted 10 July 2011 - 06:23 PM

GPUs are not good at branching. They process fragments in groups and if one of those fragments need to take a different branch, the whole fragment processing for that group stalls.
SM 3 GPU process in groups of 4 (2x2 fragments). SM 4 GPUs (I don't if it is all of them) process groups of 16 (4x4 fragments).

Branch prediction? There is no such thing in GLSL.
I do remember reading lot of articles about Intel and branch prediction in their CPUs and how AMD designed a better CPU with their brand new Athlon.

"Lots of articles talking about SIMD programming is about CUDA, can these information about CUDA be applied on shaders such as GLSL."
If the articles is about GPUs, then yes. GLSL is just a shading language and there isn't anything about GPU design in the GLSL specification.

As for your code, you can get rid of the "if""else" with the mix() function. Look at your GLSL manual.

If you are interested in using the GPU as a CPU, read about CUDA or OpenCL (Compute Library).
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);

#4 robotech_er   Members   -  Reputation: 105

Like
0Likes
Like

Posted 10 July 2011 - 11:42 PM

GPUs are not good at branching. They process fragments in groups and if one of those fragments need to take a different branch, the whole fragment processing for that group stalls.
SM 3 GPU process in groups of 4 (2x2 fragments). SM 4 GPUs (I don't if it is all of them) process groups of 16 (4x4 fragments).

Branch prediction? There is no such thing in GLSL.
I do remember reading lot of articles about Intel and branch prediction in their CPUs and how AMD designed a better CPU with their brand new Athlon.

"Lots of articles talking about SIMD programming is about CUDA, can these information about CUDA be applied on shaders such as GLSL."
If the articles is about GPUs, then yes. GLSL is just a shading language and there isn't anything about GPU design in the GLSL specification.

As for your code, you can get rid of the "if""else" with the mix() function. Look at your GLSL manual.

If you are interested in using the GPU as a CPU, read about CUDA or OpenCL (Compute Library).


thanks for your reply. i meant branch predication, not branch prediction. i read some methods to better handle branching by mean of "regroup the the threads across warps" , is this a implementation of "dynamic branching"?





#5 V-man   Members   -  Reputation: 805

Like
0Likes
Like

Posted 11 July 2011 - 08:39 AM

I'm assuming you are talking the articles that were about CUDA (threads and all that).

I can't answer your question because I'm a GL and GLSL user. I'm more interested in games and 2D graphics.
I have never touched CUDA nor OpenCL.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS