Jump to content

  • Log In with Google      Sign In   
  • Create Account


early return from GPU does not speed up process ?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
2 replies to this topic

#1 thedodgeruk   Members   -  Reputation: 124

Like
0Likes
Like

Posted 14 May 2013 - 08:47 AM

i am doing shader work  in the geometry shader  a particle system and thought that doing a check at the beginning and doing an early return  then it would speed up the whole  project .

i found that no speed improvement was made.

 

i even tryed doing  a modular test so it would return every time apart from the 1000 particle , it still made no difference 

 

what can i do  for early return so that it will speed up the process ?



Sponsor:

#2 Promit   Moderators   -  Reputation: 6350

Like
3Likes
Like

Posted 14 May 2013 - 05:18 PM

This is fairly standard: all executions of a shader run for the same time as the longest invocation of that shader in a processing group. The exact size and properties of the groups vary based on hardware and shader type. Essentially early-out doesn't help at all unless many -- ideally most -- of the elements using that shader hit the early out. Even then the difference may be trivial or undetectable due to other factors. I'm sure someone will be happy to chime in on the hardware concepts that underlie this behavior, and I'll see if I can find a useful paper or two.

 

All of this means that the way to optimize shaders is to shorten the longest shader execution, because everyone else is bottlenecked on the slow guy.



#3 menohack   Members   -  Reputation: 216

Like
0Likes
Like

Posted 20 May 2013 - 05:41 PM

Threads run in SIMD lock-step, meaning that every thread needs to execute the exact same instruction at the same time. Obviously we have if statements where half the threads will go one way and the other half will go another. What the GPU does is insert a nop (no operation) for failed cases and execute BOTH branches. If half the threads are false and go to the else then they will execute nops while the other half does the code in the if block, then they will switch roles and the first half will perform the else operations while the other half performs nops.

 

Now imagine you have 999 threads that exit early and 1 that does not. All 1000 threads will have to execute the longest path!

 

NVIDIA GPUs break the problem into threadgroups which execute together. If you can get the early terminators to be in the same threadgroup then the computation will actually speed up. If you can't then there is no performance increase.


Edited by menohack, 20 May 2013 - 05:42 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS