• Advertisement
Sign in to follow this  

early return from GPU does not speed up process ?

This topic is 1704 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

i am doing shader work  in the geometry shader  a particle system and thought that doing a check at the beginning and doing an early return  then it would speed up the whole  project .

i found that no speed improvement was made.

 

i even tryed doing  a modular test so it would return every time apart from the 1000 particle , it still made no difference 

 

what can i do  for early return so that it will speed up the process ?

Share this post


Link to post
Share on other sites
Advertisement

This is fairly standard: all executions of a shader run for the same time as the longest invocation of that shader in a processing group. The exact size and properties of the groups vary based on hardware and shader type. Essentially early-out doesn't help at all unless many -- ideally most -- of the elements using that shader hit the early out. Even then the difference may be trivial or undetectable due to other factors. I'm sure someone will be happy to chime in on the hardware concepts that underlie this behavior, and I'll see if I can find a useful paper or two.

 

All of this means that the way to optimize shaders is to shorten the longest shader execution, because everyone else is bottlenecked on the slow guy.

Share this post


Link to post
Share on other sites

Threads run in SIMD lock-step, meaning that every thread needs to execute the exact same instruction at the same time. Obviously we have if statements where half the threads will go one way and the other half will go another. What the GPU does is insert a nop (no operation) for failed cases and execute BOTH branches. If half the threads are false and go to the else then they will execute nops while the other half does the code in the if block, then they will switch roles and the first half will perform the else operations while the other half performs nops.

 

Now imagine you have 999 threads that exit early and 1 that does not. All 1000 threads will have to execute the longest path!

 

NVIDIA GPUs break the problem into threadgroups which execute together. If you can get the early terminators to be in the same threadgroup then the computation will actually speed up. If you can't then there is no performance increase.

Edited by menohack

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement