Jump to content
  • Advertisement
Sign in to follow this  
Daban

dynamic branching in GPU

This topic is 1436 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Can you explain, why dynamic branching in GPU is considered as an expensive operation? How many instructions (or in other criteria) does it usually cost? Thanks.

Share this post


Link to post
Share on other sites
Advertisement

The last time I had some info on this (it's been a while), the issue was that since a group of computing units run in lockstep (essentially same instruction pointer across all of them at any one time), all of the computing units in a given group have to effectively execute both sides of each branch even though the results would be discarded (by disabling memory loads and stores) on some of them.

 

In addition, if branching depends on external non-constant input (like a texture value, or an input primitive value) and determines - for example - a texture sampling position, it is impossible to predict subsequent memory accesses and thus texture caching may cease to be effective.

 

There are surely more reasons than mentioned herein, but these are from the top of my head.

 

It is possible that newer hardware has some smart workarounds for these issues.

Edited by Nik02

Share this post


Link to post
Share on other sites

It is possible that newer hardware has some smart workarounds for these issues.

Nope. That has hardly changed.

Branches are commonly not analyzed in cycles but rather something called "divergency" or "input coherence".

GPUs work in parallel. Lockstep as it has been said. Threads are grouped, launched together, and must execute the same instructions.

When one thread inside that group needs to take a different branch than the rest of threads in its group, it is said that we have a "divergency".
The more divergencies you have, the bigger the negative impact of branches; as all the branches must be executed by all threads only later for the wrong results to be discarded (masked out).

When all threads in one group follows one branch; while another group follows a different branch; all is well. No divergency happens, and we say that the input is coherent, homogeneous, or that it follows a nice pattern.

Of course even if the data is coherent, some divergency may still happen for some groups. But the key here is whether the data is coherent enough so that the performance improvement of skipping work for most of the groups outweights the performance drop caused by the groups that ended up diverging.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!