dynamic branching in GPU

Started by
3 comments, last by Matias Goldberg 9 years, 6 months ago

Can you explain, why dynamic branching in GPU is considered as an expensive operation? How many instructions (or in other criteria) does it usually cost? Thanks.

Advertisement

I found some other threads similar to my question, so please ignore this one. Sorry.

Post links to those threads for googlers/bingers/search results

The last time I had some info on this (it's been a while), the issue was that since a group of computing units run in lockstep (essentially same instruction pointer across all of them at any one time), all of the computing units in a given group have to effectively execute both sides of each branch even though the results would be discarded (by disabling memory loads and stores) on some of them.

In addition, if branching depends on external non-constant input (like a texture value, or an input primitive value) and determines - for example - a texture sampling position, it is impossible to predict subsequent memory accesses and thus texture caching may cease to be effective.

There are surely more reasons than mentioned herein, but these are from the top of my head.

It is possible that newer hardware has some smart workarounds for these issues.

Niko Suni

It is possible that newer hardware has some smart workarounds for these issues.

Nope. That has hardly changed.

Branches are commonly not analyzed in cycles but rather something called "divergency" or "input coherence".

GPUs work in parallel. Lockstep as it has been said. Threads are grouped, launched together, and must execute the same instructions.

When one thread inside that group needs to take a different branch than the rest of threads in its group, it is said that we have a "divergency".
The more divergencies you have, the bigger the negative impact of branches; as all the branches must be executed by all threads only later for the wrong results to be discarded (masked out).

When all threads in one group follows one branch; while another group follows a different branch; all is well. No divergency happens, and we say that the input is coherent, homogeneous, or that it follows a nice pattern.

Of course even if the data is coherent, some divergency may still happen for some groups. But the key here is whether the data is coherent enough so that the performance improvement of skipping work for most of the groups outweights the performance drop caused by the groups that ended up diverging.

This topic is closed to new replies.

Advertisement