Nope. That has hardly changed.
It is possible that newer hardware has some smart workarounds for these issues.
Branches are commonly not analyzed in cycles but rather something called "divergency" or "input coherence".
GPUs work in parallel. Lockstep as it has been said. Threads are grouped, launched together, and must execute the same instructions.
When one thread inside that group needs to take a different branch than the rest of threads in its group, it is said that we have a "divergency".
The more divergencies you have, the bigger the negative impact of branches; as all the branches must be executed by all threads only later for the wrong results to be discarded (masked out).
When all threads in one group follows one branch; while another group follows a different branch; all is well. No divergency happens, and we say that the input is coherent, homogeneous, or that it follows a nice pattern.
Of course even if the data is coherent, some divergency may still happen for some groups. But the key here is whether the data is coherent enough so that the performance improvement of skipping work for most of the groups outweights the performance drop caused by the groups that ended up diverging.