Your second code can be rewritten as:
dir = mix(mix(sNormalNext, sNormalPrev, order.x), dir * order.x / sinA, step(0.2, abs(sinA)); // considering order.x == 1.0 or 0.0
// if it is not then mine code is incorrect as well as yours
Probably compiler itself will optimize your code to something similar?. But it is now much more clear that both of your options are pretty similar.
There are two things to consider:
1. Control divergence. GPU core is a SIMD processor and shaders are executed in packages (aka warps/wavefronts) and if execution follows different branches in one package then both branches are executed in the end (plus some overhead for masking shaders and conditional statements). First option will only be faster if there is no control divergence inside a package. Which depends on:
sinA - is it uniform? In that case there will be no divergence for the first 'if'.
order.x - it is vertex attribute so I'd replace second 'if' with mix(sNormalNext, sNormalPrev, order.x).
2. Execution latency of both versions might be hidden by memory operations. If subsequent code contains texture read then when execution of a package reaches texture read instruction, GPU core will switch to another package which will execute the code while texture unit is fetching data from memory for the first package. It is possible that there is no point in optimizing this code at all.