GLSL optimization. What is faster?

Started by
3 comments, last by Mihumihu 8 years ago

I'm using OpenGL ES.

And have two types of calculation "dir" vector, which code is fastest?


attribute vec2 order;

code1:


  if( abs(sinA) < 0.2 ) {
    if(order.x == 1.0){
        dir = sNormalPrev;   
    } else {
        dir = sNormalNext;   
    }
  } else {
    dir *= order.x / sinA;
  }

code2:


float k = step(0.2, abs(sinA));
dir = k * dir * order.x / sinA - (k-1.0) * (step(1.0, order.x + 1.0) * sNormalPrev + step(1.0, -order.x + 1.0) * sNormalNext);

Advertisement

For element i of the return value, 0.0 is returned if x[i] < edge[i], and 1.0 is returned otherwise.

Parameters

edge

Specifies the location of the edge of the step function.

x

Specify the value to be used to generate the step function.

genType step( genType edge,

genType x);

now consider this

template <class T> T Max(T a, T b) { return (a > b ? a : b); } as an if if thats faster than normal if then we can debate on the thing, anyway if we dig in shit we end up with times for each gpu.

second thing seems to have more instructions than first one. since i just converten it to asm on the flow i am not 100% sure but it seems so, fpu will do more work , and in the second line you hgave 3 ifs and on first only 2, we could rant about optimization that goes through glsl and end up that calculations (* /) are faster, but you don't always have division in first line. someone more experienced should reply to that, but i thing for small scene without much data to process second should be faster than first one wheres first is faster for complicated scenes. , but yet not always, and again to answer the question there should be a specialist that knows how glsl optimization is done., thus this can vary on different gpus/cpus, and a mystery for 2016: if elses werent invented to screw the performance.

Your second code can be rewritten as:


dir = mix(mix(sNormalNext, sNormalPrev, order.x), dir * order.x / sinA, step(0.2, abs(sinA)); // considering order.x == 1.0 or 0.0
// if it is not then mine code is incorrect as well as yours

Probably compiler itself will optimize your code to something similar?. But it is now much more clear that both of your options are pretty similar.

There are two things to consider:

1. Control divergence. GPU core is a SIMD processor and shaders are executed in packages (aka warps/wavefronts) and if execution follows different branches in one package then both branches are executed in the end (plus some overhead for masking shaders and conditional statements). First option will only be faster if there is no control divergence inside a package. Which depends on:

sinA - is it uniform? In that case there will be no divergence for the first 'if'.

order.x - it is vertex attribute so I'd replace second 'if' with mix(sNormalNext, sNormalPrev, order.x).

2. Execution latency of both versions might be hidden by memory operations. If subsequent code contains texture read then when execution of a package reaches texture read instruction, GPU core will switch to another package which will execute the code while texture unit is fetching data from memory for the first package. It is possible that there is no point in optimizing this code at all.

as you see its comples thing you need to poert shite into your own needs then we can talk about optimization, thus if else is the best thing you can use as far you dont use much else condition

Yes! Thank you very much WiredCat and Alex! I think I understand. Thank you!

This topic is closed to new replies.

Advertisement