[SOLVED] Lerp then Add, or Add then Lerp?

Started by
4 comments, last by n00body 7 years, 9 months ago

Background

I'm trying to smoothly blend away an emission effect from my shader's total emission output by using the lerp() intrinsic.

Questions

  1. Is it cheaper or equal cost to do "a += lerp(0, b, mask);" or "a = lerp(a, a + b, mask);"?
  2. Is the answer the same when replacing "+" with "*", "+=" with "*=", and "0" with "1"?
  3. Will either optimize down to fewer instructions if the mask is set to a constant of 1?

[Hardware:] Falcon Northwest Tiki, Windows 7, Nvidia Geforce GTX 970

[Websites:] Development Blog | LinkedIn
[Unity3D :] Alloy Physical Shader Framework

Advertisement

a += lerp(0, b, mask);

This is a += b*mask, or a = b * mask + a, which is a single multiply-add instruction :wink:
You'd be better off writing it in this form so that the compiler has the best chance at realizing that it's got a MAD on it's hands here, instead of hoping the optimizer discovers it.

2) No, different rules apply.
e.g. a *= lerp(1, b, mask) is equivalent to a = a*(b*mask + 1 - mask), or a = a*b*mask - a*mask + a, which is quite a different beast to the above.
3) Using constants instead of uniforms can often positively affect code generation.

Check out:
http://www.humus.name/index.php?page=Articles&ID=6
http://www.humus.name/index.php?page=Articles&ID=9

Doh! Well that's embarrassing. Apparently experience doesn't stop you from forgetting basic math from time to time. >_<

As for Question 2, I guess it won't really matter either way. Thanks for the help.


[Hardware:] Falcon Northwest Tiki, Windows 7, Nvidia Geforce GTX 970

[Websites:] Development Blog | LinkedIn
[Unity3D :] Alloy Physical Shader Framework

1) a*(1.f - mask) + (a+b)*mask
vs.
2) a + (0.f*(1.f-mask) + b*mask) or if simplified a + b*mask

the 2)nd one is also better if you are concerned about precision, as there are less +/- operations (the main thing that kills presision as the exp. part of the floating point number must be equal for the both numbers).

If we talk in floats the 1) cannot be well optimized because of that precision triks that the complier thinks could be applied.

Although Hodgman's advice is likely the better, I would point out that there _is_ an advantage to using higher-level primitives in your math: it's clearer to other developers (including future you) what your intent was.

I'd also suspect a good optimizer in fast-math mode to apply the following transformations:

lerp(a, b, t) -> a*(1 - t) + b*t
a += lerp(0, 1, t)

a = a + lerp(0, 1, t) // operator expansion
a = a + 0*(1 - t) + b*t // inlining
a = a + 0 + b*t // constant expression resolution
a = a + b*t // identity transformation folding
a = b*t + a // reordering add-multiply to multiply-add
a = fma(b, t, a) // fused-multiply-add
... // instruction selection for target machine

Sure enough, with at GCC 5.1 and Clang 3.2 we see that indeed the optimization happens (link below) when using -O3 -mavx2 -mfma -ffast-math (I didn't play around with the settings much, so I don't know if you need all that). GCC 5.1 and Clang 3.8 optimizes perfectly while while Clang 3.2 - 3.7 select a poorer FMA instruction (it literally translates the add-multiply rather than transforming it into a mutiply-add) that requires an extra mov instruction to compensate (Hodgman's suggested simplification _also_ does this, though, as it's still an add-multiply).

Clang 3.0 does not do the optimization, I don't see an option to test 3.1, and I didn't bother any GCC older than 5.1. The online MSC compiler doesn't let me set the target architecture or see assembly output and I'm too lazy to compile locally to test right now, so I'm unsure how well it does at this test (but I've so far been _extremely_ happy with the quality of optimizations in MSVC 2013+). ICC 13 surprisingly does not ever emit an FMA instruction in my testing, but does optimization down to just two instruction (add and multiply, unsurprisingly).
gcc.godbolt.org test

Compilers are neat.

That said, debug performance _does_ matter in games, at least at the higher end of development, so there's an argument to be made that your code should be as fast as possible even with optimizations off. The selection of trade-offs between optimal-debug and optimal-clarity is a constant battle in game code engineering, unfortunately. :)

Sean Middleditch – Game Systems Engineer – Join my team!

I already said the equation "a += lerp(0, b, mask);" was a mistake as I had briefly forgotten a basic math identity. Then someone replied before I could remove the thread. So while I appreciate the breakdown, it's for an erroneous equation that I don't plan to use.


[Hardware:] Falcon Northwest Tiki, Windows 7, Nvidia Geforce GTX 970

[Websites:] Development Blog | LinkedIn
[Unity3D :] Alloy Physical Shader Framework

This topic is closed to new replies.

Advertisement