Jump to content
  • Advertisement
Sign in to follow this  
n00body

[SOLVED] Lerp then Add, or Add then Lerp?

This topic is 879 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Background

I'm trying to smoothly blend away an emission effect from my shader's total emission output by using the lerp() intrinsic.

 

Questions

  1. Is it cheaper or equal cost to do "a += lerp(0, b, mask);" or "a = lerp(a, a + b, mask);"?
  2. Is the answer the same when replacing "+" with "*", "+=" with "*=", and "0" with "1"?
  3. Will either optimize down to fewer instructions if the mask is set to a constant of 1?

Share this post


Link to post
Share on other sites
Advertisement

a += lerp(0, b, mask);

This is a += b*mask, or a = b * mask + a, which is a single multiply-add instruction :wink:
You'd be better off writing it in this form so that the compiler has the best chance at realizing that it's got a MAD on it's hands here, instead of hoping the optimizer discovers it.
 
2) No, different rules apply.
e.g. a *= lerp(1, b, mask) is equivalent to a = a*(b*mask + 1 - mask), or a = a*b*mask - a*mask + a, which is quite a different beast to the above.
3) Using constants instead of uniforms can often positively affect code generation.
 
Check out:
http://www.humus.name/index.php?page=Articles&ID=6
http://www.humus.name/index.php?page=Articles&ID=9

Share this post


Link to post
Share on other sites

Doh! Well that's embarrassing. Apparently experience doesn't stop you from forgetting basic math from time to time. >_<

 

As for Question 2, I guess it won't really matter either way. Thanks for the help.

Edited by n00body

Share this post


Link to post
Share on other sites

1) a*(1.f - mask) + (a+b)*mask 
vs.
2) a + (0.f*(1.f-mask) + b*mask) or if simplified a + b*mask 

the 2)nd one is also better if you are concerned about precision, as there are less +/- operations (the main thing that kills presision as the exp. part of the floating point number must be equal for the both numbers).

 

If we talk in floats the 1) cannot be well optimized because of that precision triks that the complier thinks could be applied.

Edited by imoogiBG

Share this post


Link to post
Share on other sites
Although Hodgman's advice is likely the better, I would point out that there _is_ an advantage to using higher-level primitives in your math: it's clearer to other developers (including future you) what your intent was.

I'd also suspect a good optimizer in fast-math mode to apply the following transformations:

lerp(a, b, t) -> a*(1 - t) + b*t
a += lerp(0, 1, t)

a = a + lerp(0, 1, t) // operator expansion
a = a + 0*(1 - t) + b*t // inlining
a = a + 0 + b*t // constant expression resolution
a = a + b*t // identity transformation folding
a = b*t + a // reordering add-multiply to multiply-add
a = fma(b, t, a) // fused-multiply-add
... // instruction selection for target machine

Sure enough, with at GCC 5.1 and Clang 3.2 we see that indeed the optimization happens (link below) when using -O3 -mavx2 -mfma -ffast-math (I didn't play around with the settings much, so I don't know if you need all that). GCC 5.1 and Clang 3.8 optimizes perfectly while while Clang 3.2 - 3.7 select a poorer FMA instruction (it literally translates the add-multiply rather than transforming it into a mutiply-add) that requires an extra mov instruction to compensate (Hodgman's suggested simplification _also_ does this, though, as it's still an add-multiply).

Clang 3.0 does not do the optimization, I don't see an option to test 3.1, and I didn't bother any GCC older than 5.1. The online MSC compiler doesn't let me set the target architecture or see assembly output and I'm too lazy to compile locally to test right now, so I'm unsure how well it does at this test (but I've so far been _extremely_ happy with the quality of optimizations in MSVC 2013+). ICC 13 surprisingly does not ever emit an FMA instruction in my testing, but does optimization down to just two instruction (add and multiply, unsurprisingly).
gcc.godbolt.org test

Compilers are neat.

That said, debug performance _does_ matter in games, at least at the higher end of development, so there's an argument to be made that your code should be as fast as possible even with optimizations off. The selection of trade-offs between optimal-debug and optimal-clarity is a constant battle in game code engineering, unfortunately. :)

Share this post


Link to post
Share on other sites

I already said the equation "a += lerp(0, b, mask);" was a mistake as I had briefly forgotten a basic math identity. Then someone replied before I could remove the thread. So while I appreciate the breakdown, it's for an erroneous equation that I don't plan to use.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!