Sign in to follow this  

Is linear interpolation possible without dividing?

This topic is 3861 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I hate dividing. I really, really hate the expense of '/'. That symbol in the context of an algorithm equates to a curse word to me. How can I perform linear interpolation without paying the penalty of divide? My macro for lerping is the equation that only does one multiplication.
[source ="cpp"]
#define lerpf(fStart,fEnd,fAlpha) (fStart + fAlpha * (fEnd + -fStart))

That is all well and good. This next part is killing me!
[source = "cpp"]
// Normalize the life of the particle... ewwww divide!
float percentLife = p->life.x/p->life.y;

// Establish the time difference of the current keyframe and next keyframe.
float fRange = m_vecSizeSets[p->scaleNdx].frame[nextKeyframe].x - m_vecSizeSets[p->scaleNdx].frame[p->scaleKeyframe].x;
		
// Subtract the start of the current keyframe time from the current total time
float fCurrent = percentLife - m_vecSizeSets[p->scaleNdx].frame[p->scaleKeyframe].x;


// Now find out the percentage that we are in the the current range! another divide.. :(
p->scale = lerpf(m_vecSizeSets[p->scaleNdx].frame[p->scaleKeyframe].w,m_vecSizeSets[p->scaleNdx].frame[nextKeyframe].w,fCurrent/fRange);

Is there a better way to achieve this mathematically as I am going to have to do something very similar to a few other members of the class. :( Brandon

Share this post


Link to post
Share on other sites
If you dont care that much about precision and 'p->life.x' is the current particle life time and 'p->life.y' is your max life you can always precalc.
In this case all you would have to precalc is "1.0/p->life.y" and change 'p->life.x/p->life.y' to 'p->life.x*p->life.oneOverY'.
The same might be possible for you fRange as long as its the difference between keyframe n and n+1. Then just precalc that to.

Then of course you have the option to go with SSE and use it's operations.

Share this post


Link to post
Share on other sites
Quote:
Original post by codingsolo
How can I perform linear interpolation without paying the penalty of divide?
By performing a sequence of convoluted and contrived operations, the final result of which will be difficult to read, difficult to maintain and modify -- and here's the kicker, slower than dividing.

Use the god damn division op.

Share this post


Link to post
Share on other sites
Well, no. There's no way to normalise or to scale between arbitrary values without dividing.

Is there a good reason you're disallowing division, or do you actually believe the outrageous fallacy that division is slow? The x86 (and x64) MUL/DIV, IMUL/IDIV, FMUL/FDIV have been equally fast (or almost as good as) for such a long time. It's true that integer multiplication has the potential to be faster than division, but it also has the potential to be slower. If you're panicking about the order of ten clock cycles in the days of quad-core, hyper-threaded, cached-up-to-the-follicles, highly unpredictable and nonlinear processors, you have it all wrong [rolleyes].

I suggest you let your compiler take care of the micro-optimisation, because it will do a much better job than any human ever could, and spend your efforts optimising things that will actually make a difference [wink].

Admiral

Share this post


Link to post
Share on other sites
Actually divisions are very slow (compared to the other instruction) if nothing has changed in the last few years and I doubt they have. Last time I played around with them was the main problem not so much that they took a few clock cycles. The killer was that they stalled the whole pipeline on the cpu. *ouch*

But I assume codingsolo has profiled his code already and
A) Done any algorithm optimizations he can think of
B) Seen that the division is whats actually killing him and not 'hidden' problems like 'p->life.x/p->life.y' is the first time the particle is accessed so the real problem is that the cpu is waiting to get the particle into cache.

Share this post


Link to post
Share on other sites
These few divisions will not impact your performance at all. The whole out-of-order core destroyed the problems of 'slow' division on PCs over a decade ago.

In 13 years (wow!) of optimizing code, I know that it is never where you expect it to be, even when you really think you know where it is.


Use a profiler to figure out where the *real* bottlenecks are, and fix those. Measure before and after to make sure you improved performance. For example, one drawing API call may be taking a few μs and really slowing your game down, when a similar but adequate function may take just a few ns to complete.

It isn't the division that slows your game down. It is the function with an unintended side effect of internally calling strlen() thousands of times per frame that slow the game down. It is the unexpected use of an API function that requires round-trip data reads from hardware when it could be using cached values. It is an unintended new/free cycle that makes a lot of unnecessary OS work.

Learn to love your profiler, and *never* guess when optimization is concerned.

Share this post


Link to post
Share on other sites
Good words of wisdom but I have put the code under the scrutiny of Vtune and have detected this as a hotspot. I appreciate the profound opinions about the topic, but Xetick was the only one who hit the nail on the head. You other guys are good problem solvers which makes you excellent programmers, but sometimes a simple answer is the wanted solution. I don't want to be doing divides for as many particles that are in each system if I could avoid it through ingenuity of mathematics. I expanded my code for your readability in the snippet as I only do random accesses once and utilize pointers to the data structures. I just wanted to know what I could do with the divides and I got a helpful answer from Xetick. Thank you.

Brandon

Share this post


Link to post
Share on other sites

Wouldn't it be far more efficient to offload the processing to GPU. Just feed the data to fragment shader as a texture and render to another texture.

Share this post


Link to post
Share on other sites
Back to the original question...

You can get rid of one of the divides if you can "predivide" some values.


Here is your lerp after I simplified all the variables and removed the intermediate computations:
    p->scale = Sw + (L - Sx)/(Nx - Sx)( Nw - Sw ) 
This is the same as:
    p->scale = Sw/(Nx - Sx)(Nx - Sx) + (L - Sx)( Nw - Sw )/(Nx - Sx) 
which is the same as
    p->scale = Skk + (L - Sx)( Nk - Sk ) 
where k, Nk, and Sk are precomputed like this:
    k = Nx - Sx
Nk = Nw / k
Sk =Sw / k
I have no idea if you can precompute these values, but if you can it will save one divide.

Edit: fixed a typo

[Edited by - JohnBolton on May 23, 2007 12:18:19 PM]

Share this post


Link to post
Share on other sites
Actually, I've got a small class at home for linear interpolation using a fixed-point algorithm similar to a DDA.

I guess it wouldn't help much if you're restricted to floats, but my class is much faster for my needs.

Share this post


Link to post
Share on other sites

This topic is 3861 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this