Carmacks sqrt()

Started by
99 comments, last by jonbell 19 years, 10 months ago
Again, testing in C# as it''s my current language of choice and using your new algorithm results, on the set [1,10e5], 100 iterations are as follows:

Average percent error: 0.096%
Average time difference: 4.79 ms in favor of .NET''s sqrt()

-- Exitus Acta Probat --
Advertisement
From my tests I determined that using 0x5f375a84 was slightly more inaccurate than the original over a range of [1,1000], with incriments of 0.5. However, it was only around 3/1000 of a difference, so it doesn''t make much of a difference which is used, and over some ranges it was more accurate. What I am interested in is the math behind the calculations you did to arrive at that number.
-YoshiXGXCX ''99
quote:Original post by Anonymous Poster
I figured out how this thing works, and have improved the accuracy. The new code is

/// Chris Lomont version of fast inverse sqrt,
/// based on Newton method, 1 iteration, more accurate
float InvSqrt_Lomont(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f375a84 - (i>>1); // hidden initial guess, fast
x = *(float*)&i;
x = x*(1.5f-xhalf*x*x);
return x;
} // InvSqrt_Lomont

I changed the const from 0x5f3759df to 0x5f375a84 which reduces the maximum relative error over the entire range of floating point numbers. If someone else will like to test this, please post anything you find.

I hope to write up an explanation of how this works, and post it to my site (www.math.purdue.edu/~clomont) eventually under papers. The explanation by David Eberly at www.magic-software.com/Documentation/FastInverseSqrt.pdf is not entirely correct, since he ignores a lot of bit errors and shifts.

Let me know if this works for everyone. I am pretty certain about my analysis.

Chris Lomont


Ok, I plugged it into my test program and here''s what I got, with the C sqrt() being the ''true value''. The 1st style uses 0x5f3759df and the 2nd method uses 0x5f375a84.

first 500 integers:
Avg. Pct Error for 1st style: 0.0946063
Avg. Difference for 1st style: 0.0142623
Avg. Pct Error for 2nd style: 0.0946675
Avg. Difference for 2nd style: 0.0142718
first 5000 integers:
Avg. Pct Error for 1st style: 0.0971904
Avg. Difference for 1st style: 0.0409933
Avg. Pct Error for 2nd style: 0.0972366
Avg. Difference for 2nd style: 0.0410079
first 20000 integers:
Avg. Pct Error for 1st style: 0.112022
Avg. Difference for 1st style: 0.0893813
Avg. Pct Error for 2nd style: 0.112074
Avg. Difference for 2nd style: 0.0894122
first 100000 integers:
Avg. Pct Error for 1st style: 0.10822
Avg. Difference for 1st style: 0.192221
Avg. Pct Error for 2nd style: 0.108278
Avg. Difference for 2nd style: 0.192317
first 1e+006 integers:
Avg. Pct Error for 1st style: 0.105428
Avg. Difference for 1st style: 0.647431
Avg. Pct Error for 2nd style: 0.105498
Avg. Difference for 2nd style: 0.647852


Looks like there''s a difference between the two methods, though these results show the 1st method is only very slightly more accurate. I agree with YoshiN on this one. However, both methods are just as suitable because they''re approximations anyway.
quote:Original post by jperalta
Again, testing in C# as it''s my current language of choice and using your new algorithm results, on the set [1,10e5], 100 iterations are as follows:

Average percent error: 0.096%
Average time difference: 4.79 ms in favor of .NET''s sqrt()

-- Exitus Acta Probat --


It occurs to me that .NET might be exploiting processor features w/o telling you it is, hence the speed of that particular function.

##UnknownPlayer##
##UnknownPlayer##
quote:Original post by UnknownPlayer

It occurs to me that .NET might be exploiting processor features w/o telling you it is, hence the speed of that particular function.

##UnknownPlayer##


Just goes to show you how ''portable'' their languages are.
About the accuracy, I tested it over almost ALL floating point numbers. I think the posted function was only tested over numbers close to 1. The test you guys are posting over integers is not complete, since this is probably used a lot to normalize numbers less than one. Here is the function I used to test it, and I still find my version has less relative error over the entire floating point range, as well as a lot of sub ranges.

Please test more, on numbers besides integers, since in practice x may not be integral.

Also, I find that the constant 0x5f375a85 is a tad more accurate over the range than the earlier 0x5f375a84 I posted.

Chris Lomont

/// tests a given function,
/// returns the max relative error, max error, and average relative error
void TestFunc(float (*func)(float), float & relerr, float & maxerr, float & avgerr)
{
float s0,s1,d1,v; // inv sqrt values, delta value, test value
float worstval = 1; // where the worst rel error occurred
unsigned int mant, count=0;
relerr = maxerr = avgerr = 0; // reset these
int bits = 23; // how many bits to test in mantissa, from 1 to 23
for (int e = -126; e <= 126; e += 1) // the exponent values, best ket in -126 to 126 to avoid mistakes
for (mant = 0; mant < (1UL<<bits); mant++) // the mantissa values
{
unsigned int temp; // used to construct the floating point number
unsigned int E;
E = (127+e);
temp = (E<<23) | (mant<<(23-bits));
v = *(float*)&temp; // comstruct a floating point with exponent and mantissa
s0 = (float)(1.0/sqrt(v)); // "correct" value
s1 = func(v); // the passed in functions value

// compute errors
d1 = (float)fabs(s0-s1); // delta value, or absolute error
if (d1 > maxerr)
maxerr = d1;
if (d1/s0 > relerr) // checking the relative error
{
relerr = d1/s0;
worstval = v;
}
avgerr += relerr;
count++;
}

// uncomment this to see where the max relative error occurred
// cout << worstval << " has error " << relerr << endl;

// finish averages
avgerr /= count;
} // TestFunc
quote:
It occurs to me that .NET might be exploiting processor features w/o telling you it is, hence the speed of that particular function.

##UnknownPlayer##


That occured to me too, just posting the results in case anyone cared.

-- Exitus Acta Probat --
darookie Question:
Why do you subtract 127 from the first few to calculate the exponent, yet on others you subtract 126?

Thanks
quote:
About the accuracy, I tested it over almost ALL floating point numbers. I think the posted function was only tested over numbers close to 1.


Here is the function I used to test it, and I still find my version has less relative error over the entire floating point range, as well as a lot of sub ranges.

Please test more, on numbers besides integers, since in practice x may not be integral.

Also, I find that the constant 0x5f375a85 is a tad more accurate over the range than the earlier 0x5f375a84 I posted.


Ok, I''ll be fair. Here''s some different results with smaller increments(I would have tested even more cases but my computer''s not exactly fast and I''m a little impatient):
1st style is 0x5f3759df, 2nd is 0x5f375a84, 3rd is 0x5f375a85

For all numbers 1 - 1000000 incremented by 0.0010000000475:
Total number of times: 999998953
Avg. Pct Error for 1st style: 0.0946063678424
Avg. Difference for 1st style: 0.628209096009
Avg. Pct Error for 2nd style: 0.0946706139572
Avg. Difference for 2nd style: 0.62862068985
Avg. Pct Error for 3rd style: 0.0946710035252
Avg. Difference for 2rd style: 0.628623185747


For all numbers 1 - 1000 incremented by 9.99999974738e-005:
Total number of times: 9990001
Avg. Pct Error for 1st style: 0.0963345727045
Avg. Difference for 1st style: 0.0204320324505
Avg. Pct Error for 2nd style: 0.0963950607491
Avg. Difference for 2nd style: 0.0204438955941
Avg. Pct Error for 3rd style: 0.0963954275384
Avg. Difference for 2rd style: 0.0204439675353


For all numbers 1 - 10 incremented by 1.00000001169e-007:
Total number of times: 89999999
Avg. Pct Error for 1st style: 0.0959722277237
Avg. Difference for 1st style: 0.00222551400415
Avg. Pct Error for 2nd style: 0.0960468310797
Avg. Difference for 2nd style: 0.00222733983144
Avg. Pct Error for 3rd style: 0.09604728339
Avg. Difference for 2rd style: 0.00222735090097

Ok, I''ll admit there''s less of a difference.
I think I might know what might be causing the difference between our readings. I used 1.0f/InvSqrt() and you used 1.0f/sqrt() to get the two functions into alignment. It doesn''t sound like it should make a major difference but it''s the only thing I can think of.

quote:

The test you guys are posting over integers is not complete, since this is probably used a lot to normalize numbers less than one.



True in theory but in reality it''s not a problem; all three of the InvSqrt functions are more than accurate enough for this. And applications which need high levels of accuracy can use the slower sqrt() function. But we don''t care about that, because this stuff is so cool to tinker with.

I''ll play around with Chris''s code later and see if I can figure this whole thing out. If you''re interested I''ll clean up my incredibly unreadable code and post it.

Oh, and I forgot to ask: how did you derive those new constants? I''m interested in the math behind it.
I am working on a PDF file explaining the analysis of how I derived the new constant. It will be a bit more technical than most are used to reading, but will explain in gory detail (and CORRECTLY!) how this works (allowing similar things to be derived).

Stay tuned

Chris Lomont

clomont@removemeforsp@m.math.purdue.edu

www.math.purdue.edu\~clomont

This topic is closed to new replies.

Advertisement