# Carmacks sqrt()

###
#2
Members - Reputation: **142**

Posted 15 February 2003 - 04:26 AM

float InvSqrt (float x)

{

float xhalf = 0.5f*x;

int i = *(int*)&x;

i = 0x5f3759df - (i >> 1);

x = *(float*)&i;

x = x*(1.5f - xhalf*x*x);

return x;

}

I believe it was inverse square root. To test it I used:

int main()

{

cout << "Using Carmacks: " << 1 / InvSqrt(3) << endl;

cout << "math.h: " << sqrt(3) << endl;

}

And it was pretty accurate.

I did a quick (and I believe innacurate, because I used SDL's getticks instead of other means due to inexperience) test using full compiler optimization in Dev-c++ of the time required to do 10,000,000 loops on both math.h's and carmacks.

Carmack: 28ms

math.h: 232ms

Hmm. That's an awfully large difference. Please correct me if I'm wrong.

[edited by - Ronin Magus on February 15, 2003 11:35:18 AM]

###
#3
Members - Reputation: **142**

Posted 15 February 2003 - 04:58 AM

- Mikko Kauppila (whose name doesn't sound cool

[edited by - uutee on February 15, 2003 11:59:23 AM]

###
#6
Members - Reputation: **491**

Posted 15 February 2003 - 05:06 AM

###
#7
Crossbones+ - Reputation: **4956**

Posted 15 February 2003 - 05:16 AM

i tried to do it by hand and i''m getting weird numbers...

float x == 4

float xhalf == 0.5f * 4 == 2

int i == 2

i == 5f3759df (which is 1597463007 in decimal) - (2 >> 1)

== 1597463006

x == 1597463006

x == 1597463006*(1.5f - 2*x*x)

x == some ridiculously long number:

(8153093528352233345939813923)

and the inverse is (1/x): 1.22 e-28.

in other words not 2.

can someone tell me where i went wrong with the calculation?

###
#8
Members - Reputation: **1437**

Posted 15 February 2003 - 05:30 AM

quote:Original post by Alpha_ProgDes

can someone tell me where i went wrong with the calculation?

You are using integers in your calculation.

The function relies on the different representations of integers and floats in memory. This is why there are pointer dereferences instead of simple assignments.

###
#9
Members - Reputation: **211**

Posted 15 February 2003 - 06:18 AM

Then just toss in a (rcp*s) to take a fast reciprocal of that, and you have the square root.

SSE is supported by Pentium 3/4 and Athlon XP, but you might want to check out 3DNOW! stuff for AMD processors because you might happen to find a quicker way of doing it.

###
#10
Members - Reputation: **145**

Posted 15 February 2003 - 06:27 AM

Darookie mentioned something about it, but what are the details behind that method?

Could anyone explain it briefly?

**Click NOW!**

###
#11
Members - Reputation: **1437**

Posted 15 February 2003 - 06:39 AM

On x86 platforms, floats are represented in IEEE format:

Single Precision (32 bits)

The IEEE single precision floating point standard representation requires a 32 bit word, which may be represented as numbered from 0 to 31, left to right. The first bit is the sign bit, S, the next eight bits are the exponent bits, ''E'', and the final 23 bits are the fraction ''F'':

S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF

0 1 8 9 31

The value V represented by the word may be determined as follows:

If E=255 and F is nonzero, then V=NaN ("Not a number")

If E=255 and F is zero and S is 1, then V=-Infinity

If E=255 and F is zero and S is 0, then V=Infinity

If 0If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.

If E=0 and F is zero and S is 1, then V=-0

If E=0 and F is zero and S is 0, then V=0

In particular,

0 00000000 00000000000000000000000 = 0

1 00000000 00000000000000000000000 = -0

0 11111111 00000000000000000000000 = Infinity

1 11111111 00000000000000000000000 = -Infinity

0 11111111 00000100000000000000000 = NaN

1 11111111 00100010001001010101010 = NaN

0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2

0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5

1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5

0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)

0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)

0 00000000 00000000000000000000001 = +1 * 2**(-126) *

0.00000000000000000000001 =

2**(-149) (Smallest positive value)

If you write int i = x, the compiler will convert x to the nearest integer that is lower than x, which is not what you want.

Since you need to do some bit shifting and masking tricks that are only available for integers, you need to fool the compiler to have an integer variable that contains the exact bits of x.

This is why pointer dereferencing i used instead of simple assignments.

###
#12
Members - Reputation: **145**

Posted 15 February 2003 - 06:51 AM

Now it''s clear.

**Click NOW!**

###
#13
Anonymous Poster_Anonymous Poster_*
Guests - Reputation:

Posted 16 February 2003 - 05:25 AM

###
#14
Anonymous Poster_Anonymous Poster_*
Guests - Reputation:

Posted 16 February 2003 - 04:31 PM

taking the standard c sqrt and the ''magic'' sqrt of all #s from 1-5000, assuming the standard sqrt is the true value:

the average % error is: 0.0877298% (and no, I didn''t forget to multipy by 100 - it''s really that low)

the average difference between the ''magic'' and c sqrts is: 0.0395671

if 0.0395671 is subracted from every ''magic'' sqrt the average % error drops to: 0.0746919%

However, the %error for smaller #s skyrockets.

Going from 0 to 4809 the percent error continually drops.

At 4810 to 4817, my level of precision could not tell if there was any difference.

from 4818 to 5000 the % error slowly increased.

For 4793 to 4809 and 4818 to 4837, the percent error output was put in scientific notation (using E)

I didn''t time the functions though - maybe I''ll do that later.

###
#15
Members - Reputation: **1402**

Posted 17 February 2003 - 03:20 AM

float InvSqrt2 (float x){

union{

int intPart;

float floatPart;

} convertor;

convertor.floatPart = x;

convertor.intPart = 0x5f3759df - (convertor.intPart >> 1);

return 0.5f*convertor.floatPart*(3.0f - x*convertor.floatPart*convertor.floatPart);

}

I''m not sure how much accuracy it''ll lose by pulling the 0.5f out in the return line. I''m guessing it''ll be less than the error already introduced by the method. Just in case though I also tried:

float InvSqrt3 (float x){

union{

int intPart;

float floatPart;

} convertor;

convertor.floatPart = x;

convertor.intPart = 0x5f3759df - (convertor.intPart >> 1);

return convertor.floatPart*(1.5f - 0.5f*x*convertor.floatPart*convertor.floatPart);

}

I timed them and averaged over several runs and got the following times for performing 500 000 000 invsqrts:

InvSqrt (original): 29.233s = 58.466ns per invsqrt

InvSqrt2: 20.957 = 41.914ns per invsqrt

InvSqrt3: 21.550 = 43.100ns per invsqrt

So my best attempt was 28.3% faster than the original function. If you take off the loop overhead (4.48694s) the speed increase is 33.4%!

Can anyone beat that?

Enigma

###
#16
Anonymous Poster_Anonymous Poster_*
Guests - Reputation:

Posted 17 February 2003 - 06:54 AM

quote:Original post by Enigma

I''m not sure how much accuracy it''ll lose by pulling the 0.5f out in the return line. I''m guessing it''ll be less than the error already introduced by the method.

Try ''none''. If there was any difference it wasn''t obvious - the avg. difference and the avg. % error are the same for InvSqrt and InvSqrt2, compared to sqrt().

I think the only way to make it even faster is either to write in assembly(maybe use sse or mmx?) or double check to make sure the optimizer for your compiler is on full throttle. Oh, and maybe make the function inline.

Anyway, can this formula work with doubles and long doubles?

###
#17
Members - Reputation: **146**

Posted 17 February 2003 - 10:00 AM

###
#18
Members - Reputation: **122**

Posted 17 February 2003 - 12:03 PM

CONST SEGMENT

__real@3f000000 DD 03f000000r ; 0.5

CONST ENDS

CONST SEGMENT

__real@3fc00000 DD 03fc00000r ; 1.5

; Function compile flags: /Ogty

CONST ENDS

_TEXT SEGMENT

_x$ = 8

_i$ = 8

?InvSqrt@@YAMM@Z PROC NEAR

mov eax, DWORD PTR _x$[esp-4]

mov ecx, 1597463007 ; 5f3759dfH

fld DWORD PTR _x$[esp-4]

fmul DWORD PTR __real@3f000000

sar eax, 1

sub ecx, eax

mov DWORD PTR _i$[esp-4], ecx

fld DWORD PTR _i$[esp-4]

fmul DWORD PTR _i$[esp-4]

fmulp ST(1), ST(0)

fsubr DWORD PTR __real@3fc00000

fmul DWORD PTR _i$[esp-4]

ret 0

?InvSqrt@@YAMM@Z ENDP

_TEXT ENDS

the code seem good optimized, and of course use bit rotation, but I think that the fsqrt FPU instrucction its enough fast and accurate, or the equivalents in 3dnow/SSE/SSE2, not mmx because it works with integers

###
#19
Members - Reputation: **356**

Posted 18 February 2003 - 05:39 AM

Sqrt: 4.01 ms

Test: 11.0 ms

Diff: 7.01 ms

%err: 0.095%

Sorry about the previous numbers, I put an extra timer call in that messed some stuff up. So the C# sqrt is still faster than the Newton method when used under C# however the speed difference is very small.

-- Exitus Acta Probat --

[edited by - jperalta on February 18, 2003 12:55:14 PM]

[edited by - jperalta on February 18, 2003 12:56:02 PM]

###
#20
Anonymous Poster_Anonymous Poster_*
Guests - Reputation:

Posted 18 February 2003 - 01:14 PM

/// Chris Lomont version of fast inverse sqrt,

/// based on Newton method, 1 iteration, more accurate

float InvSqrt_Lomont(float x)

{

float xhalf = 0.5f*x;

int i = *(int*)&x;

i = 0x5f375a84 - (i>>1); // hidden initial guess, fast

x = *(float*)&i;

x = x*(1.5f-xhalf*x*x);

return x;

} // InvSqrt_Lomont

I changed the const from 0x5f3759df to 0x5f375a84 which reduces the maximum relative error over the entire range of floating point numbers. If someone else will like to test this, please post anything you find.

I hope to write up an explanation of how this works, and post it to my site (www.math.purdue.edu/~clomont) eventually under papers. The explanation by David Eberly at www.magic-software.com/Documentation/FastInverseSqrt.pdf is not entirely correct, since he ignores a lot of bit errors and shifts.

Let me know if this works for everyone. I am pretty certain about my analysis.

Chris Lomont