• Create Account

# Carmacks sqrt()

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

100 replies to this topic

### #1jonbell  Members   -  Reputation: 100

Like
Likes
Like

Posted 15 February 2003 - 04:23 AM

Last week i saw a topic in which someone posted a copy of the sqrt function used in quake 3 but i was at work and couldn''t grab it. If anyone has it can i have a copy plz. Also how much of a loss in accuracy does it take for speed?

### #2Ronin Magus  Members   -  Reputation: 142

Like
Likes
Like

Posted 15 February 2003 - 04:26 AM

      float InvSqrt (float x){    float xhalf = 0.5f*x;    int i = *(int*)&x;    i = 0x5f3759df - (i >> 1);    x = *(float*)&i;    x = x*(1.5f - xhalf*x*x);    return x;}  I believe it was inverse square root.  To test it I used:      int main(){    cout << "Using Carmacks: " << 1 / InvSqrt(3) << endl;    cout << "math.h: " << sqrt(3) << endl;}

And it was pretty accurate.

I did a quick (and I believe innacurate, because I used SDL's getticks instead of other means due to inexperience) test using full compiler optimization in Dev-c++ of the time required to do 10,000,000 loops on both math.h's and carmacks.

Carmack: 28ms
math.h: 232ms

Hmm. That's an awfully large difference. Please correct me if I'm wrong.

[edited by - Ronin Magus on February 15, 2003 11:35:18 AM]

### #3uutee  Members   -  Reputation: 142

Like
Likes
Like

Posted 15 February 2003 - 04:58 AM

Doh. This method (Newton-Rhapson) has been used for years, and the theory behind it is hundreds of years old. It's a damn shame if people begin to call it Carmack's sqrt now, just because Carmack has a guru's status and a cool sounding name.

- Mikko Kauppila (whose name doesn't sound cool

[edited by - uutee on February 15, 2003 11:59:23 AM]

### #4Ronin Magus  Members   -  Reputation: 142

Like
Likes
Like

Posted 15 February 2003 - 04:59 AM

Well I just called it that because I''m no math person, I saw this post on the front page and remembered copying that code.

### #5Pipo DeClown  Members   -  Reputation: 804

Like
Likes
Like

Posted 15 February 2003 - 05:06 AM

sorry, bt what does it do actually? (in code)

.lick

### #6LilBudyWizer  Members   -  Reputation: 491

Like
Likes
Like

Posted 15 February 2003 - 05:06 AM

I wouldn''t say it is awefully large. Rather ten to one is what I generally use as a rule of thumb as to whether something is worth while. Within performance you can get 100 to 1 and 1000 to 1 improvements, i.e. bubble sort versus quicksort. With 10 to 1 you might as well go ahead and do it without compelling reasons otherwise. With 10% it better be a big part of your processing. 90% of 10% is just as big as 10% of 90%, but a whole lot less work as a general rule.

### #7Alpheus  Crossbones+   -  Reputation: 6692

Like
Likes
Like

Posted 15 February 2003 - 05:16 AM

i''m not seeing how this works.
i tried to do it by hand and i''m getting weird numbers...

float x == 4

float xhalf == 0.5f * 4 == 2

int i == 2

i == 5f3759df (which is 1597463007 in decimal) - (2 >> 1)
== 1597463006

x == 1597463006

x == 1597463006*(1.5f - 2*x*x)

x == some ridiculously long number:
(8153093528352233345939813923)

and the inverse is (1/x): 1.22 e-28.
in other words not 2.

can someone tell me where i went wrong with the calculation?

### #8darookie  Members   -  Reputation: 1441

Like
Likes
Like

Posted 15 February 2003 - 05:30 AM

quote:
Original post by Alpha_ProgDes
can someone tell me where i went wrong with the calculation?

You are using integers in your calculation.
The function relies on the different representations of integers and floats in memory. This is why there are pointer dereferences instead of simple assignments.

### #9The Reindeer Effect  Members   -  Reputation: 211

Like
Likes
Like

Posted 15 February 2003 - 06:18 AM

I know that in the SSE instruction set there is a similar instruction (rsqrt*s) that uses an on chip LUT.

Then just toss in a (rcp*s) to take a fast reciprocal of that, and you have the square root.

SSE is supported by Pentium 3/4 and Athlon XP, but you might want to check out 3DNOW! stuff for AMD processors because you might happen to find a quicker way of doing it.

### #10SwSh  Members   -  Reputation: 145

Like
Likes
Like

Posted 15 February 2003 - 06:27 AM

There''s a thing I don''t understand. Why the: int i = *(int*)&x; instead of: int i = (int)x;
Darookie mentioned something about it, but what are the details behind that method?
Could anyone explain it briefly?

Click NOW!

### #11darookie  Members   -  Reputation: 1441

Like
Likes
Like

Posted 15 February 2003 - 06:39 AM

Ok. I''ll try to explain it

On x86 platforms, floats are represented in IEEE format:

Single Precision (32 bits)
The IEEE single precision floating point standard representation requires a 32 bit word, which may be represented as numbered from 0 to 31, left to right. The first bit is the sign bit, S, the next eight bits are the exponent bits, ''E'', and the final 23 bits are the fraction ''F'':
  S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF  0 1      8 9                    31The value V represented by the word may be determined as follows:If E=255 and F is nonzero, then V=NaN ("Not a number") If E=255 and F is zero and S is 1, then V=-Infinity If E=255 and F is zero and S is 0, then V=Infinity If 0If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values. If E=0 and F is zero and S is 1, then V=-0 If E=0 and F is zero and S is 0, then V=0 In particular,  0 00000000 00000000000000000000000 = 0  1 00000000 00000000000000000000000 = -0  0 11111111 00000000000000000000000 = Infinity  1 11111111 00000000000000000000000 = -Infinity  0 11111111 00000100000000000000000 = NaN  1 11111111 00100010001001010101010 = NaN  0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2  0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5  1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5  0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)  0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)   0 00000000 00000000000000000000001 = +1 * 2**(-126) *                                        0.00000000000000000000001 =                                        2**(-149)  (Smallest positive value)

If you write int i = x, the compiler will convert x to the nearest integer that is lower than x, which is not what you want.
Since you need to do some bit shifting and masking tricks that are only available for integers, you need to fool the compiler to have an integer variable that contains the exact bits of x.

This is why pointer dereferencing i used instead of simple assignments.

### #12SwSh  Members   -  Reputation: 145

Like
Likes
Like

Posted 15 February 2003 - 06:51 AM

Thanks a lot. For one or another reason I didn''t know that bitwise operations are not permitted on floats. (probably I never needed to perform them on floats )
Now it''s clear.

Click NOW!

### #13 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

Likes

Posted 16 February 2003 - 05:25 AM

Plug that "magic constant" into Google. The first result is an explanation of this code. Interesting stuff.

### #14 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

Likes

Posted 16 February 2003 - 04:31 PM

If anyone''s interested I did some testing on this formula:

taking the standard c sqrt and the ''magic'' sqrt of all #s from 1-5000, assuming the standard sqrt is the true value:

the average % error is: 0.0877298% (and no, I didn''t forget to multipy by 100 - it''s really that low)

the average difference between the ''magic'' and c sqrts is: 0.0395671

if 0.0395671 is subracted from every ''magic'' sqrt the average % error drops to: 0.0746919%
However, the %error for smaller #s skyrockets.

Going from 0 to 4809 the percent error continually drops.
At 4810 to 4817, my level of precision could not tell if there was any difference.
from 4818 to 5000 the % error slowly increased.
For 4793 to 4809 and 4818 to 4837, the percent error output was put in scientific notation (using E)

I didn''t time the functions though - maybe I''ll do that later.

### #15Enigma  Members   -  Reputation: 1402

Like
Likes
Like

Posted 17 February 2003 - 03:20 AM

That was fun - I decided to see if I could optimise the function any further. Here''s what I came up with:
float InvSqrt2 (float x){	union{		int intPart;		float floatPart;	} convertor;	convertor.floatPart = x;	convertor.intPart = 0x5f3759df - (convertor.intPart >> 1);	return 0.5f*convertor.floatPart*(3.0f - x*convertor.floatPart*convertor.floatPart);}

I''m not sure how much accuracy it''ll lose by pulling the 0.5f out in the return line. I''m guessing it''ll be less than the error already introduced by the method. Just in case though I also tried:
float InvSqrt3 (float x){	union{		int intPart;		float floatPart;	} convertor;	convertor.floatPart = x;	convertor.intPart = 0x5f3759df - (convertor.intPart >> 1);	return convertor.floatPart*(1.5f - 0.5f*x*convertor.floatPart*convertor.floatPart);}

I timed them and averaged over several runs and got the following times for performing 500 000 000 invsqrts:

InvSqrt (original): 29.233s = 58.466ns per invsqrt
InvSqrt2: 20.957 = 41.914ns per invsqrt
InvSqrt3: 21.550 = 43.100ns per invsqrt

So my best attempt was 28.3% faster than the original function. If you take off the loop overhead (4.48694s) the speed increase is 33.4%!

Can anyone beat that?

Enigma

### #16 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

Likes

Posted 17 February 2003 - 06:54 AM

quote:
Original post by Enigma
I''m not sure how much accuracy it''ll lose by pulling the 0.5f out in the return line. I''m guessing it''ll be less than the error already introduced by the method.

Try ''none''. If there was any difference it wasn''t obvious - the avg. difference and the avg. % error are the same for InvSqrt and InvSqrt2, compared to sqrt().

I think the only way to make it even faster is either to write in assembly(maybe use sse or mmx?) or double check to make sure the optimizer for your compiler is on full throttle. Oh, and maybe make the function inline.

Anyway, can this formula work with doubles and long doubles?

### #17NickB  Members   -  Reputation: 146

Like
Likes
Like

Posted 17 February 2003 - 10:00 AM

This code in it''s self would not work directly with doubles (or long doubles), but I think the principle could be adapted to work with doubles. However, I''m not sure how you''d work it since you need to do 64bit integer arithmetic & I don''t know of any standard implementation of C that allow 64 bit integer processing (MSVC has __int64 - but I''m unsure of how it codes instructions). If you don''t mind it being un-portable, then you could use x86 asm, which has a weird & wonderful instruction (that I''ve forgotten!) that would make the right shift operation to be done fairly easily & efficiently with 32 bit registers (of course, there''s MMX - but for the sake of 2 instructions it''d kill the speed gain. Something to think about!

### #18q2guy  Members   -  Reputation: 122

Like
Likes
Like

Posted 17 February 2003 - 12:03 PM

with vc++ 6.0 optimizations on:

CONST SEGMENT
__real@3f000000 DD 03f000000r ; 0.5
CONST ENDS
CONST SEGMENT
__real@3fc00000 DD 03fc00000r ; 1.5
; Function compile flags: /Ogty
CONST ENDS
_TEXT SEGMENT
_x$= 8 _i$ = 8
?InvSqrt@@YAMM@Z PROC NEAR
mov eax, DWORD PTR _x$[esp-4] mov ecx, 1597463007 ; 5f3759dfH fld DWORD PTR _x$[esp-4]
fmul DWORD PTR __real@3f000000
sar eax, 1
sub ecx, eax

mov DWORD PTR _i$[esp-4], ecx fld DWORD PTR _i$[esp-4]
fmul DWORD PTR _i$[esp-4] fmulp ST(1), ST(0) fsubr DWORD PTR __real@3fc00000 fmul DWORD PTR _i$[esp-4]
ret 0
?InvSqrt@@YAMM@Z ENDP
_TEXT ENDS

the code seem good optimized, and of course use bit rotation, but I think that the fsqrt FPU instrucction its enough fast and accurate, or the equivalents in 3dnow/SSE/SSE2, not mmx because it works with integers

### #19jperalta  Members   -  Reputation: 356

Like
Likes
Like

Posted 18 February 2003 - 05:39 AM

Apparently the C# square root function is a lot more optimized than this one. I did the test with the functions under C# and got the following results (Tested on all integers in the set [1,10e5]:

Sqrt: 4.01 ms
Test: 11.0 ms
Diff: 7.01 ms
%err: 0.095%

Sorry about the previous numbers, I put an extra timer call in that messed some stuff up. So the C# sqrt is still faster than the Newton method when used under C# however the speed difference is very small.
-- Exitus Acta Probat --

[edited by - jperalta on February 18, 2003 12:55:14 PM]

[edited by - jperalta on February 18, 2003 12:56:02 PM]

### #20 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

Likes

Posted 18 February 2003 - 01:14 PM

I figured out how this thing works, and have improved the accuracy. The new code is

/// Chris Lomont version of fast inverse sqrt,
/// based on Newton method, 1 iteration, more accurate
float InvSqrt_Lomont(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f375a84 - (i>>1); // hidden initial guess, fast
x = *(float*)&i;
x = x*(1.5f-xhalf*x*x);
return x;
} // InvSqrt_Lomont

I changed the const from 0x5f3759df to 0x5f375a84 which reduces the maximum relative error over the entire range of floating point numbers. If someone else will like to test this, please post anything you find.

I hope to write up an explanation of how this works, and post it to my site (www.math.purdue.edu/~clomont) eventually under papers. The explanation by David Eberly at www.magic-software.com/Documentation/FastInverseSqrt.pdf is not entirely correct, since he ignores a lot of bit errors and shifts.

Let me know if this works for everyone. I am pretty certain about my analysis.

Chris Lomont

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

PARTNERS