• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.

Archived

This topic is now archived and is closed to further replies.

jonbell

Carmacks sqrt()

100 posts in this topic

Last week i saw a topic in which someone posted a copy of the sqrt function used in quake 3 but i was at work and couldn''t grab it. If anyone has it can i have a copy plz. Also how much of a loss in accuracy does it take for speed?
0

Share this post


Link to post
Share on other sites

      
float InvSqrt (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i >> 1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}


I believe it was inverse square root. To test it I used:


int main()
{
cout << "Using Carmacks: " << 1 / InvSqrt(3) << endl;
cout << "math.h: " << sqrt(3) << endl;
}


And it was pretty accurate.

I did a quick (and I believe innacurate, because I used SDL's getticks instead of other means due to inexperience) test using full compiler optimization in Dev-c++ of the time required to do 10,000,000 loops on both math.h's and carmacks.

Carmack: 28ms
math.h: 232ms

Hmm. That's an awfully large difference. Please correct me if I'm wrong.

[edited by - Ronin Magus on February 15, 2003 11:35:18 AM]
0

Share this post


Link to post
Share on other sites
Doh. This method (Newton-Rhapson) has been used for years, and the theory behind it is hundreds of years old. It's a damn shame if people begin to call it Carmack's sqrt now, just because Carmack has a guru's status and a cool sounding name.

- Mikko Kauppila (whose name doesn't sound cool

[edited by - uutee on February 15, 2003 11:59:23 AM]
0

Share this post


Link to post
Share on other sites
Well I just called it that because I''m no math person, I saw this post on the front page and remembered copying that code.
0

Share this post


Link to post
Share on other sites
I wouldn''t say it is awefully large. Rather ten to one is what I generally use as a rule of thumb as to whether something is worth while. Within performance you can get 100 to 1 and 1000 to 1 improvements, i.e. bubble sort versus quicksort. With 10 to 1 you might as well go ahead and do it without compelling reasons otherwise. With 10% it better be a big part of your processing. 90% of 10% is just as big as 10% of 90%, but a whole lot less work as a general rule.
0

Share this post


Link to post
Share on other sites
i''m not seeing how this works.
i tried to do it by hand and i''m getting weird numbers...

float x == 4

float xhalf == 0.5f * 4 == 2

int i == 2

i == 5f3759df (which is 1597463007 in decimal) - (2 >> 1)
== 1597463006

x == 1597463006

x == 1597463006*(1.5f - 2*x*x)

x == some ridiculously long number:
(8153093528352233345939813923)

and the inverse is (1/x): 1.22 e-28.
in other words not 2.

can someone tell me where i went wrong with the calculation?


0

Share this post


Link to post
Share on other sites
quote:
Original post by Alpha_ProgDes
can someone tell me where i went wrong with the calculation?


You are using integers in your calculation.
The function relies on the different representations of integers and floats in memory. This is why there are pointer dereferences instead of simple assignments.


0

Share this post


Link to post
Share on other sites
I know that in the SSE instruction set there is a similar instruction (rsqrt*s) that uses an on chip LUT.

Then just toss in a (rcp*s) to take a fast reciprocal of that, and you have the square root.

SSE is supported by Pentium 3/4 and Athlon XP, but you might want to check out 3DNOW! stuff for AMD processors because you might happen to find a quicker way of doing it.
0

Share this post


Link to post
Share on other sites
There''s a thing I don''t understand. Why the: int i = *(int*)&x; instead of: int i = (int)x;
Darookie mentioned something about it, but what are the details behind that method?
Could anyone explain it briefly?


Click NOW!
0

Share this post


Link to post
Share on other sites
Ok. I''ll try to explain it

On x86 platforms, floats are represented in IEEE format:

Single Precision (32 bits)
The IEEE single precision floating point standard representation requires a 32 bit word, which may be represented as numbered from 0 to 31, left to right. The first bit is the sign bit, S, the next eight bits are the exponent bits, ''E'', and the final 23 bits are the fraction ''F'':

S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
0 1 8 9 31

The value V represented by the word may be determined as follows:


If E=255 and F is nonzero, then V=NaN ("Not a number")
If E=255 and F is zero and S is 1, then V=-Infinity
If E=255 and F is zero and S is 0, then V=Infinity
If 0If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.
If E=0 and F is zero and S is 1, then V=-0
If E=0 and F is zero and S is 0, then V=0
In particular,


0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0

0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity

0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 = NaN

0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5

0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)
0 00000000 00000000000000000000001 = +1 * 2**(-126) *
0.00000000000000000000001 =
2**(-149) (Smallest positive value)


If you write int i = x, the compiler will convert x to the nearest integer that is lower than x, which is not what you want.
Since you need to do some bit shifting and masking tricks that are only available for integers, you need to fool the compiler to have an integer variable that contains the exact bits of x.

This is why pointer dereferencing i used instead of simple assignments.

0

Share this post


Link to post
Share on other sites
Thanks a lot. For one or another reason I didn''t know that bitwise operations are not permitted on floats. (probably I never needed to perform them on floats )
Now it''s clear.


Click NOW!
0

Share this post


Link to post
Share on other sites
Plug that "magic constant" into Google. The first result is an explanation of this code. Interesting stuff.
0

Share this post


Link to post
Share on other sites
If anyone''s interested I did some testing on this formula:

taking the standard c sqrt and the ''magic'' sqrt of all #s from 1-5000, assuming the standard sqrt is the true value:

the average % error is: 0.0877298% (and no, I didn''t forget to multipy by 100 - it''s really that low)

the average difference between the ''magic'' and c sqrts is: 0.0395671

if 0.0395671 is subracted from every ''magic'' sqrt the average % error drops to: 0.0746919%
However, the %error for smaller #s skyrockets.

Going from 0 to 4809 the percent error continually drops.
At 4810 to 4817, my level of precision could not tell if there was any difference.
from 4818 to 5000 the % error slowly increased.
For 4793 to 4809 and 4818 to 4837, the percent error output was put in scientific notation (using E)

I didn''t time the functions though - maybe I''ll do that later.

0

Share this post


Link to post
Share on other sites
That was fun - I decided to see if I could optimise the function any further. Here''s what I came up with:

float InvSqrt2 (float x){
union{
int intPart;
float floatPart;
} convertor;

convertor.floatPart = x;
convertor.intPart = 0x5f3759df - (convertor.intPart >> 1);
return 0.5f*convertor.floatPart*(3.0f - x*convertor.floatPart*convertor.floatPart);
}


I''m not sure how much accuracy it''ll lose by pulling the 0.5f out in the return line. I''m guessing it''ll be less than the error already introduced by the method. Just in case though I also tried:

float InvSqrt3 (float x){
union{
int intPart;
float floatPart;
} convertor;

convertor.floatPart = x;
convertor.intPart = 0x5f3759df - (convertor.intPart >> 1);
return convertor.floatPart*(1.5f - 0.5f*x*convertor.floatPart*convertor.floatPart);
}


I timed them and averaged over several runs and got the following times for performing 500 000 000 invsqrts:

InvSqrt (original): 29.233s = 58.466ns per invsqrt
InvSqrt2: 20.957 = 41.914ns per invsqrt
InvSqrt3: 21.550 = 43.100ns per invsqrt

So my best attempt was 28.3% faster than the original function. If you take off the loop overhead (4.48694s) the speed increase is 33.4%!

Can anyone beat that?

Enigma
0

Share this post


Link to post
Share on other sites
quote:
Original post by Enigma
I''m not sure how much accuracy it''ll lose by pulling the 0.5f out in the return line. I''m guessing it''ll be less than the error already introduced by the method.



Try ''none''. If there was any difference it wasn''t obvious - the avg. difference and the avg. % error are the same for InvSqrt and InvSqrt2, compared to sqrt().

I think the only way to make it even faster is either to write in assembly(maybe use sse or mmx?) or double check to make sure the optimizer for your compiler is on full throttle. Oh, and maybe make the function inline.

Anyway, can this formula work with doubles and long doubles?
0

Share this post


Link to post
Share on other sites
This code in it''s self would not work directly with doubles (or long doubles), but I think the principle could be adapted to work with doubles. However, I''m not sure how you''d work it since you need to do 64bit integer arithmetic & I don''t know of any standard implementation of C that allow 64 bit integer processing (MSVC has __int64 - but I''m unsure of how it codes instructions). If you don''t mind it being un-portable, then you could use x86 asm, which has a weird & wonderful instruction (that I''ve forgotten!) that would make the right shift operation to be done fairly easily & efficiently with 32 bit registers (of course, there''s MMX - but for the sake of 2 instructions it''d kill the speed gain. Something to think about!
0

Share this post


Link to post
Share on other sites
with vc++ 6.0 optimizations on:

CONST SEGMENT
__real@3f000000 DD 03f000000r ; 0.5
CONST ENDS
CONST SEGMENT
__real@3fc00000 DD 03fc00000r ; 1.5
; Function compile flags: /Ogty
CONST ENDS
_TEXT SEGMENT
_x$ = 8
_i$ = 8
?InvSqrt@@YAMM@Z PROC NEAR
mov eax, DWORD PTR _x$[esp-4]
mov ecx, 1597463007 ; 5f3759dfH
fld DWORD PTR _x$[esp-4]
fmul DWORD PTR __real@3f000000
sar eax, 1
sub ecx, eax

mov DWORD PTR _i$[esp-4], ecx
fld DWORD PTR _i$[esp-4]
fmul DWORD PTR _i$[esp-4]
fmulp ST(1), ST(0)
fsubr DWORD PTR __real@3fc00000
fmul DWORD PTR _i$[esp-4]
ret 0
?InvSqrt@@YAMM@Z ENDP
_TEXT ENDS

the code seem good optimized, and of course use bit rotation, but I think that the fsqrt FPU instrucction its enough fast and accurate, or the equivalents in 3dnow/SSE/SSE2, not mmx because it works with integers
0

Share this post


Link to post
Share on other sites
Apparently the C# square root function is a lot more optimized than this one. I did the test with the functions under C# and got the following results (Tested on all integers in the set [1,10e5]:

Sqrt: 4.01 ms
Test: 11.0 ms
Diff: 7.01 ms
%err: 0.095%

Sorry about the previous numbers, I put an extra timer call in that messed some stuff up. So the C# sqrt is still faster than the Newton method when used under C# however the speed difference is very small.
-- Exitus Acta Probat --

[edited by - jperalta on February 18, 2003 12:55:14 PM]

[edited by - jperalta on February 18, 2003 12:56:02 PM]
0

Share this post


Link to post
Share on other sites
I figured out how this thing works, and have improved the accuracy. The new code is

/// Chris Lomont version of fast inverse sqrt,
/// based on Newton method, 1 iteration, more accurate
float InvSqrt_Lomont(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f375a84 - (i>>1); // hidden initial guess, fast
x = *(float*)&i;
x = x*(1.5f-xhalf*x*x);
return x;
} // InvSqrt_Lomont

I changed the const from 0x5f3759df to 0x5f375a84 which reduces the maximum relative error over the entire range of floating point numbers. If someone else will like to test this, please post anything you find.

I hope to write up an explanation of how this works, and post it to my site (www.math.purdue.edu/~clomont) eventually under papers. The explanation by David Eberly at www.magic-software.com/Documentation/FastInverseSqrt.pdf is not entirely correct, since he ignores a lot of bit errors and shifts.

Let me know if this works for everyone. I am pretty certain about my analysis.

Chris Lomont
0

Share this post


Link to post
Share on other sites
Again, testing in C# as it''s my current language of choice and using your new algorithm results, on the set [1,10e5], 100 iterations are as follows:

Average percent error: 0.096%
Average time difference: 4.79 ms in favor of .NET''s sqrt()

-- Exitus Acta Probat --
0

Share this post


Link to post
Share on other sites
From my tests I determined that using 0x5f375a84 was slightly more inaccurate than the original over a range of [1,1000], with incriments of 0.5. However, it was only around 3/1000 of a difference, so it doesn''t make much of a difference which is used, and over some ranges it was more accurate. What I am interested in is the math behind the calculations you did to arrive at that number.
0

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
I figured out how this thing works, and have improved the accuracy. The new code is

/// Chris Lomont version of fast inverse sqrt,
/// based on Newton method, 1 iteration, more accurate
float InvSqrt_Lomont(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f375a84 - (i>>1); // hidden initial guess, fast
x = *(float*)&i;
x = x*(1.5f-xhalf*x*x);
return x;
} // InvSqrt_Lomont

I changed the const from 0x5f3759df to 0x5f375a84 which reduces the maximum relative error over the entire range of floating point numbers. If someone else will like to test this, please post anything you find.

I hope to write up an explanation of how this works, and post it to my site (www.math.purdue.edu/~clomont) eventually under papers. The explanation by David Eberly at www.magic-software.com/Documentation/FastInverseSqrt.pdf is not entirely correct, since he ignores a lot of bit errors and shifts.

Let me know if this works for everyone. I am pretty certain about my analysis.

Chris Lomont


Ok, I plugged it into my test program and here''s what I got, with the C sqrt() being the ''true value''. The 1st style uses 0x5f3759df and the 2nd method uses 0x5f375a84.

first 500 integers:
Avg. Pct Error for 1st style: 0.0946063
Avg. Difference for 1st style: 0.0142623
Avg. Pct Error for 2nd style: 0.0946675
Avg. Difference for 2nd style: 0.0142718
first 5000 integers:
Avg. Pct Error for 1st style: 0.0971904
Avg. Difference for 1st style: 0.0409933
Avg. Pct Error for 2nd style: 0.0972366
Avg. Difference for 2nd style: 0.0410079
first 20000 integers:
Avg. Pct Error for 1st style: 0.112022
Avg. Difference for 1st style: 0.0893813
Avg. Pct Error for 2nd style: 0.112074
Avg. Difference for 2nd style: 0.0894122
first 100000 integers:
Avg. Pct Error for 1st style: 0.10822
Avg. Difference for 1st style: 0.192221
Avg. Pct Error for 2nd style: 0.108278
Avg. Difference for 2nd style: 0.192317
first 1e+006 integers:
Avg. Pct Error for 1st style: 0.105428
Avg. Difference for 1st style: 0.647431
Avg. Pct Error for 2nd style: 0.105498
Avg. Difference for 2nd style: 0.647852


Looks like there''s a difference between the two methods, though these results show the 1st method is only very slightly more accurate. I agree with YoshiN on this one. However, both methods are just as suitable because they''re approximations anyway.
0

Share this post


Link to post
Share on other sites
quote:
Original post by jperalta
Again, testing in C# as it''s my current language of choice and using your new algorithm results, on the set [1,10e5], 100 iterations are as follows:

Average percent error: 0.096%
Average time difference: 4.79 ms in favor of .NET''s sqrt()

-- Exitus Acta Probat --


It occurs to me that .NET might be exploiting processor features w/o telling you it is, hence the speed of that particular function.

##UnknownPlayer##
0

Share this post


Link to post
Share on other sites
quote:
Original post by UnknownPlayer

It occurs to me that .NET might be exploiting processor features w/o telling you it is, hence the speed of that particular function.

##UnknownPlayer##


Just goes to show you how ''portable'' their languages are.
0

Share this post


Link to post
Share on other sites