#### Archived

This topic is now archived and is closed to further replies.

# optimize square root

## Recommended Posts

bzroom    647
is there a way i could optimize the square root function to be faster but less precise, like maybe ot the nearest .5 or something, on some depth tests im doin with it i just need an aproximation, any ideas?

##### Share on other sites
billybob    134
you might want to search for carmack''s square root, or something like that. or look in the quake 2 source for it, or if the search is working on the forum, try that.

##### Share on other sites
JONSKI    122
It''s called the Newton-Raphson method. I can find it in my calculus book (no code, though ).

http://www.realtimerendering.com/

##### Share on other sites
Punty50    122
I would not recommend writing any of your own math functions. All of them will most likely be much slower than the ones that are built into FPU''s and GPUs. If you want to do your own function, use something like Carmack''s because it exploits certain properties of floats according to IEEE standards. I would not use power series or Newtonian approximation because beyond the second or third approximations, you will be running the function slower than sqrt or the asm function fsqrt. Another method you could use is a look up table and then interpolate the values.

Brendan

##### Share on other sites
grhodes_at_work    1385
quote:
Original post by Punty50
I would not recommend writing any of your own math functions. All of them will most likely be much slower than the ones that are built into FPU''s and GPUs. If you want to do your own function, use something like Carmack''s because it exploits certain properties of floats according to IEEE standards. I would not use power series or Newtonian approximation because beyond the second or third approximations, you will be running the function slower than sqrt or the asm function fsqrt. Another method you could use is a look up table and then interpolate the values.

If you can accept very low precision, then one or two highly optimized Newton-Raphson iterations might be precise enough, and could be faster than the FPU calcs for some functions.

Graham Rhodes
Senior Scientist
Applied Research Associates, Inc.

##### Share on other sites
superpig    1825
Don''t many processors have rsq (reciprocal square root, 1/sqrt(x) ) instructions these days? You could surely divide that into 1 to get the square root itself back out again...

##### Share on other sites
Punty50    122
GRhodes and Honayboyz-
Yes, I was trying to make the point that for the first few terms, the accuracy you get is worth it, however, if you start using too many terms, then you aren''t optimizing intelligently. Does anyone know how a lookup table and interpolation would fare compared to a computational method?

##### Share on other sites
rypyr    252
Are you sure you need square root? Can you live with distance squared and compare that? Just curious...

##### Share on other sites
bzroom    647
holy crap, ur a genius, i never thought of that, why do i need the rational number when i can just have the squares off all the differences and compare them, thank you so much, and at that, do i even need to square them? or cout i just get absolute value, which ever is faster?

##### Share on other sites
Enigma    1410
If you find you do need a square root, here''s an optimised sqrt function based on the Carmack InvSqrt. The magic number can probably be tweaked a little more if anybody fancies playing with it. I tested it on the full range of normalised floating point numbers. The maximum percentage error was 2.84658% and the average was 1.73863%.
float mysqrt(float x){	union{		int intPart;		float floatPart;	} convertor;	union{		int intPart;		float floatPart;	} convertor2;	convertor.floatPart = x;	convertor2.floatPart = x;	convertor.intPart = 0x1FBCF800 + (convertor.intPart >> 1);	convertor2.intPart = 0x5f3759df - (convertor2.intPart >> 1);	return 0.5f*(convertor.floatPart + (x * convertor2.floatPart));}

I think the accuracy can be slightly improved by changing the 0.5f in the return statement into a 0.4999f because the function always overestimates slightly. Again this number can probably be tweaked. Here''s the results I got for timing 500 000 000 sqrts:

sqrt (standard): >38s
fsqrt (asm): >38s
mysqrt: <20s

That includes function call and loop overheads.

Enigma

##### Share on other sites
rypyr    252
quote:
Original post by honayboyz
and at that, do i even need to square them? or cout i just get absolute value, which ever is faster?

You may need to square them because usually when you''re looking at distances:

d^2 = x^2 + y^2

You need to square both values and then add them to get the distance squared. Otherwise this will skew your math:

if x1 = 6 and y1 = 6
and x2 = 10 and y2 = 0

then if you don''t square them and just add them, your first point will appear farther away than your second point. Which it isn''t.

##### Share on other sites
Guest Anonymous Poster
I would suggest using a handheld calculator thus releaving the burnden on the processor.

##### Share on other sites
Chris Jungen    144
By far the fastest square root you can get,
are those SIMD instructions that are provided with SSE or 3DNow.
They give you 12 bit precision and eat only 2 cycles.

23 bit precision are possible with a few extra instructions.
On SSE you can even do 4 squareroots in parallel.

The drawback is, you have to write the same
for SSE, 3DNow and perhabs Fpu.

Onother option would be to use the faboulus VectorC
compiler, where you hint the compiler to do
lower precision square roots and divisions and reziprocal
square roots. It creates those SIMD instructions for you.

Chris

##### Share on other sites
quote:
Original post by Enigma
If you find you do need a square root, here's an optimised sqrt function based on the Carmack InvSqrt. The magic number can probably be tweaked a little more if anybody fancies playing with it. I tested it on the full range of normalised floating point numbers. The maximum percentage error was 2.84658% and the average was 1.73863%.
float mysqrt(float x){	union{		int intPart;		float floatPart;	} convertor;	union{		int intPart;		float floatPart;	} convertor2;	convertor.floatPart = x;	convertor2.floatPart = x;	convertor.intPart = 0x1FBCF800 + (convertor.intPart >> 1);	convertor2.intPart = 0x5f3759df - (convertor2.intPart >> 1);	return 0.5f*(convertor.floatPart + (x * convertor2.floatPart));}

I think the accuracy can be slightly improved by changing the 0.5f in the return statement into a 0.4999f because the function always overestimates slightly. Again this number can probably be tweaked. Here's the results I got for timing 500 000 000 sqrts:

sqrt (standard): >38s
fsqrt (asm): >38s
mysqrt: <20s

That includes function call and loop overheads.

Enigma

Re: 0x5f3759df

You may wish to take a look at this website: Homepage of Chris Lomont, who has a paper "explaining the magic constant 0x5f3759df floating around the net for inverse square root code, and he shows that 0x5f375a86 is probably a better choice".

Jason Doucette - my online résumé page: www.jasondoucette.com
projects, real-time graphics, artificial intelligence, world records, wallpapers/desktops
"Great minds discuss ideas, average minds discuss events, small minds discuss people." - Admiral Hyman G. Rickover

[edited by - Jason Doucette on May 23, 2003 8:24:52 AM]

##### Share on other sites
So far, the fastest square root algorithm I have tested is this one. Posted by shadow12345 on 18 February 2004 1:43:25 PM:
http://www.gamedev.net/community/forums/topic.asp?topic_id=208248

quote:
EDIT: here is another one from tricks of the 3d game programming gurus (in the far back of the book). It's based on the same idea. I think it's equivalent to the carmack inverse sqrt. It's ridiculously fast, but only within 5% accuracy.

/*	-From tricks of the 3d game programming gurus	I guess this makes me a 3d game programming guru then...right...*/float	Faster_Sqrtf(float f){	float	result;	_asm	{		mov eax, f		sub eax, 0x3f800000		sar eax, 1		add eax, 0x3f800000		mov result, eax	}	return result;}

matthewdoucette.com
- GFX (DOS/3Dfx/Rendered), Music, A.I., Inspirational Quotes, Wallpapers, Chess
- Contact
jasondoucette.com
pureproductivity.com - Pure Productivity Inc. Web Development & Hosting

"Discussion is an exchange of intelligence; argument is an exchange of ignorance." - Unknown

[edited by - Doucette on March 17, 2004 2:26:02 PM]