#### Archived

This topic is now archived and is closed to further replies.

# optimize square root

This topic is 5212 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

is there a way i could optimize the square root function to be faster but less precise, like maybe ot the nearest .5 or something, on some depth tests im doin with it i just need an aproximation, any ideas?

##### Share on other sites
you might want to search for carmack''s square root, or something like that. or look in the quake 2 source for it, or if the search is working on the forum, try that.

##### Share on other sites
It''s called the Newton-Raphson method. I can find it in my calculus book (no code, though ).

Search for the word "square" on this page:
http://www.realtimerendering.com/

It''s about halfway down.

##### Share on other sites
I would not recommend writing any of your own math functions. All of them will most likely be much slower than the ones that are built into FPU''s and GPUs. If you want to do your own function, use something like Carmack''s because it exploits certain properties of floats according to IEEE standards. I would not use power series or Newtonian approximation because beyond the second or third approximations, you will be running the function slower than sqrt or the asm function fsqrt. Another method you could use is a look up table and then interpolate the values.

Brendan

##### Share on other sites
quote:
Original post by Punty50
I would not recommend writing any of your own math functions. All of them will most likely be much slower than the ones that are built into FPU''s and GPUs. If you want to do your own function, use something like Carmack''s because it exploits certain properties of floats according to IEEE standards. I would not use power series or Newtonian approximation because beyond the second or third approximations, you will be running the function slower than sqrt or the asm function fsqrt. Another method you could use is a look up table and then interpolate the values.

If you can accept very low precision, then one or two highly optimized Newton-Raphson iterations might be precise enough, and could be faster than the FPU calcs for some functions.

Graham Rhodes
Senior Scientist
Applied Research Associates, Inc.

##### Share on other sites
Don''t many processors have rsq (reciprocal square root, 1/sqrt(x) ) instructions these days? You could surely divide that into 1 to get the square root itself back out again...

##### Share on other sites
GRhodes and Honayboyz-
Yes, I was trying to make the point that for the first few terms, the accuracy you get is worth it, however, if you start using too many terms, then you aren''t optimizing intelligently. Does anyone know how a lookup table and interpolation would fare compared to a computational method?

##### Share on other sites
Are you sure you need square root? Can you live with distance squared and compare that? Just curious...

##### Share on other sites
holy crap, ur a genius, i never thought of that, why do i need the rational number when i can just have the squares off all the differences and compare them, thank you so much, and at that, do i even need to square them? or cout i just get absolute value, which ever is faster?

##### Share on other sites
If you find you do need a square root, here''s an optimised sqrt function based on the Carmack InvSqrt. The magic number can probably be tweaked a little more if anybody fancies playing with it. I tested it on the full range of normalised floating point numbers. The maximum percentage error was 2.84658% and the average was 1.73863%.
float mysqrt(float x){	union{		int intPart;		float floatPart;	} convertor;	union{		int intPart;		float floatPart;	} convertor2;	convertor.floatPart = x;	convertor2.floatPart = x;	convertor.intPart = 0x1FBCF800 + (convertor.intPart >> 1);	convertor2.intPart = 0x5f3759df - (convertor2.intPart >> 1);	return 0.5f*(convertor.floatPart + (x * convertor2.floatPart));}

I think the accuracy can be slightly improved by changing the 0.5f in the return statement into a 0.4999f because the function always overestimates slightly. Again this number can probably be tweaked. Here''s the results I got for timing 500 000 000 sqrts:

sqrt (standard): >38s
fsqrt (asm): >38s
mysqrt: <20s

That includes function call and loop overheads.

Enigma

1. 1
2. 2
3. 3
4. 4
5. 5
Rutin
17

• 9
• 12
• 9
• 12
• 37
• ### Forum Statistics

• Total Topics
631420
• Total Posts
2999990
×