optimize square root

Started by
13 comments, last by bzroom 20 years, 1 month ago
is there a way i could optimize the square root function to be faster but less precise, like maybe ot the nearest .5 or something, on some depth tests im doin with it i just need an aproximation, any ideas?
Advertisement
you might want to search for carmack''s square root, or something like that. or look in the quake 2 source for it, or if the search is working on the forum, try that.
It''s called the Newton-Raphson method. I can find it in my calculus book (no code, though ).


Search for the word "square" on this page:
http://www.realtimerendering.com/

It''s about halfway down.
Interests: my money-pit car, computer hardware/programming, anything 3D'93 RX-7
I would not recommend writing any of your own math functions. All of them will most likely be much slower than the ones that are built into FPU''s and GPUs. If you want to do your own function, use something like Carmack''s because it exploits certain properties of floats according to IEEE standards. I would not use power series or Newtonian approximation because beyond the second or third approximations, you will be running the function slower than sqrt or the asm function fsqrt. Another method you could use is a look up table and then interpolate the values.

Brendan
Brendan"Mathematics is the Queen of the Sciences, and Arithmetic the Queen of Mathematics" -Gauss
quote:Original post by Punty50
I would not recommend writing any of your own math functions. All of them will most likely be much slower than the ones that are built into FPU''s and GPUs. If you want to do your own function, use something like Carmack''s because it exploits certain properties of floats according to IEEE standards. I would not use power series or Newtonian approximation because beyond the second or third approximations, you will be running the function slower than sqrt or the asm function fsqrt. Another method you could use is a look up table and then interpolate the values.


If you can accept very low precision, then one or two highly optimized Newton-Raphson iterations might be precise enough, and could be faster than the FPU calcs for some functions.



Graham Rhodes
Senior Scientist
Applied Research Associates, Inc.
Graham Rhodes Moderator, Math & Physics forum @ gamedev.net
Don''t many processors have rsq (reciprocal square root, 1/sqrt(x) ) instructions these days? You could surely divide that into 1 to get the square root itself back out again...

Richard "Superpig" Fine - saving pigs from untimely fates - Microsoft DirectX MVP 2006/2007/2008/2009
"Shaders are not meant to do everything. Of course you can try to use it for everything, but it's like playing football using cabbage." - MickeyMouse

GRhodes and Honayboyz-
Yes, I was trying to make the point that for the first few terms, the accuracy you get is worth it, however, if you start using too many terms, then you aren''t optimizing intelligently. Does anyone know how a lookup table and interpolation would fare compared to a computational method?
Brendan"Mathematics is the Queen of the Sciences, and Arithmetic the Queen of Mathematics" -Gauss
Are you sure you need square root? Can you live with distance squared and compare that? Just curious...
holy crap, ur a genius, i never thought of that, why do i need the rational number when i can just have the squares off all the differences and compare them, thank you so much, and at that, do i even need to square them? or cout i just get absolute value, which ever is faster?
If you find you do need a square root, here''s an optimised sqrt function based on the Carmack InvSqrt. The magic number can probably be tweaked a little more if anybody fancies playing with it. I tested it on the full range of normalised floating point numbers. The maximum percentage error was 2.84658% and the average was 1.73863%.
float mysqrt(float x){	union{		int intPart;		float floatPart;	} convertor;	union{		int intPart;		float floatPart;	} convertor2;	convertor.floatPart = x;	convertor2.floatPart = x;	convertor.intPart = 0x1FBCF800 + (convertor.intPart >> 1);	convertor2.intPart = 0x5f3759df - (convertor2.intPart >> 1);	return 0.5f*(convertor.floatPart + (x * convertor2.floatPart));}   


I think the accuracy can be slightly improved by changing the 0.5f in the return statement into a 0.4999f because the function always overestimates slightly. Again this number can probably be tweaked. Here''s the results I got for timing 500 000 000 sqrts:

sqrt (standard): >38s
fsqrt (asm): >38s
mysqrt: <20s

That includes function call and loop overheads.

Enigma

This topic is closed to new replies.

Advertisement