Fast sqrt()?

Started by
18 comments, last by d000hg 21 years, 1 month ago
I know this must have come up often but I''m in a rush! Can I do in software square roots to ~2% accuracy faster than calling sqrt() in MSVC++ thankyou! Read about my game, project #1 NEW (18th December)2 new screenshots, one from the engine and one from the level editor John 3:16
Advertisement
Depends on your hardware. Maybe if you do a look-up-table that contains approximate sqrts and then use one newton's iteration step to make it more accurate. Should work fine if the range isn't too large.

edit: linearly interpolating in the LUT would be pretty good too

But of course, using a LUT requires you to convert a float/double into int. I don't really know how slow that operation is in general.

[edited by - civguy on January 31, 2003 7:21:34 AM]
What range of values are you looking at?
If it''s small, you might be able to do a taylor approximation. Else use a lookup table and interpolate between it, using either linear or cubic spline interpolation.

- JQ
#define NULL 1
~phil
check out the downloads section on my site. there''s an assembly optimized square root. it''s pretty quick.

My Homepage
My HomepageSome shoot to kill, others shoot to mame. I say clear the chamber and let the lord decide. - Reno 911
NVidia''s fastmath sqrt is a LUT based approach; I see no asm here
Also, blowing all your L2 cache on a *drumroll* sqrt table is not conducive to performance, if you''re doing anything other than sqrt() (=> cache miss every time => pretty much anything is faster)

copy+paste from a previous post:
quote:
If you can''t avoid the sqrt, calculate it as x * rsqrt(x):
- on Athlons: pfrsqrt + pfmul (7 clock latency)
- otherwise, use a clever bit of code credited to Carmack:

  float InvSqrt (float x){	float xhalf = 0.5f*x;	int i = *(int*)&x	i = 0x5f3759df - (i >> 1);	x = *(float*)&i	x = x*(1.5f - xhalf*x*x);	return x;}  


What this does in a nutshell is approximate via one iteration of Newton''s method (the line with the magic number computes an initial guess, important for convergence).


HTH
Jan
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
That looks weird - I should know Newton-Raphson but it''s a while since I did it last - what''s the formula?

One situation I need to get a square root is solely for numbers in the range 0-1. This must be easy! I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)] but I have a feeling that casting is a slow operation? Would this be faster or slower, more or less accurate than that method above?

How accurate generally is the method above?


Read about my game, project #1
NEW (18th December)2 new screenshots, one from the engine and one from the level editor


John 3:16
quote:Original post by d000hg
How accurate generally is the method above?
Uh, perhaps you should try it out yourself instead of wait for breast feeding.

Also, caches aren''t so small that a tiny LUT would kill it. If the LUT is accessed often enough, it''ll stay in cache. If it''s not, then it''s not performance critical.
Solve F(y) = 1/y^2 - x, which is obviously 0 <==> |y| = x^(-1/2); Newton''s method will get you the positive root.

> How accurate generally is the method above?
Depends on your initial guess (I think the Taylor expansion for approximating the mantissa, i.e. the magic number step could be better). If it ain''t good enough, just add iterations.

> One situation I need to get a square root is solely for numbers in the range 0-1.
You could approximate by piecewise linear functions (3 ought to cut it).

> I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)]
That''ll work, but get rid of the cast (ends up as a call to _ftol(), which really sucks).
a) enable -QIfist compiler option (your code will break if compiler != VC)
b) write your own round() (either with fist or by extracting the mantissa from a double)

>> Would this be faster or slower, more or less accurate than that method above?
> Uh, perhaps you should try it out yourself instead of wait for breast feeding.
Agreed

quote:Also, caches aren''t so small that a tiny LUT would kill it. If the LUT is accessed often enough, it''ll stay in cache. If it''s not, then it''s not performance critical.

Of course. My comment referred to the NVidia fastmath table (256 KB), which is not so tiny
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
quote:

That looks weird - I should know Newton-Raphson but it''s a while since I did it last - what''s the formula?


Xnew = X - ((f''(x) / f(x))
What is it with people not wanting to give specific information? Thanks for the help but if someone knows how good a result is from experience there''s no need for me to test it is there - if you don''t know without testing yourself then just say so! I''m working to finish a major project in a limited period and don''t have the time to fiddle around unless I need to!

This topic is closed to new replies.

Advertisement