#### Archived

This topic is now archived and is closed to further replies.

# Fast sqrt()?

This topic is 5561 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I know this must have come up often but I''m in a rush! Can I do in software square roots to ~2% accuracy faster than calling sqrt() in MSVC++ thankyou!
Read about my game, project #1 NEW (18th December)2 new screenshots, one from the engine and one from the level editor
John 3:16

##### Share on other sites
Depends on your hardware. Maybe if you do a look-up-table that contains approximate sqrts and then use one newton's iteration step to make it more accurate. Should work fine if the range isn't too large.

edit: linearly interpolating in the LUT would be pretty good too

But of course, using a LUT requires you to convert a float/double into int. I don't really know how slow that operation is in general.

[edited by - civguy on January 31, 2003 7:21:34 AM]

##### Share on other sites
What range of values are you looking at?
If it''s small, you might be able to do a taylor approximation. Else use a lookup table and interpolate between it, using either linear or cubic spline interpolation.

- JQ
#define NULL 1

##### Share on other sites
check out the downloads section on my site. there''s an assembly optimized square root. it''s pretty quick.

My Homepage

##### Share on other sites
NVidia''s fastmath sqrt is a LUT based approach; I see no asm here
Also, blowing all your L2 cache on a *drumroll* sqrt table is not conducive to performance, if you''re doing anything other than sqrt() (=> cache miss every time => pretty much anything is faster)

copy+paste from a previous post:
quote:

If you can''t avoid the sqrt, calculate it as x * rsqrt(x):
- on Athlons: pfrsqrt + pfmul (7 clock latency)
- otherwise, use a clever bit of code credited to Carmack:

  float InvSqrt (float x){	float xhalf = 0.5f*x;	int i = *(int*)&x;	i = 0x5f3759df - (i >> 1);	x = *(float*)&i;	x = x*(1.5f - xhalf*x*x);	return x;}

What this does in a nutshell is approximate via one iteration of Newton''s method (the line with the magic number computes an initial guess, important for convergence).

HTH
Jan

##### Share on other sites
That looks weird - I should know Newton-Raphson but it''s a while since I did it last - what''s the formula?

One situation I need to get a square root is solely for numbers in the range 0-1. This must be easy! I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)] but I have a feeling that casting is a slow operation? Would this be faster or slower, more or less accurate than that method above?

How accurate generally is the method above?

NEW (18th December)2 new screenshots, one from the engine and one from the level editor

John 3:16

##### Share on other sites
quote:
Original post by d000hg
How accurate generally is the method above?
Uh, perhaps you should try it out yourself instead of wait for breast feeding.

Also, caches aren''t so small that a tiny LUT would kill it. If the LUT is accessed often enough, it''ll stay in cache. If it''s not, then it''s not performance critical.

##### Share on other sites
Solve F(y) = 1/y^2 - x, which is obviously 0 <==> |y| = x^(-1/2); Newton''s method will get you the positive root.

> How accurate generally is the method above?
Depends on your initial guess (I think the Taylor expansion for approximating the mantissa, i.e. the magic number step could be better). If it ain''t good enough, just add iterations.

> One situation I need to get a square root is solely for numbers in the range 0-1.
You could approximate by piecewise linear functions (3 ought to cut it).

> I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)]
That''ll work, but get rid of the cast (ends up as a call to _ftol(), which really sucks).
a) enable -QIfist compiler option (your code will break if compiler != VC)
b) write your own round() (either with fist or by extracting the mantissa from a double)

>> Would this be faster or slower, more or less accurate than that method above?
> Uh, perhaps you should try it out yourself instead of wait for breast feeding.
Agreed

quote:
Also, caches aren''t so small that a tiny LUT would kill it. If the LUT is accessed often enough, it''ll stay in cache. If it''s not, then it''s not performance critical.

Of course. My comment referred to the NVidia fastmath table (256 KB), which is not so tiny

##### Share on other sites
quote:

That looks weird - I should know Newton-Raphson but it''s a while since I did it last - what''s the formula?

Xnew = X - ((f''(x) / f(x))

##### Share on other sites
What is it with people not wanting to give specific information? Thanks for the help but if someone knows how good a result is from experience there''s no need for me to test it is there - if you don''t know without testing yourself then just say so! I''m working to finish a major project in a limited period and don''t have the time to fiddle around unless I need to!

• 23
• 10
• 19
• 15
• 14