• Advertisement

Archived

This topic is now archived and is closed to further replies.

Fast sqrt()?

This topic is 5470 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I know this must have come up often but I''m in a rush! Can I do in software square roots to ~2% accuracy faster than calling sqrt() in MSVC++ thankyou!
Read about my game, project #1 NEW (18th December)2 new screenshots, one from the engine and one from the level editor
John 3:16

Share this post


Link to post
Share on other sites
Advertisement
Depends on your hardware. Maybe if you do a look-up-table that contains approximate sqrts and then use one newton's iteration step to make it more accurate. Should work fine if the range isn't too large.

edit: linearly interpolating in the LUT would be pretty good too

But of course, using a LUT requires you to convert a float/double into int. I don't really know how slow that operation is in general.

[edited by - civguy on January 31, 2003 7:21:34 AM]

Share this post


Link to post
Share on other sites
What range of values are you looking at?
If it''s small, you might be able to do a taylor approximation. Else use a lookup table and interpolate between it, using either linear or cubic spline interpolation.

- JQ
#define NULL 1

Share this post


Link to post
Share on other sites
check out the downloads section on my site. there''s an assembly optimized square root. it''s pretty quick.

My Homepage

Share this post


Link to post
Share on other sites
NVidia''s fastmath sqrt is a LUT based approach; I see no asm here
Also, blowing all your L2 cache on a *drumroll* sqrt table is not conducive to performance, if you''re doing anything other than sqrt() (=> cache miss every time => pretty much anything is faster)

copy+paste from a previous post:
quote:

If you can''t avoid the sqrt, calculate it as x * rsqrt(x):
- on Athlons: pfrsqrt + pfmul (7 clock latency)
- otherwise, use a clever bit of code credited to Carmack:


  
float InvSqrt (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i >> 1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}


What this does in a nutshell is approximate via one iteration of Newton''s method (the line with the magic number computes an initial guess, important for convergence).


HTH
Jan

Share this post


Link to post
Share on other sites
That looks weird - I should know Newton-Raphson but it''s a while since I did it last - what''s the formula?

One situation I need to get a square root is solely for numbers in the range 0-1. This must be easy! I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)] but I have a feeling that casting is a slow operation? Would this be faster or slower, more or less accurate than that method above?

How accurate generally is the method above?



Read about my game, project #1
NEW (18th December)2 new screenshots, one from the engine and one from the level editor



John 3:16

Share this post


Link to post
Share on other sites
quote:
Original post by d000hg
How accurate generally is the method above?
Uh, perhaps you should try it out yourself instead of wait for breast feeding.

Also, caches aren''t so small that a tiny LUT would kill it. If the LUT is accessed often enough, it''ll stay in cache. If it''s not, then it''s not performance critical.

Share this post


Link to post
Share on other sites
Solve F(y) = 1/y^2 - x, which is obviously 0 <==> |y| = x^(-1/2); Newton''s method will get you the positive root.

> How accurate generally is the method above?
Depends on your initial guess (I think the Taylor expansion for approximating the mantissa, i.e. the magic number step could be better). If it ain''t good enough, just add iterations.

> One situation I need to get a square root is solely for numbers in the range 0-1.
You could approximate by piecewise linear functions (3 ought to cut it).

> I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)]
That''ll work, but get rid of the cast (ends up as a call to _ftol(), which really sucks).
a) enable -QIfist compiler option (your code will break if compiler != VC)
b) write your own round() (either with fist or by extracting the mantissa from a double)

>> Would this be faster or slower, more or less accurate than that method above?
> Uh, perhaps you should try it out yourself instead of wait for breast feeding.
Agreed

quote:
Also, caches aren''t so small that a tiny LUT would kill it. If the LUT is accessed often enough, it''ll stay in cache. If it''s not, then it''s not performance critical.

Of course. My comment referred to the NVidia fastmath table (256 KB), which is not so tiny

Share this post


Link to post
Share on other sites
quote:


That looks weird - I should know Newton-Raphson but it''s a while since I did it last - what''s the formula?



Xnew = X - ((f''(x) / f(x))

Share this post


Link to post
Share on other sites
What is it with people not wanting to give specific information? Thanks for the help but if someone knows how good a result is from experience there''s no need for me to test it is there - if you don''t know without testing yourself then just say so! I''m working to finish a major project in a limited period and don''t have the time to fiddle around unless I need to!

Share this post


Link to post
Share on other sites
this is my Newton-Rhapson method "optimized":

__inline double sqrt_newton(double c)
{
// x0 = 1
// f(x) = x^2 - c
// f''(x) = 2*x
double x = 1;

// 10 iterations (float Min=4 Max=15)
x -= ((SQR(x) - c) / (x+x));
x -= ((SQR(x) - c) / (x+x));
x -= ((SQR(x) - c) / (x+x));
x -= ((SQR(x) - c) / (x+x));

return x;
}

SQR is a define for x*x

Share this post


Link to post
Share on other sites
the 4 iterations can be replaced by this others:

x = ((SQR(x) + c)/(x+x));
x = ((SQR(x) + c)/(x+x));
x = ((SQR(x) + c)/(x+x));
x = ((SQR(x) + c)/(x+x));

Share this post


Link to post
Share on other sites
> What is it with people not wanting to give specific information?
Dude, if I had happened to know the exact accuracy of both methods, for the entire input domain, I would have gladly told you. Get real.

Did you see my statement (or get the idea from q2guy''s posts) that you can have results as accurate as you like?

Share this post


Link to post
Share on other sites
Hey man, I just thought someone who''d faced this problem before might have tested it. If people said ''I dunno, you''d have to test it '' rather than ''test it instead of having to be baby fed with answers'' then I''d be fine with that! The whole point of good programming is to avoid reinventing the wheel I thought!

Anyway... yeah I see you can just keep looping until the result squared is sufficently close to the original e.g

float SQRT(float f,float accuaracy)
{
//accuaracy is in range 0...1
float val=1,inv_f=1/f;
while(val*val*inv_f > accuracy)
val = (val*val + f)/(val+val));
return val;
}


I don''t have a compiler on this machine (at uni) so I can''t test it but it should work. Is it sensible in terms of performance or would the loop mean I''d be just as well off using a normal sqrt() function? Obviously this''d depend on the accuracy value but I guess I want 1-10%. And if I know the value I''m rooting is small I can probably fixed-code how many loops there are too like q2guy does. How''s this approach for arbritary numbers d''you think i.e 100s or 1000s?

Share this post


Link to post
Share on other sites
> Hey man, I just thought someone who''d faced this problem before might have tested it.
I have, but not an exhaustive check, and it was long ago. Besides, shouldn''t take all that long.

> The whole point of good programming is to avoid reinventing the wheel I thought!
Reinventing the wheel would, in this case, entail coming up with this rather clever code.

quote:
Is it sensible in terms of performance or would the loop mean I''d be just as well off using a normal sqrt() function?

Come on. If you''re worried about performance, measure it.

> I can probably fixed-code how many loops there are too like q2guy does.
Yes. I think a loop is unnecessary.

> How''s this approach for arbritary numbers d''you think i.e 100s or 1000s?
The bigger inaccuracy is probably the mantissa of the initial guess. Anyway, test it!

500x2

Share this post


Link to post
Share on other sites
d000hg, it didn''t work...

I converted q2guy''s method to ASM, but it is slower then the Cpp function...

You could download my asm version here: www.illicitdev.net/~matrixvb/dload/asm_sqr.zip

Share this post


Link to post
Share on other sites
If you''re gonna use assembly for fast sqrt why not just do the following:

double fsqrtd(double x) {
__asm {
fld x;
fsqrt;
fst x;
}
return x;
}

float fsqrt(float x) {
__asm {
fld x;
fsqrt;
fst x;
}
return x;
}

This is one of the fastest ways to calculate the square root of a number.

-- Exitus Acta Probat --

Share this post


Link to post
Share on other sites
Because an SSE / 3DNow rsqrt can be had in 3 clocks (follow this with a multiply, if you need the real square root, for a grand total of 7 clocks), whereas fsqrt takes 25 clocks.
(timings for Athlon XP CPUs)

Share this post


Link to post
Share on other sites
Jan, do you know some good ways to get the recipinial (or somthing like that ) square root?

Do I have to multiply the result of rsqrt with the value passed in to get the real square root?

Share this post


Link to post
Share on other sites
reciprocal of x := 1 / x (Kehrwert).
r(eciprocal) sqrt of x (calculated with 3DNow! pfrsqrt instruction or SSE rsqrtss) * x = sqrt(x).

Share this post


Link to post
Share on other sites

  • Advertisement