Fast sqrt()?
I know this must have come up often but I''m in a rush! Can I do in software square roots to ~2% accuracy faster than calling sqrt() in MSVC++
thankyou!
Read about my game, project #1
NEW (18th December)2 new screenshots, one from the engine and one from the level editor
John 3:16
Depends on your hardware. Maybe if you do a look-up-table that contains approximate sqrts and then use one newton's iteration step to make it more accurate. Should work fine if the range isn't too large.
edit: linearly interpolating in the LUT would be pretty good too
But of course, using a LUT requires you to convert a float/double into int. I don't really know how slow that operation is in general.
[edited by - civguy on January 31, 2003 7:21:34 AM]
edit: linearly interpolating in the LUT would be pretty good too
But of course, using a LUT requires you to convert a float/double into int. I don't really know how slow that operation is in general.
[edited by - civguy on January 31, 2003 7:21:34 AM]
What range of values are you looking at?
If it''s small, you might be able to do a taylor approximation. Else use a lookup table and interpolate between it, using either linear or cubic spline interpolation.
- JQ
If it''s small, you might be able to do a taylor approximation. Else use a lookup table and interpolate between it, using either linear or cubic spline interpolation.
- JQ
#define NULL 1
check out the downloads section on my site. there''s an assembly optimized square root. it''s pretty quick.
My Homepage
My Homepage
NVidia''s fastmath sqrt is a LUT based approach; I see no asm here
Also, blowing all your L2 cache on a *drumroll* sqrt table is not conducive to performance, if you''re doing anything other than sqrt() (=> cache miss every time => pretty much anything is faster)
copy+paste from a previous post:
What this does in a nutshell is approximate via one iteration of Newton''s method (the line with the magic number computes an initial guess, important for convergence).
HTH
Jan
Also, blowing all your L2 cache on a *drumroll* sqrt table is not conducive to performance, if you''re doing anything other than sqrt() (=> cache miss every time => pretty much anything is faster)
copy+paste from a previous post:
quote:
If you can''t avoid the sqrt, calculate it as x * rsqrt(x):
- on Athlons: pfrsqrt + pfmul (7 clock latency)
- otherwise, use a clever bit of code credited to Carmack:
float InvSqrt (float x){ float xhalf = 0.5f*x; int i = *(int*)&x i = 0x5f3759df - (i >> 1); x = *(float*)&i x = x*(1.5f - xhalf*x*x); return x;}
What this does in a nutshell is approximate via one iteration of Newton''s method (the line with the magic number computes an initial guess, important for convergence).
HTH
Jan
That looks weird - I should know Newton-Raphson but it''s a while since I did it last - what''s the formula?
One situation I need to get a square root is solely for numbers in the range 0-1. This must be easy! I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)] but I have a feeling that casting is a slow operation? Would this be faster or slower, more or less accurate than that method above?
How accurate generally is the method above?
Read about my game, project #1
NEW (18th December)2 new screenshots, one from the engine and one from the level editor
John 3:16
One situation I need to get a square root is solely for numbers in the range 0-1. This must be easy! I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)] but I have a feeling that casting is a slow operation? Would this be faster or slower, more or less accurate than that method above?
How accurate generally is the method above?
Read about my game, project #1
NEW (18th December)2 new screenshots, one from the engine and one from the level editor
John 3:16
quote:Original post by d000hgUh, perhaps you should try it out yourself instead of wait for breast feeding.
How accurate generally is the method above?
Also, caches aren''t so small that a tiny LUT would kill it. If the LUT is accessed often enough, it''ll stay in cache. If it''s not, then it''s not performance critical.
Solve F(y) = 1/y^2 - x, which is obviously 0 <==> |y| = x^(-1/2); Newton''s method will get you the positive root.
> How accurate generally is the method above?
Depends on your initial guess (I think the Taylor expansion for approximating the mantissa, i.e. the magic number step could be better). If it ain''t good enough, just add iterations.
> One situation I need to get a square root is solely for numbers in the range 0-1.
You could approximate by piecewise linear functions (3 ought to cut it).
> I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)]
That''ll work, but get rid of the cast (ends up as a call to _ftol(), which really sucks).
a) enable -QIfist compiler option (your code will break if compiler != VC)
b) write your own round() (either with fist or by extracting the mantissa from a double)
>> Would this be faster or slower, more or less accurate than that method above?
> Uh, perhaps you should try it out yourself instead of wait for breast feeding.
Agreed
Of course. My comment referred to the NVidia fastmath table (256 KB), which is not so tiny
> How accurate generally is the method above?
Depends on your initial guess (I think the Taylor expansion for approximating the mantissa, i.e. the magic number step could be better). If it ain''t good enough, just add iterations.
> One situation I need to get a square root is solely for numbers in the range 0-1.
You could approximate by piecewise linear functions (3 ought to cut it).
> I could use a table with 100 entries and do sqrt(x)=table[(int)(x*100)]
That''ll work, but get rid of the cast (ends up as a call to _ftol(), which really sucks).
a) enable -QIfist compiler option (your code will break if compiler != VC)
b) write your own round() (either with fist or by extracting the mantissa from a double)
>> Would this be faster or slower, more or less accurate than that method above?
> Uh, perhaps you should try it out yourself instead of wait for breast feeding.
Agreed
quote:Also, caches aren''t so small that a tiny LUT would kill it. If the LUT is accessed often enough, it''ll stay in cache. If it''s not, then it''s not performance critical.
Of course. My comment referred to the NVidia fastmath table (256 KB), which is not so tiny
quote:
That looks weird - I should know Newton-Raphson but it''s a while since I did it last - what''s the formula?
Xnew = X - ((f''(x) / f(x))
What is it with people not wanting to give specific information? Thanks for the help but if someone knows how good a result is from experience there''s no need for me to test it is there - if you don''t know without testing yourself then just say so! I''m working to finish a major project in a limited period and don''t have the time to fiddle around unless I need to!
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement