Sign in to follow this  

sqrt question

This topic is 4553 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I know using sqrt is super slow, but sometimes it's unavoidable (like when finding the magnitude of a vector). I always use the standard sqrt function and just realized i've never seen how it is implemented, yay black box programming. Anyways I was wondering if anyone knew what the implementation did to calculate square roots. I'm assuming it's about as optimized as it will get, but i'd like to see how it is done. Thanks if anyone can put me in the know.

Share this post


Link to post
Share on other sites
Most of the compilers will just call the native hardware implementation (whatever sqrt function the CPU has).
For the CPUs without a FPU, it will emulate it, but it should be resonably fast.

Share this post


Link to post
Share on other sites
Two easy ways to calculate an approximation of the square root is use either the first few terms of its taylor series expanded about 1 (if the nonnegative value you are taking the square root of is less than 1) or use newtons method.

Share this post


Link to post
Share on other sites
Quote:
Original post by Grain
Are there any faster sqrt functions floating around out there other than the one you get from math.h?


The cmath version can usually be replcaed by a processor intrinsic, which uses a hardware lookup table and is pretty snappy.

Share this post


Link to post
Share on other sites
Quote:
Original post by Grain
Are there any faster sqrt functions floating around out there other than the one you get from math.h?


If you have an SSE code path:
inline float sqrt(float x)
{
_asm
{
movss xmm0, x
rsqrtss xmm0, xmm0
rcpss xmm0, xmm0
movss x, xmm0
}
return x;
}


I did some tests between the CRTs "sqrtf" and the SSE code above.. and I found the results (10,000,000 iterations, random values ranging from 0..0xFFFFFFFF modulo 0xff, 0xfff, 0xffff, 0xfffff, etc. whole numbers only)

0..0xff showed a +/- difference of about 0.001
0..0xfff showed a +/- difference of about 0.005
0..0xffff showed a +/- difference of about 0.026
0..0xfffff showed a +/- difference of about 0.101

and it went up from there. this probably has to do with floating point error and internal optimizations that the SSE instructions do in trade-off for speed. For 10,000,000 iterations (I tried to make both speed tests uniform without favoring either one cache-wise), I found the SSE method to be between 30% and 40% faster than the built-in "sqrt" function (on a 2.8C, P4P800-VM) when asked to sqrt random numbers between 0 and 0xffffffff.

Share this post


Link to post
Share on other sites
I've got one. (Really easy if you can implement it in hw).

You get each bit, and you move it to the bit thats at position (b/2).

at most its an add (if you want to, for added precision), if not, it doesn't ned a single transister.

For eg, 4
4 = 2^2
sqr(2^2) = 2^1 = 2
Move the bit from 2^2 to 2^1 and there ya go

3 = 2^2 + 2^1
sqr(3) = 2^1 + 2^0.5
= 2 + ~1 ~= 3

Thats the best and the worst. (its usually pretty good tho).

Hw implementations can be a little... complex tho.

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
I've got one. (Really easy if you can implement it in hw).

You get each bit, and you move it to the bit thats at position (b/2).

at most its an add (if you want to, for added precision), if not, it doesn't ned a single transister.

For eg, 4
4 = 2^2
sqr(2^2) = 2^1 = 2
Move the bit from 2^2 to 2^1 and there ya go

3 = 2^2 + 2^1
sqr(3) = 2^1 + 2^0.5
= 2 + ~1 ~= 3

Thats the best and the worst. (its usually pretty good tho).

Hw implementations can be a little... complex tho.

From,
Nice coder


This idea is terribly useless. It only performs square roots of integers and returns an integral result. Therefore sqrt(1) = sqrt(2) = sqrt(3) = 1. sqrt(4) = 2. sqrt(5) = sqrt(6) = sqrt(7) = sqrt(8) = sqrt(9) = 3, etc. A hardware implementation that simply used bit repositioning would have each input bit hardwired to an output bit which would be a rather easy implementation, however the results of this would be entirely inaccurate. Or if they add you get results like sqrt(7) = 5. That's not useful for anything, that doesn't even remotely resemble the correct answer.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Do you really have to take the sqrt? There are times that you absolutly have to but if you are comparing if something is further than something else just compare the squared versions of the magnitudes.

Share this post


Link to post
Share on other sites

This topic is 4553 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this