Public Group

# sqrt question

This topic is 4774 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I know using sqrt is super slow, but sometimes it's unavoidable (like when finding the magnitude of a vector). I always use the standard sqrt function and just realized i've never seen how it is implemented, yay black box programming. Anyways I was wondering if anyone knew what the implementation did to calculate square roots. I'm assuming it's about as optimized as it will get, but i'd like to see how it is done. Thanks if anyone can put me in the know.

##### Share on other sites
Most of the compilers will just call the native hardware implementation (whatever sqrt function the CPU has).
For the CPUs without a FPU, it will emulate it, but it should be resonably fast.

##### Share on other sites
Two easy ways to calculate an approximation of the square root is use either the first few terms of its taylor series expanded about 1 (if the nonnegative value you are taking the square root of is less than 1) or use newtons method.

##### Share on other sites
Are there any faster sqrt functions floating around out there other than the one you get from math.h?

##### Share on other sites
Quote:
 Original post by GrainAre there any faster sqrt functions floating around out there other than the one you get from math.h?

The cmath version can usually be replcaed by a processor intrinsic, which uses a hardware lookup table and is pretty snappy.

##### Share on other sites
Quote:
 Original post by GrainAre there any faster sqrt functions floating around out there other than the one you get from math.h?

If you have an SSE code path:
inline float sqrt(float x)
{
_asm
{
movss xmm0, x
rsqrtss xmm0, xmm0
rcpss xmm0, xmm0
movss x, xmm0
}
return x;
}

I did some tests between the CRTs "sqrtf" and the SSE code above.. and I found the results (10,000,000 iterations, random values ranging from 0..0xFFFFFFFF modulo 0xff, 0xfff, 0xffff, 0xfffff, etc. whole numbers only)

0..0xff showed a +/- difference of about 0.001
0..0xfff showed a +/- difference of about 0.005
0..0xffff showed a +/- difference of about 0.026
0..0xfffff showed a +/- difference of about 0.101

and it went up from there. this probably has to do with floating point error and internal optimizations that the SSE instructions do in trade-off for speed. For 10,000,000 iterations (I tried to make both speed tests uniform without favoring either one cache-wise), I found the SSE method to be between 30% and 40% faster than the built-in "sqrt" function (on a 2.8C, P4P800-VM) when asked to sqrt random numbers between 0 and 0xffffffff.

##### Share on other sites
wow, that's perhaps the most thorough and impressive response i've ever seen to a post. thank you very much!

##### Share on other sites
I've got one. (Really easy if you can implement it in hw).

You get each bit, and you move it to the bit thats at position (b/2).

at most its an add (if you want to, for added precision), if not, it doesn't ned a single transister.

For eg, 4
4 = 2^2
sqr(2^2) = 2^1 = 2
Move the bit from 2^2 to 2^1 and there ya go

3 = 2^2 + 2^1
sqr(3) = 2^1 + 2^0.5
= 2 + ~1 ~= 3

Thats the best and the worst. (its usually pretty good tho).

Hw implementations can be a little... complex tho.

From,
Nice coder

##### Share on other sites
Quote:
 Original post by Nice CoderI've got one. (Really easy if you can implement it in hw).You get each bit, and you move it to the bit thats at position (b/2).at most its an add (if you want to, for added precision), if not, it doesn't ned a single transister.For eg, 44 = 2^2sqr(2^2) = 2^1 = 2Move the bit from 2^2 to 2^1 and there ya go3 = 2^2 + 2^1sqr(3) = 2^1 + 2^0.5= 2 + ~1 ~= 3Thats the best and the worst. (its usually pretty good tho).Hw implementations can be a little... complex tho.From,Nice coder

This idea is terribly useless. It only performs square roots of integers and returns an integral result. Therefore sqrt(1) = sqrt(2) = sqrt(3) = 1. sqrt(4) = 2. sqrt(5) = sqrt(6) = sqrt(7) = sqrt(8) = sqrt(9) = 3, etc. A hardware implementation that simply used bit repositioning would have each input bit hardwired to an output bit which would be a rather easy implementation, however the results of this would be entirely inaccurate. Or if they add you get results like sqrt(7) = 5. That's not useful for anything, that doesn't even remotely resemble the correct answer.

##### Share on other sites
Do you really have to take the sqrt? There are times that you absolutly have to but if you are comparing if something is further than something else just compare the squared versions of the magnitudes.

1. 1
2. 2
Rutin
19
3. 3
4. 4
5. 5

• 14
• 30
• 13
• 11
• 11
• ### Forum Statistics

• Total Topics
631782
• Total Posts
3002324
×