Morpheus011 309 Report post Posted June 23, 2005 I know using sqrt is super slow, but sometimes it's unavoidable (like when finding the magnitude of a vector). I always use the standard sqrt function and just realized i've never seen how it is implemented, yay black box programming. Anyways I was wondering if anyone knew what the implementation did to calculate square roots. I'm assuming it's about as optimized as it will get, but i'd like to see how it is done. Thanks if anyone can put me in the know. 0 Share this post Link to post Share on other sites
Raduprv 997 Report post Posted June 23, 2005 Most of the compilers will just call the native hardware implementation (whatever sqrt function the CPU has).For the CPUs without a FPU, it will emulate it, but it should be resonably fast. 0 Share this post Link to post Share on other sites
Mastaba 761 Report post Posted June 23, 2005 Two easy ways to calculate an approximation of the square root is use either the first few terms of its taylor series expanded about 1 (if the nonnegative value you are taking the square root of is less than 1) or use newtons method. 0 Share this post Link to post Share on other sites
Grain 500 Report post Posted June 24, 2005 Are there any faster sqrt functions floating around out there other than the one you get from math.h? 0 Share this post Link to post Share on other sites
Promit 13259 Report post Posted June 24, 2005 Quote:Original post by GrainAre there any faster sqrt functions floating around out there other than the one you get from math.h?The cmath version can usually be replcaed by a processor intrinsic, which uses a hardware lookup table and is pretty snappy. 0 Share this post Link to post Share on other sites
sordid 246 Report post Posted June 24, 2005 Quote:Original post by GrainAre there any faster sqrt functions floating around out there other than the one you get from math.h?If you have an SSE code path:inline float sqrt(float x){ _asm { movss xmm0, x rsqrtss xmm0, xmm0 rcpss xmm0, xmm0 movss x, xmm0 } return x;}I did some tests between the CRTs "sqrtf" and the SSE code above.. and I found the results (10,000,000 iterations, random values ranging from 0..0xFFFFFFFF modulo 0xff, 0xfff, 0xffff, 0xfffff, etc. whole numbers only)0..0xff showed a +/- difference of about 0.0010..0xfff showed a +/- difference of about 0.0050..0xffff showed a +/- difference of about 0.0260..0xfffff showed a +/- difference of about 0.101and it went up from there. this probably has to do with floating point error and internal optimizations that the SSE instructions do in trade-off for speed. For 10,000,000 iterations (I tried to make both speed tests uniform without favoring either one cache-wise), I found the SSE method to be between 30% and 40% faster than the built-in "sqrt" function (on a 2.8C, P4P800-VM) when asked to sqrt random numbers between 0 and 0xffffffff. 0 Share this post Link to post Share on other sites
Morpheus011 309 Report post Posted June 24, 2005 wow, that's perhaps the most thorough and impressive response i've ever seen to a post. thank you very much! 0 Share this post Link to post Share on other sites
Nice Coder 366 Report post Posted June 24, 2005 I've got one. (Really easy if you can implement it in hw).You get each bit, and you move it to the bit thats at position (b/2).at most its an add (if you want to, for added precision), if not, it doesn't ned a single transister.For eg, 44 = 2^2sqr(2^2) = 2^1 = 2Move the bit from 2^2 to 2^1 and there ya go3 = 2^2 + 2^1sqr(3) = 2^1 + 2^0.5= 2 + ~1 ~= 3Thats the best and the worst. (its usually pretty good tho).Hw implementations can be a little... complex tho.From,Nice coder 0 Share this post Link to post Share on other sites
jperalta 356 Report post Posted June 24, 2005 Quote:Original post by Nice CoderI've got one. (Really easy if you can implement it in hw).You get each bit, and you move it to the bit thats at position (b/2).at most its an add (if you want to, for added precision), if not, it doesn't ned a single transister.For eg, 44 = 2^2sqr(2^2) = 2^1 = 2Move the bit from 2^2 to 2^1 and there ya go3 = 2^2 + 2^1sqr(3) = 2^1 + 2^0.5= 2 + ~1 ~= 3Thats the best and the worst. (its usually pretty good tho).Hw implementations can be a little... complex tho.From,Nice coderThis idea is terribly useless. It only performs square roots of integers and returns an integral result. Therefore sqrt(1) = sqrt(2) = sqrt(3) = 1. sqrt(4) = 2. sqrt(5) = sqrt(6) = sqrt(7) = sqrt(8) = sqrt(9) = 3, etc. A hardware implementation that simply used bit repositioning would have each input bit hardwired to an output bit which would be a rather easy implementation, however the results of this would be entirely inaccurate. Or if they add you get results like sqrt(7) = 5. That's not useful for anything, that doesn't even remotely resemble the correct answer. 0 Share this post Link to post Share on other sites
Guest Anonymous Poster Report post Posted June 27, 2005 Do you really have to take the sqrt? There are times that you absolutly have to but if you are comparing if something is further than something else just compare the squared versions of the magnitudes. 0 Share this post Link to post Share on other sites
nmi 978 Report post Posted June 27, 2005 Here you can find an integer sqrt algorithm:http://astronomy.swin.edu.au/~pbourke/analysis/sqrt/You can also use it for fixed point numbers in a modified form, and also for floating point numbers (on systems which have no fpu). 0 Share this post Link to post Share on other sites