• Advertisement
Sign in to follow this  

Fast sqrt for 64bit

This topic is 978 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The following code is a very fast method of computing sqrt but in 32bit application.

double inline __declspec (naked) __fastcall sqrt_asm(double n)
{
    _asm fld qword ptr [esp+4]
    _asm fsqrt
    _asm ret 8
}

what is the equivalent (in speed) for 64bit application?

Share this post


Link to post
Share on other sites
Advertisement

#include <emmintrin.h>

_mm_sqrt_ss

(or actually _mm_sqrt_sd as your version uses doubles).

 

In 64-bit floating point values are in the XMM registers by default (so the old FPU used in your 32-bit version isn't normally used at all).

Edited by Erik Rufelt

Share this post


Link to post
Share on other sites

The following code is a very fast method of computing sqrt but in 32bit application.

double inline __declspec (naked) __fastcall sqrt_asm(double n)
{
    _asm fld qword ptr [esp+4]
    _asm fsqrt
    _asm ret 8
}
what is the equivalent (in speed) for 64bit application?


Fast compared to what? std::sqrt should be just as fast, unless I am missing something.

Share this post


Link to post
Share on other sites


SQRTSS instruction is much faster than the legacy FSQRT instruction

 

Technically for a single sqrt calculation itself it's probably 2x faster for single-precision and about the same for double-precision (by Intel's instruction listings https://www-ssl.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf and measurements here: http://www.agner.org/optimize/instruction_tables.pdf).

(Though the overall performance of surrounding code will of course probably be faster).

Share this post


Link to post
Share on other sites

 

float Sqrt(float x){
    int i = *(int*)&x;
    i = 0x5f3759df - (i>>1);
    float r = *(float*)&i;
    r = r*(1.5f - 0.5f*x*r*r);
    return r * x;
}
Just don't ask me how it works...

 


That code computes a fast approximation to 1/sqrt(x) and multiplies it by x.

 

In modern CPUs casting from float to int and then back will cause moving data among different registers and potentially memory, which can cause stalls.

I wouldn't be surprised if it performs poorly today.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement