# Fast sqrt for 64bit

The following code is a very fast method of computing sqrt but in 32bit application.

double inline __declspec (naked) __fastcall sqrt_asm(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}


what is the equivalent (in speed) for 64bit application?

#include <emmintrin.h>

_mm_sqrt_ss

(or actually _mm_sqrt_sd as your version uses doubles).

In 64-bit floating point values are in the XMM registers by default (so the old FPU used in your 32-bit version isn't normally used at all).

Fast compared to what? std::sqrt should be just as fast, unless I am missing something.

here it show that you get more performance over the standard sqrt function. I have no idea is that is still true for 64bit

SQRTSS instruction is much faster than the legacy FSQRT instruction

Technically for a single sqrt calculation itself it's probably 2x faster for single-precision and about the same for double-precision (by Intel's instruction listings https://www-ssl.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf and measurements here: http://www.agner.org/optimize/instruction_tables.pdf).

(Though the overall performance of surrounding code will of course probably be faster).

float Sqrt(float x){
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
float r = *(float*)&i;
r = r*(1.5f - 0.5f*x*r*r);
return r * x;
}
Just don't ask me how it works...

That code computes a fast approximation to 1/sqrt(x) and multiplies it by x.

float Sqrt(float x){
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
float r = *(float*)&i;
r = r*(1.5f - 0.5f*x*r*r);
return r * x;
}
Just don't ask me how it works...

That code computes a fast approximation to 1/sqrt(x) and multiplies it by x.

In modern CPUs casting from float to int and then back will cause moving data among different registers and potentially memory, which can cause stalls.

I wouldn't be surprised if it performs poorly today.