Core 2 Duo
div a/b 20 cycles
sqrt(a) 60 cycles
implicit
It's not a bit hack, it's a reasonable implementation of some algorithm. The standard organisation made FP64 in a way to permit algorithms like this.
Though quite a lot of people doesn't have the right education to know these algorithms (from experience).
Optimized invSqrt( double )
Quote:Original post by RagharDude, if the way the initial guess is made by shifting the exponent and mantissa together and adjusting the result with a magic constant *isn't* a bithack then I don't know what is. I seriously doubt that the IEEE committee had this kind of abuse in mind when choosing their format, or anything much beyond the ability to easily compare floats as integers really.
It's not a bit hack, it's a reasonable implementation of some algorithm. The standard organisation made FP64 in a way to permit algorithms like this.
Though quite a lot of people doesn't have the right education to know these algorithms (from experience).
One could do sqrt(1/x) = 2^((1/2)*log2(1/x)) = 2^(-(1/2)*log2(x)), which should be pretty fast assuming the standard library has implemented fast exponents and logarithms.
On my machine (AMD Sempron 1.91GHz) also, invSqrt() runs 20% slower than 1/sqrt(). I'm using Visual Studio 2005, invSqrt() takes 33.98 ns in average while 1/sqrt() takes 28.53 ns.
And I disassembled the code generated by compiler and confirmed that 1/sqrt() is implemented by fsqrt and fdiv instructions.
What the hell is going on?
And yes, floating point model is set to /fp:precise
And I disassembled the code generated by compiler and confirmed that 1/sqrt() is implemented by fsqrt and fdiv instructions.
What the hell is going on?
And yes, floating point model is set to /fp:precise
Well, I don't know if that will make a difference but here is the code that I did find:
I used ( long long ) instead of __int64 for GCC.
float Q_rsqrt( float number ){ long i; float x2, y; const float threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( long * ) &y; // evil floating point bit level hacking i = 0x5f3759df - ( i >> 1 ); // what the fuck? y = * ( float * ) &i; y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed #ifndef Q3_VM #ifdef __linux__ assert( !isnan(y) ); // bk010122 - FPE? #endif #endif return y;} double InvSqrt(double number){ __int64 i; double x2, y; const double threehalfs = 1.5F; x2 = number * 0.5F; y = number; i = * ( __int64 * ) &y; i = 0x5fe6ec85e7de30da - ( i >> 1 ); y = * ( double * ) &i; y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed return y;}
I used ( long long ) instead of __int64 for GCC.
Quote:Original post by rozz666
As for the pointer casts - it's undefined behaviour. It's undefined what will happen. Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.
Can't you get the same result without the undefined behavior by using a union instead of a pointer cast?
Quote:Original post by rozz666Enter: Unit tests ;)
Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.
Set up your build system to run the unit tests as a post-compile step, then if the code is broken it will show up as a compile-time error.
Quote:Original post by lolQuote:Original post by rozz666
As for the pointer casts - it's undefined behaviour. It's undefined what will happen. Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.
Can't you get the same result without the undefined behavior by using a union instead of a pointer cast?
No. Accessing a union through alternate members is undefined behavior. Once you set a union via a certain member you're only supposed to access it through that member.
Using unions to convert between types is an all too common abuse which few people know about.
Back on topic, any processor that supports SSE already has an inverse square root instruction built in, RSQRTSS. You can access it via a compiler intrinsic (_mm_rsqrt_ss) in MSVC 2008. However, even using the old-style FPU math will outperform Quake's inverse square root on any modern processors.
This kind of thing is a micro-optimization that shouldn't be performed until you've determined, with a profiler, that you actually need it. Which means the profiler is reporting that you're spending 80% of your time on sqrt calls. Writing out a square root function using integer math and bit hacks is a really bad idea, as it'll only turn out much slower than the built-in instructions on modern processors.
Quote:Original post by RaQuote:Original post by lolQuote:Original post by rozz666
As for the pointer casts - it's undefined behaviour. It's undefined what will happen. Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.
Can't you get the same result without the undefined behavior by using a union instead of a pointer cast?
No. Accessing a union through alternate members is undefined behavior. Once you set a union via a certain member you're only supposed to access it through that member.
Using unions to convert between types is an all too common abuse which few people know about.
Back on topic, any processor that supports SSE already has an inverse square root instruction built in, RSQRTSS. You can access it via a compiler intrinsic (_mm_rsqrt_ss) in MSVC 2008. However, even using the old-style FPU math will outperform Quake's inverse square root on any modern processors.
This kind of thing is a micro-optimization that shouldn't be performed until you've determined, with a profiler, that you actually need it. Which means the profiler is reporting that you're spending 80% of your time on sqrt calls. Writing out a square root function using integer math and bit hacks is a really bad idea, as it'll only turn out much slower than the built-in instructions on modern processors.
As far as I know this rsqrt_ss is using a lookup table which is too unprecise for most people's need.
Did there ever someone represent the square root as a cubic or quadratic spline and implement the evaluation with sse?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement