Optimized invSqrt( double )
Well, I'm looking for the double version of the famous function invSqrt() of the Quake 3 Engine. Could someone provide it?
Thanks.
[Edited by - johnstanp on July 5, 2008 6:41:02 PM]
Well, using google, I've found a version for the function taking a double parameter.
[Edited by - johnstanp on July 5, 2008 6:23:20 PM]
inline double invSqrt( const double& x ){ double y = x; double xhalf = ( double )0.5 * y; long long i = *( long long* )( &y ); i = 0x5fe6ec85e7de30daLL - ( i >> 1 );//LL suffix for (long long) type for GCC y = *( double* )( &i ); y = y * ( ( double )1.5 - xhalf * y * y ); return y;}
[Edited by - johnstanp on July 5, 2008 6:23:20 PM]
I read an article about that function some time ago, in that article there was a warning about the speed of it. It was a lot faster to use "back then", but as compilers have gotten better, that code might actually make your program run slower.
This function leads to undefined behaviour. Particularly:
Apart from that, it relies on implemenation specific factors like double implementation and size, long long size, etc.
Lastly, try to profile what's really faster. My bet is that 1/std::sqrt(x) should be faster on modern CPUs.
long long i = *( long long* )( &y ); // pointer cast y = *( double* )( &i ); // pointer cast( i >> 1 ) // (not strictly UB, but right shifts are implementation depenedent for signed integers
Apart from that, it relies on implemenation specific factors like double implementation and size, long long size, etc.
Lastly, try to profile what's really faster. My bet is that 1/std::sqrt(x) should be faster on modern CPUs.
Quote:Original post by rozz666
This function leads to undefined behaviour. Particularly:long long i = *( long long* )( &y ); // pointer cast y = *( double* )( &i ); // pointer cast( i >> 1 ) // (not strictly UB, but right shifts are implementation depenedent for signed integers
Apart from that, it relies on implemenation specific factors like double implementation and size, long long size, etc.
Lastly, try to profile what's really faster. My bet is that 1/std::sqrt(x) should be faster on modern CPUs.
I use a laptop with a Pentium 4( 2.66Ghz ). That implementation is faster than 1/std::sqrt(x), actually seven times faster. My sqrt() is an inline member function in a template class.
If that sqrt() implementation yields faster results, to overcome the problem you cite, writing different functions for each targeted platform solves the problem.
PS: I use dev-cpp for cross-platform development. I have yet to test and profile it on a Unix machine.
Optimising things like square roots is an old trick - in modern programming, we would prefer to optimise our use of square roots instead: try to change your high level algorithms so that they make more economical use of such maths functions.
On my CPU (Pentium 4 1.4 GHz) invSqrt runs about 20% slower than 1 / std::sqrt.
As for the pointer casts - it's undefined behaviour. It's undefined what will happen. Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.
As for the pointer casts - it's undefined behaviour. It's undefined what will happen. Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.
Quote:Original post by h3ro
I read an article about that function some time ago, in that article there was a warning about the speed of it. It was a lot faster to use "back then", but as compilers have gotten better, that code might actually make your program run slower.
While current CPUs use radix 16 division, SQRT remained the same. That said inline 64 bit ASM might be actually faster. That function is hardcoded into CPU when asembly programmers need a fast aproximation of square root. Also, you will get rid of an ambiguity of a C++ code.
Quote:Original post by dmatter
Optimising things like square roots is an old trick - in modern programming, we would prefer to optimise our use of square roots instead: try to change your high level algorithms so that they make more economical use of such maths functions.
Thousand times nothing killed donkey. While programmers might do effort to invent, or choose, algorithms with low comutional cost, and high numerical accuracy. At the end they still might like to speed up theirs fast algorithm.
Quote:Original post by dmatterI tend to think that thing are a bit more complicated than that. True, new algorithms are typically the only way to improvements beyond an order of magnitude, but on the other hand implementing more complex specialized algorithms will affect far more code and is usually more brittle. That is to say that if this (admittedly bizarre) bithack allows you not to re-engineer dozens of systems around the game to keep values normalized then you've saved a great deal of work, at the cost of figuring out whether the hack is reliable.
Optimising things like square roots is an old trick - in modern programming, we would prefer to optimise our use of square roots instead: try to change your high level algorithms so that they make more economical use of such maths functions.
Not to say that there are any hard rules as to when to perform micro and macro optimize something but the "modern" view that low-level optimizations are merely a last resort when you absolutely can't convolute your algorithms or datastructures anymore is dangerous.
Quote:Original post by rozz666Odd.. What's your x87 precision set to?
On my CPU (Pentium 4 1.4 GHz) invSqrt runs about 20% slower than 1 / std::sqrt.
As for the pointer casts - it's undefined behaviour. It's undefined what will happen. Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.
Of course systems with SSE/3DNow have fast inverse square-root approximations in hardware already so you should really be using those on recent hardware.
As for the casts how about a memcpy() then? That ought to be portable and, assuming that the compiler has the proper intrinsics, about as fast.
Quote:Original post by rozz666
On my CPU (Pentium 4 1.4 GHz) invSqrt runs about 20% slower than 1 / std::sqrt.
You probably messed up your profiling somehow.
A division op is just about the slowest thing you can do on any CPU, old or new. I'm not talking 2 times slower than the simpler ops, or even 5 times slower.. try up to 60 times slower in some cases and no better than 20 times slower on the freshest cpu's available. Division times rival those of L2 and L3 cache misses, and cannot be paired with other divisions. Its a rare day when avoiding a division isnt the right thing to do if performance is a consideration.
Now maybe you have found a compiler that converts 1 / sqrt() into an SSE reciprocal square root, but I am skeptical unless you blatently told the compiler to use as little precision as possible in floating point calculations (which completely defeats the purpose of using doubles instead of floats everywhere else, not just at the site of the rsqrt)
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement