• 9
• 9
• 11
• 13
• 9

# Optimized invSqrt( double )

This topic is 3566 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Well, I'm looking for the double version of the famous function invSqrt() of the Quake 3 Engine. Could someone provide it? Thanks. [Edited by - johnstanp on July 5, 2008 6:41:02 PM]

##### Share on other sites
Well, using google, I've found a version for the function taking a double parameter.

inline double invSqrt( const double& x ){    double y = x;    double xhalf = ( double )0.5 * y;    long long i = *( long long* )( &y );    i = 0x5fe6ec85e7de30daLL - ( i >> 1 );//LL suffix for (long long) type for GCC    y = *( double* )( &i );    y = y * ( ( double )1.5 - xhalf * y * y );        return y;}

[Edited by - johnstanp on July 5, 2008 6:23:20 PM]

##### Share on other sites
I read an article about that function some time ago, in that article there was a warning about the speed of it. It was a lot faster to use "back then", but as compilers have gotten better, that code might actually make your program run slower.

##### Share on other sites
This function leads to undefined behaviour. Particularly:
long long i = *( long long* )( &y ); // pointer cast    y = *( double* )( &i ); // pointer cast( i >> 1 ) // (not strictly UB, but right shifts are implementation depenedent for signed integers

Apart from that, it relies on implemenation specific factors like double implementation and size, long long size, etc.

Lastly, try to profile what's really faster. My bet is that 1/std::sqrt(x) should be faster on modern CPUs.

##### Share on other sites
Quote:
 Original post by rozz666This function leads to undefined behaviour. Particularly:long long i = *( long long* )( &y ); // pointer cast y = *( double* )( &i ); // pointer cast( i >> 1 ) // (not strictly UB, but right shifts are implementation depenedent for signed integersApart from that, it relies on implemenation specific factors like double implementation and size, long long size, etc.Lastly, try to profile what's really faster. My bet is that 1/std::sqrt(x) should be faster on modern CPUs.

I use a laptop with a Pentium 4( 2.66Ghz ). That implementation is faster than 1/std::sqrt(x), actually seven times faster. My sqrt() is an inline member function in a template class.

If that sqrt() implementation yields faster results, to overcome the problem you cite, writing different functions for each targeted platform solves the problem.

PS: I use dev-cpp for cross-platform development. I have yet to test and profile it on a Unix machine.

##### Share on other sites
Optimising things like square roots is an old trick - in modern programming, we would prefer to optimise our use of square roots instead: try to change your high level algorithms so that they make more economical use of such maths functions.

##### Share on other sites
On my CPU (Pentium 4 1.4 GHz) invSqrt runs about 20% slower than 1 / std::sqrt.
As for the pointer casts - it's undefined behaviour. It's undefined what will happen. Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.

##### Share on other sites
Quote:
 Original post by h3roI read an article about that function some time ago, in that article there was a warning about the speed of it. It was a lot faster to use "back then", but as compilers have gotten better, that code might actually make your program run slower.

While current CPUs use radix 16 division, SQRT remained the same. That said inline 64 bit ASM might be actually faster. That function is hardcoded into CPU when asembly programmers need a fast aproximation of square root. Also, you will get rid of an ambiguity of a C++ code.
Quote:
 Original post by dmatterOptimising things like square roots is an old trick - in modern programming, we would prefer to optimise our use of square roots instead: try to change your high level algorithms so that they make more economical use of such maths functions.

Thousand times nothing killed donkey. While programmers might do effort to invent, or choose, algorithms with low comutional cost, and high numerical accuracy. At the end they still might like to speed up theirs fast algorithm.

##### Share on other sites
Quote:
 Original post by dmatterOptimising things like square roots is an old trick - in modern programming, we would prefer to optimise our use of square roots instead: try to change your high level algorithms so that they make more economical use of such maths functions.
I tend to think that thing are a bit more complicated than that. True, new algorithms are typically the only way to improvements beyond an order of magnitude, but on the other hand implementing more complex specialized algorithms will affect far more code and is usually more brittle. That is to say that if this (admittedly bizarre) bithack allows you not to re-engineer dozens of systems around the game to keep values normalized then you've saved a great deal of work, at the cost of figuring out whether the hack is reliable.
Not to say that there are any hard rules as to when to perform micro and macro optimize something but the "modern" view that low-level optimizations are merely a last resort when you absolutely can't convolute your algorithms or datastructures anymore is dangerous.

Quote:
 Original post by rozz666On my CPU (Pentium 4 1.4 GHz) invSqrt runs about 20% slower than 1 / std::sqrt.As for the pointer casts - it's undefined behaviour. It's undefined what will happen. Sure, many compilers allows this, but then you have to check your code with every compiler you use and still can't be sure when it's going to fail.
Odd.. What's your x87 precision set to?
Of course systems with SSE/3DNow have fast inverse square-root approximations in hardware already so you should really be using those on recent hardware.
As for the casts how about a memcpy() then? That ought to be portable and, assuming that the compiler has the proper intrinsics, about as fast.

##### Share on other sites
Quote:
 Original post by rozz666On my CPU (Pentium 4 1.4 GHz) invSqrt runs about 20% slower than 1 / std::sqrt.

You probably messed up your profiling somehow.

A division op is just about the slowest thing you can do on any CPU, old or new. I'm not talking 2 times slower than the simpler ops, or even 5 times slower.. try up to 60 times slower in some cases and no better than 20 times slower on the freshest cpu's available. Division times rival those of L2 and L3 cache misses, and cannot be paired with other divisions. Its a rare day when avoiding a division isnt the right thing to do if performance is a consideration.

Now maybe you have found a compiler that converts 1 / sqrt() into an SSE reciprocal square root, but I am skeptical unless you blatently told the compiler to use as little precision as possible in floating point calculations (which completely defeats the purpose of using doubles instead of floats everywhere else, not just at the site of the rsqrt)