Jump to content
  • Advertisement
gustavo rincones

C++ 3 quick ways to calculate the square root in c++

Recommended Posts

 

 

Hi, this is my first forum and I want to do it: quick way to calculate the square root in c ++ with floating data types. These types of functions are very useful to gain some CPU time, especially when used continuously. I will show you 3 similar functions and indicate the advantages and disadvantages of each of them. The last of these three functions was written by me. If you notice that the writing is a bit out of grammar, it is because I do not speak English and I am using a support tool. My native language is Spanish. Well, let's start:


The First method is very famous was used in the video game quake III arena and you can find a reference in Wikipedia as: :https://en.wikipedia.org/wiki/Fast_inverse_square_root.

 

The Function was optimized for improvements in computing times.

float sqrt1(const float &n) 
{
   static union{int i; float f;} u;
   u.i = 0x5F375A86 - (*(int*)&n >> 1);
   return (int(3) - n * u.f * u.f) * n * u.f * 0.5f;
}

 

-Advantages:

* When Root of 0 is calculated the function returns 0.

* The convergence of the function is acceptable enough for games.

* It generates very good times.

* The Reciprocal of the root can be calculated by removing the second “n” from the third line. According to the property of: 1 / sqrt (n) * n = sqrt (n).

-Disadvantages:

* Convergence decreases when the root to be calculated is very large.

 

The second method is not as famous as the first. But it does the same function calculate the root.


float sqrt2(const float& n)
{
   union {int i; float f;} u;
   u.i = 0x1FB5AD00 + (*(int*)&n >> 1);
   u.f = n / u.f + u.f;
   return n / u.f + u.f * 0.25f;
}

 

-Advantages:

* The convergence of the function is high enough to be used in applications other than games.

-Disadvantages:

* Computing times are much larger.

* The square root of “0” is a number very close to “0” but never “0”.

* The division operation is the bottleneck in this function. because the division operation is more expensive than the other arithmetic operations of Arithmetic Logic Units (ALU).

 

The third method takes certain characteristics of the two previous methods.

float sqrt3(const float& n)
{
   static union {int i; float f;} u;
   u.i = 0x2035AD0C + (*(int*)&n >> 1);
   return n / u.f + u.f * 0.25f;
}

 

Advantages:

* The convergence of the function is greater than that of the first method.

* Generates times equal to or greater than the first method.

Disadvantages:

* The square root of “0” is a number very close to “0” but never “0”.

 

 

The 3 previous methods have something in common.

They are based on the definition of the Newton-Raphson Method. according to the function of the square root > f (x) = x ^ 2 - s.

 

well thanks to you for reading my forum.

well thanks to you for reading my forum.

Share this post


Link to post
Share on other sites
Advertisement

Very good information and also useful in its entirety. You are right, the "static" member avoids continuous creation
of the union in exchange for speed but also completely limits the function. On the other hand,
I like being able to see a function written in c ++ on how to calculate
the square root in a more efficient way as you just described.
I would really like to see that function please.

Edited by gustavo rincones

Share this post


Link to post
Share on other sites

Interesting.

But what's wrong with using std::sqrt() ?

Do modern processors have a hardware accelerated version ?

 

Share this post


Link to post
Share on other sites
1 hour ago, Green_Baron said:

But what's wrong with using std::sqrt() ?

Do modern processors have a hardware accelerated version ?

 

Check the latencies on the link. This is for vectorized code, but I guess, the results are quite similar for serial instructions. `_mm_sqrt_ps` is the standard square root while `_mm_rsqrt_ps` calculates the reciprocal square root using an approximation formula as described by the TO. It is 3.25 times faster (on a skylake processor) but less accurate. If you perform a lot of square root calculations, this might be beneficial.

However, there are many situations where you can simply avoid calculating the square root. For example, when you want to know which vectors length is bigger, you can simply compare the squared lengths instead of calculating the length using the square root. No square root is faster than a fast square root 😛

You should generally avoid fast square roots if the result is used in complex subsequent calculations (physics simulations, linear solvers like LLT or QR). The error can lead to unwanted numerical instabilities, especially if you are only using 32bit floats.  However, if accuracy is not an issue, you might get a performance boost. I am not sure if it really matters since the relative number of square root calculations per frame shouldn't be too high, but that depends on your game/program.

Greetings

 

Share this post


Link to post
Share on other sites

I see. So its the usual tradeoff between speed and accuracy.

Yeah, i am aware. I don't use sqrt extensively for example in frustum and intersection tests. Was that grammatically correct ? :-)

Share this post


Link to post
Share on other sites
Spoiler

 

The third method can also be written this way:

float sqrt3(const float& n)
{
   int i = 0x2035AD0C + (*(int*)&n >> 1);

   return n / *(float*)&i + *(float*)&i * 0.25f;
}

the static member was removed from the function as a quote from: SeanMiddleditch.

Share this post


Link to post
Share on other sites
On 10/24/2019 at 12:34 AM, gustavo rincones said:
  Reveal hidden contents

 

The third method can also be written this way:


float sqrt3(const float& n)
{
   int i = 0x2035AD0C + (*(int*)&n >> 1);

   return n / *(float*)&i + *(float*)&i * 0.25f;
}

the static member was removed from the function as a quote from: SeanMiddleditch.

No it can't. That code breaks strict aliasing rules and is 100% undefined behavior ;) 

Share this post


Link to post
Share on other sites

Also, the following three all produce the same code on gcc, clang and msvc with optimizations enabled (but only one is actually legal)

#include <string.h>
float sqrt1(float n)
{
    int ni;
    memcpy(&ni, &n, sizeof(n));
    int ui = 0x2035AD0C + (ni >> 1);
    float u;
    memcpy(&u, &ui, sizeof(u));
    return n / u + u * 0.25f;
}

float sqrt2(float n)
{
   union {int i; float f;} u; // no need for the static!
   u.i = 0x2035AD0C + (*(int*)&n >> 1);
   return n / u.f + u.f * 0.25f;
}

float sqrt3(float n)
{
   int i = 0x2035AD0C + (*(int*)&n >> 1);

   return n / *(float*)&i + *(float*)&i * 0.25f;
}

 

Share this post


Link to post
Share on other sites
1 hour ago, l0calh05t said:

Also, the following three all produce the same code on gcc, clang and msvc with optimizations enabled (but only one is actually legal)

Its almost like compiler-writers know about these kinds of optimizations :D

But seriously, on one hand its awesome that compilers from being able to optimize usage of functions like sqrt (pretty much every compiler has settings for fast-math).
On the other hand, its even funnier that sometimes, compilers will figure out what extensive crap you are doing, and simply replace it with their generic optimized routine, or sometimes even with the thing you were actually trying to avoid :D (see Matt Godbolts talk from CppCon 2017 for some examples of this.

Today, "let the compiler figure it out" really applies at least for such local scope. Save optimizations for alogrithm/data-structure related matters that cannot be resolved by a simple compiler flag.

On 10/22/2019 at 5:14 AM, gustavo rincones said:

You are right, the "static" member avoids continuous creation
of the union in exchange for speed but also completely limits the function.

The "static" member also does everything but give you speed. Under optimization, both with or without it you get the same code (https://godbolt.org/z/ZBz6MM), because the compiler can figure out that you not using the value of the unoin in subsquent function call. But if itdidn't, that static would make the code much slower. You save one increment of the stack-pointer, but you pay the cost of heap-access, as well as (at least) a check on each access for multithreaded initialization.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!