Faster Sin and Cos

Started by
37 comments, last by alvaro 7 years, 7 months ago

You can make this code more precise by using a higher-degree polynomial in x2, and by picking better coefficients (I pretty much picked these by hand).

Any takers?

Sure. I have plans for tonight but I can do it soon.

One thing that will make it faster and more accurate is to replace:
static const float pi_halves = float(std::atan(1.0)*2.0);
with:
const float pi_halves = 1.5707963267948966192313216916398f; // float(std::atan(1.0)*2.0)
Even using a high-precision calculator does not give exactly PI/2 when I go through atan(1)*2. Due to the accuracy of floats, it will likely end up being the same constant no matter what, but this ensures it is correct for higher degrees of accuracy, and it makes us sure we have exactly the best constant.
It also avoids the possibility that a certain compiler does not implement atan() as an intrinsic and evaluated at compile time.

The 2nd and more-important point is to remove static. This will add a lot of code for initializing, in a thread-safe way, pi_halves. It will always add a branch (critical to avoid in performant code such as this) and may add instruction-cache misses due to the extra code that exists for locking, applying the value, setting a flag, and unlocking.

I see a few other things that could impact performance as well.
And I will tackle accuracy soon.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Advertisement

One thing that will make it faster and more accurate is to replace:

static const float pi_halves = float(std::atan(1.0)*2.0);
with:
const float pi_halves = 1.5707963267948966192313216916398f; // float(std::atan(1.0)*2.0)


Actually, my compiler produces the exact same binary with that modification. As usual when optimizing, even if you have a plausible argument for why a change should be an improvement, you just need to test.
You should find that these coefficients give you the best-possible accuracy:
9.99934434890747070313e-01 3.32740783691406250000e-01 6.53432160615921020508e-02
return x / (0.999934434890747070313f + 0.33274078369140625f*x2 - 0.0653432160615921020508f*x2*x2);


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

It would be good to have my_atan(1) be exactly pi/4, because otherwise my_atan will not be monotonic around 1.

Then you may find these constants more to your liking:
1.00022506713867187500f 0.324211299419403076172f 0.0511969886720180511475f


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Then you may find these constants more to your liking:
1.00022506713867187500f 0.324211299419403076172f 0.0511969886720180511475f


L. Spiro


That's great! The maximum difference is about .00074, which is really good for such a simple formula.

Here is the code with updated coefficients:

float atan_near_0(float x) {
  float x2 = x * x;
  return x / (1.00022506713867187500f + 0.324211299419403076172f*x2 - 0.0511969886720180511475f*x2*x2);
}

float my_atan(float x) {
  static const float pi_halves = float(std::atan(1.0)*2.0);
  if (std::abs(x) > 1.0f)
    return std::copysign(pi_halves, x) - atan_near_0(1.0f/x);
  return atan_near_0(x);
}

Any chance we can get a version with a higher-degree polynomial?

It’s possible if I have time.

I am in the middle of writing a blog post about sin() and cos() at the moment and it is going into a lot of detail (more than I originally planned).

I might include a note about your my_atan(), and possibly exp() and some others. Did you come up with the original code?

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I might include a note about your my_atan(), and possibly exp() and some others. Did you come up with the original code?


Yes, I did. I got the idea of using something like x/polynomial from this paper.

I just found a very interesting trick I hadn't thought about: You can reduce everything to computing atan(x) for x smaller than tan(pi/12) ~= 0.268, where it's really easy to approximate well. See here. The idea seems to come from "Math Toolkit for Real-Time Programming" by Jack Crenshaw, which I don't own.

On the GPU, I've used these before: https://seblagarde.wordpress.com/2014/12/01/inverse-trigonometric-functions-gpu-optimization-for-amd-gcn-architecture/

You might find his discussion of different methods of interest.

It probably would make sense to add terms to the numerator of my formula for atan near 0 as well. This is motivated by a continued fraction formula for atan, which gives you approximations as ratios of an odd polynomial and an even polynomial.

This topic is closed to new replies.

Advertisement