Back to General and Gameplay Programming

Faster Sin and Cos

General and Gameplay Programming Programming

Started by L. Spiro August 28, 2016 05:01 PM

37 comments, last by alvaro 7 years, 7 months ago

L. Spiro

25,818

Author

September 08, 2016 07:13 AM

You can make this code more precise by using a higher-degree polynomial in x2, and by picking better coefficients (I pretty much picked these by hand).

Any takers?

Sure. I have plans for tonight but I can do it soon.

One thing that will make it faster and more accurate is to replace:

static const float pi_halves = float(std::atan(1.0)*2.0);

with:

const float pi_halves = 1.5707963267948966192313216916398f; // float(std::atan(1.0)*2.0)

Even using a high-precision calculator does not give exactly PI/2 when I go through atan(1)*2. Due to the accuracy of floats, it will likely end up being the same constant no matter what, but this ensures it is correct for higher degrees of accuracy, and it makes us sure we have exactly the best constant.
It also avoids the possibility that a certain compiler does not implement atan() as an intrinsic and evaluated at compile time.

The 2nd and more-important point is to remove static. This will add a lot of code for initializing, in a thread-safe way, pi_halves. It will always add a branch (critical to avoid in performant code such as this) and may add instruction-cache misses due to the extra code that exists for locking, applying the value, setting a flag, and unlocking.

I see a few other things that could impact performance as well.
And I will tackle accuracy soon.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

alvaro

21,604

September 08, 2016 11:15 AM

One thing that will make it faster and more accurate is to replace:
static const float pi_halves = float(std::atan(1.0)*2.0);
with:
const float pi_halves = 1.5707963267948966192313216916398f; // float(std::atan(1.0)*2.0)

Actually, my compiler produces the exact same binary with that modification. As usual when optimizing, even if you have a plausible argument for why a change should be an improvement, you just need to test.

L. Spiro

25,818

Author

September 10, 2016 01:47 PM

You should find that these coefficients give you the best-possible accuracy:
9.99934434890747070313e-01 3.32740783691406250000e-01 6.53432160615921020508e-02

return x / (0.999934434890747070313f + 0.33274078369140625f*x2 - 0.0653432160615921020508f*x2*x2);

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

alvaro

21,604

September 10, 2016 02:06 PM

It would be good to have my_atan(1) be exactly pi/4, because otherwise my_atan will not be monotonic around 1.

L. Spiro

25,818

Author

September 10, 2016 08:16 PM

Then you may find these constants more to your liking:
1.00022506713867187500f 0.324211299419403076172f 0.0511969886720180511475f

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

alvaro

21,604

September 11, 2016 04:04 PM

Then you may find these constants more to your liking:
1.00022506713867187500f 0.324211299419403076172f 0.0511969886720180511475f

L. Spiro

That's great! The maximum difference is about .00074, which is really good for such a simple formula.

Here is the code with updated coefficients:

float atan_near_0(float x) {
  float x2 = x * x;
  return x / (1.00022506713867187500f + 0.324211299419403076172f*x2 - 0.0511969886720180511475f*x2*x2);
}

float my_atan(float x) {
  static const float pi_halves = float(std::atan(1.0)*2.0);
  if (std::abs(x) > 1.0f)
    return std::copysign(pi_halves, x) - atan_near_0(1.0f/x);
  return atan_near_0(x);
}

Any chance we can get a version with a higher-degree polynomial?

L. Spiro

25,818

Author

September 12, 2016 08:16 AM

It’s possible if I have time.

I am in the middle of writing a blog post about sin() and cos() at the moment and it is going into a lot of detail (more than I originally planned).

I might include a note about your my_atan(), and possibly exp() and some others. Did you come up with the original code?

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

alvaro

21,604

September 12, 2016 10:34 AM

I might include a note about your my_atan(), and possibly exp() and some others. Did you come up with the original code?

Yes, I did. I got the idea of using something like x/polynomial from this paper.

I just found a very interesting trick I hadn't thought about: You can reduce everything to computing atan(x) for x smaller than tan(pi/12) ~= 0.268, where it's really easy to approximate well. See here. The idea seems to come from "Math Toolkit for Real-Time Programming" by Jack Crenshaw, which I don't own.

Hodgman

52,717

September 12, 2016 11:14 AM

On the GPU, I've used these before: https://seblagarde.wordpress.com/2014/12/01/inverse-trigonometric-functions-gpu-optimization-for-amd-gcn-architecture/

You might find his discussion of different methods of interest.

. 22 Racing Series .

alvaro

21,604

September 12, 2016 01:52 PM

It probably would make sense to add terms to the numerator of my formula for atan near 0 as well. This is motivated by a continued fraction formula for atan, which gives you approximations as ratios of an odd polynomial and an even polynomial.

Faster Sin and Cos

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Faster Sin and Cos

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines