If you search for these terms you will get a lengthy list of possibilities.
Look-up tables from the old id Tech days are too cache-miss heavy.
Other implementations are only faster if you are re-implementing sincos(), including the Unreal Engine 4 implementation.
Sony presents an idea using Chebyshev polynomials, but we are bound to certain degrees of accuracy based on how far out you wish to expand the Taylor series.
With the goal of making something both fast and accurate in mind, I have come up with the following functions.
float Sin( float _fX ) {
int32 i32I = int32( _fX * 0.31830988618379067153776752674503f ); // 1 / PI.
_fX = (_fX - float( i32I ) * 3.1415926535897932384626433832795f);
float fX2 = _fX * _fX;
return (i32I & 1) ?
-_fX * (float( 1.0 ) +
fX2 * (float( -1.66666671633720398e-01 ) +
fX2 * (float( 8.33333376795053482e-03 ) +
fX2 * (float( -1.98412497411482036e-04 ) +
fX2 * (float( 2.75565571428160183e-06 ) +
fX2 * (float( -2.50368437093584362e-08 ) +
fX2 * (float( 1.58846852338356825e-10 ) +
fX2 * float( -6.57978446033657960e-13 )))))))) :
_fX * (float( 1.0 ) +
fX2 * (float( -1.66666671633720398e-01 ) +
fX2 * (float( 8.33333376795053482e-03 ) +
fX2 * (float( -1.98412497411482036e-04 ) +
fX2 * (float( 2.75565571428160183e-06 ) +
fX2 * (float( -2.50368437093584362e-08 ) +
fX2 * (float( 1.58846852338356825e-10 ) +
fX2 * float( -6.57978446033657960e-13 ))))))));
}
float Cos( float _fX ) {
int32 i32I = int32( _fX * 0.31830988618379067153776752674503f ); // 1 / PI.
_fX = (_fX - float( i32I ) * 3.1415926535897932384626433832795f);
float fX2 = _fX * _fX;
return (i32I & 1) ?
-float( 1.0 ) -
fX2 * (float( -5.00000000000000000e-01 ) +
fX2 * (float( 4.16666641831398010e-02 ) +
fX2 * (float( -1.38888671062886715e-03 ) +
fX2 * (float( 2.48006836045533419e-05 ) +
fX2 * (float( -2.75369188784679864e-07 ) +
fX2 * (float( 2.06202765973273472e-09 ) +
fX2 * float( -9.77589970779790818e-12 ))))))) :
float( 1.0 ) +
fX2 * (float( -5.00000000000000000e-01 ) +
fX2 * (float( 4.16666641831398010e-02 ) +
fX2 * (float( -1.38888671062886715e-03 ) +
fX2 * (float( 2.48006836045533419e-05 ) +
fX2 * (float( -2.75369188784679864e-07 ) +
fX2 * (float( 2.06202765973273472e-09 ) +
fX2 * float( -9.77589970779790818e-12 )))))));
}
Performance on PC may vary, from 1.02 times as fast to 2.0 times as fast.
On PlayStation 4 this is around 7 or 8 time as fast.
On Xbox One this is around 2 or 3 times as fast.
Accuracy is no fewer than 6 digits. I implemented a less-accurate version on Final Fantasy XV, so these versions are entirely suitable for any AAA production.
Explanation:
The code starts off by using fmodf() on PI. This is implemented manually via a cast to an integer. This gives it a valid range of ±52,707,130.87185.
cos() and sin() are curves that go up, then down, then up, etc. “i32I & 1” checks for it being an up curve or down curve. i32I represents the number of PI denominators, each even going one way, each odd going another way.
Here is the fancy part.
You will notice that the magic constants start off being close to one-over each odd factorial and each even factorial.
0.16666666666666666666666666666667 = 1 / 6 (6 - 1*2*3)
0.00833333333333333333333333333333 = 1 / 120 (120 = 1*2*3*4*5)
etc.
But by the end, they drift rather significantly.
1/17! = 7.6471637318198164759011319857881e-13
I use
6.57978446033657960e-13
The reason is that the series should normally continue on into infinity, but we cut it short.
In the case of Sin(), if we don’t account for this, our numbers drift low (because we actually use the negative of the constant and it overshoots low).
Using a lower number as I have done accounts for this.
I’ve adjusted each of the constants to account for this type of drift and give the best-possible accuracy for this degree of precision.
The precision here is enough to entirely drive a AAA game such as Final Fantasy XV, Star Ocean 5, and others.
Later I will re-evaluate the constants used in Unreal Engine 4, and then I will post a super-fast version.
L. Spiro