# Cosine Taylor Series Issue

This topic is 3090 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm not sure if this belongs in General Programming or here.

I've been playing around with the SSE/SSE2 instructions, and have been trying to implement the calculation of cosine using the Taylor series. I've been basing my code on an article I found online, Fast Trigonometric Functions Using Intel's SSE2.

I could never quite get his ASM instructions (more of a step by step how to) to work right, but I managed to adapt the C example into an SSE2 instristic (MSVC) version. I also found that the C example he gives doesn't seem to correctly pull the input into the desired -pi/2 < x < pi/2 range, so my version is adapted from his earlier example that uses branching.

I've been running a comparison between std::cos and the results from this function and overall they tend to be pretty close. Where it starts to really break down is right around Pi.

Input		std::cos	sse2		difference-0.983189225	0.468513191	0.468513191	0-0.883189201	0.554371059	0.554371119	-5.96E-08-0.783189178	0.634689868	0.634689868	0-0.683189154	0.70866704	0.70866704	0-0.58318913	0.775563479	0.775563479	0-0.483189136	0.834710658	0.834710658	0-0.383189142	0.885517716	0.885517716	0-0.283189148	0.927477002	0.927477002	0-0.183189154	0.960169196	0.960169196	0-0.083189152	0.983267725	0.983267725	00.016810849	0.996541798	0.996541798	00.116810851	0.999858677	0.999858677	0Input		std::cos	sse2		difference-3.683187962	-0.801142097	-0.801262379	0.000120282-3.583188057	-0.8568874	-0.857079506	0.000192106-3.483188152	-0.904070973	-0.904372096	0.000301123-3.383188248	-0.942221403	-0.942686081	0.000464678-3.283188343	-0.970957458	-0.971663117	0.000705659-3.183188438	-0.989992082	-0.991048455	0.001056373-3.083188534	-0.999135017	-1.000695705	0.001560688-2.983188629	-0.99829495	-0.999758005	0.001463056-2.883188725	-0.987480283	-0.988468528	0.000988245-2.78318882	-0.96679908	-0.967457533	0.000658453-2.683188915	-0.936457932	-0.936890245	0.000432312-2.583189011	-0.896759987	-0.897039414	0.000279427-2.483189106	-0.848101974	-0.848279595	0.000177622-2.383189201	-0.790970027	-0.791080833	0.000110805

I believe that around Pi is where the Taylor series is no longer producing numbers close to cosine. Is there someway around this, or is there an error in my calculations?

A couple of needed constants were not in the original post.

namespace KS{	static const __m128 PS_ZERO(_mm_setzero_ps());	static const __m128 PS_SIGN_MASK(_mm_set1_ps(-0.0f));	static const __m128 PS_ALL_MASK(_mm_cmpeq_ps(PS_ZERO, PS_ZERO));	static const __m128 PS_INV_SIGN_MASK(_mm_andnot_ps(PS_SIGN_MASK, PS_ALL_MASK));	static const __m128 PS_ONE(_mm_set1_ps(1.0f));	static const __m128 PS_PI(_mm_set1_ps(F_PI));	static const __m128 PS_1_OVER_PI(_mm_rcp_ps(PS_PI));	static const __m128 PS_PI_OVER_2(_mm_set1_ps(F_PI / 2.0f));	static const __m128i EPI32_ONE(_mm_set1_epi32(1));	static const __m128 PS_1_OVER_2FACTORIAL(_mm_set1_ps(1.0f / 2.0f));	static const __m128 PS_1_OVER_4FACTORIAL(_mm_set1_ps(1.0f / 24.0f));	static const __m128 PS_1_OVER_6FACTORIAL(_mm_set1_ps(1.0f / 720.0f));	static const __m128 PS_1_OVER_8FACTORIAL(_mm_set1_ps(1.0f / 40320.0f));	static const __m128 PS_1_OVER_10FACTORIAL(_mm_set1_ps(1.0f / 3628800.0f));	inline const float cossse(float in_Value)	{		/*	Convert input to vector	*/		__m128 _Value(_mm_set_ss(in_Value));		/*	Remove Sign	*/		_Value = _mm_and_ps(_Value, PS_INV_SIGN_MASK);		/*	Clamp the input to -Pi/2 < _Value < Pi/2	*/		/*	_Value % Pi == _Value - (Pi * static_cast<int>(_Value/Pi))	*/		__m128 _Quadrant(_mm_mul_ss(_Value, PS_1_OVER_PI));		__m128i _TruncatedQuadrant(_mm_cvttps_epi32(_Quadrant));		_Quadrant = _mm_cvtepi32_ps(_TruncatedQuadrant);		_Quadrant = _mm_mul_ss(_Quadrant, PS_PI);		_Value = _mm_sub_ss(_Value, _Quadrant);		/*	Simulate _Value = ((_Value > Pi/2) ? Pi - _Value : 0 - _Value) 	*/		_TruncatedQuadrant = _mm_add_epi32(_TruncatedQuadrant, EPI32_ONE);		_TruncatedQuadrant = _mm_srli_epi32(_TruncatedQuadrant, 1);		_TruncatedQuadrant = _mm_and_si128(_TruncatedQuadrant, EPI32_ONE);		_TruncatedQuadrant = _mm_cmpeq_epi32(_TruncatedQuadrant, EPI32_ONE);		_Quadrant = _mm_castsi128_ps(_TruncatedQuadrant);		_Quadrant = _mm_and_ps(_Quadrant, PS_PI);		_Value = _mm_sub_ss(_Quadrant, _Value);		/*	Square	*/		_Value = _mm_mul_ss(_Value, _Value);		/*	Make Negative - IEEE 754	*/		_Value = _mm_or_ps(_Value, PS_SIGN_MASK);		/*	Taylor series	*/		__m128 _Math(_mm_mul_ss(PS_1_OVER_10FACTORIAL, _Value));		_Math = _mm_add_ss(_Math, PS_1_OVER_8FACTORIAL);		_Math = _mm_mul_ss(_Math, _Value);		_Math = _mm_add_ss(_Math, PS_1_OVER_6FACTORIAL);		_Math = _mm_mul_ss(_Math, _Value);		_Math = _mm_add_ss(_Math, PS_1_OVER_4FACTORIAL);		_Math = _mm_mul_ss(_Math, _Value);		_Math = _mm_add_ss(_Math, PS_1_OVER_2FACTORIAL);		_Math = _mm_mul_ss(_Math, _Value);		_Math = _mm_add_ss(_Math, PS_ONE);		float _Out;		_mm_store_ss(&_Out, _Math);		return _Out;	}}int main(){	std::cout.precision(10);	float fValue(KS::F_PI * -2.0f);	while(fValue < (KS::F_PI * 2))	{		float fDef(std::cos(fValue));		float fSSE(KS::cossse(fValue));		float fDif(fDef - fSSE);		fValue += 0.1f;		std::cout << fValue << "\t" << fDef << "\t" << fSSE << "\t" << fDif << "\n";	}	std::cout.flush();	return 0;}

##### Share on other sites
From your numbers (I didn't bother with the code), it looks like you are using the 10th-degree Taylor expansion at 0 as your quick cosine formula. You are evaluating your formula pretty far away from 0, and the error you are seeing is normal. There are several things you can do to improve the accuracy:
* Use the 12th- or 14th-degree Taylor expansion.
* Use your formula only for angles under pi/2 and use reflection to compute the others.
* Use a different polynomial approximation that is designed to have the smallest maximum error for whatever interval you'll be evaluating it in. (Sorry I don't have a reference handy for how to do this, and I don't have time to look for one.)

##### Share on other sites
After a little more experimenting, adding a few more factorials greatly reduce the divergence (which makes sense). Most of the articles I've read have said for single precision (floats), that 1/10! tends to be enough. But from what I'm seeing it's more along the lines of 1/16!.

##### Share on other sites
Thanks for the suggestions

Quote:
 Original post by alvaro * Use the 12th- or 14th-degree Taylor expansion.

Yeah, this seems to be the easy solution at the moment.

Quote:
 * Use your formula only for angles under pi/2 and use reflection to compute the others.

I think this might be the absolute cheapest way, if I can figure out how. (Trig has never been one of my strong points).

Quote:
 * Use a different polynomial approximation that is designed to have the smallest maximum error for whatever interval you'll be evaluating it in. (Sorry I don't have a reference handy for how to do this, and I don't have time to look for one.)

From what I've seen, Taylor series does tend to be the cheapest with good results (once the series math begins, it's just a multiply and an add per factorial), CORDIC and Euler tend to requires significantly more math to achieve similar results (the article actually addressed this).

##### Share on other sites
Quote:
Original post by Rattrap
Quote:
 * Use a different polynomial approximation that is designed to have the smallest maximum error for whatever interval you'll be evaluating it in. (Sorry I don't have a reference handy for how to do this, and I don't have time to look for one.)

From what I've seen, Taylor series does tend to be the cheapest with good results (once the series math begins, it's just a multiply and an add per factorial), CORDIC and Euler tend to requires significantly more math to achieve similar results (the article actually addressed this).

I am not sure you understood what I said in that last bullet point: I am suggesting using a polynomial approximation which would basically have identical code to the evaluation of the Taylor expansion, but slightly different coefficients, to distribute the error better along the interval of interest. I believe the technique is called "minimax polynomial approximation". Robin Green has a good article that covers it in "Game Programming Gems 3" ("More Approximations to Trigonometric Functions").

##### Share on other sites
Quote:
 Original post by alvaroI am not sure you understood what I said in that last bullet point: I am suggesting using a polynomial approximation which would basically have identical code to the evaluation of the Taylor expansion, but slightly different coefficients, to distribute the error better along the interval of interest. I believe the technique is called "minimax polynomial approximation". Robin Green has a good article that covers it in "Game Programming Gems 3" ("More Approximations to Trigonometric Functions").

Gotcha. I'll take a look into that (I own that book already, which will make that easier).

I had found another example online that was using some constants that weren't making any sense to me (wasn't sure where they were coming from), I'm thinking it might be related. Though when I tested that code (it was all ASM), it did tend to have a lot of drift too.

Thanks for the suggestion.

##### Share on other sites
Original post by Rattrap

Quote:
 * Use your formula only for angles under pi/2 and use reflection to compute the others.

I think this might be the absolute cheapest way, if I can figure out how. (Trig has never been one of my strong points).

Quote:

sin(pi-x)=sin(x)

So if x < p/2, simply use your approximation for sin on x, if x> p/2 and < pi calculate pi -x = y and calculate sin(y). Similar reasoning holds for the other two quadrants. If x > pi and <3*pi/2 then do -sin(x-pi). If x >3*pi/2 and < 2*pi, then -sin(2*pi-x). Thus all your calls to your sin function will be <pi/2. If you draw a circle, and then draw some mirrored lines in the various quadrants, it will be more intuitive why this works.
Jesse

1. 1
2. 2
Rutin
20
3. 3
A4L
15
4. 4
5. 5
khawk
14

• 13
• 26
• 10
• 11
• 9
• ### Forum Statistics

• Total Topics
633738
• Total Posts
3013617
×