#### Archived

This topic is now archived and is closed to further replies.

# simd sine?

This topic is 5506 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

recently I was SIMD-izing a source, and I ran into matrix rotation which uses sine/cosine functions. Well, on the FPU those are easy, just use fsincos, but what would would be the fastest/most effective way to get the sine and cosine using the simd functions? I implemented Taylor''s polynomial(?) for a trial run... and I noticed with my code (see below) there is a little bit of inaccuracy (especially when using singles) Has anyone here ever done this before? I''m not even sure this is a "optimum" solution considering the inaccuracies. I estimate: ~42 clocks to sine 4 packed singles or 2 packed doubles... But with those inaccuracies, it might not be worth it. BTW, this is testing code, so everything uses single singles and no pipelining. Nor do I check to see the bounds of x to be sure it''s between 2pi and 0... Also, it''s in SpAsm syntax, not intel syntax ; ------------ 8< -------------- [ff6r: F$6.0] [ff120r: F$120.0] [ff5040r: F$5040.0] [sinex: F$0.5] ; initialize fpu constants: fld F$ff6r | fld1 | fdivrp fld F$ff120r | fld1 | fdivrp fld F$ff5040r | fld1 | fdivrp fstp F$ff5040r | fstp F$ff120r | fstp F$ff6r ; formula: ; x-((1/6)*x^3)+((1/120)*x^5)-((1/5040)*x^7) ; assume single single for simplicity: movss xmm0 X$sinex | movss xmm1 X$sinex mulss xmm1 xmm0 | mulss xmm1 xmm0 ; xmm1 = x^3 movss xmm2 xmm1 ; copy it to xmm2 mulss xmm1 X$ff6r mulss xmm2 xmm0 | mulss xmm2 xmm0 ; xmm1 = x^7 movss xmm3 xmm1 ; copy it to xmm3 mulss xmm2 X$ff120r mulss xmm3 xmm0 | addss xmm1 xmm2 ; xmm1 = (1/6)*x^3)+((1/120)*x^5) mulss xmm3 xmm0 ; xmm3 = x^7 mulss xmm3 X\$ff5040r | addss xmm1 xmm3 ; xmm1 = ((1/6)*x^3)+((1/120)*x^5)-((1/5040)*x^7) subss xmm0 xmm1 dbgxmm ; ------------ 8< --------------

##### Share on other sites
Download amd''s maths library for 3dnow, see how they did it, and rewrite for SSE. I assume that amd would know what they are doing, and it is 100% accurate.

##### Share on other sites
I must be missing something, but it''s not open source, just a dll

##### Share on other sites
you gave me a good idea though... after a little searching on Intel''s site, I came across the Approximate Math Library:

http://www.intel.com/design/pentiumiii/devtools/AMaths.zip

very nice, very nice indeed.

1. 1
2. 2
3. 3
Rutin
23
4. 4
5. 5
khawk
14

• 9
• 11
• 11
• 23
• 10
• ### Forum Statistics

• Total Topics
633651
• Total Posts
3013134
×