Archived

This topic is now archived and is closed to further replies.

This topic is 5143 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

recently I was SIMD-izing a source, and I ran into matrix rotation which uses sine/cosine functions. Well, on the FPU those are easy, just use fsincos, but what would would be the fastest/most effective way to get the sine and cosine using the simd functions? I implemented Taylor''s polynomial(?) for a trial run... and I noticed with my code (see below) there is a little bit of inaccuracy (especially when using singles) Has anyone here ever done this before? I''m not even sure this is a "optimum" solution considering the inaccuracies. I estimate: ~42 clocks to sine 4 packed singles or 2 packed doubles... But with those inaccuracies, it might not be worth it. BTW, this is testing code, so everything uses single singles and no pipelining. Nor do I check to see the bounds of x to be sure it''s between 2pi and 0... Also, it''s in SpAsm syntax, not intel syntax ; ------------ 8< -------------- [ff6r: F$6.0] [ff120r: F$120.0] [ff5040r: F$5040.0] [sinex: F$0.5] ; initialize fpu constants: fld F$ff6r | fld1 | fdivrp fld F$ff120r | fld1 | fdivrp fld F$ff5040r | fld1 | fdivrp fstp F$ff5040r | fstp F$ff120r | fstp F$ff6r ; formula: ; x-((1/6)*x^3)+((1/120)*x^5)-((1/5040)*x^7) ; assume single single for simplicity: movss xmm0 X$sinex | movss xmm1 X$sinex mulss xmm1 xmm0 | mulss xmm1 xmm0 ; xmm1 = x^3 movss xmm2 xmm1 ; copy it to xmm2 mulss xmm1 X$ff6r mulss xmm2 xmm0 | mulss xmm2 xmm0 ; xmm1 = x^7 movss xmm3 xmm1 ; copy it to xmm3 mulss xmm2 X$ff120r mulss xmm3 xmm0 | addss xmm1 xmm2 ; xmm1 = (1/6)*x^3)+((1/120)*x^5) mulss xmm3 xmm0 ; xmm3 = x^7 mulss xmm3 X$ff5040r | addss xmm1 xmm3 ; xmm1 = ((1/6)*x^3)+((1/120)*x^5)-((1/5040)*x^7) subss xmm0 xmm1 dbgxmm ; ------------ 8< --------------

Share this post


Link to post
Share on other sites
Download amd''s maths library for 3dnow, see how they did it, and rewrite for SSE. I assume that amd would know what they are doing, and it is 100% accurate.

Share this post


Link to post
Share on other sites