Archived

This topic is now archived and is closed to further replies.

sin() / cos() speed

This topic is 4950 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm reading Tricks of the 3D Game Programming Gurus at the moment, and LaMothe writes that the trig functions are very slow and that the best thing to do when comuting sin and cos in a game is to use pre-built lookup tables. This is obviously faster, but in this book he creates a function that calculates sin and cos for non-integer values. I performed a small speed test and it turns out that calculating these values from the lookup table was actually *slower*! For your game engines, do you guys just use the sin() and cos() functions, or is there a better way? By the way, the code from the book is something like this: float Sin(float theta) { theta = fmodf(theta,360); if(theta < 0) theta += 360; int iThetaInt = (int)theta; float fThetaFrac = theta - iThetaInt; return(fSin[iThetaInt] + fThetaFrac * (fSin[iThetaInt + 1] - fSin[iThetaInt])); } Turns out that the fmodf() function seems to take twice as long as just calling sin()! [edited by - Strayfire on May 29, 2004 12:27:27 AM]

Share this post


Link to post
Share on other sites
x87 FPU instruction: FSIN, valid on every at least every processor in the P6 family.

inline float Sin(float theta)
{
_asm {
FLD theta // Push theta onto fpu register stack.

FSIN // Take sin of st0 and store at st0.

FSTP theta // Copy st0 into theta.

}

return theta; //return sin(theta).

}


Make your function inline b/c the function si so small, and this removes the overhead of pushing EIP onto stack and maybe theta, but i''m not sure. Another thing; there''s probably a more optimal method of returning the value in st0, but i''m just learning IA-32 and couldn''t tell you where the compiler puts floating-point returns. A method with potentially less overhead might be a macro:


#define Sin(theta, dest) \ //params are theta and return dest,

_asm { \ //are both 32-bit floating-point.

FLD theta \
FSIN \
FSTP dest \
}

Sry, for the syntax errors--the reply edit box isn''t a very good compiler

Share this post


Link to post
Share on other sites
Thanks, that certianly sped it up. It''s a little under 1.5x the speed of calling sin(). I always thought that the trig functions were slow, but even the standard sin() and cos() functions seem to be pretty quick.. odd.

Share this post


Link to post
Share on other sites
No, it''s about 1.5x the speed. Definitely worth the speed increase. I just think it''s odd that the even the c library trig functions seem to be a lot faster than what I''ve heard in the past.

Share this post


Link to post
Share on other sites
Temporarily changing the precision control to 24-bit significands (for single precision) may speed things up a bit:

short x87fpuControlWordStore;
//set bits 8 and 9 of control word mask to 00.

short x87fpuControlWordPCMask = 0xF3FF //1..1 00000011 1..1 1..1

//is that correct (?)

short x87fpuControlWordLoad;
#define Sin(theta, dest) \ //for 32-bit floating-pt only

_asm {
FNSTCW x87fpuControlWordStore
//saves ctrl r w/o checking

//for pending exceptions

MOV AX, x87fpuControlWordStore
AND AX, x87fpuControlWordPCMask //mask precision ctrl

//field to make 24-bit

MOV x87fpuControlWordLoad, AX //copy to WordLoad

FLDCW x87fpuControlWordLoad //load new control word

//continue with calculations

FLD theta
FSIN
FSTP dest
//restore previous control word reg state

FLDCW x87fpuControlWordStore
}

Probably needs some syntax tweaking. And I think it may be slower b/c of all the memory accesses. I think changing precision should be done at the beginning of a stretch of code that you know will use all floats, not every time you want to find sine, but the above code shows how, that is if it works, lol

Share this post


Link to post
Share on other sites
quote:
Original post by Strayfire
No, it''s about 1.5x the speed. Definitely worth the speed increase. I just think it''s odd that the even the c library trig functions seem to be a lot faster than what I''ve heard in the past.


Lol, i''m surprised that my code actually worked, did you have to tweak it at all?

Share this post


Link to post
Share on other sites
How are you measuring the speed? With QueryPerformanceCounter()? I hear it's really precise and accurate. There's a cockroach crawling around my room... eek!

EDIT: Don't run the last code snippet! There's a bug, and i don't know what it will do! Hold on i'm working on something with less memory accesses and the correct mask, lol

[edited by - temp_ie_cant_thinkof_name on May 30, 2004 2:58:05 AM]

Share this post


Link to post
Share on other sites
I just used the original inline function you posted. I usually time the code by looping hundreds of thousands of times and calculating the time difference with GetTickCount(). It''s probably not the best method of timing code, but it gives me a good idea of how fast one piece of code is compared to another. For instance, Sin() looped about 360,000 times took about 5 seconds while the c library sin() took about 7.5.

Share this post


Link to post
Share on other sites
I went to the kitchen to get a snack, thought about it for a few minutes and have come up with a (probably) faster piece of code. To test its effectiveness, you have to test with the PC change to 24-bit and without. I'm not really sure if the compiler makes that code for floats or not, but if it doesn't then doing this may about double the speed of the default double-extended precision (80-bit, ouch!).

short ControlWord;
short ControlWordPC24;

#define SetPrecision24 \
_asm { \
FNSTCW ControlWord \ //store the current CW.

MOV AX, 0xFCFF \ //make bits 8 and 9 be 0.

AND AX, ControlWord \ //AND CW with AX --> AX.

MOV AX, ControlWordPC24 \ //copy to memory.

FLDCW ControlWordPC24 \ //load PC=00 CW.

}

#define RestorePrecision \
_asm { FLDCW ControlWord }

#define Sin(theta, dest) _asm { //...


Less memory accesses, yay!

EDIT:
For x87 FPU there's also FPTAN FPATAN FSINCOS (faster than just sin and then cosine) And other transcendental functions. Donload a copy of the IA-32 architecture manuals from the intel site to see how they're used.

edit: code

[edited by - temp_ie_cant_thinkof_name on May 30, 2004 3:22:18 AM]

Share this post


Link to post
Share on other sites
Well, the last few functions on changing the precision haven''t worked for me (fatal errors) but the inline Sin() function is working fine for me. Can''t say I really understand what''s going in the precision code. I know a fair bit of assembly language, but not much when it comes to floating point numbers. I ordered and received the intel IA-32 manuals and AMD64 manuals about 4 months ago, but still haven''t gotten around to reading them. Thanks a bunch for the code and reminding me that I still had those manuals lying around

Share this post


Link to post
Share on other sites
Glad to help. Which functions weren''t working? The ones in my last post, the macros? B/c they should work... I may not have the syntax down though. Anyway, I just thought it would be nice to know that the FPU was working with single precision instead of extended-double, which is 2x slower (guesstamating). I don''t know if VC++ sets the Precision Control field in the FPU Control Word or not, so that''s why i''m curious.

Share this post


Link to post
Share on other sites