#### Archived

This topic is now archived and is closed to further replies.

# a gripe about tiny speed increases

## Recommended Posts

i dont like the standard c math library: i did a speed test on some of these functions a while back. fabs() which should take about five ticks i clocked at somewhere near forty, which is about as long as it takes your processor to square root! sin and cos could be about twice as slow as they should be. people panic about speed but a few minutes rewriting eg
double Sin(float theta)
{
__asm {
fld theta
fsin
}     //simple, isn''t it?
}

can give massive percentage increases. im from the ultra-tight code school of thought so seing people use fabs() liberally makes me itch. what really bugs me is not the slight increase in speed, but the fact that nobody seems to care! its a loneley category in the world of COM users. only trouble is i cant get powers to work. f2xm1 does nothing on my processor, but pow() works fine. ho hum. ******** A Problem Worthy of Attack Proves It''s Worth by Fighting Back

##### Share on other sites
People also don''t know enough asm code to do it. I know a little asm, but I have no idea what you just wrote does, other than return the sin.

---
Make it work.
Make it fast.

"I’m happy to share what I can, because I’m in it for the love of programming. The Ferraris are just gravy, honest!" --John Carmack: Forward to Graphics Programming Black Book

##### Share on other sites
Dean Harding    546
I hope you''re comparing Release-mode code and not Debug code. My standard libraries (the ones which come with Visual C++) turn sin() and co. into the fsin and co. instructions...

If I had my way, I''d have all of you shot!

codeka.com - Just click it.

##### Share on other sites
Right you are!
By the way, replace fabs() with *(uint32*)&f &= 0x7fffffff.

Dean: my VC++ 6 calls _fabs in "optimize for size" mode, and uses fld/fabs otherwise.
Also, if you need both sin and cos, fsincos will be 2x as fast.

##### Share on other sites
petewood    819
when people post about the language being slow but don''t mention which implentation they''re talking about it doesn''t really make any sense to listen.

i personally implement part af the c standard library like this...

give me a piece of paper with a number written on it in pencil

i rub out the minus sign if there is one

i give it back to you

fab!

##### Share on other sites
i use borland. is there some unorthadox set of #define symbols to speed things up?

********

A Problem Worthy of Attack
Proves It''s Worth by Fighting Back

##### Share on other sites
pete:
Spot the non-standard keyword that narrows it down to MS or Borland? Or maybe his observation applies to all implementations that I know of...

Don''t have BCB installed anymore, but #pragma intrinsic should do the trick.

##### Share on other sites
Shannon Barber    1681
Your assembly potentially contains a bug (I think you need a fwait after fsin), and several IEEE non-comformances.

while *(uint32*)&f &= 0x7fffffff works for the expected case, it too contains a IEEE non-conformity.

If you can cut something out and decide you don't care about it, you can always go faster. If you need consistent, portable, standard behavior, you can't cut the corners.

If you're intersted in fast math functions, look at Intel's Math Kernel Library.

[edited by - Magmai Kai Holmlor on October 2, 2002 11:41:47 PM]

##### Share on other sites
quote:
Original post by Magmai Kai Holmlor I think you need a fwait after fsin

Nope. (OK, unless the x in x86 <= 3)

quote:
Original post by Magmai Kai Holmlor Your assembly potentially contains [...] several IEEE non-comformances.

while *(uint32*)&f &= 0x7fffffff works for the expected case, it too contains a IEEE non-conformity.

Correct. However, I usually gratuitously assume that the arguments aren''t denormal or SNANs, and |angle| < 2^63.

quote:
Original post by Magmai Kai Holmlor If you can cut something out and decide you don''t care about it, you can always go faster. If you need consistent, portable, standard behavior, you can''t cut the corners.

Also correct. I gather he''s interested in speeding things up, though, and GIGO is a reasonable way to do so, if you control input.