a gripe about tiny speed increases

Started by
7 comments, last by walkingcarcass 21 years, 6 months ago
i dont like the standard c math library: i did a speed test on some of these functions a while back. fabs() which should take about five ticks i clocked at somewhere near forty, which is about as long as it takes your processor to square root! sin and cos could be about twice as slow as they should be. people panic about speed but a few minutes rewriting eg
double Sin(float theta)
{
__asm {
      fld theta
      fsin
      }     //simple, isn''t it?
}
 
can give massive percentage increases. im from the ultra-tight code school of thought so seing people use fabs() liberally makes me itch. what really bugs me is not the slight increase in speed, but the fact that nobody seems to care! its a loneley category in the world of COM users. only trouble is i cant get powers to work. f2xm1 does nothing on my processor, but pow() works fine. ho hum. ******** A Problem Worthy of Attack Proves It''s Worth by Fighting Back
spraff.net: don't laugh, I'm still just starting...
Advertisement
People also don''t know enough asm code to do it. I know a little asm, but I have no idea what you just wrote does, other than return the sin.

---
Make it work.
Make it fast.

"I’m happy to share what I can, because I’m in it for the love of programming. The Ferraris are just gravy, honest!" --John Carmack: Forward to Graphics Programming Black Book
"None of us learn in a vacuum; we all stand on the shoulders of giants such as Wirth and Knuth and thousands of others. Lend your shoulders to building the future!" - Michael Abrash[JavaGaming.org][The Java Tutorial][Slick][LWJGL][LWJGL Tutorials for NeHe][LWJGL Wiki][jMonkey Engine]
I hope you''re comparing Release-mode code and not Debug code. My standard libraries (the ones which come with Visual C++) turn sin() and co. into the fsin and co. instructions...

If I had my way, I''d have all of you shot!

codeka.com - Just click it.
Right you are!
By the way, replace fabs() with *(uint32*)&f &= 0x7fffffff.

Dean: my VC++ 6 calls _fabs in "optimize for size" mode, and uses fld/fabs otherwise.
Also, if you need both sin and cos, fsincos will be 2x as fast.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
when people post about the language being slow but don''t mention which implentation they''re talking about it doesn''t really make any sense to listen.

i personally implement part af the c standard library like this...

give me a piece of paper with a number written on it in pencil

i rub out the minus sign if there is one

i give it back to you

fab!
i use borland. is there some unorthadox set of #define symbols to speed things up?

********


A Problem Worthy of Attack
Proves It''s Worth by Fighting Back
spraff.net: don't laugh, I'm still just starting...
pete:
Spot the non-standard keyword that narrows it down to MS or Borland? Or maybe his observation applies to all implementations that I know of...

Don''t have BCB installed anymore, but #pragma intrinsic should do the trick.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
Your assembly potentially contains a bug (I think you need a fwait after fsin), and several IEEE non-comformances.

while *(uint32*)&f &= 0x7fffffff works for the expected case, it too contains a IEEE non-conformity.

If you can cut something out and decide you don't care about it, you can always go faster. If you need consistent, portable, standard behavior, you can't cut the corners.

If you're intersted in fast math functions, look at Intel's Math Kernel Library.

[edited by - Magmai Kai Holmlor on October 2, 2002 11:41:47 PM]
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
quote:Original post by Magmai Kai Holmlor I think you need a fwait after fsin

Nope. (OK, unless the x in x86 <= 3)

quote:Original post by Magmai Kai Holmlor Your assembly potentially contains [...] several IEEE non-comformances.

while *(uint32*)&f &= 0x7fffffff works for the expected case, it too contains a IEEE non-conformity.

Correct. However, I usually gratuitously assume that the arguments aren''t denormal or SNANs, and |angle| < 2^63.

quote:Original post by Magmai Kai Holmlor If you can cut something out and decide you don''t care about it, you can always go faster. If you need consistent, portable, standard behavior, you can''t cut the corners.

Also correct. I gather he''s interested in speeding things up, though, and GIGO is a reasonable way to do so, if you control input.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement