GCC optimization

Started by
2 comments, last by q2guy 20 years, 1 month ago
Hi, I would know which are the best optimization options for the GCC compiler to optimice for a pentium4 not HT, this is what I think: -march=pentium4 -O2 -pipe -fomit-frame-pointer -mmx -sse -sse2 -ffast-math -mfpmath=sse -fprefetch-loop-arrays -finline-functions -frename-registers -fforce-mem -fforce-addr -malign-double -falign-functions=64 -falign-loops=5 -falign-jumps=5 -falign-labels=5 just to know how you will do it if you want to optimice to the max for specific cpu (p4) and without using asm, I know that maybe will be only a very bit faster, but I want to know it. Thanks.
Advertisement
I''m not sure about the P4 specifically, but a lot of systems actually run slower with -finline-functions and -funroll-loops enabled. Just because something sounds like an improvement, doesn''t mean it actually is. You might do just as well with something like
-march=pentium4 -Os -pipe -fomit-frame-pointer -msse2 -mfpmath=sse 


In a lot of cases with recent systems, the processor isn''t the real bottleneck, so you''ll see a better improvement if you try to optimise memory and hard disk usage (hence -Os).

Also, beware of -ffast-math. On modern processors, the FPU math is quite fast enough, and -ffast-math can produce inaccurate results. As one of the Gentoo developers said: "If I want my math done wrong, I''ll ask my cat."

Disclaimer: I don''t have a P4, and I haven''t benchmarked any of these.
I know that unroll loops is bad for modern cpu´s cache, but inline functions too ? it avoid only a call to some functions (3 or 4 asm instructions)
not thrashing the cache is better than any function call overhead...

This topic is closed to new replies.

Advertisement