Jump to content
  • Advertisement
Sign in to follow this  
fir

what gcc options use to optymize exe

This topic is 2075 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am using mingw (i am new to it) and running some very simple raytracing

stuff i noticed this kind of results, when compile and run

 

no -Ox flag   22 ms

-O1     12 ms

-O2    12 ms

-O3   6.5 ms

 

so it is quite shocking difference (I was thinking 22 ms was slow for 

simple calculations i do, 7 ms is ok)

 

I am using winXp and old core2duo machine

 

what could i set yet?,

Does maybe someone know what instruction set is generated? what to set here?

also - Need i only use -O3 with critical .o module or with all other modules too?

Share this post


Link to post
Share on other sites
Advertisement

Start by making sure you have a modern version of gcc. Compiling the module that contains your inner loop with -O3 is the most important thing. You might get some mileage out of `-flto' (both in compilation and linking commands), which allows the compiler to inline functions across modules. Then you can try profile-based optimizations (where you compile your program with -fprofile-generate, run it on some test cases and then compile it again with -fprofile-use).

 

For more information: http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Optimize-Options.html

Share this post


Link to post
Share on other sites

Start by making sure you have a modern version of gcc. Compiling the module that contains your inner loop with -O3 is the most important thing. You might get some mileage out of `-flto' (both in compilation and linking commands), which allows the compiler to inline functions across modules. Then you can try profile-based optimizations (where you compile your program with -fprofile-generate, run it on some test cases and then compile it again with -fprofile-use).

 

For more information: http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Optimize-Options.html

I am using 

 

c:\MinGW\bin>g++ -v
Using built-in specs.
COLLECT_GCC=g++
Target: mingw32
Configured with: ../../src/gcc-4.7.1/configure --build=mingw32 --enable-language
s=c,c++,ada,fortran,objc,obj-c++ --enable-threads=win32 --enable-libgomp --enabl
e-lto --enable-fully-dynamic-string --enable-libstdcxx-debug --enable-version-sp
ecific-runtime-libs --with-gnu-ld --disable-nls --disable-win32-registry --disab
le-symvers --disable-build-poststage1-with-cxx --disable-werror --prefix=/mingw3
2tdm --with-local-prefix=/mingw32tdm --enable-cxx-flags='-fno-function-sections
-fno-data-sections' --with-pkgversion=tdm-1 --enable-sjlj-exceptions --with-bugu
Thread model: win32
gcc version 4.7.1 (tdm-1)
 
does you maybe know how to set a stet of supported architecture ?
i do not know what it uses by default - I think it would be important
Edited by fir

Share this post


Link to post
Share on other sites

You can get a newer version of gcc by using mingw-w64 instead of mingw. It can generate 32 or 64-bit executables.

 

For 64-bit release builds, many projects use no flags beyond "-march=x86-64 -mtune=nocona -O3". You can fiddle with other flags if you'd like.

 

For 32-bit release builds, you'd probably want "-march=i686 -mtune=nocona -mfpmath=sse -O3" since SSE math isn't default. Note that this will use 64-bit FP temps instead of 80-bit temps, so it can break things if you're using a math library that requires 80-bit temps.

 

If you're not distributing the executables, you could use "-mtune=core2" instead so you get SSSE3's 16 instructions. But it'd be kind of stupid to kill P4 support for so little gain if you actually distribute the executable.

Share this post


Link to post
Share on other sites

You can get a newer version of gcc by using mingw-w64 instead of mingw. It can generate 32 or 64-bit executables.

 

For 64-bit release builds, many projects use no flags beyond "-march=x86-64 -mtune=nocona -O3". You can fiddle with other flags if you'd like.

 

For 32-bit release builds, you'd probably want "-march=i686 -mtune=nocona -mfpmath=sse -O3" since SSE math isn't default. Note that this will use 64-bit FP temps instead of 80-bit temps, so it can break things if you're using a math library that requires 80-bit temps.

 

If you're not distributing the executables, you could use "-mtune=core2" instead so you get SSSE3's 16 instructions. But it'd be kind of stupid to kill P4 support for so little gain if you actually distribute the executable.

As i understand i can generate for specified architecture like here but how to generate for architacture up from specified for example SSE2 and above?

Share this post


Link to post
Share on other sites

And if you're not distributing binaries but only the source which users compile manually, you can use -march=native (gcc, mingw, clang) which will use every arch-specific optimization the target CPU supports. Don't do this if you are distributing the binaries, since it will prevent users without your CPU's features to run your program (in that case you either aim for a lowest common denominator such as SSE2, which every computer for the last decade or so supports, or you select code paths at runtime via cpuid, which is harder to maintain for little gain).

Share this post


Link to post
Share on other sites

And if you're not distributing binaries but only the source which users compile manually, you can use -march=native (gcc, mingw, clang) which will use every arch-specific optimization the target CPU supports. Don't do this if you are distributing the binaries, since it will prevent users without your CPU's features to run your program (in that case you either aim for a lowest common denominator such as SSE2, which every computer for the last decade or so supports, or you select code paths at runtime via cpuid, which is harder to maintain for little gain).

 

but how to set sse2 and above, i cannot specify say pentium4 because

it works differently than cpus above  so compiling on that can slow down the above, optimizer should take some set of target cpu and even some statistics of that I think .. or no?

Share this post


Link to post
Share on other sites

but how to set sse2 and above, i cannot specify say pentium4 because
it works differently than cpus above  so compiling on that can slow down the above, optimizer should take some set of target cpu and even some statistics of that I think .. or no?

 

Using "-march=pentium4" simply tells the compiler to target a generic Pentium 4 processor, in other words, to consider the following instruction set:

 

base x86 instruction set

+ mmx instructions

+ sse instructions

+ sse2 instructions

 

Any instruction outside this instruction set is forbidden to appear in the resulting binary (the optimizer just behaves as if it did not know about them). Because the x86 series of processors is largely backwards compatible, any CPU which has those instruction sets - and possibly other, later ones - is able to run programs compiled for pentium4. The program will not be "slow" on later CPU's, it just won't use the later instructions.

 

So to answer your question, you cannot set "sse2 and above" (in the way you seem to mean) because if you build for a target instruction set greater than sse2 (for instance, sse3) then any processor which does not have sse3 support cannot run the program. The code inside the binary won't adapt itself to the CPU it's running on after being compiled - if you want this to happen, you have to code that mechanism yourself, and it can be quite a bit of work. Or you can distribute the program in a more malleable format, such as plain source code, or an intermediate language equipped with a jitter (such as Java bytecode or C# IL). Otherwise, "sse2 and above" really means "we use sse2 only, any later instruction sets don't matter as the program won't use them".

 

Another popular approach is to distribute a 32-bit and a 64-bit version of your program. The 32-bit version is for old computers and targets Pentium 4, and the 64-bit version is for recent computers and targets the Core 2 series of processors (64-bit extensions and up to sse3). You also start seeing avx builds pop up recently which build for Core 2 along with sse4, sse4.1, sse4.2, sse4.a, avx and, occasionally, avx2, for the very recent computers (last two or three years). The program won't be perfectly optimized for everyone but the available selection of builds is sufficiently granular to ensure the program takes reasonable advantage of the user's hardware. Assuming they use the right binary for them, that is, which can potentially translate to increased support costs.

 

Note on gcc there is also the "generic" architecture target which changes over time and tries to target the mainstream, but you might not want to rely on what they consider mainstream if you have specific needs. I quote: "-mtune=generic - Produce code optimized for the most common IA32/AMD64/EM64T processors. If you know the CPU on which your code will run, then you should use the corresponding -mtune option instead of -mtune=generic. But, if you do not know exactly what CPU users of your application will have, then you should use this option. As new processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, the code generated option will change to reflect the processors that were most common when that version of GCC was released."

Share this post


Link to post
Share on other sites

 


but how to set sse2 and above, i cannot specify say pentium4 because
it works differently than cpus above  so compiling on that can slow down the above, optimizer should take some set of target cpu and even some statistics of that I think .. or no?

 

Using "-march=pentium4" simply tells the compiler to target a generic Pentium 4 processor, in other words, to consider the following instruction set:

 

base x86 instruction set

+ mmx instructions

+ sse instructions

+ sse2 instructions

 

[...]

well tnx for extensible info, though i must say i did not understood 

it clearly

 

though i can say what i would like to generate, i would probably 

need 4 version

 

1) generic version for all computers including very old stuff

from 386 and above

 

2) generic version for medium weak stuff like pentium 3 or above

 

3) generic stuf for core2 and above

 

4) generic stuff for todays age cpus (1-2 year back on the market)

[that would be 64 bit]

 

those boundaries are maybe not very good choden, could maybe

someone help 1) with this (chosing it) and 2) write down the compile line swiches for the four sets above?

 

I am compiling right now pure win32 winapi stuff (no open gl

and any other dependency so i think it should run even on windows

95 or maybe yet a step earlier (or not?) I am not sure - if i will

just compile this to 64 bits will it work witch no source change?)

 

(though i couldnt test it because i got 32 bit xp installed right now)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!