Compiling + Linking too long...

Started by
7 comments, last by GameDev.net 17 years, 6 months ago
Hi GDNet, We're experiencing time issues in our project due to very long compiling times. The fact is that on a single core/dual core machine, it takes about 20 min to compile, 2 min to link. On a quad core (in fact, a hyper threaded dual core CPU), it takes about 30 minutes. We suspect that disk access is what might be slowing us down in our process, though our machines are equipped with raid 10k hard drives. My question is: does anyone know a tool that might help us to monitor system activity during our build process (which BTW uses make files) I'd like to have a report on CPU activity, disk activity, how much time spent seeking, Reading, writing and so on. Any idea would be appreciated. Thank you.
Advertisement
What does it matter? You can't reduce the disk usage of your compiler.

Better to focus on how to reduce interdependancies between your modules, as that reduces average compile times (although not full rebuild compile times). Precompiled headers can, occasionally, give you a benefit if used properly. You can also look into distributed compiling tools like Incredibuild.

But, to be honest, thirty minutes is nothing.
I understand that disk usage cant be reduced, but the point is that the fastest machine we have (that is an overclocked hyper threaded dual core) is compiling/linking with 50% more time than other machine, which led me to think that CPU is not the limiting factor. And I need to find out what it is.

I agree too that 30 min is not that much, but the nature of our activity (that is port programming, not original development) requires our developers to make tiny modifications/hacks and test them very frequently. Due to indeed nasty dependencies between files, testing even a slight modification sometimes requires to fully rebuild the project and wastes precious time. In that context, 30 minute is a lot. Unfortunately, we have no control over the original project architecture and cannot reduce those dependencies. Thus our only hope left is to try speeding up things as much as we can, and to do that optimization we need to "profile" what our bottleneck is.

Also, we've tried Incredibuild, but since that tool is not able to determine file dependencies from a make-file, all parallel compilation does not occur (like all machines but one are ignored)
Quote:Original post by janta
Hi GDNet,

We're experiencing time issues in our project due to very long compiling times. The fact is that on a single core/dual core machine, it takes about 20 min to compile, 2 min to link. On a quad core (in fact, a hyper threaded dual core CPU), it takes about 30 minutes.

We suspect that disk access is what might be slowing us down in our process, though our machines are equipped with raid 10k hard drives.

My question is: does anyone know a tool that might help us to monitor system activity during our build process (which BTW uses make files) I'd like to have a report on CPU activity, disk activity, how much time spent seeking, Reading, writing and so on.

Any idea would be appreciated.


In order to enable us to truly help you, you'll have to provide some further background information about your project's structure and the build system you are using right now, as well as the build parameters/settings you are using.

You said for example, you were using Makefiles-if you are referring to unix GNU makefiles, there's the possibility to make use of parallel builds (using "make -j x", where 'x' is the number of parallel build processes you'll want make to spawn) if the Makefiles (and the project itself) are structured accordingly.

This can significantly reduce build times, as various processes will be started, building individual targets-rather than having only one single build process which builds all targets sequentially. Likewise, you should be able to verify whether I/O activity is truly a limiting factor, by running the whole build process on a memory disk and compare resulting times.

Apart from that, you really didn't provide much info about the platform itself, that the build system is running on right now-under Linux/Unix for example, you could easily retrieve all (and much more) of the required data, by simply running "top" in a separate console/window.

In general however, it is crucial to realize that compilation in itself is usually an inherently sequential process, as such it really doesn't lend itself to parallelization usually-thus, any parallelization you may want to achieve on SMP platforms, should first be reflected in the project's source code structure and usage/configuration of the build system-compilers themselves, are usually entirely single-threaded and are by design only rarely able to make use of more than one core.

Thus, corresponding refactoring (source code restructuring) is essential to ensure that build targets are kept independent from eachother, so as to maximally reduce inter-dependencies in order to enable the build system to build as many individual targets simultaneously as possible. Even if you are using Linux/(g)make you may find that you are not yet using these tools optimally.

You may want to do a google search for "parallel builds" w/ make, for further pointers about how to achieve this.

Quote:Original post by janta
I understand that disk usage cant be reduced, but the point is that the fastest machine we have (that is an overclocked hyper threaded dual core) is compiling/linking with 50% more time than other machine, which led me to think that CPU is not the limiting factor.


or that the multi-core CPU platform simply isn't used optimally?

And I need to find out what it is.

how large is the source tree to be built?
what are the build/compiler settings you are using?
are you using precompiled headers where appropriate?

Quote:I agree too that 30 min is not that much, but the nature of our activity (that is port programming, not original development) requires our developers to make tiny modifications/hacks and test them very frequently. Due to indeed nasty dependencies between files, testing even a slight modification sometimes requires to fully rebuild the project and wastes precious time. In that context, 30 minute is a lot. Unfortunately, we have no control over the original project architecture and cannot reduce those dependencies. Thus our only hope left is to try speeding up things as much as we can, and to do that optimization we need to "profile" what our bottleneck is.


if you can't do that, then running the whole build in-memory might really be a viable option to improve build times significantly, of course this is provided that your build machine has sufficient RAM.
Using a cron job, you could still synchronize everything regularly.

Quote:Also, we've tried Incredibuild, but since that tool is not able to determine file dependencies from a make-file, all parallel compilation does not occur (like all machines but one are ignored)


well, it seems as though you are already using parallel compilation?
If that's the case, depending on the parameters you use, THIS could actually be your limiting factor-that is, if you allow make to start too many jobs for your particular platform, this can also significantly increase overall build times.

Depending on your platform (RAM/CPU usage) you may want to use a more appropriate number of jobs, the GNU make tutorial has more info on this, too.
Build process bottleneck monitoring is probably going to be platform dependant, so it'd help if you name one.

My advice on optimizing compile times:

1) You need a system that will only recompile affected files. I didn't like any of the ones out there, but I considered it important enough I built my own build system by hand (using ruby scripts). Automake may have something like this (I froget), SCONs also comes to mind. Presuming you don't want to manually hand-manage your Makefile for every dependancy update, and don't want to do a full rebuild every time (you don't), this is a must. IDEs like VS2k5 and Eclipse's CDT plugin have options for this in their automatic project management systems. Far fewer forced full recompiles as a result. Like, basically none.

WRT building your own: I use hidden files associated with every source file (including headers) to keep track of what they #include. Rebuilds of a given source file only happen if the source file or any of the files listed have a more recent modification timestamp than this hidden file (which is then touched at the end of a successful build).

2) If the above isn't enough, using Pimpl may help (or other methods of decoupling implementation/behavior updates from header file changes, minimizing rebuilds as per above)

3) If your compiler has an option for incremental linking, turn it on if it isn't already!

4) If you're using boost::spirit grammars in many places in your code (or other insanely templated amounts of code), use explicit template instantiation instead of leaving everything to implicit processes).

5) ...

6) $ $ $ Profit $ $ $
Quote:Original post by janta
We suspect that disk access is what might be slowing us down in our process, though our machines are equipped with raid 10k hard drives.


Just to be sure: hardware or software RAID?

You'll usually want HARDWARE RAID and a reasonable controller with lots of on-board RAM to provide SWAP RAM directly on the controller.

Also, your RAID setup (0,1,5 etc?) could theoretically also be a limiting factor. Likewise, if you are remotely building via console/network or even network storage (NFS), this may be a factor, too.

However, we'll have to keep guessing if we don't get more info about your build platform and setup.

In general, if I were in a situation where I had to regularly modify/rebuild a complex source tree with lots of interdependencies that may/can not be resolved, I'd probably at least take the "RAMDISK" approach for starters and make sure I use the build system optimally.
You don't mention how much memory you have. You could have 128 100 petaherz processors, but your performance will be pretty poor if all they do is wait for the swapper to thrash.

Many compliers, including and especially certain versions of GCC, are memory pigs. More memory means faster compiles.

Also, if you're using GCC try using the '-p' switch to avoid writing temporary files (it'll use pipes between stages instead). This will not help if you're memory bound (page thrashing) but will help if you're disk bound.

Honestly, though, properly factored source files, good dependency checking (automake works wonders), and precompiled headers will make palpable differences.

Stephen M. Webb
Professional Free Software Developer

just FYI: some people who contribute on forums such as these, actually find it discouraging when the OP they tried to help, doesn't come back to provide the requested feedback.

This topic is closed to new replies.

Advertisement