C++ multiple files and low-level effects

Started by
4 comments, last by Zoner 11 years, 8 months ago
*Note the following is for the purpose of accurate knowledge, no responses of why do you care, or it doesn't really matter necessary.


I was thinking about the compilation process and ASM the other day and came up with the following question: Are there any "adverse" low-level effects to multi-source file compilation in C++? The reason I ask is as follows, object code (prelinker) is in Machine Language right? So the addressing mode to all branches and jumps have already been 'baked in' so if every source file is an independent translation unit doesn't the branches or jumps (between each translation unit) have to be a full pointer instead of smaller (PC relative) immediate data? If so does/can the compiler optimization mode make a difference? It doesn't seem possible but I thought I'd ask.

In summary I've seen source where every class has its own cpp and header file, and basically wondered if the executables made with this extreme were more 'bloated' for it?


Thank You in advance for any insight into the subject matter.

-potential energy is easily made kinetic-

Advertisement
Branches can be re-written/translated/moved/patched, as long as a 'baked' section of code has some kind of manifest/reflection data with it, describing the locations of pointer/offset values.
A bigger deal is that any optimisations done by the compiler are restricted to that translation unit (e.g. the compiler can't inline a function that's implemented in a different translation unit). However, modern compilers support Link Time Code Generation and Whole Program Optimization, which allows the linking step to act like an optimising compiler and re-write code after/during linking.

A bigger deal is that any optimisations done by the compiler are restricted to that translation unit (e.g. the compiler can't inline a function that's implemented in a different translation unit). However, modern compilers support Link Time Code Generation and Whole Program Optimization, which allows the linking step to act like an optimising compiler and re-write code after/during linking.

Assuming you don't have the latter what source code solutions are available to allow for inlining? On a related note, I once read somewhere that if you inline (put implementation in class definition) the compiler will inline the object code, is this true? (I didn't believe it)

-potential energy is easily made kinetic-


Branches can be re-written/translated/moved/patched, as long as a 'baked' section of code has some kind of manifest/reflection data with it, describing the locations of pointer/offset values.

Just to be sure you are saying a set of instructions (2 or 3) to execute a branch can be rewritten to a single branch instruction (or moreover a branch instruction of size X can be changed to a branch instruction of size Y) and then all the branch targets after said branch will be updated and then all branches in other translation units then get updated with the new branch targets. And this goes round and round till everything 'settles into place'? Wow if so i'm impressed, I never knew linkers' were that complicated.

-potential energy is easily made kinetic-


On a related note, I once read somewhere that if you inline (put implementation in class definition) the compiler will inline the object code, is this true? (I didn't believe it)


Maybe.

You should treat 'inline' as more of a hint which the compiler is likely to ignore; during compiling it will do various stages of analysis to decide if a function should be inlined or called directly depending on what is going on at the call site at that moment. This can mean that calling function 'foo' from two locations might result in it being inlined in one but not another. However this only applies to functions the compiler can see all of which is at the 'translation unit' level (pretty much a single .cpp + headers) as that is all the code it has access to; so if a function is fully defined in a header it can potentially be inlined.

Note the 'only code it can see' above; this is where link time code generation and whole program optimisation enters the picture.

The linker has a much more complete picture of the code and as such can run its own anaylsis to determine what to inline when. So this could result in a function being defined in one .obj file and called from two others being inlined in one case but not the other depending on the call site.

The code can also be physically moved around in the executable to try and improve locality (a function is only called in a few places, so if those places are close together then putting the function close to them is good) and this would require adjustment to the code itself (compiler might have used a 'long' jump but with the code moved locally the linker can emit the 'short' version instead).

Simply put the compilers do a lot of clever stuff these days, some better than others, and it is a very active area of research still when it comes to applying optimisations etc.
Multiple OBJ files foils inlining, unless using link time code generation (/LTCG).

If you turn off function level linking, the whole obj gets linked in if ANY function or data in the OBJ is referenced in the rest of the program. This can be an issue if you care about space (as we definitely do on consoles).

Multiple OBJs can be a problem if a header file defines static variables. This will cause the variable to exist 'without a name' in each of the obj files and get linked into the final binary multiple times (another space issue, and can be hard to track down the cause)
http://www.gearboxsoftware.com/

This topic is closed to new replies.

Advertisement