How to know an inline function is really inline

Started by
14 comments, last by Jan Wassenberg 16 years, 5 months ago
Quote:Original post by mattnewport
Quote:Original post by Dave
TBH it's barely worth inlining anything at all. Any language directive is only a hint and without it the compiler will still do what it thinks is best.


Some compilers, like Visual C++, can sometimes inline functions that are not declared as inline through whole program optimization and link time code generation. In general though if you have a function that is a good candidate for inlining (generally a short and simple function that you know is called with high frequency in performance sensitive code) then it is worth declaring it as inline and defining it in a header where it is visible to all callers as the optimizer is more likely to be able to inline it where appropriate.


This is true. Typically, the inliner in a compiler is implemented as a set of heuristics. The compiler weighs the characteristcs of a give inline candidate. Typically it might look at the size of the callee, whether the user said to inline it, does it have EH, etc. If you specify __forceinline you will tell the compiler to ignore the heuristics and always inline. However, there are still cases when this is NOT going to inline the function. This can happen if the compiler does not know how to inline the function. Sometimes it's just too difficult. For instance, inlining SEH into C++ EH (which introduces a nasty thing called 2-pass flow), inlining vararg functions, or inlining function with inline asm.

However, because __forceinline is often used for correctness (ex: wrapping an intrinsic call that must be inlined for it to work correctly), there are no instances where a __forceinline will be ignored because it provides negative benefit.

But, in the end, the moral of the story is probably to let the compiler do what it wants. It's very good at determining what the best inlining pattern is.

(BTW, I work on the VC++ backend)
Advertisement
Quote:Original post by Drew_Benton
You can't 'truly' know unless you look at the generated assembly and see how it is called and appears in the target. If you need help figuring it out though in ASM, you could use something like:

__asm { __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 }// function body goes here__asm { __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 }


What that does is pad 5 NOPs around the body of the code (but will still come after prolog and end before epilog code, reference) but you can open up a free disassembler, OlylDbg for example, and use the search command for 5 NOPS in a row and see where your code is. **

The only caveat to that is the compiler might optimize out certain things like that, but I am not sure. When going between debug and release configurations the code is different anyways so no telling if this is actually will work.

** The above is not truly tested against identifying inline code and is just an idea.


If you do this (btw, the inline asm instructions must be on separate lines), you will see them in the output of the asm (/FA)
Quote:There are a fair number of good reasons not to care.

Quote:But, in the end, the moral of the story is probably to let the compiler do what it wants. It's very good at determining what the best inlining pattern is.

Argh! Such one-sided advice and "black-box" thinking really is dangerous.
I was recently asked to help optimize an implementation of the SIFT operator; the SECOND function on the profile was a trivial getter function that wasn't inlined (despite global optimization). If one does not know to look for such things and how to fix them, there are serious prices to be paid in terms of performance.

It is much more valuable that you listed the exact cases where the compiler currently has trouble inlining :D
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
When I was using Doxygen to generate documentation for one of my programs, I saw it listed what was inlined and what wasn't (but I don't know how it came up with that info).

Would this be helpful to the OP, or is Doxygen's determination of whether something is inlined or not not reliable?
Signature go here.
Quote:Original post by Jan Wassenberg
Quote:There are a fair number of good reasons not to care.

Quote:But, in the end, the moral of the story is probably to let the compiler do what it wants. It's very good at determining what the best inlining pattern is.

Argh! Such one-sided advice and "black-box" thinking really is dangerous.
I was recently asked to help optimize an implementation of the SIFT operator; the SECOND function on the profile was a trivial getter function that wasn't inlined (despite global optimization). If one does not know to look for such things and how to fix them, there are serious prices to be paid in terms of performance.

It is much more valuable that you listed the exact cases where the compiler currently has trouble inlining :D


By "global optimization" do you mean link time code gen (/LTCG)? C++ compilation happens in modules, typically a single CPP file. Each file is compiled separately into an obj, then fed to the linker to be mashed together. Under this model, you can't inline a function outside the module (say, defined in another class), becuase the compiler doesn't know how to get to it. Under LTCG, however, the linker creates a "meta-module" and then calls the backend for each function in this meta-module, allowing for inlining and general optimizations across compilation units.

Obviously, there are instances where specifying __forceinline can be beneficial. Saying something like "this isn't getting inlined, and I KNOW it's very hot" is OK. But the general usage pattern tends to be this:

1. Developer discovers __forceinline, and it helps perf for some special cases where perf was bad before.
2. Developer gets it into his head that, well..."if it helped here, then it'll help elsewhere, right"?
3. Developer starts specifying it before all his functions, or even before all of his "small" functions.

This is NOT good thinking. The end outcome may be much worse. The compiler can know a lot of stuff the developer has a hard time seeing, size of inline tree, how will the inline affect register pressure, how will it affect the ability to run other optimizations down the line etc. Which is why in MOST cases, it's best to just let the compiler do what it wishes.
Quote:When I was using Doxygen to generate documentation for one of my programs, I saw it listed what was inlined

Useful though that may be, Doxygen will only be looking at the C++ code (for the inline keyword, or whether a function is defined in the class declaration). No information can be gained about the code that's actually generated.

Quote:By "global optimization" do you mean link time code gen (/LTCG)?

Exactly. BTW, ICC's WPO did not succeed in inlining the function, either.
In this case, it was sufficient to move the function to the header; __forceinline was not necessary.

Quote:Which is why in MOST cases, it's best to just let the compiler do what it wishes.

Agreed :) However, we should not forget that other cases exist and ought to be prepared for them when they arise.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement