# How to know an inline function is really inline

This topic is 3765 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Is there a simply method? (without knowledge of dasm)

##### Share on other sites
You can't 'truly' know unless you look at the generated assembly and see how it is called and appears in the target. If you need help figuring it out though in ASM, you could use something like:

__asm { __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 }// function body goes here__asm { __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 }

What that does is pad 5 NOPs around the body of the code (but will still come after prolog and end before epilog code, reference) but you can open up a free disassembler, OlylDbg for example, and use the search command for 5 NOPS in a row and see where your code is. **

The only caveat to that is the compiler might optimize out certain things like that, but I am not sure. When going between debug and release configurations the code is different anyways so no telling if this is actually will work.

** The above is not truly tested against identifying inline code and is just an idea.

##### Share on other sites
just run your code with a profiler. inlined functions just don't show up there anymore.

##### Share on other sites
You could try putting a breakpoint on the line calling your function. When you run the app, you arrive at the breakpoint and tracing into the function does not actually get you into the code of that function it most probably is inlined.

But the best way is by looking at the generated code. It really isnt hard you know. Just put a breakpoint, run, right-click, get "show disassembly" and look if you see a "call ...." towards your function. If there is one, the function isnt inlined.

(all of the above was written with Visual C++ in mind as IDE ;) )

##### Share on other sites
Quote:
 Original post by KeisniIs there a simply method? (without knowledge of dasm)

Without looking at the assembly, I don't think you can. C++ doesn't have any way to guarantee that a function is inlined. The compiler can ignore the inline keyword if it wants to.

##### Share on other sites
Some compilers also provide the "__forceinline" keyword (MSVC does) but, once again, I'm not sure if it's 100% garanteed to come out inlined.

##### Share on other sites
Quote:
 Original post by TrillianSome compilers also provide the "__forceinline" keyword (MSVC does) but, once again, I'm not sure if it's 100% garanteed to come out inlined.

It isn't, in MSVC's case. It should however output a 4714 warning at compile time when failed.

##### Share on other sites
TBH it's barely worth inlining anything at all. Any language directive is only a hint and without it the compiler will still do what it thinks is best.

##### Share on other sites
Quote:
 Original post by DaveTBH it's barely worth inlining anything at all. Any language directive is only a hint and without it the compiler will still do what it thinks is best.

Some compilers, like Visual C++, can sometimes inline functions that are not declared as inline through whole program optimization and link time code generation. In general though if you have a function that is a good candidate for inlining (generally a short and simple function that you know is called with high frequency in performance sensitive code) then it is worth declaring it as inline and defining it in a header where it is visible to all callers as the optimizer is more likely to be able to inline it where appropriate.

##### Share on other sites
Quote:
Original post by mattnewport
Quote:
 Original post by DaveTBH it's barely worth inlining anything at all. Any language directive is only a hint and without it the compiler will still do what it thinks is best.

Some compilers, like Visual C++, can sometimes inline functions that are not declared as inline through whole program optimization and link time code generation. In general though if you have a function that is a good candidate for inlining (generally a short and simple function that you know is called with high frequency in performance sensitive code) then it is worth declaring it as inline and defining it in a header where it is visible to all callers as the optimizer is more likely to be able to inline it where appropriate.

This is true. Typically, the inliner in a compiler is implemented as a set of heuristics. The compiler weighs the characteristcs of a give inline candidate. Typically it might look at the size of the callee, whether the user said to inline it, does it have EH, etc. If you specify __forceinline you will tell the compiler to ignore the heuristics and always inline. However, there are still cases when this is NOT going to inline the function. This can happen if the compiler does not know how to inline the function. Sometimes it's just too difficult. For instance, inlining SEH into C++ EH (which introduces a nasty thing called 2-pass flow), inlining vararg functions, or inlining function with inline asm.

However, because __forceinline is often used for correctness (ex: wrapping an intrinsic call that must be inlined for it to work correctly), there are no instances where a __forceinline will be ignored because it provides negative benefit.

But, in the end, the moral of the story is probably to let the compiler do what it wants. It's very good at determining what the best inlining pattern is.

(BTW, I work on the VC++ backend)

##### Share on other sites
Quote:
 Original post by Drew_BentonYou can't 'truly' know unless you look at the generated assembly and see how it is called and appears in the target. If you need help figuring it out though in ASM, you could use something like:__asm { __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 }// function body goes here__asm { __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 __emit 0x90 }What that does is pad 5 NOPs around the body of the code (but will still come after prolog and end before epilog code, reference) but you can open up a free disassembler, OlylDbg for example, and use the search command for 5 NOPS in a row and see where your code is. **The only caveat to that is the compiler might optimize out certain things like that, but I am not sure. When going between debug and release configurations the code is different anyways so no telling if this is actually will work.** The above is not truly tested against identifying inline code and is just an idea.

If you do this (btw, the inline asm instructions must be on separate lines), you will see them in the output of the asm (/FA)

##### Share on other sites
Quote:
 There are a fair number of good reasons not to care.

Quote:
 But, in the end, the moral of the story is probably to let the compiler do what it wants. It's very good at determining what the best inlining pattern is.

Argh! Such one-sided advice and "black-box" thinking really is dangerous.
I was recently asked to help optimize an implementation of the SIFT operator; the SECOND function on the profile was a trivial getter function that wasn't inlined (despite global optimization). If one does not know to look for such things and how to fix them, there are serious prices to be paid in terms of performance.

It is much more valuable that you listed the exact cases where the compiler currently has trouble inlining :D

##### Share on other sites
When I was using Doxygen to generate documentation for one of my programs, I saw it listed what was inlined and what wasn't (but I don't know how it came up with that info).

Would this be helpful to the OP, or is Doxygen's determination of whether something is inlined or not not reliable?

##### Share on other sites
Quote:
Original post by Jan Wassenberg
Quote:
 There are a fair number of good reasons not to care.

Quote:
 But, in the end, the moral of the story is probably to let the compiler do what it wants. It's very good at determining what the best inlining pattern is.

Argh! Such one-sided advice and "black-box" thinking really is dangerous.
I was recently asked to help optimize an implementation of the SIFT operator; the SECOND function on the profile was a trivial getter function that wasn't inlined (despite global optimization). If one does not know to look for such things and how to fix them, there are serious prices to be paid in terms of performance.

It is much more valuable that you listed the exact cases where the compiler currently has trouble inlining :D

By "global optimization" do you mean link time code gen (/LTCG)? C++ compilation happens in modules, typically a single CPP file. Each file is compiled separately into an obj, then fed to the linker to be mashed together. Under this model, you can't inline a function outside the module (say, defined in another class), becuase the compiler doesn't know how to get to it. Under LTCG, however, the linker creates a "meta-module" and then calls the backend for each function in this meta-module, allowing for inlining and general optimizations across compilation units.

Obviously, there are instances where specifying __forceinline can be beneficial. Saying something like "this isn't getting inlined, and I KNOW it's very hot" is OK. But the general usage pattern tends to be this:

1. Developer discovers __forceinline, and it helps perf for some special cases where perf was bad before.
2. Developer gets it into his head that, well..."if it helped here, then it'll help elsewhere, right"?
3. Developer starts specifying it before all his functions, or even before all of his "small" functions.

This is NOT good thinking. The end outcome may be much worse. The compiler can know a lot of stuff the developer has a hard time seeing, size of inline tree, how will the inline affect register pressure, how will it affect the ability to run other optimizations down the line etc. Which is why in MOST cases, it's best to just let the compiler do what it wishes.

##### Share on other sites
Quote:
 When I was using Doxygen to generate documentation for one of my programs, I saw it listed what was inlined

Useful though that may be, Doxygen will only be looking at the C++ code (for the inline keyword, or whether a function is defined in the class declaration). No information can be gained about the code that's actually generated.

Quote:
 By "global optimization" do you mean link time code gen (/LTCG)?

Exactly. BTW, ICC's WPO did not succeed in inlining the function, either.
In this case, it was sufficient to move the function to the header; __forceinline was not necessary.

Quote:
 Which is why in MOST cases, it's best to just let the compiler do what it wishes.

Agreed :) However, we should not forget that other cases exist and ought to be prepared for them when they arise.