a. That is a terrible way of measuring performance:
- clock() has bad granularity
- Even if it had, the overhead of calling clock() so often in such tiny code completely obliterates any difference between the inlined & non-inlined version. (in other words, you're just reading noise)
- What happens inside std::cout can affect your code, stuff like buffer flush & buffer being full can affect the test results (edit: specially since cout's state is not the same between f1 & f2)
b. Inlining isn't always faster:
- It produces bigger code, which can trash the L1 cache (not your case though)
- There is an important branch in your code, the CPU's branch predictor was warmed up by f1's iteration, so it predicts better what is going to happen in f2. Also branch predictors may or may not be better at predicting your result depending on the call stack (completely architecture dependant); so it may be predicting better the non-inlined version. My bets the warming up is what's causing your strange results.
c. What SiCrane said, inline is a hint, not a guarantee. You have to look at the actual generated assembly to see what's inlined and what's not (it may be possible your non-inlined function got actually inlined).
Although, I digress with SiCrane in that MSVC 100% ignores the keyword. Coincidentally a week ago while playing with VS 2008 I saw the compiler was inlining a function only when I wrote the "inline" keyword. The operation going on was moderate (not too short, not too big), so I guess the MSVC's decision about whether inline it was on the edge. 99% of the times though, it just ignores me.
Read MSDN's documentation
on inlining, there are some cases where even __forceinline can't be inlined, just to quote:
Even with __forceinline, the compiler cannot inline code in all circumstances. The compiler cannot inline a function if:
- The function or its caller is compiled with /Ob0 (the default option for debug builds).
- The function and the caller use different types of exception handling (C++ exception handling in one, structured exception handling in the other).
- The function has a variable argument list.
- The function uses inline assembly, unless compiled with /Og, /Ox, /O1, or /O2.
- The function is recursive and not accompanied by #pragma inline_recursion(on). With the pragma, recursive functions are inlined to a default depth of 16 calls. To reduce the inlining depth, use inline_depth pragma.
- The function is virtual and is called virtually. Direct calls to virtual functions can be inlined.
- The program takes the address of the function and the call is made via the pointer to the function. Direct calls to functions that have had their address taken can be inlined.
- The function is also marked with the naked __declspec modifier
Edited by Matias Goldberg, 25 March 2013 - 08:01 PM.