• Create Account

# Shouldn't inline functions be faster?

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

16 replies to this topic

### #1noatom  Members   -  Reputation: 917

Like
0Likes
Like

Posted 25 March 2013 - 01:39 PM

So there is an exercise that says I should create a program to measure which function is faster,a normal one,or a inline one.

Here's the code:


#include <iostream>
#include <string>
#include <assert.h>

#include <ctime>
using namespace std;

clock_t t;
inline void f1(){
t=clock()-t;
if(t!=0)
cout << "f1 " << t << endl;
}

void f2(){
t=clock()-t;
if(t!=0)
cout <<"f2 " << t << endl;
}
int main() {
for(int i = 0;i<10000;i++){
t=clock();
f1();
}
for(int i = 0;i<10000;i++){
t=clock();
f2();
}

}


The problem is f1 appears 4 times and f2 appears 2 times.As you can see I set the functions so they'll show text only if the time difference is bigger than 0.

Why does the inline function get executed slower?! Shouldn't it be faster?

### #2SiCrane  Moderators   -  Reputation: 11376

Like
5Likes
Like

Posted 25 March 2013 - 01:50 PM

POPULAR

Using the inline keyword doesn't guarantee that the compiler will inline a function. Not using the inline keyword doesn't mean that the compiler won't inline it. The inline keyword is a hint that you give the compiler; a hint that most modern compilers will ignore and do their own thing anyways. With a modern compiler about the only difference the keyword makes is allowing you to stick the definition of a inline function in a header.

### #3EddieV223  Members   -  Reputation: 1827

Like
2Likes
Like

Posted 25 March 2013 - 01:50 PM

It should be faster, not by much but a little, if you call them lots of times you should see a slight increase in performance.  However the compiler is free to chose to inline or not to inline, whether you declare it as inline or not.  For example with optimisations on, your compiler may inline functions that you didn't declare inline.  And also choose to not inline if you declared as inline.

That said, your clock probably doesn't have the accuracy needed for this test.  Try using c++11 high_precision_clock instead ( <chrono> ).  Or if not using c++11, try win32 clock QUERY functions.

Edited by EddieV223, 25 March 2013 - 01:51 PM.

If this post or signature was helpful and/or constructive please give rep.

// C++ Video tutorials

// Easy to learn 2D Game Library c++

SFML2.2 Tutorials http://www.sfml-dev.org/tutorials/2.2/

// Excellent 2d physics library Box2D

// SFML 2 book

### #4Ravyne  GDNet+   -  Reputation: 13706

Like
0Likes
Like

Posted 25 March 2013 - 02:27 PM

That said, your clock probably doesn't have the accuracy needed for this test. Try using c++11 high_precision_clock instead ( ). Or if not using c++11, try win32 clock QUERY functions.

If you're using Visual Studio 12 then high_precision_clock won't work unless they've fixed it in one of the updates. It shipped with a placeholder version that used a low-resolution clock. If you're on Windows/Visual Studio 12, you should use QueryPerformanceCounter instead.

Somewhat related are the contents of this thread, in particular my last post and my post preceding it touch a little bit on why inlining and other micro-optimizations aren't so simple to reason about, and from there you can infer why its a good thing that the inline keyword is only a hint to the compiler, rather than a direct command.

throw table_exception("(ノ ゜Д゜)ノ ︵ ┻━┻");

### #5deftware  Prime Members   -  Reputation: 1612

Like
0Likes
Like

Posted 25 March 2013 - 02:44 PM

I did my own test a while back on inlining functions, compiled a similar test exe, and disassembled it to look at the actual output from the compiler - only to find that the code, for both functions, and for calling them, was identical.. eg: there was no discernable difference between using inline and not inline at the machine code level.. All of this I assume is the result of what everybody is saying about it here.

So, I just #define function-like macros to produce virtually the same result as inlining functions. At least, from what I've read, it's virtually the same as inlining functions. Either way, it works how I want it to at the machine level.

### #6Keyboardwarrior  Members   -  Reputation: 418

Like
0Likes
Like

Posted 25 March 2013 - 03:19 PM

If you are using Microsoft visual studio you can force use '__forceinline' keyword, meanwhile 'inline' just gives the compiler a "hint". Inlining a function removes the call/ret overhead generated by compiler, but creates a larger executable image. The only time i use inlining is small tight loops where the overhead would cost too many cpu cycles, otherwise i rely on the compiler to make the right decision.

### #7SiCrane  Moderators   -  Reputation: 11376

Like
3Likes
Like

Posted 25 March 2013 - 03:28 PM

If you are using Microsoft visual studio you can force use '__forceinline' keyword

Actually you can't. Even __forceinline is just a strong suggestion. From the MSDN documentation on __forceinline:

The compiler treats the inline expansion options and keywords as suggestions. There is no guarantee that functions will be inlined. You cannot force the compiler to inline a particular function, even with the __forceinline keyword.

### #8Ohforf sake  Members   -  Reputation: 2048

Like
1Likes
Like

Posted 25 March 2013 - 03:33 PM

Function inlining can cost you performance because it increases the code size an thus the the strain on the I-cache.

However if used correctly it can significantly increase the performance for two reasons:

1. The compiler can optimize across the boundaries of the function. In your case you won't be seeing this, because there is virtually nothing outside the function except for the loop.

2. You don't need the the instructions for calling the function, passing parameters, creating a stack frame, etc. You only see a benefit from this, if this overhead is actually a large percentage of what the function does. This is typically the case for getters and setters which would boil down to about 1 instruction if not for that overhead. However in your case, you are doing a syscall in that function worth a couple of thousand instructions, so the overhead of not inlining is insignificant.

Bottom line, in the scenario you chose, inlining should not give you any visible performance advantage.

### #9swiftcoder  Senior Moderators   -  Reputation: 17609

Like
5Likes
Like

Posted 25 March 2013 - 04:14 PM

POPULAR

So, I just #define function-like macros to produce virtually the same result as inlining functions. At least, from what I've read, it's virtually the same as inlining functions. Either way, it works how I want it to at the machine level.

I would recommend that no one do this, unless they have hard performance data to back up the change.

When your compiler chooses to ignore the 'inline' suggestion, it generally has a good reason why...

Tristam MacDonald - Software Engineer @ Amazon - [swiftcoding] [GitHub]

### #10Ryan_001  Prime Members   -  Reputation: 2846

Like
0Likes
Like

Posted 25 March 2013 - 05:29 PM

You shouldn't treat inline as a performance directive.  Much like the old auto and register keywords, its initial use is pretty much depricated.  Rather think of it in terms of a linkage directive, like extern, or static.  The inline keyword allows you to define a function in the header, and it quite useful in that regard.  As far as performance concerns, modern compilers with global optimizations do not need a hint to know when or when not to considering inlining.

### #11EWClay  Members   -  Reputation: 659

Like
0Likes
Like

Posted 25 March 2013 - 06:44 PM

If you are using Microsoft visual studio you can force use '__forceinline' keyword

Actually you can't. Even __forceinline is just a strong suggestion. From the MSDN documentation on __forceinline:

The compiler treats the inline expansion options and keywords as suggestions. There is no guarantee that functions will be inlined. You cannot force the compiler to inline a particular function, even with the __forceinline keyword.

It's a very strong suggestion. It overrides the compiler's own analysis and it will do it unless it's impossible to inline the function, for example if it is recursive or virtual.

Because it is such a strong suggestion __forceinline should be used with caution and after profiling. Unlike inline itself which means nothing but 'I put this function in a header'.

For this test you may very well need __forceinline and possibly __declspec(noinline) too. Check in the debugger that it's doing what you expect.

The method of timing looks unreliable. It's most likely random whether a tick occurs between those two points. Try timing the whole loop instead.

### #12Matias Goldberg  Crossbones+   -  Reputation: 8472

Like
3Likes
Like

Posted 25 March 2013 - 07:54 PM

a. That is a terrible way of measuring performance:
• Even if it had, the overhead of calling clock() so often in such tiny code completely obliterates any difference between the inlined & non-inlined version. (in other words, you're just reading noise)
• What happens inside std::cout can affect your code, stuff like buffer flush & buffer being full can affect the test results (edit: specially since cout's state is not the same between f1 & f2)
b. Inlining isn't always faster:
• It produces bigger code, which can trash the L1 cache (not your case though)
• There is an important branch in your code, the CPU's branch predictor was warmed up by f1's iteration, so it predicts better what is going to happen in f2. Also branch predictors may or may not be better at predicting your result depending on the call stack (completely architecture dependant); so it may be predicting better the non-inlined version. My bets the warming up is what's causing your strange results.
c. What SiCrane said, inline is a hint, not a guarantee. You have to look at the actual generated assembly to see what's inlined and what's not (it may be possible your non-inlined function got actually inlined).
Although, I digress with SiCrane in that MSVC 100% ignores the keyword. Coincidentally a week ago while playing with VS 2008 I saw the compiler was inlining a function only when I wrote the "inline" keyword. The operation going on was moderate (not too short, not too big), so I guess the MSVC's decision about whether inline it was on the edge. 99% of the times though, it just ignores me.

Read MSDN's documentation on inlining, there are some cases where even __forceinline can't be inlined, just to quote:

Even with __forceinline, the compiler cannot inline code in all circumstances. The compiler cannot inline a function if:

• The function or its caller is compiled with /Ob0 (the default option for debug builds).
• The function and the caller use different types of exception handling (C++ exception handling in one, structured exception handling in the other).
• The function has a variable argument list.
• The function uses inline assembly, unless compiled with /Og, /Ox, /O1, or /O2.
• The function is recursive and not accompanied by #pragma inline_recursion(on). With the pragma, recursive functions are inlined to a default depth of 16 calls. To reduce the inlining depth, use inline_depth pragma.
• The function is virtual and is called virtually. Direct calls to virtual functions can be inlined.
• The program takes the address of the function and the call is made via the pointer to the function. Direct calls to functions that have had their address taken can be inlined.
• The function is also marked with the naked __declspec modifier

Cheers
Dark Sylinc

Edited by Matias Goldberg, 25 March 2013 - 08:01 PM.

### #13LorenzoGatti  Crossbones+   -  Reputation: 3905

Like
0Likes
Like

Posted 26 March 2013 - 03:02 AM

Inspecting the compiled assembler code should be the first step of any performance test: if the code is identical there is nothing to measure.
With GCC, you can simply pass option -S to get assembler code.

You should also test completely separate programs, not one program with two functions: the compiler could do something different because there are two identical functions in the same compilation unit.
The test program can be written as
INLINESPEC void testFunction(){
...
}

int main (){
...
testFunction()
...
}


You can define INLINESPEC as "inline", an empty string, or the fancy compiler-specific attributes through compiler commandline options to get each variant.

Omae Wa Mou Shindeiru

### #14NightCreature83  Crossbones+   -  Reputation: 4673

Like
0Likes
Like

Posted 26 March 2013 - 03:23 AM

To understand why inlining sometimes is faster and sometimes isn't you need to have a solid understanding of what a C++ compiler is doing under the hood and how your code is transformed into assembler. There can be quite a few surprises happening when you look at the assembler code of a particular function, as a single float divide can spawn SSE2 assembler if compiling with that option on, even in non vectorised code.

http://www.altdevblogaday.com/2013/01/05/cc-low-level-curriculum-part-10-user-defined-types/ the articles referenced in this link will give you a good explanation of what is going on under the hood if you want to know more.

Inspecting the compiled assembler code should be the first step of any performance test: if the code is identical there is nothing to measure.
With GCC, you can simply pass option -S to get assembler code.

You should also test completely separate programs, not one program with two functions: the compiler could do something different because there are two identical functions in the same compilation unit.
The test program can be written as

INLINESPEC void testFunction(){
...
}

int main (){
...
testFunction()
...
}

You can define INLINESPEC as "inline", an empty string, or the fancy compiler-specific attributes through compiler commandline options to get each variant.

Actually it shouldn't be this works for this case but if you find that your application is running slow the first thing you should do is run a profiler and find where the hotspot is in your application. Looking at the generated assembler is a last resort as a C++ compiler for an out-of-order CPU is better at optimising this then you are, on inline CPU's hand optimised assembler can be faster. Nowadays optimisations are more about data locality then about instruction level otimisations, hitting a cache miss is more costly than having a slightly unoptimised instruction order, especially on out-of-order CPU's. The compiler is not the only place where optimisations to your code happen even during runtime a modern CPU is reordering the way your instuctions are issued to the ALU and you have no control over this.

http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2 even though this presentation is mostly about Parallel programming in the first bit he tells you what happens after your code is compiled and run on a modren CPU and wich transformations it can apply to that code.

Edited by NightCreature83, 26 March 2013 - 06:58 AM.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max

### #15Olof Hedman  Crossbones+   -  Reputation: 5545

Like
0Likes
Like

Posted 26 March 2013 - 03:25 AM

The problem is f1 appears 4 times and f2 appears 2 times.As you can see I set the functions so they'll show text only if the time difference is bigger than 0.

Why does the inline function get executed slower?! Shouldn't it be faster?

Disregarding all the issues others have mentioned about the way you measure time, doesn't that result say that the inlined f2 executes faster then f1?

The less times time difference is > 0, the faster the loop runs...

### #16EWClay  Members   -  Reputation: 659

Like
0Likes
Like

Posted 26 March 2013 - 04:17 AM

The problem is f1 appears 4 times and f2 appears 2 times.As you can see I set the functions so they'll show text only if the time difference is bigger than 0.

Why does the inline function get executed slower?! Shouldn't it be faster?

Disregarding all the issues others have mentioned about the way you measure time, doesn't that result say that the inlined f2 executes faster then f1?
The less times time difference is > 0, the faster the loop runs...

That would be a good point if f2 were the inlined one.

### #17Olof Hedman  Crossbones+   -  Reputation: 5545

Like
0Likes
Like

Posted 26 March 2013 - 04:19 AM

That would be a good point if f2 were the inlined one.

I'll climb back in my cave now

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

PARTNERS