# MSVC generating much slower code compared to GCC

This topic is 971 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello everyone!

We just noticed that the same C++-Program performs way better when ran using gcc and Linux, than it does in Windows, using VisualStudio 2015.

The test program is calculates some integrals over and over again and was used in one of my programming assignments, where I work under Linux using my Laptop, which doesn't have a very fast CPU built in. The program finishes in about 0.9 second on it, while it takes a whooping 8-9 seconds on my much stronger Windows-machine!

Another problem is, that our game-project runs much faster on my friends linux machine than it does on mine. He is still going at ~140fps while some heavy physics scenes, while I'm down to ~20.

I'm very confused about that and I can't seem to find any more compile-flags which would improve the situation.

The flags I used on Windows are (These come directly from a compiler-benchmark since I was desperate):

/arch:SSE2 /Ox /Ob2 /Oi /Ot /Oy /fp:fast /GF /FD /MT /GS- /openmp


While the gcc-build uses:

-O3 -ffast-math -fopenmp -funroll-loops -march=native


The same phenomenon could be observed on an other friends windows-machine which has an even better CPU then mine.

It's really weird that my tiny laptop can outperform these computers in no-time. I mean, I should get the same code roughly up to the same speed on both operating systems, right?

Are there any more magic compilerflags to set or other pitfalls I should look out for?

##### Share on other sites
When I'm not sure what's going on with code generation, I look at the disassembly. Maybe the linux version discovered that it can optimize intermediate steps of a loop out because those steps aren't used?

Can we look at your code?

##### Share on other sites
Are you using the standard library at all? MSVC by default turns on a bunch of security and correctness checking that has substantial performance overhead.

##### Share on other sites
You can help out MSVC by making use of __restrict, to give manual promises/hints about where aliasing situations cannot arise (this will also help out GCC as long as you use a macro, so that it changes to __restrict__ on GCC

That isn't even necessary, __restrict works mighty fine.

What doesn't work is using restrict (without underscores) as per C99. Which I deemed somewhat unlucky for a long time because GCC allows a lot of C99 stuff as GNU extension in C++ that isn't very useful, but this one which would be quite nice isn't supported. Then again, you can use the exact same spelling on either compiler with __restrict which is actually preferrable to having a no-underscore version (if you ever intend to make code portable between MS and GCC). Insofar, I don't deem this "unlucky" any more, it's actually a good decision.

Edited by samoth

##### Share on other sites

e.g. this is terrible code that should fail a code review:

//members: size_t m_sum; vector<size_t> m_vec;
m_sum = 0;
for( size_t i=0; i != m_vec.size(); ++i )
m_sum += m_vec[i];

A good compiler is forced to generate terrible asm given ^that^ code. Even GCC with it's strict aliasing can't fix the mistakes in it.

This is the fixed version that is giving the correct hints to the compiler so that it can produce good code:

size_t sum = 0;
for( size_t i=0, end=m_vec.size(); i != end; ++i )
sum += m_vec[i];
m_sum = sum;

Could you elaborate on the example you posted, why the first version is bad?

That by 'm_sum'/'m_vec' being acessed through pointer ('this'?), during each iteration the loop has to load the members from memory? Why can't it just load the initial value of 'm_sum' into a CPU register, add to it during the loop and then store the value?

That the end statement is recalculated every iteration?

Why can't the compiler just assume the member variables won't be altered outside the function? And if has to be, force the programmer to make explicit use of the volatile keyword?

##### Share on other sites
The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.

##### Share on other sites

The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.

I would have thought that something as trivial as size() would have gotten inlined out? Though at least the implementation I'm looking at computes the size by creating the beginning and end iterators and subtracting them, so maybe that isn't much of a savings anyway.

##### Share on other sites

The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.

I would have thought that something as trivial as size() would have gotten inlined out? Though at least the implementation I'm looking at computes the size by creating the beginning and end iterators and subtracting them, so maybe that isn't much of a savings anyway.

It was inlined out, but calling size() every iteration prevented compiler from vectorizing the loop (i.e. using SIMD commands). Precalculating size lets optimizer to figure out it can vectorize the loop. Using local variable to store temporary results also improves chances.

1. 1
2. 2
3. 3
4. 4
5. 5

• 14
• 9
• 9
• 9
• 10
• ### Forum Statistics

• Total Topics
632912
• Total Posts
3009197
• ### Who's Online (See full list)

There are no registered users currently online

×