Back to General and Gameplay Programming

MSVC generating much slower code compared to GCC

General and Gameplay Programming Programming

Started by mind in a box February 12, 2016 12:24 AM

22 comments, last by ApochPiQ 8 years, 2 months ago

mind in a box

887

Author

February 12, 2016 12:24 AM

Hello everyone!

We just noticed that the same C++-Program performs way better when ran using gcc and Linux, than it does in Windows, using VisualStudio 2015.

The test program is calculates some integrals over and over again and was used in one of my programming assignments, where I work under Linux using my Laptop, which doesn't have a very fast CPU built in. The program finishes in about 0.9 second on it, while it takes a whooping 8-9 seconds on my much stronger Windows-machine!

Another problem is, that our game-project runs much faster on my friends linux machine than it does on mine. He is still going at ~140fps while some heavy physics scenes, while I'm down to ~20.

I'm very confused about that and I can't seem to find any more compile-flags which would improve the situation.

The flags I used on Windows are (These come directly from a compiler-benchmark since I was desperate):


/arch:SSE2 /Ox /Ob2 /Oi /Ot /Oy /fp:fast /GF /FD /MT /GS- /openmp

While the gcc-build uses:


-O3 -ffast-math -fopenmp -funroll-loops -march=native

The same phenomenon could be observed on an other friends windows-machine which has an even better CPU then mine.

It's really weird that my tiny laptop can outperform these computers in no-time. I mean, I should get the same code roughly up to the same speed on both operating systems, right?

Are there any more magic compilerflags to set or other pitfalls I should look out for?

Thanks in advance!

D3D11-Renderer for Gothic I&II

Nypyren

12,313

February 12, 2016 12:55 AM

When I'm not sure what's going on with code generation, I look at the disassembly. Maybe the linux version discovered that it can optimize intermediate steps of a loop out because those steps aren't used?

Can we look at your code?

ApochPiQ

23,138

February 12, 2016 02:35 AM

Are you using the standard library at all? MSVC by default turns on a bunch of security and correctness checking that has substantial performance overhead.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

RobTheBloke

2,553

February 12, 2016 03:16 AM

Visual C++ tends to err on the side of code safety, and has a habit of assuming everything is volatile. It can be persuaded to optimise your code (and can do a very good job of it), but it can be very picky as to when it applies the optimisations. As ApochPiQ mentions, make sure you disable the security checks in the compiler settings, but without seeing any of the code, it's kinda hard to say much more than that.

Maybe the linux version discovered that it can optimize intermediate steps of a loop out because those steps aren't used?

It's unlikely. VC++ is pretty good at stripping unused code paths. VC++ tends to assume that variables are volatile, so accessing values through references (or this) can prevent vectorisation in tight loops (and a few other fun situations like that). clang and gcc are little bit aggressive in this respect (occasionally too aggressive!). Fire a profiler over the code, look at the disassembly, and tackle those hot spots one at a time. If you see lots of instructions ending in 'ss' or 'sd' (instead of 'ps' or 'pd') then the vectorisation has failed.

Hodgman

52,717

February 12, 2016 03:46 AM

VC++ tends to assume that variables are volatile

Volatile isn't quite the right word. MSVC tends not to be too aggressive when making assumptions about aliasing. GCC defaults to a very strict interpretation of the aliasing rules (which also relies on programmers being aware of the aliasing rules and using a lot of care during dodgy type reinterpretation tricks... whereas MSVC lets you write incorrect code and maybe have it work ok sometimes ).

You can help out MSVC by making use of __restrict, to give manual promises/hints about where aliasing situations cannot arise (this will also help out GCC as long as you use a macro, so that it changes to __restrict__ on GCC).

Alternatively, you should simply just avoid accessing values indirectly (i.e. via pointers, which includes member variables and their implicit this pointer) within loops. That should always be the first step before reaching for restrict.
e.g. this is terrible code that should fail a code review:


//members: size_t m_sum; vector<size_t> m_vec;
m_sum = 0;
for( size_t i=0; i != m_vec.size(); ++i )
  m_sum += m_vec[i];

A good compiler is forced to generate terrible asm given ^that^ code. Even GCC with it's strict aliasing can't fix the mistakes in it.

This is the fixed version that is giving the correct hints to the compiler so that it can produce good code:


size_t sum = 0;
for( size_t i=0, end=m_vec.size(); i != end; ++i )
  sum += m_vec[i];
m_sum = sum;

. 22 Racing Series .

samoth

9,833

February 12, 2016 09:01 AM

You can help out MSVC by making use of __restrict, to give manual promises/hints about where aliasing situations cannot arise (this will also help out GCC as long as you use a macro, so that it changes to __restrict__ on GCC

That isn't even necessary, __restrict works mighty fine.

What doesn't work is using restrict (without underscores) as per C99. Which I deemed somewhat unlucky for a long time because GCC allows a lot of C99 stuff as GNU extension in C++ that isn't very useful, but this one which would be quite nice isn't supported. Then again, you can use the exact same spelling on either compiler with __restrict which is actually preferrable to having a no-underscore version (if you ever intend to make code portable between MS and GCC). Insofar, I don't deem this "unlucky" any more, it's actually a good decision.

KaiserJohan

2,320

February 12, 2016 03:33 PM

e.g. this is terrible code that should fail a code review:
//members: size_t m_sum; vector<size_t> m_vec;
m_sum = 0;
for( size_t i=0; i != m_vec.size(); ++i )
  m_sum += m_vec[i];
A good compiler is forced to generate terrible asm given ^that^ code. Even GCC with it's strict aliasing can't fix the mistakes in it.

This is the fixed version that is giving the correct hints to the compiler so that it can produce good code:
size_t sum = 0;
for( size_t i=0, end=m_vec.size(); i != end; ++i )
  sum += m_vec[i];
m_sum = sum;

Could you elaborate on the example you posted, why the first version is bad?

That by 'm_sum'/'m_vec' being acessed through pointer ('this'?), during each iteration the loop has to load the members from memory? Why can't it just load the initial value of 'm_sum' into a CPU register, add to it during the loop and then store the value?

That the end statement is recalculated every iteration?

Why can't the compiler just assume the member variables won't be altered outside the function? And if has to be, force the programmer to make explicit use of the volatile keyword?

BitMaster

8,652

February 12, 2016 03:49 PM

The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.

Oberon_Command

6,371

February 12, 2016 04:07 PM

The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.

I would have thought that something as trivial as size() would have gotten inlined out? Though at least the implementation I'm looking at computes the size by creating the beginning and end iterators and subtracting them, so maybe that isn't much of a savings anyway.

AlexMekhed

702

February 12, 2016 04:30 PM

The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.

I would have thought that something as trivial as size() would have gotten inlined out? Though at least the implementation I'm looking at computes the size by creating the beginning and end iterators and subtracting them, so maybe that isn't much of a savings anyway.

It was inlined out, but calling size() every iteration prevented compiler from vectorizing the loop (i.e. using SIMD commands). Precalculating size lets optimizer to figure out it can vectorize the loop. Using local variable to store temporary results also improves chances.

MSVC generating much slower code compared to GCC

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

MSVC generating much slower code compared to GCC

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines