MSVC generating much slower code compared to GCC
Of course if the function is defined in a different translation unit it has to assume that it potentially have side effects (perhaps it would be nice if this information was exported in object files) but imo it's more of a case for link time code generation than for micro optimizing using additional local variables.
__restrict cannot be used on references though, for whatever reason. And even though I read an explanation on those forums that references are already thought of to not be able to be rebound, testing it with assembler showed significant worse generated code than with pointers and __restrict (or whatever equivalent).
__restrict can be used on references as of Visual C++ 2015. It was one of the things that I was glad that they added. You also can functionally specify member functions as restrict in 2015 (where this is restrict).
There's one thing Visual Studio can do to trip up performance measurements if you're not aware of it, which has nothing to do with the compiler.
If you run a program by pressing F5 then you will get the Windows debug heap enabled, which is much much slower than the non-debug one.
The simple workaround is to launch it without the debugger attached by using Control+F5 if you're doing performance testing.
There's one thing Visual Studio can do to trip up performance measurements if you're not aware of it, which has nothing to do with the compiler.
If you run a program by pressing F5 then you will get the Windows debug heap enabled, which is much much slower than the non-debug one.
The simple workaround is to launch it without the debugger attached by using Control+F5 if you're doing performance testing.
The simpler workaround is to use optimised release builds in a profiler when doing performance testing.
The simpler workaround is to use optimised release builds in a profiler when doing performance testing.
As near as I can tell, debug/release and code optimizations have no effect on the behavior Adam_42 is talking about, though. If you run a release build by pressing F5 and profile it, it'll be slower in allocation logic than if you attach after launching the process.