GCC v VC++ and C++ v inline assembly.

Started by
22 comments, last by Hodgman 15 years, 6 months ago
I've been taking an assembly coarse for a couple of months now, but worthless as it is, using assembly isn't a priority, just learning how to. So, I finally get some code out and test my assembly against GCC's assembly generated from my c++ code. No matter how many times I made the assembly version run, I could never time it because it was too fast. The GCC version, on the other hand, took ten seconds to run an insane number of iterations. Even after looking at GCC's assembly, I couldn't figure out why it was at all slower. I tried this on M$' compiler. Even though they had the home field advantage it took them twenty-some seconds, and the inline assembly took 13-some. Here's the source:
#include <iostream>
#include <ctime>
#include <string>
using namespace std;

#define USING_C

#if defined(USING_C)
    string codeLang = "C++";
#else
    string codeLang = "ASM";
#endif

#if defined(_MSC_VER)
    string compiler = "M$";
#else
    string compiler = "GCC";
#endif

unsigned int fact( unsigned int x )
{
    #if defined( USING_C )
        int sum = 1;
        for(; x; --x )
        {
            sum *= x;
        }

        return sum;
    #else // if using ASM
        #if defined(__GNUC__)
            asm
            (
                 // Exit quick if input is zero.
                "   cmp  $2, %%ebx     \n"
                "   jle   2f           \n"

                "   movl $1,    %%eax  \n" // eax is the sum.
                "1: imul %%ebx, %%eax  \n"
                "   decl %%ebx         \n"
                "   jnz  1b            \n"
                "   movl %%eax, %%ebx  \n"
                "2:                    \n"
                : "=b" ( x )
                : "b"  ( x )
                : "%eax"
            );
        #elif defined(_MSC_VER)
            __asm
            {
                mov EBX, x

                // Exit quit if input is zero.
                xor EAX, EAX
                cmp EAX, EBX
                je  A_End

                // The factorial loop:
                mov  EAX, 1 // EAX is the sum.
        A_Loop: imul EBX
                dec  EBX
                jnz  A_Loop

                mov x, EAX

         A_End:
            }
        #endif // M$ compiler
    #endif // ASM code

    return x;
}

int main()
{
    #if defined(_MSC_VER)
        clock_t timer = clock();
    #endif

    const int MAX = 99999;

    unsigned int accum = MAX;
    while( --accum )
    {
        fact( accum );
    }

    cout << "Time on " << compiler << " using " << codeLang << ": ";

    #if defined(_MSC_VER)
        // My non-M$ compiler would normally do this for me.
        // M$V$ won't, or at least isn't.
        cout << float(clock() - timer) * 0.001f << " secs" << endl;
        system("pause");
    #endif

    /*  |_____|___M$___|__GCC__|
     *  |C++__|_22.157_|_8.406_|
     *  |ASM__|_13.900_|_XXXXX_| X = Too fast to time.
     */
}
Can anyone explain to me, or help me figure out, why my asm code is so fast in GCC or why GCC can't produce similar code? How about why M$ is so slow, even I give it similar asm? (Or maybe I could optimize better?) I think my example might be too simple, but that only alarms me more! Looking as M$' disassembly, it wasn't smart enough to know that neither x, nor sum, needn't be on the stack and could be register variables. It seems GCC was smart enough to see this, but I don't see why it's any slower than my code. It seems the MS prologue and epilogue are quite large compared to GCC's, but it pushes the source and destination index, which this function does not use or modify. It also pushed ebx. It then has to pop them from the stack at the end. Is this the culprit? Thanks in advance.
Advertisement
When profiling/timing these speeds, are you compiling/running in Release? If not, please do so as profiling in debug is useless since the compiler does not optimize and in some cases may add bloat to your executable. Interesting stuff though dude. :)
What on earth is an "M$"? Monopoly money?

EDIT: As for the actual question, why don't you look at the dissasembly of the C++ code and see what's being generated?
Also did you look at the generated code for the gcc assembly version? It's possible the optimizer optimized the whole shebang out altogether.

I've found in tests like these, it helps immensely to have something like
cout << "Final: " << accum;
or something, so that the compiler can't be like "oh, well we calculate it but never use it so *poof*". Just a thought. Other thoughts include differing calling conventions etc., since you didn't inline "fact" microsoft is most likely calling it with stdcall and thus will always push the arguments onto the stack... and also what shwasasin said about making sure they are both in release builds as well.

Cheers
-Scott
I just ran your code in VC++ 2008:

Time on M$ using C++: 0 secsPress any key to continue . . .


Time on M$ using ASM: 12.656 secsPress any key to continue . . .


And on GCC:
Time on GCC using C++: 0 secs

Time on GCC using ASM: 0 secs 


Have you ever considered perhaps enabling optimizations?

The "problem" is that you perform a bunch of computations which you don't actually use for anything. So any intelligent compiler will say "oh, that's a waste of time, let's skip those computations".

That's what VC++ does with C/C++ code (which it understands well, and can analyze), but apparently isn't able to do with ASM (which it doesn't understand well, and which is much harder for the compiler to reason about)

Apart from that, we see that GCC is apparently also pretty decent at optimizing assembly, so it is able to perform the same optimization there.

If you want *real* results, you need to 1) use the result of the computation for something, so that the compiler doesn't skip it, and 2) enable optimizations in your compiler.
Benchmarking and interpretation of results is a difficult topic. That is why the internet is littered with misinformation.

Main reason for the difference is flawed methodology, as well as almost certainly invalid approach to testing.

Using corrected benchmarking code to prevent NOP elimination:
   const int MAX = 99999;    int foo = 0;          // <--    unsigned int accum = MAX;    while( --accum )    {        foo += fact( accum ); // <--    }	std::cout << foo;  // <--    cout << "Time on " << compiler << " using " << codeLang << ": ";


MVC 2008,
Quote:/Ox /Ob2 /Oi /GL /I "XXXX\src" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_SECURE_SCL=0" /D "_CRT_SECURE_NO_WARNINGS" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /GS- /Gy /arch:SSE2 /Fx /Fo"XXXX\\" /Fd"XXXX\vc90.pdb" /W4 /nologo /c /Zi /TP /wd4355 /errorReport:prompt


#define USING_C
Quote:-125961703Time on M$ using C++: 7.359 secs
Press any key to continue . . .


	int foo = 0;00401014  xor         esi,esi     unsigned int accum = MAX;    while( --accum )00401016  mov         edx,1869Eh 0040101B  jmp         main+20h (401020h) 0040101D  lea         ecx,[ecx]     {        foo += fact( accum );00401020  mov         ecx,edx 00401022  mov         eax,1 00401027  test        edx,edx 00401029  je          main+38h (401038h) 0040102B  jmp         main+30h (401030h) 0040102D  lea         ecx,[ecx] 00401030  imul        eax,ecx 00401033  sub         ecx,1 00401036  jne         main+30h (401030h) 00401038  add         esi,eax 0040103A  sub         edx,1 0040103D  jne         main+20h (401020h)     }


Without #define USING_C
Quote:-125961703Time on M$ using ASM: 7.453 secs
Press any key to continue . . .



	int foo = 0;00401014  xor         ecx,ecx     unsigned int accum = MAX;    while( --accum )00401016  mov         esi,1869Eh 0040101B  jmp         main+20h (401020h) 0040101D  lea         ecx,[ecx]     {        foo += fact( accum );00401020  mov         dword ptr [ebp-4],esi 00401023  mov         ebx,dword ptr [ebp-4] 00401026  xor         eax,eax 00401028  cmp         eax,ebx 0040102A  je          main+39h (401039h) 0040102C  mov         eax,1 00401031  imul        ebx  00401033  dec         ebx  00401034  jne         main+31h (401031h) 00401036  mov         dword ptr [ebp-4],eax 00401039  add         ecx,dword ptr [ebp-4] 0040103C  sub         esi,1 0040103F  jne         main+20h (401020h) 




Using these results I can also conclude that your gcc benchmark is invalid, but I'm not doing the gcc benchmarks as well.
Quote:Original post by Splinter of Chaos
I tried this on M$' compiler.
Stop doing that. This is a place for intelligent discussion, not bullshit trolling.

(Just to clarify, it's awfully tempting to delete any further posts that do that.)
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Quote:Original post by Antheus
Using these results I can also conclude that your gcc benchmark is invalid, but I'm not doing the gcc benchmarks as well.
I'm not to sure. In all likelihood that's the problem but I wouldn't discount the possibility of GCC simply being clever.
There are more efficient algorithms for doing this for large numbers of iterations after all, e.g. like this:
unsigned exp(unsigned k, unsigned n) {	unsigned v;	for(v = 1; n; n >>= 1) {		if(n & 1) v *= k;		k *= k;	}	return v;}
I'm not suggesting that any compiler is anywhere near clever enough to do this (unless they've included a pattern precisely for this case in order to speed up benchmarks), but any half-decent compiler will split common factors out of an expression like "k * k * k * k" in it's sleep (though to be honest a great many compilers are less than half-decent.) Now add 4x loop unrolling to the mix and that's basically what you've got in the inner loop.

Granted, I haven't actually been able to make GCC do this but my version is getting a bit old.. Or perhaps the whole thing is simply being evaluated at compile-time, though I would've expected the compiler to hit some sort of limit long before reaching the end of this loop.
Quote:Original post by Promit
Quote:Original post by Splinter of Chaos
I tried this on M$' compiler.
Stop doing that. This is a place for intelligent discussion, not bullshit trolling.


Sorry, I just got in the habit through interacting with other internet communities. Different communities have different conventions. Even if you find it annoying, some communities feel the opposite way. And, without having said anything bad about MS (I had to think about it), I don't see how trolling applies here.

Quote:Original post by Spoonbender
Have you ever considered perhaps enabling optimizations?


I generally work with all optimizations on.

Quote:
The "problem" is that you perform a bunch of computations which you don't actually use for anything. So any intelligent compiler will say "oh, that's a waste of time, let's skip those computations".


That was it. I changed my code to accumulate a variable based on the outcome. This also helped check verifiability on each separate compile.

I found out that eight seconds is about how much time you can expect this function to take 99999 times. It's just what it does. I wonder if I can find a better test to pit compilers against each other and C++ vs assembly.

Thanks for all the replies and helping me out with this!

EDIT:
Quote:Original post by Evil Stevewhy don't you look at the dissasembly of the C++ code and see what's being generated?


I did. It's not all that easy for me to read yet, as I've only just begun. It seemed to me what I saw was the same, but now I know about the code skipping, which I never saw in the disassembly. This is why I didn't know why the functions took different time to execute.
Quote:Original post by Splinter of Chaos
Quote:Original post by Promit
Quote:Original post by Splinter of Chaos
I tried this on M$' compiler.
Stop doing that. This is a place for intelligent discussion, not bullshit trolling.


Sorry, I just got in the habit through interacting with other internet communities. Different communities have different conventions. Even if you find it annoying, some communities feel the opposite way. And, without having said anything bad about MS (I had to think about it), I don't see how trolling applies here.


"M$" is fairly synonymous -- at least in terms of how it's read around these parts - with "lol I made a joke about an evil monopoly lol money lolololol". That's how low our tolerance for the term has gotten, thanks to the users of that term (as a generalization). If nothing remotely resembling that was running through your mind when you made your posting, then great!

But you'll have to forgive Promit for (probably) hating some of those other communities you came from and... ah... encouraging you to break less savory habits picked up from there. You can forgive me too if you want, for egging him on in IRC -- but I'm just an asshole, so that one's entirely up to you [lol].

This topic is closed to new replies.

Advertisement