#include <iostream>
#include <ctime>
#include <string>
using namespace std;
#define USING_C
#if defined(USING_C)
string codeLang = "C++";
#else
string codeLang = "ASM";
#endif
#if defined(_MSC_VER)
string compiler = "M$";
#else
string compiler = "GCC";
#endif
unsigned int fact( unsigned int x )
{
#if defined( USING_C )
int sum = 1;
for(; x; --x )
{
sum *= x;
}
return sum;
#else // if using ASM
#if defined(__GNUC__)
asm
(
// Exit quick if input is zero.
" cmp $2, %%ebx \n"
" jle 2f \n"
" movl $1, %%eax \n" // eax is the sum.
"1: imul %%ebx, %%eax \n"
" decl %%ebx \n"
" jnz 1b \n"
" movl %%eax, %%ebx \n"
"2: \n"
: "=b" ( x )
: "b" ( x )
: "%eax"
);
#elif defined(_MSC_VER)
__asm
{
mov EBX, x
// Exit quit if input is zero.
xor EAX, EAX
cmp EAX, EBX
je A_End
// The factorial loop:
mov EAX, 1 // EAX is the sum.
A_Loop: imul EBX
dec EBX
jnz A_Loop
mov x, EAX
A_End:
}
#endif // M$ compiler
#endif // ASM code
return x;
}
int main()
{
#if defined(_MSC_VER)
clock_t timer = clock();
#endif
const int MAX = 99999;
unsigned int accum = MAX;
while( --accum )
{
fact( accum );
}
cout << "Time on " << compiler << " using " << codeLang << ": ";
#if defined(_MSC_VER)
// My non-M$ compiler would normally do this for me.
// M$V$ won't, or at least isn't.
cout << float(clock() - timer) * 0.001f << " secs" << endl;
system("pause");
#endif
/* |_____|___M$___|__GCC__|
* |C++__|_22.157_|_8.406_|
* |ASM__|_13.900_|_XXXXX_| X = Too fast to time.
*/
}
GCC v VC++ and C++ v inline assembly.
I've been taking an assembly coarse for a couple of months now, but worthless as it is, using assembly isn't a priority, just learning how to.
So, I finally get some code out and test my assembly against GCC's assembly generated from my c++ code. No matter how many times I made the assembly version run, I could never time it because it was too fast. The GCC version, on the other hand, took ten seconds to run an insane number of iterations.
Even after looking at GCC's assembly, I couldn't figure out why it was at all slower.
I tried this on M$' compiler. Even though they had the home field advantage it took them twenty-some seconds, and the inline assembly took 13-some.
Here's the source:
Can anyone explain to me, or help me figure out, why my asm code is so fast in GCC or why GCC can't produce similar code? How about why M$ is so slow, even I give it similar asm? (Or maybe I could optimize better?)
I think my example might be too simple, but that only alarms me more! Looking as M$' disassembly, it wasn't smart enough to know that neither x, nor sum, needn't be on the stack and could be register variables. It seems GCC was smart enough to see this, but I don't see why it's any slower than my code. It seems the MS prologue and epilogue are quite large compared to GCC's, but it pushes the source and destination index, which this function does not use or modify. It also pushed ebx. It then has to pop them from the stack at the end. Is this the culprit?
Thanks in advance.
When profiling/timing these speeds, are you compiling/running in Release? If not, please do so as profiling in debug is useless since the compiler does not optimize and in some cases may add bloat to your executable. Interesting stuff though dude. :)
What on earth is an "M$"? Monopoly money?
EDIT: As for the actual question, why don't you look at the dissasembly of the C++ code and see what's being generated?
EDIT: As for the actual question, why don't you look at the dissasembly of the C++ code and see what's being generated?
Also did you look at the generated code for the gcc assembly version? It's possible the optimizer optimized the whole shebang out altogether.
I've found in tests like these, it helps immensely to have something like
Cheers
-Scott
I've found in tests like these, it helps immensely to have something like
cout << "Final: " << accum;
or something, so that the compiler can't be like "oh, well we calculate it but never use it so *poof*". Just a thought. Other thoughts include differing calling conventions etc., since you didn't inline "fact" microsoft is most likely calling it with stdcall and thus will always push the arguments onto the stack... and also what shwasasin said about making sure they are both in release builds as well.Cheers
-Scott
I just ran your code in VC++ 2008:
And on GCC:
Have you ever considered perhaps enabling optimizations?
The "problem" is that you perform a bunch of computations which you don't actually use for anything. So any intelligent compiler will say "oh, that's a waste of time, let's skip those computations".
That's what VC++ does with C/C++ code (which it understands well, and can analyze), but apparently isn't able to do with ASM (which it doesn't understand well, and which is much harder for the compiler to reason about)
Apart from that, we see that GCC is apparently also pretty decent at optimizing assembly, so it is able to perform the same optimization there.
If you want *real* results, you need to 1) use the result of the computation for something, so that the compiler doesn't skip it, and 2) enable optimizations in your compiler.
Time on M$ using C++: 0 secsPress any key to continue . . .
Time on M$ using ASM: 12.656 secsPress any key to continue . . .
And on GCC:
Time on GCC using C++: 0 secs
Time on GCC using ASM: 0 secs
Have you ever considered perhaps enabling optimizations?
The "problem" is that you perform a bunch of computations which you don't actually use for anything. So any intelligent compiler will say "oh, that's a waste of time, let's skip those computations".
That's what VC++ does with C/C++ code (which it understands well, and can analyze), but apparently isn't able to do with ASM (which it doesn't understand well, and which is much harder for the compiler to reason about)
Apart from that, we see that GCC is apparently also pretty decent at optimizing assembly, so it is able to perform the same optimization there.
If you want *real* results, you need to 1) use the result of the computation for something, so that the compiler doesn't skip it, and 2) enable optimizations in your compiler.
Benchmarking and interpretation of results is a difficult topic. That is why the internet is littered with misinformation.
Main reason for the difference is flawed methodology, as well as almost certainly invalid approach to testing.
Using corrected benchmarking code to prevent NOP elimination:
MVC 2008,
#define USING_C
Without #define USING_C
Using these results I can also conclude that your gcc benchmark is invalid, but I'm not doing the gcc benchmarks as well.
Main reason for the difference is flawed methodology, as well as almost certainly invalid approach to testing.
Using corrected benchmarking code to prevent NOP elimination:
const int MAX = 99999; int foo = 0; // <-- unsigned int accum = MAX; while( --accum ) { foo += fact( accum ); // <-- } std::cout << foo; // <-- cout << "Time on " << compiler << " using " << codeLang << ": ";
MVC 2008,
Quote:/Ox /Ob2 /Oi /GL /I "XXXX\src" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_SECURE_SCL=0" /D "_CRT_SECURE_NO_WARNINGS" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /GS- /Gy /arch:SSE2 /Fx /Fo"XXXX\\" /Fd"XXXX\vc90.pdb" /W4 /nologo /c /Zi /TP /wd4355 /errorReport:prompt
#define USING_C
Quote:-125961703Time on M$ using C++: 7.359 secs
Press any key to continue . . .
int foo = 0;00401014 xor esi,esi unsigned int accum = MAX; while( --accum )00401016 mov edx,1869Eh 0040101B jmp main+20h (401020h) 0040101D lea ecx,[ecx] { foo += fact( accum );00401020 mov ecx,edx 00401022 mov eax,1 00401027 test edx,edx 00401029 je main+38h (401038h) 0040102B jmp main+30h (401030h) 0040102D lea ecx,[ecx] 00401030 imul eax,ecx 00401033 sub ecx,1 00401036 jne main+30h (401030h) 00401038 add esi,eax 0040103A sub edx,1 0040103D jne main+20h (401020h) }
Without #define USING_C
Quote:-125961703Time on M$ using ASM: 7.453 secs
Press any key to continue . . .
int foo = 0;00401014 xor ecx,ecx unsigned int accum = MAX; while( --accum )00401016 mov esi,1869Eh 0040101B jmp main+20h (401020h) 0040101D lea ecx,[ecx] { foo += fact( accum );00401020 mov dword ptr [ebp-4],esi 00401023 mov ebx,dword ptr [ebp-4] 00401026 xor eax,eax 00401028 cmp eax,ebx 0040102A je main+39h (401039h) 0040102C mov eax,1 00401031 imul ebx 00401033 dec ebx 00401034 jne main+31h (401031h) 00401036 mov dword ptr [ebp-4],eax 00401039 add ecx,dword ptr [ebp-4] 0040103C sub esi,1 0040103F jne main+20h (401020h)
Using these results I can also conclude that your gcc benchmark is invalid, but I'm not doing the gcc benchmarks as well.
Quote:Original post by Splinter of ChaosStop doing that. This is a place for intelligent discussion, not bullshit trolling.
I tried this on M$' compiler.
(Just to clarify, it's awfully tempting to delete any further posts that do that.)
Quote:Original post by AntheusI'm not to sure. In all likelihood that's the problem but I wouldn't discount the possibility of GCC simply being clever.
Using these results I can also conclude that your gcc benchmark is invalid, but I'm not doing the gcc benchmarks as well.
There are more efficient algorithms for doing this for large numbers of iterations after all, e.g. like this:
unsigned exp(unsigned k, unsigned n) { unsigned v; for(v = 1; n; n >>= 1) { if(n & 1) v *= k; k *= k; } return v;}
I'm not suggesting that any compiler is anywhere near clever enough to do this (unless they've included a pattern precisely for this case in order to speed up benchmarks), but any half-decent compiler will split common factors out of an expression like "k * k * k * k" in it's sleep (though to be honest a great many compilers are less than half-decent.) Now add 4x loop unrolling to the mix and that's basically what you've got in the inner loop.Granted, I haven't actually been able to make GCC do this but my version is getting a bit old.. Or perhaps the whole thing is simply being evaluated at compile-time, though I would've expected the compiler to hit some sort of limit long before reaching the end of this loop.
Quote:Original post by PromitQuote:Original post by Splinter of ChaosStop doing that. This is a place for intelligent discussion, not bullshit trolling.
I tried this on M$' compiler.
Sorry, I just got in the habit through interacting with other internet communities. Different communities have different conventions. Even if you find it annoying, some communities feel the opposite way. And, without having said anything bad about MS (I had to think about it), I don't see how trolling applies here.
Quote:Original post by Spoonbender
Have you ever considered perhaps enabling optimizations?
I generally work with all optimizations on.
Quote:
The "problem" is that you perform a bunch of computations which you don't actually use for anything. So any intelligent compiler will say "oh, that's a waste of time, let's skip those computations".
That was it. I changed my code to accumulate a variable based on the outcome. This also helped check verifiability on each separate compile.
I found out that eight seconds is about how much time you can expect this function to take 99999 times. It's just what it does. I wonder if I can find a better test to pit compilers against each other and C++ vs assembly.
Thanks for all the replies and helping me out with this!
EDIT:
Quote:Original post by Evil Stevewhy don't you look at the dissasembly of the C++ code and see what's being generated?
I did. It's not all that easy for me to read yet, as I've only just begun. It seemed to me what I saw was the same, but now I know about the code skipping, which I never saw in the disassembly. This is why I didn't know why the functions took different time to execute.
Quote:Original post by Splinter of ChaosQuote:Original post by PromitQuote:Original post by Splinter of ChaosStop doing that. This is a place for intelligent discussion, not bullshit trolling.
I tried this on M$' compiler.
Sorry, I just got in the habit through interacting with other internet communities. Different communities have different conventions. Even if you find it annoying, some communities feel the opposite way. And, without having said anything bad about MS (I had to think about it), I don't see how trolling applies here.
"M$" is fairly synonymous -- at least in terms of how it's read around these parts - with "lol I made a joke about an evil monopoly lol money lolololol". That's how low our tolerance for the term has gotten, thanks to the users of that term (as a generalization). If nothing remotely resembling that was running through your mind when you made your posting, then great!
But you'll have to forgive Promit for (probably) hating some of those other communities you came from and... ah... encouraging you to break less savory habits picked up from there. You can forgive me too if you want, for egging him on in IRC -- but I'm just an asshole, so that one's entirely up to you [lol].
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement