Visual C++ 2008 producing very strange code

Started by
18 comments, last by l0calh05t 15 years, 8 months ago
I was looking at some assembler output and I noticed the following: This line of code (a and b are floats)
return a < b ? a : b;
creates the following output

movss	xmm0, DWORD PTR _a$[esp+8]
movss	xmm1, DWORD PTR _b$[esp+8]
cvtps2pd xmm0, xmm0
cvtps2pd xmm1, xmm1
comisd	xmm1, xmm0
lea	eax, DWORD PTR _a$[esp+8]
ja	SHORT $LN6@main
lea	eax, DWORD PTR _b$[esp+8]
$LN6@main:

I find it very strange that two single-precision floats are first expanded into double precision floats before being compared. Why isn't comiss used instead of comisd? This would save two instructions and would generally seem to be far more efficient.
Advertisement
Compilers cannot understand a programmer's context and will rarely produce the best code for every situation.
Obviously, but in this case the compiler knows a and b are floats. Why does it expand them to doubles??

If it understood intent it should produce something along the lines of:
movss xmm0, a;movss xmm1, b;minss xmm0, xmm1;movss retval, xmm0;
Perhaps it is willing to make the tradeoff in favor of execution performance versus space. In other words, the conversion to double may yield better performance in hardware than using float.
Why would it? The instruction set supports the exact same instruction in a single precision variant, and doing so would require two instructions less (the conversions to double precision)
Most likely the compiler recognizes the pattern and just provides the one solution it knows. The developers might have reasoned that the expansion is practically "free".

No no no no! :)
Well, it isn't:

	float a;	float b;	float res;	std::cin >> a;	std::cin >> b;	sf::Clock clock;	clock.Reset();	for(unsigned int i = 0; i < 10000000; ++i)	{		__asm		{			movss xmm0, a;			movss xmm1, b;			movss xmm2, xmm0;			movss xmm3, xmm1;			cvtps2pd xmm0, xmm0;			cvtps2pd xmm1, xmm1;			comisd xmm1, xmm0;			movss res, xmm2;			ja min_is_a;			movss res, xmm3;min_is_a:		}	}	float time1 = clock.GetElapsedTime();	clock.Reset();	for(unsigned int i = 0; i < 10000000; ++i)	{		__asm		{			movss xmm0, a;			movss xmm1, b;			movss xmm2, xmm0;			movss xmm3, xmm1;			comiss xmm1, xmm0;			movss res, xmm2;			ja min_is_a2;			movss res, xmm3;min_is_a2:		}	}	float time2 = clock.GetElapsedTime();	std::cout << res << "\n";	std::cout << time1 << "\n";	std::cout << time2 << "\n";


produces

0.0515853
0.0381895

on my pc. so the expansion is everything but free (35% slower)
It probably doesn't matter because it's only a few cycles lost, and that code won't normally run a billion times per second, but have you tried std::min or the instrinsic min() function?
I'm not using Visual C++, so no idea about that one, but under gcc, the like functions usually offer an optimal implementation, which is about as good as you could get with (and sometimes better than) hand-written assembly.

Having said that, I've entirely given up going anywhere near assembler quite a while ago, because it isn't really worth it any more. Looking at more than 3-4 isolated instructions, compiler output with full optimization is rarely a few cycles slower than what you could code in assembler, and usually as fast or even faster. Also, writing C++ takes only about 5% of the time, and the code is a lot easier to manage and debug.
Run that benchmark 10k times and tell us the averages. I trust that you told your compiler to generate optimized code.
Quote:Original post by thedustbustr
Run that benchmark 10k times and tell us the averages. I trust that you told your compiler to generate optimized code.


Right... and yes.

I ran it 5 times and the differences were at the 5th decimal. Good enough for me.

This topic is closed to new replies.

Advertisement