Jump to content
  • Advertisement
Sign in to follow this  
l0calh05t

Visual C++ 2008 producing very strange code

This topic is 3725 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I was looking at some assembler output and I noticed the following: This line of code (a and b are floats)
return a < b ? a : b;
creates the following output
movss	xmm0, DWORD PTR _a$[esp+8]
movss	xmm1, DWORD PTR _b$[esp+8]
cvtps2pd xmm0, xmm0
cvtps2pd xmm1, xmm1
comisd	xmm1, xmm0
lea	eax, DWORD PTR _a$[esp+8]
ja	SHORT $LN6@main
lea	eax, DWORD PTR _b$[esp+8]
$LN6@main:

I find it very strange that two single-precision floats are first expanded into double precision floats before being compared. Why isn't comiss used instead of comisd? This would save two instructions and would generally seem to be far more efficient.

Share this post


Link to post
Share on other sites
Advertisement
Compilers cannot understand a programmer's context and will rarely produce the best code for every situation.

Share this post


Link to post
Share on other sites
Obviously, but in this case the compiler knows a and b are floats. Why does it expand them to doubles??

If it understood intent it should produce something along the lines of:

movss xmm0, a;
movss xmm1, b;
minss xmm0, xmm1;
movss retval, xmm0;

Share this post


Link to post
Share on other sites
Perhaps it is willing to make the tradeoff in favor of execution performance versus space. In other words, the conversion to double may yield better performance in hardware than using float.

Share this post


Link to post
Share on other sites
Why would it? The instruction set supports the exact same instruction in a single precision variant, and doing so would require two instructions less (the conversions to double precision)

Share this post


Link to post
Share on other sites
Most likely the compiler recognizes the pattern and just provides the one solution it knows. The developers might have reasoned that the expansion is practically "free".

Share this post


Link to post
Share on other sites
Well, it isn't:


float a;
float b;
float res;

std::cin >> a;
std::cin >> b;

sf::Clock clock;

clock.Reset();
for(unsigned int i = 0; i < 10000000; ++i)
{
__asm
{
movss xmm0, a;
movss xmm1, b;
movss xmm2, xmm0;
movss xmm3, xmm1;
cvtps2pd xmm0, xmm0;
cvtps2pd xmm1, xmm1;
comisd xmm1, xmm0;
movss res, xmm2;
ja min_is_a;
movss res, xmm3;
min_is_a:
}
}
float time1 = clock.GetElapsedTime();

clock.Reset();
for(unsigned int i = 0; i < 10000000; ++i)
{
__asm
{
movss xmm0, a;
movss xmm1, b;
movss xmm2, xmm0;
movss xmm3, xmm1;
comiss xmm1, xmm0;
movss res, xmm2;
ja min_is_a2;
movss res, xmm3;
min_is_a2:
}
}
float time2 = clock.GetElapsedTime();

std::cout << res << "\n";
std::cout << time1 << "\n";
std::cout << time2 << "\n";




produces

0.0515853
0.0381895

on my pc. so the expansion is everything but free (35% slower)

Share this post


Link to post
Share on other sites
It probably doesn't matter because it's only a few cycles lost, and that code won't normally run a billion times per second, but have you tried std::min or the instrinsic min() function?
I'm not using Visual C++, so no idea about that one, but under gcc, the like functions usually offer an optimal implementation, which is about as good as you could get with (and sometimes better than) hand-written assembly.

Having said that, I've entirely given up going anywhere near assembler quite a while ago, because it isn't really worth it any more. Looking at more than 3-4 isolated instructions, compiler output with full optimization is rarely a few cycles slower than what you could code in assembler, and usually as fast or even faster. Also, writing C++ takes only about 5% of the time, and the code is a lot easier to manage and debug.

Share this post


Link to post
Share on other sites
Quote:
Original post by thedustbustr
Run that benchmark 10k times and tell us the averages. I trust that you told your compiler to generate optimized code.


Right... and yes.

I ran it 5 times and the differences were at the 5th decimal. Good enough for me.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!