Sign in to follow this  

Interpreting ASM of "a <cross> b" against "a.cross(b)" and "cross(a,b)"

This topic is 2067 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, I need some help interpreting the results of this test:

For two vectors, [tt]a[/tt] and [tt]b[/tt], there seems to be more than one way (in C++) to express a vector operation. Because the C++ language doesn't allow for custom infix operators, there have been many ways to write "a cross b": As a member function ([tt]a.cross(b)[/tt]), a free function ([tt]cross(a,b)[/tt]), or even using operator overloading in nonstandard ways ([tt]a ^ b[/tt]). I personally prefer the member function notation, but like the appeal and simplicity of the xor operator overload.

"Wait," I say, "couldn't I define a macro -- say [tt]CROSS[/tt] -- to place some commas and construct a temporary helper object? Sure, I would sacrifice some performance in debug mode, but if it is optimized to the same code as [tt]a.cross(b)[/tt], then I'd much prefer the infix [tt]CROSS[/tt] notation over the a.cross(b) and get the best of both worlds."

After whipping up a test, I found [url=http://stackoverflow.com/questions/1515399/can-you-make-custom-operators-in-c]I wasn't the first[/url] to come to this realization. I quickly found [url=http://cogwheel.info/idop/]IdOp[/url] and starting playing around.

What I need help with, however, is comparing the output assembly of the various methods.
[tt]vector2_t.h[/tt]
[code]
#pragma once

#include <idop.h>

template<typename T>
class vector2_t {
public:
typedef T value_type;

public:
vector2_t() : x(), y() { }
vector2_t( T x, T y ) : x(x), y(y) { }

public:
T dot( const vector2_t<T>& vec ) const {
return x * vec.x + y * vec.y;
}

public:
T x, y;
};

namespace inplace_operators {
template<typename T>
struct dot_product {
typename T::value_type operator() ( const T& left, const T& right ) const {
return left.dot(right);
}
};
}

IDOP_CREATE_LEFT_HANDED_RET( <, dot, >, inplace_operators::dot_product, float );

// My first test:
namespace test {
namespace inplace_operators {
namespace vector2 {
struct dot_product {
public:
const vector2_t<float>* left;

public:
inline friend dot_product& operator, (const vector2_t<float>& left, dot_product& mid) {
mid.left = &left;
return mid;
}
float operator, ( const vector2_t<float>& right ) {
return left->dot( right );
}
};
}
}
}

#define DOT_2 ,test::inplace_operators::vector2::dot_product(),
[/code]

[tt]main.cpp[/tt]
[code]
#include <iostream>
#include <string>
#include <sstream>
#include "vector2_t.h"


int main( int, char*[] ) {
typedef vector2_t<float> vector2;

int m;
float a, b, c, d;
std::string line;
while ( std::cout << "\n> " && std::getline( std::cin, line ) ) {
auto ss = std::istringstream(line);
if ( (ss >> m >> a >> b >> c >> d) && (m >= 0 && m < 3) ) {
float r = 0.f;
auto p1 = vector2(a,b);
auto p2 = vector2(c,d);
switch ( m ) {
case 0: r = (p1 <dot> p2); break;
case 1: r = (p1 DOT_2 p2); break;
case 2: r = p1.dot(p2); break;
default:
__assume(0);
}
std::cout << r << '\n';
}
}
}
[/code]

And the output assembly (Visual Studio 11 Beta, default Release configuration settings):
[code]
float r = 0.f;
auto p1 = vector2(a,b);
011C148F movss xmm0,dword ptr [esp+24h]
case 0: r = (p1 <dot> p2); break;
011C1495 movss xmm1,dword ptr [esp+14h]
float r = 0.f;
auto p1 = vector2(a,b);
011C149B movss xmm2,dword ptr [esp+1Ch]
case 0: r = (p1 <dot> p2); break;
011C14A1 mulss xmm1,xmm0
float r = 0.f;
auto p1 = vector2(a,b);
011C14A5 movss dword ptr [esp+30h],xmm0
case 0: r = (p1 <dot> p2); break;
011C14AB movss xmm0,dword ptr [esp+10h]
011C14B1 mulss xmm0,xmm2
case 1: r = (p1 DOT_2 p2); break;
case 2: r = p1.dot(p2); break;
default:
__assume(0);
}
std::cout << r << '\n';
011C14B5 push ecx
011C14B6 mov ecx,dword ptr [__imp_std::cout (11C40B4h)]
case 0: r = (p1 <dot> p2); break;
011C14BC addss xmm1,xmm0
float r = 0.f;
auto p1 = vector2(a,b);
011C14C0 movss dword ptr [esp+30h],xmm2
auto p2 = vector2(c,d);
switch ( m ) {
011C14C6 sub eax,0
case 1: r = (p1 DOT_2 p2); break;
case 2: r = p1.dot(p2); break;
default:
__assume(0);
}
std::cout << r << '\n';
011C14C9 movss dword ptr [esp],xmm1
011C14CE call dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (11C4064h)]
011C14D4 mov ecx,eax
011C14D6 call std::operator<<<std::char_traits<char> > (11C2570h)
011C14DB lea eax,[esp+94h]
011C14E2 mov dword ptr [esp+20h],eax
[/code]

Okay, so... Wait, what?
It doesn't look like there are any [tt]cmp[/tt] instructions in the section of the assembly. This seems to imply that all three approaches seem to optimize to the same assembly. Is it possible I am misinterpreting the results? What other tests can I perform that will test the limits compiler optimization?

Many thanks,
fastcall22 Edited by fastcall22

Share this post


Link to post
Share on other sites
Why do you expect cmp instructions?

Share this post


Link to post
Share on other sites
I think that what he means with "no cmp" is that his switch case seems nowhere to be found in the generated assembly.

fastcall, I think your interpretation is correct. The code in each branch of the switch case was probably the exact same, so they were merged together, which left the switch with all possible cases pointing to the same code, and no default (which has been hinted as unreachable with _assume(0)), so it was removed altogether.

And indeed, I would not expect the three approaches to define that cross operator to result in different code. In the three cases the compiler calls (and inlines) the same function, regardless of the specific syntax you use.

Another test you might want to do if you want to double check is just to remove the switch case altogether, and make three version of your source, one with r = (p1 <dot> p2);, one with r = (p1 DOT_2 p2); and one with r = p1.dot(p2);, compile all three, and compare the generated assembly, which should be the same. Edited by Zlodo

Share this post


Link to post
Share on other sites
"sub eax, 0" ?? Nice job compiler, nice job. (makes only sense if that instruction is needed for better instruction pairing, out of order execution, or something like that)

Pretty impressive that they all compiled to the same code.

PS: For some reason, I find code using the new keyword 'auto' (outside a function made up of 90% template code), very hard to grok.

Share this post


Link to post
Share on other sites
[quote name='Matias Goldberg' timestamp='1337614664' post='4941933']
PS: For some reason, I find code using the new keyword 'auto' (outside a function made up of 90% template code), very hard to grok.
[/quote]

I tend to agree. What is `auto' buying us when we write this
[code] auto ss = std::istringstream(line);[/code]
instead of this?
[code] std::istringstream ss(line);[/code]

Share this post


Link to post
Share on other sites
Sign in to follow this