• Create Account

## Interpreting ASM of "a <cross> b" against "a.cross(b)" and "cross(a,b)"

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

5 replies to this topic

### #1fastcall22  Moderators

9749
Like
0Likes
Like

Posted 20 May 2012 - 07:12 PM

Hello, I need some help interpreting the results of this test:

For two vectors, a and b, there seems to be more than one way (in C++) to express a vector operation. Because the C++ language doesn't allow for custom infix operators, there have been many ways to write "a cross b": As a member function (a.cross(b)), a free function (cross(a,b)), or even using operator overloading in nonstandard ways (a ^ b). I personally prefer the member function notation, but like the appeal and simplicity of the xor operator overload.

"Wait," I say, "couldn't I define a macro -- say CROSS -- to place some commas and construct a temporary helper object? Sure, I would sacrifice some performance in debug mode, but if it is optimized to the same code as a.cross(b), then I'd much prefer the infix CROSS notation over the a.cross(b) and get the best of both worlds."

After whipping up a test, I found I wasn't the first to come to this realization. I quickly found IdOp and starting playing around.

What I need help with, however, is comparing the output assembly of the various methods.
vector2_t.h
#pragma once

#include <idop.h>

template<typename T>
class vector2_t {
public:
typedef T value_type;

public:
vector2_t() : x(), y() { }
vector2_t( T x, T y ) : x(x), y(y) { }

public:
T dot( const vector2_t<T>& vec ) const {
return x * vec.x + y * vec.y;
}

public:
T x, y;
};

namespace inplace_operators {
template<typename T>
struct dot_product {
typename T::value_type operator() ( const T& left, const T& right ) const {
return left.dot(right);
}
};
}

IDOP_CREATE_LEFT_HANDED_RET( <, dot, >, inplace_operators::dot_product, float );

// My first test:
namespace test {
namespace inplace_operators {
namespace vector2 {
struct dot_product {
public:
const vector2_t<float>* left;

public:
inline friend dot_product& operator, (const vector2_t<float>& left, dot_product& mid) {
mid.left = &left;
return mid;
}
float operator, ( const vector2_t<float>& right ) {
return left->dot( right );
}
};
}
}
}

#define DOT_2 ,test::inplace_operators::vector2::dot_product(),


main.cpp
#include <iostream>
#include <string>
#include <sstream>
#include "vector2_t.h"

int main( int, char*[] ) {
typedef vector2_t<float> vector2;

int m;
float a, b, c, d;
std::string line;
while ( std::cout << "\n> " && std::getline( std::cin, line ) ) {
auto ss = std::istringstream(line);
if ( (ss >> m >> a >> b >> c >> d) && (m >= 0 && m < 3) ) {
float r = 0.f;
auto p1 = vector2(a,b);
auto p2 = vector2(c,d);
switch ( m ) {
case 0: r = (p1 <dot> p2); break;
case 1: r = (p1 DOT_2 p2); break;
case 2: r = p1.dot(p2); break;
default:
__assume(0);
}
std::cout << r << '\n';
}
}
}


And the output assembly (Visual Studio 11 Beta, default Release configuration settings):
			float r = 0.f;
auto p1 = vector2(a,b);
011C148F  movss       xmm0,dword ptr [esp+24h]
case 0: r = (p1 <dot> p2); break;
011C1495  movss       xmm1,dword ptr [esp+14h]
float r = 0.f;
auto p1 = vector2(a,b);
011C149B  movss       xmm2,dword ptr [esp+1Ch]
case 0: r = (p1 <dot> p2); break;
011C14A1  mulss       xmm1,xmm0
float r = 0.f;
auto p1 = vector2(a,b);
011C14A5  movss       dword ptr [esp+30h],xmm0
case 0: r = (p1 <dot> p2); break;
011C14AB  movss       xmm0,dword ptr [esp+10h]
011C14B1  mulss       xmm0,xmm2
case 1: r = (p1 DOT_2 p2); break;
case 2: r = p1.dot(p2); break;
default:
__assume(0);
}
std::cout << r << '\n';
011C14B5  push        ecx
011C14B6  mov         ecx,dword ptr [__imp_std::cout (11C40B4h)]
case 0: r = (p1 <dot> p2); break;
float r = 0.f;
auto p1 = vector2(a,b);
011C14C0  movss       dword ptr [esp+30h],xmm2
auto p2 = vector2(c,d);
switch ( m ) {
011C14C6  sub         eax,0
case 1: r = (p1 DOT_2 p2); break;
case 2: r = p1.dot(p2); break;
default:
__assume(0);
}
std::cout << r << '\n';
011C14C9  movss       dword ptr [esp],xmm1
011C14CE  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (11C4064h)]
011C14D4  mov         ecx,eax
011C14D6  call        std::operator<<<std::char_traits<char> > (11C2570h)
011C14DB  lea         eax,[esp+94h]
011C14E2  mov         dword ptr [esp+20h],eax


Okay, so... Wait, what?
It doesn't look like there are any cmp instructions in the section of the assembly. This seems to imply that all three approaches seem to optimize to the same assembly. Is it possible I am misinterpreting the results? What other tests can I perform that will test the limits compiler optimization?

Many thanks,
fastcall22

Edited by fastcall22, 20 May 2012 - 07:17 PM.

zlib: eJzVVLsSAiEQ6/1qCwoK i7PxA/2S2zMOZljYB1TO ZG7OhUtiduH9egZQCJH9 KcJyo4Wq9t0/RXkKmjx+ cgU4FIMWHhKCU+o/Nx2R LEPgQWLtnfcErbiEl0u4 0UrMghhZewgYcptoEF42 YMj+Z1kg+bVvqxhyo17h nUf+h4b2W4bR4XO01TJ7 qFNzA7jjbxyL71Avh6Tv odnFk4hnxxAf4w6496Kd OgH7/RxC

### #2Antheus  Members

2409
Like
0Likes
Like

Posted 21 May 2012 - 05:53 AM

Try:
int m = argc;

### #3SiCrane  Moderators

11527
Like
0Likes
Like

Posted 21 May 2012 - 05:54 AM

Why do you expect cmp instructions?

### #4Zlodo  Members

632
Like
1Likes
Like

Posted 21 May 2012 - 06:54 AM

I think that what he means with "no cmp" is that his switch case seems nowhere to be found in the generated assembly.

fastcall, I think your interpretation is correct. The code in each branch of the switch case was probably the exact same, so they were merged together, which left the switch with all possible cases pointing to the same code, and no default (which has been hinted as unreachable with _assume(0)), so it was removed altogether.

And indeed, I would not expect the three approaches to define that cross operator to result in different code. In the three cases the compiler calls (and inlines) the same function, regardless of the specific syntax you use.

Another test you might want to do if you want to double check is just to remove the switch case altogether, and make three version of your source, one with r = (p1 <dot> p2);, one with r = (p1 DOT_2 p2); and one with r = p1.dot(p2);, compile all three, and compare the generated assembly, which should be the same.

Edited by Zlodo, 21 May 2012 - 06:55 AM.

### #5Matias Goldberg  Members

9068
Like
0Likes
Like

Posted 21 May 2012 - 09:37 AM

"sub eax, 0" ?? Nice job compiler, nice job. (makes only sense if that instruction is needed for better instruction pairing, out of order execution, or something like that)

Pretty impressive that they all compiled to the same code.

PS: For some reason, I find code using the new keyword 'auto' (outside a function made up of 90% template code), very hard to grok.

### #6Álvaro  Members

20266
Like
1Likes
Like

Posted 21 May 2012 - 10:06 AM

PS: For some reason, I find code using the new keyword 'auto' (outside a function made up of 90% template code), very hard to grok.

I tend to agree. What is auto' buying us when we write this
auto ss = std::istringstream(line);
std::istringstream ss(line);`