Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interpreting ASM of "a <cross> b" against "a.cross(b)" and "cross(a,b)"


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
5 replies to this topic

#1 fastcall22   Crossbones+   -  Reputation: 4332

Like
0Likes
Like

Posted 20 May 2012 - 07:12 PM

Hello, I need some help interpreting the results of this test:

For two vectors, a and b, there seems to be more than one way (in C++) to express a vector operation. Because the C++ language doesn't allow for custom infix operators, there have been many ways to write "a cross b": As a member function (a.cross(b)), a free function (cross(a,b)), or even using operator overloading in nonstandard ways (a ^ b). I personally prefer the member function notation, but like the appeal and simplicity of the xor operator overload.

"Wait," I say, "couldn't I define a macro -- say CROSS -- to place some commas and construct a temporary helper object? Sure, I would sacrifice some performance in debug mode, but if it is optimized to the same code as a.cross(b), then I'd much prefer the infix CROSS notation over the a.cross(b) and get the best of both worlds."

After whipping up a test, I found I wasn't the first to come to this realization. I quickly found IdOp and starting playing around.

What I need help with, however, is comparing the output assembly of the various methods.
vector2_t.h
#pragma once

#include <idop.h>

template<typename T>
class vector2_t {
public:
	typedef T value_type;

public:
	vector2_t() : x(), y() { }
	vector2_t( T x, T y ) : x(x), y(y) { }

public:
	T dot( const vector2_t<T>& vec ) const {
		return x * vec.x + y * vec.y;
	}
	
public:
	T x, y;
};

namespace inplace_operators {
	template<typename T>
	struct dot_product {
		typename T::value_type operator() ( const T& left, const T& right ) const {
			return left.dot(right);
		}
	};
}

IDOP_CREATE_LEFT_HANDED_RET( <, dot, >, inplace_operators::dot_product, float );

// My first test:
namespace test {
	namespace inplace_operators {
		namespace vector2 {
			struct dot_product {
			public:
				const vector2_t<float>* left;

			public:
				inline friend dot_product& operator, (const vector2_t<float>& left, dot_product& mid) {
					mid.left = &left;
					return mid;
				}
				float operator, ( const vector2_t<float>& right ) {
					return left->dot( right );
				}
			};
		}
	}
}

#define DOT_2 ,test::inplace_operators::vector2::dot_product(),

main.cpp
#include <iostream>
#include <string>
#include <sstream>
#include "vector2_t.h"


int main( int, char*[] ) {
	typedef vector2_t<float> vector2;
	
	int m;
	float a, b, c, d;
	std::string line;
	while ( std::cout << "\n> " && std::getline( std::cin, line ) ) {
		auto ss = std::istringstream(line);
		if ( (ss >> m >> a >> b >> c >> d) && (m >= 0 && m < 3) ) {
			float r = 0.f;
			auto p1 = vector2(a,b);
			auto p2 = vector2(c,d);
			switch ( m ) {
				case 0: r = (p1 <dot> p2); break;
				case 1: r = (p1 DOT_2 p2); break;
				case 2: r = p1.dot(p2); break;
				default:
					__assume(0);
			}
			std::cout << r << '\n';
		}
	}
}

And the output assembly (Visual Studio 11 Beta, default Release configuration settings):
			float r = 0.f;
			auto p1 = vector2(a,b);
011C148F  movss       xmm0,dword ptr [esp+24h]  
				case 0: r = (p1 <dot> p2); break;
011C1495  movss       xmm1,dword ptr [esp+14h]  
			float r = 0.f;
			auto p1 = vector2(a,b);
011C149B  movss       xmm2,dword ptr [esp+1Ch]  
				case 0: r = (p1 <dot> p2); break;
011C14A1  mulss       xmm1,xmm0  
			float r = 0.f;
			auto p1 = vector2(a,b);
011C14A5  movss       dword ptr [esp+30h],xmm0  
				case 0: r = (p1 <dot> p2); break;
011C14AB  movss       xmm0,dword ptr [esp+10h]  
011C14B1  mulss       xmm0,xmm2  
				case 1: r = (p1 DOT_2 p2); break;
				case 2: r = p1.dot(p2); break;
				default:
					__assume(0);
			}
			std::cout << r << '\n';
011C14B5  push        ecx  
011C14B6  mov         ecx,dword ptr [__imp_std::cout (11C40B4h)]  
				case 0: r = (p1 <dot> p2); break;
011C14BC  addss       xmm1,xmm0  
			float r = 0.f;
			auto p1 = vector2(a,b);
011C14C0  movss       dword ptr [esp+30h],xmm2  
			auto p2 = vector2(c,d);
			switch ( m ) {
011C14C6  sub         eax,0  
				case 1: r = (p1 DOT_2 p2); break;
				case 2: r = p1.dot(p2); break;
				default:
					__assume(0);
			}
			std::cout << r << '\n';
011C14C9  movss       dword ptr [esp],xmm1  
011C14CE  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (11C4064h)]  
011C14D4  mov         ecx,eax  
011C14D6  call        std::operator<<<std::char_traits<char> > (11C2570h)  
011C14DB  lea         eax,[esp+94h]  
011C14E2  mov         dword ptr [esp+20h],eax

Okay, so... Wait, what?
It doesn't look like there are any cmp instructions in the section of the assembly. This seems to imply that all three approaches seem to optimize to the same assembly. Is it possible I am misinterpreting the results? What other tests can I perform that will test the limits compiler optimization?

Many thanks,
fastcall22

Edited by fastcall22, 20 May 2012 - 07:17 PM.

c3RhdGljIGNoYXIgeW91cl9tb21bMVVMTCA8PCA2NF07CnNwcmludGYoeW91cl9tb20sICJpcyBmYXQiKTs=

Sponsor:

#2 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 21 May 2012 - 05:53 AM

Try:
int m = argc;


#3 SiCrane   Moderators   -  Reputation: 9596

Like
0Likes
Like

Posted 21 May 2012 - 05:54 AM

Why do you expect cmp instructions?

#4 Zlodo   Members   -  Reputation: 237

Like
1Likes
Like

Posted 21 May 2012 - 06:54 AM

I think that what he means with "no cmp" is that his switch case seems nowhere to be found in the generated assembly.

fastcall, I think your interpretation is correct. The code in each branch of the switch case was probably the exact same, so they were merged together, which left the switch with all possible cases pointing to the same code, and no default (which has been hinted as unreachable with _assume(0)), so it was removed altogether.

And indeed, I would not expect the three approaches to define that cross operator to result in different code. In the three cases the compiler calls (and inlines) the same function, regardless of the specific syntax you use.

Another test you might want to do if you want to double check is just to remove the switch case altogether, and make three version of your source, one with r = (p1 <dot> p2);, one with r = (p1 DOT_2 p2); and one with r = p1.dot(p2);, compile all three, and compare the generated assembly, which should be the same.

Edited by Zlodo, 21 May 2012 - 06:55 AM.


#5 Matias Goldberg   Crossbones+   -  Reputation: 3393

Like
0Likes
Like

Posted 21 May 2012 - 09:37 AM

"sub eax, 0" ?? Nice job compiler, nice job. (makes only sense if that instruction is needed for better instruction pairing, out of order execution, or something like that)

Pretty impressive that they all compiled to the same code.

PS: For some reason, I find code using the new keyword 'auto' (outside a function made up of 90% template code), very hard to grok.

#6 Álvaro   Crossbones+   -  Reputation: 13311

Like
1Likes
Like

Posted 21 May 2012 - 10:06 AM

PS: For some reason, I find code using the new keyword 'auto' (outside a function made up of 90% template code), very hard to grok.


I tend to agree. What is `auto' buying us when we write this
auto ss = std::istringstream(line);
instead of this?
std::istringstream ss(line);





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS