Sign in to follow this  
Jesper T

Operator overloading efficiency (c++)

Recommended Posts

Operator overloading is a quite essential feature of c++. At least I use it exstensively, it really makes the code simple and easy to read. But I had not realised at how great a cost this came at before I ran some speed tests. Imagine you have a class with three members, and adding two objects of this class is defined as adding the corresponding members in quadrature. There are many ways to do this. The most simple way would be to overload the + operator, making the code simple to write and to read. So let us compare it to a couple of other ways which doesnt look so nice and are harder to read.
class Foo
{
public:
	Foo(const double a = 0.0, const double b = 0.0, const double t = 1.0)
	{
		alpha = a;
		beta = b;
		theta = t;
	}

	const Foo & operator = (const Foo &R)
	{
		alpha = R.alpha;
		beta = R.beta;
		theta = R.theta;

		return *this;
	}

	Foo operator + (const Foo &R) const
	{
		return Foo(sqrt(alpha*alpha + R.alpha*R.alpha),
			       sqrt(beta*beta + R.beta*R.beta),
				   sqrt(theta*theta + R.theta*R.theta));
	}

	const Foo & operator += (const Foo &R)
	{
		alpha = sqrt(alpha*alpha + R.alpha*R.alpha);
		beta = sqrt(beta*beta + R.beta*R.beta);
		theta = sqrt(theta*theta + R.theta*R.theta);

		return *this;
	}

	const Foo & sum(const Foo &R, Foo &result)
	{
		result.alpha = sqrt(alpha*alpha + R.alpha*R.alpha);
		result.beta = sqrt(beta*beta + R.beta*R.beta);
		result.theta = sqrt(theta*theta + R.theta*R.theta);

		return result;
	}

	~Foo(void) { }

	double alpha;
	double beta;
	double theta;
};
The weakness of the + operator overloading is obvious, first a local Foo object has to be created, then returned by value. The overloaded += operator avoids this but it is cumbersome to use, requiring two statements in the code. As a third option we dont use operator overloading at all. Testing the various methods times gave the following results
uba = foo + bar;         ->  35077ms, uba = 5.17344, 8.1232, 8.08741
uba = foo; uba += foo;   ->    938ms, uba = 5.17344, 8.1232, 8.08741
foo.sum(bar, uba);       ->    939ms, uba = 5.17344, 8.1232, 8.08741
Quite an amazing difference, isnt it? I would have thought that the compiler would be able to optimize the first one, but apparently it did not. Now, my question is this: is there a way of obtaining the simplicity of the statement "uba = foo + bar;" together with the speed of "foo.sum(bar, uba);"? Here is the code for testing by the way (compiled in release mode):
#include "stdafx.h"

class Foo
{
public:
	Foo(const double a = 0.0, const double b = 0.0, const double t = 1.0)
	{
		alpha = a;
		beta = b;
		theta = t;
	}

	const Foo & operator = (const Foo &R)
	{
		alpha = R.alpha;
		beta = R.beta;
		theta = R.theta;

		return *this;
	}

	const Foo & operator += (const Foo &R)
	{
		alpha = sqrt(alpha*alpha + R.alpha*R.alpha);
		beta = sqrt(beta*beta + R.beta*R.beta);
		theta = sqrt(theta*theta + R.theta*R.theta);

		return *this;
	}

	Foo operator + (const Foo &R) const
	{
		return Foo(sqrt(alpha*alpha + R.alpha*R.alpha),
			       sqrt(beta*beta + R.beta*R.beta),
				   sqrt(theta*theta + R.theta*R.theta));
	}

	const Foo & sum(const Foo &R, Foo &result)
	{
		result.alpha = sqrt(alpha*alpha + R.alpha*R.alpha);
		result.beta = sqrt(beta*beta + R.beta*R.beta);
		result.theta = sqrt(theta*theta + R.theta*R.theta);

		return result;
	}

	~Foo(void) { }

	double alpha;
	double beta;
	double theta;
};

int _tmain(int argc, _TCHAR* argv[])
{
	Foo foo(frand(0.0, 10.0), frand(0.0, 10.0), frand(0.0, 10.0));
	Foo bar(frand(0.0, 10.0), frand(0.0, 10.0), frand(0.0, 10.0));
	Foo uba = foo + bar;

	timeBeginPeriod(1);

	ofstream savf("result.txt", ios::out);

	int ta, tb;

	__int64 counts = 1000000000;


	ta = timeGetTime();

	for(__int64 i = 0; i < counts; i++)
		uba = foo + bar;

	tb = timeGetTime();
	savf << "uba = foo + bar;         ->  " << setw(5) << tb - ta << "ms, uba = " << uba.alpha << ", " << uba.beta << ", " << uba.theta<< "\n";
	cout << "uba = foo + bar;         ->  " << setw(5) << tb - ta << "ms, uba = " << uba.alpha << ", " << uba.beta << ", " << uba.theta<< "\n";


	ta = timeGetTime();

	for(__int64 i = 0; i < counts; i++)
	{
		uba = foo; 
		uba += bar;
	}

	tb = timeGetTime();
	savf << "uba = foo; uba += foo;   ->  " << setw(5) << tb - ta << "ms, uba = " << uba.alpha << ", " << uba.beta << ", " << uba.theta<< "\n";
	cout << "uba = foo; uba += foo;   ->  " << setw(5) << tb - ta << "ms, uba = " << uba.alpha << ", " << uba.beta << ", " << uba.theta<< "\n";

	ta = timeGetTime();

	for(__int64 i = 0; i < counts; i++)
		foo.sum(bar, uba);

	tb = timeGetTime();
	savf << "foo.sum(bar, uba);       ->  " << setw(5) << tb - ta << "ms, uba = " << uba.alpha << ", " << uba.beta << ", " << uba.theta<< "\n";
	cout << "foo.sum(bar, uba);       ->  " << setw(5) << tb - ta << "ms, uba = " << uba.alpha << ", " << uba.beta << ", " << uba.theta<< "\n";  

	cin.ignore(1);
}


Edit: Dont ask me why I used theta instead of gamma, I really have no idea.

Share this post


Link to post
Share on other sites
Quote:
Quite an amazing difference, isnt it? I would have thought that the compiler would be able to optimize the first one, but apparently it did not.


Such a difference means the test case is flawed. My guess would be that the first case is the only one that gets executed, other are probably evaluated during compile time, and the time shown is just the cost of test loop overhead.

In first loop, the entire expression gets evaluated on each pass.

In following two loops, expression gets evaluated exactly once, after that, only loop counter gets incremented.

The difference here is unrelated to performance of operators.

Share this post


Link to post
Share on other sites
They are inlined, at least the results were almost identical (within a few ms) after adding __forceinline before all the member functions.

Quote:
Original post by Antheus
Such a difference means the test case is flawed. My guess would be that the first case is the only one that gets executed, other are probably evaluated during compile time, and the time shown is just the cost of test loop overhead.


How would you suggest the test method be improved?

Share this post


Link to post
Share on other sites
Obviously the operator+ version does more work and has a different usage. (Using += instead doesn't help a lot if you do need the sum in a different object.)

However, the time difference looks suspiciously large, so there must be something wrong with the benchmark. (With GCC, the difference is about 15%, not orders of magnitude.)

Share this post


Link to post
Share on other sites
Quote:
Original post by Jesper T

How would you suggest the test method be improved?


#include <windows.h>
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
#include <cmath>

using namespace std;

class Foo
{
public:
Foo(const double a = 0.0, const double b = 0.0, const double t = 1.0)
{
alpha = a;
beta = b;
theta = t;
}

const Foo & operator = (const Foo &R)
{
alpha = R.alpha;
beta = R.beta;
theta = R.theta;

return *this;
}

const Foo & operator += (const Foo &R)
{
alpha = sqrt(alpha*alpha + R.alpha*R.alpha);
beta = sqrt(beta*beta + R.beta*R.beta);
theta = sqrt(theta*theta + R.theta*R.theta);

return *this;
}

Foo operator + (const Foo &R) const
{
return Foo(sqrt(alpha*alpha + R.alpha*R.alpha),
sqrt(beta*beta + R.beta*R.beta),
sqrt(theta*theta + R.theta*R.theta));
}

const Foo & sum(const Foo &R, Foo &result)
{
result.alpha = sqrt(alpha*alpha + R.alpha*R.alpha);
result.beta = sqrt(beta*beta + R.beta*R.beta);
result.theta = sqrt(theta*theta + R.theta*R.theta);

return result;
}

~Foo(void) { }

double alpha;
double beta;
double theta;
};

float frand(float min, float max) {
return min + ((max-min) * rand() / RAND_MAX);
}

struct Tester {
virtual void test() = 0;
virtual void print() = 0;
};

struct Baseline : Tester {
Baseline()
{
}
virtual void test() {
}
virtual void print() {
}
};


struct Test1 : Tester {
Test1()
: foo(frand(0.0, 10.0), frand(0.0, 10.0), frand(0.0, 10.0))
, bar(frand(0.0, 10.0), frand(0.0, 10.0), frand(0.0, 10.0))
{
}
virtual void test() {
uba = foo + bar;
}
virtual void print() {
std::cout << uba.alpha << ", " << uba.beta << ", " << uba.theta << std::endl;
}

Foo uba;
Foo foo;
Foo bar;
};

struct Test2 : Tester {
Test2()
: foo(frand(0.0, 10.0), frand(0.0, 10.0), frand(0.0, 10.0))
, bar(frand(0.0, 10.0), frand(0.0, 10.0), frand(0.0, 10.0))
{
}
virtual void test() {
uba = foo;
uba += bar;
}
virtual void print() {
std::cout << uba.alpha << ", " << uba.beta << ", " << uba.theta << std::endl;
}

Foo uba;
Foo foo;
Foo bar;
};

struct Test3 : Tester {
Test3()
: foo(frand(0.0, 10.0), frand(0.0, 10.0), frand(0.0, 10.0))
, bar(frand(0.0, 10.0), frand(0.0, 10.0), frand(0.0, 10.0))
{
}
virtual void test() {
foo.sum(bar, uba);

}
virtual void print() {
std::cout << uba.alpha << ", " << uba.beta << ", " << uba.theta << std::endl;
}

Foo uba;
Foo foo;
Foo bar;
};

int run(Tester * test, const char * name, long long counts, int baseline = 0) {
int ta = timeGetTime();
for(__int64 i = 0; i < counts; i++) test->test();
int tb = timeGetTime();

int delta = tb-ta-baseline;
cout << name << setw(5) << delta << "ms" << std::endl;
return delta;
};

int main(int argc, char* argv[])
{
srand(argc);

timeBeginPeriod(1);

__int64 counts = 10000000;

Baseline basetest;

// calibrate the cost of NOP virtual function call + loop counter
int baseline = run(&basetest, "NOP virtual call", counts);

Test1 test1;
run(&test1, "uba = foo + bar;", counts, baseline);

Test2 test2;
run(&test2, "uba = foo; uba += foo;", counts, baseline);

Test3 test3;
run(&test3, "foo.sum(bar, uba);", counts, baseline);
};



The above is a mess of copy-paste code. Feel free to improve the style as needed. The important points:
- confuse the compiler via virtual functions enough to take potential optimizations out of the equation
- calibrate the base cost and subtract it from measurements
- have each test case in separate class to prevent global optimizations

Quote:
/fp:precise
NOP virtual call 32ms
uba = foo + bar; 534ms
uba = foo; uba += foo; 490ms
foo.sum(bar, uba); 457ms


Quote:
/fp:fast
NOP virtual call 31ms
uba = foo + bar; 320ms
uba = foo; uba += foo; 321ms
foo.sum(bar, uba); 316ms


Quote:
/arch:SSE2
NOP virtual call 31ms
uba = foo + bar; 386ms
uba = foo; uba += foo; 349ms
foo.sum(bar, uba); 345ms


Now you can also begin to study isolated case-by-case generated assembly to see why things are different.

Note: seed isn't reset, but I don't believe it should matter (don't have time or interest to mess with it further right now)

Share this post


Link to post
Share on other sites
Changing the test to do a += equivalent operation changed things quite a bit:


uba = uba + bar; -> 34903ms, uba = 61128.2, 178221, 395.683
uba += bar; -> 32618ms, uba = 61128.2, 178221, 395.683
bar.sum(uba, uba); -> 33224ms, uba = 61128.2, 178221, 395.683


Now there is a recursive relation, making it more difficult to optimize. So it seems you were right Antheus. The compiler played a trick on me :s

Edit: Ah, I see you came to similar results. Thanks for the effort ;)

Share this post


Link to post
Share on other sites
The temporary object that gets created with binary operators like +, can in certain types of classes be eliminated through a technique called "Expression Templates". However it is pretty advanced stuff.

Have you considered implementing operator + in terms of operator += ? It might be worth checking how much it effects performance, but it shouldn't be by much:
	friend Foo operator + (Foo L, const Foo &R) const
{
return L += R;
}

Share this post


Link to post
Share on other sites
Quote:
Original post by iMalc
Have you considered implementing operator + in terms of operator += ?

That's the idiomatic way. Binary operators implemented that way do not have to be friends. They also can take advantage of NRVO to reduce copying. An optimizing compiler can do amazing stuff with inline functions.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this