Jump to content
  • Advertisement
Sign in to follow this  
Endar

Multiplication

This topic is 4831 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Now, multiplying integers with bit-shifting and addition, I heard that it was fast a while ago, and I thought "uh oh, that sounds hard!". Last night I thought I'd try it. I did some multiplication of binary numbers on paper, and the process became ludicrously obvious (if I'm wrong, I'll be feeling like an idiot). So, I decided to do some speed testing between my function and the multiplication operator, and, surprise, surprise, I lost. :D I just wanted to post my code and the info from my 10 speed runs of the program, and I wanted to know if there are any ideas about speeding it up (short of writing it in asm).

#include <stdlib.h>
#include <time.h>

#include <fstream>
#include <iostream>
using namespace std;

#include "..\everything\util\CHiResTimer.h"



/**
 * It always ends up that you are adding to a total, the first
 * number shifted by a specific number.
 * If you can't understand this, do a few multiplication sums
 * with binary.
 */
int multiply(int a, int b)
{
	int temp = 0;

	for(int i=0; i < 31; i++)
		if( (b >> i) & 0x1 )	// if the bit 'i' is a 1, 
			temp += (a << i);	// then add to the total 'a' << i

	return temp;
}

// average of an array of ints
float average(float* arr, int size)
{
	float temp = 0;
	for(int i=0; i < size; i++)
		temp += arr;

	return (temp / (float)size);
}


#define COUNT_MAX 100000000

int main()
{
	CHiResTimer timer;

	float* arr = new float[COUNT_MAX];	// time averages
	float myav, av;
	time_t total_start;

	//int a = 5;
	//int b = 56555;

	int a = 0x0fffffff;
	int b = 0x0fffffff;

	int count;

	// if system supports the timer
	if( !timer.init() ){
		cout << "System doens't support Hi-Res timer." << endl;
		exit(1);
	}


	ofstream s("multiply test.txt", ios::app );	// append to the file
	// get time and print it to file before message
	time_t t = time(NULL);
	struct tm* now = localtime( &t );
	char buf[200];
	strftime(buf, 200, "%d/%m/%y %I:%M:%S\t ", now);

	s << buf << endl << endl;

	total_start = time(NULL);	// get start time of whole prodecure	

	timer.getElapsedSeconds();
	/**
	 * Speed testing for the multiplication function, compared to the
	 * default multiplication operator for integers.
	 */
	for(count = 1000; count <= COUNT_MAX; count*= 10 ){

		// my multiply function
		for(int i=0; i < count; i++){
			multiply(a, b);		// multiply

			arr = timer.getElapsedSeconds();	// get time spent in multiply function
		}

		myav = average(arr, count);
		cout << "My function average time (seconds) over " << count << " trials: " << myav << endl;
		s << "My function average time (seconds) over " << count << " trials: " << myav << endl;


		timer.getElapsedSeconds();
		// default * operator
		for(i=0; i < count; i++){
			a * b;

			arr = timer.getElapsedSeconds();
		}

		av = average(arr, count);
		cout << "Default * operator average time (seconds) over " << count << " trials: " << av << endl;
		s << "Default * operator average time (seconds) over " << count << " trials: " << av << endl;


		if( myav < av ){
			cout << "My function was faster." << endl;
			s << "My function was faster." << endl;

		}
		else{
			cout << "Default was faster." << endl;
			s << "Default was faster." << endl;
		}

		cout << endl << endl;
		s << endl << endl;
		s.flush();
	}

	s << "Total time taken: " << (time(NULL) - total_start) << " seconds." << endl << endl;


	s.close();		// close the file
	delete[] arr;	// delete the array

return 0;
}



// CHiResTimer.h
#include <windows.h>


/**
 * A class that uses QueryPerformanceTimer
 */
class CHiResTimer
{
public:

	LARGE_INTEGER m_startTime;
	LARGE_INTEGER m_ticksPerSecond;

public:

	/**
	 * Default constructor
	 */
	CHiResTimer()
	{

	}

	/**
	 * Destructor
	 */
	~CHiResTimer()
	{

	}

	/**
	 * If the hi res timer is present, the tick rate is stored and the function
	 * returns true. Otherwise, the function returns false, and the timer should
	 * not be used.
	 * \return True if the system supports the hi-res timer, else false.
	 */
	bool init()
	{
		if( !QueryPerformanceFrequency(&m_ticksPerSecond) ){
			// return false because the system doens't support hi-res timer
			return false;
		}
	
		// system supports hi-res timer
		QueryPerformanceCounter(&m_startTime);
		return true;
	}

	/**
	 * Returns the total time in seconds (as a float) that has elapsed since the 
	 * function was last called.
	 * \return The number of elapsed seconds (as a float) that has elapsed since the function was last called.
	 */
	float getElapsedSeconds()
	{
		static LARGE_INTEGER s_lastTime = m_startTime;
		LARGE_INTEGER currentTime;

		QueryPerformanceCounter(¤tTime);

		float seconds = (	(float)currentTime.QuadPart - (float)s_lastTime.QuadPart) /
							(float)m_ticksPerSecond.QuadPart;

		// reset the timer
		s_lastTime = currentTime;
		
		return seconds;
	}

};



And, here's the info from the speed tests. I ran the program 10 times, and each of those times, first 1000 multiplications were done, then 10000, etc, all the way to 100,000,000 ( I wanted to go one higher, but with 1,000,000,000, I would have needed about 3 gig of RAM, and, well, I don't have that much, and I'm not going to bother about sitting here while the computer swaps 3 gig worth in and out of memory :D ).
22/08/05 09:21:52	 

My function average time (seconds) over 1000 trials: 1.02774e-006
Default * operator average time (seconds) over 1000 trials: 3.58368e-007
Default was faster.


My function average time (seconds) over 10000 trials: 6.4154e-007
Default * operator average time (seconds) over 10000 trials: 3.60087e-007
Default was faster.


My function average time (seconds) over 100000 trials: 4.84503e-007
Default * operator average time (seconds) over 100000 trials: 3.76393e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.9021e-007
Default * operator average time (seconds) over 1000000 trials: 3.59801e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.71545e-007
Default * operator average time (seconds) over 10000000 trials: 3.7323e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.72407e-007
Default * operator average time (seconds) over 100000000 trials: 3.70468e-007
Default was faster.


Total time taken: 94 seconds.

22/08/05 09:25:02	 

My function average time (seconds) over 1000 trials: 8.86019e-007
Default * operator average time (seconds) over 1000 trials: 4.52297e-007
Default was faster.


My function average time (seconds) over 10000 trials: 7.88578e-007
Default * operator average time (seconds) over 10000 trials: 4.6495e-007
Default was faster.


My function average time (seconds) over 100000 trials: 4.86409e-007
Default * operator average time (seconds) over 100000 trials: 3.64237e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.79855e-007
Default * operator average time (seconds) over 1000000 trials: 3.64086e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.74222e-007
Default * operator average time (seconds) over 10000000 trials: 3.68062e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.71255e-007
Default * operator average time (seconds) over 100000000 trials: 3.6586e-007
Default was faster.


Total time taken: 94 seconds.

22/08/05 09:26:56	 

My function average time (seconds) over 1000 trials: 9.53864e-007
Default * operator average time (seconds) over 1000 trials: 4.62415e-007
Default was faster.


My function average time (seconds) over 10000 trials: 8.99186e-007
Default * operator average time (seconds) over 10000 trials: 3.54564e-007
Default was faster.


My function average time (seconds) over 100000 trials: 5.26999e-007
Default * operator average time (seconds) over 100000 trials: 3.68218e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.80827e-007
Default * operator average time (seconds) over 1000000 trials: 3.62536e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.74124e-007
Default * operator average time (seconds) over 10000000 trials: 3.69278e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.74335e-007
Default * operator average time (seconds) over 100000000 trials: 3.84967e-007
Default was faster.


Total time taken: 96 seconds.

22/08/05 09:29:24	 

My function average time (seconds) over 1000 trials: 8.96095e-007
Default * operator average time (seconds) over 1000 trials: 4.53138e-007
Default was faster.


My function average time (seconds) over 10000 trials: 9.89692e-007
Default * operator average time (seconds) over 10000 trials: 3.94524e-007
Default was faster.


My function average time (seconds) over 100000 trials: 4.86586e-007
Default * operator average time (seconds) over 100000 trials: 3.65008e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.78785e-007
Default * operator average time (seconds) over 1000000 trials: 3.63629e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.71359e-007
Default * operator average time (seconds) over 10000000 trials: 3.64601e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.69104e-007
Default * operator average time (seconds) over 100000000 trials: 3.70442e-007
Default was faster.


Total time taken: 94 seconds.

22/08/05 09:32:25	 

My function average time (seconds) over 1000 trials: 9.22947e-007
Default * operator average time (seconds) over 1000 trials: 4.53717e-007
Default was faster.


My function average time (seconds) over 10000 trials: 8.17053e-007
Default * operator average time (seconds) over 10000 trials: 3.8023e-007
Default was faster.


My function average time (seconds) over 100000 trials: 4.89535e-007
Default * operator average time (seconds) over 100000 trials: 3.63216e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.84121e-007
Default * operator average time (seconds) over 1000000 trials: 3.67647e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.76502e-007
Default * operator average time (seconds) over 10000000 trials: 3.65797e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.69503e-007
Default * operator average time (seconds) over 100000000 trials: 3.64771e-007
Default was faster.


Total time taken: 93 seconds.

22/08/05 09:34:47	 

My function average time (seconds) over 1000 trials: 8.91015e-007
Default * operator average time (seconds) over 1000 trials: 4.54103e-007
Default was faster.


My function average time (seconds) over 10000 trials: 8.75183e-007
Default * operator average time (seconds) over 10000 trials: 3.67992e-007
Default was faster.


My function average time (seconds) over 100000 trials: 4.84544e-007
Default * operator average time (seconds) over 100000 trials: 3.60411e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.82569e-007
Default * operator average time (seconds) over 1000000 trials: 3.65065e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.67239e-007
Default * operator average time (seconds) over 10000000 trials: 3.69432e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.71798e-007
Default * operator average time (seconds) over 100000000 trials: 3.7032e-007
Default was faster.


Total time taken: 94 seconds.

22/08/05 09:37:44	 

My function average time (seconds) over 1000 trials: 7.34354e-007
Default * operator average time (seconds) over 1000 trials: 3.5702e-007
Default was faster.


My function average time (seconds) over 10000 trials: 6.29189e-007
Default * operator average time (seconds) over 10000 trials: 3.5867e-007
Default was faster.


My function average time (seconds) over 100000 trials: 4.90533e-007
Default * operator average time (seconds) over 100000 trials: 3.93052e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.77179e-007
Default * operator average time (seconds) over 1000000 trials: 3.68698e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.71229e-007
Default * operator average time (seconds) over 10000000 trials: 3.64018e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.73007e-007
Default * operator average time (seconds) over 100000000 trials: 3.73952e-007
Default was faster.


Total time taken: 94 seconds.

22/08/05 09:39:43	 

My function average time (seconds) over 1000 trials: 8.99486e-007
Default * operator average time (seconds) over 1000 trials: 4.47087e-007
Default was faster.


My function average time (seconds) over 10000 trials: 8.31571e-007
Default * operator average time (seconds) over 10000 trials: 4.40343e-007
Default was faster.


My function average time (seconds) over 100000 trials: 4.8538e-007
Default * operator average time (seconds) over 100000 trials: 3.62069e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.79304e-007
Default * operator average time (seconds) over 1000000 trials: 3.63649e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.78035e-007
Default * operator average time (seconds) over 10000000 trials: 3.69182e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.72039e-007
Default * operator average time (seconds) over 100000000 trials: 3.67567e-007
Default was faster.


Total time taken: 94 seconds.

22/08/05 09:41:51	 

My function average time (seconds) over 1000 trials: 8.95219e-007
Default * operator average time (seconds) over 1000 trials: 4.52262e-007
Default was faster.


My function average time (seconds) over 10000 trials: 8.28803e-007
Default * operator average time (seconds) over 10000 trials: 4.4802e-007
Default was faster.


My function average time (seconds) over 100000 trials: 6.29538e-007
Default * operator average time (seconds) over 100000 trials: 3.74929e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.87073e-007
Default * operator average time (seconds) over 1000000 trials: 3.61068e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.80001e-007
Default * operator average time (seconds) over 10000000 trials: 3.69535e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.70066e-007
Default * operator average time (seconds) over 100000000 trials: 3.68463e-007
Default was faster.


Total time taken: 94 seconds.

22/08/05 09:43:57	 

My function average time (seconds) over 1000 trials: 8.93779e-007
Default * operator average time (seconds) over 1000 trials: 4.50665e-007
Default was faster.


My function average time (seconds) over 10000 trials: 8.11227e-007
Default * operator average time (seconds) over 10000 trials: 4.29347e-007
Default was faster.


My function average time (seconds) over 100000 trials: 4.77434e-007
Default * operator average time (seconds) over 100000 trials: 3.63546e-007
Default was faster.


My function average time (seconds) over 1000000 trials: 4.77176e-007
Default * operator average time (seconds) over 1000000 trials: 3.68482e-007
Default was faster.


My function average time (seconds) over 10000000 trials: 4.72453e-007
Default * operator average time (seconds) over 10000000 trials: 3.65122e-007
Default was faster.


My function average time (seconds) over 100000000 trials: 4.78379e-007
Default * operator average time (seconds) over 100000000 trials: 3.67524e-007
Default was faster.


Total time taken: 95 seconds.



What I was wondering most was that with most of them, the first two with 1000, and 10,000 multiplications took so much longer than all the rest. EDIT:: I think I found why the first two took so much longer. I wasn't "resetting" the timer, instead, it was "restting" when it was created, since whenever 'getElapsedSeconds()' is called it returns the elapsed time since the last call to 'getElapsedSeconds()'. But, still, the first couple are always taking more time than the rest.

Share this post


Link to post
Share on other sites
Advertisement
You method is always going to be slow. First, your multiply function is pretty much wired into the CPU, thats HOW your CPU does multiplication as far as I know. Writing it out is slow. Also, the overhead of calling another function will slow it down, although the compiler would probably inline that function.

The only time bitshifts are really faster than multiplying is when multiplying by a constant. Bitshifting something to the left once is MUCH faster than loading the value 2 from memory and then multiplying.

Then again, I would expect most comilers to automatically use bitshifts when you say something like x*4. Anybody know if this is the case?

Share this post


Link to post
Share on other sites

int multiply(int a, int b)
{
int temp = 0;

for(int i=0; i < 31; i++)
if( (b >> i) & 0x1 )
temp += (a << i);
return temp;
}


Are you serious ? [smile]

Share this post


Link to post
Share on other sites
Quote:
Original post by blizzard999

int multiply(int a, int b)
{
int temp = 0;

for(int i=0; i < 31; i++)
if( (b >> i) & 0x1 )
temp += (a << i);
return temp;
}


Are you serious ? [smile]



No, I just enjoy posting large amounts of code [smile]

Serious about what? Does this code not work? I wasn't serious about getting it to run faster. I know that a really good day would be getting something to run almost as fast as anything the compiler spits out.

Share this post


Link to post
Share on other sites
Quote:

What I was wondering most was that with most of them, the first two with 1000, and 10,000 multiplications took so much longer than all the rest.


Most likely because of the cache. Your test is pretty memory intensive, so I imagine it gets a lot of cache misses when it first starts up. But once it gets going, the cache is able to keep up with you. The cache does a lot of things to try to predict what memory you'll need next, one of these things is that it figures if you need memory X, you'll probably need a bunch of memory that comes right after X. So you get much fewer cache misses when you are moving forward through the data, and the more time you spend doing this, the less of an impact the initial cache misses have on the final average.

Also, you don't need to store every single test. You could just go like this:


long start_time = timer.getElapsedSeconds();
for(int i=0; i < count; i++){
multiply(a, b); // multiply
}
long end_time = timer.getElapsedSeconds();
my_av = (float) (((double) end_time - start_time) / count);

Share this post


Link to post
Share on other sites
In neither case do you do anything with the resulting product value: there's a good chance that the compiler is able to optimize out the multiplication entirely (while it might not manage the same with the function call, because the analysis is a bit harder there).

Then again, your timing method is horrible: any given call takes negligible time even by the standards of the best timing method that's likely available to you, so the values in the array will be really imprecise. Worse, you mostly time the overhead of the timing calls. What you want to do is getElapsedSeconds() once after the end of the loop, and divide the result by the count. Much more straightforward too :)


// A more solid test
int accum = 0;
for(i=0; i < count; i++){
accum += (a * b);
}
float macTime = timer.getElapsedSeconds(); // time to Multiply and Accumulate
// Now find the time for just accumulating and subtract it out
for(i=0; i < count; i++) {
accum += a;
}
cout << "multiplication (after correction) takes " << (timer.getElapsedSeconds() - macTime)/ count << " seconds" << endl;
// make sure the compiler knows you have to do the multiplication
cout << "the result of all accumulations is " << accum << endl;


But even then, the result is likely to be totally bogus simply because a and b will be determined to be constant, and thus 'a * b' will be constant-folded by the compiler (a trivial optimization; determining the constant-ness is the hard part).

Share this post


Link to post
Share on other sites
Unfortunately, the method of performing multiplication by using other operations such as shifting will only ever be fast in the most basic of cases, such as multiplying by 3 (one shift, one add). Even then it's probably not worth it.

However, don't despair[smile]. You've done well to work that out on your own. A number of people come here wanting to work it out, but got stuck. The thing is, that is exactly what you need to write if you want to implement your own huge number class. e.g. a class that acts like a 512-bit integer or something.
Myself I have an n-bit number class, so I'll tell you that there is certainly more that you can optimise with it.

Now, if you want something just slightly harder, try doing the same for division. It's not that bad really.

Have fun![grin]

Share this post


Link to post
Share on other sites
How would I go about writing an n-bit number class?

I mean, how would I start? I've looked on google, but I haven't found a lot to help.

Share this post


Link to post
Share on other sites
Quote:
Original post by Endar
How would I go about writing an n-bit number class?

I mean, how would I start? I've looked on google, but I haven't found a lot to help.
Take a look at BIGINT by "William A. Rossi"[google]

I could show you mine if you really want. It's even better than Rossi's, and includes signed numbers as well.[smile]

Share this post


Link to post
Share on other sites
Now, what the others have said about bitshifting not being fast for dynamic multiplication is certainly true. But it seems to me that your actual experimental design is also a bit off. You are not comparing bitshifts to multiplys, exactly; you are comparing a function call plus a bitshift, to multiply. To factor out the function call, you should wrap the builtin multiply like so :


float multiply(float a, float b) {
return a*b;
}


And then you might want to add some padding instructions to make the functions the same size. And then you'll want to turn off compiler optimising so it doesn't inline one function and not the other. After all that, you'll be in a position to prove that your bitshifts are slower than multiplys. Not before.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!