Back to General and Gameplay Programming

Surely this optimisation can't be right...

General and Gameplay Programming Programming

Started by MENTAL August 11, 2002 06:45 AM

5 comments, last by MENTAL 21 years, 8 months ago

MENTAL

384

Author

August 11, 2002 06:45 AM

okay, according to my test application, using "Number |= (uint32) pow (2, Pos);" to turn on a bit is faster than using "Number |= 1 << Pos". The test does 1000000000 calculations, and every time i've run it the pow () method comes out tops. I was wondering if you could check out the source code for errors (knowing me there's something stupidly wrong with it).

    

// Bit Test.cpp : Defines the entry point for the console application.


#define WIN32_LEAN_AND_MEAN

#include <stdio.h>
#include <conio.h>
#include <windows.h>
#include <math.h>

typedef unsigned __int32 uint32;

const __int64 NUM_MAX = 1000000000;

int main(int argc, char* argv[])
{
	__int64 i;
	uint32 Pos;
	uint32 Number = 0;
	LARGE_INTEGER Time1, Time2, TimeT;

	Pos = 3;

	printf ("==================== Bit Test ====================\n\n");

	printf ("This is a test to compare two different ways of\n");
	printf ("turning on and off bits. %I64i operations will be\n", NUM_MAX);
	printf ("performed for each test.\n\n");

	printf ("Press any key to continue...");

	getch ();

	printf ("\n\nPerforming test for \"1 << BitPos\"... ");

	QueryPerformanceCounter (&TimeT);

	for (i = 0; i < NUM_MAX; i++)
	{
		Number |= 1 << Pos;
	}

	QueryPerformanceCounter (&Time1);

	Time1.QuadPart -= TimeT.QuadPart;

	printf ("completed.\n\n");

	printf ("Performing test for \"(uint32) pow (2, Pos)\"... ");

	Number = 0;

	QueryPerformanceCounter (&TimeT);

	for (i = 0; i < NUM_MAX; i++)
	{
		Number |= (uint32) pow (2, Pos);
	}

	QueryPerformanceCounter (&Time2);

	Time2.QuadPart -= TimeT.QuadPart;

	printf ("completed.\n\n");

	printf ("Time for method 1: %I64i\n", Time1.QuadPart);
	printf ("Time for method 2: %I64i\n", Time2.QuadPart);
	
	if (Time1.QuadPart < Time2.QuadPart)
		printf ("Method 1 is the quicker.\n\n");

	if (Time1.QuadPart > Time2.QuadPart)
		printf ("Method 2 is the quicker.\n\n");

	if (Time1.QuadPart == Time2.QuadPart)
		printf ("Both methods took exactly the same amount of time.\n\n");

	getch ();

	return 0;
}

Download the program here. the output is raw CPU cycles. ==================== Bit Test ==================== This is a test to compare two different ways of turning on and off bits. 1000000000 operations will be performed for each test. Press any key to continue... Performing test for "1 << BitPos"... completed. Performing test for "(uint32) pow (2, Pos)"... completed. Time for method 1: 20205700 Time for method 2: 19945321 Method 2 is the quicker. EDIT: corrected url [edited by - MENTAL on August 11, 2002 7:48:45 AM]

foofightr

130

August 11, 2002 07:22 AM

That''s a hell of a lot of loop overhead, so I''d think the majority of your time for both tests is spent updating the loop (which explains the similar times). Try to unroll it bigtime, say 32x. That will focus on the operation(s) inside the loop, not the loop update itself.

foofightr

130

August 11, 2002 07:24 AM

And for what it''s worth, these are the results I get:

Time for method 1: 4218462Time for method 2: 5367457Method 1 is the quicker.

I tested several times and method 1 always comes out on top by 10-15%

foofightr

130

August 11, 2002 07:34 AM

Another factor that bloats the loop overhead is that you use __int64 as the counter. Correct me if I''m wrong, but this is not a supported data type by the CPU, so it''s emulated with code (emulate = slow). Every time you perform an operation on it, VC probably calls a library function (major slowdown). Even if it inlines it, the slowdown is still probably substantial. The point is, your counter only goes up to a billion, and this fits inside a 32-bit counter, so use that instead.

Also be careful that the compiler is not doing any optimizations that mess up the results... one obvious one is "Pos" which is always 3, so it probably doesn''t bother checking from memory and uses immediate operands. I think the keyword volatile forces it to do the memory read every time.

To sum up, try using a 32-bit counter, unrolling the loop, and disabling optimizations.

foofightr

130

August 11, 2002 07:59 AM

I took the liberty of doing the changes myself

(and lowering NUM_MAX which was a ridiculous number. no way your test program actually used this btw, the test would take minutes/hours to run)

without unrolling:

method 1 --> ~0.00631 seconds
method 2 --> ~0.25993 seconds
ratio = 41.23

with unrolling:
method 1 --> ~0.0053344 seconds
method 2 --> ~0.2576543 seconds
ratio = 48.3

I think that's more like what you expected?

[edited by - foofightr on August 11, 2002 9:04:38 AM]

Burning_Ice

122

August 11, 2002 05:02 PM

Hi

hehe this is cool..i was astonished at first too that method 2 is faster...but then i noticed that VC++'s speed optimization kicks in there fully

if you let the compiler generate a assembler listing you will see that in method 1 it does a "or esi, 8"
in method 2, it just calls pow and ftol once before entering the loop and stores the result in eax, ( since (uint32)pow(2, pos) is constant since pos stays at 3), and then in the loops just does a "or esi, eax", which seems to be slightly faster then the "or esi, 8"

if i run the thing in debug mode withoud speed optimization method 2 takes a multiple of the time method 1 uses

edit: sorry foofightr, didn't see that you already mentioned that it's a compiler optimization

------------
Runicsoft

[edited by - Burning_Ice on August 11, 2002 6:05:57 PM]

My Programming Tips & Tricks Page

Beer Hunter

712

August 11, 2002 05:41 PM

Time for method 1: 6570184
Time for method 2: 8334828
Method 1 is the quicker.

Surely this optimisation can't be right...

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Surely this optimisation can't be right...

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines