Jump to content
  • Advertisement
Sign in to follow this  
Basiror

optimize smooth code

This topic is 4854 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

How would you optimize this code? 1. replacing the '%' operations with simple if else conditions? currently i use the modulo to smooth the transitions 2.maybe one could unroll the loop completely? do compilers unroll loops with 32*32 iterations? or would this blow the code up too much?
//template function for smoothing
	template<class T> void smoothmap(T &x)
	{
		uint32 i = 0;
		uint32 j = 0;

		for(i=1;i<=x.m_uiSize;i++)
		{
			for(j=1;j<=x.m_uiSize;j++)
			{
				x.m_Map[i%x.m_uiSize][j%x.m_uiSize] = 
					//center
					x.m_Map[j]*0.25f 
					//diagonal corners
					+ x.m_Map[(i-1)%x.m_uiSize][(j-1)%x.m_uiSize]*0.0625f
					+ x.m_Map[(i-1)%x.m_uiSize][(j+1)%x.m_uiSize]*0.0625f
					+ x.m_Map[(i+1)%x.m_uiSize][(j-1)%x.m_uiSize]*0.0625f
					+ x.m_Map[(i+1)%x.m_uiSize][(j+1)%x.m_uiSize]*0.0625f
					//horizontal and vertical corners
					+ x.m_Map[(i-1)%x.m_uiSize][(j)%x.m_uiSize]*0.125f
					+ x.m_Map[(i+1)%x.m_uiSize][(j)%x.m_uiSize]*0.125f
					+ x.m_Map[(i)%x.m_uiSize][(j-1)%x.m_uiSize]*0.125f
					+ x.m_Map[(i)%x.m_uiSize][(j+1)%x.m_uiSize]*0.125f
					//end of instruction
					;
			}
		}
	}
2. is there a way to define count as a class specific constant? so i don t have to store it in a variable? that way i could pass it to the templated functions and give the compiler the possibility to unroll everything
template<class T, int count>
class someclass
{
private:
T m_Map[count][count];
}


Share this post


Link to post
Share on other sites
Advertisement
ah I think I found the solution

i just add a enum to the class definition like this one


//template quadratic map class
template<class T, uint32 uiSize> class map
{
public:
//typedefs and enums
enum {MAP_SIZE = uiSize};
typedef T type_name;
public:
//member functions constructors & destructors

//fills the map with zero
map():m_uiSize(uiSize)
{
T tFill = 0;
std::fill<T*,T>(&m_Map[0][0],&m_Map[uiSize-1][uiSize-1],tFill);
}

public:
//friend function declarations to access private members
template<class U> friend void fillmap(U &x);
template<class U> friend void smoothmap(U &x);
private:
uint32 m_uiSize;
T m_Map[uiSize][uiSize];
};



that should do the job

Share this post


Link to post
Share on other sites
It would probably be faster to do something like this: (not sure if the compiler is this smart?)


template<class T> void smoothmap(T &x)
{
uint32 i;
uint32 j;
uint32 ti;
uint32 tj;
uint32 t1, t2;
uint32 im,ip,jm,jp;

for(i=1;i<=x.m_uiSize;i++)
{
ti = i%x.m_uiSize;
im = (i-1)%x.m_uiSize;
ip =(i+1)%x.m_uiSize;
for(j=1;j<=x.m_uiSize;j++)
{
tj = j%x.m_uiSize;
jm=(j-1)%x.m_uiSize;
jp=(j+1)%x.m_uiSize;

x.m_Map[ti][tj]>>=2;
//center point

//diagonal corners
t1 = (x.m_Map[im][jm]
+ x.m_Map[im][jp]
+ x.m_Map[ip][jm]
+ x.m_Map[ip][jp])>>4;

//horizontal and vertical corners
t2 = (x.m_Map[im][tj]
+ x.m_Map[ip][tj]
+ x.m_Map[ti][jm]
+ x.m_Map[ti][jp])>>3;

x.m_Map[ti][tj]+= t1+t2;
}
}
}



This removes a bunch of modules operations, and removes a bunch of multiplication operations as well. Of course there are more optimizations, but this should be a good start. By shifting by 4, it is the same as dividing by 16 (which is what you were trying to do with your mult), and 3 is equivalant to dividing by 8 (as far as integer math is concerned!). If you're using floats, you must replace >>4 with *0.0625f and >>3 with *0.125f, and >>=2 with *=0.25f but that still saves 6 mults per iteration ;). Why do you use [i%x.m_uiSize] and another spot use ? You realize that if you're going from 1 -> uiSize, it will be reading/writing an invalid memory location right? My solution fixes this problem by using modulus for both of them, without actually requiring any extra operations (just extra variables). Each of the modulus for the i loop only has to be executed uiSize times rather than uiSize*uiSize, and it removes the # of times that you repeat the exact same operation ;).

Share this post


Link to post
Share on other sites
Replacing division of powers of 2 by shifts just makes the code harder to read. The compiler is indeed smart enough to apply decades-old point optimizations - assuming optimizations are turning on at all of course.

You can always look at the compiler assembly output to verify this sort of thing.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anon Mike
Replacing division of powers of 2 by shifts just makes the code harder to read. The compiler is indeed smart enough to apply decades-old point optimizations - assuming optimizations are turning on at all of course.


Besides, shifts may not actually be faster. The Pentium 4, for example, doesn't do the shift in hardware, but instead uses an integer divide, iirc.

[Edited by - Promit on August 1, 2005 7:27:06 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
Quote:
Original post by Anon Mike
Replacing division of powers of 2 by shifts just makes the code harder to read. The compiler is indeed smart enough to apply decades-old point optimizations - assuming optimizations are turning on at all of course.


Besides, shifts may not actually be faster. The Pentium 4, for example, doesn't do the shift in hardware, but instead uses an integer divide, iirc.


erm, luckily I have only bought AMDs for arond 5 years now *G*

@ready4dis: i just missed the % in this case but I thnk I will modify it to what you suggested

Share this post


Link to post
Share on other sites
so i have applied the changes


//template function for smoothing
template<class T> void smoothmap(T &x)
{
uint32 i;
uint32 j;

uint32 ti1,ti2,ti3;
uint32 tj1,tj2,tj3;

for(i=1;i<=T::MAP_SIZE;i++)
{
//do modulos only once
ti1 = i%T::MAP_SIZE;
ti2 = (i+1)%T::MAP_SIZE;
ti3 = (i-1)%T::MAP_SIZE;

for(j=1;j<=T::MAP_SIZE;j++)
{
tj1 = j%T::MAP_SIZE;
tj2 = (j+1)%T::MAP_SIZE;
tj3 = (j-1)%T::MAP_SIZE;

x.m_Map[ti1][tj1] =
//center
x.m_Map[ti1][tj2]*0.25f
//diagonal corners
+ (x.m_Map[ti3][tj3]
+ x.m_Map[ti3][tj2]
+ x.m_Map[ti2][tj3]
+ x.m_Map[ti2][tj2])*0.0625f
//horizontal and vertical corners
+ (x.m_Map[ti3][tj1]
+ x.m_Map[ti2][tj1]
+ x.m_Map[ti1][tj3]
+ x.m_Map[ti1][tj2])*0.125f;
//end of instruction
;
}
}
}



as for the floating point multiplications, the m_Map values should be floats in general and i guess if i would use integers the compiler would optimize it to shifts on his own

Share this post


Link to post
Share on other sites
You don't want the compiler to unroll that, it will probably only worsen performance. There are more higher level optimisations left which would actually make a difference.

These moduluses are redundant, so you may as well remove them:
			ti3 = (i-1)%T::MAP_SIZE;
tj3 = (j-1)%T::MAP_SIZE;
Actually, given that it shouldn't matter in what order the smoothing function is applied to the grid elements, there's a cunning way of doing it without any modulus at all (or even an if!)
for (ti1=0, ti2=T::MAP_SIZE-1, ti3=1; ti2 >= 0; ti3=ti1, ti1=ti2, --ti2)
for (tj1=0, tj2=T::MAP_SIZE-1, tj3=1; tj2 >= 0; tj3=tj1, tj1=tj2, --tj2)
This even eliminates i and j too.[cool]

Anyway, by now I assume it's fast enough, if not, the memory access is probably the bottleneck at this stage.[grin]

Share this post


Link to post
Share on other sites
yeah what i could think of would be reorder the memory access to reduce memory jumping if that is still an issue at all with all those optimizations

i assume the compiler prefetches the array on its own so i don t have to add a prefetch instruction right?

Share this post


Link to post
Share on other sites
I run in a little problem now
at the top you see the typedef with 256 which created a cross:map with an array
T[256][256]

so far so good everything works ok as long as i don t insert a nummer > 256
if i insert 512 the application instantly crashes without printing the std:cerr at all

so something must be screwed i use the VC++ toolkit 2003
any idea on what s going wrong?

this piece of code instanciates the map

typedef cross::map<float,256> map32;
int main(int argc,char *argv[])
{
std::cerr<<"short: "<<map32::MAP_SIZE<<std::endl;

map32 maps;

cross::fillmap<map32>(maps);
cross::smoothmap<map32>(maps);
cross::smoothmap<map32>(maps);
cross::smoothmap<map32>(maps);

maps.dump8(std::string("test.raw"));


return 0;
}





2. class declaration and implementation of fillmap +dump8

//template quadratic map class
template<class T, uint32 uiSize> class map
{
public:
//typedefs and enums

//type specific definition of size to allow unrolling of loops
enum {MAP_SIZE = uiSize};
typedef T type_name;
public:
//member functions constructors & destructors

//fills the map with zero
map():m_uiSize(uiSize)
{
T tFill = 0;
std::fill<T*,T>(&m_Map[0][0],&m_Map[uiSize-1][uiSize-1],tFill);
};

//dumps a 8bit heightmap
void dump8(std::string strFilename)
{
uint32 i;
uint32 j;

std::ofstream file;
file.open(strFilename.c_str());

for(i=0;i<uiSize;i++)
{
for(j=0;j<uiSize;j++)
{
file << static_cast<uchar8>((m_Map[j] + 1.0f)*127);
}
}

file.close();
};

public:
//friend function declarations to access private members
template<class U> friend void fillmap(U &x);
template<class U> friend void smoothmap(U &x);
private:
uint32 m_uiSize;
T m_Map[MAP_SIZE][uiSize];
};

//template function for noise generation
template<class T> void fillmap(T &x)
{
uint32 i;
uint32 j;

//loop through the rows
for(i=0;i<T::MAP_SIZE;i++)
{
//loop through the colums
for(j=0;j<T::MAP_SIZE;j++)
{
//set a new random seed everytime
srand(static_cast<int>(time(NULL)));

//retrieve a noise value with a new random pattern
x.m_Map[j] = noise(i,j,rand());
}
}
};






Just found the problem it caused a stack overflow i moved the instance of map32 maps out of the main function and it worked fine

i didn t know that the stack is so tiny
the map32 class is only 1 mb large

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!