need help optimising...

Started by
12 comments, last by Zahlman 18 years, 9 months ago
What up. I have just written some code to rotate a sprite using SDL and it works quite well...but unfortunately its a little slow. I've done my best with optimisation (with what i know so far...heh), I was just wondering if anyone can offer some tips on how to get it a bit faster. Heres the prototype SDL_Surface* rotate(int degrees, int x0=33, int y0=33) degrees- -degrees to rotate sprite x0,y0 - coordinates of axis of rotation Returns a pointer to a surface containing the rotated sprite. and the code.... ---------------------------------------------------------------------

SDL_Surface* rotate(int degrees, int x0=33,  int y0=33)
{
	SDL_Surface *temp_src,*temp_dst;
	//The variables used to store coord data
	double x2,y2,fx,fy,w1,w2,w3,w4;
	int x1,y1;
	int x2a,x2b,y2a,y2b;
	double cosined,sined;

	//Need four variables for surrounding pixels (bilinear filtering)
	Uint32	Ca=0,	Cb=0,
			Cc=0,	Cd=0;

	//Our resultant pixel
	Uint32 Pixel=0;

	//Sprite data (height and width)
	int h=65;
	int w=65;

	//Absolute axis coords
	unsigned int absol_coord_dst;

	//assign some variables for masking
	Uint32 rmask,gmask,bmask,amask;
	#if SDL_BYTEORDER == SDL_BIG_ENDIAN
	rmask = 0xff000000;
	gmask = 0x00ff0000;
	bmask = 0x0000ff00;
	amask = 0x000000ff;
	#else
	rmask = 0x000000ff;
	gmask = 0x0000ff00;
	bmask = 0x00ff0000;
	amask = 0xff000000;
	#endif

	//If there is no rotation,return original sprite
	if (degrees==0) return(basesprite);
	
	//copy the original sprite to temp
	temp_src = SDL_DisplayFormat(basesprite);
	temp_dst = SDL_DisplayFormat(basesprite);
	
	//Lock both surfaces so we can read/write individual pixels
	SDL_LockSurface(temp_src);
	SDL_LockSurface(temp_dst);
	
	
	//Precalc sin/cos values
	cosined=cos((360-degrees)*PI/180);
	sined=sin((360-degrees)*PI/180);

	//For each pixel in our rotated sprite
	for(y1=0;y1<h;y1++)
	{
		for(x1=0;x1<w;x1++)
		{

			//Find its equiv x,y location before the rotation
			x2=(cosined*(x1-x0)-sined*(y1-y0)+x0);
			y2=(sined*(x1-x0)+cosined*(y1-y0)+y0);
					
			//If it was out of bounds, set pixel to transparent
			if ((x2>64)|(x2<0)|(y2>64)|(y2<0)) 
			{
				Pixel=gmask; //should be ColourKey! Set to green for now
			}
			else
			{
				//BILINEAR FILTERING
				//-------------------
				//Now find the four pixels around x2,y2 imaginary location (ie its still a float)
				x2a=(int)x2;x2b=x2a+1;
				y2a=(int)y2;y2b=y2a+1;

				//Used for weightings
				fx=x2-x2a;
				fy=y2-y2a;
				w1=(1-fx)*(1-fy);
				w2=(fx)*(1-fy);
				w3=(1-fx)*(fy);
				w4=(fx)*(fy);

				//Assign the four surrounding pixels
				Ca=((unsigned int *)temp_src->pixels)[x2a + y2a * w];
				Cb=((unsigned int *)temp_src->pixels)[x2b + y2a * w];
				Cc=((unsigned int *)temp_src->pixels)[x2a + y2b * w];
				Cd=((unsigned int *)temp_src->pixels)[x2b + y2b * w];


				//Calculate the weighted average of each channel
				Pixel=	  (Uint32)(  ( w1*(rmask&Ca) ) + ( w2*(rmask&Cb) ) + ( w3*(rmask&Cc) ) + ( w4*(rmask&Cd) ) )
						| ((Uint32)(  ( w1*( (gmask&Ca) >>8 ) ) + ( w2*( (gmask&Cb) >>8 ) ) + ( w3*( (gmask&Cc)>>8 ) ) + ( w4*( (gmask&Cd) >>8 ) ))<<8)
						| ((Uint32)(  ( w1*((bmask&Ca)>>16) ) + ( w2*((bmask&Cb)>>16) ) + ( w3*((bmask&Cc)>>16) ) + ( w4*( (bmask&Cd)>>16 ) )) <<16);
				
				
			}

			//and store the pixel in the sprite			
			((unsigned int *)temp_dst->pixels)[x1 + y1 * w]=Pixel;


		}
	}
	
	
	//We are finished with the pixels
	SDL_UnlockSurface(temp_src);
	SDL_UnlockSurface(temp_dst);
	
	//copy the result to "sprite" surface
	sprite=SDL_DisplayFormat(temp_dst);
	
	//Dont need temps anymore
	SDL_FreeSurface(temp_src);
	SDL_FreeSurface(temp_dst);

	//Return a pointer to the surface...
	return(sprite);


}
Advertisement
don't use sdl :P
sry i couldn't resist :) anyways, i heard some bad stuff with sdl thats why i said that, but it seems in the lines were you have
cosined=cos((360-degrees)*PI/180);
sined=sin((360-degrees)*PI/180);
you should already have PI/180 figured out, sry i dont know about sdl as much as i should to help you.
Before optimizing, you need to know what parts of the code are slow. I would suggest running it through a profiler (or sprinkling some timing information through the function). With that information, you'll be able to tell whether the problem is in your calculations, the SDL surface functions, or whatever.
Quote:Original post by Anonymous Poster
you should already have PI/180 figured out, sry i dont know about sdl as much as i should to help you.

The compiler will already do that for you.
Software rotation is of course always slow... can't you use OpenGL for this?
Just one little thing that jumped out at me:

if ((x2>64)|(x2<0)|(y2>64)|(y2<0)) 


This is a bitwise Or operation. It forces the compiler to evaluate every single part of the conditional before deciding which branch to take. If you use a logical Or (the || operator) the compiler will short-circuit the conditional and abort as soon as it figures out what to do; this will minimize the number of checks you have to make. It won't make a massive difference, but it's important to understand the difference when writing performance-critical code.

For best results, change it to something like:

if((x2 < 0) || (y2 < 0) || (x2 > 64) || (y2 > 64))   Pixel = gmask;else{   // ... Regular code here}



Chances are your slowness is coming from either repeatedly locking/unlocking the surfaces, allocating and freeing temp surfaces, or a mixture of both. If you can find ways to reduce or eliminate those operations as much as possible you'll get a noticeable speed boost. Especially if you are wanting to rotate a sprite per-frame (or many sprites per-frame) this will incur a major penalty. (And, obviously, as Konfusius said, software rotation is slow. Always optimize your algorithms and design before optimizing individual lines of code.)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Thanks for the help guys.

I'll look into learning how to get timing information fom VC++.

One more quick question. I put the following code in to speed up the
bilinear filtering (Instead of interpolating for every pixel), and it seems to have speeded up the code slightly.

For some reason i cant figure out, If i replace Pixel=Ca; with Pixel=rmask;, the affected pixels go blue....and vice versa (bmask makes them go red). I have already copied some code to change their definitions depending on if its little endian or not (see my first post)...any ideas why this is happening?

//If its surrounded by identical pixels...dont interpolate				if((Ca==Cb)&(Cb==Cc)&(Cc==Cd))				{					Pixel=Ca;				}				else				{					//Calculate the weighted average of each channel					Pixel=	  (Uint32)(  ( w1*(rmask&Ca) ) + ( w2*(rmask&Cb) ) + ( w3*(rmask&Cc) ) + ( w4*(rmask&Cd) ) )							| ((Uint32)(  ( w1*( (gmask&Ca) >>8 ) ) + ( w2*( (gmask&Cb) >>8 ) ) + ( w3*( (gmask&Cc)>>8 ) ) + ( w4*( (gmask&Cd) >>8 ) ))<<8)							| ((Uint32)(  ( w1*((bmask&Ca)>>16) ) + ( w2*((bmask&Cb)>>16) ) + ( w3*((bmask&Cc)>>16) ) + ( w4*( (bmask&Cd)>>16 ) )) <<16);				}
Quote:
if ((x2>64)|(x2<0)|(y2>64)|(y2<0)) 	{	Pixel=gmask; //should be ColourKey! Set to green for now	}



if ((x1 | x2) & ((~127)  {pixed = gmask}


Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Weighted average of 4 pixels?

Hmmm.
I don't know many optimisations for that.

Are you sure you can't get a simple average. (i can do normal averages really fast).

Normal average
pixel = ((((w + x) + (y + z))) & mask) >> 2); //Mask gets rid of pixels that move from red to green, and from green to blue during the shift.

Note:
If you make w and x, into one 16 bit int, you can get it to work faster.
There will be leak through. (for eg. if you have a lot of red, it may leak through to the green if theres heaps of it. But you won't see it much. At most it will change the lsb. That might cause a flow through effect if you set all channels to max tho. Be sure to hold it in a big enough variable, just in case.)

Also note, you do this one the ENTIRE PIXEL. (not per channel. This is what makes it fast).

Note, the >> and the mask do three divides. (for each channel).

All over, compare 3 adds, a bitshift and an and, with having to do (per channel,
4 adds and a divide).

Mask, just has a set of zeros where the bits would move into the next channell.

Now, weighted average, i'd have to think some....

From,
Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.

This topic is closed to new replies.

Advertisement