# need help optimising...

## Recommended Posts

What up. I have just written some code to rotate a sprite using SDL and it works quite well...but unfortunately its a little slow. I've done my best with optimisation (with what i know so far...heh), I was just wondering if anyone can offer some tips on how to get it a bit faster. Heres the prototype SDL_Surface* rotate(int degrees, int x0=33, int y0=33) degrees- -degrees to rotate sprite x0,y0 - coordinates of axis of rotation Returns a pointer to a surface containing the rotated sprite. and the code.... ---------------------------------------------------------------------
SDL_Surface* rotate(int degrees, int x0=33,  int y0=33)
{
SDL_Surface *temp_src,*temp_dst;
//The variables used to store coord data
double x2,y2,fx,fy,w1,w2,w3,w4;
int x1,y1;
int x2a,x2b,y2a,y2b;
double cosined,sined;

//Need four variables for surrounding pixels (bilinear filtering)
Uint32	Ca=0,	Cb=0,
Cc=0,	Cd=0;

//Our resultant pixel
Uint32 Pixel=0;

//Sprite data (height and width)
int h=65;
int w=65;

//Absolute axis coords
unsigned int absol_coord_dst;

#if SDL_BYTEORDER == SDL_BIG_ENDIAN
#else
#endif

//If there is no rotation,return original sprite
if (degrees==0) return(basesprite);

//copy the original sprite to temp
temp_src = SDL_DisplayFormat(basesprite);
temp_dst = SDL_DisplayFormat(basesprite);

//Lock both surfaces so we can read/write individual pixels
SDL_LockSurface(temp_src);
SDL_LockSurface(temp_dst);

//Precalc sin/cos values
cosined=cos((360-degrees)*PI/180);
sined=sin((360-degrees)*PI/180);

//For each pixel in our rotated sprite
for(y1=0;y1<h;y1++)
{
for(x1=0;x1<w;x1++)
{

//Find its equiv x,y location before the rotation
x2=(cosined*(x1-x0)-sined*(y1-y0)+x0);
y2=(sined*(x1-x0)+cosined*(y1-y0)+y0);

//If it was out of bounds, set pixel to transparent
if ((x2>64)|(x2<0)|(y2>64)|(y2<0))
{
Pixel=gmask; //should be ColourKey! Set to green for now
}
else
{
//BILINEAR FILTERING
//-------------------
//Now find the four pixels around x2,y2 imaginary location (ie its still a float)
x2a=(int)x2;x2b=x2a+1;
y2a=(int)y2;y2b=y2a+1;

//Used for weightings
fx=x2-x2a;
fy=y2-y2a;
w1=(1-fx)*(1-fy);
w2=(fx)*(1-fy);
w3=(1-fx)*(fy);
w4=(fx)*(fy);

//Assign the four surrounding pixels
Ca=((unsigned int *)temp_src->pixels)[x2a + y2a * w];
Cb=((unsigned int *)temp_src->pixels)[x2b + y2a * w];
Cc=((unsigned int *)temp_src->pixels)[x2a + y2b * w];
Cd=((unsigned int *)temp_src->pixels)[x2b + y2b * w];

//Calculate the weighted average of each channel
| ((Uint32)(  ( w1*( (gmask&Ca) >>8 ) ) + ( w2*( (gmask&Cb) >>8 ) ) + ( w3*( (gmask&Cc)>>8 ) ) + ( w4*( (gmask&Cd) >>8 ) ))<<8)

}

//and store the pixel in the sprite
((unsigned int *)temp_dst->pixels)[x1 + y1 * w]=Pixel;

}
}

//We are finished with the pixels
SDL_UnlockSurface(temp_src);
SDL_UnlockSurface(temp_dst);

//copy the result to "sprite" surface
sprite=SDL_DisplayFormat(temp_dst);

//Dont need temps anymore
SDL_FreeSurface(temp_src);
SDL_FreeSurface(temp_dst);

//Return a pointer to the surface...
return(sprite);

}


don't use sdl :P

##### Share on other sites
sry i couldn't resist :) anyways, i heard some bad stuff with sdl thats why i said that, but it seems in the lines were you have
cosined=cos((360-degrees)*PI/180);
sined=sin((360-degrees)*PI/180);

##### Share on other sites
Before optimizing, you need to know what parts of the code are slow. I would suggest running it through a profiler (or sprinkling some timing information through the function). With that information, you'll be able to tell whether the problem is in your calculations, the SDL surface functions, or whatever.

##### Share on other sites
Quote:
 Original post by Anonymous Posteryou should already have PI/180 figured out, sry i dont know about sdl as much as i should to help you.

The compiler will already do that for you.

##### Share on other sites
Software rotation is of course always slow... can't you use OpenGL for this?

##### Share on other sites
Just one little thing that jumped out at me:

if ((x2>64)|(x2<0)|(y2>64)|(y2<0))

This is a bitwise Or operation. It forces the compiler to evaluate every single part of the conditional before deciding which branch to take. If you use a logical Or (the || operator) the compiler will short-circuit the conditional and abort as soon as it figures out what to do; this will minimize the number of checks you have to make. It won't make a massive difference, but it's important to understand the difference when writing performance-critical code.

For best results, change it to something like:

if((x2 < 0) || (y2 < 0) || (x2 > 64) || (y2 > 64))   Pixel = gmask;else{   // ... Regular code here}

Chances are your slowness is coming from either repeatedly locking/unlocking the surfaces, allocating and freeing temp surfaces, or a mixture of both. If you can find ways to reduce or eliminate those operations as much as possible you'll get a noticeable speed boost. Especially if you are wanting to rotate a sprite per-frame (or many sprites per-frame) this will incur a major penalty. (And, obviously, as Konfusius said, software rotation is slow. Always optimize your algorithms and design before optimizing individual lines of code.)

##### Share on other sites
Thanks for the help guys.

I'll look into learning how to get timing information fom VC++.

One more quick question. I put the following code in to speed up the
bilinear filtering (Instead of interpolating for every pixel), and it seems to have speeded up the code slightly.

For some reason i cant figure out, If i replace Pixel=Ca; with Pixel=rmask;, the affected pixels go blue....and vice versa (bmask makes them go red). I have already copied some code to change their definitions depending on if its little endian or not (see my first post)...any ideas why this is happening?

//If its surrounded by identical pixels...dont interpolate				if((Ca==Cb)&(Cb==Cc)&(Cc==Cd))				{					Pixel=Ca;				}				else				{					//Calculate the weighted average of each channel					Pixel=	  (Uint32)(  ( w1*(rmask&Ca) ) + ( w2*(rmask&Cb) ) + ( w3*(rmask&Cc) ) + ( w4*(rmask&Cd) ) )							| ((Uint32)(  ( w1*( (gmask&Ca) >>8 ) ) + ( w2*( (gmask&Cb) >>8 ) ) + ( w3*( (gmask&Cc)>>8 ) ) + ( w4*( (gmask&Cd) >>8 ) ))<<8)							| ((Uint32)(  ( w1*((bmask&Ca)>>16) ) + ( w2*((bmask&Cb)>>16) ) + ( w3*((bmask&Cc)>>16) ) + ( w4*( (bmask&Cd)>>16 ) )) <<16);				}

##### Share on other sites
Quote:
 if ((x2>64)|(x2<0)|(y2>64)|(y2<0)) { Pixel=gmask; //should be ColourKey! Set to green for now }

if ((x1 | x2) & ((~127)  {pixed = gmask}

##### Share on other sites
Weighted average of 4 pixels?

Hmmm.
I don't know many optimisations for that.

Are you sure you can't get a simple average. (i can do normal averages really fast).

Normal average
pixel = ((((w + x) + (y + z))) & mask) >> 2); //Mask gets rid of pixels that move from red to green, and from green to blue during the shift.

Note:
If you make w and x, into one 16 bit int, you can get it to work faster.
There will be leak through. (for eg. if you have a lot of red, it may leak through to the green if theres heaps of it. But you won't see it much. At most it will change the lsb. That might cause a flow through effect if you set all channels to max tho. Be sure to hold it in a big enough variable, just in case.)

Also note, you do this one the ENTIRE PIXEL. (not per channel. This is what makes it fast).

Note, the >> and the mask do three divides. (for each channel).

All over, compare 3 adds, a bitshift and an and, with having to do (per channel,

Mask, just has a set of zeros where the bits would move into the next channell.

Now, weighted average, i'd have to think some....

From,
Nice coder

##### Share on other sites
x2a=(int)x2;x2b=x2a+1;
y2a=(int)y2;y2b=y2a+1;

//Used for weightings
fx=x2-x2a; //So this is how far it is from an int.

fy=y2-y2a; //Same

w1=(1-fx)*(1-fy);

w2=(fx)*(1-fy);

w3=(1-fx)*(fy);

w4=(fx)*(fy);

I don't know why your doing this. Mind enlightening me? (hey, i'm optimising your code!)

As for neibouring pixels
pixel[x + 1]
pixel[x - 1]
pixel[x + row]
pixel[x - row]

Where row is the length of one row.

Care has to be taken for the edjes of the image. (shove a border around it, and start, using the border as the neibour for the first iteration).

From,
Nice coder

##### Share on other sites
Hey nice coder,

The weighted average is a part of a process called "bilinear filtering", basically instead of taking each pixel in the original and rotating it, you take each pixel in the final image and reverse the rotation to find its colour. Most of the time the rotation doesn't give integer coordinates, so to find the equivalent colour at that (non-int) position, you take a weighted average of the pixels around it.
see http://www.flipcode.com/articles/article_bilinearfiltering.shtml

You're right in that its not the fastest way of performing a rotation, but it does give pretty nice results.

The reason i'm trying to get it faster is because i wanted to be able to rotate a little spaceship in any direction realtime. At the moment it could probably handle that, but not if i need to rotate several frames of an animated sprite.

##### Share on other sites
Are you using fast sin/cos functions?

** edit **
If you are using non-optimized sin/cos routines, what I would recommend is that you create yourself some lookup tables:
float sin_lookup[361]; // 0-360 inclusive
float cos_lookup[361]; // 0-360 inclusive

fill them in at runtime via a loop, and then you may instantly gain the sin/cos of any whole angle:
sin_lookup[angle%360];

If you need more accuracy one option is to linearly interpolate between two values in the lookup.
int i_angle = (int)angle;
float dec = angle – _floor(angle);

float sin = sin_lookup[i_angle]*(1-dec) + sin_lookup[i_angle+1]*dec;

HTH

[Edited by - CyberFox on June 24, 2005 1:00:37 PM]

##### Share on other sites
Optimization questions really shouldn't go in For Beginners. :(

Quote:
 x2=(cosined*(x1-x0)-sined*(y1-y0)+x0);y2=(sined*(x1-x0)+cosined*(y1-y0)+y0);

Since x0/y0 are constant and x1/y1 vary linearly, it should be possible to avoid the multiplications here by doing accumulation instead. Outside the loop, initialize:

x2 = sined*y0 - cosined*x0 + x0;y2 = y0 - sined*x0 - cosined*y0;

Then adjust each by the appropriate sined/cosined on each iteration (remembering to reset at the end of each "row").

OTOH, your compiler might be good enough to do that for you.

Other than that, working indirectly through surfaces is quite possibly the biggest speed hit here. Try caching basesprite->pixels (or whatever), and working with that directly - rotate "into" a local Uint32[] of the same size, and copy the data back to the sprite's pixel data.

## Create an account

Register a new account

• ## Partner Spotlight

• ### Forum Statistics

• Total Topics
627668
• Total Posts
2978540

• 10
• 10
• 10
• 12
• 22