Sign in to follow this  
h3ro

Fastest possible way to rotate an image?

Recommended Posts

Hallo, What is the fastest possible way to rotate an image n degrees? Now I am doing it pixel by pixel, and checking them against the bounding box. It is a bit to slow and would like to speed it up. If it has anything to say, the image is stored as a list of colors in a one dimensional array: eg. imageArray[0] // pixel_1 blue imageArray[1] // pixel_1 green imageArray[2] // pixel_1 red imageArray[3] // pixel_2 blue ... thanks:)

Share this post


Link to post
Share on other sites
You have to do it pixel by pixel unless you call an opengl or directx function to do it that may offload it to the gpu (hence faster). As far as the math goes as long as you're doing it in matrix form there's not really any way to speed it up. I do recommend, if you're not already doing it, for quality purposes that you invert your rotation and go through each pixel in the result image and map it back to the original.

Share this post


Link to post
Share on other sites
You have to do it pixel by pixel, but work backwards, as the above poster noted, start with the rotated destination image and for each pixel, work backwards to figure what source pixel it corresponds most closely to. (there are probably lots of sophisticated dithering ideas I'm ignoring right there.)

If the image data is static, you can rotate it once through 360 degrees at whatever intervals of the circle you need, and cache the pre-rotated images.

Then when you need the image rotated to n degrees, you'll already have it lying around, it's just image[n], for instance. You might also consider using something other than degrees as your angular unit, like 256ths of a circle, so the angle can fit into a byte, or 32nds of a circle, if 256 increments is overkill. You might store more precision, but render less precision, for instance, if storing 256 slightly rotated versions of each image is too memory intensive. How much precision you need there depends on what you're trying to do.

Share this post


Link to post
Share on other sites
You may not want the "fastest" way (in software) as it doesn't look particularly good. I imagine you just want something much faster than you have so far. In order to get any kind of decent result you need to perform bilinear filtering.

Here is an excellent article about making a 2D renderer that makes heavy use of rotation.

Share this post


Link to post
Share on other sites
Hey,

Thanks for the replies.

I am doing the rotation backwards, just forgot to mention it.

smcameron:
Ill try to pre-rotate it to 256 different degrees. Thanks for the tips :)
As of now memory is not really a problem (but cache misses are), as its the cpu that is using most of the time.

iMalc:
Thanks a lot for the article. Look like you have a lot of experience with software rendering? If I remember correct, you made a 3d software render as well? Im trying to add a bit of 3d to my engine, but Im having trouble getting the textures to work.

Thanks :)

Share this post


Link to post
Share on other sites
What does your rotation code look like? The inside of your loops should have no trig functions and no multiplies; when you have the source coordinates for one pixel, the source coordinates for an adjacent pixel can be found by simply adding precomputed values.

Share this post


Link to post
Share on other sites
I'm a bit in a hurry so short answer. If your code suffers from cache misses try tiling your images. At an angle above 90 degress you get essentially a cache miss for every pixel. 8*8 or 16*16 tiles should do better. At least this was the solution when I was doing this stuff on 486 in assembler.

Share this post


Link to post
Share on other sites
Quote:
Original post by Vorpy
What does your rotation code look like? The inside of your loops should have no trig functions and no multiplies; when you have the source coordinates for one pixel, the source coordinates for an adjacent pixel can be found by simply adding precomputed values.


From reading your responds, I think the code can be optimized a bit more, but in not 100% sure on how. Any tips ?



// Convert from radians to degrees. This is the valuse we use to rotate the point back
// to find the right sprite pixel
float cosVal = float(cos(-(rotation * PI / 180.0)));
float sinVal = float(sin(-(rotation * PI / 180.0)));

// For each pixel in the aliignedBox go back to the texture and see if there is a pixel there.
// if it is, copy the color of it
for(int y = yPos; y < (yPos + newTextureRec.getHeight()) ; y++)
{
for(int x = xPos; x < (xPos + newTextureRec.getWidth()) ; x++)
{
vertex point(x,y);

if(boundingBox.pointIntersection(point))
{
// Rotate the point back to find the right pixel color
int newX = (int)(point.x * cosVal - point.y * sinVal);
int newY = (int)(point.y * cosVal + point.x * sinVal);

// Find the correct sprite pixel
BYTE *textureDataPnt = spriteData + (newY * getWidth() + newX) * 4;

// Do the Drawing
// Draw the point
int offset = 4 * ((y * screenRec.getWidth()) + x);

// So alpha checking code

screenData[offset+0] = screenData[offset+0] + (( textureDataPnt[3] * (textureDataPnt[0] - screenData[offset+0] )) >> 8);
// ... More drawing code
}
}
}


// See if a point is inside a rectangle or not
bool BoundingBox::pointIntersection(vertex &point)
{
point.x -= vertexList[1].x;
point.y -= vertexList[1].y;

Vector2D v0( point.x, point.y );
Vector2D v1( (vertexList[0].x - vertexList[1].x), (vertexList[0].y - vertexList[1].y));

float val_1 = Dot(v0,v1);

if(0 <= val_1 && val_1 <= Dot(v1,v1))
{
Vector2D v2( (vertexList[3].x - vertexList[1].x), (vertexList[3].y - vertexList[1].y));
float val_2 = Dot(v0,v2);

if( 0 <= val_2 && val_2 <= Dot(v2,v2) )
return true;
}

return false;
}


Share this post


Link to post
Share on other sites
There's a number of problems with that piece of code.
The first realization is that the texture coordinates only change by a constant value each horizontal and vertical pixel. That is you can keep running counters and add cosVal to the x component and sinVal to the y component for each pixel. Typically you'd also avoid the texture width multiplication by restricting the textures to powers of two and shifting/masking instead.
Similarly there's no need to recalculate the destination pointer for each pixel. It'll simply be increasing by one for each horizontal step, and a getWidth() units for each vertical step, so calculate the base address once at the beginning of the loop and add the appropriate deltas within the loops here too.
Another major performance hit, although this has gotten better with recent CPU models and instruction sets, is the conversion from floating point to integers. So use fixed point or SIMD intrinsics for fast conversions.
Poor cache behavior is a big problem for large images, especially at near 90-degree angles. There are a number ways to deal with this, such as texture swizzling, but probably the easiest solution is simply render the bitmap in small (8x8 say) tiles.
You probably don't want to clip each pixel in the innerloop either. This is typically handled by writing a general-purpose polygon scan converter rather than implementing a rotozoomer like this. However if you decide to do tiled rendering then you may simply predetermine whether an entire tile is either invisible (skipped), fully visible (rendered without clipping), or partially visible (rendered with a slower innerloop).

As you might have guessed there are any number of other tricks and techniques but those should keep you occupied for a while, depending on how far you want to take these optimizations. Let me know if you want anything clarified.

Share this post


Link to post
Share on other sites
Hey,

thanks a lot for your points. Ill look into them, and will see what I can come up with.

There are two thinks I wonder about:
>> So use fixed point or SIMD intrinsics for fast conversions.
Is there a faster way to convert between float/int then casting? I though that the compiler optimized that so that it was as good as possible.
Could you please give an example?

>> probably the easiest solution is simply render the bitmap in small (8x8 say) tiles.
Im not sure I understand how this can lead to a performance gain.

>>You probably don't want to clip each pixel in the innerloop either.
Are you refering to the if(boundingBox.pointIntersection(point)) line?
If so, thats how I do the rotation backwards.
Rotate a point
Check if the point is inside the original sprite
if so, draw it.

Thanks again.

Share this post


Link to post
Share on other sites
Quote:
Original post by h3ro
Is there a faster way to convert between float/int then casting? I though that the compiler optimized that so that it was as good as possible.
Could you please give an example?
Essentially C specifies that float-to-int conversions should round towards zero, while the x86 ABI sets the FPU's rounding mode to round-to-nearest by default. So conversions typically call a fairly slow conversion function which temporarily switches modes. However toggling between two modes is optimized on some processors, and SSE2 has special truncation instructions which the compiler might be able to make use of.
Now in a texture mapper like this you wouldn't want rounding towards zero anyway since it's inconsistent around zero, so there are various ways of getting access to the low-level conversion instructions and getting the compiler not to switch the modes around if you need to. But even in the best case it's still quite slow, considering latency for getting the values back into the integer registers for doing address calculations, so I'd recommend that you use fixed-point instead. Another problem with floating point is that it's quite hard to guarantee that you don't have any off-by-one errors (e.g. stay within the texture) when not clipping every single pixel after the float-to-int conversion.

Quote:
Quote:
probably the easiest solution is simply render the bitmap in small (8x8 say) tiles.
Im not sure I understand how this can lead to a performance gain.
If you render a bitmap at a 90-degree angle then you're jumping around wildly in memory and at worst not getting any mileage out of the cache at all. Basically if the image is large enough then every single pixel in a horizontal scan will reside in a separate cache-line, and will likely kick old data out which you'll need on the next scanline. By rendering in small tiles instead all pixels involved in a single tile will fit nicely in the L1 cache, so even when you're traversing the bitmap along the "wrong" axis the lower columns of the tile will still take advantage of cached data from the upper columns.

Quote:
You probably don't want to clip each pixel in the innerloop either.
Are you refering to the if(boundingBox.pointIntersection(point)) line?
If so, thats how I do the rotation backwards.
Rotate a point
Check if the point is inside the original sprite
if so, draw it.
The standard technique is called polygon scan-conversion. You'll still be mapping destination pixels to the source texture as before but instead of drawing a large rectangular block you'll trace the left and right edges of a convex polygon and fill scanlines in between without any clipping in the innerloop. You'll find any number of tutorials on this if you google for software texture mapping.
The other alternative is to render the polygon in tiles and calculate beforehand whether the *entire* tile is inside or outside the polygon, and only perform clipping on the edge tiles.
Also do you really need to clip against general four-sided polygons? The clipping code for rectangles is a whole lot easier, e.g. simply check whether the x and y coordinates are in range.

Share this post


Link to post
Share on other sites
If vi is the i'th point to be rotated to position vi', and R is a rotation matrix so that vi' = R*vi does the rotation, then (v1'|v2'|v3'|...) = R*(v1|v2|v3|...) will rotate all points simultaneously, where (v1|v2|v3|...) is a matrix containing all the points (each point in a different column). If you think about the matrix multiplication for a minute, this works out to be the same.

Now, you might say: hey, this isnt any faster, you are just writing the same thing in a more succinct form. It is true that we haven't actually done anything groundbreaking here. However, a good matrix multiplication library can actually do better if you write it like this, since mathematicians have figured out incredibly tricky algorithms to speed up matrix multiplications orders of magnitude with all sorts of crazy divide and conquer methods. Thus if you write it like this it should be faster.

In fact, if you use a good matrix multiplication library, you can be assured that this is the fastest possible way to rotate all the points up to current math knowledge, since any faster method would imply a way to multiply matrices faster.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this