# Code that needs to be optimized

## Recommended Posts

Hi there, First some introductory words: I'm writing a graphics related application that works on 2D images. An image needs to be tiled (i.e. repeated in an infinite plane) or mirrored (i.e. repeated in a way that every odd tile is flipped in x and y respectively). I've an image class that can take any coordinate (also outside the image's dimensions and negative). To access a pixel, the coordinate is "brought back" to the corresponding coordinate (depending on the tiling or mirroring). This "bringing back" code gets executed millions of times and is the bottle neck of the application. Since the running time takes up to half an hour I try to optimize my code. Below you find my current implementation. My question is whether it is possible to optimize this code (I don't know how) or to compute the coordinates in a different -- more efficient -- way. Many thanks and Cheers, Data
/// A simple 2D point
class Point2D {
int X, Y;
};

/// The image class
class Image2D {
/// some stuff, like member variables and so on

Point2D Tile(const Point2D& coord) const
{
int x = coord.X % (int)m_width;
int y = coord.Y % (int)m_height;
if(x < 0)
{
x += m_width;
}
if(y < 0)
{
y += m_height;
}

return Point2D(x, y);
};

Point2D Mirror(const Point2D& coord) const
{
Point2D result = Tile(coord);

// handle negative coordinates
if(coord.X < 0)
{
int x = -coord.X - 1;    // note that negative coords do not start with 0

// even tiles are flipped
if(((x / m_width) % 2) == 0)
{
result.X = m_width - result.X - 1;
}
}
// odd tiles are flipped
else if(((coord.X / m_width) % 2) != 0)
{
result.X = m_width - result.X - 1;
}

// handle negative coordinates
if(coord.Y < 0)
{
int y = -coord.Y - 1;    // note that negative coords do not start with 0

// even tiles are flipped
if(((y / m_height) % 2) == 0)
{
result.Y = m_height - result.Y - 1;
}
}
// odd tiles are flipped
else if(((coord.Y / m_height) % 2) != 0)
{
result.Y = m_height - result.Y - 1;
}

return result;
};

};

##### Share on other sites
For the "tile" option all you need to do is a simple modulus. The conditionals aren't needed - modulus should have well-defined behaviour for negative numbers. All that should be needed is:
return Point2D(coord.x % m_width, y % m_height);

If m_width and m_height are floats, consider storing them as integers instead. Float to integer conversions are computationally expensive. Also, you may want to consider avoiding returning large data types - I usually don't return anything larger than 4 bytes. But in this case it should be fine (8 bytes).

Although I'm not sure how to do mirror mode.

##### Share on other sites
Quote:
 Original post by Sc4FreakFor the "tile" option all you need to do is a simple modulus. The conditionals aren't needed - modulus should have well-defined behaviour for negative numbers. All that should be needed is:return Point2D(coord.x % m_width, y % m_height);

I tried it out and actually the result of a modulus is negative if the left hand side is negative. So I still need the if statements...

##### Share on other sites
Quote:
Original post by data2
Quote:
 Original post by Sc4FreakFor the "tile" option all you need to do is a simple modulus. The conditionals aren't needed - modulus should have well-defined behaviour for negative numbers. All that should be needed is:return Point2D(coord.x % m_width, y % m_height);

I tried it out and actually the result of a modulus is negative if the left hand side is negative. So I still need the if statements...

Yes you're absolutely right they can't just be left out, or you still get negative values.

You have a few float to int conversions in there by the looks of it. I suggest adding extra member variables m_intHeight, and m_intWidth which hold the height and width pre-converted to integers. That should speed it up a lot, assuming they are floats or doubles.
Then, by expanding the definition of Tile into Mirror, we are able to take advantage of the fact that coord.X in Mirror is only negative when x in Tile is negative (same for coord.Y and y). This finally does remove those if statements.
Next we replace % 2 with & 1, in case we are using a braindead compiler here.
I've then grouped all the X and Y calculations together.
This brings me to the below code. Note that you might not need 'Tile' any more if it isn't used elsewhere. There is certainly more that can be done though. If the height and/or width is always power of two then it can be sped up much more.
    Point2D Tile(const Point2D& coord) const    {        int x = coord.X % m_intWidth;        int y = coord.Y % m_intHeight;        if(x < 0)        {            x += m_intWidth;        }        if(y < 0)        {            y += m_intHeight;        }        return Point2D(x, y);    };    Point2D Mirror(const Point2D& coord) const    {        Point2D result;        result.X = coord.X % m_intWidth;        // handle negative coordinates         if (coord.X < 0)        {            result.X += m_intWidth;            int x = -coord.X - 1;    // note that negative coords do not start with 0            // even tiles are flipped            if (((x / m_intWidth) & 1) == 0)            {                result.X = m_intWidth - result.X - 1;            }        }        // odd tiles are flipped        else if ((coord.X / m_intWidth) & 1)        {            result.X = m_intWidth - result.X - 1;        }        result.Y = coord.Y % m_intHeight;        // handle negative coordinates         if (coord.Y < 0)        {            result.Y += m_intHeight;            int y = -coord.Y - 1;    // note that negative coords do not start with 0            // even tiles are flipped            if (((y / m_intHeight) & 1) == 0)            {                result.Y = m_intHeight - result.Y - 1;            }        }        // odd tiles are flipped        else if ((coord.Y / m_intHeight) & 1)        {            result.Y = m_intHeight - result.Y - 1;        }        return result;    };

[Edited by - iMalc on November 21, 2007 12:15:20 AM]

##### Share on other sites
Quote:
 Original post by iMalcI've then grouped all the X and Y calculations together.*** Source Snippet Removed ***

You're better off not doing that. Leave result.Y = coord.Y % m_intHeight; at the start of the function and let the compiler schedule the divide early, before you need the result for the Y-half of the function. The memory moves for result.Y also give you some space between the time result.X is first calculated and the time it is needed.

##### Share on other sites
Might I suggest providing a line iterator? It's much shorter to compute the "next pixel to the right" given the current pixel, than it is to compute the "texture-space coordinates of pixel x+1", so looping over the pixels in a give line by extracting the iterator to that line's start, and the incrementing it to traverse all pixel addresses should provide an increase in performance.

Also, note that mirroring an image can be resolved simply by creating an image twice as large and manually mirroring the relevant parts.

##### Share on other sites
Actually, looking over your code again, I'm thinking now that m_height and m_width are already integers. Those (int) casts are just there to fool us right?

##### Share on other sites
If you enforce power of 2 textures, you could use:

return Point2D(coord.x & (m_Width - 1), coord.y & (m_Height - 1));

for tiling.

##### Share on other sites
Thanks for all the replies and suggestions.

(1) m_width and m_height are unsigned int. That's why there are casts
(2) The image is not necessarily pow2, even though often the case (I might think of introducing special routines for pow2 images -- any suggestions of how to optimize Mirror() in case of pow2 textures?)
(3) I need to access pixels very randomly and do not really iterate over them. So an iterator wouldn't be of much help. However, I introduced direct access to the raw data and can loop over the memory with pointers if I have to.

@iMalc The idea of moving the Tile function's calculations into Mirror looks quite nice. It is not that much code, so I don't really need to call an extra function.

@outRider What is that with the compiler optimizations? I never think of how the compiler optimizes code, so I'm curious...

Cheers,
data

[Edited by - data2 on November 21, 2007 3:50:11 PM]

##### Share on other sites
I'm thinking that maybe you should mirror the entire texture, and save the mirrored texture? then mirroring reduces to a modulus as well.

Sometimes it is not the algorithm that needs to be optimized, but the design. Are you sure you need to do this millions of times?

##### Share on other sites
Hmm, saving the mirrored texture is not quite the optimal solution, as I already consume plenty of memory.

The whole thing is a texture synthesis project and the underlying algorithm is not exactly small and easy. I could (and will) certainly try to optimize the design in general and try to avoid calls where appropriate. But still the function will be called very often.

The problem is that pixel neighborhoods are computed all the time. For border pixels you need to tile or mirror the image (a user option). In addition, you might have neighboring pixels in the synthesized texture that originate from completely different parts of the input. However, I still need to access the neighbors of these different pixels.

As you can see, many random accesses and a whole lot of computations. Even if I can optimize my implementation to reduce the number of calls, it is still worth optimizing it.

So, does anyone have an idea how to optimize Mirror() for a power of two texture? This would be fantastic, because most often mirroring is used instead of tiling and most often, I deal with power of two textures...

Oh, and I tried out the following code (as proposed by iMalc), and in fact it gets slightly slower! According to the profiler (CodeAnalyst) the first line is slowest, followed by the if statements with the divides...
Point2D Mirror2(const Point2D& coord) const{    Point2D result(coord.X % (int)m_width, coord.Y % (int)m_height);    // handle negative coordinates     if(coord.X < 0)    {        int x = -coord.X - 1;    // note that negative coords do not start with 0        // even tiles are flipped        if(((x / m_width) % 2) == 0)        {            result.X = -result.X - 1;        }        // odd tiles simply need to be shifted into positive range        else        {            result.X += m_width;        }    }    // odd tiles are flipped    else if(((coord.X / m_width) % 2) != 0)    {        result.X = m_width - result.X - 1;    }    // handle negative coordinates     if(coord.Y < 0)    {        int y = -coord.Y - 1;    // note that negative coords do not start with 0        // even tiles are flipped        if(((y / m_height) % 2) == 0)        {            result.Y = -result.Y - 1;        }        // odd tiles simply need to be shifted into positive range        else        {            result.Y += m_height;        }    }    // odd tiles are flipped    else if(((coord.Y / m_height) % 2) != 0)    {        result.Y = m_height - result.Y - 1;    }    return result;}

##### Share on other sites
How about using your graphics card instead of software? I'm not very good at OpenGL, but I remember texture functions specifying wrap modes, including tile and mirror. With a little research I'm sure you could figure out how to pull the hardware-mirrored pixels from vram.

This is assuming your images are reasonably sized. I realize your app isn't realtime so maybe this won't work.

##### Share on other sites
In fact, the original algorithm is written for the GPU :-). However, I first have to try out things on CPU, because development is much easier than writing shaders. Nevertheless, optimizing the CPU code is still worth it --- e.g. to save time for the tests (already got ~30% out of it).

##### Share on other sites
Quote:
 Original post by data2Oh, and I tried out the following code (as proposed by iMalc), and in fact it gets slightly slower! According to the profiler (CodeAnalyst) the first line is slowest, followed by the if statements with the divides...*** Source Snippet Removed ***
Wow really, that's horribly unexpected!

Well as they say: Profile, optimise, profile, optimise.

Still, I'm at a loss to explain why that could be. Are you sure you ran the exact same test? Nothing else was different?

No wait, I see your mistake, you put result.X += m_width; in an else case, but the preceeding if case relies on that statement already having been executed becuase it uses the previous value. So you introduced a bug, and that extra else case means you've got about as much branching as you had before.
Compare it to the code I posted, and try my code except changing back to with m_width and m_height.

Also (and this may be counter-intuitive) but thanks to pipeline stalls, this:
if condition then    doAelse    doB
Can actually be slower than this:
doAif !condition then    undoA    doB
Assuming doA is something fairly simple (like incrementing a variable)

##### Share on other sites
Quote:
 Also (and this may be counter-intuitive) but thanks to pipeline stalls, this: /* code snip */

I've been drinking, but are you implying that he drop into assembly? Thats a bit much.

##### Share on other sites
Those modulus operations are your performance killer. Period.

We can argue bout micro-optimizing the rest of the code but the fact remains that those operations take an extremely long amount of time.

On 32-bit integers we are talking about ~40 CPU cycles to perform one of them on an AMD64 class machine. So the absolute hard-cap max throughput of this on a 2ghz AMD64 processor is 50 million per second. It does not help that the very next thing you do is perform an operation that depends on the result of the moduluses but thats neither here nor there.

I suggest a strategy shift..

..now you say that you arent dealing with powers of two.. however there is no reason that you cannot baseline on a very large power of two (such as 2^31)

that is to say, ALL input texture coordinates and increments could be given in 1.31 fixed point, tiled or mirrored within that scale, and then these values could be rescaled for output to the proper range via multiplication and a shift

For tiled we are looking at something like:

x = coord.x & (1 << 31 - 1); // modulus by 2^31
y = coord.y & (1 << 31 - 1); // modulus by 2^31
return point2d( (x * m_width) >> 31,
(y * m_height) >> 31); // scale to actual texture range

I appologize for assuming that your integer word length is 64 bit .. make sure that x * m_width and y * m_height are computed with 64-bit precision

##### Share on other sites
There's another way you can do the mirror:

1. Tile it at double the width and height. For a power of 2 this is trivial.
2. Do the mirror with a lookup table, or some bit twiddling.

You will probably also gain some speed by taking a pointer to the output location to avoid a copy constructor / assignment operator call on return.

Here's some code that should work for mirroring power of 2 textures without any divides or multiplies. Assumes int is 32-bit, but you can easily add a define for the number of bits in an int if need be. Also assumes that bit shifts on an int preserve the sign bit.

If your CPU has slow variable sized shifts (e.g. PPC) use a lookup table instead of the bit twiddling for the mirror (i.e. x = m_Lookup[m_WidthPower][x];).

int m_WidthPower = 5;int m_HeightPower = 5;struct Point2D {int x, y;};// Assume m_WidthPower is set such that 1 << m_WidthPower == m_Width. Same for height.void Mirror(const Point2D *pIn, Point2D *pOut){  // Tile at double the size  int x = pIn->x & ((2 << m_WidthPower) - 1);  int y = pIn->y & ((2 << m_HeightPower) - 1);  // Mirror X  const int bMirrorX = (x << (31 - m_WidthPower)) >> 31;  x ^= bMirrorX;  x += ((2 << m_WidthPower) + 1) & bMirrorX;  // Mirror Y  const int bMirrorY = (y << (31 - m_HeightPower)) >> 31;  y ^= bMirrorY;  y += ((2 << m_HeightPower) + 1) & bMirrorY;  pOut->x = x;  pOut->y = y;}

##### Share on other sites
Quote:
 Original post by data2Point2D Tile(const Point2D& coord) const { int x = coord.X % (int)m_width; int y = coord.Y % (int)m_height; if(x < 0) { x += m_width; } if(y < 0) { y += m_height; } return Point2D(x, y); };

For correctness, I'd replace the ifs with whiles. Otherwise, you have an infinite wraparound in positive direction, but only a single wraparound in negative direction.

Also, you should be able to do that without resorting to branching, I think:

int Wrap(int value, int lower, int upper) {  int distance = upper - lower;  int times = (value - lower) / distance;  return value - (times * distance);}Point2D Tile(const Point2D& coord) const {  return Point2D(    Wrap(coord.X, 0, static_cast<int>(m_width)),    Wrap(coord.Y, 0, static_cast<int>(m_height))  );}

If m_width and m_height are floating point fields you might want to cache their integer counterparts. If that method is really being called millions of times, you'd have millions of float-to-integer conversions as well with that code.

-Markus-

##### Share on other sites
Quote:
 Original post by data2or mirrored (i.e. repeated in a way that every odd tile is flipped in x and y respectively).

Think of 2x2 patches of mirrored tiles as a larger "chunk" which is tiled normally. We can get the tile coordinates within the larger chunk easily, and them map them into original-tile coordinates used to make the chunk.

(Note: I might well be off-by-one in one or more places in the following...)

    Point2D Tile(const Point2D& coord, unsigned int width = m_width, unsigned int height = m_height) const {        int x = coord.X % width;        int y = coord.Y % height;        if (x < 0) { x += width; }        if (y < 0) { y += height; }        return Point2D(x, y);    }    Point2D Mirror(const Point2D& coord) const {        Point2D chunk_coordinate = Tile(coord, 2 * m_width, 2 * m_height);        int& x = chunk_coordinate.X;        int& y = chunk_coordinate.Y;        if (x >= m_width) x = 2 * m_width - x;        if (y >= m_height) y = 2 * m_height - y;    }

We might also benefit from avoiding object construction where possible (the compiler might be sucking at RVO for whatever reason):

// a helper function - can be a free functionvoid wrap(int& x, int& y, int width, int height) {    x %= width;    y %= height;    if (x < 0) { x += width; }    if (y < 0) { y += height; }}// And now:Point2D Tile::Tile(Point2D coord) const {    // I passed by value because we need a copy anyway :)    wrap(coord.X, coord.Y, m_width, m_height);    return coord;}// SimilarlyPoint2D Mirror(Point2D coord) const {    wrap(coord.X, coord.Y, 2 * m_width, 2 * m_height);    if (coord.X >= m_width) { coord.X = 2 * m_width - coord.X - 1; }    if (coord.Y >= m_height) { coord.Y = 2 * m_height - coord.Y - 1; }    return coord;}

Actually, I'm not sure that amounts to any less object construction :
There's a further optimization, though, of course: Because the chunks are symmetrical, we can simplify the logic for the negative values. We should just be able to take the abs() of the coordinates within Mirror(), and assuming that's a compiler intrinsic, we avoid (some) branching.

int mirror_coord(int original, unsigned int size) {    int chunksize = 2 * size;    int result = abs(original) % chunksize;    if (result >= size) within_chunk = chunksize - result - 1;    return result;}Point2D Mirror(const Point2D& coord) const {    return Point2D(mirror_coord(coord.X, m_width), mirror_coord(coord.Y, m_height));}

I think ToohrVyk has the right idea though. The real optimization will come from sharing information between calls to the function; a line iterator can do that.

##### Share on other sites
Quote:
 Original post by CygonFor correctness, I'd replace the ifs with whiles. Otherwise, you have an infinite wraparound in positive direction, but only a single wraparound in negative direction.

I'm afraid that's not true at all. the modulus takes care of most negative values here.
For example, -17 % 4 equals -1. Then add at most one lot of 4 and you get 3. Viola the number is now guaranteed positive.

thedustbustr: Nope, wasn't suggesting assembly at all.

##### Share on other sites
Wow,

first of all many many thanks for all the suggestions. That's much more than I've expected :-).

I currently implemented special functions for power-of-two textures:
Point2D TilePow2(const Point2D& coord) const{    return Point2D(coord.X & ((int)m_width - 1), coord.Y & (m_height - 1));}Point2D MirrorPow2(const Point2D& coord) const{    Point2D result;        result.X = coord.X & ((int)m_width - 1);   // fast tiling    // handle negative coordinates     if(coord.X < 0)    {        int x = -coord.X - 1;    // note that negative coords do not start with 0        // even tiles are flipped        if(((x / m_width) % 2) == 0)        {            result.X = m_width - result.X - 1;        }        // odd tiles simply need to be shifted into positive range        else        {            result.X += m_width;        }    }    // odd tiles are flipped    else if(((coord.X / m_width) % 2) != 0)    {        result.X = m_width - result.X - 1;    }        result.Y = coord.Y & ((int)m_height - 1);   // fast tiling    // handle negative coordinates     if(coord.Y < 0)    {        int y = -coord.Y - 1;    // note that negative coords do not start with 0        // even tiles are flipped        if(((y / m_height) % 2) == 0)        {            result.Y = m_height - result.Y - 1;        }        // odd tiles simply need to be shifted into positive range        else        {            result.Y += m_height;        }    }    // odd tiles are flipped    else if(((coord.Y / m_height) % 2) != 0)    {        result.Y = m_height - result.Y - 1;    }    return result;}

This already gave me some 10% more performance :-). But still, the divides and modulus in the if-statements are the bottlenecks.

@Adam_42 I tried out the code and it gives exeptions. The implementation is as follows:
Point2D MirrorPow2(const Point2D& coord){    Point2D result;        result.X = coord.X & ((2 << m_powerWidth)  - 1);   // fast tiling    result.Y = coord.Y & ((2 << m_powerHeight) - 1);   // fast tiling    // Mirror X    const int bMirrorX = (result.X << (31 - m_powerWidth)) >> 31;    result.X ^= bMirrorX;    result.X += ((2 << m_powerWidth) + 1) & bMirrorX;    // Mirror Y    const int bMirrorY = (result.Y << (31 - m_powerHeight)) >> 31;    result.Y ^= bMirrorY;    result.Y += ((2 << m_powerHeight) + 1) & bMirrorY;    return result;}

Image size is 128x128, powerWidth/powerHeight is 7, and the input coord is (127/128). The problem is that
result.Y = coord.Y & ((2 << m_powerHeight) - 1);

does not tile. The result still is 128. Any ideas?

Cheers,
data

##### Share on other sites
Hmm, looks like I just got my add off by 1. I think you just need to drop the add 1 in the mirroring code.

Here's an updated version, I've added some comments so it's a bit clearer how it works.

Point2D MirrorPow2(const Point2D& coord){    Point2D result;        result.X = coord.X & ((2 << m_powerWidth)  - 1);   // fast tiling    result.Y = coord.Y & ((2 << m_powerHeight) - 1);   // fast tiling    // Mirror X. bMirrorX will be -1 if it's mirrored, 0 otherwise    const int bMirrorX = (result.X << (31 - m_powerWidth)) >> 31;    // Same as: result.X = (result.X == 0) ? result.X : -(result.X +1);    result.X ^= bMirrorX;    // Same as: if (bMirrorX) result.X += (2 << m_powerWidth);    result.X += (2 << m_powerWidth) & bMirrorX;    // Mirror Y. Same as for X    const int bMirrorY = (result.Y << (31 - m_powerHeight)) >> 31;    result.Y ^= bMirrorY;    result.Y += (2 << m_powerHeight) & bMirrorY;    return result;}

##### Share on other sites
Wow, you're right. And actually it gave me another 15%. That's awesome. Many thanks :-)

## Create an account

Register a new account

• ## Partner Spotlight

• ### Forum Statistics

• Total Topics
627662
• Total Posts
2978519

• 10
• 10
• 12
• 22
• 13