Jump to content
  • Advertisement
Sign in to follow this  
fluke

image processing performance

This topic is 3627 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

while the below is a very dumbed down example, this kind of interpolation is VERY sluggish against bigger bitmaps at sizes of 4000x3000 px and above. anyone been down this path before and have suggestions on how to get any performance gains? precision is important too. typedef float image[3]; image** a=(image**)malloc(height*sizeof(image*)); image** b=(image**)malloc(height*sizeof(image*)); for(y=0;y<height;y++){ a[y]=(image*)malloc(width*sizeof(image)); b[y]=(image*)malloc(width*sizeof(image)); } //set image data for image 'a' here... int x1,y1; for(y=0;y<height;y++){ for(x=0;x<width;y++){ x1 = (x==0?0:x-1); y1 = (y==0?0:y-1); b[y][x][0] = a[y1][x1][0]; b[y][x][1] = a[y1][x1][1]; b[y][x][2] = a[y1][x1][2]; } }

Share this post


Link to post
Share on other sites
Advertisement
One thing would be the fact that you don't need your array to be floats, presuming you are storing data via the standard RGB channels. Just use 3 chars (8 bits, 0 - 255).

Alternatively, you could use a single integer to represent the 3 channels (what I do), but the downside to this is if you need the RGB values a lot, you have to do the conversions involving a lot of the modulus operator.

If you are going to be saving them in files, the char solution is probably best, since you don't have to bang any bits to output them, they will already be in an outputtable form (unless you have some extra data you need between each bit... again, like me :p).

If you decide to do the integer thing, send me a PM and I will send you the code that I use.

Since you are assigning data, there aren't really any fancy algorithms you can run to speed up the assignment time. One thing you could do is split something like each line of the image into a separate thread to take advantage of multi-core processors.

EDIT:

Just change this:

int x1,y1;
for(y=0;y<height;y++){
for(x=0;x<width;y++){
x1 = (x==0?0:x-1);
y1 = (y==0?0:y-1);

To:
int x1 = 0;
int y1 = 0;
for(y=0;y<height;y++){
for(x=0;x<width;y++){
//assign stuff here
++x1;
++y1;

The reason being that with what you have, you have to do a compare AND subtract each time, this way all you have to do is an increment.

Share this post


Link to post
Share on other sites
you are going to see huge gains by fixing your malloc

int channels = 3;
int height = 4000;
int width = 3000;
float *imageData = (float*)malloc(height*width*sizeof(float)*channels);
inline int index(int x, int y, int c) { return c + x*channels + y*width*channels; }

a[y1][x1][0]; then becomes imageDataA[index(y1,x1,0)];

This will give you much better memory locality and turn all of your
pointer-to-pointer-to-data lookups to pointer-to-data lookups which will also be
a performance boost.


note: typo here, check that this isnt in your real code.
for(y=0;y<height;y++){
for(x=0;x<width;y++){ // <<<<<<<<< should be x++

Share this post


Link to post
Share on other sites
KulSeran, yeah, i just typed that out, it's not from my code :)

I always wondered about storing image data this way, however thought calculating the pixel/color channel index would be expensive per pixel. an added benefit i guess is straight copies can be done using memcpy and the like..

thanks, i'll definitely give this a shot


Share this post


Link to post
Share on other sites
fluke, depending on the access pattern, the indexing could be reduced to something more simple than that.
store width*channels as "stride"
and now you have 2 adds and 2 multiplies. That is going to be nothing compared to the cache miss you might incur all over the place because your data is not contiguous (and thus two rows or a row and column might overlap a cache line and thusly fight the entire time that row is being accessed) And for each item access you have to do two loads from memory instead of 1 in the method i described.

Fancy filters might reach the compute bound stage, but most the time on a modern processor, it can do tonnes of computations in the time it takes one item to be read from memory. So you get big gains by cutting down memory access while doing your image processing.

Share this post


Link to post
Share on other sites
Quote:
Original post by MortusMaximus
One thing would be the fact that you don't need your array to be floats, presuming you are storing data via the standard RGB channels. Just use 3 chars (8 bits, 0 - 255).

Alternatively, you could use a single integer to represent the 3 channels (what I do), but the downside to this is if you need the RGB values a lot, you have to do the conversions involving a lot of the modulus operator.

Modulus? Something like this should work pretty well:


#define ALPHA(x) ((x & 0xff000000) >> 24)
#define RED(x) ((x & 0x00ff0000) >> 16)
#define GREEN(x) ((x & 0x0000ff00) >> 8)
#define BLUE(x) (x & 0x000000ff)

u32 uARGB = 0x53c6d2f8;

u8 uAlpha = ALPHA(uARGB);
u8 uRed = RED(uARGB);
u8 uGreen = GREEN(uARGB);
u8 uBlue = BLUE(uARGB);



And for the conversion to and from float, just multiply/divide by 255.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!