Efficient way to find same pixel?

Started by
25 comments, last by Melekor 14 years, 1 month ago
Hi, Let's say this is to be done: Given an image with 32bit RGBA pixels, I wish to list down all the unique pixels, such that: R:G:B:A = 5:230:5:90 = occurs 5x in image R:G:B:A = 2: 42:5:10 = occurs 2x in image and so on... I could go and compare each pixel one by one through a loop (comparing first pixel with the rest of the image, then comparing 2nd pixel with the rest of the image, and so on). But that would be slow? is there a faster way? Or is there a method / tool already that does this? Thanks!
Advertisement
1) Create a 4 dimensional array of integers, indexed by r, g, b and a.
2) Zero all entries
3) For each pixel, increment the correct array entry
4) Win (if you have enough memory)
Quote:Original post by helloworld123
I could go and compare each pixel one by one through a loop (comparing first pixel with the rest of the image, then comparing 2nd pixel with the rest of the image, and so on).

But that would be slow?

is there a faster way? Or is there a method / tool already that does this?

That can be performed much more efficiently if you sort the image's pixel values. Assuming img is a pointer to the image (if it's already a 32-bit type pointer, the cast is unnecessary).
unsigned int *img32bit = reinterpret_cast<unsigned int *>(img);std::sort(img32bit, img32bit+width*height);

Now all you have to do is find adjacent pixel values with the same color, instead of searching the whole image for the same color.

Quote:Original post by pbryant
1) Create a 4 dimensional array of integers, indexed by r, g, b and a.
2) Zero all entries
3) For each pixel, increment the correct array entry
4) Win (if you have enough memory)

And how much memory do you think would such an array require, even for a moderate color space such as 8-bits per component?
Quote:Original post by Brother Bob
And how much memory do you think would such an array require, even for a moderate color space such as 8-bits per component?


16gb, but that is the speed/memory trade off (which incidentally, is the point I was trying to make)

Depending on your environment, you might not be able to allocate such a big array, so it would have to be split up. However, testing with a 1024x1024 rgb image gets the counts 33x faster than the required std::sort call alone (which essentially just does exactly what he stated in the original post - a compare of each pixel to every other pixel in the image).
What if instead of int you'd store a bool value, so when the value is true, you know there's at least 1 pixel of that color. It would work when you only need to know whether color is unique not how many times it occurs.
Quote:Original post by Eigen
What if instead of int you'd store a bool value, so when the value is true, you know there's at least 1 pixel of that color. It would work when you only need to know whether color is unique not how many times it occurs.


Yeah, if you just need to know if each color occurs then you can shave it down to 4gb, assuming your compiler has 1 byte bools.
Quote:Original post by pbryant
(which essentially just does exactly what he stated in the original post - a compare of each pixel to every other pixel in the image).

No, that's not what the sort method does. It doesn't compare with every pixel of the image, it compares values, and only as much as is needed to determine how many there are. Sorting only rearranges the order so you can make efficient assumptions about the order of equal pixels that you cannot do about the original image.

Quote:Original post by pbryant
Quote:Original post by Eigen
What if instead of int you'd store a bool value, so when the value is true, you know there's at least 1 pixel of that color. It would work when you only need to know whether color is unique not how many times it occurs.


Yeah, if you just need to know if each color occurs then you can shave it down to 4gb, assuming your compiler has 1 byte bools.

If the task is only to find duplicate values, not how many of them (and that is, indeed, how he formulated it), you can even cut it down to 4 Gbits, or 512 MB. (edit: Scrap that; you need 2 bits to determine the second time a color is seen, so 1 GB.)
What are you going to do with such a dataset?

Would it instead suffice that when an user provides a given color, the app would report the count of pixels with that particular color? This way, you wouldn't have to store anything but the actual image.

Niko Suni

Quote:Original post by Brother Bob
No, that's not what the sort method does. It doesn't compare with every pixel of the image, it compares values, and only as much as is needed to determine how many there are. Sorting only rearranges the order so you can make efficient assumptions about the order of equal pixels that you cannot do about the original image.


~Essentially~

I realize that it doesn't actually compare each pixel with all the rest.
std::sort certainly doesn't determine "how many there are".

The point remains that massive gains can be made in terms of speed (far and above sorting) if you are willing to sacrifice memory.

Edit: In fact, you could probably just use a comparatively small hash table rather than a huge array, and still come out ahead by a factor of 30 over sorting.
512MB with one bit per color would be enough to check for dublication (but the helloworld123 wants the count, so probably it's really 16GB).

which version is faster depends on the image size, the bit table has contant cost regarding the amount of pixel O(1) while the sorting is probably O(n log n).

Sometimes you can reduce the color range, e.g. if you want to generate a 256-color palette, you'd be fine with probably 6bit/channel, cause you'll quantitize the image anyway. 2^(3*6) bit are just 262144entries, even using int32, you'll end up with 1MB.


So I think both versions are valid, the better choice depends on the particular case:
-bit/channel
-image resolution
-checking for doublication vs. counting

This topic is closed to new replies.

Advertisement