Quantifying Difference Between Two Images

Started by
4 comments, last by L. Spiro 11 years, 6 months ago
Hi,

I am learning about genetic algorithms and my goal is to make a replica (similar at any rate) to the one shown in this YouTube video:

[media]
[/media]

I need to be able to rate the "fitness" of my generated images. To do this, I need to compare them to a supplied picture, and was wondering what algorithm I should use to quantify the similarity (distance?) between the two images. I'm not familiar with algorithms in this category, so any insight would be very helpful! One that is simple to implement would be best, considering the goal of my project is the genetic algorithm and not comparing images, although a little learning never hurts smile.png

Thanks in advance!
Advertisement
The first idea is to treat the images as vectors in R^(height*width), with one component per pixel, and use Euclidean distance. There are more sophisticated measures that are closer to what humans perceive as being similar. They are usually based on the wavelet transform.

You can do something like reduce the resolution of the images to several different scales and add the corresponding Euclidean distances with some weights (I think that's what you would get with some of the wavelet methods if you were to use Haar wavelets).
The simplest way would be to

int weight = 0;

for each (testPixel)
{
weight += abs(originalPixel.r - testPixel.r); // assuming each channel [0,255];
weight += abs(originalPixel.g - testPixel.g);
weight += abs(originalPixel.b - testPixel.b);
}


The lower the weight the better the match.
You might find the ImageMagick 'compare' tool useful. It has a number of metrics which you can use when comparing images: http://www.imagemagick.org/script/command-line-options.php#metric
You need at least some idea, what kind of similarity you want to use as the basis:

  • The easiest is simply sum color differences of individual pixels (like turch suggested)
  • Another is to find euclidean distance between MxN vectors (like Alvaro suggested)
  • Now you may want to punish big differences proportionally more - in that case you may want to square the absolute differences before before adding them
  • If you want to give less weight to high frequencies you should downsample images before comparing
  • If you are interested in human percievable difference, you should transform your image to YUV or similar colorspace
  • If you want to add more importance to shapes you may want to apply some edge-detection

And so on wink.png
Lauris Kaplinski

First technology demo of my game Shinya is out: http://lauris.kaplinski.com/shinya
Khayyam 3D - a freeware poser and scene builder application: http://khayyam.kaplinski.com/

The simplest way would be to
int weight = 0;

for each (testPixel)
{
weight += abs(originalPixel.r - testPixel.r); // assuming each channel [0,255];
weight += abs(originalPixel.g - testPixel.g);
weight += abs(originalPixel.b - testPixel.b);
}


The lower the weight the better the match.

This is not the way to go and can in fact be very hurtful to the actual result.


What you want is to use MSE (mean squared error) if your goal is simply to compare qualities between different images, or PSNR (peak signal-to-noise-ratio) if you want to discretely quantify the differences between your images (in other words, calculating the MSE is part of determining the PSNR, and a higher MSE always results in a lower dB (the measurement for PSNR), so if your goal is simply to compare images to determine which is closer to the original than another, MSE is all that is needed, whereas if you want to quantify exactly how close using dB, you should continue on and calculate the PSNR, although even still this part is optional).

For MSE (all you probably really need), the pseudo code would be:
error = 0
for ( all Y ) {
for ( all X ) {
error += (src(X, Y) - copy(X, Y)) * (src(X, Y) - copy(X, Y))
}
}
error /= (width * height)


This is applied to each channel separately, so you would have one error value for R, one for G, and one for B.

Then you apply perceptual weights to these error values based on how humans perceive light.

Common weights are:
(0.299f, 0.587f, 0.114f)
(0.212656f, 0.715158f, 0.072186f)
(0.3086f, 0.6094f, 0.082f)

Notice how they all add up to 1, and also how green is always weighted much higher than the rest. This is also why the 565 16-bit color format exists, because green is the most important color perceptually. For example, if it was instead red that was most important to us humans, the 16-bit color format would instead have been 655.


Your final total perceptually adjusted error would be (assuming weights of (0.212656f, 0.715158f, 0.072186f)):
redError * 0.212656f + greenError * 0.715158f + blueError * 0.072186f


This gets you a properly weighted perceptually adjusted error metric that allows you to determine how close you are to the original, and is simple to implement.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement