Matching algorithms

Artificial Intelligence Programming

Started by Kylotan November 24, 2005 08:02 AM

20 comments, last by Timkin 18 years, 4 months ago

Kylotan

10,510

Author

November 28, 2005 02:27 PM

Ok guys, I'll look into cross correlation and k-d trees. Thanks. :)

Kylotan

10,510

Author

November 29, 2005 11:05 AM

Hmm, one more thing, which I didn't stress enough before: the two lists are roughly in order - so if the data was perfect, x[1] would match y[1], x[2] would match y[2], etc. Unfortunately, sometimes items are missing from one or the other, and more generally, the values in one list are biased an unknown amount relative to the other list.

A good analogy (in 2D rather than 1D) would be matching fingerprints, where the lines for a good input match and the model it's matching with will be largely superimposed, but with a degree of error either side. Also, the input sample is not going to be exactly in the same position in the image data as the training model is - perhaps one is off-centre, for example. (This corresponds to my bias factor.)

Does that make sense? :)

Anonymous

November 29, 2005 11:54 AM

Could you approach it in a similar way to a diff algorithm? Sounds very similar to me.

Sneftel

1,788

November 29, 2005 12:30 PM

Look up "dynamic time warping". Useful for roughly linear correlations between two sequences.

Sneftel

1,788

November 29, 2005 02:31 PM

Also look up RANSAC.

Timkin

864

November 29, 2005 06:30 PM

Quote:Original post by Kylotan
Hmm, one more thing, which I didn't stress enough before: the two lists are roughly in order - so if the data was perfect, x[1] would match y[1], x[2] would match y[2], etc. Unfortunately, sometimes items are missing from one or the other, and more generally, the values in one list are biased an unknown amount relative to the other list.

Okay, that's a slightly different problem to the way I interpreted your initial post. Correlation algorithms don't typically care where in the list of data you find the information, only what information is contained in the data set (actually, that one data set contains information about the other... Mutual Information is a better way of understanding correlation IMHO). So, my question is this: Consider lists A and B, such that A = { x1,x2,x3,x4,... } and B = { x2,x3,x4,... }. What do you want to say about these lists? Do you want to say they are completely mismatched (becase a1!=b1, a2!=b2, etc) or that they are offset (B(i) = A(i+1)) or that they have a distribution of offsets or something else entirely?

Cheers,

Timkin

Kylotan

10,510

Author

November 30, 2005 03:44 AM

I don't really want to say anything about the lists, just about the data items within them. In this case, I want to report that there are good matches for x2, x3, and x4, and no good match for x1. I'm not interested in the offset except for how it aids matching, and the offset I referred to was not necessarily an offset in the ordinal value of list members - although that can and does sometimes exist - but an offset between the actual data being matched. eg. I would want the list [1,2,100,101] to match to [5,6,104,105]. Here the offset looks like +4 to get the second set of values from the first. However there are error values also so I'd want it to match with [5,6,105,106] too, for example.

evolutional

1,397

November 30, 2005 04:27 AM

Assuming you're comparing strings, look up the Levenshtein Distance and Single/Double Metaphone functions.

Kylotan

10,510

Author

November 30, 2005 04:32 AM

I'm not comparing strings. :) The items are largely judged on two floating point values, one of which dictates the ordering, the other of which just has to 'match'.

RPGeezus

216

November 30, 2005 11:43 AM

This may not be the best solution, but...

From what you have described I think you could use a cross-correlation algorithm. The cross-correlation algorithm will return a degree of similarity for any offset you try.

want to find [1,2,3,4,5,6]in [1, 1, 1, 1, 4,3,3,5,6,7, 1, 1, 1, 1]                -----------

start at offset 0, then 1, 2, 3, 4, 5, 6, ..... if currScore > bestScore then bestOffset = currOffset.

Cross-correlation should tell you that the first list appears at offset 4 in the second list.

Noise in the data shouldn't be a major problem, as the correct match (even if not perfect) will have a higher similarity score than any other.

If list 1 is bigger than list 2, then search list 1 for the occurance of list 2, otherwise search list 2 for list 1.

It might help to normalize the blocks of data (as opposed to normalizing the entire list).

Will

------------------http://www.nentari.com

Matching algorithms

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Matching algorithms

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines