Jump to content
  • Advertisement
Sign in to follow this  
Kylotan

Matching algorithms

This topic is 4614 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am interested in finding out about any matching algorithms which can pair up items in one list with items in a second list, given a function that can return a degree of similarity between two items, and given that the lists may be of different sizes and some items may not match at all. To try every single permutation would run in factorial time and therefore be prohibitively slow on anything but short lists, so I'm hoping there is a good heuristic approach available. I suppose that ranking all possible matches and then using a stable matching algorithm might be one good approach?

Share this post


Link to post
Share on other sites
Advertisement
Just an idea:
If the function for determining similairity, f(a,b) can be written as something like g(a)-g(b)+smallfactor(a,b), then you could sort the list by g, and look only at close neighbours in the sorted list.
Depending on what you are using it for you could also take a look at sequence alignment algorithms.

Share this post


Link to post
Share on other sites
Yeah, sequence alignment is a better name for what I'm looking for, although there are not going to be any exact matches, just a lot of very close matches. Yes, that will be maximising the sum of the similarity, or minimising the sum of the difference, but I don't think that alone is sufficient, due to the presence of some data in one or both lists that may need to remain unmatched. The lists are already roughly sorted but the similarity function could be quite complex.

Share this post


Link to post
Share on other sites
Surely there is an existing matching/permutation algorithm that minimize the Total Least Square (TLS) of the differences. Try looking up with that keyword. Do you need the absolute best match, or a good local minima will do?

Share this post


Link to post
Share on other sites
It may not be possible to score the differences, only to rank them. So I'm interested in more algorithms than just one that will numerically reduce differences, and also in how they will handle sequences of different lengths - what is the square of the difference between an item and a non-existent item, for example?

Share this post


Link to post
Share on other sites
You can solve this problem exactly (wrt. some metric) in O(n^6) with a minimum cost flow algorithm (n=|a|+|b|). Maybe it's only n^5, I didn't try to find a tight upper bound.

Both lists can be of different length and arbitrary similarity relations are allowed.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kylotan
what is the square of the difference between an item and a non-existent item, for example?


Usually it is a normalization constant is the minimization algorithm.

Share this post


Link to post
Share on other sites
This sounds like a classic nearest neighbours search problem, which can be solved efficiently using a k-d tree representation of the data. Create one tree per list (where data is distributed based on intra-list similarity). Then, if you want to find a match for each item in each list: for each item not presently matched, use it as the query to find the nearest neighbour in the other list. You can choose to find the n nearest neighbours with minimal additional runtime cost.

Your problem sounds very much like a data set cross-correlation problem I solved a couple of years ago. It becomes more efficient to use nearest neighbour search than brute force when the number of items for comparison reaches something like 10^5 (if I recall the literature correctly).

If you get desperate for an implementation, drop me an email and I'll forward you some code you can play with.

Cheers,

Timkin

Share this post


Link to post
Share on other sites
Would cross-corelation work for what you want?

I've used the Zero-mean normalized cross-correlation algorithm successfully to find similar chunks of data in two images.

Will

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!