Matching algorithms

This topic is 4860 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I am interested in finding out about any matching algorithms which can pair up items in one list with items in a second list, given a function that can return a degree of similarity between two items, and given that the lists may be of different sizes and some items may not match at all. To try every single permutation would run in factorial time and therefore be prohibitively slow on anything but short lists, so I'm hoping there is a good heuristic approach available. I suppose that ranking all possible matches and then using a stable matching algorithm might be one good approach?

Share on other sites
You want to maximize the sum of the similarity(a,b) for all matched items?

Share on other sites
Just an idea:
If the function for determining similairity, f(a,b) can be written as something like g(a)-g(b)+smallfactor(a,b), then you could sort the list by g, and look only at close neighbours in the sorted list.
Depending on what you are using it for you could also take a look at sequence alignment algorithms.

Share on other sites
Yeah, sequence alignment is a better name for what I'm looking for, although there are not going to be any exact matches, just a lot of very close matches. Yes, that will be maximising the sum of the similarity, or minimising the sum of the difference, but I don't think that alone is sufficient, due to the presence of some data in one or both lists that may need to remain unmatched. The lists are already roughly sorted but the similarity function could be quite complex.

Share on other sites
Surely there is an existing matching/permutation algorithm that minimize the Total Least Square (TLS) of the differences. Try looking up with that keyword. Do you need the absolute best match, or a good local minima will do?

Share on other sites
It may not be possible to score the differences, only to rank them. So I'm interested in more algorithms than just one that will numerically reduce differences, and also in how they will handle sequences of different lengths - what is the square of the difference between an item and a non-existent item, for example?

Share on other sites
You can solve this problem exactly (wrt. some metric) in O(n^6) with a minimum cost flow algorithm (n=|a|+|b|). Maybe it's only n^5, I didn't try to find a tight upper bound.

Both lists can be of different length and arbitrary similarity relations are allowed.

Share on other sites
Quote:
 Original post by Kylotan what is the square of the difference between an item and a non-existent item, for example?

Usually it is a normalization constant is the minimization algorithm.

Share on other sites
This sounds like a classic nearest neighbours search problem, which can be solved efficiently using a k-d tree representation of the data. Create one tree per list (where data is distributed based on intra-list similarity). Then, if you want to find a match for each item in each list: for each item not presently matched, use it as the query to find the nearest neighbour in the other list. You can choose to find the n nearest neighbours with minimal additional runtime cost.

Your problem sounds very much like a data set cross-correlation problem I solved a couple of years ago. It becomes more efficient to use nearest neighbour search than brute force when the number of items for comparison reaches something like 10^5 (if I recall the literature correctly).

If you get desperate for an implementation, drop me an email and I'll forward you some code you can play with.

Cheers,

Timkin

Share on other sites
Would cross-corelation work for what you want?

I've used the Zero-mean normalized cross-correlation algorithm successfully to find similar chunks of data in two images.

Will

• What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

• 24
• 15
• 36
• 12
• Forum Statistics

• Total Topics
634824
• Total Posts
3019461
×