Need help formulating a matching algorithm.

This topic is 3969 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I have a group of n people where anyone can be matched to anyone. Only 1 pair is allowed such that: Persons {A,B,C,D,E,F} would have pairings {AF} {CE} {BD} if they are approximate matches. (I am not looking for optimal, as that would be far to expensive) The problem I'm having isn't the matching. It's assigning a single point total to compare 2 people. I've went through ways of doing this by assigned point values for each pair, but I need something faster. Here's what I have for a questionaire without getting into the details. Note that a bool is 2 choices and an int(x) is 3 more more choices where x is the number of choices. The percentage is the importance of the match. For multiple choice questions, a close match can have importance also. For example: How old are you? {18-21, 22-25, 25-29, 30+} 2 19 year olds would match perfectly and get say 10 points, as a 19 and 22 year old may get 7 points. A 18 and 32 year old would get 0 points, but this is more assigning points to matches which I really don't want to do. Here we go with my questionaire: bool: 35% bool: 15% int(4): 10% int(6): 10% int(6): 10% int(3): 10% int(10): 5% int(8): 5% I want to assign a score to each person such that 2 similar scores are very close on the answers. I've came up with the problem of having several opposite answers averaging out to give similar scores for 2 drastically different people. Take the example below where the choices for 4 questions are {bad, average, good, great} person A: Q1: bad Q2: bad Q3: great Q4: great Person B: Q1: great Q2: great Q3: bad Q4: bad If all questions are weighted equally with points assigned to each answer. Person A and B are complete opposites but would be assigned the same score by averaging their answers. Is their a good way to assign a point score to each individual where I could just match them based on a single number rather than trying to match for several questions each with multiple answers?

Share on other sites
I should also note that the 2nd part of this will be utilizing the stable matching algorithm to assign each pair to a location. Their are 8 locations, and each person assigns a ranking of preference, so I kind of have to do this by making sure each person gets what they want, instead of just a '1 person to 1 location'

For the scope of my project at this point, capacity of a location isn't much of a problem, although as the number of people goes up, this could be a problem.

I'm not sure if I should make this a second step, or compare the preferences of each individual on the locations in the algorithm i described in my last post.

Share on other sites
Quote:
 The problem I'm having isn't the matching. It's assigning a single point total to compare 2 people....I should also note that the 2nd part of this will be utilizing the stable matching algorithm to assign each pair to a location.
The problem IS the matching.

You are attempting to apply a matching algorithm that works on a one-dimensional space to a similarity problem that's expressed on a multidimensional space.

Unless the topology of your similarity metric is degenerate, you're not going to find a mapping from multidimensional space into a one dimensional space that preserves your metric. If only.

That's why averaging doesn't make any sense.

A better idea would be to find a metric in your space that makes sense (in the case of four questions, a four dimensional space). For example, let's somewhat arbitrarily map {bad, average, good, great} into the real number line in a way that preserves their relative meaning: {bad -> 0, average ->1, good->2, great->3}.

Now we can apply, say, the typical euclidean metric to find the distance between two people in a way that makes sense. Let's say P1 and P2 are 4 dimensional person vectors, where each dimension captures the answer to one of the test questions:

d(P1, P2) = sqrt( (P1[0] - P2[0])^2 + (P1[1] - P2[1])^2 + (P1[2] - P2[2])^2 + (P1[3] - P2[3])^2 )

Well, this distance does preserve our intuitive understanding of which people are 'close' under the test questions. But how do we use this to pair them up?

Welcome to the nearest neighbor problem! Google will turn up tons of papers.

If your space is small-dimensional (say, d < 15) you can get by using kd trees or other metric trees.

If it's bigger, you can either bite the bullet and take quadratic time or implement locality sensitive hashing.

Share on other sites
Thanks for the reply. I've given up on assigning a score to each individual and will just assign point values to pairs and will just use a variation of the stable marriage problem to do this. If space or time is a problem, I can just break up the group of people into more subsets based on rules like, no male/female combination can be paired; a smoker/non smoker can't be paired, and so forth.

I plan on this working this for over 1000 people and doing it in PHP, so hashing will be fairly simplistic.

1. 1
2. 2
Rutin
20
3. 3
khawk
16
4. 4
A4L
14
5. 5

• 11
• 16
• 26
• 10
• 11
• Forum Statistics

• Total Topics
633756
• Total Posts
3013707
×