Jump to content
  • Advertisement
Sign in to follow this  
sooner123

Finding mode and median

This topic is 2505 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Anyone know of any efficient algorithms for finding the mode or median of an unsorted array? (Both of these are trivial if the array is sorted)

I can't find much on the web regarding this. A few forum threads and a mention of an algorithm called "median of medians" which I believe is flawed and does not produce the correct result.

I do not agree that the median of a bunch of medians of a bunch of sets is the true median of the union of all the sets. On average it will probably be close, but I'm not interested in that.

Here are some stabs at algorithms. Please let me know if there are any better:

median:
two lists and a current_median_value
iterate through our array, if element is less than or equal to current_median, put it in min list. if it's greater, put it in max list. when the difference between the number of elements in one list and the other list becomes two, push the smallest (from the max list) or the largest (from the min list) into the current_median_value spot and shove the previous current_median_value into the other list.
-you have to keep both lists sorted which is linear per insert but n^2 in total.
diminishing merge sort
a vanishing (1/2^n) merge sort where you pick your pivot. move everything to its partition, then if a sub-list is smaller than half the total elements of the original array, you discard it. basically you do merge sort where you don't care about the lower half of the array. so it's faster than quick sort. final answer is a O(1) lookup

mode:
hash table
iterate through array. for each element, insert it into hash table if it doesn't exist. otherwise, increment its counter by 1.
-seems like there should be a more efficient way. obviously if an element occurs over half the time, then you can just do bit-wise counting and do the whole thing in linear time and constant space. but that's a clever optimization in that special case. is there any adaptation to the general case?

Share this post


Link to post
Share on other sites
Advertisement
You can compute the median using QuickSelect, which is easy to implement and takes average linear time. If you want a guaranteed linear-time algorithm, use "median of medians".

For the mode, using the hash is the best solution I know of.

Share this post


Link to post
Share on other sites
You can compute the median using QuickSelect, which is easy to implement and takes average linear time. If you want a guaranteed linear-time algorithm, use "median of medians". For the mode, using the hash is the best solution I know of.


How does median of medians give you the true median?

If you take an array, break it into sub arrays, find the median of all of them, then take the median of all the medians u computed, you will NOT have the actual median 100% of the time. Probably not even the majority of the time. I think it's an approximation. A statistical thing that works when you don't necessarily need the exact median but just something close.

Share this post


Link to post
Share on other sites
You need to read the article I quoted, not try to infer its content by its name.

EDIT: To be more helpful, the median of medians is used as the pivot in QuickSelect to guarantee linear worst-case performance.

Share this post


Link to post
Share on other sites
Oh I see. So median of medians finds a good pivot to guarantee that quickselect runs in linear time. It'll be the quick select that finds us our actual median.

Share this post


Link to post
Share on other sites

Anyone know of any efficient algorithms for finding the mode or median of an unsorted array? (Both of these are trivial if the array is sorted)


What does efficient mean?

O(nlogn) is typical sorting complexity.

But radix or merge sort are O(n), which is minimum required for exact solution.

Median-of-medians or similar specialized solutions might have lower constant factors, but are the same as far as complexity goes.

The rest would be implementation trade-offs.

Share this post


Link to post
Share on other sites

O(nlogn) is typical sorting complexity.

But radix or merge sort are O(n), which is minimum required for exact solution.


Hmmm... radix sort is not universally applicable (your data may not be integers), and merge sort is O(n*log(n)). Perhaps you meant something else instead of "merge sort"...

Share this post


Link to post
Share on other sites
Merge has O(n) best case, so it's not ideal.

radix sort is not universally applicable (your data may not be integers)[/quote]

A little though experiment - is median/mode sensible for items which cannot be radix sorted?

We're still at Big-O level here, radix sorting floats can be done by using their string representation if need be.

Share this post


Link to post
Share on other sites

A little though experiment - is median/mode sensible for items which cannot be radix sorted?

We're still at Big-O level here, radix sorting floats can be done by using their string representation if need be.


I am not sure if I have a practical example, but if you implement a data type that represents algebraic real numbers (as a polynomial and an approximate value of the root), there are algorithms that allow you to compare any two numbers, so you can use a comparison-based selection algorithm to compute the median. However, radix sort is kind of useless in this case.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!