# Selecting K smallest elements

This topic is 3773 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

In the wikipedia entry, a bound is mentioned: http://en.wikipedia.org/wiki/Selection_algorithm#Lower_bounds Where can I find an algorithm that actually achieves that lower bound?

##### Share on other sites
it tells you right there in the wiki... doing a partial sort. a partial heapsort will do the trick. you just make a heap which can be done in O(logn) and then place the smallest element in the front. then reheap. then repeat k times for the k smallest elements. you should have yourself a kO(logn) ~ O(logn) algorithm if k is small. with heapsort you can find the k-smallest and largest elements as well by making minheaps and maxheaps.

##### Share on other sites
Er.. As far as I know, fastest option for building a heap is O(n), and not O(log n).

##### Share on other sites
Quote:
 Original post by StarfoxEr.. As far as I know, fastest option for building a heap is O(n), and not O(log n).

Yup. If it was O(log n), that would imply you're only touching log n elements!

One of the simplest ways to achieve decent performance when k is reasonably smaller than n is to build a heap in O(n) time, then remove k elements from the heap in O(log (n-i)) time for each i'th element.

I'm not sure that's the lowest bound, Knuth's description of the problem is pretty complicated (although I've skim read over that section). To get an idea of how complicated, Knuth states it took mathematicians several attempts to even determine what the lower bound is!

##### Share on other sites
That bound is not a big-oh. It's the exact number of comparisons involved. In other words, and for the simplest case of k = 2: What algorithm would find the second smallest elements with (n-k+log(n)) comparisons?

##### Share on other sites
Quote:
 Original post by Bombshell_MWhat algorithm would find the second smallest elements with (n-k+log(n)) comparisons?

For k=2 it evaluates to n - 2 + log n.

I would guess that quickselect that returns only k-th (presented in article) element achieves that, although I cannot prove it right now.

Lower bound might also not be universally attainable. Quicksort in general is known as optimal, yet it will never reach lower bound. I seem to recall that it's some 40% away from optimal algorithm for arbitrary N.

##### Share on other sites
ah, yeah, my goof, heapify is O(n). got it confused with reheaping. just trying to tell the guy though that he can use a partial heapsort to solve that problem.

so let's see... heapify = O(n)
then k - 1 reheap operations = (k - 1)O(lg(n))

so lowerbound should be O(n) right (if k is small that is)?

##### Share on other sites
As I understand the article, the given formula is a lower bound on the complexity of the problem, meaning that it is impossible to solve the problem using fewer comparisons. This does not mean that it is possible to solve the problem with this many comparisons.

In fact, the statement
Quote:
 (from Wikipedia)This bound is achievable for k=2 but better, more complex bounds exist for larger k.

suggests that it has in fact been proven that for k>2 no algorithm exists that uses only the number of comparisons given by the formula in that article.

If my back-of-the-envelope calculations are correct, an optimal algorithm for k=2 can be obtained using a divide-and-conquer strategy similar to the "Median of medians" algorithm described on the Wikipedia page (it's simpler than that algorithm, in fact). The same strategy should give good algorithms for k>2 as well, but I don't know if they're still optimal.

Try to develop that algorithm yourself, it's a good exercise. If you're stuck, feel free to ask for more ideas.

cu,
Prefect

##### Share on other sites
Which median-of-medianss algorithm are you talking about?

##### Share on other sites
Wikipedia's version is not optimal. Here's my implementation:

int PerfCount = 0;int select_partition(int* A, int L, int R, int PivotIndex){	int PivotValue = A[PivotIndex];	swap(A[PivotIndex], A[R]);	int StoreIndex = L;	for(int i = L; i <= R-1; ++i)	{		++PerfCount;		if(A < PivotValue)		{			swap(A[StoreIndex], A);			++StoreIndex;		}	}	swap(A[R], A[StoreIndex]);	return StoreIndex;}int select(int* A, int K, int L, int R){	while(true)	{		int PivotIndex = (L+R)/2;		int PivotNewIndex = select_partition(A, L, R, PivotIndex);		if(K == PivotNewIndex)		{			return A[K];		}		else if(K < PivotNewIndex)		{			R = PivotNewIndex-1;		}		else		{			L = PivotNewIndex+1;		}	}}

for n = 8, K = 2, PerfCount = 13 > n + log(n) - 2.

• 18
• 11
• 16
• 9
• 50
• ### Forum Statistics

• Total Topics
631396
• Total Posts
2999783
×