which sorting algorithm

Started by
19 comments, last by Rockoon1 16 years, 1 month ago
I have a large array (somewhere around 64K) with the majority of the elements already sorted. Am I right in assuming that I should use insertion sort to sort the array? Which alternatives are there?
Advertisement
One alternative is to use whatever generic sorting functionality your programming language provides (generally quick sort or introsort) and then switching algorithms if the profiler indicates that sorting is actually a bottleneck for your program.
That might be a problem because I need deterministic sorting behaviour with equal elements. Since nonstable sorting algorithms might change implementation from language version to language version, I can't rely on quicksort for example.
For already sorted or mostly sorted arrays insertionsort is quite efficient (O(n) best case, which is better than quicksort or other more complex sorting algorithms). But this depends on how sorted your array is.
If you need a stable sort, many languages also supply a stable sorting algorithm. In C++, for example, there's the aptly named std::stable_sort() function.
Quote:Original post by SiCrane
If you need a stable sort, many languages also supply a stable sorting algorithm. In C++, for example, there's the aptly named std::stable_sort() function.

I use Java. The problem is: there is one level of indirection in my data. The array consists of indices into another array. So I can't use java.util.Arrays.sort(array).

Of course there is java.util.Arrays.sort(array, comparator), but that only works with elements of reference types, and I don't want to copy my int[] into a Integer[] first.

I have no problem with implementing the algorithm myself, I would just like to know if there are smarter ways then insertion sort given that ~99% of the array is already sorted.

Plus I wonder if a binary search makes that much of a difference - it seems as if the majority of time is will be spent moving the elements to make room for the one you want to insert.

[Edited by - DevFred on March 2, 2008 11:14:01 AM]
I've heard good things about the Natural Merge Sort, which is or can be stable and has near O(n) complexity on partially sorted inputs.
It surely requires O(n) auxiliary space though.
I'm currently viewing my problem from a completely different perspective, and I have found a "hack" to use a nonstable sorting algorithms to behave in a stable way that I want to share with you.

requirement: log(number of elements) + log(range of values) < 32 to fit into an integer or < 64 to fit into a long

In one of my subproblems, I need to aquire a sorted array of indices into an unsorted array of 256 elements, each of which can be one of 65536 values. Since log(256)+log(65536) = 8+16 = 24 I can use integers.

description of desired behavior
-----------------------------
input: weight is an unsorted array of integers
output: index is an array of integers so that
weight[index] < weight[index[i+1]] OR
weight[index] == weight[index[i+1]] AND index < index[i+1]

description of algorithm
----------------------
1. preprocess the data by shifting the values 8 bits to the left and add the index
2. invoke nonstable sorting algorithm, i.e. quicksort
3. postprocess the sorted data by extracting the index

java implementation
-------------------

public static int[] generateSortedIndices(int[] weight, int bits){	final int size = weight.length;	assert size <= 1 << bits;	int[] index = new int[size];	for (int i = 0; i < size; ++i)	{		index = (weight << bits) | i;	}	java.util.Arrays.sort(index);	final int mask = (1 << bits) - 1; 	for (int i = 0; i < size; ++i)	{		index &= mask;	}	return index;}


The basic idea to this "hack" is that equal weights will be treated nonequally by the sorting algorithm due to the index hiding in the lower 8 bits ;)

[Edited by - DevFred on March 2, 2008 4:57:11 PM]
Quote:Original post by DevFred
The basic idea to this "hack" is that equal weights will be treated nonequally by the sorting algorithm due to the index hiding in the lower 8 bits ;)


Not a bad idea, however for larger datasets the 2 additional passes over the data could overshadow the savings from using a non-stable sort.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Quote:Original post by DevFred
I have a large array (somewhere around 64K) with the majority of the elements already sorted. Am I right in assuming that I should use insertion sort to sort the array? Which alternatives are there?

Insert them to the proper initially, or reposition them when you modify them. There are several sorted containers that will do this for you.

Also, 64K is a fairly easy to handle list unless you left off some very important details. Are you sure this is an issue? How do you know?

This topic is closed to new replies.

Advertisement