Sign in to follow this  

which sorting algorithm

This topic is 3567 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a large array (somewhere around 64K) with the majority of the elements already sorted. Am I right in assuming that I should use insertion sort to sort the array? Which alternatives are there?

Share this post


Link to post
Share on other sites
One alternative is to use whatever generic sorting functionality your programming language provides (generally quick sort or introsort) and then switching algorithms if the profiler indicates that sorting is actually a bottleneck for your program.

Share this post


Link to post
Share on other sites
That might be a problem because I need deterministic sorting behaviour with equal elements. Since nonstable sorting algorithms might change implementation from language version to language version, I can't rely on quicksort for example.

Share this post


Link to post
Share on other sites
For already sorted or mostly sorted arrays insertionsort is quite efficient (O(n) best case, which is better than quicksort or other more complex sorting algorithms). But this depends on how sorted your array is.

Share this post


Link to post
Share on other sites
If you need a stable sort, many languages also supply a stable sorting algorithm. In C++, for example, there's the aptly named std::stable_sort() function.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
If you need a stable sort, many languages also supply a stable sorting algorithm. In C++, for example, there's the aptly named std::stable_sort() function.

I use Java. The problem is: there is one level of indirection in my data. The array consists of indices into another array. So I can't use java.util.Arrays.sort(array).

Of course there is java.util.Arrays.sort(array, comparator), but that only works with elements of reference types, and I don't want to copy my int[] into a Integer[] first.

I have no problem with implementing the algorithm myself, I would just like to know if there are smarter ways then insertion sort given that ~99% of the array is already sorted.

Plus I wonder if a binary search makes that much of a difference - it seems as if the majority of time is will be spent moving the elements to make room for the one you want to insert.

[Edited by - DevFred on March 2, 2008 11:14:01 AM]

Share this post


Link to post
Share on other sites
I'm currently viewing my problem from a completely different perspective, and I have found a "hack" to use a nonstable sorting algorithms to behave in a stable way that I want to share with you.

requirement: log(number of elements) + log(range of values) < 32 to fit into an integer or < 64 to fit into a long

In one of my subproblems, I need to aquire a sorted array of indices into an unsorted array of 256 elements, each of which can be one of 65536 values. Since log(256)+log(65536) = 8+16 = 24 I can use integers.

description of desired behavior
-----------------------------
input: weight is an unsorted array of integers
output: index is an array of integers so that
weight[index[i]] < weight[index[i+1]] OR
weight[index[i]] == weight[index[i+1]] AND index[i] < index[i+1]

description of algorithm
----------------------
1. preprocess the data by shifting the values 8 bits to the left and add the index
2. invoke nonstable sorting algorithm, i.e. quicksort
3. postprocess the sorted data by extracting the index

java implementation
-------------------


public static int[] generateSortedIndices(int[] weight, int bits)
{
final int size = weight.length;
assert size <= 1 << bits;

int[] index = new int[size];
for (int i = 0; i < size; ++i)
{
index[i] = (weight[i] << bits) | i;
}

java.util.Arrays.sort(index);

final int mask = (1 << bits) - 1;
for (int i = 0; i < size; ++i)
{
index[i] &= mask;
}
return index;
}



The basic idea to this "hack" is that equal weights will be treated nonequally by the sorting algorithm due to the index hiding in the lower 8 bits ;)

[Edited by - DevFred on March 2, 2008 4:57:11 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
The basic idea to this "hack" is that equal weights will be treated nonequally by the sorting algorithm due to the index hiding in the lower 8 bits ;)


Not a bad idea, however for larger datasets the 2 additional passes over the data could overshadow the savings from using a non-stable sort.

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
I have a large array (somewhere around 64K) with the majority of the elements already sorted. Am I right in assuming that I should use insertion sort to sort the array? Which alternatives are there?

Insert them to the proper initially, or reposition them when you modify them. There are several sorted containers that will do this for you.

Also, 64K is a fairly easy to handle list unless you left off some very important details. Are you sure this is an issue? How do you know?

Share this post


Link to post
Share on other sites
If you want to find lots of different sorting algorithms, then you probably wont find more than I've implemented
here.

Possible candidates for common stable array sorting algorithms that are about O(n) on nearly sorted data include:

Improved BubbleSort,
ShakerSort,
InsertionSort,
StrandSort,
SplayTreeSort,
RadixSort.

You can find C++ implementations of all of these by following the above link.

dmatter: Merge Sort can be done in-place without O(n) extra space. The stable_sort in VS does this.

Share this post


Link to post
Share on other sites
Hm, "the majority of the elements already sorted" wasn't precise enough :)

The first 256 elements are unsorted, and the following 65280 elements are sorted in descending order. I ended up sorting only the first 256 elements with my "stable quicksort" in ascending order and then merging the two sorted lists into a new list.

Since the first part is sorted in ascending order and the second part is sorted in descending order, the merging can be done quite nicely (no need to check if one of the lists is already completely processed).

Share this post


Link to post
Share on other sites
According to this website:
http://warp.povusers.org/SortComparison/
The fastest sorting algorithm for almost sorted arrays (beginning or end of the array is unsorted) is Shell Sort! :)

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
Hm, "the majority of the elements already sorted" wasn't precise enough :)

The first 256 elements are unsorted, and the following 65280 elements are sorted in descending order. I ended up sorting only the first 256 elements with my "stable quicksort" in ascending order and then merging the two sorted lists into a new list.

Since the first part is sorted in ascending order and the second part is sorted in descending order, the merging can be done quite nicely (no need to check if one of the lists is already completely processed).
That's known as a bitonic merge, which as you've noticed can be done easily enough without using extra space.

Note, I did not suggest shellsort because it is not stable. It could have been used, but be weary of what guardian24 just mentioned. ShellSort is reasonably good for reasonably sorted data, but it wont beat plain InsertionSort for very close to already sorted data. The reasons is that ShellSort does far more comparisons in that case.

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
The first 256 elements are unsorted, and the following 65280 elements are sorted in descending order. I ended up sorting only the first 256 elements with my "stable quicksort" in ascending order and then merging the two sorted lists into a new list.

If they're already in two lists, and you want to keep the first list sorted, why are you not using a sorted list in the first place?

Insertions involve an O(log n) search repeated for m elements to be inserted. (Any of binary_search, lower_bound, or upper_bound used with insert.) That's much faster than adding your 256 elements and then sorting. It is roughly the same computation work to sort your 256 elements and then doing an insertion sort, but much less programmer work.

Share this post


Link to post
Share on other sites
Quote:
Original post by frob
Quote:
Original post by DevFred
The first 256 elements are unsorted, and the following 65280 elements are sorted in descending order. I ended up sorting only the first 256 elements with my "stable quicksort" in ascending order and then merging the two sorted lists into a new list.

If they're already in two lists, and you want to keep the first list sorted, why are you not using a sorted list in the first place?

Insertions involve an O(log n) search repeated for m elements to be inserted. (Any of binary_search, lower_bound, or upper_bound used with insert.) That's much faster than adding your 256 elements and then sorting. It is roughly the same computation work to sort your 256 elements and then doing an insertion sort, but much less programmer work.


You forgot to count the cost of an insertion.

I believe the only data structure that offers both an O(log n) search and an O(1) insertion is a tree.

Certainly there is no "list" data structure that offers both. A linear sorted array offers O(log n) search but O(n) insertion, while a linked list offers an O(n) search and an O(1) inserion.

Neither of these (obviously) resolve down to something as good as an O(n log n) sort. The former resolves to O(n^2 log n) and the later to O(n^2) .. neither is as good as you thought, right?

I realize that some linked list methods offer better than an O(n) average search while still offering an O(~1) insertion, but implimenting one of those seems like a lot of work when the theoretical best it can do is infact O(n log n) just like every decent comparison based sort.

Additionally, linked lists and trees are generally not cache friendly. Cache-oblivious implimentations are often quite complex and are relatively "new" so you cannot expect any help from a compilers standard library.

Seriously.. his best bet is to load those 256 items into an array and then sort it using a solid O(n log n), followed by a bitonic merge of his two lists.

Share this post


Link to post
Share on other sites
[quote]Original post by Rockoon1
Quote:
Original post by frob
You forgot to count the cost of an insertion.

I believe the only data structure that offers both an O(log n) search and an O(1) insertion is a tree.

Certainly there is no "list" data structure that offers both. A linear sorted array offers O(log n) search but O(n) insertion, while a linked list offers an O(n) search and an O(1) inserion.

There are several binary search algorithms for previously sorted containers.


std::binary_search()
Complexity: At most log(last - first) + 2 comparisons.

If you are trying to get a stable sort, you can use
std::upper_bound(), or std::lower_bound()
Both are At most log(last - first) + 1 comparisons.

All three return an iterator suitable for insert().

Coupled with a std::list<>, amortized performance is as I described above.

Share this post


Link to post
Share on other sites
Quote:
Original post by Rockoon1
You forgot to count the cost of an insertion.

I believe the only data structure that offers both an O(log n) search and an O(1) insertion is a tree.

Certainly there is no "list" data structure that offers both. A linear sorted array offers O(log n) search but O(n) insertion, while a linked list offers an O(n) search and an O(1) inserion.

Hmm, IIRC a std::deque typically offers O(~log n) search and O(~1) insertion - don't quote me on that though, I don't know what the actual complexity guarantees of the standard say about this.

Share this post


Link to post
Share on other sites
Quote:
Original post by frob
There are several binary search algorithms for previously sorted containers.


std::binary_search()
Complexity: At most log(last - first) + 2 comparisons.

If you are trying to get a stable sort, you can use
std::upper_bound(), or std::lower_bound()
Both are At most log(last - first) + 1 comparisons.

All three return an iterator suitable for insert().

Coupled with a std::list<>, amortized performance is as I described above.


You can't do a binary search on a linked list, and any implementation of lower_bound or upper_bound for a linked list will be O(N). The only way to efficiently insert the elements one at a time is if you're using a tree or similiar structure (like a skiplist). Linked lists are inefficient for searching and arrays are inefficient for inserting, so in either case merging is more efficient.

Quote:
Original post by dmatter
Hmm, IIRC a std::deque typically offers O(~log n) search and O(~1) insertion - don't quote me on that though, I don't know what the actual complexity guarantees of the standard say about this.


A deque offers O(1) insertion only at the beginning and end. Insertions at random locations are O(N).

Share this post


Link to post
Share on other sites
Quote:
Original post by Vorpy
Quote:
Original post by dmatter
Hmm, IIRC a std::deque typically offers O(~log n) search and O(~1) insertion - don't quote me on that though, I don't know what the actual complexity guarantees of the standard say about this.


A deque offers O(1) insertion only at the beginning and end. Insertions at random locations are O(N).

Yeah that's what I meant.
Having now applied a few more brain cells to this, it occurs to me that Rockoon1 probably meant insertions at arbitrary locations. [rolleyes]

Share this post


Link to post
Share on other sites
Quote:
Original post by dmatter
Having now applied a few more brain cells to this, it occurs to me that Rockoon1 probably meant insertions at arbitrary locations. [rolleyes]


Well I dont know what other kind of insertion could have been implied given the topic. The reason for the list search is to find the insertion point.

The strategy itself is sound. Use a data struture that offers O(log N) search and O(1) insertion .. the problem I noted was that the proposed data structure (a list) does not offer those properties (AFAIK, certainly not using what is traditionally considered a list)

..however a tree does offer those properties and its well known technique used to accumulate unordered data into ordered form.

In my eyes the issue is whether or not a tree can be leveraged in a way which is superior to a sort in this context, and I do not believe that to be so. If it were superior in this case then a binary search tree would most certainly be considered THE way to sort data, yet it isnt. Its perfect when your primary operations are searching and inserting, but not when your primary operations also include merging and iterating.

Share this post


Link to post
Share on other sites

This topic is 3567 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this