Sorting algorithm

This topic is 4871 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Hi Ive some code that look something like this: struct SORT { int index; int value; }; SORT sort[10]; for (int n = 0; n < 10; n++) { sort[n].index = n; sort[n].value = rand()%32; } now I want to order the indices so the sort with greatest value has its index member set to 0, etc.. But I cant find out how to do this. plz help

Share on other sites
Quick Sort will do this for you. The link is to Java source but its easy to understand.

Share on other sites
for large datasets, radixsort is unbeatable.

afaik the only sorting algorithm that runs in linear time, and not hard to implement either.

Share on other sites
Actually radix sort is O(n) only if you allow it to have an O(n²) memory complexity, which can sometimes be impossible (I mean, just imagine what you would need to sort 100.000 integers), and is usually a big drawback on huge datasets.

Otherwise, radix sort usually runs in O( n log N ) where N is the size of the biggest integer, because it reduces the amount of bins by increasing the amount of passes. Sadly, if the biggest integer is n (if, for instance, you're sorting a set with mostly different objects), it will be in O(n log n).

- Quicksort is fast, O(n log n) and has a very tight inner loop (that allows it to be faster than other O(n log n) sorts), but has a O(n²) worst-case complexity. If you know your data are not already sorted or almost sorted, use it.

- Merge sort is theoretically fast, O(n log n), but does not work well with arrays (it cannot be done in place, like the others). On the other hand, it's a good alternative on lists.

- Heap sort is fast as well, O(n log n), can be done in place just like quicksort, and does not suffer from the same worst-case complexity as quicksort. If your data might be almost totally sorted, use it instead of quicksort.

- Bubble sort is slow, O(n²), but is useful if your data is almost sorted (in which case it runs close to O(n), making it the fastest sort algorithm to run on sorted data).

A final word: if using C++, you have sort algorithms available as a standard, default part of the language, such as std::sort in , which sorts an array.

Share on other sites
The above was me... looks like it didn't accept my password.

Share on other sites
I'm fairly certain that radix sort can be done in O(n + k) memory and O(k * n) complexity where n is the number of elements and k is the size of each element (usually in bytes, since that makes the lookup table small). Since k would be a constant per application, it simplifies to O(n) on both accounts.

Share on other sites
Extrarius: yes, k is the number of digits, in base B (in your case, B=256), of N, the largest number. So k = logB N, hence my O(n log N) statement.

However, in the computer world, n is smaller than N, so quicksort, merge sort, and heap sort, all are in O(n log N) (because n log n < n log N), which can all be simplified as O(n) by your reasoning.

By saying that "k is a constant", you're reducing the problem from a general environment to the 64-bit or 32-bit integer environment we work with. And in this environment, log n is also smaller than a constant = O(max pointer size), so all O(n log n) complexities will end up being O(n).

Share on other sites
Quote:
 Original post by ToohrVykExtrarius: yes, k is the number of digits, in base B (in your case, B=256), of N, the largest number. So k = logB N, hence my O(n log N) statement. However, in the computer world, n is smaller than N, so quicksort, merge sort, and heap sort, all are in O(n log N) (because n log n < n log N), which can all be simplified as O(n) by your reasoning.By saying that "k is a constant", you're reducing the problem from a general environment to the 64-bit or 32-bit integer environment we work with. And in this environment, log n is also smaller than a constant = O(max pointer size), so all O(n log n) complexities will end up being O(n).

im sorry but i dont think i get you.

maybe what youre trying to say is that radixsort depends on the actual number of bits to sort, instead of the amount of entries? that would be totally true, yet still linear in terms of number of entries nonetheless. sure, sorting a dataset of 64nit numbers will be twice as slow as a equal sized dataset of 32bit numbers, but as i said: for large datasets linear timecomplexity just cant be beaten.

as far as memory consumption is concerned: its also linear. true its not inplace but you only need one backbuffer plus a couple of bytes for an offset&counttable. not a big deal either, really.

Share on other sites
Assuming you know how to sort then this problem is only slightly differant. Normally you would do your comparision as sort.value>sort[j].value. Here you use sort[sort.index]].value>sort[sort[j].index].value. Assuming i<j then if that condition is true you swap the indexes, i.e. temp=sort.index, sort.index=sort[j].index, sort[j].index=temp. The values stay in the position they started in. The indexes move from row to row. At the end the index stored in the first row is the index of the row with the smallest value.

Share on other sites
Eelco: radix sort is linear in the number of entries. What I want to point out is that O-notation will hide constants away from you, and that in the case of radix sort, that constant is somewhere around log N. This is why in almost all cases, where n < N, I might prefer using a heap sort with O( n log n ) complexity instead of a sort in O( n ) that is actually a hidden O( n log N ).

And how about pushing that reasoning to the extreme? Here:

I coded recently an algorithm that sorts, in place, using a heapsort-like algorithm, a table of 2^32 numbers *at most* (you won't be needing more for most applications). Of course, you can submit smaller datasets.

The sort occurs in O(N log N), where N = 2^32, because that's how heap sort performs on my table. Since the algorithm doesn't depend on the number of entries, it works in O(1).

Yes, I see your point: my algorithm only allows a number of entries smaller than 2^32. Indeed, I did make that approximation here: a pointer is only 32 bits, hence the limited amount of memory. But radix sort does the same approximation: an integer is only 32 bits, hence the limited amount of passes.

Do you understand my point now? (And do you agree?)

[BTW, don't say someone's opinion is wrong if you admitted, in the same post, that you didn't understand it]

Share on other sites
Quote:
 Original post by ToohrVykEelco: radix sort is linear in the number of entries. What I want to point out is that O-notation will hide constants away from you, and that in the case of radix sort, that constant is somewhere around log N. This is why in almost all cases, where n < N, I might prefer using a heap sort with O( n log n ) complexity instead of a sort in O( n ) that is actually a hidden O( n log N ).And how about pushing that reasoning to the extreme? Here:I coded recently an algorithm that sorts, in place, using a heapsort-like algorithm, a table of 2^32 numbers *at most* (you won't be needing more for most applications). Of course, you can submit smaller datasets.The sort occurs in O(N log N), where N = 2^32, because that's how heap sort performs on my table. Since the algorithm doesn't depend on the number of entries, it works in O(1).Yes, I see your point: my algorithm only allows a number of entries smaller than 2^32. Indeed, I did make that approximation here: a pointer is only 32 bits, hence the limited amount of memory. But radix sort does the same approximation: an integer is only 32 bits, hence the limited amount of passes.Do you understand my point now? (And do you agree?)

no, i still dont understand you completely. although i get your point about hidden constants, still, my experience is that they are relatively small in a decent radix, and they are still insiginifcant compared to the extra lon(n) term.

Quote:
 [BTW, don't say someone's opinion is wrong if you admitted, in the same post, that you didn't understand it]

sorry, it was a joke, probably a bad one. however, you are clearly wrong about memory requirments, of that much im entirely sure.

Share on other sites

not algoritmicly, but if your data is already sorted youll have zero cache misses, in other words: muchos speed. other algos will afaik never benefit from the cache unless you do some funky datamanagement to line things up.

Share on other sites
About memory requirements: you have a choice of two complexities for radix sort,

- O(n*B) memory and O(B + n logBN) time (standard radix sort).
- O(n*N) memory and O(n+N) time (bin sort, a special case of radix sort where B = N).

Since n < N, n*N is bigger than n², therefore I was indeed wrong: O(n²) is too optimistic.

Quote:
 no, i still dont understand you completely. although i get your point about hidden constants, still, my experience is that they are relatively small in a decent radix, and they are still insiginifcant compared to the extra log(n) term.

Ah-ha! But since n < N, log(n) is insignificant when compared to log(N) ! J/k

Share on other sites
Yes, but all your cache voodoo will not save you from unsortable floating-point lists! ;)

Share on other sites
You don't need to reimplement the sorting algo. STL already has some.

std::stable_sort( sort, sort+10, std::greater<int> );

It uses merge sort algorithm, and as ToohrVyk has said it, it has O(n log n) efficiency which is as fast as a sorting algorithm can get. And it's stable. The bad thing about merge sort and array, the algorithm requires about log n extra memory. But radix sort, which is O(n) and can only be used to sort integers and yours is a struct, also requires a lot of extra memory. So I recommend merge sort. Remember to overload operator < first.

std::sort( sort, sort+10, std::greater<int> );

Uses quick sort algorithm. Average case is O(n log n). Worst case is O(n2). Unstable.

Or you may want to use introsort, the best sort in most cases, a combination of quick sort and heap sort. STL doesn't have it, so you probably need to do some research first.

Share on other sites
ToohrVyk: Radix sort is linear on number of entries, but the other sorts you mention are not because of the log(n) term they require. The difference is that radix sort is ALSO linear on the size of each individual element, while with other sorts the element sizes don't matter at all (algorithmically). Thus, radix sort reduces correctly to O(n) in both memory and time requirements, while other sorts remain O(n log(n)) as all comparison sorts must.

Radix sort is limited, however in that it can only sort things with a simple mapping from the individual blocks of the element to its value (such as can be done with numeric data types, but cannot be done with polymorphic classes for example). Really though, the whole element doesn't need to meet this requirement, just the 'key' part that is used for the sorting. Any data you want can follow it's key along for the ride.

Share on other sites
Quote:
 Original post by ToohrVykAbout memory requirements: you have a choice of two complexities for radix sort, - O(n*B) memory and O(B + n logBN) time (standard radix sort).- O(n*N) memory and O(n+N) time (bin sort, a special case of radix sort where B = N).Since n < N, n*N is bigger than n², therefore I was indeed wrong: O(n²) is too optimistic.

too optimisitc?

the only memeory required is simply:

one buffer to copy the data to: linear and insignificant for the reason that any realtime sortable dataset will always be << available memory.
+
2 lookuptables with memory requirement that were already laugable in the 80's

Share on other sites
Quote:
 Original post by ToohrVykYes, but all your cache voodoo will not save you from unsortable floating-point lists! ;)

again we differ.

if youre interested i could post my radix float sort for you. but i didnt invent it myself so you could google it aswell. its only a few % slower than integers.

Share on other sites
Quote:
 Original post by ToohrVykYes, but all your cache voodoo will not save you from unsortable floating-point lists! ;)
What do you mean? Floats can be sorted exactly as integers. Internally, the integer representation of a float is ordered in the same way as the floats themselves are, thus you can simply apply radix sort to the integer representation of floats to sort them.

Share on other sites
Right, my float example was completely off.

Share on other sites
Quote:
 Original post by Eelcotoo optimisitc?the only memeory required is simply:one buffer to copy the data to: linear and insignificant for the reason that any realtime sortable dataset will always be << available memory.+2 lookuptables with memory requirement that were already laugable in the 80's

Yes, that could have been me being dumb, too.

Share on other sites
for (int n = 0; n < 10; n++)
{
for (int m = n+1; m < 10; m++)
{
if (data[data[n].nIndex].nValue < data[data[m].nIndex].nValue)
{
index = data[n].nIndex;
data[n].nIndex = data[m].nIndex;
data[m].nIndex = index;
}
}
}

This didnt work, why?

Share on other sites
Well, your comparision is backwards assuming you want ascending order. More likely it is because your values are not in order. They don't move, just the indices move. So to actually display the list in order you have to use data[data[n].nIndex].nValue.

Share on other sites
Plus, it should be:

for (int n = 0; n < (10 - 1); n ++)
{
for (int m = n; m < (10 - 1) - n; m ++)
{

at the beginning. But that's about the slowest sort you can do, I would use one of the ones already suggested in the thread if you plan to sort much more....

Share on other sites
It will max be 32