Sorting algorithm.

This topic is 4667 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Basically, this is a non-inplace sorting algorithm designed to replace quicksort on some datasets. (not sure about what its good at yet.) First, take N records from the data that your sorting, and sort them. (using bubble/insertion sort). From those, use binary search to figure out where the first element goes in the second array. Now, get the last element from the array, and use binary search (using the first element as min), to figure out where it goes. Now, for every other element, use min and max values, which get shifted inwards every time one record finished. Once thats done, you get another N records and start again. Now, how does this fare? As an egsample, take the values 1 2 5 10 15 4 13 41 23 24 12 0 Now, were going 5 elements at a time. First 5 are: 1 2 5 10 15 There sorted in order. We then insert them in order making the secondary list 1 2 5 10 15 The next 5 are 4 13 41 23 24 Which, when sorted makes it 4 13 23 24 41 Which are then inserted into the second list as 1 2 4 5 10 13 15 23 24 41 The last two values are 12 0 Which sorted are 0, 12 and when there inserted into the list, they make 0 1 2 4 5 10 12 13 15 23 24 41 Which is the origional list, sorted. Now, for the min/max bit. Say your inserting 4 13 23 24 41 into 1 2 4 5 10 15 You go for the fist one. value 4, check 3, too big, check 2, too small, insert at location 3. Now, you add the last element which is 41 to the dataset. You set min to 3, max to 6, so you test #5. its too big for #5, and \$6, so you insert it in #7. 1 2 4 5 10 15 41 So now, you have bounds. (which gets smaller every time.) You then insert 13 then 23 then 24, that thats it. The next one needs to be further down still. Ect. How would this compare to other algos? From, Nice coder

Share on other sites
Well, making a gross analysis I get the following for sorting a list of N items in chunks of M items.

- do N/M times
- sort M items
- traverse a list of say N/2 items
- insert M items

As for the sorting, you suggested something like insertion sort (funny, a sorting algorithm using another sorting algorithm). The time complexity of insertion sort is O(n^2) worst case. The list traversal is linear, say O(N/2). And assuming insertion is constant time O(1), which it is probably not in vectors or lists, which is why other sorting algorithms use swapping, the complexity lies in the sub-sorting algorithm.

Hence it comes down to O(N/M * M^2) or basically O(NM). Since quicksort has an expected complexity of O(N log N) it generally beats your algorithm. Worst case, however, quicksort still does O(N^2) in which case you might be faster. However, that would require a more precise analysis which may turn out to prove something different, as this was something I thought up quickly. Moreover, the insertion and the all over implementational complexity as well as the not-in-placeness of your algorithm make it improbable for most uses, I think.

Greetz,

Ill.

Share on other sites
Yep, what definitely kills your performance is the fact that insertions into the middle of a vector are a linear-time operation. Even if the insertions were constant-time, the whole algorithm would still be average-case O(n²), because for each iteration, most of the resulting vector has to be traversed to find a place for each new element. Finding the min/max bounds won't help much on average case, it just decreases the hidden constant factor a bit (with some inputs it might even increase the factor!).

The regular quicksort is a very good choice for sorting smallish inputs. It has a low constant factor, and the worst-case quadratic behaviour doesn't matter much when inputs are small.

Share on other sites
@Nice Coder:
If you apply it recursively it is mergesort.

Share on other sites
Yeah, this looks to me like an attempt at a non-recursive mergesort implementation, building things from the bottom-up. It's definitely doable and may even be a good idea...

Share on other sites
What you're trying to describe bares some sililiarity to MergeSort, JSort, and StrandSort.
Probably JSort is the closest.

Also, example is spelt with an "x", not a "gs". I've been meaning to tell you that [smile].

Share on other sites
Ok. I thought that insertions are going to be the downside of this.

:(

ow well.

Another idea (o log2 n Fast Constant time) algorithm:

Start at the bottom of the array.Set the bottom element as the working element.Set the upper and lower second bounds to the bounds of the array.If the two bounds are close together then:       Sort using bubblesort, insertionsort, or similar       returnend ifWhile (the second bounds of the array do not cross) {if the working element has the current bit set then       Swap the working element, for the element at the upper second bound.       Lower the upper second bound by one.else       Increment the lower second bound by one.       Set the working element to be the element at the lower second bound.end if}Divide the current bit (as an int), by two.If this bit is 0, then return.Else:Call this function again, with the bounds being from the origional lower bound(low bound) to the current lower bound (heigh bound)Call this function again, with the bounds being the current heigher bound(as the lower bound), and the origional heigher bound as the heigh bound.Return

Its currently recursive, but that shouldn't be a problem should it?

From,
Nice coder

Share on other sites
Quote:
 Original post by iMalcWhat you're trying to describe bares some sililiarity to MergeSort, JSort, and StrandSort.Probably JSort is the closest.Also, example is spelt with an "x", not a "gs". I've been meaning to tell you that [smile].

I'm still in the habbit of using egsample. It just seems more natural to use eg. (considering that the contraction is e.g.)

For egsample, for eg. for example, for eg? (australian spelling, remember. We use colour instead of color.)

From,
Nice coer

Share on other sites
definition, but not super helpful on the sorting topic...sorry.

Share on other sites
Quote:
 Original post by Nice CoderI'm still in the habbit of using egsample. It just seems more natural to use eg. (considering that the contraction is e.g.)

It's latin: "Exempli Gratia", meaning "for example".

i.e. = "id est", "that is"

et al. = "et alia", "and others (people)"

etc. = "et caetera", "and others (things)"

Share on other sites
Quote:
Original post by Fruny
Quote:
 Original post by Nice CoderI'm still in the habbit of using egsample. It just seems more natural to use eg. (considering that the contraction is e.g.)

It's latin: "Exempli Gratia", meaning "for example".

i.e. = "id est", "that is"

et al. = "et alia", "and others (people)"

etc. = "et caetera", "and others (things)"

Very nifty.

I'll try to change over to example.

From,
Nice coder

Share on other sites
Quote:
 Original post by lonesockdefinition, but not super helpful on the sorting topic...sorry.

Ok....

Either way, any viewpoint is a good one. (just what do you think of it?
does it seem hard to code, have hidden costs, ect, ?)

From,
Nice coder

Share on other sites
If you're doing not-in-place sorting, rather than doing insertions into a vector (which can lead to horrible performance), use a binary tree, more specifically, a self balancing binary tree like the red-black tree. So, the algorithm becomes:
allocate memory for tree // you can build a tree in an arrayfor each element in array to sort    add element to self balancing treenextfor each element in tree    add element to end of vectornextdelete memory for tree

which gives you one allocation / deallocation, two copies per element (which can be very expensive depending on the class) plus O (log2 n), n = current element index (a summation of 0 <= n < number of elements). You can save a whole bunch of time by sorting an array of pointers to objects rather than objects if the object is greater in size than sizeof (int).

Probably not as fast as qsort (almost certainly slower if sorting pointers to objects).

Skizz

Share on other sites
Sounds now kinda like NC is describing radix sort or forward radix sort (Bitwise Sort).
Skizz is describing red-black-tree sort.

btw, For arrays I have templated C++ implementations of the following sorting algorithms, many of which are known by other names, and some of which are my own creation. In approximate order of best to worst Big-O-notation.

RadixSort, BitwiseSort, IntroSort, QuickSort, MeanSort, ProportionExtendSort, FlashSort, SmoothSort, FibonacciHeapSort, WeakHeapSort, HeapSort, BreadthFirstMergeSort, MergeSort, ShellSort, HybridCombSort, CombSort, AVLSort, RedBlackSort, AASort, SplaySort, BinaryTreeSort, BreadthBitonicSort, SemiBreadthBitonicSort, BitonicSort, ShearSort, JSort, InPlaceBreadthMergeSort, InPlaceMergeSort, ShellMergeSort, BidirectionalStrandSort, StrandSort, DualInsertionSort, InterpInsertionSort, BinaryInsertionSort, InsertionSort, DualSelectionSort, SelectionSort, BingoSort, SeveralUniqueSort, ShakerSort, ElevatorSort, BubbleSort, SimpleSort, OddEvenSort, GnomeSort, StupidSort, ExactSort, StableBozoSort, StoogeSort, FibSort, PermSort, BozoSort, and BogoSort.

All thoroughly tested of course, complete with asserts and unit tests, and a sorting demo program. I will release the code some day.
I know of a few that I haven't done yet (like telephone sort), and am always looking out for more. I'm not seeing anything new, worth implementing, here yet though.

Or if you are talking about list sorting, I could name the algorithms I have for that, etc. You could say I got a bit carried away with sorting...

So anyway, if you have any questions...

Share on other sites
Quote:
 Original post by iMalc... and BogoSort.

You've forgotten the linear-time Quantum Bogosort: randomize the data (constant-time quantum operation), and if it not sorted (linear-time check, maybe even constant-time), destroy the universe (constant-time).

Share on other sites
Quote:
 Original post by Skizzallocate memory for tree // you can build a tree in an arrayfor each element in array to sort add element to self balancing treenextfor each element in tree add element to end of vectornextdelete memory for treeSkizz

Sounds alot like a heap sort to me, which has best, average, and worst complexity always the same as qsort's average. It's basically what you have above, but with a heap ( priority queue ) instread of the RB tree, since all you care is that you can get the biggest element quickly, not that everything is sorted.

Share on other sites
Yes. It seems quite a bit like heapsort to me too.

But its a good idea.

From,
Nice coder

Share on other sites
What do you think about tthe algo?

From,
Nice coder

...

Share on other sites
It seems to beat quicksort.

(ts both in place, it does the same number of ops/node, and it runs in o(log2 n) max time. (Ie, most of the time, it hits a point where it switches to insertionsort.)

From,
Nice coder

Share on other sites
Quote:
 Original post by Nice CoderIt seems to beat quicksort.(ts both in place, it does the same number of ops/node, and it runs in o(log2 n) max time. (Ie, most of the time, it hits a point where it switches to insertionsort.)From,Nice coder

O( log2(n) )? Really? I think you'd get a prize, since you're sorting without even looking at some of them :P

iirc, quicksort is n, nlogn, n*n ( best, average, worst ) and heapsort is nlogn for all 3. If you want O( m*n ) always, try radix sort, although that one's /totally/ different conceptually...

Share on other sites
Sorry. n * log2(n) wost case.

Best case i don't know how to compute. (basically, it will always be as good or better then qyuicksort).

Why?

It groups from msb to lsb.
Once you hit a group, which is small enough (group == It shares the same msb's), something like insertionsort and bubblesort will beaat anything else.

And it just so happens that it does group all of the values together, and it does use insertionsort if the group is small enough.

From,
Nice coder

Share on other sites
Quote:
 Original post by me22iirc, quicksort is n, nlogn, n*n ( best, average, worst ) and heapsort is nlogn for all 3. If you want O( m*n ) always, try radix sort, although that one's /totally/ different conceptually...
Thanks for pointing out to him that no sorting algorithm can be better than O(n).

However, I'd like to point out to you that even QuickSort's best case is still O(nlogn), which occurs when the splitter chosen is exactly the median value of each sub-array.

Still, it is unlikely that NC is thinking of a sorting algorithm that's O(nlogn), yet is faster than quicksort on average. Implement it and you will probably see that the claim is wrong.

Share on other sites
Imalc. I think you misunderstand me.

Heres an egsapmle.

You have 2^32 elements that you want to sort. (there random, of cource).

So, at iteration 1, its at two lists, each one has 2^31 elements in them.
At iteration 2, there are 4 lists, each with 2^30 elements in them.
....
At iteration 25 (this is 8 iterations before quicksort ends).
There are 2^24 llists of 255 elements each.
2^24 is in the millions, so your doing a few million sorts for 255 elements.
you can sort 255 elements very very quickly, so it makes up the speed saving.

In compairon, quicksort would need to do a whole lot of work. (subdividing each list 8 times, copying eveything 8 times, before doing the rest of it.).

See?

Also, this is o(n log2 n) worst case. (assuming you didn´t do the final sort. I don´t know how to make two different o´s work in the same expression tho...)

Quicksort is o(n log2 n) best case. And somewhere near that for the average case.

See my point?

From,
Nice coder

Share on other sites
Also, (which its good to note)., that per node you only need:

One comparison and a swap. (comparison, to see wether or not it has the heighest power. See if it is larger then the power. If it is, then it must have it. Nifty eh [lol])

Also note: Each element per iteration is moved exactly once.
(compare to quicksort.)

Also consider that data is only being read and written to from two locations, which move sequentially.

With some loop unrolling, this is rather nice. (Two checks per how many operations to see if it needs to prefetch another block).

As well, you can have an explicit stack, which means that its simply three subtractions, and two memory acesses to get the memory bounds needed.

And now you see why i like it... (if there are a lot of small groups of things, this might only need a couple of iterations before it goes and sorts. However if there are a lot of very similar/same records, then its worst case performence is the same as quicksorts. o(n log2 n).)

From,
Nice coder