Quicksort Pivot Selection

Started by
16 comments, last by Novark 14 years, 1 month ago
Quote:Original post by iMalc
Quote:Original post by implicit
Quote:Original post by Martin
Quote:Original post by Adam_42
If I remember right, you can eliminate the chance of a stack overflow very simply - always recurse into the smallest partition, and then split the bigger one and repeat.

Thats a good optimisation yes, it doesn't eliminate the chance of stack overflow, but it does significantly reduce stack usage, i.e. it will take a much much larger data set to generate a stack overflow.
I wouldn't worry about that, in practice you'll run out of address space *long* before causing a stack overflow (e.g. one stack frame per address bit at worst.)
There are certainly exceptions, such as microcontrollers with tiny fixed-depth stacks or possible, but hopefully no one anyone writing software for one of those will know to avoid recursion.

Since it reduces the stack usage to O(log n) it's not so much a case of requiring a significantly larger data set to cause stack overflow. Multiplying the number of items by 1000 would require bugger all extra recursive calls.
It's more a case of only overflowing the stack if it was already extremely close to reaching the limit already at the point of the call to perform the sort.

Practically, the optimisation mentioned does eliminate the liklihood of blowing the stack, as unless the stack was already extremely close to being exhausted, even being asked to sort 4 billion items in the pathologically worst ordering, wont be a problem.
Quicksort stack frames are fairly small in typical implementations.
Well, yeah... Wasn't that what I said?
Advertisement
Quote:Original post by implicit
Quote:Original post by iMalc
Quote:Original post by implicit
Quote:Original post by Martin
Quote:Original post by Adam_42
If I remember right, you can eliminate the chance of a stack overflow very simply - always recurse into the smallest partition, and then split the bigger one and repeat.

Thats a good optimisation yes, it doesn't eliminate the chance of stack overflow, but it does significantly reduce stack usage, i.e. it will take a much much larger data set to generate a stack overflow.
I wouldn't worry about that, in practice you'll run out of address space *long* before causing a stack overflow (e.g. one stack frame per address bit at worst.)
There are certainly exceptions, such as microcontrollers with tiny fixed-depth stacks or possible, but hopefully no one anyone writing software for one of those will know to avoid recursion.

Since it reduces the stack usage to O(log n) it's not so much a case of requiring a significantly larger data set to cause stack overflow. Multiplying the number of items by 1000 would require bugger all extra recursive calls.
It's more a case of only overflowing the stack if it was already extremely close to reaching the limit already at the point of the call to perform the sort.

Practically, the optimisation mentioned does eliminate the liklihood of blowing the stack, as unless the stack was already extremely close to being exhausted, even being asked to sort 4 billion items in the pathologically worst ordering, wont be a problem.
Quicksort stack frames are fairly small in typical implementations.
Well, yeah... Wasn't that what I said?


Yeah. Sorry I wasn't disagreeing with you. I was disagreing with the other guy and quoting you for support.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:Original post by iMalc
Yeah. Sorry I wasn't disagreeing with you. I was disagreing with the other guy and quoting you for support.
Whoa... You can do that? ;)
Quote:Original post by iMalc

Practically, the optimisation mentioned does eliminate the liklihood of blowing the stack, as unless the stack was already extremely close to being exhausted, even being asked to sort 4 billion items in the pathologically worst ordering, wont be a problem.
Quicksort stack frames are fairly small in typical implementations.


When designing such algorithms, there are two cases:
- You need absolute, precise-to-one guarantee, proven and verifiable
- Everything else

"Chance" is a very bad thing if reliability matters. Entire Java ecosystem, especially anything open source in there is designed around that. Which means you can almost bet that any of such solution will blow up hard when pressed just slightly, or when coming close to memory bounds.

Practice however has shown that very few actually need reliable algorithms (aka will pay for them), at which point anything goes, even bubble sort is actually a valid solution.

From business perspective - bubble sort (or more realistically, heap sort) is better - since it is deterministic and does not rely on "chance". Even probabilistic algorithms can have their bounds analyzed if needed. Otherwise, chance means "hopefully the machine is big enough, but we don't know or care how to test that" - which it frequently is.

This doesn't apply to QS (it's too dead horse like problem) but to general pragmatic approach to application design.

As far as QS goes, it is possible to determine the upper bound on stack, and implement it manually, or fail predictably if there is not enough memory. It doesn't necessarily solve implicit stack smashing, but neither do other function calls (if needed, they can be validated by making platform-specific assumptions and using benevolent hacks to assert them), but it does give a fail fast version which at very least covers the algorithm part.
Thanks for all the replies everyone,

I'll take a look at some of the suggested methods!
Quote:Original post by implicit
I wouldn't worry about that, in practice you'll run out of address space *long* before causing a stack overflow (e.g. one stack frame per address bit at worst.)
There are certainly exceptions, such as microcontrollers with tiny fixed-depth stacks or possible, but hopefully no one anyone writing software for one of those will know to avoid recursion.


Humm, this is true, challenging my own assumptions. The worst case data set you could provide (from the point of view of generating the most recursions) would be one which always split the data set exactly in two, if the split was uneven the smaller set would be dealt with recursively -> less recursion.

So for a 64 bit address space, sorting bytes, we have a maximum of 2^64-1 bytes so sort, well, you can see immediately, the maximum recursion depth is 64-1.

[Edited by - Martin on March 5, 2010 4:40:41 AM]
Cheers,MartinIf I've helped you, a rating++ would be appreciated
Quote:Original post by Martin
So for a 64 bit address space, sorting bytes, we have a maximum of 2^64-1 bytes so sort, well, you can see immediately, the maximum recursion depth is 64-1.
Yup, you nailed it.

Novark, perhaps you missed my question earlier. Is this for an array or a linked-list?
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:Novark, perhaps you missed my question earlier. Is this for an array or a linked-list?


I'm using arrays in my implementation, but feel free to share your method using linked lists if you want!

This topic is closed to new replies.

Advertisement