Sorting objects and parallelism

Started by
13 comments, last by Skiller 15 years, 5 months ago
Quote:
Shhh, don't give away the ending.

Good he did because I'm not willing to correct this algorithm :P. And actually it works but not for 10k elements (stack overflow) :)
Advertisement
Sorting 10k elements shouldn't cause a stack overflow, unless you really have a problem in your algorithm.
Indeed, quicksort only requires logarithmic memory usage. And even if it was linear or nlog(n), I hardly see how you're going to overflow it with 10k elements...

Also, the worst-case of quicksort is only invoked on nearly-sorted sequences if the pivot chosen is the first or last element.

Typically, std::sort uses quicksort with median-of-3 pivot selection, fallbacking to heapsort if worst-case performance is invoked, and uses insertion sort for small ranges for cache friendliness.
Have you found it not efficient enough? Maybe you could look into smoothsort then.

[Edited by - loufoque on November 11, 2008 4:30:34 PM]
Quote:Original post by loufoque
Sorting 10k elements shouldn't cause a stack overflow [...] quicksort only requires logarithmic memory usage.

In the best case, yes. But imagine every pivot element you chose is the worst one (e.g. the minimum or maximum of the range to be sorted). Then you need linear memory usage.

For each recursive function call, you need at least four words on the stack: one word for the return address, one word for the old base pointer, and at least two words for the parameters. With 10,000 elements, you get a worst case memory footprint of 320 kb, not sure if that's enough for a stack overflow though.
Quote:Original post by DevFred
In the best case, yes. But imagine every pivot element you chose is the worst one (e.g. the minimum or maximum of the range to be sorted). Then you need linear memory usage.
No. Quicksort only requires logarithmic memory usage, even in the worst-case situations: whenever you split the interval around a pivot, recursively sort the smaller half, then tail-recurse on the larger half. Since tail recursion (either compiler-enforced or hand-optimized) never uses stack space, only the recursion on the smaller half matters, which results in worst-case logarithmic stack usage.
Do a search for radix sort, in my experience they have been damn good for sorting particles. Worst case time is 5n as opposed to the normal n*log(n). They can take a bit to get your head around at first since it never actually compares values but they are very, VERY awesome. I think I got 100k particles sorting at over 60fps (was capped by vsync) on my core 2 duo 2.13ghz using a radix sort, and it can take into account temporal coherence so it can be done in just 1 pass :D. It's also a stable sort so it can be used for a secondary sort which can be handy for things like sorting tables.

Actually I got curious at as to how fast it actually is so I've just profiled my radix sort and yer, 100k random floats sorted in ~0.014 seconds... then I noticed it was in debug, turns out it only takes ~0.00864 seconds with optimizations on :). And perhaps less if I turned off all my debugging/profiling stuff.

Still that's a fair chunk of time, usually half a frame so it's still not really that viable for most games (assuming 60fps target). There might be better ways to handle things if you really need 100k, like staggering the sort some how but I can't think of how since particles change order every frame (though often only minimaly).
-Skiller

This topic is closed to new replies.

Advertisement