Sorting objects and parallelism

This topic is 3386 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I've been thinking about sorting objects back-to-front to perform correct alpha-blending. It is clear, that for about 1k objects there shouldn't be problems to sort them (http://linux.wku.edu/~lamonml/algor/sort/sort.html). But what if I need to sort about 100k of objects? I think that doing sorting not all the time but, for example, per one second could be a good solution. But if sorting needs, let's say, 0.5 second, then after every 1 second of rendering I get my application stuck for this 0.5 second to do the sorting. How to fight this? Maybe using an additional thread could help?

Share on other sites
One common approach is iterative sorting. Every frame, do just a few passes of bubble sort. Assuming that Z orders don't usually change dramatically over a short period of time, this keeps things completely sorted most of the time, and mostly sorted all of the time.

Share on other sites
Quote:
 Every frame, do just a few passes of bubble sort

Seems to be quite simple :). Suppose I can do similar thing with some faster sorting algorithms? For 100k of objects I would need much more than 1 second to do the sorting... :)

Quote:
 Assuming that Z orders don't usually change dramatically over a short period of time

Mhm, we MUST make such assumption in this situation :)

Share on other sites
Quote:
 Original post by MaxestSeems to be quite simple :). Suppose I can do similar thing with some faster sorting algorithms? For 100k of objects I would need much more than 1 second to do the sorting... :)
There's this "bubble sort EEEVIL" idea which totally ignores temporal coherence. For a mostly-sorted array (measured by summed displacement), bubble sort and its ilk are hands down the fastest possible sorting algorithm. In contrast, quicksort (in its classical implementation) on a mostly sorted array is very slow.

Quote:
Quote:
 Assuming that Z orders don't usually change dramatically over a short period of time

Mhm, we MUST make such assumption in this situation :)

There are other sorting algorithms which may improve things here... I know some people use cocktail-sort for this. But if you want everything sorted perfectly and you're not willing to leverage temporal coherence, your options are limited. Also, remember to profile! Sorting 100k objects takes much less than 0.5 seconds on my computer.

Share on other sites
I tried code from aformentioned website. For 10k random elements algos like heapsort or quicksort do the sorting in less than 1 ms (AMD Turion 2x1.8Ghz). Bubblesort did the same in 0.5 second.
However, after first sorting, and moving the camera later on, the sorting will only be affecting few elements (depending on camera speed of course...). So as you say, profiling is most important here. Geez... and that's all for some stupid grass :D

Share on other sites
Quote:
 Original post by MaxestI tried code from aformentioned website. For 10k random elements algos like heapsort or quicksort do the sorting in less than 1 ms (AMD Turion 2x1.8Ghz). Bubblesort did the same in 0.5 second.
Now sort the already-sorted array again, using (a) quicksort, and (b) this slightly modified bubble-sort:

void bubbleSort(int numbers[], int array_size){  int i, j, temp;  for (i = (array_size - 1); i >= 0; i--)  {    bool mod = false;    for (j = 1; j <= i; j++)    {      if (numbers[j-1] > numbers[j])      {        temp = numbers[j-1];        numbers[j-1] = numbers[j];        numbers[j] = temp;        mod = true;      }    }    if(!mod) break;  }}

Share on other sites
You may want to use a fast sorting algorithm at the start of a scene, or after a transport, just whenever temporal coherence has gone out the window. Then use something like a bubble sort or a natural merge sort when temporal coherence is high and the objects are nearly sorted.

Share on other sites
Quote:
 Now sort the already-sorted array again, using (a) quicksort, and (b) this slightly modified bubble-sort:

Quick-sort crashes when numbers are already sorted ;). So I used heap-sort instead. Heap-sort for already sorted list of 100k elements gives 50 ms (<1ms for 10k elements) and your version of bubble gives <1ms in the case of 100k elements.
I see that nowadays sorting the objects in realtime does not cost as much as I thought. Grass will rise :)

Share on other sites
Quote:
 Original post by MaxestQuick-sort crashes when numbers are already sorted ;).
An algorithm doesn't crash, your implementation might. Classically Quicksort exhibits its worst case of O(n2) when the sequence is already sorted.

Share on other sites
Quote:
 Original post by dmatterClassically Quicksort exhibits its worst case of O(n2) when the sequence is already sorted.

Shhh, don't give away the ending. [grin]

Share on other sites
Quote:
 Shhh, don't give away the ending.

Good he did because I'm not willing to correct this algorithm :P. And actually it works but not for 10k elements (stack overflow) :)

Share on other sites
Sorting 10k elements shouldn't cause a stack overflow, unless you really have a problem in your algorithm.
Indeed, quicksort only requires logarithmic memory usage. And even if it was linear or nlog(n), I hardly see how you're going to overflow it with 10k elements...

Also, the worst-case of quicksort is only invoked on nearly-sorted sequences if the pivot chosen is the first or last element.

Typically, std::sort uses quicksort with median-of-3 pivot selection, fallbacking to heapsort if worst-case performance is invoked, and uses insertion sort for small ranges for cache friendliness.
Have you found it not efficient enough? Maybe you could look into smoothsort then.

[Edited by - loufoque on November 11, 2008 4:30:34 PM]

Share on other sites
Quote:
 Original post by loufoqueSorting 10k elements shouldn't cause a stack overflow [...] quicksort only requires logarithmic memory usage.

In the best case, yes. But imagine every pivot element you chose is the worst one (e.g. the minimum or maximum of the range to be sorted). Then you need linear memory usage.

For each recursive function call, you need at least four words on the stack: one word for the return address, one word for the old base pointer, and at least two words for the parameters. With 10,000 elements, you get a worst case memory footprint of 320 kb, not sure if that's enough for a stack overflow though.

Share on other sites
Quote:
 Original post by DevFredIn the best case, yes. But imagine every pivot element you chose is the worst one (e.g. the minimum or maximum of the range to be sorted). Then you need linear memory usage.
No. Quicksort only requires logarithmic memory usage, even in the worst-case situations: whenever you split the interval around a pivot, recursively sort the smaller half, then tail-recurse on the larger half. Since tail recursion (either compiler-enforced or hand-optimized) never uses stack space, only the recursion on the smaller half matters, which results in worst-case logarithmic stack usage.

Share on other sites
Do a search for radix sort, in my experience they have been damn good for sorting particles. Worst case time is 5n as opposed to the normal n*log(n). They can take a bit to get your head around at first since it never actually compares values but they are very, VERY awesome. I think I got 100k particles sorting at over 60fps (was capped by vsync) on my core 2 duo 2.13ghz using a radix sort, and it can take into account temporal coherence so it can be done in just 1 pass :D. It's also a stable sort so it can be used for a secondary sort which can be handy for things like sorting tables.

Actually I got curious at as to how fast it actually is so I've just profiled my radix sort and yer, 100k random floats sorted in ~0.014 seconds... then I noticed it was in debug, turns out it only takes ~0.00864 seconds with optimizations on :). And perhaps less if I turned off all my debugging/profiling stuff.

Still that's a fair chunk of time, usually half a frame so it's still not really that viable for most games (assuming 60fps target). There might be better ways to handle things if you really need 100k, like staggering the sort some how but I can't think of how since particles change order every frame (though often only minimaly).