Never sort a render queue with std::sort(). It is neither stable (most important factor) nor fast enough thanks to pointer function calls on each compare. Create a templated base class that uses < and == operators within its code and add these as inlined overridden operators to the render-queue item class/structure.
std::sort (and presumably std::stable_sort, but I've not tried it) can and does inline the comparison. That will happen both when using the default < operator, and when specifying a functor to compare with like std::greater<T>(). Other cases do not inline so well in my experience.
Fully agree. std::sort being slow is a statement I can't understand. Default operators inline just fine, and with lambdas, it seems to inline pretty much everything else too (at least for me). I've never bothered about the speed of function pointer comparators since they've never been an issue, but I would concede that these likely won't inline, but who knows. While it is certainly possible to write some specialized sort that is faster, but as an a-priori strategy, I deem this a premature optimization in Hoare's sense. You can still resort to trying to write a better implementation if it turns out that sorting is really bottlenecking you.
As for the sort needing to be stable, I don't get the argument either. The point of a stable sort is that if for example you have a database of people who have a name ("John Smith") and an age ("45 years"), you first sort by name, and then later sort by age. If the sort is stable, this will give a list of people sorted by age, and within each group of people with the same age, names will be sorted alphabetically. That's not what you do in a render queue.
Following the above description, stable sorting is necessary to avoid flickering when objects have the same z distance. For that one would use Z buffering anyway. Stable sorting, or even perfect sorting at all, isn't necessary for that. Indeed, when you only care about Z, the common idiom is to run a single pass of bubble sort over all objects, which eventually puts them into kind-of-order (never 100% correctly, but good enough) and amortizes the cost for sorting over several frames (with an exactly predictable, constant cost per frame). Bubble sort as such is an ultra poor algorithm, but in this case, it turns out being one of the best solutions.
A fully fledged render queue has a few more things to worry about than just depth (such as grouping by shaders etc), but since you don't normally sort these independently, it shouldn't matter whether your sort is stable or not (at least I couldn't imagine how... feel free to explain where it matters). If you do for example something like in Lengyel's 6-7 year old blog post (basically, encode that stuff into a 64 bit sort key) you run just one single sort over the whole thing, and it doesn't matter at all whether it's stable or not.
Consider a very minimal render queue with very poor z resolution and a very minimal overall sort key:
1 bit: layer (HUD or scene)
1 bit: transparent or solid
2 bits: material (shader + texture + whatever)
4 bits: depth
No matter how you sort this 8-bit key (stable or not), renderables in the scene layer will always sort after the ones in the HUD, transparent objects (which test but don't write Z) will always draw after the solid ones (which test and write Z), and within each group, they'll also be sorted by material (shader/texture/whatever), and lastly depth.
Sure, if two objects have exactly identical sort keys, it is possible that they end up being rendered in alternating order between different frames, but who cares.
Edited by samoth, 09 March 2014 - 08:04 AM.