Jump to content

  • Log In with Google      Sign In   
  • Create Account

What kind of performance to expect from real-time particle sorting?

  • You cannot reply to this topic
1 reply to this topic

#1 Husbjörn   Members   -  Reputation: 235


Posted Today, 04:33 PM

This is mostly a theoretical question.

In the recent days I have implemented a simple GPU-driven particle system for use in my engine and after finally tracking down some annoyingly elusive bugs it is running quite well.

However I realized that sorting the individual particles so that they are drawn back-to-front (for proper alpha blending) seems to take significant processing time compared to just generating, updating and drawing the individual particles.

For example I get a frame rate drop of about 35 times if I compare running some 8000 particles with and without depth sorting (I get ~2100 FPS witout it and ~55 with on a relatively high-end GPU).

Is this to be expected due to way in which sorting has to be done on the GPU; eventually you need a single thread to traverse the whole list to ensure everything is in proper order? Or is there some kind of in-place sorting algorithm that can have separate threads work on separate sub-sections of the particle list without having to have a final step that puts all the pieces together? I have been unable to think of one.


My sorting algorithm is a pretty straight forward mergesort implementation; I first perform a swap-sort to bring sub-lists of two elements in order, then I proceed by calling a MergeSort shader program that merges two adjacent pre-sorted sub-lists 1 through log2(numParticles) times where the sub-list size goes from 2 and doubles for each step. This obviously means I have lots of threads doing little work in the first passes and the finally just one thread comparing all elements in the entire buffer at the end.



Ideally I would of course want to be able to perform the sorting in a quicker way (I suppose the key to this is better work distribution but as said I'm having a hard time figuring out how to accomplish that). If that is indeed not possible I guess I'll have to tinker with sorting parts of the buffer in each frame, but that can of course permit the ocassional artifact to slip through.


#2 TiagoCosta   Crossbones+   -  Reputation: 2049


Posted Today, 05:28 PM

Have you tried Bitonic sort? And if you can use compute shaders this talk gives details on how to implement bitonic sort using Compute shaders.


I haven't used it myself but it might be useful to you.

Edited by TiagoCosta, Today, 05:29 PM.