Efficient particle system memory management

This topic is 3654 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I'm making a particle system in C# and XNA and am wondering what the best way(s) to manage the particles in memory is? There are two sample particle systems provided on the XNA creators webstie, a Simple and a Complex one. They both use a constant sized array, which seems like a smart thing to do to avoid allocating/deallocating memory all the time at run-time; they just use a boolean to determine if a particle is active (alive) or not. One question I have though, is there any sort of performance hit to use a Collections Generic class like List to store the particles vs using a native array (like Particle[] particles)? I'm assuming that as long as you set the Capacity of the List it will allocate the memory instead of allocating/deallocating at run-time. One possible advantage to using a Generics class would be dynamically changing the capacity of the array at run-time if needed, as well as the ability to search the array, etc if needed. As for managing the array of particles, the Simple particle system uses a Free Particles Queue to keep track of which particles are not in use. When it needs a particle, it is taken from the front of the queue, and when a particle dies it is returned to the back of the queue. This is a very simple technique, but I don't think it's all that efficient (let me know if I'm wrong). The Complex particle system uses a more complicated, but also a seemingly more efficient method. Rather than explain the method myself, I've just copied the explanation from their code below.
// The particles array and vertex buffer are treated as a circular queue.
// Initially, the entire contents of the array are free, because no particles
// are in use. When a new particle is created, this is allocated from the
// beginning of the array. If more than one particle is created, these will
// always be stored in a consecutive block of array elements. Because all
// particles last for the same amount of time, old particles will always be
// removed in order from the start of this active particle region, so the
// active and free regions will never be intermingled. Because the queue is
// circular, there can be times when the active particle region wraps from the
// end of the array back to the start. The queue uses modulo arithmetic to
// handle these cases. For instance with a four entry queue we could have:
//
//      0
//      1 - first active particle
//      2
//      3 - first free particle
//
// In this case, particles 1 and 2 are active, while 3 and 4 are free.
// Using modulo arithmetic we could also have:
//
//      0
//      1 - first free particle
//      2
//      3 - first active particle
//
// Here, 3 and 0 are active, while 1 and 2 are free.
//
// But wait! The full story is even more complex.
//
// When we create a new particle, we add them to our managed particles array.
// We also need to copy this new data into the GPU vertex buffer, but we don't
// want to do that straight away, because setting new data into a vertex buffer
// can be an expensive operation. If we are going to be adding several particles
// in a single frame, it is faster to initially just store them in our managed
// array, and then later upload them all to the GPU in one single call. So our
// queue also needs a region for storing new particles that have been added to
// the managed array but not yet uploaded to the vertex buffer.
//
// Another issue occurs when old particles are retired. The CPU and GPU run
// asynchronously, so the GPU will often still be busy drawing the previous
// frame while the CPU is working on the next frame. This can cause a
// synchronization problem if an old particle is retired, and then immediately
// overwritten by a new one, because the CPU might try to change the contents
// of the vertex buffer while the GPU is still busy drawing the old data from
// it. Normally the graphics driver will take care of this by waiting until
// the GPU has finished drawing inside the VertexBuffer.SetData call, but we
// don't want to waste time waiting around every time we try to add a new
// particle! To avoid this delay, we can specify the SetDataOptions.NoOverwrite
// flag when we write to the vertex buffer. This basically means "I promise I
// will never try to overwrite any data that the GPU might still be using, so
// you can just go ahead and update the buffer straight away". To keep this
// promise, we must avoid reusing vertices immediately after they are drawn.
//
// So in total, our queue contains four different regions:
//
// Vertices between firstActiveParticle and firstNewParticle are actively
// being drawn, and exist in both the managed particles array and the GPU
// vertex buffer.
//
// Vertices between firstNewParticle and firstFreeParticle are newly created,
// and exist only in the managed particles array. These need to be uploaded
// to the GPU at the start of the next draw call.
//
// Vertices between firstFreeParticle and firstRetiredParticle are free and
// waiting to be allocated.
//
// Vertices between firstRetiredParticle and firstActiveParticle are no longer
// being drawn, but were drawn recently enough that the GPU could still be
// using them. These need to be kept around for a few more frames before they
// can be reallocated.


This method seems very efficient, except that because it's using a circular queue it seems that all of the particles would have to have the lifetime (would have to live for the same duration of time). I assume this because, say for example, Particle 1 is given a duration of 10 seconds, and Particle 2 a duration of 5 seconds, then the firstFreeParticle iterator won't be able to reach Particle 2 until Particle 1 has died, so Particle 2 (and potentially the rest of the array of particles) could not be turned into a new particle until Particle 1 died and was turned into a new particle first. This would only become a problem when the particles array was full (i.e. when the back of the circular queue reaches the front). An example of when this could happen would be if the whole particle array was initialized in one shot, and the first particle initialized was set to have the longest duration. Although in the actual code randomness is added to the lifetime so it looks like the authors just chose not to address this potential problem or didn't think about it happening. Right now, even with the potential problem, this method seems to be the most efficient one I've found. Can anybody see any other problems with this method or think of a better method to use? Any help would be appreciated. Thanks. [Edited by - deadlydog on June 16, 2008 12:19:16 PM]

Share on other sites
I'd do whatever is most simple and makes the most sense to you and then go back later if needed.

Share on other sites
Quote:
 Original post by wild_pointerI'd do whatever is most simple and makes the most sense to you and then go back later if needed.

That's what I'm doing for now, but this is a project for a university class so I would like to eventually use the most efficient method possible

Share on other sites
Quote:
Quote:
 Original post by wild_pointerI'd do whatever is most simple and makes the most sense to you and then go back later if needed.

That's what I'm doing for now, but this is a project for a university class so I would like to eventually use the most efficient method possible

I would try one both and measure the performance in a profiler. That will tell you which is the most efficient.

Share on other sites
I would have two pools, one for Live particles and the other for Free particles. As you spawn, they move from Free particles pool to the Live one. As particles die, they move from Live particles pool to the Free one.

Every frame, update all the particles in the Live pool.

It is simple, less likely to go wrong and doesn't do anything overly expensive. If performance does become an issue, as the above poster says, profile.

Share on other sites
You just have to profile, as it is not possible to predict what the .Net-Platform is going to do. A list is able to add things efficiently at both ends, and I'd guess the JITers will be pretty happy with the generics. But, I cannot predict it. Profile it.

I had something like this just a day ago. I had a graph thats represented by adjacency lists ( that is, a list of adjacency lists). Random lookup was too slow, thus I created myself a little function that compiled the list into an array. Given this function, I had two choices: Calling it at selected times, or calling it whenever a lookup was requested. Guess what? Invalidating and recompiling that array whenever necessary reduced runtime by a factor 10. I did not expect that.

So, if you talk about efficiency, profile, profile and profile again. You have no chance without profiling.

Share on other sites
Your particle system, if it's CPU bound (which is unlikely. It's far more likely you'll be update-the-vertex-buffer or fillrate bound), won't have a bottleneck in the particle pool code. Meaning you could do a list traversal every time a particle spawns and find the first empty slot and it probably won't make a difference at all.

Share on other sites
If the particles have random lifetimes then the circular queue probably won't help at all. I would advise just use a pre-allocated particle array with active/inactive flags. When you spawn a new particle just find the first inactive one. It's the easiest method to implement, delivers good performance, and does not generate a lot of garbage which may hurt frame rate in managed environment.

Share on other sites
Quote:
 Original post by NumsgilYour particle system, if it's CPU bound (which is unlikely. It's far more likely you'll be update-the-vertex-buffer or fillrate bound), won't have a bottleneck in the particle pool code. Meaning you could do a list traversal every time a particle spawns and find the first empty slot and it probably won't make a difference at all.

Actually my particle system is CPU bound, as I want to do a few dynamic things that I don't think can be done in HLSL. The method I am using now is I have an array containing all the particles (to a pre-defined maximum number), and two linked-lists which point to the particles in the array. The two linked-lists are InactiveParticles and ActiveParticles. I chose linked-lists as they have a time complexity of O(1) for insertion, deletion, and checking how many particles are in the list, and a time complexity of O(n) to iterate through them. When I need a new Particle I take one from the InactiveParticles list if there is one available and add it to the ActiveParticles list. Then when updating I iterate through the ActiveParticles list and update each particle, as opposed to looping through every particle in the array and checking whether it's active or not. After updating a particle I check to see if it's still active, and if it is I copy it's vertex information (Position, Normal, Color, TextureCoordinate) into the vertex buffer (so the vertex buffer is completely overwritten every time the particles are updated). This solution was fairly easy to implement and seems to be working alright. I'm probably going to stick with this implementation unless anyone else can see a problem with it or a way it could be improved. Thanks for all the responses so far! :)

• 15
• 11
• 9
• 49
• 12
• Forum Statistics

• Total Topics
631391
• Total Posts
2999730
×