Sign in to follow this  

Multithreaded particles

Recommended Posts

I'm working on a particle system for an XNA project, and I'm interested in offloading the processing for the particle updates to a separate thread. I'm wondering what a good way to do this is. One option I can think of is to just create a thread when the game starts that's dedicated to this task, and have it constantly in the update loop (I suppose sleeping until the next frame if it finishes early). Another option would be to submit the update as a job to the .NET ThreadPool every frame. The thing I would be concerned about in this second case is if somehow there wasn't a thread available to take the job during a given frame, the update could be skipped (realistically though I can't seem myself using more than handful of total threads ever, so this case seems unlikely). Anyway, having never done this before I'm really not sure if either of these approaches is a good idea, anyone have any thoughts?

Share this post

Link to post
Share on other sites
Use the thread pool. Dispatch groups of X particles to each thread. Launch as ParCount / X number of jobs then wait for them to finish. Profile for a good number for X, but probably 250+ would be a start.

1) Particles are trivial to parallelize. In the simple case, particles don't interact with each other, meaning you can update them in batches in parallel (you can even tesselate them in parallel if you are careful about your buffer structure). In the more complicated case that they do interact with each other, you can double-buffer their state, so all particles read last frame's data, and write to this frames data. You then swap the buffers and repeat for next frame.

2) Particles are updated really late in the frame. You need to emit them during gameplay, updated them after all the emitters are done, and then tesselate and render them at the appropriate points in the render state. Doing as much of that in parallel as you can will shorten the time your engine spends waiting on particles.

Testing on my machine.
With C++, SSE optimized SOA format particles in a uniform update function on a 4 core Phenom II: 60K particles update+tesselate in ~1ms using 4 threads batched in groups of 1000 particles per batch.
But make note that they can take MUCH longer than that to draw. Particles are horrific on the the GPUs fragment processing. Any complex shader, or any large particles will quickly cause you to be wasting 15+ms rendering only 10 or 20 particles. Also, due to memory bandwidth, a simple screen facing particle is likely just as expensive as one that does a lot of calculations to rotate and twist and scale.
You therefore, might actually be better off just optimizing your effects first, then worry about if you should throw particles into the thread pool.

[Edited by - KulSeran on March 31, 2010 10:49:24 PM]

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this