GPU particles

Started by
38 comments, last by Telanor 10 years, 8 months ago
So I've been reading posts and articles about GPU particle systems and the general consensus for stateful systems seems to be to create 2 buffers, bind one as input, read from it in the vertex shader, then output to the other. And then in a second pass, you read from that second buffer to do your rendering.

In all the samples I've looked at (like the ones in the directx SDK), they create a couple hundred particles to start off with and then just reuse those same ones over and over. The particles never die off and new ones are never spawned. In one of the demos they teleport the particles back to the spawn point to make it look like they're dying off and being respawned.

While that works in a demo, I don't see how that can work in practice when you can have dozens of different emitters emitting different types of particles at different rates. Essentially the problem of choosing when and where to spawn particles has been completely avoided by all the samples I've seen. So my question is: how would you handling spawning particles when you have many different emitters with different properties?

Also, on the topic of gpu particles, I've seen some systems that use directcompute instead. Am I correct in assuming that only works on feature level 11 and up?
Advertisement

A few day ago, I created a little GPU particle sample for SharpDX. The buffers start off completely empty. To spawn new particles, I copy them to a third vertex buffer. After the pass to update the particles from last frame, I just "draw" the third buffer to append the new particles to the buffer for the current frame as well.

Any properties that are "sufficiently random" like the particle's color or various flags, I put directly into the vertex. I didn't include this in the sample to avoid bloating it, but to handle properties that a lot of particles share, every particle also has a type ID that indexes into an array of particle types in the geometry shader.

current project: Roa

Very nice sample. So basically it has a small, 128 element buffer that the cpu can dump new particles into and calling the shader with that as the input results in it appending them to the existing list.

I was hoping to avoid the geometry shader since a lot of people have been saying it has poor performance, but I'm not sure I can see a way of doing something similar using vertex texture fetch instead. Your GS setup is so straightforward and simple though that I might just go with that.

Off-topic question: How do you disable vsync for sharpdx toolkit? I want to see what kind of performance this has.


Also, on the topic of gpu particles, I've seen some systems that use directcompute instead. Am I correct in assuming that only works on feature level 11 and up?

Yes, that's correct. Or possibly feature level 10 depending on how you do it.

Last weekend I was knocking a little cs based gpu particle system up actually. it's not very hard.

you have 2 buffers which you use as append and consume buffers. I made them relatively large (a million particle wide in my test). Perform sim consuming from one and appending to the other. Then go through emitters and emit by append into the current buffer. Then to render use drawIsntancedIndirect using the size of the current buffer as indirect draw arguments, and expand in GS.

on my 580, sim and drawing of half a million particles ends up taking less than a ms. Not appropriate for everything, as all particle systems using the setup end up with the same sim code, diven by a few parameters, so if you want wildly different behaviour you might also want another approach for those. Also, it kind of gets hard to order particles with alpha objects in your scene, as they are all rendered in one batch. For this I will use it only for specific effects that need a large number of very small particles, and run a traditional cpu based particle sim for more standard fuzzy particles effects.

Here 's a quick video of what it looks like without textures / anything fancy:



Off-topic question: How do you disable vsync for sharpdx toolkit? I want to see what kind of performance this has.

this.graphicsDeviceManager.SynchronizeWithVerticalRetrace = false;
this.IsFixedTimeStep = false;

On my system (Nvidia GTX 460), updating and drawing 16k (1024 * 16) particles takes about 0.3 milliseconds.

current project: Roa


I was hoping to avoid the geometry shader since a lot of people have been saying it has poor performance,

Actually, on some cards at least 1:4 geometry expansion (typical particle scenario) is specifically handled so that it doesn't result as poor performance.

Cheers!


I was hoping to avoid the geometry shader since a lot of people have been saying it has poor performance,

Actually, on some cards at least 1:4 geometry expansion (typical particle scenario) is specifically handled so that it doesn't result as poor performance.

Cheers!

In the particle storm sample in Hieroglyph 3 I use 1:4 expansion in the GS, and it still runs pretty darn fast. It isn't perfect, but even on older DX11 hardware it handles 100k+ particles with no problems...

Yeah the point->quad expansion has special-case handling in GPU's because it's so common. If you really want to avoid GS you can also use instancing to accomplish the same thing.

It's actually the GS-based update pass I was hoping to avoid more so than the point->quad expansion. There were some posts I read here on gamedev about how using the GS ended up being slower than some other method. I don't exactly remember what it said, I'd have to go find the link again. In any case, I'm in the process of porting the sample to my project. Even if it uses the GS it'll be faster than the CPU implementation I have now.


While that works in a demo, I don't see how that can work in practice when you can have dozens of different emitters emitting different types of particles at different rates.

Particle pooling works quite nicely. Just use some metadata like number of active paricles etc. and a usage factor or similar. Once you approach the upper limit of active particles(e.g. 80%), your emitters will start to emit less particles. This will auto-adjust quite nicely as long as you don't have some extreme situtations.


There were some posts I read here on gamedev about how using the GS ended up being slower than some other method.

Yes, the GS is not the best idea to instanciate larger models (trees, rocks ..). In this case standard instancing works better. But for very low expansion (1:3/1:4), the geometry shader seems to work much better, especially when considering what MJP said, that the GPU hardware supports this special case.

Particle systems often suffer more from pixel overdraw (alpha blending, no early-z etc.) than from vertex performance. Still it depends on lot of factors (blend mode, particle size, attributes: when simulating the particle on the GPU, you will have lot of attributes per vertex). Considering this, trimming the particle like what humus sugguested in one of his presentations (Graphics Gems for Games - Findings from Avalanche Studios) might give you an additional performance boost ( if needed smile.png )

This topic is closed to new replies.

Advertisement