How to create a particle system on the GPU?

Started by
8 comments, last by kalle_h 9 years, 4 months ago

Hi there,

i want to create a 2D GPU/GLSL based particle system with some requirements in mind (up to 512k realtime simulated particles, opengl 3.3 features, dynamic grid search for neighbors, two-way collisions - no cuda/opencl if possible).

I am familiar with GLSL, but have not made anything beyond rendering on the GPU, but i have created several particle and physics systems on the CPU - so i have a pretty good understanding how such systems works, but dont know how these would work on the GPU. Please give me some light in the darkness i am in right now....

Thanks in advance,

Regards,

Finalspace

Advertisement

You don't need to use CUDA/OpenCL if you don't want to. You could store the particle positions and velocities in a floating point texture and calculate their motion in the shader. For example, suppose you create a 1024x1024 floating point RGB texture. This will have enough pixels to store both positions and velocities for 512k particles. You'll want to create two of these textures and ping-pong them -- in other words, during one frame you'll be reading from texture A and writing to texture B, and during the next frame you alternate. During this step, you render to a full-screen quad and you read the position and velocity from your input texture, calculate new positions and velocities depending on your equation of motion, and then write the results to the output texture.

Then, you render the particles with that output texture. The position attribute of each vertex in your particle system VBO would be relative to the center of the particle, and you would offset those positions in your shader according to the position stored in the texture. Your particle verts will need a particleID attribute so that it can sample the correct position from the texture.

If you want to get more advanced, you could of course include terms like mass, acceleration, rotation, angular speed, torque, etc., in your floating point texture. Use whatever your equations of motion demand.

If the simulation is stateless you could just do the simulation in the vertex shader ( vertex load will increase if you are not rendering points, but it should get the job done ).

An alternative to CDProps, If you don't want to use the texture based solution is to use transform feedback.

There are a few tutorials around on the net like this one: http://ogldev.atspace.co.uk/www/tutorial28/tutorial28.html

The premise is the same basically, but instead of using textures for each of your particle attributes you just have two buffer objects.

Where one buffer is used as you input buffer and one buffer is used as your ouput buffer.

So transform feedback seems the way i need to go and has some advantages:

- No need to encode/decode positions from textures so its much easier to access particle position/velocity etc.

- No need to upload the modified positions into the VBO from the CPU - everything except initialization is done on the GPU

but also i found some drawbacks:

- Use of random seems like a real pain

- Hard to create multiple emitters, cause it works on the vertex level which processes every particle - even its active or inactive - require a completely different thinking, are there some "discard" function to skip certain vertices?

- Collisions seems to be limited maybe? (Passing array pointers via Uniforms, there are some limitation in the max length of array in GLSL right?, so i need to make a multi-pass system so that collision contacts are being splitted to passes based on the limitations)

- Is there some "Null" fragment shader i can create to ignore any rendering at all? - For multipass systems i want to render at the very end, but process a few times some sort of GLSL shaders which operates on the particles only.

But most importantly i have no idea how to detect neighbor particles. On the CPU i had used a dynamic two-dimensional hashmap to store the particle in a dynamic grid with a bucket index-list to particle in the cell, also every particle has stored its current x/y on the dynamic grid - may change once per frame.

How the hell do i integrate something like that with GLSL? I heart about radix sort and some hash based algorythmn to access neighbors, but seesm to be very complex...

What about compute shader? I heard this is now part of opengl 4.x? Are this some sort of cuda like thing, but based on GLSL only???

Most of the time, the positions of particles are functional values, not explicit. If you can consider your positions as functional values, no need to store them and heavy-duty trade them.

You can then implement a simple 100 or 1000 quad mesh where quads are placed on each other, thus slightly changing in their y object space position. And then over this y you can implement a function in vertex shader repositioning those quads of the one mesh (for example x=x+y,y=y,z=z+y, to create a diagonal "card stretch")

I checked out some examples and tutorials regarding compute shader and transform feedback.

Transform feedback is nice, cause it requires just opengl 3.3 but compute shader is much more suitable for my scenario i am targeting for.

For the start i will just use compute shader only and render the final particle positions with spherical point sprites.

Thanks for the anwers.

I implemented a basic compute shader particle system with simple attraction force physics. Works great and i can simulate up to 2 mio particles running at 60 fps from my crappy GTX580M. What i have so far are two shader storage buffer objects - one for position and one for velocity, both initialized to max-particles (1 mio) and filled using glMapBufferRange.

Now i want to add emitter to that system, so that i can emit and destroy particles when i needed to, but i have no idea how to do that with this technique.

There are three questions i have for that:

- How can i integrate some sort of particle index buffer, to differentiate between active/dead particles with compute shaders and SSBO?

- How to limit the dispatch compute to a certain amount? I dont want to process the entire buffer count always, i want to different between active/dead particles and i do this using a technique to move dead particles to the end and process only the active particles.

- How can i integrate the concept of particle emitters, using compute shaders or is there a good way to mix CPU/GPU which is fast enough for ~512k particles?

I don't claim to have much experience with this, but here's an idea:

  • Have a compute shader determine dead particles, and write only the live particle (IDs?) into a new SSBO.
  • Somehow work out the number of particles that are still alive. Maybe atomically increment a shared number in your shader filtering dead particles? Or instead, in the shader, write to another SSBO a 0 when dead, 1 when alive, and run another shader to count. Not sure how well these would work.
  • To avoid having to determine work group sizes based on reading back buffers, you can make use of glDispatchComputeIndirect. The work group sizes would be written out by a previous shader into an SSBO, and then used again by the compute indirect function.

So the process flow would look like...

Filter live particles (and write out work group sizes) -> compute indirect for processing only live particles

or..

Filter live particles (and write out 0 or 1 to separate SSBO) -> count live particles and write out work group sizes -> compute indirect

This way you never have to read back data to the CPU which you want to avoid.

Read this excellent blog. https://directtovideo.wordpress.com/2009/10/06/a-thoroughly-modern-particle-system/

There are lot of other blog entries covering particles and everything works in limits of dx9 level hardware.

This topic is closed to new replies.

Advertisement