gpu based particle system

Started by
15 comments, last by ekba89 12 years, 8 months ago
  • to stream from one buffer into another u need a either DX10 or OpenGL3(same graphic cards required)
  • it is the standard way to use textures on older hardware, also using textures make more effects possible(like particle lines)

geometry shaders are quite expensive so u shouldn't use them alot(like in a particle system with thousands of particles).
Advertisement

Thanks for advices but i think we are getting away from my real problem :D. I mean every method you said needs to update some values and here my problem begins. Where should i write them? For example i send initial positions with my buffer and as i mentioned above i find current position of particle by multiplying it with total elapsed game time. For example if i do position += velocity *elapsedTime this won't work because i always sent initial position with buffer. So i need to update and keep some values whether it be age or position. As far as i found out i need shader 4.0 and dx 10 to change buffer from hlsl and i'm using dx 9. As spek said i can use a texture for saving data but i don't like the idea having a texture for particle system. Also a side question :D do i need directx 10 to use geometry shaders and it is hard to find good tutorials for starting glsl so if you know some beginner tutorials for geometry shader i really appreciate. Thanks.


With stateless particles you dont need to update the values after they are written because they are permenatley stored per potential-particle.

As far as keeping data in a texture or vertex buffer, its largely irrelevant as its all video memory in the end. Dont think of a texture as a texture-map, but just a container of data. When its in a texture, you can use a pixel issue to update the values, never touching it with the GPU. Vertex texture access is somewhat easier to deal with than Stream-Out and R2VB functionality too.

to stream from one buffer into another u need a either DX10 or OpenGL3(same graphic cards required)
it is the standard way to use textures on older hardware, also using textures make more effects possible(like particle lines)


geometry shaders are quite expensive so u shouldn't use them alot(like in a particle system with thousands of particles).

Currently i'm using billboarding by sending 4 texture coordinates to find which side i'm working. So except of texture coordinate i copy everything 4 times for each particle. For 10000+ particles i think it is a big overhead. So if it is possible to send just center and find 4 vertex positions in geometry shader wouldn't it be better than my technique?



With stateless particles you dont need to update the values after they are written because they are permenatley stored per potential-particle.

As far as keeping data in a texture or vertex buffer, its largely irrelevant as its all video memory in the end. Dont think of a texture as a texture-map, but just a container of data. When its in a texture, you can use a pixel issue to update the values, never touching it with the GPU. Vertex texture access is somewhat easier to deal with than Stream-Out and R2VB functionality too.

I know i can use texture to save any kind of data but what i don't like about saving position or age to a texture is i already have these information. I mean i create vertex buffer for my particle vertex type which contains position and age. So for example if i use cpu for particle engine i most likely update position values in vertex buffer. So i'm trying to find a similar solution if there is any. Also wouldn't it effect performance much to sample texture for 10000+ particle? And i don't understand what you mean with "When its in a texture, you can use a pixel issue to update the values, never touching it with the GPU.". You mean updating with CPU?

Also today another question pop up in my head. How can i delete a particle if i want to.Of course i know i should delete it from vertex buffer :D but i mean how should i know which one i should delete. For example currently i'm trying to reset particles from the top when their position.y is below 0 but lets say i wanted to remove particles when their position.y is below 0. Is moving them a place where camera can't see good enough because with this solution until gpu finds out particle can't be seen by camera every operation before that needs to be done(which is all vertex shader if im not wrong?). Thanks.

I know i can use texture to save any kind of data but what i don't like about saving position or age to a texture is i already have these information. I mean i create vertex buffer for my particle vertex type which contains position and age. So for example if i use cpu for particle engine i most likely update position values in vertex buffer. So i'm trying to find a similar solution if there is any. Also wouldn't it effect performance much to sample texture for 10000+ particle?


If you are billboarding the sprites in a geometry shader, then its a single t-fetch per texture, per sprite. Its point-filtered, so the hardware doesnt need to do anything fancy. Its fundamentally the same as passing the data in via a vertex stream. Its a fraction more expensive in the vertex shader, but you're now fetching less vertex data so thats a bit cheaper.


And i don't understand what you mean with "When its in a texture, you can use a pixel issue to update the values, never touching it with the GPU.". You mean updating with CPU?


No, i mean with the GPU.
Instead of on the CPU saying: foreach(particle) position = position + velocity * dt;
You would bind a rendertarget "result positions", bind 2 textures "previous position", "current velocity", and in a pixel shader: output = tex2d(positionSampler, uv) + tex2d(velocitySampler, uv) * dt;

Most likely, that pixel shader would be able to process an order of magnitude more particles and a CPU implementation. Sure, you can use SIMD to make your vector3 MAD instructions intrinsics, then split the loop across multiple cores, but then you have the overhead of managing those threads and the GPU is probably going to be faster still regardless. And if the game doesnt need to know what the particle positions & velocities are on the CPU, then theres no reason for them to be there. Bungie also do particle collisions with the world in screen space using the z-buffer, which shows that you don't really need to do collision detection on the cpu either.

CUDA makes shaders that perform these style operations from a more generic C syntax than HLSL, and NVidia's Apex I believe makes heavy use of it.


Also today another question pop up in my head. How can i delete a particle if i want to.Of course i know i should delete it from vertex buffer :D but i mean how should i know which one i should delete. For example currently i'm trying to reset particles from the top when their position.y is below 0 but lets say i wanted to remove particles when their position.y is below 0. Is moving them a place where camera can't see good enough because with this solution until gpu finds out particle can't be seen by camera every operation before that needs to be done(which is all vertex shader if im not wrong?). Thanks.


Does it matter if you run the shader for a particle that is dead? If you do run it, you have a better idea of what your worst case scenario performance is going to be when you suddenly have every particle alive at once.

In the texture example i gave, a channel would be devoted to the time at which a particle spawned. You would kill a particle by drawing the equivelant of float_max, thereby the particle hasnt spawned yet.
Thanks for explanations.

If you are billboarding the sprites in a geometry shader, then its a single t-fetch per texture, per sprite. Its point-filtered, so the hardware doesnt need to do anything fancy. Its fundamentally the same as passing the data in via a vertex stream. Its a fraction more expensive in the vertex shader, but you're now fetching less vertex data so thats a bit cheaper.


So as far as i understand they are nearly same per sprite but eventhough its good to learn their speed is nearly same per sprite my problem is unnecessary usage of vram. Currently i have 100000 particle in the scene which are using billboarding. And in vertex buffer for each particle i create 4 vertex struct and all of them have same data(speed, position(center for billboarding), lifetime) other than their texture coordinate and i use this to find which corner i'm currently working in vertex shader. In one of the replies spek mentioned with geometry shader just sending single point(which is center point i believe as you do with default directx functionality) is enough. So i think what i'm trying to solve here is more of a resource management problem than a speed problem.


Does it matter if you run the shader for a particle that is dead? If you do run it, you have a better idea of what your worst case scenario performance is going to be when you suddenly have every particle alive at once.
In the texture example i gave, a channel would be devoted to the time at which a particle spawned. You would kill a particle by drawing the equivelant of float_max, thereby the particle hasnt spawned yet.

Lets say we have 2 kinds of particles and our computer can handle 10 particle at a time. So when we add a new particle after 10 particles fps starts to drop. And first we create vertex buffer which contains 10 particle for first kind of particle. And for some reason we want to add second kind of particle lets say 5 of it and it is not problem to remove some of the first kind particles. In this case we exceed our limit. How should we solve this. Should we create new vertex buffer of first kind and copy 5 of the old buffers data and remove old buffer or it is enough to just draw 5 of them outside the view. Actually i'm not having this problem right now since i started my game just 2 months age i don't have much in it yet. But i wanted to learn how should i approach in case i use other particles than my snow particles(which i am planning to use :D).

So as far as i understand they are nearly same per sprite but eventhough its good to learn their speed is nearly same per sprite my problem is unnecessary usage of vram. Currently i have 100000 particle in the scene which are using billboarding. And in vertex buffer for each particle i create 4 vertex struct and all of them have same data(speed, position(center for billboarding), lifetime) other than their texture coordinate and i use this to find which corner i'm currently working in vertex shader. In one of the replies spek mentioned with geometry shader just sending single point(which is center point i believe as you do with default directx functionality) is enough. So i think what i'm trying to solve here is more of a resource management problem than a speed problem.


If you're putting data into a VBO you are using VRAM just like textures do, in fact the memory used by VBOs and textures are interchangable in DX11 (and probably DX10) because they are just buffers. You can have the data in a VBO and then bind the VBO as a texture by creating an appropriate DX11 ResourceView for it.

In my case, the VBO for the particle effect contains two streams:
A stream that defines the shape of a single particle via the sprites texture coordinates. So thats 4 float2s, which are stored in a 16-bit normalised type, so thats 16 bytes total.
A stream that defines the index of the particle within the texture-mapped data. Thats a single int32 per sprite. Could be a smaller type quite easy too. In the vertex shader that int32 is turned into a texture coordinate (a base offset into the texture is provided as a shader parameter)

The VBOs are rendered using Instanced renderering, so the tex-coords are set as the model stream, and the index is set as the instance stream.

All other data is kept in textures, for my case. If i need to setup particles explicitly and individually, i draw into the textures using pointsprites. But for the most part, i've found that any operation i do on one particle, i can run on all particles within the texture.


[quote name='Digitalfragment' timestamp='1311756500' post='4841009']
Does it matter if you run the shader for a particle that is dead? If you do run it, you have a better idea of what your worst case scenario performance is going to be when you suddenly have every particle alive at once.
In the texture example i gave, a channel would be devoted to the time at which a particle spawned. You would kill a particle by drawing the equivelant of float_max, thereby the particle hasnt spawned yet.

Lets say we have 2 kinds of particles and our computer can handle 10 particle at a time. So when we add a new particle after 10 particles fps starts to drop. And first we create vertex buffer which contains 10 particle for first kind of particle. And for some reason we want to add second kind of particle lets say 5 of it and it is not problem to remove some of the first kind particles. In this case we exceed our limit. How should we solve this. Should we create new vertex buffer of first kind and copy 5 of the old buffers data and remove old buffer or it is enough to just draw 5 of them outside the view. Actually i'm not having this problem right now since i
started my game just 2 months age i don't have much in it yet. But i wanted to learn how should i approach in case i use other particles than my snow particles(which i am planning to use :D).
[/quote]

Ignoring GPU and talking about writing a fast CPU implementation, you really should keep all of your data across all of your particles relatively contigious. If you have many buffers of particle data allocated seperately, then you will thrash your cache when you switch between them for processing. Here, its worthwhile working out an complete upper bound for the number of particles you can have live within the entire scene, and having the individual effects allocate/deallocate particles from within that one pool. A SOA (structure of arrays) approach as opposed to an AOS (array of structures) approach helps cache coherancy and promote the use of intrinsic optimized math. For example:

fixed_vector<vec3> all_positions, all_velocities;
fixed_vector<float> all_lifetimes;
resizable_vector<int> eachEffects_startIndex, eachEffects_particleCount;

So here, each effect instance is effectively a subregion of the position, velocity & lifetime pools. You've now just set a budget for your particle systems that it cannot exceed mid game and risk running out of video memory. A management system would be responsible for defragmenting the pools when particles die, and also placing per-effect restrictions on spawning based on the CPU load.

Going back to the GPU side now. Those pools can, in your case, be packed into a single VBO and the draw call for each effect just directly use the startIndex and particleCount from the "eachEffects" arrays. In my case, those fixed_vectors are channels in textures, and the startIndex/particleCount are used to share the textures between multiple instances of effects. Creating many different vertex buffers causes pretty much the same thrashing problems that creating many memory allocations in system memory does - not only on the GPU itself, but in the requirement now on the CPU to switch vertex buffers between draw calls.

Because i'm using textures to store the data, which are rendertargets, I never have to upload the memory from CPU to GPU. Zero memory bandwidth cost. I never need the individuals data back on the CPU either, so its VRAM only - where as a dynamic VBO might very well exist in system memory & vram simultaneously.
First of all thanks for explaining everything with writing mini-articles :D they are very helpful. And there is no escape of using textures for particle system so i will start to use. Also I was already thinking about using instancing for particle. I learned it not so long ago and use it for grass in my game i wasn't sure if i can use it with particles. So i will start to use it as well.

This topic is closed to new replies.

Advertisement