Sign in to follow this  
ekba89

gpu based particle system

Recommended Posts

I'm trying to create particle system based on gpu and i'm half way done. Currently i'm sending position and velocity to gpu and i do position += totalGameTime * velocity. My problem is i should have a reset or delete particle mechanism but i'm not sure how to make it. For example for snow particle should start from y position 50 when its y value is below 0. Here my problem begins. If i loop and calculate position of every particle to delete or reset in cpu there is no point calculating them in gpu. Any ideas how can i do it? Thanks.

Share this post


Link to post
Share on other sites
What I did is using a lifetime value in the vertex data, and reserving a maximum amount of particles in the VBO. If the spawner can have up to 1.000 particles, then reserve vertex data for 1.000 particles.

Asides managing position and such, you can also store a particle lifetime. While it's higher than 0, the particle is "active". If it's <= 0, the particle will become invisible or "recycled" for a new one. In order to know if a particle be be used again or not, I numbered each particle with an ID (starting at 0,1,2, and so on).
[code]
<vertex shader>
if (particle.lifeTime <= 0.f)
{
// Reset?
if ( particle.id < currentMaxParticles )
{
// Recycle
particle.lifeTime = ...something higher than 0 again
particle // reset position, velocity, color, size, whatever
} else
{
particle.size = 0; // Keep it invisible by scaling it to 0, or place it somewhere far outside the world
}
}
[/code]
With some extra intelligence in the timer, you can also make a spawn delay to prevent all particles getting spawned right after the start.

Rick

Share this post


Link to post
Share on other sites
Thanks for reply. But as far as i know in vertex shader we can't change data in the buffer. So how can i change its lifetime in shader.

Here is my code. I have age already because i though it would be useful but as i said i don't know how can i change its real value from shader so i don't use it here. And currently i'm using particle system for snow so particles start from y position 50 and goes down with time. I should reset them after their y position is below 0.

[code]
BBVertexToPixel ParticleVS(float4 inPos : POSITION, float2 inTex : TEXCOORD0, float4 velocityAndAge : TEXCOORD1)
{
BBVertexToPixel Output = (BBVertexToPixel)0;

float3 position = inPos.xyz;

position += velocityAndAge.xyz * xTotalGameTime;

//other stuff for billboarding...
[/code]

Share this post


Link to post
Share on other sites
Two ways to fix that:
1- Store vertex results in a texture (reserve 1 pixel for each particle(vertex) on 2 or 3 textures (or packed together eventually). RGB for position, another RGB for velocity, A for lifetime, another A for...
2.-Use OpenGL Transform Feedback (not sure how its called in DX.... Vertex streaming?)

Option uses textures(FBO's / Rendertargets) to store the results. So read the positions/other data per vertex from a texture, add velocity / time, and write it back to the texture.

I would go for option 2 though, no textures needed. Here you write back data in the VBO, so, yes you can :)
What I do is making 1 vertex per particle, and update them all with a vertex shader (no fragment shader needed). Then next, a shader will render that VBO again, with a geometry shader. The geometry shaders makes billboard-sprites out of the single points. This allows to render big amounts of particles & no CPU needed at all.

Rick

Share this post


Link to post
Share on other sites
Thanks again. I'm using directx so i'll look for vertex streaming and currently i'm using 4 points to create billboards so i'll also look for geometry shader solution too. If you know any good tutorial to start it would be very helpful.

Share this post


Link to post
Share on other sites
You can make completely stateless particles on the GPU by allowing particles to respawn on the frame they die by treating their live-time as a looping value too. Of course, this trick only works if their spawn-rate is 1:1 with their lifetime and the particles are indeed purely stateless, but for many localised effects its pretty valid.

By purely stateless, I mean the state of every particle is purely a function of time and constant values. At startup you just set a buffer to have a single 'spawns at' time, and all other "Initial" parameters. This means nothing can change at runtime though, not even the emitter location. A particles existance is true as long as the effects instance age >= particle spawn-at time. Each particles' age is ('instance age' - 'spawn at') % 'particle lifetime'.

Works well for localised looping effects. I've used this mostly for fire embers and ash. In these cases i typically had a single float4 per sprite, 1 spawn-at time and 3 random variables. InitialPosition, InitialDirection, DirectionSpread, Initial Velocity, Gravity, VortexLine were shader parameters. All math was done in the vertex shader.

Where possible to use, these sorts of effects require zero CPU time beyond initialisation and issuing the draw call of the resulting effect (oh, and culling, but thats a given anyway) And the GPU doesnt need to feed back into VRAM either as its stateless.

Most annoying downside to this is, because there is no feedback you have to work out what the maximum boundingbox size can be manually.

Share this post


Link to post
Share on other sites
"Of course, this trick only works if their spawn-rate is 1:1 with their lifetime"

It's perfectly reasonable to include some downtime as a ratio inside the particle data. You just draw a degenerate quad/tri if the particle is currently not running. By tweaking the origin times/respawn ratio times you can then generate effects like a fire which flares up and dies down.

Share this post


Link to post
Share on other sites
[quote name='Katie' timestamp='1311581890' post='4839897']
"Of course, this trick only works if their spawn-rate is 1:1 with their lifetime"

It's perfectly reasonable to include some downtime as a ratio inside the particle data. You just draw a degenerate quad/tri if the particle is currently not running. By tweaking the origin times/respawn ratio times you can then generate effects like a fire which flares up and dies down.
[/quote]

This is true, and works because the hidden time is considered to be part of a single particles lifetime. Terminating the effect so all of the particles die off without needing to edit the data beyond a shader parameter isnt too hard either, its just the inverse of starting it up.

Share this post


Link to post
Share on other sites
instead of an ID to controll how many Particles of hte maximum count should be active at a given time.
One can just include a propability in the respawn code.

e.g.

u have max 1000 particles but only want 200 active, in your update code u have something like

[code]
life_time -= time;
if(life_time < 0)
{
if( rnd() < 0.2)
life_time = start-life;
}
[/code]

Share this post


Link to post
Share on other sites
Thanks for advices but i think we are getting away from my real problem :D. I mean every method you said needs to update some values and here my problem begins. Where should i write them? For example i send initial positions with my buffer and as i mentioned above i find current position of particle by multiplying it with total elapsed game time. For example if i do position += velocity *elapsedTime this won't work because i always sent initial position with buffer. So i need to update and keep some values whether it be age or position. As far as i found out i need shader 4.0 and dx 10 to change buffer from hlsl and i'm using dx 9. As spek said i can use a texture for saving data but i don't like the idea having a texture for particle system. Also a side question :D do i need directx 10 to use geometry shaders and it is hard to find good tutorials for starting glsl so if you know some beginner tutorials for geometry shader i really appreciate. Thanks.

Share this post


Link to post
Share on other sites
[list][*]to stream from one buffer into another u need a either DX10 or OpenGL3(same graphic cards required)[*]it is the standard way to use textures on older hardware, also using textures make more effects possible(like particle lines)[/list]
geometry shaders are quite expensive so u shouldn't use them alot(like in a particle system with thousands of particles).

Share this post


Link to post
Share on other sites
[quote name='ekba89' timestamp='1311604257' post='4839998']
Thanks for advices but i think we are getting away from my real problem :D. I mean every method you said needs to update some values and here my problem begins. Where should i write them? For example i send initial positions with my buffer and as i mentioned above i find current position of particle by multiplying it with total elapsed game time. For example if i do position += velocity *elapsedTime this won't work because i always sent initial position with buffer. So i need to update and keep some values whether it be age or position. As far as i found out i need shader 4.0 and dx 10 to change buffer from hlsl and i'm using dx 9. As spek said i can use a texture for saving data but i don't like the idea having a texture for particle system. Also a side question :D do i need directx 10 to use geometry shaders and it is hard to find good tutorials for starting glsl so if you know some beginner tutorials for geometry shader i really appreciate. Thanks.
[/quote]

With stateless particles you dont need to update the values after they are written because they are permenatley stored per potential-particle.

As far as keeping data in a texture or vertex buffer, its largely irrelevant as its all video memory in the end. Dont think of a texture as a texture-map, but just a container of data. When its in a texture, you can use a pixel issue to update the values, never touching it with the GPU. Vertex texture access is somewhat easier to deal with than Stream-Out and R2VB functionality too.

Share this post


Link to post
Share on other sites
[quote name='Danny02' timestamp='1311628499' post='4840199']
to stream from one buffer into another u need a either DX10 or OpenGL3(same graphic cards required)
it is the standard way to use textures on older hardware, also using textures make more effects possible(like particle lines)


geometry shaders are quite expensive so u shouldn't use them alot(like in a particle system with thousands of particles).
[/quote]
Currently i'm using billboarding by sending 4 texture coordinates to find which side i'm working. So except of texture coordinate i copy everything 4 times for each particle. For 10000+ particles i think it is a big overhead. So if it is possible to send just center and find 4 vertex positions in geometry shader wouldn't it be better than my technique?


[quote name='Digitalfragment' timestamp='1311668509' post='4840411']
With stateless particles you dont need to update the values after they are written because they are permenatley stored per potential-particle.

As far as keeping data in a texture or vertex buffer, its largely irrelevant as its all video memory in the end. Dont think of a texture as a texture-map, but just a container of data. When its in a texture, you can use a pixel issue to update the values, never touching it with the GPU. Vertex texture access is somewhat easier to deal with than Stream-Out and R2VB functionality too.
[/quote]
I know i can use texture to save any kind of data but what i don't like about saving position or age to a texture is i already have these information. I mean i create vertex buffer for my particle vertex type which contains position and age. So for example if i use cpu for particle engine i most likely update position values in vertex buffer. So i'm trying to find a similar solution if there is any. Also wouldn't it effect performance much to sample texture for 10000+ particle? And i don't understand what you mean with "When its in a texture, you can use a pixel issue to update the values, never touching it with the GPU.". You mean updating with CPU?

Also today another question pop up in my head. How can i delete a particle if i want to.Of course i know i should delete it from vertex buffer :D but i mean how should i know which one i should delete. For example currently i'm trying to reset particles from the top when their position.y is below 0 but lets say i wanted to remove particles when their position.y is below 0. Is moving them a place where camera can't see good enough because with this solution until gpu finds out particle can't be seen by camera every operation before that needs to be done(which is all vertex shader if im not wrong?). Thanks.

Share this post


Link to post
Share on other sites
[quote name='ekba89' timestamp='1311692235' post='4840567']
I know i can use texture to save any kind of data but what i don't like about saving position or age to a texture is i already have these information. I mean i create vertex buffer for my particle vertex type which contains position and age. So for example if i use cpu for particle engine i most likely update position values in vertex buffer. So i'm trying to find a similar solution if there is any. Also wouldn't it effect performance much to sample texture for 10000+ particle?
[/quote]

If you are billboarding the sprites in a geometry shader, then its a single t-fetch per texture, per sprite. Its point-filtered, so the hardware doesnt need to do anything fancy. Its fundamentally the same as passing the data in via a vertex stream. Its a fraction more expensive in the vertex shader, but you're now fetching less vertex data so thats a bit cheaper.

[quote name='ekba89' timestamp='1311692235' post='4840567']
And i don't understand what you mean with "When its in a texture, you can use a pixel issue to update the values, never touching it with the GPU.". You mean updating with CPU?
[/quote]

No, i mean with the GPU.
Instead of on the CPU saying: foreach(particle) position = position + velocity * dt;
You would bind a rendertarget "result positions", bind 2 textures "previous position", "current velocity", and in a pixel shader: output = tex2d(positionSampler, uv) + tex2d(velocitySampler, uv) * dt;

Most likely, that pixel shader would be able to process an order of magnitude more particles and a CPU implementation. Sure, you can use SIMD to make your vector3 MAD instructions intrinsics, then split the loop across multiple cores, but then you have the overhead of managing those threads and the GPU is probably going to be faster still regardless. And if the game doesnt need to know what the particle positions &amp; velocities are on the CPU, then theres no reason for them to be there. Bungie also do particle collisions with the world in [i]screen space using the z-buffer[/i], which shows that you don't really need to do collision detection on the cpu either.

CUDA makes shaders that perform these style operations from a more generic C syntax than HLSL, and NVidia's Apex I believe makes heavy use of it.

[quote name='ekba89' timestamp='1311692235' post='4840567']
Also today another question pop up in my head. How can i delete a particle if i want to.Of course i know i should delete it from vertex buffer :D but i mean how should i know which one i should delete. For example currently i'm trying to reset particles from the top when their position.y is below 0 but lets say i wanted to remove particles when their position.y is below 0. Is moving them a place where camera can't see good enough because with this solution until gpu finds out particle can't be seen by camera every operation before that needs to be done(which is all vertex shader if im not wrong?). Thanks.
[/quote]

Does it matter if you run the shader for a particle that is dead? If you do run it, you have a better idea of what your worst case scenario performance is going to be when you suddenly have every particle alive at once.

In the texture example i gave, a channel would be devoted to the time at which a particle spawned. You would kill a particle by drawing the equivelant of float_max, thereby the particle hasnt spawned yet.

Share this post


Link to post
Share on other sites
Thanks for explanations.
[quote name='Digitalfragment' timestamp='1311756500' post='4841009']
If you are billboarding the sprites in a geometry shader, then its a single t-fetch per texture, per sprite. Its point-filtered, so the hardware doesnt need to do anything fancy. Its fundamentally the same as passing the data in via a vertex stream. Its a fraction more expensive in the vertex shader, but you're now fetching less vertex data so thats a bit cheaper.
[/quote]

So as far as i understand they are nearly same per sprite but eventhough its good to learn their speed is nearly same per sprite my problem is unnecessary usage of vram. Currently i have 100000 particle in the scene which are using billboarding. And in vertex buffer for each particle i create 4 vertex struct and all of them have same data(speed, position(center for billboarding), lifetime) other than their texture coordinate and i use this to find which corner i'm currently working in vertex shader. In one of the replies spek mentioned with geometry shader just sending single point(which is center point i believe as you do with default directx functionality) is enough. So i think what i'm trying to solve here is more of a resource management problem than a speed problem.

[quote name='Digitalfragment' timestamp='1311756500' post='4841009']
Does it matter if you run the shader for a particle that is dead? If you do run it, you have a better idea of what your worst case scenario performance is going to be when you suddenly have every particle alive at once.
In the texture example i gave, a channel would be devoted to the time at which a particle spawned. You would kill a particle by drawing the equivelant of float_max, thereby the particle hasnt spawned yet.
[/quote]
Lets say we have 2 kinds of particles and our computer can handle 10 particle at a time. So when we add a new particle after 10 particles fps starts to drop. And first we create vertex buffer which contains 10 particle for first kind of particle. And for some reason we want to add second kind of particle lets say 5 of it and it is not problem to remove some of the first kind particles. In this case we exceed our limit. How should we solve this. Should we create new vertex buffer of first kind and copy 5 of the old buffers data and remove old buffer or it is enough to just draw 5 of them outside the view. Actually i'm not having this problem right now since i started my game just 2 months age i don't have much in it yet. But i wanted to learn how should i approach in case i use other particles than my snow particles(which i am planning to use :D).

Share this post


Link to post
Share on other sites
[quote name='ekba89' timestamp='1311791938' post='4841216']
So as far as i understand they are nearly same per sprite but eventhough its good to learn their speed is nearly same per sprite my problem is unnecessary usage of vram. Currently i have 100000 particle in the scene which are using billboarding. And in vertex buffer for each particle i create 4 vertex struct and all of them have same data(speed, position(center for billboarding), lifetime) other than their texture coordinate and i use this to find which corner i'm currently working in vertex shader. In one of the replies spek mentioned with geometry shader just sending single point(which is center point i believe as you do with default directx functionality) is enough. So i think what i'm trying to solve here is more of a resource management problem than a speed problem.
[/quote]

If you're putting data into a VBO you are using VRAM just like textures do, in fact the memory used by VBOs and textures are interchangable in DX11 (and probably DX10) because they are just buffers. You can have the data in a VBO and then bind the VBO as a texture by creating an appropriate DX11 ResourceView for it.

In my case, the VBO for the particle effect contains two streams:
A stream that defines the shape of a single particle via the sprites texture coordinates. So thats 4 float2s, which are stored in a 16-bit normalised type, so thats 16 bytes total.
A stream that defines the index of the particle within the texture-mapped data. Thats a single int32 per sprite. Could be a smaller type quite easy too. In the vertex shader that int32 is turned into a texture coordinate (a base offset into the texture is provided as a shader parameter)

The VBOs are rendered using Instanced renderering, so the tex-coords are set as the model stream, and the index is set as the instance stream.

All other data is kept in textures, for my case. If i need to setup particles explicitly and individually, i draw into the textures using pointsprites. But for the most part, i've found that any operation i do on one particle, i can run on all particles within the texture.

[quote name='ekba89' timestamp='1311791938' post='4841216']
[quote name='Digitalfragment' timestamp='1311756500' post='4841009']
Does it matter if you run the shader for a particle that is dead? If you do run it, you have a better idea of what your worst case scenario performance is going to be when you suddenly have every particle alive at once.
In the texture example i gave, a channel would be devoted to the time at which a particle spawned. You would kill a particle by drawing the equivelant of float_max, thereby the particle hasnt spawned yet.
[/quote]
Lets say we have 2 kinds of particles and our computer can handle 10 particle at a time. So when we add a new particle after 10 particles fps starts to drop. And first we create vertex buffer which contains 10 particle for first kind of particle. And for some reason we want to add second kind of particle lets say 5 of it and it is not problem to remove some of the first kind particles. In this case we exceed our limit. How should we solve this. Should we create new vertex buffer of first kind and copy 5 of the old buffers data and remove old buffer or it is enough to just draw 5 of them outside the view. Actually i'm not having this problem right now since i
started my game just 2 months age i don't have much in it yet. But i wanted to learn how should i approach in case i use other particles than my snow particles(which i am planning to use :D).
[/quote]

Ignoring GPU and talking about writing a fast CPU implementation, you really should keep all of your data across all of your particles relatively contigious. If you have many buffers of particle data allocated seperately, then you will thrash your cache when you switch between them for processing. Here, its worthwhile working out an complete upper bound for the number of particles you can have live within the entire scene, and having the individual effects allocate/deallocate particles from within that one pool. A SOA (structure of arrays) approach as opposed to an AOS (array of structures) approach helps cache coherancy and promote the use of intrinsic optimized math. For example:

fixed_vector<vec3> all_positions, all_velocities;
fixed_vector<float> all_lifetimes;
resizable_vector<int> eachEffects_startIndex, eachEffects_particleCount;

So here, each effect instance is effectively a subregion of the position, velocity & lifetime pools. You've now just set a budget for your particle systems that it cannot exceed mid game and risk running out of video memory. A management system would be responsible for defragmenting the pools when particles die, and also placing per-effect restrictions on spawning based on the CPU load.

Going back to the GPU side now. Those pools can, in your case, be packed into a single VBO and the draw call for each effect just directly use the startIndex and particleCount from the "eachEffects" arrays. In my case, those fixed_vectors are channels in textures, and the startIndex/particleCount are used to share the textures between multiple instances of effects. Creating many different vertex buffers causes pretty much the same thrashing problems that creating many memory allocations in system memory does - not only on the GPU itself, but in the requirement now on the CPU to switch vertex buffers between draw calls.

Because i'm using textures to store the data, which are rendertargets, I never have to upload the memory from CPU to GPU. Zero memory bandwidth cost. I never need the individuals data back on the CPU either, so its VRAM only - where as a dynamic VBO might very well exist in system memory & vram simultaneously.

Share this post


Link to post
Share on other sites
First of all thanks for explaining everything with writing mini-articles :D they are very helpful. And there is no escape of using textures for particle system so i will start to use. Also I was already thinking about using instancing for particle. I learned it not so long ago and use it for grass in my game i wasn't sure if i can use it with particles. So i will start to use it as well.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this