So as far as i understand they are nearly same per sprite but eventhough its good to learn their speed is nearly same per sprite my problem is unnecessary usage of vram. Currently i have 100000 particle in the scene which are using billboarding. And in vertex buffer for each particle i create 4 vertex struct and all of them have same data(speed, position(center for billboarding), lifetime) other than their texture coordinate and i use this to find which corner i'm currently working in vertex shader. In one of the replies spek mentioned with geometry shader just sending single point(which is center point i believe as you do with default directx functionality) is enough. So i think what i'm trying to solve here is more of a resource management problem than a speed problem.
If you're putting data into a VBO you are using VRAM just like textures do, in fact the memory used by VBOs and textures are interchangable in DX11 (and probably DX10) because they are just buffers. You can have the data in a VBO and then bind the VBO as a texture by creating an appropriate DX11 ResourceView for it.
In my case, the VBO for the particle effect contains two streams:
A stream that defines the shape of a single particle via the sprites texture coordinates. So thats 4 float2s, which are stored in a 16-bit normalised type, so thats 16 bytes total.
A stream that defines the index of the particle within the texture-mapped data. Thats a single int32 per sprite. Could be a smaller type quite easy too. In the vertex shader that int32 is turned into a texture coordinate (a base offset into the texture is provided as a shader parameter)
The VBOs are rendered using Instanced renderering, so the tex-coords are set as the model stream, and the index is set as the instance stream.
All other data is kept in textures, for my case. If i need to setup particles explicitly and individually, i draw into the textures using pointsprites. But for the most part, i've found that any operation i do on one particle, i can run on all particles within the texture.
[quote name='Digitalfragment' timestamp='1311756500' post='4841009']
Does it matter if you run the shader for a particle that is dead? If you do run it, you have a better idea of what your worst case scenario performance is going to be when you suddenly have every particle alive at once.
In the texture example i gave, a channel would be devoted to the time at which a particle spawned. You would kill a particle by drawing the equivelant of float_max, thereby the particle hasnt spawned yet.
Lets say we have 2 kinds of particles and our computer can handle 10 particle at a time. So when we add a new particle after 10 particles fps starts to drop. And first we create vertex buffer which contains 10 particle for first kind of particle. And for some reason we want to add second kind of particle lets say 5 of it and it is not problem to remove some of the first kind particles. In this case we exceed our limit. How should we solve this. Should we create new vertex buffer of first kind and copy 5 of the old buffers data and remove old buffer or it is enough to just draw 5 of them outside the view. Actually i'm not having this problem right now since i
started my game just 2 months age i don't have much in it yet. But i wanted to learn how should i approach in case i use other particles than my snow particles(which i am planning to use
).
[/quote]
Ignoring GPU and talking about writing a fast CPU implementation, you really should keep all of your data across all of your particles relatively contigious. If you have many buffers of particle data allocated seperately, then you will thrash your cache when you switch between them for processing. Here, its worthwhile working out an complete upper bound for the number of particles you can have live within the entire scene, and having the individual effects allocate/deallocate particles from within that one pool. A SOA (structure of arrays) approach as opposed to an AOS (array of structures) approach helps cache coherancy and promote the use of intrinsic optimized math. For example:
fixed_vector<vec3> all_positions, all_velocities;
fixed_vector<float> all_lifetimes;
resizable_vector<int> eachEffects_startIndex, eachEffects_particleCount;
So here, each effect instance is effectively a subregion of the position, velocity & lifetime pools. You've now just set a budget for your particle systems that it cannot exceed mid game and risk running out of video memory. A management system would be responsible for defragmenting the pools when particles die, and also placing per-effect restrictions on spawning based on the CPU load.
Going back to the GPU side now. Those pools can, in your case, be packed into a single VBO and the draw call for each effect just directly use the startIndex and particleCount from the "eachEffects" arrays. In my case, those fixed_vectors are channels in textures, and the startIndex/particleCount are used to share the textures between multiple instances of effects. Creating many different vertex buffers causes pretty much the same thrashing problems that creating many memory allocations in system memory does - not only on the GPU itself, but in the requirement now on the CPU to switch vertex buffers between draw calls.
Because i'm using textures to store the data, which are rendertargets, I never have to upload the memory from CPU to GPU. Zero memory bandwidth cost. I never need the individuals data back on the CPU either, so its VRAM only - where as a dynamic VBO might very well exist in system memory & vram simultaneously.