Improving particle system

Started by
8 comments, last by 21st Century Moose 13 years ago
Hi,

I just came across a problem for my particle system which I'm currently using to create weather effects like snow, etc.. Due to the fact the my particles, as supposed, use transparent textures, I have to order them manually from back to front. I got that working, but my speed is heavily decreased. Under normal circumstances, I can display about 100000 particles and the framerate stays at least acceptable (didn't do any culling or stuff) at around ~30-45. As for the sorting, since I do it I can't even have 10000 particles (a decrease by a factor of 10) without the game getting unplayable slow (~10 FPS). I do know why: I'm using a std::map<float, CParticle*>-object to store my particles depending on their distance to the camera and draw them reversly. As I already know map's are somewhat slow when they are used with huge amounts of containing objects, and 100.00 isn't such a small value at all. So my question here is: Is there any way to optimize my sorting? Any option to using map's? Can I use vertex/pixelshader for that? I just included shaders so I can basically use them but I'm not so experienced, so if there is an option can you explain to me how to do it? This is my rendering/sorting-method:

void CEmitter::Render(void)
{
m_lpDevice->SetRenderState(D3DRS_LIGHTING, FALSE);
m_lpDevice->SetVertexShader(NULL);
m_lpDevice->SetPixelShader(NULL);
map<float, CParticle*> Particles;
D3DXVECTOR3 Pos = m_Camera->GetPosition();
D3DXVECTOR3 Dist;
float Lenght;
for(int i= 0; i!=m_Data.Rate; i++)
{
Dist = Pos - m_Particles.GetPosition();
Lenght = D3DXVec3Length(&Dist);
Particles[Lenght] = &m_Particles;
}
for(map<float, CParticle*>::reverse_iterator ii = Particles.rbegin(); ii!=Particles.rend(); ++ii)
{
ii->second->Render();
ii->second->Update();
}
m_lpDevice->SetRenderState(D3DRS_LIGHTING, TRUE);
}


m_Data.Rate is the value of particles. Anyone got a solution here?
Advertisement
For starters, try a std::vector< CParticle * >, sorted with std::sort() using a custom comparison functor that expresses the distance to camera.
std::sort() isn't optimal for this (something like Radix sort is what people usually suggest), but it should be faster than inserting repeatedly into a map and iterating over a vector should be faster than iterating over a map.

Other things you may want to consider, is creating bounding boxes for your emitters. Then sort your emitters by distance, and only sort particles per-emitter, unless two or more emitters are intersecting. Then, instead of actually sorting the particles, give each emitter a layer that it goes into, so all "smoke" emitters are in one layer, while all fire emitters are in another. That way, most intersecting emitters will sort to different layers.
You might want to look into pre-multipled alpha to get around your sorting problem.

Optimizing the rendering of a particle system & A faster alpha-blended particle method? might be useful references as well (others can probably be found around Le Internets)
What's going on inside of "ii->second->Render();"? If each particle you draw has it's own DrawPrimitive (or equivalent) call, then this is going to run at nowhere near it's optimal performance (in particular it will be quite CPU-heavy) and you will need to start batching those draw calls together.

Some more tips (in addition to sorting per-emitter, which is very much the right thing to do).

Your "D3DXVECTOR3 Pos = m_Camera->GetPosition();" and "Dist = Pos - m_Particles.GetPosition();" calls - are you doing sqrt calls in there? You don't need to; just leave everything squared and the comparison will still be valid. Likewise your D3DXVec3Length call is certainly calling an sqrt; avoiding this means switching to a different container however.

Using hardware instancing can be useful enough for particles, and can reduce your vertex submission to 25% of what it would otherwise be. Definitely worth exploring.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Another option is to check whether you care if your particles are sorted; if your particles are "hard edged" like (say) snowflakes, you might be able to get away with a scissor test and thus avoiding painting translucent pixels. And thus avoid needing to sort.

If you can turn your particles into either additive or subtractive colours (flames can be the former, dark smoke is the latter) then again you can get away without sorting.

If your particles don't move very far in any given frame, consider not sorting them all each time -- you could sort 1/10th of the array each time and still be sorting each particle more than once a second.
You may be able to get away with sorting more distant particles less often, concentrate on the nearby ones.

If your set is already almost ordered, generic sorts often don't perform well -- consider the bubble sort. (No, really. It performs well on nearly ordered sets. Also; you can halt it at any point in the sort. This makes it easy to allocate a "sorting budget" and stop when it's spent).

Set things up so you sort your indicies and not the actual vertex data -- the index sets will be much smaller.

Have two buffers -- one can be being sorted on the CPU while the GPU is drawing one of them.

Rather than culling particles during your physics session (meaning you need to dynamically shuffle the previous frame's sorted data), arrange your sort to pull them to the front of your queue. You just start drawing later into that list. New additions can use that space before you need to start extending the array.
[font="Arial"]Wow, a whole lost of replies & ideas! Thanks already, I'm going to get into detail with some of them.

[quote='KulSeran']For starters, try a std::vector< CParticle * >, sorted with std::sort() using a custom comparison functor that expresses the distance to camera.
std::sort() isn't optimal for this (something like Radix sort is what people usually suggest), but it should be faster than inserting repeatedly into a map and iterating over a vector should be faster than iterating over a map.[/quote]

I tried out, but all I get is an error trying to use a sort-function:

bool CEmitter::Compare(CParticle* a, CParticle* b)
{
return a->GetDistance()<b->GetDistance();
}

void CEmitter::Render(void)
{

m_lpDevice->SetRenderState(D3DRS_LIGHTING, FALSE);
m_lpDevice->SetVertexShader(NULL);
m_lpDevice->SetPixelShader(NULL);

sort(m_Particles.begin(), m_Particles.end(), &CEmitter::Compare);

...
}

In file algorythm - error C2064 - something about function can't have 2 arguments.. whats wrong here? Eventually if it is better I'd rather go with Radix sort if it is somewhat managable to include.

[quote='KulSeran']Other things you may want to consider, is creating bounding boxes for your emitters. Then sort your emitters by distance, and only sort particles per-emitter, unless two or more emitters are intersecting. Then, instead of actually sorting the particles, give each emitter a layer that it goes into, so all "smoke" emitters are in one layer, while all fire emitters are in another. That way, most intersecting emitters will sort to different layers. [/quote]

Yep, thats how I'm actually doing it. CEmitter represents instances of each emitter, and so does sorting happen. Bounding boxes are there, but I don't use them for intersection right now.

[quote='phantom']
You might want to look into pre-multipled alpha to get around your sorting problem.[/quote]

Thanks, I read through your links and googled a bit. From what I understand this would be just what I was looking for, except one problem, I don't really know how to implement it. I know that the color values have to be multiplied by the alpha value, but whats next? I found some reference saying that I'd have to set these two parameters:
[/font][font="Arial"] RenderState.SourceBlend = Blend.One; RenderState.DestinationBlend = Blend.InverseSourceAlpha;
[/font][font="Arial"]
But where do I do that exactly in DirectX9? Anyway I'm working on fully using shaders for everything so it would be even nice to have a reference to how to do such things
for shader, as all sources to multiplied alpha blending seem to only hanlde the theorie (which is nice but I'm pretty now to HLSL so its impossible for me to figure such things out on my own). Any tutorials/explanations pls?
[quote='mhagain'][/font][font="Arial"]
What's going on inside of "ii->second->Render();"? If each particle you draw has it's own DrawPrimitive (or equivalent) call, then
this is going to run at nowhere near it's optimal performance (in particular it will be quite CPU-heavy) and you will need to start batching those draw calls together.
[/quote]

Yes I know this isn't quite optimal. I'd really like to change it though, but what should I do if all my particles are Meshes created from .x-Files? Should I create my
own vertex and index buffer, and then read the one square consisting of 4 vertices into the vertex buffer, and repeadatly load the indices in the index buffer until all my particles are in the index buffer?
I use a constant number of particles for each emitter so I shouldn't get a problem with regular lock/unlock of the buffers, but I'd like to know if that is the best technique? If it is, as I'll already found a good document on msdn
explaining how to render multiple geometry in one draw call.

[quote='mhagain']Your "D3DXVECTOR3 Pos = m_Camera->GetPosition();" and "Dist = Pos - m_Particles.GetPosition();" calls - are you doing sqrt calls in there? You don't need to; just leave everything squared and the comparison will still be valid. Likewise your D3DXVec3Length call is certainly calling an sqrt; avoiding this means switching to a different container however.
[/font][font="Arial"][/quote][/font][font="Arial"]

Yes, I calculate the lenght. Didn't realize that I don't have to *facepalm*. Thanks!

[/font][quote='Katie']Another option is to check whether you care if your particles are sorted; if your particles are "hard edged" like (say) snowflakes, you might be able to get away with a scissor test and thus avoiding painting translucent pixels. And thus avoid needing to sort.

If you can turn your particles into either additive or subtractive colours (flames can be the former, dark smoke is the latter) then again you can get away without sorting.

If your particles don't move very far in any given frame, consider not sorting them all each time -- you could sort 1/10th of the array each time and still be sorting each particle more than once a second.[/quote]
Hm, though all of my particles are currently snow flakes, but they aren't really hard edged, so scissor test isn't getting me far. However I like the idea of additive and
substrative color for smoke and fire, I don't really know how well this works out as I'm not clear about how realistic my graphics should be. I think real particles
still give a better feel.
Sorting only part of the array won't work in general, as I want to create effects like heavy snow storms, where the particles move very far each frame. But for slower emitters
I can try it out!

[quote='Katie']If your set is already almost ordered, generic sorts often don't perform well -- consider the bubble sort. (No, really. It performs well on nearly ordered sets. Also; you can halt it at any point in the sort. This makes it easy to allocate a "sorting budget" and stop when it's spent).[/quote]

Yeah, I'll try bubble or Radix sort next if I can't get premultiplied alpha to work. At least I hope someone can explain that to me. Anyway I'm aiming at least at
precisious and mostly flawless graphics with less artifacts than possible so not sorting a certain amount of particles is an option for me, but not one I'd like to
take if possible
[quote='Katie']Set things up so you sort your indicies and not the actual vertex data -- the index sets will be much smaller.
[/quote]

Actually I'm sorting neigther index nor vertex data - I'm sorting my custom CParticle class instances, which currently contain each a reference to the same mesh.
If I'd use batches draw calls, I would just sort the positions of the particles, not the vertices themselfs.

Thanks for all the comments and suggestions, if someone could explain how to use premultiplied alpha any further, I'd be really glad though!



[font="Arial"]Yes I know this isn't quite optimal. I'd really like to change it though, but what should I do if all my particles are Meshes created from .x-Files? Should I create my
own vertex and index buffer, and then read the one square consisting of 4 vertices into the vertex buffer, and repeadatly load the indices in the index buffer until all my particles are in the index buffer?
I use a constant number of particles for each emitter so I shouldn't get a problem with regular lock/unlock of the buffers, but I'd like to know if that is the best technique? If it is, as I'll already found a good document on msdn
explaining how to render multiple geometry in one draw call.[/font]


If they're all view-facing billboards then the vertexes and indexes for them are already well known; 4 vertexes per particle and indexes most likely go 0,1,2,0,2,3.

With hardware instancing that's all that you actually need, a vertex buffer and index buffer for a single particle; then set up a second larger dynamic vertex buffer containing your per-instance data which should be relatively lightweight - don't bother with matrixes, all you need is a position and some uniforms so that you can rotate each vertex into the view plane in your vertex shader (you can extract the relevant data for the uniforms from your view matrix). Colour as well might be per-instance, as well as perhaps a scaling factor. Boom, you're off (yes, I know I'm making it sound simpler than it is, but when you "get it" once you'll be shocked at how easy it was).

If they're more complex meshes then things become slightly more tricksy, but nothing insurmountable. Dropping the mesh format, copying the vertex and index data from it to system memory and writing that out to dynamic buffers at runtime might be one approach.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

[quote="mhagain"]
If they're all view-facing billboards then the vertexes and indexes for them are already well known; 4 vertexes per particle and indexes most likely go 0,1,2,0,2,3.

With hardware instancing that's all that you actually need, a vertex buffer and index buffer for a single particle; then set up a second larger dynamic vertex buffer containing your per-instance data which should be relatively lightweight - don't bother with matrixes, all you need is a position and some uniforms so that you can rotate each vertex into the view plane in your vertex shader (you can extract the relevant data for the uniforms from your view matrix). Colour as well might be per-instance, as well as perhaps a scaling factor. Boom, you're off (yes, I know I'm making it sound simpler than it is, but when you "get it" once you'll be shocked at how easy it was). [/quote]

Yes, they are all view-facing 4 vertex billboards. I just realiced I didn't really need any material from the x-file at all, this makes things more easy. I don't really need a dynamic buffer if I don't change the amount of particles after initialization of the emitter, do I? I use a certain kind of particle system where particles simply are transfered back to origin if they eigther leave the bounding box or reach the time limit, so I neigther create nor delete any particles after the emitter is set.

Isn't it the best idea to just submit the worldviewprojection-matrix to the shader? Thats the way most tutorials do it, and all my vertex shaders currently use such a float4x4-matrix. Should I change something here?

Well, I'm making progress, as after all I managed to get premultiplied alpha partually included. I just set the render states:

m_lpDevice->SetRenderState(D3DRS_SRCBLEND, D3DBLEND_ONE);
m_lpDevice->SetRenderState(D3DRS_DESTBLEND, [s]D3DBLEND_INVDESTALPHA[/s]);


Then I wrote a pixelshader setting up premultiplied alpha

// Pixel shader input structure
struct PS_INPUT
{
float4 Position : POSITION;
float2 Texture : TEXCOORD0;
};


// Pixel shader output structure
struct PS_OUTPUT
{
float4 Color : COLOR0;
};


// Global variables
sampler2D Tex0;

PS_OUTPUT ps_main( in PS_INPUT In )
{
PS_OUTPUT Out; //create an output pixel

Out.Color = tex2D(Tex0, In.Texture);
Out.Color.rgb *= Out.Color.a;

return Out; //return output pixel
}


[s]Works fine so far, only problem: My vertices underneath the alpha texture are all drawn, so no real alpha here. The problem certainly lies in the fact that sampler states aren't used anymore. I know how to basically how to use sampler in pixel shader due to msdn. I create a global called like
sampler2D s = sampler_state {//??};
float4 Color (float2 tex : TEXCOORD0)
{
return tex2D(s, tex);
}


[/s][font="Arial"][s]so what states out of this page http://msdn.microsof...v=vs.85%29.aspx belongs inside the sampler state? Is this even the right approach? I'd really be glad if someone told me, I'm still
learning how pixel shaders and texture samplers work..

[/s]Dang, I'm stupid. Didn't realize I wrote [/font][font="Arial"]D3DRS_DESTBLEND, [/font][font="Arial"]D3DBLEND_INVDESTALPHA instead of D3DRS_DESTBLEND, D3DBLEND_INVSRCALPHA. The vertices weren't drawn at all, it was just artifacts caused by the wrong alpha value. Now it works perfectly fine, without any sorting. Well after I spent 15 minutes trying to figure out that I need to deactive z-buffer and active stencil before drawing these particles. Holy ****, this is amazing! Now I'm sparing all sorting algorythms, so when I additionally batch the particles I should get an incredible speed. Thanks for all the help!
[/font]
OK, here's the link I was looking for: http://zeuxcg.blogspot.com/2007/09/particle-rendering-revisited.html

Most instancing samples you see will include a matrix as per-instance data, but if you think about it, a matrix per-instance is going to be 4x4 floats, which is a substantial percentage of the size of a non-instanced full particle. That's going to completely wipe out any potential benefit from using instancing in this case; it's clearly not good enough. For billboarding all that you really need are your up and right vectors, which you can extract from your view matrix (right is at _12, _22 and _32; up at _13, _23 and _33). These are global uniforms so just send them once per frame, and what they're used for is orienting the billboard so that it faces the camera. Your standard world * view * proj is also used of course. Then all you need per-instance is the position of each particle, and use the corner_position formula given in the article linked above (right is camera_axis_x, up is camera_axis_y) to calculate the final correct vertex position, then mul it with your world * view * proj to get the transformed output position.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement