optimize particle system
hello,
i implemented a very simple particle system for a visualization which draws textured quads and uses a rather simple shader. it is surely not the most optimized code (using std::vector as a container, calling a render function for every particle, ..), but it is also not CPU bound now. i profiled it.
i think the problem is that the quads are rather huge and blended all the time. so the fillrate is very high and the fragment shader is really busy.
i use immediate mode for drawing (glBegin(GL_TRIANGLES), glTexCoord2f(), glVertex3f() ...). but i wondered if it would be possible to use a VBO for storing vertices on the graphics card. would this even make sense since i have to update particle positions every frame?
i also tried glpointsprites but they have limitations i can't live with and they were not so fast after all.
pseudo-code:
transform whole particlesystem
enable shader (common for all particles)
enable texture (common for all particles)
for each particle:
{
update shader uniforms for each particle
transform each particle
render each particle (using immediate mode)
}
does anybody have an idea how i could optimize this?
thanks in advance!
Don't use a uniform per particle, if you need to change data that quickly use attributes as uniforms are slow to change.
Yes, it makes perfect sense to store the particle data in a VBO, certainly if you are drawing a lot of them. You have to send the data to the card anyways, so sending over a large batch in one hit is much more prefable than sending one point at a time via the cpu and generally staving the gpu of data.
Yes, it makes perfect sense to store the particle data in a VBO, certainly if you are drawing a lot of them. You have to send the data to the card anyways, so sending over a large batch in one hit is much more prefable than sending one point at a time via the cpu and generally staving the gpu of data.
hmm, but i need the changed data per particle in the fragment shader. isn't that what uniforms are for? attributes are just per primitive or do you mean using attributes for the vertex shader and propagating the value as varying to the fragment shader?
concerning the VBO:
i have a rotation and a position for each quad. how do i update them for every frame in the VBO or how do i calculate the resulting vertices in a shader?
and if i do so, aren't this permanent updates of the VBO bufferdata counterproductive and equally inefficient like sending it over the bus the whole time?
thanks for your help!
concerning the VBO:
i have a rotation and a position for each quad. how do i update them for every frame in the VBO or how do i calculate the resulting vertices in a shader?
and if i do so, aren't this permanent updates of the VBO bufferdata counterproductive and equally inefficient like sending it over the bus the whole time?
thanks for your help!
most likely VBO etc aint gonna help cause the bottleneck is the fillrate
*use compressed textures
*use smaller textures eg use 64x64 instead of 256x256 (for particles unlike solid things in the world quality can be lower)
*fade particles out when they get close to the camera (until u dont draw them) personally i dont draw particles if they cover > 20% height of screen
*use compressed textures
*use smaller textures eg use 64x64 instead of 256x256 (for particles unlike solid things in the world quality can be lower)
*fade particles out when they get close to the camera (until u dont draw them) personally i dont draw particles if they cover > 20% height of screen
Quote:Original post by ghostd0gIt's shocking. Anyway, even if it is not now, setting uniform values is one of the most expensive things to do: don't do it.
...but it is also not CPU bound now. i profiled it.
Quote:Original post by ghostd0gThis would happen only when a lot of pixels get rendered. If the particle system is far in view, this wouldn't bottleneck anyway. I believe you're flooding your driver/hw with batches (or not profiling correctly). The batch problem isn't too bad in GL as it is in D3D but if you plot hw performance on batch size you realize that sending less than 1kvertex per batch is a real sin on today's hw.
i think the problem is that the quads are rather huge and blended all the time. so the fillrate is very high and the fragment shader is really busy.
i use immediate mode for drawing (glBegin(GL_TRIANGLES), glTexCoord2f(), glVertex3f() ...).
I have seen a few programs hitting less than 1Mtri/s when on the same hardware shipping games do hit 8-10Mtri/s. A program I am using (which sends more than 20kvertices per batch) clocks at about 18Mtri/s.
GPUs are like internal combustion engines: high RPM = high torque. Low RPM = much less fun.
Quote:Original post by ghostd0gUpdating at half framerate and interpolating/extrapolating may also cut it.
would this even make sense since i have to update particle positions every frame?
Quote:Original post by ghostd0gWhat does this mean?
pseudo-code:
transform whole particlesystem
EDIT: the two previous messages were written while I was doing this one so, some additions follow.
Quote:Original post by ghostd0gNot really. If you need to change this on a per-instance basis then you should find a way to route this info inside VS/PS. This isn't actually too difficult when dealing with disjoint triangles.
hmm, but i need the changed data per particle in the fragment shader. isn't that what uniforms are for?
Quote:Original post by ghostd0gFor example, by having another VBO with a per-vertex ubyte which is an angle. This would be more than enough to represent a screen space rotation.
i have a rotation and a position for each quad. how do i update them for every frame in the VBO or how do i calculate the resulting vertices in a shader?
Quote:Original post by ghostd0gWith respect to what?
and if i do so, aren't this permanent updates of the VBO bufferdata counterproductive and equally inefficient like sending it over the bus the whole time?
Quote:Original post by zedzmost likely VBO etc aint gonna help cause the bottleneck is the fillrateNot necessarly, only when viewed close. It really depends on how much fillrate is really going out.
Use VBOs. The VBO sends the vertex data to the VS so the updated data will be there for the shaders. You update the position on the screen with the CPU if you want and call glBindBuffer() and glBufferSubData() and now you have updated your VBO with new vertex data. Or you can calculate the position in the VS with a R2VB method with PBOs or Vertex textures...
Quote:Original post by zedz
most likely VBO etc aint gonna help cause the bottleneck is the fillrate
*use compressed textures
*use smaller textures eg use 64x64 instead of 256x256 (for particles unlike solid things in the world quality can be lower)
*fade particles out when they get close to the camera (until u dont draw them) personally i dont draw particles if they cover > 20% height of screen
how will compressed textures help? i already use 128x128 (one and the same texture for all particles). texture memory is not a problem and compressed textures will just put another workload for decompression on the GPU, i guess.
i can try going down to 64x64 but i think there will be a real visual problem (pixelate) because the particles have to be ~20-30% of the viewport size.
Quote:Original post by Krohm
.. setting uniform values is one of the most expensive things to do: don't do it.
ok, got that. but how to evade that if i need updates in the fragment shader?
Quote:Original post by Krohm
This would happen only when a lot of pixels get rendered. If the particle system is far in view, this wouldn't bottleneck anyway. I believe you're flooding your driver/hw with batches (or not profiling correctly). The batch problem isn't too bad in GL as it is in D3D but if you plot hw performance on batch size you realize that sending less than 1kvertex per batch is a real sin on today's hw.
I have seen a few programs hitting less than 1Mtri/s when on the same hardware shipping games do hit 8-10Mtri/s. A program I am using (which sends more than 20kvertices per batch) clocks at about 18Mtri/s.
GPUs are like internal combustion engines: high RPM = high torque. Low RPM = much less fun.
actually there are lots of pixels to be rendered because the big quads (20-30% of the viewport size) are all over the screen and they get blended too.
Quote:Original post by KrohmQuote:Original post by ghostd0gUpdating at half framerate and interpolating/extrapolating may also cut it.
would this even make sense since i have to update particle positions every frame?
i'll investigate into updating only at a lower framerate. but i do not understand how inter/extrapolating would spare some rendering power. or maybe i do not really understand what you mean by inter/extrapolating.
Quote:Original post by KrohmQuote:Original post by ghostd0gWhat does this mean?
pseudo-code:
transform whole particlesystem
first transform the whole particlesystem (emitter position and area of validity), then transform the particles relative to the system.
Quote:Original post by KrohmQuote:Original post by ghostd0gNot really. If you need to change this on a per-instance basis then you should find a way to route this info inside VS/PS. This isn't actually too difficult when dealing with disjoint triangles.
hmm, but i need the changed data per particle in the fragment shader. isn't that what uniforms are for?
Quote:Original post by KrohmQuote:Original post by ghostd0gFor example, by having another VBO with a per-vertex ubyte which is an angle. This would be more than enough to represent a screen space rotation.
i have a rotation and a position for each quad. how do i update them for every frame in the VBO or how do i calculate the resulting vertices in a shader?
could you elaborate further on that topic or give me some more hints please?
the way i am thinking of is to have a VBO with vertex positions and texture coordinates. then i could send the rotations as attributes to the VS and calculate my new vertex positions (rotate them) in the shader. could this work and is it a good way?
Quote:Original post by KrohmQuote:Original post by ghostd0gWith respect to what?
and if i do so, aren't this permanent updates of the VBO bufferdata counterproductive and equally inefficient like sending it over the bus the whole time?
i thought the real inefficiency of immediate mode comes from sending the whole daata over the bus for each frame. so of course it is more effective when the vertex data stays on the gfx memory. but when i use a dynamic buffer i want to update the positions/normals/whatever component and therefore i have to send those updates over the bus again (inefficient again?!?). but this might just come from my lack of knowledge.
anyway, thank you very very much for your answers!
Quote:Original post by MARS_999
Use VBOs. The VBO sends the vertex data to the VS so the updated data will be there for the shaders. You update the position on the screen with the CPU if you want and call glBindBuffer() and glBufferSubData() and now you have updated your VBO with new vertex data. Or you can calculate the position in the VS with a R2VB method with PBOs or Vertex textures...
thanks MARS_999, i'll try and look into that.
If you have positions and texcoords in your VBO consider a serial layout. Normally you'd store the vertices like this (interleaved layout):
p.x1p.y1p.z1t.u1t.v1p.x2p.y2p.z2t.u2t.v2...
A serial layout could look like this:
p.x1p.y1p.z1p.x2p.y2p.z2...t.u1t.v1t.u2t.v2...
Using a serial layout you are able to reduce the cost of updating the VBO since you don't need to send all the data down the pipeline again but just the updated positions via glBufferSubData.
p.x1p.y1p.z1t.u1t.v1p.x2p.y2p.z2t.u2t.v2...
A serial layout could look like this:
p.x1p.y1p.z1p.x2p.y2p.z2...t.u1t.v1t.u2t.v2...
Using a serial layout you are able to reduce the cost of updating the VBO since you don't need to send all the data down the pipeline again but just the updated positions via glBufferSubData.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement