I just implemented a simple 2D point particle system with transform feedback, but I'm getting some performance problem. I've implemented almost identical particle systems on 4 different platforms now and tested how many particles I can run at 60 FPS on each of them on my laptop with an i5-2410 dual core and NVidia GTX 460m.
- Threaded CPU updating and uploading all particles every frame: 1 000 000 particles
- Pingponging data between data textures to update them with a shader: 2 200 000 particles
- Updating a VBO with OpenCL and rendering it with OpenGL: 2 200 000 particles
- Both updating and rendering particles at the same time with transform feedback: 1 200 000 particles
I get almost half as good performance with transform feedback. It still does have a huge advantage: Both the two other GPU implementations had horrible performance with a large particle capacity but a low number of alive particles. Without a single particle, they still ran at only around 200 FPS. This new one runs in no time at all without any particles since it actually does nothing, so it's actually practical in games. The performance loss is a bit disappointing though...
The CPU implementation uploaded only 12 bytes per pixel (float 2D position + 4 bytes of color). The following two GPU implementations used 24 bytes per particle (float position, float velocity, 4 bytes of color and 2 shorts for life time). However, with transform feedback I'm limited to only 4 byte values like floats or integers, so I simply made everything to floats (I ditched the alpha though), leading to 36 bytes of data per particle. This 50% more than the other two implementations used. Adding things like blending and point smoothing did not affect FPS much at all (it did for the other two), so I'm pretty sure this is the main bottleneck at the moment.
Is there any way of telling transform feedback to write out bytes and/or shorts instead of only 4 byte values? Maybe there's some way of packing and unpacking my color and life time data into one float each instead of 3 + 2 floats? Even just packing the color data together would save me 8 bytes, getting it down to 28 bytes. I can live with the improved life time precision. =S
Output data types for transform feedback?
No replies to this topic