I can and have written a particle system using either the transform feedback buffer or OpenCL. However, while they handle multiple emitters, attractors, and obstacles, they do not handle dynamically branching particle systems. I am trying to make an OpenCL particle system that is capable of this.
For example, consider a fire particle system. There are fire particles, and while they are alive, they emit smoke particles. One fire particle, over its lifetime, might create twenty smoke particles.
I am having trouble figuring out a good way to map this to the GPU. The particles need to be stored in memory somewhere, and this will most likely be analogous to a flat C array. Such data structures are not amenable to random insertions--and certainly not by many of their elements simultaneously.
I had the idea of making a large 2D array, and storing "parent" particles in only one column. As these particles branched, their children could go into the same row, but a different column. By using a 3D array, these particles could branch. This approach would have the major disadvantage of not being able to split more than once or twice, and of using grossly too much memory and processor power on average. This is because most of the cells aren't filled--a fire particle might only make ten particles, but for each fire particle, we'd need to allocate the worst case. Plus, it's inelegant.
One could also try a 1D array, and add new particles to it atomically. This has the disadvantage of serializing additions, but the advantage of simplicity and efficiency--particularly if you know how many particles you'll have, but not necessarily where they'll all come from. I'm not sure I know how to make a lock for an OpenCL kernel though (mostly because thread warps all travel together).
I have been playing around with some other ideas--binary trees, semi-mipmapping-type stuff, linked lists, but I haven't hit on anything that seems obviously like the Right Thing. I realize this is open-ended (and it is deliberately so), but does anyone have any good ideas for algorithms for handling dynamic branching in a GPU particle system?