Particle Systems Full GPU

Started by
2 comments, last by Tangletail 8 years, 4 months ago

Hi,

I highly think CPU particle system has non sense nowadays because it's possible to do full GPU particle system.

For the mesh particle system, instancing can be used, for the general case geometry shader can be used.

The CPU version use Affector inherited class to call one function which update each particle.

Is it good to call one compute shader for each affector ?

Thanks

Advertisement

I'm... not entirely sure what this question is asking... so I will just answer as many things I think it's asking as possible. The first one isn't really answering a question, just adding a note to a consideration. (Probably due to it being 4 in the morning)

There are still reasons to use a CPU based particle system over a GPU.

Mostly because you receive more control over the CPU based systems without needing to wait for resources to transfer back and forth between the CPU and GPU. Remember that defering calls tends to force the system to wait for callbacks. In a threading system, this means that you will have a thread that is randomly stalled and has to wait for the GPU to finish doing what ever it's doing, then compute, and send back.

With CPUs you can control your particles via AI, boids, complex particle properties that tells each particle to do something very specific, and so on.

GPU's are also significantly slower than CPUs on simple and complex tasks. GPus may have 512-1256 cores... but they are at a significantly lower clock. If you have a program that controls only 400-800 particles that is being uploaded to the GPU every frame, you're only hurting yourself. It takes time to cross from the CPU, to the BUS to the GPU's card, then to be handled from there, then get a call back. So... while the GPU's many cores could theoretically process this faster... it still takes significantly more time to update, when it takes the CPU barely a fraction of a microsecond.

GPUs is better when you need a massive amount of particles that justifies time, and all of it's transformations are procedurally mathematical. Even then, there is a cost to draw time and update time.

This is also why games only push aesthetic physics to the GPU, and not move the entire physics system to the GPU. Particles, wood chips, denting, cloth, anything that is not game changing, and only after the data systems from the CPU end have determined that the two are "colliding" for larger objects and sub systems. And particles are only approximated by the world's texture space.


Is it good to call one compute shader for each affector ?

Typically... probably not. It's expensive for the GPU to constantly swap shaders. Even if it's on the same data. I've taken a look at Unreal's code sometime ago to figure out how they do particles.

Interestingly, it's faster for the CPU to have multiple affectors in the particle's properties than it is for the GPU to do the same. There was also some very good reason why a good number of the affectors are availible for only GPU or CPU based particles. Possibly due to how the two different systems were designed. GPUs can only do one thing at once, and requires all data to be present. CPUs are perfectly capable of doing multiple things at once, and holding something if it doesn't have what it needs.

For CPUs, the data and is instantiated when it's needed, and held onto. The kernal always exists. If I remember correctly, GPUs tend to dump kernal code, which is often reuploaded by the CPU.

That being said, this is why I guess it lead to Unreal's Design. Unreal actually treats it's particle GPUs like it does it's materials. It procedurally creates a single shader for that particle system. When it's done, the game has one computer shader that defines the particle system's behavior.

The last game I shipped simulated all particles on the GPU. We still had our legacy codepath for CPU-simulated particles, but I don't think that we actually had any assets in the shipping game that used it. At least for us all of our use cases for particles were things that didn't require feedback from the simulation back to the CPU. Personally, if any such case came up I would be fine writing an optimized component/actor system to handle that one particular case instead of making our particle systems flexible enough to handle it. For the other 99% of cases, the GPU path is extremely fast, and simulation was essentially free as far as we were concerned (especially on PS4 where we could use async compute).

I would also disagree with the notion that "uploading" is a high cost of GPU-based particle systems. In our case, the amount of CPU->GPU data that occurred every frame was very small, and generally amounted to a few small constant buffers. Everything else was pre-packed into static buffers or textures kept in GPU memory.

Probably better to go with MJP's response. I'm still a college student for CS. He's actually worked with the end products. Though... I'd think that a console is completely different from a desktop. Only because the Nintendo Wii had absolutely nothing remarkable about it's specs (Single core processor, very weak GPU) and it was able to produce some complicated and visually amazing games.

This topic is closed to new replies.

Advertisement