Most efficient way to update/render particles?

Started by
4 comments, last by SGreth 18 years, 10 months ago
This is more a query for 'how did you implement it' rather than a 'how do you code it'. In the past, I've always locked the vertex buffer, and then updated the geometry of each particle, then done one big render on the buffer. I suppose alternatively I could render each particle seperately, and change the transform for each one, but that's a lot of function calls on the CPU if the system is complex (i.e. that doesn't seem like it would scale). Oh, and I'm just using the fixed-function pipeline for now. I'm guessing you could write a shader to play some fancy games with using the color component or texture coordinate as the dynamic translation info, but I'm not coming up with any brilliant ideas. ~S'Greth
"The difference between insanity and genius is measured only by success."~Bruce Feirstein
Advertisement
You pointed out two things. Using the CPU to do everything wastes time that the GPU could be processing. And using the GPU to do everything wastes time that the CPU could be processing. In fact, there is a nice median that you have to find. I don't know how true it is anymore, because I had heard that games were being CPU limited more than GPU limited now days, but consider this anyway. Instead of batching up tons and tons of particles and drawing at once, batch a few thousand at a time. There are actual numbers (5000 sticks in mind), I'm thinking Simon O'Conner had some links to it on here, that will give a good balance between CPU and GPU. So, prepare 5000 particles, send 5000 to the video card. Prepare the next 5000, etc. If you do this with a large vertex buffer and use some tricks like the NOOVERWRITE flag, you can balance the CPU and GPU loads.

If you have a problem seeing how that balances. Consider going into this "preparation" routine empty handed. The GPU isn't working on anything at the time. You spend 5 ms calculating stuff, then you transfer it to video memory. Now, the GPU starts working on it. If you had started intermittently transmitting data to the video memory only 0.5 ms into the computations, the GPU could be working along side of you. Instead, it's backed up 5 ms with nothing to do. Hope that helps a bit.

Chris
Chris ByersMicrosoft DirectX MVP - 2005
That makes complete sense. I wasn't really considering doing batches that large, but my concerns were mostly if there was a more efficient way of doing translation other than locking down the vertex buffer and trolling through it. It just seems like there must be a better way. It'll work for now though as my particle counts won't be terribly high at all.
"The difference between insanity and genius is measured only by success."~Bruce Feirstein
Supernat02,

I was under the impression that the GPU never began to parse the vertex data until the D3D runtime's command buffer is flushed to the driver.

So once the app has called Present, the driver is then off executing the commands in the command buffer, while the CPU is free to calculate new particle data. When writing this new data to the VB, locking with NOOVERWRITE would allow the CPU to access the VB without having to wait for the GPU to finish accessing it. Or maybe it would be benefical to investigate some sort of double-buffering of VBs?

Quote:Original post by don
I was under the impression that the GPU never began to parse the vertex data until the D3D runtime's command buffer is flushed to the driver.

Knowing exactly what happens is difficult as, for good reason, a lot of those decisions are left to the drivers/hardware. Those parts seem to work black-magic at times [smile]

As a general overview, you send stuff to D3D and it will then pass it on to the driver with as little interaction as possible. The GPU will then work through the entries in the command buffer as it sees fit - with various commands causing a complete flush (e.g. some resource locks)


Anyway, to throw another idea into the mix... the last big particle system I did benefited greatly from two things:


  1. Throttling: the particle system updated at 15hz irrespective of the frame rate. At times it was noticeable - but ONLY when there were fast moving particles. Sometimes trying to brute force and update every single frame is just pointless if it's only gonna end up moving 0.5px in screen space [smile]

  2. Time slicing: This has been mentioned above, but I think I did it slightly differently... say we have 15,000 particles - I'd divide updates out over 1 second (or whatever) and have the CPU work on 1,000 particles per frame. I never tried double (or 15x) buffering with the data though - that sounds interesting.


Obviously, the entire set of particles was rendered each frame - just the CPU based updating that was modified.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Thanks a bunch guys, I'll probably employ the periodic updates scheme, but I honestly odon't plan on having housands of particle counts. Remember, this is just a simple breakout clone :) I just don't wanna implement the particle system half-assedly so I have to re-code it again later.

Thanks again!
"The difference between insanity and genius is measured only by success."~Bruce Feirstein

This topic is closed to new replies.

Advertisement