CPU GPU (compute shader) parallelism

Started by
4 comments, last by MJP 10 years, 8 months ago

Quick theoretical question:

Lets say my CPU frames run faster than my compute shader has finished his job. And I start a new dispatch call each CPU frame. What would exactly happen:

1. The compute shader dispatches getting queued and exectued later?

2. The new dispatch calls will be ignored?

3. If enough GPU threads are still available they would start handling the new dispatch call?

Basically I want my program to act like number 2 describes. If this is not the default behavior I wonder how I will obtain it.

Advertisement

D3D and the driver will queue up as many commands as you give it, and the GPU will eventually execute them. Typically in applications that use the device for rendering, the device will block the CPU during Present if the CPU starts getting too far ahead of the GPU. I'm not sure how it works exactly if you're only using the device for compute, but I would assume that something similar happens if the driver has too many commands queued up.

If you wanted a system that dynamically changes what commands it issues based on the GPU load, there's no direct support for doing it. If I were to try implementing such a thing, I would probably start by trying to using timestamp queries to track when Dispatch calls actually get executed. Then based on that feedback you could try to decide whether to issue new Dispatch calls.

That sounds ok.

Since I am using a CS to spawn particles and another to update them I was (am) worried that the data gets completely mixed up.

BUT, since you can't bind the same buffer to different CS at the same time this should theoretically not happen? (I just hope you can also not bind the same buffer to the same CS with another dispatch call)

Lets say I am doing this:


1. Bind particleStateBuffer to CSUpdateParticle
2. Dispatch CSUpdateParticle
3. Unbind particleStateBuffer to CSUpdateParticle

4. Bind particleStateBuffer to CSSpawnParticle
5. Dispatch CSSpawnParticle
6. Unbind particleStateBuffer to CSSpawnParticle

Once point 4. is beeing reached and the CSUpdateParticle is still running, CSSpawnParticle wouldn't do anything right? Will it then getting queued or just ignored forever?

And now if I imagne that the CPU frames are much faster than the compute shaders. I just hope that there aren't multiple instances of the same CS running in parallel with the same data and both manipulating that data at the same time.

You don't have anything to worry about if I'm understanding you right.

Setting/unsetting shader resources, constant buffers, calling Dispatch etc are instructions that the GPU will execute sequentially at some point in the future, only overlapping work where it's valid to do so. Even if the CPU were allowed to run 50 frames ahead (it isn't) you're still just building up a buffer of commands that the GPU will execute in order without skipping any of them. The CPU is only allowed to get 1-3 frames ahead of the GPU before DirectX will block you from adding any more commands to allow the GPU to catch up. It doesn't do this because it's possible that the GPU will end up running multiple frames in parallel but because it would introduce unnecessary latency whereby the time between the CPU issuing the commands and the GPU actually executing them gets higher and higher.

Even if the GPU has got 3 frames worth of commands ready and waiting to go, it'll will run them sequentially and not skip any frames.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Thank you very much ajmiles!

In the case you're describing the driver will automatically detect the dependency between your two Dispatch calls, and it will insert a sync point before executing the second Dispatch. This will cause the GPU to wait until all threads from the first Dispatch complete, so the second Dispatch won't begin until all of the data is present in the state buffer.

This topic is closed to new replies.

Advertisement