Sign in to follow this  
michaelruecker

CPU GPU (compute shader) parallelism

Recommended Posts

Quick theoretical question:

 

Lets say my CPU frames run faster than my compute shader has finished his job. And I start a new dispatch call each CPU frame. What would exactly happen:

 

1. The compute shader dispatches getting queued and exectued later?

2. The new dispatch calls will be ignored?

3. If enough GPU threads are still available they would start handling the new dispatch call?

 

Basically I want my program to act like number 2 describes. If this is not the default behavior I wonder how I will obtain it.

Share this post


Link to post
Share on other sites

D3D and the driver will queue up as many commands as you give it, and the GPU will eventually execute them. Typically in applications that use the device for rendering, the device will block the CPU during Present if the CPU starts getting too far ahead of the GPU. I'm not sure how it works exactly if you're only using the device for compute, but I would assume that something similar happens if the driver has too many commands queued up.

If you wanted a system that dynamically changes what commands it issues based on the GPU load, there's no direct support for doing it. If I were to try implementing such a thing, I would probably start by trying to using timestamp queries to track when Dispatch calls actually get executed. Then based on that feedback you could try to decide whether to issue new Dispatch calls.

Edited by MJP

Share this post


Link to post
Share on other sites

That sounds ok.

 

Since I am using a CS to spawn particles and another to update them I was (am) worried that the data gets completely mixed up. 

 

BUT, since you can't bind the same buffer to different CS at the same time this should theoretically not happen? (I just hope you can also not bind the same buffer to the same CS with another dispatch call)

 

 

Lets say I am doing this:

1. Bind particleStateBuffer to CSUpdateParticle
2. Dispatch CSUpdateParticle
3. Unbind particleStateBuffer to CSUpdateParticle

4. Bind particleStateBuffer to CSSpawnParticle
5. Dispatch CSSpawnParticle
6. Unbind particleStateBuffer to CSSpawnParticle

Once point 4. is beeing reached and the CSUpdateParticle is still running, CSSpawnParticle wouldn't do anything right? Will it then getting queued or just ignored forever?

 

And now if I imagne that the CPU frames are much faster than the compute shaders. I just hope that there aren't multiple instances of the same CS running in parallel with the same data and both manipulating that data at the same time. 

Edited by me_12

Share this post


Link to post
Share on other sites

You don't have anything to worry about if I'm understanding you right.

 

Setting/unsetting shader resources, constant buffers, calling Dispatch etc are instructions that the GPU will execute sequentially at some point in the future, only overlapping work where it's valid to do so. Even if the CPU were allowed to run 50 frames ahead (it isn't) you're still just building up a buffer of commands that the GPU will execute in order without skipping any of them. The CPU is only allowed to get 1-3 frames ahead of the GPU before DirectX will block you from adding any more commands to allow the GPU to catch up. It doesn't do this because it's possible that the GPU will end up running multiple frames in parallel but because it would introduce unnecessary latency whereby the time between the CPU issuing the commands and the GPU actually executing them gets higher and higher.

 

Even if the GPU has got 3 frames worth of commands ready and waiting to go, it'll will run them sequentially and not skip any frames. 

Share this post


Link to post
Share on other sites

In the case you're describing the driver will automatically detect the dependency between your two Dispatch calls, and it will insert a sync point before executing the second Dispatch. This will cause the GPU to wait until all threads from the first Dispatch complete, so the second Dispatch won't begin until all of the data is present in the state buffer.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this