Jump to content

  • Log In with Google      Sign In   
  • Create Account


#Actualbackstep

Posted 14 October 2013 - 01:50 PM

I'm no expert in directcompute, really just starting out with it, but I would think the difference in your example would come down to how the API command queue is handled by the driver.  I mean your hardware probably has more than 64 threads available, but will it share them amongst multiple small dispatches?
 
Assuming each loop is operating on different data and there are no collisions, it could run the multiple dispatches in parallel the same as the single dispatch.  I don't know what level of support GPUs have for concurrently running multiple compute kernels, and that's what it'd come down to.
 
Unless anyone else here is more knowledgeable on how compute command queues are handled once they leave the API, my advice would be to profile it.  You can use http://graphics.stanford.edu/~mdfisher/GPUView.html (part of the SDK now) to check the command queue in the API and see how the driver/gpu schedules your calls to see if the looped version runs in series or parallel.   You could also grab a vendor profiling tool like nsight for nvidia or perfstudio for amd to check that also, and get some feedback on the thread occupancy as well.

Edit: Just noticed ATEfred's reply and he put it more succintly, you're asking the GPU to run multiple dispatches concurrently and it sounds like that isn't supported.


#2backstep

Posted 14 October 2013 - 01:48 PM

I'm no expert in directcompute, really just starting out with it, but I would think the difference in your example would come down to how the API command queue is handled by the driver.  I mean your hardware probably has more than 64 threads available, but will it share them amongst multiple small dispatches?

 

Assuming each loop is operating on different data and there are no collisions, it could run the multiple dispatches in parallel the same as the single dispatch.  I don't know what level of support GPUs have for concurrently running multiple compute kernels, and that's what it'd come down to.

 

Unless anyone else here is more knowledgeable on how compute command queues are handled once they leave the API, my advice would be to profile it.  You can use http://graphics.stanford.edu/~mdfisher/GPUView.html'> http://graphics.stanford.edu/~mdfisher/GPUView.html (part of the SDK now) to check the command queue in the API and see how the driver/gpu schedules your calls to see if the looped version runs in series or parallel.   You could also grab a vendor profiling tool like nsight for nvidia or perfstudio for amd to check that also, and get some feedback on the thread occupancy as well.

 

Edit: Just noticed ATEfred's reply and he put it more succintly, you're asking the GPU to run multiple dispatches concurrently and it sounds like that isn't supported.


#1backstep

Posted 14 October 2013 - 01:47 PM

I'm no expert in directcompute, really just starting out with it, but I would think the difference in your example would come down to how the API command queue is handled by the driver.  I mean your hardware probably has more than 64 threads available, but will it share them amongst multiple small dispatches?

 

Assuming each loop is operating on different data and there are no collisions, it could run the multiple dispatches in parallel the same as the single dispatch.  I don't know what level of support GPUs have for concurrently running multiple compute kernels, and that's what it'd come down to.

 

Unless anyone else here is more knowledgeable on how compute command queues are handled once they leave the API, my advice would be to profile it.  You can use [url = http://graphics.stanford.edu/~mdfisher/GPUView.html]GPUview[/url] (part of the SDK now) to check the command queue in the API and see how the driver/gpu schedules your calls to see if the looped version runs in series or parallel.   You could also grab a vendor profiling tool like nsight for nvidia or perfstudio for amd to check that also, and get some feedback on the thread occupancy as well.

 

Edit: Just noticed ATEfred's reply and he put it more succintly, you're asking the GPU to run multiple dispatches concurrently and it sounds like that isn't supported.


PARTNERS