I tried naively enclosing Dispatches with timestamp queries (ID3D11Query) and it failed to give reasonable results. First Dispatch seems to take long time, few next are below microsecond. I suppose gpu ends first dispatch after pipeline is ready for executing ComputeShaders, and following dispatches just pop in when there is place for new threads. Any synchronizations between them seem to be handled after that with no impact on dispatch timestamps. Unfortunately Nvidia Nsight works really slow on my pcm so is there any way of measuring Compute Shader execution time using ID3D11Query or similar approach? I am afraid there is no simple solution with current API.
Compute Shader execution time
Members - Reputation: 1155
Posted 01 August 2013 - 06:52 AM
Have you tried using Intel GPA? I've had good Compute Shader dispatch time measurement using that tool (it does not require an Intel GPU for most functionality).
Adam Miles - Senior Software Development Engineer - Xbox Advanced Technology Group
Moderators - Reputation: 16169
Posted 01 August 2013 - 12:49 PM
The problem with timestamp queries is that they just tell you the amount of time it takes for the GPU's command processor to reach a certain point in the command buffer. Actually measuring the amount of time it takes for a Draw or Dispatch call is more complicated than that, because the GPU can be executing multiple Draw/Dispatch calls simultaneously. Since there can be lots of things in flight, the command processor generally won't wait for a Draw/Dispatch to finish before moving on and executing the next command. So if you just wrap a single Dispatch call, all you'll get is the amount of time for the CP to start the Dispatch and then move on. To get any sort of accurate timing info that need to wrap your Begin/End around something that will cause the driver to insert a sync point, or try to force a sync point yourself. Typically any Dispatch or Draw that reads from the output of another Dispatch or Draw will cause the GPU to sync. But of course inserting artificial sync points will hurt your overall efficiency by preventing multiple Draw/Dispatch calls from overlapping, so you have to be careful.
On a related note, this is related to why Nsight will give you the "sync" and "async" timings for a Draw or Dispatch call. The "sync" value gives you the time it takes to execute the call if it's the only call being executed on the GPU, while the "async" value gives you the time it took to execute with all of the other Draw/Dispatch calls being executed during the frame.
Members - Reputation: 135
Posted 02 August 2013 - 02:53 AM
@MJP: I tried some CopySubresourceRegion for dummy buffers between dispatches and driver is too smart for that. Inserting artificial synchronization points might work, but I won't do it until every other option fails. It seems nightmare to keep code understandable after that.
@ATEFred: Is it only when gpu must sync between them, or it always happens? It would be strange if gpu stalled on each dispatch with idle shading units.