Synchronizing Compute Shader

Started by
4 comments, last by gfxCahd 10 years, 1 month ago

I have a single compute shader that has two distinct steps.
I need the first step to be finished, before starting the second step
(i.e. threads in the second step use data that could be produced by any thread in the previous step).

I have separated the two steps into two functions in my compute shader.
Is there a way to synchronize them inside the compute shader?
Or do I need to do it by calling Dispatch twice, and run one of the methods depending on the value in a cbuffer?


[numthreads(128, 1, 1)]

void CSMAIN(uint3 DispatchThreadID : SV_DispatchThreadID, uint GroupID : SV_GroupID)
{
  if (DispatchThreadID.x < N)
  {
    StepOne(DispatchThreadID.x);    
    //synchronization here?
    StepTwo(DispatchThreadID.x);
  }
}
Advertisement

p.s. I guess the following command does not fit my needs, since it only syncs groups, but not all of my threads.

AllMemoryBarrierWithGroupSync function

Blocks execution of all threads in a group until all memory accesses have been completed and all threads in the group have reached this call.

http://msdn.microsoft.com/en-us/library/windows/desktop/ff471351%28v=vs.85%29.aspx

How many groups do you dispatch? If you just dispatch one group, i.e. Disptach(1,1,1), the method GroupMemoryBarrierWithGroupSync() should work. To use this type of synchronization you need to setup a groupshared array and let each thread write once into that array after it has finished a step.

How many groups do you dispatch? If you just dispatch one group, i.e. Disptach(1,1,1), the method GroupMemoryBarrierWithGroupSync() should work. To use this type of synchronization you need to setup a groupshared array and let each thread write once into that array after it has finished a step.

Actualy I'm using a variable amount of groups (the amount of threads per group is fixed, the amount of groups depends on the size of the input/problem).

Could you maybe elaborate on the groupshared array method? Not sure I understand how it should work.

The sync functions only work across thread groups. If you need a global sync point, then the easiest way to do it would be to split things into two Dispatch calls.

Ah ok, thanks!

Was hoping I could save 0.3 milliseconds or so in overhead, but it was not to be...

This topic is closed to new replies.

Advertisement