Sign in to follow this  
newMe

Compute shader threads synchronization

Recommended Posts

Hi. I try to synchronnize thread groups at some point. The main idea is to make groups execute successively one at a time while others are just looping waiting. It looks something like this:

 

g_var is a RWBuffer variable, initially set to g_numGroups-1

 

[allow_uav_condition] do

{

    if(groupID == g_var)

    {

        if(groupID == 0)

              g_var = g_numGroups-1;

        else

             g_var -= 1;      

        break;

     }   

}while(true);

 

[allow_uav_condition] do
{
    if(g_var == g_numGroups-1)
        break;
}while(true);
 
So, the first group to break from the first loop would be the group with the highest index, actually it should not stay there at all. Then so do the others. They get caught in the second loop and loop there until the group with index 0 restores the initial value of g_var and they come through altogether. I just want that all the groups contribute to the same variable, so it can be used later by all groups. I ommited this part. I also ommited some details like setting g_var with threads with 0 index in each group, just want to convey the idea. The second loop is not a problem, but apparentely threads get caught in the first one and stay there, and  the card stops responding. I just can not see where i make mistake. Could anyone help me pointing out what is wrong and maybe suggest another way to do this. Thanks
I know I can break the code into pieces and synchronize it with another dispatch call. But the code itself is in a for loop, so it would involve changing a shader or a constant buffer several times. I thought maybe it is more efficient to stall them like they would anyway at the end of any code just waiting for other groups to complete.
Edited by newMe

Share this post


Link to post
Share on other sites

It sounds like you're trying to do something like GroupMemoryBarrierWithGroupSync in a roundabout way. Have you checked the HLSL reference to see if any of the barrier commands might do the job you're looking for?

Share this post


Link to post
Share on other sites

I have 5 groups 32 threads each. If i increment by 2 so that only groups indexed 4, 2 and 0 are in play it works. But when 4 is reduced by 1 then group number 3 never steps in and all is looping endlessly. It is kind of wierd.

Edited by newMe

Share this post


Link to post
Share on other sites

You can't reliably "synchronize all threads" because you have no guarantee that the hardware will actually be able to execute all of your thread groups simultaneously. GPU's have a finite number of hardware units, which corresponds to a finite number of threads being in flight at any given time. So if you launch enough thread groups to saturate the GPU and then each thread waits for all threads to hit sync point, you'll get an instant deadlock. 

I would really suggest just splitting into multiple dispatches if you want a global sync point. 

Edited by MJP

Share this post


Link to post
Share on other sites

Ok, if i split it into several dispatch calls what would be better: keep the shader in one piece, separate it in several if() sections and choose the right one with cbuffer vars, changing it between dispatches or split the shader into several ones.

Share this post


Link to post
Share on other sites

If i try to apply a very basic logic, since i dont have any knowledge of this, i would say setting a shader can involve moving some code, maybe compiling it, setting some registers with constants from the code, etc. Setting a constant buffer is just setting some registers. All threads take the same pathes, based on constants. From this point of view i would prefer to change a constant buffer unless i miss some aspect i dont know about.

Share this post


Link to post
Share on other sites

There is a funny thing i found. Say, there is a shader with two pathes and two dispatch calls taking those pathes. In the first path one writes some value to a buffer and in the second one just reads it back. The pathes are taken based on cbuffer variables that are switched between the calls. So if we write into a RWByteAddressBuffer on the first call the value does not survive the call, one cannot  read it back, it zeroes out. I can read it back still during the first call though, so it is not like i dont set it. It is there, but then gets lost. RWStructuredBuffer seems OK. Has anyone noticed this kind of behavior? 

Edited by newMe

Share this post


Link to post
Share on other sites

If i try to apply a very basic logic, since i dont have any knowledge of this, i would say setting a shader can involve moving some code, maybe compiling it, setting some registers with constants from the code, etc. Setting a constant buffer is just setting some registers. All threads take the same pathes, based on constants. From this point of view i would prefer to change a constant buffer unless i miss some aspect i dont know about.

Switching shaders in and out doesn't require recompiling each time, you still only need to compile each one once. Assuming you don't have to change the logic within the shaders each iteration.

Share this post


Link to post
Share on other sites

I am not entirely sure about it. I dont know what is a shader compiled by directx. Maybe it cannot be loaded just like that into the processor and some additional processing  is required.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this