Compute shader threads synchronization

Started by
9 comments, last by newMe 9 years, 3 months ago

Hi. I try to synchronnize thread groups at some point. The main idea is to make groups execute successively one at a time while others are just looping waiting. It looks something like this:

g_var is a RWBuffer variable, initially set to g_numGroups-1

[allow_uav_condition] do

{

if(groupID == g_var)

{

if(groupID == 0)

g_var = g_numGroups-1;

else

g_var -= 1;

break;

}

}while(true);

[allow_uav_condition] do
{
if(g_var == g_numGroups-1)
break;
}while(true);
So, the first group to break from the first loop would be the group with the highest index, actually it should not stay there at all. Then so do the others. They get caught in the second loop and loop there until the group with index 0 restores the initial value of g_var and they come through altogether. I just want that all the groups contribute to the same variable, so it can be used later by all groups. I ommited this part. I also ommited some details like setting g_var with threads with 0 index in each group, just want to convey the idea. The second loop is not a problem, but apparentely threads get caught in the first one and stay there, and the card stops responding. I just can not see where i make mistake. Could anyone help me pointing out what is wrong and maybe suggest another way to do this. Thanks
I know I can break the code into pieces and synchronize it with another dispatch call. But the code itself is in a for loop, so it would involve changing a shader or a constant buffer several times. I thought maybe it is more efficient to stall them like they would anyway at the end of any code just waiting for other groups to complete.
Advertisement

It sounds like you're trying to do something like GroupMemoryBarrierWithGroupSync in a roundabout way. Have you checked the HLSL reference to see if any of the barrier commands might do the job you're looking for?

Visit http://www.mugsgames.com

Stroids, a retro style mini-game for Windows PC. http://barryskellern.itch.io/stroids

Mugs Games on Twitter: [twitter]MugsGames[/twitter] and Facebook: www.facebook.com/mugsgames

Me on Twitter [twitter]BarrySkellern[/twitter]

GroupMemoryBarrierWithGroupSync synchronizes threads within a group. I want to synchronize all threads.

I have 5 groups 32 threads each. If i increment by 2 so that only groups indexed 4, 2 and 0 are in play it works. But when 4 is reduced by 1 then group number 3 never steps in and all is looping endlessly. It is kind of wierd.

You can't reliably "synchronize all threads" because you have no guarantee that the hardware will actually be able to execute all of your thread groups simultaneously. GPU's have a finite number of hardware units, which corresponds to a finite number of threads being in flight at any given time. So if you launch enough thread groups to saturate the GPU and then each thread waits for all threads to hit sync point, you'll get an instant deadlock.

I would really suggest just splitting into multiple dispatches if you want a global sync point.

Ok, if i split it into several dispatch calls what would be better: keep the shader in one piece, separate it in several if() sections and choose the right one with cbuffer vars, changing it between dispatches or split the shader into several ones.

I think splitting would make more sense rather than branching.

Or you can use defines to effectively split the shader - either way you need to set different shaders.

If i try to apply a very basic logic, since i dont have any knowledge of this, i would say setting a shader can involve moving some code, maybe compiling it, setting some registers with constants from the code, etc. Setting a constant buffer is just setting some registers. All threads take the same pathes, based on constants. From this point of view i would prefer to change a constant buffer unless i miss some aspect i dont know about.

There is a funny thing i found. Say, there is a shader with two pathes and two dispatch calls taking those pathes. In the first path one writes some value to a buffer and in the second one just reads it back. The pathes are taken based on cbuffer variables that are switched between the calls. So if we write into a RWByteAddressBuffer on the first call the value does not survive the call, one cannot read it back, it zeroes out. I can read it back still during the first call though, so it is not like i dont set it. It is there, but then gets lost. RWStructuredBuffer seems OK. Has anyone noticed this kind of behavior?

If i try to apply a very basic logic, since i dont have any knowledge of this, i would say setting a shader can involve moving some code, maybe compiling it, setting some registers with constants from the code, etc. Setting a constant buffer is just setting some registers. All threads take the same pathes, based on constants. From this point of view i would prefer to change a constant buffer unless i miss some aspect i dont know about.

Switching shaders in and out doesn't require recompiling each time, you still only need to compile each one once. Assuming you don't have to change the logic within the shaders each iteration.

Justin Stenning | Blog | Book - Direct3D Rendering Cookbook (using C# and SharpDX)

Projects: Direct3D Hook, EasyHook, Shared Memory (IPC), SharpDisasm (x86/64 disassembler in C#)

@spazzarama

 

This topic is closed to new replies.

Advertisement