Compute shader:
RWStructuredBuffer<float> DataA;RWStructuredBuffer<float> DataB;int NumItems;[numthreads(1,2,1)]void Method(uint3 i_DispatchThreadID:SV_DispatchThreadID){ int Index; Index=i_DispatchThreadID.y; DataA[Index]=1.0; AllMemoryBarrier(); // Wait for adjacent values of DataA to be set. [branch] if (Index!=NumItems-1) DataB[Index]=DataA[Index+1];}
Called by:
Dispatch(1,ceil(NumItems/ThreadGroupSize.y),1);
Before the code is called, DataA contains all 0.0s.
The code writes 1.0 to each element of DataA (1 item for each thread in the thread group), waits for the writes to complete, then sets each element of DataB from the next element of DataA.
With 8 items, 2 threads per group, and 4 groups, the results are:
GPU:
DataA={1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0};
DataB={1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0}; // Correct.
Reference device:
DataA={1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0};
DataB={1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0}; // Wrong!
With 8 items, 8 threads per group, and 1 group, the results are:
GPU:
DataA={1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0};
DataB={1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0}; // Correct.
Reference device:
DataA={1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0};
DataB={1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0}; // Wrong!
Any ideas?
JB.
[Edited by - JB2009 on November 9, 2010 6:07:57 AM]