DX11 Data processing (queue?) in compute shader

I have a simple traversal algorithm that I want to implement in a compute shader, it looks something like this:

while (!isEmpty(itemList))
	a = itemList.Pop();
  	results = Process(a);
	for each r in results:

My initial though was to do it with multiple passes/Dispatch calls - each one creating a new list of items and swapping the lists between dispatch's. the problem is there is no way to know when you're done with out a read back.

But then I thought I could possibly use a RWStructuredBuffer as a circular buffer to implement a processing queue, and have atomic operations that increment the read/write position to the buffer. This way I could do it with a single large dispatch.

Has anyone done something like this? Are there any references/tips/ideas you could give me?


