Compute shader works using a split of work by ThreadGroup.
You can use a groupshared to have a cache to reuse data with the next ThreadGroup.
The problem is if you want to store an offset based on each cell who has a count the order is important.
My question is simple : Compute Shader works using a linear dispatch ?
I mean, does it work running thread group left to right and top to bottom ?