I am trying to render an effect that requires ping-ponging back and forth between two textures. First I tried to implement it as a loop in a compute shader, with the two textures represented as RWStructuredBuffers and the output directed to a RWTexture2D. This resulted in a lot of "feedback" in the end texture, despite the fact that I put in calls to DeviceMemoryBarrierWithGroupSync after each iteration of the loop.
Then I tried setting up a loop in the C++ code to render via Pixel Shader back and forth between render targets, with Resource Barrier Transitions from Render Target to Pixel Shader Resource and back. This seems to only produce the output from the first ping-pong.
How should I be doing it, in theory? I liked the idea of the compute shader approach, but I'm sampling data outside of each execution's threadgroup quadrant of the texture, and that seems to be ruining things for me. I'm hesitant to Submit each render and then fence for completion, because that seems wasteful...
I guess basically, what's the best way to update a texture and ensure the update takes place before using it as a source?