I noticed some strange behaviour playing around with GPU-controlled drawing and thought I'd make a post about it in case anybody happens to know what may cause it; I've been looking over and re-writing my code for two days now.
Essentially I have a pair of Append/ConsumeStructuredBuffers that are used to insert / remove billboard data. I use ID3D11DeviceContext::CopyStructureCount to copy the number of used elements in the append buffer (which acts as output for the current frame) to a buffer, which apparently works correctly. The buffer that the structure count is copied to is created with the D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS and D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS misc flags and is 16 bytes in sice. The element count is copied to the first 4 bytes, the remaining 12 bytes are filled out with the UINT values 1, 0, 0. This is later on used in a call to DrawInstancedIndirect, such that the element count becomes the number of vertices sent using a pointlist topology without any vertex layout (the "vertex id's" are used to index into the billboard output buffer described above and is passed along to a geometry shader that creates a quad per input "vertex" / point).
However, this apparently doesn't work the way I expect it too. Inspecting the drawing pass using RenderDoc it is insuinated that the DrawInstancedIndirect call is made with the wrong vertex count (oftentimes being stated to be in the range 0 .. 3 or around there, while the actual number that can be mapped back to the CPU from the buffer is hundreds of times larger most of the time).
If I copy the element count into a cbuffer instead and use another compute shader to write it into the indirect buffer, RenderDoc will agree that the value is what is expected in the cbuffer, however the draw call still gets the wrong vertex count value. The weird thing is that if I set it to a constant value from this small compute shader instead of the element count from the cbuffer, the correct value is preserved into the drawcall according to RenderDoc.
What could possibly cause this to happen, may it be that the written value is a dirty read because it is accessed before the CopyStructureCount function has finished doing its copying? But in that case I would expect different values than it somehow being clamped into the "low numbers" range. Also there is no documentation suggesting this function is asynchronous? Or well I guess the function call itself may be, but shouldn't the input order be preserved when the instruction list is actually carried out in that case?
Thanks for any pointers,
Husbjörn