Seemingly incorrect buffer data used with indirect draw calls

Started by
4 comments, last by Husbj 9 years, 7 months ago

I noticed some strange behaviour playing around with GPU-controlled drawing and thought I'd make a post about it in case anybody happens to know what may cause it; I've been looking over and re-writing my code for two days now.

Essentially I have a pair of Append/ConsumeStructuredBuffers that are used to insert / remove billboard data. I use ID3D11DeviceContext::CopyStructureCount to copy the number of used elements in the append buffer (which acts as output for the current frame) to a buffer, which apparently works correctly. The buffer that the structure count is copied to is created with the D3D11_RESOURCE_MISC_DRAWINDIRECT_ARGS and D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS misc flags and is 16 bytes in sice. The element count is copied to the first 4 bytes, the remaining 12 bytes are filled out with the UINT values 1, 0, 0. This is later on used in a call to DrawInstancedIndirect, such that the element count becomes the number of vertices sent using a pointlist topology without any vertex layout (the "vertex id's" are used to index into the billboard output buffer described above and is passed along to a geometry shader that creates a quad per input "vertex" / point).

However, this apparently doesn't work the way I expect it too. Inspecting the drawing pass using RenderDoc it is insuinated that the DrawInstancedIndirect call is made with the wrong vertex count (oftentimes being stated to be in the range 0 .. 3 or around there, while the actual number that can be mapped back to the CPU from the buffer is hundreds of times larger most of the time).

If I copy the element count into a cbuffer instead and use another compute shader to write it into the indirect buffer, RenderDoc will agree that the value is what is expected in the cbuffer, however the draw call still gets the wrong vertex count value. The weird thing is that if I set it to a constant value from this small compute shader instead of the element count from the cbuffer, the correct value is preserved into the drawcall according to RenderDoc.

What could possibly cause this to happen, may it be that the written value is a dirty read because it is accessed before the CopyStructureCount function has finished doing its copying? But in that case I would expect different values than it somehow being clamped into the "low numbers" range. Also there is no documentation suggesting this function is asynchronous? Or well I guess the function call itself may be, but shouldn't the input order be preserved when the instruction list is actually carried out in that case?

Thanks for any pointers,

Husbjörn

Advertisement

Like any other GPU-executed command, CopyStructureCount has implicit synchronization with any commands issued afterwards. So there shouldn't be any kind of manual waiting or synchronization required, the driver is supposed to handle it.

Your approach all sounds okay, and I've successfully implemented something similar several times in the past. I'm not sure if you have a bug somewhere in your code, if there's a driver issue, or if RenderDoc is giving you incorrect information. Driver bugs can usually be diagnosed by enabling the reference rasterizer, and comparing the output. So you should do that, if you haven't already (WARNING: it's *very* slow). Reading back the value yourself on the CPU would be my other suggestion, but it sounds like you're doing that already.

I don't have too much to add to what MJP said, those are all good suggestions to look at next. I have tried RenderDoc with this kind of use case and it's worked, but by no means is that a guarantee that it's not buggy in some way smile.png. Though if the results look equally broken on replay as in your program, that suggests that something else is wrong - inaccurate information won't help to diagnose the problem mind you.

To clarify - the number that you see in the DrawInstancedIndirect(<X, Y>) in the event browser in RenderDoc is just retrieved the same way as you described by copying the given buffer to a staging buffer, and mapping that. The values are read out from the byte offset that you pass into that function. You can also see the structure counts in the pipeline view for any buffers bound to UAV slots, those are read out at the point before the drawcall occurs via the same mechanism of CopyStructureCount and mapping back to the CPU. I wasn't clear on whether you had any persistence between frames, but RenderDoc will capture these structure counts at the start of the frame and ensure they're correct on replay.

The toy program I have that I tested with a basic form of this kind of algorithm works roughly speaking by ping-ponging between two Append/ConsumeStructuredBuffers as follows, I could upload the source if that would be useful for you as working reference:

  1. Frame N has a source buffer with some non-zero structure count containing the already alive particles
  2. Any new particles that are spawned this frame are Append()'d onto the source buffer in a compute shader
  3. An update step runs with the source buffer as a ConsumeBuffer, and another empty buffer as an AppendBuffer. It pulls from the first, does any update, and if the particle is still alive pushes it onto the other. This is launched from a DispatchIndirect(), with the current source structure count being CopyStructureCount'd into the args buffer.
  4. From here, the second buffer can be used to render from. Either bound directly similar to how you describe or run another compute pass to prepare a packed vertex buffer - either works. In either case again this is done via an Indirect() call that copies in the current structure count with CopyStructureCount
  5. Finally clear what was the 'source' buffer and swap your pointers, so that frame N+1 uses the buffer full of newly updated data as its source buffer.

Like any other GPU-executed command, CopyStructureCount has implicit synchronization with any commands issued afterwards. So there shouldn't be any kind of manual waiting or synchronization required, the driver is supposed to handle it.

That's what I thought.

After a third rewrite (and a full rendering shader rewrite as well) it turned out I managed to build my quads the wrong way in the geometry shader so that they weren't visible; the appropriate vertex count does indeed seem to be passed to the DrawInstancedIndirect call. However, RenderDoc is still reporting the call as having a zero argument for the vertex count, so I guess there's a quite sneaky bug in there too which threw me off (naturally I expected it to give the correct output).

Thanks for your suggestions though smile.png

Edit: Didn't see your ninja post baldurk.

To clarify - the number that you see in the DrawInstancedIndirect(<X, Y>) in the event browser in RenderDoc is just retrieved the same way as you described by copying the given buffer to a staging buffer, and mapping that.

That is indeed weird because now I do get the proper count read back if I map it to a staging buffer myself and the correct draw results, yet RenderDoc claims this function is called with the arguments <0, 1>. I guess it clips away the last two offset integers because in reality the buffer should contain 4 values (mine would be x, 1, 0, 0) right?

My byte offset is zero, there is nothing more in the indirect buffer than the 16 bytes representing the argument list.

I'll try to add to my currently working minimalistic program to see if it still renders correctly and whether RenderDoc will keep on showing that 0 (or something else that's unreasonable) and get back. Maybe the problems will resurface in a different way once I add some complexity back in, though I hope not.

Hah, well I imagine this is little comfort to you now, but I found a bug in RenderDoc's handling - those Draw*Indirect parameters are only read back the first time it replays as the log is being loaded. But the initial structure counts of UAVs and other initial contents are only properly applied the rest of the times it replays after the log is loaded. Which caused the incorrect low numbers in the draw call that lead you down the wrong path. Many apologies for that, it's fixed now so that no-one will run into it again at least! And yes I omit the offset parameters and only show the count parameters - just to keep the event list more readable.

Oh I see, glad I could help with finding that out :)

I didn't realize you were its author, I owe you many thanks for all the times that nice little program has pointed out I forgot to bind some resource or similar when my renders went out to tea.

Also many thanks for the fix!


And yes I omit the offset parameters and only show the count parameters - just to keep the event list more readable.

Sounds reasonable.

This topic is closed to new replies.

Advertisement