"ID3D11DeviceContext::DrawIndexed: The Shader Resource View in slot 0 of the Vertex Shader unit has structure stride 128 while the shader expects a structure stride of 32. This mismatch is invalid if the shader actually uses the view (e.g. it is not skipped due to shader code branching)."
The app works as expected, but I want to avoid the error filling the log. Have done lots of searching on net and experimenting with code and can't get rid of it.
What I've got is an append buffer that the compute shader writes to through a UAV. It's then unbound from compute shader, and bound to the vertex shader as a resource view.
The append buffer is made with a StructureByteStride of 128 bytes. This one struct contains 4 vertices. Need to have 4 vertices in one chunk, so that when vertices are written to the append buffer they are written in one append() as two primitives and aren't mixed up in the append process.
But the vertex shader reads single 32 byte vertices. Here's how the buffer is declared in the vert shader:
So I guess the mismatch is between the 128 byte stride specified when the buffer was created, and the 32 byte stride of the struct declared in the vert shader hlsl.
Is there a way of having different byte strides for a resource view and its associated buffer? Any other way to avoid the error? I have to use an append buffer, because the amount of data the compute shader generates is variable.
Profiling has shown that there's a massive slow down at a point in my game app.
In each frame, I use the compute shader to create vertices which are written to a default usage append buffer. Then the code reads the amount of vertices written by the compute shader with CopyStructureCount(). The target buffer for CopyStructureCount() is a D3D11_USAGE_STAGING buffer which is four bytes long, created with D3D11_CPU_ACCESS_READ. Then my app calls map() -> memcpy() -> unmap(). This last process causes the cpu to stop for 4 ms and the gpu to stop for 1 ms.
Without the call to the staging buffer's map/unmap, other dx calls and the app generally seem to take the right amount of time.
It's possible for me to calculate from the game data how many verts should be written, and therefore not call CopyStructureCount(). But it's a huge headache, involving tracking lots of data that I otherwise wouldn't need to.
The amount of pause is directly related to the length of the compute shader call. More vertices to create, longer pause. Seems likely the cpu is waiting for it to finish.
Now, I know that with some dx calls the cpu is forced to wait for the gpu, because the gpu is already using that resource. But why does the GPU pause too? And surely double buffering won't help? Because the *same* frame needs to know how many primitives to write in the soon-to-follow draw() call.
Any other suggestions? I'm sort of guessing here, but could I swap the order of each frame? Maybe:
- <Frame starts>
- Get the struct count from last frame
- Draw the verts
- Generate the next frame's verts
It's very hard to get *general* info about dx11 and the temporal relationship between the gpu and cpu, so any experienced help would be great!