Paul__

Member
  • Content count

    8
  • Joined

  • Last visited

Community Reputation

117 Neutral

About Paul__

  • Rank
    Newbie
  1. Ah, thanks, that's how I thought I'd have to access the object.
  2. Meaning that then the vertex shader will still report to dx a 32 byte stride? I assumed that the hlsl compiler looked at which struct was used with the StructuredBuffer. Oh well. Perhaps I'll live with the warning, or work out how to use a raw buffer with a counter. Thanks again!
  3. Hi MJP, thanks for your reply. Dash it! I was hoping that there was some way around. I tried copying the buffer to another some weeks ago, but that was just too slow (the buffer is very large). I might give the raw buffer + atomic increment thing a go. Or change the vertex buffer to accept a 128 byte struct, and read from within that struct the one vert it needs. Thanks for the advice.
  4. Hello all, I'm getting this error from dx: "ID3D11DeviceContext::DrawIndexed: The Shader Resource View in slot 0 of the Vertex Shader unit has structure stride 128 while the shader expects a structure stride of 32. This mismatch is invalid if the shader actually uses the view (e.g. it is not skipped due to shader code branching)." The app works as expected, but I want to avoid the error filling the log. Have done lots of searching on net and experimenting with code and can't get rid of it. What I've got is an append buffer that the compute shader writes to through a UAV. It's then unbound from compute shader, and bound to the vertex shader as a resource view. The append buffer is made with a StructureByteStride of 128 bytes. This one struct contains 4 vertices. Need to have 4 vertices in one chunk, so that when vertices are written to the append buffer they are written in one append() as two primitives and aren't mixed up in the append process. But the vertex shader reads single 32 byte vertices. Here's how the buffer is declared in the vert shader: StructuredBuffer<cVertIn> vertsIn So I guess the mismatch is between the 128 byte stride specified when the buffer was created, and the 32 byte stride of the struct declared in the vert shader hlsl. Is there a way of having different byte strides for a resource view and its associated buffer? Any other way to avoid the error? I have to use an append buffer, because the amount of data the compute shader generates is variable. Paul
  5. Thanks for all your replies -- a great help. MJP: thanks for clarifying about the app reading gpu resources and the effect it has. I guess this means that programmers avoid reading the gpu in an app if possible, because such an app can't have the cpu working many many frames ahead of the gpu. I suppose it does kind of make the cpu and gpu stuck to each other for every single frame, and means they can't operate independently. Also, with the driver using buffer renaming, I guess also there's no point in an app multi-buffering its dynamic buffers, because it's already done for it? About the primitive count and why it's important. In my app, the compute shader generates a variable amount of primitives. Variable, because it's creating water tiles and each chunk of terrain has a variable amount of water tiles. On top of that, the amount of water tiles for each chunk changes throughout the game, based on water physics and other factors. So regardless of whether I use DrawIndirect or Draw, I think I still need to know the amount of water tiles in order to render them, either through reading how many tiles the compute shader made, or by the app keeping track of each chunk's amount of water tiles and updating those counts when the water behaviour changes. Keeping track is difficult, because terrain data is duplicated in video ram, and is updated from the main ram version only when there's a change. But I can and probably will maintain such a tile count, even though it'll be a bit of a pain. Anyway, thought I'd say why reading the gpu would simplify the code so much. But I'm now persuaded it's probably not a good idea!
  6. Okay, thanks Nik02, I think I understand the GPU/CPU relationship a bit better now.
  7. Thanks for your answer. I'm not sure if I can really reorganise the way a frame is structured. Which means I might have to go the hard way and maintaining counts of all the geometry, rather than read the count from the append buffer. Damn! So just to clarify, when an app *reads* a GPU buffer using map/unmap() will *always* cause the CPU to wait for the GPU? Compared to when an app *writes* to a dynamic buffer, which doesn't always cause the cpu to wait (I guess because under the hood dx seems to maintain multiple buffers for dynamic writes). Also, when you say that the CPU "sits around waiting for the GPU to execute all pending commands", does that truly mean that all dx commands queued up for that frame have to be executed before a buffer can be read, or does it mean that only commands involving the particular append buffer to be read have to be waited for? I'm using dx queries to time the gpu. I could well have made a mistake though! Thanks again. Paul
  8. Hey all, Profiling has shown that there's a massive slow down at a point in my game app. In each frame, I use the compute shader to create vertices which are written to a default usage append buffer. Then the code reads the amount of vertices written by the compute shader with CopyStructureCount(). The target buffer for CopyStructureCount() is a D3D11_USAGE_STAGING buffer which is four bytes long, created with D3D11_CPU_ACCESS_READ. Then my app calls map() -> memcpy() -> unmap(). This last process causes the cpu to stop for 4 ms and the gpu to stop for 1 ms. Without the call to the staging buffer's map/unmap, other dx calls and the app generally seem to take the right amount of time. It's possible for me to calculate from the game data how many verts should be written, and therefore not call CopyStructureCount(). But it's a huge headache, involving tracking lots of data that I otherwise wouldn't need to. The amount of pause is directly related to the length of the compute shader call. More vertices to create, longer pause. Seems likely the cpu is waiting for it to finish. Now, I know that with some dx calls the cpu is forced to wait for the gpu, because the gpu is already using that resource. But why does the GPU pause too? And surely double buffering won't help? Because the *same* frame needs to know how many primitives to write in the soon-to-follow draw() call. Any other suggestions? I'm sort of guessing here, but could I swap the order of each frame? Maybe: - <Frame starts> - Get the struct count from last frame - Draw the verts - Generate the next frame's verts - Present It's very hard to get *general* info about dx11 and the temporal relationship between the gpu and cpu, so any experienced help would be great!