D3Dlock_Nooverwrite Erratic Flashing

Started by
3 comments, last by Hodgman 7 years, 9 months ago

Hi everyone,

I've been working on a quadtree LOD terrain system lately and up until now, things have been going pretty smoothly. You can see a couple of screenshots showing the kind of basic result I'm getting in this post.

I recently started working on a different computer and I just copied my VC++ project directly across, opened it and re-compiled it to check it would work. That's when I ran into a very strange issue that I don't really know how to diagnose. My terrain mesh, which rendered perfectly well on the previous computer, now appears as a bunch of random flashing squares instead of a complete mesh. I've made a gif from a few frame captures which shows you the kind of result I'm getting:

R2u7nAx.gif

I've isolated the issue to my Vertex Buffer locking calls, where I specify the flags D3DLOCK_DISCARD | D3DLOCK_NOOVERWRITE. If I remove D3DLOCK_NOOVERWRITE, the issue is resolved but my frame rate drops dramatically. It is for performance reasons that I used D3DLOCK_NOOVERWRITE in the first place, so I'd like to keep using it if possible!

So, my question is, can anyone think of a reason why my code which worked perfectly fine on a different computer just yesterday is suddenly so broken? I guess it's some sort of hardware issue (I can post specs if you think it may help). Or perhaps there's a code issue which was somehow masked on the previous computer?

I'll explain how the program works. Basically the terrain is constructed using a bunch of square tiles which recursively subdivide or un-subdivide into four more tiles depending on the camera distance each frame. When I have created the list of tiles to render for a given frame, I send them, one by one, into my vertex buffer and I draw them individually using D3DPT_TRIANGLESTRIP. This means that the vertex buffer is emptied and re-filled a great deal of times each frame, holding a maximum of four vertices at any one time. Here is some code which may be useful:

TerrainVertex structure (each terrain tile has an array of four of these (i.e. TerrainVertex vertices[4];) which are filled when the tile is created):


struct TerrainVertex
{
D3DXVECTOR4 pos;
D3DXVECTOR2 texCoords;
D3DXVECTOR3 normal;
 
TerrainVertex()
{pos = D3DXVECTOR4(0, 0, 0, 1); texCoords = D3DXVECTOR2(0, 0); normal = D3DXVECTOR3(0, 1, 0);}
TerrainVertex(D3DXVECTOR4 p, D3DXVECTOR2 txc, D3DXVECTOR3 n)
{pos = p; texCoords = txc; normal = n;}
};

Vertex declaration/buffer setup:


D3DVERTEXELEMENT9 elements[] =
{
{0, sizeof(float)*0, D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_POSITION, 0},
{0, sizeof(float)*4, D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_TEXCOORD, 0},
{0, sizeof(float)*6, D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT, D3DDECLUSAGE_NORMAL, 0},
D3DDECL_END()
};
vertexDecleration = 0;    //This is a LPDIRECT3DVERTEXDECLARATION9 declared elsewhere in the program
if (FAILED(d3ddev->CreateVertexDeclaration(elements, &vertexDecleration))){return false;}
if (FAILED(d3ddev->CreateVertexBuffer(4*sizeof(TerrainVertex), D3DUSAGE_DYNAMIC | D3DUSAGE_WRITEONLY, 0, D3DPOOL_DEFAULT, &vertexBuffer, 0))){return false;}

Rendering loop which should render all the tiles for a given frame, one after the other:


//Note that renderQueue is just a list of tiles which is re-filled on each frame. Each element in renderQueue has four vertices as described above
for (unsigned rqc = 0; rqc < renderQueue.size(); rqc++)
{
void* pVoid;
if (FAILED(vertexBuffer->Lock(0, 0, (void**)&pVoid, D3DLOCK_DISCARD | D3DLOCK_NOOVERWRITE))){return false;}
memcpy(pVoid, renderQueue[rqc]->vertices, sizeof(renderQueue[rqc]->vertices));
if (FAILED(vertexBuffer->Unlock())){return false;}
 
if (FAILED(d3ddev->SetVertexDeclaration(vertexDecleration))){return false;}
if (FAILED(d3ddev->SetStreamSource(0, vertexBuffer, 0, sizeof(TerrainVertex)))){return false;}
 
if (FAILED(d3ddev->DrawPrimitive(D3DPT_TRIANGLESTRIP, 0, 2))){return false;}
}
So, you can see that the vertexBuffer is continually being re-filled with four different vertices for each tile.
Thanks for looking, and let me know if I can provide any more useful info!
Advertisement

Have you enabled the debug layer?

D3DLOCK_DISCARD together with D3DLOCK_NOOVERWRITE is nonsense. Either you use only discard to get a new buffer every time or you create a larger buffer, map a small part with nooverwrite and use it, map the next small part with nooverwrite and use it, map the next...

Yeah as above, you shouldn't used discard and no-overwrite, but one or the other.

When you're writing graphics code, you're actually writing multi-threaded code, where one "thread" is your CPU, and one "thread" is your GPU.

For the lock flags:

  • Passing 0 or D3DLOCK_READONLY is equivalent to acquiring a mutex. Both threads will synchronize (one will stall) if they're both using the resource.
  • Passing D3DLOCK_NOOVERWRITE is equivalent to doing nothing... If both threads are using the resource, you've got yourself a race condition! Using this flag requries that you implement your own synchronization, e.g. by using IDirect3DQuery9::Issue to keep track of the GPU's progress.
  • Passing D3DLOCK_DISCARD is similar to no-overwrite, but relying on the driver to implement a clever scheme for you. This is actually equivalent to Releasing your buffer and Creating a new one every time you lock it (except much more optimal than that!). Internally, D3D releases the memory allocation for the buffer (which doesn't cause it to get deleted/free'd immediately -- if it's in use by the GPU, then it will have incremented the reference counter, so the delete/free will only occur after the GPU has finished with that data), and allocates a new memory allocation to return to you as the result of the Lock call.

If you want something similar to Discard, but want to do it yourself, the typical solution is to allocate a buffer that is N*(M+1)*Size, where Size = the number of bytes you need the buffer to store, N = the number of times that you update the data per frame, and M = the number of frames that you wish to have in flight (typically 1 or 2).

Every time the user wishes to "lock" your buffer, increment the offset by Size (or wrap back around to zero when you reach the end). Lastly, you need to ensure that the GPU is only ever M frames behind the CPU. To do this, you can issue a query at the end of every frame as a kind of 'fence', and use IDirect3DQuery9::GetData to periodically see which fences the GPU has passed. If the GPU is too far behind (M frames), then you need to busy wait on GetData (using the D3DGETDATA_FLUSH flag while busy waiting to ensure the GPU is making progress) until it has caught up.

This is a typical "ring buffer" used to stream data to the GPU :)

If done right, it should be slightly faster than using the Discard flag while actually being safe and not giving you flickering bugs from a race condition :lol:

Ah OK, thanks for the responses! Strange how it worked perfectly on my other computer! I guess a good approach might be to have a fixed-size buffer to hold, say, 100 tiles and I can loop through my tiles to add them into the buffer using D3DLOCK_NOOVERWRITE. Then when I fill up the buffer I can draw every tile in there, then lock it with D3DLOCK_DISCARD and fill it up from the bottom again. i.e. draw 100 tiles at a time, emptying the buffer for the next 100.

Yep, that's another common way to use no-overwrite without the hassle of managing the queries like in my example!

This topic is closed to new replies.

Advertisement