Weird corrupt buffers when created concurrently

Started by
7 comments, last by Tispe 9 years ago

For some unknown reason, I get corrupted vertex/index/texture buffers (leading to glitchy rendering) in some rare cases when I concurrently create those resources while rendering in a different thread.

Here's a breakdown:

I have a render thread that renders some UI stuff (e.g. a loading progress bar) while a loader thread loads data off the disk. The loader thread calls some D3D11Device Create functions with initialization data (pSysMem of D3D11_SUBRESOURCE_DATA is valid). For some hair pulling reason, some buffers that are initialized this way end up getting corrupted at some point. I am 100% sure that the data being passed in is correct (I verify it before calling Create). When I copy and stage the data to read it back from the CPU (after all loading has completed), indeed the buffer data is not the original data I initialized it to. The buffer device objects themselves are perfectly fine (GetDesc returns the correct creation params).

The buffers that get corrupted are completely random. If I reload the same scene over and over again, it will be a different buffer that gets corrupted each time. This indicates some sort of race condition. The code never crashes, so probably no CPU side stacks/heaps are being corrupted.

I've verified that I never access the ImmediateDeviceContext (of the Render thread) from the Loader thread (or DXGI stuff either). I've remarked all Map/UpdateSubresource calls to ensure I'm not accidentally writing passed buffer ends. The device is being created with the correct flags (i.e. no SINGLE_THREADED flag is set).

This problem does not occur if I load synchronously (i.e. block rendering while loading).

A workaround to this problem is to create a DeferredDeviceContext and use UpdateSubresource to initialize the buffers through the DeferredDeviceContext then flush it in the main render thread when the Loader completes. I lose the ability of defining the buffers as IMMUTABLE and possibly some concurrent driver cleverness in getting the data to the GPU since the copies are really happening when the DeferedContext is flushed in the main render thread.

I've been able to reproduce this problem on different devices (GTX 970 and 680) albeit on the same system.

Has anyone encountered this problem before or have any insights? The documentation makes clear that D3D11Device Create functions are supposed to be re-entrant safe. What could possibly corrupt buffer data on the device?

Many thanks in advance.

Advertisement

How are you guaranteeing that all the data is actually finished copying before using it for drawing ?

What kind of thread safety do you have to separate your drawing from your updates?

You should have a queue of scenes, and only pass completed scenes to the draw thread, so that they can't be changed whilst being drawn. Completed scenes are disposed of along with any resources no longer needed.

Please let me know if this helps.

How are you guaranteeing that all the data is actually finished copying before using it for drawing ?

I'm assuming the driver needs to assert this condition somehow in order to support Creates (with supplied data) asynchronously. That is, when Create is called with init data, the driver takes care of getting it to the device. If I use it before the driver has copied the data to the device, I would expect the driver to block somehow, but unfortunately I don't know exactly the driver implementations regarding this.

What kind of thread safety do you have to separate your drawing from your updates?

You should have a queue of scenes, and only pass completed scenes to the draw thread, so that they can't be changed whilst being drawn. Completed scenes are disposed of along with any resources no longer needed.

Please let me know if this helps.

I effectively poll an atomic value shared between the render and loader thread until the loader thread completes and sets the value to indicate completion. Until then, none of the data loaded by the loader is accessed by the render thread.

Thanks for folks help, it's appreciated.

Quick update, I was unable to reproduce this on a different system. OK, a sample size of 2 is probably not going to convince me it's a driver problem yet but I really have no other explanations for corrupted data on the device.

Perhaps your synchronization method isn't quite correct? Have you tried using something more heavy weight than an atomic? It might also be interesting if you add a sleep statement after your resource creation but before you set your atomic to see if a time delay helps the situation.

So the system where this problem manifests had two video cards (GTX 980+680). When I pulled one video card, I was unable to reproduce. So I'm thinking this might be a driver issue with multiple non-SLI-ed video cards being used at the same time (one monitor attached to each video card for extended desktop, but only actually rendering to one card in the program). Either that or removing the second card changed the timings of things and this problem has become hidden to my test cases...

Anyway I hope this helps anyone encountering similar problems. I'll post again if it re-manifests.

Does your render thread also create the ID3D11Device and ID3D11DeviceContext?

This topic is closed to new replies.

Advertisement