Multithreading and ID3D11DeviceContext::Map

Started by
6 comments, last by Jason Z 11 years ago

Over the last two months I have been converting my engine from 64-bit single threaded to 32-bit multithreaded. The decision for multithreading is for various performance reasons. The decision for 32-bit is because I use Chromium Embedded Framework to power HTML (and CSS, Flash, Java, etc...) based user interfaces. I had originally wrapped CEF in a separate 32-bit process and used shared memory so that it could communicate with my 64-bit engine. I believe that this was the cause of false positives from my anti-virus software so I decided that going 32-bit to use CEF directly was a justifiable option.

CEF creates multiple threads internally. One of these threads calls an OnPaint() function when it is necessary to update the user interface image data. Inside OnPaint(), I call m_pImmediateContext->Map() with D3D11_MAP_WRITE_DISCARD in order to update the user interface texture. This, obviously, is giving me problems...

D3D11: CORRUPTION: ID3D11DeviceContext::Map: Two threads were found to be executing functions associated with the same Device at the same time. This will cause corruption of memory. Appropriate thread synchronization needs to occur external to the Direct3D API. 66768 and 67180 are the implicated thread ids. [ MISCELLANEOUS CORRUPTION #28: CORRUPTED_MULTITHREADING ]

My first thought was to simply lock the immediate context before calling Map(). However, my user interface is rendered on it's own dedicated deferred context (concurrently with all other deferred contexts). Specifically, a command list is generated (on the user interface deferred context) for rendering the user interface overlay to the screen on start up. That command list is reused for the life of the program (since the sequence of commands for rendering the user interface does not change), leaving the user interface's deferred context unused, yet available. In theory, if I can use this unused context to update the texture, I should be able to avoid having lock the immediate context. It would require quite a bit of rewriting just to be able to test this theory, so I decided to ask before hand to avoid doing so in vain.

Can ID3D11DeviceContext::Map() be called on a deferred context and provide immediate access to subresource data?

I apologize for taking so long to get to the question. I just wanted explain my intent and what led me to this question.

Advertisement

I managed to rewrite the code faster than I anticipated. Using the deferred context did, in fact, take care of the threading problem. However, the user interface texture does not update. I believe that the Map() and Unmap() calls made on the deferred context must be executed by the immediate context via a command list. If this is the case, I am utterly confused and here is why...


DeferredContext->Map(); // Get a pointer to sub resource data
memcpy(); // copy some new data to that location
DeferredContext->Unmap();
DeferredContext->FinishCommandList(); // Map(), and Unmap() get saved to this but NOT the memcpy call
...
...
...
ImmediateContext->ExecuteCommandList(); // What happens to the memcpy call to modify the data that Map() gives access to???

At the same time, I know that calling Map() does return a usable pointer immediately. If it did not, I would get an access violation from immediately trying to update the texture data.

There is very little information in the D3D docs about how Map() and Unmap() work with deferred contexts. Any help in the right direction would be greatly appreciated.

I have somewhat figured out...

  • Its appears that when you call Map() on a deferred context, memory is allocated by the API and a pointer to that memory is what is returned (immediately) in the D3D11_MAPPED_SUBRESOURCE struct.
  • All writes are actually written to this temporary memory.
  • The Map() and Unmap() calls are recorded to the command list.
  • Executing the command list results in the actual resource data being updated with the data written to the temporary memory.
  • Releasing the command list will free this temporary memory.
  • Command list interfaces are free threaded, just like the device interface. (This means I don't have to sync the command list for creation on the interface thread and execution on the rendering thread.)

This may not be 100% correct but I am positive I am close to accurate. I have not figured out if the temporary memory is allocated on the GPU, but it's location is pretty far off from all other pointer locations I am using in that section of code (i.e. 0x0b5----- returned from deferred context Map() call vs. 0x16b----- that appears everywhere else, including what the immediate context Map() call returns). I look forward to your feedback.

Thanks for sharing your results so far. From my past experience, what you have said is correct - mapping the resource on a deferred context will produce a temporary buffer that carries around your data with it, and the results are applied only when the command list is executed. I also seem to recall that you can't read data from this pointer on a deferred context - only writing.

I'm not sure what the actual issue is now - are your changes not being applied to the mapped buffer after executing the command list?

Thanks for the confirmation. I got so focused on posting what I found out that I forgot to mention that it works now. All that is left for me to do is clean up the code but before I do, I was wondering if there is a way to reuse that temporary buffer? Or is it automatically being reused by the API? It just seems terribly inefficient to have to allocate and release a buffer every frame just to update a texture.

Nope - that is all at the driver level, and never visible to the application. That is why in general they recommend that you don't do too many of the Map calls on the deferred contexts. Eventually you get into a diminishing returns scenario if you have to do lots of map calls.

I actually saw in one of the DirectX LinkedIn discussion groups that the AMD developer evangelist recommended not to use deferred contexts at all... However, I have indeed seen a performance gain in certain scenarios with deferred contexts - but it all depends on the scene and what you are doing. The biggest factor in the whole thing is how the driver is implemented and how you use the deferred contexts.

Out of curiosity, I decided to see where the CPU was spending the most time at.

The user interface thread:

Untitled1.png

The main rendering thread:

Untitled2.png

In my opinion, it seems that letting a separate thread incur the expense of the Map/memcpy calls on a deferred context and then letting the main thread spend relatively no time executing the resulting command list is a pretty nice solution. The alternative would be to lock the immediate context in order to perform the Map/memcpy/Unmap. Doing that would, in a sense, force both the user interface thread and the main thread to pay the price.

Did the AMD developer give any reasons for discouraging the use of deferred contexts?

Not really - he just listed the ways that you could use multithreading without deferred contexts. I agree with you though - I have seen clear benefits when you use them properly. I wonder if NVidia has the same advice, or if they are encouraging the use of deferred contexts as well...

This topic is closed to new replies.

Advertisement