Hence why I've not updated anything or posted much the last couple of days. If you've not already seen it, check out the Are you using the new version of PIX? thread:
Quote:Might as well make the most of an opportunity to provide feedback/comments on one of the best D3D tools going [wink]
Inquiring minds want to know.
In closing... another FAQ entry. Before doing my research I thought there'd be more to include, but seems its actually quite a simple topic. By all means let me know if I've missed anything (and/or any other comments as well):
D3D #20: Direct3D and multi-threading.
With the increasing presence of multi-core and multi-processor systems it is becoming more and more important to consider multi-programming options in your software. Even with traditional single-core CPU's it can be a substantial improvement to delegate resource loading/manipulation to "worker threads" (mostly due to the I/O related stalls when loading from a disk).
The simple rule is to keep all Direct3D related work in a single thread. Some resource-related operations can be improved via multi-threading, but core rendering and pipeline configuration gains nothing from multi-threading.
More specifically, if you do need to use the API across multiple threads you must add the D3DCREATE_MULTITHREADED flag to your IDirect3D9::CreateDevice() call. Adding this flag forces the runtime to take critical sections on most API calls, adding around 100 cycles per call (source) - refer to this list for normal function call times. That's before considering any delays resulting from actual contention. By keeping all device/API access in a single thread you do not need to add this flag, even if other parts of your application are using multiple threads. Good software design should allow you to completely avoid using this flag.
Refer to the "Coding for Multiple Cores" presentation from the GDC 2006 conference (more Microsoft presentations) for more information and general best practices.
As previously mentioned, various resource related algorithms suit a multi-threaded approach. Simple examples are streaming resources from disk, decompressing or extracting resources from virtual file systems. A simple way to mitigate any I/O stalling - either from regular storage of from virtual storage - is to do the I/O loading in the worker thread and then use one of D3DX's "In Memory" functions on the main thread. For example, prefer D3DXCreateTextureFromFileInMemoryEx() over D3DXCreateTextureFromFileEx().
For textures, reading the Multithreaded texture loading again... discussion (from the DirectXDev mailing list) is a good idea. Creating a texture using the null reference (D3DDEVTYPE_NULLREF) and in the D3DPOOL_SCRATCH pool in a separate thread should then be loaded using D3DXLoadSurfaceFromMemory() on the device/main thread. However, comments on the same mailing list suggest that better performance can be achieved by "rolling your own" loading/copying mechanism. As with many things it's a trade-off between simple or complex code and good or better performance.
It is worth noting that you can have the main device thread lock a resource and pass the pointer to a worker thread to perform the real work. Be sure to do a lock/copy/unlock operation and then pass the data to a worker thread - otherwise the threads (and GPU) will be synchronized until the work is complete thus gaining no multi-threading advantage! Be careful to make copies of the locked data (it will be invalidated after an unlock operation).
Compiling and creating shaders and effects is also a tempting candidate for a multi-threaded approach. Note that the D3DXCreateEffect() will create device objects, thus should either be protected using the D3DCREATE_MULTITHREADED flag or should only be run on the main device thread. To migrate this to a worker thread consider using ID3DXEffectCompiler (the API form of the command line fxc.exe) and passing the results to the main thread where D3DXCreateEffect() can be used. Use of D3DXCompileShader() should be safe across multiple threads as it doesn't require device interaction and the returned byte code can be passed to the main thread to create the actual vertex/pixel shader. Refer to the Building Effects in a Worker Thread discussion on the DirectXDev mailing list.
However, as a general note, run-time creation of a large number of shaders is always going to be slow such that the best performance win might be to compile shaders as part of the build process.
It is worth noting that D3DX10 has a powerful multi-threaded model that should allow easy utilization of a multi-core host system. More details required?