• Advertisement
Sign in to follow this  

DX11 How do you multithread in Directx 11?

This topic is 1982 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I looked around the net and the forums,but I couldn't find basic instructions on how to multithread in DirectX11.From what I understood,you have to create the device and the context with some different flags?Is there any tutorial or post that explains how to pull it off?What are the actual benefits of multithreading on Dx11?I have a deferred renderer that works fine,but from what I heard,to really make the performance acceptable,you have to implement multithreading?I also saw something like that in the Frostbite 2 technology presentation pdf.

Share this post


Link to post
Share on other sites
Advertisement
[quote name='mrheisenberg' timestamp='1344850055' post='4968992']from what I heard,to really make the performance acceptable,you have to implement multithreading?[/quote]How many milliseconds does your game currently use to perform all of it's D3D calls on the CPU? Multi-threading your D3D calls is an optimisation, and the first step in optimizing is always to take measurements.
Keep in mind, the bulk of rendering takes place on the GPU, with the CPU only managing resources and submitting commands - so you'll only need to multi-thread your CPU code if you need to submit a LOT of commands, or if you do a lot of resource management at the same time as rendering.

Share this post


Link to post
Share on other sites
It takes 7 miliseconds to render a large building with 10 large directional lights and about 11 miliseconds for 60 large directional lights.I'm not very happy about the performance right now and I'm gonna optimize some more and implement instancing,but from what I understood modern engines like Frostbite 2 have both instancing and multi-threaded rendering.The thing is I have no idea how to implement multithreading,what changes do I have to make to my device and context creation?Currently I'm just using [b]D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, 0, &featureLevel, 1, D3D11_SDK_VERSION, &swapChainDesc, &swapChain, &DEVICE, NULL, &CONTEXT));[/b]

Share this post


Link to post
Share on other sites
[quote name='mrheisenberg' timestamp='1344853230' post='4969003']
It takes 7 miliseconds to render a large building with 10 large directional lights and about 11 miliseconds for 60 large directional lights.I'm not very happy about the performance right now and I'm gonna optimize some more and implement instancing,but from what I understood modern engines like Frostbite 2 have both instancing and multi-threaded rendering.The thing is I have no idea how to implement multithreading,what changes do I have to make to my device and context creation?Currently I'm just using [b]D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, 0, &featureLevel, 1, D3D11_SDK_VERSION, &swapChainDesc, &swapChain, &DEVICE, NULL, &CONTEXT));[/b]
[/quote]
You aren't listening to what Hodgman is saying here. Do you know how much of that time is spent queuing up draw calls on the CPU? What he's getting at is that you may actually be GPU limited-- that is, your CPU is mostly farting around waiting for the GPU to do the work assigned to it. You'd ultimately end up making the CPU fart around even more for no actual performance gain and in fact stand to make it worse if you handle threading poorly-- many professionals still can't get this right, although that's probably more the result of mediocre teaching than any inherent difficulty.

For what it's worth, though, [url="http://msdn.microsoft.com/en-us/library/windows/desktop/ff476082(v=vs.85).aspx"]this[/url] and [url="http://msdn.microsoft.com/en-us/library/windows/desktop/ff476385(v=vs.85).aspx"]this[/url] (and even more specifically, these [url="http://msdn.microsoft.com/en-us/library/windows/desktop/ff476423(v=vs.85).aspx"]two[/url] [url="http://msdn.microsoft.com/en-us/library/windows/desktop/ff476424(v=vs.85).aspx"]methods[/url]) should help get you started.

Share this post


Link to post
Share on other sites
When/if you do determine that your single CPU thread is the bottleneck, you can create deferred contexts for each additional thread by using ID3D11Device::CreateDeferredContext.

Each thread that handles rendering tasks has its own deferred context. Once the secondary threads have done their tasks for the frame (which usually is the CPU-side heavy lifting), you play back their finished command lists on the primary context to actually submit the state changes and draw commands to the device.

The SDK has a programming guide article about this: "Immediate and Deferred Rendering".

Share this post


Link to post
Share on other sites
The deferred context requires a lot of care getting things right (in a multithreaded manner). For most of my applications, I don't bother. Instead, I create the device multithreaded and use it for resource creation and destruction. Done right, you can get the resource creation occurring in one CPU thread while another CPU thread is busy with the previous resources. When the second thread is ready to process new data, it (hopefully) is available in GPU memory. Always make sure you profile. For some of my applications it is faster to create/destroy resources each frame rather than map/unmap an already existing resource.

Share this post


Link to post
Share on other sites
Did you try MSDN?

[url="http://msdn.microsoft.com/en-us/library/windows/desktop/ff476891(v=vs.85).aspx"]'Introduction to Multithreading in Direct3D11' [/url]and all the pages it references?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By AxeGuywithanAxe
      I wanted to see how others are currently handling descriptor heap updates and management.
      I've read a few articles and there tends to be three major strategies :
      1 ) You split up descriptor heaps per shader stage ( i.e one for vertex shader , pixel , hull, etc)
      2) You have one descriptor heap for an entire pipeline
      3) You split up descriptor heaps for update each update frequency (i.e EResourceSet_PerInstance , EResourceSet_PerPass , EResourceSet_PerMaterial, etc)
      The benefits of the first two approaches is that it makes it easier to port current code, and descriptor / resource descriptor management and updating tends to be easier to manage, but it seems to be not as efficient.
      The benefits of the third approach seems to be that it's the most efficient because you only manage and update objects when they change.
    • By evelyn4you
      hi,
      until now i use typical vertexshader approach for skinning with a Constantbuffer containing the transform matrix for the bones and an the vertexbuffer containing bone index and bone weight.
      Now i have implemented realtime environment  probe cubemaping so i have to render my scene from many point of views and the time for skinning takes too long because it is recalculated for every side of the cubemap.
      For Info i am working on Win7 an therefore use one Shadermodel 5.0 not 5.x that have more options, or is there a way to use 5.x in Win 7
      My Graphic Card is Directx 12 compatible NVidia GTX 960
      the member turanszkij has posted a good for me understandable compute shader. ( for Info: in his engine he uses an optimized version of it )
      https://turanszkij.wordpress.com/2017/09/09/skinning-in-compute-shader/
      Now my questions
       is it possible to feed the compute shader with my orignial vertexbuffer or do i have to copy it in several ByteAdressBuffers as implemented in the following code ?
        the same question is about the constant buffer of the matrixes
       my more urgent question is how do i feed my normal pipeline with the result of the compute Shader which are 2 RWByteAddressBuffers that contain position an normal
      for example i could use 2 vertexbuffer bindings
      1 containing only the uv coordinates
      2.containing position and normal
      How do i copy from the RWByteAddressBuffers to the vertexbuffer ?
       
      (Code from turanszkij )
      Here is my shader implementation for skinning a mesh in a compute shader:
      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 struct Bone { float4x4 pose; }; StructuredBuffer<Bone> boneBuffer;   ByteAddressBuffer vertexBuffer_POS; // T-Pose pos ByteAddressBuffer vertexBuffer_NOR; // T-Pose normal ByteAddressBuffer vertexBuffer_WEI; // bone weights ByteAddressBuffer vertexBuffer_BON; // bone indices   RWByteAddressBuffer streamoutBuffer_POS; // skinned pos RWByteAddressBuffer streamoutBuffer_NOR; // skinned normal RWByteAddressBuffer streamoutBuffer_PRE; // previous frame skinned pos   inline void Skinning(inout float4 pos, inout float4 nor, in float4 inBon, in float4 inWei) {  float4 p = 0, pp = 0;  float3 n = 0;  float4x4 m;  float3x3 m3;  float weisum = 0;   // force loop to reduce register pressure  // though this way we can not interleave TEX - ALU operations  [loop]  for (uint i = 0; ((i &lt; 4) &amp;&amp; (weisum&lt;1.0f)); ++i)  {  m = boneBuffer[(uint)inBon].pose;  m3 = (float3x3)m;   p += mul(float4(pos.xyz, 1), m)*inWei;  n += mul(nor.xyz, m3)*inWei;   weisum += inWei;  }   bool w = any(inWei);  pos.xyz = w ? p.xyz : pos.xyz;  nor.xyz = w ? n : nor.xyz; }   [numthreads(1024, 1, 1)] void main( uint3 DTid : SV_DispatchThreadID ) {  const uint fetchAddress = DTid.x * 16; // stride is 16 bytes for each vertex buffer now...   uint4 pos_u = vertexBuffer_POS.Load4(fetchAddress);  uint4 nor_u = vertexBuffer_NOR.Load4(fetchAddress);  uint4 wei_u = vertexBuffer_WEI.Load4(fetchAddress);  uint4 bon_u = vertexBuffer_BON.Load4(fetchAddress);   float4 pos = asfloat(pos_u);  float4 nor = asfloat(nor_u);  float4 wei = asfloat(wei_u);  float4 bon = asfloat(bon_u);   Skinning(pos, nor, bon, wei);   pos_u = asuint(pos);  nor_u = asuint(nor);   // copy prev frame current pos to current frame prev pos streamoutBuffer_PRE.Store4(fetchAddress, streamoutBuffer_POS.Load4(fetchAddress)); // write out skinned props:  streamoutBuffer_POS.Store4(fetchAddress, pos_u);  streamoutBuffer_NOR.Store4(fetchAddress, nor_u); }  
    • By mister345
      Hi, can someone please explain why this is giving an assertion EyePosition!=0 exception?
       
      _lightBufferVS->viewMatrix = DirectX::XMMatrixLookAtLH(XMLoadFloat3(&_lightBufferVS->position), XMLoadFloat3(&_lookAt), XMLoadFloat3(&up));
      It looks like DirectX doesnt want the 2nd parameter to be a zero vector in the assertion, but I passed in a zero vector with this exact same code in another program and it ran just fine. (Here is the version of the code that worked - note XMLoadFloat3(&m_lookAt) parameter value is (0,0,0) at runtime - I debugged it - but it throws no exceptions.
          m_viewMatrix = DirectX::XMMatrixLookAtLH(XMLoadFloat3(&m_position), XMLoadFloat3(&m_lookAt), XMLoadFloat3(&up)); Here is the repo for the broken code (See LightClass) https://github.com/mister51213/DirectX11Engine/blob/master/DirectX11Engine/LightClass.cpp
      and here is the repo with the alternative version of the code that is working with a value of (0,0,0) for the second parameter.
      https://github.com/mister51213/DX11Port_SoftShadows/blob/master/Engine/lightclass.cpp
    • By mister345
      Hi, can somebody please tell me in clear simple steps how to debug and step through an hlsl shader file?
      I already did Debug > Start Graphics Debugging > then captured some frames from Visual Studio and
      double clicked on the frame to open it, but no idea where to go from there.
       
      I've been searching for hours and there's no information on this, not even on the Microsoft Website!
      They say "open the  Graphics Pixel History window" but there is no such window!
      Then they say, in the "Pipeline Stages choose Start Debugging"  but the Start Debugging option is nowhere to be found in the whole interface.
      Also, how do I even open the hlsl file that I want to set a break point in from inside the Graphics Debugger?
       
      All I want to do is set a break point in a specific hlsl file, step thru it, and see the data, but this is so unbelievably complicated
      and Microsoft's instructions are horrible! Somebody please, please help.
       
       
       

    • By mister345
      I finally ported Rastertek's tutorial # 42 on soft shadows and blur shading. This tutorial has a ton of really useful effects and there's no working version anywhere online.
      Unfortunately it just draws a black screen. Not sure what's causing it. I'm guessing the camera or ortho matrix transforms are wrong, light directions, or maybe texture resources not being properly initialized.  I didnt change any of the variables though, only upgraded all types and functions DirectX3DVector3 to XMFLOAT3, and used DirectXTK for texture loading. If anyone is willing to take a look at what might be causing the black screen, maybe something pops out to you, let me know, thanks.
      https://github.com/mister51213/DX11Port_SoftShadows
       
      Also, for reference, here's tutorial #40 which has normal shadows but no blur, which I also ported, and it works perfectly.
      https://github.com/mister51213/DX11Port_ShadowMapping
       
  • Advertisement