• Advertisement
Sign in to follow this  

DX11 [Solved] Multi-Threading SlimDX DX11 Radeon Win7 Crash

This topic is 2876 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I write an engine for a game in C# using SlimDX (like many other developers here too ^^). One of the features is multi-threading. I do this using the deferred context of the DX11-API. Each of my threads collects the calls in its deferred context, then the deferred context is executed on this thread. Because the device is synchronized (I'm not using singlethreaded flag!), all the threads can execute their calls itself on the device. I do not need a 3rd thread, which gets the calls and puts them to the device (because this would be singlethreaded rendering!). All is working well. Now, I have a new test code for the engine, which renders parallel in 2 threads into two Win7-windows. First I created the two windows, then I created two threads, which do the DoEvents()-stuf for each window. Then on each window I created a render target. Two other rendering threads renders parallel to this two render targets. In the first few seconds, it seams that all is ok. But then the application crashes. For some reason the GUI of my Win7 gets errors too. You can see, that the window-vertices (I mean the real gui-elements in Win7) becomes crazy, or the textures becomes streched. If I create the device in SlimDX with the debug option, then the application says, there is an internal error of an external component, and that's all. On a REF device, in 90% of the tries, the GUI of Win7 becomes instable and crashes the device driver, which, after a few seconds, resets. I'm using the newest Radeon on my 5870. Has someone the problem too? Or maybe I'm wrong assuming, that I can use two threads accessing the immediate-context on the DX device simultanously (which should be synchronized internally and executed successive). [Edited by - Pyrogame on March 8, 2010 2:43:40 AM]

Share this post

Link to post
Share on other sites
You're misunderstanding how the multithreading functionality works. The point isn't to allow to have two threads using the device simultaneously, the point is to have multiple threads building up portions of a command list so that your main rendering thread can combine the portions into one big command list that can be submitted to the GPU. You're incorrect when you say that this would be like single-threaded rendering...using a deferred context allows you to spread the overhead of building the command list over multiple threads. The final part where you combine command lists is relatively cheap.

Ultimately your deferred contexts need to be serialized into a single command list. This is because the GPU only reads one command at a time, so you need to make sure that the command lists from your deferred contexts are combined in the order needed for the GPU to properly draw your scene.

If you want to support multiple output windows, you need to do it the old-fashioned way: create a swap chain for each window, render each view seperately, and present with the corresponding swap chain.

Share this post

Link to post
Share on other sites
Thank you for you comment MJP. I figured it out (with your help), why my way didn't work.

In my opinion, the deferred contexts are only simple software lists. So it shouldn't matter, how the lists are pushed to the immediate contexts. If you use a 3rd thread, you automatically serializes the lists. And this was my problem: the synchronized executing of the command lists. Because I render all my stuff in renderings threads, it doesn't matter, that this threads waits for the (synchronized) immediate context to push their commands. In my opinion, this is faster, than using a 3rd thread.

Now, instead of using a 3rd thread, I simply synchronize my immediate context like this:

for every rendering thread:
render to deferred context
build the command list
lock (immediateContext) {
immediateContext.ExecuteCommandList(commandList, false);
dispose the command list

Only the immediate contexts has to be synchronized. Other things, like the device itself are already synchronized by SlimDX or DX. And this was something, I misunderstud in the DX-API :) Now, all the threads render, render, render, and if one of them becomes ready, it renders to the device. Only, if two threads finishes the work simultanously, one of them has to wait for the other to execute the command list. This should be more efficient than using a third thread, which implies synchronized passing of command list from the rendering threads to this 3rd thread, etc.

Thank you for your help :)

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Now

  • Advertisement
  • Similar Content

    • By AxeGuywithanAxe
      I wanted to see how others are currently handling descriptor heap updates and management.
      I've read a few articles and there tends to be three major strategies :
      1 ) You split up descriptor heaps per shader stage ( i.e one for vertex shader , pixel , hull, etc)
      2) You have one descriptor heap for an entire pipeline
      3) You split up descriptor heaps for update each update frequency (i.e EResourceSet_PerInstance , EResourceSet_PerPass , EResourceSet_PerMaterial, etc)
      The benefits of the first two approaches is that it makes it easier to port current code, and descriptor / resource descriptor management and updating tends to be easier to manage, but it seems to be not as efficient.
      The benefits of the third approach seems to be that it's the most efficient because you only manage and update objects when they change.
    • By evelyn4you
      until now i use typical vertexshader approach for skinning with a Constantbuffer containing the transform matrix for the bones and an the vertexbuffer containing bone index and bone weight.
      Now i have implemented realtime environment  probe cubemaping so i have to render my scene from many point of views and the time for skinning takes too long because it is recalculated for every side of the cubemap.
      For Info i am working on Win7 an therefore use one Shadermodel 5.0 not 5.x that have more options, or is there a way to use 5.x in Win 7
      My Graphic Card is Directx 12 compatible NVidia GTX 960
      the member turanszkij has posted a good for me understandable compute shader. ( for Info: in his engine he uses an optimized version of it )
      Now my questions
       is it possible to feed the compute shader with my orignial vertexbuffer or do i have to copy it in several ByteAdressBuffers as implemented in the following code ?
        the same question is about the constant buffer of the matrixes
       my more urgent question is how do i feed my normal pipeline with the result of the compute Shader which are 2 RWByteAddressBuffers that contain position an normal
      for example i could use 2 vertexbuffer bindings
      1 containing only the uv coordinates
      2.containing position and normal
      How do i copy from the RWByteAddressBuffers to the vertexbuffer ?
      (Code from turanszkij )
      Here is my shader implementation for skinning a mesh in a compute shader:
      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 struct Bone { float4x4 pose; }; StructuredBuffer<Bone> boneBuffer;   ByteAddressBuffer vertexBuffer_POS; // T-Pose pos ByteAddressBuffer vertexBuffer_NOR; // T-Pose normal ByteAddressBuffer vertexBuffer_WEI; // bone weights ByteAddressBuffer vertexBuffer_BON; // bone indices   RWByteAddressBuffer streamoutBuffer_POS; // skinned pos RWByteAddressBuffer streamoutBuffer_NOR; // skinned normal RWByteAddressBuffer streamoutBuffer_PRE; // previous frame skinned pos   inline void Skinning(inout float4 pos, inout float4 nor, in float4 inBon, in float4 inWei) {  float4 p = 0, pp = 0;  float3 n = 0;  float4x4 m;  float3x3 m3;  float weisum = 0;   // force loop to reduce register pressure  // though this way we can not interleave TEX - ALU operations  [loop]  for (uint i = 0; ((i &lt; 4) &amp;&amp; (weisum&lt;1.0f)); ++i)  {  m = boneBuffer[(uint)inBon].pose;  m3 = (float3x3)m;   p += mul(float4(pos.xyz, 1), m)*inWei;  n += mul(nor.xyz, m3)*inWei;   weisum += inWei;  }   bool w = any(inWei);  pos.xyz = w ? p.xyz : pos.xyz;  nor.xyz = w ? n : nor.xyz; }   [numthreads(1024, 1, 1)] void main( uint3 DTid : SV_DispatchThreadID ) {  const uint fetchAddress = DTid.x * 16; // stride is 16 bytes for each vertex buffer now...   uint4 pos_u = vertexBuffer_POS.Load4(fetchAddress);  uint4 nor_u = vertexBuffer_NOR.Load4(fetchAddress);  uint4 wei_u = vertexBuffer_WEI.Load4(fetchAddress);  uint4 bon_u = vertexBuffer_BON.Load4(fetchAddress);   float4 pos = asfloat(pos_u);  float4 nor = asfloat(nor_u);  float4 wei = asfloat(wei_u);  float4 bon = asfloat(bon_u);   Skinning(pos, nor, bon, wei);   pos_u = asuint(pos);  nor_u = asuint(nor);   // copy prev frame current pos to current frame prev pos streamoutBuffer_PRE.Store4(fetchAddress, streamoutBuffer_POS.Load4(fetchAddress)); // write out skinned props:  streamoutBuffer_POS.Store4(fetchAddress, pos_u);  streamoutBuffer_NOR.Store4(fetchAddress, nor_u); }  
    • By mister345
      Hi, can someone please explain why this is giving an assertion EyePosition!=0 exception?
      _lightBufferVS->viewMatrix = DirectX::XMMatrixLookAtLH(XMLoadFloat3(&_lightBufferVS->position), XMLoadFloat3(&_lookAt), XMLoadFloat3(&up));
      It looks like DirectX doesnt want the 2nd parameter to be a zero vector in the assertion, but I passed in a zero vector with this exact same code in another program and it ran just fine. (Here is the version of the code that worked - note XMLoadFloat3(&m_lookAt) parameter value is (0,0,0) at runtime - I debugged it - but it throws no exceptions.
          m_viewMatrix = DirectX::XMMatrixLookAtLH(XMLoadFloat3(&m_position), XMLoadFloat3(&m_lookAt), XMLoadFloat3(&up)); Here is the repo for the broken code (See LightClass) https://github.com/mister51213/DirectX11Engine/blob/master/DirectX11Engine/LightClass.cpp
      and here is the repo with the alternative version of the code that is working with a value of (0,0,0) for the second parameter.
    • By mister345
      Hi, can somebody please tell me in clear simple steps how to debug and step through an hlsl shader file?
      I already did Debug > Start Graphics Debugging > then captured some frames from Visual Studio and
      double clicked on the frame to open it, but no idea where to go from there.
      I've been searching for hours and there's no information on this, not even on the Microsoft Website!
      They say "open the  Graphics Pixel History window" but there is no such window!
      Then they say, in the "Pipeline Stages choose Start Debugging"  but the Start Debugging option is nowhere to be found in the whole interface.
      Also, how do I even open the hlsl file that I want to set a break point in from inside the Graphics Debugger?
      All I want to do is set a break point in a specific hlsl file, step thru it, and see the data, but this is so unbelievably complicated
      and Microsoft's instructions are horrible! Somebody please, please help.

    • By mister345
      I finally ported Rastertek's tutorial # 42 on soft shadows and blur shading. This tutorial has a ton of really useful effects and there's no working version anywhere online.
      Unfortunately it just draws a black screen. Not sure what's causing it. I'm guessing the camera or ortho matrix transforms are wrong, light directions, or maybe texture resources not being properly initialized.  I didnt change any of the variables though, only upgraded all types and functions DirectX3DVector3 to XMFLOAT3, and used DirectXTK for texture loading. If anyone is willing to take a look at what might be causing the black screen, maybe something pops out to you, let me know, thanks.
      Also, for reference, here's tutorial #40 which has normal shadows but no blur, which I also ported, and it works perfectly.
  • Advertisement