Sign in to follow this  
Barnett

DX12 DirectX 12 Multi Threading / Low-latency presentation

Recommended Posts

Hi,

 

Does anyone know if it is safe to call IDXGISwapChain3::Present() on one thread, while at the same time calling ID3D12CommandQueue::ExecuteCommandLists() on another thread?

I have an application that creates two windows, one for each monitor. I have two threads that render to these two windows simultaneously. Each thread has its own ID3D12Device, ID3D12CommandQueue, IDXGISwapChain3, everything... they are completely independent.

Yet the application hangs randomly. Below are the call stacks of the two threads when they hang:

ntdll.dll!NtWaitForAlertByThreadId()
ntdll.dll!RtlpWaitOnAddressWithTimeout()
ntdll.dll!RtlpWaitOnAddress()
ntdll.dll!RtlpWaitOnCriticalSection()
ntdll.dll!RtlpEnterCriticalSectionContended()
D3D12.dll!CCommandQueue<0>::ExecuteCommandLists(unsigned int,struct ID3D12CommandList * const *)
dxgi.dll!CD3D12Device::CloseAndSubmitCommandList(unsigned int,enum CD3D12Device::QueueType)
dxgi.dll!CD3D12Device::PresentExtended(struct DXGI_PRESENTSURFACE const *,struct IDXGIResource * const *,unsigned int,struct IDXGIResource *,void *,unsigned int,unsigned int,int *,unsigned int *)
dxgi.dll!CDXGISwapChain::FlipPresentToDWM(struct SPresentArgs const *,unsigned int,unsigned int,unsigned int &,unsigned int,struct tagRECT const *,struct DXGI_SCROLL_RECT const *,struct DXGI_INTERNAL_CONTENT_PROTECTION const &)
dxgi.dll!CDXGISwapChain::PresentImplCore(struct SPresentArgs const *,unsigned int,unsigned int,unsigned int,struct tagRECT const *,unsigned int,struct DXGI_SCROLL_RECT const *,struct IDXGIResource *,bool &,bool &,bool &)
dxgi.dll!CDXGISwapChain::Present(unsigned int,unsigned int)
MyApp.exe!gui::CDXProc::CJob::Present() Line 963    C++

ntdll.dll!NtWaitForAlertByThreadId()
ntdll.dll!RtlpWaitOnAddressWithTimeout()
ntdll.dll!RtlpWaitOnAddress()
ntdll.dll!RtlpWaitOnCriticalSection()
ntdll.dll!RtlpEnterCriticalSectionContended()
dxgi.dll!CDXGISwapChain::GetCurrentBackBufferIndex(void)
dxgi.dll!CDXGISwapChain::GetCurrentCommandQueue(struct _GUID const &,void * *)
D3D12.dll!CCommandQueue<0>::ExecuteCommandLists(unsigned int,struct ID3D12CommandList * const *)
MyApp.exe!gui::CDXProc::CJob::ExecuteCommandList(ID3D12CommandList * iCommandList) Line 1172    C++

As you can see, the one thread is stuck in the Present() call, while the other thread is stuck inside ExecuteCommandLists().

I can get around the problem by putting a critical section around all calls to Present() and ExecuteCommandLists(), but I do not understand why this is necessary. Any ideas?

 

Edit: Changed the thread title to reflect the direction things are going.
 

Share this post


Link to post
Share on other sites

The first argument to CreateSwapChain is your main commandQueue. I guess that Present makes use of this queue internally, which means that no other thread should be using that queue during a Present call.

Is ID3D12CommandQueue not free threaded? I thought the general idea is that multiple threads can create multiple ID3D12GraphicsCommandLists in parallel, and submit them in parallel to a single ID3D12CommandQueue?

 

But anyway, that is not what I am doing. I created two separate command ID3D12CommandQueues. Or are they perhaps one and the same internally?

Share this post


Link to post
Share on other sites

Is ID3D12CommandQueue not free threaded? I thought the general idea is that multiple threads can create multiple ID3D12GraphicsCommandLists in parallel, and submit them in parallel to a single ID3D12CommandQueue?

You're right: "Any thread may submit a command list to any command queue at any time, and the runtime will automatically serialize submission of the command list in the command queue while preserving the submission order." -- that sounds like the queue has an internal mutex that's acquired for you...

Perhaps there's a bug and Present fails to acquire this mutex? Hopefully someone with deeper knowledge of D3D12 can shed light on this...

Share this post


Link to post
Share on other sites

That particular deadlock was discovered and fixed a while back, if I remember correctly. Make sure you're on the latest version of Windows 10.

I think this is something else - I am on Build 10586.494. Windows Update says: "Your device is up to date. Last checked: ?2016/?08/?09, ??00:35"

I narrowed the deadlock down to a ResourceBarrier I have that straddles VSync. I set a barrier from PRESENT to RENDER_TARGET directly *after* Present(), followed by a Signal()+SetEventOnCompletion()+WaitForSingleObject().

This is the only way I have been able to achieve "Direct Flip" latency. The usual method of waiting on a WAITABLE_OBJECT after Present() does not seem to work, because it does not matter if the window covers only a portion of the monitor or if it covers the entire monitor, I always get the same latency of about 34ms:
(The picture below is from an oscilloscope that I trigger when I start to render a new frame, and then measure how long it takes to see the change on the screen as picked up by a photo diode that I taped to the monitor.)
Tek_Temp1.png
If instead I remove the wait on the WAITABLE_OBJECT, and replace it with a wait on resource barrier, I get the expected behaviour of "Direct Flip" with latency going down to 18ms (a 16ms reduction) for the case where the window covers the entire screen:
Tek_Temp2.png
This worked, but is now causing the deadlock when I do the same thing on two screens simultaneously. I suppose I could also go back to using waitable objects, but then won't get the lower latency of "Direct Flip". Is there some other way of doing the timing?
 

Share this post


Link to post
Share on other sites

Well, the Anniversary update was just released as 14393, so I'd recommend giving that one a shot first to see what's going on.

 

You can also try out PresentMon as a software technique for measuring latency. It'll also tell you whether you're in independent flip or getting composed. You might just be using the waitable object incorrectly while trying to get low latency, but that is absolutely our recommended way of controlling your latency, even in D3D12. For example, are you waiting on the object before your first frame? If not, you'll end up with latency that's one frame higher than you'd want, even in independent flip mode.

Share this post


Link to post
Share on other sites

Well, the Anniversary update was just released as 14393, so I'd recommend giving that one a shot first to see what's going on.

 

You can also try out PresentMon as a software technique for measuring latency. It'll also tell you whether you're in independent flip or getting composed. You might just be using the waitable object incorrectly while trying to get low latency, but that is absolutely our recommended way of controlling your latency, even in D3D12. For example, are you waiting on the object before your first frame? If not, you'll end up with latency that's one frame higher than you'd want, even in independent flip mode.

 

According to PresentMon everything is fine, but in reality (when measuring the light coming out of the screen), all is not as it seems. I did the following four tests:
Test 1: Using a "waitable object" on a non-fullscreen window.
Test 2: Using a "waitable object" on a fullscreen window.
Test 3: Using a "wait on barrier" on a non-fullscreen window.
Test 4: Using a "wait on barrier" on a fullscreen window.

Below is the output from PresentMon: (I added a column on the far right with the actual latency as measured using an oscilloscope.)

      Runtime SyncInterval AllowsTearing PresentFlags PresentMode                Dropped TimeInSeconds MsBetweenPresents MsBetweenDisplayChange MsInPresentAPI MsUntilRenderComplete MsUntilDisplayed Measured Latency
      ------- ------------ ------------- ------------ -----------                ------- ------------- ----------------- ---------------------- -------------- --------------------- ---------------- ----------------
Test 1:
      DXGI    1            0             64           Composed: Flip             0       4.134419      16.581            16.756                 0.488          0.429                 32.617           35
      DXGI    1            0             64           Composed: Flip             0       4.151078      16.659            16.605                 0.506          0.5                   32.563           35
      DXGI    1            0             64           Composed: Flip             0       4.167767      16.689            16.673                 0.39           0.512                 32.547           35
Test 2:
      DXGI    1            0             64           Hardware: Independent Flip 0       4.396671      16.611            16.717                 0.466          0.426                 16.311           35
      DXGI    1            0             64           Hardware: Independent Flip 0       4.413443      16.772            16.648                 0.382          0.396                 16.187           35
      DXGI    1            0             64           Hardware: Independent Flip 0       4.430011      16.568            16.734                 0.397          0.41                  16.353           35
Test 3:
      DXGI    1            0             64           Composed: Flip             0       2.242991      16.301            16.689                 0.371          0.431                 32.319           35
      DXGI    1            0             64           Composed: Flip             0       2.259456      16.465            16.67                  0.376          0.347                 32.524           35
      DXGI    1            0             64           Composed: Flip             0       2.276224      16.768            16.694                 0.359          0.434                 32.45            35
Test 4:
      DXGI    1            0             64           Hardware: Independent Flip 0       3.005195      16.478            16.696                 0.43           0.447                 15.927           19
      DXGI    1            0             64           Hardware: Independent Flip 0       3.021999      16.804            16.679                 0.387          0.394                 15.802           19
      DXGI    1            0             64           Hardware: Independent Flip 0       3.038641      16.642            16.72                  0.383          0.391                 15.88            19

Note that in Test 2 (ie using a waitable object on a fullscreen window) there is a 16ms discrepancy between what PresentMon says the latency is and what is measured in hardware.

It might be that I am doing something wrong with the waitable object - I will keep looking... I will also try the Windows 10 upgrade.

Share this post


Link to post
Share on other sites

Well, the Anniversary update was just released as 14393, so I'd recommend giving that one a shot first to see what's going on.

 

You can also try out PresentMon as a software technique for measuring latency. It'll also tell you whether you're in independent flip or getting composed. You might just be using the waitable object incorrectly while trying to get low latency, but that is absolutely our recommended way of controlling your latency, even in D3D12. For example, are you waiting on the object before your first frame? If not, you'll end up with latency that's one frame higher than you'd want, even in independent flip mode.

Sorry, one more thing... I watched your "Presentation Modes" video about 101 times but still don't understand the difference between "Independent Flip" and "True Immediate Independent Flip". Can you perhaps point me to some additional information on the "True Immediate Independent Flip" mode, and under what conditions it becomes active?

 

Also you mentioned in the video that with a DXGI_SWAP_EFFECT_FLIP_DISCARD backbuffer DXGI will render things like the volume control directly on my backbuffer while staying in Independent Flip mode. I have not been able to reproduce this behaviour - as soon as the volume control comes up I can see that DXGI is adding in an extra frame of latency... or am I missing something?

Share this post


Link to post
Share on other sites

True immediate independent flip is engaged either by calling SetFullscreenState with TRUE (Win32 only, not recommended), or using the new DXGI_SWAP_CHAIN_FLAG_ALLOW_TEARING and DXGI_PRESENT_ALLOW_TEARING. When independent flip is entered and sync interval is 0, the flip will happen as soon as rendering is complete.

 

The FLIP_SEQUENTIAL and FLIP_DISCARD swap effects allow seamless transitions between independent flip and composition. It is also possible that on systems with hardware composition support (e.g. multiple hardware overlay planes) that things like the volume controls can be rendered without dropping back to software composition and adding back the latency.

 

The PresentMon data looks like what I'd expect. Are you sure that case 2 has data that looks like that at the same time as your monitor latency test? Note that it's possible that independent flip didn't properly engage 100% of the time, but if your results were consistent then it's probably not that.

Share this post


Link to post
Share on other sites

Are you sure that case 2 has data that looks like that at the same time as your monitor latency test? Note that it's possible that independent flip didn't properly engage 100% of the time, but if your results were consistent then it's probably not that.

 

Yes, I am sure. I repeated the tests multiple times, now also on two different computers.

 

In order to rule out the possibility that I am doing something wrong I thought I would start with a working sample, Intel's FlipModelD3D12 sample as documented here. But as I will show below, "Direct Flip" does not seem to work, not even with this unmodified sample.

 

Firstly, running in a window I get the expected best case of two frames of latency - one frame for rendering plus the one frame for compositing:

 

Untitled2.jpg

 

 

Next, going to fullscreen to enable "Direct Flip" (or "Independent Flip"), the results however remain the same - still two frames of latency:

 

Untitled1.jpg

 

 

This is the output of PresentMon during the window-to-fullscreen transition:

                                           Runtime SyncInterval AllowsTearing PresentFlags PresentMode                Dropped TimeInSeconds MsBetweenPresents MsBetweenDisplayChange MsInPresentAPI MsUntilRenderComplete MsUntilDisplayed
                                           ------- ------------ ------------- ------------ -----------                ------- ------------- ----------------- ---------------------- -------------- --------------------- ----------------
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.472288      16.614            16.659                 0.195          0.74                  22.975
                   1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.495689      16.659            16.715                 0.228          0.449                 16.251
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.488948      16.659            16.707                 0.218          0.808                 23.023
                   1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.51256       16.871            16.616                 0.184          0.445                 15.996
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.505638      16.69             16.622                 0.238          0.645                 22.954
                   1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.529297      16.737            16.669                 0.166          0.475                 15.928
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.522536      16.897            16.66                  0.207          0.575                 22.717
                   1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.546146      16.848            16.73                  0.14           0.117                 15.809
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.53896       16.424            16.76                  0.196          0.787                 23.052
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             1       3.555698      16.738            0                      0.241          1.015                 0
                   1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.562076      15.93             16.712                 0.119          0.12                  16.591
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.572275      16.578            33.322                 0.213          8.253                 23.059
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.590761      18.486            16.669                 0.204          6.514                 21.242
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.607514      16.753            16.73                  0.213          6.378                 21.219
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.624395      16.881            16.632                 0.261          6.284                 20.97
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.640937      16.542            16.696                 0.213          6.308                 21.125
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.657528      16.591            16.657                 0.206          6.488                 21.191
FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.674217      16.689            16.698                 0.2            6.37                  21.2

So PresentMon says that in fullsceen we are in Independent Flip, but according to the FlipModelID3D12 sample we are still getting two frames of latency.

 

In order to resolve this discrepancy I added six lines of code to the FlipModelID3D12 sample program that allows me to measure the latency with an oscilloscope:

(Basically all I do is, for one in every eight frames, I set a RS232 port line high at the start of render, and low again at the end of render. I also set the background color to black in this frame so that I can pick it up on screen with a photo diode.)

 

Tek_Temp.png

 

So the latency measured is about 33ms (two frames), as predicted by the FlipModelID3D12 app, and NOT the 16ms (or one frame) as you would expect for Independent Flip mode.

 

Originally I did all these tests on my main PC which is:

  Windows 10 build 10586, multi monitor, Nvidia discrete GPU.

 

Thinking that there is something wrong with this computer I repeated everything on a:

  Windows 10 build 14393, single monitor, Intel integrated GPU.

 

But the results are exactly the same.

 

I don't really know what to try next - any help would be appreciated.

Share this post


Link to post
Share on other sites

PresentMon agrees with you for your tests with the Intel flip model sample app - it says 21ms of latency from the time Present() is called until it hits the screen, not 16ms. The reason for that in the sample is that it doesn't work properly :) notice the 6ms it takes to render, but only when in fullscreen. That's because the app is rendering too early to get just 1 frame of latency; I suspect there's a bug which causes it to drop a wait on the waitable object, accidentally causing it to run with latency of 2.

 

I'm interested in focusing on the discrepency between PresentMon and your hardware measurements. If PresentMon says you should be getting 16ms, then you legitimately should be - it's not a predictive tool based on inputs into the system, it measures events from the components responsible for getting contents on screen. When it says you've flipped, we've really requested the hardware to flip.

 

As something to try, just to see what happens, try increasing the buffer count to 3?

Share this post


Link to post
Share on other sites

As something to try, just to see what happens, try increasing the buffer count to 3?

 

When changing the buffers to 3 the diagram displayed by FlipModelID3D12 looks slightly different:

 

Untitled3.jpg

 

However, the latency is still two frames.

 

If FlipModelID3D12 is not working correctly now, it must have been working before because the image on Intel's web page looks like they had it running with one frame latency:

 

3_minimumlatency.png

 

 

I am going to see if I can move the timing pulse to wrap only the Present() call, to see how the hardware measurements compare with PresentMon.

Share this post


Link to post
Share on other sites

I changed FlipModelD3D12 adding a pulse around only the Present() call so that I can measure the latency from Present() to when the image is on the screen. I tested:
  1) FlipModelD3D12 running in a window with 2 buffers.
  2) FlipModelD3D12 running full screen with 2 buffers.
  3) FlipModelD3D12 running full screen with 3 buffers.

The results are:

                               PresentMon MsUntilDisplayed        Hardware measurement
  1) Windowed 2 buffers   :    26.3                               27.6
  2) Fullscreen 2 buffers :    22.9                               23.6
  3) Fullscreen 3 buffers :    26.3                               27.6

I think that looks pretty good - the hardware measurement is out by maybe one ms, but that is not unexpected considering the simple measuring technique.

So what do you think can be wrong with FlipModelD3D12 that makes fullscreen no better than windowed?

Is there perhaps another sample I can try that is known to be good?

Edited by Barnett

Share this post


Link to post
Share on other sites

I managed to get FlipModelID3D12 working very nicely with a solid 16.6ms one frame latency:

 

Untitled4.jpg

 

 

PresentMon reports about 9.9ms latency, but that is from Present()->screen, so it will obviously be a bit shorter than the "Start Of Render"->screen displayed by FlipModelID3D12:

FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.065275,16.737,16.655,0.061,2.298,9.809
FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.081854,16.579,16.662,0.054,2.374,9.892
FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.098540,16.685,16.660,0.051,2.384,9.866
FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.115118,16.579,16.660,0.052,2.463,9.948
FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.132057,16.938,16.667,0.077,2.322,9.676
FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.148549,16.493,16.658,0.057,2.468,9.842
FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.165347,16.798,16.665,0.067,2.391,9.709
FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.181797,16.450,16.656,0.053,2.515,9.916
FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.198429,16.632,16.662,0.051,2.582,9.945

My hardware measurements confirm these numbers as well.

 

To make it work I had to add an additional IDXGIOutput::WaitForVBlank() call directly after the wait on waitable object. I think there is a bug in Windows causing the waitable object to be kicked one frame too early when in Independent Flip mode. Perhaps they forgot to compensate for the lower latency when switching from composition to Independent Flip? That is all I can think.

 

Obviously you cannot always just call WaitForVBlank() to add the one frame delay because most of the time the waitable object works correctly. So you will have to wrap that call in some or other if statement. So the next problem now will be to come up with the logic for doing that reliably...

Share this post


Link to post
Share on other sites

Hm, that's unexpected. It's possible that we have a bug with independent flip causing the waitable object to be signaled when the frame is queued instead of released... If I find out anything more I'll update this thread.

Share this post


Link to post
Share on other sites

Hm, that's unexpected. It's possible that we have a bug with independent flip causing the waitable object to be signaled when the frame is queued instead of released... If I find out anything more I'll update this thread.

 

Thanks for all your help - greatly appreciated.

 

Can you perhaps shed some light on how presentation to secondary monitors are handled? How are they different from the main monitor? I cannot seem to get a secondary monitor to go into Independent Flip.

 

Rendering to Main Monitor:

  0x00000218C02CADC0,DXGI,1,0,0,Hardware: Independent Flip,0,3.555798,16.532,16.637,0.508,0.509,15.843

 

Same code rendering to Secondary Monitor:

  0x00000218C4C1FF10,DXGI,1,0,0,Composed: Flip,0,3.522093,15.864,16.663,0.222,0.377,49.643

 

 

In a previous post you explained "True Immediate Independent Flip", which I think I understand now. But what then is the difference between "Direct Flip" and "Independent Flip"?

Share this post


Link to post
Share on other sites

Yep, secondary monitors may not support independent flip yet. Additionally, PresentMon can't properly detect direct flip and classifies it as composed.

 

Direct flip occurs when the compositor detects that it doesn't need to compose because only one app is covering the screen - it just uses that app's contents when it would normally compose to one of its own surfaces.

 

Independent flip occurs when the compositor decides it no longer even needs to wake up, because that app will continue covering the screen until some other event happens. It tells the system to continue flipping independently; at that point, it can start to get even faster than the compositor's rate (immediate independent flip).

 

Edit: Clarifying, secondary monitor independent flip requires the Anniversary update, along with newer drivers, and isn't guaranteed to be supported even then.

Edited by Jesse Natalie

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      627749
    • Total Posts
      2978912
  • Similar Content

    • By Mr_Fox
      Hi Guys,
      Does anyone know how to grab a video frame on to DX texture easily just using Windows SDK? or just play video on DX texture easily without using 3rd party library?  I know during DX9 ages, there is a DirectShow library to use (though very hard to use). After a brief search, it seems most game dev settled down with Bink and leave all hobbyist dx programmer struggling....
      Having so much fun play with Metal video playback (super easy setup just with AVKit, and you can grab movie frame to your metal texture), I feel there must be a similar easy path for video playback on dx12 but I failed to find it.
      Maybe I missed something? Thanks in advance for anyone who could give me some path to follow
    • By _void_
      Hello guys,
      I have a texture of format DXGI_FORMAT_B8G8R8A8_UNORM_SRGB.
      Is there a way to create shader resource view for the texture so that I could read it as RGBA from the shader instead of reading it specifically as BGRA?
      I would like all the textures to be read as RGBA.
       
      Tx
    • By _void_
      Hello guys,
      I am wondering why D3D12 resource size has type UINT64 while resource view size is limited to UINT32.
      typedef struct D3D12_RESOURCE_DESC { … UINT64                   Width; … } D3D12_RESOURCE_DESC; Vertex buffer view can be described in UINT32 types.
      typedef struct D3D12_VERTEX_BUFFER_VIEW { D3D12_GPU_VIRTUAL_ADDRESS BufferLocation; UINT                      SizeInBytes; UINT                      StrideInBytes; } D3D12_VERTEX_BUFFER_VIEW; For the buffer we can specify offset for the first element as UINT64 but the buffer view should still be defined in UINT32 terms.
      typedef struct D3D12_BUFFER_SRV { UINT64                 FirstElement; UINT                   NumElements; UINT                   StructureByteStride; D3D12_BUFFER_SRV_FLAGS Flags; } D3D12_BUFFER_SRV; Does it really mean that we can create, for instance, structured buffer of floats having MAX_UNIT64 elements (MAX_UNIT64 * sizeof(float) in byte size) but are not be able to create shader resource view which will enclose it completely since we are limited by UINT range?
      Is there a specific reason for this? HLSL is restricted to UINT32 values. Calling function GetDimensions() on the resource of UINT64 size will not be able to produce valid values. I guess, it could be one of the reasons.
       
      Thanks!
    • By pcmaster
      Hello!
      Is it possible to mix ranges of samplers and ranges of SRVs and ranges of UAVs in one root parameter descriptor table? Like so:
      D3D12_DESCRIPTOR_RANGE ranges[3]; D3D12_ROOT_PARAMETER param; param.ParameterType = D3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE; param.DescriptorTable.NumDescriptorRanges = 3; param.DescriptorTable.pDescriptorRanges = ranges; range[0].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SRV; .. range[1].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_UAV; .. range[2].RangeType = D3D12_DESCRIPTOR_RANGE_TYPE_SAMPLER; .. I wonder especially about CopyDescriptors, that will need to copy a range of D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER and a range of D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV.
      Thanks if anyone knows (while I try it :))
      .P
    • By Infinisearch
      So I was reading the presentation Practical DirectX 12 - Programming Model and Hardware Capabilities again and finally decided to tackle proper command list submission.  Things mentioned in the document regarding this subject:
      Aim for (per-frame): ● 15-30 Command Lists ● 5-10 ‘ExecuteCommandLists’ calls
      Each ‘ ExecuteCommandLists’ has a fixed CPU overhead ● Underneath this call triggers a flush ● So batch up command lists
      Try to put at least 200μs of GPU work in each ‘ExecuteCommandLists’, preferably 500μs
      Small calls to ‘ExecuteCommandLists’ complete faster than the OS scheduler can submit new ones
      OS takes ~60μs to schedule upcoming work
      So basically I want to estimate how long my draw calls take.  Benchmarking for a particular piece of hardware seems impractical.  So given the stats primitive count, pixel count(approximately how many screen space pixels the call will be rendered to), and some precomputed metric associated with shader ALU complexity(like # of alu ops) do you think that I can get a reasonable estimation of how much time a draw call will take?
      What do you do to take this into account?
      What about other things like transitions?  I can only think of actual measurement in this case.
  • Popular Now