Jump to content
  • Advertisement

Barnett

Member
  • Content Count

    16
  • Joined

  • Last visited

Community Reputation

140 Neutral

About Barnett

  • Rank
    Member

Personal Information

  • Role
    Programmer
  • Interests
    Programming

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Oh, I think I figured it out... On tier 1 hardware you will be forced to create all placed textures with D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET if you want to render to any of the textures in that heap. The disadvantage is that this will use more memory. From the documentation on D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET: On tier 2 hardware the advantage is that you can create placed textures with or without the flag as needed. Is this correct?
  2. Thanks for your help. I am not sure I understand #3. I would like to create a large heap and then sub-allocate 2D textures within the heap using ID3D12Device::CreatePlacedResource(). I would like to use these textures as both render targets and as shader resources. The documentation says that when calling ID3D12Device::CreateHeap(), you have to specify D3D12_HEAP_FLAGS such that: Adapters that only support heap tier 1 must set two out of the three following flags. D3D12_HEAP_FLAG_DENY_BUFFERS D3D12_HEAP_FLAG_DENY_RT_DS_TEXTURES D3D12_HEAP_FLAG_DENY_NON_RT_DS_TEXTURES What then is the purpose of these flags and how should you set them?
  3. I have a few questions about DirectX 12 memory management: 1) When you call the ID3D12Device::Evict() function, how do you specify that the contents of the resource is no longer needed and therefor can be discarded? For example, if I have a resource that I am done with, but will need it again later as a render target, I don't want the contents to be swapped out and restored because that would just be a waste of bandwidth. 2) When you call the QueryVideoMemoryInfo() function and find that your memory budget has been reduced and that you are now over budget, how much time do you have to rectify the situation? 3) Am I correct when saying that Placed Resources cannot be used when you want to use a resource as both a render target and a shader resource? (The reason being that you cannot create a heap that support both uses on heap tier 1 hardware (almost all current NVidia hardware). )
  4. Barnett

    DirectX 12 Multi Threading

      Thanks. That is a lot more permissive than I expected. I am going to have to go through my code and see where I have been overly conservative.
  5. Barnett

    DirectX 12 Multi Threading

    OK, thanks, that makes sense.   What about sharing resources? Is is safe to call ID3D12Resource->GetGPUVirtualAddress() on more than one thread to use that same resource on different threads?
  6. I still have a great deal of uncertainty about what is and what is not thread safe in DX12.   I understand that ID3D12Device and ID3D12CommandQueue objects are free threaded. So your application only has to create one instance of these and can then use them on multiple threads that may call member functions of these objects simultaneously. On the other hand, ID3D12CommandAllocator and ID3D12GraphicsCommandList objects are not free threaded, so each thread needs to create its own instances of these.   But what about things like ID3D12RootSignature and ID3D12PipelineState? Can a single instance of these be used simultaneously on multiple threads? Eg: ID3D12CommandQueue* pCQ; ID3D12GraphicsCommandList* pGCL_1; ID3D12GraphicsCommandList* pGCL_2; ID3D12RootSignature* pRS; void Thread_1() {   pGCL_1->SetGraphicsRootSignature( pRS );   //...   pCQ->ExecuteCommandLists( 1, &pGCL_1 ); } void Thread_2() {   pGCL_2->SetGraphicsRootSignature( pRS );   //...   pCQ->ExecuteCommandLists( 1, &pGCL_2 ); } Whould this be safe to do? How can I tell what is being done to the shared ID3D12RootSignature object, and how do I know if it is thread safe?
  7.   Thanks for all your help - greatly appreciated.   Can you perhaps shed some light on how presentation to secondary monitors are handled? How are they different from the main monitor? I cannot seem to get a secondary monitor to go into Independent Flip.   Rendering to Main Monitor:   0x00000218C02CADC0,DXGI,1,0,0,Hardware: Independent Flip,0,3.555798,16.532,16.637,0.508,0.509,15.843   Same code rendering to Secondary Monitor:   0x00000218C4C1FF10,DXGI,1,0,0,Composed: Flip,0,3.522093,15.864,16.663,0.222,0.377,49.643     In a previous post you explained "True Immediate Independent Flip", which I think I understand now. But what then is the difference between "Direct Flip" and "Independent Flip"?
  8. I managed to get FlipModelID3D12 working very nicely with a solid 16.6ms one frame latency:       PresentMon reports about 9.9ms latency, but that is from Present()->screen, so it will obviously be a bit shorter than the "Start Of Render"->screen displayed by FlipModelID3D12: FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.065275,16.737,16.655,0.061,2.298,9.809 FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.081854,16.579,16.662,0.054,2.374,9.892 FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.098540,16.685,16.660,0.051,2.384,9.866 FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.115118,16.579,16.660,0.052,2.463,9.948 FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.132057,16.938,16.667,0.077,2.322,9.676 FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.148549,16.493,16.658,0.057,2.468,9.842 FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.165347,16.798,16.665,0.067,2.391,9.709 FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.181797,16.450,16.656,0.053,2.515,9.916 FlipModelD3D12.exe,2772,0x000001E13F9D5800,DXGI,1,0,0,Hardware: Independent Flip,0,1.198429,16.632,16.662,0.051,2.582,9.945 My hardware measurements confirm these numbers as well.   To make it work I had to add an additional IDXGIOutput::WaitForVBlank() call directly after the wait on waitable object. I think there is a bug in Windows causing the waitable object to be kicked one frame too early when in Independent Flip mode. Perhaps they forgot to compensate for the lower latency when switching from composition to Independent Flip? That is all I can think.   Obviously you cannot always just call WaitForVBlank() to add the one frame delay because most of the time the waitable object works correctly. So you will have to wrap that call in some or other if statement. So the next problem now will be to come up with the logic for doing that reliably...
  9. I changed FlipModelD3D12 adding a pulse around only the Present() call so that I can measure the latency from Present() to when the image is on the screen. I tested:   1) FlipModelD3D12 running in a window with 2 buffers.   2) FlipModelD3D12 running full screen with 2 buffers.   3) FlipModelD3D12 running full screen with 3 buffers. The results are:                                PresentMon MsUntilDisplayed        Hardware measurement   1) Windowed 2 buffers   :    26.3                               27.6   2) Fullscreen 2 buffers :    22.9                               23.6   3) Fullscreen 3 buffers :    26.3                               27.6 I think that looks pretty good - the hardware measurement is out by maybe one ms, but that is not unexpected considering the simple measuring technique. So what do you think can be wrong with FlipModelD3D12 that makes fullscreen no better than windowed? Is there perhaps another sample I can try that is known to be good?
  10.   When changing the buffers to 3 the diagram displayed by FlipModelID3D12 looks slightly different:     However, the latency is still two frames.   If FlipModelID3D12 is not working correctly now, it must have been working before because the image on Intel's web page looks like they had it running with one frame latency:       I am going to see if I can move the timing pulse to wrap only the Present() call, to see how the hardware measurements compare with PresentMon.
  11.   Yes, I am sure. I repeated the tests multiple times, now also on two different computers.   In order to rule out the possibility that I am doing something wrong I thought I would start with a working sample, Intel's FlipModelD3D12 sample as documented here. But as I will show below, "Direct Flip" does not seem to work, not even with this unmodified sample.   Firstly, running in a window I get the expected best case of two frames of latency - one frame for rendering plus the one frame for compositing:       Next, going to fullscreen to enable "Direct Flip" (or "Independent Flip"), the results however remain the same - still two frames of latency:       This is the output of PresentMon during the window-to-fullscreen transition:                                            Runtime SyncInterval AllowsTearing PresentFlags PresentMode                Dropped TimeInSeconds MsBetweenPresents MsBetweenDisplayChange MsInPresentAPI MsUntilRenderComplete MsUntilDisplayed                                            ------- ------------ ------------- ------------ -----------                ------- ------------- ----------------- ---------------------- -------------- --------------------- ---------------- FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.472288      16.614            16.659                 0.195          0.74                  22.975                    1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.495689      16.659            16.715                 0.228          0.449                 16.251 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.488948      16.659            16.707                 0.218          0.808                 23.023                    1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.51256       16.871            16.616                 0.184          0.445                 15.996 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.505638      16.69             16.622                 0.238          0.645                 22.954                    1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.529297      16.737            16.669                 0.166          0.475                 15.928 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.522536      16.897            16.66                  0.207          0.575                 22.717                    1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.546146      16.848            16.73                  0.14           0.117                 15.809 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             0       3.53896       16.424            16.76                  0.196          0.787                 23.052 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Composed: Flip             1       3.555698      16.738            0                      0.241          1.015                 0                    1008 0x000001E61C854480 DXGI    1            0             0            Hardware: Legacy Flip      0       3.562076      15.93             16.712                 0.119          0.12                  16.591 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.572275      16.578            33.322                 0.213          8.253                 23.059 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.590761      18.486            16.669                 0.204          6.514                 21.242 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.607514      16.753            16.73                  0.213          6.378                 21.219 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.624395      16.881            16.632                 0.261          6.284                 20.97 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.640937      16.542            16.696                 0.213          6.308                 21.125 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.657528      16.591            16.657                 0.206          6.488                 21.191 FlipModelD3D12.exe 6996 0x000001F27D6B42C0 DXGI    1            0             0            Hardware: Independent Flip 0       3.674217      16.689            16.698                 0.2            6.37                  21.2 So PresentMon says that in fullsceen we are in Independent Flip, but according to the FlipModelID3D12 sample we are still getting two frames of latency.   In order to resolve this discrepancy I added six lines of code to the FlipModelID3D12 sample program that allows me to measure the latency with an oscilloscope: (Basically all I do is, for one in every eight frames, I set a RS232 port line high at the start of render, and low again at the end of render. I also set the background color to black in this frame so that I can pick it up on screen with a photo diode.)     So the latency measured is about 33ms (two frames), as predicted by the FlipModelID3D12 app, and NOT the 16ms (or one frame) as you would expect for Independent Flip mode.   Originally I did all these tests on my main PC which is:   Windows 10 build 10586, multi monitor, Nvidia discrete GPU.   Thinking that there is something wrong with this computer I repeated everything on a:   Windows 10 build 14393, single monitor, Intel integrated GPU.   But the results are exactly the same.   I don't really know what to try next - any help would be appreciated.
  12. Sorry, one more thing... I watched your "Presentation Modes" video about 101 times but still don't understand the difference between "Independent Flip" and "True Immediate Independent Flip". Can you perhaps point me to some additional information on the "True Immediate Independent Flip" mode, and under what conditions it becomes active?   Also you mentioned in the video that with a DXGI_SWAP_EFFECT_FLIP_DISCARD backbuffer DXGI will render things like the volume control directly on my backbuffer while staying in Independent Flip mode. I have not been able to reproduce this behaviour - as soon as the volume control comes up I can see that DXGI is adding in an extra frame of latency... or am I missing something?
  13.   According to PresentMon everything is fine, but in reality (when measuring the light coming out of the screen), all is not as it seems. I did the following four tests: Test 1: Using a "waitable object" on a non-fullscreen window. Test 2: Using a "waitable object" on a fullscreen window. Test 3: Using a "wait on barrier" on a non-fullscreen window. Test 4: Using a "wait on barrier" on a fullscreen window. Below is the output from PresentMon: (I added a column on the far right with the actual latency as measured using an oscilloscope.)       Runtime SyncInterval AllowsTearing PresentFlags PresentMode                Dropped TimeInSeconds MsBetweenPresents MsBetweenDisplayChange MsInPresentAPI MsUntilRenderComplete MsUntilDisplayed Measured Latency       ------- ------------ ------------- ------------ -----------                ------- ------------- ----------------- ---------------------- -------------- --------------------- ---------------- ---------------- Test 1:       DXGI    1            0             64           Composed: Flip             0       4.134419      16.581            16.756                 0.488          0.429                 32.617           35       DXGI    1            0             64           Composed: Flip             0       4.151078      16.659            16.605                 0.506          0.5                   32.563           35       DXGI    1            0             64           Composed: Flip             0       4.167767      16.689            16.673                 0.39           0.512                 32.547           35 Test 2:       DXGI    1            0             64           Hardware: Independent Flip 0       4.396671      16.611            16.717                 0.466          0.426                 16.311           35       DXGI    1            0             64           Hardware: Independent Flip 0       4.413443      16.772            16.648                 0.382          0.396                 16.187           35       DXGI    1            0             64           Hardware: Independent Flip 0       4.430011      16.568            16.734                 0.397          0.41                  16.353           35 Test 3:       DXGI    1            0             64           Composed: Flip             0       2.242991      16.301            16.689                 0.371          0.431                 32.319           35       DXGI    1            0             64           Composed: Flip             0       2.259456      16.465            16.67                  0.376          0.347                 32.524           35       DXGI    1            0             64           Composed: Flip             0       2.276224      16.768            16.694                 0.359          0.434                 32.45            35 Test 4:       DXGI    1            0             64           Hardware: Independent Flip 0       3.005195      16.478            16.696                 0.43           0.447                 15.927           19       DXGI    1            0             64           Hardware: Independent Flip 0       3.021999      16.804            16.679                 0.387          0.394                 15.802           19       DXGI    1            0             64           Hardware: Independent Flip 0       3.038641      16.642            16.72                  0.383          0.391                 15.88            19 Note that in Test 2 (ie using a waitable object on a fullscreen window) there is a 16ms discrepancy between what PresentMon says the latency is and what is measured in hardware. It might be that I am doing something wrong with the waitable object - I will keep looking... I will also try the Windows 10 upgrade.
  14. I think this is something else - I am on Build 10586.494. Windows Update says: "Your device is up to date. Last checked: ?2016/?08/?09, ??00:35" I narrowed the deadlock down to a ResourceBarrier I have that straddles VSync. I set a barrier from PRESENT to RENDER_TARGET directly *after* Present(), followed by a Signal()+SetEventOnCompletion()+WaitForSingleObject(). This is the only way I have been able to achieve "Direct Flip" latency. The usual method of waiting on a WAITABLE_OBJECT after Present() does not seem to work, because it does not matter if the window covers only a portion of the monitor or if it covers the entire monitor, I always get the same latency of about 34ms: (The picture below is from an oscilloscope that I trigger when I start to render a new frame, and then measure how long it takes to see the change on the screen as picked up by a photo diode that I taped to the monitor.) If instead I remove the wait on the WAITABLE_OBJECT, and replace it with a wait on resource barrier, I get the expected behaviour of "Direct Flip" with latency going down to 18ms (a 16ms reduction) for the case where the window covers the entire screen: This worked, but is now causing the deadlock when I do the same thing on two screens simultaneously. I suppose I could also go back to using waitable objects, but then won't get the lower latency of "Direct Flip". Is there some other way of doing the timing?  
  15. Is ID3D12CommandQueue not free threaded? I thought the general idea is that multiple threads can create multiple ID3D12GraphicsCommandLists in parallel, and submit them in parallel to a single ID3D12CommandQueue?   But anyway, that is not what I am doing. I created two separate command ID3D12CommandQueues. Or are they perhaps one and the same internally?
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!