Jump to content
  • Advertisement
turanszkij

DX12 WriteBufferImmediate use-cases

Recommended Posts

I am doing a DX12 graphics wrapper, and I would like to update constant buffers. I found the ID3D12GraphicsCommandList2::WriteBufferImmediate method, which is apparently available from a Windows 10 Creators update only. I couldn't really find any info about this (and couldn't try it yet), am I correct to assume this would be useful for writing to constant buffers without much need to do synchronization? It seems to me like this method copies data to the command list itself and then that data will be copied into the DEFAULT resource address which I provided? The only synchronization needed here would be transition barriers to COPY_DEST before WriteBufferImmediate() and back to GENERIC_READ afterwards? I could be totally off though, I'm still wrapping my head around a lot of things.

What other use cases would this method allow for?

Share this post


Link to post
Share on other sites
Advertisement

At a high level, no that is not its intended use. Using MODE_DEFAULT would (probably) cause the graphics pipeline to stall/drain every time you issue one of these writes, which would kill performance. Using either of the other modes could cause the writes to happen too soon (affecting draws already in flight) or too late (after all previous draws in flight are fully finished, not necessarily in time for the next one).

Its intended use is for checking progress of GPU execution, specifically when the GPU has faulted and the device has become removed. If you use WriteBufferImmediate to insert "breadcrumbs" at the top of pipe and bottom of pipe, and the GPU faults, you can inspect these breadcrumbs to see which workloads had started but not finished - i.e. which workloads could have possibly contributed to the fault.

Share this post


Link to post
Share on other sites

SoldierOfLight is correct that one of the main purposes of WriteBufferImmediate is to provide synchronized writes with work done in the pipeline using MARKER_IN and MARKER_OUT modes.  However, MODE_DEFAULT is not a synchronizing operation.  In fact, the purpose of MODE_DEFAULT is to enable quick, stochastic writes to buffer locations, such as updating a few constants.  This also eliminates the need for an upload-heap staging buffer for these cases. 

The buffer must be in either the COPY_DEST or COMMON state (will be promoted to COPY_DEST).  

We would love to hear your feedback on how this affects your application performance.  Also, let me know if you have any more questions.

Share this post


Link to post
Share on other sites

Right, to clarify, the barrier to COPY_DEST would cause the write to be serialized with the previous read operation. However, if you weren't previously reading from the resource in the command list, then yes, WriteBufferImmediate is an excellent replacement for CopyBufferRegion.

Share this post


Link to post
Share on other sites
On ‎05‎/‎12‎/‎2017 at 7:19 PM, SoldierOfLight said:

Right, to clarify, the barrier to COPY_DEST would cause the write to be serialized with the previous read operation. However, if you weren't previously reading from the resource in the command list, then yes, WriteBufferImmediate is an excellent replacement for CopyBufferRegion.

Right now I am using an upload heap allocator that the CPU writes and issues a CopyBufferRegion into a default heap resource. Each buffer has one default heap and the upload heap is a global heap used by all buffers. This way on each update I will have:

  1. allocate next chunk from upload heap
  2. memcpy into upload heap
  3. transition barrier from constant buffer to copy_dest
  4. CopyBufferRegion(default_heap, 0, upload_heap, upload_heap_offset, dataSize)
  5. transition back to constant buffer
  6. bind constant buffer to pixel shader

Do you think this would be acceptable/standard way of doing this? I could not test perf yet, I'm just setting everything up. Data seems correct in the debugger.

The WriteBufferImmediate would be nearly exactly the same, but I copy my constant buffer to the command list.

Share this post


Link to post
Share on other sites

What you've implemented is the D3D11 equivalent of UpdateSubresource on a default constant buffer. WriteBufferImmediate would be roughly the same thing. In my experience, most people prefer to implement the D3D11 equivalent of Map(DISCARD) on a dynamic constant buffer, which would mean just binding your upload heap directly to the pixel shader.

Share this post


Link to post
Share on other sites

Thanks for this, great information. 

12 hours ago, MJP said:

You'll also have to track your fence on the DIRECT queue  to know when to free your chunk from the UPLOAD heap.

I've been thinking about just leaving the fence and freeing the UPLOAD heaps on frame start. I have unique upload heaps per frame for double (or triple) buffering. I will have a fence only when there are no more frames available to be queued up which the GPU hasn't finished yet. 

Also very interesting way of using the copy queue, I will not get into that yet but seems like an interesting technique. I heard that the copy queue could be slower, but it would use different hardware units so utilization could be better. Could you compare this with different hardware vendors as well?

This also cleared up some confusion, thanks for this:

18 hours ago, SoldierOfLight said:

What you've implemented is the D3D11 equivalent of UpdateSubresource on a default constant buffer. WriteBufferImmediate would be roughly the same thing. In my experience, most people prefer to implement the D3D11 equivalent of Map(DISCARD) on a dynamic constant buffer, which would mean just binding your upload heap directly to the pixel shader.

I just thought that you can't bind an upload heap as shader resource. I guess this way you are creating constant buffer views for each allocation from the heap for binding to the descriptor tables? Does an UPLOAD heap which is shader visible need unmapping or can it also stay mapped forever?

Share this post


Link to post
Share on other sites
3 hours ago, turanszkij said:

I guess this way you are creating constant buffer views for each allocation from the heap for binding to the descriptor tables? Does an UPLOAD heap which is shader visible need unmapping or can it also stay mapped forever?

Yep, that's right, and no it doesn't need to be unmapped.

Share this post


Link to post
Share on other sites
11 hours ago, turanszkij said:

Also very interesting way of using the copy queue, I will not get into that yet but seems like an interesting technique. I heard that the copy queue could be slower, but it would use different hardware units so utilization could be better. Could you compare this with different hardware vendors as well?

I don't have any comprehensive numbers at the moment, so I'll have to try to set up a benchmark at some point. I would guess that the difference would be pretty minimal unless you're uploading a very large buffer. For me it was also somewhat convenient to use the COPY queue since I already had a system in place for initializing resources using the COPY queue, and the buffer updates go through the same system. The IHV's have recommended using the COPY queue for resource initialization, since the DMA units are optimized for pulling lots of data over the PCI-e bus without disrupting rendering too much (which is necessary in D3D11 games that stream in new textures while gameplay is going on).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
  • Advertisement
  • Popular Tags

  • Similar Content

    • By Roman R.
      I have a problem synchronizing data between shared resources. On the input I receive a D3D11 2D texture, which itself is enabled for sharing and has D3D11_RESOURCE_MISC_SHARED_NTHANDLE in its description.
      Having a D3D12 device created on the same adapter I open a resource through sharing a handle.
                  const CComQIPtr<IDXGIResource1> pDxgiResource1 = pTexture; // <<--- Input texture on 11 device
                  HANDLE hTexture;
                  pDxgiResource1->CreateSharedHandle(NULL, GENERIC_ALL, NULL, &hTexture);
                  CComPtr<ID3D12Resource> pResource; // <<--- Counterparty resource on 12 device
                  pDevice->OpenSharedHandle(hTexture, __uuidof(ID3D12Resource), (VOID**) &pResource);
      I tend to keep the mapping between the 11 texture and 12 resource further as they are re-filled with data, but in context of the current problem it does not matter if I reuse the mapping or I do OpenSharedHandle on every iteration.
      Further on I have a command list on 12 device where I use 12 resource (pResource) as a copy source. It is an argument in further CopyResource or CopyTextureRegion calls. I don't have any resource barriers in the command list (including that my attempts to use any don't change the behavior).
      My problem is that I can't have the data synchronized. Sometimes and especially initially the resource has the correct data, however further iterations have issues such as resource having stale/old data.
      I tried to flush immediate context on 11 device to make sure that preceding commands are completed.
      I tried to insert resource barriers at the beginning of command list to possibly make sure that source resource has time to receive the correct data.
      Same time I have other code paths which don't do OpenSharedHandle mapping and instead do additional texture copying and mapping between original 11 device and 11on12 device, and the code including the rest of the logic works well there. This makes me think that I fail to synchronize the data on the step I mentioned above, even though I am lost how do I synchronize exactly outside of command list.
      I originally thought that 12 resource has a IDXGIKeyedMutex implementation which is the case with sharing-enabled 11 textures, but I don't have the IDXGIKeyedMutex and I don't see what is the D3D12 equivalent, if any.
      Could you please advise where to look at to fix the sync?
    • By NikiTo
      Recently I read that the APIs are faking some behaviors, giving to the user false impressions.
      I assume Shader Model 6 issues the wave instructions to the hardware for real, not faking them.

      Is Shader Model 6, mature enough? Can I expect the same level of optimization form Model 6 as from Model 5? Should I expect more bugs from 6 than 5?
      Would the extensions of the manufacturer provide better overall code than the Model 6, because, let say, they know their own hardware better?

      What would you prefer to use for your project- Shader Model 6 or GCN Shader Extensions for DirectX?

      Which of them is easier to set up and use in Visual Studio(practically)?
    • By mark_braga
      I am trying to get the DirectX Control Panel to let me do something like changing the break severity but everything is greyed out.
      Is there any way I can make the DirectX Control Panel work?
      Here is a screenshot of the control panel.
       

    • By Keith P Parsons
      I seem to remember seeing a version of directx 11 sdk that was implemented in directx12 on the microsoft website but I can't seem to find it anymore. Does any one else remember ever seeing this project or was it some kind off fever dream I had? It would be a nice tool for slowly porting my massive amount of directx 11 code to 12 overtime.
    • By NikiTo
      In the shader code, I need to determine to which AppendStructuredBuffers the data should append. And the AppendStructuredBuffers are more than 30.
      Is declaring 30+ AppendStructuredBuffers going to overkill the shader? Buffers descriptors should consume SGPRs.

      Some other way to distribute the output over multiple AppendStructuredBuffers?

      Is emulating the push/pop functionality with one single byte address buffer worth it? Wouldn't it be much slower than using AppendStructuredBuffer?
    • By Sobe118
      I am rendering a large number of objects for a simulation. Each object has instance data and the size of the instance data * number of objects is greater than 4GB. 
      CreateCommittedResource is giving me: E_OUTOFMEMORY Ran out of memory. 
      My PC has 128GB (only 8% ish used prior to testing this), I am running the DirectX app as x64. <Creating a CPU sided resource so GPU ram doesn't matter here, but using Titan X cards if that's a question>
      Simplified code test that recreates the issue (inserted the code into Microsofts D3D12HelloWorld): 
      unsigned long long int siz = pow(2, 32) + 1024; D3D12_FEATURE_DATA_D3D12_OPTIONS options; //MaxGPUVirtualAddressBitsPerResource = 40 m_device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS, &options, sizeof(options)); HRESULT oops = m_device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Buffer(siz), D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(&m_vertexBuffer)); if (oops != S_OK) { printf("Uh Oh"); } I tried enabling "above 4G" in the bios, which didn't do anything. I also tested using malloc to allocate a > 4G array, that worked in the app without issue. 
      Are there more options or build setup that needs to be done? (Using Visual Studio 2015)
      *Other approaches to solving this are welcome too. I thought about splitting up the set of items to render into a couple of sets with a size < 4G each but would rather have one set of objects. 
      Thank you.
    • By _void_
      Hey guys!
      I am not sure how to specify array slice for GatherRed function on Texture2DArray in HLSL.
      According to MSDN, "location" is one float value. Is it a 3-component float with 3rd component for array slice?
      Thanks!
    • By lubbe75
      I have a winforms project that uses SharpDX (DirectX 12). The SharpDX library provides a RenderForm (based on a System.Windows.Forms.Form). 
      Now I need to convert the project to WPF instead. What is the best way to do this?
      I have seen someone pointing to a library, SharpDX.WPF at Codeplex, but according to their info it only provides support up to DX11.
      (Sorry if this has been asked before. The search function seems to be down at the moment)
    • By korben_4_leeloo
      Hi.
      I wanted to experiment D3D12 development and decided to run some tutorials: Microsoft DirectX-Graphics-Samples, Braynzar Soft, 3dgep...Whatever sample I run, I've got the same crash.
      All the initialization process is going well, no error, return codes ok, but as soon as the Present method is invoked on the swap chain, I'm encountering a crash with the following call stack:
      https://drive.google.com/open?id=10pdbqYEeRTZA5E6Jm7U5Dobpn-KE9uOg
      The crash is an access violation to a null pointer ( with an offset of 0x80 )
      I'm working on a notebook, a toshiba Qosmio x870 with two gpu's: an integrated Intel HD 4000 and a dedicated NVIDIA GTX 670M ( Fermi based ). The HD 4000 is DX11 only and as far as I understand the GTX 670M is DX12 with a feature level 11_0. 
      I checked that the good adapter was chosen by the sample, and when the D3D12 device is asked in the sample with a 11_0 FL, it is created with no problem. Same for all the required interfaces ( swap chain, command queue...).
      I tried a lot of things to solve the problem or get some info, like forcing the notebook to always use the NVIDIA gpu, disabling the debug layer, asking for a different feature level ( by the way 11_0 is the only one that allows me to create the device, any other FL will fail at device creation )...
      I have the latest NVIDIA drivers ( 391.35 ), the latest Windows 10 sdk ( 10.0.17134.0 ) and I'm working under 
      Visual Studio 2017 Community.
      Thanks to anybody who can help me find the problem...
    • By _void_
      Hi guys!
      In a lot of samples found in the internet, people when initialize D3D12_SHADER_RESOURCE_VIEW_DESC with resource array size 1 would normallay set its dimension as Texture2D. If the array size is greater than 1, then they would use dimension as Texture2DArray, for an example.
      If I declare in the shader SRV as Texture2DArray but create SRV as Texture2D (array has only 1 texture) following the same principle as above, would this be OK? I guess, this should work as long as I am using array index 0 to access my texture?
      Thanks!
  • Advertisement
  • Popular Now

  • Forum Statistics

    • Total Topics
      631396
    • Total Posts
      2999783
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!