mark_braga

DX12 ID3D12CommandAllocator questions

Recommended Posts

I am working on making a DX12, Vulkan framework run on CPU and GPU in parallel.

Decided to finish the Vulkan implementation before DX12. (Eat the veggies before having the steak XDD)

I have a few questions about the usage of ID3D12CommandAllocator:

  • Different sized command lists should use different allocators so the allocators dont grow to worst size
    • Does this mean that I need to know the size of the command list before calling CreateCommandList and pass the appropriate allocator?
  • Try to keep number of allocators to a minimum
    • What are the pitfalls if I create a command allocator per list? This way each allocator will never grow too large for the list. In addition, there will be no need for synchronization.

Most of the examples I have seen just use a pool of allocators and do fence based synchronization. I can modify that to also consider command list size but before that any advice on this will really help me to understand the internal workings of the ID3D12CommandAllocator in a better way.

Share this post


Link to post
Share on other sites

The idea behind reusing allocators is to reuse block of memory needed by the command list generation, possibly. There is not a single strategy here, especially because you still don't know internally how the allocators behaves and where may be the sweet spots.

 

I can only at best give you ONE idea on how to reuse them :) Because an allocator can only bound to one active command list at a time, you can create a command allocator per thread producing command list (imagine a job queue system, a job produce a command list ). You can then have a pool of allocator to alternate between frames and give you enough time for the gpu to execute them before you again need to reset a set of allocators. 

Edited by galop1n

Share this post


Link to post
Share on other sites
On 8/9/2017 at 6:20 PM, mark_braga said:

Different sized command lists should use different allocators so the allocators dont grow to worst size

  • Does this mean that I need to know the size of the command list before calling CreateCommandList and pass the appropriate allocator?

 

Yes to the first part, not exactly (but close) to the bulleted question.  If you can swing it, you should consider using the same or similar allocators for the same command lists.  For example, let's say you're doing deferred rendering and are drawing some number of items.  Your pass to fill the gbuffer may take (for example's sake) 200 calls to various functions on the command list, which are written to the allocator.  Your shadowmap pass has to be run for each shadow-casting light, so you'll have somewhere in the neighborhood of 200 * numLights entries in the command list, and therefore that allocator.  If in a subsequent frame, you use the allocator you initially used for the gbuffer pass for the shadow map pass, that allocator will grow to the worst-case scenario size, i.e., the size required by the shadowmap pass.  If instead you keep that allocator designated for gbuffer stuff only, it'll only be sized enough for your worst-case gbuffer pass, so could save on your overall memory budget.

To answer whether you need to know the size of the command list before calling CreateCommandList, not precisely, but it's helpful to have some idea of how the command list will be used so you have a ballpark idea of how it will affect the allocator bound to it.

On 8/9/2017 at 6:20 PM, mark_braga said:

Try to keep number of allocators to a minimum

  • What are the pitfalls if I create a command allocator per list? This way each allocator will never grow too large for the list. In addition, there will be no need for synchronization.

 

Keep them to a minimum, but don't dig yourself into a hole.  You can reuse an existing command list to write into a separate allocator than the one you initially created it with while the initial allocator's content is being consumed by the GPU.  You still must set and track your fence values to be sure that the GPU is done with the allocator you're about to reset and reuse.  You should create enough to cover your maximum latency, but if it comes down to it, you still must be able to block and wait for the GPU to finish it's work if all the command allocators are occupied.  This is like the pool of allocators @galop1n mentioned in his post.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Announcements

  • Forum Statistics

    • Total Topics
      628401
    • Total Posts
      2982462
  • Similar Content

    • By Trylz Engine
      Hello !
      I would like to share with you a personnal project i started this Year.
      The Trylz Renderer is a CPU  unidirectional path tracer with DirectX 12 preview written in C++

      General features include:
      User interface with basic settings
      Create scenes from model files and save it in xml files
      Render high quality images. The full features and binaries can be seen on the project page. Its is only for windows at the time
       
      An example render i made with it:

    • By ZachBethel
      Hey all,
      I'm trying to debug some async compute synchronization issues. I've found that if I force all command lists to run through a single ID3D12CommandQueue instance, everything is fine. However, if I create two DIRECT queue instances, and feed my "compute" work into the second direct queue, I start seeing the issues again.
      I'm not fencing between the two queues at all because they are both direct. According to the docs, it seems as though command lists should serialize properly between the two instances of the direct queue because they are of the same queue class.
      Another note is that I am feeding command lists to the queues on an async thread, but it's the same thread for both queues, so the work should be serialized properly. Anything obvious I might be missing here?
      Thanks!
    • By Vilem Otte
      So, I've been playing a bit with geometry shaders recently and I've found a very interesting bug, let me show you the code example:
      struct Vert2Geom { float4 mPosition : SV_POSITION; float2 mTexCoord : TEXCOORD0; float3 mNormal : TEXCOORD1; float4 mPositionWS : TEXCOORD2; }; struct Geom2Frag { float4 mPosition : SV_POSITION; nointerpolation float4 mAABB : AABB; float3 mNormal : TEXCOORD1; float2 mTexCoord : TEXCOORD0; nointerpolation uint mAxis : AXIS; float3 temp : TEXCOORD2; }; ... [maxvertexcount(3)] void GS(triangle Vert2Geom input[3], inout TriangleStream<Geom2Frag> output) { ... } So, as soon as I have this Geom2Frag structure - there is a crash, to be precise - the only message I get is:
      D3D12: Removing Device.
      Now, if Geom2Frag last attribute is just type of float2 (hence structure is 4 bytes shorter), there is no crash and everything works as should. I tried to look at limitations for Shader Model 5.1 profiles - and I either overlooked one for geometry shader outputs (which is more than possible - MSDN is confusing in many ways ... but 64 bytes limit seems way too low), or there is something iffy that shader compiler does for me.
      Any ideas why this might happen?
    • By VietNN
      Hi everyone, I am new to Dx12 and working on a game project.
      My game just crash at CreateShaderResourceView with no infomation output in debug log, just: 0xC0000005: Access violation reading location 0x000001F22EF2AFE8.
      my code at current:
      CreateShaderResourceView(m_texture, &desc, *cpuDescriptorHandle);
       - m_texture address is: 0x000001ea3c68c8a0
      - cpuDescriptorHandle address is 0x00000056d88fdd50
      - desc.Format, desc.ViewDimension, Texture2D.MostDetailedMip, Texture2D.MipLevels is initalized.
      The crash happens all times at that stage but not on same m_texture. As I noticed the violation reading location is always somewhere near m_texture address.
      I just declare a temp variable to check how many times CreateShaderResourceView already called, at that moment it is 17879 (means that I created 17879 succesfully), and CreateDescriptorHeap for cpuDescriptorHandle was called 4190, do I reach any limit?
      One more infomation, if I set miplevel of all texture when create to 1 it seem like there is no crash but game quality is bad. Do not sure if it relative or not.
      Anyone could give me some advise ?
    • By VietNN
      Hi all,
      The D3D12_SHADER_RESOURCE_VIEW_DESC has a member Shader4ComponentMapping but I don't really know what is it used for? As several example set its value to D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING. I also read the document on MSDN but still do not understand anything about it.
      https://msdn.microsoft.com/en-us/library/windows/desktop/dn903814(v=vs.85).aspx
      https://msdn.microsoft.com/en-us/library/windows/desktop/dn770406(v=vs.85).aspx
      Anyone could help me, thank you.
  • Popular Now