mark_braga

DX12 ID3D12CommandAllocator questions

Recommended Posts

mark_braga    4

I am working on making a DX12, Vulkan framework run on CPU and GPU in parallel.

Decided to finish the Vulkan implementation before DX12. (Eat the veggies before having the steak XDD)

I have a few questions about the usage of ID3D12CommandAllocator:

  • Different sized command lists should use different allocators so the allocators dont grow to worst size
    • Does this mean that I need to know the size of the command list before calling CreateCommandList and pass the appropriate allocator?
  • Try to keep number of allocators to a minimum
    • What are the pitfalls if I create a command allocator per list? This way each allocator will never grow too large for the list. In addition, there will be no need for synchronization.

Most of the examples I have seen just use a pool of allocators and do fence based synchronization. I can modify that to also consider command list size but before that any advice on this will really help me to understand the internal workings of the ID3D12CommandAllocator in a better way.

Share this post


Link to post
Share on other sites
galop1n    977

The idea behind reusing allocators is to reuse block of memory needed by the command list generation, possibly. There is not a single strategy here, especially because you still don't know internally how the allocators behaves and where may be the sweet spots.

 

I can only at best give you ONE idea on how to reuse them :) Because an allocator can only bound to one active command list at a time, you can create a command allocator per thread producing command list (imagine a job queue system, a job produce a command list ). You can then have a pool of allocator to alternate between frames and give you enough time for the gpu to execute them before you again need to reset a set of allocators. 

Edited by galop1n

Share this post


Link to post
Share on other sites
WFP    2781
On 8/9/2017 at 6:20 PM, mark_braga said:

Different sized command lists should use different allocators so the allocators dont grow to worst size

  • Does this mean that I need to know the size of the command list before calling CreateCommandList and pass the appropriate allocator?

 

Yes to the first part, not exactly (but close) to the bulleted question.  If you can swing it, you should consider using the same or similar allocators for the same command lists.  For example, let's say you're doing deferred rendering and are drawing some number of items.  Your pass to fill the gbuffer may take (for example's sake) 200 calls to various functions on the command list, which are written to the allocator.  Your shadowmap pass has to be run for each shadow-casting light, so you'll have somewhere in the neighborhood of 200 * numLights entries in the command list, and therefore that allocator.  If in a subsequent frame, you use the allocator you initially used for the gbuffer pass for the shadow map pass, that allocator will grow to the worst-case scenario size, i.e., the size required by the shadowmap pass.  If instead you keep that allocator designated for gbuffer stuff only, it'll only be sized enough for your worst-case gbuffer pass, so could save on your overall memory budget.

To answer whether you need to know the size of the command list before calling CreateCommandList, not precisely, but it's helpful to have some idea of how the command list will be used so you have a ballpark idea of how it will affect the allocator bound to it.

On 8/9/2017 at 6:20 PM, mark_braga said:

Try to keep number of allocators to a minimum

  • What are the pitfalls if I create a command allocator per list? This way each allocator will never grow too large for the list. In addition, there will be no need for synchronization.

 

Keep them to a minimum, but don't dig yourself into a hole.  You can reuse an existing command list to write into a separate allocator than the one you initially created it with while the initial allocator's content is being consumed by the GPU.  You still must set and track your fence values to be sure that the GPU is done with the allocator you're about to reset and reuse.  You should create enough to cover your maximum latency, but if it comes down to it, you still must be able to block and wait for the GPU to finish it's work if all the command allocators are occupied.  This is like the pool of allocators @galop1n mentioned in his post.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Similar Content

    • By nbertoa
      I want to implement anti-aliasing in BRE, but first, I want to explore what it is, how it is caused, and what are the techniques to mitigate this effect. That is why I am going to write a series of articles talking about rasterization, aliasing, anti-aliasing, and how I am going to implement it in BRE.
      Article #1: Rasterization
      All the suggestions and improvements are very welcome! I will update this posts with new articles
    • By mark_braga
      I am working on optimizing barriers in our engine but for some reason can't wrap my head around split barriers.
      Lets say for example, I have a shadow pass followed by a deferred pass followed by the shading pass. From what I have read, we can put a begin only split barrier for the shadow map texture after the shadow pass and an end only barrier before the shading pass. Here is how the code will look like in that case.
      DrawShadowMapPass(); ResourceBarrier(BEGIN_ONLY, pTextureShadowMap, SHADER_READ); DrawDeferredPass(); ResourceBarrier(END_ONLY, pTextureShadowMap, SHADER_READ); // Uses shadow map for shadow calculations DrawShadingPass(); Now if I just put one barrier before the shading pass, here is how the code looks.
      DrawShadowMapPass(); DrawDeferredPass(); ResourceBarrier(NORMAL, pTextureShadowMap, SHADER_READ); // Uses shadow map for shadow calculations DrawShadingPass(); Whats the difference between the two?
      Also if I have to use the render target immediately after a pass. For example: Using the albedo, normal textures as shader resource in the shading pass which is right after the deferred pass. Would we benefit from a split barrier in this case?
      Maybe I am completely missing the point so any info on this would really help. The MSDN doc doesn't really help. Also, I read another topic 
      but it didn't really help either. 
    • By ZachBethel
      I'm reading through the Microsoft docs trying to understand how to properly utilize aliasing barriers to alias resources properly.
      "Applications must activate a resource with an aliasing barrier on a command list, by passing the resource in D3D12_RESOURCE_ALIASING_BARRIER::pResourceAfter. pResourceBefore can be left NULL during an activation. All resources that share physical memory with the activated resource now become inactive or somewhat inactive, which includes overlapping placed and reserved resources."
      If I understand correctly, it's not necessary to actually provide the pResourceBefore* for each overlapping resource, as the driver will iterate the pages and invalidate resources for you. This is the Simple Model.
      The Advanced Model is different:
      Advanced Model
      The active/ inactive abstraction can be ignored and the following lower-level rules must be honored, instead:
      An aliasing barrier must be between two different GPU resource accesses of the same physical memory, as long as those accesses are within the same ExecuteCommandLists call. The first rendering operation to certain types of aliased resource must still be an initialization, just like the Simple Model. I'm confused because it looks like, in the Advanced Model, I'm expected to declare pResourceBefore* for every resource which overlaps pResourceAfter* (so I'd have to submit N aliasing barriers). Is the idea here that the driver can either do it for you (null pResourceBefore) or you can do it yourself? (specify every overlapping resource instead)? That seems like the tradeoff here.
      It would be nice if I can just "activate" resources with AliasingBarrier (NULL, activatingResource) and not worry about tracking deactivations.  Am I understanding the docs correctly?
      Thanks.
    • By amadeus12
      I'm building my own tool, it's 'jobless.0.3'
      I'm parsed objects and write it in my style  and save it.
      the model file it self is no problem. I think the mode is problem.
      I saw uv coordinates is more than 1.
      what kind of sampler state I should use?
      this is my format below
      vertex
      {
      p:0.000000,0.100000,4.500000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:0.000000,0.937500;
      p:24.000000,0.100000,4.500000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:2.000000,0.937500;
      p:0.000000,0.100000,2.890000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:0.000000,1.138750;
      p:24.000000,0.100000,2.890000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:2.000000,1.138750;
      p:0.000000,0.000000,-2.880000 n:-0.000000,0.099504,0.995037 t:1.000000,0.000000,0.000000 u:-0.000023,1.859195;
      p:24.000000,0.000000,-2.880000 n:-0.000000,0.099504,0.995037 t:1.000000,0.000000,0.000000 u:1.999977,1.859195;
      p:0.000000,0.100000,-2.890000 n:-0.000000,0.099504,0.995037 t:1.000000,0.000000,0.000000 u:-0.000023,1.871758;
      p:24.000000,0.100000,-2.890000 n:-0.000000,0.099504,0.995037 t:1.000000,0.000000,0.000000 u:1.999977,1.871758;
      p:24.000000,0.000000,2.880000 n:-0.000000,0.099504,-0.995037 t:1.000000,0.000000,-0.000000 u:2.000005,1.140554;
      p:0.000000,0.000000,2.880000 n:-0.000000,0.099504,-0.995037 t:1.000000,0.000000,-0.000000 u:0.000006,1.140554;
      p:24.000000,0.100000,2.890000 n:-0.000000,0.099504,-0.995037 t:1.000000,0.000000,-0.000000 u:2.000005,1.127992;
      p:0.000000,0.100000,2.890000 n:-0.000000,0.099504,-0.995037 t:1.000000,0.000000,-0.000000 u:0.000006,1.127992;
      p:24.000000,0.100000,-4.500000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:2.000000,2.062500;
      p:0.000000,0.100000,-4.500000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:0.000000,2.062500;
      p:24.000000,0.100000,-2.890000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:2.000000,1.861250;
      p:0.000000,0.100000,-2.890000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:0.000000,1.861250;
      p:24.000000,0.000000,-2.880000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:2.000000,1.860000;
      p:0.000000,0.000000,-2.880000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:0.000000,1.860000;
      p:24.000000,0.000000,2.880000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:2.000000,1.140000;
      p:0.000000,0.000000,2.880000 n:0.000000,1.000000,-0.000000 t:1.000000,0.000000,0.000000 u:0.000000,1.140000;
      }
      indice
      {
      0 1 2 
      2 1 3 
      4 5 6 
      6 5 7 
      8 9 10 
      10 9 11 
      12 13 14 
      14 13 15 
      16 17 18 
      18 17 19 
      }
      texture
      {
      diffuse :Road05.jpeg
      normal :Road05_NORM.png
      }
      effect :normal
      bufferkey :road
       
      look it's 2.0000000 and 1.86
      it's more than 1.
      and I using default samplerstate which is wrap mode.
      I think this is the problem then what adrress mode should I use?
       
      this is picture

      other model fine. it's parsed same parser that i made.
      road image only.

    • By piluve
      Hello!
      Jumping from one bug the the next .
      I wonder, If having the same descriptor table is a bad idea or if it should be avoided? In my rendering code, I'm only using one command list (graphics), depending if the current call is a Draw* or a Dispatch, I set the Descriptor table with SetComputeRootDescriptorTable or with SetGraphicsRootDescriptorTable.  
      I'm asking this because I encountered a bug where, if I perform a draw call in between some compute, the following compute work will be all messed up and not working properly. This problem may not be related at all with the main question (I think there is something going on with my depth buffer).
      Thanks!
  • Popular Now