• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By lubbe75
      As far as I understand there is no real random or noise function in HLSL. 
      I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
      Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious? 
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By NikiTo
      Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.
       
      if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
      Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

      MSN>
      discard: Do not output the result of the current pixel.
      clip: Discards the current pixel..
      <MSN

      As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

      I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

      (what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
    • By NikiTo
      I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
      Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

      Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!



       
    • By Axiverse
      I'm wondering when upload buffers are copied into the GPU. Basically I want to pool buffers and want to know when I can reuse and write new data into the buffers.
  • Advertisement
  • Advertisement

DX12 ID3D12CommandAllocator questions

Recommended Posts

I am working on making a DX12, Vulkan framework run on CPU and GPU in parallel.

Decided to finish the Vulkan implementation before DX12. (Eat the veggies before having the steak XDD)

I have a few questions about the usage of ID3D12CommandAllocator:

  • Different sized command lists should use different allocators so the allocators dont grow to worst size
    • Does this mean that I need to know the size of the command list before calling CreateCommandList and pass the appropriate allocator?
  • Try to keep number of allocators to a minimum
    • What are the pitfalls if I create a command allocator per list? This way each allocator will never grow too large for the list. In addition, there will be no need for synchronization.

Most of the examples I have seen just use a pool of allocators and do fence based synchronization. I can modify that to also consider command list size but before that any advice on this will really help me to understand the internal workings of the ID3D12CommandAllocator in a better way.

Share this post


Link to post
Share on other sites
Advertisement

The idea behind reusing allocators is to reuse block of memory needed by the command list generation, possibly. There is not a single strategy here, especially because you still don't know internally how the allocators behaves and where may be the sweet spots.

 

I can only at best give you ONE idea on how to reuse them :) Because an allocator can only bound to one active command list at a time, you can create a command allocator per thread producing command list (imagine a job queue system, a job produce a command list ). You can then have a pool of allocator to alternate between frames and give you enough time for the gpu to execute them before you again need to reset a set of allocators. 

Edited by galop1n

Share this post


Link to post
Share on other sites
On 8/9/2017 at 6:20 PM, mark_braga said:

Different sized command lists should use different allocators so the allocators dont grow to worst size

  • Does this mean that I need to know the size of the command list before calling CreateCommandList and pass the appropriate allocator?

 

Yes to the first part, not exactly (but close) to the bulleted question.  If you can swing it, you should consider using the same or similar allocators for the same command lists.  For example, let's say you're doing deferred rendering and are drawing some number of items.  Your pass to fill the gbuffer may take (for example's sake) 200 calls to various functions on the command list, which are written to the allocator.  Your shadowmap pass has to be run for each shadow-casting light, so you'll have somewhere in the neighborhood of 200 * numLights entries in the command list, and therefore that allocator.  If in a subsequent frame, you use the allocator you initially used for the gbuffer pass for the shadow map pass, that allocator will grow to the worst-case scenario size, i.e., the size required by the shadowmap pass.  If instead you keep that allocator designated for gbuffer stuff only, it'll only be sized enough for your worst-case gbuffer pass, so could save on your overall memory budget.

To answer whether you need to know the size of the command list before calling CreateCommandList, not precisely, but it's helpful to have some idea of how the command list will be used so you have a ballpark idea of how it will affect the allocator bound to it.

On 8/9/2017 at 6:20 PM, mark_braga said:

Try to keep number of allocators to a minimum

  • What are the pitfalls if I create a command allocator per list? This way each allocator will never grow too large for the list. In addition, there will be no need for synchronization.

 

Keep them to a minimum, but don't dig yourself into a hole.  You can reuse an existing command list to write into a separate allocator than the one you initially created it with while the initial allocator's content is being consumed by the GPU.  You still must set and track your fence values to be sure that the GPU is done with the allocator you're about to reset and reuse.  You should create enough to cover your maximum latency, but if it comes down to it, you still must be able to block and wait for the GPU to finish it's work if all the command allocators are occupied.  This is like the pool of allocators @galop1n mentioned in his post.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Advertisement