• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By NikiTo
      Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.
       
      if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
      Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

      MSN>
      discard: Do not output the result of the current pixel.
      clip: Discards the current pixel..
      <MSN

      As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

      I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

      (what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
    • By NikiTo
      I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
      Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

      Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!



       
    • By Axiverse
      I'm wondering when upload buffers are copied into the GPU. Basically I want to pool buffers and want to know when I can reuse and write new data into the buffers.
    • By NikiTo
      AMD forces me to use MipLevels in order to can read from a heap previously used as RTV. Intel's integrated GPU works fine with MipLevels = 1 inside the D3D12_RESOURCE_DESC. For AMD I have to set it to 0(or 2). MSDN says 0 means max levels. With MipLevels = 1, AMD is rendering fine to the RTV, but reading from the RTV it shows the image reordered.

      Is setting MipLevels to something other than 1 going to cost me too much memory or execution time during rendering to RTVs, because I really don't need mipmaps at all(not for the 99% of my app)?

      (I use the same 2D D3D12_RESOURCE_DESC for both the SRV and RTV sharing the same heap. Using 1 for MipLevels in that D3D12_RESOURCE_DESC gives me results like in the photos attached below. Using 0 or 2 makes AMD read fine from the RTV. I wish I could sort this somehow, but in the last two days I've tried almost anything to sort this problem, and this is the only way it works on my machine.)


  • Advertisement
  • Advertisement

DX12 [D3D12] Descriptor Heap Strategies

Recommended Posts

A lot of DX12 articles talk about implementing the descriptor heap entries as a ring buffer (notably the nVidia Do's and Don'ts).  I've also read in these forums that some people prefer a stack-allocated scheme.  I don't see why these methods would be the preferred way of solving this problem.  A ring buffer of descriptors is great if you're always adding new descriptors while deleting the oldest ones.  But what happens when you want to remove a descriptor from the middle of the active set?  And as for a stack-allocated scheme, wouldn't that involve copying in the descriptors every frame?  Why wouldn't something like a free-list or buddy allocator be preferable to either of these setups?

Share this post


Link to post
Share on other sites
Advertisement

I guess what I don't understand is why there would be a lot of objects with transient lifetimes.  It seems like most textures and constant buffers are going to stick around for awhile.  In fact it seems like adding new descriptors/removing old ones would happen pretty infrequently.  Can you describe a use case where a majority of objects would require new descriptors every frame?  And also, are you saying to call CreateShaderResourceView/CreateConstantBufferView every frame?

Share this post


Link to post
Share on other sites

The problem isn't the individual textures having transient lifetimes, but rather the sets/tables of textures having transient lifetimes, as they're arbitrarily combined by the engine. I've seen both sides of this, one where every descriptor table was pre-allocated at the initialization of the engine, and others where everything was dynamic. In the static case, the unit of allocation was descriptor tables of fixed size, used in a heap allocator scheme. In the dynamic case, the unit of allocation was the descriptor or view.

 

For the dynamic case, a common pattern is to use a set of "offline" descriptor heaps which exist on the CPU timeline to stage the descriptors, and CopyDescriptors on a per-frame basis to gather them into "online" descriptor heaps, into tables for binding. The Create*View APIs only need to be called on these "offline" descriptor heaps.

Share this post


Link to post
Share on other sites

Until now I've just been creating two descriptors for any per-frame resources (mostly buffers that are constantly updated),  But I've also been calling SetGraphicsRootDescriptorTable for every bound resource, rather than batching things into contiguous regions and minimizing those calls.  This has worked fine for the relatively small shaders I've tested my scenes with, but it's clear now that this strategy could quickly hit a wall.

 

It's pretty much a classic allocation problem, except there's no reason not to apply extra memory/processing power to making the allocations/deallocations as fast as possible.  I am trying to dream up a faster scheme, but so far it seems like the ring buffer / stack allocator strategy is the way to go.

 

That bindless strategy is intriguing.  That approach would use a common root signature with access to every descriptor, right?  It might be tricky getting root constants to work with that, but the tradeoff for not having to manage the descriptor heap is enticing...

Edited by Funkymunky

Share this post


Link to post
Share on other sites

After some deliberation, I think I'm going to adopt the following scheme:

 

I'll create one or more "offline" heaps to create descriptors in, as this will let me create resources in separate threads.  For the "online" heap, I'll use a freelist allocator to give me descriptor ranges.  I'll track three lists for maintaining this.  The first will be a list of available allocations, sorted by size (a normal freelist).  The second will be a list of allocated and deallocated ranges, sorted by their offset from the start of the heap.  The third will be a list of just the deallocated ranges, also sorted by their offset from the start of the heap.  (The second and third lists will use the same structures, with each structure having pointers to its neighbors).

 

Every frame I will run a basic defragmentation pass.  It will look at the first entry in that third list (deallocations).  If the neighbor to the right of that entry is also a deallocation, then I will coalesce the two into a single deallocation.  If the neighbor is instead already-allocated, then I will shuffle that allocated range to the left, essentially bubbling the deallocation toward the end of the heap.

 

In practice, I'll probably split the "online" heap into multiple regions (one for each frame).  This way I can shuffle descriptors after a fence without disrupting something that's being used.  I think as long as I don't hammer the heap with constant allocations and fragmenting deallocations, this should keep me relatively well managed.  And even if I do, I can always increase the number of defragmentation passes to keep things in check.

Share this post


Link to post
Share on other sites

My approach to this was every command list gets a portion of the heap so there is no need for any kind of fence synchronization. Works well with Vulkan too where every command list has its own descriptor pool. The user has the option to specify the size of the sub-allocation and some more customizations. This approach guarantees lock free descriptor allocation except for the one time where the command list needs to sub-allocate its "local" descriptor heap from the global one.

Edited by mark_braga

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Advertisement