Jump to content
  • Advertisement
Sign in to follow this  
Mr_Fox

Changing heaps should be avoided?

This topic is 1015 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Intel has mentioned that Changing heaps is expensive: https://software.intel.com/sites/default/files/managed/4a/38/Efficient-Rendering-with-DirectX-12-on-Intel-Graphics.pdf  page 35

 

Changing heaps is expensive (pipeline flush)

– Ideally use a single heap of each type (sampler, CBV/SRV/UAV)

– Exception: changing heaps at command list boundary is “free”

 

So, I just wondering, if changing heaps at commandlist boundary is "free", could we giving each cmdlist it's own CBV/SRV/UAV heap, and make sure we only change heaps at commandlist boundary? or is this pattern recommended?(definetly Intel doesn't think so...)   It seems descriptor management will be much easier in this way.

Share this post


Link to post
Share on other sites
Advertisement

I believe the official samples that use multiple command lists create a separate CBV/SRV/UAV heap per command list. I don't know if that makes it "best practice," but it is at least "common practice."

Share this post


Link to post
Share on other sites

I believe the official samples that use multiple command lists create a separate CBV/SRV/UAV heap per command list. I don't know if that makes it "best practice," but it is at least "common practice."

Is there an advantage creating multiple heaps vs one superheap per type, and having every command list set to that? You'd have to copy descriptors around but it seems like that happens a lot anyway.

Share this post


Link to post
Share on other sites

In my app I use two pair of descriptor heap (one for CBV/UAV/SRV and one for samplers) that swaps every frame and it works quite well so far with up to 2000 draw calls despite recreating the descriptor every time. Apparently creating descriptor is not that costly (although maybe in the future I will try to reuse older ones when possible).

The resources one is 10000 descriptors big while the samplers one is 2048 big (the max possible amount).

 

At the beginning I tried to use one command list per draw call and there was a noticeable overhead. I read that the most efficient count is 1000 draw calls per command list, and having too many of them doesn't help. 

Share this post


Link to post
Share on other sites

2048 samplers is not a big amount and having one heap per frame is a cheap way to double it.

For CBV/UAV/SRV I could use a single heap, but since I reset the command list every frame I don't know if it changes anything.

Share this post


Link to post
Share on other sites

I noticed this thread this morning and it kind of stuck with me all day.  I'm curious about my understanding of descriptor heap usage and whether I'm missing something.  

 

This evening I went back and checked a few of the MS samples because I thought I remembered them using a single descriptor heap, and indeed so far as I can tell they use the same desc heap for every frame and every commandlist within those frames.  Even likely candidates still stick to a single desc heap, such as the nbodygravity sample (separate graphics and async compute command queues/lists), or the multithreading sample with 2 decent sized commandlists in each of the 3 worker threads (shadow and main pass in each thread).  

 

I'm having trouble seeing why you might want to use multiple descriptor heaps.  Looking at the ways you could implement multiple heaps:

 

heap-per-frame:  With a desc heap per-backbuffer, lets say 2, alternating each frame, your SRV and UAV descriptors will be duplicated, the CBV descriptors will be unique.  You can generally split constants into groups by the frequency they're bound; per-draw and per-frame, and really per-draw constants shouldn't be in a descriptor heap or set through a table, but instead set directly with a root CBV.  That only leaves your per-frame constants like camera/light viewproj matrices, viewport size, etc, which are relatively few compared to SRVs/UAVS.  It seems you're doubling your total desc heap size for virtually no benefit, since with a single heap the only descriptor offsets you need to adjust between frames are your few per-pass constants.  

 

heap-per-thread  With a separate heap per-thread, unless you duplicate all the SRV/UAV/CBVs across all heaps (identical heaps), you can't process any pass in parallel (break a pass's draw calls into chunks and have each thread build the corresponding command list).  You'd need to know ahead of time which thread is processing which draw call in each pass to ensure the corresponding views are available in that thread's desc heap.  So per-thread desc heaps make little sense.

 

heap-per-pass(cmdlist?)  I'm guessing giving each render pass it's own desc heap was the meaning of giving each cmdlist it's own heap.  With this you could at least process each pass's draw calls across multiple threads, unlike with the above.  But you'd still have the issue of needless duplication any time two different passes utilize the same resource views.  It also seems like it would add complexity in other ways (e.g. in a render queue, a material would have multiple handles for one of it's textures if it can be used in multiple passes?).  It wouldn't have a downside for passes that only use completely unique resource views unrelated to any other passes, admittedly, but that seems a very narrow use case.

 

The documentation has some detailed guidance on descriptor heap usage in the "switching" and "management" sections here.  There's also a note about using too many very large descriptor heaps on this page.

 

I really am curious if I'm missing something.  

 

Lastly, I realize this will only seem more ignorant, but as an aside on sampler desc heaps, when would you ever need 2048 unique samplers??  Even with a complex frame and supporting all different levels of maxaniso with a selection of filters, address modes, comparisons and LoD limits it would only be several 100s.  Additionally in some of those cases it would make more sense to instead use static samplers if a shader always uses the same samplers? (also saving a DWORD in the root sig size).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!