Jump to content
  • Advertisement
Sign in to follow this  
Funkymunky

[D3D12] CBuffer layout, contiguous or interleaved?

This topic is 1031 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Let's say I have a shader that uses 2 textures and 3 cbuffers.  I've actually set up 6 cbuffers so as to support a 2-frame update scheme.  The textures are t0 and t1, and the buffers are b0[0], b0[1], b1[0], b1[1], b2[0], b2[1].  I've read that cache locality isn't as important for GPU work, but I'm still wondering, would it be better to lay out the shader resources interleaved like this:

 

t0 | t1b0[0] | b0[1] | b1[0] | b1[1] | b2[0] | b2[1]

 

or contiguously, like this:

 

t0 | t1 | b0[0] | b1[0] | b2[0] | b0[1] | b1[1] | b2[1]

 

...I doubt it's going to make or break my performance, but I'm interested in doing it right since I'm at the point in development where I have to decide how it happens.

Share this post


Link to post
Share on other sites
Advertisement

Do you mean, that the two textures views and six buffer views are all allocated contiguously within the one heap, and you're wondering about the order they should be stacked inside that heap?

Share this post


Link to post
Share on other sites

I'm guessing that you are asking what Hodgman suspects... my intuition is that it's going to make no difference, at all. I guess that if the CBs are small enough, then the second layout could be faster in some scenario, but I can't really imagine a set of architecture decisions that would ever make the first layout faster. Of course, GPUs are often not susceptible to intuition, so your best bet is to test both and measure. (Then remember that whatever results you get are probably inverted from what they'll be on a different vendor, or in the next generation of HW from your vendor).

Share this post


Link to post
Share on other sites

I certainly hope that your 2-frame buffer update scheme doesn't mean this:

 

cbuffer MyBuffer myBuffer0 : register(b0);
cbuffer MyBuffer myBuffer1 : register(b1);
 
Texture2D myTex0 : register(t0);
Texture2D myTex1 : register(t1);
 
void main()
{
    if( frame == 0 )
    {
        //Use myTex0 and myBuffer0;
    }
    else
    {
        //Use myTex1 and myBuffer1;
    }
}

 

Because that would be really bad.

Share this post


Link to post
Share on other sites
If I understand correctly you're creating a "static" heap for every object in your world and you're reusing them each frame when necessary. At least that's what the single texture views but duplicated buffer views suggest.

According to msdn switching descriptor heap might be costly on some hardware except at "command list boundaries" so I guess it's better to use a single "ring" descriptor heap for every pipeline state object or every object in your app and update it on the fly with corresponding barriers. Writing a view is cheap in my experience.
Of course this comes from doc and a proper benchmark would be better to know the impact in your case.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!