• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
360GAMZ

DX11
[DX11] Fastest way to update a constant buffer per draw call

9 posts in this topic

Let's say I have 500 draw calls per frame and all 500 draw calls use the same shader and that shader uses one constant buffer. Let's also assume that the data in the constant buffer needs to be built dynamically for each draw call. What would be the most desirable way to update the constant buffers, in terms of efficiency?

A) Create a single constant buffer and call Map/Unmap on that same constant buffer before each draw call.

B) Create 500 constant buffers, one for each draw call, and call Map/Unmap on the draw call's own constant buffer.

C) Or, another idea?

I know that for (A) the driver will rename the buffer each time I Map it, discarding the previous contents which is fine. But is it ok to expect that the driver can handle hundreds or even thousands of renames per frame? And I assume the rename process consumes some time, too.

On the other hand, (B) avoids the renaming and any associated overhead at the expense of possibly more video memory being consumed (500 constant buffers, even if fewer draw calls are actually used) and more code complexity.
0

Share this post


Link to post
Share on other sites
On older GPUs, there's no such thing as a cbuffer; there's just one global set of shader registers. On these cards, when you ask to set a cbuffer, it copies the register-id/value pairs out of the cbuffer and into the command-buffer. The GPU consumes the command-buffer in order, reading out the register values before reading the draw-call.
For these kinds of GPUs, I'd theorise that option (A) would be the most efficient, as there really is no cbuffer management going on behind the scenes.

On newer GPUs, it's possible for cbuffers to be stored in VRAM, and then moved into registers when required. On these cards, when you put data into a cbuffer, it can actually perform a VRAM transfer ([i]and possibly issue a cache-invalidation command to the command-buffer[/i]). When you bind a cbuffer, you're writing a command into the command buffer that instructs the GPU to fetch some register values from VRAM.
On these cards, using option (B) would allow you to perform all of the VRAM transfers well in advance of any draw-calls that use that data, which reduces the amount of data flowing through the command-buffer. However, as you're still moving the same amount of data to the GPU every frame anyway ([i]as you're regenerating the cbuffers each frame[/i]), there isn't really a bandwidth saving here... though it still might be more efficient...
You'd probably have to test it ([i]on multiple GPUs[/i]) to find out [img]http://public.gamedev.net/public/style_emoticons/default/tongue.gif[/img]


On [i]really [/i]old GPUs, there's no such thing as cbuffers AND there's no such thing as shader registers! On these cards, when you set a cbuffer, the driver actually takes the compiled shader code and inserts new instructions into it that contain your shader values ([i]now as hard-coded numbers, not variables[/i]). On this class of GPUs, no matter what you do, setting shader variables is going to be bad for performance, as every change-of-variables actually produces a whole new shader program ;)
2

Share this post


Link to post
Share on other sites
This is a DX11 compliant card. An NVIDIA GeForce GTX 460, for example. The cbuffers are indeed in VRAM on this type of graphics card. I suppose my question boils down to, is it ok to assume that the driver for this class of modern graphics card can handle hundreds or even thousands of buffer renames each frame without breaking a sweat? Or is the buffer renaming mechanism really there only to handle a few rare cases of multiple Map/Unmaps to the same buffer?



0

Share this post


Link to post
Share on other sites
Is there any reason you can't generate the data up front, before issuing draw calls, then build one large cbuffer and index in the shader based on an instance ID? Maybe split this up over a few buffers depending on cbuffer size so you aren't updating a massive chunk of data in one go.

So; [generate all data] -> [bind] -> [draw objects as required with indexing]

Generating data at render time seems like Bad Voodoo to me anyway; render time should just be rendering, sort your data out before hand.
0

Share this post


Link to post
Share on other sites
[quote name='360GAMZ' timestamp='1323135041' post='4890936']
Let's say I have 500 draw calls per frame and all 500 draw calls use the same shader and that shader uses one constant buffer. Let's also assume that the data in the constant buffer needs to be built dynamically for each draw call. What would be the most desirable way to update the constant buffers, in terms of efficiency?

A) Create a single constant buffer and call Map/Unmap on that same constant buffer before each draw call.

B) Create 500 constant buffers, one for each draw call, and call Map/Unmap on the draw call's own constant buffer.

C) Or, another idea?

I know that for (A) the driver will rename the buffer each time I Map it, discarding the previous contents which is fine. But is it ok to expect that the driver can handle hundreds or even thousands of renames per frame? And I assume the rename process consumes some time, too.

On the other hand, (B) avoids the renaming and any associated overhead at the expense of possibly more video memory being consumed (500 constant buffers, even if fewer draw calls are actually used) and more code complexity.
[/quote]


To A) i belive that you shuld use UpdateResource instead.
think i read it in the sdk that states that it´s faster for constant buffers.

map/unmap is for vertexbuffers and textures i think.
NOTE, not 100% sure.

1

Share this post


Link to post
Share on other sites
[quote name='phantom' timestamp='[url="tel:1323179144"]1323179144[/url]' post='[url="tel:4891086"]4891086[/url]']
Is there any reason you can't generate the data up front, before issuing draw calls, then build one large cbuffer and index in the shader based on an instance ID? Maybe split this up over a few buffers depending on cbuffer size so you aren't updating a massive chunk of data in one go.

So; [generate all data] -> [bind] -> [draw objects as required with indexing]

Generating data at render time seems like Bad Voodoo to me anyway; render time should just be rendering, sort your data out before hand.
[/quote]

I'm trying to implement what DICE has done for Battlefield 3, in terms of using buffers to store per-instance matrices to reduce draw calls. The constant buffer will hold data such as the number of matrices in the bone matrix palette, and that number (as well as additional data being stored in it) can be different for each type of object and so needs to be updated for each draw call. Here's a link to the DICE presentation. The instancing section is the first section in the Performance section, about half way through the doc.
http://publications.dice.se/attachments/GDC11_DX11inBF3_Public.pdf
0

Share this post


Link to post
Share on other sites
Right, I see... well, what I said above still stands for most of your data, if you look at slides 30/31 you'll see they have a very small cbuffer for the per-draw call data so you might want to consider how much you place in it.

I suspect if you are moving around small enough buffers either option would be fine; we had a pure CPU limited rendering test at work which was drawing 50,000 cubes and, for each draw call, was doing a map/unmap for a cbuffer on mulitple contexts (6 iirc, there might have been one per context but don't quote me on that, its been a while since I played with that bit of the code). With that test we were good up until around 15,000 draw calls before the driver started to get into trouble internally with memory issues.

Do whatever makes organisation sense I guess...
1

Share this post


Link to post
Share on other sites
My cbuffer consists of eight 32-bit integers, so only 2 vector registers. Pretty darn small. We won't have anywhere near 15,000 draw calls. Probably under 1,000, but we need to maintain 60 FPS at all times. Also, since DICE is using this method, the hardware vendors may target it for optimization in their drivers. Though, there's no telling whether DICE is using a single cbuffer and relying on renaming by the driver, or using a bunch of cbuffers. Or, using UpdateSubresource instead of Map/Unmap, as Tordin mentioned earlier.



0

Share this post


Link to post
Share on other sites
I've not seen anything which says prefered UpdateSubresource over Map/Unmap; a quick look at the SDK docs would suggest that best case the UpdateSubresource function will put it straight into "destination memory", worst case it creates an extra buffer, copies there first and then later copied again into destination memory when the command buffer is flushed. A discard-map would likely do much the same but probably quicker as it doesn't have to worry about checking for resource contention, it can just throw away the reference and grab a new chunk/reuse a chunk of memory.

In short I'd probably go for a discard-map + a cbuffer per object type but make it easy to go with multiples if it proves to be a bottleneck.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By lonewolff
      Hi Guys,
      I am revisiting an old DX11 framework I was creating a while back and am scratching my head with a small issue.
      I am trying to set the pixel shader resources and am getting the following error on every loop.
      As you can see in the below code, I am clearing out the shader resources as per the documentation. (Even going overboard and doing it both sides of the main PSSet call). But I just can't get rid of the error. Which results in the render target not being drawn.
      ID3D11ShaderResourceView* srv = { 0 }; d3dContext->PSSetShaderResources(0, 1, &srv); for (std::vector<RenderTarget>::iterator it = rtVector.begin(); it != rtVector.end(); ++it) { if (it->szName == name) { //std::cout << it->srv <<"\r\n"; d3dContext->PSSetShaderResources(0, 1, &it->srv); break; } } d3dContext->PSSetShaderResources(0, 1, &srv);  
      I am storing the RT's in a vector and setting them by name. I have tested the it->srv and am retrieving a valid pointer.
      At this stage I am out of ideas.
      Any help would be greatly appreciated
       
    • By bowerbirdcn
      hi, guys, how to understand the math used in CDXUTDirectionWidget ::UpdateLightDir 
      the  following code snippet is taken from MS DXTU source code
       
        D3DXMATRIX mInvView;
          D3DXMatrixInverse( &mInvView, NULL, &m_mView );
          mInvView._41 = mInvView._42 = mInvView._43 = 0;
          D3DXMATRIX mLastRotInv;
          D3DXMatrixInverse( &mLastRotInv, NULL, &m_mRotSnapshot );
          D3DXMATRIX mRot = *m_ArcBall.GetRotationMatrix();
          m_mRotSnapshot = mRot;
          // Accumulate the delta of the arcball's rotation in view space.
          // Note that per-frame delta rotations could be problematic over long periods of time.
          m_mRot *= m_mView * mLastRotInv * mRot * mInvView;
          // Since we're accumulating delta rotations, we need to orthonormalize 
          // the matrix to prevent eventual matrix skew
          D3DXVECTOR3* pXBasis = ( D3DXVECTOR3* )&m_mRot._11;
          D3DXVECTOR3* pYBasis = ( D3DXVECTOR3* )&m_mRot._21;
          D3DXVECTOR3* pZBasis = ( D3DXVECTOR3* )&m_mRot._31;
          D3DXVec3Normalize( pXBasis, pXBasis );
          D3DXVec3Cross( pYBasis, pZBasis, pXBasis );
          D3DXVec3Normalize( pYBasis, pYBasis );
          D3DXVec3Cross( pZBasis, pXBasis, pYBasis );
       
       
      https://github.com/Microsoft/DXUT/blob/master/Optional/DXUTcamera.cpp
    • By YixunLiu
      Hi,
      I have a surface mesh and I want to use a cone to cut a hole on the surface mesh.
      Anybody know a fast method to calculate the intersected boundary of these two geometries?
       
      Thanks.
       
      YL
       
    • By hiya83
      Hi, I tried searching for this but either I failed or couldn't find anything. I know there's D11/D12 interop and there are extensions for GL/D11 (though not very efficient). I was wondering if there's any Vulkan/D11 or Vulkan/D12 interop?
      Thanks!
    • By lonewolff
      Hi Guys,
      I am just wondering if it is possible to acquire the address of the backbuffer if an API (based on DX11) only exposes the 'device' and 'context' pointers?
      Any advice would be greatly appreciated
  • Popular Now