difference between D3D11_USAGE_DEFAULT and D3D11_USAGE_DYNAMIC

Started by
3 comments, last by Matias Goldberg 8 years, 4 months ago

msdn says that D3D11_USAGE_DEFAULT is just use for GPU read or write and if we want to write from CPU, we have to use dynamic buffer (D3D11_USAGE_DYNAMIC). but i can update my buffer that is set as D3D11_USAGE_DEFAULT from CPU using UpdateSubresource() like this :

d3d11DevCon->UpdateSubresource(cbPerObjectBuffer, 0, NULL, &cbPerObj, 0, 0);

anybody knows what is difference between D3D11_USAGE_DEFAULT and D3D11_USAGE_DYNAMIC in terms of writing on buffer from CPU ?

Regards,

Advertisement

UpdateSubresource is generally not a direct upload from system memory to video memory, but will probably do something similar to using an intermediate dynamic buffer to upload your data. Because of this, it is good practice to only use UpdateSubresource on resources that won't get updated frequently as the cost of getting data into video memory this way is fairly high.

The difference between DEFAULT and DYNAMIC is exactly as the msdn documentation describes it. A dynamic resource can be directly written to from system memory by using a Map/Unmap, a default resource can't. Have a look at this page for an explanation on dynamic resources: https://msdn.microsoft.com/en-us/library/windows/desktop/dn508285(v=vs.85).aspx

Dynamic buffer resources can be used to build some very powerful and useful tools such as linear or ring allocators for frequently changing data using the D3D11_MAP_WRITE_NO_OVERWRITE flag.

I gets all your texture budgets!

These flags may also serve as hints to the driver for where in memory to store the buffer. A DEFAULT usage buffer might be stored in memory that's fast for the GPU to read from, but slow for everything else; directly in video RAM, for example. On the other hand a DYNAMIC buffer might be stored in memory that's fast for the CPU to write to. This of course will be all dependent on drivers and hardware architecture, but it can be significant. So in other words, you better choose the appropriate D3D11_USAGE depending on how you actually intend to use the buffer. See https://msdn.microsoft.com/en-us/library/windows/desktop/ff476259%28v=vs.85%29.aspx for more information.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

I knew how to update dynamic buffers but since UpdateSubresource is much convenient to use i thought why i have to use dynamic buffers for writing from CPU.

now i know where i have to use UpdateSubresource and where i shouldn't.

Thanks a lot,

When you use UpdateSubresource, you rely on DX and the driver to schedule an asynchronous transfer. If the data to upload is too big (or you have already exhausted the internal schedule queue), DX will stall. This is very bad for performance.

Because the pointer you provide to UpdateSubresource maybe freed at an undisclosed moment, DX runtime/driver can't assume it won't be freed before the async transfer will happen, and thus needs to copy your data to a temporary internal buffer.

Therefore:

  • Best case scenario: DX memcpy's your data CPU->GPU. At a later moment the DX will perform a GPU->GPU transfer. That's two memcpys.
  • Common case scenario: DX memcpy's your data CPU->CPU. At a later moment the DX will perform a CPU->GPU and immediately afterwards a GPU->GPU transfer. That's three memcpys.
  • Worst case scenario: DX will stall, and memcpy CPU->GPU then GPU->GPU. That's two memcpys + stall.

For data that is modified sporadically or small data that gets uploaded often, this works well. For large amounts of data that needs to be uploaded every frame; the two/three extra memcpys (+ potential stall) can hurt you badly. In those cases you use a USAGE_DYNAMIC buffer, in which the GPU will read directly from your CPU-visible buffer.

Note that mapping with DISCARD (often associated with USAGE_DYNAMIC) has an internal memory limit before it stalls too (e.g. don't map more than 4 MBs per frame using discard on AMD drivers); which is why you should use NO_OVERWRITE as much as possible and issue a DISCARD every now and then.

Of course hardware is complex and nothing is as straightforward as it seems: In GCN hardware, writing your data to a StagingBuffer (or a USAGE_DYNAMIC Buffer mapped with NO_OVERWRITE) and then CopySubResources to a USAGE_DEFAULT can end up being faster than using a USAGE_DYNAMIC directly because GCN may end up transferring CPU->GPU in the background using its DMA Engines; and by the time it's time to render, the data is already in GPU memory (which has higher memory bandwidth than the PCIe bus).

But this trick works like crap on Intel hardware because you're just adding extra memcpys on system memory (which has much lower bandwidth than a discrete GPU, and shares it with the CPU).

The ideal is to abstract the buffer mapping interface so that you can switch strategies based on what's faster on each hardware.

Fun times.

This topic is closed to new replies.

Advertisement