Questions on buffers and their lifetime in GPU memory

Started by
10 comments, last by chiffre 6 years, 5 months ago

Heads up: this question is more theoretical than practical. My (minute) knowledge about D3D11 is self taught, so please take any premise I make with additional care. I invite everyone to correct anything I say.  Now to the actual post.

I have a question about the lifetime of a D3D11_USAGE_DEFAULT buffer, used with a D3D11ShaderResourceView as a StructuredBuffer, in GPU memory. At first I need to make sure I am understanding the difference between DEFAULT and DYNAMIC buffers correctly. The way I understand the difference between DEFAULT and DYNAMIC buffers comes from here:

D3D11_USAGE_DEFAULT

D3D11_USAGE_DEFAULT tells the API to store my buffer in memory that is fast to access for the GPU. This does absolutely not guarantee (?) it is located in VRAM, however it is more likely to be located there. I can update the buffer (partially) by using UpdateSubResource. Here is some info from the previously mentioned thread.

On 27.11.2015 at 6:41 PM, Matias Goldberg said:

When you use UpdateSubresource, you rely on DX and the driver to schedule an asynchronous transfer. If the data to upload is too big (or you have already exhausted the internal schedule queue), DX will stall. This is very bad for performance.

 

Because the pointer you provide to UpdateSubresource maybe freed at an undisclosed moment, DX runtime/driver can't assume it won't be freed before the async transfer will happen, and thus needs to copy your data to a temporary internal buffer.

Therefore:

  • Best case scenario: DX memcpy's your data CPU->GPU. At a later moment the DX will perform a GPU->GPU transfer. That's two  memcpys.
  • Common case scenario: DX memcpy's your data CPU->CPU. At a later moment the DX will perform a CPU->GPU and immediately afterwards a GPU->GPU transfer. That's three memcpys.
  • Worst case scenario: DX will stall, and memcpy CPU->GPU then GPU->GPU. That's two memcpys + stall.

D3D11_USAGE_DYNAMIC

D3D11_USAGE_DYNAMIC tells the API to store my buffer in memory that is fast to access for the CPU. This guarantees (?) it will be located on system RAM and not VRAM. Whenever the GPU needs to access the data it will upload the data to VRAM. Assuming the hardware can handle buffers larger than 128mB (see footnote 1 on here) this theoretically means the size of the buffer is limited by the amount of data can be transferred from CPU memory to GPU memory in the desired frametime. An estimate for the upper boundary, ignoring time necessary for actually processing the data, would be the PCIE bandwidth available to the GPU divided by the desired framerate (can we estimate a more precise upper boundary?). I can update the buffer using Map/Unmap with one of the following flags:

  1. D3D11_MAP_WRITE
  2. D3D11_MAP_READ_WRITE
  3. D3D11_MAP_WRITE_DISCARD
  4. D3D11_MAP_WRITE_NO_OVERWRITE
  5. (D3D11_MAP_READ <- this would not be for updating, but simply for reading)

Nvidia suggests to use D3D11_MAP_WRITE_DISCARD (for constant buffers).

The reason for this (as I understand from here) is that buffers may still be in use when you are trying to update them, and MAP_WRITE_DISCARD will let you write to a different region of memory so that the GPU can discard the old buffer when it is done with it, and grab the new one when it needs it. All of this is still under my personal, possibly wrong, premise that the USAGE_DYNAMIC buffer is stored in system RAM and grabbed by the GPU over PCIE lanes when it needs it.

If I were to use MAP_WRITE_NO_OVERWRITE, I could write to the buffer that is in use, but I would have to guarantee that my implementation does not overwrite anything the GPU is currently using. I assume something undefined happens otherwise. Here I really would need to understand the intricacies of how DX11 manages CPU/GPU memory. So if you happen to know about these intricacies in relation to the map flags, please share your knowledge. :)

Back to my initial question:

A structured buffer is nothing but an ID3D11Buffer wrapped by an ID3D11ShaderResourceView. As I understand, this means the memory management by D3D11 should be no different. Of course that assumption could be fatally flawed, but that is why I am posting here asking for help.

Nonetheless I have to bind and unbind ShaderResources, for example for the vertex shader via VSSetShaderResources. How is binding/unbinding (both implicitly by binding a new resource, or implicitly by binding a nullptr) related to the memory management of my ID3D11Buffer by the D3D11 API? Assuming I have used a USAGE_DEFAULT buffer, then I would hope my structured buffer stays in VRAM until I Release() the resources explicitly. Meaning I can bind/unbind without the cost of having to move the buffer from RAM to VRAM.

I guess this question can be generalized to the following: do I ever get a guarantee from D3D11 that something is stored in VRAM until I decide to remove/release it? Of course I still need clarification/answers for the rest of the questions in my post, but my difficulties with D3D11 are summarized by a lack of understanding of the lifetime of objects in VRAM, and how I can influence these lifetimes.

Thanks for reading this far, hope someone can help me. :D

 

 

Advertisement

IIRC binding and unbinding resources to the pipeline has mostly nothing to do with the residency lifetime of resources.  The only thing I can think of is that when you unbind a resource you are indirectly telling the API that it is now safe to do things with said resource (for example making it non resident).  There are two basic states of your application, being overcommited in terms of VRAM and not being overcommited.  If you're not overcommited I don't think D3D/Drivers will mess with residency.  Basically what I'm saying is if there's no reason to mess with video memory I don't think D3D/drivers will.  If you are overcommited then D3D/drivers will mess with allocations.  I'm not sure what method they use for managment (LRU maybe).

edit - I should also mention that the CPU and GPU work asynchronously... moreover the GPU takes a variable amount of time to execute each command.  Practically this means the CPU/Runtime/drivers buffer lots of commands for the GPU to execute.  So when you bind or unbind something its not happening at that moment in time it happens later at the right time.  Also IIRC binding resources has more to do with the api abstraction than the GPU... remember the API is a state machine.  And commands inherit state. 

-potential energy is easily made kinetic-

In D3D11, you actually have no direct control or guarantee that a resource is in VRAM. This is managed automatically by the D3D11 drivers. if VRAM fills up for example, it will make a decision about which resources it can page out to system memory, and if system memory fills up, it will make a decision about which resources can be paged out to disk. If a resource is needed by a shader it'll be paged in to VRAM

Before Windows 10 (WDDM 1.x), the driver was responsible for submitting an "allocation list" and "patch location list" alongside every command buffer. These lists served two primary purposes: it let the kernel-mode driver patch addresses in the command buffer with actual physical memory addresses (this old model assumed that GPU's didn't have virtual memory capabilities), and it let the video memory manager (VIDMM) know which resources were being referenced by the command buffers. That second part is tied to residency: the video memory manager would go off those lists of referenced resources when it would try to term what to keep resident in video memory and what to evict to system memory. So with that in mind, the act of "binding" a resource can indirectly affect residency assuming that you issue a Draw or Dispatch call that actually references that binding. So in other words, if you bind your buffer every frame and use it in 1 draw call, it's less likely to get evicted then if you didn't reference it all for a while.

With WDDM 2.0, the patch and allocation lists are gone for GPU's that support virtual addresses. Under this model residency is explicit from an OS point of view: it just responds to requests to evict resources or make them resident. In D3D12 these actions are made directly available to you on a per-resource or per-heap basis. In D3D11 you don't have those controls, so instead it's up to the driver to make changes to residency automatically, which it will (likely) do based on which resources are referenced by your Draw/Dispatch calls.

 

Damn, I can't believe I left out the simplest case... when something isn't resident and you bind it.  Thanks MJP.  BTW @MJP we miss you over on B3D. 

-potential energy is easily made kinetic-

@MJP and @Infinisearch are totally correct. Some clarifications/additions:

3 hours ago, MJP said:

In D3D11 you don't have those controls, so instead it's up to the driver to make changes to residency automatically, which it will (likely) do based on which resources are referenced by your Draw/Dispatch calls.

There's some pseudocode that drivers should follow which basically implements the logic that VidMM used to run on the WDDM1 model, but is now the responsibility of the WDDM2 usermode drivers. If the driver doesn't do this well, then the graphics system has no choice but to suspend the app, page out its stuff to let other apps run, page the app's stuff back in, and then resume the app, since the app/driver said it needs those resources resident in order to be able to run.

 

To your original post:

1. Dynamic buffers only support WRITE_DISCARD and WRITE_NO_OVERWRITE map flags. READ, WRITE, and READ_WRITE are all not supported.

2. 

12 hours ago, chiffre said:

do I ever get a guarantee from D3D11 that something is stored in VRAM until I decide to remove/release it?

So, first off, the answer is no - this is called "pinning" and is not expressible. Secondly, it's a little more complicated than that. The GPU has the ability to read data directly from system RAM (as you pointed out in your original post). There are 3 states a resource can be in: Resident in VRAM, resident in RAM, or not resident. A resource that's "not resident" as MJP mentioned is inaccessible, and can be paged out of VRAM via the video memory manager, and even paged out of RAM to disk via the regular OS memory manager. A resource that's resident in system memory has essentially already been paged out of VRAM, but still needs to be in use. Like Infinisearch mentioned, simply making a resource not resident (which can happen as a result of unbinding, depending how the driver tracks "in use") doesn't necessarily actually move it out of VRAM, it just means that it can be moved if something else needs to be there instead.

3. Some resources are requested to be resident in VRAM, and some are only ever requested to be resident in RAM. Generally, DYNAMIC resources don't have a request for VRAM, where DEFAULT resources generally do. There are exceptions (e.g. DEFAULT resources with CPU access flags generally don't reside in VRAM) but for the most part those generalizations hold true.

First of all, thanks for all the answers! I see that I only have indirect influence on the residency of resources, at least in D3D11.

10 hours ago, SoldierOfLight said:

The GPU has the ability to read data directly from system RAM (as you pointed out in your original post). There are 3 states a resource can be in: Resident in VRAM, resident in RAM, or not resident. A resource that's "not resident" as MJP mentioned is inaccessible, and can be paged out of VRAM via the video memory manager, and even paged out of RAM to disk via the regular OS memory manager. A resource that's resident in system memory has essentially already been paged out of VRAM, but still needs to be in use. Like Infinisearch mentioned, simply making a resource not resident (which can happen as a result of unbinding, depending how the driver tracks "in use") doesn't necessarily actually move it out of VRAM, it just means that it can be moved if something else needs to be there instead.

3. Some resources are requested to be resident in VRAM, and some are only ever requested to be resident in RAM. Generally, DYNAMIC resources don't have a request for VRAM, where DEFAULT resources generally do. There are exceptions (e.g. DEFAULT resources with CPU access flags generally don't reside in VRAM) but for the most part those generalizations hold true.

The way I understand the first underlined portion from SoldierOfLight's post, means that I can trust the video memory manager to "do the right thing" as long as I don't (how do I word this? ask the D3D11 API politely? ;) )  attempt to overcommit VRAM. In the scenario when "something else needs to be there instead" the video memory manager will prioritize resources based on my usage/access flags and change residencies as it thinks appropriate. 

To the second underlined bit: "DEFAULT resources with CPU access flags generally don't reside in VRAM". Does this mean I might as well use a dynamic buffer with cpu access flags because I can take advantage of Map/Unmap over UpdateSubResource?

 

40 minutes ago, chiffre said:

To the second underlined bit: "DEFAULT resources with CPU access flags generally don't reside in VRAM". Does this mean I might as well use a dynamic buffer with cpu access flags because I can take advantage of Map/Unmap over UpdateSubResource?

No, UpdateSubresource isn't the same as CPU access flags. See the MapOnDefaultBuffer member of D3D11_FEATURE_DATA_D3D11_OPTIONS1.

@SoldierOfLight  I was wondering in modern computers is there still something like AGP memory?  Moreover does GPU visible system memory stay statically mapped or is it mapped on demand?  Also any reason why DX9's UP(user pointer) calls were dropped in later API revisions?

-potential energy is easily made kinetic-

I'm not really familiar with AGP memory. Sounds like it's just RAM that's visible to the GPU? That's definitely still a thing. If you're talking about the specialized hardware I see mentioned which essentially virtualizes RAM, nowadays that's just virtual addressing. I'm not super familiar with how this works these days, but I expect that the GPU gets a virtual address assigned to system memory, and the corresponding memory managers make sure that the mapping stays up-to-date.

For the UP support, I believe that WDDM didn't natively support this functionality (Vista+) so for D3D9 it was emulated in new drivers. However you might be interested in ID3D12Device3::OpenExistingHeapFromAddress.

This topic is closed to new replies.

Advertisement