Texture/Buffer persistence in VRAM

Started by
2 comments, last by Yourself 7 years, 1 month ago

Hi all,

I've written the odd shader but I must admit my overall understanding of the pipeline is lacking. I'm wondering how VRAM memory management works. In particular:

  • Which objects persist in VRAM between frames? I assume textures and vertex info?
  • Can a framebuffer be persisted on the GPU between frames, e.g. if you have a bunch of security cameras in your game that don't need to be updated every frame? Or for temporal reprojection for fancy screen-space effects.
  • What happens if you have too much in VRAM? Does it get paged out, or something worse?

If you can answer any of the above it would be much appreciated.

Thanks,

JT

Advertisement

So this depends on the operating system, as well as the API being used. I'm only familiar with how things work on Windows, so that's what I'll explain. If you want to consult the actual OS documentation, you'll want to go here.

On Windows Vista through Windows 8 the OS uses WDDM 1.x, where the OS pretty much completely controls VRAM allocation. To simplify things considerably, the way it works is that the OS will keep track of both the dedicated memory on the video card (VRAM), as well as a chunk of system memory that's usually around 2x the size of VRAM. Any apps that use the GPU will tell the driver to create "resources", where a resource is generally a single allocation (typically either a texture or a buffer). The app will then issue rendering commands, which causes the driver to create command buffers that are submitted to the GPU for execution. Whenever the driver submits one of these command buffers, it also has to submit a list of resources that are referenced by that command buffer. So if you issued a draw that uses TextureA and TextureB via shader resource views, the driver will include those resources in the list. This list serves two purposes:

  1. It allows the kernel-mode driver to patch physical addresses into the command buffer. Older GPU's didn't support virtual addressing, and required raw physical addresses provided by the OS's memory manager.
  2. It allowed the OS's memory manager to know which resources were needed for that command list to execute. The memory manager would use this list to shuffle resources in and out of VRAM. This allows for multiple apps to share the GPU without requiring the sum of all resources to fit in VRAM. It also potentially allows a single app to over-subscribe to the VRAM. Resources that are in VRAM are considered to be "resident", while resources that are paged out to system memory are considered "evicted".

In other words, the OS would try it's very best to make sure that the resources you're actually using stay in VRAM. In that way it's somewhat similar to CPU memory, which can get paged out to the swap file if it's not in use. However if your app (or multiple apps collectively) try to use too much memory simultaneously, it will typically manifest as poor performance resulting from the OS frantically moving things in and out of VRAM. In some cases it's also possible that a resource will get moved to an area of CPU memory that's still accessible by the GPU, albeit more slowly than if it were in VRAM. Either way there are tools that you can use to track this down.

On Windows 10 there's a new driver model called WDDM 2.0. Under this driver model GPU's are expected to have virtual addressing support, which simplifies things a bit. Using virtual addresses in command buffers avoids the need for patching, which saves on performance. If the app is using D3D12, control over residency is also given to the app instead of being automatic. The app can use the "MakeResident" and "Evict" functions to manually move resources (or heaps) in and out of VRAM. The OS also has a mechanism for notifying apps when their amount of available GPU memory is shrinking, which usually happens due to another app requesting VRAM. In this scenario the app is expected to destroy or evict resources, but in practice the OS will start to automatically evict if your app fails to do it. In this case the resource will most likely be slower to access, which can cause performance to degrade. D3D11 apps see the same behavior as they did previously (residency is automatic), but under WDDM 2.0 the driver is responsible for providing this behavior.

@MJP

Thanks for the in-depth explanation! It definitely demystifies a lot of things for me.

Do you happen to know whether frame buffers can remain in VRAM between frames? And whether you can copy a buffer to another buffer purely on the GPU? I'm wondering whether a big bank of static security cameras could be rendered on the cheap by keeping pre-rendered low res frame buffers (with depth) and simply drawing dynamic objects on top of them. My guesstimate is that it may work well due to:

  1. Low number of meshes
  2. Low pixel coverage
  3. The majority of pixels failing the Z Test

frame buffers will most likely remain in VRAM. You would have to use specific tools/APIs to be sure though. It all depends on the amount of memory you use, you can have 100s of frame buffers in the GPU as long as the memory they occupy isn't needed for other resources/applications. It is good idea to allocate large resources (like framebuffers) as soon as possible to prevent eviction)

Yes, you can 'freely' copy resources (buffers, texture, ..) on the GPU. notice the quotes are all API depened. for example some APIs have the constrain that textures need to have the same dimension and a compatible format while other APIs are pretty mutch memcpy style (very powerful but possible headache introducing :) )

This topic is closed to new replies.

Advertisement