Say your typical low end graphics card has 512MB - 1GB of memory. Is it realistic to say that the total data required to draw a complete frame is 2GB, would that mean that the GPU memory would have to be refreshed 2-5+ times every frame?
Do I need to start batching based on buffer sizes?
It's always been the case that you shouldn't use more memory than the GPU actually has, because it results in terrible performance. So, assuming that you've always followed this advice, you don't have to do much work in the future
The article is not very clear on this. It says that the driver will tell the operating system to copy resources into GPU memory (from system memory) as required, but only the application can free those resources once all of the queued commands using those resources have been processed by the GPU. It's not clear if the resources can also be released (from GPU memory, by the OS) during the processing of already queued commands, to make room for the next 512MB (or 1GB, or whatever size) of your 2GB data. But my guess is that this is not possible. This would imply that the application's "swap resource" request could somehow be plugged-into the driver/GPU's queue of commands, to release unused resource memory intermediately, which is probably not possible, since (also according to the article), the application has to wait for all of the queued commands in a frame to be executed, before it knows which resources are no longer needed. Also, "the game already knows that a sequence of rendering commands refers to a set of resources" - this also implies that the application (not even the OS) can only change resource residency in-between frames (sequence of rendering commands), not during a single frame.
If D3D12 is going down the same path as the other low-level API's, the resources as we know them in D3D don't really exist any more.
A resource such as a texture ceases to exist. Instead, you just get a form of malloc/free to use as you will. You can malloc/free memory whenever you want, but freeing memory too soon (while command lists referencing that memory are still in flight) will be undefined behavior (logged by the debug runtime, likely to cause corruption in the regular runtime).
Resource-views stay pretty much as-is, but instead of creating a resource-view that points to a resource, instead they just have a raw pointer inside them, which points somewhere into one of the gpu-malloc allocations that you've previously made. These resource-view objects will hopefully be POD blobs instead of COM-objects, which can easily be copied around into your descriptor tables. These 'view' structures are in native-to-the-GPU formats, and will be read directly as-is by your shader programs executing on the GPU.
This is basically what's going on already inside D3D, but it's hidden behind a thick layer of COM abstraction.
At the moment, the driver/runtime has to track which "resources" are used by a command list, and from there figure out which range of memory addresses are used by a command list.
The command-list and this list of memory-ranges is passed down to the Windows display manager, which is responsible for virtualizing the GPU and sharing it between processes. It stores this info in a system-wide queue, and eventually gets around to ensuring that your range of (virtual) memory addresses are actually resident in (physical) GPU-RAM and are correctly mapped, and then it submits the command list.
At the moment, it's up to D3D to internally keep track of how many megabytes of memory is required by a command-list (how many megabytes of resources are referenced by that command list). Currently, D3D is likely ending your internal command-list early when it detects you're using too much memory, submitting this partial command-list, and then starting a new command list for the rest of the frame.
Also, DX12 is only a driver/application-side improvement over DX11. Adding memory management capabilities to the GPU itself would also require a hardware-side redesign.
This kind of memory management is already required in order to implement the existing D3D runtime - pretending that the managed pool can be of unlimited size requires that the runtime can submit partial command buffers and page resources in and out of GPU-RAM during a frame.
There's already lots of HW features available to allow this
Both the CPU and the GPU use virtual-addressing, where the value of a pointer doesn't necessarily correspond to a physical address in RAM.
Generally, most pointers (i.e. virtual addresses) we use on the CPU are mapped to physical "main RAM", but pointers can also be mapped to IO devices, or other bits of RAM, such as RAM that's physically on the GPU.
The most basic system is then for us to use an event, such that when the GPU executes that event command, it writes a '1' into an area of memory that we've previously initialized with a zero. The CPU can submit the command buffer containing this event command, and then poll that memory location until it contains a '1', indicating the GPU has completed the commands preceding the event. The CPU can then map physical GPU memory into the CPU's address space, and memcpy new data into it.
This is a slow approach though - it requires the CPU to waste time doing memcpys... but worse, because memcpy'ing from the CPU into GPU-RAM is much slower than to regular RAM!
Another approach is to get the GPU to do the memcpy. Instead, you map some CPU-side physical memory into the GPU's virtual address space, and at the end of the command buffer, insert a dispatch command that launches a compute shader that just reads from the CPU-side pointer and writes to a GPU-side pointer.
This frees up the CPU, but wastes precious GPU-compute time on something as basic as a memcpy. On that note - yep, the GPU can read from CPU RAM at any time really - you could just leave your textures and vertex buffers in CPU-RAM if you liked... but performance would be much worse... Also, on Windows, you can't have too much CPU-RAM mapped into GPU-address-space at any one time or you degrade system wide performance (as it requires pinning the CPU-side pages / marking them as unable to be paged out).
Even older GPUs that don't have compute capabilities will have a mechanism to implement this technique -- it just easier to explain if we talk about a memcpy compute shader
Lastly, modern GPUs have dedicated "DMA units", which are basically just asynchronous memcpy queues. As well as your regular command buffer full of draw/dispatch commands, you can have a companion buffer that contains DMA commands. You can insert a DMA command that says to memcpy some data from CPU-RAM to GPU-RAM, but before it, insert a command that says "wait until the word at address blah changes from a 0 to a 1". We can also put an event at the end of our drawing comamnd buffer like in the first example, which lets the DMA queue know that it's time to proceed. This can be an amazing solution as it has zero impact on the CPU or GPU!
Instead of D3D just doing all this magically internally, if you want to use an excessive amount of memory in your app (more than the GPU can handle), then in D3D12 it's going to be up to you to implement this crap...
If you don't want to use an excessive amount of RAM, the only thing you'll have to do is, while you're submitting draw-calls into your command buffer, you'll also have to generate a list of resources that are going to be used by that command buffer, so you can inform Windows which bits of GPU-RAM are going to be potentially read/written by your command buffer.