Where CommandAllocator store its data, on VRAM or RAM?

Started by
2 comments, last by Hodgman 5 years, 7 months ago

MSDN says CommandAllocator is used for storing GPU commands. But doesn't answer where it's stored. On the graphics card's VRAM or on the RAM.

If it's stored on the CPU side, then every time we call execute, is it copied to graphics card? Or copied to GPU when we add new command to list and executed after?

I think storing on the gpu side is better, because with this way we can cache commands and dont create every frame.

Advertisement

As far as I know this isn't really exposed to you at the API level, it's instead a detail of the driver and the Windows driver model that it plugs into. Before Win10/WDDM 2.0 there was a distinction on the driver side of things between a "command buffer" and a "DMA buffer", where a command buffer was generated by the user-mode driver and was placed in normal paged memory, but the DMA buffer was handled by the kernel-mode driver and could be made directly-accessible to the GPU. But it seems that this distinction is gone in WDDM 2.0 for GPU's drivers that support GPU virtual addressing, and the user-mode driver can directly-generate a GPU-accessible command buffer and then submit that to the scheduler. I would assume that pretty much all D3D12 drivers are going down that path, in which case the commands are probably being stored directly in GPU-accessible memory. However the D3D12 API doesn't preclude patching occurring when you call ExecuteCommandLists, so it's totally possible that a driver might want to do a patch followed by a copy.

Main RAM can be configured to be directly readable by the GPU, without extra copies required. PCIe speeds are pretty fast these days - e.g. Around 8GB to 63GB/s, or around 100MB to 1GB per frame. Command buffers are typically quite small, are read linearly, and tolerant of long latency in fetches (draw setup happens asynchronously from draw shading), so they work perfectly fine from system RAM. The benefits of actually copying them to VRAM would be very small. 

As an extreme example, you can even have a 100MB vertex buffer living in system RAM, being fetched on demand by the vertex shader over the PCIe bus, and still have it perform fine on modern systems :)

This topic is closed to new replies.

Advertisement