So in essence, a game rated to have minimum 512MB VRAM (does DX have memory requirements?) never uses more then that for any single frame/scene?
You would think that AAA-games that require tens of gigabytes of disk space would at some point use more memory in a scene then what is available on the GPU. Is this just artist trickery to keep any scene below rated gpu memory?
Spare a thought for the PS3 devs with 10's of GB's of disc space, and then just 256MB of GPU RAM I'm always completely blown away when I see stuff like GTA5 running on that era of consoles!
Ideally on PC, if you have 512MB of VRAM, yes, you should never use more than that in a frame. Ideally you should never use more than that, ever!
If you've got 1024MB in resources, but on frame #1 you use the first 50% of it, and on frame #2 you use the second 50%, it's still going to really hurt performance -- in between those two frames, you're asking D3D to memcpy half a gig out of VRAM, and then another half a gig into VRAM. That's a lot of memcpy'ing!
Game consoles don't do this kind of stuff for you automatically (they're a lot more like D3D12 already!), so big AAA games made for consoles are going to be designed to deal with harsh memory limits themselves. e.g. on a console that has 256MB of VRAM, the game will crash as soon as you try to allocate the 257th MB of it. There's no friendly PC runtimes/drivers that are going to pretend that everything's ok and start doing fancy stuff behind the scenes for you
The tricky part in doing a PC version is that you've got a wide range of resource budgets. On the PS3 version, you can just say "OK, we crash after 256Mb of allocations, deal with it", and do the best you can while fitting into that budget. On PC, you need to do the same, but also make it able to utilize 512MB, or 700MB or 1GB, etc... The other hard part that on PC, it's almost impossible to know how much memory any resources actually take up, or how much VRAM is actually available to you... Most people probably just make guesses based on the knowledge they have from their console versions.
that method would have to also be used -for example- when all of that larger-than-video-memory resource is being accessed by the GPU in the same, single shader invocation? Or that shader invocation would (somehow) have to be broken up into the subsequently generated command lists? Does that mean that the DirectX pipeline is also virtualized on the CPU?
I don't know if it's possible to support that particular situation? Can you bind 10GB of resources to a single draw/dispatch command at the moment?
I don't think the application will be allowed to use that DMA-based synchronisation method (or trick? ) that you explained.
D3D12 damn well better expose the DMA command queues nVidia are starting to expose them in GL, and a big feature of modern hardware is that they can consume many command queues at once, rather than a single one as with old hardware.
Wait. That's how tiled resources already work
Tiled resources tie in with the virtual address space stuff. Say you've got a texture that exists in an allocation from pointer 0x10000 to 0x90000 (a 512KB range) -- you can think of this allocation being made up of 8 individual 64KB pages.
Tiled resources are a fancy way of saying that the entire range of this allocation doesn't necessarily need to be 'mapped' / has to actually translate to a physical allocation.
It's possible that 0x10000 - 0x20000 is actually backed by physical memory, but 0x20000 - 0x90000 aren't actually valid pointers (much like a null pointer), and they don't correspond to any physical location.
This isn't actually new stuff -- at the OS level, allocating a range of the virtual address space (allocating yourself a new pointer value) is actually a separate operation to allocating some physical memory, and then creating a link between the two. The new part that makes this extremely useful is a new bit of shader hardware -- When a shader tries to sample a texel from this texture, it now gets an additional return value indicating whether the texture-fetch actually suceeded or not (i.e. whether the resource pointer was actually valid or not). With older hardware, fetching from an invalid resource pointer would just crash (like they do on the CPU), but now we get error flags.
This means you can create absolutely huge resources, but then on the granularity of 64KB pages, you can determine whether those pages are physically actually allocated or not. You can use this so that the application can appear simple, and just use huge textures, but then the engine/driver/whatever can intelligently allocate/deallocate parts of those textures as required.