DX12 - Documentation / Tutorials?

Started by
33 comments, last by Alessio1989 9 years, 6 months ago

It's always been the case that you shouldn't use more memory than the GPU actually has, because it results in terrible performance. So, assuming that you've always followed this advice, you don't have to do much work in the future

So in essence, a game rated to have minimum 512MB VRAM (does DX have memory requirements?) never uses more then that for any single frame/scene?

You would think that AAA-games that require tens of gigabytes of disk space would at some point use more memory in a scene then what is available on the GPU. Is this just artist trickery to keep any scene below rated gpu memory?

Advertisement

So in essence, a game rated to have minimum 512MB VRAM (does DX have memory requirements?) never uses more then that for any single frame/scene?

You would think that AAA-games that require tens of gigabytes of disk space would at some point use more memory in a scene then what is available on the GPU. Is this just artist trickery to keep any scene below rated gpu memory?

Spare a thought for the PS3 devs with 10's of GB's of disc space, and then just 256MB of GPU RAM laugh.png I'm always completely blown away when I see stuff like GTA5 running on that era of consoles!

Ideally on PC, if you have 512MB of VRAM, yes, you should never use more than that in a frame. Ideally you should never use more than that, ever!
If you've got 1024MB in resources, but on frame #1 you use the first 50% of it, and on frame #2 you use the second 50%, it's still going to really hurt performance -- in between those two frames, you're asking D3D to memcpy half a gig out of VRAM, and then another half a gig into VRAM. That's a lot of memcpy'ing!

Game consoles don't do this kind of stuff for you automatically (they're a lot more like D3D12 already!), so big AAA games made for consoles are going to be designed to deal with harsh memory limits themselves. e.g. on a console that has 256MB of VRAM, the game will crash as soon as you try to allocate the 257th MB of it. There's no friendly PC runtimes/drivers that are going to pretend that everything's ok and start doing fancy stuff behind the scenes for you biggrin.png

The tricky part in doing a PC version is that you've got a wide range of resource budgets. On the PS3 version, you can just say "OK, we crash after 256Mb of allocations, deal with it", and do the best you can while fitting into that budget. On PC, you need to do the same, but also make it able to utilize 512MB, or 700MB or 1GB, etc... The other hard part that on PC, it's almost impossible to know how much memory any resources actually take up, or how much VRAM is actually available to you... Most people probably just make guesses based on the knowledge they have from their console versions.

that method would have to also be used -for example- when all of that larger-than-video-memory resource is being accessed by the GPU in the same, single shader invocation? Or that shader invocation would (somehow) have to be broken up into the subsequently generated command lists? Does that mean that the DirectX pipeline is also virtualized on the CPU?

I don't know if it's possible to support that particular situation? Can you bind 10GB of resources to a single draw/dispatch command at the moment?

I don't think the application will be allowed to use that DMA-based synchronisation method (or trick? ) that you explained.

D3D12 damn well better expose the DMA command queues laugh.png nVidia are starting to expose them in GL, and a big feature of modern hardware is that they can consume many command queues at once, rather than a single one as with old hardware.

Wait. That's how tiled resources already work

Tiled resources tie in with the virtual address space stuff. Say you've got a texture that exists in an allocation from pointer 0x10000 to 0x90000 (a 512KB range) -- you can think of this allocation being made up of 8 individual 64KB pages.
Tiled resources are a fancy way of saying that the entire range of this allocation doesn't necessarily need to be 'mapped' / has to actually translate to a physical allocation.
It's possible that 0x10000 - 0x20000 is actually backed by physical memory, but 0x20000 - 0x90000 aren't actually valid pointers (much like a null pointer), and they don't correspond to any physical location.
This isn't actually new stuff -- at the OS level, allocating a range of the virtual address space (allocating yourself a new pointer value) is actually a separate operation to allocating some physical memory, and then creating a link between the two. The new part that makes this extremely useful is a new bit of shader hardware -- When a shader tries to sample a texel from this texture, it now gets an additional return value indicating whether the texture-fetch actually suceeded or not (i.e. whether the resource pointer was actually valid or not). With older hardware, fetching from an invalid resource pointer would just crash (like they do on the CPU), but now we get error flags.

This means you can create absolutely huge resources, but then on the granularity of 64KB pages, you can determine whether those pages are physically actually allocated or not. You can use this so that the application can appear simple, and just use huge textures, but then the engine/driver/whatever can intelligently allocate/deallocate parts of those textures as required.


Tiled resources tie in with the virtual address space stuff. Say you've got a texture that exists in an allocation from pointer 0x10000 to 0x90000 (a 512KB range) -- you can think of this allocation being made up of 8 individual 64KB pages.
Tiled resources are a fancy way of saying that the entire range of this allocation doesn't necessarily need to be 'mapped' / has to actually translate to a physical allocation.
It's possible that 0x10000 - 0x20000 is actually backed by physical memory, but 0x20000 - 0x90000 aren't actually valid pointers (much like a null pointer), and they don't correspond to any physical location.
This isn't actually new stuff -- at the OS level, allocating a range of the virtual address space (allocating yourself a new pointer value) is actually a separate operation to allocating some physical memory, and then creating a link between the two. The new part that makes this extremely useful is a new bit of shader hardware -- When a shader tries to sample a texel from this texture, it now gets an additional return value indicating whether the texture-fetch actually suceeded or not (i.e. whether the resource pointer was actually valid or not). With older hardware, fetching from an invalid resource pointer would just crash (like they do on the CPU), but now we get error flags.

This means you can create absolutely huge resources, but then on the granularity of 64KB pages, you can determine whether those pages are physically actually allocated or not. You can use this so that the application can appear simple, and just use huge textures, but then the engine/driver/whatever can intelligently allocate/deallocate parts of those textures as required.

So what you are saying is that we CAN have 2GB+ of game resources allocated on the GPU VRAM using virtual addresses just fine. But only when we need them we need to tell the driver to actually page things in to VRAM?

Assume now that a modern computer has atleast 16GB of system memory. And a game has 8GB of resources and the GPU has 2GB VRAM. So in this situation a DX12 game would just copy all game data from disk to system memory (8GB), then allocate those 8GB to the VRAM and create 8GB of resources, even though the physical limit is 2GB. Command queues would then tell the driver what parts of those 8GB to page in and out? But is that not just what managed pool does anyway?

It's pretty much VirtualAlloc & VirtualFree for the GPU.

You still have to manually manage the memory yourself, flagging pages as appropriate and loading/inflating data from disk/RAM to VRAM on need.

Virtual Textures were available in hardware on 3DLabs cards 10 years ago, long before "Mega Textures" ever existed...

Doing manually allows you to predict what you'll need, and keep things compressed in RAM, that's not the case for Managed which needs you to upload everything in final format and let the system page in/out on use [which is, too late to avoid a performance hit].

-* So many things to do, so little time to spend. *-
But is that not just what managed pool does anyway?

The managed pool needs to swap in and out entire textures. Say you've a 2048x2048 texture but your draw call is only going to reference a small portion of it. With the managed pool the entire texture needs to be swapped in in order for this to happen. With proper virtualization of textures only the small portion that is being used (in practice it will probably be a litle bigger, in the order of a 64k tile) will get swapped in. That's an efficiency win straight away.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

But if you never use more then physical available VRAM in a scene, why would you ever need to page data in and out during that scene? Is it not only when new things come into the scene and old stuff leaves the scene we gotta put them in and out of gpu memory? Does this memory management stuff really increase framerate? Or is it just loading times that gets better?

Also, what data is actually "paged out", unless the compute shader make some data for the CPU, what is there to "copy out"?

Ok, let's restart the conversation about the new rendering features of Direct3D 11.3 and 12.0, some details here: http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-01-20-72/8463.Direct3D-12-_2D00_-New-Rendering-Features.pptx

3D tiled resourcs ---> wub.png

"Recursion is the first step towards madness." - "Skegg?ld, Skálm?ld, Skildir ro Klofnir!"
Direct3D 12 quick reference: https://github.com/alessiot89/D3D12QuickRef/

Conservative rasterization should allow for some film-quality AA solutions to be developed. It would be possible to store per-pixel, a list of every triangle that in any way intersects with that pixel's bounds. You could then resolve that list in post, sorting and clipping them against each other to get the actual area covered by each primitive.

Conservative rasterization should allow for some film-quality AA solutions to be developed. It would be possible to store per-pixel, a list of every triangle that in any way intersects with that pixel's bounds. You could then resolve that list in post, sorting and clipping them against each other to get the actual area covered by each primitive.

Sure, although you'd have to sort the list of triangles by depth in order to get the correct result. You'd also have to forego standard z buffering.

I know this isn't really contributing to the conversation much, but did you see that last picture of the Sponza Atrium in the powerpoint Allesio posted? My eyes almost popped out in awe biggrin.png

The difference in memory consumption using Tiled textures vs textures is really sweet. the tiled textures in that scene take less than 1/10th than using textures? or am i reading that wrong?

This topic is closed to new replies.

Advertisement