Sign in to follow this  
Tispe

DX12 DX12 - Documentation / Tutorials?

Recommended Posts

Spazzarama    1643

Haven't heard of any releases for DX12 yet, hints are for an early access preview later this year. You can apply for the early access from a link down the page here. That same page gives a good overview of what is changing - I think the main thing is lower level of hardware abstraction. It looks like it builds upon the approach used in DX11 (e.g. still supports command lists and so on), so I think you will be safe to start with DX 11.1 or 11.2

Edited by spazzarama

Share this post


Link to post
Share on other sites
tonemgub    2008

Closest thing to documentation right now: http://channel9.msdn.com/Events/Build/2014/3-564

(Rant: And knowing Microsoft, probably that's the only "documentation" we will ever have for a long time. With DX 10 & 11 I got the feeling they either didn't finish the documentation, or they did finish it but didn't provide it for free. I hear some guy at Futuremark (3DMark) already had access to the API to write a DX12 demo - I fear they, like other companies are probably already working directly with MS to figure out stuff, and we average Joes will again be left in the dark.)

Edited by tonemgub

Share this post


Link to post
Share on other sites
SeanMiddleditch    17565

what kind of documentation are you searching for?


Sorry, should've been more specific. I'm referring to documentation on the binary format to allow you to produce/consume compiled shaders like you can with SM1-3 without having to pass through Microsoft DLLs or HLSL. Consider projects like MojoShader that could make use of this functionality to decompile SM4/5 code to GLSL when porting software or a possible Linux D3D11 driver that would need to be able to compile compiled SM4/5 code into Gallium IR and eventually GPU machine code.

There's also no way with SM4/5 to write assembly and compile it which is a pain for various tools that don't work to work through HLSL or the HLSL compiler.

Share this post


Link to post
Share on other sites
ankhd    2304

So DirectX12 is it going to be like DX10. where thery get you started then they just drop it in no time and replace it with 11 wtf. Is it Is it going to be like that.

Share this post


Link to post
Share on other sites
Alessio1989    4634

 

 

D3D12 will be the same, except will perform much better (D3D11 deferred context do not actually provide good performance increases in practice... or this is the excuse of AMD and Intel which do not support driver command lists).

Fixed cool.png

 

AMD support them on Mantle and multiple game console APIs. It's a back end D3D (Microsoft code) issue, forcing a single-thread in the kernel mode driver be responsible for kickoff. The D3D12 presentations have pointed out this flaw themselves.

 

 
I know that D3D11 command lists are far away from be perfect, but AMD was the first IHV to sell DX11 GPUs (Radeon HD5000 Series) claiming "multi-threading support" as one of the big features of theirs graphics cards.
 
Here what AMD proclaims: 
 
http://www.amd.com/en-us/products/graphics/desktop/5000/5970
 

  • Full DirectX® 11 support
    • Shader Model 5.0
    • DirectCompute 11
    • Programmable hardware tessellation unit
    • Accelerated multi-threading
    • HDR texture compression
    • Order-independent transparency

 

They also claimed the same thing with DX 11.1 GPUs when WDDM 1.2 drivers came out. 

 

Yes, their driver is itself "multi-threaded" (I remember few years ago it scaled well on two cores with half CPU driver overhead), and you can always use deferred context in different "app-threads" (since they are emulated by the D3D runtime, more CPU overhead yeah!), but that's not the same thing.

 

Graphics Mafia.. ehm NVIDIA supports driver command lists, and where used in the correct they work just fine (big example: civilization 5). Yes, they also "cheat" consumers  on feature level 11.1 support (as AMD "cheated" consumers.. and developers! on tier-2 tiled resources support) and they really like to break old application and games compatibility (especially old OpenGL games), but those are other stories.

Edited by Alessio1989

Share this post


Link to post
Share on other sites
mhagain    13430

 

what kind of documentation are you searching for?


Sorry, should've been more specific. I'm referring to documentation on the binary format to allow you to produce/consume compiled shaders like you can with SM1-3 without having to pass through Microsoft DLLs or HLSL. Consider projects like MojoShader that could make use of this functionality to decompile SM4/5 code to GLSL when porting software or a possible Linux D3D11 driver that would need to be able to compile compiled SM4/5 code into Gallium IR and eventually GPU machine code.

There's also no way with SM4/5 to write assembly and compile it which is a pain for various tools that don't work to work through HLSL or the HLSL compiler.

 

 

I'm not sure what the actual problem you have here is.  It's an ID3DBlob.

 

If you want to load a precompiled shader, it's as simple as (and I'll even do it in C, just to prove the point) fopen, fread and a bunch of ftell calls to get the file size.  Similarly to save one it's fopen and fwrite.

 

Unless you're looking for something else that Microsoft actually have no obligation whatsoever to give you, that is.....

Share this post


Link to post
Share on other sites
Tispe    1468

The DX12 overview indicates that the "unlimited memory" that the managed pool offers will be replaceable with costum memory management.

 

Say your typical low end graphics card has 512MB - 1GB of memory. Is it realistic to say that the total data required to draw a complete frame is 2GB, would that mean that the GPU memory would have to be refreshed 2-5+ times every frame?

 

Do I need to start batching based on buffer sizes? 

Share this post


Link to post
Share on other sites
tonemgub    2008

Is it realistic to say that the total data required to draw a complete frame is 2GB

Unless you have another idea, this is completely unrealistic. The amount of data required for one frame should be somewhere in the order of megabytes...  And DX11.2 minimizes the memory requirement with "tiled resources".

 

 


would that mean that the GPU memory would have to be refreshed 2-5+ times every frame?

This is not the case, but even if it was... The article is not very clear on this. It says that the driver will tell the operating system to copy resources into GPU memory (from system memory) as required, but only the application can free those resources once all of the queued commands using those resources have been processed by the GPU. It's not clear if the resources can also be released (from GPU memory, by the OS) during the processing of already queued commands, to make room for the next 512MB (or 1GB, or whatever size) of your 2GB data. But my guess is that this is not possible. This would imply that the application's "swap resource" request could somehow be plugged-into the driver/GPU's queue of commands, to release unused resource memory intermediately, which is probably not possible, since (also according to the article), the application has to wait for all of the queued commands in a frame to be executed, before it knows which resources are no longer needed. Also, "the game already knows that a sequence of rendering commands refers to a set of resources" - this also implies that the application (not even the OS) can only change resource residency in-between frames (sequence of rendering commands), not during a single frame. Also, DX12 is only a driver/application-side improvement over DX11. Adding memory management capabilities to the GPU itself would also require a hardware-side redesign.

 

 


Do I need to start batching based on buffer sizes?

If you think that you'll need to use 2GB (or more than the recommended/available resource limits) of data per frame, then yes. Otherwise, no.

Edited by tonemgub

Share this post


Link to post
Share on other sites
tonemgub    2008

Thanks, Hodgman! Really good explanation!

 

 

 


Quote

Also, DX12 is only a driver/application-side improvement over DX11. Adding memory management capabilities to the GPU itself would also require a hardware-side redesign.

This kind of memory management is already required in order to implement the existing D3D runtime - pretending that the managed pool can be of unlimited size requires that the runtime can submit partial command buffers and page resources in and out of GPU-RAM during a frame.

What I meant to point out by that (and this was the main conclusion I reached with my train of thoughts) was that the CPU is still the one doing the heavy-lifting when it comes to memory management. But now that I think about it, I guess it makes no difference - the main bottleneck is having to do an extra "memcpy" when there's not enough video memory.

 

For your explanation of how DMA could be used to make this work, that method would have to also be used -for example- when all of that larger-than-video-memory resource is being accessed by the GPU in the same, single shader invocation? Or that shader invocation would (somehow) have to be broken up into the subsequently generated command lists? Does that mean that the DirectX pipeline is also virtualized on the CPU?

 

Anyway, I think the main question that must be answered here is if the resource limits imposed by DX11 will go away In DX12. Yes, theoretically (and perhaps even practically) the CPU & GPU could be programmed into working together to provide virtually unlimited memory, but will this really be the case with DX12? From what I can tell, that article implies that the "unlimited memory" has to be implemented as "custom memory management" done by the application - not the runtime, nor the driver or GPU. This probably also means that it will be the application's job to split the processing of the large data/resources into multiple command lists, and I don't think the application will be allowed to use that DMA-based synchronisation method (or trick? smile.png ) that you explained.

Edit: Wait. That's how tiled resources already work. Never mind... :)

Edited by tonemgub

Share this post


Link to post
Share on other sites
Tispe    1468

It's always been the case that you shouldn't use more memory than the GPU actually has, because it results in terrible performance. So, assuming that you've always followed this advice, you don't have to do much work in the future 

 

So in essence, a game rated to have minimum 512MB VRAM (does DX have memory requirements?) never uses more then that for any single frame/scene?

 

You would think that AAA-games that require tens of gigabytes of disk space would at some point use more memory in a scene then what is available on the GPU. Is this just artist trickery to keep any scene below rated gpu memory? 

Edited by Tispe

Share this post


Link to post
Share on other sites
Tispe    1468


Tiled resources tie in with the virtual address space stuff. Say you've got a texture that exists in an allocation from pointer 0x10000 to 0x90000 (a 512KB range) -- you can think of this allocation being made up of 8 individual 64KB pages.
Tiled resources are a fancy way of saying that the entire range of this allocation doesn't necessarily need to be 'mapped' / has to actually translate to a physical allocation.
It's possible that 0x10000 - 0x20000 is actually backed by physical memory, but 0x20000 - 0x90000 aren't actually valid pointers (much like a null pointer), and they don't correspond to any physical location.
This isn't actually new stuff -- at the OS level, allocating a range of the virtual address space (allocating yourself a new pointer value) is actually a separate operation to allocating some physical memory, and then creating a link between the two. The new part that makes this extremely useful is a new bit of shader hardware -- When a shader tries to sample a texel from this texture, it now gets an additional return value indicating whether the texture-fetch actually suceeded or not (i.e. whether the resource pointer was actually valid or not). With older hardware, fetching from an invalid resource pointer would just crash (like they do on the CPU), but now we get error flags.
 
This means you can create absolutely huge resources, but then on the granularity of 64KB pages, you can determine whether those pages are physically actually allocated or not. You can use this so that the application can appear simple, and just use huge textures, but then the engine/driver/whatever can intelligently allocate/deallocate parts of those textures as required.

 

So what you are saying is that we CAN have 2GB+ of game resources allocated on the GPU VRAM using virtual addresses just fine. But only when we need them we need to tell the driver to actually page things in to VRAM?

 

Assume now that a modern computer has atleast 16GB of system memory. And a game has 8GB of resources and the GPU has 2GB VRAM. So in this situation a DX12 game would just copy all game data from disk to system memory (8GB), then allocate those 8GB to the VRAM and create 8GB of resources, even though the physical limit is 2GB. Command queues would then tell the driver what parts of those 8GB to page in and out? But is that not just what managed pool does anyway?

Share this post


Link to post
Share on other sites
Ingenu    1629

It's pretty much VirtualAlloc & VirtualFree for the GPU.

You still have to manually manage the memory yourself, flagging pages as appropriate and loading/inflating data from disk/RAM to VRAM on need.

 

Virtual Textures were available in hardware on 3DLabs cards 10 years ago, long before "Mega Textures" ever existed...

 

Doing manually allows you to predict what you'll need, and keep things compressed in RAM, that's not the case for Managed which needs you to upload everything in final format and let the system page in/out on use [which is, too late to avoid a performance hit].

Edited by Ingenu

Share this post


Link to post
Share on other sites
mhagain    13430
But is that not just what managed pool does anyway?

 

The managed pool needs to swap in and out entire textures.  Say you've a 2048x2048 texture but your draw call is only going to reference a small portion of it.  With the managed pool the entire texture needs to be swapped in in order for this to happen.  With proper virtualization of textures only the small portion that is being used (in practice it will probably be a litle bigger, in the order of a 64k tile) will get swapped in.  That's an efficiency win straight away.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Partner Spotlight

  • Similar Content

    • By NikiTo
      Would it be a problem to create in HLSL ~50 uninitialized arrays of ~300000 cells each and then use them for my algorithm(what I currently do in C++(and I had stack overflows problems because of large arrays)).
      It is something internal to the shader. Shader will create the arrays in the beginning, will use them and not need them anymore. Not taking data for the arrays from the outside world, not giving back data from the arrays to the outside world either. Nothing shared.
      My question is not very specific, it is about memory consumption considerations when writing shaders in general, because my algorithm still has to be polished. I will let the writing of HLSL for when I have the algorithm totally finished and working(because I expect writing HLSL to be just as unpleasant as GLSL). Still it is useful for me to know beforehand what problems to consider.
    • By mark_braga
      I am working on optimizing our descriptor management code. Currently, I am following most of the guidelines like sorting descriptors by update frequency,...
      I have two types of descriptor ranges: Static (DESCRIPTOR_RANGE_FLAG_NONE) and Dynamic(DESCRIPTORS_VOLATILE). So lets say I have this scenario:
      pCmd->bindDescriptorTable(pTable); for (uint32_t i = 0; i < meshCount; ++i) { // descriptor is created in a range with flag DESCRIPTORS_VOLATILE // setDescriptor will call CopyDescriptorsSimple to copy descriptor handle pDescriptor[i] to the appropriate location in pTable pTable->setDescriptor("descriptor", pDescriptor[i]); } Do I need to call bindDescriptorTable inside the loop?
    • By nbertoa
      I want to implement anti-aliasing in BRE, but first, I want to explore what it is, how it is caused, and what are the techniques to mitigate this effect. That is why I am going to write a series of articles talking about rasterization, aliasing, anti-aliasing, and how I am going to implement it in BRE.
      Article #1: Rasterization
      All the suggestions and improvements are very welcome! I will update this posts with new articles
    • By mark_braga
      I am working on optimizing barriers in our engine but for some reason can't wrap my head around split barriers.
      Lets say for example, I have a shadow pass followed by a deferred pass followed by the shading pass. From what I have read, we can put a begin only split barrier for the shadow map texture after the shadow pass and an end only barrier before the shading pass. Here is how the code will look like in that case.
      DrawShadowMapPass(); ResourceBarrier(BEGIN_ONLY, pTextureShadowMap, SHADER_READ); DrawDeferredPass(); ResourceBarrier(END_ONLY, pTextureShadowMap, SHADER_READ); // Uses shadow map for shadow calculations DrawShadingPass(); Now if I just put one barrier before the shading pass, here is how the code looks.
      DrawShadowMapPass(); DrawDeferredPass(); ResourceBarrier(NORMAL, pTextureShadowMap, SHADER_READ); // Uses shadow map for shadow calculations DrawShadingPass(); Whats the difference between the two?
      Also if I have to use the render target immediately after a pass. For example: Using the albedo, normal textures as shader resource in the shading pass which is right after the deferred pass. Would we benefit from a split barrier in this case?
      Maybe I am completely missing the point so any info on this would really help. The MSDN doc doesn't really help. Also, I read another topic 
      but it didn't really help either. 
    • By ZachBethel
      I'm reading through the Microsoft docs trying to understand how to properly utilize aliasing barriers to alias resources properly.
      "Applications must activate a resource with an aliasing barrier on a command list, by passing the resource in D3D12_RESOURCE_ALIASING_BARRIER::pResourceAfter. pResourceBefore can be left NULL during an activation. All resources that share physical memory with the activated resource now become inactive or somewhat inactive, which includes overlapping placed and reserved resources."
      If I understand correctly, it's not necessary to actually provide the pResourceBefore* for each overlapping resource, as the driver will iterate the pages and invalidate resources for you. This is the Simple Model.
      The Advanced Model is different:
      Advanced Model
      The active/ inactive abstraction can be ignored and the following lower-level rules must be honored, instead:
      An aliasing barrier must be between two different GPU resource accesses of the same physical memory, as long as those accesses are within the same ExecuteCommandLists call. The first rendering operation to certain types of aliased resource must still be an initialization, just like the Simple Model. I'm confused because it looks like, in the Advanced Model, I'm expected to declare pResourceBefore* for every resource which overlaps pResourceAfter* (so I'd have to submit N aliasing barriers). Is the idea here that the driver can either do it for you (null pResourceBefore) or you can do it yourself? (specify every overlapping resource instead)? That seems like the tradeoff here.
      It would be nice if I can just "activate" resources with AliasingBarrier (NULL, activatingResource) and not worry about tracking deactivations.  Am I understanding the docs correctly?
      Thanks.
  • Popular Now