Sign in to follow this  
Tispe

DX12 DX12 - Documentation / Tutorials?

Recommended Posts

Haven't heard of any releases for DX12 yet, hints are for an early access preview later this year. You can apply for the early access from a link down the page here. That same page gives a good overview of what is changing - I think the main thing is lower level of hardware abstraction. It looks like it builds upon the approach used in DX11 (e.g. still supports command lists and so on), so I think you will be safe to start with DX 11.1 or 11.2

Edited by spazzarama

Share this post


Link to post
Share on other sites

Closest thing to documentation right now: http://channel9.msdn.com/Events/Build/2014/3-564

(Rant: And knowing Microsoft, probably that's the only "documentation" we will ever have for a long time. With DX 10 & 11 I got the feeling they either didn't finish the documentation, or they did finish it but didn't provide it for free. I hear some guy at Futuremark (3DMark) already had access to the API to write a DX12 demo - I fear they, like other companies are probably already working directly with MS to figure out stuff, and we average Joes will again be left in the dark.)

Edited by tonemgub

Share this post


Link to post
Share on other sites

what kind of documentation are you searching for?


Sorry, should've been more specific. I'm referring to documentation on the binary format to allow you to produce/consume compiled shaders like you can with SM1-3 without having to pass through Microsoft DLLs or HLSL. Consider projects like MojoShader that could make use of this functionality to decompile SM4/5 code to GLSL when porting software or a possible Linux D3D11 driver that would need to be able to compile compiled SM4/5 code into Gallium IR and eventually GPU machine code.

There's also no way with SM4/5 to write assembly and compile it which is a pain for various tools that don't work to work through HLSL or the HLSL compiler.

Share this post


Link to post
Share on other sites

So DirectX12 is it going to be like DX10. where thery get you started then they just drop it in no time and replace it with 11 wtf. Is it Is it going to be like that.

Share this post


Link to post
Share on other sites

 

 

D3D12 will be the same, except will perform much better (D3D11 deferred context do not actually provide good performance increases in practice... or this is the excuse of AMD and Intel which do not support driver command lists).

Fixed cool.png

 

AMD support them on Mantle and multiple game console APIs. It's a back end D3D (Microsoft code) issue, forcing a single-thread in the kernel mode driver be responsible for kickoff. The D3D12 presentations have pointed out this flaw themselves.

 

 
I know that D3D11 command lists are far away from be perfect, but AMD was the first IHV to sell DX11 GPUs (Radeon HD5000 Series) claiming "multi-threading support" as one of the big features of theirs graphics cards.
 
Here what AMD proclaims: 
 
http://www.amd.com/en-us/products/graphics/desktop/5000/5970
 

  • Full DirectX® 11 support
    • Shader Model 5.0
    • DirectCompute 11
    • Programmable hardware tessellation unit
    • Accelerated multi-threading
    • HDR texture compression
    • Order-independent transparency

 

They also claimed the same thing with DX 11.1 GPUs when WDDM 1.2 drivers came out. 

 

Yes, their driver is itself "multi-threaded" (I remember few years ago it scaled well on two cores with half CPU driver overhead), and you can always use deferred context in different "app-threads" (since they are emulated by the D3D runtime, more CPU overhead yeah!), but that's not the same thing.

 

Graphics Mafia.. ehm NVIDIA supports driver command lists, and where used in the correct they work just fine (big example: civilization 5). Yes, they also "cheat" consumers  on feature level 11.1 support (as AMD "cheated" consumers.. and developers! on tier-2 tiled resources support) and they really like to break old application and games compatibility (especially old OpenGL games), but those are other stories.

Edited by Alessio1989

Share this post


Link to post
Share on other sites

 

what kind of documentation are you searching for?


Sorry, should've been more specific. I'm referring to documentation on the binary format to allow you to produce/consume compiled shaders like you can with SM1-3 without having to pass through Microsoft DLLs or HLSL. Consider projects like MojoShader that could make use of this functionality to decompile SM4/5 code to GLSL when porting software or a possible Linux D3D11 driver that would need to be able to compile compiled SM4/5 code into Gallium IR and eventually GPU machine code.

There's also no way with SM4/5 to write assembly and compile it which is a pain for various tools that don't work to work through HLSL or the HLSL compiler.

 

 

I'm not sure what the actual problem you have here is.  It's an ID3DBlob.

 

If you want to load a precompiled shader, it's as simple as (and I'll even do it in C, just to prove the point) fopen, fread and a bunch of ftell calls to get the file size.  Similarly to save one it's fopen and fwrite.

 

Unless you're looking for something else that Microsoft actually have no obligation whatsoever to give you, that is.....

Share this post


Link to post
Share on other sites

The DX12 overview indicates that the "unlimited memory" that the managed pool offers will be replaceable with costum memory management.

 

Say your typical low end graphics card has 512MB - 1GB of memory. Is it realistic to say that the total data required to draw a complete frame is 2GB, would that mean that the GPU memory would have to be refreshed 2-5+ times every frame?

 

Do I need to start batching based on buffer sizes? 

Share this post


Link to post
Share on other sites

Is it realistic to say that the total data required to draw a complete frame is 2GB

Unless you have another idea, this is completely unrealistic. The amount of data required for one frame should be somewhere in the order of megabytes...  And DX11.2 minimizes the memory requirement with "tiled resources".

 

 


would that mean that the GPU memory would have to be refreshed 2-5+ times every frame?

This is not the case, but even if it was... The article is not very clear on this. It says that the driver will tell the operating system to copy resources into GPU memory (from system memory) as required, but only the application can free those resources once all of the queued commands using those resources have been processed by the GPU. It's not clear if the resources can also be released (from GPU memory, by the OS) during the processing of already queued commands, to make room for the next 512MB (or 1GB, or whatever size) of your 2GB data. But my guess is that this is not possible. This would imply that the application's "swap resource" request could somehow be plugged-into the driver/GPU's queue of commands, to release unused resource memory intermediately, which is probably not possible, since (also according to the article), the application has to wait for all of the queued commands in a frame to be executed, before it knows which resources are no longer needed. Also, "the game already knows that a sequence of rendering commands refers to a set of resources" - this also implies that the application (not even the OS) can only change resource residency in-between frames (sequence of rendering commands), not during a single frame. Also, DX12 is only a driver/application-side improvement over DX11. Adding memory management capabilities to the GPU itself would also require a hardware-side redesign.

 

 


Do I need to start batching based on buffer sizes?

If you think that you'll need to use 2GB (or more than the recommended/available resource limits) of data per frame, then yes. Otherwise, no.

Edited by tonemgub

Share this post


Link to post
Share on other sites

Thanks, Hodgman! Really good explanation!

 

 

 


Quote

Also, DX12 is only a driver/application-side improvement over DX11. Adding memory management capabilities to the GPU itself would also require a hardware-side redesign.

This kind of memory management is already required in order to implement the existing D3D runtime - pretending that the managed pool can be of unlimited size requires that the runtime can submit partial command buffers and page resources in and out of GPU-RAM during a frame.

What I meant to point out by that (and this was the main conclusion I reached with my train of thoughts) was that the CPU is still the one doing the heavy-lifting when it comes to memory management. But now that I think about it, I guess it makes no difference - the main bottleneck is having to do an extra "memcpy" when there's not enough video memory.

 

For your explanation of how DMA could be used to make this work, that method would have to also be used -for example- when all of that larger-than-video-memory resource is being accessed by the GPU in the same, single shader invocation? Or that shader invocation would (somehow) have to be broken up into the subsequently generated command lists? Does that mean that the DirectX pipeline is also virtualized on the CPU?

 

Anyway, I think the main question that must be answered here is if the resource limits imposed by DX11 will go away In DX12. Yes, theoretically (and perhaps even practically) the CPU & GPU could be programmed into working together to provide virtually unlimited memory, but will this really be the case with DX12? From what I can tell, that article implies that the "unlimited memory" has to be implemented as "custom memory management" done by the application - not the runtime, nor the driver or GPU. This probably also means that it will be the application's job to split the processing of the large data/resources into multiple command lists, and I don't think the application will be allowed to use that DMA-based synchronisation method (or trick? smile.png ) that you explained.

Edit: Wait. That's how tiled resources already work. Never mind... :)

Edited by tonemgub

Share this post


Link to post
Share on other sites

It's always been the case that you shouldn't use more memory than the GPU actually has, because it results in terrible performance. So, assuming that you've always followed this advice, you don't have to do much work in the future 

 

So in essence, a game rated to have minimum 512MB VRAM (does DX have memory requirements?) never uses more then that for any single frame/scene?

 

You would think that AAA-games that require tens of gigabytes of disk space would at some point use more memory in a scene then what is available on the GPU. Is this just artist trickery to keep any scene below rated gpu memory? 

Edited by Tispe

Share this post


Link to post
Share on other sites


Tiled resources tie in with the virtual address space stuff. Say you've got a texture that exists in an allocation from pointer 0x10000 to 0x90000 (a 512KB range) -- you can think of this allocation being made up of 8 individual 64KB pages.
Tiled resources are a fancy way of saying that the entire range of this allocation doesn't necessarily need to be 'mapped' / has to actually translate to a physical allocation.
It's possible that 0x10000 - 0x20000 is actually backed by physical memory, but 0x20000 - 0x90000 aren't actually valid pointers (much like a null pointer), and they don't correspond to any physical location.
This isn't actually new stuff -- at the OS level, allocating a range of the virtual address space (allocating yourself a new pointer value) is actually a separate operation to allocating some physical memory, and then creating a link between the two. The new part that makes this extremely useful is a new bit of shader hardware -- When a shader tries to sample a texel from this texture, it now gets an additional return value indicating whether the texture-fetch actually suceeded or not (i.e. whether the resource pointer was actually valid or not). With older hardware, fetching from an invalid resource pointer would just crash (like they do on the CPU), but now we get error flags.
 
This means you can create absolutely huge resources, but then on the granularity of 64KB pages, you can determine whether those pages are physically actually allocated or not. You can use this so that the application can appear simple, and just use huge textures, but then the engine/driver/whatever can intelligently allocate/deallocate parts of those textures as required.

 

So what you are saying is that we CAN have 2GB+ of game resources allocated on the GPU VRAM using virtual addresses just fine. But only when we need them we need to tell the driver to actually page things in to VRAM?

 

Assume now that a modern computer has atleast 16GB of system memory. And a game has 8GB of resources and the GPU has 2GB VRAM. So in this situation a DX12 game would just copy all game data from disk to system memory (8GB), then allocate those 8GB to the VRAM and create 8GB of resources, even though the physical limit is 2GB. Command queues would then tell the driver what parts of those 8GB to page in and out? But is that not just what managed pool does anyway?

Share this post


Link to post
Share on other sites

It's pretty much VirtualAlloc & VirtualFree for the GPU.

You still have to manually manage the memory yourself, flagging pages as appropriate and loading/inflating data from disk/RAM to VRAM on need.

 

Virtual Textures were available in hardware on 3DLabs cards 10 years ago, long before "Mega Textures" ever existed...

 

Doing manually allows you to predict what you'll need, and keep things compressed in RAM, that's not the case for Managed which needs you to upload everything in final format and let the system page in/out on use [which is, too late to avoid a performance hit].

Edited by Ingenu

Share this post


Link to post
Share on other sites
But is that not just what managed pool does anyway?

 

The managed pool needs to swap in and out entire textures.  Say you've a 2048x2048 texture but your draw call is only going to reference a small portion of it.  With the managed pool the entire texture needs to be swapped in in order for this to happen.  With proper virtualization of textures only the small portion that is being used (in practice it will probably be a litle bigger, in the order of a 64k tile) will get swapped in.  That's an efficiency win straight away.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628360
    • Total Posts
      2982261
  • Similar Content

    • By DejayHextrix
      Hi, New here. 
      I need some help. My fiance and I like to play this mobile game online that goes by real time. Her and I are always working but when we have free time we like to play this game. We don't always got time throughout the day to Queue Buildings, troops, Upgrades....etc.... 
      I was told to look into DLL Injection and OpenGL/DirectX Hooking. Is this true? Is this what I need to learn? 
      How do I read the Android files, or modify the files, or get the in-game tags/variables for the game I want? 
      Any assistance on this would be most appreciated. I been everywhere and seems no one knows or is to lazy to help me out. It would be nice to have assistance for once. I don't know what I need to learn. 
      So links of topics I need to learn within the comment section would be SOOOOO.....Helpful. Anything to just get me started. 
      Thanks, 
      Dejay Hextrix 
    • By HD86
      As far as I know, the size of XMMATRIX must be 64 bytes, which is way too big to be returned by a function. However, DirectXMath functions do return this struct. I suppose this has something to do with the SIMD optimization. Should I return this huge struct from my own functions or should I pass it by a reference or pointer?
      This question will look silly to you if you know how SIMD works, but I don't.
    • By lubbe75
      I am looking for some example projects and tutorials using sharpDX, in particular DX12 examples using sharpDX. I have only found a few. Among them the porting of Microsoft's D3D12 Hello World examples (https://github.com/RobyDX/SharpDX_D3D12HelloWorld), and Johan Falk's tutorials (http://www.johanfalk.eu/).
      For instance, I would like to see an example how to use multisampling, and debugging using sharpDX DX12.
      Let me know if you have any useful examples.
      Thanks!
    • By lubbe75
      I'm writing a 3D engine using SharpDX and DX12. It takes a handle to a System.Windows.Forms.Control for drawing onto. This handle is used when creating the swapchain (it's set as the OutputHandle in the SwapChainDescription). 
      After rendering I want to give up this control to another renderer (for instance a GDI renderer), so I dispose various objects, among them the swapchain. However, no other renderer seem to be able to draw on this control after my DX12 renderer has used it. I see no exceptions or strange behaviour when debugging the other renderers trying to draw, except that nothing gets drawn to the area. If I then switch back to my DX12 renderer it can still draw to the control, but no other renderers seem to be able to. If I don't use my DX12 renderer, then I am able to switch between other renderers with no problem. My DX12 renderer is clearly messing up something in the control somehow, but what could I be doing wrong with just SharpDX calls? I read a tip about not disposing when in fullscreen mode, but I don't use fullscreen so it can't be that.
      Anyway, my question is, how do I properly release this handle to my control so that others can draw to it later? Disposing things doesn't seem to be enough.
    • By Tubby94
      I'm currently learning how to store multiple objects in a single vertex buffer for efficiency reasons. So far I have a cube and pyramid rendered using ID3D12GraphicsCommandList::DrawIndexedInstanced; but when the screen is drawn, I can't see the pyramid because it is drawn inside the cube. I'm told to "Use the world transformation matrix so that the box and pyramid are disjoint in world space".
       
      Can anyone give insight on how this is accomplished? 
       
           First I init the verts in Local Space
      std::array<VPosData, 13> vertices =     {         //Cube         VPosData({ XMFLOAT3(-1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, +1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, +1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(-1.0f, +1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, +1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, +1.0f) }),         //Pyramid         VPosData({ XMFLOAT3(-1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(0.0f,  +1.0f, 0.0f) }) } Then  data is stored into a container so sub meshes can be drawn individually
      SubmeshGeometry submesh; submesh.IndexCount = (UINT)indices.size(); submesh.StartIndexLocation = 0; submesh.BaseVertexLocation = 0; SubmeshGeometry pyramid; pyramid.IndexCount = (UINT)indices.size(); pyramid.StartIndexLocation = 36; pyramid.BaseVertexLocation = 8; mBoxGeo->DrawArgs["box"] = submesh; mBoxGeo->DrawArgs["pyramid"] = pyramid;  
      Objects are drawn
      mCommandList->DrawIndexedInstanced( mBoxGeo->DrawArgs["box"].IndexCount, 1, 0, 0, 0); mCommandList->DrawIndexedInstanced( mBoxGeo->DrawArgs["pyramid"].IndexCount, 1, 36, 8, 0);  
      Vertex Shader
       
      cbuffer cbPerObject : register(b0) { float4x4 gWorldViewProj; }; struct VertexIn { float3 PosL : POSITION; float4 Color : COLOR; }; struct VertexOut { float4 PosH : SV_POSITION; float4 Color : COLOR; }; VertexOut VS(VertexIn vin) { VertexOut vout; // Transform to homogeneous clip space. vout.PosH = mul(float4(vin.PosL, 1.0f), gWorldViewProj); // Just pass vertex color into the pixel shader. vout.Color = vin.Color; return vout; } float4 PS(VertexOut pin) : SV_Target { return pin.Color; }  

  • Popular Now