• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Jason Smith
      While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
      #define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
      mov  qword ptr [rdx],rax
      which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.
       
    • By lubbe75
      As far as I understand there is no real random or noise function in HLSL. 
      I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
      Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious? 
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By NikiTo
      Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.
       
      if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
      Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

      MSN>
      discard: Do not output the result of the current pixel.
      clip: Discards the current pixel..
      <MSN

      As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

      I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

      (what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
    • By NikiTo
      I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
      Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

      Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!



       
  • Advertisement
  • Advertisement
Sign in to follow this  

DX12 D3D12 - Record commands for multibuffering

This topic is 710 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello guys,

 

I'm coding a simple D3D12 program and have many command lists with hundreds of prerecorded commands (commands are recorded once at initialization and never reset again).

The problem is that commands that reference the backbuffer can not be recorded because i'm using triplebuffering and when a command recorded for the current backbuffer is executed on the next frame, the program hangs. For example i can't do something like this (i can record it but can't execute it without hanging):

m_command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_render_targets[m_frame_index].Get(), D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET));
m_command_list->OMSetRenderTargets(1, &m_rtv_handle[m_frame_index], false, nullptr);
m_command_list->ClearRenderTargetView(m_rtv_handle[m_frame_index], clearColor, 0, nullptr);
m_command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_render_targets[m_frame_index].Get(), D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT));

This totally breaks my prerecording model. One thing would have to get the handle to the current backbuffer and record a separate command every frame. Another thing would be to prerecord a set of command lists (one for each backbuffer) with the commands in the example above and execute the corresponding one before or after my other prerecorded command lists (the ones with draw submissions), but what if i'd like to set a resource barrier or clear the backbuffer in the middle of my command lists? (it makes no sense to clear the backbuffer in the middle of a frame but is just an example).

In D3D11 it was easy to do this with deferred contexts when creating the swap chain with the swap effect as DXGI_SWAP_EFFECT_DISCARD because the current writeable backbuffer was only accesible through index 0. In D3D12 i can not even set the backbuffer count below 2, no matter what swap effect i'm creating the swap chain with.

Do you guys have a programing model to overcome this?

 

Share this post


Link to post
Share on other sites
Advertisement
You could use bundles for this. Bundle the first commands, bundle the second commands, then execute. I am not really sure why your concerned with resetting in the first place. I highly doubt you're getting much performance gain, but if you do please report your findings. I'd love to hear about it.

Share this post


Link to post
Share on other sites
Alternatively to bundles: you can re-execute command lists, but you have to wait for completion before re-submitting. You can triple buffer your command lists with back buffer references, and have one per buffer

Alternatively, you can use an intermediate and issue a final copy to the back buffer (which is essentially how D3D11 handled swapchains with one buffer).

Share this post


Link to post
Share on other sites

Hello guys, thank you for your interest in this topic.

 

To begin with, i must say that if any of you don't understand the problem then it is very easy to reproduce. Simply grab the most basic example in the D3D12 SDK, the "HelloWindow" example and move line 162 (a call to the function to populate the command list) to line 151 (at the end of the function to load assets). What you're doing here is recording the command list at initialization once and then executing it at every frame. If you compile and execute the program it is going to run, clean the first frame correctly but then in the next frame the command list will reference the previous backbuffer and it will crash. I've attached to this reply a ZIP file with the C++ source file and the compiled program, try it.

 

Now i'll answer some fragments of this topic:

 

 

I am not really sure why your concerned with resetting in the first place. I highly doubt you're getting much performance gain

 

I'm not resetting my lists because i have too many commands and by doing the prerecording model i'm saving CPU time. This may not gain performance in the GPU side as you say but will compensate when doing heavy work in the CPU.

 

 

Yeah bundles is what you want.

 

Bundles have no effect different to direct command lists regarding the backbuffer index issue. The problem persists even with bundles.

 

 

Alternatively, you can use an intermediate and issue a final copy to the back buffer (which is essentially how D3D11 handled swapchains with one buffer).

 

This is a good idea. Create a "ID3D12Resource" and a handle to it, use it as a render target for all my commands and then copy the whole region to the current backbuffer. It sounds great, sure it will require memory for the frame buffer but its just a routine worth the sacrifice (and not that much memory anyway, depends on the resolution, 4k omg). Entire frame buffer copies are expensive but again are dependent on resolution, i wonder how the performance will be affected and how it will be scaled based on resolution. I'll have to elaborate more on the subject as i made an implementation for it. Thanks for the advice, i'll have to try this.

 

 

You can triple buffer your command lists with back buffer references, and have one per buffer

I also thought about creating a command list for each backbuffer but that would be 100+ commands per list for each buffer. This would completely solve the execution problem and it would allow me to write directly to the backbuffer but it would introduce memory usage by a lot (seriously, i'm precaching too many commands across many lists). To counter the memory usage i was thinking about branching my command lists using linked lists. The structure used for the linked list can specify if my command lists are "normal" type or a "backbuffer reference" type. The normal types would only utilize one command list and the other type would use FRAME_COUNT command lists (which can be optimized by creating them as bundles). This way when composing the final array of command lists that are going to be submitted to the command queue i can create an infinite branch of mixed normal and backbuffer reference types. This is my concept:

struct CommandLink
{
    uint8_t type = 0; // 0 = normal (use m_command_list[0]), 1 = backbuffer reference (use m_command_list[0 to FRAME_COUNT - 1]).
    ComPtr<ID3D12GraphicsCommandList> m_command_list[FRAME_COUNT];
    CommandLink* next = nullptr;

 

    // Note that this structure can be extended or optimized using unions.
};

 

And this can be an example branch:

1 - Normal [0]
           |

2 - Backbuffer reference [0-(FRAME_COUNT - 1)]

           |

3 - Normal [0]

           |

4 - Normal [0] (i can do this but two normal types can be merged together for better performance)

           |

5 - Backbuffer reference [0-(FRAME_COUNT - 1)]

           |

6 - Normal [0]

           |

7 - Backbuffer reference [0-(FRAME_COUNT - 1)]

 

EDIT: Actually this is more like serializing command lists rather than branching them. Also this can be done with arrays instead of linked lists.

 

This could sound like an overthought concept but i'm guessing that it will have low memory usage and good performance compared to the intermediate RTV solution. I'll also have to code something like this to see how it goes.

 

Well, this has gone long enough. I'll try to post my results for the 2 solutions but i'll need some time. Also this has somehow turned to something fun to me. I'm really liking D3D12 a lot, it is flexible enough allowing you to do anything you want, even crash your program on purpose.

 

Cheers guys, take care.

Edited by HateWork

Share this post


Link to post
Share on other sites

 

Yeah bundles is what you want.

 

Bundles have no effect different to direct command lists regarding the backbuffer index issue. The problem persists even with bundles.

 

 

I didn't mean that the issue is directly tackled, I meant use bundles for what you can pre-record and then use direct command lists for the rest.  At least thats what I was thinking at the time.  To be honest though D3D12 has a lot less CPU usage than 'classic' API's I don't really see the point of going out your way to reduce it further.  But like I said wouldn't a combination of bundles and direct command lists work out for you.  Or if your really worried about it a combination of all three direct, prerecorded direct, and bundles with no state inheritance.

Share this post


Link to post
Share on other sites

This is a good idea. Create a "ID3D12Resource" and a handle to it, use it as a render target for all my commands and then copy the whole region to the current backbuffer. It sounds great, sure it will require memory for the frame buffer but its just a routine worth the sacrifice (and not that much memory anyway, depends on the resolution, 4k omg). Entire frame buffer copies are expensive but again are dependent on resolution, i wonder how the performance will be affected and how it will be scaled based on resolution. I'll have to elaborate more on the subject as i made an implementation for it. Thanks for the advice, i'll have to try this.

 

I've been doing this it works fine. You only need one additional intermediate texture + the swap chain textures. You then have two loops that don't need to run in lock step -- they still need to be synchronized, but one can run many more times than the other. You need as many command lists as you have swap chain textures, but once you make them you literally don't need to reset anything ever (so long as your command lists don't need to change). You can even use the same allocator for everything, which I think is fine, because you're not resetting anything. It's not profoundly useful or anything and mostly a fun challenge but I'm pretty sure no one else is doing this.

 

I like this because I can do sort of phony "background" tasks on the gpu and fully saturate its workload so that there's almost no idle time and maintain a very reliable 60fps. 

Edited by Dingleberry

Share this post


Link to post
Share on other sites

Ok, so i'm back here to report my progress. I finished the implementation of my "command serializer" concept and it ended up pretty damn good! I did some basic testing and here are the results:

 

NVIDIA GTX 750 Ti (v-sync off):

[Default MS Implementation]
3317 fps average
30.6 MiB (RAM)

[Command Serializer]
3616 fps average with spikes up to 3850 fps
30.2 MiB (RAM)

 

I run the tests many times and results were the same. The "Default MS Implementation" means that commands that reference the backbuffer are recorded every frame in a dedicated command list for this purpose and my normal commands are recorded once in their own command lists.

The serializer method needs more testing under different scenarios to see how it behaves but so far it has been doing good for command lists that are prerecorded once. It works perfect for every type of commands, i can reference any backbuffer at any moment and mix them between normal commands.

What's next? I want to code two more solutions and publish the results: The "intermediate RTV" and also the more common "one command list per backbuffer", the latter one I thought it would use too much memory because I thought vertex buffer data and other resource data was cached by command lists but I think now that they doesn't, this should make this solution the preferred one because it would be standard, lightweight and faster. Lets wait for the results.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement