D3D12 - Record commands for multibuffering

Started by
6 comments, last by HateWork 7 years, 11 months ago

Hello guys,

I'm coding a simple D3D12 program and have many command lists with hundreds of prerecorded commands (commands are recorded once at initialization and never reset again).

The problem is that commands that reference the backbuffer can not be recorded because i'm using triplebuffering and when a command recorded for the current backbuffer is executed on the next frame, the program hangs. For example i can't do something like this (i can record it but can't execute it without hanging):


m_command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_render_targets[m_frame_index].Get(), D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET));
m_command_list->OMSetRenderTargets(1, &m_rtv_handle[m_frame_index], false, nullptr);
m_command_list->ClearRenderTargetView(m_rtv_handle[m_frame_index], clearColor, 0, nullptr);
m_command_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_render_targets[m_frame_index].Get(), D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT));

This totally breaks my prerecording model. One thing would have to get the handle to the current backbuffer and record a separate command every frame. Another thing would be to prerecord a set of command lists (one for each backbuffer) with the commands in the example above and execute the corresponding one before or after my other prerecorded command lists (the ones with draw submissions), but what if i'd like to set a resource barrier or clear the backbuffer in the middle of my command lists? (it makes no sense to clear the backbuffer in the middle of a frame but is just an example).

In D3D11 it was easy to do this with deferred contexts when creating the swap chain with the swap effect as DXGI_SWAP_EFFECT_DISCARD because the current writeable backbuffer was only accesible through index 0. In D3D12 i can not even set the backbuffer count below 2, no matter what swap effect i'm creating the swap chain with.

Do you guys have a programing model to overcome this?

Advertisement
You could use bundles for this. Bundle the first commands, bundle the second commands, then execute. I am not really sure why your concerned with resetting in the first place. I highly doubt you're getting much performance gain, but if you do please report your findings. I'd love to hear about it.

Yeah bundles is what you want.

-potential energy is easily made kinetic-

Alternatively to bundles: you can re-execute command lists, but you have to wait for completion before re-submitting. You can triple buffer your command lists with back buffer references, and have one per buffer

Alternatively, you can use an intermediate and issue a final copy to the back buffer (which is essentially how D3D11 handled swapchains with one buffer).

Hello guys, thank you for your interest in this topic.

To begin with, i must say that if any of you don't understand the problem then it is very easy to reproduce. Simply grab the most basic example in the D3D12 SDK, the "HelloWindow" example and move line 162 (a call to the function to populate the command list) to line 151 (at the end of the function to load assets). What you're doing here is recording the command list at initialization once and then executing it at every frame. If you compile and execute the program it is going to run, clean the first frame correctly but then in the next frame the command list will reference the previous backbuffer and it will crash. I've attached to this reply a ZIP file with the C++ source file and the compiled program, try it.

Now i'll answer some fragments of this topic:

I am not really sure why your concerned with resetting in the first place. I highly doubt you're getting much performance gain

I'm not resetting my lists because i have too many commands and by doing the prerecording model i'm saving CPU time. This may not gain performance in the GPU side as you say but will compensate when doing heavy work in the CPU.

Yeah bundles is what you want.

Bundles have no effect different to direct command lists regarding the backbuffer index issue. The problem persists even with bundles.

Alternatively, you can use an intermediate and issue a final copy to the back buffer (which is essentially how D3D11 handled swapchains with one buffer).

This is a good idea. Create a "ID3D12Resource" and a handle to it, use it as a render target for all my commands and then copy the whole region to the current backbuffer. It sounds great, sure it will require memory for the frame buffer but its just a routine worth the sacrifice (and not that much memory anyway, depends on the resolution, 4k omg). Entire frame buffer copies are expensive but again are dependent on resolution, i wonder how the performance will be affected and how it will be scaled based on resolution. I'll have to elaborate more on the subject as i made an implementation for it. Thanks for the advice, i'll have to try this.

You can triple buffer your command lists with back buffer references, and have one per buffer

I also thought about creating a command list for each backbuffer but that would be 100+ commands per list for each buffer. This would completely solve the execution problem and it would allow me to write directly to the backbuffer but it would introduce memory usage by a lot (seriously, i'm precaching too many commands across many lists). To counter the memory usage i was thinking about branching my command lists using linked lists. The structure used for the linked list can specify if my command lists are "normal" type or a "backbuffer reference" type. The normal types would only utilize one command list and the other type would use FRAME_COUNT command lists (which can be optimized by creating them as bundles). This way when composing the final array of command lists that are going to be submitted to the command queue i can create an infinite branch of mixed normal and backbuffer reference types. This is my concept:

struct CommandLink
{
uint8_t type = 0; // 0 = normal (use m_command_list[0]), 1 = backbuffer reference (use m_command_list[0 to FRAME_COUNT - 1]).
ComPtr<ID3D12GraphicsCommandList> m_command_list[FRAME_COUNT];
CommandLink* next = nullptr;

// Note that this structure can be extended or optimized using unions.
};

And this can be an example branch:

1 - Normal [0]
|

2 - Backbuffer reference [0-(FRAME_COUNT - 1)]

|

3 - Normal [0]

|

4 - Normal [0] (i can do this but two normal types can be merged together for better performance)

|

5 - Backbuffer reference [0-(FRAME_COUNT - 1)]

|

6 - Normal [0]

|

7 - Backbuffer reference [0-(FRAME_COUNT - 1)]

EDIT: Actually this is more like serializing command lists rather than branching them. Also this can be done with arrays instead of linked lists.

This could sound like an overthought concept but i'm guessing that it will have low memory usage and good performance compared to the intermediate RTV solution. I'll also have to code something like this to see how it goes.

Well, this has gone long enough. I'll try to post my results for the 2 solutions but i'll need some time. Also this has somehow turned to something fun to me. I'm really liking D3D12 a lot, it is flexible enough allowing you to do anything you want, even crash your program on purpose.

Cheers guys, take care.

Yeah bundles is what you want.

Bundles have no effect different to direct command lists regarding the backbuffer index issue. The problem persists even with bundles.

I didn't mean that the issue is directly tackled, I meant use bundles for what you can pre-record and then use direct command lists for the rest. At least thats what I was thinking at the time. To be honest though D3D12 has a lot less CPU usage than 'classic' API's I don't really see the point of going out your way to reduce it further. But like I said wouldn't a combination of bundles and direct command lists work out for you. Or if your really worried about it a combination of all three direct, prerecorded direct, and bundles with no state inheritance.

-potential energy is easily made kinetic-

This is a good idea. Create a "ID3D12Resource" and a handle to it, use it as a render target for all my commands and then copy the whole region to the current backbuffer. It sounds great, sure it will require memory for the frame buffer but its just a routine worth the sacrifice (and not that much memory anyway, depends on the resolution, 4k omg). Entire frame buffer copies are expensive but again are dependent on resolution, i wonder how the performance will be affected and how it will be scaled based on resolution. I'll have to elaborate more on the subject as i made an implementation for it. Thanks for the advice, i'll have to try this.

I've been doing this it works fine. You only need one additional intermediate texture + the swap chain textures. You then have two loops that don't need to run in lock step -- they still need to be synchronized, but one can run many more times than the other. You need as many command lists as you have swap chain textures, but once you make them you literally don't need to reset anything ever (so long as your command lists don't need to change). You can even use the same allocator for everything, which I think is fine, because you're not resetting anything. It's not profoundly useful or anything and mostly a fun challenge but I'm pretty sure no one else is doing this.

I like this because I can do sort of phony "background" tasks on the gpu and fully saturate its workload so that there's almost no idle time and maintain a very reliable 60fps.

Ok, so i'm back here to report my progress. I finished the implementation of my "command serializer" concept and it ended up pretty damn good! I did some basic testing and here are the results:

NVIDIA GTX 750 Ti (v-sync off):

[Default MS Implementation]
3317 fps average
30.6 MiB (RAM)

[Command Serializer]
3616 fps average with spikes up to 3850 fps
30.2 MiB (RAM)

I run the tests many times and results were the same. The "Default MS Implementation" means that commands that reference the backbuffer are recorded every frame in a dedicated command list for this purpose and my normal commands are recorded once in their own command lists.

The serializer method needs more testing under different scenarios to see how it behaves but so far it has been doing good for command lists that are prerecorded once. It works perfect for every type of commands, i can reference any backbuffer at any moment and mix them between normal commands.

What's next? I want to code two more solutions and publish the results: The "intermediate RTV" and also the more common "one command list per backbuffer", the latter one I thought it would use too much memory because I thought vertex buffer data and other resource data was cached by command lists but I think now that they doesn't, this should make this solution the preferred one because it would be standard, lightweight and faster. Lets wait for the results.

This topic is closed to new replies.

Advertisement