[D3D12] Use compute shader to write to swap chain backbuffer?

Started by
13 comments, last by Adam Miles 8 years, 4 months ago

Hey! I was hoping to do all of my postprocessing (tonemapping, bloom, etc) in my D3D12 renderer using compute shaders, Thus, I think, in an ideal world I'd just write to it as a RWTexture2D<float4> in the compute shader and blast the processed colors out directly.

However, IDXGIFactory4::CreateSwapChain fails if I try to specify a BufferUsage that contains DXGI_USAGE_UNORDERED_ACCESS ("DXGI ERROR: IDXGIFactory::CreateSwapChain: The BufferUsage field of the swapchain description contains some DXGI_USAGE flags that are not supported.")

Without that, I can't create an unordered access view to it, or transition it into the proper state using a resource barrier, so I can't seem to write to it using a compute shader at all!

Is there a way around this (another way to write to a texture from compute, or another way to create a swap chain that plays nice), or should I just go back to postprocessing using pixel shaders like a caveman?

Advertisement

Perhaps someone more knowledgeable can comment on how correct or optimal this is, but in both D3D11 and now D3D12 I've always used an intermediate texture2D resource for my compute output (bindflags uav and srv usage, accessed as RWTexture2D in compute shader).

That's then transitioned to D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE and copied into the actual back buffer by rendering a fullscreen triangle using the normal pipeline (accessed as Texture2D in pixel shader), with the backbuffer bound as the RTV. The UI is then rendered into the backbuffer etc. There's probably a way to do a more direct copy if the intermediate resource and backbuffer share the same dxgi_format, although usually my compute output is still in a HDR format so the pixel shader is required for tone mapping anyway.

D3D12 now disallows UAV usage on swap chains. You'll need to copy into the swap chain or ensure the last event in your frame can be performed using a pixel shader that can write directly to the swap chain. By all means keep the rest of your post-processing in Compute though.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Thanks for the info. Yeah, that kinda sucks (it's a sad regression from D3D11), but I've at least worked around it in the short term by having a simple copy postprocess (via a pixel shader) that gets things onto the backbuffer and still allows all of my postproc to run as compute (as you suggested).

Thanks again!

EDIT: I suppose I could use CopyTextureRegion instead and use that fancy DMA hardware that may or may exist, if I'm already using a backbuffer-format-compatible texture by the time I'm at the end of the postproc chain. But a pixel shader works well enough for now.

Side note: Watch out for the "fancy" DMA; for exaple on consoles it can have an unexpectedly shitty performance. Be sure to profile it.

As an aside, the vast majority of post processes will be slower on a compute shader. Doing it with a pixelshader isn't doing it like a caveman. You're rasterizing colors in a very coherent manner which GPUs have been designed to do quickly, leveraging more hardware than a compute shader by itself can do. It can do things like compress the output of your colors to a render target to reduce bandwidth, and there's just no way to do that in a compute shader.

Given that most of your texture fetches are going to be cached in a post process shader, you need to be doing a lot of cooperative work in a compute shader to realize a benefit.

As an aside, the vast majority of post processes will be slower on a compute shader. Doing it with a pixelshader isn't doing it like a caveman. You're rasterizing colors in a very coherent manner which GPUs have been designed to do quickly, leveraging more hardware than a compute shader by itself can do. It can do things like compress the output of your colors to a render target to reduce bandwidth, and there's just no way to do that in a compute shader.

Given that most of your texture fetches are going to be cached in a post process shader, you need to be doing a lot of cooperative work in a compute shader to realize a benefit.

What do you mean by that (bolded)? Are you suggesting that writes to UAVs can't be made to formats smaller than 32bits per channel?

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

I mean that the rasterizer writes to memory in a different way than a compute shader writes to a UAV. If you're writing coherent color values to a giant rectangular render target texture meant for display purposes, a pixel shader is going to be very fast at this relative to writing from a compute shader to a UAV. If you're profiling your code you're probably going to realize this immediately as doing even a trivial post process effect with a compute shader is going to measure considerably slower than the pixel shader equivalent.

That's not been my experience on hardware of the last few years at least.

In fact, since Compute can bypass traditional limits surrounding fill-rate (pixels per clock) by avoiding the use of ROPs entirely you may even find that Compute is faster than Raster in trivial shaders that would otherwise have hit that bottleneck before any other. As the shaders get more complicated you either run into bandwidth or ALU being the limiting factor, neither of which are inherently faster or slower in Compute or Raster. In the class of post-processing shaders that share taps between threads (blurs, for example), use of Compute/LDS is likely to offset any mild performance cost that using Compute might have had.

The definition of "considerably" is up for debate, but I'm not sure I've seen Compute ever come out more than about 10% slower than Raster in anything I've tried.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group


That's not been my experience on hardware of the last few years at least.

This is the impression I was under as well, from reading at least.

-potential energy is easily made kinetic-

This topic is closed to new replies.

Advertisement