Jump to content
  • Advertisement
zmic

DX12 CopyTextureRegion to backbuffer of swapchain does nothing

Recommended Posts

Hi programmers,

I have this problem in DirectX 12.

I have one thread that runs a compute shader in a loop. This compute shader generates a 2d image in a buffer (D3D12_HEAP_TYPE_DEFAULT, D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS). The buffer gets updated, say, a hundred times a second.

In the main thread, I copy this buffer to the back buffer of the swap chain when WM_PAINT is called. WM_PAINT keeps getting called because I never do the BeginPaint/Endpaint. For this copy operation I use a graphical CommandQueue/CommandList. Here's the pseudo-code for this paint operation:

... reset CommandQueue/CommandList

swapchain->GetBuffer(back_buffer)
commandlist->CopyTextureRegion(back_buffer, 0, 0, 0, computed_buffer, nullptr);
commandlist->ResourceBarrier( back_buffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_PRESENT));

... execute CommandList and wait for finish using fence
... swapchain->Present(..)

I use a good old CriticalSection to make sure the compute CommandList and the graphical CommandList don't run at the same time.

When I start the program this runs fine in a normal window. I see the procedurally generated buffer animated in real time. However, when I switch to full screen (and resize the swapchain), nothing gets rendered. The screen stays black. When I leave fullscreen (again resize swapchain), same problem. The screen just stays black. Other than that the application runs stable. No directX warnings in the debug output, no nothing. I checked that the WM_PAINT messages keep coming and I checked that the compute thread keeps computing.

Note that I don't do anything else with the graphical commandlist. I set no pipeline, or root signature because I have no 3d rendering to do. Can this be a problem?

I suppose I could retrieve the computed buffer with a readback buffer and paint it with an ordinary GDI function, but that seems silly with the data already being on the GPU.

EDIT: I ran the code on another PC and on there the window stays black right from the start. So the resizing doesn't seem to be the problem.

Any ideas appreciated!

Edited by zmic

Share this post


Link to post
Share on other sites
Advertisement

EDIT2: I removed the compute thread altogether. Still the same problem. So it's not some concurrency problem.

I guess my question simply boils down to: I have a non-texture buffer with RGBA data on the GPU. How do I blit it to the backbuffer of the swapchain? (without running an actual 3D rendering pipeline, if possible),

 

Edited by zmic

Share this post


Link to post
Share on other sites

I am able to copy to the backbuffer (I used CopyResource not CopyTextureRegion, but I doubt that is the problem)

Remember to switch your backbuffer to D3D12_RESOURCE_STATE_COPY_DEST before copying to it

 

What happens if you resize the window not by switching to fullscreen but instead by dragging the edge of the window bigger ? (assuming you call your resize swapchain code here as well)

The problem may be something you have missed when resizing (for eample if you were using a pixel shader you have to recreate rendertargetviews when you resize the swapchain as the old texture buffers dont exist anymore so pointers to them are garbage)

The same applies to your uav on the texture your compute shader is writing to if you have recreated that texture on a resize.

Edited by CortexDragon

Share this post


Link to post
Share on other sites
2 hours ago, zmic said:

I use a good old CriticalSection to make sure the compute CommandList and the graphical CommandList don't run at the same time

You can submit work on two different queues at the same time - there's no need for a mutex there. Also, queues just enqueue work for the GPU to perform later - a mutex around your queues won't affect when the GPU actually executes the commands. Also, in this situation, a single queue would be fine instead of having two queues. 

Does your graphics command list contain the right barriers to transition the texture from UAV to copy-src and back again? How do you keep track of your backbuffer descriptors? Do you create your device with the debug flag set? 

Share this post


Link to post
Share on other sites
54 minutes ago, CortexDragon said:

I am able to copy to the backbuffer (I used CopyResource not CopyTextureRegion, but I doubt that is the problem)

Remember to switch your backbuffer to D3D12_RESOURCE_STATE_COPY_DEST before copying to it

 

What happens if you resize the window not by switching to fullscreen but instead by dragging the edge of the window bigger ? (assuming you call your resize swapchain code here as well)

The problem may be something you have missed when resizing (for eample if you were using a pixel shader you have to recreate rendertargetviews when you resize the swapchain as the old texture buffers dont exist anymore so pointers to them are garbage)

The same applies to your uav on the texture your compute shader is writing to if you have recreated that texture on a resize.

Thanks for your input! I was able to reproduce your suggestion... I can CopyResource a texture to the backbuffer (having the same size) and it works nicely in full-screen and windowed mode.

However, the buffer that is calculated by the compute thread is not a texture buffer but a plain D3D12_RESOURCE_DIMENSION_BUFFER so I cannot CopyResource it -- debugger complains that backbuffer and compute buffer are of different type. I need to use CopyTextureRegion with the buffer "wrapped" inside a D3D12_TEXTURE_COPY_LOCATION structure.

Maybe I can let the compute thread write into a texture buffer rather than a plain buffer.. gonna try that first.

 

 

 

Share this post


Link to post
Share on other sites
10 hours ago, Hodgman said:

You can submit work on two different queues at the same time - there's no need for a mutex there. Also, queues just enqueue work for the GPU to perform later - a mutex around your queues won't affect when the GPU actually executes the commands. Also, in this situation, a single queue would be fine instead of having two queues. 

Does your graphics command list contain the right barriers to transition the texture from UAV to copy-src and back again? How do you keep track of your backbuffer descriptors? Do you create your device with the debug flag set? 

Thanks for your input Hodgman!

You are right about the queuing. Those mutexes don't make a difference so I threw them out again.

Yeah the debug flag is set and I think the state transitions are ok. Whenever there's something wrong with those barriers I get spammed instantly in the debug output.

 

Edited by zmic

Share this post


Link to post
Share on other sites

CopyTextureRegion is extremely difficult to use, because you have to make sure that the formats match. I used a similar approach as you in a performance test: copying a loaded texture directly to swap chain back buffer of my output window. Some textures would work, others not. That could explain why your app does not work on the other PC at all.

I guess the only good way to get the image from your compute shader back to the output window is the thing you didn't want to: Use the texture in a regular draw call, because that will convert the different formats for you. I know it sounds silly to program a shader for a simple copy to output window, but at least I cold not find another way.

Bottom line: CopyTextureRegion is fine to copy areas between your own buffers that have the same format, but has not much use in copying to the swap chain back buffer, because that buffer type will depend on the machine you run your program on.

Share this post


Link to post
Share on other sites
18 minutes ago, clemensx said:

CopyTextureRegion is extremely difficult to use, because you have to make sure that the formats match. I used a similar approach as you in a performance test: copying a loaded texture directly to swap chain back buffer of my output window. Some textures would work, others not. That could explain why your app does not work on the other PC at all.

I guess the only good way to get the image from your compute shader back to the output window is the thing you didn't want to: Use the texture in a regular draw call, because that will convert the different formats for you. I know it sounds silly to program a shader for a simple copy to output window, but at least I cold not find another way.

Bottom line: CopyTextureRegion is fine to copy areas between your own buffers that have the same format, but has not much use in copying to the swap chain back buffer, because that buffer type will depend on the machine you run your program on.

Thanks for answering!

In the meantime I stumbled on some kludge that works for whatever reason... the kludge being that I add 16 extra pixels to the size of the backbuffer on resize. I have no idea why this works, but the program only has to run on one machine so I'll take it... for now. When I have some more time I will look into it again.

    _iswapchain3->GetDesc(&desc);
    if (!minimized)
    {
        THROW_FAIL(_iswapchain3->ResizeBuffers(2, X + 16, Y + 16, desc.BufferDesc.Format, desc.Flags));
    }

Without the 16, it doesn't work. With the 16, it does. Yeah, really :)

By the way the backbuffer is created with DXGI_FORMAT_R8G8B8A8_UNORM and my own buffer has the same format internally, so just I assumed they would be compatible that way.

 

Edited by zmic

Share this post


Link to post
Share on other sites
3 hours ago, zmic said:

Thanks for answering!

In the meantime I stumbled on some kludge that works for whatever reason... the kludge being that I add 16 extra pixels to the size of the backbuffer on resize. I have no idea why this works, but the program only has to run on one machine so I'll take it... for now. When I have some more time I will look into it again.


    _iswapchain3->GetDesc(&desc);
    if (!minimized)
    {
        THROW_FAIL(_iswapchain3->ResizeBuffers(2, X + 16, Y + 16, desc.BufferDesc.Format, desc.Flags));
    }

Without the 16, it doesn't work. With the 16, it does. Yeah, really :)

By the way the backbuffer is created with DXGI_FORMAT_R8G8B8A8_UNORM and my own buffer has the same format internally, so just I assumed they would be compatible that way.

 

 

Hello zmic,

 

I have absolutely no experience in doing what you are trying to do (copy data directly to swap chain buffer) but I just noticed one thing in your last comment if I read what clemensx said. I'm not sure if clemensx meant that formats have to be the exact same or only the bytes alignment of the formats. If the formats really have to be exactly the same, are you sure that the buffer of your swap chain is not actually B8G8R8A8_UNORM instead of R8G8B8A8_UNORM? I've seen swap chains using the B8G8R8A8_UNORM, R11G11B10_FLOAT and R16G16B16A16_FLOAT formats but never R8G8B8A8_UNORM like common textures.

 

Also I've had to do the same thing as you in the past but I've done it with an extra draw call and a quad covering the whole buffer after reading everywhere that it was normal to have to do such thing.

Edited by ChuckNovice

Share this post


Link to post
Share on other sites

yes ChuckNovice, this is exactly what I meant. A simple format change from B8G8... to R8G8... will not work. Some details can be found in the official docu at https://msdn.microsoft.com/de-de/library/windows/desktop/dn903862(v=vs.85).aspx

I ended up doing the same thing as you, using a very simple triangle setup that spans the whole screen and copying the texture there using normal shader code. Kind of what you need to do anyway for a number of reasons, like blur effects and many more.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
  • Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By korben_4_leeloo
      Hi.
      I wanted to experiment D3D12 development and decided to run some tutorials: Microsoft DirectX-Graphics-Samples, Braynzar Soft, 3dgep...Whatever sample I run, I've got the same crash.
      All the initialization process is going well, no error, return codes ok, but as soon as the Present method is invoked on the swap chain, I'm encountering a crash with the following call stack:
      https://drive.google.com/open?id=10pdbqYEeRTZA5E6Jm7U5Dobpn-KE9uOg
      The crash is an access violation to a null pointer ( with an offset of 0x80 )
      I'm working on a notebook, a toshiba Qosmio x870 with two gpu's: an integrated Intel HD 4000 and a dedicated NVIDIA GTX 670M ( Fermi based ). The HD 4000 is DX11 only and as far as I understand the GTX 670M is DX12 with a feature level 11_0. 
      I checked that the good adapter was chosen by the sample, and when the D3D12 device is asked in the sample with a 11_0 FL, it is created with no problem. Same for all the required interfaces ( swap chain, command queue...).
      I tried a lot of things to solve the problem or get some info, like forcing the notebook to always use the NVIDIA gpu, disabling the debug layer, asking for a different feature level ( by the way 11_0 is the only one that allows me to create the device, any other FL will fail at device creation )...
      I have the latest NVIDIA drivers ( 391.35 ), the latest Windows 10 sdk ( 10.0.17134.0 ) and I'm working under 
      Visual Studio 2017 Community.
      Thanks to anybody who can help me find the problem...
    • By _void_
      Hi guys!
      In a lot of samples found in the internet, people when initialize D3D12_SHADER_RESOURCE_VIEW_DESC with resource array size 1 would normallay set its dimension as Texture2D. If the array size is greater than 1, then they would use dimension as Texture2DArray, for an example.
      If I declare in the shader SRV as Texture2DArray but create SRV as Texture2D (array has only 1 texture) following the same principle as above, would this be OK? I guess, this should work as long as I am using array index 0 to access my texture?
      Thanks!
    • By _void_
      Hey!
       
      What is the recommended upper count for commands to record in the command list bundle?
      According to MSDN it is supposed to be a small number but do not elaborate on the actual number.
      I am thinking if I should pre-record commands in the command buffer and use ExecuteIndirect or maybe bundles instead.
      The number of commands to record in my case could vary greatly. 
       
      Thanks!
    • By Jason Smith
      While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
      #define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
      mov  qword ptr [rdx],rax
      which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.
       
    • By lubbe75
      As far as I understand there is no real random or noise function in HLSL. 
      I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
      Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious? 
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By NikiTo
      Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.
       
      if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
      Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

      MSN>
      discard: Do not output the result of the current pixel.
      clip: Discards the current pixel..
      <MSN

      As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

      I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

      (what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
    • By NikiTo
      I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
      Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

      Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!



       
    • By Axiverse
      I'm wondering when upload buffers are copied into the GPU. Basically I want to pool buffers and want to know when I can reuse and write new data into the buffers.
    • By NikiTo
      AMD forces me to use MipLevels in order to can read from a heap previously used as RTV. Intel's integrated GPU works fine with MipLevels = 1 inside the D3D12_RESOURCE_DESC. For AMD I have to set it to 0(or 2). MSDN says 0 means max levels. With MipLevels = 1, AMD is rendering fine to the RTV, but reading from the RTV it shows the image reordered.

      Is setting MipLevels to something other than 1 going to cost me too much memory or execution time during rendering to RTVs, because I really don't need mipmaps at all(not for the 99% of my app)?

      (I use the same 2D D3D12_RESOURCE_DESC for both the SRV and RTV sharing the same heap. Using 1 for MipLevels in that D3D12_RESOURCE_DESC gives me results like in the photos attached below. Using 0 or 2 makes AMD read fine from the RTV. I wish I could sort this somehow, but in the last two days I've tried almost anything to sort this problem, and this is the only way it works on my machine.)


  • Forum Statistics

    • Total Topics
      631067
    • Total Posts
      2997734
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!