Sign in to follow this  

DX12 Need barrier after RTV cleanrendertarget and before actual rendering?

This topic is 401 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey Guys,

 

a quick question:

do we need to insert some kind of barrier between RTV clean and actual rendering to the same RTV?

 

It seems we don't need this barrier, but after struggling with a lot resource state tracking issues in DX12, I kinda of overreacting whenever we have two GPU tasks working on same memory address...

 

If we don't need this barrier, how could we assure that when I actually write to pixel, the clear operation on the same pixel is done by then?

 

Thanks in advance

Share this post


Link to post
Share on other sites

No, you don't need a barrier in the case of a clear followed by a draw. Draws implicitly have ordering guarantees with regards to render target read/write operations: if you issue Draw A before Draw B, then the writes (and blending operations) of Draw B are guaranteed to happen after the writes of Draw A are completed.

 

Note that these guarantees only apply to render target operations, and not to the shaders themselves or any UAV read/writes. The pixel shaders can and will execute in any order, and typically the hardware will have some sort of mechanism for ensuring that the RT writes get put in the correct order even though the shaders themselves did not execute in draw order. This is why Rasterizer Ordered Views were added for D3D12, since they let ensure that writes to UAV's happen in draw order.

Edited by MJP

Share this post


Link to post
Share on other sites

There is no need for a barrier between a clear and render. In fact, if your resource is not in a RT state, the clear is likely to yield at you with the debug layer. You wiil need of course, later, a barrier from RTV to SRV, this will instruct the driver to perform fast clear elimination. As a side note, be sure to put the proper clear color at resource creation so the fast clear can be enable.

Share this post


Link to post
Share on other sites

No, you don't need a barrier in the case of a clear followed by a draw. Draws implicitly have ordering guarantees with regards to render target read/write operations: if you issue Draw A before Draw B, then the writes (and blending operations) of Draw B are guaranteed to happen after the writes of Draw A are completed.

 

Note that these guarantees only apply to render target operations, and not to the shaders themselves or any UAV read/writes. The pixel shaders can and will execute in any order, and typically the hardware will have some sort of mechanism for ensuring that the RT writes get put in the correct order even though the shaders themselves did not execute in draw order. This is why Rasterizer Ordered Views were added for D3D12, since they let ensure that writes to UAV's happen in draw order.

Thanks. Just curious how GPU achieved that ordering in RT write. They give each RT write a Draw ID and block undesired RT write?(so possible block relative ps thread?)?


later, a barrier from RTV to SRV, this will instruct the driver to perform fast clear elimination.

Thanks for the reply, but I just get confused by this sentence. why there is a fast clear elimination when we transit resource from RTV to SRV? what this clear elimination doing?

 

Thanks 

Share this post


Link to post
Share on other sites

Thanks. Just curious how GPU achieved that ordering in RT write. They give each RT write a Draw ID and block undesired RT write?(so possible block relative ps thread?)?


For the case of traditional "immediate mode" GPU's (the kind you find in the discrete video cards used by laptops and desktops), the magic happens in the ROPs. The ROPs are the bit of hardware that handles memory access to the render targets, and they're capable of sorting their inputs by draw order to ensure that the writes happen in the correct order. See this more info: https://fgiesen.wordpress.com/2011/07/12/a-trip-through-the-graphics-pipeline-2011-part-9/

Share this post


Link to post
Share on other sites
Fast clear mecanihum is a gpu optimisation. The gpu split your surface in little tiles and keep a little block of memory for their status. when you clear, only the status of the tile is cleared, not your surface. when you render, touched tiles will clear themselves (if not fully covered). Then once you are done, the gpu will have to clear the remaining tiles. Hopefuly, not many as you have covered most of the surface, and so save on bandwidth.

This is why you provide a clear color at the resource creation. Usually, the fast clear will only work with it.

That kind of system exist for color compression and depth buffer optimisation too. That is why resource barrier are important so the driver knows when to perform actions.

Share this post


Link to post
Share on other sites

Thanks MJP, that link you posted is super useful :)

And thanks galop1n for elaborating on the RT clear, HW vendors are doing crazy thing in a block box... wish to know more...

Share this post


Link to post
Share on other sites

AMD has lots of hardware documentation available if you really want to get into some of the low-level details of their GPU's: http://developer.amd.com/resources/developer-guides-manuals/ (scroll down to "Instruction Set Architecture (ISA) Documents" and "Open GPU Documentation"). Intel also has a ton of docs available: https://01.org/linuxgraphics/documentation

Share this post


Link to post
Share on other sites

AMD has lots of hardware documentation available if you really want to get into some of the low-level details of their GPU's: http://developer.amd.com/resources/developer-guides-manuals/ (scroll down to "Instruction Set Architecture (ISA) Documents" and "Open GPU Documentation"). Intel also has a ton of docs available: https://01.org/linuxgraphics/documentation

 

And nVidia keep everything secret :(

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Similar Content

    • By VietNN
      Hi all,
      I want to copy  just 1 mipmap level of a texture and I am doing like this:
      void CopyTextureRegion( &CD3DX12_TEXTURE_COPY_LOCATION(pDstData, mipmapIndex), 0, 0, 0, &CD3DX12_TEXTURE_COPY_LOCATION(pSrcData, pLayout), nullptr ); - pDstData : is DEFAULT_HEAP, pSrcData is UPLOAD_HEAP(buffer size was get by GetCopyableFootprints from pDstData with highest miplevel), pLayout is D3D12_PLACED_SUBRESOURCE_FOOTPRINT
      - I think the mipmapIndex will point the exact location data of Dest texture, but does it know where to get data location from Src texture because pLayout just contain info of this mipmap(Offset and Footprint).  (???)
      - pLayout has a member name Offset, and I try to modify it but it(Offset) need 512 Alignment but real offset in Src texture does not.
      So what I need to do to match the location of mip texture in Src Texture ?
      @SoldierOfLight @galop1n
    • By _void_
      Hello!
      I am wondering if there is a way to find out how many resources you could bind to the command list directly without putting them in a descriptor table.
      Specifically, I am referring to these guys:
      - SetGraphicsRoot32BitConstant
      - SetGraphicsRoot32BitConstants
      - SetGraphicsRootConstantBufferView
      - SetGraphicsRootShaderResourceView
      - SetGraphicsRootUnorderedAccessView
      I remember from early presentations on D3D12 that the count of allowed resources is hardware dependent and quite small. But I would like to learn some more concrete figures.
    • By lubbe75
      I am trying to set up my sampler correctly so that textures are filtered the way I want. I want to use linear filtering for both min and mag, and I don't want to use any mipmap at all.
      To make sure that mipmap is turned off I set the MipLevels to 1 for my textures.
      For the sampler filter I have tried all kind of combinations, but somehow the mag filter works fine while the min filter doesn't seem to work at all. As I zoom out there seems to be a nearest point filter.
      Is there a catch in Dx12 that makes my min filter not working?
      Do I need to filter manually in my shader? I don't think so since the mag filter works correctly.
      My pixel shader is just a simple texture lookup:
      textureMap.Sample(g_sampler, input.uv); My sampler setup looks like this (SharpDX):
      sampler = new StaticSamplerDescription() { Filter = Filter.MinMagLinearMipPoint, AddressU = TextureAddressMode.Wrap, AddressV = TextureAddressMode.Wrap, AddressW = TextureAddressMode.Wrap, ComparisonFunc = Comparison.Never, BorderColor = StaticBorderColor.TransparentBlack, ShaderRegister = 0, RegisterSpace = 0, ShaderVisibility = ShaderVisibility.Pixel, };  
    • By lubbe75
      Does anyone have a working example of how to implement MSAA in DX12? I have read short descriptions and I have seen code fragments on how to do it with DirectX Tool Kit.
      I get the idea, but with all the pipeline states, root descriptions etc I somehow get lost on the way.
      Could someone help me with a link pointing to a small implementation in DirectX 12 (or SharpDX with DX12)?
       
    • By HD86
      I have a vertex buffer on a default heap. I need a CPU pointer to that buffer in order to loop through the vertices and change one value in some vertices (the color value). In the past this was possible by creating the buffer with the flag D3DUSAGE_DYNAMIC/D3D11_USAGE_DYNAMIC and using IDirect3DVertexBuffer9::Lock or ID3D11DeviceContext::Map to get a pointer.
      What is the correct way to do the same in DX 12? As far as I understand, the method ID3D12Resource::Map cannot be used on a default heap because default heaps cannot be accessed directly from the CPU. The documentation says that upload heaps are intended for CPU-write-once, GPU-read-once usage, so I don't think these are equivalent to the "dynamic" buffers. Is the readback heap equivalent to what was called a dynamic buffer? Or should I create a custom heap?
      I am thinking to do the following:
      -Create a temporary readback heap.
      -Copy the data from the default heap to the readback heap using UpdateSubresources.
      -Get a CPU pointer to the readback heap using Map and edit the data.
      -Copy the data back to the default heap using UpdateSubresources.
      What do you think about this?
  • Popular Now