Sign in to follow this  
Mr_Fox

DX12 Need barrier after RTV cleanrendertarget and before actual rendering?

Recommended Posts

Hey Guys,

 

a quick question:

do we need to insert some kind of barrier between RTV clean and actual rendering to the same RTV?

 

It seems we don't need this barrier, but after struggling with a lot resource state tracking issues in DX12, I kinda of overreacting whenever we have two GPU tasks working on same memory address...

 

If we don't need this barrier, how could we assure that when I actually write to pixel, the clear operation on the same pixel is done by then?

 

Thanks in advance

Share this post


Link to post
Share on other sites

No, you don't need a barrier in the case of a clear followed by a draw. Draws implicitly have ordering guarantees with regards to render target read/write operations: if you issue Draw A before Draw B, then the writes (and blending operations) of Draw B are guaranteed to happen after the writes of Draw A are completed.

 

Note that these guarantees only apply to render target operations, and not to the shaders themselves or any UAV read/writes. The pixel shaders can and will execute in any order, and typically the hardware will have some sort of mechanism for ensuring that the RT writes get put in the correct order even though the shaders themselves did not execute in draw order. This is why Rasterizer Ordered Views were added for D3D12, since they let ensure that writes to UAV's happen in draw order.

Edited by MJP

Share this post


Link to post
Share on other sites

There is no need for a barrier between a clear and render. In fact, if your resource is not in a RT state, the clear is likely to yield at you with the debug layer. You wiil need of course, later, a barrier from RTV to SRV, this will instruct the driver to perform fast clear elimination. As a side note, be sure to put the proper clear color at resource creation so the fast clear can be enable.

Share this post


Link to post
Share on other sites

No, you don't need a barrier in the case of a clear followed by a draw. Draws implicitly have ordering guarantees with regards to render target read/write operations: if you issue Draw A before Draw B, then the writes (and blending operations) of Draw B are guaranteed to happen after the writes of Draw A are completed.

 

Note that these guarantees only apply to render target operations, and not to the shaders themselves or any UAV read/writes. The pixel shaders can and will execute in any order, and typically the hardware will have some sort of mechanism for ensuring that the RT writes get put in the correct order even though the shaders themselves did not execute in draw order. This is why Rasterizer Ordered Views were added for D3D12, since they let ensure that writes to UAV's happen in draw order.

Thanks. Just curious how GPU achieved that ordering in RT write. They give each RT write a Draw ID and block undesired RT write?(so possible block relative ps thread?)?


later, a barrier from RTV to SRV, this will instruct the driver to perform fast clear elimination.

Thanks for the reply, but I just get confused by this sentence. why there is a fast clear elimination when we transit resource from RTV to SRV? what this clear elimination doing?

 

Thanks 

Share this post


Link to post
Share on other sites

Thanks. Just curious how GPU achieved that ordering in RT write. They give each RT write a Draw ID and block undesired RT write?(so possible block relative ps thread?)?


For the case of traditional "immediate mode" GPU's (the kind you find in the discrete video cards used by laptops and desktops), the magic happens in the ROPs. The ROPs are the bit of hardware that handles memory access to the render targets, and they're capable of sorting their inputs by draw order to ensure that the writes happen in the correct order. See this more info: https://fgiesen.wordpress.com/2011/07/12/a-trip-through-the-graphics-pipeline-2011-part-9/

Share this post


Link to post
Share on other sites
Fast clear mecanihum is a gpu optimisation. The gpu split your surface in little tiles and keep a little block of memory for their status. when you clear, only the status of the tile is cleared, not your surface. when you render, touched tiles will clear themselves (if not fully covered). Then once you are done, the gpu will have to clear the remaining tiles. Hopefuly, not many as you have covered most of the surface, and so save on bandwidth.

This is why you provide a clear color at the resource creation. Usually, the fast clear will only work with it.

That kind of system exist for color compression and depth buffer optimisation too. That is why resource barrier are important so the driver knows when to perform actions.

Share this post


Link to post
Share on other sites

AMD has lots of hardware documentation available if you really want to get into some of the low-level details of their GPU's: http://developer.amd.com/resources/developer-guides-manuals/ (scroll down to "Instruction Set Architecture (ISA) Documents" and "Open GPU Documentation"). Intel also has a ton of docs available: https://01.org/linuxgraphics/documentation

Share this post


Link to post
Share on other sites

AMD has lots of hardware documentation available if you really want to get into some of the low-level details of their GPU's: http://developer.amd.com/resources/developer-guides-manuals/ (scroll down to "Instruction Set Architecture (ISA) Documents" and "Open GPU Documentation"). Intel also has a ton of docs available: https://01.org/linuxgraphics/documentation

 

And nVidia keep everything secret :(

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      628291
    • Total Posts
      2981862
  • Similar Content

    • By lubbe75
      I am looking for some example projects and tutorials using sharpDX, in particular DX12 examples using sharpDX. I have only found a few. Among them the porting of Microsoft's D3D12 Hello World examples (https://github.com/RobyDX/SharpDX_D3D12HelloWorld), and Johan Falk's tutorials (http://www.johanfalk.eu/).
      For instance, I would like to see an example how to use multisampling, and debugging using sharpDX DX12.
      Let me know if you have any useful examples.
      Thanks!
    • By lubbe75
      I'm writing a 3D engine using SharpDX and DX12. It takes a handle to a System.Windows.Forms.Control for drawing onto. This handle is used when creating the swapchain (it's set as the OutputHandle in the SwapChainDescription). 
      After rendering I want to give up this control to another renderer (for instance a GDI renderer), so I dispose various objects, among them the swapchain. However, no other renderer seem to be able to draw on this control after my DX12 renderer has used it. I see no exceptions or strange behaviour when debugging the other renderers trying to draw, except that nothing gets drawn to the area. If I then switch back to my DX12 renderer it can still draw to the control, but no other renderers seem to be able to. If I don't use my DX12 renderer, then I am able to switch between other renderers with no problem. My DX12 renderer is clearly messing up something in the control somehow, but what could I be doing wrong with just SharpDX calls? I read a tip about not disposing when in fullscreen mode, but I don't use fullscreen so it can't be that.
      Anyway, my question is, how do I properly release this handle to my control so that others can draw to it later? Disposing things doesn't seem to be enough.
    • By Tubby94
      I'm currently learning how to store multiple objects in a single vertex buffer for efficiency reasons. So far I have a cube and pyramid rendered using ID3D12GraphicsCommandList::DrawIndexedInstanced; but when the screen is drawn, I can't see the pyramid because it is drawn inside the cube. I'm told to "Use the world transformation matrix so that the box and pyramid are disjoint in world space".
       
      Can anyone give insight on how this is accomplished? 
       
           First I init the verts in Local Space
      std::array<VPosData, 13> vertices =     {         //Cube         VPosData({ XMFLOAT3(-1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, +1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, +1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(-1.0f, +1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, +1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, +1.0f) }),         //Pyramid         VPosData({ XMFLOAT3(-1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(-1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, -1.0f) }),         VPosData({ XMFLOAT3(+1.0f, -1.0f, +1.0f) }),         VPosData({ XMFLOAT3(0.0f,  +1.0f, 0.0f) }) } Then  data is stored into a container so sub meshes can be drawn individually
      SubmeshGeometry submesh; submesh.IndexCount = (UINT)indices.size(); submesh.StartIndexLocation = 0; submesh.BaseVertexLocation = 0; SubmeshGeometry pyramid; pyramid.IndexCount = (UINT)indices.size(); pyramid.StartIndexLocation = 36; pyramid.BaseVertexLocation = 8; mBoxGeo->DrawArgs["box"] = submesh; mBoxGeo->DrawArgs["pyramid"] = pyramid;  
      Objects are drawn
      mCommandList->DrawIndexedInstanced( mBoxGeo->DrawArgs["box"].IndexCount, 1, 0, 0, 0); mCommandList->DrawIndexedInstanced( mBoxGeo->DrawArgs["pyramid"].IndexCount, 1, 36, 8, 0);  
      Vertex Shader
       
      cbuffer cbPerObject : register(b0) { float4x4 gWorldViewProj; }; struct VertexIn { float3 PosL : POSITION; float4 Color : COLOR; }; struct VertexOut { float4 PosH : SV_POSITION; float4 Color : COLOR; }; VertexOut VS(VertexIn vin) { VertexOut vout; // Transform to homogeneous clip space. vout.PosH = mul(float4(vin.PosL, 1.0f), gWorldViewProj); // Just pass vertex color into the pixel shader. vout.Color = vin.Color; return vout; } float4 PS(VertexOut pin) : SV_Target { return pin.Color; }  

    • By mark_braga
      I am confused why this code works because the lights array is not 16 bytes aligned.
      struct Light {     float4 position;     float radius;     float intensity; // How does this work without adding // uint _pad0, _pad1; }; cbuffer lightData : register(b0) {     uint lightCount;     uint _pad0;     uint _pad1;     uint _pad2; // Shouldn't the shader be not able to read the second element in the light struct // Because after float intensity, we need 8 more bytes to make it 16 byte aligned?     Light lights[NUM_LIGHTS]; } This has erased everything I thought I knew about constant buffer alignment. Any explanation will help clear my head.
      Thank you
    • By HD86
      I don't know in advance the total number of textures my app will be using. I wanted to use this approach but it turned out to be impractical because D3D11 hardware may not allow binding more than 128 SRVs to the shaders. Next I decided to keep all the texture SRV's in a default heap that is invisible to the shaders, and when I need to render a texture I would copy its SRV from the invisible heap to another heap that is bound to the pixel shader, but this also seems impractical because ID3D12Device::CopyDescriptorsSimple cannot be used in a command list. It executes immediately when it is called. I would need to close, execute and reset the command list every time I need to switch the texture.
      What is the correct way to do this?
  • Popular Now