Jump to content
  • Advertisement
  • 01/13/19 10:39 AM

    Graphics
    A Practical Approach to Managing Resource States in Vulkan and Direct3D12

    Graphics and GPU Programming

    DiligentDev

    Introduction

    Explicit resource state management and synchronization is one of the main advantages and main challenges that modern graphics APIs such as Direct3D12 and Vulkan offer application developers. It makes rendering command recording very efficient, but getting state management right is a challenging problem. This article explains why explicit state management is important and introduces a solution implemented in Diligent Engine, a modern cross-platform low-level graphics library. Diligent Engine has Direct3D11, Direct3D12, OpenGL/GLES and Vulkan backends and supports Windows Desktop, Universal Windows, Linux, Android, Mac and iOS platforms. Its full source code is available on GitHub and is free to use. This article gives an introduction to Diligent Engine.

     

    Synchronization in Next-Gen APIs

    Modern graphics applications can best be described as client-server systems where CPU is a client that records rendering commands and puts them into queue(s), and GPU is a server that asynchronously pulls commands from the queue(s) and processes them. As a result, commands are not executed immediately when CPU issues them, but rather sometime later (typically one to two frames) when GPU gets to the corresponding point in the queue. Besides that, GPU architecture is very different from CPU because of the kind of problems that GPUs are designed to handle. While CPUs are great at running algorithms with lots of flow control constructs (branches, loops, etc.) such as handling events in an application input loop, GPUs are more efficient at crunching numbers by executing the same computation thousands and even millions of times. Of course, there is a little bit of oversimplification in that statement as modern CPUs also have wide SIMD (single instruction multiple data) units that allow them to perform computations efficiently as well. Still, GPUs are at least order of magnitude faster in these kinds of problems.

    The main challenge that both CPUs and GPUs need to solve is memory latency. CPUs are out-of-order machines with beefy cores and large caches that use fancy prefetching and branch-prediction circuitry to make sure that data is available when a core actually needs it. GPUs, in contrast, are in-order beasts with small caches, thousands of tiny cores and very deep pipelines. They don't use any branch prediction or prefetching, but instead maintain tens of thousands of threads in flight and are capable of switching between threads instantaneously. When one group of threads waits for a memory request, GPU can simply switch to another group provided it has enough work.

    When programming CPU (when talking about CPU I will mean x86 CPU; things may be a little bit more involved for ARM ones), the hardware does a lot of things that we usually take for granted. For instance, after one core has written something at a memory address, we know that another core can immediately read the same memory. The cache line containing the data will need to do a little bit of travelling through the CPU, but eventually, another core will get the correct piece of information with no extra effort from the application. GPUs, in contrast, give very few explicit guarantees. In many cases, you cannot expect that a write is visible to subsequent reads unless special care is taken by the application. Besides that, the data may need to be converted from one form to another before it can be consumed by the next step. Few examples where explicit synchronization may be required:

    • After data has been written to a texture or a buffer through an unordered access view (UAV in Direct3D) or an image (in Vulkan/OpenGL terminology), the GPU may need to wait until all writes are complete and flush the caches to memory before the same texture or buffer can be read by another shader.
    • After shadow map rendering command is executed, the GPU may need to wait until rasterization and all writes are complete, flush the caches and change the texture layout to a format optimized for sampling before that shadow map can be used in a lighting shader.
    • If CPU needs to read data previously written by the GPU, it may need to invalidate that memory region to make sure that caches get updated bytes.

    These are just a few examples of synchronization dependencies that a GPU needs to resolve. Traditionally, all these problems were handled by the API/driver and were hidden from the developer. Old-school implicit APIs such as Direct3D11 and OpenGL/GLES work that way. This approach, while being convenient from a developer's point of view, has major limitations that result in suboptimal performance. First, a driver or API does not know what the developer's intent is and have to always assume the worst-case scenario to guarantee correctness. For instance, if one shader writes to one region of a UAV, but the next shader reads from another region, the driver must always insert a barrier to guarantee that all writes are complete and visible because it just can't know that the regions do not overlap and the barrier is not really necessary.

    The biggest problem though is that this approach makes parallel command recording almost useless. Consider a scenario where one thread records commands to render a shadow map, while the second thread simultaneously records commands to use this shadow map in a forward rendering pass. The first thread needs the shadow map to be in depth-stencil writable state, while the second thread needs it to be in shader readable state. The problem is that the second thread does not know what the original state of the shadow map is. So what happens is when an application submits the second command buffer for execution, the API needs to find out what the actual state of the shadow map texture is and patch the command buffer with the right state transition. It needs to do this not only for our shadow map texture but for any other resource that the command list may use. This is a significant serialization bottleneck and there was no way in old APIs to solve it.

    Solution to the aforementioned problems is given by the next-generation APIs (Direct3D12 and Vulkan) that make all resource transitions explicit. It is up to the application now to track the states of all resources and assure that all required barriers/transitions are executed. In the example above, the application will know that when the shadow map is used in a forward pass, it will be in the depth-stencil writable state, so the barrier can be inserted right away without the need to wait for the first command buffer to be recorded or submitted. The downside here is that the application is now responsible for tracking all resource states which could be a significant burden.

    Let's now take a closer look at how synchronization is implemented in Vulkan and Direct3D12.

     

    Synchronization in Vulkan

    Vulkan enables very fine-grain control over synchronization operations and provides tools to individually tweak the following aspects:

    • Execution dependencies, i.e. which set of operations must be completed before another set of operations can begin.
    • Memory dependencies, i.e. which memory writes must be made available to subsequent reads.
    • Layout transitions, i.e. what texture memory layout transformations must be performed, if any.

    Executions dependencies are expressed as dependencies between pipeline stages that naturally map to the traditional GPU pipeline. The type of memory access is defined by VkAccessFlagBits enum. Certain access types are only valid for specific pipeline stages. All valid combinations are listed in Section 6.1.3 of Vulkan Spec, which are also given in the following table:

     

    | Access flag (VK_ACCESS_)           | Pipeline Stages             |
    |                                    |(VK_PIPELINE_STAGE_)         | Access Type Description
    |------------------------------------|-----------------------------|---------------------------------------------------------------------
    | INDIRECT_COMMAND_READ_BIT          | DRAW_INDIRECT_BIT           | Read access to indirect draw/dispatch command data attributes stored in a buffer
    | INDEX_READ_BIT                     | VERTEX_INPUT_BIT            | Read access to an index buffer
    | VERTEX_ATTRIBUTE_READ_BIT          | STAGE_VERTEX_INPUT_BIT      | Read access to a vertex buffer
    | UNIFORM_READ_BIT                   | ANY_SHADER_BIT              | Read access to a uniform (constant) buffer
    | SHADER_READ_BIT                    | ANY_SHADER_BIT              | Read access to a storage buffer (buffer UAV), uniform texel buffer (buffer SRV), sampled image (texture SRV), storage image (texture UAV)
    | SHADER_WRITE_BIT                   | ANY_SHADER_BIT              | Write access to a storage buffer (buffer UAV), or storage image (texture UAV)
    | INPUT_ATTACHMENT_READ_BIT          | FRAGMENT_SHADER_BIT         | Read access to an input attachment (render target) during fragment shading
    | COLOR_ATTACHMENT_READ_BIT          | COLOR_ATTACHMENT_OUTPUT_BIT | Read access to a color attachment (render target) such as via blending or logic operations
    | COLOR_ATTACHMENT_WRITE_BIT         | COLOR_ATTACHMENT_OUTPUT_BIT | Write access to a color attachment (render target) during render pass or via certain operations such as blending
    | DEPTH_STENCIL_ATTACHMENT_READ_BIT  | EARLY_FRAGMENT_TESTS_BIT or |
    |                                    | LATE_FRAGMENT_TESTS_BIT     | Read access to depth/stencil buffer via depth/stencil operations
    | DEPTH_STENCIL_ATTACHMENT_WRITE_BIT | EARLY_FRAGMENT_TESTS_BIT or |
    |                                    | LATE_FRAGMENT_TESTS_BIT     | Write access to depth/stencil buffer via depth/stencil operations
    | TRANSFER_READ_BIT                  | TRANSFER_BIT                | Read access to an image (texture) or buffer in a copy operation
    | TRANSFER_WRITE_BIT                 | TRANSFER_BIT                | Write access to an image (texture) or buffer in a clear or copy operation
    | HOST_READ_BIT                      | HOST_BIT                    | Read access by a host
    | HOST_WRITE_BIT                     | HOST_BIT                    | Write access by a host

    Table 1. Valid combinations of access flags and pipeline stages. ANY_SHADER_BIT means TESSELLATION_CONTROL_SHADER_BIT, TESSELLATION_EVALUATION_SHADER_BIT, GEOMETRY_SHADER_BIT, FRAGMENT_SHADER_BIT, or COMPUTE_SHADER_BIT

     

    As you can see most access flags correspond 1:1 to a pipeline stage. For example, quite naturally vertex indices can only be read at the vertex input stage, while final color can only be written at color attachment (render target in Direct3D12 terminology) output stage. For certain access types, you can precisely specify what stage will use that access type. Most importantly, for shader reads (such as texture sampling), writes (UAV/image stores) and uniform buffer access it is possible to precisely tell the system what shader stages will be using that access type. For depth-stencil read/write access it is possible to distinguish if the access happens at the early or late fragment test stage. Quite honestly I can't really come up with any examples where this flexibility may be useful and result in measurable performance improvement. Note that it is against the spec to specify access flag for a stage that does not support that type of access (such as depth-stencil write access for vertex shader stage).

    An application may use these tools to very precisely specify dependencies between stages. For example, it may request that writes to a uniform buffer from vertex shader stage are made available to reads from the fragment shader in a subsequent draw call. An advantage here is that since dependency starts at the fragment shader stage, the driver will not need to synchronize the execution of the vertex shader stage, potentially saving some GPU cycles.

    For image (texture) resources, a synchronization barrier also defines layout transitions, i.e. potential data reorganization that the GPU may need to perform to support the requested access type. Section 11.4 of the Vulkan spec describes available layouts and how they must be used. Since every layout can only be used at certain pipeline stages (for example, color-attachment-optimal layout can only be used by color attachment read/write stage), and every pipeline stage allows only few access types, we can list all allowed access flags for every layout, as presented in the table below:

     

    |Image layout (VK_IMAGE_LAYOUT)    | Access (VK_ACCESS_)                |   Description
    |----------------------------------|------------------------------------|----------------------------------------------------
    | UNDEFINED                        | n/a                                | This layout can only be used as initial layout when creating an image or as the old layout in image transition. When transitioning out of this layout, the contents of the image is not preserved.
    | GENERAL                          | Any,All types of device access.    |
    | COLOR_ATTACHMENT_OPTIMAL         | COLOR_ATTACHMENT_READ_BIT          |
    |                                  | COLOR_ATTACHMENT_WRITE_BIT         | Must only be used as color attachment.
    | DEPTH_STENCIL_ATTACHMENT_OPTIMAL | DEPTH_STENCIL_ATTACHMENT_READ_BIT  |
    |                                  | DEPTH_STENCIL_ATTACHMENT_WRITE_BIT | Must only be used as depth-stencil attachment.
    | DEPTH_STENCIL_READ_ONLY_OPTIMAL  | DEPTH_STENCIL_ATTACHMENT_READ_BIT  |
    |                                  | SHADER_READ_BIT                    | Must only be used as read-only depth-stencil attachment or as read-only image in a shader.
    | SHADER_READ_ONLY_OPTIMAL         | SHADER_READ_BIT                    | Must only be used as a read-only image in a shader (sampled image or input attachment).
    | TRANSFER_SRC_OPTIMAL             | TRANSFER_READ_BIT                  | Must only be used as source for transfer (copy) commands.
    | TRANSFER_DST_OPTIMAL             | TRANSFER_WRITE_BIT                 | Must only be used as destination for transfer (copy and clear) commands.
    | PREINITIALIZED                   | n/a                                | This layout can only be used as initial layout when creating an image or as the old layout in image transition. When transitioning out of this layout, the contents of the image is preserved, as opposed to UNDEFINED layout.

    Table 2. Image layouts and allowed access flags.

     

    As with access flags and pipeline stages, there is very little freedom in combining image layouts and access flags. As a result, image layouts, access flags and pipeline stages in many cases form uniquely defined triplets.

    Note that Vulkan also exposes another form of synchronization called render passes and subpasses. The main purpose of render passes is to provide implicit synchronization guarantees such that an application does not need to insert a barrier after every single rendering command (such as draw or clear). Render passes also allow expressing the same dependencies in a form that may be leveraged by the driver (especially on GPUs that use tiled deferred rendering architectures) for more efficient rendering. Full discussion of render passes is out of scope of this post.

     

    Synchronization in Direct3D12

    Synchronization tools in Direct3D12 are not as expressive as in Vulkan, but are also not as intricate. With the exception of UAV barriers described below, Direct3D12 does not define the distinction between the execution barrier and memory barrier and operates with resource states (see Table 3).

    |  Resource state            |
    | (D3D12_RESOURCE_STATE_)    | Description
    |----------------------------|-------------------------------------------------------
    | VERTEX_AND_CONSTANT_BUFFER | The resource is used as vertex or constant buffer.
    | INDEX_BUFFER               | The resource is used as index buffer.
    | RENDER_TARGET              | The resource is used as render target.
    | UNORDERED_ACCESS           | The resource is used for unordered access via an unordered access view (UAV).
    | DEPTH_WRITE                | The resource is used in a writable depth-stencil view or a clear command.
    | DEPTH_READ                 | The resource is used in a read-only depth-stencil view.
    | NON_PIXEL_SHADER_RESOURCE  | The resource is accessed via shader resource view in any shader stage other than pixel shader.
    | PIXEL_SHADER_RESOURCE      | The resource is accessed via shader resource view in pixel shader.
    | INDIRECT_ARGUMENT          | The resource is used as the source of indirect arguments for an indirect draw or dispatch command.
    | COPY_DEST                  | The resource is as copy destination in a copy command.
    | COPY_SOURCE                | The resource is as copy source in a copy command.

    Table 3. Most commonly used resource states in Direct3D12.

     

    Direct3D12 defines three resource barrier types:

    • State transition barrier defines transition from one resource state listed in Table 3 to another. This type of barrier maps to Vulkan barrier when old an new access flags and/or image layouts are not the same.
    • UAV barrier is an execution plus memory barrier in Vulkan terminology. It does not change the state (layout), but instead indicates that all UAV accesses (read or writes) to a particular resource must complete before any future UAV accesses (read or write) can begin.
    • Aliasing barrier indicates a usage transition between two resources that are backed by the same memory and is out of scope of this article.

     

    Resource state management in Diligent Engine

    The purpose of Diligent Engine is to provide efficient cross-platform low-level graphics API that is convenient to use, but at the same time is flexible enough to not limit the applications in expressing their intent. Before version 2.4, the ability of the application to control resource state transitions was very limited. Version 2.4 made resource state transitions explicit and introduced two ways to manage the states. The first one is fully automatic, where the engine internally keeps track of the state and performs necessary transitions. The second one is manual and completely driven by the application.

     

    Automatic State Management

    Every command that may potentially perform state transitions uses one of the following state transitions modes:

    • RESOURCE_STATE_TRANSITION_MODE_NONE  - Perform no state transitions and no state validation.
    • RESOURCE_STATE_TRANSITION_MODE_TRANSITION  - Transition resources to the states required by the command.
    • RESOURCE_STATE_TRANSITION_MODE_VERIFY  - Do not transition, but verify that states are correct.

    The code snippet below gives an example of a sequence of typical rendering commands in Diligent Engine 2.4:

    // Clear the back buffer 
    const float ClearColor[] = {  0.350f,  0.350f,  0.350f, 1.0f }; 
    m_pImmediateContext->ClearRenderTarget(nullptr, ClearColor, RESOURCE_STATE_TRANSITION_MODE_TRANSITION);
    m_pImmediateContext->ClearDepthStencil(nullptr, CLEAR_DEPTH_FLAG, 1.f, 0, RESOURCE_STATE_TRANSITION_MODE_TRANSITION);
    
    // Bind vertex buffer
    Uint32 offset = 0;
    IBuffer *pBuffs[] = {m_CubeVertexBuffer};
    m_pImmediateContext->SetVertexBuffers(0, 1, pBuffs, &offset, RESOURCE_STATE_TRANSITION_MODE_TRANSITION,
                                          SET_VERTEX_BUFFERS_FLAG_RESET);
    m_pImmediateContext->SetIndexBuffer(m_CubeIndexBuffer, 0, RESOURCE_STATE_TRANSITION_MODE_TRANSITION);
    
    // Set pipeline state
    m_pImmediateContext->SetPipelineState(m_pPSO);
    // Commit shader resources
    m_pImmediateContext->CommitShaderResources(m_pSRB, RESOURCE_STATE_TRANSITION_MODE_TRANSITION);
        
    DrawAttribs DrawAttrs;
    DrawAttrs.IsIndexed = true;
    DrawAttrs.IndexType = VT_UINT32; // Index type
    DrawAttrs.NumIndices = 36;
    // Verify the state of vertex and index buffers
    DrawAttrs.Flags = DRAW_FLAG_VERIFY_STATES;
    m_pImmediateContext->Draw(DrawAttrs);

    Automatic state management is useful in many scenarios, especially when porting old applications to Diligent API. It has the following limitations though:

    • The state is tracked for the whole resource only. Individual mip levels and/or texture array slices cannot be transitioned.
    • The state is a global resources property. Every device context that uses a resource sees the same state.
    • Automatic state transitions are not thread safe. Any operation that uses RESOURCE_STATE_TRANSITION_MODE_TRANSITION requires that no other thread accesses the states of the same resources simultaneously.

     

    Explicit State Management

    As we discussed above, there is no way to efficiently solve resource management problem in a fully automated manner, so Diligent Engine is not trying to outsmart the industry and makes state transitions part of the API. It introduces a set of states that mostly map to Direct3D12 resource states as we believe this method is expressive enough and is way more clear compared to Vulkan's approach. If an application needs a very fine-grain control, it can use native API interoperability to directly insert Vulkan barriers into a command buffer. The list of states defined by Diligent Engine as well as their mapping to Direct3D12 and Vulkan is given in Table 4 below.

    | Diligent State    | Direct3D12 state           | Vulkan Image Layout              |  Vulkan Access Type  
    | (RESOURCE_STATE_) | (D3D12_RESOURCE_STATE_)    | (VK_IMAGE_LAYOUT_)               |  (VK_ACCESS_)
    |-------------------|----------------------------|----------------------------------|----------------------------------
    | UNKNOWN           | n/a                        | n/a                              | n/a
    | UNDEFINED         |  COMMON                    | UNDEFINED                        | 0
    | VERTEX_BUFFER     | VERTEX_AND_CONSTANT_BUFFER | n/a                              | VERTEX_ATTRIBUTE_READ_BIT
    | CONSTANT_BUFFER   | VERTEX_AND_CONSTANT_BUFFER | n/a                              | UNIFORM_READ_BIT
    | INDEX_BUFFER      | INDEX_BUFFER               | n/a                              | INDEX_READ_BIT
    | RENDER_TARGET     | RENDER_TARGET              | COLOR_ATTACHMENT_OPTIMAL         | COLOR_ATTACHMENT_READ_BIT | COLOR_ATTACHMENT_WRITE_BIT
    | UNORDERED_ACCESS  | UNORDERED_ACCESS           | GENERAL                          | SHADER_WRITE_BIT | SHADER_READ_BIT
    | DEPTH_READ        | DEPTH_READ                 | DEPTH_STENCIL_READ_ONLY_OPTIMAL  | DEPTH_STENCIL_ATTACHMENT_READ_BIT
    | DEPTH_WRITE       | DEPTH_WRITE                | DEPTH_STENCIL_ATTACHMENT_OPTIMAL | DEPTH_STENCIL_ATTACHMENT_READ_BIT | DEPTH_STENCIL_ATTACHMENT_WRITE_BIT
    | SHADER_RESOURCE   | NON_PIXEL_SHADER_RESOURCE  | SHADER_READ_ONLY_OPTIMAL         | SHADER_READ_BIT
    |                   | PIXEL_SHADER_RESOURCE      |                                  |
    | INDIRECT_ARGUMENT | INDIRECT_ARGUMENT          | n/a                              | INDIRECT_COMMAND_READ_BIT
    | COPY_DEST         | COPY_DEST                  | TRANSFER_DST_OPTIMAL             | TRANSFER_WRITE_BIT
    | COPY_SOURCE       | COPY_SOURCE                | TRANSFER_SRC_OPTIMAL             | TRANSFER_READ_BIT
    | PRESENT           | PRESENT                    |  PRESENT_SRC_KHR                 | MEMORY_READ_BIT

    Table 4. Mapping between Diligent resource state, Direct3D12 state, Vulkan image layouts and access flags.

     

    Diligent resource states map almost exactly 1:1 to Direct3D12 resource states. The only real difference is that in Diligent, SHADER_RESOURCE  state maps to the union of NON_PIXEL_SHADER_RESOURCE  and PIXEL_SHADER_RESOURCE states, which does not seem to be a real issue.

    Compared to Vulkan, resource states in Diligent are a little bit more general, specifically:

    • RENDER_TARGET state always defines writable render target (sets both COLOR_ATTACHMENT_READ_BIT, COLOR_ATTACHMENT_WRITE_BIT access type flags).
    • UNORDERED_ACCESS  state always defines writable storage image/storage buffer (sets both SHADER_WRITE_BIT, SHADER_READ_BIT access type flags).
    • Transitions to and out of CONSTANT_BUFFER, UNORDERED_ACCESS, and SHADER_RESOURCE states always set all applicable pipeline stage flags as given by Table 1.

    None of the limitations above seem to be causing any measurable performance degradation. Again, if an application really needs to specify more precise barrier, it can rely on native API interoperability.

    Note that Diligent defines both UNKNOWN and UNDEFINED states, which have very different meanings. UNKNOWN means that the state is not known to the engine and that application manually manages the state of this resource. UNDEFINED means that the state is known to the engine and is undefined from the point of view of the underlying API. This state has well-defined counterparts in Direct3D12 and Vulkan.

    Explicit resource state transitions in Diligent Engine are performed with the help of IDeviceContext::TransitionResourceStates() method that takes an array of StateTransitionDesc structures:

    void IDeviceContext::TransitionResourceStates(Uint32 BarrierCount, StateTransitionDesc* pResourceBarriers)

    Every element in the array defines resource to transition (a texture or a buffer), old state, new state as well as the range of mip levels and array slices, for a texture resource:

    struct StateTransitionDesc
    {
        ITexture* pTexture       = nullptr;
        IBuffer*  pBuffer        = nullptr;
    
        Uint32    FirstMipLevel  = 0;
        Uint32    MipLevelsCount = 0;
        Uint32    FirstArraySlice= 0;
        Uint32    ArraySliceCount= 0;
    
        RESOURCE_STATE OldState = RESOURCE_STATE_UNKNOWN;
        RESOURCE_STATE NewState = RESOURCE_STATE_UNKNOWN;
    
        bool UpdateResourceState = false;
    };


    If the state of the resource is known to the engine, the OldState member can be set to UNKNOWN, in which case the engine will use the state from the resource. If the state is not known to the engine, OldState must not be UNKNOWN. NewState can never be  UNKNOWN.

    An important member is UpdateResourceState flag. If set to true, the engine will set the state of the resource to value given by NewState. Otherwise, the state will remain unchanged.

     

    Switching between explicit and automatic state management

    Diligent Engine provides tools to allow switching between and mixing automatic and manual state management. Both ITexture and IBuffer interfaces expose SetState() and GetState() methods that allow an application to get and set the resource state. When the state of a resource is set to UNKNOWN, this resource will be ignored by all methods that use RESOURCE_STATE_TRANSITION_MODE_TRANSITION mode. State transitions will still be performed for all resources whose state is known. An application can thus mix automatic and manual state management by setting the state of resources that are manually managed to UNKNOWN. If an application wants to hand over state management back to the system, it can use  SetState() method to set the resource state. Alternatively, it can set UpdateResourceState flag to true, which will have the same effect.

     

    Multithreaded Safety

    As we discussed above, the main advantage of manual resource state management is the ability to record rendering commands in parallel. As resource states are tracked globally in Diligent Engine, the following precautions must be taken:

    • Recording state transitions of the same resource in multiple threads simultaneously with IDeviceContext::TransitionResourceStates() is safe as long as UpdateResourceState flag is set to false.
    • Any thread that uses RESOURCE_STATE_TRANSITION_MODE_TRANSITION mode with any method must be the only thread accessing resources that may be transitioned. This also applies to IDeviceContext::TransitionShaderResources()  method.
    • If a thread uses RESOURCE_STATE_TRANSITION_MODE_VERIFY mode with any method (which is recommended whenever possible), no other thread should alter the states of the same resources.

     

    Discussion

    Diligent Engine adopts D3D11-style API with immediate and deferred contexts to record rendering commands. Since it is well known that deferred contexts did not work well in Direct3D11, a natural question one may ask is why they work in Diligent. And the answer is because of the explicit state transition control. While in Direct3D11, resource state management was always automatic, Diligent gives the application direct control of how resource states must be handled by every operation. At the same time, device contexts incorporate dynamic memory, descriptor management and other tasks that need to be handled by a thread that records rendering commands.

     

    Conclusion

    Explicit resource state management system introduced in Diligent Engine v2.4 combines flexibility, efficiency and convenience to use. An application may rely on automatic resource state management in typical rendering scenarios and switch to manual mode when the engine does not have enough knowledge to manage the states optimally or when it is not possible such as in the case of multithreaded rendering command recording.

    At the moment Diligent Engine only supports one command queue exposed as single immediate context. One of the next steps is to expose multiple command queues through multiple immediate contexts as well as primitives to synchronize execution between queues to allow async compute and other advanced rendering techniques.



      Report Article


    User Feedback


    A very interesting introduction. Thank you for that.

    I was very excited right until the end where you revealed that you aren't trying to fully solve what the driver is doing for us in D3D11, especially in the context of multi-threaded command recording. Good for you! Good in overall! :)

    I've got my hands on a multi-platform engine built around the logic of D3D11 and, unfortunately, high-level users sometimes do all kinds of stuff to resources on all the threads in parallel. This brought me a lot of headache because trying to support that is very difficult, error-prone and certainly won't ever be more performant than the battle-hardened drivers.

    Just not doing that, that is no transitions inside of parallel recording jobs, but only on their boundaries, is the only way to go, obviously.

    Share this comment


    Link to comment
    Share on other sites

    There is really no efficient solution to resolving state dependencies in multithreaded environment, which is why D3D12 and Vulkan make that an application's problem. I believe that giving an option to choose between manual and automatic state management is a convenient way to make API easy to use yet expressive when necessary.

    Share this comment


    Link to comment
    Share on other sites


    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

  • Advertisement
  • Game Developer Survey

    completed-task.png

    We are looking for qualified game developers to participate in a 10-minute online survey. Qualified participants will be offered a $15 incentive for your time and insights. Click here to start!

    Take me to the survey!

  • Advertisement
  • Latest Featured Articles

  • Featured Blogs

  • Advertisement
  • Popular Now

  • Similar Content

    • By vinibiavatti
      Hi there! I have one issue for now. I'm creating a RayCasting application, and for my floor and ceiling I'm trying to use Mode7 for rendering (I think this is easier to understand). but, I cant align the RayCasting walls with the mode7 floor. I use a rotate matrix to make the rotation of floor. Do you know what a need to think in the implementation to fix that? Or do you know if there is some tutorial explaining about it? Thanks!!! (Check the image below for understand)

      Here is my mode7 code:
      function mode7() { let _x = 0; let _y = 0; let z = 0; let sin = Math.sin(degreeToRadians(data.player.angle)); let cos = Math.cos(degreeToRadians(data.player.angle)); for(let y = data.projection.halfHeight; y < data.projection.height; y++) { for(let x = 0; x < data.projection.width; x++) { _x = ((data.projection.width - x) * cos) - (x * sin); _y = ((data.projection.width - x) * sin) + (x * cos); _x /= z; _y /= z; if(_y < 0) _y *= -1; if(_x < 0) _x *= -1; _y *= 8.0; _x *= 8.0; _y %= data.floorTextures[0].height; _x %= data.floorTextures[0].width; screenContext.fillStyle = data.floorTextures[0].data[Math.floor(_x) + Math.floor(_y) * data.floorTextures[0].width]; screenContext.fillRect(x, y, 1, 1); } z += 1; } }  
    • By DiligentDev
      The latest release of Diligent Engine combines a number of recent updates (Vulkan on iOS, GLTF2.0 support, shadows), significantly improves performance of OpenGL backend, updates API, adds integration with Dear Imgui and implements new samples and tutorials. Some of the new features in this release:
      GLTF2.0 support (loader, PBR renderer and sample viewer) Shadowing Component and Shadows Sample Integration with Dear Imgui library and Dear Imgui demo Tutorial13 - Shadow Map Tutorial14 - Compute Shader Tutorial15 - Multiple Windows Check it out on GitHub.
        
       
       
    • By Guy Fleegman
      Graphic Artist Opportunity
       

       
      If you’re an artist and have ever thought about game development, but were hesitant about actually doing it, this is the perfect opportunity for you. There’s no commitment and moderate pixel art skills are probably all you need to bring to the table.
       
      Cube Universe is a game that has been in development for 5 years. It has combat, crafting, world building, quests, RPG skills and abilities, travel between planets and it’s multiplayer… it’s a fully functional game with a dedicated developer behind it all. It’s a science fiction, fantasy sandbox game where magic and technology meet. It’s alien and mystical. There’s no limit to what can be in the game and that means a lot of room to express yourself as an artist.
       
      When I say pixel art skills are required for a 3D game, let me explain the current process of how content can be created in Cube Universe.
       

       
      Cube Universe comes with a built-in editor. It allows you to build structures (like a house, a castle, a spaceship, or a sacrificial temple to the moon god of a primitive culture) using the game world’s terrain blocks (which you can also create different kinds of). It also features a modeller that allows you to create more intricate furniture, lively creatures and decorations (like a fireplace, a holographic console, a bookshelf, or a laboratory table bubbling with the craziest potions imaginable).
       
      Note: A terrain block is 0.5 metres cubed. When modelling, a 0.5 metre block is 16x16x16 voxels. Each voxel allows for 4x4 pixels on each face.
       

       
      It’s all about speeding up the process though; getting your ideas into the game world as quickly as possible. Cube Universe’s editor can import MagicaVoxel ( https://ephtracy.github.io/ ) models and it keeps the color information for texturing. MagicaVoxel is an amazingly simple and powerful voxel modelling/coloring tool that’s completely free to use.
       

       
      The next step is to add minor details through the Cube Universe editor using it’s built-in paint tools. You can import your own palette and paint until you’re satisfied. At this point everything is kept simple on purpose because the texture can now be imported into your preferred paint program as a PNG file.
       

       
      In this case, GIMP ( https://www.gimp.org/ ) is being used to change colors faster and paint the wood grain. It’s easy to see how the sides of the model are represented in the PNG file, but this process might require you to go back and forth a bit between GIMP and the editor to texture around corners and such. After you’re satisfied, you can run any filters in GIMP over your textures and you’re done!
       

       
      The nice thing when creating content is that the game supports shadows and ambient occlusion, which creates a darkening around seams and let’s you keep your textures simple while the game adds shading. The most time consuming part of the process is usually the texturing. A 30 minute model could typically take 2 hours to texture, for example. The focus of this game’s graphics is to create content easily with a pleasant appeal. The texture style is purposely simple to keep things as economical as possible. The modelling is where you want to spend the most time being creative and I believe that focus will make for an enjoyable experience creating content for Cube Universe.
       

       
      Once you have a handle on static models, Cube Universe’s editor also exports bones, meshes and UV maps to Blender ( https://www.blender.org/ ) for animation all in a DAE (Collada) file. Animation is it’s own thing and we’d love to have someone who is familiar with basic Blender bone animation, but that is not a prerequisite for this recruitment phase. This is how the creatures are animated though. And you can only model so many tables and teleporter pads before you get the itch to try making a wild half-monster, half-robot abomination that strikes fear into the player from a 100 metres away. This is what drives artists to learn more technical things; torturing the player creating engaging experiences for the player.
       
      At the end of the day, that’s what game development is about; learning new skills, pushing yourself a little out of your comfort zone and making wild ideas into a digital reality. I’ve written this recruitment post from my own perspective with the project. I’m not an experienced game artist, but I’m having a blast making stuff and learning new techniques. I’ve even learned new things about the software I thought I was already familiar with. And that’s where the fun in development comes from. Also, you won’t be alone. This is a team effort and helping each other is a crucial part of that. We'll help you get started and share any tips and tricks with you to make your life easier on this project.
       
      If you’ve made it this far, you’re definitely wondering about payment. At this point, all that can be offered is revenue sharing. If you are looking at this as an opportunity to retire on a tropical island, you’ll most likely be disappointed. If you view this as a way to experience 3D game development in probably the most accessible way possible, then I believe you’ll enjoy your time on the project. You’ll receive a copy of a cool sandbox game and some money when sales are made down the road. The details can be discussed further with the developer directly.
       
      You’ve probably noticed that all the software an artist needs is free to download. Got a computer? You’re good to go! The developer is passionate about this game and has implemented a lot of features in the editor to accommodate speed and flexibility for you, the artist. Discord messaging is the primary way to communicate and stay connected to the project. Google Drive is used for all file sharing and asset backup. That’s all the online accounts you require to join the team and start creating.
       
      Currently the game is for sale on the official website ( https://www.beosar.com/games/cubeuniverse/ ), but it’s not quite ready for a marketing push yet. With sandbox games, content is king and Cube Universe needs your help. If you’re new to game development, you'll gain some important skills and experience to help you with future endeavors. If you know someone who might be interested in the graphic side of games, please mention this opportunity to them and let them decide if this is right for them. Lastly, if you know all this stuff already and have lots of experience, well let’s see what you got, tough guy! C’mon, I dare you! 😉
       
      Feel free to ask questions in this thread. Otherwise, you can contact Beosar ( https://www.gamedev.net/profile/221978-beosar/ ) here on GameDev.net for further information. If you prefer Discord, Beosar#8149 is what you'll need.
       
       
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!