GuyWithBeard

Vulkan Vulkan render pass interdependencies?

Recommended Posts

Hi,

In Vulkan you have render passes where you specify which attachments to render to and which to read from, and subpasses within the render pass which can depend on each other. If one subpass needs to finish before another can begin you specify that with a subpass dependency.

In my engine I don't currently use subpasses as the concept of the "render pass" translates roughly to setting a render target and clearing it followed by a number of draw calls in DirectX, while there isn't really any good way to model subpasses in DX. Because of this, in Vulkan, my frame mostly consists of a number of render passes each with one subpass.

My question is, do I have to specify dependencies between the render passes or is that needed only if you have multiple subpasses?

In the Vulkan Programming Guide, chapter 13 it says: "In the example renderpass we set up in Chapter 7, we used a single subpass with no dependencies and a single set of outputs.”, which suggests that you only need dependencies between subpasses, not between render passes. However, the (excellent) tutorials at vulkan-tutorial.com have you creating a subpass dependency to "external subpasses" in the chapter on "Rendering and presentation", under "Subpass dependencies": https://vulkan-tutorial.com/Drawing_a_triangle/Drawing/Rendering_and_presentation even if they are using only one render pass with a single subpass.

So, in short; If I have render pass A, with a single subpass, rendering to an attachment and render pass B, also with a single subpass, rendering to that same attachment, do I have to specify subpass dependencies between the two subpasses of the render passes, in order to make render pass A finish before B can begin, or are they handled implicitly by the fact that they belong to different render passes?

Thanks!

Edited by GuyWithBeard

Share this post


Link to post
Share on other sites

The short answer is that you are likely to have problems and will need to do something to introduce dependencies between render passes.  The longer answer is that this is very driver specific and you might get away with it, at least for a while.  The problem here is that unlike Dx12, Vulkan does do a little driver level work to help you out which you usually want to duplicate in Dx12 yourself anyway.  Basically though, if you issue 10 render passes, the Vulkan driver does not have to heed the order you sent them to the queue unless you have supplied some form of dependency information or explicit synchronization.  Vulkan is allowed to, and probably will, issue the renderpasses out of order, in parallel or completely ass backwards depending on the drivers involved.  Obviously this means that the draw commands associated with each begin/next subpass executes out of order.

When I implemented my wrapper, I ended up duplicating the entire concept of render/sub passes in Dx12, it was the most appropriate solution I could find which would allow me to solve the various issues on the three primary rendering API's I wanted to support.  The primary reason for putting the work into this was exactly the problems you are asking questions about.  At least in my case, when I really dug into and understood the render passes I realized I had basically just been doing exactly the same things except in my rendering code.  By pushing it down to an abstraction it cleaned things up quite a lot and made for a much cleaner API.  Additionally, at a later time, it will make it considerably easier to deal with optimizing for better parallelism and GPU utilization since all the transitions and issuance is in one place that I can get clever with, without having to re-organize large swaths of code.

So, yup, I'd suggest you consider implementing the subpass portion because it has a lot of benefits and solves the problem you are asking about.

Share this post


Link to post
Share on other sites

Interesting. I wonder if you could force render passes into a specific execution order (ie. the order they are recorded into the command buffer) by using only VK_SUBPASS_EXTERNAL as srcSubpass and dstSubpass.

EDIT: And by that I mean that the vulkan-tutorial.com article seems to suggest that you can refer to a subpass of a previous/subsequent render pass using VK_SUBPASS_EXTERNAL, although I am a bit unsure if that is what it really means.

Of course, it wouldn't be utilizing the GPU to its fullest but it would ensure correct behavior...

Edited by GuyWithBeard

Share this post


Link to post
Share on other sites
1 hour ago, GuyWithBeard said:

Interesting. I wonder if you could force render passes into a specific execution order (ie. the order they are recorded into the command buffer) by using only VK_SUBPASS_EXTERNAL as srcSubpass and dstSubpass.

Yes you can.  This was actually the first approach I tried but it is very bad for the GPU in terms that it severely under-utilizes parallelism.  As a quick and dirty 'get it running' solution it is fine though.  If you read that tutorial again you might catch that they talk about there being implicit 'subpasses' in each render pass, it's kinda 'pre' 'your pass' 'post'.  By setting dependencies on those hidden subpasses that is how you can control renderpass to renderpass dependency, or at least ordering.

1 hour ago, GuyWithBeard said:

BTW, did you manage to hide away all resource barriers and transitions into your render pass system or do you still have to issue barrier/transition commands on your command buffers at the rendering code level?

Nope, I still expose quite a bit of that for things not directly within a renderpass since it is common to all three API's, or null op where Metal does some of it for you.  The most specific case is the initial upload of a texture where you still have to do the transition to CPU visible, copy the data, transition to GPU visible copy to GPU only memory.  While I have considered hiding all these items the goal is to keep the adapter as close to the underlying API's as possible and maintain the free threaded externally synchronized model of DX12/Vulkan and to a lesser degree Metal.  Hiding transitions and such would mean building more thread handling and synchronization into that low level than I desire.  This would be detrimental since I'm generating command buffers via 32 threads (Thread ripper is fun) which would really suck if the lowest level adapter was introducing synchronization at the CPU level in order to automate hiding a couple barrier & transition calls.

Long story short, the middle layer 'rendering engine' will probably hide all those details.  I just haven't really bothered much and hand code a lot of it since I'm more focused on game systems than on the rendering right now.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Announcements

  • Forum Statistics

    • Total Topics
      628301
    • Total Posts
      2981906
  • Similar Content

    • By mellinoe
      Hi all,
      First time poster here, although I've been reading posts here for quite a while. This place has been invaluable for learning graphics programming -- thanks for a great resource!
      Right now, I'm working on a graphics abstraction layer for .NET which supports D3D11, Vulkan, and OpenGL at the moment. I have implemented most of my planned features already, and things are working well. Some remaining features that I am planning are Compute Shaders, and some flavor of read-write shader resources. At the moment, my shaders can just get simple read-only access to a uniform (or constant) buffer, a texture, or a sampler. Unfortunately, I'm having a tough time grasping the distinctions between all of the different kinds of read-write resources that are available. In D3D alone, there seem to be 5 or 6 different kinds of resources with similar but different characteristics. On top of that, I get the impression that some of them are more or less "obsoleted" by the newer kinds, and don't have much of a place in modern code. There seem to be a few pivots:
      The data source/destination (buffer or texture) Read-write or read-only Structured or unstructured (?) Ordered vs unordered (?) These are just my observations based on a lot of MSDN and OpenGL doc reading. For my library, I'm not interested in exposing every possibility to the user -- just trying to find a good "middle-ground" that can be represented cleanly across API's which is good enough for common scenarios.
      Can anyone give a sort of "overview" of the different options, and perhaps compare/contrast the concepts between Direct3D, OpenGL, and Vulkan? I'd also be very interested in hearing how other folks have abstracted these concepts in their libraries.
    • By mark_braga
      I am looking at the SaschaWillems subpass example for getting some insight into subpass depdendencies but its hard to understand whats going on without any comments. Also there is not a lot of documentation on subpass dependencies overall.
      Looking at the code, I can see that user specifies the src subpass, dst subpass and src state, dst state. But there is no mention of which resource the dependency is on. Is a subpass dependency like a pipeline barrier. If yes, how does it issue the barrier? Is the pipeline barrier issued on all attachments in the subpass with the input src and dst access flags? Any explanation will really clear a lot of doubts on subpass dependencies.
      Thank you
    • By mark_braga
      I need to index into a texture array using indices which are not dynamically uniform. This works fine on NVIDIA chips but you can see the artifacts on AMD due to the wavefront problem. This means, a lot of pixel invocations get the wrong index value. I know you fix this by using NonUniformResourceIndex in hlsl. Is there an equivalent for Vulkan glsl?
      This is the shader code for reference. As you can see, index is an arbitrary value for each pixel and is not dynamically uniform. I fix this for hlsl by using NonUniformResourceIndex(index)
      layout(set = 0, binding = 0) uniform sampler textureSampler; layout(set = 0, binding = 1) uniform texture2D albedoMaps[256]; layout(location = 0) out vec4 oColor; void main() { uint index = calculate_arbitrary_texture_index(); vec2 texCoord = calculate_texcoord(); vec4 albedo = texture(sampler2D(albedoMaps[index], textureSampler), texCoord); oColor = albedo; } Thank you
    • By Mercesa
      As the title says, I am explicitly creating a too small descriptor pool, which should NOT support the resources I am going to allocate from it.
       
      std::array<vk::DescriptorPoolSize, 3> type_count; // Initialize our pool with these values type_count[0].type = vk::DescriptorType::eCombinedImageSampler; type_count[0].descriptorCount = 0; type_count[1].type = vk::DescriptorType::eSampler; type_count[1].descriptorCount = 0; type_count[2].type = vk::DescriptorType::eUniformBuffer; type_count[2].descriptorCount = 0; vk::DescriptorPoolCreateInfo createInfo = vk::DescriptorPoolCreateInfo() .setPNext(nullptr) .setMaxSets(iMaxSets) .setPoolSizeCount(type_count.size()) .setPPoolSizes(type_count.data()); pool = aDevice.createDescriptorPool(createInfo);  
      I have an allocation function which looks like this, I am allocating a uniform, image-combined sampler and a regular sampler. Though if my pool is empty this should not work?
      vk::DescriptorSetAllocateInfo alloc_info[1] = {}; alloc_info[0].pNext = NULL; alloc_info[0].setDescriptorPool(pool); alloc_info[0].setDescriptorSetCount(iNumToAllocate); alloc_info[0].setPSetLayouts(&iDescriptorLayouts); std::vector<vk::DescriptorSet> tDescriptors; tDescriptors.resize(iNumToAllocate); iDevice.allocateDescriptorSets(alloc_info, tDescriptors.data());  
    • By Mercesa
      When loading in a model with a lot of meshes that have different materials that contain different textures, how would you handle this in Vulkan?
      Is it possible to partially change a DescriptorSet with a WriteDescriptorSet object? Even if it is possible, it does not sound ideal to update the descriptor set for every mesh. I am aware of the boundless texture arrays in shader model 5.0+, but for now I want to keep it as simple as possible.
  • Popular Now