• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.

      View full story
    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.
    • By mark_braga
      I have a pretty good experience with multi gpu programming in D3D12. Now looking at Vulkan, although there are a few similarities, I cannot wrap my head around a few things due to the extremely sparse documentation (typical Khronos...)
      In D3D12 -> You create a resource on GPU0 that is visible to GPU1 by setting the VisibleNodeMask to (00000011 where last two bits set means its visible to GPU0 and GPU1)
      In Vulkan - I can see there is the VkBindImageMemoryDeviceGroupInfoKHR struct which you add to the pNext chain of VkBindImageMemoryInfoKHR and then call vkBindImageMemory2KHR. You also set the device indices which I assume is the same as the VisibleNodeMask except instead of a mask it is an array of indices. Till now it's fine.
      Let's look at a typical SFR scenario:  Render left eye using GPU0 and right eye using GPU1
      You have two textures. pTextureLeft is exclusive to GPU0 and pTextureRight is created on GPU1 but is visible to GPU0 so it can be sampled from GPU0 when we want to draw it to the swapchain. This is in the D3D12 world. How do I map this in Vulkan? Do I just set the device indices for pTextureRight as { 0, 1 }
      Now comes the command buffer submission part that is even more confusing.
      There is the struct VkDeviceGroupCommandBufferBeginInfoKHR. It accepts a device mask which I understand is similar to creating a command list with a certain NodeMask in D3D12.
      So for GPU1 -> Since I am only rendering to the pTextureRight, I need to set the device mask as 2? (00000010)
      For GPU0 -> Since I only render to pTextureLeft and finally sample pTextureLeft and pTextureRight to render to the swap chain, I need to set the device mask as 1? (00000001)
      The same applies to VkDeviceGroupSubmitInfoKHR?
      Now the fun part is it does not work  . Both command buffers render to the textures correctly. I verified this by reading back the textures and storing as png. The left texture is sampled correctly in the final composite pass. But I get a black in the area where the right texture should appear. Is there something that I am missing in this? Here is a code snippet too
      void Init() { RenderTargetInfo info = {}; info.pDeviceIndices = { 0, 0 }; CreateRenderTarget(&info, &pTextureLeft); // Need to share this on both GPUs info.pDeviceIndices = { 0, 1 }; CreateRenderTarget(&info, &pTextureRight); } void DrawEye(CommandBuffer* pCmd, uint32_t eye) { // Do the draw // Begin with device mask depending on eye pCmd->Open((1 << eye)); // If eye is 0, we need to do some extra work to composite pTextureRight and pTextureLeft if (eye == 0) { DrawTexture(0, 0, width * 0.5, height, pTextureLeft); DrawTexture(width * 0.5, 0, width * 0.5, height, pTextureRight); } // Submit to the correct GPU pQueue->Submit(pCmd, (1 << eye)); } void Draw() { DrawEye(pRightCmd, 1); DrawEye(pLeftCmd, 0); }  
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By Alexa Savchenko
      I publishing for manufacturing our ray tracing engines and products on graphics API (C++, Vulkan API, GLSL460, SPIR-V): https://github.com/world8th/satellite-oem
      For end users I have no more products or test products. Also, have one simple gltf viewer example (only source code).
      In 2016 year had idea for replacement of screen space reflections, but in 2018 we resolved to finally re-profile project as "basis of render engine". In Q3 of 2017 year finally merged to Vulkan API. 
       
       
  • Advertisement
  • Advertisement
Sign in to follow this  

Vulkan Vulkan render pass questions

This topic is 759 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello. 

 

I'm looking into adding Vulkan support through an abstraction I'm writing so that I can switch between OGL and VK easily and test it out. I'm looking into how my postprocessing would map to render passes in Vulkan right now, and it's not looking very good. I thought it would be easy to make each fullscreen pass a subpass in one big render pass, but it seems very easy to "break" a render pass.

 

 - Changing the width and height of the framebuffer requires a completely new render pass, not just a new subpass. This messes with my bloom processor, since I repeatedly downsample the scene and blur it. Each mipmap would require its own render pass.

 

 - It might be possible to work around this by packing all my bloom mipmaps into a single texture. The bloom texture starts at half resolution and goes down, and I need two to pingpong between for the blurring. I could pack both these mip chains into one full-resolution buffer and only have a tiny bit of wasted area to avoid having to switch resolution when downsampling, but this would require the same functionality as NV_texture_barrier provides, which I don't think is available in Vulkan without breaking the render pass.

 

 - According to the spec "Image subresources used as attachments must not be used via any non-attachment usage for the duration of a render pass instance.". I suppose this is partly replaced by input attachments, but input attachment loads are always unfiltered. This breaks bloom even more since I use a bilinear filtering accelerated gaussian blur for each blur pass, meaning I'd not only need one render pass for each mipmap level, I also need 2 render passes per gaussian blur pass (one horizontal and one vertical pass). It'd also break FXAA which uses bilinear filtering of the input buffer.

 

 - I mainly develop for PC. Is it even important in the first place to have as few render passes as possible if I'm not on a tiled architecture? Do transient attachments have any advantages at all on desktop GPUs? Is there any point at all in trying to force all postprocessing into the same render pass and make pretty much all intermediate images transient?

 

 - Is it legal to use the same image multiple times in a Vulkan framebuffer? Let's say I have 2 fullscreen passes I need done. I have 3 attachments defined in my render pass create info. The first pass reads from the first attachment and writes to the second. The second pass reads the second and writes to the third. Would it be legal for me to have the same image as the first and third attachments when I create my framebuffer? I don't see anything specifically disallowing this in the spec, and it seems to allow multiple image-views both using the same underlying image as attachments, so I believe this is 100% OK to do.

Edited by theagentd

Share this post


Link to post
Share on other sites
Advertisement

You are repeatedly downsampling and blurring..why..if its just for the blooming, then you may want to reconsider if you really have to do that ? The downsampling itself act as a low-pass filter, so simply rendering to a lower res intermediate with a nxn kernel will actually give the appearance of a wider kernel if you use a lower resolution frame buffer ( hopefully, that make sense ). In reality all you would need to do if you are worried about efficiency is to make the blur separable...

Share this post


Link to post
Share on other sites

You are repeatedly downsampling and blurring..why..if its just for the blooming, then you may want to reconsider if you really have to do that ? The downsampling itself act as a low-pass filter, so simply rendering to a lower res intermediate with a nxn kernel will actually give the appearance of a wider kernel if you use a lower resolution frame buffer ( hopefully, that make sense ). In reality all you would need to do if you are worried about efficiency is to make the blur separable...

The blur is already separable and accelerated with bilinear filtering (2 pixel read per sample). I downsample for performance reasons. I want a blur radius the size of the screen, which this gives me. Doing that at a fixed resolution would be waaaay too expensive.

Share this post


Link to post
Share on other sites

Also, you can only read the same pixel XY being processed from input attachments. subpassLoad(subpass) is the same as texelFetch(subpassSampler, ivec2(gl_FragCoord.xy), 0). This breaks even more things in my postprocessing. SSAO blurring is now broken since it requires reading stuff with an offset. My SRAA is broken since I need neighboring pixels as well.

 

What the hell is the point of a subpass then? If subpasses can only render at a fixed resolution and I can only read the exact same pixel from the previous pass, I might as well merge it all together into one shader manually in the first place. Hell, I've already done that in all places I could, pretty much.

Share this post


Link to post
Share on other sites

It's not meant for post-processing in general.  It's meant for optimizations that tiled GPUs especially can make use of whenever they have guarantees like only reading an attachment at the exact position it's rendering the fragment.

 

I imagine merging everything into a single shader would have some downsides as opposed to doing it within the render pass framework, or we wouldn't have render passes in the first place.

Share this post


Link to post
Share on other sites

 - It might be possible to work around this by packing all my bloom mipmaps into a single texture.

Yeah, googled around for "nv_texture_barrier vulkan" but got no results.

 

In any case, yeah, just use more render passes.

 

Probably you've seen this

 

They talk a bit about sub render passes but the example they gave is contrived (as the speaker acknowledged).

Edited by TheChubu

Share this post


Link to post
Share on other sites

I think render pass may be used for gbuffer. It should be possible to encapsulate the whole gbuffer generation + lighting + transparent shading in a single render pass.

Share this post


Link to post
Share on other sites

The only legitimate time I can imagine this actually helping for tiled GPUs is when doing a geometry pass followed by a fullscreen pass. For example, I draw to my G-buffer, then generate a linear depth buffer from the hardware depth buffer. If I made the G-buffer filling one subpass and the linear depth buffer calculation a second subpass, a tile GPU would be able to combine the geometry pass with the fullscreen pass, first shading the geometry inside the tile, then immediately linearizing the depth buffer without the depth buffer leaving on-chip memory. I guess I could just add the linearization to the G-buffer shader and add an extra color attachment since tiled renderers only ever shade once, but using subpasses here allows me to write code that is optimized for immediate renderers, but provide enough information for tiled renderers to run them at maximum efficiency.

 

This also means that they really are worthless for postprocessing, which is really sad. I was hoping for much bigger gains for tiled renderers there. Theoretically, a tiled renderer should be able to run my entire postprocessing pipeline solely using on-chip memory (transient lazily initialized memory), but since I need texture filtering and neighbor sampling here and there for blurs and other effects, that won't be possible. I guess it makes sense that that limitation is there, since it would be possible to sample outside the tile, so it probably won't be fixable due to inherent limitations of tiled renderers.

 

EDIT: vlj's idea would also be valid, but is problematic for me to implement in my lighting engine. We potentially need multiple shadow map passes. We pack the shadow maps of multiple lights into one big shadow map and then draw all those lights in one batch. If the shadow map isn't big enough to contain all shadow maps, we do multiple passes. That'd mean I have a dynamic number of subpasses based on light count, which is a bit hard to implement (but not impossible).

Edited by theagentd

Share this post


Link to post
Share on other sites

For fullscreen pass you can try using compute shader, I think they mandatory in Vulkan. Tiled renderer doesn't have special handling for compute shader but you can implement tiling yourself with local data storage. Unfortunatly there will be a flush between the geometry/lighting pass and fullscreen one but if you can pack all your fullscreen algorithm into a single, big compute shader it might save some memory bandwidth.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement