Vulkan render pass questions

Started by
7 comments, last by vlj 8 years ago

Hello.

I'm looking into adding Vulkan support through an abstraction I'm writing so that I can switch between OGL and VK easily and test it out. I'm looking into how my postprocessing would map to render passes in Vulkan right now, and it's not looking very good. I thought it would be easy to make each fullscreen pass a subpass in one big render pass, but it seems very easy to "break" a render pass.

- Changing the width and height of the framebuffer requires a completely new render pass, not just a new subpass. This messes with my bloom processor, since I repeatedly downsample the scene and blur it. Each mipmap would require its own render pass.

- It might be possible to work around this by packing all my bloom mipmaps into a single texture. The bloom texture starts at half resolution and goes down, and I need two to pingpong between for the blurring. I could pack both these mip chains into one full-resolution buffer and only have a tiny bit of wasted area to avoid having to switch resolution when downsampling, but this would require the same functionality as NV_texture_barrier provides, which I don't think is available in Vulkan without breaking the render pass.

- According to the spec "Image subresources used as attachments must not be used via any non-attachment usage for the duration of a render pass instance.". I suppose this is partly replaced by input attachments, but input attachment loads are always unfiltered. This breaks bloom even more since I use a bilinear filtering accelerated gaussian blur for each blur pass, meaning I'd not only need one render pass for each mipmap level, I also need 2 render passes per gaussian blur pass (one horizontal and one vertical pass). It'd also break FXAA which uses bilinear filtering of the input buffer.

- I mainly develop for PC. Is it even important in the first place to have as few render passes as possible if I'm not on a tiled architecture? Do transient attachments have any advantages at all on desktop GPUs? Is there any point at all in trying to force all postprocessing into the same render pass and make pretty much all intermediate images transient?

- Is it legal to use the same image multiple times in a Vulkan framebuffer? Let's say I have 2 fullscreen passes I need done. I have 3 attachments defined in my render pass create info. The first pass reads from the first attachment and writes to the second. The second pass reads the second and writes to the third. Would it be legal for me to have the same image as the first and third attachments when I create my framebuffer? I don't see anything specifically disallowing this in the spec, and it seems to allow multiple image-views both using the same underlying image as attachments, so I believe this is 100% OK to do.

Advertisement

You are repeatedly downsampling and blurring..why..if its just for the blooming, then you may want to reconsider if you really have to do that ? The downsampling itself act as a low-pass filter, so simply rendering to a lower res intermediate with a nxn kernel will actually give the appearance of a wider kernel if you use a lower resolution frame buffer ( hopefully, that make sense ). In reality all you would need to do if you are worried about efficiency is to make the blur separable...

You are repeatedly downsampling and blurring..why..if its just for the blooming, then you may want to reconsider if you really have to do that ? The downsampling itself act as a low-pass filter, so simply rendering to a lower res intermediate with a nxn kernel will actually give the appearance of a wider kernel if you use a lower resolution frame buffer ( hopefully, that make sense ). In reality all you would need to do if you are worried about efficiency is to make the blur separable...

The blur is already separable and accelerated with bilinear filtering (2 pixel read per sample). I downsample for performance reasons. I want a blur radius the size of the screen, which this gives me. Doing that at a fixed resolution would be waaaay too expensive.

Also, you can only read the same pixel XY being processed from input attachments. subpassLoad(subpass) is the same as texelFetch(subpassSampler, ivec2(gl_FragCoord.xy), 0). This breaks even more things in my postprocessing. SSAO blurring is now broken since it requires reading stuff with an offset. My SRAA is broken since I need neighboring pixels as well.

What the hell is the point of a subpass then? If subpasses can only render at a fixed resolution and I can only read the exact same pixel from the previous pass, I might as well merge it all together into one shader manually in the first place. Hell, I've already done that in all places I could, pretty much.

It's not meant for post-processing in general. It's meant for optimizations that tiled GPUs especially can make use of whenever they have guarantees like only reading an attachment at the exact position it's rendering the fragment.

I imagine merging everything into a single shader would have some downsides as opposed to doing it within the render pass framework, or we wouldn't have render passes in the first place.

"So there you have it, ladies and gentlemen: the only API I’ve ever used that requires both elevated privileges and a dedicated user thread just to copy a block of structures from the kernel to the user." - Casey Muratori

boreal.aggydaggy.com


- It might be possible to work around this by packing all my bloom mipmaps into a single texture.

Yeah, googled around for "nv_texture_barrier vulkan" but got no results.

In any case, yeah, just use more render passes.

Probably you've seen this

They talk a bit about sub render passes but the example they gave is contrived (as the speaker acknowledged).

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

I think render pass may be used for gbuffer. It should be possible to encapsulate the whole gbuffer generation + lighting + transparent shading in a single render pass.

The only legitimate time I can imagine this actually helping for tiled GPUs is when doing a geometry pass followed by a fullscreen pass. For example, I draw to my G-buffer, then generate a linear depth buffer from the hardware depth buffer. If I made the G-buffer filling one subpass and the linear depth buffer calculation a second subpass, a tile GPU would be able to combine the geometry pass with the fullscreen pass, first shading the geometry inside the tile, then immediately linearizing the depth buffer without the depth buffer leaving on-chip memory. I guess I could just add the linearization to the G-buffer shader and add an extra color attachment since tiled renderers only ever shade once, but using subpasses here allows me to write code that is optimized for immediate renderers, but provide enough information for tiled renderers to run them at maximum efficiency.

This also means that they really are worthless for postprocessing, which is really sad. I was hoping for much bigger gains for tiled renderers there. Theoretically, a tiled renderer should be able to run my entire postprocessing pipeline solely using on-chip memory (transient lazily initialized memory), but since I need texture filtering and neighbor sampling here and there for blurs and other effects, that won't be possible. I guess it makes sense that that limitation is there, since it would be possible to sample outside the tile, so it probably won't be fixable due to inherent limitations of tiled renderers.

EDIT: vlj's idea would also be valid, but is problematic for me to implement in my lighting engine. We potentially need multiple shadow map passes. We pack the shadow maps of multiple lights into one big shadow map and then draw all those lights in one batch. If the shadow map isn't big enough to contain all shadow maps, we do multiple passes. That'd mean I have a dynamic number of subpasses based on light count, which is a bit hard to implement (but not impossible).

For fullscreen pass you can try using compute shader, I think they mandatory in Vulkan. Tiled renderer doesn't have special handling for compute shader but you can implement tiling yourself with local data storage. Unfortunatly there will be a flush between the geometry/lighting pass and fullscreen one but if you can pack all your fullscreen algorithm into a single, big compute shader it might save some memory bandwidth.

This topic is closed to new replies.

Advertisement