I just finished implementating a post-processing pipeline in my renderer. Its initial performance is better than I expected, but I still need to optimize it a bit.
One of the things that worries me is the number of passes (by pass I mean single full-screen-quad draw call, so single gaussian blur has 2 passes - BlurU and BlurV). This is how the pipeline looks:
- Screen Space Subsurface Scattering - 6 passes(3 gaussian blurs). I use stencil and depth test to avoid unnecessary pixel blurs.
- Bloom - 7 passes(1 bright pass filter, 3 gaussian blurs).
- HDR Tone Mapping - 2 passes - create luminance texture, generatemips for average, than tone map. I read MJP's post about using CS instead of generateMips, it's on my todo list.
- DOF - 3 passes (generate CoC map, 1 gaussian blur).
- Film Grain - 1 pass. This is the easiest one to remove, which I tried, but perfroance stayed the same.
So my starting point is 19 passes, most of them blurs so heavy on the TXS. I've tried removing some, but it affects the visual quality. What I'm trying to do is improving performance while preserving the visual quality, and if possible, preserve the pipeline flexibility.
I have some ideas, mainly:
- Use CS for better sampling efficiency.
- Widen the blur kernels while reducing the number of blurs. This will reduce the total amount of TXS ops, but will probably reduce the visual quality.
- Merge passes. Not sure how that will work.
Any advice will be appreciated.