Hi all, I'm new to GameDev and new-ish to modern graphics programming so you'll have to forgive any extreme ignorance on my part.
I'm in the process of adding various post-processing effects to a rendering engine which I am working on. I am using OpenGL and have set up a couple of framebuffers with multiple texture attachments which I am using to render multiple passes of screenspace effects to (DoF, SSAO, bloom, etc.)
Currently I am trying to optimise the post processing effects as much as possible and after doing some rudimentary profiling I was surprised to see how much time is being taken to perform what I thought should be very simple and fast tasks. Primarily that is generating mipmaps and up/downscaling images.
Much of what I have read regarding post processing often talks about getting a significant speed increase by only processing a half or quater size screen image where suitable. This makes great sense although the cost of any downscaling or upscaling passes never seem to be mentioned, my testing is showing significant cost which seems to often almost cancel out any benefit.
I am aware that it is much faster for the gpu to run ALU instructions than to read and write from textures. Is this difference on modern gpu's significant enough to cause the performance I am seeing?
Some examples:
0.595ms to generate mipmaps for one fullscreen texture ( using glGenerateMipmap(GL_TEXTURE_2D) ).
0.268ms to read a fullscreen texture and render it to another texture at 1/4 size.
In contrast my SSAO takes almost exactly 1ms running at fullscreen, doing multiple reads per screen pixel of a 32bit depthbuffer then reading a full screen texture and applying the AO result to that and rendering it out to a new fullscreen texture. Why would it be almost 1/3 the cost of a fullscreen AO effect just to render a texture at 1/4 size where the shader is literally one texture fetch then an output? It is just not making sense to me.
I was hoping to convert my SSAO to run at 1/2 screen size, I also hoped to convert my bloom from a single 1/4 sized blur to a combination of multiple screen sizes as I have seen in many examples. At this point the cost of writing and reading many textures seems to very high, around 3ms total is being spent not actually processing anything other than rescaling.
I can't help but feel I must be doing something fundamentally wrong, many papers, books and tutorials talk about using many render passes and many variously scaled images as if it is commonplace and trivial. Many bloom techniques I have read talk about using 1/2, 1/4, 1/8 and 1/16 size images blurred and combined. I know for a fact many modern game engines are doing many post processing passes and rescaling for screenspace effects, I find it hard to believe they are spending several ms just on reading and writing to multiple textures.
Would using Compute shaders to resize textures be faster than fragment shaders?
What is the typical "budget" of processing time for various common effects? That is to say assuming many of the post effects are screenspace and have a relatively fixed cost per screen size regardless of the scene, how much of a % of a theoretical 16.6ms total rendering budget would be allocated to bloom, DoF, SSAO, etc...?