You seem to be worried about counting "passes", but all you're really counting here is how many times you change the current render-target, which is just a state-change like any other.
e.g. here's the thumbnail view of about 70 passes from my last game:
Why can't the first two passes be done in one drawcall? To me it seems logical to output to two RenderTargets, one color for HDR scene and one color for the luminance. Is this supported in D3D9, having MRT, one with a monochromatic/mipmap chain and another without? Or will this "optimization" defeat its purpose when the mipmap chain generation takes 4 times longer with 4 16F channels instead of one?
Yes, that approach would work and yes the mip-map generation will require 4x the bandwidth, but...
You mentioned deferred shading earlier, so --
During your lighting pass, if a pixel is covered by multiple lights, then that pixel is drawn to multiple times (once per light), with additive blending.
With the MRT approach, each pixel write is 16 bytes of data (ABGR16*2). Say we've got 4 lights per pixel on average for a 720p scene; that gives us 56.25MiB of data being written.
If you don't use an MRT, and instead compute luminosity at the end in another "pass", then when drawing the scene you've only got 28.125MiB of bandwidth, and then the luminosity pass reads ABGR16 and writes R16 once per pixel, which is another ~8.8MiB, for a total of ~37MiB of bandwidth.
Actually... log(light1) + log(light2) != log( light1 + light2 ), so the first approach would give incorrect results for a deferred rendering set-up anyway ;)
Will an auto mip-mapped texture generate its chain when the scene is done, or do I need to call GenerateMipMapSubLevels() before trying to sample it in the second pass?
Yes, auto-mipped textures generate their chains at some unspecified time after they've been rendered to, but before they're used in texture-sampler slots, automatically.
The function to manually generate their chain is just a hint that says "now would be an optimal time to generate your mips" -- you don't have to call it in D3D9 (some other APIs do require you to manually perform mipmap generation).
I assume this reads the last mip surface in the chain, even though the chain is much shorter.
Exactly. If you know how many levels there are in the mip chain, you can use the correct number. I just used 9000 as an in-source code joke (the mip level "is over 9000!"), and because the code should work for any resolution...
What are the pro/cons in splitting the blurs in two passes instead of two render targets?
Blurring the original scene horizontally and vertically, like you described earlier, doesn't give the correct result. It will give you a plus-sign shaped blur, instead of a circular blur.
What we're doing is a Gaussian blur, which in it's basic form is a single-pass effect (read in original scene, output circularly-blurred scene), but in it's traditional form, it requires us to sample in a large box around each pixel, e.g. 5x5=25 samples per pixel.
However, the Gaussian blur has a special property of being "separable", which means that instead of doing it in one 5x5 pass, we can instead do a 5x1 pass, then use that result as the input to a 1x5 pass, and the math works out to give us the exact same final result as the traditional method (but with just 10 samples instead of 25).
The earliest article that I know of on real-time bloom is here, and does a decent job explaining this.