You say I need to be careful to not reach 1.0f with the former, but what about reaching 65504 (max half precision) with the latter?
Yeah, that's just as bad, but it's a lot harder to do.
And say my brightest pixel is 1000.0f, won't then all values between 1000.0f and 65504 be wasted?
Yes, this is true whether you use integer-fractional or floating-point formats.
However, floating point formats have logarithmic precision, where as numbers get bigger they become less precise -- so you can give yourself a bit of extra headroom, without sacrificing precision in the common case.
e.g. say you scene brightness is usually from 0-1k, but in very rare cases some pixels might have values of 60k. With ABGR16F, then the usual numbers have great precision, and the rare numbers are still valid, but are slightly less accurate (you might get some colour banding if you're unlucky, but that is much better than clamping).
With ABGR16 (integer-fractional), you'd have to either set your scale factor to 1000 and suffer from clamping issues, or you'd set it to 60000 and suffer reduced precision across both the 'usual' brightnesses and the 'rare' brightnesses.
Depending on your game, one may be better than the other, but I'd recommend 16F just for simplicity.
On my last game, we actually used 10bit integer-fractional on one platform (with a scale factor), and 16bit float on other platforms, depending on which one was more optimal. For us, 16F "just worked", whereas 10-bit integer was a headache that we had to constantly tweak per time-of-day to get acceptable results.
I though that HDR was to utilize the whole range, e.g 0-1 with D3DFMT_A16B16G16R16 in increments of 1/65536.
The point is to not have a limit to your brightness values (i.e. the 0 to infinity thing). Infinity doesn't work in practice, so the point is to have a large limit ;)
If you're doing a physical simulation, then in the general case, it's hard to predict what you maximum brightness value is. e.g. direct sunlight might be 1000 times brighter than fluorescent lighting, and an object 1cm from a light-bulb might be 1000 times brighter than an object 1m from it. So for a general solution, full 32-bit floating point would be preferable, but it's quite expensive to use F32 formats!
These are produced in the first pass:
RenderTarget0 (D3DFMT_A16B16G16R16F): Render scene with light values between 0.0f and 65504? (or just ignore 65504 and possibly output a value like 2344534756348.0f)
RenderTarget1 (D3DFMT_A16B16G16R16F): Render same scene but with Log luminance instead for pixel values and mipmap it.
Get the K in application by reading lowest mip surface?
These are two passes.
After drawing the HDR scene to RT0, you then render to RT1 using RT0 as input. The shader reads RT0 and uses a luminance function to find the luminance of each pixel.
RT1 is monochromatic, so you could use the D3DFMT_R16F format for it. In my last game, we actually compressed the data down to just 8-bits, but a half-float format is easier.
Instead of reading the K value back in the application, you can keep everything on the GPU-side. If RT1 is an auto mip-mapped texture, you can later use tex2Dlod(rt1, float4(0.5,0.5,0,9000)).r to read the 1x1 pixel result in your tone-mapping shader.
Each of these is also an individual "pass":
3. Output to RT2 (lower resolution), read RT0 as input. The shader reads RT0 and multiplies by some fraction to darken it (or does (input-threshold)*fraction if you want to have a sharper cut-off to the effect). Now you've got your "downsampled" and darkened version of the scene.
You'll want to make sure you can tweak this fraction value (and possibly threshold value) at run-time, to fine-tune the effect. You probably want to use quite small numbers, like 0.1, but that of course depends on the scene/lighting/game.
4. Output to RT3 (lower resolution), read RT2 as input several times with different horizontal offsets. Add all the samples together using different weights for each sample. Now you've got a horizontally blurred image.
5. Output to RT2 (RT2's data isn't required any more, so let's recycle that target!), read RT3 as input several times with different vertical offsets, and add/weight as before. Now you've got a vertically blurred version of a horizontally blurred image -- this produces a circularly blurred image!
6. Read RT2 and RT0, combine them, and tone-map the result.