If you should decide to try Voxel Cone Tracing do it on (nested) dense grids. A Sparse Voxel Octree while not really being affordable performance-wise in a setting where you do other stuff beside dynamic GI is also a pain in the butt to implement and maintain. Trust me, I have been there. :\
Because radiance is flux differentiated with respect to two quantities (projected solid angle and surface area) which gives you two differential operators in the denominator and therefore two in the numerator as well. Radiometry isn't really that hard once you got it but it can take some time and practice(!) depending on your foreknowledge and basic intelligence.
To be physically plausible, Bloom/Glare/Glow should added before tonemapping since it simulates scattering due to diffraction effects in the eye/camera lens, which are wavelength but not intensity dependent. Usually the wavelength dependency is neglected too, though. Tonemapping happens as soon as the photons hit the sensor (or later as a post-processing step if the sensor supports a high dynamic range as it is the case with our virtual sensors) and therefore after the scattering events.
Also, most GPUs implement "Hi-Z" or some similar marketing buzzword, which conceptually is like a tiled min-max depth buffer. e.g. for every 8x8 pixel block, there is an auxiliary buffer that stores the smallest and largest depth value inside that region. The fixed-function rasterizer can make use of this information to sometimes very quickly accept (all pixels pass depth test without actually doing depth testing) or reject (all pixels fail depth test and are discarded immediately) whole blocks of pixels.
I might by wrong here, but isn't Hi-Z about culling whole triangles before they are even passed to the rasterizer? With other words based on the depth values of the three vertices of a triangle?
[...]it's instead implemented the depth test as a sorting algorithm!
But for intersecting triangles that doesn't work...or does it somehow?
I'm by no means an expert with regard to SHs, but I think what you are doing is correct. A directional light is basically described by a delta function which you can not reproduce with a finite amount of SH bands. Intuitively spoken, the best you can do with just two bands is to set the 'vector' part (2nd band (index 1)) to the direction of the delta function and use the constant SH term (1st band (index 0)) to account for the clamping in the negative direction. This is basically the same you do to represent a clamped cosine lobe with SHs (I'm not sure whether the weights are exactly the same, though). All you could therefore do to increase the quality, is to increase the number of used SH bands. I think.
I'm trying to reconcile that with the ideal diffuser (Lambert) case now... The lambert BRDF is just "k" (diffuse colour), so for a white surface we typically just use dot(N,L) in our per pixel calculations.
If we incorporate the view angle through, we get dot(N,L)/dot(N,V)... which results in a very flat and unrealistic looking surface.
There are actually three cosine terms: dot(incident light dir, surface normal), dot(emitted light dir, normal), dot(viewer dir, normal) -> the last two cancel out (the differential area over which the emitted light is distributed grows proportionally to the observed differential area).
Just a general idea regarding the light-info accumulation concept which was floating around my head for some time now and I finally want to get rid of :
Instead of cone-tracing per screen-pixel (which is how the technique works default wise IIRC), couldn't you seperate your view frustrum into cells (similar to what you do for clustered shading, but perhaps with cube-shaped cells), accumulate the light information in these represented by spherical harmonics using cone-tracing and finally use this SH - 'volume' to light your scene?
You would of course end up with low frequent information only suitable for diffuse lighting (like when using light propagation volumes, but still with less quantization since you would not (necessarily) propagate the information iteratively (or at least with fewer steps if you choose to do so to keep the trace range shorter)) but on the other hand you could probably reduce the amount of required cone-traces considerably (you also would only need to fill cells with intersecting geometry (if you choose not to propagate iteratively)) and, to some extend, resolve the correlation between the amount of traces and the output pixel count.
I'm always using x * x *( 3.0 - 2.0 * x ) instead of Smoothstep, which I think is often times more optimal since Smoothstep maps the input to the range [0..1] at first even if it is already in this required range.