I had the same idea some time ago as well, but actually there's no need to encode the screen spaced approximated visibility function into SH basis. They do it because they need to pass over the function from one pass to the other (DSSDO -> Lighting).
It would be way better, if you just sampled the screen space along the direction of the light and lerp the lights color with the occluders color based on how sure the algorithm is, that the occluder is actually occluding the light. This way there's no need for any spherical harmonics, as you have the full screen space approximated visibility function (technically it's more than just a visibility function, as you can get chromatic information) and you can use that and put it into your rendering equation.
I've done that for one light, and AFAIK Crysis 2 does it as well, just for the sun (maybe only on characters for finely detailed self shadowing?) -- but just with occlusion, not colouring based on the occluder's colour.
It gets pretty expensive if you do it on every light though, because you want at least 4, and preferably many depth samples to make an accurate occlusion decision (I got decent results with just 6 samples).
With the SH method, you get more approximate results, but it scales to many lights very well.