Another idea -- at the moment you're doing 'for each position, trace 16 rays', you could instead use the algorithm from this ray-tracer, which is 'for each ray direction, trace rays through a million positions', which maps better to GPU rasterization hardware and might result in less overall passes of the scene.
Hodgman - Yea can't do it that way because of the cavities/indoor. Plus that doesn't really work well in outdoor forests either. If you have a few close boxes, but they are under a ton of tall trees, then it will get no direct lighting at all because leaves will cover every one of those depth buffers, so I have to start at the object and go out just like SSAO.
My very cheap hack to deal with that is to mostly place lights in the upper hemisphere, but also place some in the lower hemisphere, and to not render the ground into the depth buffer, so it doesn't cast shadows upwards onto objects. This allows objects that are covered by a 'canopy' to still receive some level of "AO" gradients.
Stealing inspiration from the above link, you could also solve this issue completely by rendering the shadow maps with depth-peeling, which allows you to measure the length of the ray from the surface position to the nearest occluder, rather than the "most outside" occluder.