Why not voxels? The idea is not so crazy anymore and certainly used for real-time GI in games (i.e. Crysis 3).
It may sound crazy due to the memory requirements. However, for shadow mapping you just need 1 bit.
A 1024x1024x1024 voxel would need 128MB and suddenly starts feeling appealing.
Perhaps the biggest block right now is that there is no way to fill this voxel with occlusion data in real-time.
The most efficient way I see would be regular rasterization but where the shader (or the rasterizer) decides on the fly which layer from the 3D texture the pixel should be rendered to, based on its interpolated depth (quantized). However I'm not aware of any API or GPU that has this capability. This would be highly parallel.
Geometry shaders allow selecting which RenderTarget should a triangle be rendered to, but there is no way to select which RenderTarget should a pixel be rendered to (which can be fixed function; not necessarily shaders)
You could use a variant of the KinectFusion algorithm to build a volumetric representation of the scene. The basic idea is to get a depth image (or a depth buffer in the rendering case) and then you find the camera location relative to your volume representation. Then for each pixel of the depth image you trace through the volume, updating each voxel as you go with the distance information you have from the depth image. The volume representation is the signed distance from a surface at each voxel. For the next frame, the volume representation is used to find out where the Kinect moved to and the process is repeated. The distances are updated over a time constant to eliminate the noise from the sensor and to allow for moving objects.
This is a little bit of a heavy algorithm to do in addition to all of the other stuff you do to render a scene, but there are key parts of the algorithm that wouldn't be needed anymore. For example, you don't need to solve for the camera location, but instead you already have it. That signed distance voxel representation could easily be modified and/or used to calculate occlusion. That might be worth investigating further to see if it could be used in realtime...