I totally agree with you that code readability is far more important than conciseness, however my aim here is to maintain the lowest number of LOC, in the spirit of smallpt, while retaining a minimum of clarity.
I didn't check the paper you cited (I'll read it as soon as I'm back home), but I'm already working on adding direct light sampling. It should speed things up a bit. However, the convergence rate is really slow now and any resistance is futile .
I'm using importance sampling quite a lot in my implementation, first to sample the vertices along the ray that are expected to have the largest contribution to a pixel; second, since I'm using path tracing, no exponential ray growth is allowed, so I'm using the contribution of multiple scattering as a hint to decide whether it is worth it to trace the "volume" ray further or just continue with normal path tracing.
I'm still not completely satisfied with the results that I currently have, so there is still a lot of work to be done (especially in the convergence part) before I can call this thing done. In the meantime, here is my latest test render: