This looks quite convincing, but naturally there are still some artefacts related to what you called "dark-is-deep" -- white spots are given little hills that obviously aren't really there. A lot of the instances I've seen of this seem to roughly speaking construct a depth map from a sum of blurred layers of the image (often with what seems like some extra hand-wavy convolution applied at each step), and it looks like you're perhaps doing something similar?
I suspect a purely statistical approach based on windows of pixels might actually work pretty well, but I'm not sure if there's enough easily-available reference data to generate a model.
Also, I forgot to mention something. You were talking about how white spots are given little hills. I know it happens, and it happens because of the variation in the albedo of the surface. But that happens only if you work on a simple unprocessed image. If you were to work only on the lightning data, it would work pretty good. (It naturally happens that something that's deeper is darker, ambient occlusion etc, if you do some crazy math here you can actually recover the depth information pretty accurately)