Yes, but I would take a few steps back first.Is there any way to only re-render objects that are in motion relative to the light ?
Caching a lightmap scene with static objects is ok, but I would not be surprised if the cache didn't survive very long. The reason is, that shadowmapping really suffers from a lot of artifacts and therefore you often try to maximize the shadow map density and doing other tricks depending on light source and the camera. Using techniques like CSM often only renders the shadowmaps directly in front of the camera, which means, that turning the camera would often invalidate the cache.
On the other hand, drawing only to the depth buffer is really fast. IMHO the benefit of such an optimization is questable and often other parts of the pipeline have a much heavier impact on the performance. And adding animation for trees and grass would make most of the effort obsolete.