I just had a random thought the other day that sounds feasible, but I wonder if any of you guys can poke a hole in the concept.
The depth renders for cascaded shadow maps are orthographic, which means that there isn't any perspective foreshortening on the model renders. Every object with the same model/rotation/scale should have the same relative depth extents regardless of where they are on the shadow map.
I'm thinking it might be possible to render the model depth once to a render target texture, and then for every instance of that model on the shadow cascade you just draw quads and read the depths from the depth-texture that was rendered up front. You would also have to offset the the depths read in from the texture by the instance's distance from the orthographic camera. Then you would just specify the depth output in the pixel shader to allow for depth-testing like normal.
Depth rendering is pretty fast so this might be counterproductive because of the texture reading. You would probably have to atlas several model depth renders onto the same render target to minimize the number of binds/state-changes.
Can anyone see a flaw with the idea, or should I try out an implementation of it and tell you all how it goes?