The hardware depth test has certain optimizations in place which can significantly speed up the rendering of occluded fragments. However, those optimizations require additional memory which is why depth attachments are more then a simple texture. You can't just use them as a regular render target, write to them, and still expect that additional memory for the optimizations to be consistent.
This. Have in mind that writing directly to the depth buffer means no early Z rejection. The GPU can "preemptively" reject fragments before executing the fragment shader by just testing the resulting depth value from the vertex shader outputs.
If you write to the depth buffer directly, the GPU has to execute the fragment shader to know the actual depth value, thus no early Z rejection is possible.
The reason I initially wanted to do this is because my shadow maps for directional lights have 4 cascades, and I tried generating them simultaneously by quadrupling the geometry using the geometry buffer. If I rendered only to the depth buffer I was able to avoid allocating an additional 4 R32f 2048x2048 color attachment slices.
After some profiling I noticed that this isn't really any faster than generating the cascades in individual passes though, provided that I use decent frustum culling for the individual cascades.
I think the best option is to allocate 3 buffers: The depth attachment, into which you render, the intermediate texture and the final texture. Note that the first 2 can be reused by other lights after the filtering, as long as the other shadowmaps have the same or a smaller size.
This is precisely what I am doing now. It is also much easier to have either a linear or exponential shadow map this way, which is needed for ESM.