Sorry for the slow response, just got back from my honeymoon!
Before doing anything else just draw the terrain to shadow map and see if the artifacts is even related to that. If that fix it then you either need to prebake terrain to some cheaper form or pick some another shadow map algorithm. Personally I have never been satisfied for VSM in real scenes.
You shouldn't need to draw your terrain into the shadowmap as long as you clear the shadow map to the "maximum depth value".
Already cleared to max depth. The artifact I'm referring to is caused by blurring between max depth and occluders with much lower depth value. A blur with a very large depth difference produces shadows with sharp edges and reduces the ability of the blur kernel to eliminate discretization artifacts. This produces an artifact where shadows that project onto the far plane are sharp and discrete, but the blur antialiasing works well when the shadow is projected onto a non-terrain surface and the shadow edges there are soft.
Instead of using a shadowmap for shadows, you could bake the visibility between the terrain and the main light in an occlusion map. For each texel, cast a ray between the corresponding terrain sample towards the light, and store 0 or 1 depending on whether the ray hits the terrain or not. The intersection code can be optimized in several ways, for instance by precaching the terrain geometry instead of evaluating the fractal function on the fly.
An occlusion map has many advantages: it filters correctly, takes little memory (an 8 bit format is enough), gives you soft shadows for free and it's view independent. Of course you can tweak its resolution depending on your memory and runtime budget.
That's the approach I used more than 10 years ago for shadowing my terrains (on the CPU) and it worked very well. You should be able to prototype it on the GPU rather quickly.
Just an idea!
Not sure I follow what you're describing, if you could cite a whitepaper perhaps?
Another possibility I was looking at was ESSM, looks like it selects the blur kernel sizes variably, perhaps a solution might lie in a soft-shadowing algorithm like that.