Cascaded shadow maps biggest problem is in the computation of the frustums of the cameras used to render the shadows. There are multiple kinds of policies;
The most common is surely the one that cuts the main cmera view frusutms into subparts according to distance and use a bouding volume of those slice to create an encompassing orthogonal frustum for the shadow camera.
There are lost efficiency in this scheme because of bounding volume of bounding volume so lots of shadow pixels end up off screen and never used. In other words you loose resolution in the visible zone.
Therefore some recent solutions using compute shaders to be able to determine the actual pixel perfect min depth and max depth percieved in a given image, then you can optimize the slices of the camera frustum to perfection making crazy high resolution shadows, especially in scenes a bit enclosed in walls.
There is another very simple policy for shadow frustums, just center the shadow camera on the view camera's position and zoom out in the direction of the light, each cascade zooming out a bit more thus logically encoding more distance in view space. But this has the problem of calculating shadows behind the view where they could be unnecessary.
I say could; because actually you never know when a long oblique shadows must drop from a high rise bulding located far behind you. this is why this simple scheme is also popular.
In my opinion; this is your scheme that fails. you should visualize the shadow zones by sampling red; blue and green to obtain this:
once you get this debugging will be easy.