When you create your orthographic matrix for rendering the first shadow cascade, the width and height (S.x and S.y) should be just big enough to encompass the first slice of your view frustum as seen from the light as L.Spiro mentioned.The S.x and S.y are the scale of the scene on the X and Y. Because I'm using an AABB to determine what should be in the shadow map, so it isn't always a square to fit the 1024*1024 shadow map, so its scaled to fit with optimal space usage.
Imagine your physical camera frustum has been split into 4 pieces along the z direction. When looking from the light's position, your orthographic view width and height should only be big enough to encompass the camera frustum slice you're working on.
If your first slice is fairly small, say 10 units, the width and height of the orthographic projection will probably be fairly small too meaning if you're using a 1024x1024 shadow map, you will get a fairly large amount of detail.
The S.x and S.y are for the crop matrix which makes the map encompass the frustum slice.