Difference between SDSM and PSSM?

Started by
14 comments, last by MJP 9 years ago

I've been implementing PSSM (http://http.developer.nvidia.com/GPUGems3/gpugems3_ch10.html) so far for cascaded shadow mapping. I've been reading abit about SDSM (http://visual-computing.intel-research.net/art/publications/sdsm/sampleDistributionShadowMaps_SIGGRAPH2010_notes.pdf) recently, and I'm not sure I understand the difference between them.

From what I gather, PSSM uses a logarithmic division of the splits with static near/far Z-bounds (such as the near/far view distance), while SDSM first transforms all visible objects into the cameras view space, finds the min/max Z and use it for the same split formula as PSSM (logarithmic).

What I find confusing though is that the PSSM article then in section 10-2 builds a crop matrix which takes the smallest of the frustrum AABB and the visible objects combined AABB... resulting in the same tight frustrum as SDSM does...? Isn't that the exact same thing?

Another thing not mentioned in the SDSM article is tightening the frustrums in x/y dimensions?

Advertisement

SDSM works by rasterizing to a depth buffer on the GPU, and then using a compute shader to analyze the depth samples in order to come up with optimal projects for the shadow map splits. This can me much more accurate than using object bounding boxes, especially when consider that it will handle occlusion.

The SDSM paper demo proposes a few different technique. The simplest one is to just compute the min and max Z value visible to the camera using the depth buffer, which you can then use to compute optimal split distances. It also proposes taking things a step further by transforming every depth buffer position into the local space of the directional light, and then fitting a tight AABB per split.

If you use the GPU to produce the splits, do you have to read back the values on the CPU afterwards? So the CPU has to stall for the GPU to finish before continuing?

You can generates matrix on gpu. It seems wasteful to use it to typically generate 4 matrix but it works and the execution time is negligible (less than 10 us according to gpu perfect studio 2)

If you use the GPU to produce the splits, do you have to read back the values on the CPU afterwards? So the CPU has to stall for the GPU to finish before continuing?

Common approach is to retrieve the results from previous frame or the before-last frame to avoid stalling, in exchange for a very small temporal error that goes unnoticed.


If you use the GPU to produce the splits, do you have to read back the values on the CPU afterwards? So the CPU has to stall for the GPU to finish before continuing?

Yes, you need to read back the results if you want to do split setup, culling, and draw setup on the CPU. If you just wait immediately you'll get a stall on the CPU where it waits for the GPU to finish processing pending commands, and that will immediately be followed by a stall on the GPU (since its run out of commands to execute). Adding latency can allow you to avoid the stall, but if you decide to do a full frame of latency your split setup will be out-of-date. This can result in artifacts if the splits are constrained too much for the visible pixels, which can happen due to either camera movement or object movement. It's possible to compute instantaneous velocity and use that to predict a particular pixel's position for the next frame, which can be used during the depth buffer analysis to compute bounds that (hopefully) work for the next frame. This can work pretty well for simple camera movement, but will still break with more complex movement or for camera teleportation.

With modern API's it's possible to actually do all of the setup and culling on the CPU, which eliminates the need for reading back the data. It can still be awkward and limiting depending on the API and extensions used, but it's definitely doable. In fact the demo that I made for this article has a GPU-only implementation that makes use of a compute shader batching system and DrawIndirect, so you can look at that if you want. It's not really an implementation that would scale up to a real-world scenario (unless your game is pretty simple), but it might give you some ideas. DX12, Vulkan, and some of the newer GL extensions have even more functionality available for GPU-driven rendering, which should allow for more flexible implementations.

Is it really needed to do the whole SDSM steps or just the depth reduction pass is enough ?

I actually just do the depth reduction pass.


Is it really needed to do the whole SDSM steps or just the depth reduction pass is enough ?
I actually just do the depth reduction pass.

It does improve quality, if that's what your asking. Where it really helps is cases where the visible surfaces are very constrained along the view-space X and Y axis. For instance, consider looking down a narrow alleyway. I believe the original paper or presentation had some images demonstrating this case.

Ok, yea since the AABB is tighter, the shadow quality is always better and the stabilization of cascade is then not needed.

One problem using the depth reduction is when you don't have enough information, for example a wall is just behind the camera you got the shadow wrong.

Do you know a way to avoid this problem ?

One problem using the depth reduction is when you don't have enough information, for example a wall is just behind the camera you got the shadow wrong.

Do you know a way to avoid this problem ?

Then you are doing it wrong. Whole idea of SDSM is to get smaller shadow receiver volume and then project that towards light to infinity. You never get any missing shadows when you do it like this.

This topic is closed to new replies.

Advertisement