spot light soft shadow (variance vs depth pcf)

Started by
11 comments, last by jamesw 17 years, 6 months ago
hi, i've actually 2 different implementation for my spot light shadows : the first uses a depth render target and the nvidia feature (pcf) which gives me a bilinear filtered value and creates some nive soft edges shadows, the performance are very good since I only render in the depth target. secondly i've a variance implementation, as everybody knows the blur (gaussian filter used by post processing effets like hdr, blur, depthof field...) is excellent but the performance are not very good (need to write a hacked color (the moments) and the depth too, and the 2*float16 render target is written less fast than depth24 target) and of course the problem of light bleeding now i want to increase the softness of my pcf shadows but really don't know what is the best technique to use (for realtime) ! any ideas ? just ask the neighbour ? use a noise function (volume texture?) i've tried a lot of stuff but each time the shadow look very uniform (i see the pattern in fact..) thanks a lot for any help ! (sorry for the not very good english !!)
Advertisement
for spotlight I use VSM and rather small shadow maps which I then blur (normally spotlight shouldn't cove to much world space). Shadows are far from hard and you might have some light bleeding but the overal results looks really soft. For characters and other object which need to cast high detailed shadows you need another solution anyway
VSM and PCF will give you the same results for a single planar occluder and receiver (as proven in the paper). As mentioned light bleeding can occur when the depth functions becomes more complex though.

Still don't think of them as separate things: VSM is a fast way to do PCF (and it can do more with hardware, like mipmapping and aniso). Indeed the two can be combined in some good hybrid strategies. PCF is just the brute-force way to filter the depth distribution.

Blurring performance should actually be pretty good, depending of course on your kernel size. Still, we could easily get 13x13 blurs of 1024x1024 shadow maps on modern hardware at quite fast speeds. Some tips:

1) You can get two samples at once using bilinear filtering hardware (ask for a pixel between the two that you want, with the fractonal part being the relative weights that you want each pixel to get).

2) Make *sure* you're using a separable convolution, like Gaussian, and do the horizontal and vertical passes separately.

3) You can also apply smaller kernels several times instead of one large kernel. Ex. apply a 3x3 kernel, then triple the Gaussian sigma value and apply it again.

4) ATI can use Fetch4 to get 2-4 pixels at once if you split your textures into two one-component ones. May or may not be worth it.

There's more stuff, but those should get you up to quite a fast speed. In my experience VSM can be implemented with a very small performance hit over regular shadow mapping, and *much* better quality. Also note that as Guoshima mentions, you don't need as large a shadow map since VSM can hide aliasing much better without needing to resort to huge resolutions. Thus on equal-quality comparisons, VSM can be a *huge* win (as also shown in the VSM paper).
Quote:Original post by Guoshima
For characters and other object which need to cast high detailed shadows you need another solution anyway


ok so what is this solution ?
I just need to know how I can have a decent blur when I 'm not using variance shadow map
I know I will never have the same blur quality but I need decent quality and for now all my blur shaders tests were very bad ! :(

Quote:Original post by AndyTX
Thus on equal-quality comparisons, VSM can be a *huge* win (as also shown in the VSM paper)


the problem is that I'm working on a Nvidia hardware and writing to a 64bits (8bytes) render target (instead 32bits / 4 bytes) is twice more costly than writing just in a 24 depth buffer (and I can disable color write)
is that possible to use another format for the render target ?

so I have the variance shadows working on small render target, and the blur is very nice, but I have to enhance the blur of my classic depth shadow map ! ;)


I've found Manual Bilinear 3x3 PCF (weighted PCF, not the simple averaging) gives very good results even with a large magnification of a small shadow map. Its quite expensive though, as it takes 9 samples. It looks much better than some limited VSM blurs, but is probably/most-likely much more expensive. Better VSM blurs (which require more time) would probably be just as good minus any light bleeding artifacts. NOTE: My VSM experience is somewhat limited as I got it working, but spent most of the time dealing with 16bit fp precision issues on nvidia hardware for moderate to long ranges (even storing -1->1, and trying fp16x4). Also had issues with nvidias fp16 texture filtering causing artifacts compared to manual bilinear filtering in "large" light ranges. Haven't had time to get back to VSM due to other work. fp32x2 works quite well, but is sloooooowww on most current cards.

You could probably get similiar results with 3-4 weighted hardware bilinear PCF (nvidia cards only) lookups, offset from the original sample points.

A 4x4 dithered average PCF (only 4 samples per pixel, mentioned in one of the graphics gems books i think) looks good for less work, but still has some visible blockiness in the worst situations due to the averaging. It does require the screen position to calculate the sample offsets, so its still got a few more instructions than a standard 4 sample filter.

I hear of people using rotated sample locations (4 samples or more) to good effect, but I haven't tried it much. If anyone (wolf?) wishes to speak about there usage/attempts at this, it would be welcome by me as well.

Anything that is ordered (such as averaging PCF) will probably look "blockier" unless jittered, or rotated. The bilinear weighting helps this to some effect compared to simple averaging, but its still not as good as jitter/rotated. I'm no expert on this subject, so you can always just try things. There are a few papers out there discussing and comparing different PCF techniques and Poisson disc filters and the like.

All that said, I would also keep working on VSM. I think it has good potential, and anything that could reduce texture mem usage for shadows (ie store at lower resolutions for comparable quality) will help speed tremendously.
Quote:Original post by zoret
the problem is that I'm working on a Nvidia hardware and writing to a 64bits (8bytes) render target (instead 32bits / 4 bytes) is twice more costly than writing just in a 24 depth buffer (and I can disable color write)
is that possible to use another format for the render target ?

Texture formats are certainly a problem right now as they are highly unorthogonal on both vendors...

Unfortunately, you will probably need at least 16 bits for depth and another 16 for depth^2. Theoretically all you need then is 32-bits, and indeed you can use G16R16 on ATI. Unfortunately I know of no similar format on current NVIDIA cards, and thus you'll have to use 4x fp16 (annoyingly enough, 2x fp16 is not renderable).

I expect this will get better with new hardware, but for now you may just have to eat the bandwidth cost, making the shadow map smaller if necessary.
Quote:Original post by multisample
Better VSM blurs (which require more time) would probably be just as good minus any light bleeding artifacts.

Yep - actually it's possible to reproduce exactly any uniform linear sampling that you do with a preprocessing step with VSMs. In particular NxN guassians are what we used ithe most in the paper. The advantage of doing it in advance is that you only have to do it once per shadow map pixel (rather than per fragment), and it's often separable (such as gaussian), making really huge filter kernels quite viable.

Quote:Original post by multisample
Also had issues with nvidias fp16 texture filtering causing artifacts compared to manual bilinear filtering in "large" light ranges.

You too? I thought it was just me! I was always surprised when the manual bilinear filtering produced better results than the hardware... guess they're doing it in lower precision somewhere...

Quote:Original post by multisample
I hear of people using rotated sample locations (4 samples or more) to good effect, but I haven't tried it much.

ATI has quite a few papers on this. Indeed the goal is to try and futher hide the discrete nature of the shadow map - it works well with enough samples. On ATI you can use Fetch4 to do quite a few samples and get a pretty good result actually.

Regarding bilinear weights, you should probably always use them. Otherwise you're going to be dealing with only N possible contributions... unless you're taking 256 samples or more (unlikely), this is quite undesirable.
Quote:Unfortunately I know of no similar format on current NVIDIA cards, and thus you'll have to use 4x fp16 (annoyingly enough, 2x fp16 is not renderable).


Both D3DFMT_G16R16F and D3DFMT_G16R16 are renderable and filterable on my 6800GT.
Quote:
Both D3DFMT_G16R16F and D3DFMT_G16R16 are renderable and filterable on my 6800GT.


Actually, only fp16x2 is renderable on the 6/7 series (according to NVIDIA's GPU Programming Guide and my experience). Both formats are filterable. NVIDIA could be using an fp16x4 in the background behind our back, but I don't think they are.

AFAIK, filtering on NVIDIA's hw happens at the same bit/precision as the source, so fp16 is filtered using fp16, rgba8 using rgba8, and fp32 as fp32. Hence filtering results can be less than spectacular when you filter outside of the range/precision of the source.


Quote:Original post by AndyTXATI has quite a few papers on this. Indeed the goal is to try and futher hide the discrete nature of the shadow map - it works well with enough samples. On ATI you can use Fetch4 to do quite a few samples and get a pretty good result actually.

Regarding bilinear weights, you should probably always use them. Otherwise you're going to be dealing with only N possible contributions... unless you're taking 256 samples or more (unlikely), this is quite undesirable.


I haven't found many papers that use only around 4-6 samples which is what I am finding as acceptable for what I need (multiple lights, multiple shadows, current gen). If I had only 1 shadow the 12 sample one might be doable. Anyone have a reference to any lower sample rotated/jittered versions that look good ?

As for the Bil weights, I totally agree. I messed around with this a bit and i personally think its almost essential. Someone could probablty work out the 4x4 dithered to use bil weights for each pixel and get good results. At a decent distance the 4x4 dithered looks pretty good (due to the "noise" factor), so a bilinear weighted version may work well.

As per VSM and fp16, I was fighting the filtering problems (I could live with manual filtering if I can blur them), but I still had problems with the medium to larger light ranges. I believe its due to the "square" of the depth, since it requires much more precision than the depth itself. I was wondering if the Ch. Inequality would work with depth and sqrt(depth) as that might give better results (in this case treating sqrt(depth) as the random variable, and depth as the "square" of the random variable. You would probably still have issues though. What were your results with light ranges that were acceptable (relative to your smallest shadowable range) ? Maybe I should move this to another thread as I think I may have hijacked it a bit :)
Quote:Original post by multisample
I believe its due to the "square" of the depth, since it requires much more precision than the depth itself.

Yeah, the higher moments need more precision to represent. Specifically, Chebyshev's inequality is highly numerically unstable.

Quote:Original post by multisample
I was wondering if the Ch. Inequality would work with depth and sqrt(depth) as that might give better results (in this case treating sqrt(depth) as the random variable, and depth as the "square" of the random variable.

I'm not sure if that would help. I think we're actually dealing with a "true" precision issue here, in that we've lost too much data to properly combine lots of distributions and still get an extremely accurate result.

Quote:Original post by multisample
What were your results with light ranges that were acceptable (relative to your smallest shadowable range)?

We got decent results for fp16 and fx16 (better with the latter of course), and pretty much perfect results for fp32 (as expected with a 24-bit mantissa). fx32 would be ideal, and potentially one could encode it into 4x fx8, but one may lose precision in the process.

This topic is closed to new replies.

Advertisement