This topic is 3979 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi, I have Variance Shadow Maps running for a while now and I have now some time to improve them. The old thread was retired: http://www.gamedev.net/community/forums/topic.asp?topic_id=374989&PageSize=25&WhichPage=5 So I had to open up a new one. I want to look into light bleeding artefacts. What I did first is go through the equations and see that I understand everything. I need to write a short documentation on how it works. Here is what I have: ----------------- The value in x is the depth value that is calculated based on the distance of the light to the vertex. The value in x -also called Moment 1- is stored in the x channel of the two-channel render target in the first render pass. E(x) holds the result from filtering x. The value x^2 (Moment 2) is stored in the y channel in the first rendering pass. E(x^2) holds the result of filtering x^2. Equation for mean μ value: μ = E(x) = M1 The mean μ value and E(x) keeps the result of filtering the x channel of the texture that holds the x = depth values. Equation to calculate variance: σ^2 = E(x^2) - E(x)^2 = M2 - M1^2 x^2 is the depth value multiplied with itself and stored in the y channel of the render target. E(x^2) holds the result of filtering this value. Chebyshev's inequality (one-tailed version): P(x >= t)<=pmax(t) = σ^2 / (σ^2 + (t - μ)^2 The variable t holds current fragment depth. ----------------- The interesting part is that all implementations I see seem to exchange the t - u part in the source code. It comes down to the following two lines:
//
float m_d = (moments.x - RescaledDistToLight);

// Chebyshev's inequality
float p = variance / (variance + m_d * m_d);

t is obviously the variable RescaledDistToLight and the mean value μ is moments.x ... which means it seems to come down to (-u -(- t))^2. If this is true I would use negative values in render targets? I guess I miss something here. The part I do not get is the following: P(x>=t) ... I can see that t is the fragments depth value, but x would be following the naming convention above the depth value that is stored in the x channel of the render target, but it would not represent the occluder depth distribution function ... which is probably E(x) or μ? So t > E(x) is a requirement but for t <= E(x) the standard shadow mapping would jump in. Any clarification is appreciated, - Wolf

##### Share on other sites
Quote:
 (-u -(- t))^2
... I solved my own problem here :-) ... it does not matter if I write t - u or u - t because I just need the difference without looking at the sign ... if it ends up in minus the m_d * m_d line will make it plus anyway :-)

[Edited by - wolf on March 20, 2007 8:46:32 PM]

##### Share on other sites

Andrew's done a whole bunch of new work on VSM since the I3D paper, some of it related to light bleeding.

He is also contributing a chapter to GPU Gems 3 with his new research.

##### Share on other sites
Yeah exactly :) The direction of comparison only matters for selecting which "tail" of the distribution you are using (i.e. t < mu or t > mu).

On a separate note, have you had much luck with the light bleeding reduction stuff that I sent you? It seems to work pretty well with my "toy" scenes at least.

##### Share on other sites
Hey this looks cool. I just checked out the demo and the results look great, but ... unfortunately the technique does not seem to be fast enough on my target hardware platforms :-) ... so I will invest some time to make VSM more bullet-proof.
Would you guys be interested in seeing the propsals regarding VSM for ShaderX6? I would appreciate your comments here. If you are interested in cross-referencing, talk to your editor if you can send me your paper and we will refer to it in ShaderX6 :-). We usually do things like this ...

##### Share on other sites
One interesting thing: If I click distribute precision on my GeForce 7900 GTX there is no real difference in speed .. this might have something to do with the fact that the 7900 does not have a "native" 32:32 format. How does Summed-Area VSM work then with 16:16:16:16 render targets?
The softness slider is mind-blowing :-) ... hm the difference between 32:32:32:32 and 32:32 is obvious :-) ... cool.
What does light bleeding reduction do? Using the Markov term? What does softness do?

##### Share on other sites
we use some of your suggestions for reducing light bleeding that were made in certain scenes where self-shadowing of characters is important. Thanks for this.

##### Share on other sites
Is there any way for VSMs to look nice on older machines? I tried using them with some older hardware, and I couldn't get them to look decent without 32f color channels.

##### Share on other sites
I believe with the current implementation this will be difficult. If even a 16:16 render target is not enough, you will have problems to run 32:32 on older hardware. What you might try is to distribute the two 16-bit values over 8:8:8:8 ... some hardware platforms need this.

##### Share on other sites
Doing nothing but changing from 16:16 to 32:32 improved my VSM implimentation immensely. I haven't done much with it so perhaps I was doing something wrong.

8:8:8:8 is basically the same as 16:16 as far as precision goes right?

##### Share on other sites
More or less ... 8:8:8:8 is just supported by most hardware.

##### Share on other sites
Ok, let me try to respond to all of the questions/comments... sorry if I miss any!

Quote:
 Original post by wolfHey this looks cool. I just checked out the demo and the results look great, but ... unfortunately the technique does not seem to be fast enough on my target hardware platforms :-)

Yeah summed-area tables are pretty heavy, and feasible really only on G80's and maybe X1900s right now. That said, they're a forward-looking technique that should work really well for plausible soft shadows, and are much faster than PCSS, etc. I'll get really excited about them when we have hardware doubles (even if slow), and we can drop the distribute precision stuff. Doubles will completely clear up any precision issues.

Quote:
 Original post by wolf... so I will invest some time to make VSM more bullet-proof.

Yeah, IMHO VSMs using ideally all of multisampling, mipmapping+trilinear and a small-to-medium blur (maybe 4x4 or more), and a light bleeding reduction function is the current best solution. Specifically on G80 that supports fp32 filtering and multisampling, the results are flawless and ridiculously fast (400+fps).

Quote:
 Original post by wolfWould you guys be interested in seeing the propsals regarding VSM for ShaderX6? I would appreciate your comments here. If you are interested in cross-referencing, talk to your editor if you can send me your paper and we will refer to it in ShaderX6 :-). We usually do things like this ...

Yeah I'd love to see the proposals and give feedback, etc. You have my e-mail address correct? If not, please PM me.

Quote:
 Original post by wolfOne interesting thing: If I click distribute precision on my GeForce 7900 GTX there is no real difference in speed .. this might have something to do with the fact that the 7900 does not have a "native" 32:32 format.

At one point there was something broken with the 7000 series... in particular they were reporting a format as filterable that really was not. I'll have to look into that in more detail at some point...

Quote:
 Original post by wolfHow does Summed-Area VSM work then with 16:16:16:16 render targets?

Probably very badly... SAVSMs really need doubles, although they can be "made to work" decently with 4xfp32 as the demo shows.

Quote:
 Original post by wolfWhat does light bleeding reduction do?

Use the function that I sent you earlier and describe a bit in the post... just lopping off the one tail of the distribution (artist-editable aggressiveness).

Quote:
 Original post by wolfWhat does softness do?

It varies the minimum filter width. For PCF, this means taking more samples which is O(n^2). For standard VSM it means increasing the size of the separable blur (although this isn't implemented in that version of the demo - it just does a LOD bias which doesn't look very good ;)). In the D3D10 demo that I have, this is implemented correctly and costs O(n) due to the separable blur. For SAVSM this just literally means clamping the minimum size of the filter rectangle, which is O(1).

Quote:
 Original post by stanloIs there any way for VSMs to look nice on older machines? I tried using them with some older hardware, and I couldn't get them to look decent without 32f color channels.

Yeah VSMs do need high precision. The best that I can suggest is to use the tips that I gave in my GDC presentation last year (slides available at NVIDIA developer site). 16-bit precision is acceptable, and distributing that works fairly well for normal VSMs. 8-bit - even if distributed - may be pushing it, but it depends on the depth range of your light. Make sure to clamp the latter as aggressively as possible if precision is a problem! Also make sure that you're using a linear depth metric - such as distance to light point (for spot lights) or distance to light plane (for directional lights).

##### Share on other sites
Hey Andy,
thanks for the extensive description. I just looked closer today at your new approach. Maybe it is my graphics card but it seems like the shadow is moving slightly ... it is a kind of shimmering. You can see this near the back bumper of the car. It happens with 512x512 and 1024x1024 indenpendly where the light bleeding slider is and also independently of the softness slider ...
Maybe it only happens on my 7900 GTX with 93.71 drivers ... slightly outdated.

- Wolf

##### Share on other sites
Quote:
 Original post by wolfthanks for the extensive description. I just looked closer today at your new approach. Maybe it is my graphics card but it seems like the shadow is moving slightly ... it is a kind of shimmering. You can see this near the back bumper of the car.

Almost certainly numeric problems, although they should be mitigated somewhat by increasing the softness, or decreasing the shadow map resolution. Doubles will solve this in the (near) future.

As I mentioned, SAVSMs will become more useful when people really want to do plausible soft shadows, which is still a little ways off, and by then doubles should be supported.

For now, standard hardware-filtered VSMs work really well, especially on the 8000 series! I'll release the D3D10 demo soonish (it'll be in Gems 3 at the very least)...

##### Share on other sites
Actually on further reflection it may not be a precision problem. I remember having some trouble on both ATI cards and G70/NV40 with respect to dynamic flow control depth... IIRC that was causing random blinking blocks and other weirdness, which may be what you're seeing. At one point, merely loading the shader on ATI would instantly reboot the computer!

In any case the control flow depth problems are easy to work around if you're targeting these series of cards; I just didn't bother since I've been concerned mostly with G80 these days :)

##### Share on other sites
Are there any articles on how to reduce light bleeding or can anyone describe his (or hers) working approach?

How severely is performance affected by VSM? It seems that GL_RGB32F, full screen blur and linear interpolation in the shader run at 30 FPS on my laptop (NVidia 7900 Go) while standard shadow maps with 24 bit fixed depth buffer and the free PCF run with > 200 FPS.

Another thing I noticed is that when using GL_RGBA16F and storing the moments in two components to enhance precision I see way more artefacts when using GL_LINEAR filtering (like "random" white pixels) than when doing linear interpolation in the shader and using GL_NEAREST. Sounds like a bug or are that expected precision problems?

Thanks!

##### Share on other sites
Quote:
 Original post by krausestAnother thing I noticed is that when using GL_RGBA16F and storing the moments in two components to enhance precision I see way more artefacts when using GL_LINEAR filtering (like "random" white pixels) than when doing linear interpolation in the shader and using GL_NEAREST. Sounds like a bug or are that expected precision problems?Thanks!

If your using NVIDIA cards (7 series or lower, I can't speak for the 8 series), then this is "normal" behaviour. AFAIK, Nvidia cards interpolate in the using the bit depth that the texture is in. So a F16 texture uses F16 for interpolation, causing the artifacts. I sure hope thats changed for the 8 series. I don't believe recent ATI cards do this, but don't quote me on that.

This issue is the main reason I had to dump VSM as a general solution (at least for now). I really want to use VSM, but it has serious problems using F16 based formats, and thats all most cards (nvidia) can do these days respectably. Add large light range to the equation (for city levels etc) and it gets worse . If someone solves this it would be much appreciated (free beer? :) ).

##### Share on other sites
Yeah as noted VSM is only usable with fp16 for fairly small lights. The Gefore 8 series supports full filtering for fp32 textures and thus has *no* precision problems. Quite a joy to use :)

Regarding speed, on the 8800 VSM is *way* faster than an equivalent PCF implementation. It runs on the order of 600+fps in D3D10 even with gigantic blurs.

Regarding light bleeding, check out the beyond3D thread linked earlier in the thread by Pragma. It discusses a simple and pretty-much free way or reducing light bleeding. Of course degenerate cases can be constructed but I find that it produces quite good and acceptable results in practice.

##### Share on other sites
Quote:
 Original post by AndyTXRegarding light bleeding, check out the beyond3D thread linked earlier in the thread by Pragma. It discusses a simple and pretty-much free way or reducing light bleeding. Of course degenerate cases can be constructed but I find that it produces quite good and acceptable results in practice.

Thanks for your replies so far!
Just to leave no doubt: You propose
p = smoothstep(threshold, 1.0, p); where p is the bound for the Chebyshev's Inequality?

##### Share on other sites
Actually I've been using "linstep" lately so as to try and maintain the original shape of the falloff function, but smoothstep will work fine too. You can change the falloff function as desired: the point is to clip off the tail to avoid light bleeding.

##### Share on other sites
I can't find any documentation on these "linstep" and "linsmooth" functions. What are they and how can I implement them?

##### Share on other sites
smoothstep is a built-in intrinsic function. linstep isn't for some reason but can be trivially implemented:

float linstep(float min, float max, float v){    return clamp((v - min) / (max - min), 0, 1);}

You can of course vectorize this if required. It can also be implemented in terms of lerp, but there's no real advantage in that. since the divide is required in either case.