# Extending VSM - cheap PCSS & alternate representation

## Recommended Posts

lonesock    807
Hi, All. I've just started playing with shadow maps, and I started implementing vanilla VSM because I found the concept to be quite elegant. I've found 2 things which are simple ways to extend the concept: 1) Since we already have the sigma values, I can use that to estimate the blocker position (a la PCSS). If you were going to do an 8x8 search for a blocker, instead get the average depth & depth^2 from the 3rd mipmap level, and: d_blocker ~= avg_depth - 1.5*sigma; where: sigma = sqrt( avg_depth2 - avg_depth*avg_depth); You then need to clamp the blocker distance between 0 and the fragment distance, or you could get negative/undefined numbers. Using the standard PCSS calculation I get a penumbra width estimate...take the log2() of that, and use that as my mipmap level for the regular VSM calculation! 2) Storing the depth^2 values gets tricky: instead of storing depth & depth^2, I first scale all my depth values so that they are in the range [0..1] (by knowing something about the scene in question and using a uniform float inverse_max_depth). Then I use the knowledge that is X is in the range [0..1], then X-X^2 is always in the range [0..1/4]. Also, X-X^2 is very linear near X=0 and X=1, and flat near X=0.5, but X is linear there. So I store depth and encoded = 4.0*(depth - depth^2). I think this gives me better precision over the entire range, but I'm not entirely sure, if any math gurus wish to verify this I would be most grateful! (it seems to work, so just thought I'd share). Then to recover: sigma2 = avg_depth - 0.25*encoded - avg_depth*avg_depth; The mandatory simple screenshot: the demo (with source code). [M] to toggle mouse control [I] to toggle mouse invert Y-Axis [Esc] to quit [Arrow Keys] movement [PageUp/Down] change the light diameter [Space] toggle light rotation (and will show fps info) Sorry, I haven't gotten it working on ATI cards yet. Also, this is using RGBA16F because I couldn't get my 7300 to use 16-bit integers or LA16F or GR16F! 16-bit integer formats work great on my 8600 at home, and I hope they will also be supported on ATI HW because I need mipmaps and trilinear filtering. Future: * I want to do a simple Gaussian blur on the mipmaps of the shadowmap (as Mintmaster has mentioned). Right now I do a little 4-sample fakery in the final shader. * I want to reduce the light bleeding (I've seen AndyTX say there is a simple way, but haven't found the code yet) * I want this to work on as many platforms/HW as possible. Please hit me with any criticism, ideas, comments, questions, etc.!

##### Share on other sites
MJP    19754
Quote:
 Original post by lonesock* I want to reduce the light bleeding (I've seen AndyTX say there is a simple way, but haven't found the code yet)

The "AndyTX" way is to simply clip off the tail end of Chebyshev's Inequality. In HLSL it looks something like this:

// calc p_max using Chebychev's inequalityfloat m_d = moments[0] - depth;float p_max = variance / (variance + m_d * m_d);// clip off tail end of the inequality to reduce light bleeding.  p_max = smoothstep(0.3f, 1.0f, p_max);

...where smoothstep is defined like so:

float smoothstep (float min, float max, float input){	return (clamp((input - min) / (max - min), 0, 1));}

##### Share on other sites
lonesock    807
Quote:
 Original post by MJPawesomely helpful stuff

Thanks! I will work that into the next revision!

Related note: I found why my code wasn't working on ATI cards. glCreateProgramObjectARB() was returning a negative number on success. I had assumed that >0 meant success, so even though the shaders compiled fine I was not using glUseProgramObjectARB because the ID was < 1.

One thing I forgot to mention: the demo will load .OBJ files, just drag them onto the exe to see what this looks like on a random scene. Don't expect greatness on large scenes, however, this is using only a single 512x512 map, and no true blurring.

##### Share on other sites
AndyTX    806
Very cool! I actually use a similar approximation of blocker depth: mu - sigma. I chose this because it's exact at the "center" of the filter (when 50% of the filter is covered by the blocker, and 50% receiver). Curiously is there a reason why you chose 1.5*sigma? It shouldn't matter too much anyways, but I'm interested. I particularly use this approximation with the summed-area variance shadow maps stuff (see GPU Gems 3 chapter) where you can get very nice results that don't suffer from the blocky artifacts of using mipmapping for blurring. Of course Mintmaster's custom mipmap generation stuff (Gaussian blur, etc) may work quite well also but I haven't had the time to try it.

That's an interesting depth/variance encoding. I'll have to try that out and run the math on it. Thanks for posting it!

MJP already posted the simple "light bleeding reduction by over-darkening" (although what he calls "smoothstep" should actually be called "linstep" there) that I discussed in GPU Gems 3. Further work in Layered Variance Shadow Maps (to be published very soon - just cleaning it up) and Exponential Variance Shadow Maps (as well as Exponential Shadow Maps themselves - see ShaderX6) provide more ways to generalize and approximate the depth distribution, each with associated trade-offs. There's a recent thread at Beyond3D in the console section about the GDC presentations wherein a couple of us are discussing some possibilities in more depth.

##### Share on other sites
lonesock    807
Quote:
 Original post by AndyTXVery cool! I actually use a similar approximation of blocker depth: mu - sigma. I chose this because it's exact at the "center" of the filter (when 50% of the filter is covered by the blocker, and 50% receiver). Curiously is there a reason why you chose 1.5*sigma? It shouldn't matter too much anyways, but I'm interested...

Well, the 1.5 came about because I was using Excel's solver to minimize error of this approximation on a step function, using a few different smoothing widths. The solver almost always came back with a value around 1.5. Ironically, in my demo I just use 1.0 because it didn't really make any difference that I could see. [8^)

Quote:
 Original post by AndyTXThat's an interesting depth/variance encoding. I'll have to try that out and run the math on it. Thanks for posting it!

You are welcome! (btw, I forgot to mention that when initializing the depth/variance map, I need to use 1.0/0.0 values, instead of 1.0/1.0)

I don't have any of the GPU Gems * or ShaderX* series, though they look cool. I just do this stuff as a hobby, and none of my local bookstores carry them so I can't even "impulse buy" them [8^). Thank you for the feedback. I will check out the resources you mentioned and get caught up on the more recent work.

##### Share on other sites
swiftcoder    18432
Quote:
 Original post by lonesockRelated note: I found why my code wasn't working on ATI cards. glCreateProgramObjectARB() was returning a negative number on success. I had assumed that >0 meant success, so even though the shaders compiled fine I was not using glUseProgramObjectARB because the ID was < 1.

Which is the reason why a program handle is an unsigned integer (GLuint). ATI returns large numbers for texture handles, which become negative if interpreted as signed.

##### Share on other sites
lonesock    807
Quote:
 Original post by swiftcoderWhich is the reason why a program handle is an unsigned integer (GLuint). ATI returns large numbers for texture handles, which become negative if interpreted as signed.

I see. I am using GLee 5.21, and it defines GLhandleARB as type "int". Thank you for the response.

##### Share on other sites
swiftcoder    18432
Quote:
Original post by lonesock
Quote:
 Original post by swiftcoderWhich is the reason why a program handle is an unsigned integer (GLuint). ATI returns large numbers for texture handles, which become negative if interpreted as signed.

I see. I am using GLee 5.21, and it defines GLhandleARB as type "int". Thank you for the response.

Hmm, I just looked this up, and it seems the ARB did define GLhandle as a signed integer. However, when the extension was approved into the main library, GLhandles were removed, and became GLuint.

Since you are using GLee, there is really no point warting your code with *ARB all over the place - use the standard GL 2.0 functions and types, and GLee will deal with it for you ;)

##### Share on other sites
lonesock    807
OK, I've updated the original zip file with new code and demo. Also updated the original screenshot

DONE
* Reduced light bleeding
* Should run on ATI HW too
* uses RGBA16 textures to get semi-cross-HW compatibility (some older NV cards drop this to RGBA8, you'll know if this happens)
* do a simple 3x3 Gaussian blur on the shadow-map just after rendering it

STILL TO DO
* larger Gaussian blurs (separable)
* get the RGBA8 encoding working (has some weird artifacts)
* enable MSAA (which I've never done before, just need to Google I guess [8^)

@swiftcoder: Thanks for all your help and info!

##### Share on other sites
AndyTX    806
Lookin' good! I'm definitely interesting in your low-precision encodings especially and how well they scale to larger depth ranges. Any details that you're willing to provide would be appreciated :)

##### Share on other sites
lonesock    807
Quote:
 Original post by AndyTXLookin' good! I'm definitely interesting in your low-precision encodings especially and how well they scale to larger depth ranges. Any details that you're willing to provide would be appreciated :)

Thanks! I've updated the demo so now you can toggle texture formats and the alternate encoding (to see the results live). The depths must be scaled in the 0..1 range for this demo anyway, since I am trying to support fixed point texture formats. I'm not sure how the encoding would work outside that range, but for large depth values the depth^2 term would dominate the depth term, so I imagine it would be of no help without the scaling.

Note: there is now a readme file packed with the demo which lists all the keys.

I'm still having trouble getting this to work on many ATI cards [8^(

##### Share on other sites
Hi, I've been following some of these discussions on Shadow Mapping and I am having some issues with VSM.

I seem to have problems with quality of shadows both on Andy's original demo, lonesock's demo and I think I tried a DirectX9 version of a later demo of Andys as well and am wondering whether its inherent with the algorithm/graphics card support/GLSL problems/Drivers problems?

Notice the grainy bits of the floor.

I have a GeForce 6200 with the newest drivers so I think it should be able to use this technique, though I'm not an expert.

Neutrinohunter

##### Share on other sites
AndyTX    806
Quote:
 Original post by NeutrinohunterI seem to have problems with quality of shadows both on Andy's original demo, lonesock's demo and I think I tried a DirectX9 version of a later demo of Andys as well and am wondering whether its inherent with the algorithm/graphics card support/GLSL problems/Drivers problems?

What you're seeing is an numeric precision problem. Note that you have the demo there running in 8-bit integer mode! Try 16-bit float and it should be fine. The "problem" with the original demo was that when enabled it requested a 16-bit integer texture and the NVIDIA drivers silently switched to an 8-bit texture! 8-bits isn't enough for even standard depth maps, let alone variance shadow maps.

##### Share on other sites
wolf    852
Hey Lonesock,
drop me a PM and I will send you ShaderX6 ... you are doing great stuff :-)

- Wolf

##### Share on other sites
lonesock    807
Hi, All!

OK, I've updated the demo once again. The big change is that when using RGBA8i mode, it automatically switched to using a 16-bit packed mode (rg for depth, ba for the depth^2 encoding). (Note: this is thanks to this thread from Slagh! Everybody who needs to do accurate packing go check it out and rate Slagh up! The conventional wisdom of fract() did not work at all for this)

I've also had some success running it on some ATI HW, so give it a go! (btw, if anyone with ATI HW has the demo fail, but has acces to CodeBlocks/MinGW _and_ wants to help debug this thing I would be very grateful!)

@ Neutrinohunter: at the top of the window you can see the text "map=??" which tells you what texture mode was used for the shadowmap. The demo will request various modes (via the F1 key), but will cycle past unsupported texture modes (the format needs to be supported both as a texture_2d and in the FBO). The RGBA packing mentioned above should work for you now!

@ AndyTX: thanks again for your input (and PM's)!

@ Wolf: thank you so much! That is a very generous offer! (PM sent, naturally [8^)

##### Share on other sites
AndyTX    806
Cool, I really like the RGBA8i mode... it gets better results than the RG16f one! Do you know how well it scales to larger depth ranges by any chance? I'm also interested in whether you can use a similar encoding with RGBA16i or whether you have any thoughts about better encodings for RGBA16f (as those two filter on previous generation AMD and NVIDIA hardware respectively).

Any chance we can see some code for the encodings too (once you have everything finalized of course)? I think I know what you're doing but it would be clearly to see it of course.

Cheers!
Andrew

##### Share on other sites
lonesock: Meh, well the proggie pretty much crashes for me when I press F1. Must be trying to use floating point buffers which don't go fast for me.

Your log says the FBO couldn't initialise, I definitely have FBO abilities as I have them in my current shadowing project.

AndyTX: I've just looked at your demo again and your right, I get problems at very shallow angles due to that fact. Also the weird grainy artefact around the shadows too.

Wolf: Is ShaderX6 available yet? I think I remember you saying somewhere it was going to be available at GDC, but I'm wondering whether thats made it as far as the UK yet? Also what technologies are new in the book and whether it has more OpenGL than X3 (I'v read X2 but that didn't have any OpenGL in).

Neutrinohunter

##### Share on other sites
For some reason your program crashes when trying to acquire RGBA32f. My GPU does support GL_ARB_texture_float and a few other things I presume I'd need. Is there any surefire way to tell which formats should be supported?

Or is best method just to initialise an FBO and check for GL_FRAMEBUFFER_UNSUPPORTED_FORMAT_EXT or equivalent?

Neutrinohunter

##### Share on other sites
lonesock    807
Quote:
 Original post by AndyTXCool, I really like the RGBA8i mode... it gets better results than the RG16f one! Do you know how well it scales to larger depth ranges by any chance? I'm also interested in whether you can use a similar encoding with RGBA16i or whether you have any thoughts about better encodings for RGBA16f (as those two filter on previous generation AMD and NVIDIA hardware respectively).Any chance we can see some code for the encodings too (once you have everything finalized of course)? I think I know what you're doing but it would be clearly to see it of course.Cheers!Andrew

Thanks! The RGBA packing mode basically should get the same resolution as 16i modes. That is a great idea to extend the packing to 16f and 16i formats, especially as it turns out that the RGBA8-packed mode still looks bad on older (pre-8xxx) hardware. I'm pretty sure that the artifacts come from my use of glGenerateMipmapEXT to generate the original mipmaps (as everything else, blur included, do the manual un/packing). Is there a way to quickly generate mipmaps, preferably _while_ blurring, using shaders and an FBO?

The packing for 16i will be straight-forward, obviously simulating a 32i format, so the ranges will still need to be scaled from [0..1]. It will not actually be as accurate as 32i, because all the packing instructions will be performed at 32f precision, but it should still be an improvement over the regular 16i format. I'm not sure what the scaling would be for 16f formats (most likely should be light_radius dependent). Since 16f formats already perform worse than 16i, I'm sure packing could improve the 16f quality at a minimum performance hit.

BTW, the code is all included in the zip (look in the main.cpp file, under init_XXXX_shader(). When I get this all polished I'll try to write up a little whitepaper on these little "Tips & Tricks", and I would love to have your feedback at that time [8^)

@ Neutrinohunter: I'm sorry it still isn't working on your machine...I don't have the hardware to test it. Regarding the FBO failing message, that only means that an FBO initialization failed for a specific format...I'll try to make the log less cryptic. I try to attach the texture, then do the standard glCheckFramebufferStatusEXT, and if that fails I emit that "FBO failed to initialize" message and cycle on to the next texture format.

##### Share on other sites
Its okay, I noticed that I had a problem with some of my drivers anyhow.

Is there a way to cycle the formats in a way such that I can change via a combo or such? I cannot seem to test *worse* formats because it fails with those. You have a makefile I could modify? I'd be really happy to help and find problems or pitfalls with these algorithms as I can use the results for a paper I am writing.

And I don't believe I've said it so far but well done with your current work!!! :)

Neutrinohunter