EVSM performance tip!

Graphics and GPU Programming Programming OpenGL

Started by theagentd November 22, 2012 03:09 AM

5 comments, last by theagentd 11 years, 4 months ago

990

Author

November 22, 2012 03:09 AM

Hello. I have a small tip for anyone using EVSMs. I don't know if it's something completely obvious or so, but here it is anyway.

The papers I've found recommends that as a shadow map, you render to a MSAA 32-bit float RGBA render target to store the two moments (m1, m1*m1, m2, m2*m2). You also need a depth buffer since the RGBA buffer can't be used as a depth buffer. This is incredibly expensive. For 4xMSAA, we get 16+4 bytes per MSAA sample. That's 80MBs just for a 4 sample 1024^2 variance map! We also need a resolved variance map, so add another 16MBs there. In total: 96MBs just for a 1024^2 shadow map. Don't you dare try a resolution 2048^2...

However, I found that I can reduce this memory footprint a lot. We don't have to calculate the moments until we resolve the MSAA texture! Instead of having a 32-bit float RGBA texture + a depth texture, we can get by with only a 32-bit float depth texture. By outputting view-space depth and modifying the depth range, we can just pass it down to gl_FragDepth (in the case of OpenGL) in the shadow rendering shader. When resolving we simply read the depth samples and calculate the moments and average them together! The result is that we only need 4 bytes per MSAA sample, period. That's 16MBs for a a 4xMSAA 1024^2 variance. The resolved variance texture is identical to before so that's another 16MBs. In total: 32MBs, which is a LOT better than 96MBs.

This not only reduces VRAM usage a lot, it also massively reduces the bandwidth needed for the shadow map rendering and resolve passes. On my GTX 295 (only one GPU, equal to a GTX 260/275), the performance is a lot better with my optimization. I'm getting 240 FPS in using the standard technique (1024^2 + 4xMSAA) and 440 FPS with my new one, which is almost twice as fast. Quality is identical to the normal technique since it's identical to normal EVSM stuff after resolving the MSAA texture.

I hope someone finds this useful, and if something's unclear feel free to ask!

EDIT: Fun fact: It's not possible to create a 8xMSAA 32-bit float RGBA texture, but it is possible to create a 8xMSAA 32-float depth buffer, so my technique works with 8xMSAA while the standard technique does not (at least on my hardware). Even funnier: Mine with 8xMSAA is 50% faster than the original with 4xMSAA and uses half as much memory.

MJP

20,295

November 22, 2012 06:40 AM

The first time I saw this trick mentioned was in the tech post-mortem for Little Big Planet (they rolled it into the blurring pass since they didn't use MSAA), so it's been around for a while! In fact Andrew Lauritzen used it in his sample distribution shadow maps sample. Early on with VSM's it wasn't possible to do so since DX9 hardware didn't support sampling MSAA textures, and DX10 hardware didn't support sampling MSAA depth buffers (DX10.1 added this). But now on DX11-level hardware it's totally doable.

You'd probably get even better performance if you didn't use a fragment shader at all for rendering the shadow maps, since the hardware will run faster with the fragment shader disabled. Explicitly writing out depth through gl_FragDepth will also slow things down a bit, since the hardware won't be able to skip occluded fragments.

The Blog | The Book

theagentd

990

Author

November 22, 2012 03:59 PM

Haha, I knew I couldn't be the first one to discover this... =S

I'd like to point out that this does work on my GTX 295 which doesn't support DirectX 10.1, so it's actually doable with OpenGL but not with DirectX it seems?

I would not recommend using non-linear depth. The precision distribution is not made to work well with floats, and the additional steps all reduce the effective precision. The much improved precision of using eye-space distance can be spent on a higher C value to reduce bleeding. However, you are correct that performance is better thanks to early z rejection before the shader is run. It might be worth using both a color attachment for the eye-space distance and a depth buffer for early z rejection if your scene has lots of overdraw.

MJP

20,295

November 22, 2012 05:56 PM

I'd like to point out that this does work on my GTX 295 which doesn't support DirectX 10.1, so it's actually doable with OpenGL but not with DirectX it seems?

Yeah it was common knowledge that the GTX 200 series supported *most* of the 10.1 feature set, but they never bothered to get it fully compliant.

I would not recommend using non-linear depth. The precision distribution is not made to work well with floats, and the additional steps all reduce the effective precision. The much improved precision of using eye-space distance can be spent on a higher C value to reduce bleeding. However, you are correct that performance is better thanks to early z rejection before the shader is run. It might be worth using both a color attachment for the eye-space distance and a depth buffer for early z rejection if your scene has lots of overdraw.

Well for orthographic projections (directional lights) the depth value is already linear, and for a perspective projection you can flip the near and far planes if you're using a floating point depth buffer, which mostly balances out the precision issue. But of course, your mileage may vary.

The Blog | The Book

theagentd

990

Author

November 22, 2012 09:27 PM

Oh, I didn't know any of that! How lame of Nvidia! I might try that flipping trick too, it sounds interesting! How would I use the flipped depth value? Would I need to use GL_GREATER instead of GL_LESS in my depth test? How would I treat the value when resolving the MSAA depth texture? Do I need to invert it or something?

MJP

20,295

November 24, 2012 06:54 AM

For any value from the depth buffer you can convert to a linear depth value (z component of view-space position) by using a few values from the projection matrix. Just follow the math in the matrix multiplication and you can derive it pretty easily. Doing this still works fine with a projection that uses "flipped" clipping planes. So for EVSM you can just render using "flipped" depth (make sure you switch the direction of the depth test), and then when you perform the EVSM converison/MSAA resolve you can convert to a linear depth value right after sampling the depth texture.

The Blog | The Book