Deferred shading without MRTs and with MSAA

Yann L · 2007-04-29T13:51:02

I need some ideas from you guys with the creative minds here ! Ok, here's the situation: I'm currently rendering a very complex 3D scene to a 16bit floating point offscreen surface. I'm highly fragment limited. Now, I've recently developed a new kind of dynamic soft shadow algorithm that looks really good and isn't too expensive, so I'd like to add it to the engine. Unfortunately, doing so in the traditional way (by simply including it into the main shaders) forces me to rerender parts of the scene several times, which is a performance killer. So I'd like to use partially deferred shading to add the shadows after the main render pass. So far, so good. Essentially, the information I need after the main render pass is complete would be: * The (floating point) pixel colour. Doh, that's easy, it's just the output of the shaders as they are right now. * The pixel position in world space. The position should be reasonably accurate. * The pixel normal, also in world space. This one can be less accurate. This would be easy with MRTs, but unfortunately I can't use them. The main reasons are the lack of MSAA support with MRTs on most current GPUs, and the huge memory requirements of the additional buffer (since all MRT colour buffers need the same internal format, I'd have to go with a second FP16 RGBA buffer - ugh). So MRT is not an option. What to do ? After my main render pass, I have two valid buffers: an RGBA FP16 buffer, where the alpha component is currently unused. And a 24bit depth buffer. Given these two buffers, I need to somehow convey the pixel world space position and the normal. The position is easy, I can simply unproject it from the depth buffer and the inverse camera matrix in a pixel shader. The normal, however, is tricky. I only have one additional free channel, the alpha channel. To store a full featured normal, I'd need at least two. I could also use finite differencing on the depth buffer to recover the normal, but that would lead to incorrect data on the object silhouette edges. I've also though about quantizing the normal hemisphere (only the normals pointed towards the camera are visible, everything pointing back is culled anyway) into 256 directions, and using lookup tables to encode the pixel normal into a single byte. That wouldn't be very accurate, but still OK (although I'm afraid of banding artifacts). The problem is that it will completely fail when MSAA is enabled. As soon as the MSAA buffer is resolved, my alpha values (that are now actually indices into the quantized normal hemisphere) will be blended on the edges, and *BAM* ! Unfortunately, I cannot afford rendering the entire scene twice per frame, even when using simplified shaders (so to write only the normal to a separate buffer). There's just too much geometry. Well, I'll probably still do it this way if nothing else helps, but it's really a last resort. Any ideas ? Maybe a way to encode the normal as a polar angle to the pixels view direction, and recover the missing second angle by differencing the depth map ? Or somehow encoding a correction factor in the alpha component of the colour buffer, that compensates for the errorneous depthmap differencing on polygon edges ? I'm open to all weird suggestions ! Edit: oh yeah, I'm using OpenGL btw, but that shouldn't really matter. Thanks, Yann

Graphics and GPU Programming Programming OpenGL

Started by Yann L April 12, 2007 06:16 AM

14 comments, last by jerm 17 years ago

AndyTX

807

April 28, 2007 07:19 PM

Quote:Original post by Yann L
As AndyTX mentioned, current MSAA is just not usable with any form of deferred shading, unless there somehow was an efficient way to access the pre-resolved samples for all involved components, with full MRT support. We're not there yet, I'm afraid.

I looked into the support in D3D10 more and indeed you can access the pre-resolved data but unfortunately I'm not sure that deferred shading can make use of this efficiently. The problem is that you can't get access to any sort of compression flag to indicate whether all of the samples are the same, which means you are stuck evaluating the BRDF for all of the samples anyways... why not just super-sample?

Quote:Original post by Yann L
However, the visual impact of having completely dynamic area soft shadows everywhere is definitely worth it :)

Sounds really nice - any chance we can see some screenshots or get some info on the techniques that you're using?

Quote:Original post by Yann L
Oh, and could ATI *PLEASE* fix the two trillion depth buffer related bugs in their current OpenGL driver already ? Working around their bugs is a real PITA.

Haha, you ran into this too? The other major annoyance on ATI is the poor performance of MRT in general. In a deferred shading demo that I wrote a while back even with huge resolutions and lots of geometry it was *still* faster on ATI to render the buffers one by one, whereas on NVIDIA the crossover is legitimately at something like ~1000 polygons.

Quote:Original post by Yann L
Anyway, cheers guys, and thanks for your suggestions !

Sorry they couldn't be any more useful :( Good luck with your project though!

MJP

20,296

April 29, 2007 12:33 AM

Ditto on what AndyTX said. I've been playing around with deferred rendering for the past month or two, and I have yet to come across an effective way of implementing MSAA (MRT or not). Having just got my 8800 recently, I haven't played around with retrieving the pre-resolved textures but I'd suspect the performance would be closer to SSAA than it would be to MSAA.

The Blog | The Book

ze moo

192

April 29, 2007 04:31 AM

Quote:Original post by Yann L
...
So I ended up rendering the geometry twice, once to a non-MSAA buffer (with depth and normal data only, using MRT), and a second time to the HDR colour buffer with MSAA. This works fine, but eats up a lot of performance.

It's faster on ATI cards though. currently the driver recompiles ALL shaders when
you switch from e.g. one drawbuffer to two.
weird, especially since the drawbuffers extension has been introduced by ati in the
first place. (there's been a thread on opengl.org about this issue)

Quote:Original post by Yann L
However, the visual impact of having completely dynamic area soft shadows everywhere is definitely worth it :)
...

care to explain a little bit more in detail? :)

[Edited by - ze moo on April 29, 2007 5:31:46 AM]

Cypher19

768

April 29, 2007 09:53 AM

Quote:why not just super-sample

You get the benefits of rotated grid (et al) antialiasing patterns without having to rerender the scene multiple times, e.g. in the case of jittered sub-pixel positions. With just regular super-sampling (i.e. render once to a mega-sized buffer), you get the same quality as square grid AA, and anyone who has seen both knows that rotated grid just knocks square all over the parking lot [grin]

Besides, the driver/GPU can probably handle the AA'd buffer in a smarter manner, in terms of performance, than just falling over and having to deal with a really large buffer.

Yann L

1,806

Author

April 29, 2007 12:42 PM

Quote:Original post by Cypher19
Besides, the driver/GPU can probably handle the AA'd buffer in a smarter manner, in terms of performance, than just falling over and having to deal with a really large buffer.

Yes of course, your comments are correct. However, Andy's point was about the interaction between user supplied pixel shaders and the AA buffer. Since there seem to be no (easy ?) way of knowing whether some property within an AA sample cell is constant, you cannot optimize your shaders the same way MSAA optimizes over SSAA. In other words, you have to execute your shader once for every subpixel in the AA buffer, which would amount to SSAA from the performance penalty of shader invocation.

Quote:Original post by AndyTX
Quote:Original post by Yann L
Oh, and could ATI *PLEASE* fix the two trillion depth buffer related bugs in their current OpenGL driver already ? Working around their bugs is a real PITA.

Haha, you ran into this too?

Oh God, yes... Please don't remind me... I just ranted about this over in the OpenGL forum.

Quote:Original post by AndyTX
Sounds really nice - any chance we can see some screenshots or get some info on the techniques that you're using?

Oh, it's not that complex actually.

It's basically a form of advanced ambient occlusion field technique mixed with PRT. In ShaderX4, there is an article by Kontkanen/Laine about cubemap compressed ambient occlusion fields for contact shadows. Our system uses a similar principle, but encodes the data in a different way.

For each reference object, we're using one set of 3D textures and one cubemap set. The 3D textures spans the objects bounding box, and the cubemaps close the box like an exterior skin. Each 3D voxel contains not the spherical cap size and direction (as in the Shader X4 approach), but an entire SH coefficient set. This way, we can describe the transfer function at each inner point of the object towards the outside, from every possible direction, taking into account all forms of self shadowing and self reflections.

The outer cubemap encodes compressed polynomials, much like Kontkanen/Laine did, except we're storing outwards going SH coefficients again instead of a simple spherical cap.

When rendering, we project the 3D texture/cubemap set from every object onto the environment, compute the light transfer function at each pixel. Since we store an entire SH set at the inner and outer regions of each object, we can easily extend the ambient shadows to include direct and indirect lighting effects. Means that we don't only get near surface contact shadows through the 3D texture, but full soft shadows through the PRT polynomials on the cubemap set. See it as a kind of "inverted shadowmap" approach, but instead of storing depth values, we store light transfer polynomials.

The main idea is to get realistic shadows from many large dynamic objects in a complex GI environment (with area lights, sun and skylight), used in photorealistic architectural visualization.

jerm

123

April 29, 2007 01:51 PM

I've always thought of them as inverted deep shadow maps. Close enough! Sounds interesting.

I'd be curious to know storage requirements for your technique. Kontkanen's technique was memory heavy on its own, without adding a 3D texture of SH coeffs. I always liked the ambient occlusion fields paper, but it always seemed like it was just a bit too impractical for real-time apps.

Deferred shading without MRTs and with MSAA

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Deferred shading without MRTs and with MSAA

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines