Deferred shader theory

Started by
9 comments, last by lollan 15 years, 8 months ago
Hi, I try to get my head around the concept of deferred shader, but to be honest I don't really get it. From what I gathered you process first all the geometry needed into a buffers. Then you take this buffer and process the lighting,texturing and such ? What do I get wrong in the basic concept ? What kind of buffer would I use in DirectX ? and OpenGL ? And finally is it really more efficient ? Thanks
Advertisement
Quote:Original post by lollan
From what I gathered you process first all the geometry needed into a buffers.

That's the basic idea: you render the components needed to do your lighting into buffers (normals, diffuse, etc). If you want to do that in a single-pass, you need to use multiple render-targets (MRTs). If you can't, you need to do several passes.

Quote:Original post by lollan
Then you take this buffer and process the lighting,texturing and such ?

For the lighting, you render a quad (or any other geometry) that covers the light's influence in screen-space. In the shader, you fetch the components (normal, diffuse, pixel position, etc) from the buffers and you do the lighting. So you end up having different shader for each type of lights (omnis, spots, etc).

Quote:Original post by lollan
What kind of buffer would I use in DirectX ? and OpenGL ?

That depends on your requirements. Here's the setup I use:

MRTs:
* RT0: A8R8G8B8
* RT1: R32F
* RT2: A8R8G8B8
* RT3: A8R8G8B8

PrePass:
* Render all objects
* Write Z
* Output: RT0: rgb8:Normal, a8:SpecularPower
* Output: RT1: r32f:Depth

LightPass:
* Render all lights
* Input: RT0, RT1
* Output: RT2: rgb8:Diffuse
* Output: RT3: rgb8:Specular

FinalPass:
* Render all objects
* Input: RT2 and RT3
* Output: RT0: rgb8:FinalColor

The final-pass is used to combine the ambient and emissive components with the rest of the lighting equation, and could be skipped if this is not required.

Quote:Original post by lollan
And finally is it really more efficient ?

It depends on the requirements of your application. If you need to support a high numbers of lights, it could be more efficient. On the other hand, lights passes can be a bottleneck in high-resolutions, etc.

JF
Thank you very much !

It is very helpful, it's weird the theory looks much more simpler ^^

Thanks again.
There are plenty of papers and tutorials on the subject that you'll probably want to check out, if you're interested:

Nvidia's Deferred Shading presentation

Fabio Policarpo's Deferred Shading Tutorial

Shawn Hargreave's original presentation from GDC '04
Thanks MJP I am definitely interested.
>> And finally is it really more efficient ?
Yes/No. The idea is to calculate all the difficult stuff once, and do the lighting with a minimal number of pixel calculations.

A normal "forward renderer" draws a bunch of polygons that are affected by a light X. Ussually you render more than needed, for example, the entire sector while only a few pixels are litten, or at least the entire wall, also if its only slightly affected by this light X. Furthermore, for each light you'll have to calculate stuff again and again. For example:
pass[X].resultColor = dot( light[x].dir, pixelNormal ) * diffuseTexture + lightColor + <specular(light X) * diffuseTexture>
finalResult = sum(lights)

Getting the per pixel normal requires a texture fetch, eventually multiple if you plan to use parallax mapping with the help of a heightmap. And maybe you need a second normalMap as well, if you want to use detail normalMapping. And then there is often also a specular color that can differ per material or even per pixel. You'll need to do all this stuff for each light again.

With deferred shading, you only have to this once. Like Deks explained, you can fill a number of buffers with the worldnormals, diffuse/specular colors, reflection term, and so on. Eventually everything parallax mapped. This data is ready-to-use for 1 light, or 10 billion lights for that matter. You still have to do some calculations for each litten pixel, but a big part has already been done. And, if done properly, you can render the lights more efficient. Instead of rendering an entire wall, you project the light on the wall and you 'exactly' evalulate the affected pixels. Nothing more, nothing less.


Sounds perfect, doesn't it? Logically, the more lights you have (at the same time visible), the more benefit you will get. HOWEVER, when do you really need that much lights? Sure, a scene could have 100 light sources, but are they really visible at the same time? Turn on all lights in your house, and count the number of lights that directly (not indirect, that is ambient=a whole different story) shine on your table, floor, or a wall. 2, maybe 3?

A more complex scene would be a city in the night. But again, do you really need that much of light? A lot can be faked with simple flares in the distance, and/or a global lightmap. I don't think a game like GTA IV is actually rendering that much of lights.


In that case, the deferred rendering approach won't automatically give you a speed gain. In fact, it could make things even slower or at least more 'restricted'. Your GPU is saved with some pixel calculations, but now you need to calculate the light volumes (CPU task), which is not that easy. And, your buffers will eat bandwidth. A 1680 x 1050 resolution with 4 buffers would need ~54 MB texture memory (in case all 4 textures are RGBA 16F). Modern cards can do that, but older ones...

But ok, that is not really the biggest problem. A real pain in the @ss is alpha blending and the restrictions. Let me explain. Each pixel in your buffers contain data for 1 pixel on the resulting screen (its normal, color, specular, etc.). When doing blending, you see 2 or more pixels at the same location (for example, glass with a wall behind). What to do now? You can't store data for 2 seperate pixels. Blending the normals or depth (if you want to use shadowMaps) will cause wacky results. In other words, you will need to do the alpha blended objects seperate afterwards. There are several tricks for that, but they come at a high cost. Some of the transparent surfaces are relative easy to "paste on top", others can cause problems. For example, if you want much grass that also uses the lighting and shadowMaps, you will need to render the grass seperately with a normal forward renderer, and then blend the 2 results (opaque buffer, alpha buffer) properly...

Another problem are the restrictions. Let's say you can have 4 data buffers at the max. 4 RGBA buffers = 16 scalar values. Deks showed you how you can fill that. 16 scalars is fine for the ordinary shading operations, but as soon as you want more exotic lighting models, play around with multiple reflection cubeMaps or have to add extra emissive/ambient data, you will run out of options. Of course, you can fix that with another hack afterwards, but it will make your rendering pipeline somewhat dirty, and maybe not more efficient than a normal forward renderer in the end...



I'm not saying deferred rendering is a no-go. And probably there are some tricks to overcome some of the problems I mentioned. But before deciding to use it as your lighting system, first think if a normal forward renderer is really not an options (how many lights you expect, what techniques you would like to use, how about the alpha blending ,etc.).

Greetings,
Rick
Thanks Spek you put in light some restriction I didn't think about however I think that most os the problem you talked about (like the blending by example) can be resolved or at least minimised with a optimised optimisation.

Thanks again
Quote:Original post by spek

Sounds perfect, doesn't it? Logically, the more lights you have (at the same time visible), the more benefit you will get. HOWEVER, when do you really need that much lights? Sure, a scene could have 100 light sources, but are they really visible at the same time? Turn on all lights in your house, and count the number of lights that directly (not indirect, that is ambient=a whole different story) shine on your table, floor, or a wall. 2, maybe 3?



Well to be fair, a big benefit of deferred shading is that your cost of lighting is pretty directly related to the amount of pixels that are affected by lights. So for example a hundred small lights could be cheaper than a full-screen global light source if the sum of the pixels is < the backbuffer resolution. And if lots of these small light sources are off-screen/too small/occluded by geometry, the scene is then much cheaper to render (provided you're properly using the z-culling, and stencil culling if necessary). Of course I know what you're saying here...that if you're not going to have lots of light sources you're going to suffer through the penalties of deferred shading without getting one of the primary benefits However I just don't think performance is necessarily something you'd lose.

Also as far as ambient/indirect lighting goes, I've seen some clever techniques that make use of the G-Buffer to add some indirect lighting to a scene. This presentation is somewhat interesting, as is this one. Not really practical IMO, but interesting. [smile]

Quote:Original post by spek
A more complex scene would be a city in the night. But again, do you really need that much of light? A lot can be faked with simple flares in the distance, and/or a global lightmap. I don't think a game like GTA IV is actually rendering that much of lights.


GTAIV is actually a great example...at night all the dynamic light sources look fantastic. The headlights, the tail-lights, the street lamps...very effective IMO. And of course GTAIV does actually use deferred techniques.

Quote:Original post by spek
In that case, the deferred rendering approach won't automatically give you a speed gain. In fact, it could make things even slower or at least more 'restricted'. Your GPU is saved with some pixel calculations, but now you need to calculate the light volumes (CPU task), which is not that easy. And, your buffers will eat bandwidth. A 1680 x 1050 resolution with 4 buffers would need ~54 MB texture memory (in case all 4 textures are RGBA 16F). Modern cards can do that, but older ones...


Bandwidth is definitely a big problem on low and mid-end GPU's. For higher-end cards G-Buffers work great since they have the bandwidth and the memory access patterns are consistent, but low-end cards simply buckle on the pressure (in my experience, anyway). This should hopefully become less of an issue as GPU's get more and more bandwidth...plus you probably don't want to do deferred rendering on DX9-class hardware anyway since you lose MSAA in that case.

As far as light volumes go, that's not really a CPU task at all. All you need to do is render a cone for spot lights or a sphere for point lights. You transform and rasterize them just like normal geometry, and you just shade the visible pixels by sampling the g-buffer. You also get to use the normal rasterization optimizations, like z-cull and stencil-cull. You can use even simpler geometry if you want...I used to use cubes for rendering huge amounts of particles that acted as light sources.

On a side note, you definitely don't need 4 64bpp buffers. Just about all implementations I've seen have worked on 4 32bpp buffers.

Quote:Original post by spek
But ok, that is not really the biggest problem. A real pain in the @ss is alpha blending and the restrictions. Let me explain. Each pixel in your buffers contain data for 1 pixel on the resulting screen (its normal, color, specular, etc.). When doing blending, you see 2 or more pixels at the same location (for example, glass with a wall behind). What to do now? You can't store data for 2 seperate pixels. Blending the normals or depth (if you want to use shadowMaps) will cause wacky results. In other words, you will need to do the alpha blended objects seperate afterwards. There are several tricks for that, but they come at a high cost. Some of the transparent surfaces are relative easy to "paste on top", others can cause problems. For example, if you want much grass that also uses the lighting and shadowMaps, you will need to render the grass seperately with a normal forward renderer, and then blend the 2 results (opaque buffer, alpha buffer) properly...


Yeah, alpha-blending is a real pain. Having to blend it in later sucks...but to be fair you end up having to do this anyway in most forward renderers. Still it would be nice to keep things consistent in a deferred renderer (since one of the main benefits of going DR is a simple and elegant renderer design, which really takes a hit when you need special-case code for certain surfaces). Humus actually has a really neat demo for deferred rendering with non-opaque surfaces, but it requires DX10 hardware and also adds a significant performance hit.



Anyway I'll also add a few other pros/cons that haven't been mentioned yet...

+You only need one shadow map in memory, since you can use "lazy" shadow map generation
+Coherent memory access patterns
+Your renderer can usually be simpler, and more elegant.
+/-Lighting model is unified. This is a + because it makes things simpler (especially in your shaders), but a - because it can be very restrictive (as spek mentioned earlier)
-Bandwidth usage is *huge* (sampling 4 g-buffers + blending to the backbuffer)
-No multi-sampling on DX9-class hardware. Multi-sampling on DX10 hardware is possible, but significantly more expensive (you need to effectively super-sample your lighting calculations in the shader)


EDIT: Just in case it's not clear where I stand based on my comments...I don't really think going DR is worth it unless you can exclusively target DX10 and higher hardware.

[Edited by - MJP on August 13, 2008 4:53:06 PM]
Quote:Just in case it's not clear where I stand based on my comments...I don't really think going DR is worth it unless you can exclusively target DX10 and higher harder.
I second this.
In the meantime I started to come up with a new renderer design that might be more flexible than a deferred renderer and offer 2 - 3 times more lights ... depending how you scale it. It also offers MSAA on DX9 hardware and is capable to run on lower end hardware. I got it running in one of our unannounced games and it looks very promising.
I talked about it twice at UCSD and at the University of Stuttgart and you can find the slides here:

http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html

It is a mixture of a Z Pre-Pass renderer and a deferred renderer and I call it Light Pre-Pass renderer. GTA IV used a Deferred renderer.
>> And of course GTAIV does actually use deferred techniques.
I said nothing :)

@MJP
Thanks for those ambient papers. I'm trying out lots of techniques lately. So far generating a realtime radiosity map is the best thing I did, but big lightmaps(big scenes) are still slow to update completely. Using the actual screen content might be a better way to do it. I did an attempt, but in that case the lighting depended too much on the current camera position. For each and every movement/rotation, the ambient lighting would change. But maybe that paper has a better solution. Definitely worth a read.


A deferred renderer stays interesting nevertheless. And people like Wolf could make it more and more usefull over time. In my 'forward renderer' I do use DR in some way as well. The lighting is done in the old way, but I also have 4 buffers with pre-calculated stuff such as (parallaxed) worldNormal, albedo, specular, etc. But instead of rendering a full screen quad with the lighting volumes on top, I render the world normally with these buffers projected on it.

Maybe not the fastest way, but at least adding transparency and other techniques that need data that is not available in these buffers are little bit more easy now. And I save some time in the pixel calculations. For example, the color of a terrain is a fairly complicated one (blending multiple textures with weight values). I only have to do that once though, instead for each light or any other technique that needs the albedo color. It also keeps the shaders clean, only have to implement that technique at 1 place.

It gets difficult if the view changes though. I have reflection cubemaps, a mirrored view for water, and render the world for radiosity patches. All these passes would need to render the 4 buffers again since the camera changes. So instead, I just made a simplified render for these 'probes' that does not need these buffers.

Greetings,
Rick

This topic is closed to new replies.

Advertisement