Sign in to follow this  
lollan

OpenGL Deferred shader theory

Recommended Posts

Hi, I try to get my head around the concept of deferred shader, but to be honest I don't really get it. From what I gathered you process first all the geometry needed into a buffers. Then you take this buffer and process the lighting,texturing and such ? What do I get wrong in the basic concept ? What kind of buffer would I use in DirectX ? and OpenGL ? And finally is it really more efficient ? Thanks

Share this post


Link to post
Share on other sites
Quote:
Original post by lollan
From what I gathered you process first all the geometry needed into a buffers.

That's the basic idea: you render the components needed to do your lighting into buffers (normals, diffuse, etc). If you want to do that in a single-pass, you need to use multiple render-targets (MRTs). If you can't, you need to do several passes.

Quote:
Original post by lollan
Then you take this buffer and process the lighting,texturing and such ?

For the lighting, you render a quad (or any other geometry) that covers the light's influence in screen-space. In the shader, you fetch the components (normal, diffuse, pixel position, etc) from the buffers and you do the lighting. So you end up having different shader for each type of lights (omnis, spots, etc).

Quote:
Original post by lollan
What kind of buffer would I use in DirectX ? and OpenGL ?

That depends on your requirements. Here's the setup I use:

MRTs:
* RT0: A8R8G8B8
* RT1: R32F
* RT2: A8R8G8B8
* RT3: A8R8G8B8

PrePass:
* Render all objects
* Write Z
* Output: RT0: rgb8:Normal, a8:SpecularPower
* Output: RT1: r32f:Depth

LightPass:
* Render all lights
* Input: RT0, RT1
* Output: RT2: rgb8:Diffuse
* Output: RT3: rgb8:Specular

FinalPass:
* Render all objects
* Input: RT2 and RT3
* Output: RT0: rgb8:FinalColor

The final-pass is used to combine the ambient and emissive components with the rest of the lighting equation, and could be skipped if this is not required.

Quote:
Original post by lollan
And finally is it really more efficient ?

It depends on the requirements of your application. If you need to support a high numbers of lights, it could be more efficient. On the other hand, lights passes can be a bottleneck in high-resolutions, etc.

JF

Share this post


Link to post
Share on other sites
>> And finally is it really more efficient ?
Yes/No. The idea is to calculate all the difficult stuff once, and do the lighting with a minimal number of pixel calculations.

A normal "forward renderer" draws a bunch of polygons that are affected by a light X. Ussually you render more than needed, for example, the entire sector while only a few pixels are litten, or at least the entire wall, also if its only slightly affected by this light X. Furthermore, for each light you'll have to calculate stuff again and again. For example:
pass[X].resultColor = dot( light[x].dir, pixelNormal ) * diffuseTexture + lightColor + <specular(light X) * diffuseTexture>
finalResult = sum(lights)

Getting the per pixel normal requires a texture fetch, eventually multiple if you plan to use parallax mapping with the help of a heightmap. And maybe you need a second normalMap as well, if you want to use detail normalMapping. And then there is often also a specular color that can differ per material or even per pixel. You'll need to do all this stuff for each light again.

With deferred shading, you only have to this once. Like Deks explained, you can fill a number of buffers with the worldnormals, diffuse/specular colors, reflection term, and so on. Eventually everything parallax mapped. This data is ready-to-use for 1 light, or 10 billion lights for that matter. You still have to do some calculations for each litten pixel, but a big part has already been done. And, if done properly, you can render the lights more efficient. Instead of rendering an entire wall, you project the light on the wall and you 'exactly' evalulate the affected pixels. Nothing more, nothing less.


Sounds perfect, doesn't it? Logically, the more lights you have (at the same time visible), the more benefit you will get. HOWEVER, when do you really need that much lights? Sure, a scene could have 100 light sources, but are they really visible at the same time? Turn on all lights in your house, and count the number of lights that directly (not indirect, that is ambient=a whole different story) shine on your table, floor, or a wall. 2, maybe 3?

A more complex scene would be a city in the night. But again, do you really need that much of light? A lot can be faked with simple flares in the distance, and/or a global lightmap. I don't think a game like GTA IV is actually rendering that much of lights.


In that case, the deferred rendering approach won't automatically give you a speed gain. In fact, it could make things even slower or at least more 'restricted'. Your GPU is saved with some pixel calculations, but now you need to calculate the light volumes (CPU task), which is not that easy. And, your buffers will eat bandwidth. A 1680 x 1050 resolution with 4 buffers would need ~54 MB texture memory (in case all 4 textures are RGBA 16F). Modern cards can do that, but older ones...

But ok, that is not really the biggest problem. A real pain in the @ss is alpha blending and the restrictions. Let me explain. Each pixel in your buffers contain data for 1 pixel on the resulting screen (its normal, color, specular, etc.). When doing blending, you see 2 or more pixels at the same location (for example, glass with a wall behind). What to do now? You can't store data for 2 seperate pixels. Blending the normals or depth (if you want to use shadowMaps) will cause wacky results. In other words, you will need to do the alpha blended objects seperate afterwards. There are several tricks for that, but they come at a high cost. Some of the transparent surfaces are relative easy to "paste on top", others can cause problems. For example, if you want much grass that also uses the lighting and shadowMaps, you will need to render the grass seperately with a normal forward renderer, and then blend the 2 results (opaque buffer, alpha buffer) properly...

Another problem are the restrictions. Let's say you can have 4 data buffers at the max. 4 RGBA buffers = 16 scalar values. Deks showed you how you can fill that. 16 scalars is fine for the ordinary shading operations, but as soon as you want more exotic lighting models, play around with multiple reflection cubeMaps or have to add extra emissive/ambient data, you will run out of options. Of course, you can fix that with another hack afterwards, but it will make your rendering pipeline somewhat dirty, and maybe not more efficient than a normal forward renderer in the end...



I'm not saying deferred rendering is a no-go. And probably there are some tricks to overcome some of the problems I mentioned. But before deciding to use it as your lighting system, first think if a normal forward renderer is really not an options (how many lights you expect, what techniques you would like to use, how about the alpha blending ,etc.).

Greetings,
Rick

Share this post


Link to post
Share on other sites
Thanks Spek you put in light some restriction I didn't think about however I think that most os the problem you talked about (like the blending by example) can be resolved or at least minimised with a optimised optimisation.

Thanks again

Share this post


Link to post
Share on other sites
Quote:
Original post by spek

Sounds perfect, doesn't it? Logically, the more lights you have (at the same time visible), the more benefit you will get. HOWEVER, when do you really need that much lights? Sure, a scene could have 100 light sources, but are they really visible at the same time? Turn on all lights in your house, and count the number of lights that directly (not indirect, that is ambient=a whole different story) shine on your table, floor, or a wall. 2, maybe 3?



Well to be fair, a big benefit of deferred shading is that your cost of lighting is pretty directly related to the amount of pixels that are affected by lights. So for example a hundred small lights could be cheaper than a full-screen global light source if the sum of the pixels is < the backbuffer resolution. And if lots of these small light sources are off-screen/too small/occluded by geometry, the scene is then much cheaper to render (provided you're properly using the z-culling, and stencil culling if necessary). Of course I know what you're saying here...that if you're not going to have lots of light sources you're going to suffer through the penalties of deferred shading without getting one of the primary benefits However I just don't think performance is necessarily something you'd lose.

Also as far as ambient/indirect lighting goes, I've seen some clever techniques that make use of the G-Buffer to add some indirect lighting to a scene. This presentation is somewhat interesting, as is this one. Not really practical IMO, but interesting. [smile]

Quote:
Original post by spek
A more complex scene would be a city in the night. But again, do you really need that much of light? A lot can be faked with simple flares in the distance, and/or a global lightmap. I don't think a game like GTA IV is actually rendering that much of lights.


GTAIV is actually a great example...at night all the dynamic light sources look fantastic. The headlights, the tail-lights, the street lamps...very effective IMO. And of course GTAIV does actually use deferred techniques.

Quote:
Original post by spek
In that case, the deferred rendering approach won't automatically give you a speed gain. In fact, it could make things even slower or at least more 'restricted'. Your GPU is saved with some pixel calculations, but now you need to calculate the light volumes (CPU task), which is not that easy. And, your buffers will eat bandwidth. A 1680 x 1050 resolution with 4 buffers would need ~54 MB texture memory (in case all 4 textures are RGBA 16F). Modern cards can do that, but older ones...


Bandwidth is definitely a big problem on low and mid-end GPU's. For higher-end cards G-Buffers work great since they have the bandwidth and the memory access patterns are consistent, but low-end cards simply buckle on the pressure (in my experience, anyway). This should hopefully become less of an issue as GPU's get more and more bandwidth...plus you probably don't want to do deferred rendering on DX9-class hardware anyway since you lose MSAA in that case.

As far as light volumes go, that's not really a CPU task at all. All you need to do is render a cone for spot lights or a sphere for point lights. You transform and rasterize them just like normal geometry, and you just shade the visible pixels by sampling the g-buffer. You also get to use the normal rasterization optimizations, like z-cull and stencil-cull. You can use even simpler geometry if you want...I used to use cubes for rendering huge amounts of particles that acted as light sources.

On a side note, you definitely don't need 4 64bpp buffers. Just about all implementations I've seen have worked on 4 32bpp buffers.

Quote:
Original post by spek
But ok, that is not really the biggest problem. A real pain in the @ss is alpha blending and the restrictions. Let me explain. Each pixel in your buffers contain data for 1 pixel on the resulting screen (its normal, color, specular, etc.). When doing blending, you see 2 or more pixels at the same location (for example, glass with a wall behind). What to do now? You can't store data for 2 seperate pixels. Blending the normals or depth (if you want to use shadowMaps) will cause wacky results. In other words, you will need to do the alpha blended objects seperate afterwards. There are several tricks for that, but they come at a high cost. Some of the transparent surfaces are relative easy to "paste on top", others can cause problems. For example, if you want much grass that also uses the lighting and shadowMaps, you will need to render the grass seperately with a normal forward renderer, and then blend the 2 results (opaque buffer, alpha buffer) properly...


Yeah, alpha-blending is a real pain. Having to blend it in later sucks...but to be fair you end up having to do this anyway in most forward renderers. Still it would be nice to keep things consistent in a deferred renderer (since one of the main benefits of going DR is a simple and elegant renderer design, which really takes a hit when you need special-case code for certain surfaces). Humus actually has a really neat demo for deferred rendering with non-opaque surfaces, but it requires DX10 hardware and also adds a significant performance hit.



Anyway I'll also add a few other pros/cons that haven't been mentioned yet...

+You only need one shadow map in memory, since you can use "lazy" shadow map generation
+Coherent memory access patterns
+Your renderer can usually be simpler, and more elegant.
+/-Lighting model is unified. This is a + because it makes things simpler (especially in your shaders), but a - because it can be very restrictive (as spek mentioned earlier)
-Bandwidth usage is *huge* (sampling 4 g-buffers + blending to the backbuffer)
-No multi-sampling on DX9-class hardware. Multi-sampling on DX10 hardware is possible, but significantly more expensive (you need to effectively super-sample your lighting calculations in the shader)


EDIT: Just in case it's not clear where I stand based on my comments...I don't really think going DR is worth it unless you can exclusively target DX10 and higher hardware.

[Edited by - MJP on August 13, 2008 4:53:06 PM]

Share this post


Link to post
Share on other sites
Quote:
Just in case it's not clear where I stand based on my comments...I don't really think going DR is worth it unless you can exclusively target DX10 and higher harder.
I second this.
In the meantime I started to come up with a new renderer design that might be more flexible than a deferred renderer and offer 2 - 3 times more lights ... depending how you scale it. It also offers MSAA on DX9 hardware and is capable to run on lower end hardware. I got it running in one of our unannounced games and it looks very promising.
I talked about it twice at UCSD and at the University of Stuttgart and you can find the slides here:

http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html

It is a mixture of a Z Pre-Pass renderer and a deferred renderer and I call it Light Pre-Pass renderer. GTA IV used a Deferred renderer.

Share this post


Link to post
Share on other sites
>> And of course GTAIV does actually use deferred techniques.
I said nothing :)

@MJP
Thanks for those ambient papers. I'm trying out lots of techniques lately. So far generating a realtime radiosity map is the best thing I did, but big lightmaps(big scenes) are still slow to update completely. Using the actual screen content might be a better way to do it. I did an attempt, but in that case the lighting depended too much on the current camera position. For each and every movement/rotation, the ambient lighting would change. But maybe that paper has a better solution. Definitely worth a read.


A deferred renderer stays interesting nevertheless. And people like Wolf could make it more and more usefull over time. In my 'forward renderer' I do use DR in some way as well. The lighting is done in the old way, but I also have 4 buffers with pre-calculated stuff such as (parallaxed) worldNormal, albedo, specular, etc. But instead of rendering a full screen quad with the lighting volumes on top, I render the world normally with these buffers projected on it.

Maybe not the fastest way, but at least adding transparency and other techniques that need data that is not available in these buffers are little bit more easy now. And I save some time in the pixel calculations. For example, the color of a terrain is a fairly complicated one (blending multiple textures with weight values). I only have to do that once though, instead for each light or any other technique that needs the albedo color. It also keeps the shaders clean, only have to implement that technique at 1 place.

It gets difficult if the view changes though. I have reflection cubemaps, a mirrored view for water, and render the world for radiosity patches. All these passes would need to render the 4 buffers again since the camera changes. So instead, I just made a simplified render for these 'probes' that does not need these buffers.

Greetings,
Rick

Share this post


Link to post
Share on other sites
Thanks guys.

I don't understand everything yet but I sure have all I needed to get started.
For now I downloaded some demos and some papers. I will read them all and try to understand demos before getting started.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      627741
    • Total Posts
      2978887
  • Similar Content

    • By DelicateTreeFrog
      Hello! As an exercise for delving into modern OpenGL, I'm creating a simple .obj renderer. I want to support things like varying degrees of specularity, geometry opacity, things like that, on a per-material basis. Different materials can also have different textures. Basic .obj necessities. I've done this in old school OpenGL, but modern OpenGL has its own thing going on, and I'd like to conform as closely to the standards as possible so as to keep the program running correctly, and I'm hoping to avoid picking up bad habits this early on.
      Reading around on the OpenGL Wiki, one tip in particular really stands out to me on this page:
      For something like a renderer for .obj files, this sort of thing seems almost ideal, but according to the wiki, it's a bad idea. Interesting to note!
      So, here's what the plan is so far as far as loading goes:
      Set up a type for materials so that materials can be created and destroyed. They will contain things like diffuse color, diffuse texture, geometry opacity, and so on, for each material in the .mtl file. Since .obj files are conveniently split up by material, I can load different groups of vertices/normals/UVs and triangles into different blocks of data for different models. When it comes to the rendering, I get a bit lost. I can either:
      Between drawing triangle groups, call glUseProgram to use a different shader for that particular geometry (so a unique shader just for the material that is shared by this triangle group). or
      Between drawing triangle groups, call glUniform a few times to adjust different parameters within the "master shader", such as specularity, diffuse color, and geometry opacity. In both cases, I still have to call glBindTexture between drawing triangle groups in order to bind the diffuse texture used by the material, so there doesn't seem to be a way around having the CPU do *something* during the rendering process instead of letting the GPU do everything all at once.
      The second option here seems less cluttered, however. There are less shaders to keep up with while one "master shader" handles it all. I don't have to duplicate any code or compile multiple shaders. Arguably, I could always have the shader program for each material be embedded in the material itself, and be auto-generated upon loading the material from the .mtl file. But this still leads to constantly calling glUseProgram, much more than is probably necessary in order to properly render the .obj. There seem to be a number of differing opinions on if it's okay to use hundreds of shaders or if it's best to just use tens of shaders.
      So, ultimately, what is the "right" way to do this? Does using a "master shader" (or a few variants of one) bog down the system compared to using hundreds of shader programs each dedicated to their own corresponding materials? Keeping in mind that the "master shaders" would have to track these additional uniforms and potentially have numerous branches of ifs, it may be possible that the ifs will lead to additional and unnecessary processing. But would that more expensive than constantly calling glUseProgram to switch shaders, or storing the shaders to begin with?
      With all these angles to consider, it's difficult to come to a conclusion. Both possible methods work, and both seem rather convenient for their own reasons, but which is the most performant? Please help this beginner/dummy understand. Thank you!
    • By JJCDeveloper
      I want to make professional java 3d game with server program and database,packet handling for multiplayer and client-server communicating,maps rendering,models,and stuffs Which aspect of java can I learn and where can I learn java Lwjgl OpenGL rendering Like minecraft and world of tanks
    • By AyeRonTarpas
      A friend of mine and I are making a 2D game engine as a learning experience and to hopefully build upon the experience in the long run.

      -What I'm using:
          C++;. Since im learning this language while in college and its one of the popular language to make games with why not.     Visual Studios; Im using a windows so yea.     SDL or GLFW; was thinking about SDL since i do some research on it where it is catching my interest but i hear SDL is a huge package compared to GLFW, so i may do GLFW to start with as learning since i may get overwhelmed with SDL.  
      -Questions
      Knowing what we want in the engine what should our main focus be in terms of learning. File managements, with headers, functions ect. How can i properly manage files with out confusing myself and my friend when sharing code. Alternative to Visual studios: My friend has a mac and cant properly use Vis studios, is there another alternative to it?  
    • By ferreiradaselva
      Both functions are available since 3.0, and I'm currently using `glMapBuffer()`, which works fine.
      But, I was wondering if anyone has experienced advantage in using `glMapBufferRange()`, which allows to specify the range of the mapped buffer. Could this be only a safety measure or does it improve performance?
      Note: I'm not asking about glBufferSubData()/glBufferData. Those two are irrelevant in this case.
    • By xhcao
      Before using void glBindImageTexture(    GLuint unit, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLenum format), does need to make sure that texture is completeness. 
  • Popular Now