Sign in to follow this  
outlawhacker

Deferred shading - multiple lights

Recommended Posts

Hey, I've been trying to implement a deferred shader and I got some stuff to work. I wanted to try out the method because it separates geometry rendering from the light shading and I also heard people playing around with somewhere around 100 lights without almost any frame rate drop (I know all the negative stuff about deferred shading also, so I know it's not some magicical lighting solution :P). My question is just how do you render more than one light? I tried processing multiple lights in the same pass and that went ok. I tried about 10 lights without any significant frame rate drop. But this solution is just ugly, I would like to do every light in separate passes. This though, tears my frame rate as much as using a multi-pass forward lighting solution... Has anyone tried rendering every light in separate passes and gotten good performance? I'm wondering if I'm doing something wrong here...The lighting pass is a vertex shader just passing the screen direction to the frustum far corners and then the pixel shader handles all of the G-buffer stuff to light the scene. I'm not culling the lights at all, but at this point I'm just trying to throw in a bunch of light sources to check the performance... It shouldn't cost that much to render some fullscreen quads, one would think :( Thanks in advance :)

Share this post


Link to post
Share on other sites
I got "nice performance", when I used one light as one fullscreen quad. I rendered thousand of the quads! all of them with some different values (such light position, light attenuation, ...) and blended them together. It was under OpenGL, so I used classical glEnable(GL_BLEND)! it worked very nice :)

Deferred shading isn't magical solution, but it's fastest, when you use many lights! Try to render 10 lights with standart BRDF - it'll be slideshow!

Share this post


Link to post
Share on other sites
Hmm...thousands of fullscreen quads? That's crazy. If I try to render 50 quads I get like 2 fps. I'm using DX9.

I tried downloading some sample applications using deferred rendering and tweaked them to render more quads. They dropped in frame rate significantly "already" when using 20-30 lights. One application was in OpenGL, the other in DirectX. The OpenGL sample even runs better when using forward rendering than deferred rendering on my computer (Intel Core 2 Duo, 4GB RAM, Geforce 7900GT).

Here are the links to the samples:

DX9 - http://www.codesampler.com/usersrc/usersrc_7.htm
OpenGL - http://www.paulsprojects.net/opengl/defshad/defshad.html

Is there a special way of rendering the quads I don't know about? 'Cause just altering these samples to use more lights (and therefore do more fullscreen quads rendering passes) didn't give me any good performance.

Share this post


Link to post
Share on other sites
Well, there are few methods how can Deferred rendering be achieved. Are you using depth map or position map? Are you using scissor testing for optimizing performance of lights? what hardware are you using? and last what effects are you using (Normal Mapping, Steep Parallax Mapping, HDR, Bloom, Soft Shadows, Reflections and Refractions, ...)?

Anyway, try DX10 - there are some nice solutions how can be that achieved. At www.humus.ca is nice example how to use deferred rendering with DirectX 10. I'm rendering few thousands of lights (for deferred rendered radiosity) with scissor testing! at 20 - 30 fps on Pentium D 2* 2.8 GHz, 6 GB RAM (well, it's using just 2 of em, I'm testing it under 32-bit Windows, not 64-bit Linux) and Radeon HD 2900 XT at 1440*900 with normal mapping, PCMLSM shadows (my own very slow method), reflections and refractions, radiosity - it uses a little of raytracing, but nothing special. Rendered area has few dozens of thousands of polygons.

Share this post


Link to post
Share on other sites
Yeah DX10 seems to have a few improvements that I'd like to try in the future. Unfortunately I don't have a DX10-card :(.

I'm not using any effects right now. Simple point lights with a mesh with about 2k tris. I'm not using any scissor testing at all, just rendering a total fullscreen quad for every light.

I've tried using both position map and depth map, but that didn't do much for performance in my tests. There are some calculations needed for creating the pixel position when using depth map, but I didn't notice any performance loss.

Share this post


Link to post
Share on other sites
If you'd use OpenGL, there'd be some nice demo - http://www.ultimategameprogramming.com/demoDownload.php?category=OpenGL&page=9. Maybe, there's another problem with rendering - do you use antialiasing with your framebuffers (or it just may be turned on in NVidia Control Panel or Catalyst Control Center)? - it'd drop your framerate dramatically down (but it'll probably not be in this)

My method is (much more simplier - just normal and position (i'll write position there, because it's simplier):
1. render position and normal geometry buffer
2. in "deferred" pass, if light's attenuation at this pixel is greater than 0.000001 (i've got this as constant - lambda :D - i like that letter), calculate diffuse and specular light (well, i often calculate indirect light too, using rays and then add it as ambient, but this has to be simple!), i'm sending just light position and camera position to the shader!.
The "deferred" pass shader is activated for every full-screen quad, that I blend with the other full screen quads.

Share this post


Link to post
Share on other sites
For deferred rendering, a key optimization will be ensuring that as few pixels are shaded as possible during your lighting pass. With your approach, simply rendering a full-screen quad for every light source, you're ensuring that for every pass you're running the pixel shader for every pixel on the screen. This is wasteful and expensive when your light doesn't touch every pixel on the screen. Think about it: you're running a fairly complex pixel shader that requires samples from multiple textures and that could be outputting and blending 32bpp or 64bpp pixels. That's quite a bit of fillrate and bandwidth, and could very quickly get out of hand as you increase resolution and the number of lights.

One simple way to deal with this, as Vilem Otte mentioned, is to use scissor rectangles. You simply estimate a bounding box for your light in screen space, and then use that to limit the number of pixels that are rendered.

For more fine-grained culling, light volumes can be used. Usually a cone is used for spot lights, and a sphere or box for point lights (quads can still be used for fullscreen directional lights). This adds more vertex processing, but will allow more precise culling. You can usually get away with some very low-poly volumes, anyway. Another advantage of this is that it can be combined with z-culling to reject pixels where the light is "buried underground", or "floating in air" (depending on whether back-face or front-face culling is used when rendering the volumes). If combined with stencil techniques, its possible to reject all pixels that won't need shading. See this presentation and also this PDF for more info.

Lately I've been using a stencil-based technique where I render volume with depth-writes and stencil-writes, then render a full-screen quad with stencil-culling enabled. I do this only for lights that are relatively large, for smaller lights I just render the volumes directly with z-culling. This lets me get 300+ FPS @ 1280x1024 on my 8800 GTS with 1 large point light, 1 large spotlight, 1 full-screen directional light, and 200 small point light sources (attached to a particle system). This is all D3D9, btw.

Share this post


Link to post
Share on other sites
Quote:
my 8800 GTS with 1 large point light, 1 large spotlight, 1 full-screen directional light, and 200 small point light sources (attached to a particle system). This is all D3D9, btw.

How about 1.5k point lights on a platform that has only half the power of your 8800 GTS (XBOX 360) ... while running a whole game in the background.
It is all about renderer design ... what you guys call deferred renderer is a very strange over-complicated renderer design that has a few very strong design flaws. For example the lighting equation that you use to render the lights is completely off. It was designed for a renderer architecture that renders light and material properties at once. This is obviously wrong for an architecture that defers material data and wants to render light data later ...

So what I would like to suggest is to look at whatever your game needs and start with a renderer that defers depth data and shadow data (quite common nowadays) and then think about defering normal data and if you want split up the light equation into light properties and material properties. You can render much more lights if you really only render the light properties.

Share this post


Link to post
Share on other sites
While I'd love to have a nice discussion regarding the pros and cons of deferred rendering, it wasn't my intention to make some sort of case for DR being The Best Thing Ever when I gave those performance numbers. Heck I couldn't even tell you whether those numbers are good or horrible compared to any commercial game, with either forward or deferred rendering. I was just trying to help give the OP an idea of his headroom in terms of performance scaling with some of the optimization options available to him.

Like most others here I'm merely a hobbyist, doing this in what little spare time I have between work and sleep. I pursued deferred rendering out of interest in the technique, not because I'm interested in maximum exploitation of a platform for a commercial game. I've enjoyed working with the technique, its helped me become pretty well acquainted with some its more important advantages and disadvantages. In that context I don't think it was "wrong" for me to pursue it, even if I did somehow find the time to make a simple game using it.

Regardless I'd love to hear some of your experiences with DR in a commercial game scenario, if you have any. Its very easy to see the issues it would have on a platform like the 360, but it seems to have more potential on PS3 and new PC hardware. I've mostly found it simple and straightforward to implement, but I can see how its caveats could cause much pain with more complex engine architectures.

Share this post


Link to post
Share on other sites
sorry to hi-jack the thread. My main point is that the terms deferred renderer and forward renderer are quite artifical. None of those are really used in real-world applications .... I will structure my thoughts and follow up with an example and some explanation here ... in a few weeks.
I can't talk about unreleased games, but of the two games I am currently working on, one uses more "deferred lighting" elements than the other. None can be considered a deferred renderer so, because transparent objects still need to be rendered immediately.

Quote:
For more fine-grained culling, light volumes can be used. Usually a cone is used for spot lights, and a sphere or box for point lights (quads can still be used for fullscreen directional lights). This adds more vertex processing, but will allow more precise culling. You can usually get away with some very low-poly volumes, anyway. Another advantage of this is that it can be combined with z-culling to reject pixels where the light is "buried underground", or "floating in air" (depending on whether back-face or front-face culling is used when rendering the volumes). If combined with stencil techniques, its possible to reject all pixels that won't need shading.

This is straight to the point: shade as less pixels as possible. Then your performance is directly related on how much area your lights occupy in a scene.
I don't know how and if you have access to it, but stencil culling on NVIDIA cards can give you a nice advantage here. I think on ATI cards this is called hi-stencil and works a bit different. If you got until the stencil culling, I believe you did most of the stuff that is possible to optimize the light shader.
The obvious other optimization is to make the light shaders as light as possible. I would render the shadow contribution up-front into a channel in your G-Buffer and then use this to optimize the rest of the lighting pipeline.

If you have everything running, I would recommend picking a graphics card with a 128-bit memory bus ... this is then comparable to the PS3 and the 360. For example pick a NVIDIA 7600 GT. It has about 20 GB/s bandwidth and a 128-bit bus. You might be surprised how much your performance breaks in with a 1280x720 or higher resolution.
One of the disadvantages is that deferred lighting the way it is usually described can't run on anything with less 20 GB/s ... as long as if you do not cut corners ... for example by light mapping a huge amount of the 500 lights that you want to show :-)

Share this post


Link to post
Share on other sites
Quote:
Original post by wolf
How about 1.5k point lights on a platform that has only half the power of your 8800 GTS (XBOX 360)


One can design a renderer in any number of ways; the basic reason for "deferred" rendering is to simplify the render path and to provide a straight forward way to minimize overdraw besides simple draw order and early-z culls.

Obviously not everything in a deferred render is deferred...all it really means is that you render out position, normals and color separately, and do most of the heavy shader ops in the image space.

Your comment seems a little pointless, because its comparing two different things entirely. Having 1000 lights *per-scene* in your xbox360 engine does not seem remarkable. The differed method allows you scale the *per-pixel* light count with little overhead--so you dont need to split all your meshes up and or do multiple lighting passes.

Share this post


Link to post
Share on other sites
Quote:
Your comment seems a little pointless, because its comparing two different things entirely. Having 1000 lights *per-scene* in your xbox360 engine does not seem remarkable. The differed method allows you scale the *per-pixel* light count with little overhead--so you dont need to split all your meshes up and or do multiple lighting passes.
My point was to make it clear that it is not necessary to follow the deferred renderer paradigm that is used in research to get 1500 lights on screen. You can do this with a render design that renders material and light data at the same time in real-world games.
Let me just add one more argument to the discussion: current and future hardware prefers arithmetic instructions over memory. So the speed of memory does not increase with the same pace as the speed of arithmetic instructions. When you design a renderer you want to balance this. The design that is described in many documents will perform worse in future hardware therefore because the increase in arithmetic instruction speed will be higher than the increase in memory speed and you will want to do things with as less render targets as possible.

[Edited by - wolf on January 3, 2008 2:46:18 PM]

Share this post


Link to post
Share on other sites
The strong points that I see for deferred rendering going forward have to do with decoupling geometric and lighting complexities elegantly using the already-screamin'-fast hardware rasterizer at an extremely fine granularity (pixels). Sure you can do this at an object, spatial partition or larger level (scene graph) but these approaches are more complicated, less precise and do not scale as well with fully dynamic scenes. I'm not saying that they can't be made to work well - certainly they can - but they offer few theoretical advantages over the light contribution "compositing" approach, which tends to be what people are referring to when they talk about "Deferred Lighting".

Quote:
Original post by wolf
So what I would like to suggest is to look at whatever your game needs and start with a renderer that defers depth data and shadow data (quite common nowadays) and then think about defering normal data and if you want split up the light equation into light properties and material properties.

Certainly that's a good approach and I think people who do it this way see very clearly that the BRDFs/BSDFs are quite separate in reality and thus deferred lighting is almost a more natural way to specify and evaluate the lighting and surface reflectance models.

That said there's a caveat to some of the deferred shadowing stuff that people are doing nowadays: you lose shadow texture coordinate derivatives unless you write them out separately or compute them analytically. With these, you lose all hope of properly filtered shadows... you can "soften the edges" but they'll still alias like crazy when viewed from certain angles - Crysis being the latest poster child for this.

Quote:
Original post by wolf
Let me just add one more argument to the discussion: current and future hardware prefers arithmetic instructions over memory. So the speed of memory does not increase with the same pace as the speed of arithmetic instructions. When you design a renderer you want to balance this. The design that is described in many documents will perform worse in future hardware therefore because the increase in arithmetic instruction speed will be higher than the increase in memory speed and you will want to do things with as less render targets as possible.

While it's true that memory speeds are scaling less rapidly than arithmetic ability, the key point to note is that deferred rendering offers entirely predictable, streaming memory access. Thus the latency (and bandwidth to some extent) of these transactions can be entirely hidden by a proper cache/local storage hierarchy. These are already in place on modern GPUs and we're moving even more towards user-managed "scratch pads" them with things like the Cell's local store, the G80's parallel data cache and arguably the 360's EDRAM. It's really random access (gather/scatter) that's the problem.

It's also important to note that scene and shader complexities are increasing at a far more rapid rate than framebuffer resolutions, which motivates image-space lighting algorithms like Deferred Lighting. Sure there will probably be a 2-3 fold increase in LCD pixel density in the next 10 years, but I expect that in the same time we'll be wanting tons of fully dynamic lights (from particle systems say) affecting fully dynamic scenes (from physics simulation say). GI approximations also come to mind, many of which use deferred lighting-style algorithms to efficiently accumulate indirect light.

Anyways it's a big huge topic certainly, but the shift from strictly forward rendering to more hybrid engines has actually been rather fast so far... capable GPUs have only been around for a few years. It will be interesting to see how the state of the art advances in the next little while but if the recent papers in real-time graphics are indicative (they almost always are), deferred lighting/shading is here to stay for the long haul.

Now what gets really interesting is looking at tile-based deferred renderers in GPU hardware, but that's a subject for Beyond3D I think :)

Share this post


Link to post
Share on other sites
Doh!

It seems hard to touch the subject "Deferred Rendering" without talking about how good or bad it is...Interesting reading though. Thanks for replies!

I just wanted to get some info on how to improve the rendering of the lights :) I'm not primarily looking for a way to render 1000 lights. I've tried doing forward rendering and in my experience it was a pain in some areas. Now I would like to explore other options.

And thanks MJP and Otte, I will look up those ways to optimize the rendering and see what that brings.

Quote:
deferred lighting/shading is here to stay for the long haul

I think so too.

Share this post


Link to post
Share on other sites
Quote:
Original post by outlawhacker
It seems hard to touch the subject "Deferred Rendering" without talking about how good or bad it is...Interesting reading though. Thanks for replies!

Hehe it's true - and I didn't mean to get too off topic. However your question was really related to the very things that makes deferred rendering faster than forward rendering (in certain cases). Thus it's not really that much of a tangent.

That said, I'd play with stencil culling of volumes for larger lights (since this technique doesn't work as efficiently when rendering lots of lights in a single batch) and use something simpler like scissor tests, etc. for smaller lights. I think at that point you'll be making pretty good utilization of the available hardware/bandwidth, although you may also want to experiment with accumulating multiple lights in one pixel shader pass if you have a lot of "light overdraw".

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this