Deferred shading - multiple lights

Started by
13 comments, last by AndyTX 16 years, 3 months ago
Hey, I've been trying to implement a deferred shader and I got some stuff to work. I wanted to try out the method because it separates geometry rendering from the light shading and I also heard people playing around with somewhere around 100 lights without almost any frame rate drop (I know all the negative stuff about deferred shading also, so I know it's not some magicical lighting solution :P). My question is just how do you render more than one light? I tried processing multiple lights in the same pass and that went ok. I tried about 10 lights without any significant frame rate drop. But this solution is just ugly, I would like to do every light in separate passes. This though, tears my frame rate as much as using a multi-pass forward lighting solution... Has anyone tried rendering every light in separate passes and gotten good performance? I'm wondering if I'm doing something wrong here...The lighting pass is a vertex shader just passing the screen direction to the frustum far corners and then the pixel shader handles all of the G-buffer stuff to light the scene. I'm not culling the lights at all, but at this point I'm just trying to throw in a bunch of light sources to check the performance... It shouldn't cost that much to render some fullscreen quads, one would think :( Thanks in advance :)
Advertisement
I got "nice performance", when I used one light as one fullscreen quad. I rendered thousand of the quads! all of them with some different values (such light position, light attenuation, ...) and blended them together. It was under OpenGL, so I used classical glEnable(GL_BLEND)! it worked very nice :)

Deferred shading isn't magical solution, but it's fastest, when you use many lights! Try to render 10 lights with standart BRDF - it'll be slideshow!

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Hmm...thousands of fullscreen quads? That's crazy. If I try to render 50 quads I get like 2 fps. I'm using DX9.

I tried downloading some sample applications using deferred rendering and tweaked them to render more quads. They dropped in frame rate significantly "already" when using 20-30 lights. One application was in OpenGL, the other in DirectX. The OpenGL sample even runs better when using forward rendering than deferred rendering on my computer (Intel Core 2 Duo, 4GB RAM, Geforce 7900GT).

Here are the links to the samples:

DX9 - http://www.codesampler.com/usersrc/usersrc_7.htm
OpenGL - http://www.paulsprojects.net/opengl/defshad/defshad.html

Is there a special way of rendering the quads I don't know about? 'Cause just altering these samples to use more lights (and therefore do more fullscreen quads rendering passes) didn't give me any good performance.
Well, there are few methods how can Deferred rendering be achieved. Are you using depth map or position map? Are you using scissor testing for optimizing performance of lights? what hardware are you using? and last what effects are you using (Normal Mapping, Steep Parallax Mapping, HDR, Bloom, Soft Shadows, Reflections and Refractions, ...)?

Anyway, try DX10 - there are some nice solutions how can be that achieved. At www.humus.ca is nice example how to use deferred rendering with DirectX 10. I'm rendering few thousands of lights (for deferred rendered radiosity) with scissor testing! at 20 - 30 fps on Pentium D 2* 2.8 GHz, 6 GB RAM (well, it's using just 2 of em, I'm testing it under 32-bit Windows, not 64-bit Linux) and Radeon HD 2900 XT at 1440*900 with normal mapping, PCMLSM shadows (my own very slow method), reflections and refractions, radiosity - it uses a little of raytracing, but nothing special. Rendered area has few dozens of thousands of polygons.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Yeah DX10 seems to have a few improvements that I'd like to try in the future. Unfortunately I don't have a DX10-card :(.

I'm not using any effects right now. Simple point lights with a mesh with about 2k tris. I'm not using any scissor testing at all, just rendering a total fullscreen quad for every light.

I've tried using both position map and depth map, but that didn't do much for performance in my tests. There are some calculations needed for creating the pixel position when using depth map, but I didn't notice any performance loss.
If you'd use OpenGL, there'd be some nice demo - http://www.ultimategameprogramming.com/demoDownload.php?category=OpenGL&page=9. Maybe, there's another problem with rendering - do you use antialiasing with your framebuffers (or it just may be turned on in NVidia Control Panel or Catalyst Control Center)? - it'd drop your framerate dramatically down (but it'll probably not be in this)

My method is (much more simplier - just normal and position (i'll write position there, because it's simplier):
1. render position and normal geometry buffer
2. in "deferred" pass, if light's attenuation at this pixel is greater than 0.000001 (i've got this as constant - lambda :D - i like that letter), calculate diffuse and specular light (well, i often calculate indirect light too, using rays and then add it as ambient, but this has to be simple!), i'm sending just light position and camera position to the shader!.
The "deferred" pass shader is activated for every full-screen quad, that I blend with the other full screen quads.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

For deferred rendering, a key optimization will be ensuring that as few pixels are shaded as possible during your lighting pass. With your approach, simply rendering a full-screen quad for every light source, you're ensuring that for every pass you're running the pixel shader for every pixel on the screen. This is wasteful and expensive when your light doesn't touch every pixel on the screen. Think about it: you're running a fairly complex pixel shader that requires samples from multiple textures and that could be outputting and blending 32bpp or 64bpp pixels. That's quite a bit of fillrate and bandwidth, and could very quickly get out of hand as you increase resolution and the number of lights.

One simple way to deal with this, as Vilem Otte mentioned, is to use scissor rectangles. You simply estimate a bounding box for your light in screen space, and then use that to limit the number of pixels that are rendered.

For more fine-grained culling, light volumes can be used. Usually a cone is used for spot lights, and a sphere or box for point lights (quads can still be used for fullscreen directional lights). This adds more vertex processing, but will allow more precise culling. You can usually get away with some very low-poly volumes, anyway. Another advantage of this is that it can be combined with z-culling to reject pixels where the light is "buried underground", or "floating in air" (depending on whether back-face or front-face culling is used when rendering the volumes). If combined with stencil techniques, its possible to reject all pixels that won't need shading. See this presentation and also this PDF for more info.

Lately I've been using a stencil-based technique where I render volume with depth-writes and stencil-writes, then render a full-screen quad with stencil-culling enabled. I do this only for lights that are relatively large, for smaller lights I just render the volumes directly with z-culling. This lets me get 300+ FPS @ 1280x1024 on my 8800 GTS with 1 large point light, 1 large spotlight, 1 full-screen directional light, and 200 small point light sources (attached to a particle system). This is all D3D9, btw.
Quote:my 8800 GTS with 1 large point light, 1 large spotlight, 1 full-screen directional light, and 200 small point light sources (attached to a particle system). This is all D3D9, btw.

How about 1.5k point lights on a platform that has only half the power of your 8800 GTS (XBOX 360) ... while running a whole game in the background.
It is all about renderer design ... what you guys call deferred renderer is a very strange over-complicated renderer design that has a few very strong design flaws. For example the lighting equation that you use to render the lights is completely off. It was designed for a renderer architecture that renders light and material properties at once. This is obviously wrong for an architecture that defers material data and wants to render light data later ...

So what I would like to suggest is to look at whatever your game needs and start with a renderer that defers depth data and shadow data (quite common nowadays) and then think about defering normal data and if you want split up the light equation into light properties and material properties. You can render much more lights if you really only render the light properties.
While I'd love to have a nice discussion regarding the pros and cons of deferred rendering, it wasn't my intention to make some sort of case for DR being The Best Thing Ever when I gave those performance numbers. Heck I couldn't even tell you whether those numbers are good or horrible compared to any commercial game, with either forward or deferred rendering. I was just trying to help give the OP an idea of his headroom in terms of performance scaling with some of the optimization options available to him.

Like most others here I'm merely a hobbyist, doing this in what little spare time I have between work and sleep. I pursued deferred rendering out of interest in the technique, not because I'm interested in maximum exploitation of a platform for a commercial game. I've enjoyed working with the technique, its helped me become pretty well acquainted with some its more important advantages and disadvantages. In that context I don't think it was "wrong" for me to pursue it, even if I did somehow find the time to make a simple game using it.

Regardless I'd love to hear some of your experiences with DR in a commercial game scenario, if you have any. Its very easy to see the issues it would have on a platform like the 360, but it seems to have more potential on PS3 and new PC hardware. I've mostly found it simple and straightforward to implement, but I can see how its caveats could cause much pain with more complex engine architectures.
sorry to hi-jack the thread. My main point is that the terms deferred renderer and forward renderer are quite artifical. None of those are really used in real-world applications .... I will structure my thoughts and follow up with an example and some explanation here ... in a few weeks.
I can't talk about unreleased games, but of the two games I am currently working on, one uses more "deferred lighting" elements than the other. None can be considered a deferred renderer so, because transparent objects still need to be rendered immediately.

Quote:For more fine-grained culling, light volumes can be used. Usually a cone is used for spot lights, and a sphere or box for point lights (quads can still be used for fullscreen directional lights). This adds more vertex processing, but will allow more precise culling. You can usually get away with some very low-poly volumes, anyway. Another advantage of this is that it can be combined with z-culling to reject pixels where the light is "buried underground", or "floating in air" (depending on whether back-face or front-face culling is used when rendering the volumes). If combined with stencil techniques, its possible to reject all pixels that won't need shading.

This is straight to the point: shade as less pixels as possible. Then your performance is directly related on how much area your lights occupy in a scene.
I don't know how and if you have access to it, but stencil culling on NVIDIA cards can give you a nice advantage here. I think on ATI cards this is called hi-stencil and works a bit different. If you got until the stencil culling, I believe you did most of the stuff that is possible to optimize the light shader.
The obvious other optimization is to make the light shaders as light as possible. I would render the shadow contribution up-front into a channel in your G-Buffer and then use this to optimize the rest of the lighting pipeline.

If you have everything running, I would recommend picking a graphics card with a 128-bit memory bus ... this is then comparable to the PS3 and the 360. For example pick a NVIDIA 7600 GT. It has about 20 GB/s bandwidth and a 128-bit bus. You might be surprised how much your performance breaks in with a 1280x720 or higher resolution.
One of the disadvantages is that deferred lighting the way it is usually described can't run on anything with less 20 GB/s ... as long as if you do not cut corners ... for example by light mapping a huge amount of the 500 lights that you want to show :-)

This topic is closed to new replies.

Advertisement