• Create Account

## Deferred lighting and instant radiosity

Posted by , 03 April 2009 · 2,804 views

In the past months, I've been wondering how to approach the problem of lighting inside hangars and on ship hulls. So far, I had only been using a single directional light: the sun. The majority of older games precompute lighting into textures ( called lightmaps ) but clearly this couldn't work well in the context of a procedural game, where content is generated dynamically at run time. Plus, even if it did.. imagine the amount of texture memory needed to store all the lighting information coming from surfaces of kilometers-long battleship !

Fortunately, there's a solution to the problem.. enter the fantastic universe of deferred lighting !

# Deferred lighting

Traditionally, it is possible implement dynamic lighting without any precomputations via forward lighting. The algorithm is surprisingly simple: in a first pass, the scene is rendered to the depth buffer and to the color buffer using a constant ambient color. Then, for each light you render the geometry that is affected by this light only, with additive blending. This light pass can include many effects, such as normal mapping/per pixel lighting, shadowing, etc..

This technique, used in games silmilar to Doom 3, does work well, but is very dependent on the granularity of the geometry. Let's take an object of 5K triangles that is partially affected by 4 lights. This means that to light this object, you will need to render 25K triangles over 5 passes total ( ambient pass + 4 lights passes, each 5K ). An obvious optimization is, given one light and one object, to only render the triangles of the object that are affected by the light, but this would require some precomputations that a game such as Infinity cannot afford, due to its dynamic and procedural nature.

Now let's imagine the following situation: you've got a battleship made of a dozen of 5K-to-10K triangles objects, and you want to place a hundred lights on its hull. How many triangles do you need to render to achieve this effect with forward lighting ? Answer: a lot. Really, a lot. Too much.

Another technique that is getting more and more often used in modern games is deferred lighting. It was a bit impractical before shader model 3.0 video cards, as it required many passes to render the geometry too. But using multiple render targets, it is possible to render all the geometry once, and exactly once ! independently of the number of lights in the scene. One light or a hundred lights: you don't need to re-render all the objects affected by the lights. Sounds magical, doesn't it ?

The idea with deferred lighting is that, in a forward pass, geometric informations are rendered to a set of buffers, usually called "geometry buffers" ( abbrev: G-buffers ). Those informations usually include the diffuse color ( albedo ), the normal of the surface, the depth or linear distance between the pixel and the camera, the specular intensity, self-illumination, etc.. Note that no lighting is calculated yet at this stage.

Once this is done, for each light, a bounding volume ( which can be as simple as a 12-triangles box for a point light ) is rendered with additive blending. In the pixel shader, the G-buffers are accessed to reconstruct the pixel position from the current ray and depth, then this position is then used to compute the light color and attenuation, do normal mapping or shadowing, etc..

# Implementation

## G-Buffers

There are a few tricks and specificities in Infinity. Let's have a quick look at them. First of all, the G-buffers.

I use 4 RGBAF16 buffers. They store the following data:

`-           R           G           B           ABuffer 1    FL          FL          FL          DepthBuffer 2    Diffuse     Diffuse     Diffuse     Self-illumBuffer 3    Normal      Normal      Normal      SpecularBuffer 4    Velocity 	Velocity    Extinction  MatID`

'FL' = Forward lighting. That's one of the specificity of Infinity. I still do one forward lighting pass, for the sun and ambient lighting ( with full per-pixel lighting, normal mapping and shadowing ) and store the result in the RGB channels of the first buffer. I could defer it too, but then I'd have a problem related to atmospheric scattering. At pixel level, the scattering equation is very simple: it's simply modulating an extinction color ( Fex ) and adding an in-scattering color ( Lin ):

Final = Color * Fex + Lin

Fex and Lin are computed per vertex, and require some heavy calculations. Moving those calculations per pixel would kill the framerate.

If I didn't have a forward lighting pass, I'd have to store the scattering values in the G-buffers. This would require 6 channels ( 3 for Fex and 3 for Lin ). Here, I can get away with only 4 and use a grayscale 'Extinction' for the deferred lights ( while sun light really needs an RGB color extinction ).

'Velocity' is the view-space velocity vector used for motion blur ( computed by taking the differences of positions of the pixel between the current frame and the last frame ).

'Normal' is stored in 3 channels. I have plans to store it in 2 channels only and recompute the 3rd in the shader. However this will require to encode the sign bit in one of the two channels, so I haven't implemented it yet. Normals ( and lighting in general ) are computed in view space.

'MatID' is an ID that can be used in the light shader to perform material-dependent calculations.

As you can see, there's no easy way to escape using 4 G-buffers.

As for the format, I use F16. It is necessary both for storing the depth, but also encoding values in HDR.

## Performance

At first, I was a bit disapointed by the performance hit / overhead caused by G-buffers. There are 4 buffers after all, in F16: that requires a lot of bandwidth. On an ATI X1950 XT, simply setting up the G-buffers and clearing them to a constant color resulted in a framerate of 130 fps at 1280x1024. That's before even sending a single triangle. As expected, changing the screen resolution dramatically changed the framerate, but I found this overhead to be linear with the screen resolution.

I also found yet-another-bug-in-the-ATI-OpenGL-drivers. The performance of clearing the Z-buffer only was dependent on the number of color attachments. Clearing the Z-buffer when 4 color buffers are attached ( even when color writes are disabled ) took 4 more time than clearing the Z-buffer when only 1 color buffer was attached. As a "fix", I simply dettach all color buffers when I need to clear the Z-buffer alone.

## Light pass

Once the forward lighting pass is done and all this data is available in the G-buffers, I perform frustum culling on the CPU to find all the lights that are visible in the current camera's frustum. Those lights are then sorted by type: point lights, spot lights, directional lights and ambient point lights ( more on that last category later ).

The forward lighting ( 'FL' ) color is copied to an accumulation buffer. This is the buffer in which all lights will get accumulated. The depth buffer used in the forward lighting pass is also bound to the deferred lighting pass.

For each light, a "pass" is done. The following states are used:

* depth testing is enabled ( that's why the forward lighting's depth buffer is reattached )
* depth writing is disabled
* culling is enabled
* if the camera is inside the light volume, the depth test function is set to GREATER, else it uses LESS

A threshold is used to determine if the camera is inside the light volume. The value of this threshold is chosen to be at least equal to the znear value of the camera. Bigger values can even be used, to reduce a bit the overdraw. For example, for a point light, a bounding box is used and the test looks like this:

const SBox3DD& bbox = pointLight->getBBoxWorld();
SBox3DD bbox2 = bbox;
bbox2.m_min -= SVec3DD(m_camera->getZNear() * 2.0f);
bbox2.m_max += SVec3DD(m_camera->getZNear() * 2.0f);
TBool isInBox = bbox2.isIn(m_camera->getPositionWorld());
m_renderer->setDepthTesting(true, isInBox ? C_COMP_GREATER : C_COMP_LESS);

Inverting the depth test to GREATER as the camera enters the volume allows to discard pixels in the background / skybox very quickly.

I have experimented a bounding sphere for point lights too, but found that the reduced overdraw was cancelled out by the larger polycount ( a hundred polygons, against 12 triangles for the box ).

I haven't implemented spot lights yet, but I'll probably use a pyramid or a conic shape as their bounding volume.

As an optimization, all lights of the same type are rendered with the same shader and textures. This means less state changes, as I don't have to change the shader or textures between two lights.

For each light, a Z range is determined on the cpu. For point lights, it is simply the distance between the camera and the light center, plus or minus the light radius. When the depth is sampled in the shader, the pixel is discarded if the depth is outside this Z range. This is the very first operation done by the shader. Here's a snippet:

vec4 ColDist = texture2DRect(ColDistTex, gl_FragCoord.xy);
if (ColDist.w < LightRange.x || ColDist.w > LightRange.y)

There isn't much to say about the rest of the shader. A ray is generated from the camera's origin / right / up vectors and current pixel position. This ray is multiplied by the depth value, which gives a position in view space. The light position is uploaded to the shader as a constant in view space; the normal, already stored in view space, is sampled from the G-buffers. It is very easy to implement a lighting equation after that. Don't forget the attenuation ( color should go to black at the light radius ), else you'll get seams in the lighting.

## Antialiasing

In a final pass, a shader applies antialiasing to the lighting accumulation buffer. Nothing particularly innovative here: I used the technique presented in GPU Gems 3 for Tabula Rasa. An edge filter is used to find edges either in the depth or the normals from the G-buffers, and "blur" pixels in those edges. The parameters had to be adjusted a bit, but overall I got it working in less than an hour. The quality isn't as good as true antialiasing ( which cannot be done by the hardware in a deferred lighting engine ), but it is acceptable, and the performance is excellent ( 5-10% hit from what I measured ). Here's a picture showing the edges on which pixels are blurred for antialiasing:

Once I got my deferred lighting working, I was surprised to see how well it scaled with the number of lights. In fact, the thing that matters is pixel overdraw, which is of course logical and expected given the nature of deferred lighting, but still I found it amazing that as long as overdraw remained constant, I could spawn a hundred light and have less than a 10% framerate hit.

The algorithm is relatively simple: each light is set up and casts N photon rays in a random direction. At each intersection of the ray with the scene, a photon is generated and stored in a list. The ray is then killed ( russian roulette ) or bounces recursively in a new random direction. The photon color at each hit is the original light color multiplied by the surface color recursively at each bounce. I sample the diffuse texture with the current hit's barycentric coordinates to get the surface color.

In my tests, I use N = 2048, which results in a few thousands photons in the final list. This step takes around 150 ms. I have found that I could generate around 20000 photons per second in a moderately complex scene ( 100K triangles ), and it's not even optimized to use many CPU cores.

In a second step, a regular grid is created and photons that share the same cell get merged ( their color is simply averaged ). Ambient point lights are then generated for each cell with at least one photon. Depending on N and the granularity of the grid, it can result in a few dozen ambient point lights, up to thousands. This step is very fast: around one millisecond per thousand photons to process.

You can see indirect lighting in the following screenshot. Note how the red wall leaks light on the floor and ceiling. Same for the small green box. Also note that no shadows are used for the main light ( located in the center of the room, near the ceiling ), so some light leaks on the left wall and floor. Finally, note the ambient occlusion that isn't fake: no SSAO or precomputations! There's one direct point light and around 500 ambient point lights in this picture. Around 44 fps on an NVidia 8800 GTX in 1280x1024 with antialiasing.

# Results

I have applied deferred lighting and instant radiosity to Wargrim's hangar. I took an hour to texture this hangar with SpAce's texture pack. I applied a yellow color to the diffuse texture of some of the side walls you'll see in those screenshots: note how light bounces off them, and created yellow-ish ambient lighting around that area.

There are 25 direct point lights in the hangar. Different settings are used for the instant lighting, and as the number of ambient point lights increase, their effective radius decrease. Here are the results for different grid sizes on a 8800 GTX in 1280x1024:

` Cell size  # amb point lights  Framerate0.2        69                  910.1        195                 870.05       1496                460.03       5144                300.02       10605               170.01       24159               8`

I think this table is particularly good at illustrating the power of deferred lighting. Five thousand lights running at 30 fps ! And they're all dynamic ( although in this case they're used for ambient lighting, so there would be no point in that ): you can delete them or move every single of them in real time without affecting the framerate !

In the following screenshots, a few hundred ambient point lights were used ( sorry, I don't remember the settings exactly ). You'll see some green dots/spheres in some pics: those highlight the position of ambient lights.

Full lighting: direct lighting + ambient lighting

Direct lighting only

Ambient ( indirect ) lighting only

The screens look beautiful - thanks for the detailed explanation of the deferred pipeline!
Holy Mother Of God....

That looks absolutly stunning and mind blowing...

:-o
That looks jaw-droppingly good!

Thanks for detailing the radiosity lighting.
Hi, really informative entry, I've followed your work for some time and you never fail to impress.

I may have misunderstood something but surely if you recalculate the z component of the view space normal you don't need to store the sign as it will always be towards the camera?

I'm speachless.

Ok, not completely. It's awesome.

I've been waiting for deferred lighting to come around.

You can see the location of the cells and the highlight near them after you look at the image for a while. Would it look any better if it was more stochastic or if the falloff were more subtle? How does that look in motion? Are the points fixed to model space?

What if you have a dynamic light moving through the scene? Do the cells move? How often are the indirect lights recomputed?
A lot of that went over my head, but it was a fascinating read. Stunning screenshots too, good job.
Excellent read. I'm planning on putting deferred lighting into my engine when I have time to work on it, so I love it anytime there's a good, informative read on the implementation of deferred lighting.

The screens look fantastic and the indirect lighting looks like a very good technique that I'll have to think about aswell. Great job as always.
I dont know if the deferred path is the right way in your case
Im betting average framerate is gonna be quite a bit lower prolly ~%40
yes the extreme cases will benefit but if thats <1% of the time is it worth it?

a forward renderer could handle this reasonably
http://www.infinity-universe.com/Infinity/images/stories/Journals/DeferredLighting/def_lighting_11.jpg

I guess Im saying dont commit unless your sure

Screenshots are amazing. [wow]
The number of lights used is too.
Congratulations! Your game is now GPU bound [lol]
The frame rate may be low, but as long as you don't overload the GPU with more data, fps will stay constant even if you add more processing; CPU side.

Double-check your results on Pre-Geforce 8 series, as they may have much less pixel processing power because they don't have unified shaders.

You were a bit harsh with the forward lighting introduction. Have you heard of single pass forward lighting? Shader Model 3.0 improves a lot the ability to do everything in one pass not just only for referred rendering.
It's worth mentioning though, forward lighting could no way handle THAT many lights. Not at least with that frame rate.

Congratulations again ;-)

Cheers
Dark Sylinc
Another great article, thanks for sharing your insights.

I think it's worth giving a plug to a friend of mine who came up with a new deferred rendering technique, it may be worth you having a look.

You know I-can't-believe-it's-not-butter?

I-can't-believe-it's-not-Maya! (Or something like that.)

Excellent insight on the techniques you use (and why). It's always nice to know that voodoo was not involved
Yet another case showing that deferred shading is ready for primetime. ;)

Gives me hope for my own project (trying to see how far I can get with a prelighting renderer on early SM 3.0 hardware).

Keep up the good work!
Great work as always!

ForeverNoobie
I saw your name on the poll on the home page (Asking which user name is hardest to pronounce) and I decided to come check on your progress. I've been a fan of the concept of Infinity Online for a long time although the details are quite over my head.

Good luck with the project. You are truly an inspiration to someone like me because I find it hard to stick with my big projects. I really hope you finish the game (and soon, I can't wait to beta test).
I'm curious, after seeing this one
http://www.infinity-universe.com/Infinity/images/stories/Journals/DeferredLighting/def_lighting_9_med.jpg

whether ambient lighting would benefit from (a) poisson distribution or (b) regularly placed grid of lights using "manhattan" distance formula for attenuation.
I'd guess either way would result in a more natural/even lighting of the scene (although it's pretty good already)

my 2 cents
-M
Yeah, I played with other attenuation formulas, but it's hard to get ride of the spot effect without increasing the radius. And if you do increase the radius, the number of overlapping pixels explode and performance drops quickly..