Sign in to follow this  
Chris_F

Dynamic global illumination (Molecule Engine)

Recommended Posts

I came across this blog post which mentions some type of technique for quickly calculating indirect lighting. They don't go into any details at all except to say that it is not based on precomputed radiance transfer. So my question is, if it is not PRT, then does anyone here have any ideas what it might be?

Share this post


Link to post
Share on other sites

My wild guess:

 

The hints are:

* Lightmap based results.

* Precomputation step, dependent on geometry, probably determining visibility per texel.

* 128x128 lightmap results.

* Precomputed data for 128x128 is about 4-5 MB.

* Improvements could be made in the quantization of this data.

 

I would guess the precomutation step is ray-tracing the hemisphere above each texel, to compute a crude 2D image from that texel's POV, containing lightmap UV's of the ray intersection points, and depth values (length of the ray). The world-space position and normal of each texel is probably also stored.

 

texels = 128*128

position = 3 * float

normal = 3 * float

depth/UV hemisphere = 32 * (float + float * 2)

total size = 128*128 * sizeof(float) * (3+3+32*(1+2)) == ~5MB.

 

At runtime, you can then also quickly compute a 128x128 albedo map, by rendering your level in UV-space (lightmap UV's as the VS position output, biased to -1 to 1 range) with a PS that outputs albedo.

 

Given this data, you can evaluate all your direct lights by rendering a quad over a 128x128 light accumulation buffer per light, and reading the above data per pixel. The position, normal and albedo are used to calculate the diffuse light response per texel.

 

Then to add the bounced light, you render to another accumulation buffer, using the per-texel hemisphere data and the direct light accumulation buffer as input.

For each texel, you sample the direct light results at each of the 32 UV's stored in your precomputed hemisphere visibility function, and evaluate a directional emitter at that UV's position/normal against your own position/normal.

That step can be looped a few times to calculate extra bounces.

Share this post


Link to post
Share on other sites

Well he refers to it as radiosity, so I would assume that it's based on form factors. If you have lightmap then you can pre-compute form factors between texels pretty easily. Then at runtime you just need some means of dynamically computing lighting at each texel, and you can iteratively compute the indirect lighting using your pre-computed for factors. For such a low amount of texels you could probably just brute-force force it by having each texel iterate over all of the form factors, at least if you strip out form factors that are below some threshold. But there's lots of existing optimizations for radiosity that you could borrow to make it really fast.

Share this post


Link to post
Share on other sites

My wild guess:

 

The hints are:

* Lightmap based results.

* Precomputation step, dependent on geometry, probably determining visibility per texel.

* 128x128 lightmap results.

* Precomputed data for 128x128 is about 4-5 MB.

* Improvements could be made in the quantization of this data.

 

I would guess the precomutation step is ray-tracing the hemisphere above each texel, to compute a crude 2D image from that texel's POV, containing lightmap UV's of the ray intersection points, and depth values (length of the ray). The world-space position and normal of each texel is probably also stored.

 

texels = 128*128

position = 3 * float

normal = 3 * float

depth/UV hemisphere = 32 * (float + float * 2)

total size = 128*128 * sizeof(float) * (3+3+32*(1+2)) == ~5MB.

 

At runtime, you can then also quickly compute a 128x128 albedo map, by rendering your level in UV-space (lightmap UV's as the VS position output, biased to -1 to 1 range) with a PS that outputs albedo.

 

Given this data, you can evaluate all your direct lights by rendering a quad over a 128x128 light accumulation buffer per light, and reading the above data per pixel. The position, normal and albedo are used to calculate the diffuse light response per texel.

 

Then to add the bounced light, you render to another accumulation buffer, using the per-texel hemisphere data and the direct light accumulation buffer as input.

For each texel, you sample the direct light results at each of the 32 UV's stored in your precomputed hemisphere visibility function, and evaluate a directional emitter at that UV's position/normal against your own position/normal.

That step can be looped a few times to calculate extra bounces.

 

I concur but my brief foray into getting the bouncing part working on the GPU wasn't particularly promising.  You end up iterating over a table of UVs that are spatially incoherent and the resulting texture fetches simply don't favor the high-latency in the name of high-throughput design philosophy of GPUs.  Also there is very little computational work that can be done that doesn't depend on the fetches so all the latency hiding built into the GPU doesn't help much.  It actually ran faster on the CPU where the cache hierarchy was better able to absorb the scattered memory accesses.  Sorting the rays by UV helped a bit but not by a lot.  I'm not convinced it's impossible but a naïve implementation is definitely not viable.  Possible solutions:

 

1) Aggregating the work across multiple frames is a no brainer as long as your lights don't move too fast.

 

2) Perhaps a second pre-computation step can be added that tells you how to rearrange your texture data (probably by tile) and UV tables such that the radiance bouncing is a more memory coherent operation.  According to the few papers I have read that tried to do this (in the case of offline rendering) this is hard and there are always pathological cases that break the optimizer.

 

Small critique:

 

1) Why would you need a depth value parried with the UVs?  I can't imagine propagating anything other than radiance which I believe is constant along rays (i.e. doesn't change with distance).

 

2) A single 128x128 data structure to hold all of this doesn't seem like enough.  I was using an array of 256x256 structures...this definitely contributed to my problem because the UV table also held a texture index so collecting incident irradiance meant accessing a random texture at a random location.  I suppose limiting how much of the scene relative to the camera's position gets indirect lighting can do wonders in terms of limiting the size of your structure array.

 

3) I wouldn't call this true dynamic indirect illumination; your lights can move but your geometry cannot.  I'm skeptical over the advantages of good old fashioned Light Propagation Volumes.

 

4) All problems aside I consider this approach to be more promising than what Epic has done with tracing cones in a sparse voxel octreee.  Impressive work for sure and it can be made truly dynamic but we really needed the PS4 and Xbox One to be meatier machines to see wide spread use  :(.  Hopefully some developer somewhere will surprise me and I end up being wrong...it is a neat idea and it does have a lot of knobs that can be dialed down to mitigate the expense.

Share this post


Link to post
Share on other sites
2) A single 128x128 data structure to hold all of this doesn't seem like enough.  I was using an array of 256x256 structures...this definitely contributed to my problem because the UV table also held a texture index so collecting incident irradiance meant accessing a random texture at a random location.  I suppose limiting how much of the scene relative to the camera's position gets indirect lighting can do wonders in terms of limiting the size of your structure array.

 

3) I wouldn't call this true dynamic indirect illumination; your lights can move but your geometry cannot.  I'm skeptical over the advantages of good old fashioned Light Propagation Volumes.

 

4) All problems aside I consider this approach to be more promising than what Epic has done with tracing cones in a sparse voxel octreee.  Impressive work for sure and it can be made truly dynamic but we really needed the PS4 and Xbox One to be meatier machines to see wide spread use  sad.png.  Hopefully some developer somewhere will surprise me and I end up being wrong...it is a neat idea and it does have a lot of knobs that can be dialed down to mitigate the expense.

 

 

Actually the biggest problem with this I can see is the same problem pretty much all these hacks have, perhaps good for small corridors and etc. But how many games are open world or at least have huge open levels now? 5 seconds for Sponza is great for Sponza, but the complexity of precomputation should be undbounded, and doing a level just a hundred times the size of Sponza, say a good sized campaign level from halo, would be 8 minutes and 20 seconds of work each time you wanted to see what lighting would look like.

 

Voxel Cone Tracing seems the most promising to me if you want to get your dynamic geometry and reflections in, I think Epic gave up on it too early. There was some great papers from Siggraph about voxel structures such as transforming your octree into a "Directed Acylic Graph" On here, the links aren't up though: http://kesen.realtimerendering.com/sig2013.html Basically you compress your octree by using a table linking the same identical pieces of structure towards different areas in the graph, essentially reducing any identical structures to being stored only once. I remember they'd gotten near identical performance for the benefit of a quite high compression rate. You'd be able to increase the octree resolution by at least one level without taking up any more ram, which would help with thin geometry a lot.

 

There are other things that weren't considered. Signed distance fields can and are generated in realtime already, and can be a good speedup for tracing into an octree, I suspect (and am looking at) whether it's faster overall. Trying to get a more coherent tracing structure would also help, a huge part of the cost comes from the tracing itself, and I know some people have just tried a regular octree over a sparse representation and said it gives a nice speed boost. There's also the question of transparency, and at least for particles, they're usually close together (smoke, dust, whatever), so you can box those blocks together and just trace once for the entire block, no one's going to notice. I'm not sure how Epic was trying to do a double bounce, but all the ways I can think of were incredibly expensive (or just plain wrong). But there are ways to get good results with a single bounce, and cutting out the double bounce would save a ton of time.

 

I also can't think of a single other way to get reflections. Every other realtime GI technique is diffuse only. Plenty of graphics programmers have grumbled and complained and asked others to think of some good way to get reflections. The only solution besides this that I've seen is "environment probes" and that's about it. I suppose you could try splatting VPLs with imperfect shadow maps, but the imperfect shadow maps would probably cause severe light leak for high frequency stuff like reflections, and besides I've never seen and implementation of VPL's without some huge deficit.

Edited by Frenetic Pony

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this