Sign in to follow this  
CDProp

Naive Question: Why not do light culling in world space?

Recommended Posts

CDProp    1451

I guess I can't speak for everybody, but the majority of the lights in my scene don't move. They're street lights, etc., that I want to affect things that do move, so they need to be dynamic, but I don't need them to move around in the scene. So, with a scene that is 2km², I could divide the world into 64x64 (4096) 32m² chunks and sort my lights into them using a quadtree. Yes, that's a lot of chunks, but even if I allow 16 lights per chunk, with a 2-byte index each, I am only using 128k of memory. And most of my lights would only have to be sorted ONCE. I would only have to sort the moving lights (vehicle headlights, etc.) once per frame. Then, when I go to shade a pixel, I simply calculate my chunk index from the fragment position, and that index will give me the list of lights that affect that chunk.

 

A few reasons why I suspect people don't do this:

 

a) They have a lot more dynamically-moving lights than I do (e.g., from laser-fire and whatnot).

b) They have so many lights that a 32m chunk size is way too huge.

c) You lose alignment with the 8x8 screen-space tiles, which can cause performance problems. (edit)

 

Is there anything else I am missing?

Edited by CDProp

Share this post


Link to post
Share on other sites
cozzie    5029
Sounds like a good idea if you ask me.
It might be that the results are not 100% (but acceptable), you might have to take the viewer/ camera position and heading into account

Share this post


Link to post
Share on other sites
Hodgman    51220

This technique was used in Just Cause IIRC -- they made a big 2D texture, which is overlaid over the world, basically dividing it into a grid. Each grid cell (texel) can then have light indices stored in it.

When shading a surface, you use it's world-space position to select a grid cell, fetch that texel, which gives you a list of lights to use.

Share this post


Link to post
Share on other sites
Matias Goldberg    9573

Light culling algorithms are common.

 

In Forward renderers you have a limited number of lights, so culling is a must. In Deferred renders, lights are often just rendered geometry with special additive shaders, thus can be culled like normal geometry.

 

It's worth noting you can use the standard algorithms that cull your geometry to also cull your lights, just as TheChubu says.

If you use a Quadtree to cull your geometry, you can use it for lights too.

 

Perhaps your case is that you're aware of the applications you'll be giving to it: most lights are static. Thus a custom solution that fit your particular needs is not crazy.

 

Be careful with quadtrees though. Trees often involve multiple dependent-reads indirections which means GPU's memory latency can slow you down a lot (though you could workarounded perhaps with a few assumptions); and unless the memory layout is well planned, they're usually not cache friendly either.

Grids are often much better. They waste more ram though.

Share this post


Link to post
Share on other sites
MJP    19753

Using spatial hierarchies to associate meshes with lights was common back in the day if traditional forward renderers. With deferred rendering however it's totally unnecessary, as you can easily achieve per-tile level culling without any acceleration structures using only a fraction of a millisecond on the GPU. In most cases a 16x16 screen-space tile is probably going to be a lot more granular than a 32x32m chunk, which would really hurt your ability to scale up on smaller lights. Plus your algorithm sounds inherently expensive, since you're talking about a per-pixel traversal of an acceleration structure.

Share this post


Link to post
Share on other sites
CDProp    1451
Yeah, I see what you mean. I've been kind of down on deferred rendering lately because I feel like you give up a lot in order to achieve it. We actually have a deferred renderer that does the old style light culling (looping through the lights on the CPU, stenciling them one by one). So, bandwidth use is high. We're using FXAA, but that still leaves things looking unacceptably jaggy. So it seems there are two options:

a) Solve the problems with the deferred rendering. Do tiled light culling to save bandwidth, implement MSAA (requires edge detection pass, per-sample lighting on edges, custom MSAA resolve). I'm already eating the cost of a custom resolve with my tone mapping, but the rest sounds pretty slow.

b) If I've already committed to doing some sort of tiled light culling, then I might as well just ditch deferred rendering and solve the AA problem as well. I also don't have to handle transparency any differently, which is a nice bonus.

So, b is sounding like a better option to me. Taking this further, though, the point of my OP is to raise the question of whether it's even worth it to re-sort my lights every frame along screen space tiles, given that 90% of them don't move with respect to world space. The acceleration structure would be used for the sorting step only. When it comes to actually looking up which tile a fragment is being rendered in, I would just use the world-space position to calculate an index into the grid implied by the deepest level of the tree.

I realize, though, that my world space grid is not going to align with the screen space tiles along which the GPU will partition it's fragment rendering, and so there will be inefficiencies there.

Share this post


Link to post
Share on other sites
cozzie    5029
Maybe it helps, but here's what I do which might be what you're aiming for:

Per frame:
- check in which "cell"/ grid of the quadtree the camera is in
- cull all renderables in that "cell" against the frustum
- do the same with each light, using simple distance check with bounding sphere radius
- loop through the renderables and do a distance check to only the lights that are visible in the frustum (lightpos + max range), save those light ID's in a simple int vector
- render per renderable and send only those lights that affect the renderable, to the shader

Blended renderables can be done exactly the same (after sorting that is), no blending issues because of forward rendering. Also MSAA is no issue.

Share this post


Link to post
Share on other sites
CDProp    1451

What I have in mind would be somewhat different. It would be more like this. I divide the world into a grid of 32x32 meter cells. For an arena that is 2 square kilometers, that is roughly 4000 cells. I give each cell an index according to j*NumXCells + i with some choice of origin.

 

per frame:

- Figure out which cell each light belongs in by doing sphere/cone vs. AABB tests on the cells. Here is where I might use some sort of quadtree logic to cut down on the number of tests I'm doing. Store the light indices in a shader storage buffer. If I'm allowing a max of 16 lights per cell, then this buffer would need to hold 16*numCells integers. So, a 64kb buffer would work in this instance. If there are fewer than 16 lights in a cell, then use a sentinel value after the last light index.

 

per fragment:

- Calculate the fragment position from depth. Use that fragment position to figure out the index of the grid cell that the fragment is in. Use index*max_lights_per_cell to get the index into the aforementioned buffer. Start looping through the lights until the sentinel value, or max_lights_per_cell, has been reached.

 

And one optimization I have in mind is having two buffers, one for moving lights and another for non-moving ones. The advantage there is that the non-moving lights only have to be sorted once. Only about 10% of my lights actually move in world space (vehicle headlights, etc.) and so I would only need to sort these on a per-frame basis.

 

And yes, since it's forward-rendered, I can use hardware MSAA and do the usual back-to-front alpha blending. I can even allow snow/rain particles to be lit by doing their lighting in the vert shader, perhaps.

Share this post


Link to post
Share on other sites
MJP    19753

Well you can (very easily) use forward rendering and still do culling for screen-space tiles. This is exactly what "Forward+" does, and what I did in my Light Indexed Deferred demo. You just use a compute shader to do the culling, and output a list of indices per tile. Then in your forward pass you figure out what tile a pixel is in, and loop over the light indices.

Share this post


Link to post
Share on other sites
CDProp    1451

Yeah, that's true. Do you do the depth-only prepass, like other Forward+ implementations I've read about, to find min/max depth for each tile to bracket the depth of your tile frustum? Have you found that to be a useful optimization?

 

The worst-case scenario that I'm thinking about, with something like Forward+, is that I'm looking through a vehicle window straight down a street that is lined with street lamps, which would be common in my case. Because I'm looking through a window, my min depth is very close by. My max depth is very far away. So, there will be some tiles whose frustum will intersect ALL of these street lights. When I go to draw my window, then, I have to process every one of those lights, even though none of them actually affect the window. Doing a world-space grid doesn't really have this problem.

 

So, it seems to me that it would at least be useful to modify Forward+ by splitting each tile frustum into chunks along the depth. That way, I'm not just asking, "Which lights affect this tile?" but "Which lights affect this tile at this depth?" And then that seems to get rid of the need to do any per-tile min/max depth determination (although a depth-only prepass may still be a useful optimization in terms of reducing over-shading).

 

I guess what I really need to do is experiment and see what works.

 

Edit: Not that you guys haven't been helpful, because you've been immensely helpful. I just think that maybe I'm worrying about the cost of things (e.g., a depth-only prepass, or resorting lights every frame, or worst-case scenarios, etc.) that I haven't even tried yet.

Edited by CDProp

Share this post


Link to post
Share on other sites
CDProp    1451

Yes, exactly. Thanks. I've hesitated to call it "clustered rendering" because, although the diagrams look similar, I haven't read up on it in enough detail yet to know if that's exactly what this is.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this