Naive Question: Why not do light culling in world space?

Graphics and GPU Programming Programming

Started by CDProp February 10, 2014 07:43 PM

12 comments, last by CDProp 10 years, 2 months ago

1,451

Author

February 10, 2014 07:43 PM

I guess I can't speak for everybody, but the majority of the lights in my scene don't move. They're street lights, etc., that I want to affect things that do move, so they need to be dynamic, but I don't need them to move around in the scene. So, with a scene that is 2km², I could divide the world into 64x64 (4096) 32m² chunks and sort my lights into them using a quadtree. Yes, that's a lot of chunks, but even if I allow 16 lights per chunk, with a 2-byte index each, I am only using 128k of memory. And most of my lights would only have to be sorted ONCE. I would only have to sort the moving lights (vehicle headlights, etc.) once per frame. Then, when I go to shade a pixel, I simply calculate my chunk index from the fragment position, and that index will give me the list of lights that affect that chunk.

A few reasons why I suspect people don't do this:

a) They have a lot more dynamically-moving lights than I do (e.g., from laser-fire and whatnot).

b) They have so many lights that a 32m chunk size is way too huge.

c) You lose alignment with the 8x8 screen-space tiles, which can cause performance problems. (edit)

Is there anything else I am missing?

cozzie

5,063

February 10, 2014 11:38 PM

Sounds like a good idea if you ask me.
It might be that the results are not 100% (but acceptable), you might have to take the viewer/ camera position and heading into account

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Hodgman

52,717

February 11, 2014 12:30 AM

This technique was used in Just Cause IIRC -- they made a big 2D texture, which is overlaid over the world, basically dividing it into a grid. Each grid cell (texel) can then have light indices stored in it.

When shading a surface, you use it's world-space position to select a grid cell, fetch that texel, which gives you a list of lights to use.

. 22 Racing Series .

TheChubu

9,484

February 11, 2014 03:05 AM

I thought the whole idea of spatial partitioning was to cull everything that could be culled, including lights

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

cozzie

5,063

February 11, 2014 06:04 PM

Interesting. I believe this means that lights not in the line of sight are also processed, because you only have a position in the grid cell/ texel and no orientation

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Matias Goldberg

9,637

February 11, 2014 06:15 PM

Light culling algorithms are common.

In Forward renderers you have a limited number of lights, so culling is a must. In Deferred renders, lights are often just rendered geometry with special additive shaders, thus can be culled like normal geometry.

It's worth noting you can use the standard algorithms that cull your geometry to also cull your lights, just as TheChubu says.

If you use a Quadtree to cull your geometry, you can use it for lights too.

Perhaps your case is that you're aware of the applications you'll be giving to it: most lights are static. Thus a custom solution that fit your particular needs is not crazy.

Be careful with quadtrees though. Trees often involve multiple dependent-reads indirections which means GPU's memory latency can slow you down a lot (though you could workarounded perhaps with a few assumptions); and unless the memory layout is well planned, they're usually not cache friendly either.

Grids are often much better. They waste more ram though.

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

MJP

20,295

February 12, 2014 08:13 AM

Using spatial hierarchies to associate meshes with lights was common back in the day if traditional forward renderers. With deferred rendering however it's totally unnecessary, as you can easily achieve per-tile level culling without any acceleration structures using only a fraction of a millisecond on the GPU. In most cases a 16x16 screen-space tile is probably going to be a lot more granular than a 32x32m chunk, which would really hurt your ability to scale up on smaller lights. Plus your algorithm sounds inherently expensive, since you're talking about a per-pixel traversal of an acceleration structure.

The Blog | The Book

CDProp

1,451

Author

February 12, 2014 04:31 PM

Yeah, I see what you mean. I've been kind of down on deferred rendering lately because I feel like you give up a lot in order to achieve it. We actually have a deferred renderer that does the old style light culling (looping through the lights on the CPU, stenciling them one by one). So, bandwidth use is high. We're using FXAA, but that still leaves things looking unacceptably jaggy. So it seems there are two options:

a) Solve the problems with the deferred rendering. Do tiled light culling to save bandwidth, implement MSAA (requires edge detection pass, per-sample lighting on edges, custom MSAA resolve). I'm already eating the cost of a custom resolve with my tone mapping, but the rest sounds pretty slow.

b) If I've already committed to doing some sort of tiled light culling, then I might as well just ditch deferred rendering and solve the AA problem as well. I also don't have to handle transparency any differently, which is a nice bonus.

So, b is sounding like a better option to me. Taking this further, though, the point of my OP is to raise the question of whether it's even worth it to re-sort my lights every frame along screen space tiles, given that 90% of them don't move with respect to world space. The acceleration structure would be used for the sorting step only. When it comes to actually looking up which tile a fragment is being rendered in, I would just use the world-space position to calculate an index into the grid implied by the deepest level of the tree.

I realize, though, that my world space grid is not going to align with the screen space tiles along which the GPU will partition it's fragment rendering, and so there will be inefficiencies there.

cozzie

5,063

February 12, 2014 07:38 PM

Maybe it helps, but here's what I do which might be what you're aiming for:

Per frame:
- check in which "cell"/ grid of the quadtree the camera is in
- cull all renderables in that "cell" against the frustum
- do the same with each light, using simple distance check with bounding sphere radius
- loop through the renderables and do a distance check to only the lights that are visible in the frustum (lightpos + max range), save those light ID's in a simple int vector
- render per renderable and send only those lights that affect the renderable, to the shader

Blended renderables can be done exactly the same (after sorting that is), no blending issues because of forward rendering. Also MSAA is no issue.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

CDProp

1,451

Author

February 12, 2014 09:25 PM

What I have in mind would be somewhat different. It would be more like this. I divide the world into a grid of 32x32 meter cells. For an arena that is 2 square kilometers, that is roughly 4000 cells. I give each cell an index according to j*NumXCells + i with some choice of origin.

per frame:

- Figure out which cell each light belongs in by doing sphere/cone vs. AABB tests on the cells. Here is where I might use some sort of quadtree logic to cut down on the number of tests I'm doing. Store the light indices in a shader storage buffer. If I'm allowing a max of 16 lights per cell, then this buffer would need to hold 16*numCells integers. So, a 64kb buffer would work in this instance. If there are fewer than 16 lights in a cell, then use a sentinel value after the last light index.

per fragment:

- Calculate the fragment position from depth. Use that fragment position to figure out the index of the grid cell that the fragment is in. Use index*max_lights_per_cell to get the index into the aforementioned buffer. Start looping through the lights until the sentinel value, or max_lights_per_cell, has been reached.

And one optimization I have in mind is having two buffers, one for moving lights and another for non-moving ones. The advantage there is that the non-moving lights only have to be sorted once. Only about 10% of my lights actually move in world space (vehicle headlights, etc.) and so I would only need to sort these on a per-frame basis.

And yes, since it's forward-rendered, I can use hardware MSAA and do the usual back-to-front alpha blending. I can even allow snow/rain particles to be lit by doing their lighting in the vert shader, perhaps.

Naive Question: Why not do light culling in world space?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Naive Question: Why not do light culling in world space?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines