Jump to content

  • Log In with Google      Sign In   
  • Create Account

Naive Question: Why not do light culling in world space?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
13 replies to this topic

#1 CDProp   Members   -  Reputation: 960

Like
0Likes
Like

Posted 10 February 2014 - 01:43 PM

I guess I can't speak for everybody, but the majority of the lights in my scene don't move. They're street lights, etc., that I want to affect things that do move, so they need to be dynamic, but I don't need them to move around in the scene. So, with a scene that is 2km², I could divide the world into 64x64 (4096) 32m² chunks and sort my lights into them using a quadtree. Yes, that's a lot of chunks, but even if I allow 16 lights per chunk, with a 2-byte index each, I am only using 128k of memory. And most of my lights would only have to be sorted ONCE. I would only have to sort the moving lights (vehicle headlights, etc.) once per frame. Then, when I go to shade a pixel, I simply calculate my chunk index from the fragment position, and that index will give me the list of lights that affect that chunk.

 

A few reasons why I suspect people don't do this:

 

a) They have a lot more dynamically-moving lights than I do (e.g., from laser-fire and whatnot).

b) They have so many lights that a 32m chunk size is way too huge.

c) You lose alignment with the 8x8 screen-space tiles, which can cause performance problems. (edit)

 

Is there anything else I am missing?


Edited by CDProp, 10 February 2014 - 02:25 PM.


Sponsor:

#2 cozzie   Members   -  Reputation: 1613

Like
1Likes
Like

Posted 10 February 2014 - 05:38 PM

Sounds like a good idea if you ask me.
It might be that the results are not 100% (but acceptable), you might have to take the viewer/ camera position and heading into account

#3 Hodgman   Moderators   -  Reputation: 30388

Like
1Likes
Like

Posted 10 February 2014 - 06:30 PM

This technique was used in Just Cause IIRC -- they made a big 2D texture, which is overlaid over the world, basically dividing it into a grid. Each grid cell (texel) can then have light indices stored in it.

When shading a surface, you use it's world-space position to select a grid cell, fetch that texel, which gives you a list of lights to use.



#4 TheChubu   Crossbones+   -  Reputation: 4354

Like
1Likes
Like

Posted 10 February 2014 - 09:05 PM

I thought the whole idea of spatial partitioning was to cull everything that could be culled, including lights huh.png  


"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#5 cozzie   Members   -  Reputation: 1613

Like
1Likes
Like

Posted 11 February 2014 - 12:04 PM

Interesting. I believe this means that lights not in the line of sight are also processed, because you only have a position in the grid cell/ texel and no orientation

#6 Matias Goldberg   Crossbones+   -  Reputation: 3399

Like
1Likes
Like

Posted 11 February 2014 - 12:15 PM

Light culling algorithms are common.

 

In Forward renderers you have a limited number of lights, so culling is a must. In Deferred renders, lights are often just rendered geometry with special additive shaders, thus can be culled like normal geometry.

 

It's worth noting you can use the standard algorithms that cull your geometry to also cull your lights, just as TheChubu says.

If you use a Quadtree to cull your geometry, you can use it for lights too.

 

Perhaps your case is that you're aware of the applications you'll be giving to it: most lights are static. Thus a custom solution that fit your particular needs is not crazy.

 

Be careful with quadtrees though. Trees often involve multiple dependent-reads indirections which means GPU's memory latency can slow you down a lot (though you could workarounded perhaps with a few assumptions); and unless the memory layout is well planned, they're usually not cache friendly either.

Grids are often much better. They waste more ram though.



#7 MJP   Moderators   -  Reputation: 11376

Like
2Likes
Like

Posted 12 February 2014 - 02:13 AM

Using spatial hierarchies to associate meshes with lights was common back in the day if traditional forward renderers. With deferred rendering however it's totally unnecessary, as you can easily achieve per-tile level culling without any acceleration structures using only a fraction of a millisecond on the GPU. In most cases a 16x16 screen-space tile is probably going to be a lot more granular than a 32x32m chunk, which would really hurt your ability to scale up on smaller lights. Plus your algorithm sounds inherently expensive, since you're talking about a per-pixel traversal of an acceleration structure.



#8 CDProp   Members   -  Reputation: 960

Like
0Likes
Like

Posted 12 February 2014 - 10:31 AM

Yeah, I see what you mean. I've been kind of down on deferred rendering lately because I feel like you give up a lot in order to achieve it. We actually have a deferred renderer that does the old style light culling (looping through the lights on the CPU, stenciling them one by one). So, bandwidth use is high. We're using FXAA, but that still leaves things looking unacceptably jaggy. So it seems there are two options:

a) Solve the problems with the deferred rendering. Do tiled light culling to save bandwidth, implement MSAA (requires edge detection pass, per-sample lighting on edges, custom MSAA resolve). I'm already eating the cost of a custom resolve with my tone mapping, but the rest sounds pretty slow.

b) If I've already committed to doing some sort of tiled light culling, then I might as well just ditch deferred rendering and solve the AA problem as well. I also don't have to handle transparency any differently, which is a nice bonus.

So, b is sounding like a better option to me. Taking this further, though, the point of my OP is to raise the question of whether it's even worth it to re-sort my lights every frame along screen space tiles, given that 90% of them don't move with respect to world space. The acceleration structure would be used for the sorting step only. When it comes to actually looking up which tile a fragment is being rendered in, I would just use the world-space position to calculate an index into the grid implied by the deepest level of the tree.

I realize, though, that my world space grid is not going to align with the screen space tiles along which the GPU will partition it's fragment rendering, and so there will be inefficiencies there.

#9 cozzie   Members   -  Reputation: 1613

Like
1Likes
Like

Posted 12 February 2014 - 01:38 PM

Maybe it helps, but here's what I do which might be what you're aiming for:

Per frame:
- check in which "cell"/ grid of the quadtree the camera is in
- cull all renderables in that "cell" against the frustum
- do the same with each light, using simple distance check with bounding sphere radius
- loop through the renderables and do a distance check to only the lights that are visible in the frustum (lightpos + max range), save those light ID's in a simple int vector
- render per renderable and send only those lights that affect the renderable, to the shader

Blended renderables can be done exactly the same (after sorting that is), no blending issues because of forward rendering. Also MSAA is no issue.

#10 CDProp   Members   -  Reputation: 960

Like
0Likes
Like

Posted 12 February 2014 - 03:25 PM

What I have in mind would be somewhat different. It would be more like this. I divide the world into a grid of 32x32 meter cells. For an arena that is 2 square kilometers, that is roughly 4000 cells. I give each cell an index according to j*NumXCells + i with some choice of origin.

 

per frame:

- Figure out which cell each light belongs in by doing sphere/cone vs. AABB tests on the cells. Here is where I might use some sort of quadtree logic to cut down on the number of tests I'm doing. Store the light indices in a shader storage buffer. If I'm allowing a max of 16 lights per cell, then this buffer would need to hold 16*numCells integers. So, a 64kb buffer would work in this instance. If there are fewer than 16 lights in a cell, then use a sentinel value after the last light index.

 

per fragment:

- Calculate the fragment position from depth. Use that fragment position to figure out the index of the grid cell that the fragment is in. Use index*max_lights_per_cell to get the index into the aforementioned buffer. Start looping through the lights until the sentinel value, or max_lights_per_cell, has been reached.

 

And one optimization I have in mind is having two buffers, one for moving lights and another for non-moving ones. The advantage there is that the non-moving lights only have to be sorted once. Only about 10% of my lights actually move in world space (vehicle headlights, etc.) and so I would only need to sort these on a per-frame basis.

 

And yes, since it's forward-rendered, I can use hardware MSAA and do the usual back-to-front alpha blending. I can even allow snow/rain particles to be lit by doing their lighting in the vert shader, perhaps.



#11 MJP   Moderators   -  Reputation: 11376

Like
1Likes
Like

Posted 12 February 2014 - 03:40 PM

Well you can (very easily) use forward rendering and still do culling for screen-space tiles. This is exactly what "Forward+" does, and what I did in my Light Indexed Deferred demo. You just use a compute shader to do the culling, and output a list of indices per tile. Then in your forward pass you figure out what tile a pixel is in, and loop over the light indices.



#12 CDProp   Members   -  Reputation: 960

Like
0Likes
Like

Posted 12 February 2014 - 04:01 PM

Yeah, that's true. Do you do the depth-only prepass, like other Forward+ implementations I've read about, to find min/max depth for each tile to bracket the depth of your tile frustum? Have you found that to be a useful optimization?

 

The worst-case scenario that I'm thinking about, with something like Forward+, is that I'm looking through a vehicle window straight down a street that is lined with street lamps, which would be common in my case. Because I'm looking through a window, my min depth is very close by. My max depth is very far away. So, there will be some tiles whose frustum will intersect ALL of these street lights. When I go to draw my window, then, I have to process every one of those lights, even though none of them actually affect the window. Doing a world-space grid doesn't really have this problem.

 

So, it seems to me that it would at least be useful to modify Forward+ by splitting each tile frustum into chunks along the depth. That way, I'm not just asking, "Which lights affect this tile?" but "Which lights affect this tile at this depth?" And then that seems to get rid of the need to do any per-tile min/max depth determination (although a depth-only prepass may still be a useful optimization in terms of reducing over-shading).

 

I guess what I really need to do is experiment and see what works.

 

Edit: Not that you guys haven't been helpful, because you've been immensely helpful. I just think that maybe I'm worrying about the cost of things (e.g., a depth-only prepass, or resorting lights every frame, or worst-case scenarios, etc.) that I haven't even tried yet.


Edited by CDProp, 12 February 2014 - 04:11 PM.


#13 _swx_   Members   -  Reputation: 940

Like
0Likes
Like

Posted 12 February 2014 - 04:24 PM

So, it seems to me that it would at least be useful to modify Forward+ by splitting each tile frustum into chunks along the depth.


You mean like this: https://sites.google.com/site/takahiroharada/storage/2012SA_2.5DCulling.pdf?attredirects=0

#14 CDProp   Members   -  Reputation: 960

Like
0Likes
Like

Posted 12 February 2014 - 04:31 PM

Yes, exactly. Thanks. I've hesitated to call it "clustered rendering" because, although the diagrams look similar, I haven't read up on it in enough detail yet to know if that's exactly what this is.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS