Simple Alternative to Clustered Shading for Thousands of Lights

Started by
13 comments, last by Ashaman73 9 years, 1 month ago

Hi,

I have recently discovered a very simple new way to render thousands of lights in both forward and deferred scenarios. I call the method BVH Accelerated Shading. BVH Accelerated Shading is quite a bit easier to implement than Clustered Shading methods, and its performance is seems pretty competitive, so this should make a good alternative to those methods.

[attachment=26012:lots_of_lights.png]

I have written a post about it here. Take a look and feel free to comment! smile.png

-fries

Advertisement

Nice! Thank you for sharing.

Nice! I do something similar per-mesh on the CPU in my engine: compute light intensity at mesh's position and then sort by decreasing intensity, submitting the first N lights to the mesh's shader. Your technique is per-pixel though...

Nice! I do something similar per-mesh on the CPU in my engine: compute light intensity at mesh's position and then sort by decreasing intensity, submitting the first N lights to the mesh's shader. Your technique is per-pixel though...

I think that's how I was doing it a while ago. But BVH Accelerated Shading has the added bonus of not having to switch state between objects, so you can potentially batch a lot of objects together in one draw call.

Also, if you already have a hierarchy of lights, implementing BVH Accelerated Shading shouldn't take very long ;)

This is really great, thanks for taking time to share this.

Looks good but I would see a bench of this technique versus clustered.

Very interesting, I'll try implementing this in a few weeks.

Aether3D Game Engine: https://github.com/bioglaze/aether3d

Blog: http://twiren.kapsi.fi/blog.html

Control

Glad you guys like it.

I'm going to be working on a clustering comparrison demo with code. It should be available in the next few days.

Very interesting and good work smile.png

When taking a look at your technique, you would need to sample the memory quite often, compared to a clustered/cell deferred rendering pipeline, where the lights are stored as uniforms/parameters. So, my question is, how much bandwidth is your approach using ? How does it perform on a low-bandwidth system ? How much time does it take to rebuild the bhv and upload it per frame (dynamic light sources) ? How does it scale with number of lights ? How does it scale with the size of the render buffer ?

That is works with deferred and forward and a mix of it makes it really interesting, especially for the notoriously hard to implement lit particles and transparent surfaces in a deferred shader.

When taking a look at your technique, you would need to sample the memory quite often, compared to a clustered/cell deferred rendering pipeline, where the lights are stored as uniforms/parameters.

In my experience with clustered, you store all the lights in the buffer in a very similar manner, except each cluster has a linear section of the buffer.
e.g. pseudo-shader:


struct ClusterLightRange { int start, size; };
struct LightInfo { float3 position; etc };

Buffer<ClusterLightRange> clusters;
Buffer<Lights> lights;

ClusterLightRange c = clusters[ClusterIdxFromPixelPosition(pixelPosition)];
for( int i=c.start, end=c.start+c.size; i!=end; ++i )
{
  Light l = lights[i];
  DoLight(l);
}

So the main additional cost is doing a bounding sphere test per light that you visit, and iterating through the light array as a linked-list rather than a simpler linear array traversal. On modern hardware, it should do pretty well. Worse than clustered, but probably not by much -- especially if the "DoLight" funciton above is expensive.

It would be interesting to do some comparisons between the two using (A) a simple Lambert lighting model, and (B) a very complex Cook-Torrence/GGX/Smith/etc fancy new lighting model.

Also, you could merge this technique with clustered shading:

A) CPU creates the BVH as described, and uploads into a GPU buffer.

B) A GPU compute shader traverses the BVH for each cluster, and generates the 'Lights' buffer in my pseudo example above.

C) Lighting is performed as in clustered shading (as in my pseudo example above).

This topic is closed to new replies.

Advertisement