Simple Alternative to Clustered Shading for Thousands of Lights

Graphics and GPU Programming Programming

Started by fries February 20, 2015 07:12 PM

13 comments, last by Ashaman73 9 years, 1 month ago

407

Author

February 20, 2015 07:12 PM

Hi,

I have recently discovered a very simple new way to render thousands of lights in both forward and deferred scenarios. I call the method BVH Accelerated Shading. BVH Accelerated Shading is quite a bit easier to implement than Clustered Shading methods, and its performance is seems pretty competitive, so this should make a good alternative to those methods.

[attachment=26012:lots_of_lights.png]

I have written a post about it here. Take a look and feel free to comment!

-fries

My blog

MJP

20,295

February 20, 2015 09:56 PM

Nice! Thank you for sharing.

The Blog | The Book

Aressera

3,144

February 21, 2015 02:17 AM

Nice! I do something similar per-mesh on the CPU in my engine: compute light intensity at mesh's position and then sort by decreasing intensity, submitting the first N lights to the mesh's shader. Your technique is per-pixel though...

fries

407

Author

February 21, 2015 02:29 AM

Nice! I do something similar per-mesh on the CPU in my engine: compute light intensity at mesh's position and then sort by decreasing intensity, submitting the first N lights to the mesh's shader. Your technique is per-pixel though...

I think that's how I was doing it a while ago. But BVH Accelerated Shading has the added bonus of not having to switch state between objects, so you can potentially batch a lot of objects together in one draw call.

Also, if you already have a hierarchy of lights, implementing BVH Accelerated Shading shouldn't take very long ;)

My blog

Tessellator

1,402

February 22, 2015 11:29 AM

This is really great, thanks for taking time to share this.

Alundra

2,325

February 24, 2015 07:01 AM

Looks good but I would see a bench of this technique versus clustered.

bioglaze

1,637

February 24, 2015 09:09 AM

Very interesting, I'll try implementing this in a few weeks.

Aether3D Game Engine: https://github.com/bioglaze/aether3d

Blog: http://twiren.kapsi.fi/blog.html

Control

fries

407

Author

February 25, 2015 10:26 PM

Glad you guys like it.

I'm going to be working on a clustering comparrison demo with code. It should be available in the next few days.

My blog

Ashaman73

13,718

February 26, 2015 06:30 AM

Very interesting and good work

When taking a look at your technique, you would need to sample the memory quite often, compared to a clustered/cell deferred rendering pipeline, where the lights are stored as uniforms/parameters. So, my question is, how much bandwidth is your approach using ? How does it perform on a low-bandwidth system ? How much time does it take to rebuild the bhv and upload it per frame (dynamic light sources) ? How does it scale with number of lights ? How does it scale with the size of the render buffer ?

That is works with deferred and forward and a mix of it makes it really interesting, especially for the notoriously hard to implement lit particles and transparent surfaces in a deferred shader.

Ashaman

Gnoblins: Website - Facebook - Twitter - Youtube - Steam Greenlit - IndieDB - Gamedev Log

Hodgman

52,717

February 26, 2015 07:03 AM

When taking a look at your technique, you would need to sample the memory quite often, compared to a clustered/cell deferred rendering pipeline, where the lights are stored as uniforms/parameters.

In my experience with clustered, you store all the lights in the buffer in a very similar manner, except each cluster has a linear section of the buffer.
e.g. pseudo-shader:


struct ClusterLightRange { int start, size; };
struct LightInfo { float3 position; etc };

Buffer<ClusterLightRange> clusters;
Buffer<Lights> lights;

ClusterLightRange c = clusters[ClusterIdxFromPixelPosition(pixelPosition)];
for( int i=c.start, end=c.start+c.size; i!=end; ++i )
{
  Light l = lights[i];
  DoLight(l);
}

So the main additional cost is doing a bounding sphere test per light that you visit, and iterating through the light array as a linked-list rather than a simpler linear array traversal. On modern hardware, it should do pretty well. Worse than clustered, but probably not by much -- especially if the "DoLight" funciton above is expensive.

It would be interesting to do some comparisons between the two using (A) a simple Lambert lighting model, and (B) a very complex Cook-Torrence/GGX/Smith/etc fancy new lighting model.

Also, you could merge this technique with clustered shading:

A) CPU creates the BVH as described, and uploads into a GPU buffer.

B) A GPU compute shader traverses the BVH for each cluster, and generates the 'Lights' buffer in my pseudo example above.

C) Lighting is performed as in clustered shading (as in my pseudo example above).

. 22 Racing Series .

Simple Alternative to Clustered Shading for Thousands of Lights

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Simple Alternative to Clustered Shading for Thousands of Lights

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines