Is Clustered Forward Shading worth implementing?

Richard Blake · 2013-01-29T19:18:09

I'm referring to this: http://www.cse.chalmers.se/~uffe/clustered_shading_preprint.pdf there is also a video avaliable "> the performance of this technique seems to scale perfectly for huge amounts of lights,but on lower amounts performs a little worse than the less advanced tiled culling method.The thing is - has there ever been a case where you will need 30 thousand lights in a scene?Plus,won't it get bottlenecked by generating shadow maps for all the lights(in the youtube video the lights just pass trough the bridge and under it).Unfortunately I couldn't test it's performance,because for some reason the provided demo won't start up(even tho I support OpenGL 3 and higher) and I've never done GLSL,so it might take time to get it to work.

Graphics and GPU Programming Programming OpenGL

Started by mrheisenberg January 09, 2013 05:56 AM

45 comments, last by Matias Goldberg 11 years, 2 months ago

Hodgman

52,717

January 21, 2013 12:19 AM

I think BDRF is a common misspelling/typo for BRDF - bidirectional reflectance distribution function.

. 22 Racing Series .

Ben Bowen

115

January 22, 2013 01:59 AM

Haha, whoops. I don't know why I wrote BDRF. Here's a reminder (saw this on Twitter ):

BeaRDed F!

Krohm

5,051

January 23, 2013 08:23 AM

There is a thing I don't understand.
It appears there's this thing still going on which plain forward can only do 8-10 lights per pass. How? In the past I've had quite some success encoding light data in textures and looping them on entry-level SM3 hardware. Perhaps I'm not seeing the whole picture but in SM4 with the much higher resource limits and the unlimited dynamic instruction count... shouldn't we go easily in the thousand range? Of course we'll neeed a z-only pass first.
So I guess there are additional practical reasons to stay in the 8-10 range.

At the top of page 2, I read about extra pressure and lower execution efficiency. I understand.

But, as much as I love lighting modularity coming from deferred, as a DDR3 card owner I still don't understand how the improved processing makes up for the bandwidth increase required. The trend on bandwidth is set. It looks to me we want to trash compute in the future.

Previously "Krohm"

Matias Goldberg

9,637

January 28, 2013 06:49 PM

Hi Krohm,

In shader model 4.0 you can have up to 4096 entries in a constant buffer (which would limit lights to ~256 if they have position, direction diffuse, specular). Or you can use texture buffers and have near limitless lights.

Let's say you use the latter, so no worries about the light count. And today with SM 5.0 we really don't need to worry about loop count limits either. So we're good on that front

Indeed you can loop through a 1000 lights in SM4+ hardware. But let's say I'm running at 1920x1080 resolution and the whole screen is covered.

1920x1080 x 1000 lights = 2.073.600.000 light evaluations per frame.

Not to mention some BRDFs are expensive (i.e. Cook Torrance). Framerate would be sloooooooooow. So slow in fact, that it could trigger the Windows watchdog for believing the GPU is stalled and restart the driver.

The secret behind Deferred shaders (or Forward+) is that even though there are thousands of lights, they're not covering the whole screen at the same time.

In other words: many small lights = few big lights.

It's typical that a single region of the screen isn't lit by more than 4-20 lights, may be 5 on average. Let's be pessimistic and say 10.

1920x1080 x 10 lights = 20.736.000 light evaluations per frame

That's a lot more reasonable for a GPU to perform in real time. In such scenario every region of pixels (called tiles) would only have to loop through 10 lights (on average), not a 1000 and waste gpu time on 990 lights that aren't needed.

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

CryZe

773

January 29, 2013 08:10 AM

But what if you would test whether a light actually should be covering the individual pixels inside this loop and skip all the lights, that shouldn't? The only difference to a tile-based deferred renderer would be, that the light culling is performed per pixel instead of per tile. But you wouldn't have all the BRDF, transparency, bandwidth and Anti-Aliasing issues. It would basically be a worse version of a light indexed deferred renderer, because the list of lights is not precomputed.

MJP

20,295

January 29, 2013 08:22 AM

Also with more traditional forward rendering you would typically have stage (performed either offline or online) where you determine which lights will affect a given mesh, so that you only apply those lights when rendering it. Once again the only major difference is your granularity, and when/where you cull your lights at your given level of granularity. Doing everything on the GPU lets you achieve very fine granularity (per-tile or per-pixel) with relatively simple code, which is the primary draw of deferred techniques.

The Blog | The Book

Matias Goldberg

9,637

January 29, 2013 07:18 PM

But what if you would test whether a light actually should be covering the individual pixels inside this loop and skip all the lights, that shouldn't? The only difference to a tile-based deferred renderer would be, that the light culling is performed per pixel instead of per tile. But you wouldn't have all the BRDF, transparency, bandwidth and Anti-Aliasing issues. It would basically be a worse version of a light indexed deferred renderer, because the list of lights is not precomputed.

You could do that, but GPUs suck at branch-heavy applications, specially if there's not good branch coherency within the tile block (pixel shaders are run in blocks)

And even if it did, a tile-based deferred renderer is MUCH more efficient in performing this culling.

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

Is Clustered Forward Shading worth implementing?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Is Clustered Forward Shading worth implementing?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines