Is Clustered Forward Shading worth implementing?

Started by
45 comments, last by Matias Goldberg 11 years, 2 months ago

I'm referring to this: http://www.cse.chalmers.se/~uffe/clustered_shading_preprint.pdf there is also a video avaliable

">
the performance of this technique seems to scale perfectly for huge amounts of lights,but on lower amounts performs a little worse than the less advanced tiled culling method.The thing is - has there ever been a case where you will need 30 thousand lights in a scene?Plus,won't it get bottlenecked by generating shadow maps for all the lights(in the youtube video the lights just pass trough the bridge and under it).Unfortunately I couldn't test it's performance,because for some reason the provided demo won't start up(even tho I support OpenGL 3 and higher) and I've never done GLSL,so it might take time to get it to work.
Advertisement

I wouldn't implement it unless you're planning on using the types of scenes where it's shown to perform really well wink.png

But clustered-deferred seems to perform better than clustered-forward for scenes with tens of thousands of lights.

The thing is - has there ever been a case where you will need 30 thousand lights in a scene?

There is a global-illumination technique where you use reflective shadow maps to generate thousands of virtual point lights from every regular light source, which could easily create 30K lights in a scene.

Plus,won't it get bottlenecked by generating shadow maps for all the lights

If you need to generate shadow-maps en masse, you could use imperfect shadow maps to render thousands of shadow-maps simultaneously.

One thing I like about the tiled Clustered is that it become "cheaper" to handle transparent object. In the case of tiled deferred you have to build 2 lists, one that used the depth buffer for the light culling and one without. So yo can have a massive overhead on the transparent pass. With clustered 1 culling is necessary.

But again that depend of the light count (also clustered is a heavier in term of memory size if I'm correct) and the scene.

Also at the Siggraph Asia , they were a presentation about a 2.5D culling techinque that you can find here : https://sites.google.com/site/takahiroharada/

Lots of lights + forward shading = sign me up.

http://www.cse.chalmers.se/~olaolss/get_file.php?filename=clustered_forward_demo.zip here's the demo,if you don't trust the direct link you can get it here http://www.cse.chalmers.se/~olaolss/main_frame.php?contents=publication&id=tiled_clustered_forward_talk

deferred shading is really unhandy when it comes to anti aliasing and lighting transparent objects is not solved in this approach.

forward shading is the way to go, I expect in the next generation consoles to go back to it. I use a similar approach on my phone engines, I've a view space aligned 3d grid (texture) that has a 'count' and 'offset' value per voxel, that I use to index into a texture containing the light sources that affect that voxel. the grid creation is done every frame on CPU, I don't have 30k of lights, but I run with antialiasing, I use the same shader for solid and transparent objects, very convenient to use, I can even assign this texture on the vertexshader for lighting particles in a cheap way.

one problem you still have is to apply shadows/projectors, it's solveable by having an atlas and store more data per lightsource (projection matrix, offsets,extends etc), but it makes quite a lot of overhead.

Forward+ is the new rave.

It allows MSAA, transparency, multiple brdfs, and most applications end up being faster than tile based deferred. The only caveat is that if you're vertex shader bound (or cpu bound), that extra early z pass will hurt you. You can avoid it, but then you will have to limit the ammount of lights in the scene because you can't depth-cull it (but at least you can cull them per tile). Also you'll have to evaluate if stream out is viable to reuse processed vertices and save CPU & Vertex Shader (at the cost of memory & bandwidth).

Note that Forward+ (aka Clustered Forward, Light Indexed Deferred) is a very new topic and there's a lot of research coming up this year.

Must reads:

Light Indexed Deferred Rendering, Matt Pettineo, 2012
http://mynameismjp.wordpress.com/2012/03/31/light-indexed-deferred-rendering/

A 2.5D CULLING FOR FORWARD+ AMD, Takahiro Harada, 2012

https://sites.google.com/site/takahiroharada/storage/2012SA_2.5DCulling.pdf?attredirects=0

Clustered Deferred and Forward Shading, Olsson, Billeter, Assarsson, 2012
http://www.cse.chalmers.se/~olaolss/main_frame.php?contents=publication&id=clustered_shading

the Z-prepass worries me,does that mean I have to do the tessellation twice as well?(tessellation already hits my FPS big time)

you can also try to sort front to back instead, if you are vertex bound, that might give you better results. another approach is to use occluder object, you can get 90% of the culling as with zprepass, yet without the cost.

but tesselated geometry has another problem, you cover a lot of pixel just partially when AA is enabled, that increases the costs a lot in the pixelshader. something like POM might scale way better.

how much vRAM do your GBuffers usually take up?GPU-Z tells me with 8xMSAA that mine takes around 350mb just for a position,color,normal,specular buffer.

This topic is closed to new replies.

Advertisement