I'm referring to this: http://www.cse.chalmers.se/~uffe/clustered_shading_preprint.pdf there is also a video avaliable
">Is Clustered Forward Shading worth implementing?
I wouldn't implement it unless you're planning on using the types of scenes where it's shown to perform really well
But clustered-deferred seems to perform better than clustered-forward for scenes with tens of thousands of lights.
The thing is - has there ever been a case where you will need 30 thousand lights in a scene?
There is a global-illumination technique where you use reflective shadow maps to generate thousands of virtual point lights from every regular light source, which could easily create 30K lights in a scene.
Plus,won't it get bottlenecked by generating shadow maps for all the lights
If you need to generate shadow-maps en masse, you could use imperfect shadow maps to render thousands of shadow-maps simultaneously.
One thing I like about the tiled Clustered is that it become "cheaper" to handle transparent object. In the case of tiled deferred you have to build 2 lists, one that used the depth buffer for the light culling and one without. So yo can have a massive overhead on the transparent pass. With clustered 1 culling is necessary.
But again that depend of the light count (also clustered is a heavier in term of memory size if I'm correct) and the scene.
Also at the Siggraph Asia , they were a presentation about a 2.5D culling techinque that you can find here : https://sites.google.com/site/takahiroharada/
http://www.cse.chalmers.se/~olaolss/get_file.php?filename=clustered_forward_demo.zip here's the demo,if you don't trust the direct link you can get it here http://www.cse.chalmers.se/~olaolss/main_frame.php?contents=publication&id=tiled_clustered_forward_talk
deferred shading is really unhandy when it comes to anti aliasing and lighting transparent objects is not solved in this approach.
forward shading is the way to go, I expect in the next generation consoles to go back to it. I use a similar approach on my phone engines, I've a view space aligned 3d grid (texture) that has a 'count' and 'offset' value per voxel, that I use to index into a texture containing the light sources that affect that voxel. the grid creation is done every frame on CPU, I don't have 30k of lights, but I run with antialiasing, I use the same shader for solid and transparent objects, very convenient to use, I can even assign this texture on the vertexshader for lighting particles in a cheap way.
one problem you still have is to apply shadows/projectors, it's solveable by having an atlas and store more data per lightsource (projection matrix, offsets,extends etc), but it makes quite a lot of overhead.
Forward+ is the new rave.
It allows MSAA, transparency, multiple brdfs, and most applications end up being faster than tile based deferred. The only caveat is that if you're vertex shader bound (or cpu bound), that extra early z pass will hurt you. You can avoid it, but then you will have to limit the ammount of lights in the scene because you can't depth-cull it (but at least you can cull them per tile). Also you'll have to evaluate if stream out is viable to reuse processed vertices and save CPU & Vertex Shader (at the cost of memory & bandwidth).
Note that Forward+ (aka Clustered Forward, Light Indexed Deferred) is a very new topic and there's a lot of research coming up this year.
Must reads:
Light Indexed Deferred Rendering, Matt Pettineo, 2012
http://mynameismjp.wordpress.com/2012/03/31/light-indexed-deferred-rendering/
A 2.5D CULLING FOR FORWARD+ AMD, Takahiro Harada, 2012
https://sites.google.com/site/takahiroharada/storage/2012SA_2.5DCulling.pdf?attredirects=0
Clustered Deferred and Forward Shading, Olsson, Billeter, Assarsson, 2012
http://www.cse.chalmers.se/~olaolss/main_frame.php?contents=publication&id=clustered_shading
the Z-prepass worries me,does that mean I have to do the tessellation twice as well?(tessellation already hits my FPS big time)
but tesselated geometry has another problem, you cover a lot of pixel just partially when AA is enabled, that increases the costs a lot in the pixelshader. something like POM might scale way better.