I understand that clustered forward is of course an optimization to non-clustered. But I fail to see how you can do without inevitably having to draw some geometry (not all, but some... occasionally) twice or more often. Unless your max_lights is huge (but then, if it's huge, why do you need clustered at all, just render the whole scene in one pass with 1,000 active lights!) or objects and lights always fit snugly into a single cluster, or lights are perfectly, evenly distributed across the screen so there are very few overlaps (never exceeding the max lights number), I believe you have no other choice than to render at least some objects twice, every now and then. How would it otherwise work? You have to render them somehow, and there's a limit on how many lights you can do at once.
Yeah, its hard to comprehend at first, I tried multiple times before I finally (think I) got it :)
The thing is, you put all visible lights in one big list. This can be cbuffer, texture, etc... from their paper, this appearently is enough to hold all lights they needed. (2xfloat2 for point, 2xfloat3 for spotlight).
Then, you compile a list of lights that each cluster is affected. You first have a lookup-list (again, texture or cbuffer) with (Offset, NumPoints, NumSpots). This offset points to another buffer, which contains indices into the light list.
Then, for each fragment, you calculate which cluster it is in, and do the lookup. This is a double indirection, but appearently fast enough. This is independant of forward or deferred, you just have a different source of input for the position+z-coordinate used to calculate the cluster.
So you can have potentially infinite (= a high amount of) lights, only limited by memory, though in practice they limit it to a certain amount, determined by profiling for a worst-case szenario. Still no need to render anything twice, even with forward rendering.
@AliasBinman:
Ah, thanks, that was about what I wanted to hear. So I think I'll just stick with deferred rendering by now, and maybe implement forward rendering at some point to profile & compare, especially since they seem kind of easy to switch out in this.
@Frenetic Pony:
Thanks, sounds really awesome, I'll have a look at it and see how hard it would be to implement alongside. I have kind of a time-concern for now, since I'm doing it alongside my master thesis at university, and one thing I liked about clustered shading is that, once you comprehend it, it seems really easy to implement. So I'll try to keep things clean and expandable, and apply such highly advanced techniques at some point, eventually :)