Lighting rendering architecture doubt

Started by
14 comments, last by AndyTX 17 years, 4 months ago
Let me preface this by the fact that I have implemented and tested all of the systems described here, so I'm not just making this up. I could potentially be very wrong - in particular with respect to your specific application - but I honestly am trying to help, not justify some design decision that I've made.

In particular I've written a fairly optimized forward renderer that generates shader permutations on the fly (from light and surface shaders), selecting and sorting based on BHV intersections: Old Screenshot (you get the idea).

Quote:Original post by Woodchuck
I'm talking about a Blinn shader that do NdotL, pow( NdotH, gloss), normal mapping and often paralax mapping. Not an old lightmap or just NdotL shader.

What made you think that I was talking about anything else? I even consider that to be a pretty basic lighting system myself, but I could be out of touch ;)


Quote:Original post by Woodchuck
If you want this, just reduce your geometry polygon count.

You shouldn't have to reduce your polygon count... there's no reason why lighting and geometric complexity have to be related by O(G*L). Deferred rendering provides a elegant solution to that!

Quote:Original post by Woodchuck
Normal mapping and parallax are made to replace polygons no?

Not entirely, no. They're useful for cases where the lighting model and some basic parallax are the only discernible effect of complex geometry, but useless in other cases that actually need to model proper occlusion.

These sorts of techniques help us get back to a system wherein rasterization is a win (i.e. polygons larger than a pixel), but they do *not* solve the global scene complexity problem - far from it. Note that polygon counts are *still* increasing even in new games that make use of these sorts of shaders.

Quote:Original post by Woodchuck
In other term, I think there is less 'overlaped' pixel than pixels that are shaded for nothing for a given light.

I still don't know what you're saying, or really how it relates to the conversation, sorry :(

If you're talking about shading pixels that will eventually be occluded, that's a total non-issue in any of these techniques. Using occlusion queries, a pre-Z pass, and/or deferred rendering it is easy to solve this problem. What I'm talking about is the granularity of "light contribution" calculations. It simply cannot be tight enough when done on the CPU.

Imagine a large triangle that fills the whole screen but yet only a small piece of it is affected by a light (these are the sorts of cases that you're suggesting will become common with normal mapping, etc). There's no way to avoid wasting a lot of power computing lighting for the whole triangle per-pixel, when only a small portion of it will actually get lit. Deferred rendering solves this.

Note that you can't even compute which *triangles* might be affected by a light since that's too expensive. The best one can reasonably do is use object bounding volumes, which is an extremely course way to go, and *still* eats CPU power. Deferred shading solves this per-pixel and is cheap...

Quote:Original post by Woodchuck
Mmmh, for a draw call, you have a static number of settexture and other things like that. That's why draw calls only considerations are better than just optimize one or two set textures some times between two draw calls;)

Don't know what you're trying to say here... all I'm saying is that *any* state changes breaks batching. So therefore having a different shader for every object (or even triangle) in your scene to handle lots of different lighting situations will ruin all batching.

Quote:Original post by Schrompf
[edit]I see, there's a conflict. To be more precise: vertex processing or rasterization were never a problem *for us*. NVPerfHUD showed that pixel processing is the limiting factor.[/edit]

Great - although I'm sure you can see how they could easily become so for a more complex scene using a single light per pass (as I was discussing).

This is one reason why I think that "multiple lights per pass" and deferred rendering are really the only two reasonable options moving forward. As hardware becomes more capable (and it's already quite usable) and scenes become more complex, I suspect that deferred rendering will continue to look better and better in comparison.
Advertisement
Deferred Shading might be a good solution in the future. At the moment, the basic costs of maintaining several high precision render targets und sampling from them again and again are too high. You have to throw ALOT of lights at it to have it look better than a forward renderer.

It's a bit off the topic, I think. The discussion was SinglePassPerLight vs. MultipleLightsPerPass. I'm back home now and I've done some measurements. I'd like to share the results with you, although the scenario might not suit to your application. The SinglePassPerLight tests were done simply by breaking out of the light accumulation loop after the first light.

This is the test case: (click to enlarge)
test scenario

Three light setups were tested:

A) 3 point lights with shadows (Shadow Mapping, 8xPCF)
B) 3 point lights without shadows
C) 3 point lights without Shadows + global sun light with shadows

All lights per Pixel, terrain uses alpha blending to combine multiple materials, several materials using three-step virtual displacement mapping.

Single pass per light:

A) 52 fps B) 129 fps C) 70 fps

All lights in a single pass:

A) 60 fps B) 205 fps c) 106 fps

The result: Combining multiple lights into a single pass yields an performance win of 20% to 50% in this scenario. Light setup A) shows that the performance win decreases with increasing shader complexity per light. Part of the performance loss is also due to the fact that the scene has to be rendered 6 times per light.

On cards with dynamic branching you could skip all shadow calculations in the PS when noticing the attenuation is 0. This can be done for all three solutions (single pass, multiple pass, deferred). Dynamic branching is necessary, though, as a texkill() or clip() does still execute the shader to the end before discarding the pixel. You might also setup some funky Stencil Buffer magic, but I think you'll gain not much because of the additional overhead for marking the pixels. Haven't tried it, though.

[edit] Image url corrected

Bye, Thomas
----------
Gonna try that "Indie" stuff I keep hearing about. Let's start with Splatter.
Quote:Original post by AndyTX
rerasterizing everything is still going to be expensive, even with a pre-Z pass.


While most of your points are valid (even though I don't necessarily agree with the end solution as you know [wink]) I've got a fairly big issue with that point. Right now, early-Z culling on almost all recent hardware is simply FANTASTIC, and only getting better (i.e. G80's new system), and can dump a very large amount of the pixel crunching that goes on if any overdraw occurs if a multipass solution is used.

Also, I think you're forgetting that with non-DR solutions, with usage of dynamic branching (or even texkill depending on how fast GPUs are at it, something I'm not knowledgable in at all) can provide even nicer in scenarios like your "large triangle that fills the whole screen but yet only a small piece of it is affected by a light".

Personally, while I haven't done much research on it, the multi-pass solution is reasonably appealing to me for more than a couple reasons. The big one is shadowing issues. It seems to me that when using, for example, high res (note that I'm talking on the order of resolutions that actually give results that are worth a damn in terms of final quality, i.e. more than 1k*1k) shadow maps that doing a full omni light, let alone MULTIPLE ones in a single pass is not at all a viable option despite the issue of risking greater CPU limitations.

Quote:Great - although I'm sure you can see how they could easily become so for a more complex scene using a single light per pass (as I was discussing).


Keep in mind that for HIS scenario it's a pixel limitation. In the general case, yeah, you can make any scene geometry limited, but you have to ermember that deferred rendering takes the current fillrate of your scene (even with pre-z pass) and slams it in the kneecaps with a crowbar because of the cost involved with the initial generation of the G-Buffers. I don't mean any offense, but I think that your academic-centric thinking (i.e. "THE FINAL SOLUTION MUST WORK WELL EVEN IN THE WORST AND MOST BORDERLINE UNIMAGINABLE SCENARIO!!") regarding this point simply doesn't apply as well given his current scenario.
Thanks for the numbers Schrompf - they are roughly what I've seen as well. In particular, multiple passes never wins, unless you're scene has almost no geometry (like 1000 polygons or less). Of course a scene with different shader for every (small) object could be constructed that would punish this system, but I think it's quite reasonable for most games.

Quote:Original post by Schrompf
This can be done for all three solutions (single pass, multiple pass, deferred)

Note however that deferred rendering just uses the depth buffer for this, which tends to be even more efficient than DB, as Cypher19 notes.


Quote:Original post by Cypher19
Right now, early-Z culling on almost all recent hardware is simply FANTASTIC, and only getting better (i.e. G80's new system), and can dump a very large amount of the pixel crunching that goes on if any overdraw occurs if a multipass solution is used.

Oh yeah, z-cull is amazing fast. What I was trying to say is that even with StreamOut, I can't see re-rendering/rasterizing the whole scene to be nearly as efficient as deferred rendering. Hence I don't think multi-pass lighting is a reasonable solution for complex scenes... I think Schrompf's results agree with mine here.

Quote:Original post by Cypher19
Also, I think you're forgetting that with non-DR solutions, with usage of dynamic branching can provide even nicer in scenarios like your "large triangle that fills the whole screen but yet only a small piece of it is affected by a light".

Sure, but at the very least you still have to compute the attenuation function for every light, which is less efficient than simply rasterizing light volumes (with z-cull for stencil-shadow-like culling). And as you mentioned, DB is still not as efficient as z-cull ;) You also avoid the need for CPU sorting and group of the light's and geometry, which is a big win in and of itself for complex scenes.

Quote:Original post by Cypher19
The big one is shadowing issues. It seems to me that when using, for example, high res (note that I'm talking on the order of resolutions that actually give results that are worth a damn in terms of final quality, i.e. more than 1k*1k) shadow maps that doing a full omni light, let alone MULTIPLE ones in a single pass is not at all a viable option

Firstly, it's quite reasonable to use several huge textures in a shader, so single-pass forward rendering with shadows is still completely reasonable.

Secondly, deferred rendering has the same benefits as multiple-passes-per-light here in that you only need to keep one shadow map around at a time :)


Quote:Original post by Cypher19
Keep in mind that for HIS scenario it's a pixel limitation. In the general case, yeah, you can make any scene geometry limited, but you have to ermember that deferred rendering takes the current fillrate of your scene (even with pre-z pass) and slams it in the kneecaps with a crowbar because of the cost involved with the initial generation of the G-Buffers.

It adds a *constant factor* to your fill requirements though, since you're just writing more data. If you're worried about overdraw in the G-buffer, do a pre-Z pass before building your buffers.

In my experience, this is the cheapest part of deferred rendering, even with complex scenes. Even 6000 series cards rip through this part... 7000 do it even better and on G80 it's a total non-issue. In fact the latency of these reads can be almost entirely hidden by the lighting math on the G80 since the texture coordinates are extremely coherent and known before PS execution even begins.

Quote:Original post by Cypher19
I don't mean any offense, but I think that your academic-centric thinking [...] regarding this point simply doesn't apply as well given his current scenario.

Guilty as charged :) However I did mention several times that even the most inefficient implementation will work fine with a small workload. Certainly design your engine for a given workload, but the cases in which deferred rendering lose are becoming fewer and fewer. Plus, it's a lot easier to implement than an efficient forward renderer :)
Quote:Original post by AndyTX
I hear the G80/D3D10 have a way to do a custom multi-sample resolve though, which would allow MSAA to work with differed rendering...


Do you have a link for this?
Quote:Original post by Anonymous Poster
Do you have a link for this?

B3D Thread

This topic is closed to new replies.

Advertisement