Jump to content
  • Advertisement
Sign in to follow this  
ignotion

Lighting rendering architecture doubt

This topic is 4325 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Good afternon, I'm implementing a shader generator for my engine. This is, given a material i will export a custom shader. This is the "easy" step. The hardest one is when it comes to lighting. I will support 4 kinds of "light". Omni, spot, directional and projectors. I've thought about 2 possibilities when it comes to add lighting to the shaders. 1st, shader explosion: where i generate N (where N can be a very big number) of shaders for each material, each shader handling a case of ilumination. Counting that i'm planning to support several lights, you can see, combinations like: - 1 omni - 1 omni + 1 spot - 1 omni + 1 directional ... - 2 omnis + 2 spot + 1 projector ... and so on. Using this method i will generate tons of shaders and using some kind of mark depending the state of the object i would use one shader or another (i.e: object affected by 1 light or object affected by 5 lights) 2nd, multipass. I've thought i could easily group the lights in different passes, so i could have something like Draw all geometry with: 1 pass ambient + diffuse texture Draw all geometry affected by omni lights (using the omni shaders) 2nd pass omni lights Draw all geometry affected by spot lights (using the spot shaders) 3rd pass spot lights Draw all geometry affected by directional lights (using the directional shaders) 4rd pass directional lights Draw all geometry affected by projector lights (using the projector shaders) 5rd pass projector lights What do you think, i'm a bit lost and i really need help with this? Thanx in advance, Toni

Share this post


Link to post
Share on other sites
Advertisement
The multipass method is a good choice. Moreover I advise to you consider one pass per light. For 2 reasons :

1) you will not have a maximum limit of light count per object
2) you can win a lots of performance by limiting the shader influence due to the light attenuation (by using user-defined clipping plane for example).

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
At my company, we tried the multi-pass approach and found it was too slow in practice, even with all sorts of culling going on. We now use a single-pass approach with a fixed set of lights per shader.

Share this post


Link to post
Share on other sites
ignotion: I've been having the exact same thoughts as you recently. However, I'm not sure you need as many shaders as you may think. To take your example. If you have a shader that can handle 1 omni, 1 directional and 1 spot, you don't need a shader that can handle 1 omni. You can just turn off the two other lights. I realize that this might not be the best solution performance wise, but as long as we are talking simple combinations of primitive lights, I do not believe is problem. You can save alot of shaders at very little cost, as these shaders will not be the bottlenecks anyway. You may still need quite a few shaders, but instead of 64 per special effect (4 omni * 4 directional * 4 spot = 64) you might do with four or five.

However, if you plan to start using shadow mapping (or something similar) it is probably a good idea to do a render pass per light from the beginning.

Perhaps I hybrid method could be a good idea. If you do one render pass with all the "simple" lights and after that one render pass per light it might not be such a bad idea.

Best regards,
Per Rasmussen.

Share this post


Link to post
Share on other sites
I like these threads [smile]

I suppose it depends on your API of choice, mine being D3D but I assume similar is possible with OpenGL. You can minimize the combinatorial explosion for the shader code you write using preprocessor tricks - compile time dead-code elimination or old-school #define magic.

By writing a big, generic, shader controlled by compile-time constants you can make your life a lot easier without impacting on the eventual performance of your application. Well, not in my experience anyway [wink]

As for the multi-pass versus "uber shader" I'd highly suggest a compile-time approach. Build/cache your shaders according to the underlying hardware - older hardware might get 1 light per pass (FFP/SM1), mid-range (SM2) might get 2-3 lights per pass and decent hardware (SM3/SM4) might get many more. You should be able to design your code so that you can run-time scale between these.

As for "shader explosion" - you could try running all of the combinations through an automated build process. The compiled binaries for shaders aren't too big so you can ship 100's or 1000's with your game without too much trouble. That obviously means maintaining many many shaders at runtime - but aggressive use of profiling should reveal this as your problem fairly early on. You can then employ the many state-management algorithms to try and reduce the cost or possibly look into rolling similar shaders into dynamically-branched shaders. The latter change is trivial once you have the initial framework set up so costs little to experiment with.

As for the multi-pass approach - a definite good idea and should be fairly dynamic (as described earlier). I wouldn't hard-code passes though, that seems like an unnecessary limitation on hardware capable of branching/looping.

Quote:
we tried the multi-pass approach and found it was too slow in practice
I can understand this, but I would counter with the fact that layering down the Z (and getting free ambient term at the same time) can be a definite advantage of multi-pass. I've received a fairly healthy speed-up by tactically filling the depth buffer before going on to do more complex rendering.

Which brings me onto a closing suggestion - have you considered 'deferred rendering'? I've not a huge amount of experience myself, but from what I've played with it seems like a good approach if you think that the lighting phase of your graphics is going to be costly.

hth
Jack

Share this post


Link to post
Share on other sites
Yeah it's a tough problem. A few options:

1) Pull a Far Cry and generate literally thousands of shaders. These are pretty trivial to generate (using the preprocessor, or better yet something like Libsh). The problem here is that the more you do this, the more you hose any chance of batching. It's a tradeoff now wherein you're never going to be even near 100% efficient (always either bottlenecked on CPU or GPU). D3D10 will help, but not solve this geometric explosion.

2) Multipass every light with aggressive culling. This is easy to implement but you're practically guaranteed to bottleneck in *several* areas. First, geometry transform! D3D10's streamout will again help, but rerasterizing everything is still going to be expensive, even with a pre-Z pass. Secondly you can't cull efficiently enough... if you try to do it down to anything less than object granularity you'll burn up even a multi-core CPU, but doing it above that level will sometimes waste a lot of GPU power. You can probably tell that I don't like this approach at all ;)

The first is workable in scenarios with relatively simple lighting and geometry. It can get to be more trouble that it's worth once a scene becomes complex, and the performance characteristics are highly non-uniform - one room could be *much* slower than another just because of the lighting environment.

Deferred rendering really solves this elegantly. It separates out the lighting and geometry components so that you pay only for on-screen lit pixels, doesn't require any real CPU work, and simplifies the renderer to boot. It's neat because it uses the rasterizer to solve the light contribution problem *per-pixel*, and efficiently!

That said I'm the first to admit that it comes with it's own set of fun issues. The most complained about is lack of MSAA. If you can live without that (and/or have your own anti-aliasing solution), deferred rendering might be the way to go.

I hear the G80/D3D10 have a way to do a custom multi-sample resolve though, which would allow MSAA to work with differed rendering...

Share this post


Link to post
Share on other sites
Sorry guys, but I'm going to flamed this topic a little bit [smile].

If you mix the concept of batching and the concept of expensive shader, you will not have nothing all than a useless mess of considerations.

First of all, let me consider a program that is really not cpu limited. Really really not, ok, forgot cpu. Consider that no Z pre pass are needed too.
I work on this case (a plane he), and I find that for a Blinn shader, with well done optimizations, the one-light per pass method using user-clipping plane to limit the shader effect is really really faster on a 6600 (SM2) than multi light per pass. The reason is that you never place two lights in the same position with the same attenuation. The lights are always spaced each other. ( the other implicit reason is that a blinn shader is expensive even in a sm2 due partly to the hug number of texture fetch needed - diffuse, specular level, specular color, gloss map, normal map, high map some times, emissive map, opacity map some times, reflection map some times.

Now, we can consider (that is, I know, debatable, but consider it) that we have the best pixel shader case. Then the idea is to find cpu-related algorithm (batching etc.) that will not unbalanced the program.

I'm not going to discuss on batching, but you can see that with the light architecture I described, you can imagine that the DC count tend to be equal to the number of lights present in the frustrum (including the ambient one).
Batching is above all a mesh and texture consideration - that's where I think it must tend to.

About the Zprepass I think - but I not really test it - that the double speed Z pre pass is what you might use instead of a z ambient pre pass.
Even in a ambient pass, you have expensive texture fetch : the ambient one (generaly near the diffuse one that is often composed of many textures), the emissive one, the high map one (if there is one ), the ambient occlusion one (if there is one )and the opacity/mask one (if there is one). Compared to a double speed Z pre pass, the choice is made...

EDIT: for the geometry transform consideration, I would say that a program is never limited by the vertex units, this is easy to it like so.

[Edited by - Woodchuck on November 20, 2006 8:05:40 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Woodchuck
I work on this case (a plane he), and I find that for a Blinn shader, with well done optimizations, the one-light per pass method using user-clipping plane to limit the shader effect is really really faster on a 6600 (SM2) than multi light per pass.

As I explained in point #2 above, you're going to get bitten by geometry transform and (potentially) rasterization overhead in this one for anything but the most trivial scenes. If you're already pixel-limited (for example, shading a sphere @ 1600x1200 or something) - or if you only have a few lights - you may not notice this, but it becomes quite unworkable for complex scenes, and ridiculously inefficient to boot.

Quote:
Original post by Woodchuck
The reason is that you never place two lights in the same position with the same attenuation. The lights are always spaced each other.

If you're implying that lights rarely overlap, that's simply not true. Many scenes that I've worked with have an average lighting complexity of 4 or greater.

Quote:
Original post by Woodchuck
the other implicit reason is that a blinn shader is expensive even in a sm2 due partly to the hug number of texture fetch needed

I don't understand what you're saying here... in the one light per pass method, you end up *re-reading* these surface attributes every time, whereas they are only fetched once in the one pass method... yet another strike against the multi-pass method.

Quote:
Original post by Woodchuck
Batching is above all a mesh and texture consideration - that's where I think it must tend to.

Batching must by necessity consider *all* render state changes. Shaders are one of those... You can dodge it somewhat with dynamic branching, but if you're working on a card with capable DB, deferred rendering is probably a better way to go.

Quote:
Original post by Woodchuck
EDIT: for the geometry transform consideration, I would say that a program is never limited by the vertex units, this is easy to it like so.

What? Even with *aggressive* LOD, geometry transform is *often* a bottleneck. Unified architectures help, but even in that case you don't want to be wasting processor power redoing the same work over and over again!

Share this post


Link to post
Share on other sites
Vertex processing or rasterization won't be an issue in any "game" scenario. So don't bother optimizing for that.

[edit]I see, there's a conflict. To be more precise: vertex processing or rasterization were never a problem *for us*. NVPerfHUD showed that pixel processing is the limiting factor. It might be an issue for point lights, because you draw highly tesselated objects into a small target buffer. But as stated below: shadows are a different set of problems.[/edit]

What we do: generate HLSL shader source onthefly from a script. We use simple (gamma-corrected) NdotL lighting, so a simple light source without shadows boils down to three PS instructions. Calculation is done in Tangent Space, light direction, attenuation and such are calculated in the VS. Combination of multiple lights into a single pass is a big win here, especially when using Parallax Mapping and the likes. For more complex lights and materials, for example shadows or bump self shadow, the win isn't as large anymore. I can give exact figures this evening, if you want.

We havn't tested a really complex lighting scenario, yet. Up to now, with a few light sources scattered over a large outdoor scene, usually 4 to 8 shader permutations are generated per shader. The shader generation itsself is slow. Switching on a light in a dense room for the first time usually leads to a noticable stutter of maybe 200ms time. At the moment, we haven't done anything about this issue, but there are several ways out.

For a single mesh to be painted, we repeatedly add the next light and ask the shader manager for a shader for this light combination. If it returns a Non-Null, we add another light and so on. If it returns NULL, we paint the mesh with the accumulated lights and then rerun the process with additive blending for the remaining lights.

a) With this system in place, you could precompute the permutations and load them on startup. The shader manager will never have to generate a shader, it simply looks up if there's a shader available for it or not.

b) You could pre-generate only a basic shader set for one single light of all types. Then, if the shader manager is asked for a certain combination of lights, it starts shader generation in a background thread and simply returns NULL till it is available. This way, additional light sources are done each in a separate pass until you have the combined shader ready that does both lights in a single pass.

Summarization: From my experience I recommend a SinglePass-MultipleLight-System. Most of the light sources won't have shadows anyways, because rendering the shadow maps is expensive as well. Simple non-shadowed lights are pretty cheap with this system. For a large outdoor scene with a single directional shadowing light (the sun), an additional non-shadowed point light is nearly free. In numbers: about 5% framerate loss per light.

Shader management is difficult, though. We use a std::map per Shader with an uint64 hash key that contains the lighting situation and the parameters used to generate this shader permutation. With an average size of 5 to 10 entries per map, lookup is fast enough.

a short commercial:

Pictures of the system in work can be seen here on GameDev.net

Bye, Thomas

Share this post


Link to post
Share on other sites
Quote:
Original post by AndyTX
You can probably tell that I don't like this approach at all ;)


That's true ;) I will try to explain that you are wrong in all your points.

Quote:
Original post by AndyTX
Quote:
Original post by Woodchuck
I work on this case (a plane he), and I find that for a Blinn shader, with well done optimizations, the one-light per pass method using user-clipping plane to limit the shader effect is really really faster on a 6600 (SM2) than multi light per pass.

As I explained in point #2 above, you're going to get bitten by geometry transform and (potentially) rasterization overhead in this one for anything but the most trivial scenes. If you're already pixel-limited (for example, shading a sphere @ 1600x1200 or something) - or if you only have a few lights - you may not notice this, but it becomes quite unworkable for complex scenes, and ridiculously inefficient to boot.


I'm talking about a next gen light architecture. Like in Doom3 and in the UE3. If you take the example of farcry, there is not a full blinn light system. Just a little part. This games is maybe geometry limited in some PC, I agree.

I'm talking about a Blinn shader that do NdotL, pow( NdotH, gloss), normal mapping and often paralax mapping. Not an old lightmap or just NdotL shader.

A real full mapped Blinn shader with maybe 20 omni lights per frustrum. If you want this, just reduce your geometry polygon count.
Normal mapping and parallax are made to replace polygons no ?

Quote:
Original post by AndyTX
Quote:
Original post by Woodchuck
The reason is that you never place two lights in the same position with the same attenuation. The lights are always spaced each other.

If you're implying that lights rarely overlap, that's simply not true. Many scenes that I've worked with have an average lighting complexity of 4 or greater.


I didn't say that light rarely overlap. I say that the ratio between the light screen size (in pixels) and the lit object/batch screen size is generaly less than 1. And this is easy to reduce it without a real quality impact.
In other term, I think there is less 'overlaped' pixel than pixels that are shaded for nothing for a given light.

Quote:
Original post by AndyTX
Quote:
Original post by Woodchuck
the other implicit reason is that a blinn shader is expensive even in a sm2 due partly to the hug number of texture fetch needed

I don't understand what you're saying here... in the one light per pass method, you end up *re-reading* these surface attributes every time, whereas they are only fetched once in the one pass method... yet another strike against the multi-pass method.


That's a good point. But its good for the same reason, there is a lots of pixels shaded for nothing compared of overlaped pixels.

Quote:
Original post by AndyTX
Quote:
Original post by Woodchuck
Batching is above all a mesh and texture consideration - that's where I think it must tend to.

Batching must by necessity consider *all* render state changes. Shaders are one of those... You can dodge it somewhat with dynamic branching, but if you're working on a card with capable DB, deferred rendering is probably a better way to go.


Mmmh, for a draw call, you have a static number of settexture and other things like that. That's why draw calls only considerations are better than just optimize one or two set textures some times between two draw calls;)

Quote:
Original post by Schrompf
[edit]I see, there's a conflict. To be more precise: vertex processing or rasterization were never a problem *for us*. NVPerfHUD showed that pixel processing is the limiting factor.[/edit]


Bingo !

[Edited by - Woodchuck on November 20, 2006 11:59:24 AM]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!