Sign in to follow this  
Gumgo

Switching from deferred to forward... some questions

Recommended Posts

Gumgo    968
I was going to write a deferred renderer, but I decided to switch to forward rendering because of the limited material properties I could store in the G-buffer and also because I'd have to write a forward renderer for transparency anyway. I have a few questions forward rendering. First of all, I'm wondering how multiple lights are handled. I'm thinking that a maximum of 8 lights per object is probably plenty. It seems like the most efficient way to handle these cases would be to have a different shader for each combination of lights; i.e. 0 spot, 1 spot, 2 spot, ..., 7 spot, 8 spot, 0 spot 1 directional, 1 spot 1 directional, ..., 7 spot 1 directional... etc. Is this a good way to go? (Writing a program to generate all these shader combinations is not difficult.) Another thought I had was to use static branching with uniforms. One uniform could determine the number of lights to loop though, and each light could have a uniform determining which type it was. This would involve an if-else in a loop. How bad of a performance hit would this be? Is the first approach better? (The first approach would probably have more state changes.) I'm planning on using Cg, and another issue could be the number of uniforms available (each light would take several float4s - xyz for various colors (diffuse, ambient, etc.) and extra data could be packed into w, like attenuation). My current computer is pretty old, so its limit is 96 float4s I think. How many do newer cards generally support? I'll need some uniforms for fog, backface normal flip, and a bunch of matrices for skeletal animation... would I have enough left over for 8 lights? I've also read that lights are sometimes done using different passes for each light. How does the method I suggested and this method generally compare? The single-pass method would with with transparency very well, would the multi-pass method work as well? Anyway, lots of questions... thanks in advance for any info.

Share this post


Link to post
Share on other sites
Ashaman73    13715
Quote:
Original post by Gumgo
I was going to write a deferred renderer, but I decided to switch to forward rendering because of the limited material properties I could store in the G-buffer and also because I'd have to write a forward renderer for transparency anyway. I have a few questions forward rendering.

Deferred renders are really good when it comes down to lighting, and really bad for transparency. First, you should take a look at light pre pass rendering , which is similar to deferred rendering, but can handle materials much better. Then you should consider a mix of deferred and forward rendering. Transparency will be always difficult, even with a forward renderer. At the current state of art, it will be almost always be ugly. Anyway you will need special treatment of transparency or performance wise expensive algorithms (depth peeling etc.).

Quote:

First of all, I'm wondering how multiple lights are handled. I'm thinking that a maximum of 8 lights per object is probably plenty. It seems like the most efficient way to handle these cases would be to have a different shader for each combination of lights; i.e. 0 spot, 1 spot, 2 spot, ..., 7 spot, 8 spot, 0 spot 1 directional, 1 spot 1 directional, ..., 7 spot 1 directional... etc. Is this a good way to go? (Writing a program to generate all these shader combinations is not difficult.)

IMHO this is not a good idea, shader switching could be quite expensive.

Quote:

Another thought I had was to use static branching with uniforms. One uniform could determine the number of lights to loop though, and each light could have a uniform determining which type it was. This would involve an if-else in a loop. How bad of a performance hit would this be? Is the first approach better? (The first approach would probably have more state changes.)

If you write code for your CPU, are you still optimizing it with asm ? :) Shader compilers are quite good and grafics hardware has done a major leap in the last 5 years. Games often use some kind of ubershader, a shader which does many different things and which is controlled by some uniforms.


Quote:

I'm planning on using Cg, and another issue could be the number of uniforms available (each light would take several float4s - xyz for various colors (diffuse, ambient, etc.) and extra data could be packed into w, like attenuation). My current computer is pretty old, so its limit is 96 float4s I think. How many do newer cards generally support? I'll need some uniforms for fog, backface normal flip, and a bunch of matrices for skeletal animation... would I have enough left over for 8 lights?

On newer hardware, yes. In my lighting pass I support up to 50 lights. Remember that you don't need to calculate direction independent lighting like ambient on the GPU. Just sum up the ambient lighting value on the CPU and transfer one value to the shader.

Share this post


Link to post
Share on other sites
Nik02    4348
A common approach in managing lights is to determine the n nearest/most influential lights of an object and use them when rendering said object.

---

Shader permutations (parameter-based variations) is a common way to access the problem of multiple lights if dynamic branching is not an option.

---

Static branching can be pretty fast, but it will consume more shader program registers; add to that, old hardware is forced to execute all branches even though the result only depends on one branch (lack of predication). For older hardware, specialized shader permutations are faster because they do exactly what's needed - nothing less, nothing more.

When choosing between permutations and branching, take into account the total work the system has to do in order to draw your scene. Since shader logic is called very often, it quickly adds up. On the other hand, the CPU can execute different code in parallel with the GPU work. The only way to make informed decisions is to profile your app with your target machines.

---

If you run out of vertex shader registers when using skinning, you could split the models based on the bone usage and render them piece by piece (requiring less active matrices on registers).

Alternatively, you can do the skinning in software. Since mesh splitting results in more state changes and draw calls, this approach could be faster depending on the amount of geometry you have to skin. As a bonus, you get "unlimited" number of bones.

Newer cards support a massive amount of registers. D3D11 shader stages have 16 banks of constant registers which can store 4096 constants (4x32bit) each. In addition, you can use textures (or other buffers) as a data source in any shader stage.

---

Since the depth is written from the first pass onwards, you can't render transparent objects in the same passes as the rest of the geometry if you want the subsequent passes to affect the lighting "behind" transparent objects. There are tricks to go around this problem, but they are entirely dependent on the actual application.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this