Branching & picking lighting technique in a Deferred Renderer

Started by
15 comments, last by spek 11 years, 11 months ago
We all know one of the problems with Deferred-Rendering is using various lighting models (Lambert, Phong, Oren, Ward, et cetera). Yet I wonder if branching is still a DONT these days. I mean, having an "if-then else" usually doesn't give me a noticeable performance difference. But in this case, the branch could become relative large:

psLightShader
{
int lightingTechnique = tex2D( gBuffer, texcoords ).x;

if ( lightingTechnique == 0 ) doLambert_Blinn else
if ( lightingTechnique == 1 ) doLambert_Phong else
if ( lightingTechnique == 2 ) doOrenNayar else
if ( lightingTechnique == 3 ) doWard else
...
}

I think there will be 10 to 20 different options in the end. Although most pixels will use a default (Lambert + Blinn) technique.

Would this be a bad idea, and ifso, is there a smart way-around? There was something about combining a Deferred with a Forward pipeline recently, but I'm afraid that's a bit too much of a change. Honestly using different BRDF's isn't giving THAT much of an improvement either, so if it costs too much, I'll fall back on just a few standard lighting techniques.

Rick
Advertisement
I believe it wont be a problem if the threads (are they called threads...?) branch the same way, so if you have a part of the screen do lighting technique x and another part do y, it wouldnt be bad, but if every pixel uses a random technique it would really slow it down a lot.

Though i dont really know how GPUs work. Maybe modern GPUs dont need all of them branching the same way, maybe it can sort them so it does the pixels requiring same branching at the same time.

Anyways, i think it wont be a problem if they all generally branch the same way.

o3o

Well, luckily it should be somewhat coherent indeed. A typical example would be a room that uses default shading on all contents, except the walls using Oren Nayar, and a sofa a velvet lighting method. In other words, pixels using technique X are clustered together.

Yet I'm still a bit affraid of having such a big branch, or doesn't it matter much whether a shader has to check for 2 or 20 cases?
I googled some and at least for some modern GPUs it seems like if there is multiple branches taken in a collection of threads, it calculates both branches for all those threads. So it doesnt seem that bad. Not sure if there is other cache related stuff that would make it slower, but if theres just like 1-2 different branches to take and theyre not randomly scattered all around the place, it could do pretty well.

o3o

Why not try to select only a small amount of BRDFs to use and go from there? I myself use 2 more or less physically based BRDFs (an isotropic and an anisotropic one) which can create a wide range of realistic and good-looking materials in my deferred setup. I use a branch just like you suggested and notice no performance drop whatsoever in my profiling tests.

I can't imagine a normal scene rendered with a deferred renderer requiring 20 BRDFs to be available at once though. If you really want to render with tons of different BRDFs (eg. when you want to do physically correct rendering) you should probably rethink your decision of using a deferred renderer instead of trying to cram tons of techniques into a pipeline that really wasn't designed to handle them.

I gets all your texture budgets!


Well, luckily it should be somewhat coherent indeed. A typical example would be a room that uses default shading on all contents, except the walls using Oren Nayar, and a sofa a velvet lighting method. In other words, pixels using technique X are clustered together.

Yet I'm still a bit affraid of having such a big branch, or doesn't it matter much whether a shader has to check for 2 or 20 cases?

It's still going to add about 19+ spurious ALU ops that may or may not be scheduled concurrently with useful work, depending on the target GPU architecture and a handful of other things. In the non-coherent branch case, you're very likely going to be shading all 20+ BRDF models and then doing a predicated move to pick the 'right' result-- *any* sort of boundary is going to be disproportionally expensive to render. I guess what I'm trying to say here is that your question gets asked a lot and the answer hasn't really changed much :(

If you want flexible BRDFs, you have a few options. You can just use standard, expressive BRDFs like Oren-Nayar/Minnaert or Kelemen Szirmay-Kalos for everything and store some additional material parameters in your G-buffers; this is in general a workable base for most scenes. More esoteric surfaces could be handled via forward shading (and you may be doing this anyway for things like hair, being that they're partially-transparent and all) and compositing into the final render.

You could also aim for the more general BRDF solutions like Lafortune or Ashikminh-Shirley and encode their parameters too. This should be sufficient to represent pretty much any material you can think of.

Lastly, you can also give tiled forward rendering a go. If you're starting off from a deferred renderer this may not be that hard to switch over to, though you'll need to do some work on the CPU side (namely light binning and culling) if you're just using a D3D9 feature set. It should still be viable, however.
clb: At the end of 2012, the positions of jupiter, saturn, mercury, and deimos are aligned so as to cause a denormalized flush-to-zero bug when computing earth's gravitational force, slinging it to the sun.

... In the non-coherent branch case, you're very likely going to be shading all 20+ BRDF models...


Wait, if there is lets say 3 different techniques used in the group of pixels being processed, doesnt it just calculate those 3? Or does it really calculate all the branching? Or did you mean for a worst case situation where all the techniques are used and they all happen to end up in the same thread-group-thing?

o3o

N.B. you can express branching in other ways, such as stencil masking or tile classification.

e.g. say you're trying to do a full-screen lighting pass with 2 BRDFs, say iso/aniso --
You could first do a pass that reads your g-buffer material ID value and outputs a mask - Red for iso and Green for aniso. Then you could down-sample this mask, say, 32 times smaller. When down-sampling, if red + green are blended together at all, instead output blue. This then gives you 3 tile masks -- red tiles contain only iso pixels, green tiles only aniso and blue tiles contain both.
Instead of drawing a full-screen quad with a branching shader, you can now draw 3 full-screen quad-grids, who's vertices are associated with the texels in your mask texture -- the first grid uses an iso-lighting pixel shader, and a vertex shader which rejects vertices that don't belong to a red tile. Same for the 2nd grid, but with an aniso shader and green tile vertex checking. The last grid checks for blue vertices and uses your branching shader supporting multiple BRDF's.
On older hardware with bad branching performance, you now only perform branching on the tiles that actually need it, and areas where the tiles are coherent are only shaded by the appropriate branchless shader.
My biggest concern is that your shader's register usage will be determined by your most complex lighting function, and will reduce your warp occupancy for simpler shading. The approach Hodgman mentioned will avoid this pitfall (which badly affects many engines out there tha have tried it).
Probably you guys are right that only a few BRDF's will be sufficient, but in this early stage is pretty hard to already tell which ones will be usefull and which won't. I guess the final version will just use basic Lambert + Blinn, Oren Nayar for the many matte surfaces I have, and 1 or 2 anisotropic variants.

Doing multiple steps like Hodgman explains sounds interesting. And since Tiled-Deferred Rendering is also on the to-do list, I'm heading that way soft of anyway. Doing steps with specific shaders kills the branching issue, but also introduces some other overhead of course (shader switches, making the mask, downscaling). Hard to say what would be the best choice.


Instead of branching, it still might be a good idea to encode multiple lighting models into 2D (or 3D?) textures and simply use the texcoords to pick. Probably the fastest way, although encoding various complexer models such as Ward or Cook into textures may get a bit difficult, or requires quite a lot texture reads.

This topic is closed to new replies.

Advertisement