Is Clustered Forward Shading worth implementing?

Started by
45 comments, last by Matias Goldberg 11 years, 2 months ago
the more advanced games are, the more likely they become deferred, the reason is that it's not possible to get the amount of light-surface interactions with forward rendering in a fast way. as you said, it would seem deferred is more demanding, yet it's the only way to go if you want flexibility.


What's 'advanced' mean? Huge numbers of dynamic lights? You can do just as many lights with forward as long as you've got a decent way of solving the classic issue of determining which objects are affected by which lights. Actually, the whole point of tiled-deferred was that it was trying to reduce lighting bandwidth back down to what we had with forward rendering, while keeping the "which light for which object" calculations in screen-space on the GPU.

advanced means there are no limits in light-surface interactions due to tech. deferred shading has a lot of 'points', not just this one.

-you had to reduce shader combination counts, you can imagin, even if your forward solution would be fast enough, you could have 0 to 100 lights affecting a surface, this means you need 100 times the permutation of your shader library that isn't small already. (and no, sadly dynamic branching is not a solution on current gen HW, and no even static branching is not a solution, as your shader will increase be some % and your register usage will increase as well, and we graphics coder guys don't want to pay those ms that we could spend elsewhere. yes, it's a performance reason)

-complexity of light resources, there are some simple lights, some area lights, some projector light, some shadow-mapping lights, there is a sun, there are light streaks (e.g. particle, laser beams). if you'd want to go forward, you'd need to index into all the needed resources, like textures, constants, and current gen hw is not really supporting that. creating atlases is also not very feasible, you'd need to spend a lot of time on moving memory to re-arange data per object to draw. (and you'd still face tight limits on current gen).

you can find some more reasons people went deferred in:

http://www.crytek.com/download/A_bit_more_deferred_-_CryEngine3.ppt

If your environment is static, then you can bake all the lighting (and probes) and it'll be a ton faster than any other approach! wink.png
Most console games are still using static, baked lighting for most of the scene, which reduces the need for huge dynamic light counts.

and even those engines, that decimate a vast count of lights this way, like UE3 using lightmass, have problems to apply those lights to dynamic objects, in UE3 they use spherical harmonics to combine them, just like KZ2 does for baked lights. lightmaps are really just orthogonal to forward/deferred.

http://www.unrealengine.com/files/downloads/GDC09_Smedberg_RenderingTechniques.pdf

AFAIK those realtime shadows in UE3 are claimed to be deferred, as that's the only reason why UE3 does not cope well with MSAA.

Another issue with deferred is that it's very hard to do at full 720p on the 360. The 360 only has 10MiB of EDRAM, where your frame-buffers have to live. Let's say you optimize your G-buffer layout so you've got hardware depth/stencil, and two 8888 targets -- that's 3 * 4bpp * 1280*720, or ~10.5MiB -- that's over the limit and won't fit.

n.b. these numbers are the same as depth/stencil + FP16_16_16_16, which also makes forward rendering or deferred light accumulation difficult in HDR... wacko.png

exactly, yet another reason why it is a very unfavorable idea to go deferred on 360. why would anyone do that? it's because the alternative just does not work (for the reasons given above). Sure, if you make a racing game like gran turismo, with just one light source and maybe some spherical harmonics evaluation in the VS for nicer ambient/radiosity, no reason to go deferred. even an outdoor shooter like just caused can life with forward I guess. but as soon as you want more advanced lighting, like GearsOfWar, GTA, Crysis, Stalker, ... you can't go forward on current gen. next gen, I imagin something like AMD did in LEO is very doable.

.

Sure, Crysis, Battlefield 3 and Killzone are deferred, but there's probably many more games that use forward rendering, even "AAA" games, like Gears of War (and most other Unreal games), L4D2 (and other Source games), God of War, etc... Then there's the games that have gone deferred-lighting (LPP) as a half-way choice, such as GTA4 (or many rockstar games), Space Marine, etc...

Crysis is forward shaded with up to 16lights per object, (check the insane amount of shader space they use ;) ), Crysis 2 is deferred lighted like GTA, UE3 games are neither what we would call deferred nor forward, it's spherical harmonic based like KZ2. battlefield 3 goes for the (deferred) light indexing/tiling approach. as it's not doable on the RSX it seems, they rather spend their SPUs for it, yet it's the first step towards light indexing, IMO.

Regarding materials, forward is unarguably more flexible -- each object can have unique BRDFs, unique lighting models, and any number of lights. It's just inefficient if you've got lots of small objects (due to shader swapping overhead and bad quad efficiency), or lots of big objects (due to the "which light for which object" calculations being done per-object).

that's the vanilla version, and then the clustered/tiled forward shading comes in ;)

Actually, you mentioned dynamic branches before, but forward rendering doesn't need any; all branches should be able to be determined at compile time. On the other hand, implementing multiple BRDFs in a deferred renderer requires some form of branching (or look-up-tables, which are just as bad).

would explain why most deferred games on console have just one lighting term, even the nano suit in Crysis2 looks like it's missing the anisotropic metal shading of crysis1.

the dynamic branching is needed in first place to skip unneeded light calculations. if you are backfacing, or in shadow, or out of range -> next light. this gives even on my mobile phones a boost if I use a fixed set of lights per drawn object. on DX9 hardware it was skipping pixel, but the general overhead due to this branching compensated for it (was like 10cycles more per shader, 6due to branching and some more as the loop had overhead of storing/restoring registers, validated with FX composer back then.)

Also, tiled-deferred and tiled-forward are implementable on current-gen hardware (even DX9 PC if you're careful), so there's no reason we won't see it soon wink.png

As usual, there's no single objectively better pipeline; different games have different requirements, which are more efficiently met with one pipeline or another...

I'm just saying, going for top notch lighting/shading (aka not just radiosity baking into lightmaps and also not just 1light source in the world and cubemap/spherical harmonics for dynamic objects), made all engines go deferred on this generation of consoles. I can't think of any with competitive lighting to dead space, crysis,gta, that would be forward, beside maybe God Of War, but you could clearly identify artifacts of merged lights per vertex if you exceeded some count (I'd guess 3 dynamic lights).

Advertisement
A little off topic but still on topic, does anyone have any links to good tutorials on deferred vs forward rendering? I've read a fair bit about the detail on deferred but would rather get a good grounding on it before look into it further - couldn't find any decent sites with 'why deferred' other than 'you can have more lights'.

Apologies for borrowing this thread quickly...

I think that's a good start:

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09.html

A little off topic but still on topic, does anyone have any links to good tutorials on deferred vs forward rendering? I've read a fair bit about the detail on deferred but would rather get a good grounding on it before look into it further - couldn't find any decent sites with 'why deferred' other than 'you can have more lights'.

Apologies for borrowing this thread quickly...

I think that's a good start:

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09.html

That link just reinforces his belief that 'why deferred' is just 'you can have more lights'.

Effectively, that's the main reason it appeared, and that's the main reason it's still strong.

There are other side effects that are good:

  1. The GBuffer data can be very useful for screen space effects (i.e. Normals can be used for AO, refraction mapping, and local reflections, depth can be used for Godrays, fog, and DOF). Even if you do you forward rendering, you'll probably end up spitting a sort of GBuffer for those FXs. Of course, you don't have to do magic to compress a lot of parameters into the MRT that you won't be needing in the postprocessing passes (like specular colour term).
  2. Shading complexity becomes screen-dependant. This benefit/disadvantage (depending on the application) is shared with Forward+. Assuming just one directional light is used, every pixel is shaded once. In a forward renderer, if you render everything back to front, every pixel covered by a triangle will be shaded multiple times. Hence deferred shader's time will be fixed and depends on screen resolution (hence lower screen res. is an instant win for low end users). A deferred shader/Forward+ cannot shade more than (num_lights * width * height) pixels even if there are an infinite amount of triangles, whereas the Forward renderer may shade the same pixel an infinite number of times for an infinite amount of triangles, overwriting it's previous value. Of course if you're very good at sorting your triangles (chances are the game cannot be that good) Forward renderer may perform faster; but in a Deferred Shader you're on more stable grounds.

Edit: As for the "more lights" argument, take in mind that a deferred shader can easily take 5000 lights (as long as they're small) while a forward renderer can max at 8-16 lights per object.

Very insightful guys, thanks. My renderer is nicely abstracted so I might give it a go. My game only requires one directional light at the moment but I still see the plus with effects like AO, etc

Anyone know which method the call of duty engines use?
http://advances.realtimerendering.com/s2011/index.html
Call of duty: Black Ops used forward renderer with physical based lighting.

Most of the big differences in the different flavors of forward and deferred rendering stem from limitations in earlier hardware and API's. On D3D9 hardware you were pretty limited in terms of the flexibility of your shaders, which meant even something as simple as looping over a list of lights was difficult to implement in a generic way. Hence you have the multipass forward, forward with permutations for each combination of lights affecting an object, multipass deferred, tiled deferred, light prepass, etc. Modern hardware and the shaders that run on them are much more flexible which allows you to look at things in a more abstract way. For instance, here are some questions you might answer when designing your rendering scheme:

  • How do I decide which lights affect a surface, and at what granularity?
  • What data do I need as inputs to my lighting functions, and how do I feed it that data?
  • What BRDF's need to be supported, and at what granularity am I going to switch between them?

If you're working with D3D11-class hardware it's actually not even that hard to write a simple renderer that let's you switch between various forward/deferred rendering tehcniques, since you can share so much of the shader code between all of them. You just need to write lighting functions that take two structs: one that has the material/object input parameters, and one that has the light parameters. Then for your different techniques you just pull that data out of G-Buffers or structured buffers or wherever you have them.

I'm just saying, going for top notch lighting/shading, made all engines go deferred on this generation of consoles.

But, not all engines did go deferred...? There's an absolute ton of forward rendered current-gen games, many with superb lighting!


I'm not saying that every game should go one way or another, but that the optimal pipeline will depend on the game (as opposed to, "it's impossible to go forward, forward doesn't work, no engines use forward rendering").

"Top notch" lighting/shading doesn't always mean "thousands of lights" -- like you said, with a racing game maybe you only need a few lights, but you instead need really complex BRDFs (like your Bugatti IOTD biggrin.png), and quite a few different ones at that. That's still "top notch, advanced lighting", despite not having 5000 tiny point lights...

Is your game about 1000 glowing sparks, or 1000 different kinds of paint? Each requires a different "advanced lighting" pipeline...

To take things to the extreme, imagine we've got 1000 lights covering the entire screen (very advanced wink.pngtongue.png)

For deferred, we have 1000 passes of the screen, where we read 96-128bytes of G-Buffer data and write out 64bit of HDR lighting data --- 156-188KiB of bandwidth per pixel, or over 100GiB total at 720p (an impossible amount for current gen).

For forward, let's say we've can do 10 lights per pass, thus we'd do 100 passes of the screen, where we read 64-96bytes of data (everything we would've written into the G-buffer, except hardware depth as we've got it intrinsically) and write out 64bit of HDR lighting data --- 13-16KiB of bandwidth per pixel --- 11-14GiB total at 720p (still an insane amount, but maybe low enough for 2fps).

So both techniques fail miserably with thousands of large lights (though traditional forward actually does better than traditional deferred), but yes, if you want thousands of small lights applied to arbitrary objects, then deferred is a winner simply because it allows you to associate lights with screen-space areas, instead of associating lights with objects themselves.
However, light-index deferred and Forward+ both also use this same screen-space light association technique, but do their actual lighting using forward-rendering (and they're both implementable on current-gen consoles!!), so deferred isn't your only option for these situations.
Also, deferred-lighting ("light pre-pass"), or inferred lighting shouldn't be in the same category as regular deferred shading, as they have advantages/disadvantages from both traditional forward and deferred approaches. They're some of the hybrids that doesn't easily fit into either traditional black-and-white category. There's a huge number of console games that live in this grey area.
e.g. Uncharted perform deferred-lighting, but only for dynamic lights affecting the environment, and forward render everything else.
Or, in my last game, we forward rendered several lighting terms, then calculated deferred shadow masks after lighting, then combined the terms/masks in post. That's not traditional forward or deferred, but one of these weird hybrids... We also didn't require lighting to exactly match the environment, so for dynamic objects we constructed light positions per-pixel, like God of War does (except with several resulting lights, instead of merging them all into 1), and we could avoid putting them behind the pixel to save on having to do a backfacing test (every light calculation gave bang for buck). Using this, we could get something that looked like it was lit by a dozen lights with only two light evaluations (and even gave a kind of cheap "ambient BRDF" GI), which was fine for our game.

as you want more advanced lighting, like Crysis, ... you can't go forward on current gen ... Crysis is forward shaded

?!

Shading complexity becomes screen-dependant. This benefit/disadvantage (depending on the application) is shared with Forward+. Assuming just one directional light is used, every pixel is shaded once. In a forward renderer, if you render everything back to front, every pixel covered by a triangle will be shaded multiple times. Hence deferred shader's time will be fixed and depends on screen resolution (hence lower screen res. is an instant win for low end users). A deferred shader/Forward+ cannot shade more than (num_lights * width * height) pixels even if there are an infinite amount of triangles, whereas the Forward renderer may shade the same pixel an infinite number of times for an infinite amount of triangles, overwriting it's previous value. Of course if you're very good at sorting your triangles (chances are the game cannot be that good) Forward renderer may perform faster; but in a Deferred Shader you're on more stable grounds.

This is a bit misleading, because it's standard practice with forward renderers to use a z-pre-pass, so that there isn't any over-draw.

Also, the g-buffer pass of deferred suffers the same issue, which may be a significant cost depending on your MRT set-up and your shaders (e.g. expensive parallax mapping done during the g-buffer pass), but again, you could solve this with a ZPP, if required.

Screen-dependant shading complexity is more important when considering that pixel-shaders are run on 2x2 quads of pixels.

In a deferred (screen-space) lighting pass, an entire model can be lit by drawing a quad(polygon) over the top of it, in which every pixel-quad(2x2 pixels) is processed fully, regardless of the underlying geometry, so your quad efficiency is 100%.

On the other hand, if you do the lighting during forward rendering, then many of your model's triangles will only cover portions of pixel-quads (any edge that isn't aligned to the x/y axis will cut through many quads, partially covering them), which leads to a large amount of wasted shading and forces you to aggressively LOD your models so that you have large triangles. In the worst case, if your models are made up of pixel-sized triangles, then your quad efficiency is only 25%, which means your pixel shaders are effectively 4 times slower than they should be.

This has been a fascinating topic, and I commend everyone contributing too it! Warning, what follows is a somewhat off topic rant

Since it wasn't clear, I was arguing for deferred simply to handle most worst case scenarios in games today, not that it was the most efficient always, and yes forward/hybrid/etc. may always have scenarios where X is faster than Y. This generation, now closing, has seen an extreme need for optimization of resources for each game. Programmer time and talent has been, and will always be another consideration, but pushing good visuals versus limited resources has eaten up more and more consideration as time has gone on. So optimizing what you are doing for each game has been a priority, including refactoring something like lighting each time if need be.

This next generation however, I believe, will be different. Certainly more compute power will always be usable, pretty much into infinity. But the biggest constraint I can see is artist time. It's already been a constraint with modern games, and can only get worse now that a thousand materials can be supported on models a hundred thousand faces and more in count, not too mention all the advanced animation rigging for things like cloth physics, hair physics, skin and muscle simulation and etc. that can be done.

Which is simply why I foresee less time spent on refactoring rendering and more time making better tools for artists. Certainly, there are going to be cases still where forward/deferred/hybrid approaches are better; and as MJP pointed out you can have a more generalized pipeline far more easily now, which is great! But anything that makes the job easier and faster for artists, at least in my view, should be given priority. Which is a reason why I've viewed the use of multiple BRDFs with skepticism. The less the artist has to learn, and thus the more time spent actually making things, the better. And while certainly better can be done than Blinn-Phong, I'd simply rather not even give most artists the opportunity to sit there and switch between Beckmann to GGX to Cook-Torrance to etc. just to see what each did to "Get it right."

And I know there are ways to mitigate that. Suggestions I've seen range from pre-defined materials with correct values to not let artists screw things up to. etc. etc. All take time, and I'm mostly thinking way too hard about efficiency I suppose. So in short I'd rather hope for, in general, far less refactoring each game, as little complexity (and thus time and effort) added to the artists pipeline as possible, and more effort on all those very neat tools and research into such this go around. After all, I enjoy playing games as well, and would love to see them be as good as possible. And it makes more sense to me to take the Hollywood/offline VFX path of late, which is trying to make making things faster and cheaper, rather than trying harder to make things look better. So I'd rather have a deferred render capable of handling the worst case gameplay scenarios with a generalized BDRF that's good enough at doing multiple materials. But that's a far away thought I suppose.

I.E. Here's what I'm talking about, even Hollywood and Disney have this problem and are trying to solve it: http://disney-animation.s3.amazonaws.com/library/s2012_pbs_disney_brdf_slides_v2.pdf

In fact, their BDRF sounds pretty good! "As few parameters as possible, 0-1 range, all combinations of parameters plausible" Mmmm yeah that's the stuff. One BDRF to rule them all, one BDRF to find them, one deferred renderer to bring them all, and in screenspace bind them! laugh.png

Which is a reason why I've viewed the use of multiple BRDFs with skepticism. The less the artist has to learn, and thus the more time spent actually making things, the better. And while certainly better can be done than Blinn-Phong, I'd simply rather not even give most artists the opportunity to sit there and switch between Beckmann to GGX to Cook-Torrance to etc. just to see what each did to "Get it right."

This isn't the problem you think it is; the BRDF/shaders are set up by tech artists (in association with rendering programmers) which are not the same guys doing the models/animation/rigging which take the time. Those guys are handed shaders and told 'use these' so they won't be swapping from one function to another to 'get it right'.

This topic is closed to new replies.

Advertisement