the more advanced games are, the more likely they become deferred, the reason is that it's not possible to get the amount of light-surface interactions with forward rendering in a fast way. as you said, it would seem deferred is more demanding, yet it's the only way to go if you want flexibility.
What's 'advanced' mean? Huge numbers of dynamic lights? You can do just as many lights with forward as long as you've got a decent way of solving the classic issue of determining which objects are affected by which lights. Actually, the whole point of tiled-deferred was that it was trying to reduce lighting bandwidth back down to what we had with forward rendering, while keeping the "which light for which object" calculations in screen-space on the GPU.
advanced means there are no limits in light-surface interactions due to tech. deferred shading has a lot of 'points', not just this one.
-you had to reduce shader combination counts, you can imagin, even if your forward solution would be fast enough, you could have 0 to 100 lights affecting a surface, this means you need 100 times the permutation of your shader library that isn't small already. (and no, sadly dynamic branching is not a solution on current gen HW, and no even static branching is not a solution, as your shader will increase be some % and your register usage will increase as well, and we graphics coder guys don't want to pay those ms that we could spend elsewhere. yes, it's a performance reason)
-complexity of light resources, there are some simple lights, some area lights, some projector light, some shadow-mapping lights, there is a sun, there are light streaks (e.g. particle, laser beams). if you'd want to go forward, you'd need to index into all the needed resources, like textures, constants, and current gen hw is not really supporting that. creating atlases is also not very feasible, you'd need to spend a lot of time on moving memory to re-arange data per object to draw. (and you'd still face tight limits on current gen).
you can find some more reasons people went deferred in:
If your environment is static, then you can bake all the lighting (and probes) and it'll be a ton faster than any other approach!
Most console games are still using static, baked lighting for most of the scene, which reduces the need for huge dynamic light counts.
and even those engines, that decimate a vast count of lights this way, like UE3 using lightmass, have problems to apply those lights to dynamic objects, in UE3 they use spherical harmonics to combine them, just like KZ2 does for baked lights. lightmaps are really just orthogonal to forward/deferred.
AFAIK those realtime shadows in UE3 are claimed to be deferred, as that's the only reason why UE3 does not cope well with MSAA.
Another issue with deferred is that it's very hard to do at full 720p on the 360. The 360 only has 10MiB of EDRAM, where your frame-buffers have to live. Let's say you optimize your G-buffer layout so you've got hardware depth/stencil, and two 8888 targets -- that's 3 * 4bpp * 1280*720, or ~10.5MiB -- that's over the limit and won't fit.
n.b. these numbers are the same as depth/stencil + FP16_16_16_16, which also makes forward rendering or deferred light accumulation difficult in HDR...
exactly, yet another reason why it is a very unfavorable idea to go deferred on 360. why would anyone do that? it's because the alternative just does not work (for the reasons given above). Sure, if you make a racing game like gran turismo, with just one light source and maybe some spherical harmonics evaluation in the VS for nicer ambient/radiosity, no reason to go deferred. even an outdoor shooter like just caused can life with forward I guess. but as soon as you want more advanced lighting, like GearsOfWar, GTA, Crysis, Stalker, ... you can't go forward on current gen. next gen, I imagin something like AMD did in LEO is very doable.
Sure, Crysis, Battlefield 3 and Killzone are deferred, but there's probably many more games that use forward rendering, even "AAA" games, like Gears of War (and most other Unreal games), L4D2 (and other Source games), God of War, etc... Then there's the games that have gone deferred-lighting (LPP) as a half-way choice, such as GTA4 (or many rockstar games), Space Marine, etc...
Crysis is forward shaded with up to 16lights per object, (check the insane amount of shader space they use ;) ), Crysis 2 is deferred lighted like GTA, UE3 games are neither what we would call deferred nor forward, it's spherical harmonic based like KZ2. battlefield 3 goes for the (deferred) light indexing/tiling approach. as it's not doable on the RSX it seems, they rather spend their SPUs for it, yet it's the first step towards light indexing, IMO.
Regarding materials, forward is unarguably more flexible -- each object can have unique BRDFs, unique lighting models, and any number of lights. It's just inefficient if you've got lots of small objects (due to shader swapping overhead and bad quad efficiency), or lots of big objects (due to the "which light for which object" calculations being done per-object).
that's the vanilla version, and then the clustered/tiled forward shading comes in ;)
Actually, you mentioned dynamic branches before, but forward rendering doesn't need any; all branches should be able to be determined at compile time. On the other hand, implementing multiple BRDFs in a deferred renderer requires some form of branching (or look-up-tables, which are just as bad).
would explain why most deferred games on console have just one lighting term, even the nano suit in Crysis2 looks like it's missing the anisotropic metal shading of crysis1.
the dynamic branching is needed in first place to skip unneeded light calculations. if you are backfacing, or in shadow, or out of range -> next light. this gives even on my mobile phones a boost if I use a fixed set of lights per drawn object. on DX9 hardware it was skipping pixel, but the general overhead due to this branching compensated for it (was like 10cycles more per shader, 6due to branching and some more as the loop had overhead of storing/restoring registers, validated with FX composer back then.)
Also, tiled-deferred and tiled-forward are implementable on current-gen hardware (even DX9 PC if you're careful), so there's no reason we won't see it soon
As usual, there's no single objectively better pipeline; different games have different requirements, which are more efficiently met with one pipeline or another...
I'm just saying, going for top notch lighting/shading (aka not just radiosity baking into lightmaps and also not just 1light source in the world and cubemap/spherical harmonics for dynamic objects), made all engines go deferred on this generation of consoles. I can't think of any with competitive lighting to dead space, crysis,gta, that would be forward, beside maybe God Of War, but you could clearly identify artifacts of merged lights per vertex if you exceeded some count (I'd guess 3 dynamic lights).