I'm just saying, going for top notch lighting/shading, made all engines go deferred on this generation of consoles.
But, not all engines did go deferred...? There's an absolute ton of forward rendered current-gen games, many with superb lighting!
I'm not saying that every game should go one way or another, but that the optimal pipeline will depend on the game (as opposed to, "it's impossible to go forward, forward doesn't work, no engines use forward rendering").
"Top notch" lighting/shading doesn't always mean "thousands of lights" -- like you said, with a racing game maybe you only need a few lights, but you instead need really complex BRDFs (like your Bugatti IOTD ), and quite a few different ones at that. That's still "top notch, advanced lighting", despite not having 5000 tiny point lights...
Is your game about 1000 glowing sparks, or 1000 different kinds of paint? Each requires a different "advanced lighting" pipeline...
To take things to the extreme, imagine we've got 1000 lights covering the entire screen (very advanced )
For deferred, we have 1000 passes of the screen, where we read 96-128bytes of G-Buffer data and write out 64bit of HDR lighting data --- 156-188KiB of bandwidth per pixel, or over 100GiB total at 720p (an impossible amount for current gen).
For forward, let's say we've can do 10 lights per pass, thus we'd do 100 passes of the screen, where we read 64-96bytes of data (everything we would've written into the G-buffer, except hardware depth as we've got it intrinsically) and write out 64bit of HDR lighting data --- 13-16KiB of bandwidth per pixel --- 11-14GiB total at 720p (still an insane amount, but maybe low enough for 2fps).
as you want more advanced lighting, like Crysis, ... you can't go forward on current gen ... Crysis is forward shaded
Shading complexity becomes screen-dependant. This benefit/disadvantage (depending on the application) is shared with Forward+. Assuming just one directional light is used, every pixel is shaded once. In a forward renderer, if you render everything back to front, every pixel covered by a triangle will be shaded multiple times. Hence deferred shader's time will be fixed and depends on screen resolution (hence lower screen res. is an instant win for low end users). A deferred shader/Forward+ cannot shade more than (num_lights * width * height) pixels even if there are an infinite amount of triangles, whereas the Forward renderer may shade the same pixel an infinite number of times for an infinite amount of triangles, overwriting it's previous value. Of course if you're very good at sorting your triangles (chances are the game cannot be that good) Forward renderer may perform faster; but in a Deferred Shader you're on more stable grounds.
This is a bit misleading, because it's standard practice with forward renderers to use a z-pre-pass, so that there isn't any over-draw.
Also, the g-buffer pass of deferred suffers the same issue, which may be a significant cost depending on your MRT set-up and your shaders (e.g. expensive parallax mapping done during the g-buffer pass), but again, you could solve this with a ZPP, if required.
Screen-dependant shading complexity is more important when considering that pixel-shaders are run on 2x2 quads of pixels.
In a deferred (screen-space) lighting pass, an entire model can be lit by drawing a quad(polygon) over the top of it, in which every pixel-quad(2x2 pixels) is processed fully, regardless of the underlying geometry, so your quad efficiency is 100%.
On the other hand, if you do the lighting during forward rendering, then many of your model's triangles will only cover portions of pixel-quads (any edge that isn't aligned to the x/y axis will cut through many quads, partially covering them), which leads to a large amount of wasted shading and forces you to aggressively LOD your models so that you have large triangles. In the worst case, if your models are made up of pixel-sized triangles, then your quad efficiency is only 25%, which means your pixel shaders are effectively 4 times slower than they should be.