Revival of Forward Rending?

Started by
33 comments, last by Hodgman 12 years ago
I saw AMD's "Leo" tech demo the other day and was pretty amazed by it. I was also surprised to see that it hadn't yet been mentioned on gamedev.net (that I can find). Here are the links.

AMD's Website
Youtube Video

They say that they are using a forward renderer that uses compute shaders to cull lights on a tile basis, which reminded me of this thread I saw on this site a while back.

I'm really interested to see just how feasible this is and how it stacks head to head with tile based deferred rendering for a strictly D3D11 based engine.
Advertisement
Did it ever die? There are specific use cases for which each type of renderer is more suited than the other. What's most interesting about this is the use of a CS for work that would be more traditionally done CPU-side, but I wonder how well that would balance out in a real in-game scene. Thanks for the heads-up anyway, I'll be checking this out in more detail later on. :)

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

You may be interested in this paper: http://www.cse.chalmers.se/~olaolss/papers/tiled_shading_preprint.pdf
It spends some time comparing a tile-based deferred shading implementation with a tile-based forward shading implementation.
The short answer is, it doesn't, not from a development point of view. Tile based light culling was originally developed for deferred shading, which is only getting better. And with the next generation of consoles no doubt having plenty of RAM and memory bandwidth there's no reason (at the moment) to waste rendering the geometry twice when you can just shove all you need through in G-Buffers and render it once.

In fact deferred lighting is probably going the same way for the same reason, both forward and deferred lighting were only ever used to minimize memory and bandwidth use, a precious commodity on the 360 and PS3. But with even mobile devices shoving past them now what's the point? Might as well use up that available RAM and double your polycount, throw a ton of shadow mapped lights, or something else of the kind.

As for AA, there was a recent paper (very recent) on minimal cost MSAA while doing deferred shading, similar to the cost of forward rendering. And of course you can also use temporal and/or morphological AA as well. I'm sure you could forward render transparency stuff using the same tile based scheme while you're going deferred, but deferred shading definitely seems to me to be the way to go.

The short answer is, it doesn't, not from a development point of view. Tile based light culling was originally developed for deferred shading, which is only getting better. And with the next generation of consoles no doubt having plenty of RAM and memory bandwidth there's no reason (at the moment) to waste rendering the geometry twice when you can just shove all you need through in G-Buffers and render it once.
Let's forget for a moment the GBuffer is typically bigger. NV40 could render about 60 pointlights per pass, I have difficulty understanding the need to render twice.

The only major advantage of deferred tech is the modularity of light processing compared to material rendering but this comes at a considerable cost: no one I've talked to understood the need to write shaders putting stuff in different buffers... and don't even get me started on packing.

So, in my opinion, the advantages are still unproven. I suppose UE3 and Samaritan makes this clear. Flexible Forward can emulate Deferred at a reduced cost... not vice versa.

I look forward to read what mhagain will write on this.

Previously "Krohm"

So I've downloaded the full demo and run it a few times. I deliberately chose a fairly low-specced machine to see how viable the technique is on the kind of hardware that would be considered commodity today, and the short answer is - it's not.

Reminds me of the time I first got a 3DFX and - naturally - immediately grabbed GLQuake to test it out. Of course I neglected to pop in the 3DFX opengl32.dll file so I ended up drawing through the Windows software implementation at 1fps.

Obviously AMD feel that they've got something special with their new kit, and they want to show off it's capabilities to the best by taking a sub-optimal technique and making it realtime. More power to them, I wish them well with it. Maybe in 2 years time when this level of hardware is a commonplace average this might be an approach to think of using, but for now there seems to be better things to burn your GPU budget on.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

"Specifically, this demo uses DirectCompute [...] per-pixel or per-tile list of lights that forward-render based shaders use for lighting each pixel."

Now call me a cynic, but is this not pretty much exactly what Damian Trebilco did 5 years ago on 3 generation older hardware using nothing but the CPU and some stencil buffer trick?

Admittedly, Damian's demo with that horse model inside a room was not quite as artistic. The ATI demo sure is kind of funny, with a nice story, well done animations, and it looks quite good, but honestly I couldn't tell it really looks a class better than a thousand other demos (in fact, all of the characters are quite "plastic like" though of course the ATI guys will claim that this is intentional). Opposed to that, unlike a thousand other demos, it requires the lastest, fastest hardware to run...
I've done some simple tests (sponza with 128 unshadowed point lights) and it's definitely feasible. On a GTX 570 at 1920x1080 I get 1.1ms for filling the G-Buffer and 3.5ms for the lighting with a tiled deferred approach, while with an indexed deferred approach I get 0.75ms for the computing the lights per-tile and 4.5ms for forward rendering (both using a z-only prepass). So at 4.6ms vs. 5.25ms it's not too far off in terms of performance. But of course you really need to know how well it scales with:

1. more lights
2. more sophisticated light types (spot, directional, shadows, gobos, etc.),
3. Different BRDF's/material types
4. Dense, complex geometry
5. How well it handles MSAA (in my test scene it brings the forward rendering pass up to 5.25ms for 4xMSAA)

You'd also want to compare how well a deferred renderer scales with lots of G-Buffer terms, especially with dense geometry. Unfortunately I don't have the time at the moment to thoroughly evaluate all of those things, but it does at least seem like a viable alternative to traditional deferred rendering. But I'm not sure if it would ever beat a good tiled deferred implementation outright. It would definitely make certain things a lot easier, since you wouldn't have to worry about packing things into a G-Buffer or special-case handling of different material types in the lighting shader.

@mhagain: there's a lot more going on in that demo than just indexed deferred rendering. For instance they use PTex, and a VPL-based GI solution.

@samoth: it's a variant of light indexed deferred, and they even say as much in their presentation. The technique just becomes a lot more practical when you can generate the light list per-tile in a compute shader, rather than having to do all of the nasty hacks required by the original implementation.

@mhagain: there's a lot more going on in that demo than just indexed deferred rendering. For instance they use PTex, and a VPL-based GI solution.

The bounced lights are the most interesting thing in it to me; my obsession with lighting is not shadows but brightness, and that one tickles my fancy.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Sorry for the bump, but I put up a blog post with some performance numbers and a sample app. Feel free to use it for your own experiments.

This topic is closed to new replies.

Advertisement