Overdraw sucks [deferred renderer]

Started by
13 comments, last by Matias Goldberg 13 years, 4 months ago
Deferred renderer. I first perform a z-pre pass (turned out to be very fast). Then the usual GBuffer pass, which performs some computations for each pixel. Ih this pass, however, Z buffer in ON with func=LESSEQUAL.
So I expect the GBuffer pass to be fast regardless of the scene complexity, because rasterization IS fast, and invisble pixel are discarded because the zbuffer is already filled.
But this doesn't happen.
A scene with 100.000 tris is WAY WAY slower than a scene with 5.000 tris. Vertex processing and rasterization is supposed to be fast, and with zbuffer already filled, the pixels should be computed exactly one time.

I even tried rendering a stupid fullscreen plane with a very low depth, causing all the geometry to be discarded. But this doesn't change anything. GBuffer pass keeps being slow, depending of tris number. Doesn't have sense for me.

I really can't understand WHY !
Could you help ??

THANKS
Advertisement
Some things you can do will disable the early z rejection optimization because of hardware limitations. This is somewhat driver and graphics card dependant, so check their documentation for a list of things that will disable it. In general you should avoid:

- Explicitly writing depth from the shader.
- Alpha testing.
- Calling clip() in the pixel shader.
Somehow you missed the disadvantage of Early Z pre pass is that you render all your geometry twice. Twice the batch count, twice the draw calls.
Google "draw call CPU bottleneck". This is the original reason why instancing was invented.

Early Z pre pass only works great when you're GPU bottlenecked, and you're obviously CPU bottlenecked.

Even then, there are, as Adam_42 pointed out, some limitations that prevent Early Z from working correctly.

Cheers
Dark Sylinc
Get a profiler and figure out where the time is being spent in your frame. There's not enough real data in your post to make any kind of conclusion.
I have a deferred engine setup and i tried and early z pass. The early z made things slower, not faster. Even when I am doing 40 draw calls per frame, the early z doubled that to 80, but i experienced a 30% decrease in frame rate. I am not cpu bound in any way: I only utilize a 7% usage of a single core.

I believe that early z is special case where you have EXTREMELY heavy duty shaders.
Wisdom is knowing when to shut up, so try it.
--Game Development http://nolimitsdesigns.com: Reliable UDP library, Threading library, Math Library, UI Library. Take a look, its all free.
I've never noticed a win doing z pre-pass either. Your shaders have got to be pretty damn serious ime before doing /some/ early z rejection outweighs the extra draw call overhead.

You can probably use instancing to render to z with only a couple of simple shaders (with or without alphatest for example), but that's how my engine worked until I removed it for a second time and swore there wouldn't be a third.
------------------------------Great Little War Game
Early-Z does very much depend on the scene you are rendering and the shader work load you are doing.

If your shader work is low, or you have low over draw anyway, then an early z-pass isn't going to help matters as you are just pushing more verts for no reason.

If, on the other hand, you have heavy scene, with lots of over draw of objects you can't occulusion cull away, well then early z can be a win, more so when coupled with a sane drawing order so that close objects lay down their z values nice and early.
That's just it though isn't it. Lots of books and sources say things like draw front to back in one chapter and then back to front in another and then sorted by shader changes in a third.

My experience says just sort for shader change reduction and be happy - its the only thing that consistently makes a positive outcome vs the others.

I'm sure that you can frig a system somewhere that pushes a renderer to a card's theoretical maximums, but in the real world you have to work with what your scene demands. And that's alpha, lots of overdraw, state change minimisation and etc.

I'm still of the opinion that a deferred renderer is the emperors new clothes tbh. I went this route mainly to see how it felt and I was never happy. Deferred has a lot of advantages but the few drawbacks are just deal-breakers for me. Like the problems with alpha, especially lighting it.

Unlimited shader lenghts on DX10+ make ubershaders look very attractive. I guess you can put 20 lights in a one-pass shader and then early z might give a win too. But my engine only supports a few lights at once as it tries to maintain feature compatibility with crapper platforms, but it runs like shit off a shovel and alpha has no special casing attached at any point, which is great.
------------------------------Great Little War Game
So, basically you just agreed with what I said; it depends on your scene.

I'd also be very surprised if you found a text saying to draw back to front, apart from in the case of correct alpha blending in certain alpha modes.

As for front to back rendering, and indeed a Z pass in general;



This scene contains many objects with the same shader/material combinations, and quite heavy shaders at that, so a failure to take advantage of early-z (or occulsion culling) would murder the frame rate.
(We also support alot of lights, thus a win for deferred rendering).
Quote:Original post by Rubicon
That's just it though isn't it. Lots of books and sources say things like draw front to back in one chapter and then back to front in another and then sorted by shader changes in a third.

My experience says just sort for shader change reduction and be happy - its the only thing that consistently makes a positive outcome vs the others.

I'm sure that you can frig a system somewhere that pushes a renderer to a card's theoretical maximums, but in the real world you have to work with what your scene demands. And that's alpha, lots of overdraw, state change minimisation and etc.

I'm still of the opinion that a deferred renderer is the emperors new clothes tbh. I went this route mainly to see how it felt and I was never happy. Deferred has a lot of advantages but the few drawbacks are just deal-breakers for me. Like the problems with alpha, especially lighting it.

Unlimited shader lenghts on DX10+ make ubershaders look very attractive. I guess you can put 20 lights in a one-pass shader and then early z might give a win too. But my engine only supports a few lights at once as it tries to maintain feature compatibility with crapper platforms, but it runs like shit off a shovel and alpha has no special casing attached at any point, which is great.


There was a thread in Graphics Programming & Theory about deferred rendering, and I made this great big list of pro's and con's. When my coworker read it he said to me "you forgot the most important one: it's sexy!" Even though a lot of people go deferred for the wrong reasons (it's trendy), it doesn't mean that all of its benefits are suddenly moot. You might think right now that in DX10 you can just scale up to 20 lights per object and everything will be hunky dory, but it won't be. You either end up with stupidly expensive shaders, or a shader permutation nightmare, and/or abysmal efficiency regarding wasted shaded fragments per light. You already mentioned how important reducing state changes and maximizing batching can be on the PC...try throwing 20 lights into the mix and see what that does for your batching.

As for alpha, it's true that deferred rendering can't handle it without extending the technique in the some way. But at the same time it also doesn't prevent you from doing it the old-fashioned way, so I don't think it's a deal breaker. It's not like MSAA, where if you want it you *have* to specifically handle it throughout the pipeline, and suffer worse performance. Besides it's not like alpha-blended materials are all roses in forward renderers, since you can't multipass the lighting.



This topic is closed to new replies.

Advertisement