• entries
    232
  • comments
    1462
  • views
    956510

Screen-space ambient occlusion

Sign in to follow this  
Ysaneya

1278 views

Review of the rendering pipeline

I discussed in length in a previous dev journal ( a year ago ) the rendering pipeline system, but I'll make a small summary for people who're not fully aware of it.

The whole rendering system is based on "pipes". Each pipe is a component dedicated to rendering something on the screen ( or processing something for rendering ). The pipeline is a tree of pipes, each pipe has one parent and any number of children pipes.

Each render pipe has two main methods: setup() and render(). In the setup() operation, CPU operations required for further rendering are executed. This step does not only do CPU work though, it can also do rendering to off-screen textures for special effects ( ex.: shadow-mapping requires rendering the scene to a depth buffer from the light's point of view ). The render() method of course renders something on the screen.

Example of basic pipes are CScenePipe and CStandardPipe. CScenePipe is in charge of determining what part of the scene-graph has to be displayed, and in which order ( this pipe groups rendering batches by shader techniques for example, to minimize textures or states switching ). Then it stores a list of "objects to render". CStandardPipe takes a CScenePipe and renders this list of objects on the screen, by setting the states / shaders / textures, matrices, etc..

CScenePipe + CStandardPipe allow you to render something on screen with some shaders, but you get nothing out of it more than what you could see, say, in ASEToBin.

Then come special effects. In the past ( see previous dev journals ), I've worked on shadow maps ( CShadowmapPipe ), glowing effect ( CBloomPipe ), high-dynamic range ( CHDRIPipe ), motion blur ( CMotionBlurPipe ), depth-of-field ( CDepthOfFieldPipe ), distortions ( CDistortionPipe ), and I probably forget a few ones.

It should be clear now what the problem is: this design leads to a proliferation of render pipes. Basically, for each effect, I need 1 or 2 additional pipes.

When implementing those effects, I felt uneasy, and only recently realized why: most of the time, all the pipes had a similar structure, rendering something to one or many textures, setting shader constants, etc.. Last week I thought it was a good time to review this and to clean up the pipes, and maybe find if a better approach could lead to a non-proliferation of pipes.

After a lot of experiments, I found a solution involving two pipes only. Each of those is pretty complex, due to their ability to support a lot of code paths.

The first one is CRenderTexturePipe. This pipe renders all its children into one or many textures, and has those abilities:
- one can declare many textures in which to render to ( color buffers and depth buffers ), select the resolution, pixel format, etc..
- automatically uses multiple render-targets ( instead of multi-pass );
- automatically sets up shaders for clearing the targets if necessary ( you can't do a simple glClear() in multi-render targets, or all your targets will be cleared by the same color, and you might want two targets to be filled with different initial values ).
- supports rendering to float textures
- can render to the main color buffer, or display a full-screen quad with one of the render targets.

The second pipe is CEffectPipe. That one is a bit different:
- it can render to *one* color texture target if necessary, but not more. It cannot render to depth textures. By default it renders to the main color buffer.
- it takes N textures in input, one shader, and many shader constants, and outputs a full-screen quad.

Pretty much all my previous effects can be reimplemented ( at greater performance, and more simple code ) with those two pipes only. For example, for my depth of field effect:
- with one CRenderTexturePipe, the scene is rendered into an RGBA16f buffer for the color, and to an RGBA16f for the velocity/depth. Velocity can be used for motion blur. Depth is used to determine the focus distance in the depth of field effect.
- then one CEffectPipe takes the two textures that were rendered by the CRenderTexturePipe in input, binds a post-processing depth-of-field shader, and outputs that either to another texture, or to the main color buffer.

Nvidia driver bug

While re-implementing the depth-of-field effect, I was surprised to notice a tremendous performance hit with a kernel size of 32 for the blurring. After investigation, I found the cause of the problem; apparently, on Geforce 8800 cards in OpenGL, the constant kernel table is implicitely using temporary registers ( with read-write access ). 32 of them! It's incredible, but instead of declaring a table of constants directly in the GLSL shader, if I declare an array of vec4 uniforms and upload that table from the CPU, I get back my framerate.

The difference is impressive: 15 fps ( constants in shader ) vs 200 fps ( constants in array of uniforms ).

I blame that on the GLSL Nvidia compiler unable to recognize the "const" keyboard, and decide that no, my kernel offsets table will never be written, and should not use temporary registers.

I've noticed this behavior on two different machines, on in Vista, another in XP, so any programmer reading this and using tables of constants in GLSL shaders: investigate this problem, you might not know, but maybe your framerate could increase by 10 times!

Shaders library

I've also re-organized my shaders library, and added the "#version 120" directive to make the compiler more strict. Had to fix tons of bugs / ambiguities.

Screen-space ambient occlusion

Last but not least, since we discovered Crytek's paper about Crysis's screen-space ambient occlusion effect, I and a work collegue ( Inigo Quilez ) decided to give a try to this technique. It'll be interesting for our work, but also for Infinity if I can make it work with a good-enough quality.

Ambient-occlusion is pre-computed on ships in textures, but there's one problem: the planetary engine cannot precompute anything, so true AO is impossible on it.

Screen-space ambient occlusion is a trick to simulate local AO from the depth buffer. The advantage is that it can increase the visual quality a bit ( not by a miracle either ), at a high performance cost, and handles dynamic scenes perfectly, as it's a screen-space / post-processing effect.

Since Crytek and Inigo both wrote lengthy discussions on this subject, I won't come back on a technical description of it; I'll probably submit a gamedev.net image-of-the-day with such details instead, if you're interested.

Framerate is around 150 fps on my 8800 GTX, the noise effect can be enabled or disabled.

Mandatory pics, experimental, taken with various settings. Only the AO component is shown.






Sign in to follow this  


13 Comments


Recommended Comments

Thanks for taking the time to share the design of your rendering pipeline. These are the interesting bits that unfortunately are rarely written about.

By the way, I know it's too late but in case you ever start with a new code base, please consider dropping the 'C' in front of the class names. Code using this notation is noticeably harder to work with and is IMHO ugly. Not to mention that the notation is completely unnecessary: I know I'm using a class.

Share this comment


Link to comment
Quote:
Original post by Gaheris
By the way, I know it's too late but in case you ever start with a new code base, please consider dropping the 'C' in front of the class names. Code using this notation is noticeably harder to work with and is IMHO ugly. Not to mention that the notation is completely unnecessary: I know I'm using a class.


That's sometimes not obvious when looking at the code ( not the interfaces ). I'm using C for classes, S for structures and I for interfaces.

Share this comment


Link to comment
Excellent post as always.

Quote:
Original post by Gaheris
By the way, I know it's too late but in case you ever start with a new code base, please consider dropping the 'C' in front of the class names. Code using this notation is noticeably harder to work with and is IMHO ugly. Not to mention that the notation is completely unnecessary: I know I'm using a class.


In my opinion a style is just that and as long as you're consistent I don't see the problem. The most important abillity is abiding by a project's styles and not trying to enforce your way on other people. I support your naming convention even though it may not be the same as mine :)

Share this comment


Link to comment
Quote:
Original post by LachlanL
I for one would love to see something about the screen-space ambient occlusion!


Seconded. I always like the more technically-oriented entries Ysaneya. Kudos.

Share this comment


Link to comment
Quote:
Original post by Solidus117
Quote:
Original post by LachlanL
I for one would love to see something about the screen-space ambient occlusion!


Seconded. I always like the more technically-oriented entries Ysaneya. Kudos.


Seconded, and thirded!

Quote:
Original post by Metorical
Excellent post as always.

Quote:
Original post by Gaheris
By the way, I know it's too late but in case you ever start with a new code base, please consider dropping the 'C' in front of the class names. Code using this notation is noticeably harder to work with and is IMHO ugly. Not to mention that the notation is completely unnecessary: I know I'm using a class.


In my opinion a style is just that and as long as you're consistent I don't see the problem. The most important abillity is abiding by a project's styles and not trying to enforce your way on other people. I support your naming convention even though it may not be the same as mine :)


Meh. If it works, go for it. Personally I find that having 'C' everywhere is a bit much. I currently use 'I' for interfaces, but I'm not bothering with 'S' because I currently don't use that many structs. If I get to a point where I do, I'll add it in, but for the moment, almost everything is a class if it has no prefix.

Share this comment


Link to comment
Hey Y, about this

The difference is impressive: 15 fps ( constants in shader ) vs 200 fps ( constants in array of uniforms ).

Are you saying that using

const float PI = 3.14;
const float PI2 = 6.28;
const float PI3 = ;
const float PI4 = ;

vs.

uniform float PI[4];

that sending the float array of consts you want to use is 10x faster?

Share this comment


Link to comment
Quote:
Original post by MARS_999
The difference is impressive: 15 fps ( constants in shader ) vs 200 fps ( constants in array of uniforms ).

Are you saying that using

const float PI = 3.14;
const float PI2 = 6.28;
const float PI3 = ;
const float PI4 = ;

vs.

uniform float PI[4];

that sending the float array of consts you want to use is 10x faster?


I certainly hope not :) In any case, I suspect that the problem only shows up with a large number of constants.

Example:

const vec3 table[32] = { value0, value1, value2, ... , value30, value31 };

vs

uniform vec3 table[32];

There you'd get the 10x performance.

I think it's proportional to the number of temporary registers used.

Share this comment


Link to comment
You mention that you can now achieve all of this with only two pipes:
Quote:

- with one CRenderTexturePipe, the scene is rendered into an RGBA16f buffer for the color, and to an RGBA16f for the velocity/depth. Velocity can be used for motion blur. Depth is used to determine the focus distance in the depth of field effect.
- then one CEffectPipe takes the two textures that were rendered by the CRenderTexturePipe in input, binds a post-processing depth-of-field shader, and outputs that either to another texture, or to the main color buffer.

I'm curious though, what is the "glue" that plugs these two together in the many combinations necessary to create each effect? If it were nothing but these two pipe objects and each one had a fixed output, then obviously the result would be very generic. So where do the variations come from? For example, when rendering a depth-of-field effect, who determines that two render textures need to be created by CRenderTexturePipe and the appropriate depth-of-field shader applied in CEffectPipe?

Share this comment


Link to comment
Quote:
Original post by Nairou
You mention that you can now achieve all of this with only two pipes:
Quote:

- with one CRenderTexturePipe, the scene is rendered into an RGBA16f buffer for the color, and to an RGBA16f for the velocity/depth. Velocity can be used for motion blur. Depth is used to determine the focus distance in the depth of field effect.
- then one CEffectPipe takes the two textures that were rendered by the CRenderTexturePipe in input, binds a post-processing depth-of-field shader, and outputs that either to another texture, or to the main color buffer.

I'm curious though, what is the "glue" that plugs these two together in the many combinations necessary to create each effect? If it were nothing but these two pipe objects and each one had a fixed output, then obviously the result would be very generic. So where do the variations come from? For example, when rendering a depth-of-field effect, who determines that two render textures need to be created by CRenderTexturePipe and the appropriate depth-of-field shader applied in CEffectPipe?


Even further, do the pipes generate shader source-code on the fly? Otherwise how do you account for differing values of MAX_DRAW_BUFFERS, etc.?

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now