I discussed in length in a previous dev journal ( a year ago ) the rendering pipeline system, but I'll make a small summary for people who're not fully aware of it.
The whole rendering system is based on "pipes". Each pipe is a component dedicated to rendering something on the screen ( or processing something for rendering ). The pipeline is a tree of pipes, each pipe has one parent and any number of children pipes.
Each render pipe has two main methods: setup() and render(). In the setup() operation, CPU operations required for further rendering are executed. This step does not only do CPU work though, it can also do rendering to off-screen textures for special effects ( ex.: shadow-mapping requires rendering the scene to a depth buffer from the light's point of view ). The render() method of course renders something on the screen.
Example of basic pipes are CScenePipe and CStandardPipe. CScenePipe is in charge of determining what part of the scene-graph has to be displayed, and in which order ( this pipe groups rendering batches by shader techniques for example, to minimize textures or states switching ). Then it stores a list of "objects to render". CStandardPipe takes a CScenePipe and renders this list of objects on the screen, by setting the states / shaders / textures, matrices, etc..
CScenePipe + CStandardPipe allow you to render something on screen with some shaders, but you get nothing out of it more than what you could see, say, in ASEToBin.
Then come special effects. In the past ( see previous dev journals ), I've worked on shadow maps ( CShadowmapPipe ), glowing effect ( CBloomPipe ), high-dynamic range ( CHDRIPipe ), motion blur ( CMotionBlurPipe ), depth-of-field ( CDepthOfFieldPipe ), distortions ( CDistortionPipe ), and I probably forget a few ones.
It should be clear now what the problem is: this design leads to a proliferation of render pipes. Basically, for each effect, I need 1 or 2 additional pipes.
When implementing those effects, I felt uneasy, and only recently realized why: most of the time, all the pipes had a similar structure, rendering something to one or many textures, setting shader constants, etc.. Last week I thought it was a good time to review this and to clean up the pipes, and maybe find if a better approach could lead to a non-proliferation of pipes.
After a lot of experiments, I found a solution involving two pipes only. Each of those is pretty complex, due to their ability to support a lot of code paths.
The first one is CRenderTexturePipe. This pipe renders all its children into one or many textures, and has those abilities:
- one can declare many textures in which to render to ( color buffers and depth buffers ), select the resolution, pixel format, etc..
- automatically uses multiple render-targets ( instead of multi-pass );
- automatically sets up shaders for clearing the targets if necessary ( you can't do a simple glClear() in multi-render targets, or all your targets will be cleared by the same color, and you might want two targets to be filled with different initial values ).
- supports rendering to float textures
- can render to the main color buffer, or display a full-screen quad with one of the render targets.
The second pipe is CEffectPipe. That one is a bit different:
- it can render to *one* color texture target if necessary, but not more. It cannot render to depth textures. By default it renders to the main color buffer.
- it takes N textures in input, one shader, and many shader constants, and outputs a full-screen quad.
Pretty much all my previous effects can be reimplemented ( at greater performance, and more simple code ) with those two pipes only. For example, for my depth of field effect:
- with one CRenderTexturePipe, the scene is rendered into an RGBA16f buffer for the color, and to an RGBA16f for the velocity/depth. Velocity can be used for motion blur. Depth is used to determine the focus distance in the depth of field effect.
- then one CEffectPipe takes the two textures that were rendered by the CRenderTexturePipe in input, binds a post-processing depth-of-field shader, and outputs that either to another texture, or to the main color buffer.
Nvidia driver bug
While re-implementing the depth-of-field effect, I was surprised to notice a tremendous performance hit with a kernel size of 32 for the blurring. After investigation, I found the cause of the problem; apparently, on Geforce 8800 cards in OpenGL, the constant kernel table is implicitely using temporary registers ( with read-write access ). 32 of them! It's incredible, but instead of declaring a table of constants directly in the GLSL shader, if I declare an array of vec4 uniforms and upload that table from the CPU, I get back my framerate.
The difference is impressive: 15 fps ( constants in shader ) vs 200 fps ( constants in array of uniforms ).
I blame that on the GLSL Nvidia compiler unable to recognize the "const" keyboard, and decide that no, my kernel offsets table will never be written, and should not use temporary registers.
I've noticed this behavior on two different machines, on in Vista, another in XP, so any programmer reading this and using tables of constants in GLSL shaders: investigate this problem, you might not know, but maybe your framerate could increase by 10 times!
I've also re-organized my shaders library, and added the "#version 120" directive to make the compiler more strict. Had to fix tons of bugs / ambiguities.
Screen-space ambient occlusion
Last but not least, since we discovered Crytek's paper about Crysis's screen-space ambient occlusion effect, I and a work collegue ( Inigo Quilez ) decided to give a try to this technique. It'll be interesting for our work, but also for Infinity if I can make it work with a good-enough quality.
Ambient-occlusion is pre-computed on ships in textures, but there's one problem: the planetary engine cannot precompute anything, so true AO is impossible on it.
Screen-space ambient occlusion is a trick to simulate local AO from the depth buffer. The advantage is that it can increase the visual quality a bit ( not by a miracle either ), at a high performance cost, and handles dynamic scenes perfectly, as it's a screen-space / post-processing effect.
Since Crytek and Inigo both wrote lengthy discussions on this subject, I won't come back on a technical description of it; I'll probably submit a gamedev.net image-of-the-day with such details instead, if you're interested.
Framerate is around 150 fps on my 8800 GTX, the noise effect can be enabled or disabled.
Mandatory pics, experimental, taken with various settings. Only the AO component is shown.