Jump to content

  • Log In with Google      Sign In   
  • Create Account

simonjacoby

Member Since 19 Jan 2006
Offline Last Active Dec 12 2014 05:52 AM

#4535100 Renderer design and performance optimizations

Posted by simonjacoby on 01 October 2009 - 08:18 AM

I suggest that you start in a completely different end, if you want to get good rendering performance.

Don't worry about virtual function calls. Even though they add overhead, that overhead is completely negligible in comparison to the time it takes to issue draw calls in Direct3D on any modern hardware (Xbox 360/Vista PC). The same goes for branching. If either of them *do* make a difference in your rendering engine, you should probably rethink the design of your rendering engine ;) All your "problem time" should be in issuing draw calls, creating/updating/binding resources and switching states, not in virtual function calls and if-statements.

Try to minimize the time state sorting takes, not by making a smart state sorter, but by having fewer states to sort. It all comes down to identifying "batch breakers" and removing them. A "batch breaker" is a condition that forces you to "break" a batch by splitting it in several other batches. This is always some kind of state change, be it texture, shader, geometry, states, render targets etc.

Your target is to come as close as possible to have to bind only one texture, one vertex buffer, one index buffer and one shader, and then issuing one draw call to draw your entire world ;)

Here are some tips that can give you a pretty serious performance boost when used together:

- You mentioned that you had about 1500 objects to render. Most likely a lot of them are static. Group them together and use instancing to draw several objects with one draw call. Note that instancing doesn't automatically imply hardware instancing, you can also use shader instancing, or just slam several objects together in a big vertex buffer. Slamming together objects will eat more memory but draw fast as hell, and works on all hardware. This removes "switching-objects-batch-breaking".

- Aanimated meshes can often also be instanced using some cleverness.

- As long as models share vertex format, you can put lots of objects in a big (or several big) vertex buffer(s), instead of letting each object have separate index/vertex buffers. You then simply draw ranges of vertices in this big buffer. Most GPUs today have no problem with 1M+ vertices in a vertex buffer. This removes "switching-vertex/index-buffer-batch-breakers".

- Use atlasing to group several textures together into one. This is not a requirement to be able to use instancing, but it really helps because you can draw different objects that use different textures. This removes "switching-texture-batch-breakers".

- Prefer shaders that are a bit more versatile with parameter controlled appearance over smaller, specialized shaders that do only one thing. Specialized shaders will force you to break batches more often. This removes "switching-shader-batch-breakers".

- You mentioned that your objects have nearly identical materials. Try to parameterize the material so they can share a common shader, and then put the material parameters either in the vertex data, or in textures. This removes "switching-material-batch-breakers".

- Look into deferred rendering and similar techniques. While they come with their own set of problems, they provide opportunity for excellent batching. This removes a lot of shader/material/light problems that can lead to unnecessary batching.

- For particle systems, it is essential to use atlasing and putting all your stuff in a single buffer for good performance. Also use premultiplied alpha, it will allow you to perform additive blending and regular blending simultaneously. I once optimized a particle system from about 100-500 draw calls to a single draw call using that, and it still had animated particles that used different types of blending (for example fire and smoke).

- Use good culling. Fewer objects to draw obviously means fewer state changes.

- Perform as little vertex processing on the CPU as you have to. Sometimes you have to, for example if you have deformable meshes that are processed by a physics library. Be sure to use the proper flags for dynamic buffer updates when you do, or you will suffer a hefty performance penalty.

As a side note, if you manage to make your batching so good that you actually can draw all your geometry with just a few draw calls, you'll find that you actually want to break the batches yourself artificially so that the level can be culled and drawn in chunks. Otherwise you will draw the entire level, all the time, which is bad also ;)

Using these techniques I managed to optimize a game that in a previous engine had used about 2500-3000 draw calls, down to 25-50 draw calls. I could run levels that previously had dipped below 30 fps in release mode, in 60+ fps in debug mode (using Direct3D's debug DLL). :)

I realize that the above isn't an answer to your original question, but as you see, you don't really need to optimize your state sorter if you only have a few draw calls to sort. You'll definitely utilize your GPU better this way.

Hope this gives you some ideas and best of luck with your implementation!

/Simon


#4445840 HDR lighting + bloom effect

Posted by simonjacoby on 26 April 2009 - 07:42 AM

EDIT: sorry, I was still writing when you guys answered :)

Hi,

you've got some of the concepts mixed up. From your description, it sounds like you're trying to do three things:

1. Render HDR
2. Perform automatic luminance adaption during tone-map pass
3. Add a bloom effect

Here's a brief explaination on how you do each step, and why:

HDR rendering: this is the source of a lot confusion, mainly because it's one of those buzzwords that gets thrown around alot. Here's what it means in practical terms:

When you draw stuff "regularly", you usually do that to a color buffer where each channel is eight bits (for example RGBA8). This is fine for representing colours, but when you're rendering 3D-stuff you really need more precision, because your geometry will be lit and shaded in various ways, which can cause pixels to have very high or very low brightness.

The way to fix this is simply to render to a buffer that has higher precision in each color channel. That's it. Instead of just using 8 bits, use more. One format that is easy to use and has good enough precision is half-float format, in D3D lingo D3DFMT_A16R16G16B16F. Because of limitations of the GPU, you usually can't set the backbuffer to a high-precision format. So instead, you create a texture with this format, render to it, and then copy the result to the backbuffer so it can be shown on your screen.

So, all you have to do is to create a texture with this new format (for example), and bind it as the render target instead of the default back buffer. Let's call this texture the HDR render texture. When you have created it and set it as render target, just draw as usual. When you're done rendering, copy the pixels in the HDR texture to the old back buffer to show it. The copy is usually done by drawing a full screen quad textured with the HDR render texture over the back buffer. When you've done this: voilá! Your very first HDR rendering is done :)

If you've done this correctly, the first thing you will notice is that there has been no improvement at all to your regular rendering ;) This is because we haven't done any of the cool stuff that higher precision enables us to do. Some of the most common things people do are bloom, exposure, vignetting and luminance adaption (exposure, vignetting and luminance adaption are usually called tone-mapping when used together).

Here's what the are, and how you do them.

Exposure: there's a great article written by hugo elias that explains it much better than I could do here: http://freespace.virgin.net/hugo.elias/graphics/x_posure.htm
In practice, that article boils down to a single line of code at the end of your shader:

float4 exposed = 1.0 - pow( 2.71, -( vignette * unexposed * exposure ) );

where 'unexposed' is the "raw" pixel value from your HDR texture, 'vignette' is explained below, and 'exposure' is the constant K in hugo elias article. In my code it's simply declared as:

const float exposure = 2.0;

...because 2.0 makes my scene look nice. You may have to use a different value that look good for you, if you decide to implement exposure. If you want it a bit more robust, You can make this happen automatically, it is described in 'luminance adaption' below. Also, know that there are several ways of performing the exposure, with different formulas, which result in different images. The Hugo Elias one is an easy way to get started though.

Vignetting: Because a lens in a camera has a curved shape, it lets in less amount of light at the edges, so many photos or films (especially on cheap cameras) have noticably darkened edges. See example here: http://mccollister.info/vignette70.jpg. This effect is called vignetting. It is simulated with two lines of code:

float2 vtc = float2( iTc0 - 0.5 );
float vignette = pow( 1 - ( dot( vtc, vtc ) * 1.0 ), 2.0 );

...where iTc0 are the texture coordinates of the full screen quad, and ranging from 0..1. The result is a factor that is 1.0 in the center of the screen and becomes less as it moves away from the center.

Luminance adaption: this is part of the exposure, but can be done separately. In Hugo's code, the constant K (and in my code the variable 'exposure') the exposure is fixed, meaning that you have to tweak it manually for a scene to look good. If a level varies a lot in brightness (for example you are standing in a dark room and then walking outside to a sunny day), no value of K will work very well for both scenes (the sunny outside may be 10,000 times (or more) brighter than the dark inside). Instead, you need to measure how bright the scene is so you can adjust K accordingly.

The easiest way to do this is to take the average of all pixels in the HDR texture. One way to do this that is fast is to make mip-maps of the HDR texture, all the way down to a 1-pixel texture. This final one-pixel texture will then contain the average of all the pixels above, which is the same as the average scene luminance. Use this value as K when doing the exposure. You simply do this by using the 1-pixel texture as input to the exposure, instead of the hardcoded K (or 'exposure' as it's called in my code example). You will need to tweak it to look good and adapt the way you want, but when it's done your renderer can handle all kinds of brightnesses, which is very cool :)

Finally, there's the blooming: i'm sure you already know what this is, simply making bright parts of the scene glow a bit. This is simply done by taking a copy of the current scene, blurring it, and adding the blurred version back to the original. To make this fast, you usually scale down the scene to a texture that is for example 1/4 of the original, and then blur that. Another thing that you do is that you only want the brightest pixels to glow, not the entire scene, therefore, when scaling down the scene, you usually also subtract a factor from the original pixels, for example 1.0, but you can use whatever looks good. The smaller this factor is the more parts of your scene will glow and vice versa.

Whoa, long post :)

While this probably seems pretty complicated, depending on what look you want for your game and the type of game you are creating you can decide on implementing all of this or just some of it. The modern FPS games and racing games implement most of this above, but if you just want to make a simple space shooter with some nice glowing effects, all you have to implement is the "render to high-precision-texture"-part, and the bloom part.

For starters, you should probably just try that, and then add the other effects as you get more comfortable.

So, to answer your questions:

1. It depends, see above :)
2. You render color data to a texture by creating a texture with usage D3DUSAGE_RENDERTARGET and a high precision pixel format, and then setting it with device->SetRenderTarget( 0, m_hdr_rt_tex );
3. You resize by creating more mip levels for your texture, and rendering to them. Don't forget to set the ViewPort to match the mip level size.
4. One simple way of getting the luminance is simply averaging the color channels together. This can be done with a dot product, like so:
float lum = dot( color.rgb, float3( 0.333, 0.333, 0.333 ) );
Some people like to weigh the different channels differently with more on green and a lot less in blue, but in practice nobody ever notices the difference unless you point it out ;) Feel free to experiment :)
5. You blur a texture by averaging several nearby samples together.
6. It depends, but when adding bloom you usually just add the color values.
7. Described above.
8 (3rd? :)). Yes, you can have everything in one .fx-file, create each one as a technique (downsample_technique, bloom_combine_technique etc).

Best of luck!

Simon


PARTNERS