• entries
232
1463
• views
961046

# Improvements

227 views

This week end, i've been working on a few optimizations.

As i suspected a huge CPU bottleneck, i added a CProfile class to my code, which works hierarchically. You call a CProfile::begin(title) function before the piece of code you want to profile, and a CProfile::end() function after. These calls can be nested. Two informations are gathered: the time (in milliseconds) elapsed between the begin() and end(), and the percentage of time spent in the block of code, compared to the total frame.

In the log it might look like this:

Frame (20 ms, 100%){    Setup (10 ms, 50%)    {         UpdateScene (5 ms, 25%)         DoFrustumCulling (3 ms, 15%)         SortObjects (2 ms, 10%)    }    Render (10 ms, 50%)    {         Planet1 (5 ms, 25%)         {             Atmosphere (2 ms, 10%)             Terrain (3 ms, 15%)         }         Planet2 (5 ms, 25%)         {             Atmosphere (2 ms, 10%)             Terrain (3 ms, 15%)         }    }}

After i added a couple of begin() and end() calls in the important parts of the engine, i ran it, and without too much surprise, discovered it spent:
- around 25% of its time in updating the planet
- around 75% of its time in rendering the planet ( in that case, "rendering" means setting up the scene, sorting the objects, and sending them to the GPU ).

I also displayed the number of terrain patches at ground level: around 700. That's an interesting information, because i've got a unique texture assigned for each of these. I was testing with a resolution of 512^2.. which means it was using 700 * 512 * 512 * 3 bytes of texture memory.. that's 550 Mb ! Despite this, it was running pretty well on my ATI X800 256 Mo, which means the driver is doing its job at paging textures between video and system memory, but i still decreased the standard resolution to 256^2.

Finally, i discovered that a piece of code during the planet setup, was called twice ( while once was enough ). This saved 15% on the total framerate. On that X800, i'm now getting around 85 fps at ground level, but i'm not totally happy with that number ( especially since it'll decrease when i'll add additional effects, clouds, vegetation, the user interface, etc.. ). My goal is to reach at least 120 fps. To reach my number, my next step will be to review the materials sorting code, and especially the way the shaders and their constants are attached, since at the moment i set tens of constants ( which never change ) for each of the 700 objects, and enable/disable the same shaders.

## 1 Comment

I saw a nice implementation of a profiler like you describe, but it had a couple of nice code-related characteristics:

I forget the finer points, but something like:

#define PROFILE_BLOCK CProfiler p( __FUNCTION__, __LINE__, __FILE__ );

and you'd then have:
void myFunc( ... )
{
PROFILE_BLOCK
// other stuff
}

all you needed to do was add a PROFILE_BLOCK at the beginning of a block. Not perfect, but avoids forgetting to "end" a profiling fragment [smile]

Are you profiling your OpenGL (?) usage as well as your CPU usage?

I've taken to use PIX in a big way (OpenGL has 'GlDEBugger' or something like that, right?) and the information can be amazing.

When viewed in a flat/sequential way, the things that the pipeline are being asked to do look very different to the way the algorithms in your code might appear.

At the very least it'd allow you to confirm whether any rendering-related optimizations (esp. state change related) were at all effective [smile]

Jack

## Create an account

Register a new account