So, I have finally took the time to optimize my game. One week later and the following test scene went from ~30 fps(35-40ms) to ~60fps (16-17ms).
Well, it is not a secret that my engine is home brew and that there were and are enough optimizing potential hidden. The trick was to start profiling it in more depth. Therefor I started with extending the in-game profiling mechanism by adding a time-stamp based, detailed (microseconds level) frame logging of single threads and the GPU.
Getting a profiler I imagined was hard. Gdebugger was not enough. Many tools support DirectX , but not OpenGL, at least only halfhearted. So my first attempt to write a CSV file and transform it into a diagram with OpenOffice failed. It was hard, slow and really clumsy to show lot of data (several thousand timestamps per frame). The rescue comes in form of SVG. Scalable Vector Graphics. Really easy to generate, out-of-the-box support by browsers, good display performance, you can zoom in and out easily, and finally you can add meta information which will be displayed when you move over data with the mouse. Really perfect ! It looks like this:
So, after getting more detailed information about your game performance I was able to start tracking down issues. First off, I needed to get rid of sync points between the GPU and CPU. That occurs when the GPU and CPU wait for the other one to finish processing of certain task, i.e. uploading a buffer. Therefor I added some double and triple buffers to process data on the CPU while the GPU renders the result of a previous frame.
The next step was moving data processing from the main thread to the worker threads. Here is a list of jobs, which are processed in the worker threads now:
- physics engine
- behavior tree (lua)
- garbage collection (lua)
- path finding
- environment scanning
- filling "rendering command queues" (not API supported yet !)
- particle processing
- decal processing
- audio processing
- animation processing
Moving this into a worker thread often requires some form of double buffering and sometimes it introduce some funny bugs, like this new alien gnoblin version ;-)
Finally I tracked down an old test rendering pass (additional geometry pass). And I only optimized a single shader by adding branching and reordering the rendering order.
There is still lot of potential in the expensive post-processing shaders and submitting the data to OpenGL (still not really good batching support), but for now I 'm quite happy with the result