Every Game Should Have an In-Code Profiler

Published August 31, 2007
Advertisement
Things are very busy. I have a bunch of entries planned, and not anywhere near enough time. How does Raymond Chen do this? Also, two new cars to add to the list: A Rolls Royce (might be an older Phantom), and a black Ferrari 599 GTB that is new on the NVIDIA lot. The 599 is a refreshing car to see, instead of the red 360 Modena/F430 that everyone buys.

Every Game Should Have an In-Code Profiler

In-code profiling has really fallen out of style in recent years. The wide availability of non-intrusive profiling tools really took the steam out of building profiling into your code. For basically zero effort, you can get a full breakdown of where your code is spending time, how much time is being spent (including or excluding children), and all sorts of other neat statistics. Intrusive profiling tools give you even more information: function hit counts and average time spent per hit, for example. Despite all that, these tools are not adequate. Why not?

One of the more common performance bugs is stuttering. Basically, some frames are fast enough, and others are not fast enough. How are you going to diagnose and repair an app that is stuttering? Your profiler output is worthless, because it's gone and averaged the fast frames with the slow ones, destroying information about the slow frames in the process. Moreover, you do not care about the information in fast frames. A frame that slides in under that 16ms boundary is not interesting. Even considering it is a waste of time. It's the slow frames you want to take a look at, and you don't want to average those together either. The fact of the matter is that in order to effectively study what is happening, we need to isolate frames, profile them separately, and discard the results when a frame finishes quickly.

While there's no technical reason a profiler couldn't provide an API to support this type of usage pattern, none that I'm aware of actually do so. VTune allows you to pause and resume sampling, but that doesn't really help, because you can't throw away results from a frame, or isolate frames. And good luck finding a profiler that supports the necessary API calls on every platform you need to target. It's clear, then, that an in code profiler is a necessity for effectively tuning a game for performance.

It's common to use the excellent memory tracker written by Paul Nettle to find memory leaks in C++ code. Unfortunately, I was unable to find a similar library for in-code profiling. (If any of you know one, please comment. [EDIT] I'm told that Game Programming Gems has one by Scott Bilas.) I don't think it would've mattered if I found one, though; I do not think the fine coders at CodeProject are likely to write one that is adequate. What defines adequate? Well, this is a very game centric bit of coding, so it's important to be aware of the complexities of a modern game:

* As I mentioned before, we need frame-to-frame measurements, with the ability to discard uninteresting data and keep the rest. This is also a lot of data. Simply vomiting it out to a text file is not good enough, unless it's a format that can be parsed into a more effective system (a database maybe).
* The definition of "frame" differs depending on what part of the game you're looking at. We need to be able to profile gameplay, graphics, physics, etc separately, since they will frequently be running at different frequencies. Besides, it's helpful to be able to see a per-subsystem breakdown of where time's being spent at a high level. The reporting needs to support this as well.
* We're threading games heavily now. The profiler needs to be thread safe and thread aware, and it shouldn't mix the results from each thread together. We also have to consider that the same subsystem may well be using multiple threads, which adds another bit of complexity to our reporting. In other words, every function call that is recorded needs to be tagged with both its thread and its subsystem.
* You want to collect results during QA and playtesting as well. Writing to a hardcoded C:\perf.log file isn't anywhere near good enough here. We need support for sending the results over a network to servers that can accumulate and parse the data.
* Data is needed over extended sessions as well (especially after the game's been running a couple hours and your heap is getting a little akward), and depending on the format of the game, you may want to split up the data depending on what level/map is running.
* Related to the above, a timestamp relative to when the game started is necessary, both as a frame count and as a human time.
* For extreme sophistication, you might want a sampling based profiler included that can look up symbols or map files. That will allow you to diagnose where time is being spent even outside your code, mainly in Windows components. (Mixing this in with an intrusive stack tracing profiler could be complicated though.)

I'm sure there's more (comment!), but those are the ones that immediately come to mind. Some of this stuff is pretty high end; most indies aren't going to need network based reporting so that they can get perf data from remote testlabs. It's the sort of thing I like to keep in mind though. Still, I think the absolute basics should be built into everyone's code from the beginning, rather than being retrofitted in. I've seen enough posts on these forums by people trying to figure out why their game is stuttering or otherwise slow.

This is actually sort of a new revelation for me. Up until recently, I was perfectly happy to use the conventional profiling tools. Then it came to actually analyzing performance at work, and suddenly it hit me like a ton of bricks. Conventional profiling is practically useless, because of the averaging effect. How on earth are you going to find out what made some arbitrary frame slow? What if it's only on frame out of every hundred that is off? You're completely in the dark with something like VTune. In-code is really the only way to go.
0 likes 5 comments

Comments

Monder
Have you seen Phoenix? It's basically the MS compilers put together in a modular way that allows you switch stages in and out and add in plugins. One of its intended uses is instrumenting code, so if you wish to create a new profiler it could come in handy.
September 01, 2007 02:48 AM
Anon Mike
Raymond Chen does it by having a preblog. He has several months worth of articles just sitting on a server somewhere waiting to be automatically posted. That's also why all the articles are posted at the exact same time everyday.

If he doesn't feel like writing or is to busy for a week or two his preblog gets a bit shorter. Then when the next 5-day serial article comes along he can pound the whole thing out in a day or two, break it up into pieces, and the preblog gets a bit longer.
September 01, 2007 12:09 PM
Emmanuel Deloget
When I read your article, I was thinking to something along the line of
class embedded_profiler
{
  profiler_map_stack stack;
  std::size_t beging_time;
  std::string function_name;
public:
  embedded_profiler(std::string function_name)
  : function_name(name)
  {
    stack = get_stack_from_thread_local_storage();
    begin_time = get_current_time();
  }
  ~embedded_profiler()
  {
    std::size_t total_time = get_current_time() - begin_time;
    stack[function_name].add_execution_time(total_time);
  }
}

Or maybe something a bit more... better?
September 01, 2007 10:00 PM
jollyjeffers
I was exploring this for '3D Pipes in Direct3D 10'. My initial version used ID3D10InfoQueue to try and create some central repository for all debug, event and profiling information.

Quality of information was great, but sadly the performance was abysmal. I had no choice but to drop it from the codebase [headshake].

Maybe we don't see any generic ones like Paul Nettle's mmgr because most people assume profiling is program-specific? I'm sure most people capture slightly different information in slightly different ways...
September 04, 2007 04:02 AM
Washu
Quote:Original post by Anon Mike
Raymond Chen does it by having a preblog. He has several months worth of articles just sitting on a server somewhere waiting to be automatically posted. That's also why all the articles are posted at the exact same time everyday.

If he doesn't feel like writing or is to busy for a week or two his preblog gets a bit shorter. Then when the next 5-day serial article comes along he can pound the whole thing out in a day or two, break it up into pieces, and the preblog gets a bit longer.

Wrong. Raymond Chen is a robot. Everyone knows this.
September 09, 2007 01:23 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement