In-engine profiler: how to automatically mark all/most functions for profiling?

Started by
4 comments, last by Kylotan 7 years, 1 month ago

Hi, after almost finishing the basic functionality of my 3D Game Engine, I am now starting to expand it. The first thing that I want to work on is to improve the in-engine profiler. Currently, I have quite a bit of functionality for manually marking beginning and end parts of code that should be profiled, which has been of great help.

However, I would like to have a way through which all the functions in my code (or at least most of them) would be automatically marked for compiling. Engines like Unity and Unreal seem to achieve that somehow: you can choose to see very detailed profilling of most inner functions, even if source code of the engines does not appear to include a time-stamp recorder at the beginning and end of all the functions.

I am trying to learn how to achieve the same, but it has been very hard to find references on such type of profiling. Could any of you give me some insights on how such a thing could be achieved, which techniques could be involved, etc? I would like to state from the start that I am very aware of external good profilers that could do this for my engine - I am specifically trying to learn how to implement such a thing in-engine.

Advertisement

Personally, I would not try to do this with an intrusive profiler - there's a decent overhead to every profiling block, so adding them at a fine-grained level will destroy the data that you're trying to collect.
Perhaps those engines use a sampling profiler instead? Unreal is not closed source, so you could check :wink: These profilers periodically interrupt a thread and record the call-stack, which at a later point in time can be converted into a stack of function names using the application's debug database (e.g. PDB file on MSVC).

UE4 has quite explicit macros in the source code that mark out the areas that get profiled. I've not seen any magic that goes further than that.

Could any of you give me some insights on how such a thing could be achieved, which techniques could be involved, etc?

The two sets of keywords for search engines are "instrumenting profiler" and "sampling profiler". It looks like you are trying to build your own instrumenting code.

With an instrumenting profiler, SOMETHING inserts code into your program to record the metrics. Whatever method is chosen, the code can be added manually to the source code as stack-based profiling objects, or 1980's style ugly macros, or added through compiler flags to the function prolog/epilog processing, or if the language supports reflection can be added at run time, or by linking against a profiler library that automatically adds function prolog/epilog calls.

In the case of Unity and other systems that do some profiling for you, it is a mix of systems. Unity has code that enables instrumented profiling within certain blocks, it gets activated and deactivated so that you see all your functions using C# reflection and only Unity's interesting tidbits that are explicitly marked.

For code I write yourself, I personally prefer scope-based objects in the functions you care about. Every time you enter a function, you create the object right as you enter:


void foo(...) {
  Profiler::Marker marker(__PRETTY_FUNCTION__); // creates log entry when created and another when destroyed

  /* Remaining code here */
?}

You can add some optional parameters so it includes specific flags for whatever purposes you'd like, perhaps a construction signature like:

marker( const char* function_name, const char* sub_name = nullptr, uint64 flags = 0);

I have created a in-game profiler as well. Its still not finished but i already get a good overview which part of the code is slow.

Its fully based on macros, when i want to include a function i simple just stick one word after the function body - thats it.

When i want to time sub-parts of my code, i put a BEGIN_BLOCK("some name") and END_BLOCK().

Internally it uses just two profiling event array´s, one to write to and one to read from + atomic index to store the event array and the current event index.

The overhead for recording each events is very small:


inline void RecordProfilerEvent(ProfilerType type, char* guid) {
    Assert(externProfilerMemory);
    ProfilerTable *profilerTable = (ProfilerTable *)externProfilerMemory->tableStorage;
    Assert(profilerTable);
    u64 arrayIndex_EventIndex = AtomicAddU64(&profilerTable->eventArrayIndex_EventIndex, 1);
    u32 eventIndex = arrayIndex_EventIndex & 0xFFFFFFFF;
    Assert(eventIndex < ArrayCount(profilerTable->events[0]));
    ProfilerEvent *ev = profilerTable->events[arrayIndex_EventIndex >> 32] + eventIndex;
    ev->clock = __rdtsc();
    ev->type = (u8)type;
    ev->coreIndex = 0; // @TODO: Retrieve core index from current thread or remove this field entirely
    ev->threadID = (u16)GetThreadID();
    ev->guid = guid;
}

The GUID is created like this:


#define ABCD_(a, b, c, d) a "|" #b "|" #c "|" d
#define ABCD(a, b, c, d) ABCD_(a, b, c, d)
#define PROFILER_ID(name) ABCD(__FILE__, __LINE__, __COUNTER__, name)

The only thing which is slow is to process the events, so that you can reasonably visual it:

- Snapshoting the events, so you get at least two frames worth of data.

- Each snapshot contains a single frame boundary event, so you can just look at the events between frame A and B

- Traversing the events, calculating the delta cycles and building up a tree or list

- Render the data in a list/graph whatever you want.

This wont affect the game code at all and i can clearly see which codes takes how much time.

Of course the entire frametime will increase with this collation process, but this is fine because all timings are just cpu cycles.

Just a hint: instead of using atomic trickery to get safe indices into a shared array, consider using thread local storage and collating the results at the end. It can be faster in most circumstances and it's arguably cleaner too. (Not so good for short-lived threads, though.)

This topic is closed to new replies.

Advertisement