I have written a partially simple performance profiler for my game projects and the basic functionality works: I can drop as many profiling start/end macro as i want and it magically shows me the number of cpu cycles, how much time it gets spent - great.
But when i include the part when the profiler builds up a tree for visualizing the recorded events - it simply crashes on a assert.
I know why this happen, but i have no idea how to get around that.
Now i explain how that system works:
1. I have a struct which defines a event, this contains a type (EventBegin, EventEnd), the actual rdtsc clocks, a unique identifier containing: __COUNTER__, __FUNCTION__, __LINE__ and a additional name + some extra parameters.
2. I have a external struct containing a two dimension array, storing the events (Ping pong buffer - or how is this called?)
constant U32 MAX_PROFILER_EVENT_COUNT = 16 * 65536;
struct ProfilerTable {
// NOTE: Event read array index (0 or 1)
U32 currentEventArrayIndex;
// NOTE: Contains two values, each 32bit long
// - Next event index to write to = EventCount - 1
// - Write event array index (0 or 1)
volatile U64 eventArrayIndex_EventIndex;
// NOTE: Two events array (One for read, One for write)
ProfilerEvent events[2][MAX_PROFILER_EVENT_COUNT];
};
3. In my main translation units i define the external table and allocate the required memory once and set everything to zero.
4. Now i may drop how many BEGIN() and END() into my code and each one records an ProfilerEvent with all parameters filled in.
It also works with nested blocks as well ;)
inline void RecordProfilerEvent(ProfilerType type, char* guid, F32 value = 0.0f) {
Assert(externalProfilerTable);
U64 arrayIndex_EventIndex = AtomicAddU64(&externalProfilerTable->eventArrayIndex_EventIndex, 1);
U32 eventIndex = arrayIndex_EventIndex & 0xFFFFFFFF;
Assert(eventIndex < ArrayCount(externalProfilerTable->events[0]));
ProfilerEvent *ev = externalProfilerTable->events[arrayIndex_EventIndex >> 32] + eventIndex;
ev->clock = __rdtsc();
...
}
#define BEGIN_BLOCK_(guid) {RecordProfilerEvent(ProfilerType_BeginBlock, guid);}
#define END_BLOCK_() {RecordProfilerEvent(ProfilerType_EndBlock, PROFILER_ID("::END_BLOCK::"));}
#define BEGIN_BLOCK(name) BEGIN_BLOCK_(PROFILER_ID(name))
#define END_BLOCK() END_BLOCK_()
void myAwesomeUpdateFunc() {
BEGIN_BLOCK("Awesome stuff");
...
END_BLOCK();
}
5. Right after the game loop has been finished, i toggle the event array index with atomic exchange, so that the event count is reset to zero and the event array index is swapped.
// NOTE: Toggle event array index
Assert(externalProfilerTable);
externalProfilerTable->currentEventArrayIndex = !externalProfilerTable->currentEventArrayIndex;
U64 arrayIndex_EventIndex = AtomicExchangeU64(&externalProfilerTable->eventArrayIndex_EventIndex, (U64)externalProfilerTable->currentEventArrayIndex << 32);
U32 eventArrayIndex = arrayIndex_EventIndex >> 32;
Assert(eventArrayIndex <= 1);
U32 eventCount = arrayIndex_EventIndex & 0xFFFFFFFF;
6. Now i record all used events in a historical listy way right after i have done the atomic exchange (Note that atomic exchange returns the value before the change happened!).
// NOTE: Record snapshot
U32 snapshotIndex = AtomicExchangeU32(&profilerState->snapshotIndex, (profilerState->snapshotIndex + 1) % PROFILER_MAX_SNAPSHOT_COUNT);
ProfilerSnapshot *snapshot = profilerState->snapshots + snapshotIndex;
snapshot->eventCount = eventCount;
for (U32 eventIndex = 0; eventIndex < eventCount; ++eventIndex) {
ProfilerEvent *sourceEvent = externalProfilerTable->events[eventArrayIndex] + eventIndex;
ProfilerEvent *targetEvent = snapshot->events + eventIndex;
*targetEvent = *sourceEvent;
}
7. Lastly i build a simple tree from the recorded events (Begin is a new sibling, End restores to the previous sibling). Also i calculate the actual delta value from the rdtsc and then i simply draw it in a table, graph, whatever.
Everything i descriped works great - except for timing this "Collation" process itself. Right now when i time this entire process (put a BEGIN() at the start of the function and a END() at the end of the function) - it either crashes due to my assert for trying to not corrupting the tree structure - or have an absolute value instead of a delta value.
When the BEGIN() starts before the atomic exchange, i get the assertion which is totally legal - because i write a start event in the current event array - but the end event gets inserted as first into the other event array.
But when i put the BEGIN() after the atomic exchange, the first two entries for the next frame contains the start and end event for that profiler collation - but from the previous frame.
I really want to see the profiled profiler timings at the end of my tree - so what should i do? How would you approach that problem?