Developing A Profiler

Started by
4 comments, last by Andrew Kabakwu 15 years, 4 months ago
Hi Guys, after spending a long time adding support for vertex arrays and other OpenGL features to my engine, in an attempt to increase rendering speed, I finally got to the stage of comparing both modes in the engine. However, instead of depending on the simple FPS counter approach, I've decided to implement a Profiling system. It's a completly seperate project in its own right which will provide hooks for rendering the statistics in the game window as it profiles the code. I'd like to ask for suggestions on necessary features for a profiler. And for links to documents and tutorials on profiling code. This is the first time am coding a profiler. Thanks
Advertisement
You may want consider using NVIDIA PerfKit and gDEBugger.
See also GPU Performance Tuning with NVIDIA Performance Tools.
Hi Kambiz,
Thanks for the response.
Am familiar with those tools (have looked at them).

gDebugger's prices are high for a hobbyist.
Nvidia's seems to require a special driver.

Am not trying to do the same thing as these or compete with them, just want to take a stab at profiling and providing a free easy solution.

Still hope to have some more suggetions on how to go about doing this.

Thanks
1) You won't be able to profile the graphics end of the pipeline without the tools Kambiz linked. The best you can do is measure the time between known flush operations (glFlush(), glSwapBuffers()) since the driver is free to decided when it needs to block. This means most functions return almost instantly, no matter how much work the operation represents. The only operations that guarantee synchronization to the GPU are the flush calls (and even they are free to return instantly if the GPU has room in the queue for a new frame, though it usually doesn't and these calls will block till this frame is queued)

2) Look up the QueryPerformanceCounter or another high-performance counter for your system. But note that QPC and other timers all have different little nuances. Sometimes multi-processor systems will give you funny results. A non-realtime OS like windows is free to do what it wants with your threads.
So expect results like:
average time 00.01ms
min time 00.01ms
max time 15.00ms
total time 24.99ms
callcount 1,000
Since your profile can get interrupted by a context switch, thus making any fast function show up as running really slow occasionally (even more so if a large percentage of the total application time is within that function, thus making the chance of a context switch hitting that function really high)

3) The simplest profiler would contain a class that upon instantiation takes a const char * name, checks the time, and on destruction checks the time again, then pushes the start-end pair along with the profile's name into the profiler's queue. At the end of the frame (outside all profile blocks, especially the profile that covers the current frame's total time) you'd do whatever you wanted with the queue of profile data. You could collate all instances with the same name for call-count, total times, average times, and otherwise prepare the data for presentation.
The key is making sure you do as little processing per-profile during the profile step. It is better to post-process your profile blocks so that the profiler doesn't impact timings of what it is profiling.

Other things you then may want to include is a "stack depth" on your profiles so that you can, while walking over your list of profiles this frame determine what profiles were inside other profiles so that you can then break the profile into "parent time" and "child time"

If your profiles have high callcounts over a frame you may want to change your profile storage to a hash and accept the overhead of collating on the fly inorder to cut down on the memory my above method would require. But note that this means you can nolonger accurately place profiles within profiles, expecially if it is for small functions:
// This is really contrived, but some representative of what some people// think when they think profiling.// And it is representative of the fact that you can no longer profile classes// like your vector or matrix code on the function levelint dotVec3(int x, int y, int z){Profile("dotVec3");return x * y;}int foo(int x){Profile("foo")return dotVec3(x,x,x) * dotVec3(x,x,x);}int main ( ){  for ( int i = 0; i < 1000000; ++i )     foo(i);}
With exception of the PerfSDK, I don't know any API to access GPU performance counters. But there is an indirect way to measure how long some operation takes: One can measure the frame time with and without the operation and take the difference. For example if you want to know how much time your parallax share takes to execute you calculate:

dt = frame_time(parallax_on) - frame_time(parallax_off)
where paralax_off means you have replaced the parallax shared with the most basic shared (constant color) for that time measurement.

There are also other useful data you can easily collect, like number of draw calls and state changes.
Thank you for all your comments.

Currently, the system I implemented uses a simple approach. It has an array of profile data where each profile object created stores its information.
The user can specify the size of the array when the the profiling system is initialised. This represents how many profiles that are likely to be made per frame/application cycle.

When each profile is destoryed, I gather Parent->Child information.
The current profile's parent is the profile which has the closest start time to the current profile being destoryed.

Am looking for better ways of store and processing the data at the end of a frame/application cylce (am not making the profiler game centric).

I currently ask the user to supply a deltaTime when calculating and storing the profiler history (Min, Avg, Max etc). After using this for a while, I really dislike that requirement. I'll be changing this soon as am reworking the whole system.

Thanks for the ideas, I'll come back with an update and hopefully something worth showing.

Am still open to more suggestions too

This topic is closed to new replies.

Advertisement