# My open source C++ performance profiler

This topic is 3285 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi everyone, I wrote a simple C++ performance profiler these days. Any suggestion is welcome. You can download it at: https://sourceforge.net/project/showfiles.php?group_id=249996 Homepage: http://freeprofiler.sourceforge.net/ ==============================================================================

# Introduction

## 1. Basic Information

FreeProfiler is a light weighted C++ code performance profiler. It provides utility macros for users to insert into their code. After recompiling and running their application, user can watch the performance result. Unix name: freeprofiler CVS anonymous access: :pserver:anonymous@freeprofiler.cvs.sourceforge.net:/cvsroot/freeprofiler

## 2. Why Use FreeProfiler?

2.1 Compared to other performance profilers, FreeProfiler has smaller granularity. It focuses on code block times and message processing times, not functions. So user can know exactly how much time is consumed by a specific part of a function, or how much time is consumed by a specific windows message. Here is a typical case where FreeProfiler is useful: You have a window callback procedure function, and you want to know which message consumes much time. So you simply add FreeProfiler macros at the beginning of your windows callback procedure, like this:
LRESULT CALLBACK AppWndProc(HWND hwnd, int message, WPARAM wParam, LPARAM lParam)
{
FreeProfilerRecordMessageBlock(message);
switch (message)
{
case WM_CREATE:
......
......
}
......
}


And you can view the xml-format result after you exit your application. 2.2 It is light-weighted. FreeProfiler has minimal impact on the performance of the application. 2.3 It is thread-safe. So you don’t need to worry if your function will be called by multiple-threads simultaneously. 2.4 It is open source and free. You can use it for any purposes. [Edited by - renqilin on January 19, 2009 4:15:53 AM]

##### Share on other sites
It would be nice if you would reformat your code: tags. (C++ is default).

##### Share on other sites
Thank you. phresnel.

##### Share on other sites
I made some modifications to the package. Now there are two zip files: One is binary zip, the other is source zip.

If you don't want to compile FreeProfiler by yourself, you can just download binary zip.

##### Share on other sites

I just realized that there are the following bugs in my package:

1. __declspec(thread) does not work well with LoadLibrary. (This is fixed now).
2. rdtsc does not work well on a multi-core CPU.

##### Share on other sites
Will it also work without inserting any of those macros into my code?

##### Share on other sites
Hi Rattenhirn, Thanks for your comment. I'm sorry that FreeProfiler won't work if you don't insert the macros into your code. I have logged this as a future feature. Ideally FreeProfiler should analyze your source code and insert those detecting macros into your code automatically.

For now, you need to insert macros by yourself. However, one benefit of this is that you can detecting the performance of a small part of an algorithm.

##### Share on other sites
Nice tool. Thanks for making this open source.

I browsed the source and noticed that you use construction and destruction to start and stop the high performance timer. I've been using this technique at work to do ad-hoc profiling and found it extremely useful in hunting down bottlenecks without pulling out the the big guns (like BoundsChecker). This is much more mature, cheers for putting it together. It may find it's way into my hobby projects.

##### Share on other sites
To Valere:
Thanks, and yes, high resolution timer is fast. However it does not work well on a multi-core CPU. This is something I need to improve.

##### Share on other sites
You should add the ability to output to a real time debug render, like iprof. Writing to a text file(xml or otherwise) is pretty useless imo. It's hugely more useful to see real time timer stats on screen so performance issues can be examined in real time. I don't mean add rendering code, something much simpler, like a lightweight render interface the application user can implement that will let the profiler optionally draw to the screen.

##### Share on other sites
Quote:
 You should add the ability to output to a real time debug render, like iprof.

Adding interfaces for real-time rendering of performance data is a good idea, especially for games. Thanks, DrEvil. I will log this.

##### Share on other sites
Quote:
 rdtsc does not work well on a multi-core CPU.

Are you using QueryPerformanceCounter or is that where the problem lies?

##### Share on other sites

I didn't use QueryPerformanceCounter, but used time stamp counter "rdtsc".

It is faster but the problem is: it is not thread-safe.

##### Share on other sites
From what I've heard, that's only part of the problem. I'm sure others will promptly suggest using QueryPerformanceCounter/Frequency. You may want to search through some old existing threads.

##### Share on other sites
Quote:
 Original post by renqilinI didn't use QueryPerformanceCounter, but used time stamp counter "rdtsc".It is faster but the problem is: it is not thread-safe.

I think it's more a case that with multi-core processors it will provide inaccurate results, and QueryPerformanceCounter suffers from the same problem (see Remarks).

##### Share on other sites
Quote:
Original post by renqilin
Quote:
 You should add the ability to output to a real time debug render, like iprof.

Adding interfaces for real-time rendering of performance data is a good idea, especially for games. Thanks, DrEvil. I will log this.

Or even a standalone 'monitor' application would be nice :)

##### Share on other sites
Quote:
 Original post by renqilinI didn't use QueryPerformanceCounter, but used time stamp counter "rdtsc".It is faster but the problem is: it is not thread-safe.

rdtsc doesn't take varying cpu-frequencies (due to powermangement etc.) and out-of-sync cycle counts between cores into account.

Go for QueryPerformanceCounter :)

##### Share on other sites
Quote:
 I think it's more a case that with multi-core processors it will provide inaccurate results, and QueryPerformanceCounter suffers from the same problem (see Remarks).

QueryPerformanceCounter suffers the same thread problem? Really?

Oh, that's really a problem.

##### Share on other sites
Quote:
Original post by renqilin
Quote:
 I think it's more a case that with multi-core processors it will provide inaccurate results, and QueryPerformanceCounter suffers from the same problem (see Remarks).

QueryPerformanceCounter suffers the same thread problem? Really?

Oh, that's really a problem.

Just use the SetThreadAffinity for the (main) thread in which you calculate timings. Should be fine :)

Edit: Oh, jsut realized that you actually need to do the timings in several threads, sry.
Not sure what exactly "different results on different processors due to bugs" means.

##### Share on other sites

Thank you for the information, stein.

I logged this as a bug and I will do some research on QueryPerformanceCounter and SetThreadAffinityMask tomorrow.

##### Share on other sites
I read some docs today, and I found that both rdtsc and QueryPerformanceCounter is *unusable*!

Some information:

1. QueryPerformanceCounter isn't guaranteed to be monotonic, even on single-processor systems - it increases unsteadily and sometimes goes backwards! (Actually, what happens is the performance counter runs too fast, getting ahead of real time, and is then corrected by a lower-resolution timer somewhere else in the system). This isn't what you want to do when measuring frame times for your game - even if you clamp the delta values to zero, you won't get a smoothly increasing counter, which causes things like player movement and physics to go haywire. Moreover, on laptops the processor speed will actually vary depending on CPU usage - and the value returned by QPF may or may not change to represent this.

2. SetThreadAffinityMask or SetProcessAffinityMask can fix part of the problem of QueryPerformanceCounter on a multi-core CPU. Unfortunately, QueryPerformanceCounter will fail when Hyperthread is on, even if you restrict all the threads to the first processor.

##### Share on other sites
Well, I'm wondering what other people use to get a pretty accurate timing (especially in games). I always read about the problems with QueryPerformanceCounter on multi core system (there are alot of threads on the net about this), but I never stumbled over a solution. Obviously timeGetTime is far too slow and using core specific instructions isn't ideal as well.
Are there even any counters that can be used?

##### Share on other sites

Here is a more accurate timer. Unfortunately it cost more than 250 milliseconds to get current time.

slow but calibrate pc timer

##### Share on other sites
This timing issue has caused lots of trouble, which I hope to reduce by providing an article on the topic along with a partial solution :)

##### Share on other sites
Ah ha! Thank you very much, Jan Wassenberg. Your solution looks a fairly good way to solve the problem.

I heard that Windows Vista internally uses HPET(High Precision Event Timer) to implement QueryPerformanceCounter. therefore, the multi-core issue should only be the problem of Windows 9x, Windows NT and Windows XP. Please correct me if I'm wrong here.