# How to code profile.

This topic is 3927 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I've searched all over the forums for how to profile code, but I always end up with just the suggestions TO profile, nothing else. Is this a functionality of some IDE, or a utility kinda like the gnudebugger? How do you profile your code?

##### Share on other sites
A good free profiler is AMD Codeanalyst. Although it's from AMD, it also works on Intel processors.

##### Share on other sites
Depends really on 3 things:
1. How much money you want to spend
2. What platform you are on
3. What language you are developing in.

The poorman's profiler is to just do:
int start = GetTickCount();.. run codeint end = GetTickCount();int totalTime = end - start

This works, but is a pain to use. So you can take the time to write your own full featured profiler like this.

Compuware gives out a toned down freeware (or is it just trial now?) version of its fantastic code profiler. You can find it here.

If you don't have the time or skills for that, then shelling out some cash might be best. There are a few alternatives so look at Wikipedia which has a huge link selection, so peruse them.

If you are on Linux there is one built into the system (I can't remember the name though..).

Hope this helps somewhat.

##### Share on other sites
Of course if you use Dev-C++ you can use the simple built in profiler.

##### Share on other sites
If you use Linux/gcc:

- compile using -pg.
- run the executable
- do $gprof executablenamehere > resultsfilehere Maybe there is a windowsport of gprof, I don't really know. #### Share this post ##### Link to post ##### Share on other sites Quote:  Original post by hydrooIf you use Linux/gcc:- compile using -pg.- run the executable- do$ gprof executablenamehere > resultsfilehereMaybe there is a windowsport of gprof, I don't really know.

Ah thats right. =) Can't believe I'd forgotten that, considering I was using that in my last sem's class!

##### Share on other sites
Don't use GetTickCount. It has a resolution in the millisecond range. Unless your code is really slow, or you're timing entire frames, you need a higher resolution such as from QueryPerformanceCounter.

##### Share on other sites
Quote:
 Original post by DeyjaDon't use GetTickCount. It has a resolution in the millisecond range. Unless your code is really slow, or you're timing entire frames, you need a higher resolution such as from QueryPerformanceCounter.

It's more than that though, arbitrary profiling like that tends to produce results that are baised either for or against the code in question. Your code needs to be profiled in the usage domain that is applicable to the situation. The standard method of
Get start time    Do something that is expensiveGet end time

Has several problems in that it doesn't actually mimic a real world usage pattern. For instance, profiling dot product code where you perform a dot product many thousands of times linearly likely isn't a very good benchmark, because you will usually be performing something with the results of the dot product that could introduce more overhead, or that could cause events like cache flushes that could end up being more expensive than your dot product was.

##### Share on other sites
Quote:
 Original post by DeyjaDon't use GetTickCount. It has a resolution in the millisecond range. Unless your code is really slow, or you're timing entire frames, you need a higher resolution such as from QueryPerformanceCounter.

A lot of people have noticed QPC is buggy on modern processors. timeGetTime with 1ms granularity is probably a safer option. Anyway, continue.

##### Share on other sites
Quote:
Original post by skittleo
Quote:
 Original post by DeyjaDon't use GetTickCount. It has a resolution in the millisecond range. Unless your code is really slow, or you're timing entire frames, you need a higher resolution such as from QueryPerformanceCounter.

A lot of people have noticed QPC is buggy on modern processors. timeGetTime with 1ms granularity is probably a safer option. Anyway, continue.

Actually, its not buggy, its just that if you do not set the processor affinity, your QPC results will be dependent upon the core that your thread executes on. Windows will attempt to keep your threads localized to a single core, but...not always. As such, due to different execution speeds of the various cores (since clock rates will vary per core), you may end up with different values being returned by QPC. To fix this is simply a matter of setting your processor affinity.

##### Share on other sites
Quote:
Original post by Washu
Quote:
Original post by skittleo
Quote:
 Original post by DeyjaDon't use GetTickCount. It has a resolution in the millisecond range. Unless your code is really slow, or you're timing entire frames, you need a higher resolution such as from QueryPerformanceCounter.

A lot of people have noticed QPC is buggy on modern processors. timeGetTime with 1ms granularity is probably a safer option. Anyway, continue.

Actually, its not buggy, its just that if you do not set the processor affinity, your QPC results will be dependent upon the core that your thread executes on. Windows will attempt to keep your threads localized to a single core, but...not always. As such, due to different execution speeds of the various cores (since clock rates will vary per core), you may end up with different values being returned by QPC. To fix this is simply a matter of setting your processor affinity.

But not the only problem with QPC: Performance counter value may unexpectedly leap forward. Without an understanding of both these issues (that just about every article I've encountered seems to miss), it would be easy to say that QPC is buggy. And actually it is buggy since the cause of the problem I linked is a design defect.

##### Share on other sites
Quote:
 Original post by WashuActually, its not buggy, its just that if you do not set the processor affinity, your QPC results will be dependent upon the core that your thread executes on.

Its a bit more complicated than that.. oh and...

Quote:
 Original post by WashuWindows will attempt to keep your threads localized to a single core, but...not always.

No, it wont.. at least not under Windows XP64 Pro. XP64 on a dual core AMD64 infact attempts to balance the usage of each core even if it means swapping a single thread back and forth many times per second, so that for example each core has a running 60% average utilization. The only way to prevent this is to set a thread or process affinity.

I am assuming that this is to theoretically balance heat generation between cores but perhaps I have their motive wrong.

In any event, you would NOT want to avoid this behavior unless the production code will specifically avoid the behavior. Profile the real thing, not a mock trial with specifically allocated thread affinity's and so forth.

Quad cores or here so unless you plan on having special cases for 1 core, 2 cores, 4 cores, and soon 8 core systems... you really SHOULD let the OS manage the cores itself.

Quote:
 Original post by WashuAs such, due to different execution speeds of the various cores (since clock rates will vary per core), you may end up with different values being returned by QPC. To fix this is simply a matter of setting your processor affinity.

I believe you are refering to the output of the RDTSC instruction, which on unpatched dual core AMD64's exhibited this specific problem. Additionally, on pentium processors with energy saving features or with AMD's "cool'n'quiet" technology, the clock speed could change dynamically.

If you cannot use GetTickCount (which often has an accuracy of +/- 10ms or 55ms) to profile an optmization, then you are probably profiling something that doesnt need optimization. Its a blink of an eye after all.

##### Share on other sites
Quote:
 Original post by Rockoon1If you cannot use GetTickCount (which often has an accuracy of +/- 10ms or 55ms) to profile an optmization, then you are probably profiling something that doesnt need optimization. Its a blink of an eye after all.

This is patently false. I've used profilers many times (DevPartner's free edition mostly, which doesn't work with VS2005, just 2003 unfortunately). Oftentimes, I find out that one function that runs in .01ms is being called more times than I'd imagined (by several orders of magnitude) and was the culprit.

GetTickCount wouldn't have helped me much in finding out that micro-optimizations in that function would produce a 50% or more speed increase in the entire application.

Of course, the real solution is to call the function fewer times [smile] But sometimes that's just not an option.

##### Share on other sites
Use QFC and when you notice a jump forward/backwards use another timing routine that is less accurate. Easy and better solution.

##### Share on other sites
Quote:
 Original post by BeanDogThis is patently false. I've used profilers many times (DevPartner's free edition mostly, which doesn't work with VS2005, just 2003 unfortunately). Oftentimes, I find out that one function that runs in .01ms is being called more times than I'd imagined (by several orders of magnitude) and was the culprit.

If your profiler is inserting timing calls into every function then you arent timing anything resembling production code. That type of obsessive observation can and often does effect the results.

The kind of profiling you are detailing is best done with an interrupt driven approach that simply interrupts your thread at regular intervals and see's where the instruction pointer is, collecting statistics... You were looking for a potential optimization target, correct? What did the absolute time have to do with it?

AMD's Code Analyst does precisely this. I highly recommend it for this purpose. Intel's alternative probably does similar although I have never used it.

Timing, IMHEO, is best reserved from comparing alternatives.

##### Share on other sites
I've been doing some research based on what you all have said, its helped me a lot in understanding profiling. Thank you.

To me, it seems that its not JUST how long any particular peice of code takes, but also how many times that code is called. If you can get the two (time, count), you simply multiply them together and figure out what percentage that is of the total code execution time. of course it would likely be important to take into account sleep time (like in a gui program waiting for input)

While adding in the overhead would't make it quite like the intended running enviroment, if applied universally, the diffrence shouldn't be noticable in terms of the percentages.

I imagine something like this could be compiled in using a construct similar to how the "assert()" macro can be cancled out.

The amd tool mentioned, if indeed it does take outside samples of the code's stack pointer... due to the fact that it would be doing so at a constant rate, you are, in a sense, indirectly getting the same percentage. Code that is ran more offten as a percentage of the execution time will have its stack pointer recognized more often in a near 1:1 ratio. Because it doesn't involve any real timeing systems, it would be much more robust to thread pauses and otehr os-related things. If you accidently alt-tab, (or another user on the system starts a heavy load in case of linux), the timing wouldn't be representive of your processes, and would incorperating hidden and unrelated variables. Where-as in the amd analysit case, the percentage is how many times you are pointed into the code verses how many times you have checked doesn't really care about the absolute time.

##### Share on other sites
Indeed. But I must stress that adding timing code to all your functions is not going to have consistent impact on each function. It can effect function inlining, branch prediction, and L1 data cache efficiency just to name a few. It will also, garanteed, effect L1 code/trace cache efficiency.

Intel's profiler is called VTUNE and it will do basically the same stuff that AMD's Code Analyst does. If you are looking for hotspots to put on the table for optimization consideration, then there really is no substitute for VTUNE or Code Analyst. These tools can also often tell you why a portion of the code is suprisingly slow (cache misses, branch mispredictions, etc..)

If you are looking to compare the performance of alternative algorithms, then a pair of GetTickCount's wrapped around a major portion of program is almost always an acceptable method. Simple and easy with games: just look at the FPS!

##### Share on other sites

This topic is 3927 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628647
• Total Posts
2984032

• 10
• 9
• 9
• 10
• 21