# Best way to measure performance

## Recommended Posts

Hi, I would like to test a few things performance-wise, just to be sure if some of my thoughts of getting higher performance (micro-optimization and more) are actually working. I know performance is really an 'in-theory' thing, and depends on a lot of variables. So my question is, what would be the best / most stable OS / compiler / way to measure time / number of repetitions etc. etc. etc.. Also, I noticed the first few times I run something (on XP), it's always slower then later on. So I guess I should leave out the few highest and lowest? So, what would be the best way to perform these measurements? Thanks! PS: that would be in C and/or C++, where I'd like to compile to ASM as well. I currently use GCC and WinXP, though I have an Ubuntu box downstairs I could use as well.

An often recommended technique is profiling. Profiling can tell you how much of your time is being spent in what function, how many times that function is called, how much time it takes, etc. It's a great tool for the kind of thing you'r talking about.

You'll want to profile with realistic conditions: Compile in release mode, run it for a good amount of time, etc. Note that profiling does add a little overhead though. Probably negligible. Note that this is for whole-program testing. To see how one line of code performs is a lot harder, especially if it turns out your compiler optimized it.

Note that you'll probably not get a LOT of data on any micro-optimizations that you mentioned. A lot of micro optimizations cut off so little time, the profiler won't find it. If you're good with assembly, I think you can get the time from the cpu yourself and that'll be a lot more accurate.

But, you mentioned "what would be the best / most stable OS / compiler".

You'll want to use your target OS. One reason is if you're testing on Linux but release on Windows, your Linux test results are invalid, especially if you depend on any OS-specific functionality.

I don't know there's one BEST compiler. Each one has advantages and disadvantages you'd just have to see what works. If you're using gcc, you can probably just use gcc.

I read in a few other topics about things loading faster the second time because the OS caches the file. I don't know whether it'd run better the second time, though.

If you're curious about optimization and performance, this is a pretty good guide. The Optimizing software in C++ one contains a section called "Finding the biggest time consumers" with a subsection about profilers. It also contains some good info about optimization and WHEN to optimize.

EDIT:
One last thing. "So, what would be the best way to perform these measurements?"
Whether or not my advice is any good kind of depends on what you're making, I think. I mean, if I wanted to reduce lag in my game, I might use a profiler, but if I wanted to stress test my application and how fast it could run through a MB of data, benchmarking might be more appropriate. So, what ARE you doing?

I want to test a few micro-optimizations ++i and i++ (I just want to test it myself) and other similar things, but also small libraries I wrote, how quick will it run, what SPF (seconds per frame) will it give me, testing how much overhead std::string::substr() gives me...I want to be sure about things, performance-wise.

I'm not too familiar with ASM, but I can read it a bit. Exporting the ASM sources sounds like a solid idea...using Release mode and the optimizations I'd use when I'd distribute it...

For stress-testing the profiler would be good to have.

Thanks.

Quote:
 Original post by DecriusI want to test a few micro-optimizations ++i and i++

The best test is to look at the generated assembly.

Furthermore, i++ can NEVER be faster than ++i. I ALWAYS write ++i.

Quote:
 Original post by DevFredFurthermore, i++ can NEVER be faster than ++i. I ALWAYS write ++i.

Yes, true. I want to see the difference though, how much it will differ etc.

Quote:
 Original post by DevFredFurthermore, i++ can NEVER be faster than ++i. I ALWAYS write ++i.
That isn't entirely true - if i is an integral type, there is really no reason why the compiler cannot compile both expressions to the identical assembly. For aggregate types you are of course correct.

Quote:
 Original post by DecriusSo, what would be the best way to perform these measurements?

There is no best way.

Micro-optimizations depend completely on the context. While ++i may take less than one cycle, the cost of loading the value from memory on first access may be hundreds of cycles.

In algorithms, running out of registers affects performance. Pipelining is an issue as well, so is cache behavior.

Quote:
 most stable OS / compiler
This question translates to: "I'm trying to improve my SUV. Which race track and which Formula 1 team should I use to measure performance".

You use your target environment. It doesn't matter how things work elsewhere.

Quote:
 way to measure time
One that is accurate enough. If your process takes 17 hours, stopwatch will do. If it takes 15 cycles, RTDSC is likely optimal. It depends.

Quote:
 number of repetitions
Anywhere between 0 and infinity. For some optimizations, you may not need to run any tests at all. For others, you need to run just enough to receive enough samples to get statistically accurate measurement.

For arbitrary tests, I used a loop:
int n = 1;while (true) {  time t1 = now();  for (int i = 0; i < n; i++) run_single_test();  time t2 = now();  time delta = t2 - t1;  if (delta_time < desired_time) {    n = n * 2;  } else {    break;  }}for (int i = 0; i < m; i++) {  time t1 = now();  for (int i = 0; i < n; i++) {    run_single_test();  }  time t2 = now();  record_single_measurement(t2-t1);}

The above is convenient for automated profiling. For each test, you specify how long it should run. The profiler then calibrates the number of repetitions (n) and number of measurements (m).

The nice thing about this approach is that you can measure a single instruction (which will get executed bazillion times) or an expensive algorithm (gets tested 12 times). Further more, it's possible to provide statistical analysis of results to determine if results are consistent. This information can be used to re-run the simulation with longer desired_time to compensate for inaccurate timers.

I generally use this type of profiling as part of unit tests. It helps as metric to see if a change to code broke affected performance. But unlike other unit tests, the results are logged, they don't cause an error if algorithm is suddenly slower (due to different absolute values on different machines).

Quote:
Original post by swiftcoder
Quote:
 Original post by DevFredFurthermore, i++ can NEVER be faster than ++i. I ALWAYS write ++i.
That isn't entirely true - if i is an integral type, there is really no reason why the compiler cannot compile both expressions to the identical assembly. For aggregate types you are of course correct.

Although you're right the compiler can realize of that, actually he's technically correct.

He said !(speed(i++) < speed(++i)) which is the same as speed(i++) >= speed(++i)

Which means that i++ can be SLOWER OR EQUAL than ++i. That's satisfying what you're saying and what he said.

It's impressive how a couple of words can express that much of content, isn't it?

Cheers
Dark Sylinc

Quote:
Original post by Matias Goldberg
Quote:
Original post by swiftcoder
Quote:
 Original post by DevFredFurthermore, i++ can NEVER be faster than ++i. I ALWAYS write ++i.
That isn't entirely true - if i is an integral type, there is really no reason why the compiler cannot compile both expressions to the identical assembly. For aggregate types you are of course correct.

Although you're right the compiler can realize of that, actually he's technically correct.

He said !(speed(i++) < speed(++i)) which is the same as speed(i++) >= speed(++i)

Which means that i++ can be SLOWER OR EQUAL than ++i. That's satisfying what you're saying and what he said.

It's impressive how a couple of words can express that much of content, isn't it?
LOL - I must have been skimming rather more than usual. I primarily replied because I thought DevFred was stating that one should always use preincrement, but I see on re-reading that he only stated that he himself uses preincrement. My apologies [smile]

Performance is relative.

Most chunks of code have no significant performance differences. Some chunks of code are hit really hard. A profiler helps there. Measure before, measure after, take the version that works best.

Another nice tool is map files. They can tell you a lot about your code without resorting to look at the raw assembly. Reading map files to identify performance is an art.

For very fine tuning you can look at the generated optimized code. This is often a last resort, only after you have identified the bottleneck, identified and corrected all the high level problems (like algorithm choice), identified and corrected issues in surrounding code, and still know that the very small gains will be worth the huge time investment.

