what is best way to check a code is more effiecient and runs faster than others

Started by
11 comments, last by Sacaldur 8 years, 9 months ago

hi.

i want to know how can i distinguish my different codes and see which one runs better and faster.

there are some theorical ways like calculating best time and worst time and search in how many times loops runs and....

and there is a piece of code that starts a timer runs a code in a N time loop and after the code ended stops the timer and gives you the average.

i want to know how you analyze your code to see it runs efficient and fastest possible.

thank you for helping

Advertisement

Basically two options: measure the time for a given task yourself or use external profilers / tools like Intel VTune which can help you find hotspots and it shows you some CPU counters that can also guide you in optimizing your code.

You can use the C++11 std::chrono library to measure time or use an OS function like QueryPerformanceCounter directly.

The two methods are not mutually exclusive. You can (and should) do both.

"Some people use Singletons, some people are Simpletons." - Bill Gates
"Yum yum, I luv Cinnabon." - Mahatma Gandhi

and there is a piece of code that starts a timer runs a code in a N time loop and after the code ended stops the timer and gives you the average.


You should not trust too much the results of synthetic tests where you run your code in a tight loop, because the performance in those conditions and the performance as part of a larger program might be very different (for instance if the cache gets primed by the first iteration and the others are artificially cheap). You should test the performance of your code as part of the larger program, and feed it data that looks as much as possible like the actual data it's going to encounter in reality.

Performance is a tricky thing.

Profilers can give a wrong impression because they just point to where not why. They won't catch something like a bad algorithm or cache unfriendly usage.

Timer ran over many iterations once you know what area to look at is the best.

But mainly it comes down to experience to find performance issues. It is a lot like debugging except that you have to be more creative in judging the symptoms. IE in realizing you even have a bug at all.

This is my thread. There are many threads like it, but this one is mine.

Profilers can give a wrong impression because they just point to where not why. They won't catch something like a bad algorithm or cache unfriendly usage.

Most profilers I have used list cache misses in addition to abnormally high hits per instruction and allow sorting based on various things along with tree views, list views, child functions included/excluded in timings, etc.

And pointing to where should be enough for any competent developer to realize (at least roughly) why, especially if it is your own code that is flagged. Knowing where the bottleneck is usually implicitly indicates which algorithm is running and specifically which part of it is causing troubles.

To argue against using profilers is to suggest a programmer just use intuition when figuring things out. Seems to work well enough for xoxos though…


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I wrote a piece of "article" for an answer on stackoverflow about benchmarking, it can be found here:

http://stackoverflow.com/a/25027750

Of course only the first part is of any interest to you.

Otherwise, I'd say you can always predict performance by hand, using a "math" model of the machine, but it is so difficult and tricky that you might as well consider it impossible.

Do do a performance analysis on paper, you'd need to know the exact binary form of the program (op codes, or de-assembled op-codes), then you'll need to know how your cpu is going to treat this, what instructions goes into what pipeline (which out of order execution parallel pipe), and the exact state of the caches, which is hard.

Please consider this article:

http://www.gamedev.net/page/resources/_/technical/general-programming/a-journey-through-the-cpu-pipeline-r3115

The not-crazy-way™ is to measure it. You profile it in situation, or simply time it between start and end. Sampling profilers will give you details about where are the hot spots. Which means usually what function is called the most in your program. Then you can try different implementations for it, and profile again, if you lost time, revert code, if you gained speed, good. I'd say basically that's it.

The only sensible approach is to actually measure the performance: there's a reason we call it "computer science" rather than "computer voodoo".

Use a profiler or do your own timing (never just guess), measure things in context (i.e. not isolated snippets), and either use real data or data that is as close to real as possible.


Note that code that is faster on one machine may be slower on another, so for meaningful results you also need to test with the environment your code will actually run in whenever possible.

- Jason Astle-Adams

(for instance if the cache gets primed by the first iteration and the others are artificially cheap).

This is why you always fence before the first iteration of your own loops.

When you are writing a very reusable piece of code such as a matrix multiply it is literally impossible to test every separate piece of code that uses it and then rewrite the routine to best-suite those cases.

So it is wrong to suggest that you should distrust these results. If you have a specific use-case then you should always try to profile it within its native environment. Profilers do this for you. They time all functions in each of their own call contexts.

On the other hand some functions are so generic there is no possible way to consider them within a single context.
When you encounter this type of situation you don’t simply distrust any profiling information you get, you try to maximize the trustworthiness of your own timings.
That means, for example, performing a fence operation prior to the first iteration of the first set of compares you do.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I have spent a huge amount of time fixing other people'spoorly optimized code that no one else at the company can fix.

I know the profiler can be misleading because I see it mislead people all the time. They turn their brain off and just look at what the profiler says. So someone will come to me and ask to fix their matrix multiplication for example, after tons of effort and problems, and not ever ask why matrix multiplication is suddenly doing this even though there's not hundreds of times more multiplucation going on overnight and think maybe this is a symptom that something else is horribly wrong.

Any idiot can optimize a loop but if you have a serious performance problem that is not the issue. There is some 'problem' ie your algorithm, waiting on some resource, memory fragmentation, data layout that leads to a lot of cache misses etc.

Theoretically you would be so super smart that you never fall into such traps, but I don't think it's the reality for the average user, and I never came across a situation they are particularly helpful or that they would have caught something I missed. You will just end up optimizing your string class and math library when the real issue is you are killing the cache with each object you read because they have useless crap in them and not a clean list of operands to work on.

This is my thread. There are many threads like it, but this one is mine.

This topic is closed to new replies.

Advertisement