Jump to content
  • Advertisement
Sign in to follow this  
hotdogsayhi

How to do performance test in C++?

This topic is 3477 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Advertisement
1) Compare the disasembly. This will tell you the most about it with the least work.

2) Put the functions to use in a REAL LIFE EXAMPLE in a RELEASE build. And roll your own profiler using QueryPerformanceCounter() for accurate timings (or something like AMD's Codeanalyst). The key here is that contrived usage like

for ( int i = 0; i < 1000; ++i )
{
foo()
}

is a really really really bad way to profile code.
a) It doesn't fool the VS compiler. It has a tendency to inline foo() unroll the loop, determine the loop has no side effects, and then cut the whole thing.
b) If the loop really does have side effects, and the compiler doesn't just cut it out, you miss out on a lot of data anyway. All of foo()'s instructions and data will be in the cache and registers. This means it will run much faster than you'd see in normal usage.
c) Testing like above only covers one type of usage of the function, and that might be the best case for one way of writing it, but the worst case for the other. For another example of how not to write a benchmark:

vector<int> a;
for ( int i = 0; i < 1000; ++i )
{
a.push_back(1);
a.erase(a.begin());
}

doesn't at ALL highlight how slow a.erase(a.begin()) could be.

You are best off if you can profile the real usage of the object in the real code it will be used in.

3) You can try a contrived usage. If you can avoid 2a from above, then you still might be able to get SOME information. Just don't trust it to be very acurate (2c). But it might be enough to give you a good guess as to what to do.


As far as rolling your own profiler. You should look at the high performance timer QueryPerformanceCounter and QueryPerformanceFrequency. Using that, you need only gather the start and end time for each call to your function. From there you can aggregate that data to get average time, total time, max time and min time your function took.

Share this post


Link to post
Share on other sites
The ideas behind the profiler superpig develops in this article are pretty solid IMO, if you want to roll your own solution.

Enginuity Part III

The series is a bit dated, but I feel the techniques and ideas presented are a great place to start if you're totally lost on a particular subject. For example, a few subsystems in my engine are loosely based on the concepts presented in these articles.

Share this post


Link to post
Share on other sites
Matt328. I agree the Enginuity profiler is a really decent starting place if you have no idea where to start.
But it could use a bit of tuning.
1) It isn't as light weight as it could be. Storing just the start/end time and call depth of a profile sample is enough (pre-reserved storage avoids allocation overhead). You can come back at the end of the frame and aggregate the data. This is important for profiles within profiles if the bottom profile is a small function. Lots of data aggregation and processing in the profile sample itself will cloud the parent's profile with misinformation.
2) It doesn't have, but iirc could be adapted to, keep call trace information. This can be very useful. It is nice to see what functions are called from where, and how often in each place.

Lastly. If you have access to an external profiler, use it. They can often do other things like code-coverage (what functions were actually used). And you don't have to specify what functions get profiled (they will profile all of them). That is bad because you may see a lot of "garbage" functions in your profile that you don't know what to do with, but it is really GOOD because it can show you time wasted in system calls like "lock mutex" or "sleep on IO request" or "garbage collection".


Everyone feel free to bash (sorry for the f64 f32 u32 crap. bad habit from places i've worked to hand-define presized types the f64's are doubles on
the PC):
tGetTickCount() in my core/timer is just a wrapper for QPC() if you are using windows.

Profiler.h

#ifndef PROFILER_H
#define PROFILER_H

#include <vector>

#include "core/FSDefs.h"

namespace Profiler
{
class Sample
{
public:
Sample ( const char *name );
~Sample ();
private:
const f64 _timeStart;
const char *_name;
};

struct ReportData
{
const char *_name;
u64 _callCount;
f64 _timeTotal;
f64 _timeAvg;
f64 _timeMax;
f64 _timeMin;

f64 _timeChild;
f64 _timeChildAvg;
f64 _timeChildMax;
f64 _timeChildMin;

ReportData() :
_callCount(0),
_timeTotal(0),
_timeAvg(0),
_timeMax(0),
_timeMin(1e15),
_timeChild(0),
_timeChildAvg(0),
_timeChildMax(0),
_timeChildMin(1e15)
{
}
};

void WriteLog( const char *filename = "stdout", const bool formated = true );
void CollectFrame( void );
void SetMaxSampleCount( unsigned samplecount );
const std::vector<ReportData> &GetProfileReport( void );

typedef std::vector<ReportData>::const_iterator profilereport_const_iterator;
}; // namespace Profiler

#ifdef ENABLE_PROFILER
#define PROFILER_SAMPLE(_name) Profiler::Sample profileScopeInstance(_name);
#define IF_PROFILER(_a) _a
#define IFN_PROFILER(_b)
#define SELECT_PROFILER(_a,_b) _a
#else
#define PROFILER_SAMPLE(x)
#define IF_PROFILER(_a)
#define IFN_PROFILER(_b) _b
#define SELECT_PROFILER(_a,_b) _b
#endif // ENABLE_PROFILER

#endif



Profiler.cpp

#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#include <fstream>
#include <iomanip>

#define ENABLE_PROFILER
#include "core/assert.h"
#include "core/timer.h"
#include "profiler.h"

namespace Profiler
{
//----------------------------------------------------------------------------
// Local Data Structures
struct SampleData
{
const char *_name;
f64 _timeStart;
f64 _timeEnd;
u32 _callDepth;

SampleData(const char *name, const f64 timeStart, const f64 timeEnd, const u32 callDepth) :
_name(name),
_timeStart(timeStart),
_timeEnd(timeEnd),
_callDepth(callDepth)
{
}
};

//----------------------------------------------------------------------------
// Function Prototypes
static inline void BeginProfileScope( void );
static inline void EndProfileScope( const char *name, const f64 startTime, const f64 endTime );
static std::string TimeToString( const f64 time, const f64 secScale = 1.0 );

//----------------------------------------------------------------------------
// Globals
static const char *g_profiler_profileFileName = "profile.log.txt";
static std::vector<SampleData> g_profiler_frameData;
static std::vector<ReportData> g_profiler_reportData;
static u32 g_profiler_stackDepth = 0;
static u32 g_profiler_maxFrameSamples = 1000;
static u32 g_profiler_maxProfilesPerFrame = 0;

typedef std::vector<ReportData>::iterator report_iterator;
typedef std::vector<SampleData>::iterator sample_iterator;
//----------------------------------------------------------------------------
// TJS
// Return the current report listing (for display or analyisis)
//----------------------------------------------------------------------------
const std::vector<ReportData> &GetProfileReport( void )
{
return g_profiler_reportData;
}

//----------------------------------------------------------------------------
// TJS
// Sets the max profiles allowed in one frame
//----------------------------------------------------------------------------
void SetMaxSampleCount( u32 samplecount )
{
g_profiler_maxFrameSamples = samplecount;
ASSERT(g_profiler_frameData.size() < samplecount);
g_profiler_frameData.reserve(samplecount);
}

//----------------------------------------------------------------------------
// ProfileSample implementation
Sample::Sample ( const char *name ) : _name(name), _timeStart(tGetTickCount())
{
++g_profiler_stackDepth;
}
Sample::~Sample()
{
const f64 timeEnd = tGetTickCount();
ASSERT(g_profiler_stackDepth);
ASSERT(g_profiler_frameData.size() < g_profiler_maxFrameSamples-1);
--g_profiler_stackDepth;

IF_ASSERTS(if(g_profiler_frameData.size() < g_profiler_maxFrameSamples-1))
{
g_profiler_frameData.push_back(SampleData(_name, _timeStart, timeEnd, g_profiler_stackDepth));
}
}
//----------------------------------------------------------------------------

//----------------------------------------------------------------------------
// TJS
// Collects all the data for this frame and adds it to the list of report data
// This can take a while, make sure it is outside any critical profiles.
//----------------------------------------------------------------------------
void CollectFrame( void )
{
//____________
// Set up some other statictics, like child times, and profiles this frame
f64 callStackChildTimes[64];
std::fill(callStackChildTimes, callStackChildTimes+ARRAY_LENGTH(callStackChildTimes), 0.0);
g_profiler_maxProfilesPerFrame = std::max(g_profiler_maxProfilesPerFrame,(u32)g_profiler_frameData.size());


//____________
// Collect all the profiles from this frame
for ( sample_iterator sdi = g_profiler_frameData.begin(); sdi != g_profiler_frameData.end(); ++sdi )
{
ReportData tempReport;
ReportData *curReport = &tempReport;
//____________
// Find the report that we should be appending our data to
for ( report_iterator rdi = g_profiler_reportData.begin(); rdi != g_profiler_reportData.end(); ++rdi )
{
// NOTE: Only comparing pointers! Unless you did something silly this will work.
// You shouldn't be passing dynamic names into the system.
// You shouldn't be passing in strings off the stack.
// All these strings should just be hard-coded like Sample("name");
// If it isn't hard coded, make sure it doesn't drop out of scope before this CollectFrame() call.
if ( rdi->_name == sdi->_name )
{
curReport = &(*rdi);
ASSERT(std::string(rdi->_name)==std::string(sdi->_name));
}
else
{
ASSERT(std::string(rdi->_name)!=std::string(sdi->_name));
}
}

//____________
// Parse the new data to add to this report
const f64 timeElapsed = std::max(sdi->_timeEnd - sdi->_timeStart, 0.0);

curReport->_name = sdi->_name;
curReport->_callCount++;
curReport->_timeTotal += timeElapsed;
curReport->_timeAvg = curReport->_timeAvg + (timeElapsed - curReport->_timeAvg) / f64(curReport->_callCount);
curReport->_timeMax = std::max( timeElapsed, curReport->_timeMax );
curReport->_timeMin = std::min( timeElapsed, curReport->_timeMin );

ASSERT(sdi->_callDepth < (ARRAY_LENGTH(callStackChildTimes)-1));
const f64 childtimeElapsed = callStackChildTimes[sdi->_callDepth+1];
callStackChildTimes[sdi->_callDepth+1] = 0.0;
callStackChildTimes[sdi->_callDepth] += timeElapsed;

curReport->_timeChild += childtimeElapsed;
curReport->_timeChildAvg = curReport->_timeChildAvg + (childtimeElapsed - curReport->_timeChildAvg) / f64(curReport->_callCount);
curReport->_timeChildMax = std::max( childtimeElapsed, curReport->_timeChildMax );
curReport->_timeChildMin = std::min( childtimeElapsed, curReport->_timeChildMin );

//____________
// Add the report if it was new
if ( curReport == &tempReport )
{
g_profiler_reportData.push_back(*curReport);
}
}
//____________
// Clear things for the next frame
g_profiler_frameData.clear();
}

//----------------------------------------------------------------------------
// TJS
// write out the data to the logfile
//----------------------------------------------------------------------------
void WriteLog( const char *filename, const bool formated )
{
CollectFrame();
std::string fname(filename);
if ( fname == "stdout" )
{
// just drop it to the console, formatted (prolly doesn't fit, but oh well.
std::cout << "Output to: <" << fname << ">" << std::endl;
std::cout << "Max Prstd::couts per Frame: " << g_profiler_maxProfilesPerFrame << std::endl;
std::cout << "Name [CallCount] |TTotal |TChild |TMin |TMax |TAvg |TChildAvg |TChildMin |TChildMax |" << std::endl;
for ( report_iterator rdi = g_profiler_reportData.begin(); rdi != g_profiler_reportData.end(); ++rdi )
{
std::cout << std::setw(16) << std::left << rdi->_name;
std::cout << "[" << std::setw(9) << std::right << rdi->_callCount << std::setw(6) << std::left << "] |";
std::cout << std::setw(12) << std::left << TimeToString(rdi->_timeTotal)<<"|";
std::cout << std::setw(12) << std::left << TimeToString(rdi->_timeChild)<<"|";
std::cout << std::setw(12) << std::left << TimeToString(rdi->_timeMin)<<"|";
std::cout << std::setw(12) << std::left << TimeToString(rdi->_timeMax)<<"|";
std::cout << std::setw(12) << std::left << TimeToString(rdi->_timeAvg)<<"|";
std::cout << std::setw(12) << std::left << TimeToString(rdi->_timeChildAvg)<<"|";
std::cout << std::setw(12) << std::left << TimeToString(rdi->_timeChildMin)<<"|";
std::cout << std::setw(12) << std::left << TimeToString(rdi->_timeChildMax)<<"|";
std::cout << std::endl;
}
}
else
{

std::ofstream ofile(fname.c_str());
if ( formated )
{
// Drop it to a file, formated to be read as is.
ofile << "Output to: <" << fname << ">" << std::endl;
ofile << "Max Profiles per Frame: " << g_profiler_maxProfilesPerFrame << std::endl;
ofile << "Name [CallCount] |TTotal |TChild |TMin |TMax |TAvg |TChildAvg |TChildMin |TChildMax |" << std::endl;
for ( report_iterator rdi = g_profiler_reportData.begin(); rdi != g_profiler_reportData.end(); ++rdi )
{
ofile << std::setw(16) << std::left << rdi->_name;
ofile << "[" << std::setw(9) << std::right << rdi->_callCount << std::setw(6) << std::left << "] |";
ofile << std::setw(12) << std::left << TimeToString(rdi->_timeTotal)<<"|";
ofile << std::setw(12) << std::left << TimeToString(rdi->_timeChild)<<"|";
ofile << std::setw(12) << std::left << TimeToString(rdi->_timeMin)<<"|";
ofile << std::setw(12) << std::left << TimeToString(rdi->_timeMax)<<"|";
ofile << std::setw(12) << std::left << TimeToString(rdi->_timeAvg)<<"|";
ofile << std::setw(12) << std::left << TimeToString(rdi->_timeChildAvg)<<"|";
ofile << std::setw(12) << std::left << TimeToString(rdi->_timeChildMin)<<"|";
ofile << std::setw(12) << std::left << TimeToString(rdi->_timeChildMax)<<"|";
ofile << std::endl;
}
}
else
{
// Drop it to a file in comma delminated table format to be read into a spreadsheet.
ofile << "Profiles/Frame: " << g_profiler_maxProfilesPerFrame << std::endl;
ofile << "Name,CallCount,TTotal,TChild,TMin,TMax,TAvg,TChildAvg,TChildMin,TChildMax" << std::endl;
for ( report_iterator rdi = g_profiler_reportData.begin(); rdi != g_profiler_reportData.end(); ++rdi )
{
ofile << rdi->_name << ",";
ofile << rdi->_callCount << ",";
ofile << (rdi->_timeTotal) << ",";
ofile << (rdi->_timeChild) << ",";
ofile << (rdi->_timeMin) << ",";
ofile << (rdi->_timeMax) << ",";
ofile << (rdi->_timeAvg) << ",";
ofile << (rdi->_timeChildAvg) << ",";
ofile << (rdi->_timeChildMin) << ",";
ofile << (rdi->_timeChildMax);
ofile << std::endl;
}
}
}
}

//----------------------------------------------------------------------------
// TJS
// Turn a time into a string
//----------------------------------------------------------------------------
std::string TimeToString( const f64 time, const f64 secScale )
{
std::stringstream str;
str << std::fixed << std::setprecision(3) << std::setw(9) << std::right;
f64 timeAccum = time * secScale;

if ( timeAccum > 86400.0 )
{
const f64 scaledTime = timeAccum / 86400.0;
const f64 days = f64(u64( scaledTime ));
const f64 remainder = timeAccum - days * 86400.0;
str << days << "d ";
timeAccum = remainder;
}

if ( timeAccum > 3600.0 )
{
const f64 scaledTime = timeAccum / 3600.0;
const f64 hrs = f64(u64( scaledTime ));
const f64 remainder = timeAccum - hrs * 3600.0;
str << hrs << "h ";
timeAccum = remainder;
}

if ( timeAccum > 60.0 )
{
const f64 scaledTime = timeAccum / 60.0;
const f64 min = f64(u64( scaledTime ));
const f64 remainder = timeAccum - min * 60.0;
str << min << "m ";
timeAccum = remainder;
}

if ( timeAccum < 0.001f )
{
const f64 scaledTime = timeAccum * 1000000.0;
str << scaledTime << "ns";
}
else if ( timeAccum < 1.0f )
{
const f64 scaledTime = timeAccum * 1000.0;
str << scaledTime << "ms";
}
else
{
str << timeAccum << "s";
}

return str.str();
}



[Edited by - KulSeran on January 11, 2009 11:49:19 AM]

Share this post


Link to post
Share on other sites
You could try an evaluation version of Intel® VTune™ Performance Analyzer but the complete version is far from being free.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!