So, while all the suggestions are fine, they don't usually solve the larger issues. Don't get me wrong, I've seen trivial missing references cause massive performance degradation, in fact, for fun:
class ThingGrid
{
public:
typedef uint32_t Thing;
typedef std::vector< Thing > ThingArray_t;
const ThingArray_t operator[]( size_t y ) const {return mThings[ y ];}
const Thing GetThing( size_t x, size_t y ) const {return mThings[ y ][ x ];}
private:
std::vector< ThingArray_t > mThings;
};
It is a silly piece of code and obviously just an example, but it works as intended and if we were not talking about optimizations (and this were in say a JavaDoc commented header) it would not jump out as to why the above is really bad. If you didn't catch it immediately, yes returning the uint32_t "Thing" by value is intended but the return of the array is bad since it is also by value, so GetThing is massively slower than it should be thanks to a hidden temporary being made. Of course, fix the reference and things are all better, or are they? Take the following:
for( int x=0; x<things.Width(); ++x )
for( int y=0; y<things.Height(); ++y )
{ThingGrid::Thing thing = things.GetThing( x, y ); ... do something ...}
That piece of code is going to perform horribly on multiple levels even with the fix for the reference. Why? Think about it, there are 2 primary problems with that loop as applied to the given class.
[spoiler]
1. First problem is the inner loop iterates on Y while the data is laid out in an sub arrays of x indexed data. So, you are touching new chunks of memory each loop. Fix:
for( int y=0; y<things.Height(); ++y )
for( int x=0; x<things.Width(); ++x )
{ThingGrid::Thing thing = things.GetThing( x, y ); ... do something ...}
2. Even with that fix, you still get bit by the fact that as individual std::vectors each one points to different chunks of memory, so each time you complete an x loop you are likely blowing the cache and looking at a completely new chunk of memory. Fixing this means removing the inner vectors and using a single vector with manual y offset to indexing. Of course at that point, you could just use "for( const Thing& : mThings ) {}" and not have two indexes being maintained.
[/spoiler]
So, all said and done, while the simple rules of thumb items are important, I believe knowing your libraries and memory access patterns are much more important to prevent simple things like the above from being constantly adding up performance drains.