Sign in to follow this  
Mawr

How to improve yout STL performance?

Recommended Posts

When I was profiling my 2d space game with gprof (mingw is the compiler), the most time taken up was not by my functions (suprised myself!), but by some calls to the STL library. From the STL I'm using a list to keep track of my objects. It's used like this:
void Draw()
{
  std::list<dust>::iterator it= dustlist.begin();
  for(it;it!=dustlist.end();it++)
  {
    glBegin(GL_TRIANGLES);
    glColor3f(.3,.3,.3);
    glVertex2f(TransulatePoint(it->x+(dustWidth/2),true),
               TransulatePoint(it->y,false));

    glVertex2f(TransulatePoint(it->x-(dustWidth/2),true),
               TransulatePoint(it->y,false));

    glVertex2f(TransulatePoint(it->x,true),
               TransulatePoint(it->y-(dustHeight+bonusSpd),false));

    glEnd();
  }
}


And:

void Update()
{
  std::list<dust>::iterator it=dustlist.begin();
  for(it;it!=dustlist.end();it++)
  {
    it->y-=(it->spd+bonusSpd);
    if(it->y<0)
    {
      it=dustlist.erase(it);
    }
  }
}



gprof is reporting that:
std::_List_iterator<dust, dust&, dust*>::operator*() const 

std::_List_iterator<dust, dust&, dust*>::operator->() const 

are being called the most. So with this in mind, is there any way to optimize how I use the STL library in those code snippets?

Share this post


Link to post
Share on other sites
Quote:
Original post by Mawr
...the most time taken up was not by my functions (suprised myself!), but by some calls to the STL library...

...are being called the most.

Now, which is it? Because if they are being called the most and using up the most time, not surprising. The ratio of time taken to calls performed is fairly linear.

But we really need more information. For instance, did you compile it in release mode? Did you make sure you had optimizations enabled? If so, what level of optimizations? Now for a quick tip, there is no need to call end every iteration of the loop unless you change the contents inside of the loop (which applies to the first loop).

Now, another option would be to keep a pointer to the structure on hand, instead of calling to the iterator each time, but that should be a last resort really, as the compiler should be able to optimize both of those calls out.

Share this post


Link to post
Share on other sites
First did you try to turn on optimization flags at the highest level? i.e "-O3"

In draw function you should just use const_iterators if your not going modify any elements in the container & you could try doing this instead of using the iterator doing the work cache a reference e.g.


const dust& d= *it;


another thing is with iterators use pre-increment operator instead i.e. instead of it++ do ++it, post-increment cost you.

also you do realize you could probably use STL algorithms instead of writiing explicit loops.

Also something not related to STL, instead of having glBegin/End inside of the loop take it out-side, you can send more than one triangle between glBegin & End, and instead of modifiying indiviual vertices you could probably just do less operations using opengl transform ops.

Share this post


Link to post
Share on other sites
Quote:

std::_List_iterator<dust, dust&, dust*>::operator*() const
std::_List_iterator<dust, dust&, dust*>::operator->() const


These are the functions that actually access the data so it's not surprising they come at the top. All this says is that you are doing a lot of data accesses.

Here are the definitions for those two functions:
reference operator*() const
{ return static_cast<_Node*>(_M_node)->_M_data; }

pointer operator->() const
{ return &(operator*()); }



Considering that they are actually inlined, there's not much (i.e. nothing) you can shave off.

I would suspect that the profiler caused them not to be inlined, possibly artificially inflating their "cost" (i.e. viewing as a function call something that would just be done as a couple of pointer indirections).

Quote:
So with this in mind, is there any way to optimize how I use the STL library in those code snippets?


You could do fewer data accesses.

You could grab a reference to the element the iterator points to and use that instead of dereferencing the iterator multiple times. That should placate your profiler (fewer function calls, regardless of inlining), and possibly actually improve performance.
dust& ref = *it;

glVertex2f(TransulatePoint(ref.x+(dustWidth/2),true),
TransulatePoint(ref.y,false));




Grabbing a reference to an object whose location is determined by a potentially expensive function is a common trick.

Another option, for those cases where you do not need to refer to the original variable anymore, would be to make local copies of whatever data you need (here, it->x and ref.y) and be done with it. It may improve cache performance.

Share this post


Link to post
Share on other sites
Fruny: I beat you to it [smile]

And as it's just a draw function that doesn't seem to modify any elements in the container you'll be better off using const_iterator & a constant reference cache.

Plus take out glBegin/End of the loop!!!

Share this post


Link to post
Share on other sites
Quote:
Original post by snk_kid
Fruny: I beat you to it [smile]

And as it's just a draw function that doesn't seem to modify any elements in the container you'll be better off using const_iterator & constant reference cache.


*coughs*
Sure, I said pointer and not reference, but I beat both of you.

Share this post


Link to post
Share on other sites
Thanks for the speedy responses! So far I have done most of the suggested optimizations.
I moved glBegin(GL_TRIANGLES);, glEnd();, and glColor3f(.3,.3,.3); outside of the loop.
I'm using ++it.
Constant iterators when I don't need to modify data.
Made a reference to the end of the list, and tested against that instead of calling dustlist.end(); every loop.

But I don't understand how making a reference to the iterator for my list would improve performance. I thought in this situation 'it' as already a reference?

Clarifing, those functions I mentioned are both called the most, and the most time consuming (1st, and 2nd on the gprof list, for both catagories). I compiled this version with debug information, with optimized turned on to it's highest setting (I'm using dev-cpp, the option is Tools-> Compiler Options-> Settings-> Optimizations ("yes")-> Further Optimizations->Best).

Share this post


Link to post
Share on other sites
Quote:
Original post by Mawr
But I don't understand how making a reference to the iterator for my list would improve performance. I thought in this situation 'it' as already a reference?

Not to the iterator, to the structure the iterator points to. The idea is that if your compiler doesn't inline the dereferencing of the iterator, storing a reference to it will avoid that function call each time you need to access a member of the structure.

Share this post


Link to post
Share on other sites
Quote:
Original post by Mawr
Made a reference to the end of the list, and tested against that instead of calling dustlist.end(); every loop.


I'm not sure what your saying, you can't test an iterator with a reference, certainly if your not modifying the list you can cache a copy of the end iterator e.g.


for(std::list<dust>::const_iterator itr = dustlist.begin(),
end = dustlist.end();
itr != end;
++itr) { /*...*/ }


but we mean't to use a reference type at the beginning of the loop.

Quote:
Original post by Mawr
But I don't understand how making a reference to the iterator for my list would improve performance. I thought in this situation 'it' as already a reference?


list iterator will typically encapsulate a list node that refers to a node in the list, list iterator is a user-defined type with overloaded operators that mick schematics of a pointer, your really calling a member function so to avoid the overhead of the function call you cache a reference or constant reference (even better) and in all intents a reference is the object presto no more function calls.

By the way did you get a performance increase?

Share this post


Link to post
Share on other sites
The cache and TLB misses on all those items in the list are killing you. Also, print the size of the list every so often, to make sure you're not leaving lots of zombie items in the list.

I would keep the actual elements (not pointers) in a vector, so walking the vector is linear memory access, which dramatically reduces cache miss penalties. I'd consider using a "zombie" bit on elements in the vector, so I don't have to move the entire vector each time a particle is purged. Keep a counter of the number of zombie items, and when it's bigger than some number (30 ?) then I'd compact the vector and resize in one go.

Share this post


Link to post
Share on other sites
Guess my nomenclature is a bit bad. Here is the code I used to replace my old code.


//Old version
for(it;it!=dustlist.end();it++)
{...}

//New version
std::list<dust>::const_iterator end_it=list.end();
for(it;it!=end_it;++it)
{...}




I did get a signifigant boost to performance. While my program wasn't really that slow before on my machine, I wanted to know if there was any easy optimizations I could do, for the simple fact I'd know them.

I shifted the most time intensive calls from the ones I mentioned in my orignal post to void Draw(), void Update(), and float TransulatePoint(). And because some of my other functions were made in the same style, I improved them also with what you guys suggested.

Though I still have some questions, I think I understand moving away from it->member to const dust& d=*it, I'll try that next. But now zombie items inside the list? I thought the statement it=dustlist.erase(it); removed the item at position it, preventing any zombie items from being inside the list. Also, while watching the amount of memory my program uses, it stabilizes fairly quickly at a reasonable number (7 megabytes or so).

Share this post


Link to post
Share on other sites
Quote:
Original post by Mawr
Though I still have some questions, I think I understand moving away from it->member to const dust& d=*it, I'll try that next. But now zombie items inside the list? I thought the statement it=dustlist.erase(it); removed the item at position it, preventing any zombie items from being inside the list. Also, while watching the amount of memory my program uses, it stabilizes fairly quickly at a reasonable number (7 megabytes or so).

think of it this way, if you store a vector of things, then the cache can load up a bunch of those "things" however a list of things tends to result in the actual things residing all over in memory, very hard to cache.

Now, the reason you don't use the erase with the vector is that all elements after the erased item have to up one to fill in the space. Instead you have two options. 1) swap with the last living element (thus the first "dead" item you hit is where you would stop looping or 2) mark the element as 'dead' and then collect it at a later time when performance doesn't matter as much. (so you would have a condition here, if(!isDead) { do } else { skip })

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this