Sign in to follow this  
shaobohou

how efficient is stl vector erase

Recommended Posts

Hi, I have a situation where I have a huge vector of vector of doubles, basically a matrix of about 3000 rows and 12000 columns. I have an algorithm that iterates over the rows of the matrix until a stopping condition is reached. During the algorithm it may decide that some of the rows are no longer relevant and should be ignored for the rest of the algorithm. One way to do this is to set the flag of the row in another vector and ignore it when appropriate in the calculation, this increases computing cost as the ignored rows still takes up memory and certain par t of the algo will still iterate over them. So I decided that when a row becomes irrelevant, it should be erased from the vector. This however causes great slow down at the start of the algo when there are lots of rows but much faster in latter part of the algo when the number of rorws have become much smaller. The only reason I can think of for the slow down is that somehow the vector is structuring the memory as part of the erase operation and because of the sheer volume of data involved, it starts to use swap memory (this on a machine with 512mb ram). Can anyone confirm if this is the case and suggest someway of getting around it? I would like to maintain the order of the elements in the vector if at all possible. So far I am getting the idea that erase (or pop) at the end (or was it beginning) of the vector is more efficient. thanks Shaobo

Share this post


Link to post
Share on other sites
erase() copies elements from the end of the vector forward to fill in the gaps of elements erased. This means erase() near the end is efficient by erase() near the beginning is slow. You can somewhat better performance if you use std::remove(_if)() followed by erase() (which minimizes the number of element copies) or by instead of having a vector of vector of doubles, have a vector of (smart) pointers to vectors of doubles (which reduces the cost of copies).

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
erase() copies elements from the end of the vector forward to fill in the gaps of elements erased. This means erase() near the end is efficient by erase() near the beginning is slow. You can somewhat better performance if you use std::remove(_if)() followed by erase() (which minimizes the number of element copies) or by instead of having a vector of vector of doubles, have a vector of (smart) pointers to vectors of doubles (which reduces the cost of copies).


with the smart pointers, would I still be able to access the elements of matrix using the [][] operators? Would it involve quite a bit of effort? any links would be appreciated.

thanks

Shaobo

Share this post


Link to post
Share on other sites
Quote:
Original post by shaobohou
with the smart pointers, would I still be able to access the elements of matrix using the [][] operators? Would it involve quite a bit of effort? any links would be appreciated.


The syntax would end up like (*foo[i])[j], you could write a wrapper to make it look more elegant but i suggest looking into either:

A. Boost.MultiArray.

or:

B. Boost Basic Linear Algebra or any linear algebra library that uses expression templates.

or:

C. Boost Pointer Container library and have: boost::ptr_vector/deque of std::vector instead.

or:

D. Maybe even a std::deque of std::vector.

Share this post


Link to post
Share on other sites
Or you could do away with the matrix altogether and use something that doesn't incur in such a high penalty on row deletion.

Like a list of vectors, for example. very low penalty, no memory rearrangement needed, and no smart pointers either. As simple as possible.

This may or may not be useful depending on your needs. If you can do away with row indexes and just iterate trough the whole thing, then it should do the trick.

Share this post


Link to post
Share on other sites
I only just realised this, for the algorithm I am using, I can do the row removal in batch. I shuffle all the row that doesn't need removed to the front of the vector and do erase from some point in the middle of the vector to the end.

Assuming that erase a range of vector does the right thing.

Share this post


Link to post
Share on other sites
Another fast way to delete from a vector is to swap the item with the last item, and then erase the last item.
This is of course assuming that you don't care about the order of them.

Share this post


Link to post
Share on other sites
Quote:
Original post by shaobohou
I only just realised this, for the algorithm I am using, I can do the row removal in batch. I shuffle all the row that doesn't need removed to the front of the vector and do erase from some point in the middle of the vector to the end.

Assuming that erase a range of vector does the right thing.


This is how the 'erase-remove idiom' works. You may find it is neater to make use of the standard library remove or remove_if algorithms, as SiCrane suggested.

Share this post


Link to post
Share on other sites
Quote:
Original post by dmatter
but won't swapping a row with another row incur copying of the whole row vector?


No (if done correctly). Since the std::vector object internally maintains a pointer to its data allocation, and because the library writers are smart, the std::vector provides a .swap() member function which swaps those pointers around (in addition to the necessary swapping of bookkeeping information). This is of course much more efficient than copying the data :)

Share this post


Link to post
Share on other sites
if ordering of elements is not important, you might just copy the
last element over the element you want to erase, and the resize (size()-1)

Regards

Share this post


Link to post
Share on other sites
When the elements are std::vector<>s it's generally cheaper to swap() with the back and call pop_back() rather than assign the last element on top and call pop_back(). A lot cheaper if the element vectors contain a lot of elements themselves.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Quote:
Original post by dmatter
but won't swapping a row with another row incur copying of the whole row vector?


No (if done correctly). Since the std::vector object internally maintains a pointer to its data allocation, and because the library writers are smart, the std::vector provides a .swap() member function which swaps those pointers around (in addition to the necessary swapping of bookkeeping information). This is of course much more efficient than copying the data :)


What do you mean by if done correctly? Is the behaviour you described not required as part of the standard or just some compilers don't conform to it? So effectively, when I do v.swap(u), basically the two pointers to the internal data allocations gets swapped around (along with the bookkeeping information as you said).

I didn't realise this aspect of the .swap() function. Personally I prefer it over the remove_if/erase thing, just because that would probably involve me changing some of the data structures.

thanks for all replies

Shaobo

Share this post


Link to post
Share on other sites
Quote:
Original post by shaobohou
What do you mean by if done correctly?

In this context, done correctly means using std::vector::swap rather than manually swapping them. The latter will almost certainly require three deep vector copies, rather than three shallow ones.

CM

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Or just std::swap() which is specialized to call std::vector::swap().


I was worried it wasn't, which is why I wrote as I did. Thanks for the reassurance.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this