how efficient is stl vector erase

Started by
14 comments, last by Zahlman 18 years, 4 months ago
Hi, I have a situation where I have a huge vector of vector of doubles, basically a matrix of about 3000 rows and 12000 columns. I have an algorithm that iterates over the rows of the matrix until a stopping condition is reached. During the algorithm it may decide that some of the rows are no longer relevant and should be ignored for the rest of the algorithm. One way to do this is to set the flag of the row in another vector and ignore it when appropriate in the calculation, this increases computing cost as the ignored rows still takes up memory and certain par t of the algo will still iterate over them. So I decided that when a row becomes irrelevant, it should be erased from the vector. This however causes great slow down at the start of the algo when there are lots of rows but much faster in latter part of the algo when the number of rorws have become much smaller. The only reason I can think of for the slow down is that somehow the vector is structuring the memory as part of the erase operation and because of the sheer volume of data involved, it starts to use swap memory (this on a machine with 512mb ram). Can anyone confirm if this is the case and suggest someway of getting around it? I would like to maintain the order of the elements in the vector if at all possible. So far I am getting the idea that erase (or pop) at the end (or was it beginning) of the vector is more efficient. thanks Shaobo
Just because it is not nice, doesn''t mean it is not miraculous.
Advertisement
erase() copies elements from the end of the vector forward to fill in the gaps of elements erased. This means erase() near the end is efficient by erase() near the beginning is slow. You can somewhat better performance if you use std::remove(_if)() followed by erase() (which minimizes the number of element copies) or by instead of having a vector of vector of doubles, have a vector of (smart) pointers to vectors of doubles (which reduces the cost of copies).
Quote:Original post by SiCrane
erase() copies elements from the end of the vector forward to fill in the gaps of elements erased. This means erase() near the end is efficient by erase() near the beginning is slow. You can somewhat better performance if you use std::remove(_if)() followed by erase() (which minimizes the number of element copies) or by instead of having a vector of vector of doubles, have a vector of (smart) pointers to vectors of doubles (which reduces the cost of copies).


with the smart pointers, would I still be able to access the elements of matrix using the [][] operators? Would it involve quite a bit of effort? any links would be appreciated.

thanks

Shaobo

Just because it is not nice, doesn''t mean it is not miraculous.
Quote:Original post by shaobohou
with the smart pointers, would I still be able to access the elements of matrix using the [][] operators? Would it involve quite a bit of effort? any links would be appreciated.


The syntax would end up like (*foo)[j], you could write a wrapper to make it look more elegant but i suggest looking into either:

A. Boost.MultiArray.

or:

B. Boost Basic Linear Algebra or any linear algebra library that uses expression templates.

or:

C. Boost Pointer Container library and have: boost::ptr_vector/deque of std::vector instead.

or:

D. Maybe even a std::deque of std::vector.
Or you could do away with the matrix altogether and use something that doesn't incur in such a high penalty on row deletion.

Like a list of vectors, for example. very low penalty, no memory rearrangement needed, and no smart pointers either. As simple as possible.

This may or may not be useful depending on your needs. If you can do away with row indexes and just iterate trough the whole thing, then it should do the trick.
Working on a fully self-funded project
I only just realised this, for the algorithm I am using, I can do the row removal in batch. I shuffle all the row that doesn't need removed to the front of the vector and do erase from some point in the middle of the vector to the end.

Assuming that erase a range of vector does the right thing.
Just because it is not nice, doesn''t mean it is not miraculous.
Another fast way to delete from a vector is to swap the item with the last item, and then erase the last item.
This is of course assuming that you don't care about the order of them.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Quote:Original post by shaobohou
I only just realised this, for the algorithm I am using, I can do the row removal in batch. I shuffle all the row that doesn't need removed to the front of the vector and do erase from some point in the middle of the vector to the end.

Assuming that erase a range of vector does the right thing.


This is how the 'erase-remove idiom' works. You may find it is neater to make use of the standard library remove or remove_if algorithms, as SiCrane suggested.
but won't swapping a row with another row incur copying of the whole row vector?

I would just use a simple vector of pointers to vectors like suggested.
Quote:Original post by dmatter
but won't swapping a row with another row incur copying of the whole row vector?


No (if done correctly). Since the std::vector object internally maintains a pointer to its data allocation, and because the library writers are smart, the std::vector provides a .swap() member function which swaps those pointers around (in addition to the necessary swapping of bookkeeping information). This is of course much more efficient than copying the data :)

This topic is closed to new replies.

Advertisement