Efficient way to erase an element from std::vector

Started by
21 comments, last by SmkViper 9 years, 2 months ago

Hi

I am using a "std::vector<std::pair<stuff1, stuff2>> myvec" that holds temporary data. A random element in the vector may be erased at any time. I don't want to use the std::vector::erase method as it will shuffle all elements which degrades performance.

What I want to do instead is to take the element to be erased and set it equal to the back() element, then pop_back(). This won't shuffle the elements down as the erase() method does (I think).

Now, will the std::pair class assign operator mess with me when I do this or is this code good?


void EfficientEraseElement(std::vector<std::pair<stuff1, stuff2>> &myvec, int i)
{
	myvec.at(i) = myvec.back();
	myvec.pop_back();
}
Advertisement
The usual trick is to swap with the last element and then pop_back. It might be more efficient then your code in some cases.

Normally you use move-semantics to avoid the copies (for non-trivial types). std::swap() handles that for you, though a single std::move() would work.

Here's my general function for swap-and-popping vectors:


//Swap-and-pop a specific element of a container, swapping the element with the element at the back of the container and then popping it off the stack.
//This does not preserve the order of the elements.
template<typename ContainerType>
void SwapAndPopAtIndex(ContainerType &container, size_t index)
{
	//Don't swap the back element with itself, and also don't swap out of range.
	/*
		The (index+1) is to prevent the back element from swapping with itself.

		This could be more clearly written as:	if(index >= (container.size()-1))
		...but since 'container.size()' is unsigned,
		the -1 would accidentally loop it around if size() returns 0, which is a likely occurance.
	*/
	if((index+1) >= container.size())
		return;

	//Swap the element with the back element.
	std::swap(container[index], container.back());

	//Pop the back of the container, deleting our old element.
	container.pop_back();
}

It does not preserve the order of elements.

Normally you use move-semantics to avoid the copies (for non-trivial types). std::swap() handles that for you, though a single std::move() would work.

Here's my general function for swap-and-popping vectors:

//Swap-and-pop a specific element of a container, swapping the element with the element at the back of the container and then popping it off the stack.
//This does not preserve the order of the elements.
template<typename ContainerType>
void SwapAndPopAtIndex(ContainerType &container, size_t index)
{
	//Don't swap the back element with itself, and also don't swap out of range.
	/*
		The (index+1) is to prevent the back element from swapping with itself.

		This could be more clearly written as:	if(index >= (container.size()-1))
		...but since 'container.size()' is unsigned,
		the -1 would accidentally loop it around if size() returns 0, which is a likely occurance.
	*/
	if((index+1) >= container.size())
		return;

	//Swap the element with the back element.
	std::swap(container[index], container.back());

	//Pop the back of the container, deleting our old element.
	container.pop_back();
}
It does not preserve the order of elements.


The fact that it does not handle the case of desiring to remove the last element means this is less than a robust solution, as you end up having to write a conditional for every usage... I would recommend modifications to properly handle the case of popping the last element. Additionally, an assert for when the index is >= size would be more appropriate, as the desired goal is to fail in a non-silent manner when someone does something stupid (like try and remove an element outside of the bounds of the container). Furthermore, by handling the swap case in a saner manner, i.e. simply skipping it if we're the last element, we can call this function without requiring any additional checks. Whereas with your current one, you require a check to see if you're the last element (in which case you just pop) or call this method, which then does the same check AGAIN. Thus the solution below is probably more what you desire:

template<typename ContainerType>
void SwapAndPopAtIndex(ContainerType & container, size_t index)
{
    // ensure that we're not attempting to access out of the bounds of the container.
    assert(index < container.size());

    //Swap the element with the back element, except in the case when we're the last element.
    if (index + 1 != container.size())
        std::swap(container[index], container.back());

    //Pop the back of the container, deleting our old element.
    container.pop_back();
}

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

AFAIK, std::vector needs to guarantee contiguous storage at all times, so I don't think there's any way to avoid memory moves when you delete an item. Std::list does not have that requirement, and it is designed specifically for random deletions and insertions. However, it does not provide random access (by index).

Another thing you could do if you still need to work with std::vector is to mark items as deleted instead of actually deleting them. For basic data types, you could reserve a specific value (like NULL for pointers) as meaning "deleted item". For structures and classes, you probably already use pointers, so NULL is good. If not, you could add a member m_bIsDeleted.

Out of curiosity, does anyone know if there is any chance the swap can be optimized to a move when the items don't have destructors?

I guess that could be difficult unless the memory in the vector is actually shrunk..

Not that it should really matter, but for large POD types it seems unnecessary to exchange the memory instead of just copying.

If by "items don't have destructors" you mean to say that your items are not pointers to objects, structures or other data, then no, there is not way to avoid a copy/move (actually three - one for the first item being moved to aux, one for the second item being moved into the first, and one for aux being moved to the second item).

Also, for 64-bit targets, the compiler will optimize those data moves into SSE instructions, if the data to be moved is 8 bytes or less...

template<typename ContainerType>
void SwapAndPopAtIndex(ContainerType & container, size_t index)
{
    if (index + 1 != container.size())//#####################
        std::swap(container[index], container.back());
    container.pop_back();
}

Could you just remove the highlighted line, assuming that if an item is expensive to swap, it will contain it's own early-out inside a if-swapping-this-with-this test?

Out of curiosity, does anyone know if there is any chance the swap can be optimized to a move when the items don't have destructors?
I guess that could be difficult unless the memory in the vector is actually shrunk..
Not that it should really matter, but for large POD types it seems unnecessary to exchange the memory instead of just copying.

C++11 introduced a POD testing template, so you could write:
if( std::is_pod<T>::value )
  container[index] = container.back();
else
  std::swap(container[index], container.back());

Out of curiosity, does anyone know if there is any chance the swap can be optimized to a move when the items don't have destructors?
I guess that could be difficult unless the memory in the vector is actually shrunk..
Not that it should really matter, but for large POD types it seems unnecessary to exchange the memory instead of just copying.

C++11 introduced a POD testing template, so you could write:
if( std::is_pod<T>::value )
  container[index] = container.back();
else
  std::swap(container[index], container.back());


Wouldn't it be enough to test for std::is_trivially_copyable? Requiring the whole more restrictive POD-property seems unnecessary.

Wouldn't it be enough to test for std::is_trivially_copyable? Requiring the whole more restrictive POD-property seems unnecessary.

Yep. I forgot that C++'s definition of POD is now stupidly restrictive. I usually use "POD" to mean "memcpy-safe", which apparently C++ calls "trivially copyable" now sad.png

This topic is closed to new replies.

Advertisement