Sign in to follow this  
Tispe

Efficient way to erase an element from std::vector

Recommended Posts

Hi

 

I am using a "std::vector<std::pair<stuff1, stuff2>> myvec" that holds temporary data. A random element in the vector may be erased at any time. I don't want to use the std::vector::erase method as it will shuffle all elements which degrades performance.

 

What I want to do instead is to take the element to be erased and set it equal to the back() element, then pop_back(). This won't shuffle the elements down as the erase() method does (I think).

 

Now, will the std::pair class assign operator mess with me when I do this or is this code good?

void EfficientEraseElement(std::vector<std::pair<stuff1, stuff2>> &myvec, int i)
{
	myvec.at(i) = myvec.back();
	myvec.pop_back();
}

Share this post


Link to post
Share on other sites

AFAIK, std::vector needs to guarantee contiguous storage at all times, so I don't think there's any way to avoid memory moves when you delete an item. Std::list does not have that requirement, and it is designed specifically for random deletions and insertions. However, it does not provide random access (by index).

 

Another thing you could do if you still need to work with std::vector is to mark items as deleted instead of actually deleting them. For basic data types, you could reserve a specific value (like NULL for pointers) as meaning "deleted item". For structures and classes, you probably already use pointers, so NULL is good. If not, you could add a member m_bIsDeleted.

Share this post


Link to post
Share on other sites

Out of curiosity, does anyone know if there is any chance the swap can be optimized to a move when the items don't have destructors?

I guess that could be difficult unless the memory in the vector is actually shrunk..

Not that it should really matter, but for large POD types it seems unnecessary to exchange the memory instead of just copying.

Share this post


Link to post
Share on other sites

If by "items don't have destructors" you mean to say that your items are not pointers to objects, structures or other data, then no, there is not way to avoid a copy/move (actually three - one for the first item being moved to aux, one for the second item being moved into the first, and one for aux being moved to the second item).

 

Also, for 64-bit targets, the compiler will optimize those data moves into SSE instructions, if the data to be moved is 8 bytes or less...

Edited by tonemgub

Share this post


Link to post
Share on other sites

Wouldn't it be enough to test for std::is_trivially_copyable? Requiring the whole more restrictive POD-property seems unnecessary.

Yep. I forgot that C++'s definition of POD is now stupidly restrictive. I usually use "POD" to mean "memcpy-safe", which apparently C++ calls "trivially copyable" now sad.png

Share this post


Link to post
Share on other sites

 

Wouldn't it be enough to test for std::is_trivially_copyable? Requiring the whole more restrictive POD-property seems unnecessary.

Yep. I forgot that C++'s definition of POD is now stupidly restrictive. I usually use "POD" to mean "memcpy-safe", which apparently C++ calls "trivially copyable" now sad.png

 

 

I would even say is_trivially_destructible. Though I'm not entirely sure on that one, and actually for performance-reasons I don't think it can even be determined from such traits, as it would depend on their internals. But for the general complexity it's interesting.

 

As far as I see it, swap is always better than copy/assign (whether or not trivially) when there are parts of the object that need to be destructed, as otherwise any non-trivial original contents of [index] would be destructed before replaced with their new values from back(), and then back() would be destructed at some point (though it could of course at times be optimized out to a copy in the assignment, but worst-case).

When swapping out the objects internals, no potential extra destruction can take place.

 

However, there is another case where swap is worse, which is when there is no destructor so the object removed at back() never needs to be destructed. At that time it is completely unnecessary to swap the contents instead of just copying them (if copying would be faster, which may or may not be the case of course). For a trivially copyable object it's obvious, one memcpy instead of swapping.

 

What I was originally wondering was whether there is any chance the compiler can assume that the pop_back() means the memory at back() is undefined and therefore optimize away the swap and replace with a memcpy when appropriate. If destruction of an object means that the contents of the memory that object occupied is undefined afterwards it could theoretically.. and perhaps even realistically in simple inlined cases like this. I'm not sure that is actually a rule though?

Share this post


Link to post
Share on other sites
I would disagree. If an object fulfills a theoretical std::is_trivially_destructable but not std::is_trivially_copyable it has non-trivial copy and/or move semantics without a non-trivial destructor. That's a classic violation of the Rule of Three (or the newer C++11-counterparts) and it should probably only fulfill std::is_probably_broken.

Share this post


Link to post
Share on other sites

template<typename ContainerType>
void SwapAndPopAtIndex(ContainerType & container, size_t index)
{
    if (index + 1 != container.size())//#####################
        std::swap(container[index], container.back());
    container.pop_back();
}

Could you just remove the highlighted line, assuming that if an item is expensive to swap, it will contain it's own early-out inside a if-swapping-this-with-this test?

One would hope, but in practice the reality is that if you're writing a general purpose library chances are someone is going to do something stupid with it (such as having an expensive to copy object with no early out on the copy side). Better to be safe than sorry.

Share this post


Link to post
Share on other sites
That's a classic violation of the Rule of Three (or the newer C++11-counterparts) and it should probably only fulfill std::is_probably_broken. 

 

 

Not always. The rule of three says if you have one, you probably should have all three. There are absolutely valid cases for having one-or-more, but not all.

 

For example, if all the members of your class leverage RAII, there's no need to explicit define a destructor, even if you needed to define a copy constructor or assignment operator (or delete one of the same) because one of the members was a reference or const.

Edited by Josh Petrie

Share this post


Link to post
Share on other sites

I remember from some C++ talks that the std::map or std::unordered_map might be suitable aswell.

std::remove_if(mymap.begin(), mymap.end(), [](std::pair<stuff1, stuff2> &mypair){
	return mypair.first->Status > 1;
});

zX07TZR.png

 

**EDIT:

Another approach I am testing is to iterate over the vector and remove elements on a condition, in this example an std::future:

std::vector<std::future<DWORD>> Futures;
for (auto it = Futures.begin(); it != Futures.end() ; )
{
	if (it->wait_for(std::chrono::seconds(0)) == std::future_status::ready)
	{
		std::swap(*it, Futures.back());
		Futures.pop_back();
	} else {
		it++;
	}
}
Edited by Tispe

Share this post


Link to post
Share on other sites

[background=#fafbfc]That's a classic violation of the Rule of Three (or the newer C++11-counterparts) and it should probably only fulfill std::is_probably_broken.[/size][/background]

 
Not always. The rule of three says if you have one, you probably should have all three. There are absolutely valid cases for having one-or-more, but not all.
 
For example, if all the members of your class leverage RAII, there's no need to explicit define a destructor, even if you needed to define a copy constructor or assignment operator (or delete one of the same) because one of the members was a reference or const.

I find it difficult to agree with this statement, probably because I have spent at least one day too much debugging code where the root cause turned out to be a Rule of Three violation after a lot of suffering. I would say in practically all cases where a contained RAII resource is present you do not need to specify any of the three (because the compiler will automatically generate them based on what is already defined or not defined for the RAII resource).

Also, if there is a RAII resource inside the class then it's not going to have a trivial destructor, then the class which contains it won't have that either and my statements were in the larger context of whether a std::is_trivially_destructable would have a real use. I maintain 'no' on that count.
With a bit of creativity I'm sure one could cobble together an actual use case for going against the Rule of Three. But I also believe those are so rare it should lead to more than normal reflection about if it's really okay.

Share this post


Link to post
Share on other sites

The fact that it does not handle the case of desiring to remove the last element means this is less than a robust solution, as you end up having to write a conditional for every usage...

 

Oops, that's a flaw. I meant to avoid the swap, but I still want the pop, if the index is the last one.

Share this post


Link to post
Share on other sites

I'm getting Debug Assertion Failed! on the vector<std::future<DWORD>> pop!

for (auto it = Futures.begin(); it != Futures.end() ; )
{
	if (it->wait_for(std::chrono::seconds(0)) == std::future_status::ready)
	{
		std::swap(*it, Futures.back());
		Futures.pop_back();              // Assertion Failed!
	} else {
		it++;
	}
}

 

Debug Assertion Failed!

 
Program: C:\Windows\system32\MSVCP120D.dll
File: c:\program files (x86)\microsoft visual studio 12.0\vc\include\vector
Line: 240
 
Expression: vector iterators incompatible

 

What is going on?

Share this post


Link to post
Share on other sites
http://stackoverflow.com/questions/62340/does-pop-back-really-invalidate-all-iterators-on-an-stdvector

However, in your code example, it can invalidate it when it is pointing to the last element and the value is below 10. In which case Visual Studio debug STL will mark iterator as invalidated, and further check for it not being equal to end() will show an assert.

Share this post


Link to post
Share on other sites

http://stackoverflow.com/questions/62340/does-pop-back-really-invalidate-all-iterators-on-an-stdvector
 

However, in your code example, it can invalidate it when it is pointing to the last element and the value is below 10. In which case Visual Studio debug STL will mark iterator as invalidated, and further check for it not being equal to end() will show an assert.

 

 

Edit: Baw, can't solve using this code where I try to update the iterator :(

	auto EndIt = Futures.end();
	for (auto it = Futures.begin() ; it != EndIt ; it++)
	{
		if (it->wait_for(std::chrono::seconds(0)) == std::future_status::ready)
		{
			std::swap(*it, Futures.back());
			Futures.pop_back();
			EndIt = Futures.end();
		}
	}
Edited by Tispe

Share this post


Link to post
Share on other sites

threw away interators in favor of indices:

	for (size_t i = 0; i < Futures.size(); )
	{
		if (Futures[i].wait_for(std::chrono::seconds(0)) == std::future_status::ready)
		{
			std::swap(Futures[i], Futures.back());
			Futures.pop_back();
		} else {
			i++;
		}
	}

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this