Strange slowness, std::vector, MSVC++9

Started by
39 comments, last by Emmanuel Deloget 15 years, 8 months ago
Hi, The following code is a renderlist, containing a list of RenderNodes.

class RenderList;
class RenderNode;

#ifndef class_RenderList_h
#define class_RenderList_h 1
#include "../StaticMesh.h"
#include "../gMaths.h"

class RenderNode
{
	public:
		RenderNode(){};
		RenderNode(StaticMesh &m, g_transform &p):mesh(m),pos(p){}

		StaticMesh mesh;
		g_transform pos;
};

class RenderList
{
	public:
		RenderList()
		{
		
		}
		
		~RenderList()
		{
			Flush();
		}
		
		void AddNode(StaticMesh& m, g_transform& p)
		{
			RenderNode newnode(m,p);
			nodes.push_back(newnode);
		}
		
		void Flush()
		{
			nodes.clear();
		}
		
		void ConsoleDump()
		{
			for(unsigned int index=0; index < nodes.size(); index++)
			{
				std::cout<<"RenderList contains node named "<<nodes[index].mesh.name<<"\n";
			}
		}
		//TAKES 60ms
		void RenderList::Append(const RenderList& list)
		{
			nodes.reserve(list.nodes.size());
			nodes.insert(nodes.end(),list.nodes.begin(),list.nodes.end());
		}
		//TAKES 21ms WITH SAME DATA
		void RenderList::Append(const RenderList& list)
		{
			nodes.reserve(list.nodes.size());
			unsigned int end = list.nodes.size();
			for(unsigned int index = 0; index < end; ++index)
			{
				nodes.push_back(list.nodes[index]);
			}
		}

	private:		
		std::vector<RenderNode> nodes;
		friend class Renderer;
};
#endif




Now, for some reason the first version of RenderList::Append() is slower than the second one, as it takes 60ms to copy 227 elements into the first render list compared with the second version of Append() which takes 21ms to perform the same operation. All I am doing to test the two is commenting one out in code and leaving the other, then recompiling. Can anybody shed some light on why the first is slower than the second? Surely a single insert() should be faster than a series of push_back() ? Thanks. [Edited by - speciesUnknown on August 9, 2008 10:07:44 PM]
Don't thank me, thank the moon's gravitation pull! Post in My Journal and help me to not procrastinate!
Advertisement
If you call push_back over and over the vector will need to be reallocated and copied each time its capacity is exceeded.

However, if you call insert the vector can perform a single reallocation to accommodate the new elements.

[edit]

D'oh. Thats what I get for skimming your post.
Quote:Original post by fpsgamer
If you call push_back several times the vector may need to be resized as much as 'n' times.

However, if you call insert it can perform a single resize to accommodate all the new elements.


I dont think i made muyself clear. The first one uses insert() and takes 60ms. the second one uses a series of push_back() and takes 20ms. I would expect the opposite effect.


Quote:

[edit]

D'oh. Thats what I get for skimming your post.

Yeah, im guilty of that occasionally.
But still, why the bizarre slowness? Everything im reading suggests that insert() should be faster than a load of single calls to push_back.
Don't thank me, thank the moon's gravitation pull! Post in My Journal and help me to not procrastinate!
Well its no longer a performance issue, but rather an absurdity, because by adding some basic optimisations elsewhere I have reduced the gather time to less than a millisecond.

But I am still wondering how insert() can be slower than a load of push_back().
Don't thank me, thank the moon's gravitation pull! Post in My Journal and help me to not procrastinate!
AFAIK, the standard library makes no performance claims regarding a single call to std::vector<T>::insert vs. repeatedly calling std::vector<T>::push_back. All I could find in Josuttis is a mention that calling insert can be expected to be faster when:

  • Elements are inserted or removed at the end

  • The capacity is large enough on entry

  • Multiple elements are inserted by a single call rather than by multiple calls


Note that the last point is with respect to multiple calls to insert, rather than multiple calls of push_back.

One thing I did notice is that you call reserve on your existing list with the size of the list to be appended to the existing list. Reserve specifies the desired total capacity, not the desired extension to the existing capacity, so you should more properly write:

nodes.reserve(nodes.size() + list.nodes.size());

My free book on Direct3D: "The Direct3D Graphics Pipeline"
My blog on programming, vintage computing, music, politics, etc.: Legalize Adulthood!

Quote:Original post by legalize
AFAIK, the standard library makes no performance claims regarding a single call to std::vector<T>::insert vs. repeatedly calling std::vector<T>::push_back. All I could find in Josuttis is a mention that calling insert can be expected to be faster when:

  • Elements are inserted or removed at the end

  • The capacity is large enough on entry

  • Multiple elements are inserted by a single call rather than by multiple calls


Note that the last point is with respect to multiple calls to insert, rather than multiple calls of push_back.

One thing I did notice is that you call reserve on your existing list with the size of the list to be appended to the existing list. Reserve specifies the desired total capacity, not the desired extension to the existing capacity, so you should more properly write:

nodes.reserve(nodes.size() + list.nodes.size());


Right, i heeded what you said about reserve() being the total amount of space. However, this made no difference. I'll bear it in mind in the future, however. Thanks alot.

By reducing the amount of data being passed around when copying StaticMesh (it uses VBO's so i made it clear the initial array), I have brought the gather time to a millisecond for the push_back method, and 3ms for the insert() method. im still pondering the difference in speed, however.
Don't thank me, thank the moon's gravitation pull! Post in My Journal and help me to not procrastinate!
STL containers are deathly-slow when compiled in Debug mode. If you're not already using Release mode, try switching to it and then seeing if performance is still sqewed unusualy.
Quote:Original post by speciesUnknown
Right, i heeded what you said about reserve() being the total amount of space. However, this made no difference. I'll bear it in mind in the future, however. Thanks alot.


Yeah, this was just a side observation; I didn't expect that to be the source of the problem.

Quote:By reducing the amount of data being passed around when copying StaticMesh (it uses VBO's so i made it clear the initial array), I have brought the gather time to a millisecond for the push_back method, and 3ms for the insert() method. im still pondering the difference in speed, however.


The only thing I can think of is that the nature of the operations (insert being more general than push_back) causes insert to do more book-keeping overhead: insert must invalidate iterators because it can be used to insert items int he middle of a container, whereas push_back can never invalidate any existing iterators because it can only append to the container. The discussion in Josuttis for insert talks about its iterator invalidating consequences.

I know its a bit of a rat's nest, but you do have the whole source for insert and push_back in the standard library, so you can compare what they do to see why they perform differently.

If I had been doing this, I probably would have written:
std::copy(list.nodes.begin(), list.nodes.end(), back_inserter(nodes));

I wonder how that would perform.

My free book on Direct3D: "The Direct3D Graphics Pipeline"
My blog on programming, vintage computing, music, politics, etc.: Legalize Adulthood!

It's because of the way that they decided to implement vector::insert. Here's the source code:

	template<class _Iter>		void _Insert(const_iterator _Where, _Iter _First, _Iter _Last,			input_iterator_tag)		{	// insert [_First, _Last) at _Where, input iterators #if _HAS_ITERATOR_DEBUGGING		if (_Where._Mycont != this			|| _Where._Myptr < _Myfirst || _Mylast < _Where._Myptr)			_DEBUG_ERROR("vector insert iterator outside range"); #endif /* _HAS_ITERATOR_DEBUGGING */		if (_First != _Last)			{	// worth doing, gather at end and rotate into place			size_type _Oldsize = size();			size_type _Whereoff = _Where._Myptr - _Myfirst;			for (; _First != _Last; ++_First)				_Insert_n(end(), (size_type)1, (value_type)*_First);			_Reverse(_Myfirst + _Whereoff, _Myfirst + _Oldsize);			_Reverse(_Myfirst + _Oldsize, _Mylast);			_Reverse(_Myfirst + _Whereoff, _Mylast);			}		}


It's those calls to _Reverse at the end which is causing the slowdown. Here's how it works:

Let's say you want to insert the sequence [a b c] into the middle of a vector containing [1 2 3 4 5 6]:
    1 2 3 4 5 6         ^       a b c

The input is added to the end:
    1 2 3 4 5 6 a b c    ^     ^     ^     ^_Myfirst  |     |   _Mylast       _Where   |                |        _Myfirst+_Oldsize


First reverse: ([4 5 6] reverses to [6 5 4])
1 2 3 6 5 4 a b c^     ^     ^     ^


Second reverse: ([a b c] reverses to [c b a])
1 2 3 6 5 4 c b a^     ^     ^     ^


Third reverse: ([6 5 4 c b a] reverses to [a b c 4 5 6])
1 2 3 a b c 4 5 6^     ^     ^     ^


The problem is that if you're inserting into the end, two of these three reverses will still occur. Taking the above example, if [a b c] was inserted into the end, it'd be reversed to [c b a], then immediately re-reversed back to [a b c]. If you have non-trivial copy constructors, this can turn out to be a very slow operation.

I wonder what Dinkumware's justification for this method was. No matter what you do, this method of "rotating into place" will always involve more copies than just resizing the vector, shoving across all elements after _Where, and then inserting the elements directly into place. Let's take the worst-case scenario: inserting into the beginning of a vector.

Take for example, inserting [a b c d] into the beginning of a vector containing [1 2 3 4 5 6]. Dinkumware's implementation will cause 2+3+5 = 10 swaps. Each swap is comprised of 3 copies, so that's 30 copies. But if the vector was simply increased in size by 4 elements, then the sequence [1 2 3 4 5 6] moved across by 4 elements that would only be 4 copies. Then the sequence [a b c d] would then be inserted into the now empty first 4 elements.

[Edited by - Sc4Freak on August 10, 2008 12:18:57 AM]
NextWar: The Quest for Earth available now for Windows Phone 7.
Lots of helpful suggestions, however I would like to clarify a few things,

1) I am compiling in release mode with debug symbols disabled.

2) I have disabled the secure SCL

3) I am copying by value, however only with a small amount of data. class StaticMesh looks like this:

class StaticMesh{	public:		StaticMesh();		~StaticMesh();		void addTriangle(const std::string& texture, static_triangle triangle);		// that single triangle is added				void finalise();		// mesh is finalised into a number of vertex buffers		void OpenGLDraw(TextureTable * m);		std::map<std::string,triangle_list>triangle_lists;		std::map<std::string,VertexBufferObject>vert_arrays;	private:		void m_immediate_mode_draw(TextureTable * m);		void m_vertex_array_draw(TextureTable * m);		// one vector of polygons per material		// no use of vertex references, each polygon keeps its own data.		GLuint VBOvertices;		GLuint VBOtexcoords;		bool finalised;	public:		void merge(const StaticMesh& sauce);};


The std::map of <std::string, triangle_list> is deleted after finalise() is called, and then cleared. when I was not doing this, I was getting gather times of 20ms with push_back and 60ms with insert(). now i get 1ms and 3ms respectively.

The map of <std::string, VertexBufferObject> also holds a small amount of data, this is what the VertexBufferObject class looks like:

class VertexBufferObject{	public:		VertexBufferObject();		VertexBufferObject(const triangle_list& source_data);		void Draw();	private:		void fillArrays(const triangle_list& in_data);		void EnableStates();		void DisableStates();		GLuint length;		GLuint VBOVertices;		GLuint VBOTexCoords;		friend class StaticMesh;};


however, i want to highlight that the ratio is still 3:1 after I drastically shrank the amount of data stored in them. (1ms and 3ms compared with 20ms and 60ms)
Don't thank me, thank the moon's gravitation pull! Post in My Journal and help me to not procrastinate!

This topic is closed to new replies.

Advertisement