• Advertisement
Sign in to follow this  

std::vector slow

This topic is 3825 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have to calculate some geometry dynamically (about 6000 vertices and 18000 indices) every frame. As I calculated the geometry, I was using push_back to store the computed vertices and indices. Note: I called vector::reserve at initialization time with a large enough max, so no memory allocation should be needed at runtime. I narrowed it down that filling the vectors was the slow part. I replaced them with C-arrays, and got about a 20 ms performance increase per frame. I went from 20 ms per frame to 2 ms per frame. Any ideas why it was slow? The code looked like this:
WORD baseIndex = (WORD)m_vertices.size();

	for(int i = 0; i < numVertices; ++i)
		m_vertices.push_back(v);

	m_vertices.push_back(centroid);

	for(int i = 0; i < numVertices; ++i)
	{
		m_indices.push_back( baseIndex+i ); 
		m_indices.push_back( baseIndex + ((i+1)%numVertices) );
		m_indices.push_back( baseIndex + numVertices ); // centroid
	}

Share this post


Link to post
Share on other sites
Advertisement
Was this in Debug or Release mode?

If in Debug mode then std::vector::push_back might be performing some error checking (memory verification stuff) that in large unoptimized quantities would be much slower than just an array index (which assembles to an offset even when Debugging I think?). This is the first thing that comes to mind, but if this is a Release build push_back probably would optimize to the array code anyway (besides the function call). Check the assembly output if possible or step through the assembly in your debugger. Thats as much as I can think of, as long as you're positive the vector isn't being cleared/reallocated every frame.

Share this post


Link to post
Share on other sites
Its probably having to reallocate and copy memory around, try calling .reserve with the amount of pushes you are expecting before you start pushing them:


WORD baseIndex = (WORD)m_vertices.size();

m_vertices.reserve(numVertices+1);
for(int i = 0; i < numVertices; ++i)

m_vertices.push_back(v);



m_vertices.push_back(centroid);


m_indices.reserve(numVertices*3);
for(int i = 0; i < numVertices; ++i)

{

m_indices.push_back( baseIndex+i );

m_indices.push_back( baseIndex + ((i+1)%numVertices) );

m_indices.push_back( baseIndex + numVertices ); // centroid

}




Edit: Also, don't clear and fill the vector each frame, modify its contents instead.

Share this post


Link to post
Share on other sites
If you really need to "clear" the vector, use resize(0) instead, since clear() might (depending on your compiler) actually also free the memory.

Share this post


Link to post
Share on other sites
// Bad:
for(int i = 0; i < numVertices; ++i)
m_vertices.push_back(v);

// Good:
m_vertices.insert(m_vertices.end(), v, v+numVertices);

Share this post


Link to post
Share on other sites
Quote:

If you're using VS2k5, it performs quite a bit of error-checking by default, even in release builds. You have to make a couple of #define's to disable it.


I am using VS2005. Do you know what the defines are?

Share this post


Link to post
Share on other sites
#define _SCL_SECURE 0


Also, Kwizatz is showing off the perfect way to get your .reserve()s wrong, you want:
m_vertices.reserve( baseIndex + numVertices + 1 );
m_indices.reserve( m_indicies.size() + numVertices*3 );


VS2k5's implementation of .reserve, at least, also fails to grow exponentially if .reserve(n), n > .capacity() -- this could actually end up slowing things down as a result.

Share this post


Link to post
Share on other sites
Thanks for the replies so far.

One thing I recently tried was to stop using push_back. Instead, I just used vector::operator[], and kept my own size counter. This is now comparable to the array performance.

I noticed push_back has a conditional statement (in case it needs to resize). Do you think that is the extra overhead? I read hardware does not like branching statements.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement