# std::vector operator [] is evil?

## Recommended Posts

Hello all I've made an interesting discovery while optimizing the performance of my 3D game app. Using operator [] to index STL vectors in performance-critical loops seems to seriously hurt framerate. Multiple nested index operators worsen it even more, eg:
something = verts[tris[i].vindex[j]];
where verts and tris are STL vectors. Has anyone else experienced this? When I switched to either using iterators in loops or just convetional C arrays in place of std::vectors, framerate rise from 63 FPS to whopping 122 FPS. [rolleyes] I'm using Visual Studio.NET 2002 and Dinkumware STL implementation bundled with it.

##### Share on other sites
I'm not always someone who defends the STL, but I'd say the the implementation of the STL you are using probably does play some factor in there. Plus there is going to be some small loss of performance just do to the bounds checking that the vector class does (which does make it safer to use than your typical c/c++ style array).

##### Share on other sites
All the STL implementations I've come across do not do bounds-checking on std::vector::operator[] by default.

Arkion, good performance for std::vector depends on the compiler correctly inlining function calls. Are you compiling in Debug mode? If so, then you shouldn't trust ANYTHING about speed differences. Only draw conclusions based on Release mode builds, with full optimization.

##### Share on other sites
Operator[] usually doesn't bounds check; I believe a function called at() does. I don't know the actual name, because it has been a while since I used C++ and its STL.

##### Share on other sites
It is the diet-coke of evil.

##### Share on other sites
I wrote quick program to see the difference between vector and array access and the only difference of note were these two lines. Other than these two lines, the generated assembly was the same for both.
00401247  mov         esi,dword ptr [v+4 (411490h)] 0040124D  add         edi,offset a (409380h)

Using a vector requires one additional indirection.

Here is my source:
static vector< s >	v( size );static s		a[ size ];	for ( int i = 0; i < size; ++i )	{		int j = rand();		int k = rand();		a[j] = v[k];	}

I can't see how changing from vector to array would double your frame rate, unless there is something really unique about your code. Along with Sneftel, I suspect you are profiling your debug configuration. It's a common mistake.

##### Share on other sites
If you're using MS VisualStudio then remember it's also affected by what mode you're running. A friend profiled std::string and std::vector and the results are much faster in Release mode. Debug mode has a load of extra faff for debugging information and error checking.

/Edit: misread post - argh. Duly corrected.

##### Share on other sites
Don't profile in debug mode.

If you arn't, update your compiler to something developed this millenia.

##### Share on other sites
The STL in a release compile is fast. I mean faster than anything I've recreated.

My homebrew array class' in debug compile are way faster than VECTOR.
In release mode VECTOR wins, it unrolls a lot into asm I believe which is more than any hobbiest wants to do.

I can send you performance tests and a array class.

##### Share on other sites
I had a similar problem to this using maps. I may be wrong on this but I think:

When you use the [] operator the element in the vector gets copied. So if you have a vector of vectors, the inner vector will get copied when you use the first [] operator. This would account for the performance hit, and certainly did with my map of maps.

Like I say I could be wrong but I seem to remember reading somewhere that this is the case.

##### Share on other sites
Quote:
 Original post by JYWhen you use the [] operator the element in the vector gets copied. So if you have a vector of vectors, the inner vector will get copied when you use the first [] operator. This would account for the performance hit, and certainly did with my map of maps.Like I say I could be wrong but I seem to remember reading somewhere that this is the case.

Yes you are completely wrong, both overloaded subscript operators for vector & (unordered_)map return constant and non-constant references depending on the context. If you don't declare your variables as of reference type then a copy will be made but this is exactly the same case for C-style arrays [headshake].

Quote:
 Original post by ArkionI'm using Visual Studio.NET 2002 and Dinkumware STL implementation bundled with it.

I also suggest you update your compiler as VC++ 7.0 is almost as bad as VC++ 6.0 which is significant, if you want an MS compiler get VC++ 7.1 > *.

##### Share on other sites
I had tried profiling the Release build back when the STL vector usage was still unoptimized (63 FPS in Debug), and the Release build resulted in framerate of 75 FPS.

I just tried the Release build again as many of you suggested, now with the optimized code, and it resulted in framerate of around 200 FPS! I can tell you, only optimizations I have done were that of STL vector usage. Summary of the results:

Before: Debug 63 FPS/Release 75 FPS
After: Debug 133 FPS/Release 200 FPS

This makes me draw a conclusion that Debug build can be indeed used for preliminary profiling. Because if FPS goes up in Debug mode, it will also go up in Release mode (though not necessarily in same proportion, as seen in the results above).

Just in case you are interested, here's a concrete example of optimized performance-critical code that bumped the FPS from 122 to 133 FPS in Debug mode. BSP tree for the test map has 172 I-nodes and 173 leaves (well balanced), making this code called 345 times per frame.

tree->m_nodes is a std::vector<BSPNode>.

Before:
void Renderer::RecursiveRender(BSPTree *tree, int node){	if(tree->m_nodes[node].m_isLeaf)	{		std::vector<Surface_t>::iterator iter;				for(iter = tree->m_nodes[node].m_surfaces.begin(); iter != tree->m_nodes[node].m_surfaces.end(); iter++)		{			glBindTexture(GL_TEXTURE_2D, iter->mat->diffMap);			glDrawArrays(GL_TRIANGLES, iter->firstVert, iter->count);		}		return;	}	RecursiveRender(tree, tree->m_nodes[node].m_iFrontChild);	RecursiveRender(tree, tree->m_nodes[node].m_iBackChild);}

After:
void Renderer::RecursiveRender(BSPTree *tree, int node){	BSPNode *pNode = &tree->m_nodes[node]; // Store pointer to the node and use it - much faster	if(pNode->m_isLeaf)	{		std::vector<Surface_t>::iterator iter;				for(iter = pNode->m_surfaces.begin(); iter != pNode->m_surfaces.end(); iter++)		{			glBindTexture(GL_TEXTURE_2D, iter->mat->diffMap);			glDrawArrays(GL_TRIANGLES, iter->firstVert, iter->count);		}		return;	}	RecursiveRender(tree, pNode->m_iFrontChild);	RecursiveRender(tree, pNode->m_iBackChild);}

##### Share on other sites
if you missed it the first time i'll say it again.

Quote:
Original post by snk_kid
Quote:
 Original post by ArkionI'm using Visual Studio.NET 2002 and Dinkumware STL implementation bundled with it.

I also suggest you update your compiler as VC++ 7.0 is almost as bad as VC++ 6.0 which is significant, if you want an MS compiler get VC++ 7.1 > *.

##### Share on other sites
Quote:
 Original post by ArkionJust in case you are interested, here's a concrete example of optimized performance-critical code that bumped the FPS from 122 to 133 FPS in Debug mode

One thing I noticed while looking at your code is that you post-increment the iterator in your for loop(ex. iter++ ). You should get in the habit of pre-incrementing any non-trivial data tpyes(ex. ++iter). This could speed thing up more.

-John

##### Share on other sites
Quote:
Original post by Teknofreek
Quote:
 Original post by ArkionJust in case you are interested, here's a concrete example of optimized performance-critical code that bumped the FPS from 122 to 133 FPS in Debug mode

One thing I noticed while looking at your code is that you post-increment the iterator in your for loop(ex. iter++ ). You should get in the habit of pre-incrementing any non-trivial data tpyes(ex. ++iter). This could speed thing up more.

If we are going doing that road then i'll add to that [grin], constant-correctness (constant members functions, use a constant iterator etc as your only rendering) & use generic algorithms over explicit loop code.

##### Share on other sites
Quote:
 Original post by ArkionAfter:for(iter = pNode->m_surfaces.begin(); iter != pNode->m_surfaces.end(); iter++)

You might want to cache the result of .end() to a const_iterator and compare with it, rather than evaluate the size of your vector couple hundred times ^^;

##### Share on other sites
For what it's worth, I was a big fan of using []'s just because I was so familiar with the syntax thanks to c++'s for loops and it was a lot less to type than iterators. But then I was chastised for doing so because it was a lot slower. So now I always use the iterative method. I don't know that I've noticed any difference but I think there are a number of other things that hurt my performance even more.

But using the []'s in your performance critical loops sounds like a very bad thing and I'm not surprised in the least. I imagine that operator was being called extremely often each frame.

I attended a lecture at a past GDC about optomizing code and the speaker spent at least half of the lecture talking about how bad STL is for performance and how something simple like what you did (switching []'s to iterators) will make a huge improvement.

##### Share on other sites
Quote:
 Original post by helixBut using the []'s in your performance critical loops sounds like a very bad thing

It's not. It is my sad duty to inform you that you should use release mode. Your compiler, if released within the past few years, now includes what's known as an "optimizer" which will eliminate the overhead for you, eliminating countless excuses to get down and jiggy with pointers and manual allocations. It's like depriving a pig of mud I tell you!!!

Quote:
 and I'm not surprised in the least. I imagine that operator was being called extremely often each frame.

In debug mode, yes, so you could in theory put a breakpoint in std::vector::operator[]. However, in release mode, "calling" is a misdominer - the assembly will not resemble anything like "calling", rather, it will produce the same assembly as something like this:

int * pie = new int[4238];
int i = pie[34];

The second line which, needless to say, involves absolutely no functions. This is due to optimizations such as automatic inlining.

Quote:
 I attended a lecture at a past GDC about optomizing code and the speaker spent at least half of the lecture talking about how bad STL is for performance and how something simple like what you did (switching []'s to iterators) will make a huge improvement.

How many years ago?

Recent compilers will optimize both out the same in many/most circumstances. There's still reasons to prefer iterators when it makes sense, but speed in most cases isn't one if them.

Advice gets old quick in this industry. 10 years ago in C you could write <<2 instead of *4 and see a preformance boost in your innermost loops. Today? Your compiler automatically preforms this and many many other optimizations as needed, automatically (if allowed by setting release mode/-O3/your compiler's equivilant of "optimize the heck outta my code" is), free of charge.

##### Share on other sites
just wanna point out that this
glBindTexture(GL_TEXTURE_2D, iter->mat->diffMap);
or even (less so) this
glDrawArrays(GL_TRIANGLES, iter->firstVert, iter->count);

are gonna take (roughly 20billion times, slight exageration) as long as how youre doing the loop
ie
youre barking up the wrong tree

##### Share on other sites
Quote:
 Original post by snk_kidIf you don't declare your variables as of reference type then a copy will be made...

But this isn't true for vectors?

##### Share on other sites
Quote:
Original post by JY
Quote:
 Original post by snk_kidIf you don't declare your variables as of reference type then a copy will be made...

But this isn't true for vectors?

[rolleyes] what exactly do you mean? as i said before both of vector/map's overloaded subscript operators return constant and non constant reference to there elements, this is exactly the same semantics as C-style arrays in C++. Do you understand what l-values & r-values are?

When you access an element of an array of T in the context of an r-value the type is const T&.
When you access an element of an array of T in the context of an l-value the type is T& or const T& if the array has been declared constant.

If that still doesn't make sense take this example:

foo bar[N]; // a C-style array

foo foo_val = bar[0]; // a copy is made of the first element

const foo& const_foo_ref = bar[0]; // refers to the first element, no copy its the original.

foo& foo_ref = bar[0]; // refers to the first element, no copy its the original.

And the same holds for vector/map.

##### Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
Original post by JY
Quote:
 Original post by snk_kidIf you don't declare your variables as of reference type then a copy will be made...

But this isn't true for vectors?

[rolleyes] what exactly do you mean? as i said before both of vector/map's overloaded subscript operators return constant and non constant reference to there elements, this is exactly the same semantics as C-style arrays in C++. Do you understand what l-values & r-values are?

When you access an element of an array of T in the context of an r-value the type is const T&.
When you access an element of an array of T in the context of an l-value the type is T& or const T& if the array has been declared constant.

If that still doesn't make sense take this example:

foo bar[N]; // a C-style array

foo foo_val = bar[0]; // a copy is made of the first element

const foo& const_foo_ref = bar[0]; // refers to the first element, no copy its the original.

foo& foo_ref = bar[0]; // refers to the first element, no copy its the original.

And the same holds for vector/map.

That was me, i was logged out for unknown reasons again [headshake].

Anyways those people who keep saying vector's subcript operator and vector itself is slow are talking rubbish:

• vector's subscript operator does not check bounds (except maybe in debug builds). If you want bounds checking use vector's at method.

• On modern C++ compilers with good standard compliance, with optimizations on vector is equivalent to a C-style dynamic in efficiency, infact in typical implementations of vector, when optimizations are on it will compile out to a cleverly used/manipulated C-style dynamic array.

[Edited by - snk_kid on June 30, 2005 5:47:27 AM]

##### Share on other sites
hmm but most times we use [] operator we r using it cuz we need a random access iterator and not to loop through all the list using a normal iterator, this is where [] is very useful. so in new compilers it can be as useful as begin()+i

##### Share on other sites
Quote:
 Original post by tolarisYou might want to cache the result of .end() to a const_iterator and compare with it, rather than evaluate the size of your vector couple hundred times ^^;

That's a good idea. I never thought about doing that.

##### Share on other sites
Quote:
 You might want to cache the result of .end() to a const_iterator and compare with it, rather than evaluate the size of your vector couple hundred times ^^;

Quote:
 One thing I noticed while looking at your code is that you post-increment the iterator in your for loop(ex. iter++ ). You should get in the habit of pre-incrementing any non-trivial data tpyes(ex. ++iter). This could speed thing up more.

It's good to get into the habit of doing these thing, but don't expect the code produced to be any faster when compiled with an optimizing compiler.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628306
• Total Posts
2981940

• 9
• 11
• 11
• 11
• 10