Sign in to follow this  
Arkion

std::vector operator [] is evil?

Recommended Posts

Arkion    122
Hello all I've made an interesting discovery while optimizing the performance of my 3D game app. Using operator [] to index STL vectors in performance-critical loops seems to seriously hurt framerate. Multiple nested index operators worsen it even more, eg:
something = verts[tris[i].vindex[j]];
where verts and tris are STL vectors. Has anyone else experienced this? When I switched to either using iterators in loops or just convetional C arrays in place of std::vectors, framerate rise from 63 FPS to whopping 122 FPS. [rolleyes] I'm using Visual Studio.NET 2002 and Dinkumware STL implementation bundled with it.

Share this post


Link to post
Share on other sites
Rattrap    3385
I'm not always someone who defends the STL, but I'd say the the implementation of the STL you are using probably does play some factor in there. Plus there is going to be some small loss of performance just do to the bounds checking that the vector class does (which does make it safer to use than your typical c/c++ style array).

Share this post


Link to post
Share on other sites
Sneftel    1788
All the STL implementations I've come across do not do bounds-checking on std::vector::operator[] by default.

Arkion, good performance for std::vector depends on the compiler correctly inlining function calls. Are you compiling in Debug mode? If so, then you shouldn't trust ANYTHING about speed differences. Only draw conclusions based on Release mode builds, with full optimization.

Share this post


Link to post
Share on other sites
Daniel Miller    218
Operator[] usually doesn't bounds check; I believe a function called at() does. I don't know the actual name, because it has been a while since I used C++ and its STL.

Share this post


Link to post
Share on other sites
JohnBolton    1372
I wrote quick program to see the difference between vector and array access and the only difference of note were these two lines. Other than these two lines, the generated assembly was the same for both.
00401247  mov         esi,dword ptr [v+4 (411490h)] 
0040124D add edi,offset a (409380h)

Using a vector requires one additional indirection.

Here is my source:

static vector< s > v( size );
static s a[ size ];

for ( int i = 0; i < size; ++i )
{
int j = rand();
int k = rand();
a[j] = v[k];
}




I can't see how changing from vector to array would double your frame rate, unless there is something really unique about your code. Along with Sneftel, I suspect you are profiling your debug configuration. It's a common mistake.

Share this post


Link to post
Share on other sites
If you're using MS VisualStudio then remember it's also affected by what mode you're running. A friend profiled std::string and std::vector and the results are much faster in Release mode. Debug mode has a load of extra faff for debugging information and error checking.

/Edit: misread post - argh. Duly corrected.

Share this post


Link to post
Share on other sites
Halsafar    205
The STL in a release compile is fast. I mean faster than anything I've recreated.

My homebrew array class' in debug compile are way faster than VECTOR.
In release mode VECTOR wins, it unrolls a lot into asm I believe which is more than any hobbiest wants to do.

I can send you performance tests and a array class.

Share this post


Link to post
Share on other sites
JY    289
I had a similar problem to this using maps. I may be wrong on this but I think:

When you use the [] operator the element in the vector gets copied. So if you have a vector of vectors, the inner vector will get copied when you use the first [] operator. This would account for the performance hit, and certainly did with my map of maps.

Like I say I could be wrong but I seem to remember reading somewhere that this is the case.

Share this post


Link to post
Share on other sites
snk_kid    1312
Quote:
Original post by JY
When you use the [] operator the element in the vector gets copied. So if you have a vector of vectors, the inner vector will get copied when you use the first [] operator. This would account for the performance hit, and certainly did with my map of maps.

Like I say I could be wrong but I seem to remember reading somewhere that this is the case.


Yes you are completely wrong, both overloaded subscript operators for vector & (unordered_)map return constant and non-constant references depending on the context. If you don't declare your variables as of reference type then a copy will be made but this is exactly the same case for C-style arrays [headshake].

Quote:
Original post by Arkion
I'm using Visual Studio.NET 2002 and Dinkumware STL implementation bundled with it.



I also suggest you update your compiler as VC++ 7.0 is almost as bad as VC++ 6.0 which is significant, if you want an MS compiler get VC++ 7.1 > *.

Share this post


Link to post
Share on other sites
Arkion    122
I had tried profiling the Release build back when the STL vector usage was still unoptimized (63 FPS in Debug), and the Release build resulted in framerate of 75 FPS.

I just tried the Release build again as many of you suggested, now with the optimized code, and it resulted in framerate of around 200 FPS! I can tell you, only optimizations I have done were that of STL vector usage. Summary of the results:

Before: Debug 63 FPS/Release 75 FPS
After: Debug 133 FPS/Release 200 FPS

This makes me draw a conclusion that Debug build can be indeed used for preliminary profiling. Because if FPS goes up in Debug mode, it will also go up in Release mode (though not necessarily in same proportion, as seen in the results above).

Just in case you are interested, here's a concrete example of optimized performance-critical code that bumped the FPS from 122 to 133 FPS in Debug mode. BSP tree for the test map has 172 I-nodes and 173 leaves (well balanced), making this code called 345 times per frame.

tree->m_nodes is a std::vector<BSPNode>.

Before:

void Renderer::RecursiveRender(BSPTree *tree, int node)
{
if(tree->m_nodes[node].m_isLeaf)
{
std::vector<Surface_t>::iterator iter;

for(iter = tree->m_nodes[node].m_surfaces.begin(); iter != tree->m_nodes[node].m_surfaces.end(); iter++)
{
glBindTexture(GL_TEXTURE_2D, iter->mat->diffMap);
glDrawArrays(GL_TRIANGLES, iter->firstVert, iter->count);
}

return;
}

RecursiveRender(tree, tree->m_nodes[node].m_iFrontChild);
RecursiveRender(tree, tree->m_nodes[node].m_iBackChild);
}




After:

void Renderer::RecursiveRender(BSPTree *tree, int node)
{
BSPNode *pNode = &tree->m_nodes[node]; // Store pointer to the node and use it - much faster

if(pNode->m_isLeaf)
{
std::vector<Surface_t>::iterator iter;

for(iter = pNode->m_surfaces.begin(); iter != pNode->m_surfaces.end(); iter++)
{
glBindTexture(GL_TEXTURE_2D, iter->mat->diffMap);
glDrawArrays(GL_TRIANGLES, iter->firstVert, iter->count);
}

return;
}

RecursiveRender(tree, pNode->m_iFrontChild);
RecursiveRender(tree, pNode->m_iBackChild);
}


Share this post


Link to post
Share on other sites
snk_kid    1312
if you missed it the first time i'll say it again.

Quote:
Original post by snk_kid
Quote:
Original post by Arkion
I'm using Visual Studio.NET 2002 and Dinkumware STL implementation bundled with it.



I also suggest you update your compiler as VC++ 7.0 is almost as bad as VC++ 6.0 which is significant, if you want an MS compiler get VC++ 7.1 > *.


Share this post


Link to post
Share on other sites
Teknofreek    331
Quote:
Original post by Arkion
Just in case you are interested, here's a concrete example of optimized performance-critical code that bumped the FPS from 122 to 133 FPS in Debug mode


One thing I noticed while looking at your code is that you post-increment the iterator in your for loop(ex. iter++ ). You should get in the habit of pre-incrementing any non-trivial data tpyes(ex. ++iter). This could speed thing up more.

-John

Share this post


Link to post
Share on other sites
snk_kid    1312
Quote:
Original post by Teknofreek
Quote:
Original post by Arkion
Just in case you are interested, here's a concrete example of optimized performance-critical code that bumped the FPS from 122 to 133 FPS in Debug mode


One thing I noticed while looking at your code is that you post-increment the iterator in your for loop(ex. iter++ ). You should get in the habit of pre-incrementing any non-trivial data tpyes(ex. ++iter). This could speed thing up more.


If we are going doing that road then i'll add to that [grin], constant-correctness (constant members functions, use a constant iterator etc as your only rendering) & use generic algorithms over explicit loop code.

Share this post


Link to post
Share on other sites
tolaris    288
Quote:
Original post by Arkion
After:

for(iter = pNode->m_surfaces.begin(); iter != pNode->m_surfaces.end(); iter++)

You might want to cache the result of .end() to a const_iterator and compare with it, rather than evaluate the size of your vector couple hundred times ^^;

Share this post


Link to post
Share on other sites
helix    301
For what it's worth, I was a big fan of using []'s just because I was so familiar with the syntax thanks to c++'s for loops and it was a lot less to type than iterators. But then I was chastised for doing so because it was a lot slower. So now I always use the iterative method. I don't know that I've noticed any difference but I think there are a number of other things that hurt my performance even more.

But using the []'s in your performance critical loops sounds like a very bad thing and I'm not surprised in the least. I imagine that operator was being called extremely often each frame.

I attended a lecture at a past GDC about optomizing code and the speaker spent at least half of the lecture talking about how bad STL is for performance and how something simple like what you did (switching []'s to iterators) will make a huge improvement.

Share this post


Link to post
Share on other sites
MaulingMonkey    1730
Quote:
Original post by helix
But using the []'s in your performance critical loops sounds like a very bad thing


It's not. It is my sad duty to inform you that you should use release mode. Your compiler, if released within the past few years, now includes what's known as an "optimizer" which will eliminate the overhead for you, eliminating countless excuses to get down and jiggy with pointers and manual allocations. It's like depriving a pig of mud I tell you!!!

Quote:
and I'm not surprised in the least. I imagine that operator was being called extremely often each frame.


In debug mode, yes, so you could in theory put a breakpoint in std::vector::operator[]. However, in release mode, "calling" is a misdominer - the assembly will not resemble anything like "calling", rather, it will produce the same assembly as something like this:

int * pie = new int[4238];
int i = pie[34];

The second line which, needless to say, involves absolutely no functions. This is due to optimizations such as automatic inlining.

Quote:
I attended a lecture at a past GDC about optomizing code and the speaker spent at least half of the lecture talking about how bad STL is for performance and how something simple like what you did (switching []'s to iterators) will make a huge improvement.


How many years ago?

Recent compilers will optimize both out the same in many/most circumstances. There's still reasons to prefer iterators when it makes sense, but speed in most cases isn't one if them.

Advice gets old quick in this industry. 10 years ago in C you could write <<2 instead of *4 and see a preformance boost in your innermost loops. Today? Your compiler automatically preforms this and many many other optimizations as needed, automatically (if allowed by setting release mode/-O3/your compiler's equivilant of "optimize the heck outta my code" is), free of charge.

Share this post


Link to post
Share on other sites
zedzeek    529
just wanna point out that this
glBindTexture(GL_TEXTURE_2D, iter->mat->diffMap);
or even (less so) this
glDrawArrays(GL_TRIANGLES, iter->firstVert, iter->count);

are gonna take (roughly 20billion times, slight exageration) as long as how youre doing the loop
ie
youre barking up the wrong tree

Share this post


Link to post
Share on other sites
JY    289
Quote:
Original post by snk_kid
If you don't declare your variables as of reference type then a copy will be made...


But this isn't true for vectors?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Quote:
Original post by JY
Quote:
Original post by snk_kid
If you don't declare your variables as of reference type then a copy will be made...


But this isn't true for vectors?


[rolleyes] what exactly do you mean? as i said before both of vector/map's overloaded subscript operators return constant and non constant reference to there elements, this is exactly the same semantics as C-style arrays in C++. Do you understand what l-values & r-values are?

When you access an element of an array of T in the context of an r-value the type is const T&.
When you access an element of an array of T in the context of an l-value the type is T& or const T& if the array has been declared constant.

If that still doesn't make sense take this example:


foo bar[N]; // a C-style array

foo foo_val = bar[0]; // a copy is made of the first element

const foo& const_foo_ref = bar[0]; // refers to the first element, no copy its the original.

foo& foo_ref = bar[0]; // refers to the first element, no copy its the original.


And the same holds for vector/map.

Share this post


Link to post
Share on other sites
snk_kid    1312
Quote:
Original post by Anonymous Poster
Quote:
Original post by JY
Quote:
Original post by snk_kid
If you don't declare your variables as of reference type then a copy will be made...


But this isn't true for vectors?


[rolleyes] what exactly do you mean? as i said before both of vector/map's overloaded subscript operators return constant and non constant reference to there elements, this is exactly the same semantics as C-style arrays in C++. Do you understand what l-values & r-values are?

When you access an element of an array of T in the context of an r-value the type is const T&.
When you access an element of an array of T in the context of an l-value the type is T& or const T& if the array has been declared constant.

If that still doesn't make sense take this example:


foo bar[N]; // a C-style array

foo foo_val = bar[0]; // a copy is made of the first element

const foo& const_foo_ref = bar[0]; // refers to the first element, no copy its the original.

foo& foo_ref = bar[0]; // refers to the first element, no copy its the original.


And the same holds for vector/map.



That was me, i was logged out for unknown reasons again [headshake].

Anyways those people who keep saying vector's subcript operator and vector itself is slow are talking rubbish:


  • vector's subscript operator does not check bounds (except maybe in debug builds). If you want bounds checking use vector's at method.

  • On modern C++ compilers with good standard compliance, with optimizations on vector is equivalent to a C-style dynamic in efficiency, infact in typical implementations of vector, when optimizations are on it will compile out to a cleverly used/manipulated C-style dynamic array.



[Edited by - snk_kid on June 30, 2005 5:47:27 AM]

Share this post


Link to post
Share on other sites
ZeRaW    384
hmm but most times we use [] operator we r using it cuz we need a random access iterator and not to loop through all the list using a normal iterator, this is where [] is very useful. so in new compilers it can be as useful as begin()+i

Share this post


Link to post
Share on other sites
helix    301
Quote:
Original post by tolaris
You might want to cache the result of .end() to a const_iterator and compare with it, rather than evaluate the size of your vector couple hundred times ^^;


That's a good idea. I never thought about doing that.

Share this post


Link to post
Share on other sites
Nitage    1107
Quote:

You might want to cache the result of .end() to a const_iterator and compare with it, rather than evaluate the size of your vector couple hundred times ^^;


Quote:

One thing I noticed while looking at your code is that you post-increment the iterator in your for loop(ex. iter++ ). You should get in the habit of pre-incrementing any non-trivial data tpyes(ex. ++iter). This could speed thing up more.


It's good to get into the habit of doing these thing, but don't expect the code produced to be any faster when compiled with an optimizing compiler.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this