STL Saves or Enslaves?

Started by
72 comments, last by Promit 18 years ago
Quote:Original post by Anonymous Poster
I really don't see how it is apples to oranges.. how is it not realworld? If i am constantly changing values in a fixed sized list that I index into, vector is TWICE as slow. It's much more realworld than most examples STL proponents use where they assign const values to the vector. Thats somethig that would just never happen in code.

The original poster asked what sort of an impact STL has. I gave one solid example. Yes it has helpful things, but it's at the expense of SPEED. I'm not saying STL isn't useful.. I'm just saying it's SLOWER.

STL's dependance on iterators adds overhead in the form of function calls, and reduced code cache coherency. I chose the vector, because it was an easy, reproducable, example.


How about you produce a resizeable array that beats std::vector. An array of known size would be more readily compared to boost::array or something like that.

[Edit]

for example, my own array class ( yes written this min )
template< class Type , int size >class Array{    Type array[size];public:    Type &operator[]( int i )    {        return array;    }};


The test( only modified to include my class )
  #define NUMELEMENTS 20000  #define NUMLOOPS    10000  int i, j, currloop;  LARGE_INTEGER start_count, end_count, final_array_count, final_vector_count;  //Start std::Vector benchmark  printf("Profiling Array: ");  QueryPerformanceCounter(&start_count);  //Allocating vector once - giving capacity as an argument  Array<char,NUMELEMENTS> testvector;  for(currloop = 0; currloop < NUMLOOPS; currloop++)  {    for(i=0;i<NUMELEMENTS;i++)    {      testvector = (char)(i%256);    }    for(i=0;i<NUMELEMENTS;i++)    {      j = testvector;    }  }  QueryPerformanceCounter(&end_count);  final_vector_count.QuadPart = end_count.QuadPart - start_count.QuadPart;  printf("%I64d\n",final_vector_count.QuadPart,final_vector_count.QuadPart);  //Start Array benchmark  printf("Profiling array: ");  QueryPerformanceCounter(&start_count);  char testarray[NUMELEMENTS]; //Allocating array once  for(currloop = 0; currloop < NUMLOOPS; currloop++)  {    for(i=0;i<NUMELEMENTS;i++)    {      testarray = (char)(i%256);    }    for(i=0;i<NUMELEMENTS;i++)    {      j = testarray;    }  }  QueryPerformanceCounter(&end_count);  final_array_count.QuadPart = end_count.QuadPart - start_count.QuadPart;  printf("%I64d\n",final_array_count.QuadPart,final_array_count.QuadPart);  printf("arrays are %I64d times faster than Arrays\n",final_vector_count.QuadPart/final_array_count.QuadPart);


Output:
Profiling Array: 1541215Profiling array: 1554889arrays are 0 times faster than ArraysProfiling Array: 1559871Profiling array: 1557985arrays are 1 times faster than ArraysProfiling Array: 1595685Profiling array: 1574723arrays are 1 times faster than ArraysProfiling Array: 1582116Profiling array: 1569467Arrays are 1 times faster than ArraysProfiling Array: 1541563Profiling array: 1542605arrays are 0 times faster than ArraysProfiling Array: 1568356Profiling array: 1538190arrays are 1 times faster than ArraysProfiling Array: 1537563Profiling array: 1548769arrays are 0 times faster than ArraysProfiling Array: 1553744Profiling array: 1534552arrays are 1 times faster than ArraysProfiling Array: 1564541Profiling array: 1578182arrays are 0 times faster than ArraysProfiling Array: 1544573Profiling array: 1585902arrays are 0 times faster than ArraysProfiling Array: 1719074Profiling array: 1580499arrays are 1 times faster than Arrays

[/Edit]

[Edit2]

Running your original test 5 times:

Profiling std::vector: 2339557Profiling array: 1927296Arrays are 1 times faster than std::vectorProfiling std::vector: 2324930Profiling array: 1928806Arrays are 1 times faster than std::vectorProfiling std::vector: 2370096Profiling array: 1949948Arrays are 1 times faster than std::vectorProfiling std::vector: 2315778Profiling array: 1951442Arrays are 1 times faster than std::vectorProfiling std::vector: 2296300Profiling array: 1895905Arrays are 1 times faster than std::vector



Not by as much as yours...

Managed to only use -O2, not -O3 [embarrass]
Profiling std::vector: 2366109Profiling array: 2425419Arrays are 0 times faster than std::vectorProfiling std::vector: 2402442Profiling array: 2385358Arrays are 1 times faster than std::vectorProfiling std::vector: 2333141Profiling array: 2345274Arrays are 0 times faster than std::vectorProfiling std::vector: 2310996Profiling array: 2349740Arrays are 0 times faster than std::vectorProfiling std::vector: 2315365Profiling array: 2346695Arrays are 0 times faster than std::vectorProfiling std::vector: 2416623Profiling array: 2416476Arrays are 1 times faster than std::vectorProfiling std::vector: 2384026Profiling array: 2439520Arrays are 0 times faster than std::vectorProfiling std::vector: 2476950Profiling array: 2382431Arrays are 1 times faster than std::vectorProfiling std::vector: 2417932Profiling array: 2398543Arrays are 1 times faster than std::vectorProfiling std::vector: 2398195Profiling array: 2410084Arrays are 0 times faster than std::vector


And so on...

[/Edit2]

[Edited by - rip-off on April 20, 2006 1:21:36 PM]
Advertisement
Quote:Original post by Anonymous Poster
Yes, that function retrieves data, but if you want to assign values to your array, it calls this function, which sets up an iterator and does all sorts of function calls and other comaratively slow things.

reference operator[](size_type _Pos)
{ // subscript mutable sequence
return (*(begin() + _Pos));
}


I think that would still be optimised away since begin() just returns a pointer. Unless the optimisations aren't switched on, in which case the test doesn't make any sense.

Bojan
Galactic Marbles: www.scientiagames.com
And I'm saying, "of course it's slower." A reasonably implemented std::vector (which is of course the only one worth considering) will use an array underneath. You're comparing raw array accesses to array accesses wrapped in a function call. Of course its slower.

Plus, it's still not a complete analysis. What happens if the vector needs to resize? You haven't addressed the relative operations in that case; this is a key issue. And, if the vector/array can be guaranteed to be of fixed size and never need to change (in which case the resize semantics wouldn't be a key issue), perhaps a vector is a poor choice for a container in that case (arguably you could simply use a raw array). In that case the error of using a vector would be with the user for choosing an inappropriate container (same kind of user error as choosing a std::map for indexing something by integers and doing frequent linear traversals).

I am not convinced that your example is valid or addressing a serious deficiency with the STL.

Quote:
STL's dependance on iterators adds overhead in the form of function calls, and reduced code cache coherency. I chose the vector, because it was an easy, reproducable, example.


Iterators are part of what gives the STL its power. They also contribute greatly to the "ease the programmer's burden" aspect (which is very, very relevant as it directly affects time-to-market). If the minimal performance overhead they may have is actually a life-threatening issue, perhaps you are (once again) using the wrong tool for the job; without a valid context for your example (which didn't use iterators anyway), it's hard to tell.

(The "right tool for the job" doesn't neccessarily mean another STL class either)
Quote:Original post by constWell, I don't want to say, that own-made bicycle is a better one, however my point is, that usually I don't need all that complexity, that brings STL. I must think about memory allocation strategy, when my task is simple iteration over list members for example.


However your argument is largely pointless.

1. The complexity of STL that you do not use, does not get compiled into your app since you are not using it. templated functions that are not used therefore do not get compiled and linked.
2. The memory allocator STL uses is a template argument, and can therefore be replaced.
3. If you *really* need to replace STL with your own containers, you can replace the templates, and switch over automatically without replacing any previously written code. So what's the problem?

Quote:
All I want to say, is that STL has no place in production in-game code (especially main loop), just because this is a place, where we need to optimize almost everything to get maximum fps.


Optimisation is obviously needed, but refusing to use STL is not an optimisation technique, It's prety much akin to saying, C is better than C++ because it has less bloat - a completely pointless and irrelevant argument. The reality is that most large assets are going to be block loaded from disk into a single block of memory. Therefore most of your runtime data is NOT going to use STL. The places where STL is useful is in those areas where dynamic allocation IS needed (number of characters in scene, num game objects, etc). With a proper allocator backing up STL, you really have no argument as to why this usage of STL is sub optimal.

Quote:There is absolutely no problem of stl-usage in tools-development or level-loading-code.


LEVEL LOADING CODE?? why on earth are you dynamically allocating level data using STL? what the hell is wrong with a single fread call like everyone else uses? Your level loading times must be through the roof! I don't think STL is the cause of your problems here, i'd suggest a complete overhaul of the way you are handiling your game assets.

Quote:And one more issue, I'm console developer, and there is not too much resources to spend on template instances (generated code grows too large, therefore consuming more memory) and other things.


I too am a console developer, and i have to say you need to learn more about templates and how to use them correctly. Ok, stupid use of templates is to be avoided, but in a large number of places, templates will REMOVE code, not bloat it.

Quote:How about cache coherency in other collections, or trees, or other more common data structures? You say I can write my own allocator for the STL types, but if I go through the trouble of doing that, wouldn't it be easier just to write my own container that did what I wanted it to do in the first place?


replace the templates when this becomes a problem.

Quote:Screaming "templates cause bloat", "STL's memory allocation sucks" or "I have experience, take my word for it" is neither convincing nor helpful. Instead, give us a way to judge your claims, demonstrate your expertise and help us improve our code.


here here! My translations of those sort of comments would be roughtly along the following lines :

"templates cause bloat" -> ignorance, they only do when you don't know how to write them properly.
"STL's memory allocation sucks" -> It sucks as much as the allocator you give it, if you have TOLD it to use free and malloc, then you really do need your head examining.
"I have experience, take my word for it" -> I am a troll.


The only thing i see bad with templates (in general) is the more complex meta programming stuff. Programming of that sort is best avoided IMO, since it's normally near impossible to figure out what the templates are doing later on - so for readability and maintainability, meta-programming is not a good technique. (I do find template meta-programming a good mental excersise, but i tend to leave it at that)
Quote:Original post by Anonymous Poster
I really don't see how it is apples to oranges.. how is it not realworld? If i am constantly changing values in a fixed sized list that I index into, vector is TWICE as slow. It's much more realworld than most examples STL proponents use where they assign const values to the vector. Thats somethig that would just never happen in code.


it's not real world because of the way you profiled it. The reality is that there is no difference in access speed when using STL vector vs. an array. period. Your benchmark demonstrates that you lack understanding, but it does not demonstrate anything near a real world case. In debug STL will be slower, granted. But who releases their final apps using debug builds?

Quote:The original poster asked what sort of an impact STL has. I gave one solid example. Yes it has helpful things, but it's at the expense of SPEED. I'm not saying STL isn't useful.. I'm just saying it's SLOWER.


And i'm saying you are very wrong.

Quote:STL's dependance on iterators adds overhead in the form of function calls, and reduced code cache coherency. I chose the vector, because it was an easy, reproducable, example.


No. It does nothing of the sort. templated inline functions compile down to a single pointer dereference. Get over it. profiling STL normally makes it appear that STL is slower. The reality is that profilers have to do more work when profiling templates.....
Quote:Original post by Anonymous Poster
I really don't see how it is apples to oranges.. how is it not realworld?


Well, you've allready answered yourself in one place:

Quote:In debug mode ...


Plus, you're benchmarking two different things:

Raw arrays : access
std::vector: access and allocation and whatever extra runtime checks the STL implementors felt like throwing in to help with debugging

Which is hardly a direct comparison of raw array access versus std::vector access.

Of final mention, is that if we're "profiling" debug mode, the appropriate benchmark is not code speed, but debugging time. While this will vary per STL implementaiton, bounds checking (usually hidden as a function of iterator addition) is quite often on the list, which is an immediate win over raw arrays.

As for release mode, they're going to perform the same on a decent compiler. If they don't, I would blame and upgrade the compiler, rather than the STL and roll a square wheel, if possible.
My debug metrics were an aside.. I did not do the original test in debug mode.. go back and re-read the post.

The test with the result that arrays were 2 times faster than vectors was in a release build with full optimisation. (reading the post before you reply is greatly appreciated)

How do you suggest I should profile it to make it realworld? I checked an equal number of reads and writes, each datatype was allocated once. I made sure to run the test from an optimized compile.

I am willing to concede that vector is not the appropriate type replacement for a static array. But many times in these forums people suggest vectors as an alternative to static arrays and state that they are just as fast. Read access into a vector is probably just as fast (negligibly slower due to function calls). Writing into a vector, however, is not.

You asked for an example were the STL was slower. I gave one. You asked me to show where it was slower. I pointed to the function calls and the iterator construction in the non-const operator[] function.

Now you tell my my test is not real-world. The ball is in your court. What would you like to see in a test that is real world? It has to be small enough for me to post.. don't say "use it in a game", we need to be objective.

I'm willing to do the benchmark footwork, i just need a test that you guys will be happy with.







what compiler and STL are you using rip-off? it's really interesting that there is such a huge performance difference between our tests..
Quote:Original post by Anonymous Poster
what compiler and STL are you using rip-off? it's really interesting that there is such a huge performance difference between our tests..


Gcc, version 3.4.2 (mingw)

Using the STL that comes with it.
This is one of the reasons I shy away from STL use too... Different implementations have wildly different performance characteristics..

Gcc seems to have a better implementation than VC .NET has, (Why doesn't this surprise me ;))

I'm starting to feel like a bit of a troll here, and that's not my intent. My concerns are real concerns, and are the reason I have stuck to my own types in my code.

I'm not even as bad as some others who i've worked with who shun the use of ANY virtual functions in c++. At least I'm not that bad :)

I would honestly like to know, once and for all the performance characteristics of the STL. I think it would be a good excersize to benchmark them. Right now all I hear are people saying STL is fast. But there is no data to back up this assumption. And I think many of you probably take it for granted because you hear it, and think it must be true. I may be completely wrong in my views of the STL, but I think in the name of computer science, we should qualify this.

We should build a profiling test, and a set of test parameters, and challenge people to write code that outperforms the STL in those areas.

Either no one will succeed, and we can crown the STL king, or people will succeed, and we'll start to see how different STLs compare to each other, and have concrete proof of where the STL actually sits, and how it can be improved, and in what scenarios a homebrew datatype will be more appropriate.

What do you guys think? It's better than namecalling and finger pointing. And those of us who consider ourseld computer scientists, need some scientific method in our lives ;)

This topic is closed to new replies.

Advertisement