• Advertisement

Archived

This topic is now archived and is closed to further replies.

Why is STL so slow?

This topic is 5686 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I was doing some timing tests between a list and my own implementation (for lists only). Why is it that my version is so much faster? It''s not ever worth using STL at all. And I had to take out the comparison on iterator.end() because this slows down STL by a factor of 100. All it does is just compares with NULL and it''s a function call that VC doesn''t make inline. Why do people use STL? It''s certainly not for the speed.
  
  list<int>::iterator i;
  list<int> l(100000);

  list<int>::iterator end = l.end();
  int cnt = 0;
  for (i=l.begin();i!=end;i++,cnt++)
  {
    *i = cnt;
  }
  
If I replace list with my own linked list implementation, it''s so much faster. It''s even worse with deque and vector.

Share this post


Link to post
Share on other sites
Advertisement
One thing to do is use "++i" instead of "i++".
The VC implementation of the STL is rather outdated. If you want a superior, better performing STL implementation, try STLport.

Share this post


Link to post
Share on other sites
Maybe because you're not using it properly :


    
#include <iterator>

#include <algorithm>

#include <list>

using namespace std;

template<class T>
class iota
{
T value;
public:
iota( T v ) : value( v ) {};
inline T operator() { return value++; };
};

list<int> l;
generate_n( back_inserter<int>(l), 100000, iota(0) );


or, using boost's counting iterators :


  
#include <list>

#include <boost/counting_iterator.hpp>

using namespace std;
using namespace boost;

list<int> l( make_counting_iterator(0), make_counting_iterator(100000) );


And you have to make sure that you're using the right container to the right task.

Then I'll ask you : your list implementation may be faster, but is it as convenient ? Is it faster in all possible use cases ? Does it provide the same functionality (e.g. list::sort) ? Is it type-safe ? Is it generic ? And so on, and so forth.

Documents [ GDNet | MSDN | STL | OpenGL | Formats | RTFM | Asking Smart Questions ]
C++ Stuff [ MinGW | Loki | SDL | Boost. | STLport | FLTK | ACCU Recommended Books ]


[edited by - Fruny on June 27, 2002 2:06:33 AM]

Share this post


Link to post
Share on other sites
Fruny: You''re missing the point entirely.

Kippesoep: Thanks. CBuilder 6 uses it. I''ll have to try it out with that.

Share this post


Link to post
Share on other sites

I tested this with MSVC 6. The STL version runs slowering in debug mode by factor of 2.5, but approximately the same in release mode (ran faster by 10, equal, and then slower by 10). As for why to use the STL, even if it's slower. The test case for my own linked list took like 15 minutes to debug, I was incrementing cur twice in the deallocation loop (Given, I am a very poor C programmer.. don't have to manually do much pointer stuff anymore) .


  
#include <list>
#include <ctime>
#include <iostream>

template <class T>
struct ll_node {
ll_node* next;
T data;
};

const int testSize = 1000000;

void testStdList() {
std::list<int> l(testSize);
std::list<int>::iterator end = l.end();
int cnt = 0;

clock_t start = clock();

for (std::list<int>::iterator i=l.begin();i!=end;i++,cnt++)
{
*i = cnt;
}

std::cout << clock() - start << std::endl;

}

void testMyList() {
ll_node<int>* head = new ll_node<int>;
ll_node<int>* cur = head;
int cnt=0;

while (cnt < testSize) {
cur->next = new ll_node<int>;
cur=cur->next;
cnt++;
}
cur->next=NULL;
cnt=0;

clock_t start = clock();

for (cur = head; cur != NULL; cur=cur->next, cnt++) {
cur->data = cnt;
}

std::cout << clock() - start << std::endl;

for (cur = head; cur != NULL; ) {
ll_node<int>* tmp = cur;
cur = cur->next;
delete tmp;
}
}

int main() {
testStdList();
testMyList();

return 0;
}




[edited by - silentstrike on June 27, 2002 2:49:25 AM]

Share this post


Link to post
Share on other sites
I was doing a test on how fast a list is at iterating through all the elements. *i = cnt was just a dummy piece of code so that it would do something and access the item. All you did was re-implement my loop taking into consideration *i = cnt. The assignment (or rather the cnt value) is irrelevant, but the access to the item IS. Your implementation shows no direct access to the item which is what I want to test. Pretend you don''t know what type of item I''m storing and give me access to each item, then you can change my loop. But I probably shouda made clear what I was testing. Sorry.

BTW, I appreciate you taking the time to reply. Thanks.

Share this post


Link to post
Share on other sites
quote:
Original post by Vorlath
Pretend you don''t know what type of item I''m storing and give me access to each item, then you can change my loop.


Ok, no big deal. When you look at the code for the for_each loop, here''s how it is implemented (a bit cleaned up from SGI''s implementation)


  

// Assign is some functor doing your assignment

for_each( l.begin(), l.end(), Assign() );

// for_each. Apply a function to every element of a range.

template <class InIt, class Func>
Func for_each( InIt first, InIt last, Func f)
{
// compile-time concept check

__STL_REQUIRES( InIt, _InputIterator);

for ( ; first != last; ++first)
f(*first);
return f;
}


Note that l.end() will be evaluated only once, just like any other function parameter (i.e. ''last''). And that they use pre-increment.

The actual overhead due to the iterators is implementation-dependent, as Kippesoep pointed-out (and the Dinkum implementation that ships with MSVC is crappy... which is why they Dinkumware sells another version as a ''drop-in'' replacement).

It is in the interest of library providers to provide you with the most efficient library they can produce.

Documents [ GDNet | MSDN | STL | OpenGL | Formats | RTFM | Asking Smart Questions ]
C++ Stuff [ MinGW | Loki | SDL | Boost. | STLport | FLTK | ACCU Recommended Books ]

Share this post


Link to post
Share on other sites
SilentStrike: Sorry, that last message was directed at Fruny.

Your post was exactly what I was looking for. You understood perfectly. Thanks for those results. My results were a bit better, but that might be due to the implementation I''m using. If so, then this is good news. And I know STL is way better than writing your own development time wise. But I''m looking for speed.

I guess I just don''t need all the functionality of STL.
Anyone know of any stipped down version of STL that''s built just for speed?

Share this post


Link to post
Share on other sites
Sorry for posting again, the messages got out of order because I''m too slow at responding and don''t see the new messages.

Fruny: Thanks for the info. That''s good to know. So it all comes down to implementation.

The only reason I started this thread was because MS kept boasting how fast STL was and it was horrible when it first came out, yet people were jumping all over it.

BTW, what''s wrong with this board''s edit function? It goes to a non-existant page when you submit.

Share this post


Link to post
Share on other sites
- usually you're better off using the provided algorithms than rolling out your own loop... that and know which algorithms and containers to use, range assignments & so on

- the MS-SQL database back-end to the forums is flaky (but I already said that ).

Edit: Once you know what's in the STL, look at what Boost has to offer

Documents [ GDNet | MSDN | STL | OpenGL | Formats | RTFM | Asking Smart Questions ]
C++ Stuff [ MinGW | Loki | SDL | Boost. | STLport | FLTK | ACCU Recommended Books ]


[edited by - Fruny on June 27, 2002 3:20:01 AM]

Share this post


Link to post
Share on other sites
If your looking for a linked-list type of data-structure, try looking into the STL vector.

Share this post


Link to post
Share on other sites
quote:
Original post by Laroche
If your looking for a linked-list type of data-structure, try looking into the STL vector.


A vector is not implemented as a linked list, because it needs to guarantee constant-time random access. Read books.

Share this post


Link to post
Share on other sites
I actually like everything about the stl, except for std::string (I found that vc.net''s version had a memory leak)

Share this post


Link to post
Share on other sites
quote:
Original post by daerid
I actually like everything about the stl, except for std::string (I found that vc.net''s version had a memory leak)


Think twice before believing debuggers that report memory leaks in STL. Quite often, what is happening is that the leak tester runs before STL frees the memory, so it can''t detect that the memory would have been freed normally. MS'' STL is bad, but I''m pretty sure it isn''t leaky.

Share this post


Link to post
Share on other sites
quote:

I was doing some timing tests between a list and my own implementation (for lists only). Why is it that my version is so much faster?


Almost always pilot error.
quote:

It''s not ever worth using STL at all.


All generalizations are false.
quote:

And I had to take out the comparison on iterator.end() because this slows down STL by a factor of 100. All it does is just compares with NULL and it''s a function call that VC doesn''t make inline.


As it''s supposed to. What if your loop inserts or removes elements? end can change.

quote:

Why do people use STL? It''s certainly not for the speed.


Really? Squeeze some instructions out of this using your class:

  
// adopted from Fruny''s code, but it''s a for_each so

// it does exactly what the original posted code does--

// fills a list with increasing integers

#include <list>

#include <algorithm>

using namespace std;

template<class T>
class iota
{
T value;
public:
iota( T v ) : value( v ) {}
void operator () ( T& a) { a = value++; }
};

void setIntList (list<int> &l)
{
for_each (l.begin (), l.end (), iota<int> (0));
}
// source listing: VC6.0 SP5

PUBLIC ?setIntList@@YAXAAV?$list@HV?$allocator@H@std@@@std@@@Z ; setIntList
; COMDAT ?setIntList@@YAXAAV?$list@HV?$allocator@H@std@@@std@@@Z
_TEXT SEGMENT
_l$ = 8
?setIntList@@YAXAAV?$list@HV?$allocator@H@std@@@std@@@Z PROC NEAR ; setIntList, COMDAT

; 16 : for_each (l.begin (), l.end (), iota<int> (0));

mov eax, DWORD PTR _l$[esp-4]
xor ecx, ecx
mov edx, DWORD PTR [eax+4]
mov eax, DWORD PTR [edx]
cmp eax, edx
je SHORT $L6270
$L6256:
mov DWORD PTR [eax+8], ecx
mov eax, DWORD PTR [eax]
inc ecx
cmp eax, edx
jne SHORT $L6256
$L6270:

; 17 : }

ret 0
?setIntList@@YAXAAV?$list@HV?$allocator@H@std@@@std@@@Z ENDP ; setIntList
_TEXT ENDS

Share this post


Link to post
Share on other sites
quote:
Original post by Sneftel
[quote]Original post by daerid
I actually like everything about the stl, except for std::string (I found that vc.net''s version had a memory leak)


Think twice before believing debuggers that report memory leaks in STL. Quite often, what is happening is that the leak tester runs before STL frees the memory, so it can''t detect that the memory would have been freed normally. MS'' STL is bad, but I''m pretty sure it isn''t leaky.


:\ It was more than just the debugger, it was the fact that my app (it was a server app) was consistently consuming more and more memory over the course of it''s operation.

Share this post


Link to post
Share on other sites
quote:
Original post by Stoffel
[quote]
I was doing some timing tests between a list and my own implementation (for lists only). Why is it that my version is so much faster?


Almost always pilot error.


Too bad it''s proven that VC''s STL is slow, huh? At least previous implementations. The company I worked for (200+ programmers) had to use their own implementation for certain parts because it was too slow. Not sure what they''re using now though.

quote:
Original post by Stoffel
[quote]
It''s not ever worth using STL at all.


All generalizations are false.


haha funny.


quote:

Why do people use STL? It''s certainly not for the speed.


Really? Squeeze some instructions out of this using your class:


Please read my previous posts. Yes, this example is fast, but I''m looking for access to the items. The fact that the values increment is irrelevant.

Anyhow, I''ve got all the info I need. The current STLPort is way better than what I was using before and is as fast or faster than anything I could write. It''s just VC''s implementation that sucks.

About cstring, I''m not sure about the memory leak, but if it uses dstring, then yeah, there used to bo a memory leak in that class for specific cases. I think it has to do with setting the size and doing catenation afterwards (can''t remember exactly, I''d have to check). But it''s fixed now. That''s really old news though. Like 2 years ago.

Share this post


Link to post
Share on other sites
quote:
Original post by daerid
:\ It was more than just the debugger, it was the fact that my app (it was a server app) was consistently consuming more and more memory over the course of it''s operation.


Sounds like pilot error. I''m unaware of std::string leaks in MSVC 6.0''s STL. We use strings extensively in several large projects. The one criticism I have heard about std::string is that it tends to make many small allocations, which can be a problem for certain memory models; I believe this can be solved with a custom allocator.

Share this post


Link to post
Share on other sites
quote:
Original post by Stoffel
Sounds like pilot error. I''m unaware of std::string leaks in MSVC 6.0''s STL. We use strings extensively in several large projects. The one criticism I have heard about std::string is that it tends to make many small allocations, which can be a problem for certain memory models; I believe this can be solved with a custom allocator.




I second that. STL can be broken, if you aren''t familiar with how it''s supposed to work. Too many beginning and even intermediate programmers think that STL containers are somehow magical.. I''ve seen things like this:

  
vector<int> myVector;
int myArray[20];

...

memcpy(myVector, myArray, sizeof(int) * 20);

One of the great things about STL is that individual implementations tend to be so widely used, any caveats or bugs are quickly identified, even for obscure situations. Which adds up to a monumental level of peer-level support compared to less popular libraries. The moral: before assuming that the STL is buggy, ask around and check your own code.

Share this post


Link to post
Share on other sites
quote:
Original post by Vorlath
Please read my previous posts. Yes, this example is fast, but I''m looking for access to the items. The fact that the values increment is irrelevant.


Are you actually so dense that you don''t realize that the given code iterates the entire list? Exactly what kind of access are you looking for? The code retrieves each node and writes a value into it. This code:
mov DWORD PTR [eax+8], ecx
mov eax, DWORD PTR [eax]
..is jumping through each node via the next pointer. The "item access" is the +8 offset from eax, which is how far the value (int) is from the head of the node.

But please, by all means, eschew STL. Better yet, just become a mindless MS basher. Gives me job security.

Share this post


Link to post
Share on other sites
also note ... the std::list is a CIRCULAR LINKED LIST structure. It can only be compared performance wise with other similar data structures. At least for inserting. If you use a SINGULAR LINKED LIST, then you should get identical read times accros the list, but faster insert / delete times (by a VERY small amount).

If you want access to be REALLY fast, then you need to use the std::vector class, which has atrocious insert / delete times (because it must perform a copy of the subsequent elements), but has the best access times, NOT JUST for random access, but also for sequentail access. Because a linked list MUST dereference the node pointer to find the next node, where a vector just increments pointer. So in a list, ++ followed by *, uses 2 pointer operations, where in a vetor it would be 1 add, and one pointer operation.

A deque gives you the best middle gound if you aren''t yet sure exactly the type of usage pattern you have. It gives you NEAR (amortized) constant time instert and delete, and near constant time access as well (but more operations than vector, and more expensive ++ operations as well).

Good Luck

Share this post


Link to post
Share on other sites
quote:
Original post by Xai
also note ... the std::list is a CIRCULAR LINKED LIST structure. It can only be compared performance wise with other similar data structures. At least for inserting. If you use a SINGULAR LINKED LIST, then you should get identical read times accros the list, but faster insert / delete times (by a VERY small amount).

If you want access to be REALLY fast, then you need to use the std::vector class, which has atrocious insert / delete times (because it must perform a copy of the subsequent elements), but has the best access times, NOT JUST for random access, but also for sequentail access. Because a linked list MUST dereference the node pointer to find the next node, where a vector just increments pointer. So in a list, ++ followed by *, uses 2 pointer operations, where in a vetor it would be 1 add, and one pointer operation.

A deque gives you the best middle gound if you aren''t yet sure exactly the type of usage pattern you have. It gives you NEAR (amortized) constant time instert and delete, and near constant time access as well (but more operations than vector, and more expensive ++ operations as well).

Good Luck


std::list is usually implemented as a bidirectional list, which is what I think you meant. In other words, each node has a pointer both to the next and previous node. It''s definitely not a circular linked list (the last node wraps around to the first). If that were true,
   for(iter = list.begin(); iter != list.end(); iter++);   
would never terminate.

Dereferencing tends to be a fairly fast operation on all major architectures, so in most cases, it doesn''t make a significant speed difference; for inner loops where you need every cycle you can get, tho, you''re right. On the other hand, if I was optimizing that aggressively, I''d probably be working in asm, and not using STL (at least for that function).

Share this post


Link to post
Share on other sites
Using .end is a hard-coded loop is... naive. Either use an stl algorithm (may require you to write a functor), or write the hand-coded loop correctly.

What optimization flags are you using Vorlath?

Performance comparisons are only valid in release builds, because useful libraries have additional debug-time checks. Accessors are generally only inlined in release builds as well.
Also, if you don''t actually do something with the data, it will be optimized out completely in a release build. And don''t printf/cout the data elements in the iteration - they''ll take several orders of magnitude more time to execute than a simple linked-list transversal.


It''s certainly possible to write containers that offer better performance in some aspect than the stl ones. If you only want a singlely linked list, then you''ll want to look for an extention to the library. I think STLport has a single_list container.

Or, if you can make assumptions about the data that a general algorithm cannot, you can make faster code in some cases. This is particularily true for memory management.

I''m afraid though, that you haven''t shown, much less proven, that the stl is unacceptably slow. You''ve only demonstrated that you need more practice profiling, and need to learn more about the standard library (if you want to use it effectively anyway).

Magmai Kai Holmlor

"Oh, like you''ve never written buggy code" - Lee

[Look for information | GDNet Start Here | GDNet Search Tool | GDNet FAQ | MSDN RTF[L] | SGI STL Docs | STFW | Asking Smart Questions ]

[Free C++ Libraries | Boost | ACE | Loki | MTL | Blitz++ | wxWindows| Spirit(xBNF)]

Share this post


Link to post
Share on other sites

  • Advertisement