# Comparison of STL list performance

## Recommended Posts

polyfrag    2503
Why is STL list so much faster in all ways than custom list implementation in MSVS 2012 for a thousand iterations? What tricks can be used in C90 to make it as fast? Is a significant boost in performance possible with C90?

##### Share on other sites
Ravyne    14300

I'm confused by the question. What does C90 have to do with STL (which is a C++ thing)?

And what STL list are you talking about that's different than MSVS 2012's list implementation (that's an STL list too, but maybe different than what you're comparing too).

You'll have to give more details if you want an answer.

##### Share on other sites
polyfrag    2503
I meant custom linked list implementation in C90. Considering switching to c90 if I can get better performance in all cases. I'll post the comparison code when I get to my computer.

##### Share on other sites
KnolanCross    1974

Can you describe the test? I have a custom C list implementation and would like to compare.

##### Share on other sites
polyfrag    2503

https://github.com/polyf/cppvsc

https://github.com/polyf/cppvsc/blob/master/2c/test/test.c

https://github.com/polyf/cppvsc/blob/master/2cpp/test/test.cpp

Compares milliseconds elapsed pushing, iterating and calling, and popping between the two implementations.

Edited by polyfrag

phil_t    8084

##### Share on other sites
polyfrag    2503

The inaccuracy in GetTickCount is supposedly only ~30 ms and I'm measuring the results of thousands of iterations once, not each iteration individually, which gives me consistent numbers. So it seems accurate.

https://github.com/polyf/cppvsc/blob/master/results.txt

##### Share on other sites
Erik Rufelt    5901

Investigate the ASM output to find out.

I suspect you're actually  measuring the time to do 2 million malloc in your C code compared to ~1 million std::list.

##### Share on other sites
polyfrag    2503

Using a pointer to the data to be held in each list node made all the difference. Now it's as slow as the C90 version.

Test 2: STL list vs. C linked list

C90 g=3000000 passed push,iterate+call,pop in 109,0,47 ms

CPP g=3000000 passed push,iterate+call,pop in 47,16,15 ms

CPP with pointer data g=3000000 passed push,iterate+call,pop in 109,0,47 ms

Edited by polyfrag

##### Share on other sites
tonemgub    2008

I'm not sure if I have this exactly right, but I think std::list is not really implemented as a linked list, so your comparison to linked lists makes no sense.

It does act like a doubly-linked list, but under the hood, it uses a little trick to reduce the number of pointer dereferencing operations which would be required for a straight-forward doubly-linked list implementation.

Instead, I think std::list uses multiple buckets (memory buffers which hold multiple elements), same as std::string and other sequence types, and instead of keeping "Next" and "Prev" pointers in each element (or Node), it only has to keep track of the memory buckets this way, and any operations it does on elements inside a bucket do not have to use the "Next" and "Prev" pointers to move between elements, because those elements are contiguous - they're accessed simply by index.

This is just a simple view. In reality, std::list is much more complicated than this.

##### Share on other sites
Bearhugger    1276

There's no good reason in 2015 to switch to C from C++. Anyone claiming otherwise is an old-school programmer who hasn't paid much attention to compiler or language technology in the last decade.

Are you talking about game development or general development? Because there are plenty of reasons people use C instead of C++ in 2015, it's just not generally the case in game development because we need to treat a lot of information and resources, and the facilities that C++ has over C are very handy there.

But yeah, people that use C don't do it to outperform std::list because that would be a complete waste of time. Instead of switching to C, I'd rather spend that time doing like EA and making my own version of the STL containers, with the features I find superfluous dropped out...

##### Share on other sites
Ravyne    14300

Because there are plenty of reasons people use C instead of C++ in 2015,

And those reasons have always been those reasons -- If you have those reasons, you were never using C++ in the first place, hence no need to switch. C has its place, but I struggle to think of an instance where it makes sense to swtich to C from where C++ has been used.

##### Share on other sites
Oberon_Command    6086

Behold the power of templates. To achieve similar performance in C, you'd have to write a separate list-node for each small data type you wanted to use, and the whole set of linked-list functions to operate on it.

Not necessarily. I can conceive of doing some sort of intrusive container using zero-length arrays. That might seem a little smelly, and you'd have to pass the size of the node type into the node allocator you were using, but I think you could make it work. Totally off the top of my head, no compiler has touched this code so it may contain typos:

[source]

/* intrusive linked list node - note the zero-length array */

typedef struct _IntrusiveNode {

_IntrusiveNode* next;

char data[0];

} INTRUSIVE_LIST;

INTRUSIVE_LIST* push_uninitialized_node(INTRUSIVE_LIST* head, size_t data_size) {

}

INTRUSIVE_LIST* push_node(INTRUSIVE_LIST* head, void* data, size_t data_size) {

}

/* helper macros */

#define INTRUSIVE_LIST_PUSH(h,d) push_node(h,d,sizeof(*d))

#define INTRUSIVE_LIST_PUSH_UNINITIALIZED(h, s) push_uninitialized_node(h, sizeof(s))

/* usage */

/* look, I can even mix and match data types!

float data = 0.0f;

[/source]

Granted,you now have a memory copy step if you want it initialized, but you could elide that if you just wanted to emplace an uninitialized node on the list (as I've shown). Oh, and you don't have the typesafety of the templated version.

Can anyone find anything wrong with this?

edit: But really, depending on your platform, you shouldn't be using linked lists, anyway.

Edited by Oberon_Command

##### Share on other sites
Pink Horror    2459

Instead, I think std::list uses multiple buckets (memory buffers which hold multiple elements), same as std::string and other sequence types, and instead of keeping "Next" and "Prev" pointers in each element (or Node), it only has to keep track of the memory buckets this way, and any operations it does on elements inside a bucket do not have to use the "Next" and "Prev" pointers to move between elements, because those elements are contiguous - they're accessed simply by index.

I believe you're thinking of the std::deque, which acts somewhat like a vector, but isn't guaranteed to be contiguous, and which guarantees constant complexity for pushing and popping from the front and back. In regards to the std::list, I'm not 100% sure, but I believe Visual Studio's implementation, and most other implementations, use a fairly standard doubly-linked list. Visual Studio has a debugging setting to show the raw structure of objects in the watch window, if you want to see who is right.

##### Share on other sites
Erik Rufelt    5901

I certainly don't mean to contradict any of the good points raised for real applications, but the particular synthetic "benchmark" posted is much much simpler, and can be pretty nicely explained by comparing the running time of the following for-loops.

for(i=0; i < 1000*1000*10; i++) {
delete (new int);
}

for(i=0; i < 1000*1000*10; i++) {
free(malloc(4));
free(malloc(4));
}


##### Share on other sites
polyfrag    2503

What do you mean, Erik? I think there would be more allocations/deallocations in the c++ version due to copy constructor (if the compiler doesn't compile that out). There's also if statements in pushing and iterating and popping.

The timings were more shaky this time around, varying by as much as 90 ms. But I tried to get the best of 5 tries.

By the way, zero-length arrays seem to make the C custom linked list implementation as fast as the STL version. Only 1 allocation is required for the linked node together with the data, and one deallocation. Plus maybe no jumping in memory because of a data pointer, allowing it all to be fetched into the cache.

Edited by polyfrag

##### Share on other sites
Erik Rufelt    5901

I think there would be more allocations/deallocations in the c++ version due to copy constructor (if the compiler doesn't compile that out).

In your custom implementation, you first malloc() the 'Widget', and then you malloc a 'ListLink' to store it in, 2 allocations for each node.

std::list will create an internal type that is 'sizeof(Widget) + sizeof(ListLink)' and do one single allocation per node.

That's why you saw the same performance when you changed the std::list store pointers that were allocated with new, as that version also had 2 allocations per node with that setup, 'new Widget' and new 'sizeof(pointer) + sizeof(ListLink)'.

Try timing the loop with 10 million allocations only without any list, and I think you will find that it's not too far from the time of the allocation-step in your list-test.

If you want to count the number of allocations for a C++ implementation, you can override operator new and increment a counter before calling malloc from it.

int g_countNew = 0;
int g_countDelete = 0;

void* operator new(size_t size) {
++g_countNew;
return malloc(size);
}

void operator delete(void *mem) {
++g_countDelete;
free(mem);
}


##### Share on other sites
osmanb    2082

What do you mean, Erik? I think there would be more allocations/deallocations in the c++ version due to copy constructor (if the compiler doesn't compile that out). There's also if statements in pushing and iterating and popping.

The cost of a single branch to check a null pointer during iteration is miniscule compared to the cost of calling new/malloc/delete/free. And things like the copy constructor for a simple type don't matter - assuming that you're just operating on simple structures or built-in types (not even necessarily POD), any "copies" allocated for temporaries are on the stack. Again, for the purposes of this benchmark, that's fast enough to be irrelevant. The only thing that matters in this situation is how many times you're hitting the system allocator to allocate and free nodes and/or payload.

##### Share on other sites
tonemgub    2008

I thought this was about iterations... Why are we talking about multiple allocations and copy constructors now? It's obvious that allocations are slow, since they usually end up being run by the OS, which runs an entire memory manager (or even just the heap manager, which is probably worse) which needs to do a bit of work before it can allocate the memory. Worst case scenario (although this is probably not the case here), there is no available physical memory, and it has to swap something else to disk to free some.

That's why I mentioned that std::list uses buckets - they also speed up the allocation (even more than just the iteration), if that is the new concern here. It's a really bad idea to ever allocate memory in a loop anyway. If you know how much memory you need beforehand, you should allocate it all at once, and use it. If you don't know how much you need, then just allocate a reasonable initial ammount, then add more when you need it. And by "reasonable", I mean "more than you think you need" in Release builds, and "as little as you can work with" in Debug builds, so you can catch any problems with your memory allocation during development. Std::list does this perfectly for you.

And if you need to copy entire lists of data element by element, then your entire design is probably flawed to begin with.

(Too much coffee. Going to bed...)

Edited by tonemgub

##### Share on other sites
ApochPiQ    23058
std::list does not allocate in buckets; it allocates per-element. You may be thinking of std::deque.

##### Share on other sites
tonemgub    2008

std::list does not allocate in buckets; it allocates per-element. You may be thinking of std::deque.

Maybe you're right about std::list, but I don't think std::string allocates every character separately. And by "allocate" I mean that it reserves the memory using the internal allocator, not that it also initializes it. So initialization (via copy constructor) has nothing to do with allocation. Maybe I'll check it out sometime, though. To me it sounds a bit too much that std::list would use additional "next" and "prev" pointers for every element, when it could speed up performance simply by using buckets.

I also found this: http://www.gamedev.net/topic/292334-how-does-stdlist-allocate-its-items/ .

Edited by tonemgub