Microsoft's STL implementation slow?

Published October 30, 2008
Advertisement
Is it me or does Microsoft's implementation of vectors in the STL seem oddly slow? This question came about after my past experiences and now recently a debate on a different forum.

Anyways, to investigate this I wrote a small program to test the speed of populating and fetching data to/from regular 'ol malloc'd...dynamically allocated arrays and std::vector's... respectively.

While the test isn't extremely extensive, I believe it's sufficient.

Either way, the results are...disturbing to say in the least. This is with the \O2 'Full Speed' optimization flag on MSVC 2008:
Starting first test: Populating Speed...Populating dynamically allocated array...Populating vector...The dynamically allocated array took    0.0176ms to populate.The vector took                         0.0396698ms to populate.A 2.25397 ratio difference in speed.Starting second test: Fetch Speed...Fetching data from dynamically allocated array.Fetching data from vector.The dynamically allocated array took:   0.00195556ms to fetch data.The vector took:                        0.0312889ms to fetch data.A 16 ratio difference in speed.


This makes me want to reinvent the wheel...

I thought it might have been my code but when I asked a friend to test it on his GCC setup his results were drastically different... while I don't have his exact results he said "I compiled this on GCC and get slightly better results with the vector at -O2, populating on GCC 4.3.2 (-O2) was slightly slower, but fetching was faster. It might be interesting to figure out what is causing the issue."

I'm posting the full source code here in attempt to get a broader range of results from various platforms. So far it seems like it's a Microsoft specific issue.

Here's the source code:
VecTest.zip

If someone knows of something I don't know, please let me know. I'm interested in seeing your results.
0 likes 6 comments

Comments

HopeDagger
Ah ha! I'm glad that I'm not the only one who has noticed this. Whenever I bring this up around "professionals" they insist that this is not the case, and that std::vectors are just as fast. Some will even assure me that I'm crazy for thinking otherwise. Sure, it might be an issue with MS's implementation, but it's a real speed thing! Stop calling me crazy! [grin]
October 30, 2008 08:23 AM
bladerunner627
Yeah, I dunno :P.

When I made my claim before the test I got this kind of response "BULLSHIT ALERT! Did you just say the STL is slow? Are you freaking kidding me? You switched out a trusted, well-implemented dynamic array for a hand-written dynamic array, and you noticed 300% - 400% speed increases?"

I noticed the slowness when using vectors to store all IBSP map data (this is before I started using vertex and index buffers) so you can see WHY it would make a significant difference, I was hitting these things SEVERAL times a frame.
October 30, 2008 09:45 AM
Telastyn
Yeah, there's some stuff in MSVS that still gets into the release build that prevents the expected behavior:

http://www.gamedev.net/community/forums/viewreply.asp?ID=3073643
http://www.gamedev.net/community/forums/topic.asp?topic_id=466974

I don't know all the details, just giving a pointer towards possible search points. The speeds should be equivalent.
October 30, 2008 09:54 AM
rip-off
Using _SECURE_SCL 0, I get this:
Quote:
Creating shared data...
Populating shared indices, this may take some time...

Starting first test: Populating Speed...

Populating dynamically allocated array...
Populating vector...
The dynamically allocated array took 0.0198349ms to populate.
The vector took 0.0301714ms to populate.
A 1.52113 ratio difference in speed.

Starting second test: Fetch Speed...

Fetching data from dynamically allocated array.
Fetching data from vector.
The dynamically allocated array took: 0.00195556ms to fetch data.
The vector took: 0.00195556ms to fetch data.
A 1 ratio difference in speed.
Press any key to continue . . .

I get different results on different runs, but that is a typical value. Sometimes the vector fetch is slightly slower, or slightly faster, but I imagine that has more to do with background tasks than anything else.

Looking at visual studios <vector>, the code in the vector is this:

	reference operator[](size_type _Pos)
		{	// subscript mutable sequence

 #if _HAS_ITERATOR_DEBUGGING
		if (size() <= _Pos)
			{
			_DEBUG_ERROR("vector subscript out of range");
			_SCL_SECURE_OUT_OF_RANGE;
			}
 #endif /* _HAS_ITERATOR_DEBUGGING */
		_SCL_SECURE_VALIDATE_RANGE(_Pos < size());

		return (*(_Myfirst + _Pos));
		}





When _HAS_ITERATOR_DEBUGGING is 0, that boils down to:

return (*(_Myfirst + _Pos));

Which is the best you can do when writing a vector class.

Looking at the assembly output, we see the compiler playing a tricky little trick, that helps explain what is going on.

; 56   : 	for(int i = 0; i < iterations; ++i)
; 57   : 		dyC[i] = data[i];

	push	20000					; 00004e20H
	push	OFFSET ?data@@3PAEA			; data
	push	eax
	mov	BYTE PTR _timer$[esp+204], 1
	call	_memcpy
	add	esp, 12					; 0000000cH

It transformed the raw copy into a call to memcpy().

For the vector, we get the following:

$LL9@main:

; 67   : 		dyVec[i] = data[i];

	mov	dl, BYTE PTR ?data@@3PAEA[eax]
	mov	ecx, DWORD PTR ?dyVec@@3V?$vector@EV?$allocator@E@std@@@std@@A+4
	mov	BYTE PTR [ecx+eax], dl
	inc	eax
	cmp	eax, 20000				; 00004e20H
	jb	SHORT $LL9@main

I don't know x86 assembler, but that looks like a regular loop to me. In fact, I am surprised to see that it (appears to) load the base address of the vector every loop iteration!

Based on that: IMO rolling your own class probably won't help. The alternative of manually allocating and deallocating everywhere isn't great, as it clutters the code and could lead to memory leaks or bugs.

If you need a clincher, consider what happens when you stop using a primitive like unsigned char. Consider a dynamic array of std::string instances. The compiler is no longer free to optimise to a memcpy. In my experience, most std::vectors store user defined data, often with (explicit or implicit) copy contructors and destructors.

I hope you see how easy it is to construct flawed benchmarks. In general, if you aren't profiling a real program, you are probably not getting an accurate idea of what is going on.
October 30, 2008 01:39 PM
matt_j
I noticed that STL is horribly slow in Debug mode because all the memory allocations that happen use debug-mode functions for those tasks, which are very very slow. How does it run in Release mode?
November 01, 2008 02:56 AM
bladerunner627
Those tests were done in Release mode.
November 03, 2008 01:19 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement

Latest Entries

Boo!

1538 views

Group blog

1044 views

Easy Usage

1063 views

Angel Script

924 views

My game idea

941 views

Lightmaps

912 views

Yay BSPs!

913 views
Advertisement