• entries
56
80
• views
41776

# Microsoft's STL implementation slow?

327 views

Is it me or does Microsoft's implementation of vectors in the STL seem oddly slow? This question came about after my past experiences and now recently a debate on a different forum.

Anyways, to investigate this I wrote a small program to test the speed of populating and fetching data to/from regular 'ol malloc'd...dynamically allocated arrays and std::vector's... respectively.

While the test isn't extremely extensive, I believe it's sufficient.

Either way, the results are...disturbing to say in the least. This is with the \O2 'Full Speed' optimization flag on MSVC 2008:
Starting first test: Populating Speed...Populating dynamically allocated array...Populating vector...The dynamically allocated array took    0.0176ms to populate.The vector took                         0.0396698ms to populate.A 2.25397 ratio difference in speed.Starting second test: Fetch Speed...Fetching data from dynamically allocated array.Fetching data from vector.The dynamically allocated array took:   0.00195556ms to fetch data.The vector took:                        0.0312889ms to fetch data.A 16 ratio difference in speed.

This makes me want to reinvent the wheel...

I thought it might have been my code but when I asked a friend to test it on his GCC setup his results were drastically different... while I don't have his exact results he said "I compiled this on GCC and get slightly better results with the vector at -O2, populating on GCC 4.3.2 (-O2) was slightly slower, but fetching was faster. It might be interesting to figure out what is causing the issue."

I'm posting the full source code here in attempt to get a broader range of results from various platforms. So far it seems like it's a Microsoft specific issue.

Here's the source code:
VecTest.zip

If someone knows of something I don't know, please let me know. I'm interested in seeing your results.

Ah ha! I'm glad that I'm not the only one who has noticed this. Whenever I bring this up around "professionals" they insist that this is not the case, and that std::vectors are just as fast. Some will even assure me that I'm crazy for thinking otherwise. Sure, it might be an issue with MS's implementation, but it's a real speed thing! Stop calling me crazy! [grin]

Yeah, I dunno :P.

When I made my claim before the test I got this kind of response "BULLSHIT ALERT! Did you just say the STL is slow? Are you freaking kidding me? You switched out a trusted, well-implemented dynamic array for a hand-written dynamic array, and you noticed 300% - 400% speed increases?"

I noticed the slowness when using vectors to store all IBSP map data (this is before I started using vertex and index buffers) so you can see WHY it would make a significant difference, I was hitting these things SEVERAL times a frame.

Using _SECURE_SCL 0, I get this:
Quote:
 Creating shared data... Populating shared indices, this may take some time... Starting first test: Populating Speed... Populating dynamically allocated array... Populating vector... The dynamically allocated array took 0.0198349ms to populate. The vector took 0.0301714ms to populate. A 1.52113 ratio difference in speed. Starting second test: Fetch Speed... Fetching data from dynamically allocated array. Fetching data from vector. The dynamically allocated array took: 0.00195556ms to fetch data. The vector took: 0.00195556ms to fetch data. A 1 ratio difference in speed. Press any key to continue . . .

I get different results on different runs, but that is a typical value. Sometimes the vector fetch is slightly slower, or slightly faster, but I imagine that has more to do with background tasks than anything else.

Looking at visual studios <vector>, the code in the vector is this:
	reference operator[](size_type _Pos)
{	// subscript mutable sequence

#if _HAS_ITERATOR_DEBUGGING
if (size() <= _Pos)
{
_DEBUG_ERROR("vector subscript out of range");
_SCL_SECURE_OUT_OF_RANGE;
}
#endif /* _HAS_ITERATOR_DEBUGGING */
_SCL_SECURE_VALIDATE_RANGE(_Pos < size());

return (*(_Myfirst + _Pos));
}



When _HAS_ITERATOR_DEBUGGING is 0, that boils down to:
return (*(_Myfirst + _Pos));

Which is the best you can do when writing a vector class.

Looking at the assembly output, we see the compiler playing a tricky little trick, that helps explain what is going on.
; 56   : 	for(int i = 0; i < iterations; ++i)
; 57   : 		dyC[i] = data[i];

push	20000					; 00004e20H
push	OFFSET ?data@@3PAEA			; data
push	eax
mov	BYTE PTR _timer$[esp+204], 1 call _memcpy add esp, 12 ; 0000000cH  It transformed the raw copy into a call to memcpy(). For the vector, we get the following: $LL9@main:

; 67   : 		dyVec[i] = data[i];

mov	dl, BYTE PTR ?data@@3PAEA[eax]
mov	ecx, DWORD PTR ?dyVec@@3V?$vector@EV?$allocator@E@std@@@std@@A+4
mov	BYTE PTR [ecx+eax], dl
inc	eax
cmp	eax, 20000				; 00004e20H
jb	SHORT \$LL9@main


I don't know x86 assembler, but that looks like a regular loop to me. In fact, I am surprised to see that it (appears to) load the base address of the vector every loop iteration!

Based on that: IMO rolling your own class probably won't help. The alternative of manually allocating and deallocating everywhere isn't great, as it clutters the code and could lead to memory leaks or bugs.

If you need a clincher, consider what happens when you stop using a primitive like unsigned char. Consider a dynamic array of std::string instances. The compiler is no longer free to optimise to a memcpy. In my experience, most std::vectors store user defined data, often with (explicit or implicit) copy contructors and destructors.

I hope you see how easy it is to construct flawed benchmarks. In general, if you aren't profiling a real program, you are probably not getting an accurate idea of what is going on.

I noticed that STL is horribly slow in Debug mode because all the memory allocations that happen use debug-mode functions for those tasks, which are very very slow. How does it run in Release mode?

Those tests were done in Release mode.

## Create an account

Register a new account