Array vs Vector Discussion :)

Started by
24 comments, last by Decrius 13 years, 6 months ago
I saw someones code at work and wondered why they were using a vector instead of an array since we knew the size, so I almost asked the question on here about speed of access but I pretty much knew the answer anyway.

So this is not a question but to beginners, if you know the size of something before the start of your application, use an array. They are always the fastest, and vectors use an underlying array as well, but they are a lot slower to access.

int array[];
vector<int> vector;

array//not a function call, no stack invoked
vector//function call, stack invoked, variables/stack frame saved

array wins 31ms to 1280 ms
#include <vector>#include <iostream>#include <windows.h>using namespace std;#define XSIZE 10000000vector<int> vecNumbers;int arrayNumbers[XSIZE];int main(){	vecNumbers.reserve(10000000);	for(int i = 0; i < XSIZE; i++)	{		vecNumbers.push_back(0);		arrayNumbers = 0;	}float start_time = GetTickCount();	for(int i = 0; i < XSIZE;i++)	{		arrayNumbers++;	}	cout << GetTickCount() - start_time << endl; start_time = GetTickCount();	for(int i = 0; i < XSIZE;i++)	{		vecNumbers++;	}	cout << GetTickCount() - start_time;	system("pause");	return 0;}


[Edited by - dpadam450 on October 18, 2010 6:01:33 PM]

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Advertisement
I find those numbers highly suspect unless you compiled in debug mode. The subscript operator will likely be inlined in release.
I wouldn't be surprised if the release version just optimized both code blocks out, since they're not really doing anything.

[size=1]Visit my website, rawrrawr.com

Quote:Original post by Rycross
I find those numbers highly suspect unless you compiled in debug mode. The subscript operator will likely be inlined in release.


Or MSVC in release if SECURE_SCL hasn't been disabled.
Well still, how many times do we need to run in debug mode? You still need fast enough debug code.

Release with size = 100,000,000
time = 100ms vs 200ms

yea release is fairly negligible, but if it were a 2D or 3D array/vector it would most likely increase in time gap as well.

Also sometimes people forget to reserve their vectors, so millions of items will be slow if you forget to ask for your vector to be big enough to hold all of those items.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

The purpose behind a debug build is to catch potential errors. In that case using a raw array rather than a vector is counterproductive since a vector can do nice things like bounds checking. However, there are classes for compile time constant arrays that would do a bit better than vectors performance-wise in release, but do checking in debug, such as boost::array.
I haven't messed with boost, any other advantages to boost::array other than bounds checks?

Quote:
since a vector can do nice things like bounds checking


It only does that if you use an iterator though. Maybe .at() checks bounds too?

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Quote:Original post by bobofjoe
I wouldn't be surprised if the release version just optimized both code blocks out, since they're not really doing anything.


Yeah, I had to add in an extra cout utilizing the array to avoid it being optimized out by gcc. Gcc didn't seem to want to optimize out the std::vector altogether.

Results on doing the same test on 10000000 elements?

57-60 ms vs 61-64 ms. In other words, basically the same.


I did change the post-increment operator to the pre-increment operator. I suspect that gcc would optimize out the copy and call to operator = for vector, but decided to change it just to be sure.

Quote:Original post by dpadam450
Well still, how many times do we need to run in debug mode? You still need fast enough debug code.


Debug mode is a concern and one of the reasons EA wrote EASTL (note that they implemented their own containers rather than defaulted to plain arrays), but its still better to go with a vector until you are having a problem. As others have pointed out, part of the point of debug is to make sure you aren't doing things wrong, so using a safe container with a speed hit and runtime checks in debug is preferable to extra speed, unless you're hitting serious bottlenecks.


Quote:Original post by dpadam450
Also sometimes people forget to reserve their vectors, so millions of items will be slow if you forget to ask for your vector to be big enough to hold all of those items.


If those developers can't use std::vector correctly, what makes you think they'll use raw arrays correctly?

[Edited by - Rycross on October 18, 2010 4:59:58 PM]
Quote:
Well still, how many times do we need to run in debug mode? You still need fast enough debug code.

The checks can be disabled in debug mode if you like.

Ok, lets look at the assembly. Here is the source I used:
#include <vector>#include <cstdio>#include <windows.h>using namespace std;const unsigned int XSIZE = 10000000;int main(){	vector<int> vecNumbers;	int arrayNumbers[XSIZE];	vecNumbers.reserve(10000000);	for(int i = 0; i < XSIZE; i++)	{		vecNumbers.push_back(0);		arrayNumbers = 0;	}	DWORD start_time = GetTickCount();	for(int i = 0; i < XSIZE;i++)	{		arrayNumbers++;	}	DWORD elapsed = GetTickCount() - start_time;	printf("array: %d\n", elapsed);	start_time = GetTickCount();	for(int i = 0; i < XSIZE;i++)	{		vecNumbers++;	}	elapsed = GetTickCount() - start_time;	printf("vector: %d\n", elapsed);	system("pause");	return 0;}

Assembler output:
; 19   : ; 20   : 	DWORD start_time = GetTickCount();	mov	edi, DWORD PTR __imp__GetTickCount@0	call	edi	mov	ebx, eax; 21   : 	for(int i = 0; i < XSIZE;i++)	xor	eax, eax	npad	4$LL6@main:; 22   : 	{; 23   : 		arrayNumbers++;	inc	DWORD PTR _arrayNumbers$[esp+eax*4+40000056]	inc	eax	cmp	eax, 10000000				; 00989680H	jb	SHORT $LL6@main; 24   : 	}; 25   : 	DWORD elapsed = GetTickCount() - start_time;	call	edi	sub	eax, ebx; 26   : 	printf("array: %d\n", elapsed);	push	eax	push	OFFSET $SG-11	call	_printf	add	esp, 8; 27   : ; 28   : 	start_time = GetTickCount();	call	edi	mov	ebx, eax; 29   : 	for(int i = 0; i < XSIZE;i++)	xor	eax, eax$LL3@main:; 30   : 	{; 31   : 		vecNumbers++;	inc	DWORD PTR [esi+eax*4]	inc	eax	cmp	eax, 10000000				; 00989680H	jb	SHORT $LL3@main; 32   : 	}; 33   : 	elapsed = GetTickCount() - start_time;	call	edi	sub	eax, ebx; 34   : 	printf("vector: %d\n", elapsed);	push	eax	push	OFFSET $SG-12	call	_printf; 35   : ; 36   : 	system("pause");	push	OFFSET $SG-13	call	_system	add	esp, 12					; 0000000cH

To simplify, I replaced ostream with printf (ostream adds lots of gunk to asm output). I also stack allocated both objects (I had to increase the stack size to accomodate the large array).

Just to highlight, for an array:
; 23   : 		arrayNumbers++;	inc	DWORD PTR _arrayNumbers$[esp+eax*4+40000056]	inc	eax	cmp	eax, 10000000				; 00989680H	jb	SHORT $LL6@main; 24   : 	}

Whereas for a vector:
; 31   : 		vecNumbers++;	inc	DWORD PTR [esi+eax*4]	inc	eax	cmp	eax, 10000000				; 00989680H	jb	SHORT $LL3@main

I'm no assembly wizard, but this seems pretty comparable. On my machine I get near identical times, maybe a millisecond favouring one over the other, mine runs in ~16 milliseconds which is too close to the scheduler quantum to be a good measurement.

Upping the size by 10 and looping 10 times shows that the skew is slight in each direction, sometimes favouring vector and other times favouring the array (here both stack and a global array were used, out of curiousity):
Quote:
array: 202 <-- winner
vector: 219
static array: 405
array: 250
vector: 249
static array: 234 <-- winner
array: 219 <-- winner
vector: 296
static array: 296
array: 297
vector: 218 <-- winner
static array: 219
array: 203
vector: 187 <-- winner
static array: 219
array: 202
vector: 188 <-- winner
static array: 202
array: 219
vector: 203
static array: 202 <-- winner
array: 203
vector: 187 <-- winner
static array: 203
array: 203
vector: 202 <-- winner
static array: 203
array: 187 <-- winner (joint)
vector: 187 <-- winner (joint)
static array: 219

In the above run, there is a good mix of winners, the differences more likely due to external issues such as the cache or external processes than any intristic benefit of one technique over the other.

The source for that last test is (aside from the "winner" annotations):
#include <vector>#include <cstdio>#include <windows.h>using namespace std;const unsigned int XSIZE = 100000000;int staticArray[XSIZE];int main(){	vector<int> vecNumbers(XSIZE, 0);	int arrayNumbers[XSIZE] = {};	DWORD start_time, elapsed;	for(int j = 0 ; j < 10 ; ++j) {		start_time = GetTickCount();		for(int i = 0; i < XSIZE;i++)		{			arrayNumbers++;		}		elapsed = GetTickCount() - start_time;		printf("array: %d\n", elapsed);				start_time = GetTickCount();		for(int i = 0; i < XSIZE;i++)		{			vecNumbers++;		}		elapsed = GetTickCount() - start_time;		printf("vector: %d\n", elapsed);				start_time = GetTickCount();		for(int i = 0; i < XSIZE;i++)		{			staticArray++;		}		elapsed = GetTickCount() - start_time;		printf("static array: %d\n", elapsed);	}	system("pause");	return 0;}

Again: stack size is increased to handle the insane array.

My conclusion: bullshit
Quote:Original post by dpadam450
I haven't messed with boost, any other advantages to boost::array other than bounds checks?

boost::array supplies several advantages. For one thing it's a first class type unlike raw arrays, and you can do operations like assigning arrays with operator= in the intuitive manner. boost::array also provides a interface similar to that of the standard library containers. Ex:you can use begin() and end() with boost array to pass to standard library algorithms without manually doing the pointer arithmetic.

Quote:
It only does that if you use an iterator though. Maybe .at() checks bounds too?

No, pretty much every vector implementation will do bounds checks for you in debug mode using operator[]. at() is defined to do bounds checks even in release mode: it will throw an exception if you try to access an invalid element.

This topic is closed to new replies.

Advertisement