Jump to content

  • Log In with Google      Sign In   
  • Create Account


#ActualChad Smith

Posted 17 May 2013 - 06:31 PM

I wanted to do a quick performance (very unscientific though) test on std::vector and std::list and I was very surprised by the results.  Maybe it's just me not understanding or knowing how each implements everything internally.

 

 

My surprise came when I sorted a vector vs. sorted a equally size linked list.  I understand that std::sort requires a random access iterator that std::list doesn't provide which is why their is a list::sort method.  Though I thought that using std::sort would require a swap internally (not sure why I do think that, I'm sure I should be doing more research about the internals of each) which wold require each and every object to be copied over.  Now the test object I use is just a very basic struct that "simulates" a Rectangle that has a length, width, x position, and y position, all stored as doubles.  I overloaded the < operator in it to do the sort and it should sort it based on area (just went ahead and calculated the length*width instead of storing an area member variable, not trying to make this scientific or find performances).

 

In my test I store 1,000,000 rectangles in a vector a list and set the length, width, xPosition, and yPosition to random numbers for each element in the vector.  I then set each elements length, width, xPosition, and yPosition in the list to the vectors corresponding element.  Calculate the time for each.  Maybe I should use a more scientific approach to this, as I am only using std::clock_t and CLOCKS_PER_SEC to calculate.

 

When doing this I am very surprised at the performance gap between the two!

First, here's my code to do this quick test, although it isn't anything special or scientific

// Tests Sorting Lists and Vectors Performance

#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <ctime>

struct Rectangle
{
	double length;
	double width;
	double xPosition;
	double yPosition;

	bool operator<(const Rectangle& rect) const
	{
		return (length*width) < (rect.length*rect.width);
	}
};

int main()
{
	const int NUM_ELEMENTS = 1000000;
	std::clock_t startTime;
	std::clock_t stopTime;

	std::vector<Rectangle> myVector(NUM_ELEMENTS);
	std::vector<Rectangle>::iterator vectIter;

	// Insert rectangle data in vector
	startTime = clock();
	for(vectIter = myVector.begin(); vectIter != myVector.end(); ++vectIter)
	{
		vectIter->length = 1.0*rand()/RAND_MAX;
		vectIter->width = 1.0*rand()/RAND_MAX;
		vectIter->xPosition = 1.0*rand()/RAND_MAX;
		vectIter->yPosition = 1.0*rand()/RAND_MAX;
	}
	stopTime = clock();
	double totalTime = (stopTime-startTime)/(double) CLOCKS_PER_SEC;
	std::cout << "Inserted rectangles in vector in: " << totalTime <<std::endl;

	std::list<Rectangle> myList(NUM_ELEMENTS);
	std::list<Rectangle>::iterator listIter;

	// Insert rectangle data in list
	startTime = clock();
	for(vectIter = myVector.begin(), listIter = myList.begin(); listIter != myList.end() && vectIter != myVector.end(); ++listIter, ++vectIter)
	{
		listIter->length = vectIter->length;
		listIter->width = vectIter->width;
		listIter->xPosition = vectIter->xPosition;
		listIter->yPosition = vectIter->yPosition;
	}
	stopTime = clock();
	totalTime = (stopTime-startTime)/(double) CLOCKS_PER_SEC;
	std::cout << "Inserted rectangles in list in: " << totalTime <<std::endl;

	// sort the vector using std::sort algoorithm
	startTime = clock();
	std::sort(myVector.begin(), myVector.end());
	stopTime = clock();
	totalTime = (stopTime-startTime)/(double) CLOCKS_PER_SEC;
	std::cout << "Time for sorting a vector: " << totalTime << std::endl;

	// sort the list
	// can't use std::sort, use the list member function
	startTime = clock();
	myList.sort();
	stopTime = clock();
	totalTime = (stopTime-startTime)/(double) CLOCKS_PER_SEC;
	std::cout << "Time for sorting a list: " << totalTime << std::endl;

	return 0;
} 

The output/performance I get

Inserted Rectangles in Vector in: 1.597

Inserted Rectangles in List in: 2.989

Time for sorting a vector: 3.384

Time for sorting a list: 111.252

 

What's with the HUGE perforamance decrease in sorting that list?  I understand list is slow at random access and that is what std::vector is strong at, but that big a performance hit when using a list?  I think list::sort uses a merge sort, which is O(NlogN), I believe.  What would std::sort be using in this case?  is their any guarantee or way to see what implementation it is using (maybe I didn't research enough), but does the standard guarantee what it will use? Is their a worst case Big-Oh run time that std::sort guarantees?  Average Big-Oh run time?

 

I understand that this isn't a scientific test nor would I use something like that in a real world code, but is their something just crazy wrong with my code that I'm missing that is causing the list to be that much slower?  I do still understand where benefits of using a list would come in, but if this is correct I can totally understand why I've always been told "look at std::vector first."  Is this correct?  Maybe it's because since the object I am sorting in each is very basic and if I was using a larger and more advanced class that would take longer to copy and results could be different?

 

Interested in any comments on this.

 

Note: this is just curiosity and I am not trying to pre-optimize current real world code that I have or anything.  Just my Computer Science brain being curious.


#1Chad Smith

Posted 17 May 2013 - 06:24 PM

I wanted to do a quick performance (very unscientific though) test on std::vector and std::list and I was very surprised by the results.  Maybe it's just me not understanding or knowing how each implements everything internally.

 

 

My surprise came when I sorted a vector vs. sorted a equally size linked list.  I understand that std::sort requires a random access iterator that std::list provides which is why their is a list::sort method.  Though I thought that using std::sort would require a swap internally (not sure why Id o think that, I'm sure I should be doing more research about the internals of each) which wold require each and every object to be copied over.  Now the test object I use is just a very basic struct that "simulates" a Rectangle that has a length, width, x position, and y position, all stored as doubles.  I overloaded the < operator in it to do the sort and it should sort it based on area (just went ahead and calculated the length*width instead of storing an area member variable, not trying to make this scientific or find performances).

 

In my test I store 1,000,000 rectangles in a vector a list and set the length, width, xPosition, and yPosition to random numbers for each element in the vector.  I then set each elements length, width, xPosition, and yPosition in the list to the vectors corresponding element.  Calculate the time for each.  Maybe I should use a more scientific approach to this, as I am only using std::clock_t and CLOCKS_PER_SEC to calculate.

 

When doing this I am very surprised at the performance gap between the two!

First, here's my code to do this quick test, although it isn't anything special or scientific

// Tests Sorting Lists and Vectors Performance

#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <ctime>

struct Rectangle
{
	double length;
	double width;
	double xPosition;
	double yPosition;

	bool operator<(const Rectangle& rect) const
	{
		return (length*width) < (rect.length*rect.width);
	}
};

int main()
{
	const int NUM_ELEMENTS = 1000000;
	std::clock_t startTime;
	std::clock_t stopTime;

	std::vector<Rectangle> myVector(NUM_ELEMENTS);
	std::vector<Rectangle>::iterator vectIter;

	// Insert rectangle data in vector
	startTime = clock();
	for(vectIter = myVector.begin(); vectIter != myVector.end(); ++vectIter)
	{
		vectIter->length = 1.0*rand()/RAND_MAX;
		vectIter->width = 1.0*rand()/RAND_MAX;
		vectIter->xPosition = 1.0*rand()/RAND_MAX;
		vectIter->yPosition = 1.0*rand()/RAND_MAX;
	}
	stopTime = clock();
	double totalTime = (stopTime-startTime)/(double) CLOCKS_PER_SEC;
	std::cout << "Inserted rectangles in vector in: " << totalTime <<std::endl;

	std::list<Rectangle> myList(NUM_ELEMENTS);
	std::list<Rectangle>::iterator listIter;

	// Insert rectangle data in list
	startTime = clock();
	for(vectIter = myVector.begin(), listIter = myList.begin(); listIter != myList.end() && vectIter != myVector.end(); ++listIter, ++vectIter)
	{
		listIter->length = vectIter->length;
		listIter->width = vectIter->width;
		listIter->xPosition = vectIter->xPosition;
		listIter->yPosition = vectIter->yPosition;
	}
	stopTime = clock();
	totalTime = (stopTime-startTime)/(double) CLOCKS_PER_SEC;
	std::cout << "Inserted rectangles in list in: " << totalTime <<std::endl;

	// sort the vector using std::sort algoorithm
	startTime = clock();
	std::sort(myVector.begin(), myVector.end());
	stopTime = clock();
	totalTime = (stopTime-startTime)/(double) CLOCKS_PER_SEC;
	std::cout << "Time for sorting a vector: " << totalTime << std::endl;

	// sort the list
	// can't use std::sort, use the list member function
	startTime = clock();
	myList.sort();
	stopTime = clock();
	totalTime = (stopTime-startTime)/(double) CLOCKS_PER_SEC;
	std::cout << "Time for sorting a list: " << totalTime << std::endl;

	return 0;
} 

The output/performance I get

Inserted Rectangles in Vector in: 1.597

Inserted Rectangles in List in: 2.989

Time for sorting a vector: 3.384

Time for sorting a list: 111.252

 

What's with the HUGE perforamance decrease in sorting that list?  I understand list is very slow at random access and that is what std::vector is strong at, but that big a performance hit when using a list?  I think list::sort uses a merge sort, which is O(NlogN), I believe.  What would std::sort be using in this case?  is their any guarantee or way to see what implementation it is using (maybe I didn't research enough), but does the standard guarantee what it will use? Is their a worst case Big-Oh run time that std::sort guarantees?  Average Big-Oh run time?

 

I understand that this isn't a scientific test nor would I use something like that in a real world code, but is their something just crazy wrong with my code that I'm missing that is causing the list to be that much slower?  I do still understand where benefits of using a list would come in, but if this is correct I can totally understand why I've always been told "look at std::vector first."  Is this correct?

 

Interested in any comments on this.

 

Note: this is just curiosity and I am not trying to pre-optimize current real world code that I have or anything.  Just my Computer Science brain being curious.

 


PARTNERS