Sign in to follow this  

std::vector size overhead

This topic is 3455 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm loading some very large 2D array into memory, and am therefore using vector<vector<t>> like thus:
foo(unsigned int row, unsigned int col)
			{
				
				matrix = new std::vector< std::vector<T>* >();
				matrix->resize(row);

				for(unsigned int i = 0; i < row; i++)
				{
					matrix->at(i) = new std::vector<T>(col);
				}
				m_rows = row;
				m_cols = col;
			}


Typically, T is double. Now for my test case, row = 8369 col = 6066 sizeof(double)=32bits //edit <- this should be 64bit as the below poster mentions 8369*6066*32 = 1624523328 bits = 193.658271789551 MB per file(according to this). However, after the first file load, my program (after have doing nothing else put load) is using ~400mb of ram. So it seems like an almost 2-fold overhead... Is this normal? Or is there something else I've managed to screw up that is resulting in this? Thanks for any input. [Edited by - _Sigma on June 25, 2008 10:19:16 PM]

Share this post


Link to post
Share on other sites
I'll also point out that since you are doing 2d allocations, there is going to be some overhead due to fragmentation between your allocations. If you want to cut the overhead, allocate a 1d array, and build an accessor to get at it as if it were 2d.

Share this post


Link to post
Share on other sites
Quote:
Original post by dwahler
What platform are you using on which a double is 32 bits? The IEEE standard for double-precision is 8 bytes (64 bits).

Holy shit, am I out of it. That is clearly the problem in my calculations.
I have been having just the worst day so far, so I suppose that shouldn't come as a surprise...

many thanks.

Quote:

I'll also point out that since you are doing 2d allocations, there is going to be some overhead due to fragmentation between your allocations. If you want to cut the overhead, allocate a 1d array, and build an accessor to get at it as if it were 2d.

I was originally doing this, and I think I'll go back. I was running out of memory, so I though perhaps looking for non-continuous blocks of smaller size might be a better idea until I had the problem sorted out.

Share this post


Link to post
Share on other sites
vectors of vectors have problems, as a vectoy copy is expensive. Until C++0x, vectors aren't able to use move shortcuts on their methods when they resize.

So if the base dimension resizes, you end up having to reallocate all of the data!

I find a deque< vector > tends to work better, if you don't do the "single dimension pretending to be multiple dimension" trick.

Share this post


Link to post
Share on other sites
Ultimately I am going to use tbb::concurrent_vector from Intel's threading building blocks so I can parallelize my code.

I might have use for boost::multi_array else where -> have you had good experiences with it?

Share this post


Link to post
Share on other sites
With a vector of vectors, you are going to pay the vector "header" overhead for each "row" of the vector. That is, 'size' and 'capacity' information, and a pointer to the allocations (plus the system's internal book-keeping of allocations, which isn't portably accessible from C++). This is redundant, because all those size and capacity fields are going to store the same values - you have the flexibility to make each row a different length, but you don't want or need it. You also lose cache coherency, as pointed out, because each row allocation is separate.

You then make both these problems worse by storing the rows by pointer in the first vector, which is completely useless for your purposes and only complicates matters. Keep in mind that a vector's contents (minus the "header") is always dynamically allocated, so there's no use indirecting things manually (except to allow for polymorphism or object sharing; but in these cases, raw pointers are a bad idea anyway).

The proper way to deal with this, yes, is boost::multi_array.

Also, if you must use the internet to do your arithmetic, why not just use Google? :)

Share this post


Link to post
Share on other sites
Quote:
Original post by ZahlmanKeep in mind that a vector's contents (minus the "header") is always dynamically allocated

Must admit I did not know this.

Quote:

The proper way to deal with this, yes, is boost::multi_array.

I am actually using Intel's TBB as I am parallelizing some of my code. I was using the concurrent_vector class, but when I ran into troubles, I moved to std::vector to attempt to rule out as many different things as possible. Currently, I don't use the vector<vector<T>> method, just one large vector. I had read somewhere that fragmenting, as you say, potentially makes it easier to allocate the memory at the cost of the additional overhead. So that is what I was doing.

Quote:

Also, if you must use the internet to do your arithmetic

And why would I not?

Quote:
why not just use Google? :)

Yeah, I tried - but the night I did it Google Calculator didn't work...I was at a loss, so I used the above site.

Share this post


Link to post
Share on other sites

This topic is 3455 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this