Sign in to follow this  

Why use std::vector?

This topic is 3778 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I really like std::vector, I love the concept of a dynamic array of course. However, my one gripe is the following snippet:

#include <vector>
#include <stdio.h>

static int constructs = 0;
static int destructs = 0;
static int copies = 0;
class Copyable
{
public:
    Copyable() { constructs++; }
    Copyable(const Copyable&) { copies++; }
    ~Copyable() { destructs++; }
};

int main(int argc, char* argv[])
{
    std::vector<Copyable> vec;
    for (int i = 0; i < 10000; i++) {
        vec.push_back(Copyable());
    }
    printf("%i constructs\n%i copies\n%i destructs\n", constructs, copies, destructs);
    return 0;
}



Prints out: 10000 constructs 34308 copies 34308 destructs Note that those precursor the scope exit; i.e. there would be total 44308 destructs after vector destruction. I find these copies and destructs to be excessive when it is easy to make something that can do similar with only 10000 total constructs and destructs with 0 copies; and it could still possess the same functionality as std::vector. Maybe I'm missing some advantage to this; but it seems a lot of overhead, particularly with user-defined classes! -Walt

Share this post


Link to post
Share on other sites
Is it a bottleneck?

If not, don't worry about it. Unless you have highly specialized needs, you can't implement something with the same characteristics that will not copy it's contents the way vector does to resize. You can't simply memcpy the data, that isn't safe for non-POD types.

Besides, simple solution? reserve(), std::vector<Copyable*>. Maybe an allocator if you really need to. There are lots of ways to improve the performance of SC++L containers in sane ways.

Share this post


Link to post
Share on other sites
To add to what's already been said, copying elements during resizing is the only sane way to grow the array, memcpy is not always suitable for custom types. Use reserve if you know how big it will need to be, or store pointers.

Share this post


Link to post
Share on other sites
Quote:
Original post by jpetrie
Is it a bottleneck?


A huge one, to be exact :) And reserve() only really works if you know how big you want your array, right?

Quote:
You can't simply memcpy the data, that isn't safe for non-POD types.


Please explain this; it's not that I don't believe you, but how does the following form for say non-POD object A not work:


push_back(A());
//Constructs A inside vector
push_back(A());
//Tests for size - resize required
//Allocate (malloc) size of new array
//memcpy() old array to new array
//free() old array
//Old objects should still exist in their proper form,
//as they were constructed validly, and all of their contents
//were properly transferred?
//Construct A inside vector



-Walt

Share this post


Link to post
Share on other sites
Well if you yourself have concluded that std::vector isn't suitable, have you thought about std::list?

Have you thought about how you can predict the kind of size to reserve?

Share this post


Link to post
Share on other sites
Change A() to std::string("Hey we have a long string!!!") and you'll find out, in the form of a crash. As soon as the type you are storing in the vector has a pointer to dynamic memory, you're going to run into problems when using memcpy. That's why standard containers use copy constructors and such, to ensure they work with every type(provided they implement a proper copy constructor, etc)

Share this post


Link to post
Share on other sites
Quote:
Original post by shadowman131
Please explain this; it's not that I don't believe you, but how does the following form for say non-POD object A not work:


There are two reasons to this. The most elementary one is that memcpy doesn't do what you seem to think it does.

memcpy is a function which copies a block of bytes from one location in memory to another. It doesn't copy PODs, it doesn't copy integers, it doesn't copy non-PODs, the only thing it can copy is bytes.

The C++ Standard lets you reinterpret a POD object or array of POD objects as a sequence of bytes, use memcpy on that sequence of bytes, and then reinterpret the destination sequence of bytes back as a POD object or array of POD objects. The C++ Standard explicitly forbids you from reinterpreting a non-POD object (or an array of such objects) as a sequence of bytes—attempting to do so against this interdiction will yield a sequence of bytes which may or may not be a complete representation of the object (i.e. some parts of the object might not appear in the representation). And, even then, the C++ Standard explicitly forbids you from reinterpreting a sequence of bytes as a non-POD object—attempting to do so will result in outright undefined behaviour, and your program may crash, do what you think it should do, overwrite random memory, or launch a nuclear missile at a cow ranch in Alaska.

The second reason is that it's not practical. Some objects may be doing something important with their position in memory—perhaps they registered themselves with a manager, or perhaps they are keeping a pointer or reference to one of their own members. When such objects are stored in a container, they should be notified that they are about to be moved around—something which a copy constructor and destructor will achieve with decent performance. In short, even if converting non-POD objects to a byte sequence worked (which it doesn't), your container will still cause bugs when self-referring objects are stored in it.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Previous thread on why memcpy() isn't a viable option for UDTs.


Right; let me rephrase my original question: IF you have a vector of non-derived classes, or a fixed derivation of a class (does sizeof() include virtual function tables? You may just have me there), then is there any benefit to that behavior? Yes, I know that that's a big if; but if I'm storing objects and not their pointers anyway, it's a safe bet I'm not using polymorphism. I understand perfectly why that doesn't work.

Quote:
Change A() to std::string("Hey we have a long string!!!") and you'll find out, in the form of a crash. As soon as the type you are storing in the vector has a pointer to dynamic memory, you're going to run into problems when using memcpy. That's why standard containers use copy constructors and such, to ensure they work with every type(provided they implement a proper copy constructor, etc)


Nope. There is no destructor called, yet the old pointer is de-referenced. The pointer passes hands safely since neither constructor nor destructor is called, but the data (pointer address) is copied precisely. Try it.

As for ToohrVyk . . .
Quote:
memcpy is a function which copies a block of bytes from one location in memory to another. It doesn't copy PODs, it doesn't copy integers, it doesn't copy non-PODs, the only thing it can copy is bytes.

This I know, though thank you for clarifying politely. It is the intended behavior.

As far as self-referencing objects that register themselves, yes, I thought of this. And yes, instead of a memcpy based vector you'd need to use std::vector (or perhaps a linked list, personally, since they're registering themselves elsewhere anyway).

-Walt

Share this post


Link to post
Share on other sites
Quote:
Original post by shadowman131
This I know, though thank you for clarifying politely. It is the intended behavior.


In that case, what did you intend to copy with it? You do have an array of non-POD objects, but you have no bytes. Or, if you did manage to copy an array of bytes to another, how would you transform the array of bytes back into an array of non-POD objects?

Quote:
does sizeof() include virtual function tables? You may just have me there


It's a little bit worse than that: there is no such thing as a virtual function table in C++—so wondering whether sizeof() includes it or not is a moot point. A virtual table point doesn't exist in C++ either.

Some compilers may, when certain compiling options are used, implement virtual functions using a pointer to a table of function pointers. Sometimes, that pointer will appear as part of the value returned by sizeof.

However, a compiler is free to implement virtual functions through a non-intrusive method (ranging from hashing tables to placing a virtual pointer outside the actual class memory to whatever magic they may come up with).

Share this post


Link to post
Share on other sites
Nope what? memcpying 2 dynamic strings is not safe in the least, barring such hacks as the guy in the other thread is doing, and not calling the destructor, which is a horrible hack to get around something which has trivial alternatives.

Sounds like your 'idea' is operating on raw memory as if they were objects. Also not recommended.

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
Quote:
Original post by shadowman131
This I know, though thank you for clarifying politely. It is the intended behavior.


In that case, what did you intend to copy with it? You do have an array of non-POD objects, but you have no bytes...


Even if the standard doesn't like it (yes, I'm stepping on slightly dangerous ground) the non-POD type sure is an array of bytes. Perhaps the format of them is undefined, but there are definitely valid bytes in it, particularly (as I am interested in) if all of its members are fixed-size classes or primatives.

-Walt

Share this post


Link to post
Share on other sites
Quote:
Original post by DrEvil
Nope what? memcpying 2 dynamic strings is not safe in the least, barring such hacks as the guy in the other thread is doing, and not calling the destructor, which is a horrible hack to get around something which has trivial alternatives.

Sounds like your 'idea' is operating on raw memory as if they were objects. Also not recommended.


I suggest you step through the memcpy process. It does not copy the string, but it does call the destructor and constructor properly (and once per allocated string).

As far as raw memory; it's working on the clearly defined (sizeof()) chunks of memory. It makes no assumption as to the content of memory, only the definition of pointers (a single address referring to a multiple (usually defined) number of bytes).

Share this post


Link to post
Share on other sites
Quote:

I suggest you step through the memcpy process. It does not copy the string, but it does call the destructor and constructor properly

I'm assuming you just worded this poorly? memcpy() will not call constructors or destructors at all. The overall process of having, say, a custom_vector<std::string> and internally using memcpy() to move the storage might result in the correct number of constructor/destructor calls, but that still doesn't make it correct or legal to do so.

Share this post


Link to post
Share on other sites
Quote:
Original post by shadowman131
I suggest you step through the memcpy process. It does not copy the string, but it does call the destructor and constructor properly (and once per allocated string).


Since when does memcpy allocate anything? Since when does it call destructors and constructors? (hint: it does no such thing)

Quote:
As far as raw memory; it's working on the clearly defined (sizeof()) chunks of memory. It makes no assumption as to the content of memory, only the definition of pointers (a single address referring to a multiple (usually defined) number of bytes).


memcpy works on the size that you pass it. Sounds to me like we're talking about 2 different things, care to clarify? I'm talking about the memcpy function which takes 2 addresses and a size and does a byte for byte copy. What are you talking about that that supposedly calls constructors and destructors?

Share this post


Link to post
Share on other sites
Quote:
Original post by shadowman131
Perhaps the format of them is undefined, but there are definitely valid bytes in it, particularly (as I am interested in) if all of its members are fixed-size classes or primatives.


Yes, these are valid bytes, which may be read by the program. The problem, however, is the reverse transformation. So (now discussing at the machine level, not the C++ level) you copy the bytes found at the object's memory location to another location. Why would that be sufficient to generate a fully working C++ object at the new location? The compiler is free to put object state information outside the object (and most compilers do, for instance in the form of a virtual table), so you will need to find a way to locate and alter the external representation so that the object works at the new location—this is assuming that you have write-access to that representation in the first place—and will involve a lot of reading compiler documentation (in those case where the behaviour is actually documented at all).

In short, your array class would probably have a nice label indicating "this class is only guaranteed to work on version Y of compiler X with options Z—use at your own risk on another compiler" which may or may not be acceptable to you.

Of course, that's ignoring the fact that explicitly invoking undefined behaviour is akin to a lemming run, if you don't have complete control over the compiler—that is, if you wrote the compiler, you may invoke undefined behaviour when you know what'll happen, otherwise you can't expect anything...

Share this post


Link to post
Share on other sites
*sigh* well, personally I think better in code. I urge you all to compile/look at this and see what I mean. I'm sure it does a better job explaining than me.

Oh, as far as derived classes - you're quite right on that, it's not safe to copy any sort of type information. So, from here on, let's just assume that the templated class T is not derived. Ok? There are still plenty of non-POD (maybe I'm mistaken about the definition of POD) classes that are not derived.



#include <malloc.h>

template<typename T>
class custom_vector
{
public:
/**Constructor. Does nothing but set the storage to uninitialized.
*/

custom_vector() : numObjects(0), maxObjects(0), storage(0) {}



/**Destruction. Calls freeStorage().
*/

~custom_vector()
{
_freeStorage();
}



/**In case the vector is, for instance, a pointer type, it may be easier
*to use a member function to access elements.
* @param index Specifies which element to return.
* @return Returns the requested array element (non-constant).
*/

T& at(size_t index)
{
return *((T*)storage + index);
}



/**In case the vector is, for instance, a pointer type, it may be easier
*to use a member function to access elements.
* @param index Specifies which element to return.
* @return Returns the requested array element (constant).
*/

const T& at(size_t index) const
{
return *((T*)storage + index);
}



/** @return Returns the requested array element (non-constant).
*/

T& operator[](size_t index)
{
return *((T*)storage + index);
}



/** @return Returns the requested array element (constant).
*/

const T& operator[](size_t index) const
{
return *((T*)storage + index);
}



/** @return Reports the number of objects in this vector.
*/

size_t size() const
{
return numObjects;
}



/**Adds a new element to the end of the list.
*Note that this invokes a copy constructor.
* @param object Object to add.
*/

void push_back(const T& object)
{
numObjects++;
_checkStorage();
new((T*)storage + numObjects - 1) T(object);
}



/**Without parameters, push_back adds a default construction of the object.
*This avoids a copy constructor, assuming default construction parameters
*are acceptable.
*/

void push_back()
{
numObjects++;
_checkStorage();
new((T*)storage + numObjects - 1) T();
}



/**Removes and deletes the final element in storage.
*/

void pop_back()
{
numObjects--;
((T*)storage + numObjects)->~T();
}



/**Exchanges two elements. This function is provided to help avoid copy
*constructors (for a temporary object).
* @param element1 First element to be exchanged
* @param element2 Element to exchange with the first
*/

void exchange(size_t element1, size_t element2)
{
char exchangeSpace[sizeof(T)];
T* element1Ptr = (T*)storage + element1;
T* element2Ptr = (T*)storage + element2;
memcpy(exchangeSpace, element1Ptr, sizeof(T));
memcpy(element1Ptr, element2Ptr, sizeof(T));
memcpy(element2Ptr, exchangeSpace, sizeof(T));
}



/**Adjusts the storage capacity to the requested size. If smaller than the
*current array, objects will be deleted off of the end down to the new
*size. If larger than existing array, new (default) objects will be
*created to fill the new space.
*/

void resize(size_t newSize)
{
if (numObjects == newSize)
return;
else if (numObjects > newSize) {
T* freeObject = (T*)storage + numObjects - 1;
while (numObjects > newSize) {
freeObject->~T();
freeObject--;
numObjects--;
}
_resizeStorage(newSize);
}
else { //numObjects < newSize
_resizeStorage(newSize);
T* newObject = (T*)storage + numObjects;
while (numObjects < newSize) {
new(newObject++) T();
numObjects++;
}
}
}



private:
/**Asserts that storage is large enough to contain all of our objects.
*Note that functions adding to storage size call this after incrementing
*numObjects.
*/

void _checkStorage()
{
if (numObjects > maxObjects) {
if (!storage) {
const size_t growth = 10;

storage = (char*)malloc(sizeof(T) * growth);
maxObjects = growth;
}
else {
const size_t growth = numObjects + (numObjects >> 1) + 10;

storage = (char*)realloc(storage,
sizeof(T) * growth);
maxObjects = growth;
}
}
}



/**Resizes the storage array without touching the object count.
*Does update maxObjects, however.
* @param count The new size of the storage array.
*/

void _resizeStorage(size_t count)
{
if (storage) {
storage = (char*)realloc(storage, sizeof(T) * count);
}
else
storage = (char*)malloc(sizeof(T) * count);
maxObjects = count;
}



/**Frees the storage array.
*/

void _freeStorage()
{
for (size_t i = 0; i < numObjects; i++) {
((T*)storage + i)->~T();
}
free(storage);
}



//Our actual data storage. Stored as a char* for easy addressing.
char* storage;

//Size of the storage
size_t numObjects;

//Maximum size of the storage
size_t maxObjects;
};



Then, for our std::string arguers:

#include <string>
void main()
{
custom_vector<std::string> myVec;
for (int i = 0; i < 10000; i++) {
myVec.push_back(std::vector("Hello bob. . . "));
}
//put a break point if you want; they're all valid, separate
//strings. Only 10000 constructors were called, and 10000
//copy constructors.
}



Oh, and for a separate issue . . . is using size_t bad? I think it is, but not sure on that one. I grabbed it because it's well defined as the native unsigned integer type, unless, again, I'm mistaken.

-Walt

Share this post


Link to post
Share on other sites
Well for one thing, your class isn't exception safe. For another thing, why write all that yourself when you can just use a standard library implementation that allows you to specify type traits for classes you know have trivial copy and assign? e.g.: STLPort with it's __type_traits class.

Share this post


Link to post
Share on other sites
Quote:

Then, for our std::string arguers:

This does mean you were phrasing things incorrectly. memcpy() is not calling any constructors or destructors; rather the implementation you have, that happens to use memcpy, is calling the number of constructors and destructors you are expecting.

Quote:

(maybe I'm mistaken about the definition of POD)

POD type. These are the only types that your container may legally hold.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
... that allows you to specify type traits for classes you know have trivial copy and assign? e.g.: STLPort with it's __type_traits class.


But the copy and assign operators don't have to be trivial for this class to work.

Quote:
Original post by jpetrie
POD type. These are the only types that your container may legally hold.


Then why does it hold classes with constructors and assignment operators properly?

Share this post


Link to post
Share on other sites
Quote:
Original post by shadowman131
Oh, as far as derived classes - you're quite right on that, it's not safe to copy any sort of type information. So, from here on, let's just assume that the templated class T is not derived. Ok? There are still plenty of non-POD (maybe I'm mistaken about the definition of POD) classes that are not derived.


So, we've removed derived classes and virtual methods... we've also removed objects which have an important copy constructor, because those wouldn't work by definition...

There's not a lot of non-POD objects left to consider, now [smile]

While we're at it:
Quote:
        memcpy(exchangeSpace, element1Ptr, sizeof(T)); 
memcpy(element1Ptr, element2Ptr, sizeof(T));
memcpy(element2Ptr, exchangeSpace, sizeof(T));


This code will actually be slower than using std::swap:
  • Objects which can be swapped quickly will provide a specialization of std::swap that will be faster than this approach (for instance, std::vector can be swapped around much faster than your trick above).
  • POD objects can be swapped around at least as fast, because copying them involves an inlined memory copy which is optimized by the compiler.



Quote:
Then, for our std::string arguers:


Toting around a bit of code with undefined behaviour and saying "it works" is about as sane as toting around an albino crow and saying "all crows are white". The real question is not "Does it work?": the problem with undefined behaviour isn't that it doesn't work, because it will work correctly in a vast majority of cases. The problem is that undefined behaviour can stop working because of completely unrelated changes in another part of code. To give another "it works" example, here's one:

#include <iostream>
#include <ostream>

int & get()
{
int array[2001];
return (array[1000] = 42);
}

int main()
{
int & x = get();
std::cout << x << std::endl; // Displays 42!
}


The real question is, "Can you show us it works on all platforms and compilers currently supporting the C++ language, in a way which does not depend on some unusual or unexpected constraints on the rest of the code?"

Quote:
is using size_t bad?


No, it's not bad, indices are guaranteed to fit inside. You could also have used ptrdiff_t

Share this post


Link to post
Share on other sites
Quote:
Original post by ToohrVyk
So, we've removed derived classes and virtual methods... we've also removed objects which have an important copy constructor, because those wouldn't work by definition...

There are plenty of classes left that have important copy constructors! For instance, those that have dynamically allocated memory that require a copy constructor to copy the memory. I would consider that an important copy constructor; without it the application would crash. Copy constructors work with this container, unless they require registering the class with an external object.


Quote:

While we're at it:
Quote:
        memcpy(exchangeSpace, element1Ptr, sizeof(T)); 
memcpy(element1Ptr, element2Ptr, sizeof(T));
memcpy(element2Ptr, exchangeSpace, sizeof(T));


This code will actually be slower than using std::swap:
  • Objects which can be swapped quickly will provide a specialization of std::swap that will be faster than this approach (for instance, std::vector can be swapped around much faster than your trick above).
  • POD objects can be swapped around at least as fast, because copying them involves an inlined memory copy which is optimized by the compiler.


Ah, thank you very much! [smile] That is a very good point.


Quote:

The real question is, "Can you show us it works on all platforms and compilers currently supporting the C++ language, in a way which does not depend on some unusual or unexpected constraints on the rest of the code?"


All code operates under unusual constraints, what matters is how well they are commented. For instance, not being able to use a python class as valid C code is an unusual constraint [wink].

On a more serious note, I've tried this in linux and windows; I know that's not a comprehensive set by any means, but could you maybe give me an example of a compiler where it doesn't work? Just because something is non-standard per say doesn't mean there exists a compiler on which it won't execute properly.

-Walt

Share this post


Link to post
Share on other sites
Quote:

Then why does it hold classes with constructors and assignment operators properly?

It doesn't. It just appears to; see ToohrVyk's crow analogy. It is still incorrect. You memcpy() a non-POD, you get undefined behavior. End of story.

Other than things like the performance issues mentioned, your vector is perfectly safe and legal -- but only for POD types (which means no constructors at all).

Quote:

Just because something is non-standard per say doesn't mean there exists a compiler on which it won't execute properly.

The problem is that there is no definition of properly. Undefined behavior is undefined behavior, no amount of locating examples where it works the way you'd like it to in particular situations will change the fact that it is bad. Don't use this custom_vector with non-POD types.

Share this post


Link to post
Share on other sites

This topic is 3778 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this