Sign in to follow this  
asdqwe

smart vector - why?

Recommended Posts

#include <vector>
#include <iostream>
using namespace std;

void main()
{
	vector<int> vec;
	vec.push_back(33);//0
	vec.push_back(66);//1
	vec.pop_back();
	cout<< vec[1];
}
Output: 66. So how does it remember there used to be a value "66" in that place? I thought popback erases the last value and then any RANDOM value would take its place. But it appears the vector remembers:)...

Share this post


Link to post
Share on other sites
The reason is simple - undefined behaviour.

Undefined can include a simple crash, subtle memory corruption or even appearing to work correctly (and more). Or lovely combinations of the above, that change when you attach a debugger. Have fun [grin]

Share this post


Link to post
Share on other sites
Quote:
Original post by Tim Ingham-Dempster
That should cause a runtime error....

No, as rip-off mentioned, it's undefined behavior. It may cause a runtime error, but it can do anything include transform your computer into a beautiful lead crystal figurine. If you want an out of bounds exception, use at() rather than operator[].

Share this post


Link to post
Share on other sites
Quote:
Original post by Rattenhirn

I hear about that a lot, but has it actually ever happened to anyone?


Don't ask....

Quote:
So how does it remember there used to be a value "66" in that place?


Because it never "forgot".

Although there's lots of talk about undefined behavior, it should be noted that in this particular case (considering actual implementation of std::vector, it isn't).

Vector is very conservative when it comes to releasing the memory. When it needs more, it usually allocates 2x the old size.

Now look at what can happen in this case (it's implementation dependant):

// X is undefined memory contents
Heap: [X][X][X][X][X][X][X][X][X][X][X]
vector<int> vec;

// Y is memory claimed by vec
Heap: [Y][X][X][X][X][X][X][X][X][X][X]

vec.push_back(33);//0
Heap: [33][X][X][X][X][X][X][X][X][X][X]

vec.push_back(66);
// oops, out of room, need to resize
// first, allocate 2x old size
// Y is allocated memory, but yet initialized
Heap: [33][Y][Y][X][X][X][X][X][X][X][X]
// next, copy old contents into new memory
Heap: [33][33][Y][X][X][X][X][X][X][X][X]
// release old memory
Heap: [X][33][Y][X][X][X][X][X][X][X][X]
// next, push 66
Heap: [X][33][66][X][X][X][X][X][X][X][X]

vec.pop_back();
// Vector hasn't re-allocated memory
// so the old heap contents are still valid
// and *defined*
Heap: [X][33][66][X][X][X][X][X][X][X][X]
0 1


Given actual implementation of std::vector, this is not undefined behavior from C++ perspective. vec[1] is valid memory location.

Conceptually, this is the same as saying:
int a, b, c;
bool allow_use_of_b;
if (allow_use_of_b) cout << b;
In above case, we may choose to say that b isn't valid, but from language perspective, b doesn't change, and remains valid.

At this point it should be pointed out that many a topics have been started about "ZOMG STL IS SLOOOW!!!!!". The first thing everyone does at that point is turn off checked iterators.

Duh!

The whole reason that performance hog exists is to check for this type of scenario. Checked iterators and secure SCL exist not to prevent undefined behavior, but to check for undesired and almost certainly invalid *defined* behavior.

Note that this type of problem can trivially occur even in managed languages. This is purely a logic error with regard to implementation of std::vector, where subscript operator does not check bounds - managed languages do (ZOMG C# IS SLOW, why doesn't CLR remove redundant bounds checks!!!!).

But it is not a buffer overrun, nor undefined behavior (sometimes, but in case of vector rarely, can be).

In managed languages, people often implement a list, but make a small mistake.
void List::remove(int i) {
size--;
contents[i] = contents[size];
};
All fine and dandy. But what happens if List manages objects? contents[size] still references the object and remains alive. In managed languages, this would be a memory leak.

So from functional perspective, this type of behavior is closer to memory leak in managed languages than it is to buffer overrun.

See here (although article is mostly advertisement for a certain product, it was this type of problem).

The solution to above is obviously:
void List::remove(int i) {
size--;
contents[i] = contents[size];
contents[size] = null;
};
In case of OP, there is no 'null' value that could be written, and leaving original value, or writing a random one wouldn't matter.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Although there's lots of talk about undefined behavior, it should be noted that in this particular case (considering actual implementation of std::vector, it isn't).


No, this is undefined. The operational semantics of v[n] on a vector v is *(v.begin() + n). If n is greater than or equal to the size of v, then you're dereferencing an iterator at end() or past end(). This is undefined behavior. See sections 23 and 24 in the C++ Standard, in particular section 23.1.1 paragraph 12.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Quote:
Original post by Antheus
Although there's lots of talk about undefined behavior, it should be noted that in this particular case (considering actual implementation of std::vector, it isn't).


No, this is undefined. The operational semantics of v[n] on a vector v is *(v.begin() + n). If n is greater than or equal to the size of v, then you're dereferencing an iterator at end() or past end(). This is undefined behavior. See sections 23 and 24 in the C++ Standard, in particular section 23.1.1 paragraph 12.


As it turns out, it depends on the type stored in vector. In case of MSVC, when a value is removed from a vector, _Destroy will be called on it.

Unfortunately, _Destroy doesn't do anything in case of basic types. End result is quite simply that the memory location does remain valid, so does the value.

I agree that it's undefined per standard. But like I said, the end result is unfortunately not undefined behavior in case of types for which (in this case) _Destroy doesn't do anything.

In some cases, algorithms were designed which took into consideration such implicit life-span of values in memory. While fragile in nature, the algorithm itself could be made to work in a completely deterministic manner.

I consider this a much more serious problem than typical invalid memory references, since the point at which such value goes from valid reference to invalid memory location is deterministic, and may never occur. It may even be safely, and to a degree portably exploited.

I would also consider this to be very representative of what makes C++ such problematic language. This is one of those corner cases which just calls for errors. Again, I'm pragmatic about what C++ standard means and am more concerned with what one encounters in practice.

As a side note - memory pools and pre-allocated storage deliberately rely on exactly the same behavior to avoid crashes. When dealing with lock-less structures, this type of references are almost required to avoid access to unallocated memory, since you never know if anyone is still touching the memory that you wish to release (a condition which cannot be practically checked without waiting, defeating the purpose of lock-less algorithms).

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
As it turns out, it depends on the type stored in vector.

No, it doesn't. Even on MSVC if you compile things one way it may "work" like you describe, or it could cause a runtime error if you compile things another way. This is the essence of undefined behavior: it can do anything. This is both a matter of standardization: the standard says that it's undefined, and a matter of practice: indexing outside the bounds of a vector, even for a POD type, can have widely different behavior depending on your compiler, operating system or standard library implementation.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Quote:
Original post by Antheus
As it turns out, it depends on the type stored in vector.

No, it doesn't. Even on MSVC if you compile things one way it may "work" like you describe, or it could cause a runtime error if you compile things another way. This is the essence of undefined behavior: it can do anything. This is both a matter of standardization: the standard says that it's undefined, and a matter of practice: indexing outside the bounds of a vector, even for a POD type, can have widely different behavior depending on your compiler, operating system or standard library implementation.

Phases of the moon, when the last election was, the results of the last poll, the number of times your computer has crashed, how much wood a woodchuck could chuck if a woodchuck could chuck wood.

As SiCrane has said (and I hope I've demonstrated above) the behavior of undefined behavior is undefined. An implementation is allowed to perform implementation defined behavior upon undefined behavior, however in the general case that remains undefined behavior.

Share this post


Link to post
Share on other sites
Quote:
Original post by Washu
the behavior of undefined behavior is undefined

Does behavior really have behavior? I'd say the behavior of something that has undefined behavior is undefined.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Quote:
Original post by SiCrane
Quote:
Original post by Antheus
Although there's lots of talk about undefined behavior, it should be noted that in this particular case (considering actual implementation of std::vector, it isn't).


No, this is undefined. The operational semantics of v[n] on a vector v is *(v.begin() + n). If n is greater than or equal to the size of v, then you're dereferencing an iterator at end() or past end(). This is undefined behavior. See sections 23 and 24 in the C++ Standard, in particular section 23.1.1 paragraph 12.


As it turns out, it depends on the type stored in vector. In case of MSVC, when a value is removed from a vector, _Destroy will be called on it.

Unfortunately, _Destroy doesn't do anything in case of basic types. End result is quite simply that the memory location does remain valid, so does the value.


Irrelevant -- both due to the fact that it's an implementation detail, and the fact that said implementation detail also has no bearing on the implementation of v[n]. In the case, again, of MSVC (2008), when you try to access such "removed" elements, taking the originally posted example in debug mode...

We get a debug assertion failure. This is perfectly legal behavior under "undefined behavior", and the exact kind of thing that causes us to be completely against labeling implementation specific deterministic behavior as "defined behavior".

Quote:
I would also consider this to be very representative of what makes C++ such problematic language. This is one of those corner cases which just calls for errors. Again, I'm pragmatic about what C++ standard means and am more concerned with what one encounters in practice.

That's great and all, but there's a surprising number of times that seemingly... harmless instances of UB can and will turn around and cause problems. Since I am not omniscient, capable of seeing every possible problem caused by even those cases of UB, prudence dictates I try to avoid all cases of UB, given how easy it is to do so once you've identified it.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this