Sign in to follow this  
fpsgamer

[C++] "Casting object addresses to char* ... almost alywas yields undefined behavior"

Recommended Posts

I was reading an Item in Effective C++ and I came across a quote:
    "... casting object addresses to char* pointers and then performing pointer arithmetic on them almost always yields undefined behavior"
It would seem to me that more often than not that that sort of code would yield perfectly defined behavior as long as the code is sensible. Consider this:
    class Foo; ... Foo* foo_ptr = new Foo; char* char_ptr = reinterpret_cast<char*>(foo_ptr);
We know by definition that sizeof(char) is always 1 byte. We also know that Foo consists of sizeof(Foo) bytes. So we may advance char_ptr as many as sizeof(Foo) times. What caveats am I failing to see? It there some funny business regarding virtual functions and vtables, memory layout etc?

Share this post


Link to post
Share on other sites
What he is getting at is that starting from something like:

class foo
{
virtual void stuff(){}
int x;
}
class bar : public foo {int y;}

this code is incorrect

bar b[20];
foo *f = &b[0];
f[1].x = 10;
cout << b[1].x << endl;

likewise, had I done

bar b[20];
char *c = (char *)&b[0];
c += sizeof(foo);
((foo *)c).x = 10;
cout << b[1].x << endl;

I still didn't edit the right memory location, and i get garbage. (AND in vs2008, i can look in the debugger, and both times
I just stomped the vtable for b[1])

The point of that statement is that given a pointer, you can't make assumptions about it. Unless you know that you have a
POD type pointer, you could randomly trash data by manipulating it. That stands for any cast, the only point of saying "to char *"
specificly is to emphesize that when you decide to throw out all information regarding your pointers you break things.

Share this post


Link to post
Share on other sites
Quote:
The point of that statement is that given a pointer, you can't make assumptions about it. Unless you know that you have a
POD type pointer, you could randomly trash data by manipulating it. That stands for any cast, the only point of saying "to char *"
specificly is to emphesize that when you decide to throw out all information regarding your pointers you break things.
I agree with that statement, although I don't think you've given good examples. Of course advancing an array of type X with sizeof(Y) bytes where sizeof(X) != sizeof(Y) would result in an error. The idea behind the pointer arithmetic after converting to char* is that you generally can't do things like
struct A
{
short sh;
char byte;
int n;
};

A a;
char* p = (char*)&a;
p += sizeof( short ) + sizeof( char );
*((int*)p) = 5; // We're probably not assigning to n

Share this post


Link to post
Share on other sites
Quote:
Original post by thedustbustr
I can't find the reference in Effective C++ Vol 3, could you look it up for me?


Yes, I don't remember that one being in the latest version (the one I own). Maybe it was one of the ones he removed, which would show that the item may not be important / valid.

Share this post


Link to post
Share on other sites
Quote:
Original post by Mike.Popoloski
Quote:
Original post by thedustbustr
I can't find the reference in Effective C++ Vol 3, could you look it up for me?


Yes, I don't remember that one being in the latest version (the one I own). Maybe it was one of the ones he removed, which would show that the item may not be important / valid.


I was not citing the title of an item, rather I was citing a line within one of the items, namely "Item 27: Minimize casting. 116".

That quote caused me confusion because for example, casting to char is pretty much the only way to write objects to disk isn't it?

Share this post


Link to post
Share on other sites
Quote:
Original post by fpsgamer

That quote caused me confusion because for example, casting to char is pretty much the only way to write objects to disk isn't it?


No and it never has been for non POD types. In that item he has just explained that a pointer to a derived does not mean a pointer to its base class has the same address, you have vtables (as pointed out in this thread aswell), then there is structure padding etc etc. The only thing you can do to a POD class in relation to this, that the standard guarantees, is that an object can be copied in a block of memory the same size (or bigger) and copied back in the object resulting in an object which is consistent with its state before the operation.

If you know information about which compiler you are using and its characteristics then you can do some none standard shenanigans.

Share this post


Link to post
Share on other sites
Any use of 'reinterpret_cast' involves at least implementation-defined behavior. The only specified behavior is that casting to a different type and then casting back to the first type must, in many cases, give back the original value. Other than that, the behavior is unspecified.

In other words, any useful code that uses reinterpret_cast is making assumptions about the implementation it is being compiled under.

Share this post


Link to post
Share on other sites
@Zalhman, sorry to cause you alarm :) If it makes you feel any better I have always been knowingly ignorant on this topic, so no actual software was at any risk [grin]

Quote:
Original post by Extrarius
Any use of 'reinterpret_cast' involves at least implementation-defined behavior. The only specified behavior is that casting to a different type and then casting back to the first type must, in many cases, give back the original value. Other than that, the behavior is unspecified.

In other words, any useful code that uses reinterpret_cast is making assumptions about the implementation it is being compiled under.


I think I get what you mean.

So could it be said that (in practical cases) it can be moral to reinterpret_cast POD types, but never moral to reinterpret_cast non-POD types?

[edit]

On a somewhat related note. In C++, can basic types like int have arbitrary alignment requirements? If so, then that would mean taking an array of type int and walking through it via char* could yield some unsavoury behavior.

Share this post


Link to post
Share on other sites
Usually, int32 have a 4-byte align, int16 2-byte align, and int8 1-byte align.
eg.

char *c = ...
int *p = (int*)c;

Doing operations on *p or any *(p + n) results in undefined behavior, especially for ARM.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this