Sign in to follow this  

Invalid and dangerous pointer casting

This topic is 4595 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I just posted about this in my dev journal, but I really want an answer to this, so I figured I'd post about it as well. Directly quoted from journal: I'm playing around with pointers and what can go wrong with them. I've been using C++ for a while, so I understand them fairly well. I'm trying to understand how classes are represented in memory though, and exactly what happens when you cast a pointer to an invalid type. For example:
class a {
public:
	a() {}
	virtual ~a() {}

	const char* getName() { return "Hi! I'm a"; }
};

class c {
public:
	c() { num = 5; }
	virtual ~c() {}

	int getNumber() { return num; }

private:
	int num;
};

c* cInstance = new c;
printf("%s\n", ((a*)cInstance)->getName());



*Somehow* this correctly prints out "Hi! I'm a". How in the world is possible? I know that it's sometimes possible that memory happens to map out correctly and it seems like everything is ok, even though something very dangerous happened. But in this case, how is ever calling code that prints that out when we are dealing with an instantiated "c" class? I thought that it would try to execute memory that is part of the "c" class, who knows what memory, and it would probably crash. But how it ever calls code from class "a", especially when we never even instantiate an object of type "a", it beyond me. Can anyone help? I'm using VS .NET 2003.

Share this post


Link to post
Share on other sites
My guess is that it has something to do with the fact that class a doesn't have any data members. A's function just returns a string literal, which has nothing to do with any data members. The big problem with casting between types is that you might have the member sizes incorrect, which is baaaad.

Share this post


Link to post
Share on other sites
Quote:
*Somehow* this correctly prints out "Hi! I'm a". How in the world is possible? I know that it's sometimes possible that memory happens to map out correctly and it seems like everything is ok, even though something very dangerous happened. But in this case, how is ever calling code that prints that out when we are dealing with an instantiated "c" class? I thought that it would try to execute memory that is part of the "c" class, who knows what memory, and it would probably crash. But how it ever calls code from class "a", especially when we never even instantiate an object of type "a", it beyond me. Can anyone help? I'm using VS .NET 2003.


Nothing really magical about it. What happens, in effect, is that a::getName is called with an incorrect this pointer. It'd be the same as if I typed:

a * example = 0x12345678;
example->getName();


Now, underneath the hood, here's what this looks like:

const char * getName( a * this )
{
return "Hi! I'm a";
}


Notice that the this pointer is never used. We never try to access the memory, so nothing bad happens (although this is horrible programming technique). Try it again with a defined like so:

class a
{
const char * message;
public:
a( void )
{
message = "Hi! I'm a";
}
const char * getName() { return message; }
};


This time, the function underneath the hood looks more like so:

const char * getName( a * this )
{
return this->message;
}


This time, the this pointer is used. Going back to the example = 0x12345678 example, the function will try to access the memory at 0x12345682, or 4 bytes past the start of the class (where a pointer to the virtual function table is supposedly kept). This will most likely result in the program dying a nasty death, since that address probably hasn't been allocated by the program.

Classes in memory are pretty simply laid out. If the class contains any virtual functions (inherited or otherwise) there is a pointer to a virtual function table, which is used to lookup the correct function to use when the compiler can't determine it's type. Then, one after another, are the memory spaces for all the variables.

Share this post


Link to post
Share on other sites
String Literals are stored "in code" meaning that when the compiler organzed everything, the string literal is placed outside of any path of execution, but still within the block where the executable code is. So, when you create a new object, which will reside in the heap, it will return a pointer into the code block where the string literal is. Just as an experiment, produce three objects all of the same class and compare the pointer to those strings, they should all be the same.

Share this post


Link to post
Share on other sites
Oops, I misinterpreted the meaning of your code.

Well, my prior statement still holds. With objects, calling a member function via the instance's identifier is almost exactly the same as this:

class bob
{
public: int member_data;
}

int bob_getdata( class bob * p )
{
return p->member_data;
}

void main()
{
class bob x;
printf("&i", bob_getdata(&x));
}

So, the function you called didn't use the pointer, but rather gave you the string literal address like I mentioned in my last post, and you didn't have a memory leak crash.

Share this post


Link to post
Share on other sites
Quote:
Original post by okonomiyaki
But how it ever calls code from class "a", especially when we never even instantiate an object of type "a", it beyond me.

your string is tossed on the stack, therefore you didn't need any instances. you're simpy calling a function (in a very scary way..) that returns a pointer to it.

that casting made me cringe.

Share this post


Link to post
Share on other sites
Great, thanks a lot guys, especially MaulingMonkey. That's what I was wondering- exactly how classes and member functions are translated. Makes a lot of sense now. Good to know how things are compiled.

Quote:

String Literals are stored "in code" meaning that when the compiler organzed everything....


Yep, that's true, so MaulingMonkey's example should still actually work, but he got the point across.

I think one of the best bugs is:


char* s = "1";
sprintf(s, "2");
printf("1");

Edit: well, program actually crashed, but I know you can sometimes write over a string literal which changes every other place that literal is used.

Edit2: You all responded while I was writing [smile] Inmate, yeah, you are exactly right, thanks for the help! I understand now.
gumpy, it was done purposefully for learning purposes [grin]

Share this post


Link to post
Share on other sites
Quote:
Original post by okonomiyaki
Great, thanks a lot guys, especially MaulingMonkey. That's what I was wondering- exactly how classes and member functions are translated. Makes a lot of sense now. Good to know how things are compiled.

Quote:

String Literals are stored "in code" meaning that when the compiler organzed everything....


Yep, that's true, so MaulingMonkey's example should still actually work, but he got the point across.


Well, it'd work in that it'd interpret a random chunk of the class C as a pointer to some characters and start spitting out all the data starting at that point until it hit a null. It wouldn't work in the sense that the variable message would never be initialized to point at the real message, and thus the result would likely be jibberish or a segfault or similar.

Share this post


Link to post
Share on other sites
Quote:
Original post by MaulingMonkey
Quote:
Original post by okonomiyaki
Great, thanks a lot guys, especially MaulingMonkey. That's what I was wondering- exactly how classes and member functions are translated. Makes a lot of sense now. Good to know how things are compiled.

Quote:

String Literals are stored "in code" meaning that when the compiler organzed everything....


Yep, that's true, so MaulingMonkey's example should still actually work, but he got the point across.


Well, it'd work in that it'd interpret a random chunk of the class C as a pointer to some characters and start spitting out all the data starting at that point until it hit a null. It wouldn't work in the sense that the variable message would never be initialized to point at the real message, and thus the result would likely be jibberish or a segfault or similar.


Oh wait, you're right. I was thinking that it would still find the string literal and return it, but it has to find the literal from the address stored in "message", and it would try to access the class, so yeah, it would still break.

Didn't mean to say it would still work in that it would read the C class as a string!

Share this post


Link to post
Share on other sites
One thing that through me for a bit, is that getName is a virtual function, so it has to go through the vtable to access the function. Since it's really a 'c' object, there is no vtable pointer in the class! The compiler must look-up the vtable based on the static type of 'a' (as there generally is only 1 actual vtable per class that all instances of a class ought to point to).

Change it to this:
class b : virtual public class a
{
};

c* cInstance = new c;
printf("%s\n", reinterpret_cast<b*>(cInstance)->getName());

And it ought to blow-up.

Share this post


Link to post
Share on other sites

This topic is 4595 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this