How a derived class cannot resolved members of its base class at compile time?

Started by
8 comments, last by SiCrane 12 years, 2 months ago
Consider this code:

class X { public: int i; };
class A : public virtual X { public: int j; };
class B : public virtual X { public: double d; };
class C : public A, public B { public: int k; };
// cannot resolve location of pa->X::i at compile-time
void foo( const A* pa ) { pa->i = 1024; }

main() {
foo( new A );
foo( new C );
// ...
}


It is said that the compiler cannot fix the physical offset of X::i accessed through pa within foo(), since the actual type of pa can vary with each of foo()'s invocations in the book "*Inside C++ object mode*l".
So, the compiler has to create something like this:
// possible compiler transformation
void foo( const A* pa ) { pa->__vbcX->i = 1024; }

If the program has a pointer to the virtual base class, how can't it resolve the memory address of that member at compile time? As far as I know, when each derived class object is created, the memory layout of each object consists of:

  • all members in the base class
  • a virtual pointer (of a virtual destructor)
  • a pointer to the virtual base class of the derived object
  • all of the members of the derived class object.

For example, suppose I have an object C c_object and A a_object

This is what I think about object c_object layout (suppose c_object start at address 1000:



1000: int i; //(subobject X)
1004: int j; //(subobject A)
1008: double d; //(subobject B)
1012: __vbcX; // which is at address 1000
1016: __vbcA; // which is at address 1004
1020: __vbcB; // which is at address 1008
1024: int k;



This is what I thought from what I read anyway, please verify and correct it for me.

So, finding the base class member should simply be finding the right offset from the starting address of the derived class object. But why can't it be resolved?
Advertisement
First things first...

C++ does not have a "well defined", so the insides of it are implementation specific.

Anyways, foo won't work at all anyways, constant pointer to an A will not allow you to modify the integer.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

If you happened to assemble it yourself and list the assembly you might get something like...

movq -16(%rbp), %rdi
callq __Z3fooP1A
//...
movq %rax, %rdi
callq __Z3fooP1A

For your calls to foo (after fixing the const issue), foo then looks like:


__Z3fooP1A: ## @_Z3fooP1A
Ltmp2:
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp3:
.cfi_def_cfa_offset 16
Ltmp4:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp5:
.cfi_def_cfa_register %rbp
movq %rdi, -8(%rbp)
movq -8(%rbp), %rdi
movq (%rdi), %rax
movq -24(%rax), %rax
movl $1024, (%rdi,%rax) ## imm = 0x400
popq %rbp
ret
Ltmp6:
.cfi_endproc
Leh_func_end0:


Feel free to figure it out, its pretty straightforward. Note that this is NOT optimized code. Which pretty much eliminates most of the code.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.


If you happened to assemble it yourself and list the assembly you might get something like...

movq -16(%rbp), %rdi
callq __Z3fooP1A
//...
movq %rax, %rdi
callq __Z3fooP1A

For your calls to foo (after fixing the const issue), foo then looks like:


__Z3fooP1A: ## @_Z3fooP1A
Ltmp2:
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp3:
.cfi_def_cfa_offset 16
Ltmp4:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp5:
.cfi_def_cfa_register %rbp
movq %rdi, -8(%rbp)
movq -8(%rbp), %rdi
movq (%rdi), %rax
movq -24(%rax), %rax
movl $1024, (%rdi,%rax) ## imm = 0x400
popq %rbp
ret
Ltmp6:
.cfi_endproc
Leh_func_end0:


Feel free to figure it out, its pretty straightforward. Note that this is NOT optimized code. Which pretty much eliminates most of the code.

Thanks for your answer.

However, I learned assembly using Motorola 68K, not x86, and it was a long time ago. So I will figure it out in the future by learning proper x86 instruction set. Can you elaborate the answer in a higher level point of view?
That's AT&T syntax, used by clang/gcc. Intel syntax is the other popular variant, and is much easier to read. I would pull it up for you but I'm not on my windows laptop atm.

In essence, its a table lookup based on the type to find the offset into the class for the appropriate types.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

With all due respect, Washu, turning the question into an analysis of assembly language doesn't help make it easier to understand. If the concepts are straightforward to you, please simplify them for us.
This is what I thought from what I read anyway, please verify and correct it for me.[/quote]

Not for virtual or multiple inheritance. There are multiple vtables, which are resolved during run-time, depending on how object is constructed.

A * is not sufficiently defined. While in this particular case it might be obvious, it's an exception, not the rule.

IIRC, multiple inheritance should be viewed as each class having its completely own vtable, rather than sharing it across hierarchy. There's also a ton of rules on how such classes are constructed and destructed. Consider a diamond: X
/ \
A B
\ /
C
Given an instance of C, one can cast it to either A or B, but A and B are completely distinct types. So even though we have C which has both, function that operates on A or B cannot rely on fixed layout.

Last time I tried to comprehend it I gave up and decided that virtual multiple inheritance is one of those parts of C++ one doesn't use. It's also the reason why essentially no other language supports it, there's just too many complications.

See here.

This is what I thought from what I read anyway, please verify and correct it for me.


Not for virtual or multiple inheritance. There are multiple vtables, which are resolved during run-time, depending on how object is constructed.

A * is not sufficiently defined. While in this particular case it might be obvious, it's an exception, not the rule.

IIRC, multiple inheritance should be viewed as each class having its completely own vtable, rather than sharing it across hierarchy. There's also a ton of rules on how such classes are constructed and destructed. Consider a diamond: X
/ \
A B
\ /
C
Given an instance of C, one can cast it to either A or B, but A and B are completely distinct types. So even though we have C which has both, function that operates on A or B cannot rely on fixed layout.

Last time I tried to comprehend it I gave up and decided that virtual multiple inheritance is one of those parts of C++ one doesn't use. It's also the reason why essentially no other language supports it, there's just too many complications.

See here.
[/quote]
I see. It seems that the example in the book is so obvious that it hardly makes sense. From what was written in C++ FAQ in this section: http://www.parashift...e.html#faq-25.9, it seems there's not ambiguity in the example, since with the virtual keyword eliminates the duplication of multiple inheritance, thus calling a data member is straightforward .

Since you refer to vtables, I will modify the example a bit clearer:


class X {
public:
int i;
virtual void func(){}
};
class A : public virtual X{
public:
int j;
virtual void func(){}
};
class B : public virtual X {
public:
double d;
virtual void func(){}
};
class C : public A, public B {
public:
int k;
virtual void func(){}
};
// cannot resolve location of pa->X::i at compile-time
void foo( const A* pa ) { pa->X_only(); }

main() {
foo( new A );
foo( new C );
// ...
}


In this new code, each base suboject of object C as well itself will have a virtual pointer to its own destructor. It is not known until runtime to be sure about which virtual destructor to be invoked.

Based on the answer in this question: http://stackoverflow...tion-is-invoked, it is known t hat in runtime, the virtual pointer of the base class subobject will be replaced by its derived one. In this case, func() cannot be determined until actual object is passed into foo() at runtime, thus it cannot assign the function address to invoke call to func() in foo().

So, I think the memory layout of typical object C is:
1000: int i; //start of subobject X
1004: __vptr_func_X; //virtual pointer of func() in X, however, it is pointing to address of __vptr_func_C
1008: int j; //start of subobject A
1012: __vptr_func_A; //virtual pointer of func() in A, however, it is pointing to address of __vptr_func_C
1016: double d; //start of subobject B
1020: __vptr_func_B; //virtual pointer of func() in B, however, it is pointing to address of __vptr_func_C
1024: __vbcX; // which points to the start subobject X at address 1000
1028: __vbcA; // which points to the start subobject A at address 1008
1032: __vbcB; // which points to the start subobject B at address 1016
1036: int k;
1040: __vptr_func_C; //virtual pointer of func() in C


About the virtual base class pointer __vbc things, I'm not sure if it is laid out as I wrote it to be. Or maybe this is compiler specific, and assume I'm a compiler maker, I can actually do it that way or otherwise, place it at the end of the object C as long as I satisfy the condition of having a virtual base class pointer in the derived class object, is it right?
Or maybe this is compiler specific[/quote]

It is.

I can actually do it that way or otherwise, place it at the end of the object C as long as I satisfy the condition of having a virtual base class pointer in the derived class object, is it right?[/quote]

I don't know what standard requires, but it does not specify layout, one of many things that leads to incompatible ABIs and lack of compatibility between different compilers and even compiler settings.
Ok, to understand this, you need to first understand class layouts in simpler cases. This is compiler specific but generally goes as follows. A class with no inheritance or no virtual members is just its data members. For example:

struct A {
int i;
int j;
};

0000 i
0004 j

For a class with virtual members you also add a pointer to the vtable. Usually this is at the beginning of the class layout, but nothing requires it to be so.

struct B {
int i;
int j;
virtual ~B() {}
};

0000 vptr to B::vtable
0004 i
0008 j

A class with non-virtual single inheritance places the complete layout of the base class at the front of its layout. If it has any virtual members, these members are made part of an extended vtable whose pointer replaces the pointer in the derived class, if the derived class has one. If not a new vptr member is created in the derived part of the layout.

struct C : A {
int k;
virtual ~C() {}
};

// beginning of A subobject of C
0000 i
0004 j
0008 vptr to C::vtable
000C k

struct D : B {
int k;
virtual ~D() {}
};

// beginning of B subobject of D
0000 vptr to D::vtable
0004 i
0008 j
000C k

With non-virtual multiple inheritance the base classes are generally laid out one after another and the derived class places its members after the base classes. If there are virtual functions the derived class appends it's added virtual functions to one of the base class's vtables or creates a new vtable. Note that with multiple inheritance there can be multiple virtual function tables associated with the derived class.

struct E {
int i;
virtual ~E() {}
}

struct F {
int j;
virtual ~F() {}
};

struct G {
int k;
virtual ~G() {}
};

// beginning of E subobject of G
0000 vptr to G::vtable1
0004 i
// beginning of F subobject of G
0008 vptr to G::vtable2
000C j
0010 k

Next there's virtual single inheritance. When virtually inheriting directly from a base class, the derived class places at the front of it's layout a pointer to a vtable and it's members. The virtual base's members goes at the end of the layout, and the offset to the virtual base is places as a member of the vtable. Non-virtually deriving from one of these classes inserts the new members between the virtual and non-virtual base.

struct H {
int i;
virtual ~H() {}
};

0000 vptr to H::vtable
0004 j

struct I : virtual H {
int j;
};

0000 vptr to I::vtable1 // contains offset to 0008 as beginning of H
0004 j
// beginning of the H subobject of I
0008 vptr to I::vtable2
000C i

struct J : I {
int k;
};

// beginning of the I subobject of J
0000 vptr to J::vtable1 // contains offset to 000C as beginning of H
0004 j
0008 k
// beginning of the H subobject of J
000C vptr to J::vtable2
0010 i

Finally we can get to the diamond virtual inheritance situation. Here the inheritance is done by aggregating the non-virtual base parts of the base classes like in the non-virtual inheritance situation and then placing the virtual base at the end.

struct K {
int i;
virtual ~K() {}
};

0000 vptr to K::vtable
0004 i

struct L : virtual K {
int j;
virtual ~L() {}
};

0000 vptr to L::vtable1 // includes offset to 0008 as beginning of K
0004 j
// beginning of the K subobject of L
0008 vptr to L::vtable2
000C i

struct M : virtual K {
int k;
virtual ~M() {}
};


0000 vptr to M::vtable1 // includes offset to 0008 as beginning of K
0004 k
// beginning of the K subobject of M
0008 vptr to M::vtable2
000C i

struct N : L, M {
int l;
};

// beginning of the L subobject of N
0000 vptr to N::vtable1 // includes offset to 0014 as beginning of K
0004 j
// beginning of the M subobject of N
0008 vptr to N::vtable2 // includes offset to 0014 as beginning of K
000C k
0010 l
// beginning of the K subobject of N
0014 vptr to N::vtable3
0018 i

Here you can see that K::i is at location 0018 in N but 000C in L.

This topic is closed to new replies.

Advertisement