I think the most important thing about the diamond problem is to understand the implications, and to understand that there is no diamond problem at all (in fact, the diamond is the solution to the problem).
If you have base class A and classes B and C which derive from it, and class D which derives (non-virtually) from both B and C, then you have a class D with two ancestors which each have a base class (which happens, by sheer coincidence, to be the same).
Both ancestors inherit a virtual function from their respective base class which happens to be the same. By convention, that function should be pure virtual (so both B and C need to define it) but it needs not be. It's perfectly allowable to provide an implementation in A as well.
So, in one word, you have two, possibly three versions of the same function[1], and when you call that function on a D object, it's impossible to tell which one you want to call. You must either say d.A::f(), d.B::f(), or d.C::f(), or you must take the objects address and do a static cast.
This is, however, not a diamond. It's a tree with "a twist and a knot" in its leaves. Or something. But it's not a diamond.
Now what virtual inheritance does is, it turns that knotty tree into a diamond. You now no longer have two base classes which are accidentially on top of each other and from which the two ancestors inherit and possibly override differently a (accidentially identical) virtual function. You now have exactly one base class, and exactly one final overrider in your most derived class (which might call one of the ancestors' implementations, or do something completely different). Problem solved. Thus, the diamond is the cure, not the illness.
Of course, virtual inheritance makes the assumption that you actually want that. This is usually, but not always, the case. You might want something different, too (but then you must tell the compiler what you want every time to make the call unambiguous).
[1] Indeed, they could just as well be entirely differrent, unrelated functions which only accidentially have the same name and function arguments. You cannot know.
Note that extra vtables may be needed for the compiler to transparently do the "magic" so object size and raw pointer addresses are not always "immediately intuitive" (they can indeed be different for the "same" object, but you should not care about raw pointer addresses anyway), and the whole thing doesn't just work with functions, but with data types as well.
Try and figure the output of the following snippet:
#include <stdio.h>
struct A { int x[100]; };
struct B1 : virtual public A { };
struct C1 : virtual public A { };
struct B2 : public A { };
struct C2 : public A { };
struct D : public B1, public C1 {};
struct E : public B2, public C2 {};
struct F : virtual public B2, virtual C2 {};
int main()
{
printf("sizeof(D) = %llu\n", sizeof(D));
printf("sizeof(E) = %llu\n", sizeof(E));
printf("sizeof(F) = %llu\n", sizeof(F));
return 0;
}
On my machine, output is
sizeof(D) = 416
sizeof(E) = 800
sizeof(F) = 808
The first obviously has 100 elements (and two vtables), the next one has 200 elements (and no vtable), and the last has 200 elements and a vtable.