As FLeBlanc indicated, this is kind of a tough problem. However, if I were attempting something like this, here is how I would go about it:
First, it's all about the underlying function or data that you use to generate the planet. Like F. said, a cube distorted into a sphere is going to act like a heightmap. The vertices of each face will be pushed toward or away from the planet's center to generate the surface bumpiness. Since a height map has inherent problems with representing overhangs, though, in this representation you can't have overhangs. Overhangs just don't play well with heightmaps. So you need the basic part of your generation function to be non-volumetric.
However, in order to transition to a volumetric representation at the surface level, you need to be able to transform your non-volumetric surface deformation function into a volumetric density field function. The trick is having the representations appear similar enough that the transition from one to the other isn't jarring. If there is a mountain on your large-scale planet, that same mountain better be there on the closer-in scale representation.
So there needs to be some way of constructing the two functions (heightmap deformation and surface volume deformation) so that they are more or less congruent with one another. Additionally, you need to introduce smaller-scale detail functions to the volumetric surface function so that overhangs, cliffs, caves and other features are generated at the small scale. These need to be applied as modifications to the large-scale function so that the macro features remain, but smaller details are added.
This is a tricky job. These are two different abstractions.
Here is an image of a "planet" generated using pure volumetric noise. The features are exaggerated for effect:
This is the kind of features that are impossible to represent using the basic heightmap representation of the subdivided cube method. This was generated by distorting a sphere function by a 3D perlin noise fractal. Clearly, this kind of function won't work for the macro features. However, one small change can make this more usable:
Here we have basically the same setup. However, before the 3D fractal is called to obtain the distortion factor, the input coordinates are normalized to unit length. Then, the sphere is distorted by the noise value obtained from the fractal using this new input coordinate. This is tantamount to displacing the surface of the sphere outward from the center. As you can see, it strips out any overhang features. Done this way, you could use the values generated at the surface of the sphere (which is accomplished by normalizing the input coordinate) to distort the vertices of the subdivided cube mesh to generate the macro features.
Still, since we've modified the distortion function it no longer generates any cool volumetric features. Some features could be added back in for the micro scales, though, using additional volumetric functions. For example, using the distorted sphere and multiplying by another fractal to subtract out parts:
In this case, the macro features remain the same. You have to differentiate between the two methods. The subdivided cube just normalizes the coordinates of each vertex to unit length, then calls the 3D fractal directly, whereas the volumetric surface uses the 3D fractal to distort a sphere function, then adds, multiplies or otherwise combines volumetric detail functions to modify the resulting isosurface.
It's a tricky thing to attempt, but I think you might be able to pull it off with some thought. Good luck.