It could be as simple as a raw 3D array with an associated value. This value can be density (e.g. 0 for air, 1 for solid), or multiple values like density, opacity, material id, etc. It depends on what all scalar quantities you need to represent in the 3D structure.
1. How is Voxel data usually stored? Is it similar to height-mapped terrain where you can just specify heights and then generate the vertices during run-time?
Marching cubes 'walks' this volume and generates meshes which is what you render with ultimately.
A voxel based data structure can run time compress/decompress the stored values using an algorithm like rle, etc, so that run time memory consumption is kept at a minimum. If RAM is not an issue, just disk space, then merely zipping up the raw serialized voxel data suffices.
2. How is the large memory footprint handled in a real-time simulation? Are there "chunks" which can be streamed in and out depending on view distance?
This is sort of orthogonal to voxel/marching cubes, but you can use any method. It depends on what you want really/application domain. For example, in medical applications, you can't really use occlusion culling, because the volumes' color may need to be alpha blended with volumes occluding them. Usually however, if you have multiple volumes, at a minimum, you would do view frustum culling.
3. What sort of partitioning/culling methods can apply? Octree/Frustum, Occlusion Culling?
4. What sort of LOD techniques work well? (This could kind of apply to question 2 I guess)
5. Know of any simple examples I can examine?
PolyVox is a very good library that is used to work with voxel data. There's a lot of information on all of your above questions on the forums there.
C4 engine is another: