I was fortunate enough to work with Ben Houston briefly once a few years ago. Those in the know may remember him for his work on hierarchical RLE level sets (full text, free preprint). The technology that iD is working with is similar.
Also, the OTOY project from the company JulesWorld that provided the (infamous?) ATI Ruby 2.0 demos uses voxel rendering extensively. I'm still waiting to see concrete results from that project, because it involves extensive distributed rendering. The idea being to render high-fidelity graphics and stream them out to things like mobile platform that could not otherwise deliver those visuals.
And of course, voxel rendering has a long and storied history in scientific and medical visualization.
To note, Digital Molecular Matter uses finite element mathematics to perform their physics calculations, which is in some ways comparable to voxel rendering. Although, instead of cubes, they use tetrahedrons. In both cases, you are dealing with a complex grid of data points, with all the ensuing challenges.
So this is far from a new technology, but what is new is its application to games... or not. For example, they already use voxels to do smoke simulations, though that infrequently shows up in games. Though that is volume (not surface) rendering, and the grid resolutions are generally much smaller. Really, what is new is the application of high resolution, sparse, hierarchical voxel sets in games.
A noble thing indeed, but the next "big thing"? I'm not so sure. It seems to me to be more of a natural evolution of real time rendering, and a particular solution to a particular problem. The end user is only really going to notice the advantages in close ups, which mainly only happen in cut-scenes. They are also useful for dealing with volumetric effects, such as translucency, as they can avoid depth peeling. Generally, they can be more efficient with effects that are best done using ray tracing. At the same time they are more difficult to animate in the traditional manner (skeletons/skinning), while they are a better format for fluid-like movements. Therefore, I see them as complimentary to polygons, not a replacement for them.
To note on animating voxel sets, I view distorted coordinate systems as the best solution here. Basically, each voxel component to be animated is split into a separate set and enclosed in a polygon hull. There are animated and rendered to a sort of G-buffer, which stores the transformations of rays from world space to the distorted voxel space for each screen pixel. The process can additionally cull non-visible voxel sets. Then the actual voxels are ray cast and the image rendered. The trouble here is that skinning polygons allows for the distortion of meshes, not just translation/rotation, which is not as obvious here. Also, there are issues with overlapping voxel hulls, because the area between the foreground hull's silhouette and the actual voxel one must not occlude the background. However, I'm believe these issues can be solved.