Voxel video

Started by
17 comments, last by Trienco 14 years, 7 months ago
@Sneftel,
I mean, ideally in the game environment I have envisioned, you the player can walk around the FMV characters and view them from any angle.

@RobTheBloke
I knew something like that couldn't possibly be that expensive and impossible! I have heard of technology like this but it is sometimes hard to find it again. This particular example is a wee flawed but it's a good start.
Advertisement
I think you'd be better off with polygonal characters if you want to be able to freely move around them. You can still use motion capture to capture their movement and facial expressions.

I don't think you're going to be able to store voxel models of FMV at anywhere near the resolution required to make it look good. For example, if your models are stored as 512x512x512 voxel maps (which seems about right to me: HD video is 1080 pixels high, so 512 is about half) at 30 frames per second, to store 1 minute of video footage, you'd need (by my calculation) 675GB of uncompressed data - that's per minute of footage. Obviously, compression can reduce that quite a bit, but even if you reduced it by a factor of one thousand, you still need 0.68GB per minute...

If you don't need full 6 degrees of freedom, you can simply capture the video from multiple angles at the same time. But don't forget, for each additional angle you capture at, you're adding an extra stream of video. DVD-quality video requires about 34MB per minute. If you recorded the video from 75 different angles (meaning you'd be able to move around the "model" and get a different view every 5 degrees of rotation) you'd need 2.5GB of storage...
Assuming, of course, that you treat voxel models like 3D bitmaps, which they rarely are, because that wouldn't make sense, on account of that very truth.

A three-dimensional model could be stored for little more data than exists in a high-definition image (or at least I suppose it can, because Zbrush seems to get by without requiring a Deep Blue workstation). I don't know much about how exactly it would work, but if the voxel video is only meant to show the surface of the subject then I see no need to store so much extra information (this isn't a CT scan we're talking about). Instead of being some sort of massive behemoth of data, all you need is to store voxel information as color and location on the grid. I'm sure having to set location markers to everything is a bit of a bummer for anyone wanting to just make a quick hack of DivX for 3D but unless I horribly misunderstand graphics technology it doesn't seem like a ridiculous feat, simply one requiring more novel approaches.

Can anyone tell me if I'm totally wrong on that? I don't know a thing about what I'm talking about, I'm sure there's even a better way to hack at it than that if that does work.
Quote:Original post by Portugal Stew
@RobTheBloke
I knew something like that couldn't possibly be that expensive and impossible! I have heard of technology like this but it is sometimes hard to find it again. This particular example is a wee flawed but it's a good start.


Not sure if you've seen the photo's of his setup, but they did make me rofl at the time. It's one of the best homebrew projects i've ever seen i think ;)

It does have flaws, but then again so do commercial 3D scanners. From experiance every 3D scan requires significant (manual) cleanup - a particularly nasty job with very dense meshes. The primary problem with the technique seems to be one of calibration - which is where the flaws in his implementation appear to come from. I had a go at implementing the algorithm, and it actually gave reasonable results when fed images rendered from maya (where the camera & projection setup could be controlled accurately).

I'd say it's possible to get a realtime implementation, but it's probably a project in itself rather than being that feasable in a game. Expect vast amounts of geometry to be generated!
Quote:Original post by Portugal Stew
A three-dimensional model could be stored for little more data than exists in a high-definition image (or at least I suppose it can, because Zbrush seems to get by without requiring a Deep Blue workstation).


Yes an no. Once polygonised, yes it can. If it exists as a 3D level set, then expect orders of magnitude more data. i.e. a 1024x1024x1024 grid would be 1Gb of data (assuming each cell has 1 byte - which it won't - it will be more!)

Even if you did take a, say, 640x480 image and stored data for that. That's still an awful lot of pixels to shift and turn into a polygonal data set. approx 28Mb minimum for the final data + however much for the image(s).
Quote:Original post by Codeka
For example, if your models are stored as 512x512x512 voxel maps (which seems about right to me: HD video is 1080 pixels high, so 512 is about half)


That's *extremely* high res for a voxel map. approx 256Mb multipled by the number of bytes in each cell.
Quote:Original post by RobTheBloke
It does have flaws, but then again so do commercial 3D scanners. From experiance every 3D scan requires significant (manual) cleanup - a particularly nasty job with very dense meshes. The primary problem with the technique seems to be one of calibration - which is where the flaws in his implementation appear to come from. I had a go at implementing the algorithm, and it actually gave reasonable results when fed images rendered from maya (where the camera & projection setup could be controlled accurately).
The projection stripe method he uses is very simple to implement, but has worse artefacts than most of the other methods. In particular, you need multiple frames to resurrect a 3D model, the projected stripes interact with shadows, causing problems in reconstruction, and the stripes remove the ability to record surface colour at the same time. If your target is human, the stripes also prove very distracting for the subject.

You can get around most of these issues by using an infra-red projection (i.e. move the stripes out of the visible spectrum), but it still requires multiple frames to reconstruct.

MIT's coded aperture approach looks like it could be incredible for homebrew, as it doesn't require any additional equipment (such as a projector), can capture scene depth in a single frame, and seems to yield very decent results - unfortunately, nobody seems to have implemented this for a video stream yet.

The ZCam/Project Natal Time-of-Flight approach looks as if it might be the most robust. They pulse an infra-red laser at regular intervals, record the intensity at which it bounces back, and use the speed of light to calculate the distance. The advantage here is that it works flawlessly in near-darkness (where the MIT approach can't), doesn't interfere visually, and can also capture depth in a single frame.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Quote:Yes an no. Once polygonised, yes it can. If it exists as a 3D level set, then expect orders of magnitude more data. i.e. a 1024x1024x1024 grid would be 1Gb of data (assuming each cell has 1 byte - which it won't - it will be more!)
But once again that seems to me to just be a bit too simple. Surely there are alternate means of storing voxel information than by just cubing the grid. That seems wildly excessive when already I could theoretically use a Zcam and store 3D photos for only a trivial amount of extra data.
Quote:Original post by Portugal Stew
But once again that seems to me to just be a bit too simple. Surely there are alternate means of storing voxel information than by just cubing the grid.


The obvious alternative being to store voxel coordinates with the color, making it 9byte instead of 3byte. Efficiency aside that will start to help as soon as less than a third of the grid is filled (which it most likely will be). Then of course a simple octree would also allow to throw away a ton of empty space for (hopefully) much smaller overhead.
f@dzhttp://festini.device-zero.de

This topic is closed to new replies.

Advertisement