Sign in to follow this  
Portugal Stew

Voxel video

Recommended Posts

I'm toying with a mystery game concept and I have been lately toying with the idea of using live actors like were used in Myst. What I want is beauty over dynamics (infact, with my concept dynamic characters would detract from the vision), but I want them to work in a 3D polygonal environment. While I'm still a freshmen Computer Science major with plenty of time to contemplate this, do any of you know if it would be possible to create voxel models out of videos? And assuming I had the technology necessary to build voxel models of every frame of the video viewable from every angle (or at least every necessary angle), would it be possible to play the models like a video? I did some light research (i.e. checked Wikipedia) and I understand that voxel rendering is very computationally expensive, but my understanding is that this is mainly because voxel renders use various shading techniques and smoothing, neither of which would be particularly necessary with video footage which is by nature pre-rendered. I am actually fairly confident that even relatively high-resolution voxel videos can be done on normal gamer hardware, but I'm a bit worried about a few potential problems I foresee. One, motion blurs, an nearly unavoidable artifact of film, could make producing voxel models very tricky (although there are ways to limit them to be nearly nonexistant). Two, I am concerned that placing voxel videos in a 3D environment, either voxel or polygonal, especially a dynamic environment, would be too computationally expensive and that it would be difficult to run voxel and polygonal graphics together in parallel. Does anyone have any insights on this? Has similar technology been used before with any success? Would I ultimately be better off doing something less interesting with the BS degree I'm working on and just use motion capture?

Share this post


Link to post
Share on other sites
Forget playing the video. You suggested reconstructing a 3D image out of the 2D image? 3D camera scanners that construct the 3D object out of several 2D shots exist, but they in itself are pretty big undertakings. And they also usually superimpose some pattern over the object so the algorithms can recognize the shape.

As far as I know, nothing like it has ever been done with several cameras. Forget the game, just being able to record 3D video would be huge, massive even.

Share this post


Link to post
Share on other sites
If you somehow had volumetric data, I am certain you could come up with a way to display it at interactive framerates. But as Radan said, getting the data to begin with would be at the very cutting edge of what we can do with markerless motion capture (as far as I know).

There are some papers on the subject. It's not my field, but a quick google turns up e.g., this (which seems really very simple). Since the models demonstrated are incredibly simple (a box, and a coffee cup with a relatively thin handle), my guess is that this particular method runs into problems with more complicated geometry that creates more self-occlusions. It would seem that you'd need to exploit temporal coherence somehow (e.g. by estimating velocities at voxels under a rigid-body constraint) to get this, and they don't do that at all.

Personally, my gut impulse -- especially if I didn't need my models to interact with the world -- would be to use an image-space approach if I insisted on attempting something like this. I.e., I'd avoid trying to construct an intermediate 3d representation from video and instead just display data from an appropriate camera based on the direction from which the player is looking at the model -- perhaps with some sort of (homography-based?) interpolation in-between; think Quicktime VR. This would require that cameras be calibrated well and that they have the same focal lengths as the perspective transformation used in the game; this would be a finicky thing to set up but probably possible.

Note that the equipment cost for anything like this, however -- lots of cameras set up on tripods surrounding an actor -- would be in the high tens of thousands of dollars at the absolute minimum, and probably in the hundreds. (That's what a "real" mocap setup costs, at least).

[Edited by - Emergent on September 14, 2009 1:30:03 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Radan
Forget the game, just being able to record 3D video would be huge, massive even.
Not really - the technology has existed for several years. As Emergent suggested, multiple cameras surrounding the subject are necessary, but he missed one important part of the puzzle - depth cameras.

A normal camera records an image, but you then have to attempt an image-based reconstruction of 3D form through edge detection of multiple offset images (much as the human eye does). However, several groups have developed methods to reliably reconstruct depth from a standard camera as well (see the MIT solution). The real ace here is a (relatively) cheap commercial solution (see a brief description here).

Given multiple cameras around a subject recording depth as well as colour, voxel reconstruction is a fairly trivial software issue. In short, the technical legwork has been done long ago [wink]

Share this post


Link to post
Share on other sites
I have to disagree, I was researching the subject of Virtual Sets, but in the process I saw some advanced sets with plugins, basically a set means a virtual studio where everything is 3D and interactive only the actor is a video.
Now the interesting part was aplugin for a sports studiom where they showed
a normal soccer match that can be paused at anytime (the data is a normal video), and the image is processed and a 3D world is constructed.
Here is the video in question, this is an Ad. for the program
but the part you want is when they start talking about Sports Studio (should be somewhere in the middle).
http://www.orad.tv/upload/downloads/IBC07.wmv

Share this post


Link to post
Share on other sites
I would like to correct myself somewhat to say that what I have in mind is not necessarily to create a model that can be viewed in accurate detail from every angle, but can be viewed with 360 degrees of rotation at eye level. This no doubt has much simpler solutions, but the concept of generating full-scale models intrigues me.

Assuming 3D video can be captured, does anyone have any idea how it can be played back? How would it be compressed? I'm curious to pre-render CGI scenes as voxel videos, which should open some interesting possibilities (even if polygonal scenes are significantly cheaper to use, voxel videos would be susceptible to traditional video filters with slight modifications, amongst other things).

Much of this is simply novelty for the sake of novelty, but a lot of it is just my fascination with the possibilities. Theoretically, if the technology is perfected, post-production visual effects could be revolutionized, where by just using one or two extra cameras green-screen footage can be edited to correct aperture length, camera angle, and more.

Share this post


Link to post
Share on other sites
Quote:
Original post by Emergent
Wow swiftcoder; that's very, very cool. I wonder how reliable all this is. It would be a real boon to robotics to have reliable "cameras with z-buffers."
Also worth mentioning that the ZCam appears to be the camera used in Microsoft's Project Natal (Microsoft acquired the ZCam technology shortly before). I assume the depth sensing is the key to their (very good) free-form interaction.

Share this post


Link to post
Share on other sites
Quote:
Original post by Portugal Stew
I would like to correct myself somewhat to say that what I have in mind is not necessarily to create a model that can be viewed in accurate detail from every angle, but can be viewed with 360 degrees of rotation at eye level.

Are you talking about rotating the view point without moving the camera, or about orbiting the camera around the subject?

Share this post


Link to post
Share on other sites
@Sneftel,
I mean, ideally in the game environment I have envisioned, you the player can walk around the FMV characters and view them from any angle.

@RobTheBloke
I knew something like that couldn't possibly be that expensive and impossible! I have heard of technology like this but it is sometimes hard to find it again. This particular example is a wee flawed but it's a good start.

Share this post


Link to post
Share on other sites
I think you'd be better off with polygonal characters if you want to be able to freely move around them. You can still use motion capture to capture their movement and facial expressions.

I don't think you're going to be able to store voxel models of FMV at anywhere near the resolution required to make it look good. For example, if your models are stored as 512x512x512 voxel maps (which seems about right to me: HD video is 1080 pixels high, so 512 is about half) at 30 frames per second, to store 1 minute of video footage, you'd need (by my calculation) 675GB of uncompressed data - that's per minute of footage. Obviously, compression can reduce that quite a bit, but even if you reduced it by a factor of one thousand, you still need 0.68GB per minute...

If you don't need full 6 degrees of freedom, you can simply capture the video from multiple angles at the same time. But don't forget, for each additional angle you capture at, you're adding an extra stream of video. DVD-quality video requires about 34MB per minute. If you recorded the video from 75 different angles (meaning you'd be able to move around the "model" and get a different view every 5 degrees of rotation) you'd need 2.5GB of storage...

Share this post


Link to post
Share on other sites
Assuming, of course, that you treat voxel models like 3D bitmaps, which they rarely are, because that wouldn't make sense, on account of that very truth.

A three-dimensional model could be stored for little more data than exists in a high-definition image (or at least I suppose it can, because Zbrush seems to get by without requiring a Deep Blue workstation). I don't know much about how exactly it would work, but if the voxel video is only meant to show the surface of the subject then I see no need to store so much extra information (this isn't a CT scan we're talking about). Instead of being some sort of massive behemoth of data, all you need is to store voxel information as color and location on the grid. I'm sure having to set location markers to everything is a bit of a bummer for anyone wanting to just make a quick hack of DivX for 3D but unless I horribly misunderstand graphics technology it doesn't seem like a ridiculous feat, simply one requiring more novel approaches.

Can anyone tell me if I'm totally wrong on that? I don't know a thing about what I'm talking about, I'm sure there's even a better way to hack at it than that if that does work.

Share this post


Link to post
Share on other sites
Quote:
Original post by Portugal Stew
@RobTheBloke
I knew something like that couldn't possibly be that expensive and impossible! I have heard of technology like this but it is sometimes hard to find it again. This particular example is a wee flawed but it's a good start.


Not sure if you've seen the photo's of his setup, but they did make me rofl at the time. It's one of the best homebrew projects i've ever seen i think ;)

It does have flaws, but then again so do commercial 3D scanners. From experiance every 3D scan requires significant (manual) cleanup - a particularly nasty job with very dense meshes. The primary problem with the technique seems to be one of calibration - which is where the flaws in his implementation appear to come from. I had a go at implementing the algorithm, and it actually gave reasonable results when fed images rendered from maya (where the camera & projection setup could be controlled accurately).

I'd say it's possible to get a realtime implementation, but it's probably a project in itself rather than being that feasable in a game. Expect vast amounts of geometry to be generated!

Share this post


Link to post
Share on other sites
Quote:
Original post by Portugal Stew
A three-dimensional model could be stored for little more data than exists in a high-definition image (or at least I suppose it can, because Zbrush seems to get by without requiring a Deep Blue workstation).


Yes an no. Once polygonised, yes it can. If it exists as a 3D level set, then expect orders of magnitude more data. i.e. a 1024x1024x1024 grid would be 1Gb of data (assuming each cell has 1 byte - which it won't - it will be more!)

Even if you did take a, say, 640x480 image and stored data for that. That's still an awful lot of pixels to shift and turn into a polygonal data set. approx 28Mb minimum for the final data + however much for the image(s).

Share this post


Link to post
Share on other sites
Quote:
Original post by Codeka
For example, if your models are stored as 512x512x512 voxel maps (which seems about right to me: HD video is 1080 pixels high, so 512 is about half)


That's *extremely* high res for a voxel map. approx 256Mb multipled by the number of bytes in each cell.

Share this post


Link to post
Share on other sites
Quote:
Original post by RobTheBloke
It does have flaws, but then again so do commercial 3D scanners. From experiance every 3D scan requires significant (manual) cleanup - a particularly nasty job with very dense meshes. The primary problem with the technique seems to be one of calibration - which is where the flaws in his implementation appear to come from. I had a go at implementing the algorithm, and it actually gave reasonable results when fed images rendered from maya (where the camera & projection setup could be controlled accurately).
The projection stripe method he uses is very simple to implement, but has worse artefacts than most of the other methods. In particular, you need multiple frames to resurrect a 3D model, the projected stripes interact with shadows, causing problems in reconstruction, and the stripes remove the ability to record surface colour at the same time. If your target is human, the stripes also prove very distracting for the subject.

You can get around most of these issues by using an infra-red projection (i.e. move the stripes out of the visible spectrum), but it still requires multiple frames to reconstruct.

MIT's coded aperture approach looks like it could be incredible for homebrew, as it doesn't require any additional equipment (such as a projector), can capture scene depth in a single frame, and seems to yield very decent results - unfortunately, nobody seems to have implemented this for a video stream yet.

The ZCam/Project Natal Time-of-Flight approach looks as if it might be the most robust. They pulse an infra-red laser at regular intervals, record the intensity at which it bounces back, and use the speed of light to calculate the distance. The advantage here is that it works flawlessly in near-darkness (where the MIT approach can't), doesn't interfere visually, and can also capture depth in a single frame.

Share this post


Link to post
Share on other sites
Quote:
Yes an no. Once polygonised, yes it can. If it exists as a 3D level set, then expect orders of magnitude more data. i.e. a 1024x1024x1024 grid would be 1Gb of data (assuming each cell has 1 byte - which it won't - it will be more!)
But once again that seems to me to just be a bit too simple. Surely there are alternate means of storing voxel information than by just cubing the grid. That seems wildly excessive when already I could theoretically use a Zcam and store 3D photos for only a trivial amount of extra data.

Share this post


Link to post
Share on other sites
Quote:
Original post by Portugal Stew
But once again that seems to me to just be a bit too simple. Surely there are alternate means of storing voxel information than by just cubing the grid.


The obvious alternative being to store voxel coordinates with the color, making it 9byte instead of 3byte. Efficiency aside that will start to help as soon as less than a third of the grid is filled (which it most likely will be). Then of course a simple octree would also allow to throw away a ton of empty space for (hopefully) much smaller overhead.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this