Generating height/normal/etc maps from 2 images

Started by
3 comments, last by FLeBlanc 11 years, 6 months ago
O hai thar

I'm thinking about creating a tool for my engine, that creates normal/height/etc maps from 2 stereo images, just like this software: http://www.photosculpt.net/

That software's just damn awesome..

But I have no idea how to do this.. Any suggestions?
I think it's based on some kind of luminance difference, or what..
Advertisement
This is related to computer vision, which is actually an AI problem, not a graphics problem.

I will explain some of the procedure but there is too much to explain in one post and you will have to Google some words I will present on your own.


Firstly, there are a few things you need to know about the cameras used to take the images.
#1: The focal length of the cameras (which should be identical). f.
#2: How far apart the cameras were. Used to calculate the baseline B.

In order to calculate the depth for any given pixel you also need to know where it is in space relative to each camera (but only in 1D (X), since the depth component is still unknown and the vertical component Y is the same for both cameras (since it is assumed the cameras were horizontally aligned)).
However, this assumes you know where each pixel is for each camera. That is, for any given pixel, you need to know its X relative to camera 0 and also to camera 1.
Taking 2 images and figuring out this information is called Correspondence.

This is the research you need to do on your own, as it is not practical to explain on a forum post, at least when you include SSD (sum of square distance) errors for determining best matches when there are multiple choices to make and when you account for the recovery of lost data.
Note that due to loss in information between images, it is not always possible to determine depth for a given pixel. Your best bet in that case would be to average the depths of nearby pixels in which correct depth information was able to be determined.



Once you have used Correspondence to determine where a single pixel is in relation to each camera, you can determine the depth by doing the following:
Project the pixel onto the back plane of camera 0 by a distance of f, and do the same for camera 1.
The results are x0 and x1.

Knowing the focal length f and the baseline B, the equation to find the depth Z is:
(x1 - x0) / f = (B / Z)
or:
Z = fB / (x1 - x0)


Another useful term would be “Stereovision”.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


This is related to computer vision, which is actually an AI problem, not a graphics problem.

I will explain some of the procedure but there is too much to explain in one post and you will have to Google some words I will present on your own.


Firstly, there are a few things you need to know about the cameras used to take the images.
#1: The focal length of the cameras (which should be identical). f.
#2: How far apart the cameras were. Used to calculate the baseline B.

In order to calculate the depth for any given pixel you also need to know where it is in space relative to each camera (but only in 1D (X), since the depth component is still unknown and the vertical component Y is the same for both cameras (since it is assumed the cameras were horizontally aligned)).
However, this assumes you know where each pixel is for each camera. That is, for any given pixel, you need to know its X relative to camera 0 and also to camera 1.
Taking 2 images and figuring out this information is called Correspondence.

This is the research you need to do on your own, as it is not practical to explain on a forum post, at least when you include SSD (sum of square distance) errors for determining best matches when there are multiple choices to make and when you account for the recovery of lost data.
Note that due to loss in information between images, it is not always possible to determine depth for a given pixel. Your best bet in that case would be to average the depths of nearby pixels in which correct depth information was able to be determined.



Once you have used Correspondence to determine where a single pixel is in relation to each camera, you can determine the depth by doing the following:
Project the pixel onto the back plane of camera 0 by a distance of f, and do the same for camera 1.
The results are x0 and x1.

Knowing the focal length f and the baseline B, the equation to find the depth Z is:
(x1 - x0) / f = (B / Z)
or:
Z = fB / (x1 - x0)


Another useful term would be “Stereovision”.


L. Spiro


This is way more compicated than I thought, wow.
AFAIK in the software I linked in the OP, you don't need to input the focal lenght of the camera, or the distance, or anything. You just input two images and it produces * maps. I founs some kind of tutorial about creating depth maps from two images, if anyone needs the link here's it:
http://www.pages.drexel.edu/~nk752/depthMapTut.html
Thank you very much for your explanation.
If I write something interesting I'll share it here.
Thanks again.
A general-purpose tool doesn’t actually need to know the baseline or focal-length, since there is a common denominator for both (baseline would be the average distance between human eyes and focal length would be the average depth of the human eyeball, from pupil lens to the central retinal vein).

Of course your results will not be entirely accurate if you do not use the actual terms used during photography, but in most cases photographers try to simulate physical eye conditions so it generally will not be noticeable if you assume these defaults.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


This is way more compicated than I thought, wow.
AFAIK in the software I linked in the OP, you don't need to input the focal lenght of the camera, or the distance, or anything. You just input two images and it produces * maps. I founs some kind of tutorial about creating depth maps from two images, if anyone needs the link here's it:
http://www.pages.dre...epthMapTut.html
Thank you very much for your explanation.
If I write something interesting I'll share it here.
Thanks again.


If you look at the generated depth-map in that link, you'll see the crudity of the result. If you look at the features page for PhotoSculpt, you'll see there are many limitations. I suspect this is because of the approximations and assumptions they make in order to avoid things such as focal length in their calculations. This type of thing is a complex beast, just as L. Spiro indicated. The idea of it is pretty neat, but I just don't think the tech is there yet. I've seen a few things done with PhotoSculpt, and in my opinion at least they just do not come close to what a real artist can produce, especially for game artwork where it's not simply a matter of producing something that is as close to real life as possible, but rather is a matter of producing something that looks good in a game without hampering performance.

This topic is closed to new replies.

Advertisement