Jump to content

  • Log In with Google      Sign In   
  • Create Account


Generating height/normal/etc maps from 2 images


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 MrOMGWTF   Members   -  Reputation: 433

Like
0Likes
Like

Posted 06 October 2012 - 02:40 AM

O hai thar

I'm thinking about creating a tool for my engine, that creates normal/height/etc maps from 2 stereo images, just like this software: http://www.photosculpt.net/

That software's just damn awesome..

But I have no idea how to do this.. Any suggestions?
I think it's based on some kind of luminance difference, or what..

Sponsor:

#2 L. Spiro   Crossbones+   -  Reputation: 11939

Like
4Likes
Like

Posted 06 October 2012 - 06:06 AM

This is related to computer vision, which is actually an AI problem, not a graphics problem.

I will explain some of the procedure but there is too much to explain in one post and you will have to Google some words I will present on your own.


Firstly, there are a few things you need to know about the cameras used to take the images.
#1: The focal length of the cameras (which should be identical). f.
#2: How far apart the cameras were. Used to calculate the baseline B.

In order to calculate the depth for any given pixel you also need to know where it is in space relative to each camera (but only in 1D (X), since the depth component is still unknown and the vertical component Y is the same for both cameras (since it is assumed the cameras were horizontally aligned)).
However, this assumes you know where each pixel is for each camera. That is, for any given pixel, you need to know its X relative to camera 0 and also to camera 1.
Taking 2 images and figuring out this information is called Correspondence.

This is the research you need to do on your own, as it is not practical to explain on a forum post, at least when you include SSD (sum of square distance) errors for determining best matches when there are multiple choices to make and when you account for the recovery of lost data.
Note that due to loss in information between images, it is not always possible to determine depth for a given pixel. Your best bet in that case would be to average the depths of nearby pixels in which correct depth information was able to be determined.



Once you have used Correspondence to determine where a single pixel is in relation to each camera, you can determine the depth by doing the following:
Project the pixel onto the back plane of camera 0 by a distance of f, and do the same for camera 1.
The results are x0 and x1.

Knowing the focal length f and the baseline B, the equation to find the depth Z is:
(x1 - x0) / f = (B / Z)
or:
Z = fB / (x1 - x0)


Another useful term would be “Stereovision”.


L. Spiro

Edited by L. Spiro, 06 October 2012 - 06:15 AM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#3 MrOMGWTF   Members   -  Reputation: 433

Like
0Likes
Like

Posted 06 October 2012 - 08:45 AM

This is related to computer vision, which is actually an AI problem, not a graphics problem.

I will explain some of the procedure but there is too much to explain in one post and you will have to Google some words I will present on your own.


Firstly, there are a few things you need to know about the cameras used to take the images.
#1: The focal length of the cameras (which should be identical). f.
#2: How far apart the cameras were. Used to calculate the baseline B.

In order to calculate the depth for any given pixel you also need to know where it is in space relative to each camera (but only in 1D (X), since the depth component is still unknown and the vertical component Y is the same for both cameras (since it is assumed the cameras were horizontally aligned)).
However, this assumes you know where each pixel is for each camera. That is, for any given pixel, you need to know its X relative to camera 0 and also to camera 1.
Taking 2 images and figuring out this information is called Correspondence.

This is the research you need to do on your own, as it is not practical to explain on a forum post, at least when you include SSD (sum of square distance) errors for determining best matches when there are multiple choices to make and when you account for the recovery of lost data.
Note that due to loss in information between images, it is not always possible to determine depth for a given pixel. Your best bet in that case would be to average the depths of nearby pixels in which correct depth information was able to be determined.



Once you have used Correspondence to determine where a single pixel is in relation to each camera, you can determine the depth by doing the following:
Project the pixel onto the back plane of camera 0 by a distance of f, and do the same for camera 1.
The results are x0 and x1.

Knowing the focal length f and the baseline B, the equation to find the depth Z is:
(x1 - x0) / f = (B / Z)
or:
Z = fB / (x1 - x0)


Another useful term would be “Stereovision”.


L. Spiro


This is way more compicated than I thought, wow.
AFAIK in the software I linked in the OP, you don't need to input the focal lenght of the camera, or the distance, or anything. You just input two images and it produces * maps. I founs some kind of tutorial about creating depth maps from two images, if anyone needs the link here's it:
http://www.pages.drexel.edu/~nk752/depthMapTut.html
Thank you very much for your explanation.
If I write something interesting I'll share it here.
Thanks again.


#4 L. Spiro   Crossbones+   -  Reputation: 11939

Like
1Likes
Like

Posted 06 October 2012 - 08:55 AM

A general-purpose tool doesn’t actually need to know the baseline or focal-length, since there is a common denominator for both (baseline would be the average distance between human eyes and focal length would be the average depth of the human eyeball, from pupil lens to the central retinal vein).

Of course your results will not be entirely accurate if you do not use the actual terms used during photography, but in most cases photographers try to simulate physical eye conditions so it generally will not be noticeable if you assume these defaults.


L. Spiro

Edited by L. Spiro, 06 October 2012 - 04:12 PM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#5 FLeBlanc   Crossbones+   -  Reputation: 3081

Like
2Likes
Like

Posted 06 October 2012 - 08:57 AM

This is way more compicated than I thought, wow.
AFAIK in the software I linked in the OP, you don't need to input the focal lenght of the camera, or the distance, or anything. You just input two images and it produces * maps. I founs some kind of tutorial about creating depth maps from two images, if anyone needs the link here's it:
http://www.pages.dre...epthMapTut.html
Thank you very much for your explanation.
If I write something interesting I'll share it here.
Thanks again.


If you look at the generated depth-map in that link, you'll see the crudity of the result. If you look at the features page for PhotoSculpt, you'll see there are many limitations. I suspect this is because of the approximations and assumptions they make in order to avoid things such as focal length in their calculations. This type of thing is a complex beast, just as L. Spiro indicated. The idea of it is pretty neat, but I just don't think the tech is there yet. I've seen a few things done with PhotoSculpt, and in my opinion at least they just do not come close to what a real artist can produce, especially for game artwork where it's not simply a matter of producing something that is as close to real life as possible, but rather is a matter of producing something that looks good in a game without hampering performance.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS