Sign in to follow this  
js2007

Determine camera pose from known object location and model view matrix

Recommended Posts

Hi everybody, I have a math question related to a somewhat special OpenGL/real world problem: I am using an augmented reality (AR) library that detects specific markers (2D patterns) in pictures taken of the real world (imagine a picture of an office with a poster hanging on one of the walls, where that poster shows a very distinct 2D marker pattern). The idea behind AR is that you display such real world pictures on a screen and overlay rendered 3D objects; for example you could render a shelf next to the wall where the marker was detected. For this purpose the AR library I am using returns an OpenGL style model view matrix for each marker it detected. If AR in OpenGL is all you want to do then that is all you need to correctly render objects onto markers. However, I am not trying to do AR but rather estimate the camera's real world pose. To me it seems that by having knowledge of the true, real world location of a marker and the model view matrix returned by the AR library for that particular marker, it should be possible to determine the real world pose of the camera (where we assume a pinhole camera for simplicity). Unfortunately, I am not exactly sure how I would compute the camera's pose. I think I could interpret the three translation components in the MV matrix as relative translation values between the marker and the camera. But just based on this information the camera could be anywhere on the surface of a sphere centered at the marker. How would you compute the precise pose? Thanks a lot! -Jonas

Share this post


Link to post
Share on other sites
Since you know the location and orientation of the marker, you can create a matrix to represent it. The matrix computed by your library is giving a transformation from the camera's space to the marker's space, ie the view matrix multiplied by the model matrix. Multiply this by the inverse of the model matrix (the one that represents the location of the marker) to get a matrix that represents the camera, the view matrix. The inverse of this matrix gives you the position and orientation of the camera in world space. I think?

Share this post


Link to post
Share on other sites
So you are saying the model view matrix A that I get from the library is essentially A=M*V of which I know M and I want to compute V^{-1}. Because A^{-1}=V^{-1}*M^{-1} the inverse view matrix would then be given by V^{-1}=A^{-1}*M. Is that correct? Can anyone confirm that? If this is correct, what do these matrices actually look like? I never worked with OpenGL before but read a short introduction on its 4x4 transformation matrix format. From that introduction it seems unreasonable to carry out the above mentioned math on the 4x4 model view matrix.

Share this post


Link to post
Share on other sites
Why does this math seem unreasonable? The matrix representation is used precisely because it allows this sort of painless manipulation of transformations. The top left 3x3 elements of the matrix are a rotation and scaling matrix. If there is no scaling, then they are just a rotation matrix, and each row represents a direction vector: the first row will be right, the second row will be up, and the third row will be backwards, I think. The three elements to the right of that are the x,y,z position of the center of the camera. The bottom right element is always 1 in a normal view or model matrix. The 3 elements on the left side of the 4th row are always 0.

Share this post


Link to post
Share on other sites
Vorpy, you are right about the matrix representation. I implemented the computation of A^{-1}*M where A is the model view matrix I am given by the AR library and M is the marker specific matrix. For testing purposes I set it to

1 0 0 0
0 1 0 0
0 0 1 1000
0 0 0 1


Which (I think) means that the marker is simply translated by 1000mm on the Z axis from the origin, without any rotation.

Unfortunately, the matrix resulting from A^{-1}*M does not make much sense (I only examined the translation vector so far). I double-checked my matrix multiplications and the inversion of A but could not find a bug.

Therefore I wonder whether this is actually the right way to compute the camera pose. Does anyone have any comments on this?

Share this post


Link to post
Share on other sites
I think the modelview matrix might be V*M. I haven't done this stuff for a while so I got a bit confused. It has to be V*M, because normally when using the matrix stack the view matrix is set up and then the stack is used to push each of the models' matrices.

Then the camera's transformation would be given by M*A^{-1}

Share this post


Link to post
Share on other sites
Vorpy, you were right about the model view matrix being V*M instead of M*V. Like you suggested, it is working by computing M*A^{-1}. Thanks!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this