• 10
• 11
• 12
• 14
• 15

# First person camera and matrix transforms

This topic is 1902 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Well that's pretty much it. I've been trying to find a resource on how to implement a 1rst person camera in OpenGL (particulary 3.3, using LWJGL bindings for Java) and I always find things that either:

a. Use deprecated functions or things that depend on deprecated functions (gluLookAt) or
b. Rely on some third party library that isn't available on Java (ie, GLM)

Currently I just have "working" the perspective projection, translations and rotations along a single arbitrary vector (taken from http://en.wikipedia.org/wiki/Rotation_matrix, "Rotation matrix from axis and angle"). The rotation I coded is plain archaic, press a to look to the left, d to the right, z up, x down, etc. And I can't figure out how to make them "stack" properly so its either look to the sides or up and down.

Adding to that, I'm kinda confused on how many space transformations do I need for some things. For example, say that I load a model. I have to put it in world space first, what would involve? I imagine translation, scaling and rotation. That would be my "model to world" transform i guess.

Then I have to transform that to view space. And that's where the first person camera kicks in and when I get lost.

I've seen examples using matrix stacks (either the deprecated ones or the implementation of GLM) and I can't figure out how those work nor if there is a better way than using them (I guess they got deprecated for a reason?), so I didn't understand those either.

So what do I need to get a camera working? And which transforms differentiate one space from another?

If you wan't me to post code or upload the jar to see the issues I'm having just ask.

##### Share on other sites

You've got world right: world transform scales, rotates, and places (translates) your object out of its object-space (model-space) coordinates into your world coordinates.
View happens in world-space, it simply reorients everything around the eye point and look-at direction,

Perspective projection is the matrix that converts things to screen space coordinates (with parallax, scaling according to distance from camera, etc).

The correct multiplication order should first apply your world transform to your model, then the view transform, then the perspective transform.

...unless there's something weird about OpenGL, as I'm basing this on my experience with DirectX.  I do think the underlying math principles are the same though.

##### Share on other sites

Most of the "easy, convenient and beginner friendly" stuff was removed to clean up the API, because it was kind of silly to have three or more ways of doing the same thing. They are still around through the compatibility extension, though.

The easiest way to handle any camera to me is to treat it like any other object. Give it a transformation matrix, understand that the columns of that matrix are right/up/forward/position vectors (depending on your setup) and that to turn it into a "view matrix" you just have to invert it. Every frame you add the new transformations depending on your input and set the inverted matrix as view matrix.

It also means that "attaching" a camera to an object or moving it for cut scenes is literally straight "forward", compared to trying to keep the inverted matrix and apply everything "in reverse".

##### Share on other sites

BCullis, on 29 Jan 2013 - 11:28, said:
BCullis, on 29 Jan 2013 - 11:28, said:
...unless there's something weird about OpenGL, as I'm basing this on my experience with DirectX. I do think the underlying math principles are the same though.

I haven't messed with OGL for a while, but unless they've made some pretty drastic changes it's the same.

MSDN has a decent article: http://msdn.microsoft.com/en-us/library/windows/desktop/bb206269(v=vs.85).aspx

Here's a really annoying video where a 5-year-old teaches the world matrix with the assistant of his middle-aged manservant. I tried finding a better one, but entering terms like 'world', 'view', 'projection' and 'matrix' into youtube means that you get flooded with thousands of new-age and conspiracy videos.

">

In terms of rotations, you can rotate each axis individually and then multiply the resulting rotations to get your final rotation, but beware of gimbal lock and remember that the order in which you multiply effects the final rotation. Alternatively you can use quaternion rotation.

The view matrix, as Trienco mentioned, is just the inverse of the matrix to transform the camera from the origin to its position and orientation.

The perspective matrix... (Did they really remove all of the helper functions? That seems asinine since they're used constantly and more or less everyone will just have to rewrite them....) The MSDN article goes into it if you follow the perspective link. Edited by Khatharr

##### Share on other sites

You've got world right: world transform scales, rotates, and places (translates) your object out of its object-space (model-space) coordinates into your world coordinates.
View happens in world-space, it simply reorients everything around the eye point and look-at direction,

Perspective projection is the matrix that converts things to screen space coordinates (with parallax, scaling according to distance from camera, etc).

The correct multiplication order should first apply your world transform to your model, then the view transform, then the perspective transform.

...unless there's something weird about OpenGL, as I'm basing this on my experience with DirectX.  I do think the underlying math principles are the same though.

All right. But then, to go from world to view wouldn't require another translation (camera position), rotation(where the camera is looking at) and scaling (zoom effects)? Or I'm missing something?

Math is the same, I mentioned LWJGL and OGL 3.3 so no one would say "just use gluLookAt!" or "use this GLM/freeGlut handy function!".

Most of the "easy, convenient and beginner friendly" stuff was removed to clean up the API, because it was kind of silly to have three or more ways of doing the same thing. They are still around through the compatibility extension, though.

The easiest way to handle any camera to me is to treat it like any other object. Give it a transformation matrix, understand that the columns of that matrix are right/up/forward/position vectors (depending on your setup) and that to turn it into a "view matrix" you just have to invert it. Every frame you add the new transformations depending on your input and set the inverted matrix as view matrix.

It also means that "attaching" a camera to an object or moving it for cut scenes is literally straight "forward", compared to trying to keep the inverted matrix and apply everything "in reverse".

Ohh, I read that somewhere in Arcsynthesis book. So, a view matrix is the inverse of the various transformation matrices that place a camera in the world? So say that I'm at (30,30,0) coords in the world, looking 45º down (rotation of 45º along a (1,0,0) unit vector), I multiply those matrices, invert the result, and I have my view matrix?

BCullis, on 29 Jan 2013 - 11:28, said:
BCullis, on 29 Jan 2013 - 11:28, said:
...unless there's something weird about OpenGL, as I'm basing this on my experience with DirectX. I do think the underlying math principles are the same though.

I haven't messed with OGL for a while, but unless they've made some pretty drastic changes it's the same.

MSDN has a decent article: http://msdn.microsoft.com/en-us/library/windows/desktop/bb206269(v=vs.85).aspx

In terms of rotations, you can rotate each axis individually and then multiply the resulting rotations to get your final rotation, but beware of gimbal lock and remember that the order in which you multiply effects the final rotation. Alternatively you can use quaternion rotation.

The view matrix, as Trienco mentioned, is just the inverse of the matrix to transform the camera from the origin to its position and orientation.

The perspective matrix... (Did they really remove all of the helper functions? That seems asinine since they're used constantly and more or less everyone will just have to rewrite them....) The MSDN article goes into it if you follow the perspective link.

I'm reading that MSDN article right now. Too bad that the View Transform section uses functions that I don't have. There wasn't any helper function in OGL, but functions to deal with the fixed-function pipeline constructs which had matrix stacks, projection matrices, transforms, etc. Since those constructs are removed in modern OGL, I have to implement them myself.

There is GLM which is a C++ library that do has some helper functions and data types that mirror GLSL types for example. I'm using Java though.

Anyway, I should point out that I had separate rotations for each axis working before, and those did stack. It was an awful amount of multiplying though (3 separate matrices for rotations, one for translation and the perspective). I was looking for a more straightforward way.

.

I thought that the general rotation matrix with 1 vector and 1 angle would work but it didn't quite worked as I imagined.

In the code I set up three unit vectors, (1,0,0), (0,1,0), (0,0,1). A 3 floats for each angle and an offset (say, +5 degrees for each update). Camera looking down Z axis by default. When you press a key that rotates, say, along Z , I send the angle (currentAngle = currentAngle + offset) and the Z unit vector.

That works, it rotates the world along the Z axis by currentAngle (in radians). But if I rotate again, along Y for example, the rotation will work as if the camera was looking down the Z axis again (which it isnt, because I rotated it before). So each rotation works separately.

Then I thought, well yeah because the unit vectors, which represent the axis of rotations, never changed. So I set it up so after a rotation, it rotates each vector by the same matrix and then normalizes the result. That didn't worked either, it still rotates as if the camera is always looking down the Z axis, but in awkward ways.

Even if it didn't worked, I couldn't recognize where I was doing my world transform and view transform. Now I think that I'm not doing any world transform since I did not need it for what I was doing. It renders the vertices as they are (it is supposed to be the terrain so (0,0) position of the terrain is (0,0) of the world), and then trying to transform them to camera space.

Edited by TheChubu

##### Share on other sites

All right. But then, to go from world to view wouldn't require another translation (camera position), rotation(where the camera is looking at) and scaling (zoom effects)? Or I'm missing something?

Yes, it does.  I'm just saying it doesn't change "spaces" when you apply the view matrix.  Like others have said, it essentially applies an inverse of the camera's position, rotation, and scale.  I use helper functions like "createLookAt", but my camera stores per-axis rotations and a world-position vector that end up being used for the calculations (world position gives the origin for the look-at vector, and the camera rotations are fed into a "CreateRotationYawPitchRoll" matrix that modifies the camera's orientation vectors (left, up, and forward)).

Gah, any more parentheses there and I'd be writing Lisp o_o

##### Share on other sites

I see. Now it makes more sense. So, where the look-at vector comes from? It is computed from mouse input?

##### Share on other sites

Personally I would absolutely avoid storing angles, though it works fine for a simple FPS style camera (just make sure you always rotate around "up" first and "right" second). The simple fact is that the order in which your rotations are applied is important and completely lost if you just accumulate angles.

Also, pleaaaase don't abuse scaling to zoom. Zooming is a perspective effect and the result of focusing your lens to show a smaller area, ie. what you get by reducing your field of view. Scaling will scale your objects, can make them grow beyond your near plane or even behind your camera and suddenly zooming turns into a wall hack.

My usual camera class is using my own vector/matrix stuff, so it would be mostly useless to you, but the basic setup is that it stores a transformation matrix (4x4). All rotations and translations are directly applied and accumulated in that matrix when processing the users input. Since matrix math is neat, applying a rotation around (1,0,0) will always look up/down around your current "right", just like a translation along (1,0,0) will always strafe left/right. For the rare situations where you want to move/rotate using the world axes, you just multiply the new transformation from the other side (to visualize it, one way the rotation happens after the previous transformations, the other way it happens before... while all local axes are still aligned with the world axes).

Btw., your typical lookAt function will just build a transformation matrix from the passed vectors and then invert it, so it becomes extremely superfluous if you store the transformation matrix in the first place. Inverting it is simple, as long as you stay away from scaling.


viewMatrix = Matrix44(
Transform[0], Transform[4], Transform[8], 0,
Transform[1], Transform[5], Transform[9], 0,
Transform[2], Transform[6], Transform[10], 0,
-(Transform[0]*Transform[12] + Transform[1]*Transform[13] + Transform[2]*Transform[14]),
-(Transform[4]*Transform[12] + Transform[5]*Transform[13] +    Transform[6]*Transform[14]),
-(Transform[8]*Transform[12] + Transform[9]*Transform[13] + Transform[10]*Transform[14]), 1);


##### Share on other sites

I see. Now it makes more sense. So, where the look-at vector comes from? It is computed from mouse input?

It depends on your implementation.  I could technically say "yes" when talking about my own code, because mouse movement alters the camera's rotation, and that rotation modifies the camera's "Forward" vector, which is used together with the camera's world position to make my look-at matrix.  But the DirectX LookAt method takes an origin and a destination, and calculates the matrix that would represent a view "looking at" the destination from the origin.

The following is C# code, but probably explains better than I just tried to:

(_rotation is a vector representing x, y, and z-axis rotation of the camera)

public virtual void Update(float timeDelta)
{
Matrix rotation = Matrix.RotationYawPitchRoll(_rotation.Y, _rotation.X, 0);
_forward = Vector3.TransformCoordinate(Vector3.UnitZ, rotation);
_up = Vector3.TransformCoordinate(Vector3.UnitY, rotation);
_left = Vector3.Normalize(Vector3.Cross(_forward, _up));
_viewMatrix = Matrix.LookAtLH(_worldPosition, _worldPosition + _forward, _up);
}

Edited by BCullis