OpenGL Matrices, explanation?

Started by
13 comments, last by Prefect 15 years, 8 months ago
I tried to read this (http://www.sjbaker.org/steve/omniv/matrices_can_be_your_friends.html), but it's very confusingly written in my opinion. For example, he writes, "Well, if we neglect the translation part (the bottom row)", and the very next thing he writes is "After that, you just add the translation onto each point so that: " but he doesn't add the bottom row, he adds something else. And stuff like that seems to be abundant on that page... So I was wondering, could someone give me an example of how to use it?
Advertisement
IMHO the author of the cited article will say the following:

A (affine) transformation matrix as used by OpenGL has a principle layout like
[ x_x  y_x  z_x  t_x ][ x_y  y_y  z_y  t_y ][ x_z  y_z  z_z  t_z ][  0    0    0    1  ]
. If only rotation and translation is used, then those x_x ... z_z denote the rotation, and those t_x ... t_z the translation. (BTW: Scaling and shearing will also be encoded into the x_x ... z_z.)

Then the author claims that such a matrix can be split into the rotational part and the translational part this way:
[ x_x  y_x  z_x  t_x ]    [ 1  0  0  t_x ]   [ x_x  y_x  z_x  0 ][ x_y  y_y  z_y  t_y ] == [ 0  1  0  t_y ] * [ x_y  y_y  z_y  0 ][ x_z  y_z  z_z  t_z ]    [ 0  0  1  t_z ]   [ x_z  y_z  z_z  0 ][  0    0    0    1  ]    [ 0  0  0   1  ]   [  0    0    0   1 ]

And due to a mathematical rule that says
( M1 * M2 ) * M3 == M1 * ( M2 * M3 )
one can interpret that splitted matrix T * R, although being applied to a point p in a single step
( T * R ) * p
as a two-step transformation
== T * ( R * p )
that rotates the point (i.e. R * p ) and translates the result (i.e. translates the rotated point). The order of effects if important.
O_O

Well, thanks. I did not understand that though. Could you give me an example of how to use it in a program?
I'm not exactly an expert when it comes to matrices, but this is my take on the page;

In most programming, you would expect an array of 9 elements (lets say int array[9]) to be laid out (if you were using it as a square), like so;
[0][1][2][3][4][5][6][7][8]

So basically, your data is stored in what is known as row-centric order. This just means that address 1 is next to address 2.

Most mathematicians, and OpenGL, treat the data like so;
[0][3][6][1][4][7][2][5][8]

which is called a column-centric matrix. Notice the way the data is laid out differently to the row-centric matrix.

Now, a 3x3 matrix, like above, only allows you to rotate, scale and shear your object.

To keep the math as simple as possible, the matrix needs to be kept as a square, but we need some way to store the position of the object in the matrix. To do this, we make the matrix 4x4, which gives us (keeping the OGL format);
[0] [4] [8]  [12][1] [5] [9]  [13][2] [6] [10] [14][3] [7] [11] [15]


In this way, the translation goes in elements [3],[7] and [11], as x,y and z respectively.

Have a look at the matrix the writer shows you. It looks something like this;
[1][0][0][0][1][0][0][0][1][0][0][0] <-- this is the part you need to notice


The author has made the matrix a 4x3 matrix, rather than keep to the 4x4 matrix he used earlier. When he says to add the translation onto each point, he means take each value of the new position, and add it to the relevant value in the last line of the matrix. I hope this helps with some of your confusion about this.
I know that matrix math isn't that easy to understand, but ifyou understand it you have much less problems interpreting how 3D works. Trust me. So please forgive me, but the following is again some matrix math ;)

If you compose a transformation matrix with OpenGL's API on OpenGL's so-called MODELVIEW matrix stack, you do something like
glLoadIdentity();
glTranslatef(tx,ty,tz);
glRotatef(alpha,ax,ay,az);

If you wonder why I use glLoadIdentity: It is useful to initiate OpenGL's matrix stack with a defined value. All those glRotate and glTranslate routines are ever _multiplying_ self onto what is already onto the stack; see below.

Mathematically this is described by composing a transformation matrix
( I * T(tx,ty,tz) ) * R(alpha,ax,ay,az) =: M
Herein I denotes the identity matrix, R the rotation resulting from glRotatef, and T the translation resulting from glTranslatef.

Notice the order of matrices in the formula from left to right and the order of API calls from top to bottom is the same!

Then you push a vertex into the API, e.g. (using the immediate mode)
glBegin(GL_TRIANGLES);
glVertex3f(px,py,pz);
glEnd();
with the above transformation being active that is hence applied to the vertex position, yielding in the transformed vertex position (I'm dropping the transformation arguments from here on since I'm too lazy to write them down, okay?)
p' := M * p == ( ( I * T ) * R ) * p

What does this mean? It means that OpenGL has a transformation on its stack, namely a composition of a translation and a rotation. This composed transformation is applied as a whole (notice the parantheses) to the vertex position.

Now let us inspect the effect of the transformation. Multiplying the identity matrix with another matrix has no effect, so that
== ( T * R ) * p

We already know that the parantheses play no role w.r.t. the result (see my previous post), so that we can _interpret_ the result (although it is not being applied in this way) as
== T * ( R * p )
what means nothing else than that the vertex position is rotated, and the rotated result is translated. Voila.

Now, isn't it the same as translating the vertex and rotating the result (i.e. the other order)? No, it isn't (in general)! You can simply construct an example: Say, you use
glTranslatef(0,1,0);
glRotatef(90,1,0,0);
glBegin(GL_TRIANGLES);
glVertex3f(0,1,0);
glEnd();
Using what is written above, OpenGL does
rotating [ 0,1,0 ] by 90 degrees around the x axis, yielding in [ 0,0,1 ]translating [ 0,0,1 ] by [ 0,1,0 ] yielding in [ 0,1,1 ]

The other way
glRotatef(90,1,0,0);
glTranslatef(0,1,0); // <-- EDIT here was an error
glBegin(GL_TRIANGLES);
glVertex3f(0,1,0);
glEnd();
does
translating [ 0,1,0 ] by [ 0,1,0 ] yielding in [ 0,2,0 ]rotating [ 0,2,0 ] by 90 degrees around the x axis, yielding in [ 0,0,2 ]
what obviously differs from the first result.

However, sometimes you apply a transformation that is already composed, using
glMultMatrixf(...);
instead of the particular transformations glRotatef, glTranslatef, ... Then, and that is the stuff of my previous post, one can decompose the transformation matrix and interpret it as
T * R
and that is, with the above explanations in mind, equivalent to
glTranslatef(...);
glRotatef(...);

_That_ is what the article said, as far as belonging to the cited section.

[Edited by - haegarr on August 10, 2008 1:17:01 PM]
@webwraith: You're mainly correct, but not totally.

Considering that OpenGL uses column vectors, and furthur uses the 4th scalar of a (homogeneous) vector as the homogeneous co-ordinate, the translation must be in the right-most column but not in the bottom row. Hence, if using a 4x4 matrix, it looks like
[ 1  0  0  t_x ][ 0  1  0  t_y ][ 0  0  1  t_z ][ 0  0  0   1  ]
and for a 4x3 matrix, it looks like
[ 1  0  0  t_x ][ 0  1  0  t_y ][ 0  0  1  t_z ]

Math is tricky. You must choose a methodological approach or you'll get lost in nowhere land in a blink of an eye. Chances are you give up learning math altogether if you choose the wrong approach in the beginning because in doing so, you'll eventually get to a point that you conclude "Well, Math is not my thing!". It is your thing my friend, but you need to choose a slow methodological approach or you'll get lost sooner than you expect it. You can't expect to learn 3D math (or any other field of science) by reading a tutorial or two, because Math is no picnic. You can't rush it. You need to invest a lot of time in learning it. It's tough, I know, but so is life and there is no easy way around either of them!

My apologies if that doesn't apply to you, in which case you can skip this post altogether, but if it does, please take a moment to consider it for your own good.

I don't care what those tutorials are advertising, but there is no fast way to learn Math or programming or any other scientific field. I hold a BSc. in Computer Science, have read a lot of math related books during the years and I do enjoy thinking about mathematical problems, so I'm neither the lazy guy nor do I lack proper trainings, but even I get stuck at math quite a lot, because as I said, math is no picnic.

I strongly suggest that you pick up a copy of "3D Math Primer for Graphics and Game Development" by Fletcher Dunn and Ian Parberry from your favorite library or bookstore and start reading it right away. It's the best elementary book on 3D math that I've come by. There IS a difference between reading 400 pages of 3D math and skimming over a couple of tutorials: you'd be much more knowledgeable when you invest more time and choose the right approach.

... and no, I'm not affiliated with the authors of this book in any way, shape or form. [smile]

Good luck!

[Edited by - Ashkan on August 10, 2008 7:28:14 PM]
Quote:Original post by haegarr
and for a 4x3 matrix, it looks like
[ 1  0  0  t_x ][ 0  1  0  t_y ][ 0  0  1  t_z ]


Minor detail: That's a 3x4 matrix.
Widelands - laid back, free software strategy
Quote:Original post by Prefect
Quote:Original post by haegarr
and for a 4x3 matrix, it looks like
[ 1  0  0  t_x ][ 0  1  0  t_y ][ 0  0  1  t_z ]


Minor detail: That's a 3x4 matrix.


I disagree. :)
Col x Row.
"Game Maker For Life, probably never professional thou." =)
Quote:Original post by haegarr
@webwraith: You're mainly correct, but not totally.
...

Thank you for pointing that out, but the example on the page is the same as the 4x3 matrix that you can see in my post. Perhaps the confusion then lies in whether the author of that page is using row- or column-centric matrices at that point?

[Edited by - webwraith on August 17, 2008 2:37:39 PM]

This topic is closed to new replies.

Advertisement