Why do I need a 3x3 matrix to transform a 2 dimensional vector?

Started by
5 comments, last by Oxyd 7 years, 4 months ago

I'm going over my graphics theory book, and up to windowing transform from cannonical view volume. [x_pixel y_pixel 1] = [ ... 3x3 matrix ] * [ x_cannonical y_cannonical 1 ]

I worked it all out on paper and I just take a regular 2x2 matrix without the last row which is [ 0 0 1] it produces the same result. So why are we doing this extra step? I think I've also seen DirectX do this where it uses a higher order matrix to transform lower vectors, I think it had some thing to do with homogenous space, never really understood that, can anyone explain please?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

Advertisement

2x2 matrix gives you rotation/scaling

3x3 matrix gives you rotation/scaling/translation

It's the same in 3D -- a 3x3 matrix gives you rotation, but a 4x4 matrix gives you translation.

Oh ok I see now, rotation x translation does produce a 3d matrix. Ok thanks.

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

A deeper way to look at it is that the affine transformations we are interested in are a particular type of projective transformation. In projective geometry we use 3 coordinates for a point on the plane, and 3x3 matrices for transformations.

To got into a bit more detail, a linear transform (i.e. a 2x2 matrix in 2D) cannot move the origin anywhere. Consider any 2x2 matrix M = [a, b; c, d] – no matter what a, b, c, d are, the point (0, 0) always maps to itself under M: [a, b; c, d] * (0, 0) = (0 * a + 0 * b, 0 * c + 0 * d) = (0, 0).

This means that while you can do scaling and rotation about the origin with a 2x2 matrix, you cannot do translations: If you move every point in the space in some direction, you necessarily move the origin as well – but as I just showed, a 2x2 matrix cannot do that.

So the trick is to go one dimension higher and then project the result back to the original dimension. In homogeneous coordinates, your origin maps to the vector (0, 0, 1) – since it's not all zeroes, matrix multiplication now can do something nontrivial with it.

To got into a bit more detail, a linear transform (i.e. a 2x2 matrix in 2D) cannot move the origin anywhere. Consider any 2x2 matrix M = [a, b; c, d] – no matter what a, b, c, d are, the point (0, 0) always maps to itself under M: [a, b; c, d] * (0, 0) = (0 * a + 0 * b, 0 * c + 0 * d) = (0, 0).

This means that while you can do scaling and rotation about the origin with a 2x2 matrix, you cannot do translations: If you move every point in the space in some direction, you necessarily move the origin as well – but as I just showed, a 2x2 matrix cannot do that.

So the trick is to go one dimension higher and then project the result back to the original dimension. In homogeneous coordinates, your origin maps to the vector (0, 0, 1) – since it's not all zeroes, matrix multiplication now can do something nontrivial with it.

That makes sense regarding translation, but about homogeneous coordinates, is that all that it is just adding that one to account for translation?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

Linear transforms plus translations gives you what's known as affine transformations. To represent an affine transformation in 2D, you actually need just a 2x3 matrix – the last column being the translation vector. If you use full 3x3 matrices, you also get the ability to represent perspective projections, in addition to linear transforms and translations.

For affine transformations, it really is about just tacking the 1 at the end of your vector. In fact, if you multiply a 2x3 matrix with a 3x1 vector, you get a 2x1 vector out – losing the extra 1 that got tacked at the end. If you tacked any other number at the end, your translation vector would be multiplied by that number before being applied to your vector – which in most cases is undesirable.

With full 3x3 matrices, this becomes a bit more interesting, since not only does 3x3 matrix multiplied with 3x1 vector give you a 3x1 vector out again, it can also change the last coordinate, so it's not 1 any more. If you now have a general vector in homogeneous coordinates (x, y, W) and want to get back to 2D, what you do is you divide the first coordinates with the last, so you get (x/W, y/W) – this division moves the point closer to the origin the larger W is (and vice versa), and it's what gives you the perspective thing of making things further from the camera appear smaller and closer together. Usually, of course, people mostly care about this when doing 3D rather than 2D, but the same principle applies in 3D as well.

If you just leave the last row of your 3x3 matrix to be (0, 0, 1), this matrix won't do anything to the last coordinate of your vector, which means you'll get no projective stuff and you'll be left with just an affine transformation. You will however still get a 3x1 vector out of multiplication with such a matrix, which means you can then multiply it with another 3x3 matrix without having to tack the 1 at the end of the vector again. Also, if you have two 3x3 matrices, you can simply multiply them together to get another 3x3 matrix that represents the transformation done by both matrices – if you used 2x3 affine matrices, you wouldn't be able to multiply them together because of their incompatible dimensions. For these reasons, people usually stick to full 3x3 matrices even when they're not interested in projective transformations.

This topic is closed to new replies.

Advertisement