Homogenous coordinates

Started by
7 comments, last by alvaro 10 years, 4 months ago

Hey there, can anybody explain to me how homogenous coordinates are used in computer graphics? When I look around on the internet it seems to be aimed towards math students.

What I read was stuff about projective lines and planes and points at infinity, which made me really confused so I got kinda disouraged from looking into it further.

Could anyone give me a practical example maybe?

Advertisement

http://www.gamedev.net/page/resources/_/technical/graphics-programming-and-theory/the-total-beginners-guide-to-3d-graphics-theory-r3402

The article you linked actually makes no mention of homogenous coordinates. I understand the principle of dividing by w though, but I can't understand the link between that and homogenous coordinates, if there is one.

In general, all you really need to know about homogeneous coordinates is that they're 4 dimensional coordinates, where the fourth component (the w component) is 1. This lets you perform affine transformations (linear with translation) with a simple matrix multiplication. The 1 in the w coordinate is required in order for the translation part to work correctly. A projection transformation is basically just a regular transformation except that instead of leaving the w coordinate untouched, it puts the viewspace z in there, and then later in the pipeline the rest of the coordinates are divided by w to get your points in the normalized device coordinates. For things like normals or vectors representing directions rather than points (cases where you don't want translation applied) then you can set their w component to zero, and the transformation will be applied without translation.

There is probably some higher mathematical structure to all of it, but like you discovered when you went to read about homogenous coordinates, it can be pretty mathy. And in my opinion, not very useful to know. You can get pretty far by only understanding what I wrote in the first paragraph. In my experience I haven't had to know more than that.

There is a deep and beautiful mathematical structure that justifies why things work the way they do, called projective geometry. But Samith's description is good enough to get by.

I studied only Math at my university, and the first year we happily appended a 1 at the end of a vector so we could do translation together with a linear transformation. When you do that, you are in the realm of affine geometry, or Euclidean geometry if you also use the Euclidean metric. The full story of projective geometry was only introduced on the second year, and it took several months to do so. Although I really enjoyed it, it probably doesn't matter a whole lot for games. Euclidean geometry is all you need to know for games, except for the part where you project to the screen, and it's no big deal if you only have a superficial understanding of that part.

Thanks guys for answering my question. Alvaro you do make it sound very interesting, but I will put my questions at rest for now ^^

An easy way to get a hang of 3D homogeneous coordinates (which consist of 4 coordinates) is to see them as a higher dimension analogue to fractions.
- A fraction $a/b$ represents a number on the 1D real axis, but really consists of 2 "coordinates", ie. (a,b). In a similar way, the 3D vector (x/w,y/w,z/w) corresponds to the 4D homogeneous vector (x,y,z,w).
- Just as with fractions, multiplying all components by the same value doesn't change its value, ie. a/b = (sa)/(sb), or in coordinate form (a,b) = (sa,sb), for some scalar s. Similarly, since (x/w,y/w,z/w) = ((sx)/(sw),(sy)/(sw),(sz)/(sw)), the homogeneous vectors (x,y,z,w) and (sx,sy,sz,sw) are considered the same vector.
And then it just so happens that many operations on 3D vectors can be done much more elegantly on their homogeneous representation, most importantly, transformations like translation and projection become representable by just a matrix. Since for a 4x4 matrix a 4D vector and a scalar s, A(sx) = s(Ax), multiplying a homogeneous vector by a matrix is well defined, ie. if two vectors are equivalent (they represent the same 3D point, even while their 4D representation may be different), they will still be equivalent after they're multiplied with a matrix.

If it helps, you can imagine things in a 2D plane and the projective space being 3D. We can define a point in a 3D space \( (x,y,z) \) which corresponds to a 2D point \( (X,Y) \Rightarrow (\frac{x}{z}, \frac{y}{z}) \). This is like having 3D points that project onto the \( z=1 \) plane.

As others have said, it's really convenient in graphics to use homogeneous coordinates. One other thing that I don't think that's been mentioned about projective geometry is that you can specifying points at infinity using finite numbers. For a 2D point at infinity, you can specify in a 3D projective space as \( (x,y,0) \). This produces the 2D point \( ( \frac{X}{0} \frac{Y}{0}) \), which corresponds to infinity. I'm not sure about the actual mathematics of how that works (Alvaro can probably explain), but as the limit as Z approaches 0, the 2D point coordinates get larger until at 0, they become infinite.

If you aren't curious about the mathematical formalism behind projective geometry, you can stop reading.

If you already know the mathematical formalism behind projective geometry, you can stop reading.

If you are still here either you are a masochist, you are trying to catch any mistakes I might have written, or you are genuinely interested in learning the mathematical formalism behind projective geometry.


Take a vector field V, define an equivalence relation in the set of non-zero vectors of V that makes two vectors related if one is proportional to the other. The classes of this equivalence relation are called "projective points". The class of the point (x1, x2, ..., xn) is written [x1:x2:...:xn], although there are other conventions. The set of projective points is a "projective space", called P(V). Linear subspaces of V can be mapped to subsets of P(V) in the natural way. A 1-dimensional linear subspace of V corresponds to a projective point, a 2-dimensional linear subspace of V corresponds to a projective line, and so on.

Although we are primarily interested in the 3-dimensional projective space (that is P(R^4)), it helps to think of the projective plane to develop some intuition, as cadjunkie says. The projective plane is the set of lines that pass through the origin of R^3, which we now call "projective points", or simply "points". You can understand most of what the projective plane is by using an "affine chart": pick a plane in R^3 (z=1 being a common choice) and identify each projective point (remember this means a line through the origin of R^3) with its intersection with the plane. There are projective points that are parallel to the plane, though, and those you don't see in the affine chart. We call these points "points at infinity". But remember that being a point at infinity is a function of what plane we chose for the affine chart.

The mathematics of the projective plane work just fine without introducing an affine chart, so there is nothing special about points at infinity. But here's one interesting bit: There are no parallels in the projective plane, so if you have two different [projective] lines, they intersect at a [projective] point. A projective line is of course a plane that goes through the origin of R^3. If you are using an affine chart, you'll normally see a projective line as its intersection with z=1, which is a conventional line. What happens to lines that are parallel in the affine chart? If you think about them as planes that pass through the origin, they intersect in a line that is parallel to z=1. That is, they intersect at a projective point at infinity. You can think of a point at infinity as being the point where parallel lines meet.

Let's now move to the three-dimensional projective space, P(R^4). If you have a hard time picturing a four-dimensional vector space, you'll just have to handle the algebra, but it works just the same as in the case of the projective plane. We write projective points as [x:y:z:w], and we pick w=1 to be the hyper-plane that defines the affine chart. That is, we'll identify [x:y:z:w] with (x/w,y/w,z/w). All the points in the 3D affine space you know and love can be seen as [x:y:z:1], but you will also have points at infinity, which have w=0. The good news is that you don't have to do anything special about them. As in the case of the projective plane, you can think of a point at infinity as the place where a bunch of parallel lines meet. Imagine all the rays coming from a directional light. The rays are all parallel, and the position of the light is the intersection point of all those lights: That's why in OpenGL you describe the position of a directional light using a fourth coordinate with the value 0.

Hmmm... I don't know if this post will actually help anyone...

This topic is closed to new replies.

Advertisement