Homogenous space

Started by
16 comments, last by Paradigm Shifter 10 years, 10 months ago

I am reading book "3D Math primer for graphics and game developement", and i am stuck at this topic as it does not explain it well.


To understand how the standard physical 3D space is extended into 4D, let’s first examine
homogenous coordinates in 2D, which are of the form (x, y, w). Imagine the standard 2D plane
existing in 3D at the plane w= 1. So the physical 2D point (x, y) is represented in homogenous
space (x,y, 1). For all points that are not in the plane w= 1, we can compute the corresponding 2D
point by projecting the point onto the plane w= 1 and dividing by w. So the homogenous coordinate (x,y,w) is mapped to the physical 2D point (x/w,y/w).

1. Where this w = 1 come from?

2. If some point is outside of this plane w = 1 then it have different w value?

3. Is w absolute distance to plane?

4. If i divide points with w which is 1 then i get same points, so what is the point?

5. Any other book you could recommend that explains this w thingy better?

Thank you for your time.

Advertisement

Not a mathematical explanation, but a practical:

Its useful when doing the perspective transformation.

This w is what is responsible for things further away looking smaller, and things closer looking bigger.

For 3D graphics, you let w=1 be the projective plane, the screen, and points further away will have w > 1, and closer will have w < 1.

The division will map points from the projective space to screen space.

Hope that gives some hint

First of all, the correct (or at least more common) name of this space is (real) projective space and not homogeneous space. An homogeneous space is usually something different in mathematics*. I've never seen the term homogeneous space used in this context.

The term projective space comes from the fact it is the smaller/more natural space in which (central) projections can be defined. You can visualize this space as the set of lines passing through the origin in the (real) vector space one dimension larger. So the projective plane (the projective space of dimension two) is for example the set of 3D lines in the form

P(t) = t * (X, Y, Z).

Each line can be identified by any of its non-zero points. The homogeneous coordinates of a point of the projective space are the coordinates of one of the non-zero points of the corresponding lines and are written between square brackets (and often separated by a colon) [X0 : X1 : ... : Xn]. These homogeneous coordinates are not unique: two coordinates which differ by a non-zero scalar multiple represent the same point. The division by W is simply a way to choose a unique representative of a point. You basically represent all lines by their intersection with the plane W=1. All the lines parallel to that plane cannot be represented in this way. They are called points at infinity and they usually represent directions.. Note that dividing by W is just a convention and it is possible to divide by any other coordinate or use any other general plane.

So, why using a space like that for projections? The main reason is that a projection transform a point with coordinates [x : y : z : 1] to some point [X : Y : Z : W] where W is no longer equal to one.. It is thus necessary to divide by W to get back a 3D point.. This is the main reason homogeneous coordinates have been defined and used in computer graphics.

The projection matrices are usually chosen so that the W coordinates basically represents the depth of the transformed point and the view frustum is transformed to the cube with all three coordinates inside [-1,1] after W divide (there are actually different conventions here between DirectX and OpenGL here..). But this is just a useful convention.

* It is usually used to denote some kind of space with a transitive G-action..

@Olof Hedman

Thank you for nice explanation.

@apatriarca

Well, that is all nice what you have wrote, i reread it multiple times and i still can't get it.


First of all, the correct (or at least more common) name of this space is (real) projective space and not homogeneous space. An homogeneous space is usually something different in mathematics*. I've never seen the term homogeneous space used in this context.

I don't care about math concepts, just want to learn basic math application used in graphics and game development.

"Homogenous space" was the topic label in the book, how could i know which is correct name?


The homogeneous coordinates of a point of the projective space are the coordinates of one of the non-zero points of the corresponding lines and are written between square brackets (and often separated by a colon) [X0 : X1 : ... : Xn]. These homogeneous coordinates not unique: two coordinates which differs by a non-zero scalar multiple represents the same point...

...transitive G-action..

I am totally lost from here, too much abstract and generalized. blink.png

Do you realize this thread is in "For beginners" forum, so could you possibly put this in easily understandable way?

Maybie i could visualize if i have some practical example, so i could know in what space/ranges these points/directions are and see why this division by w mean/results out.

I have a feeling that most of you on this forum have a need to brag with your knowledge but that doesn't help us who want to learn something. unfortunately most of the authors who wrote books have that need too which makes learning a lot harder then it needs to be. No disrespect meant.

Thank you for understanding and time.

Apatriarca is being accurate, not trying to 'show off'. I'll try to explain it in simpler terms.

3D linear algebra is the standard way of performing 3D math in games. Its allows us to do all the fun things (rotate, scale objects, ect...) with rather simple math. Simple is good because its fast. Problem is, you can't actually do two important things with standard 3D linearl algeabra. The 1st is translations, and the 2nd is perspective projections. Translations allow you to move objects around the scene, and perspective projection is necessary to mimic reality, where things farther away from us are smaller than things closer.

So to fix this problem we extend the 3 dimensional space to a 4 dimensional space, often called by programmers "Homogenous space". We do all our math in 4D space, then at very end the video card 'fixes' everything up for us. This fix is actually a 4D to 3D projection. If we have a 4D vector of the form (x,y,z,w) the video card projects it to 3D by dividing the vector by w, which gives us (x/w,y/w,z/w,1). This gives it its final 3D point on screen. The (x,y) is then used for the triangle coordinates on screen, and the z used for depth buffering and interpolation, and the w coordinate is generally ignored.

Apatriarca was explaining in detail the 4D to 3D projection, and while it is good to know, to be honest isn't probably that necessary.


First of all, the correct (or at least more common) name of this space is (real) projective space and not homogeneous space.

Yeah, AFAIK, the word "space" doesn't belong there. We're contrasting 4D homogeneous coordinates to 3D/2D Cartesian coordinates.

Personally, I call it post-projection space, and after the divide by w, I call it NDC-space/clip-space.

We (programmers) generally use "blah-space" to refer to some particular "basis" (view-space, world-space, post-projection space, etc...), but "w" comes into play not just because we're using some particular basis/space, but because we've switched from one coordinate system to another.

i.e. When working in object-space, view-space or world-space, we generally use 3D cartesian coordinates to identify points. When we're dealing with post-projection-space we switch over to using 4D homogenous coordinates to represent our points. We then transform them back to 2D cartesian coordinates in screen/viewport-space to identify pixels.

The reason we switch over to using 4D homogenous coordinates in order to implement perspective is because all our linear algebra (matrix/vector math) is... well... linear... whereas perspective is a non-linear effect! i.e. you want to scale something with "1/distance", which when you plot it, it isn't a straight line (not linear).

By working with 4D coordinates, where we say "we will divide by w later on", then we can continue to use linear algebra (matrices) to operate in a "non-linear space" (laymens terms, not formal math terms tongue.png).

I understand that "homogeneous space" is how your book call it, but if you search that on google you get something completely different. "projective space" is a much better search query. This is the main reason I have introduced that name in my previous post.

I think you should simply look for something like "homogeneous coordinates computer graphics" on google. There are hundreds of beginner articles about these things already and it is not possible to explain everything is a simple forum post. If you don't care about mathematical concepts, then you should probably just understand how perspective projections are derived and basically consider the W division step as a technical trick. Indeed, if you compute the perspective projection equations, you will see that it is necessary to divide by z at some point. You can't divide by a coordinate using matrices. The trick is thus to write the denominator in a fourth coordinate and then divide by it after the matrix multiplication.

The additional coordinate is actually also used to make it possible to use matrices to represent translations but this has nothing to do with the notion of homogeneous coordinates.

Yeah, AFAIK, the word "space" doesn't belong there. We're contrasting 4D homogeneous coordinates to 3D/2D Cartesian coordinates.

Personally, I call it post-projection space, and after the divide by w, I call it NDC-space.

We (programmers) generally use "blah-space" to refer to some particular "basis" (view-space, world-space, post-projection space, etc...), but "w" comes into play not just because we're using some particular basis/space, but because we've switched from one coordinate system to another.

i.e. When working in object-space, view-space or world-space, we generally use 3D cartesian coordinates to identify points. When we're dealing with post-projection-space we switch over to using 4D homogenous coordinates to represent our points. We then transform them back to 2D cartesian coordinates in screen/viewport-space to identify pixels.

The reason we switch over to using 4D homogenous coordinates in order to implement perspective is because all our linear algebra (matrix/vector math) is... well... linear... whereas perspective is a non-linear effect! i.e. you want to scale something with "1/distance", which when you plot it, it isn't a straight line (not linear).

By working with 4D coordinates, where we say "we will divide by w later on", then we can continue to use linear algebra (matrices) to operate in a "non-linear space" (laymens terms, not formal math terms tongue.png).

Yes, I know. But homogeneous space is not a particularly useful search query in any case. I've actually always used the terms clip(ping) coordinates (and thus clip(ping) space) and normalized device coordinates (NDC) (and thus NDC-space) to denote the coordinates post-projection and post W-division in the graphics pipeline. The homogeneous coordinates has always been a more general term for me.

I understand that the way we use the homogeneous coordinates in the graphics pipeline is often not mathematically correct. We don't really care about what a projective space really is and what operation we can do on these coordinates. Some more advanced theory is however in my opinion sometimes useful to understand, for example, what is preserved by a projective transformation.

Hello

The 2 only needed things to know about homogeneous coordinates :

  • w=0 for direction 4D vectors (i.e. normals)
  • w=1 for position 4D vectors (i.e. vertices)

When rendering, the GPU will itself do the w-division, you don't need to do it yourself in this context.

If you need to do yourself calculations on 4D vectors, be sure to do yourself the w-division at the end if you have (at least) one perspective projection matrix in your transformation.

Hope it's clear rolleyes.gif

Thanks. I know the difference between direction (xyz, w=0) and point (xyz, w=1) and how to multiply/transform vector with matrix. What i don't understand is this projecting to plane where w is not 0 nor 1.

This topic is closed to new replies.

Advertisement