How does the math of a perspective matrix work?

Started by
4 comments, last by Whoknewb 20 years, 4 months ago
I'm reading along in my book about the 3d pipeline and I can't quite figure out how the perspective matrix transforms 3d coordinates into a unit cube. From my gathering of what the perspective matrix does I believe it should do this 1) Multiply the x coordinate so it lays between [-1,1] 2) Multiply the y coordinate so it lays between [-1,1] 3) Multiply the z coordinate so it lays between [-1,1] 4) Set w = z From that I would believe the perspective matrix would look like this(row major form)

(2/r-l)   0       0      0
   0   (2/b-t)    0      0
   0      0    (2/f-n)   1
   0      0       0      0
That would multiply the x value by 1 over half the width of the screen, the y value by 1 over half the screen's height and the z value by 1 over half the depth. Also it would multiply w*z which would effectively set w = z. All the conditions would be fufilled I believe. But this is not the matrix that my book gives me and unfortunatly it doesn't give an explaniation of the math it just says "The perspective transform matrix that transforms the frutum into a unit cube is given as follows." I transposed the matrix because they had it in column major form, but here it is in row major form hopefully i transposed correctly.

(2n/r-l)    0         0      0
   0     (2n/t-b)     0      0
(r+l/r-l)(t+b/t-b) (f/f-n)   1
   0        0     -(fn/f-n)  0
Here is what I don't understand about this matrix. 1) They not only divide x and y by half the screen's width and height which i understand, but they also multiply by the near plane, why? 2) The y's value has t-b instead of b-t, t-b would give a negative value, is that so they flip the axis? 3) For x and y's value the formula works out like follows x' = x*(2n/r-l) + y*0 + z*(r+l/r-l) + w*0 So r+l/r-l simplifies to -1 and -z gets effectively added on to x's value. If z was 13 for instance, then -13 would get added on to x's unit value. x and y would never have a chance of being put into a unit cube if z was greater then 1. 4) z's value is multipled by f, I have no idea why? also z's value has w*-(fn/f-n) added on to it. Also I have no idea why thanks for reading a lot of questions hopefully someone can answer a few. [edited by - Whoknewb on December 16, 2003 11:21:30 AM]
Advertisement
I''m not quite sure how this works, but I thought X and Y and Z cordinates are translated so that they are on a 2d plane in a "perspective"

Thus: In order to get depth you have to translate the matrix also by near plane and far plane so that the ratio is visible. Now if your near is 4 and far 4000 and point on 500 the point has to seem like it is 1/4 into the screen or the total view distance.

You are confused between the terms perspective and projection. Perspective transform and projection. Projection is where you divide by w(which is some form of z) to get that ratio you are talking about. The perspective transform takes 4d homogenized coordinates from the view frustrum to the Standard View Box, or unit cube [-1,1].

1)

x = (r-l) /2
h = (t-b) /2
z = near plane distance from origin.

You are multiplying x and y by the ratio of the projection (near) plane to 1/2 of the projection plane width/height. What this does is make the projection plane n units from the origin with a width of 2n and a height of 2n. So this just makes the view frustum a right pyramid. Since x,y and z are both the same length, dividing x and y by z gives you (-n/n, n/n), a range of -1 to 1.

If your near plane is set up at 1.0, as it commonly is, then the equation for the x scale is 2*1/(r-l)

2) in the opengl viewport, the top of the screen is greater than the bottom. So the viewport origin is in the lower left.

[edit] On second thought, this is a d3d matrix. The projection matrix can flip the z because screen coords are upside down in d3d.

3) w' = 0*x+0*y+1*z+0w. A 4d vector comes out of the equation. x' may be -13, but has to be homogenized by dividing by the w' component before it makes sense. So x' = -13 / w', where w' = z

4) The z scaling f / f-n simply scales far-near so that (far - near) is of length f. then the -(fn / f-n) translates the near plane to the origin by -n units. In fact, its just -n * the scaling transform f/(f-n). Finally since the w component == z, (n,f) gets transformed to (0, f) and then normalized to (0, 1).

[edited by - Ironpoint on December 17, 2003 3:18:55 PM]

[edited by - Ironpoint on December 17, 2003 3:56:56 PM]
Something that I didn't mention:

(r+l)/(r-l) and (t+b)/(t-b) are there to skew the x and y range in case you have a skewed projection that is looking off to the side. If r and l are the same distance from the z axis, then theres no skewing. (-5, 5) == 0 / 10 = 0

It may seem that t,b,l,r are supposed to have something to do with screen coordinates such as (0,640), (0,480) This isn't the case. If your are doing a FOV based projection, then these are based on near plane, fov, and aspect ratio.

t = tan(fov / 2) * near;
b = -tan(fov / 2 ) * near;

l = -tan(fov / 2) * near * aspect;
r = tan(fov / 2) * near * aspect;

[edited by - Ironpoint on December 17, 2003 3:44:59 PM]
So you''re saying that t,b,l,r don''t have to be in screen coordinates just arbitrary numbers such as -5,5 for l, r?

This topic is closed to new replies.

Advertisement