Projections and transforms, my understanding

Started by
7 comments, last by Lactose 10 years, 5 months ago
After some problems understanding the underlying code in a book I am currently working my way through (OpenGL SuperBible, 5th edition), I decided to take a few steps back and see if I could actually explain the process to myself. Partly due to the "Writing Articles is Good for YOU!" announcement on top here, partly for getting feedback if I'm wrong.
I also tried checking briefly in another book (Real-Time Rendering, Third Edition) for some additional detail.
So, does all this seem correct, or do I have any incorrect assumptions/misunderstandings in the following?
---
Vertex positions
A vertex position is a point in space, with coordinates that uniquely describe where the vertex is.
A vertex position is defined with 1 coordinate per dimension (so a vertex for a 2D position requires a set of 2 numbers, while a 3D position requires a set of 3 numbers). In addition, it has an additional element, which is set to 1. This is required for various matrix math operations, and allows vertex positions and vectors to use the same math.
Vectors
A vector is a directional length, i.e. it contains both a direction as well as a length (its magnitude). A vector is defined in the same way as a vertex position (with 1 element per dimension), but the extra element for vectors is 0, instead of 1.
The direction of a vector is found by looking from the origin (0, 0, 0) and towards the specified coordinates.
The length is defined as the distance travelled from the origin to the specified coordinates.
If the vector's length is 1, the vector is said to be a unit vector, or of unit length.
If the vector's length is not 1, the vector can be normalized, which changes the length to 1, while maintaining its original direction. Normalization is done by dividing each component (e.g. the x, y and z) by the vector's length.
Textures
A texture is a set of data, which can contain e.g. color information, which can be used either directly (applying the color to an object), or indirectly (applying various effects in e.g. a shader). Textures can be multi-dimensional, the most common being a 2D texture.
Models
A model is a collection of vertices (as well as optionally texture and/or normal coordinates), using certain geometric primitives -- generally points, lines and triangles.
The collection of vertices and the knowledge of which geometric primitive is used, are used to define an object, e.g. a teapot.
A model's vertices are defined in model space; the model has no knowledge of the world or where it should be drawn/displayed, it just knows what the model should look like (the spout goes on the front, the handle goes on the back, etc.).
Transformations
We can adjust vertices and vectors (or in general, any point) by transforming it. Transforming can be e.g. translating (moving), rotating and scaling.
Transforms are stored in 4x4 matrices, where a single 4x4 matrix contains translation, rotation and scale.
A point can be transformed by multiplying it with a transformation matrix.
Model Transform & World Space
When we want to place, orient and/or scale a model in a specific place in our world, we transform it from model space to world space by applying a model transform to it.
We can apply multiple model transforms to a single model. If a teapot is residing on a desk (which has its own model transform), the teapot will be affected by both the desk's model transform and its own (which can be considered a model transform containing the offset from the desk).
A single model can be used in different places in the world, by reusing the model data (instancing it) and applying different model transforms to each instance of the model.
Once every model has been transformed by their model transforms, they all exist in world space -- they exist in correct spatial relation to each other.
View Transform & Camera Space
To determine which part of the world is to be displayed on screen, we use the concept of a camera object. The camera object also exists in the world in relation to everything else, with its own translation, rotation, etc. In addition, it has some settings that are specific to it.
For ease of computing, we apply the reverse of the camera's model transform to every model in our world. This places and orients the camera at origin (the orientation depending on the underlying API), and shifts all other objects in relation to the camera object. This maintains the camera's spatial relationship to every model in the world, and is done for computational gains (the math is easier when the camera is at origin).
This transform is called the view transform. The combined transform of the model transform and camera's view transform is called the modelview transform.
Models that have the modelview transform applied to them are said to be in camera space or eye space.
Projection Transform & Unit Cubes
Regardless of camera type, we define a volume which tells us which objects the camera can "see" (e.g. setting the camera's field of view, how far we can see, etc.). This volume is called the view frustum.
The camera can be orthographic, for which the final display will be a parallel projection (no perspective correction). This mainly used for 2D applications, or for architectural applications.
Alternatively, the camera can be perspective, which does not have parallel projection; instead perspective effects are applied (mainly this consists of foreshortening, i.e. making closer objects larger than objects farther away). This is generally used for applications which simulate how we perceive the world with our own eyes.
The graphics hardware only understands coordinates ranging from -1 to 1 (in all 3 dimensions), a so-called unit cube. Thus we apply yet another transform, the projection transform, which transforms the view frustum from whatever size/shape box we had, into a unit cube.
Objects which are within the unit cube are eligible for being displayed on-screen. The other objects are culled away. A special case exists for objects which are partially inside the unit cube; these are clipped to the extents of the unit cube.
Coordinates which have had this transform applied to them are said to be clip coordinates. The hardware automatically performs perspective division on the clip coordinates, leaving you with normalized device coordinates.
Finally, the objects are mapped to the screen/application window, where they are rasterized (converted into pixels on the screen).
Cheat Sheet
Model space combined with model transform --> World space
World space combined with view transform --> Camera space or eye space
Camera space combined with projection transform --> Normalized device coordinates (after automatic perspective division)
Normalized device coordinates combined with rasterization --> Pixels on screen.

Hello to all my stalkers.

Advertisement
Good enough for starting out.

One thing to note is that usually a vertex and a vector have one element more than their dimensionality; a 2d item has 3 elements, a 3d item has 4 elements. This works out much better with the math, and it also simplifies the code.

Because of the way the matrix math works out a vector should have the extra element set to 0 and a point or vertex should have the extra element set to 1. When it goes through a transformation matrix, that value corresponds with the translation. Vectors do not have a location so by having a zero multiplied by the translation they don't move (which would change their magnitude) but they will still rotate, scale, and shear properly. A point does have a location, so having a one multiplied by the translation means it moves.

So for a 3D point: {x, y, z, 1}
and a 3D vector: {x, y, z, 0}

The end result is that you don't need to track points and vectors with different operations and different classes, you can use the same classes and the same math to work with both of them.

There is still a lot to learn and a bit of nuance missing from your post, but what you have there is basically correct.

Thank you.

I know certain bits are a bit rough along the edges, partly due to trying to cut down on some sections (even though it still resulted in a fairly massive post!), partly due to a current lack of knowledge.

What I do know is that writing this did give me a clearer understanding of how stuff works, and it cleared up some misconceptions I had, which should help me as I continue.

I also got some feedback via the chat, which I appreciate :)

Hello to all my stalkers.

To be more precise about "points": a vector can mean a direction or position, depending on how you interpret it(point vector / direction vector). A vertex is a point/edge of a model/primitive, and it can have a number of attributes like position, color, texture coordinates. Hence the GL function names with "vertexAttribute".

Objects which have had this transform applied to them are said to have normalized device coordinates.

After multiplied by the projection matrix they become "clip coordinates". After the perspective division, which happens automatically, they become NDCs.

Models that have the modelview transform applied to them are said to be in camera space or eye space.
Here, we can apply various shaders (on either the vertex level, pixel/fragment level or geometry level). Notably, this is where textures are actually applied to the object

This is a bit confusing.

I have a question. Have you ever modeled in a 3d program? Have you ever used a game engine (Particularly Unity3D).

Have you done any UV unwrapping or model texturing in a 3D modeling package?

Even though I don't know OpenGL nor DirectX, All of your above explanations I can figure out.

It would really help you if you learned about 3d modeling (which OpenGL renders anyhow) and this will help your understanding of OpenGL more.

As Aliii noted, your definitions do sound confusing, especially for someone who may not have done any 3d modeling or know anything about the way 3d models are rendered.

I do know that OpenGL is a graphics library that renders graphics, which you could actually program yourself. But it takes a lot to write a program that renders 3d graphics the way they are rendered today, so people would rather use such large libraries as OpenGL and DirectX to do it for them.

You should write your notes on what you are learning as if you were completely new to everything and needed baby steps to understand. That way if you ever quit all of this for a while, you will be able to simply read your notes and get back into it right away.

For instance, what normal person knows what "rasterize" means? And there are a lot of people who don't know what rendering a pixel to a screen looks like.

Things like that.

They call me the Tutorial Doctor.

I am not sure but Shouldn't those be multiplies in your cheat sheet there rather then additions. I mean the transformation of a vertex from one coordinate space to another is done with a multiply between a matrix of one space by the matrix of another.

Such that for example:

(viewMat * modelMat * ObjectCoordinates) yields modelVeiwMat

and

(modelViewMat * ObjectCoordinates) yields EyeCoordinates

I think that your cheat sheet could be confusing. Also in your cheat sheet you might want to add the order that those multiplies should be done in because unlike regular multiplication, order here maters. (viewMat * modelMat * ObjectCoordinates) is not the same as (ObjectCoordinates * modelMat * viewMat). In opengl the order is from right to left because it is Column-Major.

Column-Major operator-on-the-left == Row-Major operator-on-the-right

so with (viewMat * modelMat * ObjectCoordinates):

first

(modelMat * ObjectCoordinates)

then

(viewMat * (modelMat * ObjectCoordinates))

If you could find a way to tidy that understanding up in your cheat sheet, that would be I think pretty awesome.

J-GREEN

Greenpanoply

I think OP used + to mean "combined with" which is unfortunate since the operation of combining matrices is multiplication and not addition.

Multiplication order does matter, since A * B != B * A, which you correctly point out, this is because row major vs. column major means the matrices get transposed if you change row major to column major and vice versa. Note that transpose(A * B) = transpose(B) * transpose(A)

Your last part is wrong (EDIT: I probably mean "misleading" rather than wrong) though, A * B * C can be done in any order i.e. (A * B) * C = A * (B * C) since matrix multiplication is associative. Brackets are unnecessary for chained multiplications. (You still need brackets for distributivity though, A * (B + C) != (A * B) + C [in fact A * (B + C) = (A * B) + (A * C), just like distributivity of * over + for integers])

What you can't do is swap left/right multiplication operands, i.e. A * B * C != A * C * B. That is because B * C != C * B in general.

"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

I think OP used + to mean "combined with" which is unfortunate since the operation of combining matrices is multiplication and not addition.

Correct. To be honest, I wasn't even intending to include the last part, and I was also mainly writing this for myself to serve as a sanity check more than a guide.

I'll reply more properly once I get back from work.

Hello to all my stalkers.

I have a question. Have you ever modeled in a 3d program? Have you ever used a game engine (Particularly Unity3D).

Have you done any UV unwrapping or model texturing in a 3D modeling package?

Even though I don't know OpenGL nor DirectX, All of your above explanations I can figure out.

It would really help you if you learned about 3d modeling (which OpenGL renders anyhow) and this will help your understanding of OpenGL more.

As Aliii noted, your definitions do sound confusing, especially for someone who may not have done any 3d modeling or know anything about the way 3d models are rendered.

I do know that OpenGL is a graphics library that renders graphics, which you could actually program yourself. But it takes a lot to write a program that renders 3d graphics the way they are rendered today, so people would rather use such large libraries as OpenGL and DirectX to do it for them.

You should write your notes on what you are learning as if you were completely new to everything and needed baby steps to understand. That way if you ever quit all of this for a while, you will be able to simply read your notes and get back into it right away.

For instance, what normal person knows what "rasterize" means? And there are a lot of people who don't know what rendering a pixel to a screen looks like.

Things like that.

I have done some very very basic modelling in 3D Studio Max some years ago. I would consider myself a novice at modelling, but I do have a decent-ish understanding of a lot of the concepts and processes involved.

I have dabbled in using Unity3D on my spare time, but I'm far from an expert on using it.

I work with game development, but we do not use a commercial engine. Nor do I do this kind of stuff -- I'm doing this as a home project, with a specific goal I'm working towards.

Based on your post and your name, I think you might have read this a tutorial for people who are brand new to the material. This wasn't the intent, although I can see benefits to such a tutorial or article being written.

I've seen some of your other posts advocating learning 3D. I don't necessarily disagree with it being helpful, but it's not where my current focus is, nor do I think it's what I'd gain the most from right now. I might be gloriously wrong, but for now learning 3D is not something I'll spend time on.

I should possibly also made it clearer what the intent of the thread was, and have more specific questions to make feedback easier. I'll definitely keep that in mind for future threads.

I'm reading up on this stuff on my own, while jotting down notes as I go along. At a certain point, I had some problems, which typing up the post helped solve/make clearer.

I posted this mainly as a sanity check, not expecting anything more detailed than frob's answer. That said, I should probably have done another pass on it to clean up some things, and to have a more consistent detail level throughout.

I think that your cheat sheet could be confusing. Also in your cheat sheet you might want to add the order that those multiplies should be done in because unlike regular multiplication, order here maters. (viewMat * modelMat * ObjectCoordinates) is not the same as (ObjectCoordinates * modelMat * viewMat). In opengl the order is from right to left because it is Column-Major.

...

If you could find a way to tidy that understanding up in your cheat sheet, that would be I think pretty awesome.

I wasn't originally planning on including the cheat sheet part. This is one of the things I should have done another pass over before posting.

Judging by some comments I've received various places, I think I'll do a pass over this tomorrow/this weekend, and try to clear up any confusing parts. I probably won't do any drastic changes, just update/correct the worst parts. Hopefully, that might make the post more relevant for other people struggling with the same issues I was.

I'll update the post now to clarify that I don't mean "mathematical addition" but "combined with" in the cheat sheet, though.

I appreciate the comments and feedback, both here in the thread, on the chat and other places.

Hello to all my stalkers.

This topic is closed to new replies.

Advertisement