# What are some advantages and disadvantages of 4x4 matrices for 3D games

This topic is 1171 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I've just started working on my hobby game engine project again, and am trying to figure out the advantages and disadvantages of representing coordinates and rotations in different forms.  I've mostly worked with vectors and quaternions, only occasionally venturing into 4x4 matrices. In the book Real Time Collision detection I noticed 4x4 matrices being focused on more than the blogs do.

Some of the advantages I've picked up are 4x4 matrices scale much better when doing a bunch of transformations at once, and vectors are more ideal when performing less transformations. 4x4 matrices store position, scaling and rotation allowing position and direction transformations to be performed more efficiently. Standardizing the location of position and rotation coordinates, makes it rather efficient to update only the coordinates needed for a given transformation quickly. Also, 4x4 matrices can be dumped into shaders without having to convert from one form to another.

The drawbacks of 4x4 matrices are they don't solve gimbal lock, unlike quaternions. (not sure if this is true) 4x4 matrices are also more prone to collecting errors over time. Vectors and quaternions are also much less complicated to deal with while coding, making major bugs unlikely.

I couldn't find an article that dealt directly with issues for 4x4 matrices.  There were just a bunch of articles spread out over a bunch of blogs and stackexchange posts on matrix math, quaternions, Euler angles, and vectors. Are there any other advantages and disadvantages to 4x4 matrices?

##### Share on other sites

A 4x4 matrix is just 4 basis vectors - the objects local X axis, Y axis, Z axis and position. If those X/Y/Z axis are all at right-angles, then it's a valid rotation. If the length of those axis is not 1.0, then it's being scaled.

Yep, if you maintain a 4x4 transformation for a long time, mutating the rotation part, then errors will accumulate into the scaling part (as both rotation and scale are stored in those 3 axis vectors).

Generally, you'll store translation, rotation, scaling seperately, and use them to rebuild a 4x4 rotation from scratch every frame. i.e. Maybe you'll use a Vec3 position, a float/Vec3 scale (depending on whether you support non-uniform scaling), and a Quaternion rotation -- and use them to generate the transform matrix. Once you've generated the transform matrix, it's easy to concatenate it with the camera/view matrix, the projection matrix, etc...

Alternatively, if you have long-lived/mutable transforms, you'll have to occasionally "reorthonormalize" them, by forcing the X/Y/Z vectors to have a length of 1 (and then re-adding in the scale factor from an external source, if required), and forcing the 3 axis vectors to be at right angles to each other using some cross products. That doesn't stop errors occuring, but stops them from growing out of control into weird scale/skewing bugs.

Often you don't want to use 4x4 matrices in shaders, as they waste space. A translation/rotation/scaling transform looks like:

Xx, Xy, Xz, 0

Yx, Yy, Yz, 0

Zx, Zy, Zz, 0

Px, Py, Pz, 1

(swap the rows/columns if you're using a different convention :P)

That last column is always (0,0,0,1) for a standard transform matrix, so there's no need to send it to the shader. Instead, you can store just the first 3 columns in 3 vec4's, and hard-code the 4th column.

This isn't a big deal for things like the view matrix, but can become an important optimization when doing skeletal animation with large numbers of bone matrices.

##### Share on other sites

Often you don't want to use 4x4 matrices in shaders, as they waste space. A translation/rotation/scaling transform looks like:
Xx, Xy, Xz, 0
Yx, Yy, Yz, 0
Zx, Zy, Zz, 0
Px, Py, Pz, 1
(swap the rows/columns if you're using a different convention )

I couldn't find the exact stackoverflow post mentioning shaders, but this is close to it. I also thought the gpu has instructions baked into it that can handle 4 x 4 matrices in one to a few instructions.  I'm just trying to understand all of the ins and outs.

##### Share on other sites

Matrix multiplication is commonly implemented as a series of dot product operations. For historical (and practical) reasons, the common GPU register size is 128 bits so a full 4x4 float matrix does not fit into one register (and closely related to that, no one hardware instruction).

Edited by Nik02

##### Share on other sites

DX and OGL both use 4x4 mats, not quats internally (1). so whatever you use, you end up converting to a 4x4 for drawing.

i find eulers best for non-flghtsims (global rotations),  and 4x4's with re-orthonomalization best for flight sims (local rotations). and quats best for slerp'd animations.

eulers are more intuitive that quats, and somewhat more intuitive than mats.

quats instead of mats for flightsims apparently degrade due to rounding errors about half as fast as mats, but still need to be re-orthonormalized.

folks have all kinds of ways of storing the translation and rotation. but most are in some from other than what they end up needing, and they convert a lot. such as converting quats to mats every draw call.

every time this topic comes up people will say "i'm gonna store it this way", and i'm like "WHY?"

(1) this may have changed in later versions of DX

Edited by Norman Barrows

##### Share on other sites

DX and OGL both use 4x4 mats, not quats internally * this may have changed in later versions of DX. so whatever you use, you end up converting to a 4x4 for drawing.

Fixed function is dead These days D3D and OGL run whatever GPU code (i.e. shaders) you give them.
If you write a vertex shader that uses quaternions, then you can draw using quaternions.
Most of the time you write your shaders to use 4x4 matrices because the code is more optimal than quaternion code.

I couldn't find the exact stackoverflow post mentioning shaders, but this is close to it. I also thought the gpu has instructions baked into it that can handle 4 x 4 matrices in one to a few instructions.  I'm just trying to understand all of the ins and outs.

....
In modern GPUs there is no restriction to what data format you upload to constant buffers.
Of course you need to write your vertex shader differently in order to use quaternions for skinning instead of matrices. In fact, we are using dual quaternion skinning in our engine.
Note that older fixed function hardware skinning indeed only worked with matrices, but that was a long time ago.

Most of the time we write vertex shaders that use matrix math, but as mentioned in that answer, there's always alternatives, and CryEngine actually does skinning using dual-quaternions (i.e. quaternions made of dual-numbers). They use that particular solution as it allows them to encode a translation+rotation transform in 8 floats, instead of the 16 for a 4x4 matrix (or 12 for the "compacted" 4x3 matrices I mentioned earlier). The downside is that their vertex shaders are much more complex. Mr Gneiting does seem to mis-speak on that page when he declares that these dual-quaternons are not slower than matrices... they are slower - the citation he gives even states "Increase in vertex instructions"... but they are more compact, which was the very important motivation on PS3/360 where vertex shader uniform registers are very limited.

##### Share on other sites
One great thing about matrices is how they can contain translation, rotation, scaling, skewing, and even perspective transformations. Many subsequent operations can be combined by simply multiplying matrices.

This makes building an object hierarchy fairly simple. A world matrix for an object is its own local transform matrix multiplied by its parent transform.

The lighter alternative, using an offset and quaternion for rotation, gets really messy when trying to maintain parent/child relationships.

Matrices also come prebuilt into shader code, quaternions aren't, so your shader code is likely to be simpler to write.

The advantages of using vector/quaternion is it will be faster. There are less math operations to perform and less memory to move around. I haven't seen any performance data on this though, so I don't know if the complexity tradeoff would be worth the potentially small boost in performance.

##### Share on other sites

Most of the time you write your shaders to use 4x4 matrices because the code is more optimal than quaternion code

some things never change. out of the box or roll you own, the fast way is the fast way.

Edited by Norman Barrows

##### Share on other sites

Most of the time we write vertex shaders that use matrix math, but as mentioned in that answer, there's always alternatives, and CryEngine actually does skinning using dual-quaternions (i.e. quaternions made of dual-numbers). They use that particular solution as it allows them to encode a translation+rotation transform in 8 floats, instead of the 16 for a 4x4 matrix (or 12 for the "compacted" 4x3 matrices I mentioned earlier). The downside is that their vertex shaders are much more complex. Mr Gneiting does seem to mis-speak on that page when he declares that these dual-quaternons are not slower than matrices... they are slower - the citation he gives even states "Increase in vertex instructions"... but they are more compact, which was the very important motivation on PS3/360 where vertex shader uniform registers are very limited.

Based on the direction I understand computer and even console hardware going in the past year or two memory and register limitations are less of an issue, with the amount of instructions required to run bits of code taking precedence in most areas now.  In 2 years we'll have so much memory available to the gpus, once PCI-E 4 comes around, along with the additional lanes provided by the cpu. Would you say that 4 x 4 matrices should be preferred for optimizations with current and future hardware? (Except for the edge cases where memory optimization is an issue)

##### Share on other sites

Based on the direction I understand computer and even console hardware going in the past year or two memory and register limitations are less of an issue, with the amount of instructions required to run bits of code taking precedence in most areas now.  In 2 years we'll have so much memory available to the gpus, once PCI-E 4 comes around, along with the additional lanes provided by the cpu. Would you say that 4 x 4 matrices should be preferred for optimizations with current and future hardware? (Except for the edge cases where memory optimization is an issue)

4x4's are a really good default solution for shaders. I'd definitely start there and then experiment later when/if optimization is necessary.

As for registers - the contrary. One of the most important optimizations for current hardware is the reduction of the number of *temporary registers* required to run a shader.
e.g. Say a shader core has 1000 registers inside it. If your pixel shader requires 500 temp registers to run, then the core can be "hyperthreading" 2 waves (a wave is say, 64 pixels). If the shader only requires 100 temp registers, the core can instead timeslice 10 waves at a time. With waves of 64 pixels, that's 640 threads in flight instead of 128.

Note that this is *temporary registers* though. In the past we were limited by uniform registers - the number of constants available per draw. Now days, uniforms are stored in memory, and fetched into temporary registers on demand, so you can have unlimited numbers of uniforms.

The number of waves per core, mentioned above, is called "occupancy". The higher your occupancy is, the better the GPU is able to hide memory latency from you, by switching to other waves when one stalls on memory accesses.
e.g. say a texture fetch has a latency of 1000 cycles. With an occupancy of 1, after issuing a texture fetch, your shader will have to spend 1000 ALU cycles on other math before attempting to use the texture fetch result, or it will have to stall and wait for the data to arrive. With an occupancy of 2, it can switch between both waves, so each only has 500 ALU cycles worth of instructions until the results of their texture fetches arrive. With an occupancy of 10, the *perceived memory latency* drops to 100 cycles.

When you have shaders that perform a lot of memory accesses, then ALU instructions can almost become "free", unless you manage you get your occupancy high enough to bring the perceived latencies down.

The situation is similar on the CPU. CPU power doubles every 2 years, but RAM speed doubles every 10 years. So, if you graph both and normalize for CPU speed (so it becomes a flat line), then RAM actually gets slower every year. There's been a big focus over the past 5-10 years on "data oriented design" in games -- focusing on data strucures, memory allocations and memory access patterns, in order to optimize for memory latency, rather than the traditional focus of optimizing for instruction counts and clock cycles.

• ### What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

• 9
• 33
• 16
• 11
• 10
• ### Forum Statistics

• Total Topics
634124
• Total Posts
3015630
×