use mul(vector, matrix) or mul(matrix, vector) in HLSL

Started by
3 comments, last by YixunLiu 6 years, 11 months ago

Hi,

In one DirectX example, I found it transpose the world transform with type XMMATRIX and then save it to XMFLOAT4x4, then send it to GPU using

context->UpdateSubresource. On GPU side, the vertex position is transformed to world space by mul(vector, matrix), in which the vector is the vertex position and the matrix is the world transform matrix.

I think we have another way to do it. We do not conduct transpose on CPU side, and use mul (matrix, vector) in HLSL.

It looks this one should be more efficient because we do not need transpose on CPU side.

Am I right?

Thanks.

YL

Advertisement
This is all convention, row major, column major. You are right that mul(vec, mat) == mul(traspose(mat), vec). The difference is how the shader do the math, either with dot products or mul and mads.

It used to be better to do the dot products, so the transpose. With our fat GPU that are scalar and not simd, it matters less.

But my advice, you have thousands of optimisation to do before that transpose become a problem unless of a very degenerative issue :)
Write your math out on paper. This is a purely mathematical convention issue.
If the basis vectors within your matrices are written vertically, you use mul(matrix, vector), or if they're written horizontally, you use mul(vector, matrix).

However, there's also a computer-specific convention issue of how 2D arrays are stored in memory - row by row, or column by column.
If I remember correctly, the XM* classes use the row-by-row storage convention, while HLSL (by default) uses the column-by-column storage convention. Also, the XM* classes use horizontal basis vectors while HLSL works with horizontal or vertical ones.
If nothing was done to correct for this mismatch in storage conventions it would have the implicit effect of diagonally flipping (transposing) the matrix... Which is equivalent to switching to the other set of mathematical conventions. i.e. an XM* matrix has horizontal basis vectors, but if you pass it directly to HLSL, it ends up with vertical basis vectors. The transpose function can 'fix' this.

So, your example with the transpose function is using it to correct for the storage order mismatch.
And, your example without the transpose function is instead using opposite mathematical conventions in C and HLSL as a horrible way of tolerating the storage mismatch.

IMHO both are bad. Instead you should pick one storage convention and one mathematical convention and use the same choices everywhere.

Just thought i'd share this link, might be helpful

https://msdn.microsoft.com/en-us/library/windows/desktop/bb509634(v=vs.85).aspx

Thank you so much for your guys insightful answer!

This topic is closed to new replies.

Advertisement