Are float4x4 arrays supported in Shader Model 2 (D3D9, vs_4_0_level_9_3)?

Started by
18 comments, last by Stefan Fischlschweiger 9 years ago

you have posted only definition of two structures, and a vertex layout in vertex shader. Actual interpretation of vertex buffer memory towards a shader are multiple untrivial dx calls.

Advertisement

you have posted only definition of two structures, and a vertex layout in vertex shader. Actual interpretation of vertex buffer memory towards a shader are multiple untrivial dx calls.

I see, here is how I declare vertex buffer:


// Initialize vertex buffers
            for (int indx = 0; indx < mesh.VertexBuffers.Count; indx++)
            {
                var vb = mesh.VertexBuffers[indx];
                Vertex[] vertices = new Vertex[vb.Length];
                for (var i = 0; i < vb.Length; i++)
                {
                    // Retrieve skinning information for vertex
                    Common.Mesh.SkinningVertex skin = new Common.Mesh.SkinningVertex();
                    if (mesh.SkinningVertexBuffers.Count > 0)
                        skin = mesh.SkinningVertexBuffers[indx][i];

                    // Create vertex
                    vertices[i] = new Vertex(vb[i].Position, vb[i].Normal, vb[i].Color, vb[i].UV, skin);
                }

                vertexBuffers.Add(ToDispose(Buffer.Create(device, BindFlags.VertexBuffer, vertices.ToArray())));
                vertexBuffers[vertexBuffers.Count - 1].DebugName = "VertexBuffer_" + indx.ToString();
            }

As you can see here, I have List<Buffer> vertexBuffers (every submesh of a mesh gets its own vertex buffer). The line 88 is where I create a vertex buffer on the CPU side. In the for-cycle I load the info about 1 vertex.

If you wanna see the InputLayout for the vertex shader, here it is:


vertexLayout = ToDispose(new InputLayout(device,
                   bytecode.GetPart(ShaderBytecodePart.InputSignatureBlob).Data,
                new[]
                {
                    // "SV_Position" = vertex coordinate in object space
                    new SharpDX.Direct3D11.InputElement("SV_Position", 0, Format.R32G32B32_Float, 0, 0),
                    // "NORMAL" = the vertex normal
                    new SharpDX.Direct3D11.InputElement("NORMAL", 0, Format.R32G32B32_Float, 12, 0),
                    // "COLOR"
                    new SharpDX.Direct3D11.InputElement("COLOR", 0, Format.R8G8B8A8_UNorm, 24, 0),
                    // "UV"
                    new SharpDX.Direct3D11.InputElement("TEXCOORD", 0, Format.R32G32_Float, 28, 0),
                    // "BLENDINDICES"
                    // NOTE: commented line is for WinRT client, we must use Format.R32G32B32A32_Float (supported in 9_3)
                    //new InputElement("BLENDINDICES", 0, Format.R32G32B32A32_UInt, 36, 0), 
                    new SharpDX.Direct3D11.InputElement("BLENDINDICES", 0, Format.R32G32B32A32_Float, 36, 0),
                    // "BLENDWEIGHT"
                    new SharpDX.Direct3D11.InputElement("BLENDWEIGHT", 0, Format.R32G32B32A32_Float, 52, 0),
                }));

Seems legit, but my gess is that the very vertex buffer data, you bound to render on, actualy does have 4byte indicies and verticies in itself, so all 4 values get swallowed into first component, somehow being 1.0 value in shader. Any sane exporter would export like that anyway, so you seem to have correct art asset, but incorrect declaration in those calls, change them to as I've adviced, 4 bytes/4bytes and try out


new SharpDX.Direct3D11.InputElement("BLENDINDICES", 0, Format.R32G32B32A32_Float, 36, 0),
// "BLENDWEIGHT"
new SharpDX.Direct3D11.InputElement("BLENDWEIGHT", 0, Format.R32G32B32A32_Float, 52, 0),
This:

...
public uint BoneIndex0;
public uint BoneIndex1;
public uint BoneIndex2;
public uint BoneIndex3;
...
is not
new SharpDX.Direct3D11.InputElement("BLENDINDICES", 0, Format.R32G32B32A32_Float, 36, 0),

This:


...
public uint BoneIndex0;
public uint BoneIndex1;
public uint BoneIndex2;
public uint BoneIndex3;
...
is not
new SharpDX.Direct3D11.InputElement("BLENDINDICES", 0, Format.R32G32B32A32_Float, 36, 0),

true as well!


Seems legit, but my gess is that the very vertex buffer data, you bound to render on, actualy does have 4byte indicies and verticies in itself, so all 4 values get swallowed into first component, somehow being 1.0 value in shader. Any sane exporter would export like that anyway, so you seem to have correct art asset, but incorrect declaration in those calls, change them to as I've adviced, 4 bytes/4bytes and try out

OH MY GOD IT WORKS!!! Finally after 2 weeks I will be able to sleep... You were right about the weird struct:


[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct SkinningVertex
{
    public uint BoneIndex0;
    public uint BoneIndex1;
    public uint BoneIndex2;
    public uint BoneIndex3;
    public float BoneWeight0;
    public float BoneWeight1;
    public float BoneWeight2;
    public float BoneWeight3;
}

This is what was causing the problem. The Indices and Weights weren't right. It was ok for Windows Part of the Universal, but for Windows Phone Part not. I can finally run the engine on Windows Phone device - with proper animating and skinning. Thank you so much. I will never be able to repay you. My live depends on this project and you solved this mystery, that no one else could. So again - thank you.

I removed this and replaced this with 2 Vector4... Vector4 Indices, Vector4 Weights and it's working.


float4x4 for bone matricies is not apropriate, since a bone matrix is 3x4 matrix. I would strongly advice you to reform your code to send your matricies into a float4 matriciesrows[60*3] array if you want to conform to older compiler versions smoothly and not outperform them. This way, if you send rows, you can achive 3x4 optimization - constructing float3x4 or float4x4 objects in vertex shader from the rows is just trivial fast operation - or you may use the rows for transforming right away as well. And also this way you will use more native and stable uniform setters on cpu device, since setting array of float4's is what every gpu is ready for the most.

Can I please have 1 more question? I tried to implement this as you said - passing float4 matricesrows[bonesCount * 3] array instead of float4x4 bones[bonesCount]. I made the adjusment to the code and the matrices rows are loaded in the buffer fine. The problem is that SharpDX is row-major and HLSL is column-major.

So if I do this in the shader code:


bonesX = { MatricesRows[indices.x * 3], MatricesRows[indices.x * 3 + 1], MatricesRows[indices.x * 3 + 2], { 0, 0, 0, 1 } };

... the rows actually become columns. Of course I could just do this:


float4x4 bonesX = { { MatricesRows[indices.x * 3].x, MatricesRows[indices.x * 3 + 1].x, MatricesRows[indices.x * 3 + 2].x, 0 }, 
{ MatricesRows[indices.x * 3].y, MatricesRows[indices.x * 3 + 1].y, MatricesRows[indices.x * 3 + 2].y, 0 },
{ MatricesRows[indices.x * 3].z, MatricesRows[indices.x * 3 + 1].z, MatricesRows[indices.x * 3 + 2].z, 0 },
{ MatricesRows[indices.x * 3].w, MatricesRows[indices.x * 3 + 1].w, MatricesRows[indices.x * 3 + 2].w, 1 } };

... but there are too many instructions, and since we're in D3D9, we are limited to 256 instructions. So this won't work. Then I tried:


row_major float4x4 bonesX = { MatricesRows[indices.x * 3], MatricesRows[indices.x * 3 + 1], MatricesRows[indices.x * 3 + 2], { 0, 0, 0, 1 } };

I added row_major for every float4x4 I use in the shader, but still the result was the same as if there was no "row_major". It seems when I call position = mul(position, skinTransform); the multiplication still is like the skinTransform had no row_major.

Is there a short way (short in instruction counts) to compose a bone transform matrix, when I have it's original 3 rows? Or is there anything else I can do?

---

Oh wait... how stupid of me. There is transpose(matrix) fucntion. Sorry ^_^


float4x4 for bone matricies is not apropriate, since a bone matrix is 3x4 matrix. I would strongly advice you to reform your code to send your matricies into a float4 matriciesrows[60*3] array if you want to conform to older compiler versions smoothly and not outperform them. This way, if you send rows, you can achive 3x4 optimization - constructing float3x4 or float4x4 objects in vertex shader from the rows is just trivial fast operation - or you may use the rows for transforming right away as well. And also this way you will use more native and stable uniform setters on cpu device, since setting array of float4's is what every gpu is ready for the most.

Can I please have 1 more question? I tried to implement this as you said - passing float4 matricesrows[bonesCount * 3] array instead of float4x4 bones[bonesCount]. I made the adjusment to the code and the matrices rows are loaded in the buffer fine. The problem is that SharpDX is row-major and HLSL is column-major.

So if I do this in the shader code:


bonesX = { MatricesRows[indices.x * 3], MatricesRows[indices.x * 3 + 1], MatricesRows[indices.x * 3 + 2], { 0, 0, 0, 1 } };

... the rows actually become columns. Of course I could just do this:


float4x4 bonesX = { { MatricesRows[indices.x * 3].x, MatricesRows[indices.x * 3 + 1].x, MatricesRows[indices.x * 3 + 2].x, 0 }, 
{ MatricesRows[indices.x * 3].y, MatricesRows[indices.x * 3 + 1].y, MatricesRows[indices.x * 3 + 2].y, 0 },
{ MatricesRows[indices.x * 3].z, MatricesRows[indices.x * 3 + 1].z, MatricesRows[indices.x * 3 + 2].z, 0 },
{ MatricesRows[indices.x * 3].w, MatricesRows[indices.x * 3 + 1].w, MatricesRows[indices.x * 3 + 2].w, 1 } };

... but there are too many instructions, and since we're in D3D9, we are limited to 256 instructions. So this won't work. Then I tried:


row_major float4x4 bonesX = { MatricesRows[indices.x * 3], MatricesRows[indices.x * 3 + 1], MatricesRows[indices.x * 3 + 2], { 0, 0, 0, 1 } };

I added row_major for every float4x4 I use in the shader, but still the result was the same as if there was no "row_major". It seems when I call position = mul(position, skinTransform); the multiplication still is like the skinTransform had no row_major.

Is there a short way (short in instruction counts) to compose a bone transform matrix, when I have it's original 3 rows? Or is there anything else I can do?

Just drop advanced objects of algebra, alowing yourself to acomodate direct transformations. Though, your issue is too narrow, no worries to deal with it directly. You cannot beat out more optimizations as it is norowed down now though directly, your shader functions are well optimzed by now. I gess at least :)


Just drop advanced objects of algebra, alowing yourself to acomodate direct transformations. Though, your issue is too narrow, no worries to deal with it directly. You cannot beat out more optimizations as it is norowed down now though directly, your shader functions are well optimzed by now. I gess at least

Hopefully it is. Thanks to you :) I will keep an eye on it.

On the CPU side you could do yourMatrix.Transpose();

This topic is closed to new replies.

Advertisement