# DX11 GPU Skinning Problem

## Recommended Posts

Hello,

I have a Problem with GPU Skinning, I load from a COLLADA File my object with the vertices and the weights and bone indices for it and the bones with the matrices.

For every vertex I choose 4 weights and 4 bone indices.

For every non skin vertex i choose by the weights 1 0 0 0 and bone indices 0 0 0 0 (In the Bone Matrices Array is index 0 a Matrix Idetity)

And i check up if all weights values together is always 1 or i calculate it to 1.

So far so good, my Shader looks like this:

bool HasBones;
matrix BoneMatrices[256];

struct Vertex
{
float3 Position  : POSITION;
float3 Normal    : NORMAL;
float2 UV        : TEXCOORD0;
float3 Tangent   : TANGENT;
float4 Weights   : WEIGHTS;
int4 BoneIndices : BONEINDICES;
};

float4 ApplyBoneTransform(Vertex input, float4 value)
{
if(HasBones)
{
float4x4 skinTransform = (float4x4)0;
skinTransform += BoneMatrices[input.BoneIndices.x] * input.Weights.x;
skinTransform += BoneMatrices[input.BoneIndices.y] * input.Weights.y;
skinTransform += BoneMatrices[input.BoneIndices.z] * input.Weights.z;
skinTransform += BoneMatrices[input.BoneIndices.w] * input.Weights.w;

float4 position = mul(value, skinTransform);

return position;
}
else
return value;
}

{
Pixel result = (Pixel) 0;

float4 posWorld = mul(ApplyBoneTransform(input, float4(input.Position.xyz, 1.0f)), World);
result.Position = mul(mul(posWorld, View), Projection);
result.Normal = normalize(mul(ApplyBoneTransform(input, float4(input.Normal.xyz, 1.0f)), WorldIT));
result.UV = input.UV;
result.View = ViewInverse[3] - mul(float4(input.Position.xyz, 1.0f), World);
result.Tangent = normalize(mul(ApplyBoneTransform(input, float4(input.Tangent.xyz, 1.0f)), WorldIT).xyz);
result.Binormal = normalize(cross(input.Normal, input.Tangent));

return result;
}

And if i set HasBones to true, my object will not draw right anymore, i only see two dark triangles

I believe it depends on the bone matrices i load from the controller_lib of the COLLADA File and send it to the BoneMatrices in the shader + at the index 0 the Matrix Idetity.

Has anyone an Idea what I make wrong and could help and explain me it?

And i upload to this post the Collada file and images of the object draw in HasBones = false and HasBones = true

Greets

Benajmin

Model.dae

##### Share on other sites

You do linear interpolation of multiple matrices which makes no sense here.

You should transform the vertex by each matrix, and lerp the resulting vectors instead.

But to do so, you first need to transform the vertex into the local bone space instead, so you typically store 2 matrices for each bone. One (static) that transforms to bone space, and another (animated) that transforms the result back to world space.

Something like this:

EDIT: Hey - who upvoted this nonsense? Maybe crossing out helps

	vec os = currentVertexPosInObjectSpace;
vec ws (0); // result
for each affecting bone {
int i = bone.matrixIndex;
vec v = ObjectSpaceToBoneMatrices[i].Transform(os); // sadly this is necessary so the vertex knows it position relative to each bone
v = AnimatedBoneInWorlsSpaceMatricws[i].Transform(v); // but knowing that, we can transform now to world space for this bone
ws += v * bone.weight; // sum up weighted results
}

Edited by JoeJ

##### Share on other sites

Unfortunately this problem needs to be debugged. What helped me is loading a very simple model (I see you are already using a box so that should be fine) and very simple bone structure. Debug that your bone matrices are correct by drawing simple lines with each transform matrix. the lines should form a skeleton when correct.

The most common mistakes done is messing up the order of matrix multiplications of Scale*Rotate*Translate as well as bone hierarchy order, and there is also the need of multiplying with a t-pose relative matrix (inverse t-pose), and all these need to be in the correct order. Also HLSL expects matrices to be in column major order, while your application side math library might produce row-major matrices, so maybe they should either be pre-transposed, or reverse the multiplication order.

The shader you posted doesn't seem too bad at first look, though normal and tangent vectors probably shouldn't be set their w component to avoid applying translations to them. Also I noticed that you are adding your matrices together and multiplying by the bone weights. That should produce correct results, but less operations are performed if you would perform vector transformations first and then multiplying the vector with the weight, then finally adding the vectors together. The result should be the about same (if we don't consider floating point accuracy) but achieved faster. You also don't have to use a 4x4 matrix, 3x4 is perfectly enough, but set the 4th row to (0,0,0,1).

You can check out my skinning shader for reference: https://github.com/turanszkij/WickedEngine/blob/master/WickedEngine/skinningCS.hlsl

Good luck, it is a very rewarding experience once you manage to correct it!

17 minutes ago, JoeJ said:

You do linear interpolation of multiple matrices which makes no sense here.

You should transform the vertex by each matrix, and lerp the resulting vectors instead.

But to do so, you first need to transform the vertex into the local bone space instead, so you typically store 2 matrices for each bone. One (static) that transforms to bone space, and another (animated) that transforms the result back to world space.

Something like this:


vec os = currentVertexPosInObjectSpace;
vec ws (0); // result
for each affecting bone {
int i = bone.matrixIndex;
vec v = ObjectSpaceToBoneMatrices[i].Transform(os); // sadly this is necessary so the vertex knows it position relative to each bone
v = AnimatedBoneInWorlsSpaceMatricws[i].Transform(v); // but knowing that, we can transform now to world space for this bone
ws += v * bone.weight; // sum up weighted results
}


You can do the linear interpolation of matrices fine, the result is the same, just results in more operations. Also, you can premultiply the bone matrices with the relative "objectspaceToBoneMatrix" matrix on the application side and only send one bone matrix to the shader.

Edited by turanszkij

##### Share on other sites
1 minute ago, turanszkij said:

there is also the need of multiplying with a t-pose relative matrix (inverse t-pose)

This is what i mean with ObjectSpaceToBoneMatrices (to avoid any confusion)

You calc that by the inverse transform of the bone in rest pose

4 minutes ago, turanszkij said:

You can do the linear interpolation of matrices fine, the result is the same, just results in more operations. Also, you can premultiply the bone matrices with the relative "objectspaceToBoneMatrix" matrix on the application side and only send one bone matrix to the shader.

Ooops - you're right. And i've forgotten about the 'trick' to premultiply... embrassing

##### Share on other sites

@turanszkij With Draw the Bones to see the Skeleton, i did this already, to check if i load the bone matrices correct. see the image.

I take the postion of each Bone and draw a line to the next child postion.

But i see only a right skeleton result if i invert every single bone matrix.

And do you mean with t-pose the bind pose matrix?

And befor i send the boneMatrices Array to the Shader, here is the code how i collect them

                public Matrix CalculateMatrixFromParents(Joint joint, Matrix world)
{
if (joint.Parent != null)
{
world *= CalculateMatrixFromParents(joint.Parent, joint.Parent.Matrix);
return world;
}
else
return joint.Matrix;
}

public List<Matrix> GetBoneMatrices()
{
List<Matrix> boneMatrices = new List<Matrix>();

foreach (Joint joint in this.bones)
{
Matrix m = this.CalculateMatrixFromParents(joint, Matrix.Identity);
}

return boneMatrices;
}

I am not really math genius , but i hope i will understand your explain?

Greets

Benjamin

Edited by B. /

##### Share on other sites

You could try inspecting the shader bone data in a graphics debugger, if the contents of it match the bone data on the application side. Nvidia Nsight or the visual studio graphics debugger are good choices.

##### Share on other sites
11 hours ago, B. / said:

foreach (Joint joint in this.bones) { Matrix m = this.CalculateMatrixFromParents(joint, Matrix.Identity); boneMatrices.Add(m); }

To me it looks like you do not take the rest/t-pose/bind-pose (however we call it) into account here.

It might look somehow like this (assuming the matrices in your current code are the animated ones):

foreach (Joint joint in this.bones) {

Matrix m = this.CalculateMatrixFromParents(joint, Matrix.Identity* joint.bindPoseWorldSpaceMatrix.Inversed();

You should get this right after some trial and error usually.

Your shader however is potentially inefficient, because each thread may store 4 matrices in registers, that's 4*16 = 64 alone for that, which is a lot.

To get 100% occupancy on AMD you should only use 24 IIRC, NV varies but is similar.

To achieve this, you should transform the position by each matrix in order (as already said), so the compiler has the option to have just 1 or 2 matrices in registers at the same time. And you should transform position and normal in one go of course, otherwise the compiler will likely decide to store them all to have them available for the normal. (Also, using subfunctions almost always has a cost the last time i checked - it seems compilers are too stupid to inline the code, better do it yourself.)

This is said just to be nit-picking. The matrices likely end up in fast constant ram, so storing all 4 in registers might not be necessary and my proposed optimization would have no effect in practice. But you never really know how different GPUs / compilers handle this and thinking of it is often no extra work. (optimizing for low register usage is usually also better than optimizing for less instructions.)

Edited by JoeJ

##### Share on other sites

Hi Guys,

thank you for your answers, today i had also test loading weights/matrices per vertex in assimp, to test, if i dont make a mistake by loading the datas. The matrices was right, but the weights per vertex was different and the draw result was still wrong, but very better as mine.

The strange thing only was in the collada file say vcount, the first vertex has 2 weights and the second has 3 weights ...

But assimp load for the fist vertex 3 weights and for the second 2 weights. So i thought, maybe i dont right understand the docu of the collada file, to set the right weights and bone indices to the right vertex maybe?

        <source id="pCube1Controller-Weights">
<float_array id="pCube1Controller-Weights-array" count="33">

1.000000 0.989534 0.009756 0.708022 0.289010 0.002968 0.989518 0.009771 0.708398 0.288669 0.002933 0.989525 0.009771 0.708446 0.288689 0.002866
0.989540 0.009756 0.708070 0.289030 0.002900 0.004697 0.497651 0.497651 0.003326 0.498337 0.498337 0.004704 0.497648 0.497648 0.003331 0.498334
0.498334</float_array>
<technique_common>
<accessor source="#pCube1Controller-Weights-array" count="33">
<param type="float"/>
</accessor>
</technique_common>
</source>

<vertex_weights count="12">
<input semantic="JOINT" offset="0" source="#pCube1Controller-Joints"/>
<input semantic="WEIGHT" offset="1" source="#pCube1Controller-Weights"/>
<vcount>2 3 2 3 2 3 2 3 3 3 3 3</vcount>
<v>0 1 1 2 0 3 1 4 2 5 0 6 1 7 0 8 1 9 2 10 0 11 1 12 0 13 1 14 2 15 0 16 1 17 0 18 1 19 2 20 0 21 1 22 2 23 0 24 1 25 2 26 0 27 1 28 2 29 0 30 1 31 2 32</v>
</vertex_weights>

Here is the code how i set the Weights and Bone Indices to the vertices:

InfluencesWeightsPerVertex = VCount List

1+ wCounterIndex, because the first Weight in the Weights Array is an extra Weights

RemoveRange, because i create a Vertex by default Weights 1 0 0 0, Bone Indices 0 0 0 0 (Index 0 = Matrix Idetity)

Insert Range because of insert the loaded values in the list

                        int vertexIndex = 0;
int wCounterIndex = 0;

// Set Vertex Weights and Bone Indices
foreach (int item in InfluencesWeightsPerVertex)
{
List<float> weights = skinClusterWeights.GetRange(1 + wCounterIndex, item);
List<int> boneIndices = new List<int>();

for (int i = 0; i < (item * 2); i += 2)

if (weights.Count > 4)
weights.RemoveRange(4, weights.Count - 4);

if (boneIndices.Count > 4)
boneIndices.RemoveRange(4, boneIndices.Count - 4);

// Normalize all weights to 1
float factor = 0;

foreach (float weight in weights)
factor += weight;

factor = 1.0f / factor;

for (int i = 0; i < weights.Count; i++ )
weights[i] = factor * weights[i];

Vertex vertex = geometry.Vertices[vertexIndex];
vertex.Weights.RemoveRange(0, weights.Count);
vertex.Weights.InsertRange(0, weights);
vertex.BoneIndices.RemoveRange(0, boneIndices.Count);
vertex.BoneIndices.InsertRange(0, boneIndices);

geometry.Vertices[vertexIndex] = vertex;

vertexIndex++;
wCounterIndex += item;
}

Can you check up my code and see, if my way to load the datas for skinning is right, that we can exclude this?

Greets

Benjamin

16 hours ago, turanszkij said:

You could try inspecting the shader bone data in a graphics debugger, if the contents of it match the bone data on the application side. Nvidia Nsight or the visual studio graphics debugger are good choices.

Hi,

if the Bone Data/Matrices would be wrong, than the draw of the skeleton lines would be wrong too, right?

Edited by B. /

##### Share on other sites

Maybe assimp is set up to optimize for better vertex caching and so changes the order of vertices?

When i came across collada, i noticed each app seems to have its own interpretation of the standard, and it was not really a good format to exchange things, especially when skinning is involved.

I would create a simple model like the box just proceduraly only from code to be sure the data is right. You could also do CPU skinning for reference. Some work, but such code may be reusable for the next issue and on the long run i prefer this against frustrating GPU debugging.

##### Share on other sites
3 hours ago, JoeJ said:

Maybe assimp is set up to optimize for better vertex caching and so changes the order of vertices?

When i came across collada, i noticed each app seems to have its own interpretation of the standard, and it was not really a good format to exchange things, especially when skinning is involved.

I would create a simple model like the box just proceduraly only from code to be sure the data is right. You could also do CPU skinning for reference. Some work, but such code may be reusable for the next issue and on the long run i prefer this against frustrating GPU debugging.

Hi Joe,

which Format would be the best for GPU Skinning, maybe FBX?

And I believe i know where my mistake is, i see in the collada file the count of vertex weights is 12, but my elbow cube has total 60 vertices

The other 48 vertices has only Weights 1 0 0 0 and Indices of 0 0 0 0 Matrix Idetity.

But thats wrong because i skin every single vertices on the 3 bones and so assimp also import weights on all vertices, but how do i calculate this from the only 12 given vcounts, that only say how many weights/bones has a vertex??????????????

And for CPU Skinning i would need a origin Vertex List and transform these and set it to the vertexbuffer for every fps?

Greets

Benjamin

Edited by B. /

##### Share on other sites
1 hour ago, B. / said:

which Format would be the best for GPU Skinning, maybe FBX?

Seems better, yes. But i'm totally no expert with file formats. There are many shipped games using collada - i would not drop this if you already invested some time. Maybe the missing data is stored elsewhere in the file? Maybe open a new collada related topic so people knowing better can help. (Say also which modelng app you use then.)

1 hour ago, B. / said:

And for CPU Skinning i would need a origin Vertex List and transform these and set it to the vertexbuffer for every fps?

Yes, but performance does not matter - just to proof you have correct data and your algorithm is right if you can't get it to work otherwise. (I have a debug visualization class that buffers lines, point or polygons with given color and drwas once per frame. This would do and is very useful for pretty much anything.)

##### Share on other sites

Hi Joel,

I found my mistake, the 12 vertex weights was linked with the 12 vertices pos array of the file and now i get the same result as assimp

So now i load the weigths and Bone Indices right and if i only send Matrix Identitys to the shader, the model will draw correct.

Send I the origin bone matrices, that i load from the file, I get a Science-Fiction effect by drawing the model. See image

So the last thing to do is now to send the right transformed matrices to the shader.

Here the code how i send these to the shader (The BindPose Matrix is invert and the result is a Matrix Idetity)

                List<Matrix> boneMatrices = new List<Matrix>();

foreach (Joint joint in this.bones)
{
Matrix m = MatrixHelper.CalculateMatrixFromParents(joint, Matrix.Identity);
}

return boneMatrices;

Anyone an idea how to solve the last stept?

Greets

Benjamin

##### Share on other sites

Try to transpose the matrix coming from file (row major vs. column major convention issue)

Try to reverse multiplication order of both matrices (unlikely)

Try combining both approaches.

Trial and error fun

Edited by JoeJ

##### Share on other sites

That only bring a new strange effect, see image

Heres again my code

Engine Code

        public static Matrix CalculateMatrixFromParents(Joint joint, Matrix world)
{
if (joint.Parent != null)
{
world = CalculateMatrixFromParents(joint.Parent, joint.Parent.Transform.Matrix) * world;
return world;
}
else
return joint.Transform.Matrix;
}

List<Matrix> boneMatrices = new List<Matrix>();

if (this.bones.Count != 0)
{

foreach (Joint joint in this.bones)
{
Matrix m = MatrixHelper.CalculateMatrixFromParents(joint, Matrix.Identity);
m.Transpose();
}
}

return boneMatrices;

float4 ApplyBoneTransform(Vertex input, float4 value)
{
if(HasBones)
{
float4x4 skinTransform = (float4x4)0;
skinTransform += BoneMatrices[input.BoneIndices.x] * input.Weights.x;
skinTransform += BoneMatrices[input.BoneIndices.y] * input.Weights.y;
skinTransform += BoneMatrices[input.BoneIndices.z] * input.Weights.z;
skinTransform += BoneMatrices[input.BoneIndices.w] * input.Weights.w;

float4 position = mul(value, skinTransform);

return position;
}
else
return value;
}

{
Pixel result = (Pixel) 0;

float4 posWorld = mul(ApplyBoneTransform(input, float4(input.Position.xyz, 1.0f)), World);
result.Position = mul(mul(posWorld, View), Projection);
result.Normal = normalize(mul(ApplyBoneTransform(input, float4(input.Normal.xyz, 1.0f)), WorldIT));
result.UV = input.UV;
result.View = ViewInverse[3] - mul(float4(input.Position.xyz, 1.0f), World);
result.Tangent = normalize(mul(ApplyBoneTransform(input, float4(input.Tangent.xyz, 1.0f)), WorldIT).xyz);
result.Binormal = normalize(cross(input.Normal, input.Tangent));

return result;
}

Greets

Benjamin

##### Share on other sites

Hmm... thinking of it, there might be still something wrong with the import or wiring the data properly.

What is really strange is that the mesh triangles seem to be teared apart. This indicates a logical error more likely than wrong math.

If only math would be wrong, the mesh would bend strangely, but it would not disconnect.

So i assume you have duplicated vertices due to different normals or UVs. You could look for those vertices (log their numbers or using debugger). If you find two vertices with the same position, they should have the same bone indices and weights. If not, import logic must be still wrong.

(After that: I see you draw white lines and circles, so you already have debug visuals. You could use them to do the whole skinning on CPU to find bugs more easily as suggested earlier.)

##### Share on other sites

Hi Joe,

you had totally right, after hours of checking my code, i found two fatal bugs i made.

The first one was a logic mistake by load the boneindices of the file and the second was to set the wrong offsets for weights and boneindices to the shader.

So i fix this and now the geometry will draw right and can be deform by the bone matrices, without disconnect the triangle shap

But there is still a last little bug. If i send the bone matrices in order of the code i post, i get a little scaling effect of the X Axis, so the mesh will get a little bit larger

Inverse order of multiplication the bone matrices has the same result. Transpose the value of multiplication of all matrices, the mesh will pressed like from a heavy hammer. Invert the result of multiplication, i get a scaling effect of the X Axis too, but it will be a little bit smaller.

Has anyone an Idea to fix this last step?

Greets

Benjamin

##### Share on other sites

If i import the Collada file back to Maya i get a warning message that the tranform of every single joint is not compatible with fbx, so its baked into trs?

Is not the Problem, that the gpu skinning should be only, if bone matrices animations exist, because my 3 bones has transform matrices who say where they should be in the world, but that is not the position of the vertices, so for a example, a bone with the x postion 3 and tranform with the binding vertices translate these 3 steps right on the x Axis and that is wrong right, because without a animation, every single vertex should be only tranform with a matrix ideity until the get animated?

So the transform matrix should be the difference of the old and new bone transform matrix around the postion of the bone?

Greets

Benjamin

##### Share on other sites
20 hours ago, JoeJ said:

(After that: I see you draw white lines and circles, so you already have debug visuals. You could use them to do the whole skinning on CPU to find bugs more easily as suggested earlier.)

Maybe it's time for this now. I'd start without any weights and linking each vertex to the closest bone. That's a bit easier but enough to understand verify any involved math.

EDIT:

The inverse bind matrix transforms the vertex from model space to local bone space

The animated bone matrix transforms from bone space to final world space

Edited by JoeJ

##### Share on other sites

As requested by PM, i'll try to give an example for better understanding...

I don't know how 'new' you are to this, but to me the key was to treat this all as a geometrical problem. Back then i was not aware this is an application of linear algebra, and i still think of this stuff in a pure geometrical way.

Let's start with an exercise of how to transform a vector from one space to another:

// our source space: (referring to the bone transform in rest position)
vec sx (1,0,0);
vec sy (0,1,0);
vec sz (0,0,1);
vec sp (2,3,4);
// and target space: (referring to the animated space)
vec tx (0,0,1);
vec ty (1,0,0);
vec tz (0,1,0);
vec tp (5,6,7);
// this is a skin vertex in global space (or model space, but usually that's the same if model and skelton are at the origin)
vec skin (3,3,3);

// first, we move the vertex from world space to the local source space
vec localInS = skin - sp; // position now relative to source origin, now care for orientation:
localInS = vec (
sx.Dot(localInS), // how far from source origin along the direction of its X axis?
sy.Dot(localInS), // how far from source origin along the direction of its Y axis?
sz.Dot(localInS)); // and Z
// vertex is now in source space. Key is here to understand how the dot product works.
// so we transformed the vertex from its modeled position to the bone of the skeleton that should affect in during animation.

// next, think of the target space as a animated variant of the skeleton bone.
// we already know the position relative to that bone, now all we need to do is to transform it back to world space but using animated transform
vec animatedSkin = tp + vec (
tx * localInS.x,
ty * localInS.y,
tz * localInS.z);

// that's it. We are done. If we have multiple bones affecting the vertex, we do the same calculation for each of them and add te a weighted final result, like:
vec result =
animatedSkin * 0.25 +
animatedSkin_2 * 0.25 +
animatedSkin_3 * 0.5;
// that's obvious, but just notice we lerp only resulting vectors istead the spaces, which is faster

If you understand this (take some time, imagine it geomatrically, visualize it to help your brain a lot...),

then you understood all math there is to know.

We could rewrite this code using matrices, the spaces already hint how as they have the same memory order than a 4x3 matrix, just adding a (useless) final row to get 4x4 it would look like this:

	float sMatrix[16] = {
1,0,0,0,
0,1,0,0,
0,0,1,0,
5,6,7,1};


You can then look your math library code to see it performs identical operations than dot products.

Thinking this further, e.g. transforming all 3 direction vectors of one matrix by another matrix, you already understood how 3D rotations work using matrices. Only requirement is to understand the dot product (!)

(This assumes OpenGL matirx order, DirectX transposed that convention because MS likes to have its own standards to force people into their great... you get me)

So in DirectX it would look like this:

	float sMatrix[16] = {
1,0,0,5,
0,1,0,6,
0,0,1,7,
0,0,0,1};


... which causes things like using simple multiplications instead dot product and vice versa - trial and error fun starts here.

I'll rewrite above code using matrices:

matrix mS = {...};
matrix mT = {...};
vec skin (3,3,3);

vec localInS = mS.Unrotate(skin - sp);
// or:
vec localInS = mS.Inversed().Transform(skin);
// or:
vec localInS = mS.InverseTransform(skin);

vec animatedSkin = mT.position + mT.Rotate(localInS);
// or:
vec animatedSkin = mT.Transform(localInS);

Try to follow this so all variants of the same thing makes sense.

Finally, let's combine both involved transforms to one (that's what i've missed in my very first post.)

	matrix combined = mS.Inversed() * mT; // or mT * mS.Inversed(), depending on the convention your math lib uses
vec animatedSkin = combined.Transform(skin);


Usually you can write just 'animatedSkin = combined * skin', but 'Transform' makes more sense eventually

So that's it. I hope this helps to understand the math, but i doubt it will help you much to fix your actual problem - this is expected.

There seems nothing wrong with how you do it, bug an be anywhere. I can only repeat my suggestions to do it all an CPU and visualize all steps... good luck!

Edit: To be sure here is how a space is defined if you're total noob

vec sx (1,0,0); // local orientation x axis

vec sy (0,1,0); // local orientation y axis

vec sz (0,0,1);

vec sp (2,3,4); // position

... same for matrices of course

Edited by JoeJ

##### Share on other sites

Hi Joe,

thank you very much again, to take the time to explian me it, but i see in your example, we assume that we animate the model, but whats with the case, i only load the model without animation matrices, so we dont have a target space, just to see the skinned model, do i need than to transform the vertices too, or only if i had animation matrices?

Greets

Benjamin

##### Share on other sites

You can set the target matrices to their according rest pose bone matrices. Then the model should look the same as in the modeling app without animation, and the 'combined' matrices should all end up being identity.

To test just if the vertices load correctly, you can ignore all matrices and render them without skinning. Again the model should look like in modeling app.

But there will be exceptions:

Often the model is already cut into pieces, and those pieces are parented under various bones or other nodes in the transform hierarchy.

This adds some additional complexity, as you now need the skeleton matrices to render them correctly even without animation or skinning. (You need to transform each piece by its parent hierarchy node.) This makes sense at least for props like guns or swords, usually parented by the hand bone so they animate properly when the hand is animated.

So for your initial test models you have it easier when the whole mesh is parented by the root.

Thinking of it, if you accidently parented your box model under one of the bones, this could explain the scaling you described eventually.

##### Share on other sites

Hi Joe,

models will always draw correct in my engine and no model has a joint as parent, i am just a noob in gpu skinning, but not to transform a mesh right with a complex parent node hierarchy

So now i do CPU skinning, like you said, to test the code, to find faster bugs.

So I test my elbow cube model and animate (translate) the last bone, right side, to move down and again to the starpoint.

So i take only the first animation matrix that start form the startpoint and test the animated joint (the last joint) to see if my model keep his form, but unfortunately he tranlate the right side (the 4 edges) a little bit after left, so the mesh will get a little bit smaller

I think there is still a bug how i calculate the Local Space, that makes this translate effect!?

Here's what i have:

        public static Matrix CalculateLocalSpaceMatrix(Joint joint, Matrix result)
{
if (joint.Parent != null)
{
result *= CalculateLocalSpaceMatrix(joint.Parent, joint.Parent.Transform.ObjectSpaceMatrix);
return result;
}
else
return joint.Transform.ObjectSpaceMatrix;
}

private List<Vertex> originVertices = new List<Vertex>();

private void DoSomethingToolStripMenuItem_Click(object sender, EventArgs e)
{
if (this.geometry != null)
{
Geometry g = this.geometry;

if (this.originVertices.Count == 0)
foreach (Vertex vertex in g.Vertices)

List<Matrix> boneMatrices = new List<Matrix>();

if (g.Bones.Count != 0)
{

for (int iBoneIndex = 0; iBoneIndex < g.Bones.Count; iBoneIndex++)
{
Joint bone = g.Bones[iBoneIndex];

Matrix inverseBindMatrix = bone.InverseTransform.ObjectSpaceMatrix;
Matrix localSpaceMatrix = MatrixHelper.CalculateLocalSpaceMatrix(bone, Matrix.Identity);
Matrix animatedSpaceMatrix = this.AnimationList[0].TransformAnimations[0];

Matrix finalMatrix = localSpaceMatrix * inverseBindMatrix * animatedSpaceMatrix;

if (iBoneIndex == 2)
else
}
}

List<Vertex> skinVertices = new List<Vertex>();

for (int i = 0; i < this.originVertices.Count; i++ )
{
Vertex vertex = this.originVertices[i];

Matrix boneTransform = new Matrix(0);
boneTransform += boneMatrices[vertex.BoneIndices[0]] * vertex.Weights[0];
boneTransform += boneMatrices[vertex.BoneIndices[1]] * vertex.Weights[1];
boneTransform += boneMatrices[vertex.BoneIndices[2]] * vertex.Weights[2];
boneTransform += boneMatrices[vertex.BoneIndices[3]] * vertex.Weights[3];

vertex.Position = Vector4F.ToVector3(Vector3.Transform(vertex.Position, boneTransform));

}

List<float> result = new List<float>();

foreach (Vertex item in skinVertices)

this.scene.Graphics3D.VertexBuffer.Update(result.ToArray());
}
}

And the docu say that the bone matrices array are the invert bone matrices

Quote

This source defines the inverse bind matrix for each joint, these are used to bring
coordinates being skinned into the same space as each joint.  Note that in this case the
joints begin at 0,0,0 and move up 30 units for each joint, so the inverse bind matrices
are the opposite of that.

So i add in my Joint Class a second tranform class that store the Object Space Matrix of the bone, the first store the inverse matrix (load from file) and the second store the invert matrix from the file (to draw the skeleton preview right).

Any Ideas?

Greets

Benjamin

Edited by B. /

##### Share on other sites

I'm a bit confused you have 3 matrices there:

Matrix inverseBindMatrix = bone.InverseTransform.ObjectSpaceMatrix;
Matrix localSpaceMatrix = MatrixHelper.CalculateLocalSpaceMatrix(bone, Matrix.Identity);
Matrix animatedSpaceMatrix = this.AnimationList[0].TransformAnimations[0];

You could try to explain where each of those come from (but likely i won't get it... terminology is so confusing to me with this things.)

However, there must be the bone matrix in global space, once in restpose and once animated, and not inversed.

To proof this is right, you should now visualize it, with something like:

DrawLine (matrix[3], matrix[3] + matrix[0], red);

DrawLine (matrix[3], matrix[3] + matrix[1], green);

DrawLine (matrix[3], matrix[3] + matrix[2], blue);

If there is a bug about matrices, you should see it.

Similary you can this with vertices, just by rendering a point. Red for skinned, green for unmodified. (no need to upload buffers to GPU, so another potential source of error less.)

I do this kind of visual debugging all the time, and in combination with some trial and error many things can be fixed without thinking about it. Works also for non graphics problems. Eyes are the best debugger to me.

What seems unusual to me is the way CalculateLocalSpaceMatrix() goes from the current bone to the root. I would expect going from root to the bone, so you might need to inverse the final result (or inverse rotation order when using the result) to have the bone in worldspace. Visual debugging should clarify such things...

##### Share on other sites
Quote

Matrix inverseBindMatrix = bone.InverseTransform.ObjectSpaceMatrix;
Matrix localSpaceMatrix = MatrixHelper.CalculateLocalSpaceMatrix(bone, Matrix.Identity);
Matrix animatedSpaceMatrix = this.AnimationList[0].TransformAnimations[0];

The inverseBindMatrix is the world/object space matrix of the bone i load from the file, remember, i wrote, that the docu say that each marix in the bone matrices array in the file is inverse. That why i had two transform classes in my bone class, one is original load from the file, the inverse Version and one that is the invert value of the inverse Version to draw my skeleton lines.

Quote

This source defines the inverse bind matrix for each joint, these are used to bring
coordinates being skinned into the same space as each joint.  Note that in this case the
joints begin at 0,0,0 and move up 30 units for each joint, so the inverse bind matrices
are the opposite of that.

The localSpaceMatrix, i thought to get in the end the final tranform of the bone relative to the bone hierarchy. I use this code if i group many meshes and transform only the group node/parent node.

The animatedSpaceMatrix should be the targetsource. I animated the last bone of 24 frames, so i have 24 targetsources/matrices.

So the result of TransformAnimations[0] or TransformAnimations[23] should be in the after multipy them with the other matrices give a matrix idetity.

Quote

However, there must be the bone matrix in global space, once in restpose and once animated, and not inversed.

Collada exportert for the bone matrices only the inverse world/object space matrices. So I invert these and store them in my second Transform Class in the bone class.

The animated matrices exported normal. That the TransformAnimations Array, i store them for the mesh.

So in the end i try it with what i wrote

Quote

Finally, let's combine both involved transforms to one (that's what i've missed in my very first post.)


matrix combined = mS.Inversed() * mT; // or mT * mS.Inversed(), depending on the convention your math lib uses
vec animatedSkin = combined.Transform(skin);


Usually you can write just 'animatedSkin = combined * skin', but 'Transform' makes more sense eventually

inverseBindMatrix * animatedSpaceMatrix without success, so i thought i try it again with multiy it with the Local Space Matrix

Edited by B. /

##### Share on other sites

ps. I check up the file to see, if the bone world/Object space matrix is the same as the first animation matrix, and its a little bit different

Bone:

1.000000 0.000000 0.000000 3.963553 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.011606 0.000000 0.000000 0.000000 1.000000

First Animation Matrix:
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000  0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000

That could explain, why i dont get the origin model shape back, after tranform these

In the file i read joint3-Interpolations-array (Animation Matrices Array)

Could it be that i first need to calculate the right animation matrices realtive from the bone object space, befor i can transform my vertices?

Here is the animation_lib of the file

<library_animations>
<animation id="joint3-anim" name="joint3"><animation><source id="joint3-Matrix-animation-input"><float_array id="joint3-Matrix-animation-input-array" count="24">

0.041667 0.083333 0.125000 0.166667 0.208333 0.250000 0.291667 0.333333 0.375000 0.416667 0.458333 0.500000 0.541667 0.583333 0.625000 0.666667
0.708333 0.750000 0.791667 0.833333 0.875000 0.916667 0.958333 1.000000</float_array><technique_common><accessor source="#joint3-Matrix-animation-input-array" count="24"><param name="TIME" type="float"/></accessor></technique_common></source><source id="joint3-Matrix-animation-output-transform"><float_array id="joint3-Matrix-animation-output-transform-array" count="384">

1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -0.142293 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -0.532450 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -1.115392 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -1.836036 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -2.639301 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -3.470107 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -4.273373 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -4.994017 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -5.576958 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -5.967116 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -6.109408 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -5.989200 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -5.656859 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -5.154813 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -4.525488 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -3.811309 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -3.054704 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -2.298099 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -1.583921 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -0.954595 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -0.452549 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 -0.120208 0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 3.447073 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000</float_array><technique_common><accessor source="#joint3-Matrix-animation-output-transform-array" count="24" stride="16"><param type="float4x4"/></accessor></technique_common></source><source id="joint3-Interpolations"><Name_array id="joint3-Interpolations-array" count="24">
LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR
LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR LINEAR
LINEAR</Name_array><technique_common><accessor source="#joint3-Interpolations-array" count="24"><param type="name"/></accessor></technique_common></source><sampler id="joint3-Matrix-animation-transform"><input semantic="INPUT" source="#joint3-Matrix-animation-input"/><input semantic="OUTPUT" source="#joint3-Matrix-animation-output-transform"/><input semantic="INTERPOLATION" source="#joint3-Interpolations"/></sampler><channel source="#joint3-Matrix-animation-transform" target="joint3/matrix"/></animation><animation><source id="joint3-visibility-animation-input"><float_array id="joint3-visibility-animation-input-array" count="3">

0.041667 0.500000 1.000000</float_array><technique_common><accessor source="#joint3-visibility-animation-input-array" count="3"><param name="TIME" type="float"/></accessor></technique_common></source><source id="joint3-visibility-animation-output"><float_array id="joint3-visibility-animation-output-array" count="3">

1.000000 1.000000 1.000000</float_array><technique_common><accessor source="#joint3-visibility-animation-output-array" count="3"><param type="float"/></accessor></technique_common></source><source id="joint3-visibility-animation-intan"><float_array id="joint3-visibility-animation-intan-array" count="3">

0.000000 0.000000 0.000000</float_array><technique_common><accessor source="#joint3-visibility-animation-intan-array" count="3"><param type="float"/></accessor></technique_common></source><source id="joint3-visibility-animation-outtan"><float_array id="joint3-visibility-animation-outtan-array" count="3">

0.000000 0.000000 0.000000</float_array><technique_common><accessor source="#joint3-visibility-animation-outtan-array" count="3"><param type="float"/></accessor></technique_common></source><source id="joint3-visibility-animation-interpolation"><Name_array id="joint3-visibility-animation-interpolation-array" count="3">
STEP STEP STEP</Name_array><technique_common><accessor source="#joint3-visibility-animation-interpolation-array" count="3"><param type="name"/></accessor></technique_common></source><sampler id="joint3-visibility-animation"><input semantic="INPUT" source="#joint3-visibility-animation-input"/><input semantic="OUTPUT" source="#joint3-visibility-animation-output"/><input semantic="IN_TANGENT" source="#joint3-visibility-animation-intan"/><input semantic="OUT_TANGENT" source="#joint3-visibility-animation-outtan"/><input semantic="INTERPOLATION" source="#joint3-visibility-animation-interpolation"/></sampler><channel source="#joint3-visibility-animation" target="joint3/visibility"/></animation></animation>
</library_animations>

Edited by B. /

## Create an account

Register a new account

• 12
• 11
• 22
• 11
• 15
• ### Similar Content

• This article uses material originally posted on Diligent Graphics web site.
Introduction
Graphics APIs have come a long way from small set of basic commands allowing limited control of configurable stages of early 3D accelerators to very low-level programming interfaces exposing almost every aspect of the underlying graphics hardware. Next-generation APIs, Direct3D12 by Microsoft and Vulkan by Khronos are relatively new and have only started getting widespread adoption and support from hardware vendors, while Direct3D11 and OpenGL are still considered industry standard. New APIs can provide substantial performance and functional improvements, but may not be supported by older hardware. An application targeting wide range of platforms needs to support Direct3D11 and OpenGL. New APIs will not give any advantage when used with old paradigms. It is totally possible to add Direct3D12 support to an existing renderer by implementing Direct3D11 interface through Direct3D12, but this will give zero benefits. Instead, new approaches and rendering architectures that leverage flexibility provided by the next-generation APIs are expected to be developed.
There are at least four APIs (Direct3D11, Direct3D12, OpenGL/GLES, Vulkan, plus Apple's Metal for iOS and osX platforms) that a cross-platform 3D application may need to support. Writing separate code paths for all APIs is clearly not an option for any real-world application and the need for a cross-platform graphics abstraction layer is evident. The following is the list of requirements that I believe such layer needs to satisfy:
Lightweight abstractions: the API should be as close to the underlying native APIs as possible to allow an application leverage all available low-level functionality. In many cases this requirement is difficult to achieve because specific features exposed by different APIs may vary considerably. Low performance overhead: the abstraction layer needs to be efficient from performance point of view. If it introduces considerable amount of overhead, there is no point in using it. Convenience: the API needs to be convenient to use. It needs to assist developers in achieving their goals not limiting their control of the graphics hardware. Multithreading: ability to efficiently parallelize work is in the core of Direct3D12 and Vulkan and one of the main selling points of the new APIs. Support for multithreading in a cross-platform layer is a must. Extensibility: no matter how well the API is designed, it still introduces some level of abstraction. In some cases the most efficient way to implement certain functionality is to directly use native API. The abstraction layer needs to provide seamless interoperability with the underlying native APIs to provide a way for the app to add features that may be missing. Diligent Engine is designed to solve these problems. Its main goal is to take advantages of the next-generation APIs such as Direct3D12 and Vulkan, but at the same time provide support for older platforms via Direct3D11, OpenGL and OpenGLES. Diligent Engine exposes common C++ front-end for all supported platforms and provides interoperability with underlying native APIs. It also supports integration with Unity and is designed to be used as graphics subsystem in a standalone game engine, Unity native plugin or any other 3D application. Full source code is available for download at GitHub and is free to use.
Overview
Diligent Engine API takes some features from Direct3D11 and Direct3D12 as well as introduces new concepts to hide certain platform-specific details and make the system easy to use. It contains the following main components:
Render device (IRenderDevice  interface) is responsible for creating all other objects (textures, buffers, shaders, pipeline states, etc.).
Device context (IDeviceContext interface) is the main interface for recording rendering commands. Similar to Direct3D11, there are immediate context and deferred contexts (which in Direct3D11 implementation map directly to the corresponding context types). Immediate context combines command queue and command list recording functionality. It records commands and submits the command list for execution when it contains sufficient number of commands. Deferred contexts are designed to only record command lists that can be submitted for execution through the immediate context.
An alternative way to design the API would be to expose command queue and command lists directly. This approach however does not map well to Direct3D11 and OpenGL. Besides, some functionality (such as dynamic descriptor allocation) can be much more efficiently implemented when it is known that a command list is recorded by a certain deferred context from some thread.
The approach taken in the engine does not limit scalability as the application is expected to create one deferred context per thread, and internally every deferred context records a command list in lock-free fashion. At the same time this approach maps well to older APIs.
In current implementation, only one immediate context that uses default graphics command queue is created. To support multiple GPUs or multiple command queue types (compute, copy, etc.), it is natural to have one immediate contexts per queue. Cross-context synchronization utilities will be necessary.
Swap Chain (ISwapChain interface). Swap chain interface represents a chain of back buffers and is responsible for showing the final rendered image on the screen.
Render device, device contexts and swap chain are created during the engine initialization.
Resources (ITexture and IBuffer interfaces). There are two types of resources - textures and buffers. There are many different texture types (2D textures, 3D textures, texture array, cubmepas, etc.) that can all be represented by ITexture interface.
Resources Views (ITextureView and IBufferView interfaces). While textures and buffers are mere data containers, texture views and buffer views describe how the data should be interpreted. For instance, a 2D texture can be used as a render target for rendering commands or as a shader resource.
Pipeline State (IPipelineState interface). GPU pipeline contains many configurable stages (depth-stencil, rasterizer and blend states, different shader stage, etc.). Direct3D11 uses coarse-grain objects to set all stage parameters at once (for instance, a rasterizer object encompasses all rasterizer attributes), while OpenGL contains myriad functions to fine-grain control every individual attribute of every stage. Both methods do not map very well to modern graphics hardware that combines all states into one monolithic state under the hood. Direct3D12 directly exposes pipeline state object in the API, and Diligent Engine uses the same approach.
Shader Resource Binding (IShaderResourceBinding interface). Shaders are programs that run on the GPU. Shaders may access various resources (textures and buffers), and setting correspondence between shader variables and actual resources is called resource binding. Resource binding implementation varies considerably between different API. Diligent Engine introduces a new object called shader resource binding that encompasses all resources needed by all shaders in a certain pipeline state.
API Basics
Creating Resources
Device resources are created by the render device. The two main resource types are buffers, which represent linear memory, and textures, which use memory layouts optimized for fast filtering. Graphics APIs usually have a native object that represents linear buffer. Diligent Engine uses IBuffer interface as an abstraction for a native buffer. To create a buffer, one needs to populate BufferDesc structure and call IRenderDevice::CreateBuffer() method as in the following example:
BufferDesc BuffDesc; BufferDesc.Name = "Uniform buffer"; BuffDesc.BindFlags = BIND_UNIFORM_BUFFER; BuffDesc.Usage = USAGE_DYNAMIC; BuffDesc.uiSizeInBytes = sizeof(ShaderConstants); BuffDesc.CPUAccessFlags = CPU_ACCESS_WRITE; m_pDevice->CreateBuffer( BuffDesc, BufferData(), &m_pConstantBuffer ); While there is usually just one buffer object, different APIs use very different approaches to represent textures. For instance, in Direct3D11, there are ID3D11Texture1D, ID3D11Texture2D, and ID3D11Texture3D objects. In OpenGL, there is individual object for every texture dimension (1D, 2D, 3D, Cube), which may be a texture array, which may also be multisampled (i.e. GL_TEXTURE_2D_MULTISAMPLE_ARRAY). As a result there are nine different GL texture types that Diligent Engine may create under the hood. In Direct3D12, there is only one resource interface. Diligent Engine hides all these details in ITexture interface. There is only one  IRenderDevice::CreateTexture() method that is capable of creating all texture types. Dimension, format, array size and all other parameters are specified by the members of the TextureDesc structure:
TextureDesc TexDesc; TexDesc.Name = "My texture 2D"; TexDesc.Type = TEXTURE_TYPE_2D; TexDesc.Width = 1024; TexDesc.Height = 1024; TexDesc.Format = TEX_FORMAT_RGBA8_UNORM; TexDesc.Usage = USAGE_DEFAULT; TexDesc.BindFlags = BIND_SHADER_RESOURCE | BIND_RENDER_TARGET | BIND_UNORDERED_ACCESS; TexDesc.Name = "Sample 2D Texture"; m_pRenderDevice->CreateTexture( TexDesc, TextureData(), &m_pTestTex ); If native API supports multithreaded resource creation, textures and buffers can be created by multiple threads simultaneously.
Interoperability with native API provides access to the native buffer/texture objects and also allows creating Diligent Engine objects from native handles. It allows applications seamlessly integrate native API-specific code with Diligent Engine.
Next-generation APIs allow fine level-control over how resources are allocated. Diligent Engine does not currently expose this functionality, but it can be added by implementing IResourceAllocator interface that encapsulates specifics of resource allocation and providing this interface to CreateBuffer() or CreateTexture() methods. If null is provided, default allocator should be used.
Initializing the Pipeline State
As it was mentioned earlier, Diligent Engine follows next-gen APIs to configure the graphics/compute pipeline. One big Pipelines State Object (PSO) encompasses all required states (all shader stages, input layout description, depth stencil, rasterizer and blend state descriptions etc.). This approach maps directly to Direct3D12/Vulkan, but is also beneficial for older APIs as it eliminates pipeline misconfiguration errors. With many individual calls tweaking various GPU pipeline settings it is very easy to forget to set one of the states or assume the stage is already properly configured when in fact it is not. Using pipeline state object helps avoid these problems as all stages are configured at once.
While in earlier APIs shaders were bound separately, in the next-generation APIs as well as in Diligent Engine shaders are part of the pipeline state object. The biggest challenge when authoring shaders is that Direct3D and OpenGL/Vulkan use different shader languages (while Apple uses yet another language in their Metal API). Maintaining two versions of every shader is not an option for real applications and Diligent Engine implements shader source code converter that allows shaders authored in HLSL to be translated to GLSL. To create a shader, one needs to populate ShaderCreationAttribs structure. SourceLanguage member of this structure tells the system which language the shader is authored in:
When sampling a texture in a shader, the texture sampler was traditionally specified as separate object that was bound to the pipeline at run time or set as part of the texture object itself. However, in most cases it is known beforehand what kind of sampler will be used in the shader. Next-generation APIs expose new type of sampler called static sampler that can be initialized directly in the pipeline state. Diligent Engine exposes this functionality: when creating a shader, textures can be assigned static samplers. If static sampler is assigned, it will always be used instead of the one initialized in the texture shader resource view. To initialize static samplers, prepare an array of StaticSamplerDesc structures and initialize StaticSamplers and NumStaticSamplers members. Static samplers are more efficient and it is highly recommended to use them whenever possible. On older APIs, static samplers are emulated via generic sampler objects.
The following is an example of shader initialization:
Creating the Pipeline State Object
After all required shaders are created, the rest of the fields of the PipelineStateDesc structure provide depth-stencil, rasterizer, and blend state descriptions, the number and format of render targets, input layout format, etc. For instance, rasterizer state can be described as follows:
PipelineStateDesc PSODesc; RasterizerStateDesc &RasterizerDesc = PSODesc.GraphicsPipeline.RasterizerDesc; RasterizerDesc.FillMode = FILL_MODE_SOLID; RasterizerDesc.CullMode = CULL_MODE_NONE; RasterizerDesc.FrontCounterClockwise = True; RasterizerDesc.ScissorEnable = True; RasterizerDesc.AntialiasedLineEnable = False; Depth-stencil and blend states are defined in a similar fashion.
Another important thing that pipeline state object encompasses is the input layout description that defines how inputs to the vertex shader, which is the very first shader stage, should be read from the memory. Input layout may define several vertex streams that contain values of different formats and sizes:
// Define input layout InputLayoutDesc &Layout = PSODesc.GraphicsPipeline.InputLayout; LayoutElement TextLayoutElems[] = {     LayoutElement( 0, 0, 3, VT_FLOAT32, False ),     LayoutElement( 1, 0, 4, VT_UINT8, True ),     LayoutElement( 2, 0, 2, VT_FLOAT32, False ), }; Layout.LayoutElements = TextLayoutElems; Layout.NumElements = _countof( TextLayoutElems ); Finally, pipeline state defines primitive topology type. When all required members are initialized, a pipeline state object can be created by IRenderDevice::CreatePipelineState() method:
// Define shader and primitive topology PSODesc.GraphicsPipeline.PrimitiveTopologyType = PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; PSODesc.GraphicsPipeline.pVS = pVertexShader; PSODesc.GraphicsPipeline.pPS = pPixelShader; PSODesc.Name = "My pipeline state"; m_pDev->CreatePipelineState(PSODesc, &m_pPSO); When PSO object is bound to the pipeline, the engine invokes all API-specific commands to set all states specified by the object. In case of Direct3D12 this maps directly to setting the D3D12 PSO object. In case of Direct3D11, this involves setting individual state objects (such as rasterizer and blend states), shaders, input layout etc. In case of OpenGL, this requires a number of fine-grain state tweaking calls. Diligent Engine keeps track of currently bound states and only calls functions to update these states that have actually changed.
Direct3D11 and OpenGL utilize fine-grain resource binding models, where an application binds individual buffers and textures to certain shader or program resource binding slots. Direct3D12 uses a very different approach, where resource descriptors are grouped into tables, and an application can bind all resources in the table at once by setting the table in the command list. Resource binding model in Diligent Engine is designed to leverage this new method. It introduces a new object called shader resource binding that encapsulates all resource bindings required for all shaders in a certain pipeline state. It also introduces the classification of shader variables based on the frequency of expected change that helps the engine group them into tables under the hood:
Static variables (SHADER_VARIABLE_TYPE_STATIC) are variables that are expected to be set only once. They may not be changed once a resource is bound to the variable. Such variables are intended to hold global constants such as camera attributes or global light attributes constant buffers. Mutable variables (SHADER_VARIABLE_TYPE_MUTABLE) define resources that are expected to change on a per-material frequency. Examples may include diffuse textures, normal maps etc. Dynamic variables (SHADER_VARIABLE_TYPE_DYNAMIC) are expected to change frequently and randomly. Shader variable type must be specified during shader creation by populating an array of ShaderVariableDesc structures and initializing ShaderCreationAttribs::Desc::VariableDesc and ShaderCreationAttribs::Desc::NumVariables members (see example of shader creation above).
Static variables cannot be changed once a resource is bound to the variable. They are bound directly to the shader object. For instance, a shadow map texture is not expected to change after it is created, so it can be bound directly to the shader:
m_pPSO->CreateShaderResourceBinding(&m_pSRB); Note that an SRB is only compatible with the pipeline state it was created from. SRB object inherits all static bindings from shaders in the pipeline, but is not allowed to change them.
Mutable resources can only be set once for every instance of a shader resource binding. Such resources are intended to define specific material properties. For instance, a diffuse texture for a specific material is not expected to change once the material is defined and can be set right after the SRB object has been created:
m_pSRB->GetVariable(SHADER_TYPE_PIXEL, "tex2DDiffuse")->Set(pDiffuseTexSRV); In some cases it is necessary to bind a new resource to a variable every time a draw command is invoked. Such variables should be labeled as dynamic, which will allow setting them multiple times through the same SRB object:
m_pSRB->GetVariable(SHADER_TYPE_VERTEX, "cbRandomAttribs")->Set(pRandomAttrsCB); Under the hood, the engine pre-allocates descriptor tables for static and mutable resources when an SRB objcet is created. Space for dynamic resources is dynamically allocated at run time. Static and mutable resources are thus more efficient and should be used whenever possible.
As you can see, Diligent Engine does not expose low-level details of how resources are bound to shader variables. One reason for this is that these details are very different for various APIs. The other reason is that using low-level binding methods is extremely error-prone: it is very easy to forget to bind some resource, or bind incorrect resource such as bind a buffer to the variable that is in fact a texture, especially during shader development when everything changes fast. Diligent Engine instead relies on shader reflection system to automatically query the list of all shader variables. Grouping variables based on three types mentioned above allows the engine to create optimized layout and take heavy lifting of matching resources to API-specific resource location, register or descriptor in the table.
This post gives more details about the resource binding model in Diligent Engine.
Setting the Pipeline State and Committing Shader Resources
Before any draw or compute command can be invoked, the pipeline state needs to be bound to the context:
m_pContext->SetPipelineState(m_pPSO); Under the hood, the engine sets the internal PSO object in the command list or calls all the required native API functions to properly configure all pipeline stages.
The next step is to bind all required shader resources to the GPU pipeline, which is accomplished by IDeviceContext::CommitShaderResources() method:
m_pContext->CommitShaderResources(m_pSRB, COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES); The method takes a pointer to the shader resource binding object and makes all resources the object holds available for the shaders. In the case of D3D12, this only requires setting appropriate descriptor tables in the command list. For older APIs, this typically requires setting all resources individually.
Next-generation APIs require the application to track the state of every resource and explicitly inform the system about all state transitions. For instance, if a texture was used as render target before, while the next draw command is going to use it as shader resource, a transition barrier needs to be executed. Diligent Engine does the heavy lifting of state tracking.  When CommitShaderResources() method is called with COMMIT_SHADER_RESOURCES_FLAG_TRANSITION_RESOURCES flag, the engine commits and transitions resources to correct states at the same time. Note that transitioning resources does introduce some overhead. The engine tracks state of every resource and it will not issue the barrier if the state is already correct. But checking resource state is an overhead that can sometimes be avoided. The engine provides IDeviceContext::TransitionShaderResources() method that only transitions resources:
m_pContext->TransitionShaderResources(m_pPSO, m_pSRB); In some scenarios it is more efficient to transition resources once and then only commit them.
Invoking Draw Command
The final step is to set states that are not part of the PSO, such as render targets, vertex and index buffers. Diligent Engine uses Direct3D11-syle API that is translated to other native API calls under the hood:
ITextureView *pRTVs[] = {m_pRTV}; m_pContext->SetRenderTargets(_countof( pRTVs ), pRTVs, m_pDSV); // Clear render target and depth buffer const float zero[4] = {0, 0, 0, 0}; m_pContext->ClearRenderTarget(nullptr, zero); m_pContext->ClearDepthStencil(nullptr, CLEAR_DEPTH_FLAG, 1.f); // Set vertex and index buffers IBuffer *buffer[] = {m_pVertexBuffer}; Uint32 offsets[] = {0}; Uint32 strides[] = {sizeof(MyVertex)}; m_pContext->SetVertexBuffers(0, 1, buffer, strides, offsets, SET_VERTEX_BUFFERS_FLAG_RESET); m_pContext->SetIndexBuffer(m_pIndexBuffer, 0); Different native APIs use various set of function to execute draw commands depending on command details (if the command is indexed, instanced or both, what offsets in the source buffers are used etc.). For instance, there are 5 draw commands in Direct3D11 and more than 9 commands in OpenGL with something like glDrawElementsInstancedBaseVertexBaseInstance not uncommon. Diligent Engine hides all details with single IDeviceContext::Draw() method that takes takes DrawAttribs structure as an argument. The structure members define all attributes required to perform the command (primitive topology, number of vertices or indices, if draw call is indexed or not, if draw call is instanced or not, if draw call is indirect or not, etc.). For example:
DrawAttribs attrs; attrs.IsIndexed = true; attrs.IndexType = VT_UINT16; attrs.NumIndices = 36; attrs.Topology = PRIMITIVE_TOPOLOGY_TRIANGLE_LIST; pContext->Draw(attrs); For compute commands, there is IDeviceContext::DispatchCompute() method that takes DispatchComputeAttribs structure that defines compute grid dimension.
Source Code
Full engine source code is available on GitHub and is free to use. The repository contains tutorials, sample applications, asteroids performance benchmark and an example Unity project that uses Diligent Engine in native plugin.
Atmospheric scattering sample demonstrates how Diligent Engine can be used to implement various rendering tasks: loading textures from files, using complex shaders, rendering to multiple render targets, using compute shaders and unordered access views, etc.

Asteroids performance benchmark is based on this demo developed by Intel. It renders 50,000 unique textured asteroids and allows comparing performance of Direct3D11 and Direct3D12 implementations. Every asteroid is a combination of one of 1000 unique meshes and one of 10 unique textures.

Finally, there is an example project that shows how Diligent Engine can be integrated with Unity.

Future Work
The engine is under active development. It currently supports Windows desktop, Universal Windows, Linux, Android, MacOS, and iOS platforms. Direct3D11, Direct3D12, OpenGL/GLES backends are now feature complete. Vulkan backend is coming next, and Metal backend is in the plan.

• I hope this is the right place to ask questions about DirectXTK which aren't really about graphics, if not please let me know a better place.
Can anyone tell me why I cannot do this:
DirectX::SimpleMath::Rectangle rectangle = {...}; RECT rect = rectangle; or
RECT rect = static_cast<RECT>(rectangle); or
const RECT rect(m_textureRect); despite Rectangle having the following operator RECT:
operator RECT() { RECT rct; rct.left = x; rct.top = y; rct.right = (x + width); rct.bottom = (y + height); return rct; } VS2017 tells me:
error C2440: 'initializing': cannot convert from 'const DirectX::SimpleMath::Rectangle' to 'const RECT' Thanks in advance
• By isu diss
I'm trying to duplicate vertices using std::map to be used in a vertex buffer. I don't get the correct index buffer(myInds) or vertex buffer(myVerts). I can get the index array from FBX but it differs from what I get in the following std::map code. Any help is much appreciated.
struct FBXVTX { XMFLOAT3 Position; XMFLOAT2 TextureCoord; XMFLOAT3 Normal; }; std::map< FBXVTX, int > myVertsMap; std::vector<FBXVTX> myVerts; std::vector<int> myInds; HRESULT FBXLoader::Open(HWND hWnd, char* Filename, bool UsePositionOnly) { HRESULT hr = S_OK; if (FBXM) { FBXIOS = FbxIOSettings::Create(FBXM, IOSROOT); FBXM->SetIOSettings(FBXIOS); FBXI = FbxImporter::Create(FBXM, ""); if (!(FBXI->Initialize(Filename, -1, FBXIOS))) { hr = E_FAIL; MessageBox(hWnd, (wchar_t*)FBXI->GetStatus().GetErrorString(), TEXT("ALM"), MB_OK); } FBXS = FbxScene::Create(FBXM, "REALMS"); if (!FBXS) { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to create the scene"), TEXT("ALM"), MB_OK); } if (!(FBXI->Import(FBXS))) { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to import fbx file content into the scene"), TEXT("ALM"), MB_OK); } FbxAxisSystem OurAxisSystem = FbxAxisSystem::DirectX; FbxAxisSystem SceneAxisSystem = FBXS->GetGlobalSettings().GetAxisSystem(); if(SceneAxisSystem != OurAxisSystem) { FbxAxisSystem::DirectX.ConvertScene(FBXS); } FbxSystemUnit SceneSystemUnit = FBXS->GetGlobalSettings().GetSystemUnit(); if( SceneSystemUnit.GetScaleFactor() != 1.0 ) { FbxSystemUnit::cm.ConvertScene( FBXS ); } if (FBXI) FBXI->Destroy(); FbxNode* MainNode = FBXS->GetRootNode(); int NumKids = MainNode->GetChildCount(); FbxNode* ChildNode = NULL; for (int i=0; i<NumKids; i++) { ChildNode = MainNode->GetChild(i); FbxNodeAttribute* NodeAttribute = ChildNode->GetNodeAttribute(); if (NodeAttribute->GetAttributeType() == FbxNodeAttribute::eMesh) { FbxMesh* Mesh = ChildNode->GetMesh(); if (UsePositionOnly) { NumVertices = Mesh->GetControlPointsCount();//number of vertices MyV = new XMFLOAT3[NumVertices]; for (DWORD j = 0; j < NumVertices; j++) { FbxVector4 Vertex = Mesh->GetControlPointAt(j);//Gets the control point at the specified index. MyV[j] = XMFLOAT3((float)Vertex.mData[0], (float)Vertex.mData[1], (float)Vertex.mData[2]); } NumIndices = Mesh->GetPolygonVertexCount();//number of indices MyI = (DWORD*)Mesh->GetPolygonVertices();//index array } else { FbxLayerElementArrayTemplate<FbxVector2>* uvVertices = NULL; Mesh->GetTextureUV(&uvVertices); int idx = 0; for (int i = 0; i < Mesh->GetPolygonCount(); i++)//polygon(=mostly triangle) count { for (int j = 0; j < Mesh->GetPolygonSize(i); j++)//retrieves number of vertices in a polygon { FBXVTX myVert; int p_index = 3*i+j; int t_index = Mesh->GetTextureUVIndex(i, j); FbxVector4 Vertex = Mesh->GetControlPointAt(p_index);//Gets the control point at the specified index. myVert.Position = XMFLOAT3((float)Vertex.mData[0], (float)Vertex.mData[1], (float)Vertex.mData[2]); FbxVector4 Normal; Mesh->GetPolygonVertexNormal(i, j, Normal); myVert.Normal = XMFLOAT3((float)Normal.mData[0], (float)Normal.mData[1], (float)Normal.mData[2]); FbxVector2 uv = uvVertices->GetAt(t_index); myVert.TextureCoord = XMFLOAT2((float)uv.mData[0], (float)uv.mData[1]); if ( myVertsMap.find( myVert ) != myVertsMap.end() ) myInds.push_back( myVertsMap[ myVert ]); else { myVertsMap.insert( std::pair<FBXVTX, int> (myVert, idx ) ); myVerts.push_back(myVert); myInds.push_back(idx); idx++; } } } } } } } else { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to create the FBX Manager"), TEXT("ALM"), MB_OK); } return hr; } bool operator < ( const FBXVTX &lValue, const FBXVTX &rValue) { if (lValue.Position.x != rValue.Position.x) return(lValue.Position.x < rValue.Position.x); if (lValue.Position.y != rValue.Position.y) return(lValue.Position.y < rValue.Position.y); if (lValue.Position.z != rValue.Position.z) return(lValue.Position.z < rValue.Position.z); if (lValue.TextureCoord.x != rValue.TextureCoord.x) return(lValue.TextureCoord.x < rValue.TextureCoord.x); if (lValue.TextureCoord.y != rValue.TextureCoord.y) return(lValue.TextureCoord.y < rValue.TextureCoord.y); if (lValue.Normal.x != rValue.Normal.x) return(lValue.Normal.x < rValue.Normal.x); if (lValue.Normal.y != rValue.Normal.y) return(lValue.Normal.y < rValue.Normal.y); return(lValue.Normal.z < rValue.Normal.z); }

• Hi,

I am working on a project where I'm trying to use Forward Plus Rendering on point lights. I have a simple reflective scene with many point lights moving around it. I am using effects file (.fx) to keep my shaders in one place. I am having a problem with Compute Shader code. I cannot get it to work properly and calculate the tiles and lighting properly.

Is there anyone that is wishing to help me set up my compute shader?
Thank you in advance for any replies and interest!

• Hi, right now building my engine in visual studio involves a shader compiling step to build hlsl 5.0 shaders. I have a separate project which only includes shader sources and the compiler is the visual studio integrated fxc compiler. I like this method because on any PC that has visual studio installed, I can just download the solution from GitHub and everything just builds without additional dependencies and using the latest version of the compiler. I also like it because the shaders are included in the solution explorer and easy to browse, and double-click to open (opening files can be really a pain in the ass in visual studio run in admin mode). Also it's nice that VS displays the build output/errors in the output window.