# DX11 Models, model matrices, and rendering

## Recommended Posts

I was thinking about how to render multiple objects. Things like sprites, truck models, plane models, boats models, etc. And I'm not too sure about this process

Let's say I have a vector of Models objects

class Model
{
Matrix4 modelMat;
VertexData vertices;
Texture texture;
};

Since each model has is own model matrix, as all models should, does this mean I now need to have 1 draw call per model?

Because each model that needs to be drawn could change the MVP matrix used by the bound vertex shader. Meaning I have to keep updating/mapping the constant buffer my MVP matrix is stored in, which is used by the vertex shader

Am I thinking about all of this wrong? Isn't this horribly inefficient?

##### Share on other sites

The options

One drawcall per model OR

instancing which is one call per N models of the same type OR

pre transform vertices for some static models so the vertices are in world space OR

If dx12 use drawindirect with CPU or with a GPU driven pipeline OR

if dx11 use instancing with manual vertex fetch for clustered rendering

if dx11 use draw indirect with either virtual texturing or thin gbuffer with deferred texturing.

I think thats all of them.

edit - there's also merge instancing but thats similar to the fifth one I listed.

edit2 - look into texture atlas's to help batching draw calls.

Edited by Infinisearch

##### Share on other sites
2 hours ago, noodleBowl said:

Since each model has is own model matrix, as all models should, does this mean I now need to have 1 draw call per model?

There are other options listed above.

2 hours ago, noodleBowl said:

Am I thinking about all of this wrong? Isn't this horribly inefficient?

You aren't necessarily thinking about this wrong just incomplete since there are other options.  As far as inefficient goes it depends on the order you draw your models in for dx11.  And you do have a "draw call budget" to think about but it depends on your CPU/CPU load.  Basically if you are DX11 the simple thing to do is sort by shader then texture then other state changes and use instancing where possible.

##### Share on other sites

Might be a stupid question here but what is considered a "static" model? Would a Sprite be considered a static model because it does not move although you can animate it

If I were to pretransform my vertices would it only work for static models?

##### Share on other sites

It means no movement too.  Think about it if you pretransform the vertices then if you move them they would need to be transformed again... whats the point.  Think walls that don't move in any way or buildings in a cityscape.  I'm no expert on this technique since I never bothered using it.  Maybe @Hodgman can explain the different variations of the technique better than me, I think I remember him mentioning it once.

But like I said for now you're better off just batching properly and using instancing where possible with texture atlases to make the batches bigger.

Oh and here are some presentations and papers that describe some of the techniques

Edited by Infinisearch

##### Share on other sites

@Infinisearch Thanks for the above!

After looking at those presentations I do have some basic/general questions.

Using these as an example, lets say I have the following meshes

Knight
Airplane
Cruise Ship

And because I would want to draw a variety of the above, which may all have different transforms. I cannot put them into one buffer and save on draw calls correct (excluding instancing in this case)?

Even if I needed to draw something enough to warrant instancing and they all had different positions, rotations, and etc can I still instance? I though for instancing to work everything had to be the same

##### Share on other sites
34 minutes ago, noodleBowl said:

And because I would want to draw a variety of the above, which may all have different transforms. I cannot put them into one buffer and save on draw calls correct (excluding instancing in this case)?

36 minutes ago, noodleBowl said:

Even if I needed to draw something enough to warrant instancing and they all had different positions, rotations, and etc can I still instance? I though for instancing to work everything had to be the same

First of all if they are of the same vertex type you should be able to stick them in the same buffer thus reducing state changes between draw calls. (look at the arguments to a draw call to understand what I mean)  Alright forget about pretransforming vertices since that would be for static objects only.  So option one is to use one draw call per model.  Option two is for each model that has exactly the same geometry data(not the transform and other constants) use instancing.  Option three is packing textures into a texture atlas (dx11 and before, dx12 is different) and then using instancing on the same models (but now with different textures in addition to different tranforms and constants).  Option four is merge instancing in which you combine instancing with manual vertex fetch and a texture atlas (with this you can have different geometry, textures, transform, and constants).  The only constraint is that the different models should be approximately the same size otherwise you waste performance on degenerate triangles.  Option five is an extension of merge instancing in which instead of using a instance size as big as the biggest model of the group you use an instance size that is much smaller than the model size.  This requires you to split your models into triangle clusters of the same size and potentially use triangle strips.  But this technique allows you to do cluster based gpu culling which can be a big performance win.  Then there is draw indirect which is different in DX11 and DX12, but in directx12 it will allow for some nice tricks on models that vary.

So to answer your question for standard instancing in dx11 the geometry and texture has to be the same but the transform and other constants like color can vary.  In dx11 if you implement a texture atlas you would be able to vary texture while using instancing but the geometry would be the same.  In dx11 if you use manual vertex fetching you throw away using the post transform vertex cache but now you can use instancing to draw different geometry.  There are two way to do this, I described them above and posted links to the techniques in my previous post.

##### Share on other sites
6 hours ago, Infinisearch said:

First of all if they are of the same vertex type you should be able to stick them in the same buffer thus reducing state changes between draw calls. (look at the arguments to a draw call to understand what I mean)

I'm actually not really sure what you are talking about here? The Draw call only has a vertex count and startVertexLocation. Am I looking at the wrong function? The only thing I can think of is the D3D11_INPUT_ELEMENT_DESC needed for a input layout

D3D11_INPUT_ELEMENT_DESC inputElementDescription[] = {
{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};

I'm looking at this tutorial on standard instancing and I don't 100% understand the input layout when it comes to the instance data. More specifically I don't understand why they have changed the InputSlot to 1. Is this because the are binding 2 buffers and using 1 would point to the second buffer (m_instanceBuffer) where the instance modifications are stored? OR is it really just that they are reusing a semantic (TEXCOORD) and the two bound buffers (m_vertexBuffer and m_instanceBuffer) are treated as one big buffer?

In the tutorial they create a InstanceType struct to hold the modifications they want to do to the vertex positions. But in a case of using a transform (model) matrix to do vertex data modifications would it be done the same way instead of using a constant buffer?

Edited by noodleBowl

##### Share on other sites
8 hours ago, Infinisearch said:

Alright forget about pretransforming vertices since that would be for static objects only.

Wouldn't it make sense to pretransform dynamic meshes too?

Thinking of skinning, tesselating, etc. multiple times for each shadow map, i assume pretransforming would be faster even if this means additional reads / writes to global memory. Drawing all models with one call is another advantage, GPU culling another, everything becomes less fragmented.

But i never tried that yet.

One thing i tried is to store an matrix index in vertex data (position.w), and load matrix per vertex. That worked surprisingly well, although on AMD it wastes registers. I did not notice a performance difference between drawing 2 million boxes with unique matrix per box or just using one global transform. Seems the rasterizer limited (boxes were just textured but not lit).

Edited by JoeJ

##### Share on other sites
14 hours ago, noodleBowl said:

And because I would want to draw a variety of the above, which may all have different transforms. I cannot put them into one buffer and save on draw calls correct (excluding instancing in this case)?

13 hours ago, Infinisearch said:

First of all if they are of the same vertex type you should be able to stick them in the same buffer thus reducing state changes between draw calls. (look at the arguments to a draw call to understand what I mean)

6 hours ago, noodleBowl said:

I'm actually not really sure what you are talking about here? The Draw call only has a vertex count and startVertexLocation. Am I looking at the wrong function? The only thing I can think of is the D3D11_INPUT_ELEMENT_DESC needed for a input layout

I think I might have read you wrong in the first quote and added the statement in parenthesis after, I don't really remember what I was thinking when I wrote that.  Ignore it for now... if I remember my line of thought I will post it.

7 hours ago, noodleBowl said:

More specifically I don't understand why they have changed the InputSlot to 1. Is this because the are binding 2 buffers and using 1 would point to the second buffer (m_instanceBuffer) where the instance modifications are stored? OR is it really just that they are reusing a semantic (TEXCOORD) and the two bound buffers (m_vertexBuffer and m_instanceBuffer) are treated as one big buffer?

This is vertex streams... you should look into them not just for instancing.  Basically lets say you have three vertex components, position, normal, and texturecoordinate.  You can stick that data into one struct, two structs, or three structs (when I say structs, I mean arrays of structs).  If you use multiple arrays you need a way to bind all the arrays, this is what input slots are for.  But for instancing the reason you use multiple slots is that the step rate for fetching data from those buffers is different.  Per vertex vs. per instance.

7 hours ago, noodleBowl said:

In the tutorial they create a InstanceType struct to hold the modifications they want to do to the vertex positions. But in a case of using a transform (model) matrix to do vertex data modifications would it be done the same way instead of using a constant buffer?

Yeah you don't need to use a constant buffer.  But there is also another way to implement instancing using the system value SV_instanceid. ( I think thats it)  But you should learn that later.

##### Share on other sites
4 hours ago, JoeJ said:

Wouldn't it make sense to pretransform dynamic meshes too?

Thinking of skinning, tesselating, etc. multiple times for each shadow map, i assume pretransforming would be faster even if this means additional reads / writes to global memory. Drawing all models with one call is another advantage, GPU culling another, everything becomes less fragmented.

But i never tried that yet.

One thing i tried is to store an matrix index in vertex data (position.w), and load matrix per vertex. That worked surprisingly well, although on AMD it wastes registers. I did not notice a performance difference between drawing 2 million boxes with unique matrix per box or just using one global transform. Seems the rasterizer limited (boxes were just textured but not lit).

I was speaking about the context of reducing draw calls for static data, as in not data with per frame changes.  But if there is per frame changes you're right there might be gains to be had by pretransforming skinned or tessellated meshes.  But pretransforming per frame on the gpu will reduce calls depending on how you implement... stream-out or compute shader.  Thats interesting that you had no performance degradation with a matrix index per vertex.  But like you seem to imply the results might differ with more complicated shaders.

##### Share on other sites

BTW @noodleBowl have I been clear enough?  Is there anything you don't understand?

##### Share on other sites
On 7.10.2017 at 3:58 PM, Infinisearch said:

But pretransforming per frame on the gpu will reduce calls depending on how you implement... stream-out or compute shader.

I never considered pretransforming by vertex shader and stream-out. Is it possible to stream out to GPU memory with DX12/VK?

Actually i planned to do it with compute shader but somehow it feels wrong to reimplement tesselation on my own if there is already hardware for that. On the other side compute seems more flexible than hardware, e.g. if we want catmull clark subivision.

Also, having good compute but weak graphics experience i tend to think: 'geometry and tesselation stages are useless - use compute and pretransform instead.' But then why did AMD spend so much effort to improve those things for Vega?

##### Share on other sites
2 hours ago, JoeJ said:

Is it possible to stream out to GPU memory with DX12/VK?

I've never done it but I don't see why not.

2 hours ago, JoeJ said:

Actually i planned to do it with compute shader but somehow it feels wrong to reimplement tesselation on my own if there is already hardware for that. On the other side compute seems more flexible than hardware, e.g. if we want catmull clark subivision.

I've read like one or two papers where they implement tessellation using the compute shader, and I think they said flexibility was one of the benefits... don't remember much else.  I think the last presentation I posted above (optimizing graphics with compute) has a section on using compute on tessellation.

3 hours ago, JoeJ said:

Also, having good compute but weak graphics experience i tend to think: 'geometry and tesselation stages are useless - use compute and pretransform instead.' But then why did AMD spend so much effort to improve those things for Vega?

I don't have any tessellation experience and have kept away from it on purpose.  IIRC Vega hasn't really improved tessellation performance that much and Nvidia still kicks their butt in it. (at least with high tessellation factors)

3 hours ago, JoeJ said:

Well like I said I have no experience with tessellation but again like I said earlier IIRC there were a few papers I read that seemed to implement it using compute.  The only thing I can definitively say is that implementing through the graphics pipeline with take up draw calls, doing it through compute will have a performance advantage over FF on hardware with lots of compute units but this advantage might be lost because instead of the expanded vertices's being stored in the cache they'd go through memory.

Maybe someone with more experience can chime in.

##### Share on other sites

I've worked on some games recently where we pre-transformed skinned meshes on the CPU. It's not ideal or a typical way to do things, but we did have spare CPU cycles available and were struggling for every GPU cycle we could find, so there was no reason for us to move that logic from the CPU to a compute shader.

5 hours ago, JoeJ said:

Is it possible to stream out to GPU memory with DX12/VK?

It's possible in DX10/GL, and DX12/VK haven't lost the ability

5 hours ago, JoeJ said:

But then why did AMD spend so much effort to improve those things for Vega?

Because they're playing catch-up with NVidia  Tessellation is used quite a bit by some games, and not at all for others. "Pass-through" geometry shaders (no geometry amplification) are useful for some things, and NVidia is really good at doing them with no typical GS penalty.
GS is used in a few modern tricks that might catch on soon -- e.g. NV encourages people to use the GS stage as part of a technique to perform variable resolution rendering for VR, where the edges of the viewport have less resolution than the center.

##### Share on other sites
11 hours ago, Infinisearch said:

BTW @noodleBowl have I been clear enough?  Is there anything you don't understand?

There is a whole lot I don't understand haha, but that is because my graphics experience / knowledge is very fragmented. Just need more practice

On 10/7/2017 at 9:45 AM, Infinisearch said:

This is vertex streams... you should look into them not just for instancing.  Basically lets say you have three vertex components, position, normal, and texturecoordinate.  You can stick that data into one struct, two structs, or three structs (when I say structs, I mean arrays of structs).  If you use multiple arrays you need a way to bind all the arrays, this is what input slots are for

Not sure if you are talking about interleaved vs non-interleave buffers? Or if you are talking about streaming out data back to the CPU from the GPU to do further processing?

If you are talking about interleaved vs non-interleave buffers (I think this is what you mean or the option that make the most sense to me), why would I want to have non-interleave buffers?

On 10/7/2017 at 9:45 AM, Infinisearch said:

Yeah you don't need to use a constant buffer

Just a general question about constant buffers/buffers, might be stupid, but in that tutorial they created an extra vertex buffer to hold position modifications, so then for certain situations should I just use/bind an extra (non-constant) buffer?

For example the MVP matrix is not really constant and it can change every frame so would it be better suited in a buffer that does not use the D3D11_BIND_CONSTANT_BUFFER flag (even though you can set the usage to dynamic)? Where as something like a light's brightness, a value that wouldn't change, should go into a buffer that is created with the D3D11_BIND_CONSTANT_BUFFER flag?

Or is that all nonsense? That there are some optimizations going on behind the scenes or that it is just better to have them split up (coming from the viewpoint that there are probably way less constant buffer binds then binds that involve other buffer types like vertex data which would need to be rebinded per mesh)

##### Share on other sites
7 hours ago, noodleBowl said:

Just a general question about constant buffers/buffers, might be stupid, but in that tutorial they created an extra vertex buffer to hold position modifications, so then for certain situations should I just use/bind an extra (non-constant) buffer?

For example the MVP matrix is not really constant and it can change every frame so would it be better suited in a buffer that does not use the D3D11_BIND_CONSTANT_BUFFER flag (even though you can set the usage to dynamic)? Where as something like a light's brightness, a value that wouldn't change, should go into a buffer that is created with the D3D11_BIND_CONSTANT_BUFFER flag?

Or is that all nonsense? That there are some optimizations going on behind the scenes or that it is just better to have them split up (coming from the viewpoint that there are probably way less constant buffer binds then binds that involve other buffer types like vertex data which would need to be rebinded per mesh)

Constant buffer are basically made for data that changes per frame, so you're fine if you use them.  In fact IHV's optimize constant buffer access if I'm remembering right.  As far as that tutorial goes its because its instancing they're using an extra vertex buffer, most probably because there is size constraints on constant buffers. (64KB if IIRC)

7 hours ago, noodleBowl said:

Not sure if you are talking about interleaved vs non-interleave buffers? Or if you are talking about streaming out data back to the CPU from the GPU to do further processing?

If you are talking about interleaved vs non-interleave buffers (I think this is what you mean or the option that make the most sense to me), why would I want to have non-interleave buffers?

## Create an account

Register a new account

• 10
• 10
• 10
• 10
• 12
• ### Similar Content

• Hello fellow devs!
Once again I started working on an 2D adventure game and right now I'm doing the character-movement/animation. I'm not a big math guy and I was happy about my solution, but soon I realized that it's flawed. My player has 5 walking-animations, mirrored for the left side: up, upright, right, downright, down. With the atan2 function I get the angle between player and destination. To get an index from 0 to 4, I divide PI by 5 and see how many times it goes into the player-destination angle.

In Pseudo-Code:
angle = atan2(destination.x - player.x, destination.y - player.y) //swapped y and x to get mirrored angle around the y axis
index = (int) (angle / (PI / 5));
PlayAnimation(index); //0 = up, 1 = up_right, 2 = right, 3 = down_right, 4 = down

Besides the fact that when angle is equal to PI it produces an index of 5, this works like a charm. Or at least I thought so at first. When I tested it, I realized that the up and down animation is playing more often than the others, which is pretty logical, since they have double the angle.

What I'm trying to achieve is something like this, but with equal angles, so that up and down has the same range as all other directions.

I can't get my head around it. Any suggestions? Is the whole approach doomed?

Thank you in advance for any input!

• I hope this is the right place to ask questions about DirectXTK which aren't really about graphics, if not please let me know a better place.
Can anyone tell me why I cannot do this:
DirectX::SimpleMath::Rectangle rectangle = {...}; RECT rect = rectangle; or
RECT rect = static_cast<RECT>(rectangle); or
const RECT rect(m_textureRect); despite Rectangle having the following operator RECT:
operator RECT() { RECT rct; rct.left = x; rct.top = y; rct.right = (x + width); rct.bottom = (y + height); return rct; } VS2017 tells me:
error C2440: 'initializing': cannot convert from 'const DirectX::SimpleMath::Rectangle' to 'const RECT' Thanks in advance

• Hi,
Can anyone point me into good direction how to resolve this?
I have flat mesh made from many quads (size 1x1 each) each split into 2 triangles. (made procedural)
What i want to achieve is : "merge" small quads into bigger ones (show on picture 01), English is not my mother language and my search got no result... maybe i just form question wrong.
i have array[][] where i store "map" information, for now i'm looking for blobs of same value in it -> and then for each position i create 1 quad. and on end create mesh from all.
is there any good algorithm for creating mesh between random points on same plane? less triangles better. Or "de-tesselate" this to bigger/less triangles/quads?
Also i would like to find "edges" and create "faces" between edge points (picture 02 shows what i want to achieve).
No need for whole code, just if someone can point me in good direction would be nice.
Thanks

• By isu diss
I'm trying to duplicate vertices using std::map to be used in a vertex buffer. I don't get the correct index buffer(myInds) or vertex buffer(myVerts). I can get the index array from FBX but it differs from what I get in the following std::map code. Any help is much appreciated.
struct FBXVTX { XMFLOAT3 Position; XMFLOAT2 TextureCoord; XMFLOAT3 Normal; }; std::map< FBXVTX, int > myVertsMap; std::vector<FBXVTX> myVerts; std::vector<int> myInds; HRESULT FBXLoader::Open(HWND hWnd, char* Filename, bool UsePositionOnly) { HRESULT hr = S_OK; if (FBXM) { FBXIOS = FbxIOSettings::Create(FBXM, IOSROOT); FBXM->SetIOSettings(FBXIOS); FBXI = FbxImporter::Create(FBXM, ""); if (!(FBXI->Initialize(Filename, -1, FBXIOS))) { hr = E_FAIL; MessageBox(hWnd, (wchar_t*)FBXI->GetStatus().GetErrorString(), TEXT("ALM"), MB_OK); } FBXS = FbxScene::Create(FBXM, "REALMS"); if (!FBXS) { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to create the scene"), TEXT("ALM"), MB_OK); } if (!(FBXI->Import(FBXS))) { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to import fbx file content into the scene"), TEXT("ALM"), MB_OK); } FbxAxisSystem OurAxisSystem = FbxAxisSystem::DirectX; FbxAxisSystem SceneAxisSystem = FBXS->GetGlobalSettings().GetAxisSystem(); if(SceneAxisSystem != OurAxisSystem) { FbxAxisSystem::DirectX.ConvertScene(FBXS); } FbxSystemUnit SceneSystemUnit = FBXS->GetGlobalSettings().GetSystemUnit(); if( SceneSystemUnit.GetScaleFactor() != 1.0 ) { FbxSystemUnit::cm.ConvertScene( FBXS ); } if (FBXI) FBXI->Destroy(); FbxNode* MainNode = FBXS->GetRootNode(); int NumKids = MainNode->GetChildCount(); FbxNode* ChildNode = NULL; for (int i=0; i<NumKids; i++) { ChildNode = MainNode->GetChild(i); FbxNodeAttribute* NodeAttribute = ChildNode->GetNodeAttribute(); if (NodeAttribute->GetAttributeType() == FbxNodeAttribute::eMesh) { FbxMesh* Mesh = ChildNode->GetMesh(); if (UsePositionOnly) { NumVertices = Mesh->GetControlPointsCount();//number of vertices MyV = new XMFLOAT3[NumVertices]; for (DWORD j = 0; j < NumVertices; j++) { FbxVector4 Vertex = Mesh->GetControlPointAt(j);//Gets the control point at the specified index. MyV[j] = XMFLOAT3((float)Vertex.mData[0], (float)Vertex.mData[1], (float)Vertex.mData[2]); } NumIndices = Mesh->GetPolygonVertexCount();//number of indices MyI = (DWORD*)Mesh->GetPolygonVertices();//index array } else { FbxLayerElementArrayTemplate<FbxVector2>* uvVertices = NULL; Mesh->GetTextureUV(&uvVertices); int idx = 0; for (int i = 0; i < Mesh->GetPolygonCount(); i++)//polygon(=mostly triangle) count { for (int j = 0; j < Mesh->GetPolygonSize(i); j++)//retrieves number of vertices in a polygon { FBXVTX myVert; int p_index = 3*i+j; int t_index = Mesh->GetTextureUVIndex(i, j); FbxVector4 Vertex = Mesh->GetControlPointAt(p_index);//Gets the control point at the specified index. myVert.Position = XMFLOAT3((float)Vertex.mData[0], (float)Vertex.mData[1], (float)Vertex.mData[2]); FbxVector4 Normal; Mesh->GetPolygonVertexNormal(i, j, Normal); myVert.Normal = XMFLOAT3((float)Normal.mData[0], (float)Normal.mData[1], (float)Normal.mData[2]); FbxVector2 uv = uvVertices->GetAt(t_index); myVert.TextureCoord = XMFLOAT2((float)uv.mData[0], (float)uv.mData[1]); if ( myVertsMap.find( myVert ) != myVertsMap.end() ) myInds.push_back( myVertsMap[ myVert ]); else { myVertsMap.insert( std::pair<FBXVTX, int> (myVert, idx ) ); myVerts.push_back(myVert); myInds.push_back(idx); idx++; } } } } } } } else { hr = E_FAIL; MessageBox(hWnd, TEXT("Failed to create the FBX Manager"), TEXT("ALM"), MB_OK); } return hr; } bool operator < ( const FBXVTX &lValue, const FBXVTX &rValue) { if (lValue.Position.x != rValue.Position.x) return(lValue.Position.x < rValue.Position.x); if (lValue.Position.y != rValue.Position.y) return(lValue.Position.y < rValue.Position.y); if (lValue.Position.z != rValue.Position.z) return(lValue.Position.z < rValue.Position.z); if (lValue.TextureCoord.x != rValue.TextureCoord.x) return(lValue.TextureCoord.x < rValue.TextureCoord.x); if (lValue.TextureCoord.y != rValue.TextureCoord.y) return(lValue.TextureCoord.y < rValue.TextureCoord.y); if (lValue.Normal.x != rValue.Normal.x) return(lValue.Normal.x < rValue.Normal.x); if (lValue.Normal.y != rValue.Normal.y) return(lValue.Normal.y < rValue.Normal.y); return(lValue.Normal.z < rValue.Normal.z); }