Jump to content
  • Advertisement
Sign in to follow this  
  • entries
  • comments
  • views

About this blog

Notes on editing skinned meshes and skeletal animations

Entries in this blog


Compute shader for skinned mesh animation - D3D11


I was raised and educated as an engineer. Not an engineer for a locomotive (though I would've loved to have been such in the days of steam), but, by education and experience, an engineer in the fields of nuclear, controls and instrumentation, and a brief career at Oak Ridge National Lab in the Computer and Instrumentation section. My father was an engineer, as were my 2 older brothers and my older son - altogether representing careers in ceramics, petroleum, agricultural and industrial engineering, and the fields I played around with mentioned above.

The point I'm trying to make is that my engineering inclination is apparently genetic, and I'm the result of nature and nurture. By occupation, an engineer is an "idea implementer." Not to say I don't have an original idea once in a while, but I do enjoy taking someone else's ideas and seeing if I can implement them.

In my continuing self-education in, and exploration of, D3D11, I'm using a mesh editor as the vehicle. Several previous entries in this blog describe approaches to implementation of pieces for that editor. The ideas for the important pieces are, alas, not the result of my own ideas, but are suggestions by others that I've engineered.

Yet Another Acknowledgement

At a hint from gamedev member unbird that mesh skinning could be done in a compute shader, I decided to give compute shaders a try, as that was an area of the API I hadn't experimented with. As with the PPP (Pixel Perfect Picking) approach that proved so useful in my editor, credit for the idea of using a compute shader for mesh skinning goes to that li'l ol' shader wizard - unbird.

Why Use A Compute Shader

A common approach to animated skinned-mesh rendering is as shown in the shader code below. The process is comprised of streaming vertex data to a vertex shader, applying animation, world, view and projection transforms to the position, transforming the vertex normal for vertex-lighting calculations, and passing the results to a pixel shader for texture sampling and color output.

For an indexed mesh, in which the same vertex is used in multiple triangles, that process results in the same calculations being performed on the same vertex multiple times. E.g., if the mesh contains N vertices which are used in just 2 adjacent triangles, the transformation and lighting calculations are done N "extra" times. It's not uncommon for some meshes to use the same vertex in 4 or more triangles. If the vertex calculation results are cached, avoiding redundant calculations for the same vertex, efficiency can be improved.

Further, N.B., that skinning process uses the same data (animation matrix array, world-view-projection matrices, lighting info, etc.) and the same instructions for every vertex calculation. That situation is right down a compute shader's alley. I.e., multiple parallel threads ( that sounds oxymoronic**) can be used to perform the calculations once per vertex, caching the results. That combines the efficiency of one-calc-per-vertex with parallel processing.

** The phrase "multiple parallel threads" was provided by the Department of Redundancy Department.

Skinned Mesh Animation - A Compute Shader Approach

Briefly, I modified my mesh skinning shader into a compute shader by substituting an RWBuffer containing the mesh's skinned vertices, for the "normal" indexed input vertex stream, and another RWBuffer for output in place of output to a pixel shader. Rendering is then done by copying the compute shader output buffer to a vertex buffer, and using DrawIndexed on that vertex buffer to a pass-through vertex shader (output = input), and a simple texture sampling pixel shader.

The implementation described below is very basic, with lots of room for improvement.

Using very simple profile testing (using QueryPerformanceCounter before and after the render calls, and repeatedly averaging 1000 render times), it appears the compute shader approach may be ~30% faster than the "traditional" mesh skinning shader. I have not done any further testing to determine exactly where the efficiency comes from. I'm just reporting the results I obtained.

More Information

Here's the general form of a mesh skinning shader I've used.

// There are various and sundrie constant buffers providing view and projection matrices,// material and lighting params, etc....// so, any variables appearing below that aren't local to the vertex shader are// in some constant buffer somewhere// And, of course, the "traditional" array of animation matrices used to skinned the mesh verticescbuffer cbSkinned : register(b1){ matrix meshWorld; matrix meshWorldInvTranspose; matrix finalMats[96];};Texture2D txDiffuse : register(t0);SamplerState samLinear : register(s0);//--------------------------------------------------------------------------------------struct VS_INPUT{ float4 Pos : POSITION; float4 Norm: NORMAL; float2 Tex : TEXCOORD; uint4 index : BONE; float4 weight : WEIGHT;};struct PS_INPUT{ float4 Pos : SV_POSITION; float4 Color: COLOR0; float2 Tex : TEXCOORD0;};//--------------------------------------------------------------------------------------// Vertex Shader//--------------------------------------------------------------------------------------PS_INPUT VS(VS_INPUT input){ PS_INPUT output = (PS_INPUT)0; int bidx[4] = { input.index.x, input.index.y, input.index.z, input.index.w }; float weight[4] = { input.weight.x, input.weight.y, input.weight.z, 0 }; weight[3] = 1.0f - weight[0] - weight[1] - weight[2]; float4 Pos = 0.0f; float3 Normal = 0.0f; for (int b = 0; b ]) * weight; Normal += mul(input.Norm.xyz, (float3x3)finalMats[bidx]) * weight; } Normal = normalize(Normal); // legacy code - meshWorld should be combined with ViewProjection on the CPU side output.Pos = mul(Pos, meshWorld); output.Pos = mul(output.Pos, ViewProjection); output.Tex = input.Tex; float4 invNorm = float4(mul(Normal.xyz, (float3x3)meshWorldInvTranspose), 0.0f); output.Color = saturate(dot(invNorm, normalize(lightDir)) + faceColor); return output;}//--------------------------------------------------------------------------------------// Pixel Shader//--------------------------------------------------------------------------------------float4 PS(PS_INPUT input) : SV_Target{ return txDiffuse.Sample(samLinear, input.Tex) * input.Color;}

The vertex buffer streamed into the vertex shader during the context->DrawIndexed call is comprised, not too surprisingly, of vertices matching struct VS_INPUT above. The process is, I believe, pretty standard.

With unbird's hint that mesh skinning could be done in a compute shader, I modified the vertex shader above as follows.

// constant buffers similar to the above code// for the compute shader input and output ...struct CS_INPUT{ float4 Pos; float4 Norm; float2 Tex; uint4 index; float4 weight;};struct CS_OUTPUT{ float4 Pos; float4 Color; float2 Tex;};RWStructuredBuffer vertsIn : register(u0);RWStructuredBuffer vertsOut : register(u1);//--------------------------------------------------------------------------------------// Compute Shader//--------------------------------------------------------------------------------------[numthreads(64, 1, 1)]void CS_Anim(uint3 threadID : SV_DispatchThreadID){ CS_INPUT input = vertsIn[threadID.x]; CS_OUTPUT output = (CS_OUTPUT)0; int bidx[4] = { input.index.x, input.index.y, input.index.z, input.index.w }; // etc., etc. - same as the vertex shader above // but, instead of returning the output to be passed to the pixel shader ... vertsOut[threadID.x] = output;}

For support, several D3D11 buffer objects are created. The vertsIn and vertsOut RWBUFFERs above are USAGE_DEFAULT buffers, bound as UNORDERED_ACCESS, with MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED.

vertsIn is sized as number-vertices * sizeof( struct CS_INPUT ), and is initialized with the mesh vertices. I initialize the data from a std::vector of the vertices, but it could be done with CopyResource, also (I think).

vertsOut is sized as number-vertices * sizeof( struct CS_OUPUT).

In addition, a vertex buffer is created for the final rendering, with attributes USAGE_DEFAULT and BIND_VERTEX_BUFFER, and ByteWidth the same as the vertsOut buffer.
Unordered access views ( vInView and vOutView ) are created for each { Format = DXGI_FORMAT_UNKNOWN, ViewDimension = D3D11_UAV_DIMENSION_UNKNOWN, Buffer type }.

The compute shader is used as follows:

1. Set all the constant buffers similar to the skinning vertex shader, except, of course, with context->CSSetConstantBuffers.
2. Call context->CSSetShader( ... ).
3. Set the RWBuffers with:
ID3D11UnorderedAccessView* views[2] = { vInView.Get(), vOutView.Get() };
context->CSSetUnorderedAccessViews(0, 2, views, nullptr);
4. Call context->Dispatch( (numVertices + 63)/64, 1, 1 );

The vertsOutBuffer then contains the screen-space vertices (and tex coords and color) that would normally have been passed on to the pixel shader. The animated skinned mesh is then rendered with a simple pass-through vertex shader, and the same pixel shader originally used with the skinned mesh shader shown first above.

//--------------------------------------------------------------------------------------struct VS_INPUT{ float4 Pos : POSITION0; float4 Color : COLOR0; float2 Tex : TEXCOORD0;};struct PS_INPUT{ float4 Pos : SV_POSITION; float4 Color: COLOR0; float2 Tex : TEXCOORD0;};//--------------------------------------------------------------------------------------// Vertex Shader//--------------------------------------------------------------------------------------PS_INPUT VS(VS_INPUT input){ PS_INPUT output = (PS_INPUT)input; // used to change the semantics return output;}//--------------------------------------------------------------------------------------// Pixel Shader//--------------------------------------------------------------------------------------float4 PS(PS_INPUT input) : SV_Target{ return txDiffuse.Sample(samLinear, input.Tex) * input.Color;}

The drawing is done by:

1. Moving data from the vertsOut buffer to the vertex buffer using CopyResource.
2. Set the appropriate input layout reflecting VS_INPUT, and PRIMITIVE_TOPOLOGY_TRIANGLELIST.
3. Set as input the newly copied vertex buffer with appropriate stride and zero (0) offset, and the mesh's index buffer (with it's original format).
4. Set the vertex and pixel shaders ( created/compiled from the HLSL code shown immediately above ).
5. Set the pixel shader texture and sampler as appropriate.
6. Call context->DrawIndexed( numIndices, 0, 0 );

The compute shader approach to mesh skinning described isn't generally compatible with the PPP (Pixel-Perfect-Picking) approach used for selecting vertices, faces and edges in a mesh editor I've described in other entries in this journal.

However, possible uses for the compute shader approach come to mind.

- increase the efficiency of animated skinned-mesh rendering in general.
- because the compute shader output is in the form of position-color-texcoords structures, that output could be batched with other mesh data (other skinned meshes, static meshes, etc.) for final rendering.




PPP - Pixel Perfect Picking - Edge Selection adventures

This blog entry describes the continuing adventure of coding an editor for a skinned mesh which has skeletal animation. Coding this app is intended not as a destination, but as a journey of learning through the features and capabilities of D3D11.

Continuing Summary

After laying down the framework of the editor as described in previous entries, I took on the task of adding the ability to select mesh vertices, faces, and edges. Picking those elements of a static mesh with a mouse click, in a more traditional approach, requires unprojecting the mouse position, and iterating though mesh data, checking for intersections between that mesh world mouse "ray" and the various elements of the mesh. Faces aren't too bad, but the intersection test requires implementation of a ray-triangle intersection test. Vertices aren't horribly bad, either, but still requires choosing an appropriately sized epsilon around each vertex position with which to test the ray intersection. In addition, those tests often include comparing the depth or distance of a hit in order to select the element closest to the eye or camera position in the current view.

Picking elements in a skeletal animated skinned mesh, where vertex positions must be calculated (e.g., from weighted bone animation matrices) before the intersection tests are made, gets even more complex. Debugging the whole algorithm (if you make mistakes as often as I do) can be a very lengthy process.

In addition, even with broad-phase culling for the intersection tests, that algorithm requires testing (in some way) every element in the mesh (thousands of vertices or faces) for every mouse click. The time required for that testing isn't normally a big deal in an editor, where the app spends most of its time waiting for user input. Nevertheless, the process is lengthy and complex - and, as it turns out, is unneeded if a graphical approach to picking is implemented.

In D3D11 (or most other graphics interfaces), skinned meshes are commonly rendered using shaders, and all those vertex bone-weighted calculations are done in the GPU. So, one might ask - if all those calcs are already being done to render the mesh, can that information be used to eliminate or reduce the need for doing similar calcs in the CPU for picking?

The PPP Approach ( Described Yet Again )

The answer to that question is emphatically YES. Consider inverting the traditional picking process - project all the mesh elements into screen space, rather than unprojecting the mouse position into mesh space - and search for mesh elements at the exact screen position of the mouse click. If that were done on the CPU side, it would be considered an inefficient approach. I.e., why convert thousands of vertex positions to screen space to compare to the mouse position, when the mouse position can be converted more efficiently into mesh space for the intersection tests?

However, projecting all the mesh elements into screen space is precisely what the shader does in the rendering process. If that information is captured and used, the picking process becomes much less complex and comprises looking at data at a single pixel position to determine an intersection hit.

As mentioned before, the concept of picking in a pixel-perfect way comes from the mind of gamedev member unbird - a shader wizard. Because I'm basically lazy, I came up with the acronym PPP (Pixel-Perfect Picking) to making typing easier.

PPP takes advantage of several features of HLSL and the D3D11 API:

- Multiple render targets can be written to simultaneously.
- System-value semantics in HLSL such as SV_VertexID and SV_PrimitiveID


1. The user clicks the mouse to select a mesh element (vertex, face, edge), dependent on the current edit mode.
2. A second rendertarget is set before the next frame is rendered.
3. Vertex, geometry and pixel shaders are set to collect and record data in that second rendertarget.
4. The mesh is rendered.
5. The pixel shader returns data in the second rendertarget comprising (e.g.) meshID, faceID, vertexID, etc.
6. The single texel at the screen position of the mouse click in the second rendertarget is examined to determine which mesh element data was last written to that texel.

Keep In Mind ... A mesh may have several vertices with the same position, but different normals, colors, tex coordinates, bone indices/weights, etc. Yes, it's possible to eliminate duplicate vertex positions by using several buffers (position, normal, tex coords, bone weights, etc.) and stream them to the vertex shader. However, I don't do that. So this editor has to account for multiple vertices (different vertex IDs) at the same position.

So ... N.B. the wording in step 6 above. If, for instance, the mesh has multiple vertices with the same position, and because depth testing takes place after the pixel shader returns target data, the data written to the second rendertarget is only for the first vertex at a texel position at the smallest depth. That is, when a second vertex at the same position is rendered, depth testing will discard the data for all rendertargets at that pixel position.

The result of the PPP process is thus limited to determining that at least one element was rendered at the screen position. Data for that particular element can, of course, be read from the second rendertarget data, but other elements at the same depth, at that position, have to be checked.

Picking Edges

Selecting an edge in a skinned mesh with skeletal animation is a whole different ballgame than vertex and face selection.

1. The visual representation of an edge (a line) may be only 1 or 2 pixels in width. For convenience, I want to let the user click near the rendered line without having to click on it in a "pixel perfect" manner.

2. Most edges in an indexed mesh are shared by adjacent triangles. As a result, most edges are drawn twice. The first edge is rendered normally. Other draws of that edge are thrown away during depth testing.

3. For editing purposes, when the user clicks near a line, all coincident edges should be selected.

The following information is needed to pick edges:

1. An indication of an edge near the mouse position where the user clicked to select edges.
2. Information on all edges which are coincident with the edge that's picked.

For the first piece of data, the PPP rendertarget can be used in a way similar to the way vertices and faces are picked. As noted, the PPP rendertarget only records the edge information at a pixel for the last edge that survived the depth test.

Similar to vertex picking discussed in a previous blog, information on all edges coincident with the picked edge is needed. For vertex picking, a geometry shader was used to stream out vertex data. The stream-out buffer data was examined to determine all the vertices coincident with the picked vertex. The geometry shader stream-out approach can't be used for edges because an edge is drawn over a range of pixel positions - from one vertex of a triangle to the next vertex. The geometry shader would "knows" only the individual vertex positions - not the positions between them.

For edge picking, the following scenario is implemented.
On a user mouse click to select an edge, for one render frame, do the following -

1. Construct a pick rectangle - provide an area around the mouse position. Record data for any edges traversing that rectangle, whether they pass the depth test or not.
2. If a pixel for an edge is rendered within the pick rectangle, output meshID, faceID, and edgeID to a second PPP rendertarget.
3. In addition, output meshID, faceID, and edgeID to an ordered access view (UAV) to provide a list of all edges traversing the pick rectangle, whether rendered or not.

FYI - thankfully, data written to a UAV in the pixel shader persists whether the pixel is later clipped or discarded.

The UAV is a uint4 buffer, used to store mesh ID, face ID, edge ID, and a flag to indicate data was written. Initially, the intent was to store data for the entire model. That is, assuming a maximum of 4 meshes per model, the UAV would be sized for 4 * max-num-faces-in-any-mesh * 3. However, because that limits edge picking to 4 meshes per model, a simpler approach of rendering one mesh at a time was implemented. So, current, the UAV is sized for edges in a single mesh - i.e., number-faces * 3, and the pixel shader writes data to the UAV indexed by faceID * 3 + edgeID.
Before describing the edge picking process, here's the geometry shader.[maxvertexcount(6)]void GS_Edges(triangle GS_INPUT input[3], uint faceID: SV_PrimitiveID, inout LineStream edges){ PS_INPUT output = (PS_INPUT)0; output.Info[1] = faceID; output.Info[3] = 123; // output 3 edges v0-v1, v1-v2, v2-v1 // edge 0 output.Info[2] = 0; output.Pos = input[0].Pos; output.Color = input[0].Color; output.Tex = input[0].Tex; edges.Append(output); output.Pos = input[1].Pos; output.Color = input[1].Color; output.Tex = input[1].Tex; edges.Append(output); // edge 1 output.Info[2] = 1; output.Pos = input[1].Pos; output.Color = input[0].Color; output.Tex = input[1].Tex; edges.Append(output); output.Pos = input[2].Pos; output.Color = input[2].Color; output.Tex = input[2].Tex; edges.Append(output); // edge 2 output.Info[2] = 2; output.Pos = input[2].Pos; output.Color = input[2].Color; output.Tex = input[2].Tex; edges.Append(output); output.Pos = input[0].Pos; output.Color = input[0].Color; output.Tex = input[0].Tex; edges.Append(output); edges.RestartStrip();}

The pixel shader outputs the data to the second (PPP) rendertarget. It also tests if the position is within the pick rectangle. If so, the same data is output to the UAV.
RWBuffer SelEdgeInfo : register(u2); // 2 rendertargets and a UAVstruct PS_OUTPUT{ float4 Color : SV_Target0; uint4 Info : SV_Target1;};// use the pixel shader to pick edge information within the rect specified// input.Info = 0->meshID, 1->faceID, 2->edge number, 3 = 123 flagPS_OUTPUT PS_UAV(PS_INPUT input){ PS_OUTPUT output = (PS_OUTPUT)0; output.Color = input.Color; input.Info[0] = (uint)bLighting.z; // meshNum output.Info = input.Info; // pickInfo is a uint4 in a constant buffer with the coords fro the pick rectangle in screen coords if (input.Pos.x >= pickInfo[0] && input.Pos.x = pickInfo[1] && input.Pos.y

The end result following that process is:

1. The second rendertarget which now holds data at each pixel position where an edge traversed. As noted above, that data is actually for the edge that last passed the depth test.
2. A UAV buffer which holds data for any edge that traversed the pick rectangle.

The pick rectangle in the PPP rendertarget is analysed to determine what edge traversed closest to the pick position (center of the rectangle). Taking a comment (first comment in that link) Ashaman73 made regarding weighting pixels, each pixel position which contains data is weighted by its distance from the pick position. The data at the highest weighted position is the "selected" edge - i.e., mesh ID, face ID and edge ID.

Using that edge identification, the UAV buffer is examined for all edges which traversed the pick rectangle. The mesh data for those edges is examined to determine which are coincident with the "selected" edge - i.e., which edges are defined by vertices that are within a small epsilon of the vertices that define the "selected" edge. Those matching edges from the UAV, and the edge selected in the PPP rendertarget are all added to the model's "selected edge" std::vector.

In the above image - the mouse was clicked near the vertical edge in the center of the view. The "Pick data" (on the right) shows diagramatically which pixel positions in the pick rectangle around that mouse position had data written to them. Below that diagram, it is noted that the data in the pixel nearest the pick position was for face 136, edge 1.

A bit further below, note that the UAV data indicates both face 136 - edge 1, and face 139 - edge 2 traversed the pick rectangle. In face selection mode, the triangles on either side of that edge were selected and, as expected, face 136 is on one side of the edge, and face 139 is on the other. That is, face 136-edge 1 was drawn first, edge data was written to the second (PPP) rendertarget and to the UAV. Face 139-edge 2 was later drawn at the same position, but the data written to the PPP target was discarded by depth testing. However, that data was also written to the UAV which persisted.

Although mesh vertex data has to be examined to determine which edges listed in the UAV are coincident with the edge "selected" in the PPP process, there are just a few candidates to test, compared to a more traditional approach which would require testing every edge in the mesh to determine if it's close to the pick position, and compare hits in that search to determine which edges should be considered "selected."




PPP - Pixel Perfect Picking - the Face Selection part

A previous blog entry discusses the concept of using a second render target to store mesh data during rendering as an approach for picking vertices and faces. That entry didn't go into any detail with regard to picking faces (triangles).


It should be known, and bears repeating from previous blogs, that the concept of pixel-perfect-picking I use is from the genius of gamedev member unbird. His patience while I worked out the kinks in my code is acknowledged and appreciated.

In Any Case
I recently added a face-edit mode to my editor and, because of the work done previously with the PPP render target to pick vertices, I was pleasantly surprised at how much easier it was to pick faces using the same technique.


Much of the code below is the result of experimenting with D3D11, seeing what can be done, and how to do it. It can all probably be improved. However, the purpose is to illustrate how a second render target can be used in a fairly simple way to eliminate hundreds or thousands of picking calcs for an animated mesh.

One of the primary benefits (for me, anyway) of using a second render target to store skinned-mesh-related data to support picking is the avoidance of doing all those loops and calculations to pick a face. I.e., having to unproject mouse coordinates using a bunch of animation matrices, calculating bone-weighted vertex positions, checking for a triangle hit, etc.

During the process of rendering an animated skinned-mesh with bone-weighted vertices all those calculations are done in the shader anyway. When the user clicks the mouse to select a face, a second render target is set to which mesh data is to written, a pixel shader is set which has the same input signature** as the "regular" skinning shader, and one cycle of rendering is done without any other changes.

** Well, the pixel shader input includes uint faceID : SV_PrimitiveID in addition to the "regular" input struct.

A Few Details

My graphics object includes:
ComPtr g_pIDRenderTarget; ComPtr g_pIDRenderTargetView; ComPtr g_pIDStagingRenderTarget;
The graphics Resize(...) routine, primarily used for window resizing, handles buffer size related changes that most D3D11 programmers are familiar with - unbinding render targets, resizing the swap chain buffers, (re)creating the primary render target view, etc. That routine is called during the graphics object Init(...) routine.

As all the ingredients are available, the g_pIDRenderTarget, the view for that 2nd texture, and a staging buffer for that 2nd texture are also created. As a result of unbird's comment below, and other help through PMs, the 2nd texture format of DXGI_FORMAT_R32B32G32A32_UINT is used.

See unbird's comments below - when the data written to a rendertarget is expected to remain unchanged - turn blending OFF!

To support writing mesh data as described above, three simple routines are provided:

void VzGraphics::SetIDRenderTargetView() // Prep the pipeline for a second render target{ float clr[4] = { 0, 0, 0, 0 }; g_pImmediateContext->ClearRenderTargetView(g_pIDRenderTargetView.Get(), clr); ID3D11RenderTargetView *tmpPtr[2] = { g_pRenderTargetView.Get(), g_pIDRenderTargetView.Get() }; g_pImmediateContext->OMSetRenderTargets(2, tmpPtr, g_pDepthStencilView.Get());}void VzGraphics::SetDefaultRenderTargetView() // Set the pipeline back to "normal"{ ID3D11RenderTargetView* tmpPtr[2] = { g_pRenderTargetView.Get(), nullptr }; g_pImmediateContext->OMSetRenderTargets(2, tmpPtr, g_pDepthStencilView.Get());}ID3D11Texture2D *VzGraphics::GetIDRenderTexture() // provide a copy of the data{ g_pImmediateContext->CopyResource(g_pIDStagingRenderTarget.Get(), g_pIDRenderTarget.Get()); return g_pIDStagingRenderTarget.Get();}

That's all there is on the graphics end of things.

The skinned-mesh object has a routine to draw (indexed) the mesh(es) - after updating the animations, calculating all the data needed by the shaders, constant buffers and resources are set. One of those constant buffers includes the mesh ID. If your mesh object has only a single mesh to render, that's unnecessary. My object supports multiple meshes so I need to know which mesh is rendered. That skinned-mesh draw routine also takes a flag (bool bLButtonDown) indicating whether picking is to supported or not. If so, then the only difference is set the picking pixel shader.

That shader looks like so:

struct PS_OUTPUT{ float4 Color : SV_Target0; float4 FaceInfo : SV_Target1;};PS_OUTPUT PS_FaceID(PS_INPUT input, uint primID : SV_PrimitiveID){ PS_OUTPUT output; output.Color = txDiffuse.Sample(samLinear, input.Tex) * input.Color; // the commented-out code is applicable to an original texture format of R8B8G8A8_UNORM // After changing the format to R32B32G32A32_UINT, the simpler code following is used ///////////// old /////////////////////// // uint id0 = uint(primID / 255.0f); // uint id1 = primID - id0 * 255; // the mesh ID is buried in the lighting stuff because there's room, and it's convenient // output.FaceInfo = float4(id1/255.0f, id0/255.0f, bLighting.z/255.0f, 123.0f/255.0f); ///////////// end of old ///////////////// output.FaceInfo = uint4((uint)bLighting.z, primID, 0, 123); return output;}

[s]Note that HOW the data is stored in the second render target is tied to the texture format. I use an R8G8B8A8 (or B8G8R8A8 sometimes) format, similar to the backbuffer texture, so the face ID, which may exceed 256, is divided into 2 bytes and stored in 2 components of the pixel. The alpha component (123.0f / 255.0f ) is used as a flag to indicate the pixel was written.[/s]

Because a 32bit/component render target texture is used, the needed data is stuffed into a pixel in the second render target.
With all those pieces in place, when the user clicks the mouse to select a face, the sequence is:
bool bLButtonDown = true;...graphics->SetBlend( false ); // turn blending OFF to ensure the data in the ID rendertarget remains unchangedgraphics->SetIDRenderTargetView();meshObject->Render( ... , bLButtonDown, ... );graphics->SetDefaultRenderTargetView();grephics->SetBlend( ...(previous state)... );

To see what (if any) face should be picked:

ID3D11Texture2D *tex = graphics->GetIDRenderTexture(); if (nullptr == tex) { std::cout GetDesc(&td); D3D11_TEXTURE2D_DESC ttd = {}; ttd.Usage = D3D11_USAGE_STAGING; ttd.Width = 1; ttd.Height = 1; ttd.ArraySize = 1; ttd.Format = td.Format; ttd.SampleDesc = td.SampleDesc; ttd.CPUAccessFlags = D3D11_CPU_ACCESS_READ; HRESULT hr = device->CreateTexture2D(&ttd, nullptr, pickTex.ReleaseAndGetAddressOf()); if (FAILED(hr)) { std::cout CopySubresourceRegion(pickTex.Get(), 0, 0, 0, 0, tex, 0, &pBox); D3D11_MAPPED_SUBRESOURCE mr = {}; hr = context->Map(pickTex.Get(), 0, D3D11_MAP_READ, 0, &mr); if (FAILED(hr)) { std::cout Resources().Graphics().Context()->Unmap(pickTex.Get(), 0); // do with meshNum and faceNum what you will std::cout

That routine is for illustration. It's unnecessary to recreate the staging texture for every pick. Just wanted to show that the texture must be the same format, etc. And - it's only one pixel - thus PPP (Pixel Perfect Picking). Using CopySubresourceRegion is just a lazy approach to avoid doing the pitch calcs to locate the pixel myself. And I use uint32_t to match the component size. Use your favorite (and, perhaps, better) coding practices.

Other Possible Uses

Ashaman73 has elsewhere discussed "proximity" picking. Sometimes picking something in the vicinity of the mouse position, but perhaps not directly under the mouse, is desirable. E.g., picking an edge in a "pixel-perfect" manner could be downright difficult. Using techniques similar to those described above, create a pick box around the mouse position of an appropriate size (say, 16x16 pixels). Examine the results and select data closest to center, etc.

Something similar could be done for edge selection if the user is allowed to drag the mouse a short distance to select an edge. Render the mesh in wireframe and check for edge-related data at the mouse position. I'm currently in process of testing that use.




PPP - Pixel Perfect Picking for Vertex and Face Selection


Credit for the concept of Pixel Perfect Picking in D3D11 belongs to gamedev member unbird, a wizard with shaders. However, since I'm basically lazy, I coined the acronym PPP, and adapted the idea to my needs.

N.B., some of the implementations below require SM5.

The PPP Approach

The idea is to pick objects (in my editor that's vertices and faces) with a mouse-click by writing mesh and vertex/face information to a texture during the rendering process; sample a single pixel of the texture at the mouse position; and interpret the components as data, rather than color, providing the needed information. If you need to know what object is under the mouse position, and you're rendering the scene anyway, get the information from the rendering process itself in a single pass.

When the user clicks the mouse to select something (or, actually, any time you want to know what's under the mouse position):

1. Clear and set a second render target view.
2. Set appropriate buffer data and shaders to both render to the primary render target and output the needed information to that second texture (mesh id's, vertex or face id's, etc.) HLSL provides the convenience of SV_VertexID and SV_PrimitiveID which can be passed through.
3. Render the scene, writing the color to SV_Target0 and the data to SV_Target1.
4. Use CopySubresourceRegion to get just the single pixel from the second texture at the mouse position.
5. Examine the pixel to see what (if anything) lies under the mouse position.

It's a Different Approach to an Old Concept

Picking objects by rendering them can be done in OpenGL. I haven't done it in ages, but, IIRC, OGL has the option to set the render mode to GL_SELECTION. You then get a list of object hits in the view volume by rendering. PPP works similarly but tests for an object hit at the pixel under the mouse-click position in a single pass.

The Need for a Picking Algorithm

A feature I wanted to incorporate in my mesh editor for picking or selecting vertices and faces is the option to limit selection to only visible vertices or faces. That is, "visible" in the sense of "appearing in the current view." It's one of the options Blender implements and I use it frequently when editing meshes. In general, it's more natural when selecting vertices or faces with a mouse-click to expect that the selection will include only the verts/faces I can see onscreen. Before I was mindful of the option in Blender, I would click a face or vertex and later find that hidden vertices or faces (verts or faces at a greater depth than the intended selection) were also selected. Editing what I assumed was only affecting the verts/faces I could see, was, in fact, altering stuff I hadn't intended.

A common method for picking verts/faces with a mouse-click is to do a raycast into the scene from the mouse position, search through verts/faces for hits, and select the object with the smallest hit distance. That picking algorithm can be extended to a list of hits with depth information (hit distance) if desired. That algorithm can be made more efficient with broad-phase culling (choose your favorite method), but still requires setting up the raycast and searching through data to find hits.

However, as with the OpenGL GL_SELECTION method, the very act of rendering does that depth testing (if depth testing is enabled). In particular, if the scene is being rendered anyway, why not use the rendering process to get the needed information, and skip the raycasting process altogether?

For my particular needs in a mesh editor, I have two picking/selection algorithms available - one using geometry shader stream-out to generate a list of visible objects for box selection; the other using the PPP method for mouse picking.

In addition, to give the user a reasonable target for clicking, I display the vertices as billboarded colored quads (say 3x3 or 4x4 pixels), with color indicating whether the vert is currently selected or not, with depth testing disabled so the full quad overdraws adjacent geometry. To do that with depth testing disabled requires culling using a depth map (I use a method similar to shadow mapping) to determine whether the quad is visible and should be rendered, or is not visible and should be discarded. That can be done in the pixel shader, or in a geometry shader. I do it in a geometry shader which can also be used, as mentioned, with stream-out to generate a list of visible objects. Maybe more on that in a later blog entry.




Interlude - Concept of an Application Manager

Years ago, back when I was gainfully employed, I took a management class as part of my employer's "career enhancement" plan. It completely changed the way I thought (and still think) about how an organization can (and should) be run.

The concept is that the staff of a company ( or a division of the company, or any subgroup down to the project or team level ) is an inverted pyramid - not with the leaders at the top** of a pyramid, but with the workers at the top. Managment was viewed as support for the workers, providing them with any and all resources they needed to get work done.

** The illustration shown has a big red X through it because it shows a team leader being carried by the workers. No, no, no. A team leader should be providing support for the workers.

Admittedly, for that concept to work, the workers are dedicated to working in the best interest of the customer, producing efficiently and economically what the customer wants.

A Program's User is the Customer, and the Customer is Always Right

Extending the concept to a user-driven computer program such as an editor, the current edit mode is provided with resources by the application manager to do whatever the user demands. The user interfaces with the current edit mode, just as a customer in a store talks directly with the clerk, not with the store manager. The current edit mode manipulates the mesh in any way it can, in response to the user's commands. If the user wants to do something the current mode can't provide, the current mode tells the application manager to switch to the edit mode the user has requested.

That results in the top-down approach to Windows message handling described in my previous blog entry.


At each step along the way (the message procedures), the user's input is the priority, and each object (the current mode instance, the base mode instance, etc.) examines the user's requests. For those requests which the current mode recognizes, all the resources the program provides are available to or generated for that mode to accomplish what it must. If the current mode can't respond (or, rather, doesn't respond) to the request, the user's request is passed on to the next object in the chain. Each object in the chain either responds to the input, or passes it on. Eventually some object responds to the user's input. N.B., that response may be to create an object (i.e., change modes) that can do what the user wants.

Imagine a customer in a store talking to a clerk whose name-tag says "I sell shoes." After looking at shoes, the customer says: "I want to look at socks." The smiling clerk silently ignores the request, but the store manager is listening to the conversation. In the blink of an eye, the manager puts a new clerk behind the counter. That clerk's name-tag changes to "I sell socks" and the display case changes from shoes to socks.

The Windows API window procedure is (essentially) an event-driver, in which events occur through menu selection, key-strokes and mouse manipulations by the user. The architecture that I'm experimenting with, rather than immediately responding to those events through a set of OnSomeEvent routines, instead calls a set of specialized routines that can most likely handle the user's input at the current moment (the current mode.) The current mode routine then examines the input and (potentially) responds. If that set of routines doesn't respond, the system replaces that set with a set of routines that does respond.

I'm definitely not claiming that this architecture is better (or worse) than event-driven programming. Likely due to my own inadequacy, event-driven code I've done previously ended up with switch statements and conditional blocks to determine what specific routines to invoke. Those specific routines often had to further determine current conditions to behave properly.

Adapting L. Spiro's gamestate architecture for an editing environment in this way is proving interesting, primarily the "feature" of the approach that, when a new mode is to be set, the current state is actually deleted, and the new mode which is created, by its own initialization, provides known conditions for its own routines. For me, that provides both discipline (what must a mode do before it's destroyed?), and freedom (don't worry what the previous mode did).




Window message processing - each object gets a look at them

At the moment I have a hierarchy of objects:

class VzAppManager - base class for storing persistent data and providing support - GetGraphics(), GetWindow(), SetEditMenu(), etc.

class MeshEdit3Manager - inherits from VzAppManager. Explicit class for my mesh editor. I could have just expanded VzAppManager to handle everything but, as this is my first attempt at adapting L. Spiro's gamestate architecture to an editor, I want to keep classes reusable - e.g., VzAppManager.

class VzMsgProc - base class which has a function: virtual LRESULT MsgProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam);
Used for main window procedure dispatching. I.e.,

LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam){ VzMsgProc* proc = (VzMsgProc*)GetWindowLongA(hWnd, GWL_USERDATA); if (proc != NULL) return proc->MsgProc(hWnd, message, wParam, lParam); ... // default processing}

class VzEditMode - base class for editing modes. Similar to L. Spiro's gamestate. VzEditMode inherits from VzMsgProc so any mode can be at the top of the messaging chain. This class also has L. Spiro's virtual methods:

virtual void Init(VzAppManager *appManager) {} virtual void Destroy(VzAppManager *appManager) {} virtual void Tick(VzAppManager *appManager); virtual void Draw(VzAppManager *appManager) {} virtual LRESULT MsgProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam);

Now, for example, I have a class for editing the mesh object as a whole (not individual verts or faces).

class VzModeObject - inherits from VzEditMode and implements:

Init - gets the mesh pointer (may be null if nothings loaded or the mesh was deleted) from appManager, and sets up the main window Edit menu and associated keyboard shortcuts for that menu. At the moment, the Edit menu for this mode is just "Verts V" -> go into vertex edit mode through the menu, or via the keyboard shortcut .

Tick - calls the app manager's CameraHandleMouseWheel() function which zooms/unzooms the view. That's what I want the mouse wheel to do in this mode. Not implemented yet - TBD - check for user commands to translate/scale/rotate the mesh.

Draw - draws the mesh with the current user's options for wireframe/solidfill, lighting on/off, cull-backface/cull-none, etc.

MsgProc - at the moment, just checks if the window message is WM_COMMAND with a message ID of IDM_EDIT_MENUSTART - that's the message ID sent by the Edit menu "Verts" option or accelerator (keyboard shortcut) . If that's the message, msg->SetNextMode( VZMODE_EDITVERTS ) is called and VzAppManager sets a flag to change the mode at the next Tick() call in the main loop.

MsgProc then simply returns mgr->MsgProc(hwnd, message, wParam, lParam) to give the MeshEdit3Manager instance a chance at the window messages.

MeshEdit3Manager::MsgProc does bunches of editor specific stuff - checks for commands to exit, to change wireframe, culling and lighting states, stores keyboard input, stores flags and mouse positions for LMB button down/up, mouse moves, and mouse wheel state; and handles window resizing, making calls to Graphics() to resize the backbuffer, etc. [ Come to think of it, WM_EXIT should probably be handled by VzAppManager instead. --> Added to the TODO list. ]

The MeshEdit3Manager::MsgProc then returns VzAppManager::MsgProc.

VzAppManager::MsgProc (at the moment) handles just WM_PAINT and WM_DESTROY. If the message is handled, it returns 0. Otherwise it returns the one and only DefWindowProc call in the chain.

Of note:

case WM_DESTROY: // the current mode and it's MsgProc will soon be deleted, so provide a MsgProc until the app is closed. GetMainWindow().SetMsgProc(this); PostQuitMessage(0); break; // eventually returns 0;

Next up, I'm thinking - setup VzMode_VertEdit - first part: how to display selected/unselected vertices.




Mode context menus and accelerators

I'm a fan of menus and accelerators (keyboard shortcuts) that depend on the context. I like to edit with one hand on the keyboard, and one hand on the mouse. However, I don't always remember what shortcuts are applicable in which mode, so I like the menu to show the shortcuts.

To implement menus and shortcuts, I reserved the second position in the window main menu for a mode-specific edit menu.

I added 2 variables to the VzEditMode class (from which all mode classes inherit):

VzEditMode() : hAccel(nullptr), bEditMenuSet(false) {}...HACCEL GetAccelerator() { return hAccel; }...bool bEditMenuSet;HACCEL hAccel;

I also added 2 support functions to the app manager:

bool VzAppManager::SetEditMenu(std::vector &menuStrings, WORD startMenuId)
HACCEL VzAppManager::CreateAccelerator(std::vector modeTable)

and created a VzAcceleratorFactory class, which implements:

HACCEL VzAcceleratorFactory::CreateAccelerator(std::vector userTable)

The SetEditMenu function attempts to create an HMENU from the menu strings, assigning command IDs serially starting with startMenuId. Then, if the window main menu has an Edit menu in position 2, that "old" menu is removed and destroyed. The new mode Edit menu is then inserted into the main window menu at position 1.

The CreateAccelerator function takes a std::vector of ACCEL structures, calls the accelerator factory to create a new accelerator object which, by default, includes the persistent accelerators to which the mode accelerators are added:


When a new mode is set, the mode class has the option (in its Init call) to setup parameters for the Edit menu and/or accelerator table. E.g., the VzModeObject class sets up the menu and accelerators like so:

bool VzModeObject::SetModeMenuAndAccelerators(){ std::vector menuStrings; if (nullptr != mesh) // is there a mesh to edit? { std::wstring str = L"&Verts\tV"; menuStrings.push_back(str); str = L"&Faces\tF"; menuStrings.push_back(str); } if (!mgr->SetEditMenu(menuStrings, IDM_EDIT_MENUSTART)) return false; if (nullptr != mesh) { std::vector accelTable; ACCEL modeAccel[] = { 0, WORD('v'), IDM_EDIT_MENUSTART, 0, WORD('V'), IDM_EDIT_MENUSTART, 0, WORD('f'), IDM_EDIT_MENUSTART + 1, 0, WORD('F'), IDM_EDIT_MENUSTART + 1, }; size_t numEntries = sizeof(modeAccel) / sizeof(ACCEL); for (size_t n = 0; n CreateAccelerator(accelTable); } return true;}

The defaults if the new mode does not create a new Edit menu is just to remove the Edit menu from the main window. If no keyboard shortcuts are specified, the accelerator table is set to just the defaults shown above.

Then, in the run loop:

while (PeekMessage(&msg, 0, 0, 0, PM_REMOVE)) { //if (!TranslateAccelerator(msg.hwnd, hAccel, &msg)) if (!TranslateAccelerator(msg.hwnd, appMgr->GetCurMode()->GetAccelerator(), &msg)) { TranslateMessage(&msg); DispatchMessage(&msg); } }

With the menu and accelerators set for each mode, things are becoming more context specific. I'm thinking the next step is for each mode to have a window message procedure to process mode specific commands. When a new mode is set, the main window GWL_USERDATA will be set to a pointer to VzEditMode, and the derived mode's MsgProc will process commands of interest (whether menu items or keyboard shortcuts), call the base class MsgProc to let generic mode responses be processed, and return through the app manager's MsgProc (if one is needed any more).




The Concept of A Mesh Editor

I've always loved the concept of skinned mesh - skeletal animation. The mathematical basis really appeals to me. I remember the first time I got a DX9 application working with Frank Luna's skinned mesh class as a basis, and seeing Tiny walking, waving, etc. I had examined the DXSDK skinned mesh example, of course, but had been rather confused at all the arrays for bone indices, matrices, etc., because that example generalizes to a hierarchy that supports multiple meshes with individual textures. Luna used a single mesh with a single texture.

I spent a lot of time playing with the code, and examining actual data - dumping matrices to the debug output, dumping bone names with associated parent and children names, converting quaternions to matrices, etc. Relating that to the shader code to figure out how it all worked together was, by far, the best time I've spent in my programming "career."

Once I had a basic understanding of allocating a hierarchy, mesh containers, animation keyframes, etc., I was hooked. The biggest blessing was the text format of X-files. Seeing the structure and its data in a file, and examining the hierarchy loaded into my app gave me a lot of opportunity to understand the concept of skinned mesh animation. That was also my introduction to HLSL and shaders as I'd previously used just the DX9 pipeline for drawing "stuff." Now that I'm into D3D11, I really miss D3DXLoadMeshHierarchyFromX, mesh->GenerateAdjacency(), ID3DXEffect, etc. I now have my own x-file loader, mesh/skeleton hierarchy class, animation controller class, SRV manager class, shader manager class, etc. I never got into D3D11 effects, as I wanted to understand what was going on "under the hood" with writing and compiling shaders, setting constant buffers, input layouts, etc.

Luckily I found a DirectX x-file exporter for Blender. That took a while to fully understand with regard to creating and exporting multiple animations. Lots of experimentation with settings. However, once again, the text format version of X-files was a life-saver.

I found adding, editing and weighting vertices in Blender to be a little difficult. Probably just my laziness with regard to learning the Blender interface better. However, that was an excuse to convert my DX9 mesh/animation editor (below) to D3D11.

I also use the Win32 interface extensively to make life easier. I learned the Windows API early on, writing assembly code and using interrupts, just to see what I could do. Then DX7, 8 and 9 came along and I went on to child windows to do what I wanted to do. The above app shows a child window for DX9 rendering, and two GDI child windows specific to animation editing. It worked (after a fashion) but further development was interrupted by a friend here on gamedev talking so much about D3D11, and posting pix of all the awesome shader tricks he was capable of. So I started on the road to D3D11 and my DX9 work ground to a halt.

I still have a ways to go to be really proficient in D3D11, but I've experimented a lot and learned enough to want to go back and try to create a mesh editor in D3D11. N.B., Blender can (probably) do everything I need, but I love creating my own tools and classes for my asset pipeline. Also, an editor (versus a real-time game) has a tremendous advantage - timing and efficiency is not nearly as critical as it is for games. E.g., drawing a curve for a bone animation using GDI calls can be really S - L - O - W. However, for me anyway, anything on the order of 15 to 30 milliseconds for the rendering loop is just fine.

The Concept

When I start a new project, I usually start out with pad and pencil and write out in words what I want the result or features of the program to be. Then I make a list of things in general terms that may be necessary to learn about, or difficulties I've run into in other projects that need to be "fixed."

One of the things I want to "fix" in the D3D11 editor is to eliminate the horrendous use of switch statements and booleans I have in my DX9 editor for function dispatching. That is, that app is a mega-class that handles everything. When the loop gets a WM_LBUTTONDOWN message, I have to determine what the current operation is to determine how to handle it. Is the user selecting something? Is it the start of a drag? What should be selected - a vertex, a face, a bone? What data needs to be displayed? Really ugly, and designed to ensure the wrong thing will always happen when the code is edited. In addition, the Undo/Redo stack was completely FUBAR. Saving the current state based on what was going to be done, had been done and what would be needed to restore the state...

Having promised myself that someday I really should organize my code, I started looking around at candidates for program architecture. I ran across L. Spiro's approach to game states, and it interested me. I started thinking in terms of modes in the editor, each mode defining its own response to user input. E.g., when in object mode, WM_LBUTTONDOWN means the user wants to start translating the entire object. In vertex editing mode, the user is selecting/deselecting a vertex. The actions are well-defined, and the Undo/Redo data is clear.

That leads to the common concept of the window procedure checking to see if GWL_USERDATA is non-null, and (if not null) casting the long as a class pointer and dispatching to the class MsgProc procedure. If each mode inherits from class VzMsgProc (which has a virtual MsgProc function), response to user input is easily redirected to the mode by setting GWL_USERDATA to a pointer to the instance of the mode class when the current mode is changed. That's appealing to me.

So, at present, I have an app class (for app initialization and the run loop), an app manager class (which maintains persistent data such as pointers to instances of the graphics, main window, shader-manager and SRV-manager classes, as well as utility functions), and several mode classes (to which gets passed a pointer to the instance of the app manager). The architecture is shamelessly based on L. Spiro's concept and works very nicely.

Each mode can be written to do specific tasks without the need to worry about interfering with another mode's processes, as only one mode exists at a time! When a new mode is invoked, the previous mode passes persistent data as needed to the app manager before it's destroyed, and a new instance for the requested mode is created and becomes the current mode.

It took a day or two to get the architecture set up properly. I had to go back to basics when designing the classes to avoid circular dependencies - i.e., declaring (not defining) classes in headers. I had gotten into the (very bad) habit of throwing #include's into each class header that needed to know about another class. Mea culpa, mea culpa.

However, with that setup, it's relatively easy to design another mode and add it (somewhat) seamlessly to the editor.

For my next trick, only because I discovered the possibility and want to try it out, is to generate keyboard accelerators for each mode. That is, keypresses in each mode are context sensitive - "X" may mean one thing while editing a vertex, and another when extruding a face.

/////////////////////////////// an experiment for creating custom accelerator tables //////////////// ACCEL modeAccel[] = { FALT, WORD('c'), IDM_VIEW_CULL_TOGGLE, FALT, WORD('C'), IDM_VIEW_CULL_TOGGLE, 0, WORD('l'), IDM_VIEW_LIGHTING_TOGGLE, 0, 'L', IDM_VIEW_LIGHTING_TOGGLE, 0, WORD('w'), IDM_VIEW_WIREFRAME_TOGGLE, 0, WORD('W'), IDM_VIEW_WIREFRAME_TOGGLE, }; int numEntries = sizeof(modeAccel) / sizeof(ACCEL); hAccel = CreateAcceleratorTable(modeAccel, numEntries);

In the app's main loop:

// If there are Window messages then process them. while (PeekMessage(&msg, 0, 0, 0, PM_REMOVE)) { if (!TranslateAccelerator(msg.hwnd, appManager->GetCurMode()->hAccel, &msg)) { TranslateMessage(&msg); DispatchMessage(&msg); } }

Gotta do some thinking about generalizing the command identifiers, maybe as simple as IDM_C_ALT and IDM_W.



Sign in to follow this  
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!