Jump to content
  • Advertisement
Sign in to follow this  
  • entries
  • comments
  • views

PPP - Pixel Perfect Picking - the Face Selection part

Sign in to follow this  


A previous blog entry discusses the concept of using a second render target to store mesh data during rendering as an approach for picking vertices and faces. That entry didn't go into any detail with regard to picking faces (triangles).


It should be known, and bears repeating from previous blogs, that the concept of pixel-perfect-picking I use is from the genius of gamedev member unbird. His patience while I worked out the kinks in my code is acknowledged and appreciated.

In Any Case
I recently added a face-edit mode to my editor and, because of the work done previously with the PPP render target to pick vertices, I was pleasantly surprised at how much easier it was to pick faces using the same technique.


Much of the code below is the result of experimenting with D3D11, seeing what can be done, and how to do it. It can all probably be improved. However, the purpose is to illustrate how a second render target can be used in a fairly simple way to eliminate hundreds or thousands of picking calcs for an animated mesh.

One of the primary benefits (for me, anyway) of using a second render target to store skinned-mesh-related data to support picking is the avoidance of doing all those loops and calculations to pick a face. I.e., having to unproject mouse coordinates using a bunch of animation matrices, calculating bone-weighted vertex positions, checking for a triangle hit, etc.

During the process of rendering an animated skinned-mesh with bone-weighted vertices all those calculations are done in the shader anyway. When the user clicks the mouse to select a face, a second render target is set to which mesh data is to written, a pixel shader is set which has the same input signature** as the "regular" skinning shader, and one cycle of rendering is done without any other changes.

** Well, the pixel shader input includes uint faceID : SV_PrimitiveID in addition to the "regular" input struct.

A Few Details

My graphics object includes:
ComPtr g_pIDRenderTarget; ComPtr g_pIDRenderTargetView; ComPtr g_pIDStagingRenderTarget;
The graphics Resize(...) routine, primarily used for window resizing, handles buffer size related changes that most D3D11 programmers are familiar with - unbinding render targets, resizing the swap chain buffers, (re)creating the primary render target view, etc. That routine is called during the graphics object Init(...) routine.

As all the ingredients are available, the g_pIDRenderTarget, the view for that 2nd texture, and a staging buffer for that 2nd texture are also created. As a result of unbird's comment below, and other help through PMs, the 2nd texture format of DXGI_FORMAT_R32B32G32A32_UINT is used.

See unbird's comments below - when the data written to a rendertarget is expected to remain unchanged - turn blending OFF!

To support writing mesh data as described above, three simple routines are provided:

void VzGraphics::SetIDRenderTargetView() // Prep the pipeline for a second render target{ float clr[4] = { 0, 0, 0, 0 }; g_pImmediateContext->ClearRenderTargetView(g_pIDRenderTargetView.Get(), clr); ID3D11RenderTargetView *tmpPtr[2] = { g_pRenderTargetView.Get(), g_pIDRenderTargetView.Get() }; g_pImmediateContext->OMSetRenderTargets(2, tmpPtr, g_pDepthStencilView.Get());}void VzGraphics::SetDefaultRenderTargetView() // Set the pipeline back to "normal"{ ID3D11RenderTargetView* tmpPtr[2] = { g_pRenderTargetView.Get(), nullptr }; g_pImmediateContext->OMSetRenderTargets(2, tmpPtr, g_pDepthStencilView.Get());}ID3D11Texture2D *VzGraphics::GetIDRenderTexture() // provide a copy of the data{ g_pImmediateContext->CopyResource(g_pIDStagingRenderTarget.Get(), g_pIDRenderTarget.Get()); return g_pIDStagingRenderTarget.Get();}

That's all there is on the graphics end of things.

The skinned-mesh object has a routine to draw (indexed) the mesh(es) - after updating the animations, calculating all the data needed by the shaders, constant buffers and resources are set. One of those constant buffers includes the mesh ID. If your mesh object has only a single mesh to render, that's unnecessary. My object supports multiple meshes so I need to know which mesh is rendered. That skinned-mesh draw routine also takes a flag (bool bLButtonDown) indicating whether picking is to supported or not. If so, then the only difference is set the picking pixel shader.

That shader looks like so:

struct PS_OUTPUT{ float4 Color : SV_Target0; float4 FaceInfo : SV_Target1;};PS_OUTPUT PS_FaceID(PS_INPUT input, uint primID : SV_PrimitiveID){ PS_OUTPUT output; output.Color = txDiffuse.Sample(samLinear, input.Tex) * input.Color; // the commented-out code is applicable to an original texture format of R8B8G8A8_UNORM // After changing the format to R32B32G32A32_UINT, the simpler code following is used ///////////// old /////////////////////// // uint id0 = uint(primID / 255.0f); // uint id1 = primID - id0 * 255; // the mesh ID is buried in the lighting stuff because there's room, and it's convenient // output.FaceInfo = float4(id1/255.0f, id0/255.0f, bLighting.z/255.0f, 123.0f/255.0f); ///////////// end of old ///////////////// output.FaceInfo = uint4((uint)bLighting.z, primID, 0, 123); return output;}

[s]Note that HOW the data is stored in the second render target is tied to the texture format. I use an R8G8B8A8 (or B8G8R8A8 sometimes) format, similar to the backbuffer texture, so the face ID, which may exceed 256, is divided into 2 bytes and stored in 2 components of the pixel. The alpha component (123.0f / 255.0f ) is used as a flag to indicate the pixel was written.[/s]

Because a 32bit/component render target texture is used, the needed data is stuffed into a pixel in the second render target.
With all those pieces in place, when the user clicks the mouse to select a face, the sequence is:
bool bLButtonDown = true;...graphics->SetBlend( false ); // turn blending OFF to ensure the data in the ID rendertarget remains unchangedgraphics->SetIDRenderTargetView();meshObject->Render( ... , bLButtonDown, ... );graphics->SetDefaultRenderTargetView();grephics->SetBlend( ...(previous state)... );

To see what (if any) face should be picked:

ID3D11Texture2D *tex = graphics->GetIDRenderTexture(); if (nullptr == tex) { std::cout << "EditFaces::PickFace - id render texture null.\n"; } D3D11_TEXTURE2D_DESC td; tex->GetDesc(&td); D3D11_TEXTURE2D_DESC ttd = {}; ttd.Usage = D3D11_USAGE_STAGING; ttd.Width = 1; ttd.Height = 1; ttd.ArraySize = 1; ttd.Format = td.Format; ttd.SampleDesc = td.SampleDesc; ttd.CPUAccessFlags = D3D11_CPU_ACCESS_READ; HRESULT hr = device->CreateTexture2D(&ttd, nullptr, pickTex.ReleaseAndGetAddressOf()); if (FAILED(hr)) { std::cout << "Failed to create staging texture to pick face.\n"; return; } // create a box for the mouse pick screen position D3D11_BOX pBox = { pickPt.x, pickPt.y, 0, pickPt.x + 1, pickPt.y + 1, 1 }; context->CopySubresourceRegion(pickTex.Get(), 0, 0, 0, 0, tex, 0, &pBox); D3D11_MAPPED_SUBRESOURCE mr = {}; hr = context->Map(pickTex.Get(), 0, D3D11_MAP_READ, 0, &mr); if (FAILED(hr)) { std::cout << "Failed to map staging texture to pick face.\n"; return; } //////////// old ////////// // BYTE *picData = (BYTE*)mr.pData; // see comments regarding change in texture format //////////// end old ////// uint32_t *picData = (uint32_t*)mr.pData; if( int(picData[3]) != 123 ) // check the alpha "flag" { // announce no pick return; } int meshNum = int(picData[0]); // int faceNum = int(picData[1]) * 255 + int(picData[2]); int faceNum = int(picData[1]); mgr->Resources().Graphics().Context()->Unmap(pickTex.Get(), 0); // do with meshNum and faceNum what you will std::cout << "\nFace picked: " << "mesh: " << meshNum << " face: " << faceNum << "\n";

That routine is for illustration. It's unnecessary to recreate the staging texture for every pick. Just wanted to show that the texture must be the same format, etc. And - it's only one pixel - thus PPP (Pixel Perfect Picking). Using CopySubresourceRegion is just a lazy approach to avoid doing the pitch calcs to locate the pixel myself. And I use uint32_t to match the component size. Use your favorite (and, perhaps, better) coding practices.

Other Possible Uses

Ashaman73 has elsewhere discussed "proximity" picking. Sometimes picking something in the vicinity of the mouse position, but perhaps not directly under the mouse, is desirable. E.g., picking an edge in a "pixel-perfect" manner could be downright difficult. Using techniques similar to those described above, create a pick box around the mouse position of an appropriate size (say, 16x16 pixels). Examine the results and select data closest to center, etc.

Something similar could be done for edge selection if the user is allowed to drag the mouse a short distance to select an edge. Render the mesh in wireframe and check for edge-related data at the mouse position. I'm currently in process of testing that use.
Sign in to follow this  


Recommended Comments

Much flatter. Very honour. So blush. Wow.


Genius ? Nah, I just like to tickle my GPU. :lol:


For other readers, one point we discussed via PM: Don't forget that there are other formats for output, e.g. R32_UInt with which you can drop the conversion/packing completely.

Share this comment

Link to comment

I couldn't get DXGI_FORMAT_R32_UINT to work (as mentioned in a PM). With rendertarget in slot 0 set to R8B8G8A8_UNORM, and the R32_UINT view set to slot 1, I get a warning that "SV_Target1 has type float..." so it doesn't like the UINT texture.


However, setting the second render target texture to DXGI_FORMAT_R8B8G8A8_UNORM, it all works as advertised using the better shader code unbird provides. See the revision in the blog.


Nope. I was editing the wrong pixel shader. ohmy.png


The warning noted above resulted from (as mentioned) editing the wrong pixel shader. Changing the SV_Target1 output to uint4 (vs. float4) resolved the warning.

Share this comment

Link to comment
Admittedly, I did confuse Buckeye - and likely readers here - suggesting a single channel target. Use a target with more channels if needed.

Another thing to keep in mind when using integer type targets is blending: integers can't blend, so for these targets blending must be disabled (use [tt]IndependentBlendEnable = true[/tt] and configure the individual [tt]D3D11_RENDER_TARGET_BLEND_DESC[/tt] accordingly).

Share this comment

Link to comment
for these targets blending must be disabled



Excellent point! Actually, for any rendertarget of any type where the values are expected to be read from the texture as written in the shader - turn blend OFF! I'll add that to the blog. I can imagine situations where analog (vs. integer) data may be useful to capture also.


Regarding the "confusion" - not at all. I had originally fixated on a 32bit texture without thinking.

Share this comment

Link to comment

It'd be interesting if the same method is also feasible during gameplay.


I avoided the "color pick" method all along since it means transferring memory from GPU to CPU and also flushing the cards render buffers (no rendering ahead anymore). While in an editor this may be fine for a game it might mean hiccups in the frame rate? Any experiences with that?

Share this comment

Link to comment

Sorry. I haven't profiled it, and I haven't tried it yet in a gameplay situation, so I can't provide any timing data. However, as described here, it's done for just a single frame. In that single frame situation, with a well-designed render loop (i.e., fixed timestep), it should have no effect.


It would also depend on what picking is being done. Compared to traditional face picking methods (in particular), and particularly where multiple meshes may be involved, I would think copying a single pixel from a single buffer to get a hit must be faster than culling and doing calcs for (perhaps) thousands of triangles in multiple meshes.


I not sure I understand your comment/concern regarding "flushing" render buffers. Can you elaborate?

Share this comment

Link to comment

It's quite common to batch render calls and send them off to the GPU to handle it. In this way the card may actually render ahead one or even more frames.


When you now read something from the card all the buffered calls need to be handled at once (= flushed) to be able to read the wanted pixel. This highly depends on how you handle rendering, and I doubt it's a problem for most games. It'd still be interesting though smile.png

Share this comment

Link to comment

Ah, yes. Didn't think about batching/deferred rendering. Good comment.

Share this comment

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!