As requested in a previous journal entry I'm writing up my new found debugging trick. At the beginning of March I had the pleasure of visiting Microsoft's Direct3D team over in Redmond and during one of the discussions they suggested using SO for debugging. I'd never even contemplated using a Geometry Shader with Stream Out as a log file before!
I've implemented this technique for debugging Direct3D 11 tesselation code and the code that I present below is for D3D11, but implementing under D3D10 should be pretty straightforward. For this usage of stream output there haven't really been any big changes between 10, 10.1 and 11.
What do you get back?
Simply put you can pull back a BLOB containing a subset of the output from your geometry shader. You can then map this buffer to its equivalent C/C++ struct and read the data that was returned. In a nutshell it allows you to perform 'printf' style debugging of your geometry processing.
Its usage is a bit of a moot point in two regards:
- We have PIX for Windows that provides a much more intuitive interface
- It can generate a huge amount of data that may not be practical to debug.
However, in the context of Direct3D 11 where the tools aren't yet mature it is a very useful and very powerful tool.
More generally, you may find it useful as you can output this data mixed along with whatever app-specific data was used to generate the draw call, thus offering the ability to correlate and cross-reference cause and effect. Given the seperation from your application when using PIX this is a definite plus-point.
Show me the money!
There are four basic steps:
- Modify your shaders
You need to add and generate any additional fields that you want to output. For example, you might not normally output the world-space position but for this case you may well want to include it. - Change how you generate your GS
You need to use slightly different API methods for constructing your ID3D11GeometryShader (or ID3D10GeometryShader) - Create a buffer to capture the output
Quite simple really - you can't output data to nowhere! - Decode the results
Once you've finished rendering you need to put this new found source of information to good use
First-up, modify your shaders.
As you should be familiar, Direct3D supports a "pass forward" mechanism such that anything you want at a later stage must be passed through from an earlier stage and if it isn't there is no way to get it back again. This is the crucial detail, such that if you want intermediary values from your vertex shader you need to output these and, in the case of D3D11, persist them through the HS/DS stages as well.
Take the following Direct3D 11 Geometry Shader:
struct DS_OUTPUT{ float4 position : SV_Position; float3 colour : COLOUR; float3 uvw : DOMAIN_SHADER_LOCATION; float3 wPos : WORLD_POSITION;};[maxvertexcount(3)]void gsMain( triangle DS_OUTPUT input[3], inout TriangleStream TriangleOutputStream ){ TriangleOutputStream.Append( input[0] ); TriangleOutputStream.Append( input[1] ); TriangleOutputStream.Append( input[2] ); TriangleOutputStream.RestartStrip();}
The GS operates entirely as a pass-through, such that utilizing will not actually change the behaviour of your application. In particular note the DS_OUTPUT struct, in the next step we will choose which elements we want to be made available to the application.
An important point is that your pixel shader need not be changed, provided the order of elements is preserved and that it is still a strict subset of the GS output. In the above example the input I provide to the Pixel Shader only expects the second element - float3 colour : COLOUR - and ignores everything else. Thus a simple design methodolgy is to just append any new SO-specific fields to the end of each struct that you pass from stage-to-stage.
Second-up is to modify how you create your geometry shader.
In either Direct3D 11 or 10 you need to call CreateGeometryShaderWithStreamOutput() instead of CreateGeometryShader(), which is pretty straightforward except you need also provide a D3D11_SO_DECLARATION_ENTRY or D3D10_SO_DECLARATION_ENTRY (depending on which version you use):
D3D11_SO_DECLARATION_ENTRY soDecl[] = { { 0, "COLOUR", 0, 0, 3, 0 } , { 0, "DOMAIN_SHADER_LOCATION", 0, 0, 3, 0 } , { 0, "WORLD_POSITION", 0, 0, 3, 0 }};UINT stride = 9 * sizeof(float); // *NOT* sizeof the above array!UINT elems = sizeof(soDecl) / sizeof(D3D11_SO_DECLARATION_ENTRY);
There are three things to pay attention to:
- The semantic name: this must match one of the entries in your HLSL shader, note that this declaration outputs the last 3 of the 4 declared in the previous snippet.
- The start and element count fields: This is the 4th and 5th elements, all 0,3 in this case. for vector types, such as float3 in the above HLSL, you state which element to start reading from (0=x, 1=y, 2=z, 3=w) and then the count of elements (implying the last one read). For a float3 and 0,3 declaration it means ALL elements - but if it were a float4 then the 'w' component would not be streamed out.
- The element stride: The call to CreateGeometryShaderWithStreamOutput() needs to know the stride of a streamed out structure. Not exactly hard to compute, but easy to mistake it for the size of the soDecl array!
The only difference for D3D10 is that D3D11 introduces the concept of a 'stream' field, but we're not using that here such that you'd just drop the first integer on each row of the above.
Thirdly, you need to create a buffer to write the output to.
This is pretty much identical to how you create vertex and index buffers for rendering, with two caveats - you need two, one GPU writeable and one CPU readable and you don't provide any initial data prior to rendering like you would for a VB or IB.
D3D11_BUFFER_DESC soDesc;soDesc.BindFlags = D3D11_BIND_STREAM_OUTPUT;soDesc.ByteWidth = 10 * 1024 * 1024; // 10mbsoDesc.CPUAccessFlags = 0;soDesc.Usage = D3D11_USAGE_DEFAULT;soDesc.MiscFlags = 0;soDesc.StructureByteStride = 0;if( FAILED( hr = g_pd3dDevice->CreateBuffer( &soDesc, NULL, &g_pStreamOutBuffer ) ) ){ /* handle the error here */ return hr;}// Simply re-use the above structsoDesc.BindFlags = 0;soDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;soDesc.Usage = D3D11_USAGE_STAGING;if( FAILED( hr = g_pd3dDevice->CreateBuffer( &soDesc, NULL, &g_pStagingStreamOutBuffer ) ) ){ /* handle the error here */ return hr;}
You can't call Map() on a D3D11_USAGE_DEFAULT (or the v10 equivalent) and you can't bind a D3D11_CPU_ACCESS_READ resource as a pipeline output, so you create one of each class and later on we copy from the GPU to CPU in order to get at the data.
With the two buffers created the final step is to make sure you bind the output:
UINT offset = 0;
g_pContext->SOSetTargets( 1, &g_pStreamOutBuffer, &offset );
In D3D10 it'll be on a device rather than a context, but thats a trivial difference.
Finally you'll obviously want to read back the results!
g_pContext->CopyResource( g_pStagingStreamOutBuffer, g_pStreamOutBuffer );D3D11_MAPPED_SUBRESOURCE data;if( SUCCEEDED( g_pContext->Map( g_pStagingStreamOutBuffer, 0, D3D11_MAP_READ, 0, &data ) ) ){ struct GS_OUTPUT { D3DXVECTOR3 COLOUR; D3DXVECTOR3 DOMAIN_SHADER_LOCATION; D3DXVECTOR3 WORLD_POSITION; }; GS_OUTPUT *pRaw = reinterpret_cast< GS_OUTPUT* >( data.pData ); /* Work with the pRaw[] array here */ // Consider StringCchPrintf() and OutputDebugString() as simple ways of printing the above struct, or use the debugger and step through. g_pContext->Unmap( g_pStagingStreamOutBuffer, 0 );}
All of the above is executed after you've issued the draw call. The first line handles the previously mentioned case where GPU-writeable and CPU-readable can't be the same resource.
You need to be a bit careful with the struct you cast the pointer to - if you've any fancy alignment or padding in your application the C/C++ struct may not exactly match the binary representation in the buffer!
How much data is returned?
This is a subtle but rather important question!
In a conventional pipeline where you only use a pass-through GS you can know the amount of data in your SO buffer from only the draw call parameters, but if you use the GS to amplify or clip data, or use the tesselation features in D3D11 it becomes non-trivial.
When you're not entirely sure how many invocations there might be you need to resort to queries.
The exact mechanism varies between the API's but the basic idea is to start a query before the draw call, end it immediately after and then grab the result(s) which will tell you how much data to expect.
In my code I use the D3D11_QUERY_PIPELINE_STATISTICS query, which returns the D3D11_QUERY_DATA_PIPELINE_STATISTICS::GSPrimitives field. Similar should work with Direct3D 10.
// When initializing/loadingD3D11_QUERY_DESC queryDesc;queryDesc.Query = D3D11_QUERY_PIPELINE_STATISTICS;queryDesc.MiscFlags = 0;if( FAILED( hr = g_pd3dDevice->CreateQuery( &queryDesc, &g_pDeviceStats ) ) ){ return hr;} // When renderingg_pContext->Begin(g_pDeviceStats);g_pContext->DrawIndexed( 3, 0, 0 ); // one triangle onlyg_pContext->End(g_pDeviceStats);D3D11_QUERY_DATA_PIPELINE_STATISTICS stats;while( S_OK != g_pContext->GetData(g_pDeviceStats, &stats, g_pDeviceStats->GetDataSize(), 0 ) );
Alternatively there are the D3D11_QUERY_SO_STATISTICS and D3D11_QUERY_SO_OVERFLOW_PREDICATE (replace '11' with '10' if desired) which will return similar information. More usefully they will also tell you if data was truncated due to overflowing the buffer - useful for "proper" use of SO, but when debugging you're unlikely to want such huge amounts of output that this is likely!!
Any limitations?
Sadly, yes!
That's all folks!