• entries
    422
  • comments
    1540
  • views
    488881

Debugging look-up textures using D3D10 Stream-Out

Sign in to follow this  

236 views

Evening all,

Finally back on the wagon with some Direct3D 10 coding [cool]

I'll save the full story for later in the week as its still a bit WiP but this evening I was doing some debugging and all evidence was suggesting that the HLSL intrinsic function acos() was generating a different result to the equivalent C++ function. Obviously that'd be a pretty messed up state of affairs if true.

Anyway, I wanted to verify that I wasn't seeing what I thought I was so I wrote the following lovely piece of code:

// Simple function to compute the differences between
// CPU computed values and GPU computed values
void ComputeDifferences( ID3D10Device* pDevice )
{
// More iterations the better, but too many
// and we get an awkwardly large dataset
const UINT ITERATIONS = 512;

// 1) Allocate memory for both sets of results
// D3D10 computes at FP32 internally so it's
// reasonable to store them as floats.
float* fCPUResults = new float[ ITERATIONS ];
float* fGPUResults = NULL;//new float[ ITERATIONS ];

// 2) Compute the CPU's results:
for( UINT cpu_idx = 0; cpu_idx < ITERATIONS; ++cpu_idx )
{
// the acosf() function is defined for inputs
// in the -1 to +1 range, so the first step is to
// map accordingly:
float x = static_cast< float >( cpu_idx ) / static_cast< float >( ITERATIONS - 1 );
x = (x * 2.0f) - 1.0f;

// Now evaluate the actual inverse cosine
fCPUResults[ cpu_idx ] = acosf( x );
}

// 3) Perform the GPU equivalent:

// Define the pass-thru vertex shader
const UINT MaxVertexShaderSize = 1 << 14;
char cVS[ MaxVertexShaderSize ];

StringCchCatA( cVS, MaxVertexShaderSize, "struct GS_INPUT" );
StringCchCatA( cVS, MaxVertexShaderSize, "{" );
StringCchCatA( cVS, MaxVertexShaderSize, " float param : PARAMETER;" );
StringCchCatA( cVS, MaxVertexShaderSize, "};" );
StringCchCatA( cVS, MaxVertexShaderSize, "GS_INPUT vs( uint i : SV_VertexID )" );
StringCchCatA( cVS, MaxVertexShaderSize, "{" );
StringCchCatA( cVS, MaxVertexShaderSize, " GS_INPUT toGS;" );

// fugly as hell hack to get a C++ constant into the VS, but it works :)
char cRemapping[128];
StringCchPrintfA( cRemapping, 128, " toGS.param = (((float)i / (float)%d) * 2.0f) - 1.0f;", ITERATIONS - 1 );
StringCchCatA( cVS, MaxVertexShaderSize, cRemapping );

StringCchCatA( cVS, MaxVertexShaderSize, " return toGS;" );
StringCchCatA( cVS, MaxVertexShaderSize, "}" );

size_t ActualVertexShaderSize = 0;
StringCchLengthA( cVS, MaxVertexShaderSize, &ActualVertexShaderSize );

// Create the vertex shader
ID3D10Blob* pVertexShaderByteCode = NULL;
ID3D10Blob* pCompilerErrors = NULL;
DWORD dwShaderFlags = D3D10_SHADER_ENABLE_STRICTNESS;
if( FAILED( D3D10CompileShader( cVS, ActualVertexShaderSize * sizeof( char ), NULL, NULL, NULL, "vs", "vs_4_0", dwShaderFlags, &pVertexShaderByteCode, &pCompilerErrors ) ) )
{
OutputDebugString( L"Failed to compile vertex shader's HLSL code!\n" );

OutputDebugStringA( reinterpret_cast< char* >( pCompilerErrors->GetBufferPointer() ) );

SAFE_RELEASE( pVertexShaderByteCode );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

ID3D10VertexShader *pVertexShader = NULL;

if( FAILED( pDevice->CreateVertexShader( pVertexShaderByteCode->GetBufferPointer(), pVertexShaderByteCode->GetBufferSize(), &pVertexShader ) ) )
{
OutputDebugString( L"Failed to create a vertex shader from the byte code!\n" );

SAFE_RELEASE( pVertexShaderByteCode );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

// Don't need the byte code anymore
SAFE_RELEASE( pVertexShaderByteCode );


// Define the geometry shader
const UINT MaxGeomShaderSize = 1 << 14;
char cGS[MaxGeomShaderSize];

StringCchCatA( cGS, MaxGeomShaderSize, "struct GS_INPUT" );
StringCchCatA( cGS, MaxGeomShaderSize, "{" );
StringCchCatA( cGS, MaxGeomShaderSize, " float param : PARAMETER;" );
StringCchCatA( cGS, MaxGeomShaderSize, "};" );
StringCchCatA( cGS, MaxGeomShaderSize, "struct GS_OUTPUT" );
StringCchCatA( cGS, MaxGeomShaderSize, "{" );
StringCchCatA( cGS, MaxGeomShaderSize, " float Result : GPU_RESULT;" );
StringCchCatA( cGS, MaxGeomShaderSize, "};" );
StringCchCatA( cGS, MaxGeomShaderSize, "[maxvertexcount(1)]" );
StringCchCatA( cGS, MaxGeomShaderSize, "void gs( point GS_INPUT fromVS[1], inout PointStream StreamOut )" );
StringCchCatA( cGS, MaxGeomShaderSize, "{" );
StringCchCatA( cGS, MaxGeomShaderSize, " GS_OUTPUT gpu_value;" );
StringCchCatA( cGS, MaxGeomShaderSize, " gpu_value.Result = acos( fromVS[0].param );" );
StringCchCatA( cGS, MaxGeomShaderSize, " StreamOut.Append( gpu_value );" );
StringCchCatA( cGS, MaxGeomShaderSize, "}" );

size_t ActualGeomShaderSize = 0;
StringCchLengthA( cGS, MaxGeomShaderSize, &ActualGeomShaderSize );

// Create a Geometry Shader
ID3D10Blob* pGeomShaderByteCode = NULL;
if( FAILED( D3D10CompileShader( cGS, ActualGeomShaderSize * sizeof( char ), NULL, NULL, NULL, "gs", "gs_4_0", dwShaderFlags, &pGeomShaderByteCode, &pCompilerErrors ) ) )
{
OutputDebugString( L"Failed to compile geometry shader's HLSL code!\n" );

OutputDebugStringA( reinterpret_cast< char* >( pCompilerErrors->GetBufferPointer() ) );

SAFE_RELEASE( pGeomShaderByteCode );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

ID3D10GeometryShader *pGeometryShader = NULL;

D3D10_SO_DECLARATION_ENTRY pDecl[] =
{
{ "GPU_RESULT", 0, 0, 1, 0 }
};

if( FAILED( pDevice->CreateGeometryShaderWithStreamOutput( pGeomShaderByteCode->GetBufferPointer(), pGeomShaderByteCode->GetBufferSize(), pDecl, 1, sizeof( float ), &pGeometryShader ) ) )
{
OutputDebugString( L"Failed to create geometry shader from byte code!\n" );

SAFE_RELEASE( pGeomShaderByteCode );
SAFE_RELEASE( pGeometryShader );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

// Don't need the byte code now we've created
// a usable GS...
SAFE_RELEASE( pGeomShaderByteCode );

// Create a buffer to store the outputs
ID3D10Buffer *pStreamOutBuffer = NULL;
D3D10_BUFFER_DESC SOBufferDesc =
{
sizeof( float ) * ITERATIONS,
D3D10_USAGE_DEFAULT,
D3D10_BIND_STREAM_OUTPUT,
0,
0
};

if( FAILED( pDevice->CreateBuffer( &SOBufferDesc, NULL, &pStreamOutBuffer ) ) )
{
OutputDebugString( L"Failed to create SO output buffer!\n" );

SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pGeometryShader );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

// Bind this buffer to the pipeline
UINT offset[1] = { 0 };
pDevice->SOSetTargets( 1, &pStreamOutBuffer, offset );

// Disable rasterization
D3D10_DEPTH_STENCIL_DESC dsDesc = { 0 };
ID3D10DepthStencilState *pDepthStencilState = NULL;

if( FAILED( pDevice->CreateDepthStencilState( &dsDesc, &pDepthStencilState ) ) )
{
OutputDebugString( L"Failed to create depth-stencil state object!\n" );

SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pGeometryShader );
SAFE_RELEASE( pDepthStencilState );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

pDevice->OMSetDepthStencilState( pDepthStencilState, 0 );

// Configure the pipeline for drawing
pDevice->VSSetShader( pVertexShader );
pDevice->GSSetShader( pGeometryShader );
pDevice->PSSetShader( NULL );

// Issue the draw command
pDevice->IASetPrimitiveTopology( D3D10_PRIMITIVE_TOPOLOGY_POINTLIST );
pDevice->Draw( ITERATIONS, 0 );

// Create a buffer to copy the results to
ID3D10Buffer* pStagingBuffer = NULL;
D3D10_BUFFER_DESC StagingDesc =
{
sizeof( float ) * ITERATIONS,
D3D10_USAGE_STAGING,
0,
D3D10_CPU_ACCESS_READ,
0
};

if( FAILED( pDevice->CreateBuffer( &StagingDesc, NULL, &pStagingBuffer ) ) )
{
OutputDebugString( L"Failed to create staging SO buffer!\n" );

SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pStagingBuffer );
SAFE_RELEASE( pGeometryShader );
SAFE_RELEASE( pDepthStencilState );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

// Copy the results to a CPU-accessible resource
pDevice->CopyResource( pStagingBuffer, pStreamOutBuffer );

// Read back the results
if( FAILED( pStagingBuffer->Map( D3D10_MAP_READ, 0, reinterpret_cast< void** >( &fGPUResults ) ) ) )
{
OutputDebugString( L"Unable to read-back GPU results from staging buffer!\n" );

pStagingBuffer->Unmap();

SAFE_RELEASE( pStagingBuffer );
SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pGeometryShader );
SAFE_RELEASE( pDepthStencilState );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}
pStagingBuffer->Unmap( );

// Tidy up all D3D10 resources
SAFE_RELEASE( pStagingBuffer );
SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pGeometryShader );
SAFE_RELEASE( pDepthStencilState );

// ?) Write the results out to a CSV file so that
// we can do further graphing/analysis using
// MS Excel:
HANDLE hFile = CreateFile( L"Results.csv", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL );
if( INVALID_HANDLE_VALUE != hFile )
{
// File was successfully opened for writing, write the header:
WCHAR wcHeader[] = L"Iteration, CPU acos(), GPU acos()\r\n";

DWORD dwBytesWritten = 0;
WriteFile( hFile, reinterpret_cast< LPCVOID >( wcHeader ), 35 * sizeof(WCHAR), &dwBytesWritten, NULL );

// Now write out a new line for each and every pair
// of CPU and GPU results:
for( UINT idx = 0; idx < ITERATIONS; ++idx )
{
WCHAR wcData[512];

// Composite this pair of results:
StringCchPrintf( wcData, 512, L"%d,%.10f,%.10f\r\n", idx, fCPUResults[idx], fGPUResults[idx] );

// Write the values to the CSV file
size_t wcLen = 0;
StringCchLength( wcData, 512, &wcLen );
WriteFile( hFile, reinterpret_cast< LPCVOID >( wcData ), static_cast< DWORD >( wcLen * sizeof( WCHAR ) ), &dwBytesWritten, NULL );
}

// Ensure we close when we're finished here
CloseHandle( hFile );
}

// ?) Clear up any memory we allocated
SAFE_DELETE_ARRAY( fCPUResults );
}


It's a bit ugly but it works. Especially love the way that I used StringCchPrintfA() to get a C++ constant into the vertex shader code. I think I've earnt my "133t ub3r h4><0rz" badge for that gem...

The above code does have a potentially good use though which is why I've shared it here.

Look-up textures for complex arithmetic can be a good optimization trick. In this instance part of my original problem was that a sin/tan look-up texture generated different results to a sin/tan operation in 'pure' HLSL arithmetic.

The code I presented here only works for 1D look-ups but you can quite easily throw in both the HLSL and C++ expressions into the above routine and check the generated CSV file. Throw the CSV file into MS Excel and draw some graphs or whatever analysis you fancy - should tell you pretty quickly whether the two implementations are actually equivalent.

Or, if that doesn't interest you... the above would be a reasonable 'getting started' example of how you can do a bit of GPGPU work using Direct3D 10...

Hope you find it useful [smile]
Sign in to follow this  


2 Comments


Recommended Comments

Any chance you could post the results too? (now that you've gotten me interested =)

Share this comment


Link to comment
I was going to post the results, but given that they are identical to ~6dp it wouldn't be a particularly interesting graph [lol]

When I dig further into the original bug I'll post pictures though!

Cheers,
Jack

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now