Jump to content
  • Advertisement
  • entries
    422
  • comments
    1540
  • views
    490100

Debugging look-up textures using D3D10 Stream-Out

Sign in to follow this  
jollyjeffers

295 views

Evening all,

Finally back on the wagon with some Direct3D 10 coding [cool]

I'll save the full story for later in the week as its still a bit WiP but this evening I was doing some debugging and all evidence was suggesting that the HLSL intrinsic function acos() was generating a different result to the equivalent C++ function. Obviously that'd be a pretty messed up state of affairs if true.

Anyway, I wanted to verify that I wasn't seeing what I thought I was so I wrote the following lovely piece of code:

// Simple function to compute the differences between
// CPU computed values and GPU computed values
void ComputeDifferences( ID3D10Device* pDevice )
{
// More iterations the better, but too many
// and we get an awkwardly large dataset
const UINT ITERATIONS = 512;

// 1) Allocate memory for both sets of results
// D3D10 computes at FP32 internally so it's
// reasonable to store them as floats.
float* fCPUResults = new float[ ITERATIONS ];
float* fGPUResults = NULL;//new float[ ITERATIONS ];

// 2) Compute the CPU's results:
for( UINT cpu_idx = 0; cpu_idx < ITERATIONS; ++cpu_idx )
{
// the acosf() function is defined for inputs
// in the -1 to +1 range, so the first step is to
// map accordingly:
float x = static_cast< float >( cpu_idx ) / static_cast< float >( ITERATIONS - 1 );
x = (x * 2.0f) - 1.0f;

// Now evaluate the actual inverse cosine
fCPUResults[ cpu_idx ] = acosf( x );
}

// 3) Perform the GPU equivalent:

// Define the pass-thru vertex shader
const UINT MaxVertexShaderSize = 1 << 14;
char cVS[ MaxVertexShaderSize ];

StringCchCatA( cVS, MaxVertexShaderSize, "struct GS_INPUT" );
StringCchCatA( cVS, MaxVertexShaderSize, "{" );
StringCchCatA( cVS, MaxVertexShaderSize, " float param : PARAMETER;" );
StringCchCatA( cVS, MaxVertexShaderSize, "};" );
StringCchCatA( cVS, MaxVertexShaderSize, "GS_INPUT vs( uint i : SV_VertexID )" );
StringCchCatA( cVS, MaxVertexShaderSize, "{" );
StringCchCatA( cVS, MaxVertexShaderSize, " GS_INPUT toGS;" );

// fugly as hell hack to get a C++ constant into the VS, but it works :)
char cRemapping[128];
StringCchPrintfA( cRemapping, 128, " toGS.param = (((float)i / (float)%d) * 2.0f) - 1.0f;", ITERATIONS - 1 );
StringCchCatA( cVS, MaxVertexShaderSize, cRemapping );

StringCchCatA( cVS, MaxVertexShaderSize, " return toGS;" );
StringCchCatA( cVS, MaxVertexShaderSize, "}" );

size_t ActualVertexShaderSize = 0;
StringCchLengthA( cVS, MaxVertexShaderSize, &ActualVertexShaderSize );

// Create the vertex shader
ID3D10Blob* pVertexShaderByteCode = NULL;
ID3D10Blob* pCompilerErrors = NULL;
DWORD dwShaderFlags = D3D10_SHADER_ENABLE_STRICTNESS;
if( FAILED( D3D10CompileShader( cVS, ActualVertexShaderSize * sizeof( char ), NULL, NULL, NULL, "vs", "vs_4_0", dwShaderFlags, &pVertexShaderByteCode, &pCompilerErrors ) ) )
{
OutputDebugString( L"Failed to compile vertex shader's HLSL code!\n" );

OutputDebugStringA( reinterpret_cast< char* >( pCompilerErrors->GetBufferPointer() ) );

SAFE_RELEASE( pVertexShaderByteCode );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

ID3D10VertexShader *pVertexShader = NULL;

if( FAILED( pDevice->CreateVertexShader( pVertexShaderByteCode->GetBufferPointer(), pVertexShaderByteCode->GetBufferSize(), &pVertexShader ) ) )
{
OutputDebugString( L"Failed to create a vertex shader from the byte code!\n" );

SAFE_RELEASE( pVertexShaderByteCode );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

// Don't need the byte code anymore
SAFE_RELEASE( pVertexShaderByteCode );


// Define the geometry shader
const UINT MaxGeomShaderSize = 1 << 14;
char cGS[MaxGeomShaderSize];

StringCchCatA( cGS, MaxGeomShaderSize, "struct GS_INPUT" );
StringCchCatA( cGS, MaxGeomShaderSize, "{" );
StringCchCatA( cGS, MaxGeomShaderSize, " float param : PARAMETER;" );
StringCchCatA( cGS, MaxGeomShaderSize, "};" );
StringCchCatA( cGS, MaxGeomShaderSize, "struct GS_OUTPUT" );
StringCchCatA( cGS, MaxGeomShaderSize, "{" );
StringCchCatA( cGS, MaxGeomShaderSize, " float Result : GPU_RESULT;" );
StringCchCatA( cGS, MaxGeomShaderSize, "};" );
StringCchCatA( cGS, MaxGeomShaderSize, "[maxvertexcount(1)]" );
StringCchCatA( cGS, MaxGeomShaderSize, "void gs( point GS_INPUT fromVS[1], inout PointStream StreamOut )" );
StringCchCatA( cGS, MaxGeomShaderSize, "{" );
StringCchCatA( cGS, MaxGeomShaderSize, " GS_OUTPUT gpu_value;" );
StringCchCatA( cGS, MaxGeomShaderSize, " gpu_value.Result = acos( fromVS[0].param );" );
StringCchCatA( cGS, MaxGeomShaderSize, " StreamOut.Append( gpu_value );" );
StringCchCatA( cGS, MaxGeomShaderSize, "}" );

size_t ActualGeomShaderSize = 0;
StringCchLengthA( cGS, MaxGeomShaderSize, &ActualGeomShaderSize );

// Create a Geometry Shader
ID3D10Blob* pGeomShaderByteCode = NULL;
if( FAILED( D3D10CompileShader( cGS, ActualGeomShaderSize * sizeof( char ), NULL, NULL, NULL, "gs", "gs_4_0", dwShaderFlags, &pGeomShaderByteCode, &pCompilerErrors ) ) )
{
OutputDebugString( L"Failed to compile geometry shader's HLSL code!\n" );

OutputDebugStringA( reinterpret_cast< char* >( pCompilerErrors->GetBufferPointer() ) );

SAFE_RELEASE( pGeomShaderByteCode );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

ID3D10GeometryShader *pGeometryShader = NULL;

D3D10_SO_DECLARATION_ENTRY pDecl[] =
{
{ "GPU_RESULT", 0, 0, 1, 0 }
};

if( FAILED( pDevice->CreateGeometryShaderWithStreamOutput( pGeomShaderByteCode->GetBufferPointer(), pGeomShaderByteCode->GetBufferSize(), pDecl, 1, sizeof( float ), &pGeometryShader ) ) )
{
OutputDebugString( L"Failed to create geometry shader from byte code!\n" );

SAFE_RELEASE( pGeomShaderByteCode );
SAFE_RELEASE( pGeometryShader );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

// Don't need the byte code now we've created
// a usable GS...
SAFE_RELEASE( pGeomShaderByteCode );

// Create a buffer to store the outputs
ID3D10Buffer *pStreamOutBuffer = NULL;
D3D10_BUFFER_DESC SOBufferDesc =
{
sizeof( float ) * ITERATIONS,
D3D10_USAGE_DEFAULT,
D3D10_BIND_STREAM_OUTPUT,
0,
0
};

if( FAILED( pDevice->CreateBuffer( &SOBufferDesc, NULL, &pStreamOutBuffer ) ) )
{
OutputDebugString( L"Failed to create SO output buffer!\n" );

SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pGeometryShader );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

// Bind this buffer to the pipeline
UINT offset[1] = { 0 };
pDevice->SOSetTargets( 1, &pStreamOutBuffer, offset );

// Disable rasterization
D3D10_DEPTH_STENCIL_DESC dsDesc = { 0 };
ID3D10DepthStencilState *pDepthStencilState = NULL;

if( FAILED( pDevice->CreateDepthStencilState( &dsDesc, &pDepthStencilState ) ) )
{
OutputDebugString( L"Failed to create depth-stencil state object!\n" );

SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pGeometryShader );
SAFE_RELEASE( pDepthStencilState );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

pDevice->OMSetDepthStencilState( pDepthStencilState, 0 );

// Configure the pipeline for drawing
pDevice->VSSetShader( pVertexShader );
pDevice->GSSetShader( pGeometryShader );
pDevice->PSSetShader( NULL );

// Issue the draw command
pDevice->IASetPrimitiveTopology( D3D10_PRIMITIVE_TOPOLOGY_POINTLIST );
pDevice->Draw( ITERATIONS, 0 );

// Create a buffer to copy the results to
ID3D10Buffer* pStagingBuffer = NULL;
D3D10_BUFFER_DESC StagingDesc =
{
sizeof( float ) * ITERATIONS,
D3D10_USAGE_STAGING,
0,
D3D10_CPU_ACCESS_READ,
0
};

if( FAILED( pDevice->CreateBuffer( &StagingDesc, NULL, &pStagingBuffer ) ) )
{
OutputDebugString( L"Failed to create staging SO buffer!\n" );

SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pStagingBuffer );
SAFE_RELEASE( pGeometryShader );
SAFE_RELEASE( pDepthStencilState );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}

// Copy the results to a CPU-accessible resource
pDevice->CopyResource( pStagingBuffer, pStreamOutBuffer );

// Read back the results
if( FAILED( pStagingBuffer->Map( D3D10_MAP_READ, 0, reinterpret_cast< void** >( &fGPUResults ) ) ) )
{
OutputDebugString( L"Unable to read-back GPU results from staging buffer!\n" );

pStagingBuffer->Unmap();

SAFE_RELEASE( pStagingBuffer );
SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pGeometryShader );
SAFE_RELEASE( pDepthStencilState );

SAFE_DELETE_ARRAY( fCPUResults );

return;
}
pStagingBuffer->Unmap( );

// Tidy up all D3D10 resources
SAFE_RELEASE( pStagingBuffer );
SAFE_RELEASE( pStreamOutBuffer );
SAFE_RELEASE( pGeometryShader );
SAFE_RELEASE( pDepthStencilState );

// ?) Write the results out to a CSV file so that
// we can do further graphing/analysis using
// MS Excel:
HANDLE hFile = CreateFile( L"Results.csv", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL );
if( INVALID_HANDLE_VALUE != hFile )
{
// File was successfully opened for writing, write the header:
WCHAR wcHeader[] = L"Iteration, CPU acos(), GPU acos()\r\n";

DWORD dwBytesWritten = 0;
WriteFile( hFile, reinterpret_cast< LPCVOID >( wcHeader ), 35 * sizeof(WCHAR), &dwBytesWritten, NULL );

// Now write out a new line for each and every pair
// of CPU and GPU results:
for( UINT idx = 0; idx < ITERATIONS; ++idx )
{
WCHAR wcData[512];

// Composite this pair of results:
StringCchPrintf( wcData, 512, L"%d,%.10f,%.10f\r\n", idx, fCPUResults[idx], fGPUResults[idx] );

// Write the values to the CSV file
size_t wcLen = 0;
StringCchLength( wcData, 512, &wcLen );
WriteFile( hFile, reinterpret_cast< LPCVOID >( wcData ), static_cast< DWORD >( wcLen * sizeof( WCHAR ) ), &dwBytesWritten, NULL );
}

// Ensure we close when we're finished here
CloseHandle( hFile );
}

// ?) Clear up any memory we allocated
SAFE_DELETE_ARRAY( fCPUResults );
}


It's a bit ugly but it works. Especially love the way that I used StringCchPrintfA() to get a C++ constant into the vertex shader code. I think I've earnt my "133t ub3r h4><0rz" badge for that gem...

The above code does have a potentially good use though which is why I've shared it here.

Look-up textures for complex arithmetic can be a good optimization trick. In this instance part of my original problem was that a sin/tan look-up texture generated different results to a sin/tan operation in 'pure' HLSL arithmetic.

The code I presented here only works for 1D look-ups but you can quite easily throw in both the HLSL and C++ expressions into the above routine and check the generated CSV file. Throw the CSV file into MS Excel and draw some graphs or whatever analysis you fancy - should tell you pretty quickly whether the two implementations are actually equivalent.

Or, if that doesn't interest you... the above would be a reasonable 'getting started' example of how you can do a bit of GPGPU work using Direct3D 10...

Hope you find it useful [smile]
Sign in to follow this  


2 Comments


Recommended Comments

Any chance you could post the results too? (now that you've gotten me interested =)

Share this comment


Link to comment
I was going to post the results, but given that they are identical to ~6dp it wouldn't be a particularly interesting graph [lol]

When I dig further into the original bug I'll post pictures though!

Cheers,
Jack

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!