Performance tips
Are you calling release on your command lists once you are done with them? Do you reuse them from frame to frame, or are they recycled multiple times?
Also, have you tried running the same program on the reference device, just to make sure you have a sequencing problem?
Also, have you tried running the same program on the reference device, just to make sure you have a sequencing problem?
command lists are recycled every frame, can they be reused?
anyway - looks like i was setting states only for the immediate context, now there are 2 problems:
FPS is down from 60~ to 8
nothing is rendered, not the terrain nor my normal objects (instances)
and then, sometimes i get an exception and the debug layers reports:
D3D11: CORRUPTION: ID3D11DeviceContext::ExecuteCommandList: First parameter is corrupt or NULL. [ MISCELLANEOUS CORRUPTION #13: CORRUPTED_PARAMETER1 ]
perhaps - synchronization error? how can this be, when the thread doesn't end before creating a command list, and im calling WaitForMultipleObjects from the main thread?
EDIT: with some experiments, i'm also getting this, sometimes:
D3D11: CORRUPTION: ID3D11DeviceContext::IASetInputLayout: Two threads were found to be executing functions associated with the same Device at the same time. This will cause corruption of memory. Appropriate thread synchronization needs to occur external to the Direct3D API. 3000 and 4432 are the implicated thread ids. [ MISCELLANEOUS CORRUPTION #28: CORRUPTED_MULTITHREADING ]
does it mean that 2 threads are using the same context?
anyway - looks like i was setting states only for the immediate context, now there are 2 problems:
FPS is down from 60~ to 8
nothing is rendered, not the terrain nor my normal objects (instances)
and then, sometimes i get an exception and the debug layers reports:
D3D11: CORRUPTION: ID3D11DeviceContext::ExecuteCommandList: First parameter is corrupt or NULL. [ MISCELLANEOUS CORRUPTION #13: CORRUPTED_PARAMETER1 ]
perhaps - synchronization error? how can this be, when the thread doesn't end before creating a command list, and im calling WaitForMultipleObjects from the main thread?
EDIT: with some experiments, i'm also getting this, sometimes:
D3D11: CORRUPTION: ID3D11DeviceContext::IASetInputLayout: Two threads were found to be executing functions associated with the same Device at the same time. This will cause corruption of memory. Appropriate thread synchronization needs to occur external to the Direct3D API. 3000 and 4432 are the implicated thread ids. [ MISCELLANEOUS CORRUPTION #28: CORRUPTED_MULTITHREADING ]
does it mean that 2 threads are using the same context?
Alright - it doesn't seem to be only a synchronization problem, even if i run each thread then wait for it using WaitForSingleObject, i don't get anything rendered, note that doing so removes all synchronization, which means that there were also a synchronization problem.
most recent code:
shouldn't this make sure that code under WaitForMultipleObjects will NEVER execute before all my threads are finished? well this is the thread code:
note at the end, it calls FinishCommandList from the provided context, which is a deferred context, and it's not being used by another thread, so i don't really see why im getting the 2 errors mentioned in my previous post.
most recent code:
//first, set the states of all contexts
for ( UINT i = 0; i < HX_TERRAIN_RENDERING_THREADS_COUNT; i++ )
{
ID3D11DeviceContext* context = _core->GetD3D11DeferredContext ( i );
//render the effect (update it's values)
_effect->Render ( context );
context->PSSetSamplers ( 0, 1, &_sampleState );
FLOAT blendFactors[] = { 0.0f, 0.0f, 0.0f, 0.0f };
context->IASetPrimitiveTopology ( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );
context->RSSetState ( _rasterizerState );
context->OMSetDepthStencilState ( _depthStencilState, 0 );
context->OMSetBlendState ( _blendState, blendFactors, 0xffffffff );
context->PSSetSamplers ( 0, 1, &_sampleState );
}
HANDLE handles[HX_TERRAIN_RENDERING_THREADS_COUNT];
RenderSectors* args[HX_TERRAIN_RENDERING_THREADS_COUNT];
UINT sectorCount[HX_TERRAIN_RENDERING_THREADS_COUNT] = { 0 };
for ( UINT i = 0; i < SectorsToRender.size ( ); i++ )
sectorCount[i%HX_TERRAIN_RENDERING_THREADS_COUNT]++;
UINT address = 0;
for ( UINT i = 0; i < HX_TERRAIN_RENDERING_THREADS_COUNT; i++ )
{
if ( sectorCount )
{
args = new RenderSectors ( );
args->context = _core->GetD3D11DeferredContext ( i );
args->numSectors = sectorCount;
args->sectors = &SectorsToRender[address];
args->indexBuffers = &indexBuffersToRender[address];
address += sectorCount;
handles = (HANDLE)_beginthread ( __RenderSectors, 0, (void*)args );
}
else
handles = NULL;
}
WaitForMultipleObjects ( HX_TERRAIN_RENDERING_THREADS_COUNT, handles, TRUE, INFINITE );
for ( UINT i = 0; i < HX_TERRAIN_RENDERING_THREADS_COUNT; i++ )
{
if ( args )
{
if ( args->numSectors )
{
_core->GetD3D11DeviceContext()->ExecuteCommandList ( args->commandList, FALSE );
HX_SAFE_FREE ( args->commandList );
HX_SAFE_DELETE ( args );
}
}
}
shouldn't this make sure that code under WaitForMultipleObjects will NEVER execute before all my threads are finished? well this is the thread code:
static void __RenderSectors ( void* sectors )
{
RenderSectors* renderSectors = (RenderSectors*)sectors;
if ( !renderSectors )
return;
for ( UINT i = 0; i < renderSectors->numSectors; i++ )
{
//set the mesh data in the input assembler to draw the mesh
UINT Strides[1];
UINT Offsets[1];
ID3D11Buffer* pVB[1];
pVB[0] = renderSectors->sectors->vb;
Strides[0] = sizeof _hxTerrainVertex;
Offsets[0] = 0;
renderSectors->context->IASetVertexBuffers ( 0, 1, pVB, Strides, Offsets );
renderSectors->context->IASetIndexBuffer ( renderSectors->indexBuffers->buffer, DXGI_FORMAT_R32_UINT, 0 );
//draw the mesh
renderSectors->context->DrawIndexed ( renderSectors->indexBuffers->size, 0, 0 );
}
renderSectors->context->FinishCommandList ( FALSE, &renderSectors->commandList );
_endthread ( );
};
note at the end, it calls FinishCommandList from the provided context, which is a deferred context, and it's not being used by another thread, so i don't really see why im getting the 2 errors mentioned in my previous post.
i used CreateEvent/SetEvent, and now i no longer get synchronization problems, but still, nothing is rendered, i'm trying to set the driver type to reference, but i can't manage to do it, for D3D11CreateDevice, if i pass NULL for the adapter and D3D_DRIVER_TYPE_REFERENCE for the driver type, it succeeds, but then i cannot create a swap chain, and the debug layer reports this warning:
DXGI Warning: IDXGIFactory::CreateSwapChain: This function is being called with a device from a different IDXGIFactory.
so i must pass the adapter, how can i create a reference device then?
that's somewhat disappointing, even though nothing is rendered, i still get the same FPS, and maybe a little less, but i will continue to the end and see how it goes...
EDIT:
this gives me 100% positive results, my hardware has full support for multithreading
DXGI Warning: IDXGIFactory::CreateSwapChain: This function is being called with a device from a different IDXGIFactory.
so i must pass the adapter, how can i create a reference device then?
that's somewhat disappointing, even though nothing is rendered, i still get the same FPS, and maybe a little less, but i will continue to the end and see how it goes...
EDIT:
D3D11_FEATURE_DATA_THREADING support;
_D3DDevice->CheckFeatureSupport ( D3D11_FEATURE_THREADING, &support, sizeof D3D11_FEATURE_DATA_THREADING );
this gives me 100% positive results, my hardware has full support for multithreading
i used CreateEvent/SetEvent, and now i no longer get synchronization problems, but still, nothing is rendered, i'm trying to set the driver type to reference, but i can't manage to do it, for D3D11CreateDevice, if i pass NULL for the adapter and D3D_DRIVER_TYPE_REFERENCE for the driver type, it succeeds, but then i cannot create a swap chain, and the debug layer reports this warning:
DXGI Warning: IDXGIFactory::CreateSwapChain: This function is being called with a device from a different IDXGIFactory.
so i must pass the adapter, how can i create a reference device then?
that's somewhat disappointing, even though nothing is rendered, i still get the same FPS, and maybe a little less, but i will continue to the end and see how it goes...
EDIT:
D3D11_FEATURE_DATA_THREADING support;
_D3DDevice->CheckFeatureSupport ( D3D11_FEATURE_THREADING, &support, sizeof D3D11_FEATURE_DATA_THREADING );
this gives me 100% positive results, my hardware has full support for multithreading
I think this journal post might be helpful for your current situation... although I don't think that has anything to do with threading.
Have you taken a frame capture with PIX yet? That could probably be helpful to figure out what is going on. I would recommend that you build your MT support so that you can gracefully switch back to a single-threaded mode that only uses the immediate context. Then when you can prove out your code with the ST mode, and switch to using deferred contexts+command lists+immediate context afterwards.
Something else that might be helpful when using pix with multithreading is to set the event markers wherever you are starting and stopping the generation of a command list. If you still have a copy of Hieroglyph 3 on your machine, you can take a look at the RendererDX11::PIXBeginEvent() and RendererDX11::PIXEndEvent() for how to implement them. It can get pretty hard to see what is going on when multiple threads are dumping code simultaneously, and these will help you make sense of them.
i have never had any success using PIX with direct3D 11, all i see is the frames window, which contains some Direct3D9 stuff (for example, number of DrawPrim calls - that's for Direct3D9 right?) so i can't get anything out of it - and all i found on msdn was for D3D9 and 10
Ok to avoid all all the mess reading from PIX on multiple threads, i attempted to render one sector from each thread, nothing is parallel, i wait for one to start the other, that's 1 vertex buffer and one index buffer per thread, here are my notes:
1- the sequence is correct, set index buffers, set vertex buffers, drawindexed, finishcommandlist, 4 times, once for every thread
2- after that, another sequence which also seems correct: from the immediate context: executecommandlist then release it (the list)
3- every thread's device context is different, nothing's wrong
4- before FinishCommandList, i checked the context state, everything's correct, but at the FinishCommandList, the device states looks default - although i specified FALSE for state restoration BOOL, nothing important about it, but is that normal?
that's what i can get out of PIX, also - do i need to step any special states on the immediate context for the command list to be properly executed?
EDIT: image attached, shows what im talking about, especially note 4
1- the sequence is correct, set index buffers, set vertex buffers, drawindexed, finishcommandlist, 4 times, once for every thread
2- after that, another sequence which also seems correct: from the immediate context: executecommandlist then release it (the list)
3- every thread's device context is different, nothing's wrong
4- before FinishCommandList, i checked the context state, everything's correct, but at the FinishCommandList, the device states looks default - although i specified FALSE for state restoration BOOL, nothing important about it, but is that normal?
that's what i can get out of PIX, also - do i need to step any special states on the immediate context for the command list to be properly executed?
EDIT: image attached, shows what im talking about, especially note 4
Regarding #4, that is normal. After finishing the list, it is normal to get the context state reset. Did you double check your viewport? Since the context state is reset for every rendering sequence, then you must completely configure the whole pipeline for each command list. The one that got me when I started out with the deferred contexts was the view port...
At last, thank you very much Jason Z, that was exactly the problem - i wasn't setting the viewports/render targets, it was a little messy to get this to work, i think i will need a more general-purpose multithreading system with lots of improvements..
anyway, this looks promising: with 1024x1024 terrain, the multithreading gives me no performance drop, and sometimes a little bit more than expected, i will try with larger terrains and see how performance goes
anyway, this looks promising: with 1024x1024 terrain, the multithreading gives me no performance drop, and sometimes a little bit more than expected, i will try with larger terrains and see how performance goes
I am curious to hear if you manage to squeeze out more performance on your multithreaded version. Remember, since you have multiple threads running to generate the command lists, if you push more work onto the threads during the command list generation then that work will be done in parallel too, producing more savings.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement