• ### Popular Now

• 12
• 27
• 9
• 9
• 20

This topic is 2759 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

First, I do not have a lot of Multithreading experience, so forgive me if these questions are obvious. In a MS presentation, they created 3 worker threads and deferred contextes, and rendered the mirrored scene for each mirror:

// Wait for main thread to signal readyWaitForSingleObject( g_hBeginRenderDeferredEvent[iInstance], INFINITE );RenderMirror( iInstance, pd3dDeferredContext );V( pd3dDeferredContext->FinishCommandList( FALSE, &g_pd3dCommandList[iInstance] ) );// Tell main thread command list is finishedSetEvent( g_hEndRenderDeferredEvent[iInstance] );

Then in the main thread, after the worker threads are done, it "executes" their work:

// Signal all worker threads, then wait for completionfor (int iInstance = 0; iInstance < g_iNumRenderThreads; ++iInstance ){ 	// signal ready for scene kickoff   	SetEvent( g_hBeginRenderDeferredEvent[iInstance] );}// wait for completionWaitForMultipleObjects( g_iNumRenderThreads, g_hEndRenderDeferredEvent, TRUE, INFINITE ); 			. . .for (int iInstance = 0; iInstance < g_iNumRenderThreads; ++iInstance ){pd3dImmediateContext->ExecuteCommandList(g_pd3dCommandList[iInstance], FALSE);SAFE_RELEASE( g_pd3dCommandList[iInstance] );}

Now what I do not understand: What is really happening when a command list is built up? From the documentation terminology, it sounds like a command list just records a list of commands. And from this example, the command lists are still executed on the main thread.

So where is the savings? Is there something going on that makes a command list more efficient to process that there is savings in doing this? I would have thought the Direct3D calls in RenderMirror( iInstance, pd3dDeferredContext ) would actually be submitted on the separate thread; of this were the case, I could see the benefit of multithreading, but it looks like the Direct3D commands are still submitted on the main thread, through the command lists.

##### Share on other sites
There are two potential levels of savings or speed ups which can be had;

The first is being able to throw more cores at the work load and thus spread it about a bit.

If your scene rendering takes 16ms, then spreading that over say 4 threads could reduce it down to around 5ms or less (depending on load, method to spread work etc) as the 16ms of work is split 4 ways and the final 'submit' is just a series of calls which while they have a cost will be cheaper than the constant calls into the runtime it would take to render things in a single threaded manner.

It also helps matters as it forces you to split up state and make no assumptions about what states are set, so you can render each sub-task as if its effectively a blank slate. This means less state management but more upfront cost when you start rendering.

The effectiveness of this however does depend on how you spread the work around as I mentioned, however it does work even at that level as the example you used shows a speedup on my HD5870 when I tried it when using the multi-threaded rendering modes.

The 2nd way it could speed things up is by allowing the driver/runtime to optimise the calls as they are made on the deferred renderer so that they are in a format the GPU can deal with faster. Currently I don't know if any drivers do this; my ATI card with the Cat10.7 drivers don't and I'm not aware of the status of this with NV nor the Cat10.8 drivers.

##### Share on other sites
With more advanced rendering techniques you can do things like: setup an occlusion query and predicate a device context on it. This will then use the results of the query to determine if the context should be processed, thereby allowing you to easily batch up states and rendering commands on a per-object/query basis.

There are other uses as well, such as making asynchronous resource loads much easier to handle.

##### Share on other sites
> From the documentation terminology, it sounds like a command list just records a list of commands.

Yes, roughly.

> And from this example, the command lists are still executed on the main thread.

No, they are scheduled to be executed by the main thread, but they are actually executed on the HW.

> So where is the savings?

The construction of the command lists can be done in parallel on multiple calls.