Came across an interesting issue today that should make for some good discussion.
First, the short version that may save you some reading: Is there anything special I need to do to use Direct3D in a threaded programming model?
If not, read on.
Still working on my emulator. First a little history: the original game loop looked like this:
while (gameIsRunning)
{
Game engine runs (emulation logic, sound)
Game engine generates a video frame to be rendered by filling in an array (let's call it renderArray).
Texture is locked and renderArray is copied into texture.
Texture is rendered to the scene.
Scene is Present()ed.
}
This all happened in one thread of execution. What I wanted to do was de-couple the array-filling from the scene rendering to allow the engine to work while DX was waiting on hardware. So I created a thread that does this:
while (gameIsRunning)
{
wait for FrameReady event to be set
setBusyEvent
call DoDisplay() -> Lock texture, copy in renderArray, render, Present.
unsetBusyEvent
}
The main code was changed to add a second array (let's call them array1 and array2) so we can fill one and render the other. The main thread of execution now does this:
fillArray = array1
while (gameIsRunning)
{
Game engine runs (emulation logic, sound)
Game engine generates a video frame to be rendered by filling in fillArray.
if BusyEvent is NOT set, we can render this frame
{
set renderArray = fillArray
set FrameReady event
set fillArray = fillArray == array1 ? array2 : array1
}
else the video display was busy, so drop the frame
}
Coneptually, this seems like it should de-couple the engine and the video system. In practice, all of the Direct3D calls seemed to take 10x longer.
As comparison, I left everything the same and just changed the main code to this, so everything is as similar as possible:
fillArray = array1
while (gameIsRunning)
{
Game engine runs (emulation logic, sound)
Game engine generates a video frame to be rendered by filling in fillArray.
set renderArray = fillArray
call DoDisplay(renderArray) -> Lock texture, copy in renderArray, render, Present.
set FrameReady event
set fillArray = fillArray == array1 ? array2 : array1
}
Here's what my high-resolution timers recorded after 20 seconds of emulator-time elapsed (all times are averages):
NON-threaded version - rendered 1186 video frames
Texture lock/copy/unlock: 0.38ms
BeginScene/EndScene: 0.04ms
Present: 6.13ms
Threaded version - rendered 122 video frames
Texture lock/copy/unlock: 30.02ms (21.44ms in unlock alone)
BeginScene/EndScene: 67.72ms (58.88ms in BeginScene)
Present: 49.45ms
Times were measured as follows:
QueryPerformanceCounter(start)
DxFunctionCall
QueryPerformanceCounter(end)
time = (end - start) / timerFreq
Again, all of the code is precisely the same, with the only difference being in the first one everything's in one thread of execution, while in the second all of the D3D calls are in a separate thread from the engine.
I'm open to suggestions on things I could try and/or reasons for this dramatic performance drop.
Thanks,
-Joe
Nostalgia, an Intellivision Emulatorhttp://www.gotmaille.com/nostalgia/