Interesting issue with threads and D3D9

Started by
0 comments, last by Nostalgia 17 years, 2 months ago
Came across an interesting issue today that should make for some good discussion. First, the short version that may save you some reading: Is there anything special I need to do to use Direct3D in a threaded programming model? If not, read on. Still working on my emulator. First a little history: the original game loop looked like this:


while (gameIsRunning)
{
  Game engine runs (emulation logic, sound)
  Game engine generates a video frame to be rendered by filling in an array (let's call it renderArray).
  Texture is locked and renderArray is copied into texture.
  Texture is rendered to the scene.
  Scene is Present()ed.
}
This all happened in one thread of execution. What I wanted to do was de-couple the array-filling from the scene rendering to allow the engine to work while DX was waiting on hardware. So I created a thread that does this:

while (gameIsRunning)
{
  wait for FrameReady event to be set
  setBusyEvent
  call DoDisplay() -> Lock texture, copy in renderArray, render, Present.
  unsetBusyEvent
}
The main code was changed to add a second array (let's call them array1 and array2) so we can fill one and render the other. The main thread of execution now does this:

fillArray = array1
while (gameIsRunning)
{
  Game engine runs (emulation logic, sound)
  Game engine generates a video frame to be rendered by filling in fillArray.
  
  if BusyEvent is NOT set, we can render this frame
  {
    set renderArray = fillArray
    set FrameReady event
    set fillArray = fillArray == array1 ? array2 : array1
  }
  else the video display was busy, so drop the frame
}
Coneptually, this seems like it should de-couple the engine and the video system. In practice, all of the Direct3D calls seemed to take 10x longer. As comparison, I left everything the same and just changed the main code to this, so everything is as similar as possible:

fillArray = array1
while (gameIsRunning)
{
  Game engine runs (emulation logic, sound)
  Game engine generates a video frame to be rendered by filling in fillArray.
  
  set renderArray = fillArray
  call DoDisplay(renderArray) -> Lock texture, copy in renderArray, render, Present.
    
  set FrameReady event
  set fillArray = fillArray == array1 ? array2 : array1
}
Here's what my high-resolution timers recorded after 20 seconds of emulator-time elapsed (all times are averages):

NON-threaded version - rendered 1186 video frames
Texture lock/copy/unlock:  0.38ms
BeginScene/EndScene:       0.04ms
Present:                   6.13ms

Threaded version - rendered 122 video frames
Texture lock/copy/unlock: 30.02ms (21.44ms in unlock alone)
BeginScene/EndScene:      67.72ms (58.88ms in BeginScene)
Present:                  49.45ms
Times were measured as follows:

QueryPerformanceCounter(start)
DxFunctionCall
QueryPerformanceCounter(end)
time = (end - start) / timerFreq
Again, all of the code is precisely the same, with the only difference being in the first one everything's in one thread of execution, while in the second all of the D3D calls are in a separate thread from the engine. I'm open to suggestions on things I could try and/or reasons for this dramatic performance drop. Thanks, -Joe
Nostalgia, an Intellivision Emulatorhttp://www.gotmaille.com/nostalgia/
Advertisement
Actually, to be completely fair, I should run the threaded mode WITHOUT allowing frame drops. So it's now precisely the same chain of execution as the unthreaded model.

Ouch. It's painful to watch :) The timers were a bit better, though:

Synchronous threaded model, 1186 iterationsTexture lock/copy/unlock: 25.51msBeginScene/EndScene:       1.90msPresent:                   8.73ms


Very interesting. More research is necessary. This is the fun part ;)

-Joe
Nostalgia, an Intellivision Emulatorhttp://www.gotmaille.com/nostalgia/

This topic is closed to new replies.

Advertisement