Sign in to follow this  
51mon

DirectX and multithreading

Recommended Posts

I got a rendering process that takes a lot of time and since the time consuming part doesn’t need to be updated every frame I was thinking of putting it into a separate thread. This is the functionality of the background thread: • Have access to the graphic card and do some short rendering. • Move the target buffer to system memory. • Let the CPU do computation on the buffer (this is the part that take the longest time). • Fill a texture and send it down to the graphic card to be used by the main process. There is some data that is shared between the threads and I declare that data as global. The main thread is made up by an ordinary C++ DirectX application based on the DirectX template, mostly function oriented. The background thread is created with CreateThread function. Also, I created the device with the D3DCREATE_MULTITHREADED flag. Here is my code:
[source lang = "cpp"]
HANDLE hScatteringLightThread;
DWORD dwScatteringLightId;
bool bRenderToSurface = false;
bool bUpdateGraphicCard = false;
LPDIRECT3DSURFACE9 g_pGPU_Surface;
LPDIRECT3DSURFACE9 g_pSysmem_Surface;
LPDIRECT3DTEXTURE9 g_pGPU_Texture;
LPDIRECT3DTEXTURE9 g_pSysmem_Texture;
   Other shared variables...


DWORD WINAPI ScatteringLightProc( LPVOID lpParam ) 
{
	while(1)
	{
		bRenderToSurface = true;
		// SuspendThread(hScatteringLightThread);	// Doesn't work
		// Sleep(INFINITE);				// Doesn't work either
		Sleep(40);
		   Compute g_pSysmem_Texture based on g_pSysmem_Surface
		bUpdateGraphicCard = true;
	}
	return 0;
}


HRESULT CALLBACK OnCreateDevice( IDirect3DDevice9* pd3dDevice, const D3DSURFACE_DESC* pBackBufferSurfaceDesc, void* pUserContext )
{
	hScatteringLightThread = CreateThread( NULL, 0, ScatteringLightProc, NULL, 0, &dwScatteringLightId);
	   Code...
}


void CALLBACK OnFrameRender( IDirect3DDevice9* pd3dDevice, double fTime, float fElapsedTime, void* pUserContext )
{
	if(bRenderToSurface)
	{
		   Render to g_pGPU_Surface
		   Move g_pGPU_Surface to g_pSysmem_Surface
		bRenderToSurface = false;
		// ResumeThread(hScatteringLightThread);	// Doesn't work
	}

	if(bUpdateGraphicCard)
	{	
		   Move g_pSysmem_Texture to g_pGPU_Texture
		   Update graphic card
		bUpdateGraphicCard = false;
	}
	   Rest of the rendering loop...
}


void CALLBACK OnDestroyDevice( void* pUserContext )
{
	DWORD dwExitCode;
	GetExitCodeThread(hScatteringLightThread,&dwExitCode);
	TerminateThread(hScatteringLightThread,dwExitCode);
	CloseHandle(hScatteringLightThread);
	   Code...
}




The thing is that it doesn’t work as I expected. When using Sleep(INFINITE); the thread is not considered as suspended by ResumeThread and does never wake up. When using SuspendThread the background thread just stop. The only thing that works is Sleep(40) but that is bad coding and I don’t think it’s optimal. Does anyone have any suggestion of how I can solve this? Should I implement the threads in another way? Should I change the thread architecture and the way they interact with each other? Another thing I wonder about, is it safe to share global variables between the threads? Thank you for your time:)

Share this post


Link to post
Share on other sites
I'm not entirely sure on the threading API's - I've not had a chance to delve into multiprogramming yet (still on my "todo" list [smile]).

BUT, I will say you're likely to incur substantial overhead from the D3D runtime by architecting it the way you have. There are lots of discussions about it on the DirectXDev mailing list that I tried to summarise in the forum FAQ. Basically, the D3DCREATE_MULTITHREADED flag just forces a CritSec on most calls and effectively forces synchronization and extra overhead without much genuine gain from the MP.

The consensus is that you should NOT use that flag and you should isolate your Direct3D connection inside of a single thread, exposing/sharing none of it with ANY other threads. Use regular system memory to transfer data between threads.

For example; your D3D thread downloads the front-buffer, locks it, memcpy_s()'s the contents to a private buffer, unlocks it, carrys on. Your worker thread can then access this local copy of the buffer and work on it as appropriate (ensure you have the correct guards in place) and then signals the D3D thread when its done doing whatever it wants. The D3D thread then does the opposite of the first step and re-uploads the modified buffer.

By doing it this way your worker thread has no interaction with the D3D API or any resources that are directly connected with it (e.g. don't just pass the D3DLOCKED_RECT::pBits pointer to a thread, make a copy first).

Extra efficiency can be gained by having double/triple buffering of data and possibly I/O queues for multiple threads.

hth
Jack

Share this post


Link to post
Share on other sites
Quote:
Original post by jollyjeffers
I'm not entirely sure on the threading API's - I've not had a chance to delve into multiprogramming yet (still on my "todo" list [smile]).

BUT, I will say you're likely to incur substantial overhead from the D3D runtime by architecting it the way you have. There are lots of discussions about it on the DirectXDev mailing list that I tried to summarise in the forum FAQ. Basically, the D3DCREATE_MULTITHREADED flag just forces a CritSec on most calls and effectively forces synchronization and extra overhead without much genuine gain from the MP.

The consensus is that you should NOT use that flag and you should isolate your Direct3D connection inside of a single thread, exposing/sharing none of it with ANY other threads. Use regular system memory to transfer data between threads.

For example; your D3D thread downloads the front-buffer, locks it, memcpy_s()'s the contents to a private buffer, unlocks it, carrys on. Your worker thread can then access this local copy of the buffer and work on it as appropriate (ensure you have the correct guards in place) and then signals the D3D thread when its done doing whatever it wants. The D3D thread then does the opposite of the first step and re-uploads the modified buffer.

By doing it this way your worker thread has no interaction with the D3D API or any resources that are directly connected with it (e.g. don't just pass the D3DLOCKED_RECT::pBits pointer to a thread, make a copy first).

Extra efficiency can be gained by having double/triple buffering of data and possibly I/O queues for multiple threads.

hth
Jack



Thanks Jack
The thing with D3DCREATE_MULTITHREADED was that the complier asked for it and it got a bit faster when I used it. But I gonna change my architecture quite radicaly, follow your advise and hopfully gain something.

/ Simon

Share this post


Link to post
Share on other sites
You could use setup an object with CreateEvent, use WaitForSingleObject in ScatteringLightProc and signal it with SetEvent in OnFrameRender, that way it'll work just like your resumethread thought.

edit: Check this out for an example: http://www.codersource.net/win32_waitforsingleobject.html

Good luck with your game!

Share this post


Link to post
Share on other sites
Quote:
Original post by keen
You could use setup an object with CreateEvent, use WaitForSingleObject in ScatteringLightProc and signal it with SetEvent in OnFrameRender, that way it'll work just like your resumethread thought.

edit: Check this out for an example: http://www.codersource.net/win32_waitforsingleobject.html

Good luck with your game!


Thanks
I dont think I gonna have waiting times in my application any more. But that was smart problem solving, I think I'll use it next time:)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this