Sign in to follow this  
Etnu

Multithreading.

Recommended Posts

I decided that some of the things I've been trying to do with my engine are a pain in the ass without multithreading, so I went and took some time today (about 2 hours or so) to implement it. Short story: Multithreading definitely eased some performance bottlenecks in certain areas, but it's kind of tricky to make sure that you don't totally fuck something up. There was a very, very slight performance loss from the initial setup (I went from about 550fps average to about 530), but it was an overall improvement (frame rate never dips below 400fps now, whereas it used to drop dowwn as low as 200 during normal rendering, and to fractions when loading things). Long story: There's one thing to consider when using D3D in a multithreaded environment that's more important than anything else: If you try to create or destroy a resource while D3D has a chance to access it, you're probably going to make your program blow up. Fortunately, it's fairly easy to resolve this. Here's what I did:

//-----------------------------------
// LockRenderThread()
// Locks the rendering thread
//-----------------------------------
void CEther::LockRenderThread()
{	
	bool Locked = false;
	GraphicsEngine.RequestLock();
	while(!Locked)
	{
		if(GraphicsEngine.GetRenderState(RS_PAUSED))
			return;
		Sleep(0); // Give the other thread time to respond.
	}
}

//-----------------------------------
// UnlockRenderThread()
// Unlocks the rendering thread
//-----------------------------------
void CEther::UnlockRenderThread()
{
	if(GraphicsEngine.IsStateLocked(RS_PAUSED)) // We can't unlock at this point; most likely we're still loading.
		return;

	bool Locked = true;
	GraphicsEngine.RequestUnlock();
	while(Locked)
	{
		if(!GraphicsEngine.GetRenderState(RS_PAUSED))
			return;
		Sleep(0); // Give the other thread time to respond.
	}
}

Is in the main engine. Here's the thread that actually does the rendering:
//-----------------------------------
// GraphicsProc()
// Handles rendering.
//-----------------------------------
DWORD WINAPI GraphicsProc(void* Obj)
{	
	assert(Obj);
	bool RunningThread = true;
	while(RunningThread)
	{
		if(GraphicsEngine.GetRenderState(RS_SHUTDOWN))
			return 0;

		if(!GraphicsEngine.GetRenderState(RS_PAUSED))
		{
			if(GraphicsEngine.IsLocked())
				GraphicsEngine.SetRenderState(RS_PAUSED, true);				
		}
		else
		{
			if(!GraphicsEngine.IsLocked())
				GraphicsEngine.SetRenderState(RS_PAUSED, false);				
		}

		if(GraphicsEngine.GetRenderState(RS_PAUSED))
			continue;		

		// Begin rendering
		RSLT rslt = GraphicsEngine.BeginScene();
		if(rslt != SUCCESS)
		{
			PostQuitMessage(0);
			break;
		}

		
		//...snipped drawing routines.							

		// Render the scene.
		rslt = GraphicsEngine.EndScene();			
		if(rslt != SUCCESS)
		{
			PostQuitMessage(0);
			break;
		}		
		Sleep(0);
	}
	return 0;
}

RequestLock() sets the Lock variable of the GraphicsEngine class to true, but NOT the RS_PAUSED render state. That way, we can wait until the RS_PAUSED render state does get set, and return control of the program once it does. By doing this, we can be 100% sure that between LockRenderThread() and UnlockRenderThread(), D3D will be doing absolutely nothing. This is key. Now, when loading a new resource (Vertex Buffer, texture, shader, whatever), you simply: - Call LockRenderThread() - Make your changes. - Call UnlockRenderThread() Note: My version displays a blank screen if the update takes a long time since it renders nothing. You could easily have some default text that could be rendered ("Loading...please wait") or something as well. The biggest benefit that I'm getting out of running the rendering on a seperate thread is that I can better control the resolution of different things, and I can very easily control how much CPU time is being dedicated to non-graphics work. In most time-based rendering loops, you suddenly drop to only updating at a resolution of about 16/17 ms when you have v-synch on. Doing it this way, I can still have a resolution that's a fraction of that, even if my frame rate happens to dip to 20fps or something. It's important that you don't make any of your rendering functions dependent on the execution time of your rendering loop. For example, if you have a shader that takes a constant representing the time, poll the time from your main thread, not the rendering thread; this will keep everything in synch, even if the video card starts chugging along. Some people may think that this is kind of worthless, but just look at all the plans for both new game consoles and new pcs ready to ship: Multi processors. If you want to take advantage of that, it's important to use multithreading, or else you'll find that you can't really take advantage of the additional processing power at your finger tips.

Share this post


Link to post
Share on other sites
First of all, great post [grin]. Now you have me wanting to implement a thread interface in my project. Must...resist...more...features...

I've never really messed with multi-threading, so in your opinion, was it easy to implement? If it only took you 2 hours, it can't be too bad.

Also, how many different threads are you going to make? Right now, you have:

(1) Engine thread
(2) Rendering thread

If you ever added a networking interface, you would need a separate thread for that, as well (soley for collecting + processing messages). It could get *very* complicated, where you have a separate thread for everything (ie graphics, sound, networking, input, physics).

It would be cool to have someone write an article about multithreading, and how it can be used in game engines. It could cover a lot of material, I believe.

Our coding styles are shockingly similar...we even use the same function header comment. The only difference I see is that I almost always use HRESULTs.

Share this post


Link to post
Share on other sites
I note that you have a local boolean in both of those functions which is never used except as a means of keeping a loop going forever. Why not just do while(true) :). Or did you make a mistake while pasting the code?

Share this post


Link to post
Share on other sites
Quote:
Original post by circlesoft

I've never really messed with multi-threading, so in your opinion, was it easy to implement? If it only took you 2 hours, it can't be too bad.


It was pretty easy, but my Engine layout was already well structured to the point where I didn't have to change a whole lot; if you're directly calling D3D functions from your app, you'll have a hell of a lot more work than I do. I use a rendering queue system (objects never directly draw themselves), and I use a resource manager that controls the creation of all d3d resources (textures, shaders, buffers, etc.) if you do any of this on your own, outside of a centralized location, it will be very, very hard to do this right.

Quote:

Also, how many different threads are you going to make? Right now, you have:

(1) Engine thread
(2) Rendering thread

If you ever added a networking interface, you would need a separate thread for that, as well (soley for collecting + processing messages). It could get *very* complicated, where you have a separate thread for everything (ie graphics, sound, networking, input, physics).


Not really. For networking, you're already using unreliable data (thanks UDP!), so it doesn't matter if your updates are irregular. You're only going to get a few milliseconds (at most) delay in the updates, so you're fine.

Sound is going to be on a seperate thread (hard to do streaming without it), as well as networking. The main thread will deal with physics and input, as those are the things that have no hardware support and require the most CPU time.

Quote:

Our coding styles are shockingly similar...we even use the same function header comment. The only difference I see is that I almost always use HRESULTs.


Yeah, I'm not that fond of HRESULTs (though of course they're required for COM). I prefer using exceptions and error logging than checking for return values; that's kind of a pain in the ass, really.

Also, 9 times out of 10 if an error occurs, I either have some default behavior (if you try to load an invalid texture, it loads a default purple thing that says "INVALID TEXTURE" instead). Most other errors will stop the engine from working right, or will require falling back to some ugly behavior. If a certain feature of D3D isn't supported, I simply have a fallback that automatically picks up. Most other issues are going to be critical failures, and the program has to terminate anyway, so you might as well just log the error and terminate.

Share this post


Link to post
Share on other sites
Quote:
Original post by Washu
I note that you have a local boolean in both of those functions which is never used except as a means of keeping a loop going forever. Why not just do while(true) :). Or did you make a mistake while pasting the code?


Nope, no mistake; I use level 4 warnings and break on errors, and while(true) generates a warning under level 4 in visual studio, whereas the local booleans do not. I suppose there's a 'cleaner' way to do it (while(!GraphicsEngine.GetRenderState(RS_PAUSED))), but I wasn't quite sure if I'd need to do any checks before exiting, so I left it as is.

Share this post


Link to post
Share on other sites
I like your idea, it is neat to keep your renderthread locked when updating mutual data. But from personal experience I find multithreading is GENERALLY a bad idea in games. The cost incurred to context switch threads is not worth it unless you have a very specific application such as yours (streaming in music for one). Also synchronizing time between threads, mutex or using semaphores to control access to dual sensitive data is a nightmare even on non-realtime apps (can anyone say Managed C++?)...

And on a single-processor system (most pcs?) you can simulate threading by using access flags and repeatedly polling your WAIT state. This is much faster than using two threads as it spares you the context switch. I had an app which was polling input in a different thread from the engine from the graphics, while I managed to synchronize all three in the end (so that no undefined memory behavior occured), I found it gave me a serious performance drop, especially as I was polling one thread to generate data for another redundantly (say waiting for up arrow to be pressed to move ur ship, but polling a hundred times a second without a press--switching contexts 300 times a second without any point). On a single-threaded system it had no trouble at all. As I said you can reclaim wasted CPU cycles (that you're passing off to the other threads) by using a polling flag.

And I'm not sure but I would imagine that nextgen consoles with multiprocessing (SMP) would have their APIs/kernel autodistribute and manage their tasks (like the Solaris model that 'attaches' kernel threads to applications), so making your own multithreader may be redundant.
All that said, I'm still very impressed by the renderthread locking you have demoed.

Share this post


Link to post
Share on other sites
Direct3D can handle some multithreading sync on itself by using the D3DCREATE_MULTITHREADED flags when calling CreateDevice. I do not prevents you from using critical sections for your own code but it protected DirectX calls from locking when having a concurrent access.

Share this post


Link to post
Share on other sites
whaoh seem my reply made bugging something on the forum (got an error message). Please moderator can you delete these ugly repeated posts ? Sorry for the disturbance.

Share this post


Link to post
Share on other sites
Clicking twice will not result in 9 posts ;).

Anyways Entu very cool. So, you using Critical Sections for synchronization? You could eliminate atleast one of those loops through a Wait call.

Share this post


Link to post
Share on other sites
Quote:
Original post by dhanji
I like your idea, it is neat to keep your renderthread locked when updating mutual data. But from personal experience I find multithreading is GENERALLY a bad idea in games. The cost incurred to context switch threads is not worth it unless you have a very specific application such as yours (streaming in music for one). Also synchronizing time between threads, mutex or using semaphores to control access to dual sensitive data is a nightmare even on non-realtime apps (can anyone say Managed C++?)...



Not necessarily; if your data is managed right, you can pretty much make it so that nothing outside of the renderer ever touched D3D resources. If you can think of a time when you'd actually want / need to access the D3D resources, though, I'd be happy to hear it. I just can't think of any.

Like my original post said, though, I've actually seen a significant improvement in overall performance here.

Quote:

And on a single-processor system (most pcs?) you can simulate threading by using access flags and repeatedly polling your WAIT state. This is much faster than using two threads as it spares you the context switch. I had an app which was polling input in a different thread from the engine from the graphics, while I managed to synchronize all three in the end (so that no undefined memory behavior occured), I found it gave me a serious performance drop, especially as I was polling one thread to generate data for another redundantly (say waiting for up arrow to be pressed to move ur ship, but polling a hundred times a second without a press--switching contexts 300 times a second without any point). On a single-threaded system it had no trouble at all. As I said you can reclaim wasted CPU cycles (that you're passing off to the other threads) by using a polling flag.


Possibly, though this might also overcomplicate your code; I suppose there are ways around it, but, personally, I feel that the multithreaded approach works just fine. It also will scale nicely on multi-processor systems.

Quote:

And I'm not sure but I would imagine that nextgen consoles with multiprocessing (SMP) would have their APIs/kernel autodistribute and manage their tasks (like the Solaris model that 'attaches' kernel threads to applications), so making your own multithreader may be redundant.


Possibly, though I'm not so sure how optimal it would be; I mean, the OS can't possibly "know" how your app is designed. Yes, it can make good guesses, but it'd be kind of hard to know exactly when your app will be using different resources without adding a ton of extra overhead. I suppose we'll have to wait a few more years to find out, though.

Quote:

All that said, I'm still very impressed by the renderthread locking you have demoed.


Thanks.

Quote:

Direct3D can handle some multithreading sync on itself by using the D3DCREATE_MULTITHREADED flags when calling CreateDevice. I do not prevents you from using critical sections for your own code but it protected DirectX calls from locking when having a concurrent access.


Er, not exactly. D3DCREATE_MULTITHREADED is required for using multithreaded apps, period. You can't run it without it (well, you can, but only if you want to get 5fps).

Also, see the point further up about the system not knowing what you're going to do. If you let your app lock d3d resources without the rendering thread being aware of it, you're likely to cause all sorts of problems; not only that, but if you've got a rendering queue, you'll probably be altering / invalidating data.

The setup I proposed makes sure that you never alter an object inside of a frame. That's extremely useful.

I've also begun using this setup to allow me to stream data in in the background. Depending on how far I take it, this could be extremely interesting...I'll update if I get anywhere with it.

Share this post


Link to post
Share on other sites
Quote:
Original post by Washu
Clicking twice will not result in 9 posts ;).

Anyways Entu very cool. So, you using Critical Sections for synchronization? You could eliminate atleast one of those loops through a Wait call.


That'd put more overhead than is really necessary on the system.

The way I have things setup, objects can't possibly alter d3d data (buffers, textures, etc.) at all -- they just can't.

The loops are only activated when something needs to change the data (which is handled internally by the resource manager). The LockRenderThread() function will always return after 2 frames max (which is typically 1-10ms, depending on the current frame rate of course). This delay is completely irrelevant, though, as data is only being updated outside of the loop.

Anything that requires dynamic updates of geometry every frame (such as splitting triangles for LOD calculations and whatnot) is handled by the object's drawing code (which is only called from the rendering thread).

The net result is, again, that there's zero risk of data collisions, and you don't have to worry about creating things like mutexes and whatnot.

With high frame rates, exact synchronization is not that important. If I'm only moving my position by a few pixels every iteration through the loop, but i'm rendering at 60fps+, nobody is going to possibly see that they actually moved a frame behind when they pushed the key.

Yes, it would be an issue in extremely low frame rate situations. I think it goes without saying that low frame rates are an issue in and of themselves anyway, though, so I don't feel that it's important.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
FYI:
for(;;) is the C/C++ standard way of an infinite loop. It should pass a level 4 warning.

Share this post


Link to post
Share on other sites
Well, since you have only another thread that might modify the data, then it sounds like your system is fine. Although I would be hesitant to use it myself without adding a guaranteed synchronization object.

Critical Sections != Mutexes. Mutexes add overhead for sure, as they are cross process objects. However a critical section is valid only for the current process, and thus much more lightweight. As far as creating them go, cmon, it's a single call, once at the start. Also, using a signal, you could tell the thread that the render state was paused. This would result in the thread using 0 processing time. While the current sleep method still requires the thread to be switched to and run, even though that is fairly quick and it does surrender the rest of it's slot, it is none the less using processing time.

How do you schedule jobs for the thread? Ie: You say you do it when loading resources, so how do you tell the thread what to load? etc. Basically, what kind of inter-thread communication do you have?

Share this post


Link to post
Share on other sites
Quote:
Not necessarily; if your data is managed right, you can pretty much make it so that nothing outside of the renderer ever touched D3D resources. If you can think of a time when you'd actually want / need to access the D3D resources, though, I'd be happy to hear it. I just can't think of any.


not necessarily d3d objects, just any data. For instance my earlier example of an input polling thread receiving data, places it in a buffer whose access is controlled by mutex/critical sec (synchronized func), and if you're not generating any input you have wasted context-switching. I admit on faster processors it wouldn't really be significant. But you dont really want any overhead at all right?

I actually think using a simulated wait state on a single thread would be quite easy to program/manage. Certainly cheaper than switching threads.

It just seems like when you are extending your app, you have more data that is exchanged between threads, you're creating more potential for issues, I mean for a good programmer it's not a big deal, but I'm of the school of keep it simple, keep it easy. =)

Quote:

Well, since you have only another thread that might modify the data, then it sounds like your system is fine. Although I would be hesitant to use it myself without adding a guaranteed synchronization object


this is pretty much what I'm trying to say too.

Share this post


Link to post
Share on other sites
Quote:
Original post by Washu
Well, since you have only another thread that might modify the data, then it sounds like your system is fine. Although I would be hesitant to use it myself without adding a guaranteed synchronization object.

Critical Sections != Mutexes. Mutexes add overhead for sure, as they are cross process objects. However a critical section is valid only for the current process, and thus much more lightweight. As far as creating them go, cmon, it's a single call, once at the start. Also, using a signal, you could tell the thread that the render state was paused. This would result in the thread using 0 processing time. While the current sleep method still requires the thread to be switched to and run, even though that is fairly quick and it does surrender the rest of it's slot, it is none the less using processing time.

How do you schedule jobs for the thread? Ie: You say you do it when loading resources, so how do you tell the thread what to load? etc. Basically, what kind of inter-thread communication do you have?


None -- the only interactions the threads have with one another directly is when the main thread locks the other thread. The main thread can not alter data while the other thread is executing: if it did, we'd have a fine mess on our hands.

I originally set up a messaging system for the other thread, but came up with this simplified version when I realized that the only message that really mattered were:

1.) Telling the thread to pause.

2.) Telling the thread to terminate.

None of the other messages matter.

The main thread still does the actual loading / unloading of the resources -- through the resource manager. The resource manager insures that the rendering thread is locked before it makes any changes, and then it unlocks when it's finished.

It's certainly true that you couldn't use this generic approach for ALL threading. That's not what I'm using it for, anyway. This is strictly to allow a seperate thread to run independently of what everything else is doing. I'd never do this for a general purpose threading system. That'd get ugly in no time.

Quote:

It just seems like when you are extending your app, you have more data that is exchanged between threads


Nope -- this thread will never touch anything except for rendering-specific items; this means vertex buffers, index buffers, textures, and the like. Nothing else. Nothing that would ever be read by the other thread: this is a one-way process. Ever tried reading from a vertex buffer in D3D? OUCH!

A synchronization object would certainly be necessary if reads/writes could go both ways -- but they really can't here. D3D never writes to anything, it only reads. The app never reads from anything, it only writes.

That's the key to making the whole system work.


Share this post


Link to post
Share on other sites
Ahh, I see. I figured you were using the thread as a worker thread. But instead you are using it as the rendering thread. However, I'm going to stick by my original assessment about using synchronization objects

Share this post


Link to post
Share on other sites
Quote:
It's certainly true that you couldn't use this generic approach for ALL threading. That's not what I'm using it for, anyway. This is strictly to allow a seperate thread to run independently of what everything else is doing.


fair enough I like it for what it is =)


Quote:
If you ever added a networking interface, you would need a separate thread for that, as well (soley for collecting + processing messages). It could get *very* complicated, where you have a separate thread for everything (ie graphics, sound, networking, input, physics).


this I imagine is not what you have in mind..

Share this post


Link to post
Share on other sites
Quote:
Original post by dhanji

Quote:
If you ever added a networking interface, you would need a separate thread for that, as well (soley for collecting + processing messages). It could get *very* complicated, where you have a separate thread for everything (ie graphics, sound, networking, input, physics).


this I imagine is not what you have in mind..


Nope. I'm leaving the seperate threads for stuff that doesn't really use that much by way of CPU resources, and leaving the main thread with the responsibility of handling CPU intensive tasks like collision detection, file IO, etc.

I'll probably have sound on a seperate thread (streaming is hard to do right otherwise), as well as networking, but input, interface, etc. are all handled in the main thread. It's just convenient to do collision detection immediately after you handle movement and such.

Share this post


Link to post
Share on other sites
Quote:
seperate thread (streaming is hard to do right otherwise), as well as networking


Networking and loading (streaming) makes the most sense to me, I know GAIM (and I assume AIM too) uses threading to keep its interface functional while polling the socket for input. If anyone used GAIM before last year you would remember how the interface would freeze up when someone was typing really quickly at the other end. haha..

Share this post


Link to post
Share on other sites
Quote:
Original post by dhanji
Quote:
seperate thread (streaming is hard to do right otherwise), as well as networking


Networking and loading (streaming) makes the most sense to me, I know GAIM (and I assume AIM too) uses threading to keep its interface functional while polling the socket for input. If anyone used GAIM before last year you would remember how the interface would freeze up when someone was typing really quickly at the other end. haha..

Yeah, running a blocking read or select in another thread is generally more efficient than not. Since the thread will be in the wait state almost all of the time, it will take very little processing power. And once it gets data, it will be signaled and can easily read.

Streaming data is also one of those things in which you find TryLocks usefull. Say you have a temp buffer you fill. You could keep filling it upto a point, try and lock the main buffer, if it fails, just keep filling the temp buffer till you eventually do get a lock on the main one.

Share this post


Link to post
Share on other sites
Sorry to dig up an older thread, but I had a specific question about Etnu's multithreaded design, regarding player input.

So you have the renderer running at full speed, in a separate thread. And in the main thread you have the game loop, with the player input, physics, AI, etc. Now, I would assume that the main thread runs at a fixed rate, unlike the rendering thread, rather than as fast as it can (correct me if I'm wrong).

My question is, if you do have it running at a fixed rate, player input is being polled at this fixed rate as well. Doesn't this introduce input lag? The game's input responses lagging behind what the player is actually doing (as far as how the game "feels" to the player).

And if not, if the main thread is indeed running at full speed to keep input response up, how do you deal with the main thread taking CPU time away from the rendering thread? Two threads running at full speed would obviously cut the frame rate in half.

Share this post


Link to post
Share on other sites
The only problem with threads is that you have redundancy that 90% of the time causes waste in processing time. The notice drop in FPS is because you are constantly switching Stack States between 2 running threads. Imagine now if you start using more and more threads. Threads sound good guys, I know, but they aren't a god sent.

He is right about mutiprocessors though, but now with the home users getting AMD 64, I don't see standard gaming PCs using multi-processors for awhile (if ever). The problem lies near the same problem with multithreading, you really on breaking up tasks which take up time, that is why usually dual processing machine only are increasing the actual processing speed by 150% (not the expected 200% - losing 50% must mean something). The idea of mutlithreading is good but for linear games I would never include them as part of the game loop, the stack's in one process that are constantly flipped between each lower FPS. However, XBOX 2 and PS3 uses multicores, so they should work well with it.

Also another big part of what compilers and processors do is pipelining data in such a manor that seems and acts many ways like multithreading. So if you need to update before you render, just update and if you can't render for some reason, don't and update again until you can. Also what if you rendering is a mutistep process with render targets. You might be in there awhile, but the whole time you have these threads existing which are useless and constantly being flipped to.

So me not being a moderator I am sure you will take Etnu sides but the truth is for home PC mutithreading is good for certain things (loading levels and displaying text, have background music run, saving a game while continuing game play, etc.) but bad for a main game loop. But (seeing as how everyone in gamedev relies on big names) based on a strong source Carmack admited he tried mutithreading for the main game loop in Doom 3 and said it was a waste of his time. However, Epic's Unreal 3 does mention multithreading support so who knows. My opinion is if you are making a PC game don't, XBOX 2 or PS 3 game do.

Also Etnu I apologize since I am an ass most of the time, you really are a good moderator. Not to mention have good taste in movies.

[Edited by - jimmynelson on August 16, 2004 12:14:51 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Etnu
The biggest benefit that I'm getting out of running the rendering on a seperate thread is that I can better control the resolution of different things, and I can very easily control how much CPU time is being dedicated to non-graphics work.

In most time-based rendering loops, you suddenly drop to only updating at a resolution of about 16/17 ms when you have v-synch on. Doing it this way, I can still have a resolution that's a fraction of that, even if my frame rate happens to dip to 20fps or something.


This is the biggest reason I'm looking into the multithreaded rendering method. V-Sync is nice, but for most games out there, you notice a drop in responsiveness when its enabled (the Quake 3 engine is a good example). Using a multithreaded design would give you the best of both worlds.

When arguing against threads for the sake of task switching, you have to remember that we're developing for multithreaded/multiprocessing OSs. There are already hundreds of threads being run at various times behind your game. Adding another thread to your game isn't going to add a whole lot to that, you just have to make sure it doesn't compete for a lot of the renderer's CPU. But that applies to a single thread design as well. Multithreading isn't the answer to everything, but I don't think its as bad as most people think it is, and in this case has some distinct advantages over a single thread design.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this