Sign in to follow this  

Efficient multithreading?

This topic is 3466 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I'm trying to design an efficient multithreading for my engine, so far I've done this:
namespace ThreadManager
{
// adds thread assigments to the stack
	void threadsAddTask(void (*aFunction)(void *), void* aArgument);

// starts a custom number of threads
	void threadsStart(int aThreadNumber);

// used on cleanup to stop all the threads
	void threadsEnd();

// checks if the threads completed all assigments
	bool threadsCompleted();
};

so what i do now is call at the start of the application: ThreadManager::ThreadsStart(4); this creates 4 threads which are indentical and loop through whole execution time:
int ThreadFunction(void *)
{
	VirtualCall *tmp = NULL;
	while (true)
	{
		tmp = NULL;

		SDL_mutexP(gLock); 
			
		if (!gVirtualCalls.empty()) 
		{
			tmp = &gVirtualCalls.front();
			gVirtualCalls.pop();
		}
						
		SDL_mutexV(gLock); 

		if (tmp != NULL)
		{
			tmp->gFunction(tmp->gArgument);


			SDL_mutexP(gLock); 
			
			if (gVirtualCalls.empty())
				gDone = true;
			else
				gDone = false;

			SDL_mutexV(gLock); 
		}

	}
}

this basicly pops the assigment form the stack and call the function from it so in the main game loop when i call CPU intensive operations it puts them on the thread assigment stack. To ensure that the job is done have to wait in the main loop before rendering: while(!ThreadManager::threadsCompleted()) {} I'm wondering if this is a good desing, since i did it from top of my head? Also, is there any crossplatform way of determening the number of CPU cores (to optimize ThreadManager::threadsStart)?

Share this post


Link to post
Share on other sites
I recommend boost for threading. Boost also provides a crossplatform way to determine the number of hardware threads available.
Clicky

The function described here should provide you with the number of hardware threads.

I once build a system you are building, still using it, but I also included the functionallity for parent tasks as I called them. A task could spawn serveral child task that where executed. As soon as all the child tasks are done the execution resumes in the parent task. I tested this system with a simple parallel merge sort and I got almost linear resulsts. >1.95 of the speed on 2 cores compared to a single core. However it is important to note that it is harder to design your application to make efficient use of the system than implementing the system itself.

Hope this helps

EDIT: I might also add that a spin wait lock is not the best way to wait until a task is completed. You are eating CPU that could be used more efficient. Use a mutex for that or a conditional variable. Only when you are sure the spin wait will not operate very long it is more efficient because you'll get less OS scheduling.

Share this post


Link to post
Share on other sites
If you need task-based dispatcher service, boost::asio provides io_service which does just that. Primarily intended for networking it can be used as basic framework for implementation of active objects or task-based parallelism. It takes care of a million of one details that are involved in this.

For short running tasks, your dispatcher will exhibit behavior considerably worse than that of single-threaded engine (the mutex contention and function call overhead will add up). In general, such dispatchers are implemented in lock-less manner.

Another major problem will be idle behavior on single-core machines, where your dispatcher will peg the CPU at 100% while doing nothing, and severely hurting performance of currently running threads. All threading facilities provide some form of event notification (WaitForMultipleObjects WinAPI, boost::condition, semaphores in pthreads, see Google/wikipedia). Many of these however can degrade performance under heavy load.

Long story short, developing and efficient, low overhead task dispatcher is far from trivial, and regardless of which approach you take, there will always be characteristic cases which will exhibit sub-optimal performance - so you'll need to take into consideration what your manager is intended for.

Under general classification, this type of concurrency is used for coarse-grained, task-oriented parallelism. An improvement over this is half-sync-half-async (see ACE/TAO, see google) approach, which tries to use optimal approach depending on availability of resources.

Quote:
To ensure that the job is done have to wait in the main loop before renderin


Waiting for threads is counter-productive at thread manager part, except perhaps for shutdown. There exist facilities (futures, completion objects, semaphores, scatter/gather, ...) which take care of this at higher level. Threading for such systems should be transparent, and give logic no option to manipulate the threads (in same way that terminating a thread is rarely optimal choice). Waiting for anything defeats the purpose of concurrency.

For good practice in this type of programming you can look at Python's Twisted framework, Stackless Python, Erlang or boost::asio server implementations (see Google/wikipedia, all are closely related to network programming, which implicitly requires concurrent programming concepts).

Share this post


Link to post
Share on other sites
Thanks for replys, I'll look in to the stuff you mentioned.

Little bit more about what i would use this system in my game for: I use it to do skinning, so when a thread picks up a task, it receives a model and then calculates transformations for every vertex. I see that the waiting is bad, but I must have calculated values to call render calls (especialy because using opengl).

(idea now comes to mind, an animation double buffering system :) , the thread calcualtes buffer values and when it's done just locks and changes an int value to 1 or 2 depending on the last buffer (if it's 1 then buffer writes to 1, renderer reads 2, and if it's 2 then it's the opposite), problem with this that the animation could be late on different systems)

Share this post


Link to post
Share on other sites
You are waiting for all calculations to be done in your main thread. Other threads calculate the skinning right?

The disadvantage is that your main thread is waiting for all calculations to be done. To solve this, you could also let your main thread participate in the calculation. This way it does something usefull while waiting.

GBS

Share this post


Link to post
Share on other sites

This topic is 3466 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this