Sign in to follow this  

threads, mutexes and the expense of locking.

This topic is 3457 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm working on a game using threads to handle various parts of AI but the threads will have common data elements that will be shared between them. Obviously I could use a mutex to lock down access to the shared elements but I'm concerned because I've been reading on the forums here that locking mutex's can be expensive. Since I'm not a thread expert and I want to be efficient are there other options (using booleans for example) or will I have to just bite the bullet and just code in the expensive calls? For what it's worth: I'm thinking of using 2-4 threads and my plan is to share a few simple c++ vector between the threads. Thanks in advance for your responses.

Share this post


Link to post
Share on other sites
Yes, locking is expensive, more so if it is a contended lock (i.e. two or more threads are trying to acquire the lock) because that would mean at least one thread has to wait until the other one releases the lock again.
That's why you want to reduce the number of synchronization points in your program to an absolute minimum. Rather than constantly acquiring a lock, accessing the shared data and releasing the lock again it is often cheaper to acquire the lock only once per frame and create a thread-private copy of the shared data and operate on that rather than on the original shared data.

And no, simple bools won't cut it. If you share data across threads, you're going to need some form of synchronization, there is no way around it.

Share this post


Link to post
Share on other sites
Quote:
Original post by Red Ant
Yes, locking is expensive, more so if it is a contended lock (i.e. two or more threads are trying to acquire the lock) because that would mean at least one thread has to wait until the other one releases the lock again.
That's why you want to reduce the number of synchronization points in your program to an absolute minimum. Rather than constantly acquiring a lock, accessing the shared data and releasing the lock again it is often cheaper to acquire the lock only once per frame and create a thread-private copy of the shared data and operate on that rather than on the original shared data.

And no, simple bools won't cut it. If you share data across threads, you're going to need some form of synchronization, there is no way around it.


That's what I was thinking of doing. Are you saying that is not a very expensive method? If so then I'm gold.

Share this post


Link to post
Share on other sites
As a general advice, if you're looking for performance you need to put some efforts on minimizing data sharing (and therefore the need of synchronization)

And of course if we're talking about win32 and threads here, you will use critical sections instead of mutexes, because locking a critical section will fall back to locking a mutex only in situations where it actually needs to be blocking, but will just result a few very fast atomic operations in situations where it doesn't (being much, much faster), which, again, you should try to make your 99%-of-the-time scenario.

Lockless programming (which you imply when you say "using booleans for example") is indeed an option but
1/ it will bite you sooner or later (even more so if you're coding on xbox360)
2/ is far tricker than it seems to be (that is, it won't be enough at all just to add a couple 'volatile' here and there :) )

Share this post


Link to post
Share on other sites
Quote:
Original post by atimes

That's what I was thinking of doing. Are you saying that is not a very expensive method? If so then I'm gold.


I'm saying it CAN be a comparatively cheaper method. Whether it really ends up being cheaper depends on the amount of data you'd have to copy I guess.

Share this post


Link to post
Share on other sites
Quote:
Original post by atimes

Obviously I could use a mutex to lock down access to the shared elements but I'm concerned because I've been reading on the forums here that locking mutex's can be expensive.


The code requires to do that is relatively cheap. But locks introduce possiblity ot contention (multiple threads trying to access same resources, all except one need to wait). This is the cost of using locking primitives to share data.

There are two general solutions to solve this:
- Establish read only state. Arbitrary number of threads can read, but they cannot modify it
- Replicate state between threads. Requires n x memory, where n is number of threads.

Something is obviously missing: What about changing the state? There is no trivial answer to this.

As such, obvious solution is to establish single-threaded logic, which passes data between threads in a well defined manner.

For example:
Thread 1) Rendering thread (reads from last complete state)
Thread 2) AI (reads from last complete state, writes into new state)
Thread 3) Reads user input, puts data into locking queue. This queue is read by once every frame
Thread 4) Asset loading, accepts requests via locking queue, puts pointers of results into another queue

This allows your renderer to run happily without locking, your AI to run as fast as needed without locking (except to read inputs once per frame), and everything else to work in async manner.

Share this post


Link to post
Share on other sites
Quote:
Original post by Red Ant
Quote:
Original post by atimes

That's what I was thinking of doing. Are you saying that is not a very expensive method? If so then I'm gold.


I'm saying it CAN be a comparatively cheaper method. Whether it really ends up being cheaper depends on the amount of data you'd have to copy I guess.

It also depends on whatever algorithms you are using. The frequency of locking any lock, frequency of locking a specific lock, duration of locks, and the usage of CPU-cached data all impact the performance.

You might find that a mutex is bad in your situation. A mutex is one of the most expensive locks in terms of performance, but you know that it is system wide. However, on Windows, the cost of acquiring an uncontested mutex is about the same as a regular context switch. There are worse things that could happen.

Much better in Windows is the Critical_Section lock. It has most of the same properties as a mutex, but it is local within a program. For an uncontested lock, a Critical_Section takes only 1% of the time required by an uncontested system mutex.

If you only need access to a single variable or certain other specific items, look up the Interlocked operations (InterlockedAdd, InterlockedSubtract, InterlockedExchange, ...) that perform better than a Critical_Section object.



You might also consider ways to share data between threads that do not require synchronization at all. Named and unnamed pipes, mailslots (if available), and the OS's event queue (see WM_COPYDATA) are just a few ways to transmit data between threads asynchronously.

Share this post


Link to post
Share on other sites

This topic is 3457 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this