Jump to content

  • Log In with Google      Sign In   
  • Create Account


pthread_mutex much slower than naive benaphore?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
2 replies to this topic

#1 e‍dd   Members   -  Reputation: 2105

Like
0Likes
Like

Posted 13 August 2012 - 06:10 PM

I came across the concept of a benaphore for the first time not so long ago and decided to experiment with it. I implemented it on Windows and OS X and while I'm not sure I'll use it for anything, it remains in my threads library for the time being.

On Windows, it performs as expected i.e. very well in low contention cases, terribly under high contention. But on OS X, I noticed my high contention stress tests were completing much faster for benaphores than they were for my pthread_mutex wrappers and I'm hoping someone might be able to shed any light on the reasons why.

I understand that pthread mutexes will be inherently slower due to additional features and capabilities, but given that we're talking over an order of magnitude's performance difference under high contention scenarios (in the opposite direction to that which I'd expect), I'd like to understand what's going on.

The forum says I'm not permitted to attach a .cpp file (really?!), so I've put the benchmark code in my DropBox Public folder for now. It's a single C++ file, essentially equivalent to one of my stress test cases. The semaphore I've used here is dispatch_semaphore_t from libdispatch. A number of threads are started, each of which increments a shared variable a large number of times, acquiring/releasing the lock before/after each increment. Here's the body of each thread:

template<typename Mutex>
void *thread_body(void *opaque_arg)
{
	CHECK( opaque_arg != 0 );

	shared_stuff<Mutex> &stuff = *static_cast<shared_stuff<Mutex> *>(opaque_arg);

	for (uint32_t i = 0; i != stuff.increments; ++i)
	{
		stuff.mtx.acquire();
		++stuff.total;
		stuff.mtx.release();
	}
	return 0;
}

And here are the results on my quad core i5 iMac running OS X 10.7, built without error checking code.

bena_vs_mutex_stats.png
(Sorry about the use of an image, I fiddled with formatting for 30 minutes but the forum always insisted on mangling it to some degree).

Even in the case of 2 threads we're looking at 2.458 vs 86.479 seconds (!). I can happily accept that the libdispatch semaphore is more efficient than a pthread_mutex, but given the degree of improvement in the benaphore case, I'm inclined to believe that I've misconfigured/misunderstood something. Any ideas as to an explanation?

Sponsor:

#2 ApochPiQ   Moderators   -  Reputation: 14623

Like
2Likes
Like

Posted 13 August 2012 - 06:58 PM

This might be interesting to you.

I vaguely remember having terrible issues with the OS X implementation of mutexes back in the 10.4 days, but I'm kind of surprised they haven't fixed them by now.

#3 e‍dd   Members   -  Reputation: 2105

Like
0Likes
Like

Posted 14 August 2012 - 12:26 PM

Thanks for digging that out.

Looks like I'll have to start experimenting with custom mutex implementations, then :( ... and reimplement condition variables :(




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS