You need to have memory barriers, compiler optimization barriers, and atomicity guarantees in order to implement a mutex that way. A simpler mutex can be based entirely on atomic compare-and-swap but still has the serious disadvantage of chewing up a logical CPU while blocked. To get a genuine mutex, that actually suspends a thread that's blocking, you need to be the operating system kernel, in effect.
This is why you need to use OS-provided facilities, or portable wrappers around those facilities, to do safe thread synchronization. Inter-process synchronization and communication is a whole separate can of poisonous worms.
It's worth taking some time to study up on the common synchronization elements (mutexes, semaphores, condition variables, events, and so on, depending on platform).
I see. I've read about critical sections before. Would either of you happen to know what (if any) are the advantages of using the Win32 API with critical sections as opposed to SDL_mutex or Boost? I'm already using SDL for pretty much everything else but I'll try the other options if there's potential for speed.
There will be no effective difference between a good portable mutex implementation and an OS-specific API, largely for the reason that the portable versions are implemented by using
the OS APIs. So in other words, no, there's no real advantage, unless your library is pathologically stupid (which I don't believe applies to SDL or boost).
Edited by ApochPiQ, 19 January 2014 - 02:41 AM.
As to the original question: on basically any CPU architecture you'll ever deal with, that code is buggy, and in several possible ways.