Double-check my lock-free voodoo

Started by
30 comments, last by Drew_Benton 15 years ago
I've been working on a lockless message passing system for the last couple of days, and finally came up with what seems like a pretty solid implementation. First, some important background: this is designed for a one consumer, multiple producer scenario. It will assuredly not work in a multiple-consumer situation. However, from my checks and logic, it should be fully safe as long as only one thread is consuming data. The theory is very simple. We maintain three stacks. The first stack is the "write stack." All producers append their messages to the write stack with a simple CAS-based method. The second stack is the "read stack." This stack is only visible to the consumer; all work by the consumer is done on the read stack. It is effectively not aware of the write stack. The third stack is a little bit of a hack. I want this system to have FIFO semantics, but the write stack is implemented as a LIFO stack. Therefore, the read process uses this third stack to essentially reverse messages that it reads. Consuming a message means popping an item off this third stack. Anyone with basic familiarity with lock-free structures should be familiar with the CAS LIFO stack, so I won't bother explaining it. The consumer half is a bit wonky, though, so here's the logic. (I make heavy use of a "sentinel" node which is just a node in either the read or write stack that has no attached data payload. This sentinel is basically the "end of list" marker.)
  • If the read head is the sentinel node, swap the pointers to the read and write stacks, and proceed as usual
  • Traverse the read stack (which is basically a singly linked list at this point) and push each item onto the third stack as it is visited.
  • If the third stack is empty, return NULL, indicating no pending messages
  • Pop the first item off the third stack, clean up some memory usage, and return it to the caller as the current message
I've reviewed this logic repeatedly, and stress-tested the implementation under 30 threads of load on my dual-core workstation. As near as I can tell the code is free of potential problems, primarily because of the magic constraint that we only have one consumer. So here's the code: [edit] Removed the original code; please see the updated version here. [/edit] I welcome any comments, suggestions, and/or gentle explanations that I am a moron and that this doesn't actually work [grin] [Edited by - ApochPiQ on March 20, 2009 6:22:50 PM]

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Advertisement
Is there particular reason why you don't just use lock-less FIFO queue, which is also safe for multiple-producers multiple-consumers case.
Every lockless FIFO I've seen requires not just CAS but CAS2; since CAS2 is not widely supported (especially in 64-bit land), I've opted to try and stick to CAS for portability.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

What level of portability are you going for? volatile gives you different guarantees depending on the compiler you're using (or even the *version* of the compiler you're using). I'd try not to use it at all and instead insert the barriers explicitly.

I'm not qualified to comment much further than that on the nitty gritty. You have some exception safety issues, though, unless you're assuming new will never throw (which isn't the case on Windows, for example).
I thought cmpxchg8b is quite widely supported on 32-bit x86 platforms. Not sure how well cmpxchg16b is supported for 64-bit platforms though. Do you have something to back up the "CAS2 is not widely supported" claim, e.g. a supported instruction table or something?
I was looking back through my notes and intertron history and realized that all the sources I have regarding CAS2 on 64-bit processors were pretty old, and likely outdated. Seems like cmpxchg16b is pretty common now.

In any case, I've got a working algorithm (I think) and don't care to go through all the trouble of building another one [wink]


Yeah, exception safety is a bit of a problem here; I'll be tuning that up along with lots of other code that needs it.

Manual memory fencing also sounds like a good idea. I'll add that to my list and hopefully get it hammered out pretty quickly.


Thanks all for the feedback thus far; although I'm still curious if anyone can find a problem in the implementation...

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Here's a fun one: Visual C++ provides no intrinsic for cmpxchg16b (unless they added it very recently), nor will it compile 64 bit assembly. How much do you love MASM? [grin]
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Quote:Original post by Promit
Here's a fun one: Visual C++ provides no intrinsic for cmpxchg16b (unless they added it very recently), nor will it compile 64 bit assembly. How much do you love MASM? [grin]


Your information is slightly dated. _InterlockedCompareExchange128 was added late 2007.
Quote:Original post by ApochPiQ
In any case, I've got a working algorithm (I think) and don't care to go through all the trouble of building another one [wink]

I haven't checked through your code, so I can't tell if there are issues with it, but one thing you might want to try to do is to reproduce the ABA issue, which is why CAS2 is often used. The problem may rise very rarely, thus it's difficult to test empirically. I implemented a lock-free FIFO queue in Spin-X Engine based on "Optimized Lock-Free FIFO Queue continued" paper though, and it's not terribly difficult.
I've walked through every combination of partial execution/preemption and can't find a case where ABA occurs. This is primarily because of the way the single consumer works - I have a guarantee during the SwapReadAndWrite operation that the only preemption can come from additional writers (since there's only the one consumer). Similarly, because of the way the read/write stacks are swapped by the consumer, I should never see an ABA pattern in the producer's push operation.

However, I'm fairly green when it comes to lockless programming, and we all know how pernicious and subtle lockless bugs can be - so I'd really appreciate it if someone else could double-check my logic and make sure I'm not overlooking something.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

This topic is closed to new replies.

Advertisement