Sign in to follow this  
BeanDog

Non-locking producer/consumer queue

Recommended Posts

I'm considering moving my software SDL rendering into its own thread to take advantage of the very common dual-core processors. The idea is that the rest of the program would run in a "main" thread, where rendering calls just added the command to a producer/consumer queue. The "rendering" thread just constantly empties that queue by actually executing the rendering code requested. The main thread would lock once per frame until the rendering queue is empty. I came across a simple, slick implementation of a non-locking producer/consumer queue by a Microsoft MVP here. The big assumption that makes this thread-safe without locking mechanisms is that there is exactly one producer and one consumer. Sounds like a perfect fit. Here's the basic implementation of pushing and popping:
bool PushElement(T &Element)
{
      int nextElement = (m_Write + 1) % Size;
      if(nextElement != m_Read) 
      {
            m_Data[m_Write] = Element;
            m_Write = nextElement;
            return true;
      }
      else
            return false;
}

bool PopElement(T &Element)
{
     if(m_Read == m_Write)
            return false;
     int nextElement = (m_Read + 1) % Size;
     Element = m_Data[m_Read];
     m_Read = nextElement;
     return true;
}
One key note is that m_Read, m_Write, and m_Data are declared volatile, so that the compiler promises that operations on them will not be reordered. Here's my problem: The producer thread needs to lock once per frame, waiting for the consumer thread to empty the queue. I'd like to do this without simply saying "while(m_Read != m_Write);". I'd also like the consumer not to (essentially) say "while(m_Read == m_Write;" while waiting for new work to do. Both of those could have unexpected (bad) performance consequences on single-processor machines. So, does using this non-locking model make any sense for what I want to do? Does someone have a locking solution that might offer better performance? Or is there another obvious solution I'm overlooking?

Share this post


Link to post
Share on other sites
I don't really see the problem with either of those approaches. Maybe you could use a mutex with an optional spin count so the producer can potentially go to sleep if the consumer takes a long time, but other than that it looks fine. On a single-processor machine you'd have one thread that does this:
// Thread 0
while(true)
{
Produce();
Consume();
}
While on a multi-processor machine you'd have two threads that do this:
// Thread 0
while(true)
{
Produce();
WaitForSignal();
}

// Thread 1
while(true)
{
Consume();
RaiseSignal();
}
Unless I'm missing something?

Share this post


Link to post
Share on other sites
Quote:
Original post by BeanDog

Here's my problem: The producer thread needs to lock once per frame, waiting for the consumer thread to empty the queue. I'd like to do this without simply saying "while(m_Read != m_Write);". I'd also like the consumer not to (essentially) say "while(m_Read == m_Write;" while waiting for new work to do. Both of those could have unexpected (bad) performance consequences on single-processor machines.


The whole point of lock-less structure is that they don't contain locks, thereby preventing stalls that occur with lock-based algorithms. This allows you to write algorithms that don't wait on each other, and where the biggest performance gains come from. You're trying to completely eliminate waiting. If you can't structure your algorithm this way, then you won't see any benefit from using such structures.

A more efficient way to do what you want would be to have n queues (double, triple, ... buffer approach), where producer writes entire state to a queue, then hands off the queue itself. Then you can use a standard locking queue with some event signaling to communicate.

Otherwise, you'd use your queue like this:

// producer
while (!done) {
// single-core producer
// while (!PushElement(x)) Sleep(0);

// multi-core producer
// while (!PushElement(x)) ;

EndOfFrameMarker marker;
PushElement(marker);
}

// consumer
while (!done) {
if (PopElement(x)) {
if (x==EndOfFrameMarker) {
// complete frame
} else {
// process x
}
}
// single-core
// Sleep(0);
}


Here, you use the queue to signal, thereby removing the need for blocking. It allows producer to run for as long as there is room in queue, even outrunning the consumer.

Sleep can be left in for multi-core as well, especially if you have more than just two threads, or want to give the CPU a little time to breathe.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this