Lockless FIFO Queue Implementation

Started by
26 comments, last by Hodgman 14 years, 10 months ago
Just out of curiosity, how much of your time is being spent inside of functions that are reading/writing from queues?
Advertisement
In this case, I think the question should be the reverse: Given a certain performance of a queue, how much queuing can you do in your application?

The reason is that future software architecture will look different from current software architecture. Small tasks that are assigned to worked thread pools are probably going to be an important part of that. Once you start writing your software like that, queuing costs are going to become noticeable.

Btw: Visual Studio 2010 is going to include a lightweight tasking/queuing library along these lines (but with more primitives than just queuing).
enum Bool { True, False, FileNotFound };
Quote:Original post by hplus0603
In this case, I think the question should be the reverse: Given a certain performance of a queue, how much queuing can you do in your application?

The reason is that future software architecture will look different from current software architecture. Small tasks that are assigned to worked thread pools are probably going to be an important part of that. Once you start writing your software like that, queuing costs are going to become noticeable.

Btw: Visual Studio 2010 is going to include a lightweight tasking/queuing library along these lines (but with more primitives than just queuing).


Oh, agreed entirely. My question was more an attempt to probe into whether this is a real problem that's being solved right now, or a case of premature optimization.
Quote:Original post by kyoryu
My question was more an attempt to probe into whether this is a real problem that's being solved right now, or a case of premature optimization.
I'm building an actor/entity system at the moment, where to allow parallel updates of entities, all inter-entity function calls must be queued (to be executed at a safe moment). This could blow out anywhere from 100 to 1M queue ops per frame.
Traditional locks can take hundreds of cycles to process. If we estimate 200 cycles of a 2.4GHz CPU, that's 1/12th of a second to do 1M lock operations.
Lock-free queues are an order of magnitude faster, and regular queues are an order of magnitude faster again.

Seeing I'm making something so reliant on queues, I've ditched all lock-free containers up front. It is a premature optimisation ;), but it's common sense that with this frequency of usage I've got to avoid using any scalability busting approach (and that means no potential synchronisation).
Cool, I love the Actor model, I honestly believe that we will see a drift towards it in the near future, as it handles concurrency much better.

From my own experiences implementing Actor systems, I'd really recommend just starting out with a locking queue first, to get a performance baselines at least. If it's encapsulated in a class, you can get an idea of what performance is like, and then make improvements from there. It is very possible for a lockless implementation to be slower than a locked implementation, depending on the efficiency of the lockless queue.
There was a nice presentation at this years GDC about
lockless programming and the gist was: "Use locks!" :)

I implemented this week a threaded task manager for my game and threw many, many
small tasks at it - the profiler showed that the locks are not nearly enough of a performance hit to be worried about them.

So I would second the use locks suggestion, at least for the time being :)

HTH,
Marc
Quote:or a case of premature optimization


Knuth must be one of the most mis-quoted people on the planet. What he actually said in that quote is that "too often, we worry about efficiencies in the small" -- what he complained about was programmers breaking out the assembler before they even had a running program to profile.

In fact, that same article clearly argues for understanding the performance characteristics of your problem before starting implementation -- the current discussion would be one example of exactly what he wanted people to understand before implementing a system.

If you don't believe me, go and look it up :-)

Anyway: if your lock is a Windows CRITICAL_SECTION, then it ends up being no more expensive than a lock-free spin-lock for the case of no contention, and only ends up hitting the kernel under contention. If you have multiple queues and multiple workers, the amount of contention will often be vanishingly small, and thus "regular" locks might be totally adequate -- and less bug-prone -- than lockless programming.
enum Bool { True, False, FileNotFound };
Quote:Original post by kyoryu
Cool, I love the Actor model, I honestly believe that we will see a drift towards it in the near future, as it handles concurrency much better.

From my own experiences implementing Actor systems, I'd really recommend just starting out with a locking queue first, to get a performance baselines at least. If it's encapsulated in a class, you can get an idea of what performance is like, and then make improvements from there. It is very possible for a lockless implementation to be slower than a locked implementation, depending on the efficiency of the lockless queue.
Like I said, I'm not using locking or lock-free queues ;) Both require synchronisation of some sort which is a scalability buster.
I'm just using plain old thread-unsafe containers - one per thread. At a single synch point these thread-local queues are merged together to be processed later.

I can't really just switch from using one type of container to another with this design, I've got to design it from the ground up so that thread-safety is taken care of by scheduling, not by locks of any sort (you can think of my single-synch point as the one lock in the system).

I guess you could kind of sum it up like this:
typedef std::vector<IMessage*> MessageQueue;typedef std::vector<MessageQueue> ThreadLocalMessageQueue;ThreadLocalMessageQueue m_PublicQueue( numThreads );MessageQueue            m_PrivateQueue;//executed by one thread, all others pausevoid SynchPoint(){  m_PrivateQueue.clear();  for each m_PublicQueue as Q  {    copy contents of Q into m_PrivateQueue    Q.clear();  }}//executed by all threadsvoid ParralelWork( int threadID, int numThreads ){  int workPerThread = m_PrivateQueue.size() / numThreads;  int start = workPerThread * threadID;  int end = start + workPerThread;  for( int i=start; i!= end; ++ i )    m_PrivateQueue->Execute(threadID);//this will push new items into m_PublicQueue[threadID]    wait for all other threads to finish  if( threadId == 0 )    SynchPoint();  else    wait for synch point to finish}

This topic is closed to new replies.

Advertisement