• Advertisement
Sign in to follow this  

Single producer - multiple consumer fast queue?

This topic is 664 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there,

 

i am writing a high performance media player right now and want to implement a fast and stable circular concurrent queue used for adding packets to and getting packets from (single producer, multiple consumer).

 

Each entry in the queue is preallocated at startup and will never be re-allocated or be freed.

Normally queues like this replaces the data directly for the given index in the array and the consumer just works with a copy of the data.

 

In my case, i cannot just replace the data with a new one. I must copy the original network packet data so a decoder thread can operate on a copy and the queue can operate on that copy. But this process takes time - dequeing an item from a circular queue does not.

 

Here comes the problem:

 

The memory copy from the dequeed packet takes longer than the deque itself and the write position is already advanced - so a decoder thread starts decoding the package - but the package is not ready net - crash... weird behaviour - asserts fire.

 

What can i do to ensure that the consumer gets a finished copy of the packet and not a broken one using lock-free methods?

 

See attachments for my naive implementation in C# (WinForms for Visualization). Language does not matter - the problem can be get on any language.

Share this post


Link to post
Share on other sites
Advertisement

First step: Write a lock based one, or better yet, use an existing queue such as ConcurrentQueue, which is already moderately lock free (source).
Second step: If profiling actually shows it to be a problem, then invest the time to replace it.

 

As for writing a lock free circular queue goes, for starters: Time consuming operations should be completed long before you ever attempt to insert the object into the queue. Your insertion into the queue should be one or more CAS operations based on the result of the CAS.

 

The queue itself is not the problem, but the fixed memory used for the queue entries and the process of copy the memory into the slot.

Adding or removing from the queue works just fine. Seems i need another lock for each entry to solve this problem - so a decoder can dequeue but waits until the memory is finished.

Share this post


Link to post
Share on other sites
How much memory do you need to copy? Because you would have to literally be copying several GB worth of data per second to show up as a problem. Unless this is the case, it sounds to me you're hitting false cache sharing issues while doing the copy, which would slowdown all cores considerably.

Share this post


Link to post
Share on other sites

How much memory do you need to copy? Because you would have to literally be copying several GB worth of data per second to show up as a problem. Unless this is the case, it sounds to me you're hitting false cache sharing issues while doing the copy, which would slowdown all cores considerably.

 

Hmm after analyzing, it happens to be around ~300 kb at max for a packet. So you are basically right - there is too much going on. Back to the drawing board.

Edited by Finalspace

Share this post


Link to post
Share on other sites

Can't the queue just contain pointers to these large 300kb packets?

 

Then you can use a fast SPMC queue of pointers, and separately, a lock-free pool allocator of the packets themselves.

 

Oh this may work, i will give it a try.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement