Single producer - multiple consumer fast queue?

Started by
6 comments, last by khawk 7 years, 10 months ago

Hi there,

i am writing a high performance media player right now and want to implement a fast and stable circular concurrent queue used for adding packets to and getting packets from (single producer, multiple consumer).

Each entry in the queue is preallocated at startup and will never be re-allocated or be freed.

Normally queues like this replaces the data directly for the given index in the array and the consumer just works with a copy of the data.

In my case, i cannot just replace the data with a new one. I must copy the original network packet data so a decoder thread can operate on a copy and the queue can operate on that copy. But this process takes time - dequeing an item from a circular queue does not.

Here comes the problem:

The memory copy from the dequeed packet takes longer than the deque itself and the write position is already advanced - so a decoder thread starts decoding the package - but the package is not ready net - crash... weird behaviour - asserts fire.

What can i do to ensure that the consumer gets a finished copy of the packet and not a broken one using lock-free methods?

See attachments for my naive implementation in C# (WinForms for Visualization). Language does not matter - the problem can be get on any language.

Advertisement

First step: Write a lock based one, or better yet, use an existing queue such as ConcurrentQueue, which is already moderately lock free (source).
Second step: If profiling actually shows it to be a problem, then invest the time to replace it.

As for writing a lock free circular queue goes, for starters: Time consuming operations should be completed long before you ever attempt to insert the object into the queue. Your insertion into the queue should be one or more CAS operations based on the result of the CAS.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

First step: Write a lock based one, or better yet, use an existing queue such as ConcurrentQueue, which is already moderately lock free (source).
Second step: If profiling actually shows it to be a problem, then invest the time to replace it.

As for writing a lock free circular queue goes, for starters: Time consuming operations should be completed long before you ever attempt to insert the object into the queue. Your insertion into the queue should be one or more CAS operations based on the result of the CAS.

The queue itself is not the problem, but the fixed memory used for the queue entries and the process of copy the memory into the slot.

Adding or removing from the queue works just fine. Seems i need another lock for each entry to solve this problem - so a decoder can dequeue but waits until the memory is finished.

How much memory do you need to copy? Because you would have to literally be copying several GB worth of data per second to show up as a problem. Unless this is the case, it sounds to me you're hitting false cache sharing issues while doing the copy, which would slowdown all cores considerably.

How much memory do you need to copy? Because you would have to literally be copying several GB worth of data per second to show up as a problem. Unless this is the case, it sounds to me you're hitting false cache sharing issues while doing the copy, which would slowdown all cores considerably.

Hmm after analyzing, it happens to be around ~300 kb at max for a packet. So you are basically right - there is too much going on. Back to the drawing board.

Can't the queue just contain pointers to these large 300kb packets?

Then you can use a fast SPMC queue of pointers, and separately, a lock-free pool allocator of the packets themselves.

Can't the queue just contain pointers to these large 300kb packets?

Then you can use a fast SPMC queue of pointers, and separately, a lock-free pool allocator of the packets themselves.

Oh this may work, i will give it a try.

Seems like you have a solution, but another possibility would be Zeromq's IPC mechanism for simplifying producer/consumer architectures. http://zeromq.org/intro:read-the-manual

Admin for GameDev.net.

This topic is closed to new replies.

Advertisement