strange deadlock

Started by
11 comments, last by swiftcoder 8 years, 7 months ago

Hi everyone.

I'm developing an MQTT broker. In essence, it is a messaging protocol based on a client/server arquitecture.

The protocol uses control packets for sending messages. One of the rules is the following:

The client sends a publish packet, then the server must answer with a pubrec packet. After receiving the pubrec, the client answers with a pubrel packet and finally the server send a pubcomp packet. This will ensure the message is reliably received only once.

I implement this using a pair of queues. One is the work queue, and the other is the global queue. Each time the client wants to send a packet, it is enqueued in the global queue. Then it is passed to the work queue which is in charge to acknowledge all the publish-pubrec-pubrel-pubcomp chain.

While this process finishes, more messages can be enqueued in the global queue. This ensures that only one message is inflight and that the order is preserved. Another thing to mention is that in the work queue, the packet in the head is not removed until it's acknowledge packet arrives. So, if the acknowledge gets lost, then i can resend the publish(or the packet in the head of the queue)

Now, the server can also send the same chain of packets to the client.

So it happens that when the client and the server send a publish packet at almost the same time, they both have the publish at their heads. And both wait for the pubrec packet to arrive, but none of them can send a pubrec becuase the publish is still on the head waiting for the pubrec which can't be sent..... So i got some strange kind of deadlock.

Any idea how can i break this block or even better, prevent it? I have tried some things, like a third queue, swaping some packets, etc, but all of them have secondary effects, like changing the order or screwing the the packet's ids. So i'm pretty stuck with this.

Advertisement
It's a lot of information that you dump here, I may have missed some important parts, but here are my first thoughts.

The key problem is that your protocol is for single-direction traffic (a connection) from one host to one client. If you have several instances of that, you need to keep each traffic stream separate.


Currently, you seem to have queue dedicated for a connection, where the queue tracks the state of the connection. If you then try to push two connections into the same queue, things get confused ;)

If you want to keep the queue == connection idea, make queues for each connection. For bi-directional traffic, that means, differentiate between "my initiated traffic" and "initiated traffic from the remote". Traffic from A -> B has nothing to do with traffic from B -> A, except they share a wire, so you cannot send messages for both at the same time. This leads to queues for each connection (each direction of traffic), and you ask all queues for messages to send. Messages that arrive have to be put into the right incoming queue.

Another direction that you can go, is to decide that the queue is the wrong place to store information about the state of a connection. Queues are for storing messages that can be send now, or have arrived, but not yet processed. Introduce connections as objects that handle the protocol, a protocol state machine for each connection, so you can keep track of what to do next for each connection. That also nicely handles waiting on timeout on multiple connections.

Hi. Thanks for the answer.

I think the problema is more a conceptual one instead of a technical.

I tried to summarize the problema so it can be explained as simple as posible, but that's really hard.

Anyway. Let's assume another abstraction.

Two hosts want to communicate between them. Let's assume both can write and read at the same time. Each message a host sends must be acknowledge with another "acknowledgment" message. So host sends packet1 then it must receive ack1 to know that host2 received packet1 correctly.

The problema arrises when 2 hosts send a packet between them.

Host1 sends packet1 and waits for ack1, and it won't send anything else until ack1 arrives.

Host2 sends packet2 and waits for ack2, and it won't send anything else until ack2 arrives.

As it can be seen, none of the are able to send an ackX message because they are mutually waiting for it. So i got DEADLOCK....

This is managed using queues and i haven't found an elegant solution for solving it.

Any help?

Since this is message passing, there are many things that could go wrong. The biggest and most common are that messages can be lost and messages can be transmitted more than once.

The answers to both are to ensure they are idempotent and also allow for retry after a short time period.

Being idempotent means they can send the request over and over and over again without messing anything up. An obvious example is on a storefront's 'buy' button. If for some reason that page gets hit more than once you only want a single transaction to take place. This is often implemented by using unique transaction ID values, if the transaction has already been processed just re-acknowledge the event without billing.

In TCP and similar flow-control systems the source just keeps sending more and more data, up to the size of a window and assuming the data has all arrived for processing. The client periodically acknowledges that they received everything up to a certain spot so the host can drop all that and advance the window, or the client says it missed something and the server re-transmits that block and advances the sliding window up to the point that was missing.

With that in mind, going over the original post again...


The client sends a publish packet, then the server must answer with a pubrec packet. After receiving the pubrec, the client answers with a pubrel packet and finally the server send a pubcomp packet. This will ensure the message is reliably received only once.
This is basically guaranteed to eventually fail. There are always network problems of dropped packets and lost connections.

Make sure you have no "only once" requirement. The data can arrive as many times as necessary and be requested over and over without ill effect. The client can drop the connection, reconnect, and repeat earlier requests. The server needs to handle duplicate requests gracefully, generally this means re-transmitting without re-processing as described above.


Then it is passed to the work queue which is in charge to acknowledge all the publish-pubrec-pubrel-pubcomp chain. ... This ensures that only one message is inflight and that the order is preserved.

Machines crash. Sometimes data gets lost. Better to allow individual blocks of work to remain on the server, and allow for them to expire on clients. If the client takes to long or loses connection for any reason, distribute the work out to another client.

If there is any possible way to arrange it for the data, make sure that individual blocks can be processed independently of each other. Then you can queue up a large amount of work, farm it out to any number of machines, and re-assemble it as it returns. If something arrives back early, hold on to it to reassemble the parts in the right order. If something doesn't return in a timely manner, farm it out to another machine to be processed again. Don't force ordering on distributed processes, it slows things down.


So it happens that when the client and the server send a publish packet at almost the same time, they both have the publish at their heads. And both wait for the pubrec packet to arrive, but none of them can send a pubrec becuase the publish is still on the head waiting for the pubrec which can't be sent
That isn't a client/server. That is two machines both acting as clients and as servers.

As above, it can be avoided by not requiring any system wait. If another machine says they've got something for you, acknowledge the delivery, validate it, and then release them to do whatever else they want to do. It shouldn't matter if they gave you a duplicate of something they already sent, or sent you something out of order, or something you weren't expecting, just say "thank you" to the connection and allow them to advance to the next block of work.

As i said before, i tried to simplify all the complexity of the mqtt protocol. The mqtt protocol normally runs over TCP, although it can be over websockets.

Everything you wrote about crashing machines, data loss, etc is considered in the protocol. I just tried to abstract that away so we can focus in the problem i have.

Yo can read the mqtt spec at mqtt.org.

When a host sends a publish to another host the othe host must reply with a pubrec. That's the acknoledgment. If it doens't answers after some period of time, then the publish is resend.

Each packet has a set of flags, ids, and other that which helps synchronize and reliably send packets. The "only once" requerimient is already part of the mqtt protocol. You can read about in the spec.

Now, i would like to focus in the deadlock i've described.

Any ideas about that?

Thanks.

The client sends a publish packet, then the server must answer with a pubrec packet. After receiving the pubrec, the client answers with a pubrel packet and finally the server send a pubcomp packet. This will ensure the message is reliably received only once.

You are unfortunately missing a vital part of the MQTT rules here: there is no requirement that only a single publish transaction be in flight at a given time.

Immediately upon sending a Publish packet (with QoS=2) to the client, the server might reasonably receive a PubRel packet in return (with Message ID matching that specified in the initial Publish).

However, it might also receive a PubRel or PubComp with some other message id (in response to an earlier publish to the client). It might receive a Publish or Subscribe by the client. It might receive a Disconnect if the client has decided it is done. Or it might receive nothing at all, and need to resend the Publish after a pre-defined timeout...

TL;DR: between the Connect and Disconnect messages, you must always be prepared to handle arbitrary message types from the client.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

If you really get stuck, try the jstack command line utility. It can give you a stack dump from another terminal session. I had to use this to find a weird X11 Swing bug that was hanging the JVM before the exception message was displayed.

Maybe finding the class/code that has hung will point you in the right direction? Couldn't hurt to have this in your toolkit. cool.png

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532


You are unfortunately missing a vital part of the MQTT rules here: there is no requirement that only a single publish transaction be in flight at a given time.

In section 4.6 in the non normative comment it textually says:

If both Client and Server make sure that no more than one message is “in-flight” at any one time (by not sending a message until its predecessor has been acknowledged), then no QoS 1 message will be received after any later one - for example a subscriber might receive them in the order 1,2,3,3,4 but not 1,2,3,2,3,4. Setting an in-flight window of 1 also means that order will be preserved even if the publisher sends a sequence of messages with different QoS levels on the same topic.

I know it is a non normative, but that's what i'm using to preserve the order of the messages.

I thinks, that sending a publish should not prevent sending pubrec, puback, pubrel or pubcomps, just block other publish. I'm still working on this idea.

> Two hosts want to communicate between them. Let's assume both can write and read at the same time. Each message a host sends must be acknowledge with another

> "acknowledgment" message. So host sends packet1 then it must receive ack1 to know that host2 received packet1 correctly.

Yes, so for host1 -> host2, the sequence is "packet1 from host1 -> host2, then ack1 from host2 -> host1". I ignore loss of messages.

> The problema arrises when 2 hosts send a packet between them.

> Host1 sends packet1 and waits for ack1, and it won't send anything else until ack1 arrives.

> Host2 sends packet2 and waits for ack2, and it won't send anything else until ack2 arrives.

>

> As it can be seen, none of the are able to send an ackX message because they are mutually waiting for it. So i got DEADLOCK....

I don't agree here.

"Host1 sends packet1" <-- belongs to host1 -> host2 communication.

"Host2 sends packet2" <-- belongs to host2 -> host1 communication.

These two data streams are independent to me, as far as I can see, host1 (and host2) cannot send another packet until they received an ack for their own sent packet (if they did, you'd have 2 packets "in-flight".

I see no rule that forbids them from sending an ACK for a packet that they received, which is from a completely unrelated stream.

I know it is a non normative, but that's what i'm using to preserve the order of the messages.
I thinks, that sending a publish should not prevent sending pubrec, puback, pubrel or pubcomps, just block other publish. I'm still working on this idea.

Yes, that means that the client and server, may, each at their own option, decide not to have more than 1 outstanding publish awaiting acknowledgement.

It doesn't however prevent the other end from sending whatever it feels like (especially if a client that does not use the non-normative behaviour connects to your server), and in any case, it won't help if both sides try to make their 1 publish at the same time.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This topic is closed to new replies.

Advertisement