Since this is message passing, there are many things that could go wrong. The biggest and most common are that messages can be lost and messages can be transmitted more than once.
The answers to both are to ensure they are idempotent and also allow for retry after a short time period.
Being idempotent means they can send the request over and over and over again without messing anything up. An obvious example is on a storefront's 'buy' button. If for some reason that page gets hit more than once you only want a single transaction to take place. This is often implemented by using unique transaction ID values, if the transaction has already been processed just re-acknowledge the event without billing.
In TCP and similar flow-control systems the source just keeps sending more and more data, up to the size of a window and assuming the data has all arrived for processing. The client periodically acknowledges that they received everything up to a certain spot so the host can drop all that and advance the window, or the client says it missed something and the server re-transmits that block and advances the sliding window up to the point that was missing.
With that in mind, going over the original post again...
The client sends a publish packet, then the server must answer with a pubrec packet. After receiving the pubrec, the client answers with a pubrel packet and finally the server send a pubcomp packet. This will ensure the message is reliably received only once.
This is basically guaranteed to eventually fail. There are always network problems of dropped packets and lost connections.
Make sure you have no "only once" requirement. The data can arrive as many times as necessary and be requested over and over without ill effect. The client can drop the connection, reconnect, and repeat earlier requests. The server needs to handle duplicate requests gracefully, generally this means re-transmitting without re-processing as described above.
Then it is passed to the work queue which is in charge to acknowledge all the publish-pubrec-pubrel-pubcomp chain. ... This ensures that only one message is inflight and that the order is preserved.
Machines crash. Sometimes data gets lost. Better to allow individual blocks of work to remain on the server, and allow for them to expire on clients. If the client takes to long or loses connection for any reason, distribute the work out to another client.
If there is any possible way to arrange it for the data, make sure that individual blocks can be processed independently of each other. Then you can queue up a large amount of work, farm it out to any number of machines, and re-assemble it as it returns. If something arrives back early, hold on to it to reassemble the parts in the right order. If something doesn't return in a timely manner, farm it out to another machine to be processed again. Don't force ordering on distributed processes, it slows things down.
So it happens that when the client and the server send a publish packet at almost the same time, they both have the publish at their heads. And both wait for the pubrec packet to arrive, but none of them can send a pubrec becuase the publish is still on the head waiting for the pubrec which can't be sent
That isn't a client/server. That is two machines both acting as clients and as servers.
As above, it can be avoided by not requiring any system wait. If another machine says they've got something for you, acknowledge the delivery, validate it, and then release them to do whatever else they want to do. It shouldn't matter if they gave you a duplicate of something they already sent, or sent you something out of order, or something you weren't expecting, just say "thank you" to the connection and allow them to advance to the next block of work.