Completion ports and worker threads

Started by
7 comments, last by efortier 18 years, 5 months ago
Hi all, I was just working on the ordering of data received in the worker threads of my server, and wanted to ask you guys about an idea I had. Instead of having all my worker threads wait on a single server-wide completion port, how about I created one port for each threads (i use between 1-4 threads). Then, as my server accepts a connection I pool the list of ports or servers, select the less active and assign it to the client socket for the rest of the connection. That way, I believe, I wouldn't have to worry about the order of the incoming data for a particular client as the data would be received in the order it was sent. I haven't tried it yet, but I think this might be a sound design. What do you think? Thanks! [Edited by - efortier on November 12, 2005 8:18:31 AM]
Advertisement
Whenever you have threads in your program, ordering between the threads is not guaranteed. Your modification does not re-order anything differently from a single completion port -- in fact, it may suffer MORE out-of-order completion because there's more ports involved.

The whole reason to use completion ports is to get efficiency, where a running thread would pick up new requests without requiring re-scheduling. You totally lose that benefit if you have multiple completion ports.
enum Bool { True, False, FileNotFound };
i thought lack of ordering was independant of the servers setup.

is it not just when they get held up on the way and some arrive before others, even if they are sent after?
> I haven't tried it yet, but I think this might be a sound design.

This works on Windows 9x because there are no completion port mechanism. But Windows NT-class systems have completion port, which is a better thread pool management mechanism.

> {...} select the less active {...}

It's the other way around. You want to pick up the MOST active because this thread has all its heap and stack already in memory and is already running. Otherwise you can get performance hickups as thread data is paged back into RAM and the thread is rescheduled. Completion ports pick up threads in LIFO order automatically for you and keeps them running if there is more data to process when they return.

> That way, I believe, I wouldn't have to worry
> about the order of the incoming data for a
> particular client {...}

As said above, you cannot guarantee order in a threaded environment. The only way to ensure "packet ordering" is to number your 'per-call' data structure. See the following link for the details on how to do this:

http://www.codeproject.com/internet/reusablesocketserver4.asp ( clicky )

Hope this helps.

-cb
Thanks for pointing the weekness of my proposed design ;)

There's still something I have a hard time with, maybe someone can help me see clearly with IOCP and multiple threads.

Forgetting about the design I proposed in this thread, I now expect to have 2 worker threads running for my server servicing a completion port.

Given that a client sends a message of 256 bytes to a server, my understanding of Winsock + IOCP is that:

a) the server can receive either just a few bytes of the message, the entire message or more than one message (if queued by clients)
b) a worker thread might receive a completion notification for a WSARecv() but it may contain only a partial message (ie: 128 bytes)
c) a worker thread, after processing the received data, issues a new WSARecv() to wait for more data

If these assumptions are correct, is it possible that:

1) Thread #1 receives 128 bytes of a 256 bytes message
2) Thread #2 receives the other 128 bytes
3) Thread #2 finishes processing the message before Thread #1?

If so, how can the threads know which data comes first in the message? I've read a lot about a client numbering the messages it sends to the server, but how does a server and its threads untangle the possible out of order bits of a message?

Either that's just the way it works or I'm not getting an important fact.

Thanks!
> If these assumptions are correct, is it possible that {...}

Rarely, but yes. There are 2 ways for packets to get out of order: 1) the network does it sometimes, but the TCP layer ensures packets are re-assembled in the correct order. And 2) when 2 or more threads pull data from the *same* socket at roughly the same time. I think you are mixing the two here and making incorrect assumptions about Completion Ports in general.

1) Unlike UDP, TCP is a stream protocol. The IP stack ensures the TCP bits and pieces are reassembled in the correct order when it fills the socket buffers for the application to read. In a singly-threaded environment, the buffers appear continuous and in-order all the time. In a context where "one thread = one socket", you get essentially the same results because each thread is pulling data out of its own assigned socket and nowhere else.

2) In a multi-threaded environment (especially if you have a multi-CPU machine), two threads can theoritically pull data out of the *same* socket buffers and process them independently. Then you need a way to synchronize the threads so that only one thread can read the socket and process data at any one time (ex: through a mutex or critical lock), or a mechanism that tells each thread what piece of the socket buffer data it actually pulled. It's that latter approach that's basically what is being done in the article reference I gave you, but it's a bit of a kluge.

------------------------

If you post *only one* WSARead() per socket, there can only be one thread that can read from the socket buffers at any one time anyway. Also, when that reading thread is done, it posts only one WSARead() call at the end. If there is any more data in the socket buffers when the thread loops back to GetQueuedCompletionStatus(), that same thread will simply get through it and pull data from the WSARead() call it has just posted. If you try to post *multiple* WSARead() call per socket, then this is where you need to have thread synchronization and counters.

Hope it's clearer now.

-cb
It makes perfect sense.

Thanks for taking the time to explain this!

--Eric
Use one thread to service the IOCP and place completely received messages in a queue. Another thread processes messages out of that queue. This way the IOCP thread is maximumally available to keep the data transmission rate high. The same IOCP thread can send data from the output queue. If you have an SMP machine, then a second thread should send data.

You can now poll the message queue in your game loop and everything stays syncronous and easy.
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
Thanks for the infos.

I did something similar to what you propose, but I'm not sure I should have.

Instead of putting the received messages in a list (using some type of locking mechanism), I created another completion port and I manually post the messages there.

Another thread simply checks that port and dequeue messages and processes them.

Is this a legal use of completion ports? I haven't associated it with anything. So far it hasn't caused any problems, but I wonder what it would do once deployed.

--Eric

This topic is closed to new replies.

Advertisement