Given the way the async works, you generally have to consider things a bit differently between input and output. The "input" (i.e. reading from client) side is basically something like:
Sleep (or wait for run control messages)
-> AsyncResult, associate message data with internal server object Id, post to queue
-- Issue another AsyncBegin
Now, given how async *write* works, this pattern is not proper for the output side. If you do an Async Write Begin, it typically triggers a result instantly, at least to start with because the buffer is empty until you start pushing stuff onto it of course. So, you don't issue the begins until there is something to actually be sent to the specific session/connection. Even then, you generally don't want to start triggering begin/waits in this case because it is a waste if there is no more data to be posted. (I.e. doing so would cause a mass of those context switches you don't like.
) So the write side is a bit more complicated and delves into architecture bits. Let me just generalize things at a high level, this is not a perfect picture but hopefully gives us common terminology to work with:
1..n network clients --> messages received on server are mapped to the "session" for each client
--> session id/object/whatever + message put on queue (I.e. multiplexing all messages to a queue)
MMO guts, more threads:
message queue -->
. process the available messages into the various objects
.. By nature this is demultiplexing the messages back to internal server objects represented by the session.
. run game loop --> produces messages to "sessions" somewhere in here
. repeat at some simulation rate
Send threads, much more painful than read threads:
Sessions get messages from the game loop/simulation/whatever. I.e. the game loop itself demultiplexed everything.
. Is there an outstanding async in progress?
. Yes- buffer the data.
. No- try a direct send, if would block, send a portion, buffer the rest and kick off the async begin for writing.
.. on async results, if no more data in buffer, don't issue a new begin.
Actual send threads only contain async's outstanding for sessions with buffered data. This means a communications system to wake such threads and tell them, start a send async for xxx session. This has to be carefully designed or you will be beating on mutex's and all that constantly and this can/will cause your nasty context switching issues. (NOTE: this is actually a problem for most AIO networking API's, they are designed with TCP bulk transmission in mind but a game only sends a bit here and there usually.)
Now, a network server can/should be more asynchronous than a standard client game loop so I generalized that considerably. The point of the walk through though is that with three threads in this simple (yeah, right
) outline, you only have two locations to worry about context switching with and they are easy to remove as problems. For instance, the queue between the reads and the game loop does not actually need to be a queue, nor does it need to be shared between threads. Instead of a queue, each "read" thread simply writes to it's own copy of a vector. Once a game loop the server says, let me have all the contents, you start filling this new empty vector. This does impose a mutex lock/contention (potential context switch) but once a loop is very acceptable. (With a centralized queue, things are MUCH more expensive.) The output side is much the same using a different system, posted items would be in pooled blocks of a fixed size, locklessly post new blocks into the outgoing queue (remember, per session, so the likely hood of hitting actual contention is near zero so even a mutex lock would be acceptable) only if there are async operations already in progress, otherwise attempt to send it without bothering to touch the writer thread. (NOTE: May want a queue to let the writer thread actually do the send even if there is no buffer, would have to see if there are any notable costs in calling the API directly.)
Hopefully this gives you a full high level picture of things. This is not the "only" way to do things nor even necessarily the best, but there is nothing here which causes any outstandingly high number of memory copies or context switches unless you implement one of the pieces wrong. A test client and do nothing server using all of this can be written in a couple days. Removing any remaining performance issues a couple more days. Generalizing, getting clean startup/shutdown, yet another couple days. Throwing 1k simulated client connections at it, a day. Fixing things when it falls on it's face with 1k connections sending/receiving 5kb/s each, a couple more days. Making an MMO out of the framework by yourself, see you in your next life.