Sign in to follow this  
mdwheele

How does the client/server model handle high traffic?

Recommended Posts

mdwheele    140
Very general question I know. But in typical implementations how does the client/server model handle 100 users sending packets? Let's say Client A sends us data on it's position, we decide that only players within 128px will receive this data. So we send to Clients B,C,D,E, and F. Now with only 6 total clients, I don't see this causing too big of a hiccup as far as latency. A multithreaded server should handle this gracefully. However... Let's say you have 99 people stacked on top of Client A. Would this not bog down the server? And what is the best way of handling this. PLEASE correct me if I'm misunderstanding something following: Under a completely authoritative server model would clients only request to be moved in a specific direction and then the server updates them to all clients on the server? And is this the best way to do it? Of course, I can already see problems if all 100 in a 128px circle move at same time but I don't know how to avoid that. As you can see, I'm not fully grasping what goes into something like WC3 where the lag seems minimal and I'd like to learn more. I'm looking for any general advice on what would be the ideal way to handle this. Not looking for code specific to anything, only want the theory. Although if applicable, please feel free. I just don't want to feel like a code receptical. Thanks everyone, And I hope you all know (Staff and Contributors) how much of a difference you make on the industry. I've been a loyal lurker for many years now. Keep up the good work! Dustin

Share this post


Link to post
Share on other sites
eedok    982
from what I remember WC3 is P2P

But other Blizzard games like WoW or Diablo 2, don't use a single server, but rather have an array of servers in a datacenter and use various load balancing methods to ensure gameplay is as lag free as they can make it.

The biggest thing though that is missing from your post is you're assuming state changes get sent to everyone, when in reality they're only sent to players within the same area.

Share this post


Link to post
Share on other sites
mdwheele    140
Thanks for your input eedok.

Let me clarify a little bit, as it seems I wasn't quite clear enough with the question I ask and I blame this on my grasp of English grammar. That and I seem to muddy my speech up quite a bit.

What I'm looking for is a resource to reference or a typical implementation of a network model that handles the high traffic that would occur in a multiplayer game.

Think about a first person shooter. What methods are they using to keep everyone up to date. For instance, how they handle not only the 16-32 players involved, but how they handle every players set amount of active bullets.

There must be a clean way to handle all this data out of a single server application. Maybe I'm completely wrong, but it seems like I've seen independent developers accomplish this without much in regards to buying multiple machines to balance load.

Share this post


Link to post
Share on other sites
hplus0603    11356
I believe WC3 is client/server, and uses user input synchronous networking. You only send the user commands ("go here" "attack there"), not each individual unit. For more, check the "1,500 archers" article referenced by the Forum FAQ.

And, yes, in a real MMO, if you have 100 people all in the exact same place, and have a limitation on the amount of bandwidth each client can receive, then there will be some form of lag. Your job is to engineer the system so it hides the lag as well as it can.

If you send position/velocity for each entity, then one good algorithm is to use a priority queue, where each player is inserted at some priority time when a message is sent. The priority is sorted on something like the current time plus a fixed number plus some scale times the distance between the player and the entity. Each time you form an update packet, you pop off the top X players from the queue, send the message, and then put them back in the queue.

This mechanism will scale back reasonably nicely to round robin when there's lots of entities. You will notice that you're "overloaded" when you start popping off entities whose sort time is earlier than the current time, but that's OK, because the update latency will be adjusted the next time you put the entity in the queue (because you use the time of insertion as the basis).

Note that there is one queue per connected player, so with 100 players, there's 100 queues, each of which contains 100 player entities (and probably a number of NPC entities, too).

Share this post


Link to post
Share on other sites
mdwheele    140
Thanks hplus0603.

I actually discussed using a priority queue with one of my friends the other day and I should have spoke up on that. Our main concern was whether or not that is generally how it is done.

I actually found a post made by another user and was wondering, assuming you have the time of course, whether or not this approach of keeping the server ahead as far as simulation would out perform a priority queue approach. Here's the link:

Source's Networking Implementation

It seems like there's a couple of similarities in how Valve did their networking and how Ensemble describes AoE's network implementation.

More or less I'm just looking for good discussion at this point. I *am* a novice as far as multiplayer networking goes, but these are the types of things that my coworker and I talk about rather than work, so naturally... [smile]

On a completely unrelated note, and this isn't to change the topic of the thread; We recently discussed how to calculate large factorials in C considering that upper factorials would exceed the storage space unsigned ints. Really love thinking about this type of problem solving. I feel like it adds a lot of experience to my programming career as a whole.

Thanks again.

EDIT: Forgot to mention that it seems that the problem with latency on a large scale multiplayer experience is really unavoidable and it's really about finding a method of handling it rather than eliminating it completely. A sort of lesser of two evils. Would I be right in assuming this?

Dustin

Share this post


Link to post
Share on other sites
hplus0603    11356
Quote:
Would I be right in assuming this?


Only if you believe Einsteinian physics is the final word. With quantum messaging, it might actually be possible to eliminate lag. Just wait another 50 years or so, and they'll have that down pat :-)

Share this post


Link to post
Share on other sites
Vinterstum    133
Also consider implementing a per-packet-type priority system.

If you're sending a movement update to all players in the vicinity each time a player changes his movement vector, consider letting the server hold off on sending the update for X milliseconds in case a new update comes along in the meantime (at which point, discard the old and send the new). If you're sending updates each time a player loses/gains hitpoints/life/whatever, that can be pretty easily staggered together too.

Using a system like that (can be combined with a priority queue system) and you can ensure that important updates aren't throttled down together with the non-vital ones.

Of course, this depends on how "forgiving" the various clients are of some inaccuracy in client position and whatnot. For MMOs, however, it can work well.

Also, try to send off as many packets to the same client in one call, as you can. Maybe let them wait a short while in the queue, if you know bandwidth is getting cramped. A lot of it is likely wasted on TCP/IP headers. Your own little version of Nagle's algorithm (though just when it's needed; instead of simply turning off TCP_NODELAY).

Share this post


Link to post
Share on other sites
hplus0603    11356
It might not be obvious, but if you look at the priority queue construction I described above, it actually achieves both those two goals implicitly, through its construction. It queues updates during the priority time interval (so they can get coalesced); it also packs one packet per player per time interval, with as many updates for that player as possible in the bandwidth budget.

Share this post


Link to post
Share on other sites
david_watt78    133
I have written several high performance server systems for the financial sector that handled thousands of cleints. The answer I found is dedicating 3 threads to each client 1 to receive, 1 to send, and 1 to process what was sent. If you want to save threads it is possible to use a pool of threads to process the messages usually I would use a thread to do distribution and 50 or so to process the messages. However with multithreading there are several important rules.

Rule 1 . If more than one thread shares something that changes it must be locked.
Rule 2 . Avoid acquiring more than one lock at a time like the plague.
Rule 3 . If you must acquire multiple locks always acquire them in the same order.
Rule 4 . When accessing multiple objects in a list that requires a lock with objects that require a lock you must copy the list and cycle through the copy.
Rule 5 . Lock only when you must and only for as short a time as you need.
Rule 6 . Use reference counts on all objects that are used by more than one thread.

the reason for using seperate threads for receiving and sending is so that receiving messages is not interupted. It is critical that you return to recv as quickly as possible in fact I don't do anything in the receive thread but receive buffers and put them in a list. I also use a queue on the sending side and send messages in bulk using a send thread to empty the queue this also helps to prevent choking of the receive thread as in windows send will block receives. This is not an issue in linux. I finally use a third thread to process the received buffers into messages. I know this is thread heavy but it is the only way to be able to handle receive thousands upon thousands of messages without dropping messages or connections. Also as side note use bi directional heartbeats rather than a send respond model.

Share this post


Link to post
Share on other sites
hplus0603    11356
Quote:
The answer I found is dedicating 3 threads to each client 1 to receive, 1 to send, and 1 to process what was sent.


If that architecture can serve "thousands" of clients, then that's despite the architecture, not because of it. Each thread has significant kernel-side overhead, as well as a memory cost (the stack for each thread).

You should not have more computing threads than you have available CPU cores. And, ideally, you should use asynchronous or non-blocking I/O, so you shouldn't use threads for asynchronizing blocking interfaces (but sometimes you have to). 50 threads are way too many, unless you're running on a Sun with 64 cores.

You don't necessarily need to return to recv() as soon as possible. The kernel will buffer data that comes in, and you can read it all in one fell swoop. In fact, it's more efficient to read larger chunks of data in one go, because it means fewer kernel trips. If you run out of buffering space in the kernel because of processing times, then you can increase the default buffer space (which isn't particularly large).

Share this post


Link to post
Share on other sites
david_watt78    133
actually this is based on actual performance tests we did on our side to confirm the improved performance. The gains of parrelism can outweight the kernel cost. As far as the recv issue yes a buffer is used but it is of limited size and winsock will drop some of the contents when the buffer is full and messages arrive. Granted that is in the 20,000 per second per socket range but it can happen if traffic is high enough.

Share this post


Link to post
Share on other sites
Xai    1848
there is no need (academically) for more than 1 receive thread per logical networking device, and 1 sending thread per same ... because you are fundamentally locking / blocking on those devices ... it makes more sense for something like:

2 channel networking on server, so create 2 receive and 2 send thread. Processing threads are more determined by your particular game's design ... but for a minute I'll assume you have 1 thread per autonomous AI type process, up to a reasonable limit (such as 2x the number of cores on the server) ... so with 28 autonomous AI / game engine process on an 8 core server ... some number between about 4 and 16 would be appropriate (would require empirical tests to determine). Also, would want data processing threads most likely no greater than the number of simultaneous threads the server can handle (8 or 16 in our example).

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Original post by david_watt78
actually this is based on actual performance tests we did on our side to confirm the improved performance. The gains of parrelism can outweight the kernel cost. As far as the recv issue yes a buffer is used but it is of limited size and winsock will drop some of the contents when the buffer is full and messages arrive. Granted that is in the 20,000 per second per socket range but it can happen if traffic is high enough.


Mind if I ask about the hardware and actual numbers.

Last time I did stress tests, my AMD3000 single-threaded server could handle 18,000 zlib compressed UDP packets per second. Without zlib, it would stall at some 20,000 at ~30% CPU due to network card.

On the dual and quad core server machines, handling up to network capacity so far has not been a problem, but I haven't ran raw stress tests.

The reason I'm curious is because the 20kps number is curiously close to the limits I encountered, and they had nothing to do with actual software.

Share this post


Link to post
Share on other sites
Spodi    642
Quote:
Original post by david_watt78
that handled thousands of cleints


Quote:
Original post by david_watt78
dedicating 3 threads to each client 1 to receive, 1 to send, and 1 to process what was sent.


Quote:
Original post by david_watt78
If you want to save threads it is possible to use a pool of threads to process the messages usually I would use a thread to do distribution and 50 or so to process the messages.


I hope I am reading this wrong, but are you saying that, when you didn't use a thread pool, you actually had over 3000 threads? That poor, poor server. ;)

Quote:
Original post by mdwheele
Let's say you have 99 people stacked on top of Client A. Would this not bog down the server? And what is the best way of handling this.


If everyone receiving the update is at the same position, I'm not sure there is a whole lot of good you can do with prioritizing since everyone is the same distance away. I guess you could prioritize by the user's activity, sending to those who have most recently shown being active first. Assuming that not all 100 players are active, such as if it was a popular "trade area" in a town for a MMORPG, this would allow you to get messages to those who need it the most first. I don't know if theres any server out there that can efficiently handle 100 active players over the internet that are running around in circles and swinging their overly huge swords without falling to its knees, though.

Share this post


Link to post
Share on other sites
david_watt78    133
All the servers used in the tests were running dual processors and linux. Under linux we were able to send and receive 100,000 messages/sec, Under windows on 2ghz pentium with hyperthreading I got 30,000 per second. As a side note I actually got less performance using completion ports, which actually surprised me. As far as thread count was concerned I was using 4 processing threads. In production it was found that the servers were more responsive to customers with 50 however those tests were conducted by boss and not me. I had programmed the system so the counts were configurable, and he tweaked those settings to his taste. As to the disparety between linux and windows that is because in winsock send blocks recv, under linux that is not the case. As far as non-blocking goes I saw no increase in performance using non-blocking sockets. Also as a note on completion ports, they are really nasty when it comes to high volumes as you can send and send data till the machine will run out of page pool memory and the screen goes black and the computer occasionaly doesn't recover. The only fix for it was to call setsockopt to set the send and recv buffers to 0. This works as long as you don't use a third party code of somekind which undoes it. Granted I haven't used completion ports on the newer windows OS, but I am sure its still not much better than in windows 2000 when I worked with them.

Share this post


Link to post
Share on other sites
Antheus    2409
Quote:
Under windows on 2ghz pentium with hyperthreading I got 30,000 per second


I find this surprising. Using boost::asio based application, the CPU load doesn't go above 30% on regular pentium ~1800 (no HT) or so, as long as network card is capable of handling the load.

Quote:
Also as a note on completion ports, they are really nasty when it comes to high volumes as you can send and send data till the machine will run out of page pool memory and the screen goes black and the computer occasionaly doesn't recover


Again, can't say to have experienced that. The only problem I've had was with NetLimiter, which somehow corrupts socket buffers, and causes IOCP based applications to randomly crash. Apparently, that's a known problem, since it's reported often with IOCP based applications.

Quote:
In production it was found that the servers were more responsive to customers with 50 however those tests were conducted by boss and not me. I had programmed the system so the counts were configurable, and he tweaked those settings to his taste. As to the disparety between linux and windows that is because in winsock send blocks recv, under linux that is not the case. As far as non-blocking goes I saw no increase in performance using non-blocking sockets.


If you use blocking sockets, then more threads naturally improve responsiveness, since 4 threads were stalling. Threads are not best suited for responsiveness. They work best when you have long-running sequential task that you cannot break apart easily. Using arbitrary number of threads there will ensure that all progress, all without changing the code, but at high expense of context switching.

If however you have short running tasks (like what you mention, 20k+ pps), then async/multiplexed/reactor/pro-actor approach is ideal. It minimizes the number of context switches and allows you to utilize the CPU as much as possible while still degrading gracefully and minimizing context switches.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this