How does the client/server model handle high traffic?

Started by
14 comments, last by Antheus 15 years, 11 months ago
actually this is based on actual performance tests we did on our side to confirm the improved performance. The gains of parrelism can outweight the kernel cost. As far as the recv issue yes a buffer is used but it is of limited size and winsock will drop some of the contents when the buffer is full and messages arrive. Granted that is in the 20,000 per second per socket range but it can happen if traffic is high enough.
Advertisement
there is no need (academically) for more than 1 receive thread per logical networking device, and 1 sending thread per same ... because you are fundamentally locking / blocking on those devices ... it makes more sense for something like:

2 channel networking on server, so create 2 receive and 2 send thread. Processing threads are more determined by your particular game's design ... but for a minute I'll assume you have 1 thread per autonomous AI type process, up to a reasonable limit (such as 2x the number of cores on the server) ... so with 28 autonomous AI / game engine process on an 8 core server ... some number between about 4 and 16 would be appropriate (would require empirical tests to determine). Also, would want data processing threads most likely no greater than the number of simultaneous threads the server can handle (8 or 16 in our example).
Quote:Original post by david_watt78
actually this is based on actual performance tests we did on our side to confirm the improved performance. The gains of parrelism can outweight the kernel cost. As far as the recv issue yes a buffer is used but it is of limited size and winsock will drop some of the contents when the buffer is full and messages arrive. Granted that is in the 20,000 per second per socket range but it can happen if traffic is high enough.


Mind if I ask about the hardware and actual numbers.

Last time I did stress tests, my AMD3000 single-threaded server could handle 18,000 zlib compressed UDP packets per second. Without zlib, it would stall at some 20,000 at ~30% CPU due to network card.

On the dual and quad core server machines, handling up to network capacity so far has not been a problem, but I haven't ran raw stress tests.

The reason I'm curious is because the 20kps number is curiously close to the limits I encountered, and they had nothing to do with actual software.

Quote:Original post by david_watt78
that handled thousands of cleints


Quote:Original post by david_watt78
dedicating 3 threads to each client 1 to receive, 1 to send, and 1 to process what was sent.


Quote:Original post by david_watt78
If you want to save threads it is possible to use a pool of threads to process the messages usually I would use a thread to do distribution and 50 or so to process the messages.


I hope I am reading this wrong, but are you saying that, when you didn't use a thread pool, you actually had over 3000 threads? That poor, poor server. ;)

Quote:Original post by mdwheele
Let's say you have 99 people stacked on top of Client A. Would this not bog down the server? And what is the best way of handling this.


If everyone receiving the update is at the same position, I'm not sure there is a whole lot of good you can do with prioritizing since everyone is the same distance away. I guess you could prioritize by the user's activity, sending to those who have most recently shown being active first. Assuming that not all 100 players are active, such as if it was a popular "trade area" in a town for a MMORPG, this would allow you to get messages to those who need it the most first. I don't know if theres any server out there that can efficiently handle 100 active players over the internet that are running around in circles and swinging their overly huge swords without falling to its knees, though.
NetGore - Open source multiplayer RPG engine
All the servers used in the tests were running dual processors and linux. Under linux we were able to send and receive 100,000 messages/sec, Under windows on 2ghz pentium with hyperthreading I got 30,000 per second. As a side note I actually got less performance using completion ports, which actually surprised me. As far as thread count was concerned I was using 4 processing threads. In production it was found that the servers were more responsive to customers with 50 however those tests were conducted by boss and not me. I had programmed the system so the counts were configurable, and he tweaked those settings to his taste. As to the disparety between linux and windows that is because in winsock send blocks recv, under linux that is not the case. As far as non-blocking goes I saw no increase in performance using non-blocking sockets. Also as a note on completion ports, they are really nasty when it comes to high volumes as you can send and send data till the machine will run out of page pool memory and the screen goes black and the computer occasionaly doesn't recover. The only fix for it was to call setsockopt to set the send and recv buffers to 0. This works as long as you don't use a third party code of somekind which undoes it. Granted I haven't used completion ports on the newer windows OS, but I am sure its still not much better than in windows 2000 when I worked with them.
Quote:Under windows on 2ghz pentium with hyperthreading I got 30,000 per second


I find this surprising. Using boost::asio based application, the CPU load doesn't go above 30% on regular pentium ~1800 (no HT) or so, as long as network card is capable of handling the load.

Quote:Also as a note on completion ports, they are really nasty when it comes to high volumes as you can send and send data till the machine will run out of page pool memory and the screen goes black and the computer occasionaly doesn't recover


Again, can't say to have experienced that. The only problem I've had was with NetLimiter, which somehow corrupts socket buffers, and causes IOCP based applications to randomly crash. Apparently, that's a known problem, since it's reported often with IOCP based applications.

Quote:In production it was found that the servers were more responsive to customers with 50 however those tests were conducted by boss and not me. I had programmed the system so the counts were configurable, and he tweaked those settings to his taste. As to the disparety between linux and windows that is because in winsock send blocks recv, under linux that is not the case. As far as non-blocking goes I saw no increase in performance using non-blocking sockets.


If you use blocking sockets, then more threads naturally improve responsiveness, since 4 threads were stalling. Threads are not best suited for responsiveness. They work best when you have long-running sequential task that you cannot break apart easily. Using arbitrary number of threads there will ensure that all progress, all without changing the code, but at high expense of context switching.

If however you have short running tasks (like what you mention, 20k+ pps), then async/multiplexed/reactor/pro-actor approach is ideal. It minimizes the number of context switches and allows you to utilize the CPU as much as possible while still degrading gracefully and minimizing context switches.

This topic is closed to new replies.

Advertisement