MMO socket strategies
I would like to know if there's anyone who got some insight on eaxactly how to create a high performing socket strategy/method for reading/sending data from a lot of sockets.
If we were to use all of the 16-bit port space we could end up with something like 65000+ connections. However, In reality this seems unlikely to me, (EVE-Online probably has the largest count by around 20000-30000 concurrent users).
Anyhow, I've asked myself many time what I would do to get good performance if I needed to handle that many sockets, (and by socket I usually mean a TCP connection).
You could of course be simple and use the Winsock Select() method, but that's hardly considered a cleaver approach.
I'm thinking more like setting up a concurrent priority queue were every server thread poll connections for available data. Were the priority would be set according to things like idle time, and user acitivity as well as some kind of message importance. The idea is to basically ignore a socket which is connected, but not sending that much data.
So what say you, any takers?
For many connections you're looking at OS-specific code, for instance IO Completion Ports under Windows. select() is limited to 64 sockets in Win32, so you'd be looking at over 450 threads to handle 30,000 sockets, which just isn't feasable.
I'd say the realistic count is around 500 connections per machine.
With IOCP it's possible to push into thousands for very low traffic connections.
But the problem is somewhere else - the ammount of data. Even if you do get thousands of users into a single box, how will you update them. Your network bandwidth will limit you far before you exhaust CPU.
Even from the logic side, a single machine can only support hundreds of users, and almost all games today have realistic loads at 100-200.
TCP is probably suboptimal here. You'll need very tight control over data sent to manage lots of users, so manually controlling the protocol over UDP is usually preferred.
As a point of reference - I can max out 100Mbit LAN connection with 800 users, but only using 30% of single core CPU.
What you need to figure out is a clustering strategy, and that is far from easy. Eve ran into plenty of problems with that, some of which required radical hardware aproach.
The only case where a single box and single network stack handles upwards of 100k users (even 300-400k) is the eDonkey server, which is an insanely difficult hack, which requires custom kernel patches and very advanced threading solutions, along with very heavy assembly optimization. And the data there is sent to clients about once a minute in form of a small UDP packet.
With IOCP it's possible to push into thousands for very low traffic connections.
But the problem is somewhere else - the ammount of data. Even if you do get thousands of users into a single box, how will you update them. Your network bandwidth will limit you far before you exhaust CPU.
Even from the logic side, a single machine can only support hundreds of users, and almost all games today have realistic loads at 100-200.
TCP is probably suboptimal here. You'll need very tight control over data sent to manage lots of users, so manually controlling the protocol over UDP is usually preferred.
As a point of reference - I can max out 100Mbit LAN connection with 800 users, but only using 30% of single core CPU.
What you need to figure out is a clustering strategy, and that is far from easy. Eve ran into plenty of problems with that, some of which required radical hardware aproach.
The only case where a single box and single network stack handles upwards of 100k users (even 300-400k) is the eDonkey server, which is an insanely difficult hack, which requires custom kernel patches and very advanced threading solutions, along with very heavy assembly optimization. And the data there is sent to clients about once a minute in form of a small UDP packet.
Thank you for a very enlightening post Antheus.
To me EVE appears to try and solve it's issues by just adding powerful and more hardware, maybe that's true as well.
800 users on 100MBit avrages on about ~0,125 MBit, that's still a significant amount of bandwidth, and with optimization i'm pretty sure you could get a lot done with 1GBit bandwidth. Since the peek at any given time should not be that high anyway.
Could you tell me a bit more about clustering strategies? What are they, I'm not familiar with the term.
I pretty much thought that UDP was deprecated, and not used anymore, what's your opinion on the matter?
When I think about the number of sockets I try to compare it with World of Warcraft, I did do some reverse engineering on the matter, and the concurrect user account on heavy populated realms averaged about 3000-4000 users. And that's my aim as well, or at least what I would like to be able to accomplish.
To me EVE appears to try and solve it's issues by just adding powerful and more hardware, maybe that's true as well.
800 users on 100MBit avrages on about ~0,125 MBit, that's still a significant amount of bandwidth, and with optimization i'm pretty sure you could get a lot done with 1GBit bandwidth. Since the peek at any given time should not be that high anyway.
Could you tell me a bit more about clustering strategies? What are they, I'm not familiar with the term.
I pretty much thought that UDP was deprecated, and not used anymore, what's your opinion on the matter?
When I think about the number of sockets I try to compare it with World of Warcraft, I did do some reverse engineering on the matter, and the concurrect user account on heavy populated realms averaged about 3000-4000 users. And that's my aim as well, or at least what I would like to be able to accomplish.
IIRC, Eve's problem was database retrieval speed as it had basically a single database server in its (very) large cluster, and it was getting thrashed. They went for a solid-state (ram) drive in the end.
Any indie game isn't going to need that, certainly not in early operation.
Most high connection volume solutions use some form of clustering. This basically means that the machines/processes that comprise your server communicate on at least two levels: Firstly and separately to their own clients, and secondly to each other. Perhaps the bulk of them are 'gateway' machines that eventually communicate with a core game server, or perhaps each node controls a zone in the game world. In either case, the purpose of the cluster is to divide the work of at least communication with thousands of clients and / or the persistent simulation of the game.
Edit:
Regarding UDP/TCP. UDP can basically means you have receive from multiple clients on the same port, gives you more of a fire-and-forget networking feel, and requires some work to implement reliable (verified) communication. TCP requires a socket per stream, plus the listener, as you should know. It does guarantee sequential delivery, so it's useful for state management, as you can assume a newer state message is more valid.
Regarding Threads. These aren't evil, but they are dangerous if you can't hold it in your head that each thread you have can be messing with data, and that all threads of a given process use the SAME MEMORY SPACE. I recommend an event subscription scheme of some sort for dealing with them. For example, my networking threads represent various services and service users. You could think of them as guys on the phones. When they receive a message, they drop it in a relevant queue and fire an event, which my main game threads can respond to, getting the message from the queue and handling it. Of course, you need to code the subscriber pattern in a thread safe (mutexed) way.
Any indie game isn't going to need that, certainly not in early operation.
Most high connection volume solutions use some form of clustering. This basically means that the machines/processes that comprise your server communicate on at least two levels: Firstly and separately to their own clients, and secondly to each other. Perhaps the bulk of them are 'gateway' machines that eventually communicate with a core game server, or perhaps each node controls a zone in the game world. In either case, the purpose of the cluster is to divide the work of at least communication with thousands of clients and / or the persistent simulation of the game.
Edit:
Regarding UDP/TCP. UDP can basically means you have receive from multiple clients on the same port, gives you more of a fire-and-forget networking feel, and requires some work to implement reliable (verified) communication. TCP requires a socket per stream, plus the listener, as you should know. It does guarantee sequential delivery, so it's useful for state management, as you can assume a newer state message is more valid.
Regarding Threads. These aren't evil, but they are dangerous if you can't hold it in your head that each thread you have can be messing with data, and that all threads of a given process use the SAME MEMORY SPACE. I recommend an event subscription scheme of some sort for dealing with them. For example, my networking threads represent various services and service users. You could think of them as guys on the phones. When they receive a message, they drop it in a relevant queue and fire an event, which my main game threads can respond to, getting the message from the queue and handling it. Of course, you need to code the subscriber pattern in a thread safe (mutexed) way.
Quote:When I think about the number of sockets I try to compare it with World of Warcraft, I did do some reverse engineering on the matter, and the concurrect user account on heavy populated realms averaged about 3000-4000 users. And that's my aim as well, or at least what I would like to be able to accomplish.
A single realm runs on dozens of blades or slices. I don't know exactly how they do the client routing (front-end connection servers, or direct zone connection), but it's not even remotely a single machine.
Eve once listed hardware they use. It was tens of multi-core server-grade machines.
Clustering is a way of running code on multiple physical machines. It's the only way to overcome the sizes of these games. See CORBA, ZeroC Ice, distributed computing.
Just the state of the game (players, mobs, items, ....) will be gigabytes or tens of gigabytes in size.
Quote:800 users on 100MBit avrages on about ~0,125 MBit, that's still a significant amount of bandwidth
No, it's not.
Because there is no average user. You have players logging in, logging out, switching zones, needing 100k-5Mb updates to synchronize the world. This will quickly saturate your connection with just 80 users. And when that happens, unless you know exactly how to handle such spikes, 1 user will log in, and 799 will lag for 3 seconds. Then consider that users log in or out every 5 seconds on average.
On top of that, consider that 800 players move into a zone, that means each world update needs to be sent 800 times.
The problem isn't raw bandwidth - that one just sets the upper limit. But most of the world management increases exponentially as more users get into same area, and that effectively limits you at hundreds of users per zone.
Quote:I pretty much thought that UDP was deprecated, and not used anymore, what's your opinion on the matter?
UDP is the protocol of choice. And UDP can't be deprecated, since it's just an internet layer. For heavy loads it's still better than TCP/IP, since it allows full control over packet loss and bandwidth.
Quote:When I think about the number of sockets I try to compare it with World of Warcraft, I did do some reverse engineering on the matter, and the concurrect user account on heavy populated realms averaged about 3000-4000 users.
For a WoW style game, running that many on a single machine is likely very very hard to achieve. The reason not being raw CPU or network performance, but the ammount of optimization you'd need to do would make the code unmaintainable, since everything would have to be so well tuned.
MMORPGs today are typically scaled for 100-200 users per machine, and even then it's easy to lag such servers. Commercial servers need to do much more than just run the combat. Logging, validation, security, synchronization, all those really slow things down, yet they are the only way to avoid common problems, bugs, exploits, deliberate attacks, etc.
Quote:If we were to use all of the 16-bit port space we could end up with something like 65000+ connections.
Actually, you can get more than that, because a TCP connection is identified by the quadruplet (local IP, local port, remote IP, remote port).
So, you can have 65000 connections to a single remote host. And then 65000 connections to the next remote host. Keep going :-) Realistically, you'll run out of kernel resources way before then.
However, no real game puts 20000 users on a single machine. The machine simply won't keep up with that (unless you're doing chat only). Instead, you have to find a way to spread users over multiple machines. Such an arrangement, where multiple machines provide one logical service is often known as a cluster.
Well I never really suggested that a singel machine should handle such capacity, but I did not guess that the amount of processing power in the other end would have to be that immense. I'm also still a bit sceptical to the little amount of work you seem to be able to do on a single machine. Then it seems to me that it has to do more and more with the games themself, the complexity the mechanism of choice.
As far as deprecated UDP goes I simply meant that it's not a preferable choice. I've been talking to some of the teachers at my university and they say that the Internet to day is rather well built, and retransmissions generally do not occur. Thus UDP gives you little in performance. However, TCP has a RTT (round-trip time) which UDP does not, so if the internet really were that well built, you could gain a lot in response time if you were to rely on the Internet and UDP. Guess you shouldn't really rule that out completely.
I think that it also important to point out that I do not fully intend to build such a MMO game. I am only intrested in devising a plan or strategy for managing very large networks. I am also intrested in delivering a lot of performance through cleaver implmentations and optimizations.
Then there is that bandwidth issue again. And I have to base it on my experience from WoW, were data is packed into a cache which is flushed as a compressed packet every now and then. Not only is the internal cache structure very compact, it is also compressed, and these packets handle state change.
I have done some profiling on the matter, and WoW avrage about 0,34Kb/s that should be playable (maybe not enjoyable) on a 5.6k modem.
As far as deprecated UDP goes I simply meant that it's not a preferable choice. I've been talking to some of the teachers at my university and they say that the Internet to day is rather well built, and retransmissions generally do not occur. Thus UDP gives you little in performance. However, TCP has a RTT (round-trip time) which UDP does not, so if the internet really were that well built, you could gain a lot in response time if you were to rely on the Internet and UDP. Guess you shouldn't really rule that out completely.
I think that it also important to point out that I do not fully intend to build such a MMO game. I am only intrested in devising a plan or strategy for managing very large networks. I am also intrested in delivering a lot of performance through cleaver implmentations and optimizations.
Then there is that bandwidth issue again. And I have to base it on my experience from WoW, were data is packed into a cache which is flushed as a compressed packet every now and then. Not only is the internal cache structure very compact, it is also compressed, and these packets handle state change.
I have done some profiling on the matter, and WoW avrage about 0,34Kb/s that should be playable (maybe not enjoyable) on a 5.6k modem.
Quote:I think that it also important to point out that I do not fully intend to build such a MMO game. I am only intrested in devising a plan or strategy for managing very large networks. I am also intrested in delivering a lot of performance through cleaver implmentations and optimizations.
Scalability is almost unrelated to raw performance. Consider that dual core processors are running at lower declared speeds, yet make it possible to achieve higher performance. If, and only if, application makes use of it.
The reason why these "low" numbers apear has to deal with MMO servers being essentially asynchronous applications with 500 processes (players).
That is a completely different beast than usual sequential and single-threaded application design.
The networking part is the least of your problems (it's still important due to extremly varying quality of player's connections) but it's not where bottlenecks lie.
The numbers come from real world. They aren't product of "HTML and XML programmers", but are based on proven techniques.
Before developing anything, let alone design, you'll need to look into clustering, asynchronous programming and concurrent execution.
Yes, it's possible to develop a single core server that will support thousands of users. But the ammount of optimization and tweaks to achieve that are so prohibitively expensive, that no company can afford them. MMO servers are live beasts, which need to leverage logging, security, data access, insanely large game states (too large to fit into any ammount of memory you can provide, let along 4 gigabytes), and need to stand the test of time for 5+ years.
Another reason why it's very undesirable to run everything on a single machine is reliability. If you run your game on a cluster of 10, and one fails, you just replace the hardware while the game runs on. If you're running on a single machine, a simple DOS attack or simple power failure will give you full downtime.
Hardware is cheap, so is bandwidth, but development time isn't.
Writing the network code to handle 500 clients is easy, you can usually just use tutorial code, and it will take you under a week.
Writing code to support 3000 users will take a month, and another month of live stress testing with real players.
First case costs 1 developer's salary for a week + cost of 5 servers.
Second case costs 3-6 man months (deploying everything isn't cheap, and setting up live stress tests takes a lot of effort).
It's not economically viable.
And lastly, nobody, and I mean nobody cares about network code. Players definitely don't. The only thing that matters when developing an MMO is content. But if it then runs on a lean mean single machine or a bunch of "poorly" optimized mediocre machine is something nobody will ever know or care about.
But go ahead and try it. Just make sure to actually solve MMO related problems, write the network code as applicable to entire architecture of MMOs. Client-side networking is only a tiny piece of a puzzle.
Quote:I have done some profiling on the matter, and WoW avrage about 0,34Kb/s that should be playable (maybe not enjoyable) on a 5.6k modem.
This number has nothing to do with network code. It's the direct product of spatial algorithms which pre-process the data, so that only a tiny fraction of tens of gigabytes of data gets sent to client.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement