Many core MMO server

Started by
16 comments, last by hplus0603 16 years ago
Hi, I'm currently trying to design / think through an architecture for game servers that will be able to utilize the processing power of a high number of cores. Intel's Nehalem architecture will feature up to 8 cores per chip with 2 threads per core. With four of those in one server a programmer will have access to 64 threads (32 cores). Basically two designs come into my mind: a) Big Seamless World: The world is (possible dynamically) partitioned into regions handled by different physical servers. The servers communicate with each other and transfer entities between each other as needed. From the player's pov it looks like one big server without zoning. b) One Server per zone: One zone is handled by one server. If a player wants to switch zones he gets a loading screen and his client connects to another server. Now two scenarios: - Rising number of players, but in distinct areas (ie in the same zone but not interacting with each other): a) works fine. If one server gets overloaded, readjust the borders or add new servers. b) Distributing non interacting players to different threads is doable. The server will reach a ceiling at some point, but is able to handle much more players than one server in a) since all objects are local and the server doesn't have any communication overhead. (A seamless server would have to use some sort of actor model to handle actions to remote objects, which is much more expensive). - Rising number of players, in one small area (eg a big event, a town raid....) a) Since this server is less efficient than b) it will max out earlier. The question is if adding more server to the scene (eg setting a zone border in the middle of the battlefield) will do any good. You get more progressing power but communication overhead increases also (a lot!). b) Theoretically this would work better than a). IF it's possible to efficiently distribute the workload to 64 threads. Note: The idea here is to build a server that can handle a lot of players in one place. A "seamless" server architecture means seamless within a zone. This can of course be one giant zone (which imposes other problems like a higher crash probability and problems if one server gets overloaded etc). On top of that other systems can be layered (like instancing or shards). Note2: Some task can be quite easily offloaded (authentication & login, chat, possibly even all client communication through proxy servers), this is about handling the main simulation. Note3: This is not tied to a specific game design or project. I'm just experimenting a bit around. Does that make any sense? :) - Is one heavily threaded server more efficient for a big fight than a cluster of servers? - Is a server with 32 cores feasible for such a task or will other bottlenecks limit this (memory bandwidth / usage, cache issues, network bandwidth)? - How to program such a server? - Does it even make sense to build a 32 core server, or is it more likely that we'll get one or two chips and a bunch of cell-like co-processors. How to map a game server onto those? Thanks for any insights you guys have on this subject :)
Advertisement
I'm not sure if MMO servers are designed with specific hardware in mind. (but maybe, I don't know).

I can't imagine the servers not communicating with one another. I just plan to have the server applications open connections to either adjacent zones or all zones.

The basic idea from what I've been able to gather over time is that you want to create one zoneManager application and loginManager application. The login runs on 1 server (doesn't need anymore power) and the server applications work on one zone that they are given. The login server manages the server nodes. Okay say you have 9 zones and 4 servers. The server application can run many zones inside of it.

Okay so that's fine you have something like:

server 1 loginManager server
server 2 zoneManager server managing zones 1,2,3
server 3 zoneManager server managing zones 3,4,5
server 4 zoneManager server managing zones 6,7
server 5 zoneManager server managing zones 8,9

But oh no server 1 is hitting the server CPU capacity and it detects that zone 1 is at fault and requires much more processing. The login server is constanty querying this data from the server cluster. The login server find that server 4 and 5 are almost empty and tells them to spawn new zones loading in the required information (like level data) for zones 2 and 3 respectively. So server 4 has an empty instance of zone 2 and server 5 has an empty instance of zone 3. All players in zone 2 and 3 are told to open a new connection to server 4 and 5 which the login server told server 4 and 5 to expect their connections. So all the players on 2 and 3 and now connected to two servers. In what might be a second of lag zones 2 and 3 stop their simulation and come to a complete halt serializing everything over to server 4 and 5 which effectively copies the world. The players are told to make the change and drop their connection to server 1 and go to their new socket either connected to 4 or 5. Server one destroys zones 2 and 3 and begins churning the data only for zone 1 and the game balances. The loginManager is constantly looking at these situations and deciding when these server balances need to occur.
Quote:Intel's Nehalem architecture will feature up to 8 cores per chip with 2 threads per core. With four of those in one server a programmer will have access to 64 threads (32 cores).


If you watch Sutter's discussions, the current average performance gain from hyperthreads is in the order of 20%. So, 38 threads.

Quote:a) since all objects are local and the server doesn't have any communication overhead.


If you want to scale, you want to replicate resources, not share them and use locks.

Proper distributed actor-like local communication overhead is about the same as network communication overhead, minus the wire latency. You need to fully persist the data you're sharing, then send it, and potentially wait till sender can accept it.

Local message passing in this way is some 5-10 slower than non-inlined function calls if no data is passed. Once you want to pass actual data, you will need to involve the following: lock/unlock, heap memory allocation/de-allocation, copy of all data you're sending. That's for each message.

Lock-less algorithms can be used instead of locking, but they make memory allocation considerably harder. Copy of data cannot be avoided for true distributed model. If you share it, then you need to lock it until receiver has processed the message in entirety.

Quote:- Is one heavily threaded server more efficient for a big fight than a cluster of servers?


There isn't much difference really (except in physical size). It all depends on how you distribute the model.


And just as a side note - seamless severs aren't new, and for all practical purposes, they're a solved problem. They all have hard limits on capacity, but that limit is smaller than what users require. Especially in games, fidelity has proven more desirable than sheer size. On typical game servers the local population will be counted in dozens, since anything else becomes problematic from client-side rendering, as well as gameplay perspective.
I agree; seamless servers that hand off between areas is not new, and is typically done on single- or dual-core machines with Ethernet between them.

The problem with putting zillions of cores on a die is that each core will have less memory bandwidth available to it. Thus, workloads on the cores must be structured to fit in the cache as much as possible. That makes multi-core machines great for partitioning things like per-object simulation, but makes them troublesome for doing things like global collision detection.

There are solutions, such as building everything as one big set of state machines, each of which machine gets a regular "tick" from a thread within a pool of service threads. If you can tie the type of service (collision vs simulation vs chat vs access control vs ...) to specific cores, that's even better, because you will have cache locality within each core.

enum Bool { True, False, FileNotFound };

Dont bank on having all the processing power you need on one machine.

You will have to eventually scale beyond what fits on a single motherboard (many MMOS have a farm of hundreds or thousands of servers to meet their processing (a whole layer of player connection servers + a bank of servers for the zones + DB servers and often a bank of servers for the game NPC AI/behavior). Zones can interact at their boundries (and you can easily run more zones than you have cores...)

My project is heavy on AI and I gave up quite a while ago trying to fit everything on one computer (even wth single/dual quad cpus). Clusters of servers are needed to do half intelligent AI behaviors, and future interactive terrain vastly will increase the processing power needed for the Zone simulations.

You will have to accomodate crossing between server machines as well as between cores withing the same server. You still might do something clever by having certain threads be positioned adjacent on the same CPU for processing that requires fine grained data dependancies (faster locks within the same CPU than across a network link to another CPU/set of cores - memmory sharing). But you quickly run out of those opportunities and are forced to interact across the more costly inter machine communications.

I also wouldnt count on 2 threads per CPU as you might move some background task to a second thread on a core , but 2 primary thread will fight each other for resources too much to be efficient.
--------------------------------------------[size="1"]Ratings are Opinion, not Fact
Quote:future interactive terrain vastly will increase the processing power needed


I'm not so sure about that. Interactive terrain really shouldn't be much more expensive, computationally, than current terrain. However, interactive terrain is a giant distribution (networking) challenge, as you need to get the terrain modification data to be able to interact in an area, and sometimes that data is pretty large.

That being said, these are really two different questions.

Question 1): How can I build an architecture that scales across nodes connected with a communication bus (Ethernet, NUMA)?

Question 2): How can I build an implementation that exploits all available power on nodes that have many CPU cores connected through shared memory?

The original question is question 2, however, the description of the question makes it seem as if you're trying to tackle question 1. Note that, when CPUs get big enough caches, or when mainstream servers become NUMA, then solutions for question 1 may start mattering even for question 2 (as the line will start blurring).


[Edited by - hplus0603 on March 22, 2008 12:07:09 PM]
enum Bool { True, False, FileNotFound };
Something to consider:

How many players do you plan for a single core to handle?

How often are that many players going to be in a single zone? (zone in this case meaning either a minimum partition of your big seamless world, or an actual zone, however you end up) What's your pain threshold here (minimum server FPS for the game to be playable), and how many players can you handle and still be within that limit?

Depending on the nature of your game, how much server-side processing you're planning to do and how much of your game logic (AI, physics) you can push onto other servers, you may find that you -very- seldom will push the limits for one core here, for a particular zone/instance/partition.

In that case, you may not want to add the tenfold complexity of multithreading the simulation itself. Simply running more processes per physical server nets you the same scalability with regards to just running more zones at once (i.e. physical machine A could be running one separate process per core, each of which handle a variable number of zones depending on load).

And for this type of setup, two boxes with 4 cores each is better than one with 8, and is more cost-efficient to scale upwards.

Seamless worlds require more inter-server communication than fully zoned ones (zoned ones can just require player transfers, everything else can be partitioned off to separate server types) but it should still be kept to a minimum, so the same principles apply.
Quote:Original post by hplus0603
Quote:future interactive terrain vastly will increase the processing power needed


I'm not so sure about that. Interactive terrain really shouldn't be much more expensive, computationally, than current terrain. However, interactive terrain is a giant distribution (networking) challenge, as you need to get the terrain modification data to be able to interact in an area, and sometimes that data is pretty large.





I was originally thinging of interactive furniture - things you could manipulate each with reactive scripted behaviors/triggers etc.. but then an extension of that with the real physics kind of terrain (ex - a rope bridge, or a pile of rocks that you could roll, and buildings made up of wall components that can be deformed/destroyed...)

Think of interactions done simultaneously by different players on the same objects and the server having to arbitrate/resolve the composite outcome of the interactions (and then post the changes to clients and possibly to other zones' overlap boundries AND to NPC AI running on other servers).


That could add up to ALOT more processing -- its like having many 1000s of the dim NPCs (we now have only dozens of) running in each zone. That also multiplies the amount of data required per zone.


--------------------------------------------[size="1"]Ratings are Opinion, not Fact
Quote:Original post by wodinoneeye
Quote:Original post by hplus0603
Quote:future interactive terrain vastly will increase the processing power needed


I'm not so sure about that. Interactive terrain really shouldn't be much more expensive, computationally, than current terrain. However, interactive terrain is a giant distribution (networking) challenge, as you need to get the terrain modification data to be able to interact in an area, and sometimes that data is pretty large.





I was originally thinging of interactive furniture - things you could manipulate each with reactive scripted behaviors/triggers etc.. but then an extension of that with the real physics kind of terrain (ex - a rope bridge, or a pile of rocks that you could roll, and buildings made up of wall components that can be deformed/destroyed...)

Think of interactions done simultaneously by different players on the same objects and the server having to arbitrate/resolve the composite outcome of the interactions (and then post the changes to clients and possibly to other zones' overlap boundries AND to NPC AI running on other servers).


That could add up to ALOT more processing -- its like having many 1000s of the dim NPCs (we now have only dozens of) running in each zone. That also multiplies the amount of data required per zone.


There was a talk about this in GDC called "Mercenaries 2: Networked Physics in a Large Streaming World", should be worth a read.
Quote:Original post by wodinoneeye

Think of interactions done simultaneously by different players on the same objects and the server having to arbitrate/resolve the composite outcome of the interactions (and then post the changes to clients and possibly to other zones' overlap boundries AND to NPC AI running on other servers).


That could add up to ALOT more processing -- its like having many 1000s of the dim NPCs (we now have only dozens of) running in each zone. That also multiplies the amount of data required per zone.


That doesn't add as much to processing power as it requires you to fully synchronize everything. If you wish to preserve proper sequence of events, you are almost guaranteed to be unable to extrapolate, which makes it considerably more difficult to provide smooth gameplay.

A considerably bigger problem here would be properly resolving interactions where clients have latency 10-1000ms, all interacting with same object. Do you handle it immediately, then rolling back every time as needed, resulting in lots of re-processing and rubber-banding, or do you wait for slowest clients, resulting in large delays for everyone.

Speed of light constraining communications would be the limiting factor. Even with infinite CPU power, the client-side latency would remain the same.

This topic is closed to new replies.

Advertisement