Distributed server architecture for load balancing

_winterdyne_ · 2005-10-27T12:47:30

Okay, seeing as my previous architecture thread didn't seem to draw a lot of flak, I guess most of it was at least semi-rational. I hope. However, I'm trying to puzzle a way around congestion based lag, and I'd like to ramble about it where it can get critique. Pick it apart as best you can please. Apologies if this seems long and drawn out, it is. But I hope you'll find it a relatively interesting read. I mentioned a relevance layer previously, and since what I'm talking about relies on the concept, I'll describe it here, along with a brief description of the architecture. Architecture: A game instance is run on one or more machines (boxes) and consists of a three kinds of processes - a game database server (pretty standard), a 'master' (world) server and a number of generic 'slave' (zone) servers, which operate in a heirarchy. There need only be one server process running on any given box, but there can only be one 'master' within a cluster (sometimes referred to as a microcluster). Edit: Connected clients are handled through a connection object which migrates between server processes as clients move around. The overall structure of the system looks sort of like this: Relevance Graph: The game universe in the system is not broken up into rectangular 2d zones, but instead is organised more like a heirarchical (sp?) scenegraph, with network-aware areas of variable size. This is referred to as the 'relevance graph'. Nodes in this graph can be thought of as 'points of relevance', and coincide with physical features, such as rooms, areas of terrain, etc. Actors and other objects in the game universe a 'relevance limpets' and are *always* attached to a point of relevance. The relevance graph is used to determine event relevance within the game system, both at the network and simulation layers. Anything that happens in a point of relevance is 'most likely' going to be important to everything contained in it. Neighbouring points may only receive events of a certain type (a soundproof, sealed glass box might only receive visually-oriented messages). The graph allows a designer to set what spreads how at any point in the world. Events also have a given radius of effect which is checked with the relevance graph as well. Neighbourhood is not used here to imply physcial adjacency, just that events of certain types occuring within (or passing through) one point of relevance may affect another. Absence of neighbourhood implies that points should have NO bearing on each other. Shortcuts are used around this system in some places, where a specific entity is the target of an event. The master server maintains a list of where all entities are, so a message can be efficiently delivered without propogating along the heirarchy as normal. A server process is given a node of the relevance graph to deal with, and it deals with that and all points leafward (generally 'contained by') of that point, unless another server takes control. Slave servers are also informed of a master, which will manage the addition of slaves. Slaves are informed of neighbour or child relations with other slaves, so direct communication amongst themselves is possible. All of the points of relevance that a server process has control over are referred to as its 'domain'. A domain, functionally, is close to a traditional mmorpg zone, whereas the point of relevance is more conceptually similar. Update process: Updating the relevance graph occurs in three phases, all of which occur in one server 'tick' in the server update threads. The threads on each of the servers in a microcluster are brought into sync during this process. Firstly, a logical update runs, which fires the events from their generators and adds them to any relevant recipients. High priority UDP is used to transfer these across server boundaries if required. This process must complete before the second phase can begin. This phase is started by the master server and triggers the process in slaves through a cascade effect. The second phase of the update is the handling of receivers that have had an event passed to them. This may include the addition of messages to those receivers' outbound queues, including appropriate network messages to an avatar's associated client. This process occurs simultaneously on all server processes in the microcluster, and is synchronised (started, flagged finished) by the master server. Multi processor servers may utilise worker threads to process more than one part of the heirarchy simultaneously. The final phase is network transmission (incoming messages are handled by a separate thread) to clients - this is done on each server in batches, a POR at a time. Firstly out of date or obsolete empirical state updates are discarded from the outbound queues, then those queues are processed and sent. This process has an overall timeout value, and unsent messages are preserved for the next cycle (and are sent in order/priority). Timeout occurences indicate congestion in the area and mark the process as congested. Timing and data stats are kept for data transmission and time to transmit at each POR and can be used to determine where the congestion is actually occurring. UNDER time exits (early finish) for this process is also recorded by the server process. Stats are transferred up to the master server regularly for congestion detection and handling. Illustrative example (edit): Consider the diagram below: Each box represents a node in the relevance graph. Here we have a simple game world where the two areas, Dark Forest and Dwarf Mountain are segregated. Both are 'adventuring areas' and it's assumed the player base will usually be evenly split between these two areas. As such, Dwarf Mountain has been assigned to a slave server. The heirarchies in each domain exhibit internal neighbourhood - Dark_forest_main, the POR that models the bulk of the Dark Forest is designated as a neighbour of the Dark_forest_clearing. Any message generated in either of these can have an effect (subject to range, etc) on the other. Cross-server neighbourhood is illustrated between the cellar, and the tunnel. This is a good example of where the population can be kept low (limited numbers will fit in a cellar or tunnel) to limit the amount of traffic between the two POR's. Parent-child relationships imply neighbourhood (what happens in the clearing is heard and seen in the cottage and vice versa) but only by one step - what happens in the clearing is NOT seen or heard in the cellar. Considerations: I expect the majority of traffic to be chat and object-description exchange. Such traffic should mostly be contained within a single server, with descriptions being drawn from cache, not database. Cross-server traffic is likely to be '/tells' or events occuring 'at the edges' of a server boundary. Good (physical) world design should keep this low. I don't want to enforce a 'zone limit' for occupancy unless I can possibly help it, apart from areas where such a limit makes sense (typically room-like areas). This is an option open to a game designer, and although potentially useful is not a requisite of the library I'm building. I do want to allow more than 50 players to gather in a quiet location and let them meet without them lagging up an entire server, especially somebody's combat. In games where real-time (or almost real-time) combat is used this would be highly annoying. There are a *lot* of mobiles wandering around the world. The system is NOT designed to cope with 'blanket coverage' of players, but more with uneven distributions. Mobiles are aggregated and simplified when not visible to a player, and indeed can usually be aggregated even when visible. The system's designed to be configured at startup (before players connect) so the master server can inform all the various slaves what they need to manage and allow time for load up and synchronisation of the abstract simulation layer (simplified geography and demographics used for statistical simulation of mobiles, resources etc). This takes some time. Congestion here I define as the situation where the network exchange TO CLIENTS in the server process update times out significantly and persistently, apparently due to traffic to an isolatable (leaf) point of relevance. Note that traffic is measured in complete UDP message sizes. Packet storming is a server security issue and is dealt with at the UDP level itself. The strategy I have in mind is as follows: Defining significant as a point where more than 10% of network traffic is left over still to process persistently; Defining persistently as for a period of approximately 3 second, or proportionately less if the volume of outstanding network traffic (congestion) increases. On detecting a congestion situation in a server process, we locate areas of congestion within that process' - first looking at leaf nodes averages, then progress up the relevance heirarchy until a disproportionate congestive average is found at a certain layer. We eventually determine an area of the heirarchy that is responsible for the congestion, and know how much traffic it has pending, how much it typically generates, and also the statistics of all processes, and as a result each box running such processes in the microcluster. What I can't decide is what to do with the guilty chunk of heirarchy once it's identified, and I'm trying to think of ways of reintegrating the heirarchy once it's no longer necessary to be segregated from its greater body - and indeed how to judge that situation. When such a segment of the relevance graph is definitely causing congestion it seems obvious to transfer ownership of that chunk to a quieter process. This has the obvious flaw of causing fragmentation of the relevance graph, which is a Bad Thing. I need to come up with a means of determining whether the cost of transferring a chunk of the graph is worthwhile, and some form of 'defragging' the graph occasionally. Does anyone know of any systems that do this, or has anyone got any ideas for things I might need to track to make this work efficiently? Sorry for the (exceedingly) long post, but hey, I'm scratching my head... and I need coffee. [Edited by - _winterdyne_ on September 19, 2005 5:49:04 AM]

Networking and Multiplayer Programming

Started by _winterdyne_ September 17, 2005 07:22 AM

50 comments, last by _winterdyne_ 18 years, 6 months ago

_winterdyne_

530

Author

September 21, 2005 03:39 AM

A-P, as you suggested, each event has a radius of effect, as do sensors for event types.
The aggregated sensory range of the contents of each POR (a variable size AABB) is used for coarse collision testing between effect area and possible sensors. This is done by a parent POR. Knowing maximum radii for various event types allows culling based on the PORs' actual (physical) sizes so there's an early-out method in play.

You are also correct in your observation that high adjacency can cause more overhead regarding neighbourhood. This is partly why a regular grid based spatial model is not being used (although it could be). If the majority of those grid squares don't contain anything 'interesting' to cause the player base to spread out, the majority of the player base will centre around the provided content. Also consider, this may be used for a true 3D game, such as a MM space sim - using a regular grid based system in such a game has its disadvantages - not only because of the extra dimension of free movement, but also, as mentioned because of the big empty spaces between any meaningful content. This is why I refer to the system as a *relevance* graph, rather than simply as a spatial divisioning system. I want to be aware that it is the organisation of *content* and its delivery that I'm dealing with, not merely dividing up a landscape into regular sectors.

In terms of crossing server boundaries, message-aware objects are intended to be placed into a transitional state. The object will be temporarily locked, and messages for it will be cached by the master server, and inserted to the start of the new queues once the object is reactivated. The object will experience a short additional lag as this occurs.

Anyway, this'll be my last post on the subject for a couple of weeks, since I'm off on holiday to the Algarve today! Thanks to all for some thought-provoking and helpful answers and I'll see y'all when I get back.

<virtual postcard>

Winterdyne Solutions Ltd is recruiting - this thread for details!

Anonymous

September 21, 2005 04:35 AM

Actually I was considering the grid as 1) just an example of the possibility of large numbers of N-way 2D/3D adjacencies (versus some games that are simply a network of portal connected bubbles) and 2) as a universal coordinate system that distances could be calculated with (to impliment the culling).

Another problem that you may not need to solve is: where you have vehicles (like a ship on water) that can move and cross area boundries (and even exist in 2 or more simultaneously) and which has its own local relational reference system seperate form the world's -- events have to cross boundries that change , translate AND rotate....

_winterdyne_

530

Author

October 08, 2005 07:08 AM

For mobile hierarchies such as ships etc, there is a roaming subclass of POR.
Within the mobile hierarchy operation is exactly as per normal. There are also subclasses for pure hierarcical and physical PORs. In my example in this thread, the _root and game world PORs are pure hierarchical PORs, and the rest physical since they may contain stuff. Mobile objects (including mobile PORs) can only exist within physical PORs, but a mobile object is in effect a pure hierarchical POR. Thus, a mobile cannot contain a mobile.

The mobile node itself is treated similarly to mobile POR 'limpet' or actor, in that it does NOT expand the physical extents of its parents (which fixed hierarchy POR's do). Its parent link is changed on complete exit from its current parent and is reselected primarily on volume inclusion (most volume included in a hierarchical fixed POR). I have a planned variant which will take velocity into account, and select a parent based on occupancy over a certain period of time.

Note that mobile POR movement, similarly to mobile actors (as opposed to placement / teleportation) is limited to neighbour relations.

Rotation of a POR (can be applied to any POR, but in runtime only to mobiles) dirties its contents positions, which triggers an AABB update of the parent. This 'dirtying' does NOT update the AABB of the parent of a mobile POR, but will update the mobile POR itself, which may result in it being transferred to a new parent fixed POR.

Message handling from a mobile POR is via its fixed parent. Since the neighbourhood relations of the mobile (outside of its internal hierarchical links) may cause message repetition or duplication, the fixed hierarchy is used since the neighbourhood relations there are not generally variable.

Because of this a mobile can only exist in ONE fixed POR at a time, but may be affected by nearby neighbours of that POR.

Does this sound about right to you folks?

Winterdyne Solutions Ltd is recruiting - this thread for details!

hplus0603

11,917

October 08, 2005 01:48 PM

Why wouldn't you just keep your entire world in one big coordinate system, and use that for all simulated entities? Why use this ship as a point of reference, when you can just express both the ship, and the things on it, in the world coordinate frame?

enum Bool { True, False, FileNotFound };

_winterdyne_

530

Author

October 08, 2005 03:34 PM

The coordinate system is global.

The reason for the hierarchical structure is message culling at a node level.

In this way, a mobile entity may actually contain its own effective hierarchy - say that the ship contains corridors and rooms within it. Neighbourhood relations can constrain message traffic within that hierarchy.

Winterdyne Solutions Ltd is recruiting - this thread for details!

Anonymous

October 11, 2005 05:03 AM

Quote:Original post by hplus0603
Why wouldn't you just keep your entire world in one big coordinate system, and use that for all simulated entities? Why use this ship as a point of reference, when you can just express both the ship, and the things on it, in the world coordinate frame?

The ship moves (translation and rotation) and you would have to constantly update
the coordinates of all the objects on the 'mobile' as well as all the terrain geometry (like the decks, passageways etc...) for the objects on board to interact (they generally will interact more with other things onboard than with things off the ship). Events that cross the boundry would have to be converted from the local coordinate system to that of the parent (which the ship sits on) or PARENTS
(when the ship crosses one or more boundries at that level of the hierarchy).

There isnt any simple way to do this and you will have to eat the extra calculations.

Yet another fun case is having objects on 2 ships interact with each other.
The topic author is working on formalizing the way the events are passed between
different referents (including where the other 'referent entity' exists on another machine).

_winterdyne_

530

Author

October 11, 2005 06:43 AM

Quote:
(snip)
Events that cross the boundry would have to be converted from the local coordinate system to that of the parent (which the ship sits on) or PARENTS
(when the ship crosses one or more boundries at that level of the hierarchy).

The global coordinate system fixes scale and orientation, not origin - this is always 'local to parent' - ie. the parents 'real' position is 0,0 for the children.

The AP is correct, assuming that a mobile built using a mobile POR is designed for constraining messages inside itself - otherwise Hplus is correct and less work is involved in having multiple mobile 'limpets' (a term I use for occupants of a POR) being moved together - most likely through the use of a 'congregation' mobile. Mobile POR's are not designed for things like rafts, where the contents are visible, but physically tied. They are designed for discrete hierarchies which roam within a larger hierarchy - like cruise ships.

Quote:
Yet another fun case is having objects on 2 ships interact with each other.
The topic author is working on formalizing the way the events are passed between
different referents (including where the other 'referent entity' exists on another machine).

This is actually trivial and is best examined by a 'bad case' scenario, where a fairly large amount of traffic may be generated across a machine division. I'll avoid going into too much detail about synching the various steps of the process and the worker threads that deal with inter-server communication, since its all fairly simple.

Assume we have three physical POR's, Land, Bay and Ocean, forming part of the divisive POR, Coastline, which is part of GameWorld.

Land is hosted on ServerA.
Bay is hosted on ServerA.
Ocean is hosted on ServerB.
ServerA is the Master Server for this cluster, and also hosts Coastline.

Land neighbours Bay, Bay neighbours Ocean.

Bay contains mobile POR ShipA, partially in Ocean.
Ocean contains mobile POR ShipB, partially in Bay.
ShipA has two occupants, PCA and PCB.
ShipB has a single occupant, PCC.

For the sake of argument, Land will be out of range of 'shout' events from both ships.

As mentioned, a relevance update occurs in 3 stages, the logical update, the handling update and finally the client update.

Step 1 - Logical update:
ServerA syncs ServerB.

PCA (on ShipA) shouts "Ahoy there!", for the sake of argument.
This generates a 'PCA shouts "Ahoy there!"' message.
The PC, being a limpet, deposits its message directly to its owner, ShipA.

ShipA being mobile, forwards the message to its parent, Bay. The current location of ShipA and the origin of the message give the location of the event relative to Bay. Mobile PORs cannot be responsible for message distribution, since their extents may cover varying portions of the fixed hierarchy.

Bay is now responsible for dealing with forwarding the message to all PORs that are in range of the event. Our shout message expands outside the area occupied by Bay, and so must be passed to the parent, Coastline for handling.

Coastline recieves the message and calculates the event origin relative to itself. In this case the message and its radius of effect lie entirely within the division governed by Coastline. The message is queued for handling. GameWorld need not be informed of the event.

ServerA being Master, checks that no slaves have pending events this cycle before continuing.

Step 2 - Message handling:
ServerA syncs ServerB.

On ServerA:

Coastline checks its child nodes' extents, Land, Bay and Ocean. Bay and Ocean are covered by the event, Land is not. Note that Coastline knows nothing about the contents of Ocean, since that is on a different machine, although contents can be queried for.

Since Bay is local, a handler function is called directly with a pointer to the message. Since Ocean is not local, a network message is created with a copy of the message data, is encrypted (if internal encryption is used on the cluster) and added to the server send queue. A worker thread despatches these immediately. This is efficient on dual processor systems, but has no advantage over waiting till the end of this step on single processor systems.

Bay handles the message by passing it to all its contents' message handlers (since it is a shout). ShipA receives the message and behaves similarly, passing it to PCA and PCB.

Both PCA and PCB handle the message by adding it to their network send queues.

On ServerB, simultaneously to the above:

ServerB gathers update messages from machineA using a worker thread, and processes them immediately, in addition to handling any events generated and handlable internally (in this case none this cycle).

The shout message is received and passed to the default slave root node, which contains Ocean. Ocean passes it to ShipB, which passes it to PCC, which places it in its send queue.

On ServerA:
ServerA informs its slaves there are no further events from it this cycle. This puts the worker thread on the slaves to sleep.

Step 3 - Client update:
ServerA syncs ServerB.

On ServerA and ServerB, simultaneously:
Send queues are processed cascade fashion. PCA and PCB receive messages from ServerA, PCC receives messages from ServerB.

The worse-case scenario where the shout is sent from ShipB works similarly, but the shout message actually crosse the server boundary TWICE, rather than once, since Coastline is remote as far as Ocean is concerned - the message is passed up across a network connection, rather than just down as in the example above.

Does this make sense? Maybe I should do a diagram?

[Edited by - _winterdyne_ on October 11, 2005 8:43:47 AM]

Winterdyne Solutions Ltd is recruiting - this thread for details!

hplus0603

11,917

October 11, 2005 04:23 PM

Quote:The ship moves (translation and rotation) and you would have to constantly update
the coordinates of all the objects on the 'mobile' as well as all the terrain geometry (like the decks, passageways etc...) for the objects on board to interact

When the ship lurches, you want the player to lurch, too, not just stay rock solid on the ship's deck. In fact, gravity doesn't change when the ship moves, but the ship changes. If you run an actual physical simulation, it would be simpler to simply change the ship, keeping everything in world coordinates, because the physical simulation will take care of everything. ("having to update all the objects" is not a problem -- because they are simulated, they are updated every frame anyway)

If you run a non-physical, old-game kind of simulation, perhaps using an old-school scene graph system, then isolated coordinate system islets would make sense. But hopefully nobody's actually building yet another one of those in this day and age!

enum Bool { True, False, FileNotFound };

Anonymous

October 12, 2005 03:36 AM

Quote:Original post by hplus0603
Quote:The ship moves (translation and rotation) and you would have to constantly update
the coordinates of all the objects on the 'mobile' as well as all the terrain geometry (like the decks, passageways etc...) for the objects on board to interact

When the ship lurches, you want the player to lurch, too, not just stay rock solid on the ship's deck. In fact, gravity doesn't change when the ship moves, but the ship changes. If you run an actual physical simulation, it would be simpler to simply change the ship, keeping everything in world coordinates, because the physical simulation will take care of everything. ("having to update all the objects" is not a problem -- because they are simulated, they are updated every frame anyway)

If you run a non-physical, old-game kind of simulation, perhaps using an old-school scene graph system, then isolated coordinate system islets would make sense. But hopefully nobody's actually building yet another one of those in this day and age!

I suppose if you are going to do physics like that you could try a universal coord system that updates constantly. Unfortunately you will have to decide where you want the 'realism' to stop (in order to make a game that doesnt require a supercomputer to run in real time). Sure you could have the player lurch about (and check all the friction effects that keep objects in place most of the time) but what of the entire structure of the ship/boat. Do you want to have to calculate all those structures effects by every force upon them to calculate all the transformations for positions (culling methods would help this some). Its probably more cost effective to apply the various 'lurch' forces to the player and other 'moveables' within a local coordinate system to minimize the CPU load.

We are still stuck with those 'old school' mechanisms because the game worlds have grown larger and the detail level higher (more objects) enough to require running the simulations on multiple machines (as per the topic).

Anonymous

October 12, 2005 03:42 AM

Quote:Original post by _winterdyne_

Does this make sense? Maybe I should do a diagram?

Yes, I think I followed most of it, but diagrams would make help visualization.

Also are there any additional complexities for an event that dont just have a simple origin and spherical effect (shout) but have more directional interactions (ie- an arrow fired) that may need to do a collision check/ LOS (line of site) and/or non-instantaneous (a traveling arrow) that itself moves over time and ack! may have secondary effects (like being visible/viewed by other obects as it moves...).

Distributed server architecture for load balancing

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Distributed server architecture for load balancing

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines