Distributed server architecture for load balancing

Started by
50 comments, last by _winterdyne_ 18 years, 6 months ago
Quote:Original post by _winterdyne_
Quote:Original post by hplus0603
I could tell you how it all works, but first you'd have to sign a bunch of legal papers :-)


Isn't that always the way? :-)

Quote:
We run one server process per machine. Running multiple processes has no advantage, because the area served by our processes can be irregular in shape (and even discontiguous, although that's usually not a great idea for other reasons). If we need to shift load, we change the area that each machine is responsible for, rather than moving the processes. Each simulating object knows how to move itself to the "most optimal" server for that object, so when we change around mappings, the appropriate objects will automatically migrate. Usually the players won't notice when their objects migrate (because of the "seamless streaming world" implementation, which already involves real-time migration).


So, given a change in area on a particular machine/process that change has to be migrated to all processes in the grid? You've stated you use a modified quadtree, I take it this is used to determine which is the most optimal server, given an objects extents and the known areas covered by each process in the grid. Elegant, given a fixed origin coordinate system. I also assume you are generally dealing with a 2D world (as far as zones are concerned).

Couple of questions, I was reading up on DungeonSiege's continuous world design and they ran across floating point precision errors at large distances. In short they overcame this by using an alterable point of reference. Are you using sliding scales for determining quadtree nodes (a 10km tree vs a 1m tree)?

Also, given an irregular shape, how do you determine continuity? Colinear edges on area perimeters? It's one of the reasons my fixed POR's have AABBs rather than arbitrary - I considered the design difficulties of placing continuous arbitrary hulls nightmarish, not to mention the fact I always hated Tetris, especially in 3d, whereas most people can easily figure out how to put together axis aligned box.





If I remember, DungeonSiege was a single player and they would recalculate all the relevant coordinates (all zones within view) when the player crossed into a new current area and convert to use use that area's origin (and maybe only really recalc all when the displacement was significant enough to start effecting accuracy....) I dont think this would work too well with a N-player continuous world where events really need to be compared using a universal coord system.


Advertisement
The actual operation of node transformations on events is independent of the number of players. So one, or a thousand players, if only a few events occur at zone boundaries, the system works pretty much the same. Problems occur only if many events are crossing zone boundaries within a short time frame.

Arranging the layout of such areas that interesting, congregation-attracting content is near the centre (out of event range of the boundaries) alleviates this problem.

If we know transformations between neighbouring coordinate systems for zones, when an event crosses a boundary it can be implicitly converted to the new coordinate system when placed in the events list / queues for the new zone.

With event comparisons being done within each zone/process, rather than in a centralised place, there is no need for a unified coordinate system, since events and boundaries can be transformed to the appropriate coordinate system as they migrate. There is little overhead to doing this in addition to the network transmission of those events between machines.
Winterdyne Solutions Ltd is recruiting - this thread for details!
Quote:Original post by _winterdyne_
The actual operation of node transformations on events is independent of the number of players. So one, or a thousand players, if only a few events occur at zone boundaries, the system works pretty much the same. Problems occur only if many events are crossing zone boundaries within a short time frame.

Arranging the layout of such areas that interesting, congregation-attracting content is near the centre (out of event range of the boundaries) alleviates this problem.

If we know transformations between neighbouring coordinate systems for zones, when an event crosses a boundary it can be implicitly converted to the new coordinate system when placed in the events list / queues for the new zone.

With event comparisons being done within each zone/process, rather than in a centralised place, there is no need for a unified coordinate system, since events and boundaries can be transformed to the appropriate coordinate system as they migrate. There is little overhead to doing this in addition to the network transmission of those events between machines.




If you can work with those constraints (few events crossing boundries)then it woudnt be a problem. But the idea of load leveling runs contrary if you give the players the freedom to mass in small areas and you have to subdivide the concentration AND you dont want high loss of important events between players (from filtering/culling cross boundry events).
If you do have to send those events, the coordinate conversion does add up to some significant CPU....




Yes, in the situation where the congregation is not in a divisible area (loads of players at the bank), or you have one large congregation in a zone as opposed to several smaller ones, there isn't really a lot you can do, whether or not you have a unified coordinate system. Such areas can be designed to minimize the cross-boundary traffic.

I disagree that the conversion cost of events would be substantial, especially compared to the updates that are running regardless of event migration - even the migration of a complex object's collision hulls only implicitly needs a position and quaternion for orientation ([Edit] and subsequent loading of the appropriate mesh, unless a similar mesh is already in play). Subsequent transformations of hull geometry are the responsibility of the physical simulation layer which would occur anyway.

It's more likely that problems occur from the data associated with an event, especially chat events, which have larger data sizes than most other events, and do not loan themselves well to lossless compression methods (like RLE, for example). I reckon network lag will hit before CPU strain.

Edit: Spelling.

[Edited by - _winterdyne_ on October 14, 2005 9:43:28 AM]
Winterdyne Solutions Ltd is recruiting - this thread for details!
Quote:I also assume you are generally dealing with a 2D world (as far as zones are concerned).


No, it's a full 3D earth-sized planet. One of the modifications in the quadtree is to make it map to a full sphere, although it does wedges centered on the center of the planet. We could easily use an octree instead; the specifics don't matter that much.

We use double precision for all physics, so we don't have to worry about the sliding scale. That way, we can cover an area from +/-8,000,000 meters without worry. Which means we'd have to switch to another coordinate system if you traveled to the moon. We have the technology to support that, but haven't had the need to implement it.

Regarding events crossing zone boundaries, you can make the observation that all events are either: 1) generated by predictable server-side algorithms or 2) generated by unpredictable users. Then you can build the entire system to make sure that both sides see the same thing, without necessarily having to send anything across at all. If you are really gung-ho on the gory details, then please apply for our open jobs :-)
enum Bool { True, False, FileNotFound };
Quote:Original post by _winterdyne_
Yes, in the situation where the congregation is not in a divisible area (loads of players at the bank), or you have one large congregation in a zone as opposed to several smaller ones, there isn't really a lot you can do, whether or not you have a unified coordinate system. Such areas can be designed to minimize the cross-boundary traffic.

I disagree that the conversion cost of events would be substantial, especially compared to the updates that are running regardless of event migration - even the migration of a complex object's collision hulls only implicitly needs a position and quaternion for orientation ([Edit] and subsequent loading of the appropriate mesh, unless a similar mesh is already in play). Subsequent transformations of hull geometry are the responsibility of the physical simulation layer which would occur anyway.

It's more likely that problems occur from the data associated with an event, especially chat events, which have larger data sizes than most other events, and do not loan themselves well to lossless compression methods (like RLE, for example). I reckon network lag will hit before CPU strain.

Edit: Spelling.




I was thinking of you having continuous floods of events (ie- position updates) crossing seamless boundries and traversing your hierarchy (and having to be translated numerous times as it spread to many adjacent areas). But if you can eliminate cases of seeing objects at long Line Of Sight (etc) from a players viewpoint (thus having few cross boundry cases) then the problem pretty much goes away.


The loading delay problem calls for a preload threshold to pull data into memory ahead of need. This buffer area then becomes its own headache with predicting and prioritizing when data needs to be preloaded across machine boundries (or complex hulls etc.. precalculated ) before the actual transition between zones takes place. Default place holder data is handy when the transition happens before the data transfer is completed...

Quote:Original post by hplus0603
Quote:I also assume you are generally dealing with a 2D world (as far as zones are concerned).


No, it's a full 3D earth-sized planet. One of the modifications in the quadtree is to make it map to a full sphere, although it does wedges centered on the center of the planet. We could easily use an octree instead; the specifics don't matter that much.

We use double precision for all physics, so we don't have to worry about the sliding scale. That way, we can cover an area from +/-8,000,000 meters without worry. Which means we'd have to switch to another coordinate system if you traveled to the moon. We have the technology to support that, but haven't had the need to implement it.

Regarding events crossing zone boundaries, you can make the observation that all events are either: 1) generated by predictable server-side algorithms or 2) generated by unpredictable users. Then you can build the entire system to make sure that both sides see the same thing, without necessarily having to send anything across at all. If you are really gung-ho on the gory details, then please apply for our open jobs :-)



"generated by predictable server-side algorithms" ... A neat trick especially if you have complex NPC AI (one- making it deterministic AND two- duplicating the AI processing in all adjacent areas).

Example of 'complex' -- NPCs react to each other as well as to the player(s), so you would need to process the AI for all the other NPCs it interacts with further away from the boundry, and react to other player's actions that might be out of range of the original player -- again maybe further inside that adjacent area (and maybe across futher boundries in a wonderful cascade of reaction dependencies....)




Quote:I was thinking of you having continuous floods of events (ie- position updates) crossing seamless boundries and traversing your hierarchy (and having to be translated numerous times as it spread to many adjacent areas). But if you can eliminate cases of seeing objects at long Line Of Sight (etc) from a players viewpoint (thus having few cross boundry cases) then the problem pretty much goes away.


My PORs can specify what event types they can 'pass along'. For example, designing an anteroom where the entrance and exit do not share a common line of sight can be used to curtail boundary traffic. In most cases event migration is handled by parent POR - this prevents looping in migration paths.

Again, this is two parts layout, and on part architecture - the POR mechanism is not designed for very high activity adjacencies - where possible any hierarchy splitting must be done at low activity adjacencies.

Quote:
The loading delay problem calls for a preload threshold to pull data into memory ahead of need. This buffer area then becomes its own headache with predicting and prioritizing when data needs to be preloaded across machine boundries (or complex hulls etc.. precalculated ) before the actual transition between zones takes place.


My libraries maintain two discrete levels of simulation - coarse, and fine.

Coarse simulation abstracts away fine route finding (falling back to predesigned coarse routes (less nodes), large amounts of physics, actual interactions, and is used to simulate world behaviour when a client is not aware of it. A server's entire domain is modelled at this level. (NPC) Entity migration can occur at this level for special circumstances, but the NPC data itself is not required.

Fine simulation is the interactive level, where PORs that contain client entities, and PORs that are potentially relevant to clients, as specified by the relevance graph (and manually definable Potentially Relevant Sets for each POR) are modelled. This is the level where real collision detection and entity movement (fine detail route maps) etc. are handled. Asset management runs on recent use stats, with items used infrequently or a long time ago being unloaded first.

Part of the load process is the conversion of abstract data stored in the coarse simulation (say NPC position-scale along a coarse node map link) to finer detail simulation data (fine node map position) and finally to actual physical position of the entity.

Quote:
Default place holder data is handy when the transition happens before the data transfer is completed...


Actually I use placeholders for a lot more besides - including asset substitution on the Client, if or load of asset is pending on the client, we can define that any sword might use a default sword item.

Using placeholders for any critical event on the server is not a great idea. It's more sensible to use common data - for example every creature has a similar collision hull, altered only by scale, position, orientation, whatever - it's then easier to ensure the server remains authoritative.
Winterdyne Solutions Ltd is recruiting - this thread for details!


I have a similar abstract to real LOD mechanism in my project. My areas/zones are a regular grid (instead of variable sized/shaped). I rollin the high detail data on the server when a player is nearby (7x7 areas around each client). The client is streamed the terrain mesh data as the client moves (it does caching locally to minimize resends). Distant terrain is simplified and sent to fill in the horizon (15x15 areas). The worlds terrain is actually computer generated -- using a variation of 'splatting' for the meshes, templates for significant features like coastlines/rivers/mountain ridges and a large master seed pattern (various infinite world type mechanisms rarely generate any realisticly coherant terrain -- especially when you have complex patterns of inhabitant placement and behaviors).

Additional terrain in the form of structural objects (including mobiles/buildings) and 'tunnel' type meshes for interiors/caves etc... Theses sit on the primary terrain and can overlap grid boundries and can themselves be navigated on (so I have a 2 tiered hierarchy) with similar event transmission and boundry crossing problems (and local coordinates conversions for the mobiles).


Even the NPCs are split between 'Significant' objects (with high AI ability and strategic activity in 'abstract mode') and 'Lackeys' which are used for seed driven encounters and local filler populations (the lackey term came from their use to fill out a group that get filled in/expanded/realized when a significant object is a group and is near a player).


The design includes multiple servers to run the active area/zones (assigned in groupings to minimize cross server event transfer and to coordinate NPC community behaviors. Seperate servers run the NPC AI (acting much like clients themselves),here is where I expect to need more flexibility in load leveling.

The most interesting part of the project is going to be automatic quest/mission generation, where a hierarchy of parameterized templates are used to give a wide variety of different problems for the player. A mission scenario would have prologue code to patch in/adapt to the location/environment that it is placed in. When a player triggers a scenario, various props (clues, targets, obstacles) are placed appropriately. The scenarios can chain -- additional seeds can be placed in the vicinity (and only activate if the player gets near them). Yet another abstract/real mechanism to minimize server residency as the scenario details get rolled out of memory if the player walks away....


Yes, I have a similar mechanism for quest generation - I think this is the way a lot of projects will (sensibly) go for small team, or large gameworld projects- manually scripting such quests and events is labour intensive, and that's something the small teams can't afford.

In the case of trying to implement systems like this, a very structured set of abstractions have to be used - in effect we end up writing not just one MMOG, but several, as events have to be mirrored or handled at several layers of abstraction.

It remains to be seen whether a product implementing this kind of system succeeds in presenting a believable world - there's a discussion in the game design forum (I believe) about random events in RPGs. Using randomly generated plotlines can always lead to a contrived feel, and avoiding this is the holy grail as far as procedural content generation is concerned.

It also remains to be seen whether the efforts we put into designing these systems are rewarded by something playable at the end.
Winterdyne Solutions Ltd is recruiting - this thread for details!

This topic is closed to new replies.

Advertisement