Jump to content

  • Log In with Google      Sign In   
  • Create Account

Synchronizing time over clustered simulation


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
5 replies to this topic

#1 fholm   Members   -  Reputation: 262

Like
0Likes
Like

Posted 13 December 2011 - 05:53 PM

So, been messing around with clustering my simulation and one issue has come up: Synchronizing simulation steps and game time between different nodes in the cluster. Is this even something you should do, or should the nodes all run their game time completely independent?

One issue I can see is when an actor crosses a zone boundary, and you need to transfer it from one simulation node to another, since there usually is a lot of time-dependent state associated with an actor, for example cooldowns on spells in an RPG. The way I see it going down is this:

  • Currently owning node packs the whole actor state into a binary format
  • It gets sent to the node that is to take ownership of the actor
  • The new node unpacks the state and continues the simulation
Now, ignoring any time-related issues with cooldowns, AI walking across a path over a set time, etc. this seems like it could work pretty ok.

One way I envisioned to solve the time/step issue is to have a "master" server which pulses all simulation nodes for each simulation step, and with the pulse sends the current "game time" (which this server also controls). Then when one actor gets packed together to get transferred to another simulation node you could pack the latest step/game time that actor ran a simulation step, then when it gets unpacked on the other side and the next simulation step happens you can calculate an actor-specific delta time (since both nodes get their "game time" from the master/time/step server) for the first step, and the actor can compensate for the potentially "lost time".

Another solution would be to put all the "time related" stuff like say buff timers or cooldowns on a specific server, and some how calculate everything there and pull the time dependent state from the server down to the actor, no matter on what node it is currently running.

Someone with more insight into how to solve this probably has a way better solution, but this is how I would envision it to work. Or maybe an actor never should switch simulation node/server? But I don't see that working seeing as a world might be very big and for one node to be able to hold data for any point of the world (pathfinding, spawn points, etc.) and synchronizing a zone would be a nightmare if all the actors are spread over too many simulation nodes.

Sponsor:

#2 hplus0603   Moderators   -  Reputation: 5717

Like
0Likes
Like

Posted 14 December 2011 - 12:27 PM

So, been messing around with clustering my simulation and one issue has come up: Synchronizing simulation steps and game time between different nodes in the cluster. Is this even something you should do, or should the nodes all run their game time completely independent?


If the zoning is "seamless" to the client, then you need to keep the game step counters synchronized between servers. Luckily, servers are generally grouped tightly together, so they will have excellent clock correlation -- you could derive step number from global time, and use NTP to synchronize the time aggressively, for example.

When it comes to clock management in simulations in general, you want to establish a global ordering of events, but the times don't have to be perfectly synchronized as long as all events happen in the same order everywhere. Better clock sync between client/server tyipcally leads to a slight reduction in latency jitter, because the calculated latency will be accurate; on most cases this is much smaller a variation than what you get from transmission itself, or from the time step size, so it usually doesn't matter.
enum Bool { True, False, FileNotFound };

#3 fholm   Members   -  Reputation: 262

Like
0Likes
Like

Posted 14 December 2011 - 12:58 PM


So, been messing around with clustering my simulation and one issue has come up: Synchronizing simulation steps and game time between different nodes in the cluster. Is this even something you should do, or should the nodes all run their game time completely independent?


If the zoning is "seamless" to the client, then you need to keep the game step counters synchronized between servers. Luckily, servers are generally grouped tightly together, so they will have excellent clock correlation -- you could derive step number from global time, and use NTP to synchronize the time aggressively, for example.

When it comes to clock management in simulations in general, you want to establish a global ordering of events, but the times don't have to be perfectly synchronized as long as all events happen in the same order everywhere. Better clock sync between client/server tyipcally leads to a slight reduction in latency jitter, because the calculated latency will be accurate; on most cases this is much smaller a variation than what you get from transmission itself, or from the time step size, so it usually doesn't matter.


From your response I take it using a server that pulses all nodes is not a good solution? I know you're really good at this stuff, so don't really wanna argue with you but: About NTP? Is it accurate enough? Especially on windows, the wikipedia page about NTP says: "However, the Windows Time Service cannot maintain the system time more accurately than about a 1-2 second range" this is obviously not accurate enough at all. And while doing some research on NTP the general statement on NTP accuracy usually is "it depends". And even only linux/unix systems the accuracy it at best a couple of ms.


I don't know if my idea of using a "time" server that pulses all simulation nodes for each tick, and attaches the correct time to it (and then use a leaky integrator on each node to calculate the current game time) is a good idea either, as it hardly will be more accurate then NTP.

#4 hplus0603   Moderators   -  Reputation: 5717

Like
0Likes
Like

Posted 15 December 2011 - 10:43 AM

From your response I take it using a server that pulses all nodes is not a good solution? I know you're really good at this stuff, so don't really wanna argue with you but: About NTP? Is it accurate enough?


Note that the NTP numbers are related to a long-distance connection. NTP within a data center, with a local NTP server that everyone synchronizes to, should be better than millisecond precision.

A single "pulse" server can work, but I see two weaknesses:

1) You introduce a single point of failure -- when this service dies, your entire cluster stops
2) If the pulse broadcast packet gets dropped or delayed, then some arbitrary set of your servers won't make the time step
If instead you pulse each server using a separate TCP connection, then you have the TCP drop/re-transmit jitter to worry about. No data center is 100% packet drop free; making it so would require the input buffering and processing capacity of each individual node to grow with the size of the total data center.
enum Bool { True, False, FileNotFound };

#5 wodinoneeye   Members   -  Reputation: 877

Like
0Likes
Like

Posted 26 December 2011 - 11:38 AM

When your object is packaged (and is frozen) record the simulation time on that server
and on reception of the object lump record the receivers local simulation time
and when its unpacked prorate the difference onto all the timed effects, cooldown, etc..)
with it able to handle past due cases (if you have any real significant delays in this transfer).
Game type and game mechanics will decide how 'damaging' one hiccup is to the players
game experience.

Keep the simulation clocks close between clustered servers by one (current) master clock on a node
that can fall to another node on failure (actually a whole sequential list of who takes over when the primary
(predecessor) stops sending/fails) . Its all relative time has nothing really to do with real world time.

How close do they have to be ?? (perceived irregularities)

What will a player notice ?? WIll they be able to discern a 1/10th second late indicator showing up on
their client end (visually) when that transmission itself has a wider variance of surging internet data ??

All kinds of events get delayed and most MMORPGs have very loose visual cues matched to results
of player commands and actions taken by NPCs and other players.

A LAN based fighting game needs to be tighter timewise but since you refer to clusters that is another subject.
--------------------------------------------Ratings are Opinion, not Fact

#6 AllEightUp   Moderators   -  Reputation: 4268

Like
0Likes
Like

Posted 08 January 2012 - 07:24 PM



So, been messing around with clustering my simulation and one issue has come up: Synchronizing simulation steps and game time between different nodes in the cluster. Is this even something you should do, or should the nodes all run their game time completely independent?


If the zoning is "seamless" to the client, then you need to keep the game step counters synchronized between servers. Luckily, servers are generally grouped tightly together, so they will have excellent clock correlation -- you could derive step number from global time, and use NTP to synchronize the time aggressively, for example.

When it comes to clock management in simulations in general, you want to establish a global ordering of events, but the times don't have to be perfectly synchronized as long as all events happen in the same order everywhere. Better clock sync between client/server tyipcally leads to a slight reduction in latency jitter, because the calculated latency will be accurate; on most cases this is much smaller a variation than what you get from transmission itself, or from the time step size, so it usually doesn't matter.


From your response I take it using a server that pulses all nodes is not a good solution? I know you're really good at this stuff, so don't really wanna argue with you but: About NTP? Is it accurate enough? Especially on windows, the wikipedia page about NTP says: "However, the Windows Time Service cannot maintain the system time more accurately than about a 1-2 second range" this is obviously not accurate enough at all. And while doing some research on NTP the general statement on NTP accuracy usually is "it depends". And even only linux/unix systems the accuracy it at best a couple of ms.


I don't know if my idea of using a "time" server that pulses all simulation nodes for each tick, and attaches the correct time to it (and then use a leaky integrator on each node to calculate the current game time) is a good idea either, as it hardly will be more accurate then NTP.


The way I've always dealt with this is to basically "not" deal with it at all except through side effect. What I mean is that I don't try to get all fancy with time synchronization, I only send a couple pings around the cluster about once a minute to get a general idea if any boxes clock is drifting and if needed send a couple extra correction ping/pongs to fix anything that drifts off the master servers clock too much. That's as far as I go with time correction. The portion which fixes transfers by side effect is how I transfer entities across boundaries. I don't do it all at once, I start proxying the object at a set distance such that most data is already on the proxy when/if ownership does actually transfer. Some of the benefits of this are tied to how my system works in general so may not apply but here's the general outline:
  • Send an entity ID to the node where we want to proxy the object, nothing more. The node builds the proxy and fills in some basic data such as model, textures, inventory, etc etc by reading from the DB. This warms up the DBCache layer so the data is ready for use when/if the proxy becomes the master.
  • Add the proxy as a viewier to the master object, basically identical to adding a player in visible range. The normal correction systems take over to fill in position, orientation, current animations, etc. The only difference here from another player is that error tightening is based on the distance to the plane between nodes, i.e. I won't update the proxy very often if the player is just running parallel to the plane, only update more often as they get closer to the plane and thus the hand-off. (There is a zone about 1/4th distance where updates are duplicated as fast as possible, this is "combat" range defined, i.e. if players from that node and this node are in the zone, they can directly effect each other and thus need to be updated quite quickly.)
  • At the point of hand-off, things are only slightly complicated by latency. I.e. the proxy could be forwarding messages to the old master while the "take ownership" message is in flight, this is actually easily solved, a single bit in the message say's "I was or was not the master" when I sent this message. So, any in flight messages which arrive after sending the "you own it now" message just get bounced back to the master. This does mean a few messages take up a bit more bandwidth than normal but this is trivial in the overall picture of things. Especially when you do this sort of thing in a corner between 4 nodes, fun ensues.
  • Any slight differences are generally hidden in network latency to the client and unless something goes horribly wrong (as in take down the cluster type error) this just works without very much complexity. Getting the proxy system working well is the key item, that can be a little tricky but not much more than general player to player visibility and awareness systems. Just think of the other node as another player for the most part, all players owned by that node will send messages to the proxy which will bounce to the master, only messages from the master will directly effect the proxy.
Now, my real work uses a much older style of server which works but has it's own share of nightmares as we scale up... Posted Image This is a number of years of individual effort to put together all the bits and pieces to make a complete framework based on best practices learned from a number of multi/massive player games I've worked on. The current one is keeping me extremely busy, hence lack of posts for a while.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS