Lockstep games - waiting and synching worlds

Started by
14 comments, last by suliman 11 years, 11 months ago
Hi
In a lockstep multiplayer RTS i advance the world "one step" once ive received the command for that turn from each connected player. But if i havent received all yet i have to wait (stop all simulation) until they have come in.

My q is: When i come to a "must wait" situation should i stop all simulation or should all updating that effects the world only be done in discreet steps (ie when the lockstep is completed and the world moves forward "one step"). Everything between must be interpolated and just for show?

For example unit position. If my lockstep is 100ms i cannot simply move units in the actual lockstep, they will then move in 10fps which will lock bad. But should i do all calcuilation with this and then have some local offset for each unit (that updates with the framerate) that is only for drawing and not connected to calculations?

This could give me a lot of extra work.

Is it common to have "ministeps" within the real locksteps? For example always allow 10 ministeps for each real step and update position ten times and then stop. When a lockstep is done 10 more ministeps is done available. If a wait is required all the 10 ministeps is used up so simulation stops anyway. This way each client will perform its 10 ministeps each real turn and nothing more, and each ministep is deterministic because it uses the commands from last real step (which is same for all clients otherwise that step wouldnt have been triggered in the first place).

Am i totally out of line here or what:)
Plz confirm or debunk my theories
Erik
Advertisement
Yes, you run simulation in discrete steps -- say, every 100 ms, or every 16 ms, or whatever. To display units working, you use a render and animation frame rate that is interpolated.
Clients must send the commands for the game in sufficient time that they are generally available when the time comes.
Typically, this means you'll have something like 500 ms delay from sending a command, until it takes effect. Usually, you hide this in game design with an acknowledgement animation -- send the command right away; start playing the acknowledgement animation; by the time the animation finishes, you've received the command back for the right turn, and it's started to take effect.
Note that this means that you don't "act on" your own commands in the simulation, until you get them back from the server together with everyone else's commands for that turn.
enum Bool { True, False, FileNotFound };
I would say that nowadays the way to go is async. C++11 and boost::asio make it so easy to write a multithreaded game server with the minimum amount of locking and without compromising readability.

For your simulation, look into the reactor pattern and using a priority_queue to manage your upcoming events.
Why as long as 500ms delay? Would i need more than the ping time to send to server? And then 100ms should be enough right?

But why not let each client send to each other client directly (max 8 players) instead of sending to server and then server send to everyone. Should save time no?This is called peer-to-peer i guess if i understand correctly.
But why not let each client send to each other client directly (max 8 players) instead of sending to server and then server send to everyone. Should save time no?This is called peer-to-peer i guess if i understand correctly.[/quote]
Peer-to-peer is nontrivial to get right and requires a complete rewrite (if you've already got a client-server thing up and running). Besides, it isn't very suited to games at all, as peer-to-peer connections tend to be very volatile (and there's the question of who owns the "true" game state, etc...). Put bluntly, it just isn't that simple.

Why as long as 500ms delay? Would i need more than the ping time to send to server? And then 100ms should be enough right?[/quote]
You could make it a setting in your client, but never rely on a particular latency. Some people live next door to the server and get 2ms latency, whereas others play overseas and can get 350+ milliseconds of ping. For FPS's I think the worst you can aim for is 450 or so, at which point the game becomes pretty much unplayable, but I'm not sure what the limit would for RTS's. it depends how fast-paced your game is, for instance.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”


Why as long as 500ms delay? Would i need more than the ping time to send to server? And then 100ms should be enough right?


First, you need round-trip delay, because the data need to get to the server and then back for you to have all players' data to start a step.
Second, if you think 100 ms is sufficient to cover all client-to-server network delays, you're in for a rough awakening :-) Even 250 ms one-way may not be enough, especially with players using cell phones as portable hot spots, people in Australia playing on Finnish servers, players torrenting Battlefield while their girlfriend watching Downton Abbey on Netflix, and the internet weather being bad.

Modern games will actually just adjust based on the delay it sees from the server. However, the game animations need to be designed to mask the longest reasonable delay.


But why not let each client send to each other client directly (max 8 players) instead of sending to server and then server send to everyone. Should save time no?This is called peer-to-peer i guess if i understand correctly.
[/quote]

Fully connected peer-to-peer turns out to have all kinds of problems that aren't immediately apparent. A project that starts out that way usually doesn't end well. Anything from NAT problems to netsplits to saturating player upstream bandwidth, or just plain code complexity, will become an impediment to this architecture.
If you really want to go that way, you should search this forum for old peer-to-peer threads.
enum Bool { True, False, FileNotFound };
Thanks for the help guys i have a server-client setup that seems to work now. Next problem is random numbers. I set the random seed to the same across all worlds at startup and it seems to ALMOST work.

Client and server get the same randoms at first but later then go off-synch (if i run 1 server and 5 clients all clients will always be identical but server will go off-synch from the random numbers).

But i dont know what is wrong... If i skip random numbers they never go offsynch even if I add commands that are depending on that all worlds are at the same lockstepTurnNumber.

What could i have missed? (and is this how i am supposed to deal with random numbers? I need a lot of them). Is it better to set random seed every turn? (sending it from the server each multiplayer tick)
The server is supposed to generate random numbers for each client individually and send the numbers to them. The clients should *never* be trusted to generate their own numbers. Why do you need to synchronize random numbers? Generally the server does all the work and then returns the results (which may include the random numbers in a readable form, say, a throw of dice). The client shouldn't actually have to manipulate the random numbers at all, except perhaps for recording purposes.

If, you can guarantee that your clients will never attempt to cheat, then you can have the server send each client a random seed, as you do, so that the burden of generating the random numbers (which isn't much a burden, really) is transferred to the client, but this means clients have to send them back to you ultimately so that you can update your server's gamestate, so I don't really see the point.

There are pseudorandom number generator constructs that are tolerant to desynchronization (some are even self-correcting) but I'm not sure you actually need to do it like this. Often the simplest approach works best (and is easiest to replace when the time comes, too).

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

What makes a game desync in lock step systems?

  • Random numbers. As a rule, I don't even use rand() and srand(). I write my own RNG, they're really trivial to make, unless this RNG must be used for encryption or other sensitive applications, which isn't your case (Google Linear random number generators, or mersene twister rng). The trouble with rand() is that it's one seed for each thread, and you don't trully control what happens inside. Don't forget to seed each properly, and of course, call your rand() equivalent the same number of times on each machine. Writing your own RNG eases you the ability to include debug logs that warns you when a PC called rand() more often that it should.
  • Uninitialized memory. Painful to find, painful to debug. Once you have a solid server-client code, this accounts for 98% of desyncs.
  • Memory corruption. Painful to find, painful to debug.
  • Floating point inconsistency. Ensure you've explicitly set the FPU control word to the same state in all PCs. Use _controlfp for that. Must be done on each thread you run the deterministic simulation. Don't use 80-bit precision. Periodically check the control word state hasn't been altered by other DLLs you call (i.e. DirectX)
  • Inconsistent starting conditions. You have to ensure everything starts the same way.
  • You're not sending ALL your inputs to your server, something you forgot (i.e. an in-game menu that affects something), or something you forgot the server to broadcast to the clients.
  • Race conditions, if you use more than one thread and it's code somehow affects the deterministic world.
  • The server is not waiting correctly for the clients, or the client went ahead of time by one or more frames. Make checks to ensure no one goes past the server.
  • Last but not least, butterfly effect is a bitch. Some engines perform, in debug, one or multiple CRC based on health, position, etc each frame, therefore you can track down a desync.
  • Since you have a deterministic system, writing a replay system (dump your input to a file rather than sending it to the server; then play it back in another session) IS A MUST. Most desyncs have little to do with networking code; and you can fast forward replays (how fast depends on your CPU power). In the long run replay systems will help you track other bugs totally unrelated to desyncs, and eases you knowing what exactly is going on inside your game, frame by frame.

Hope it helps,
Cheers
Dark Sylinc


Edit: Forgot to mention, you can't mix different builds. You can't mix Debug with Release binaries. Or Binaries built with GCC with binaries built MSVC. Also, make sure your game data is the same (i.e. level, models, and scripts)

The server is supposed to generate random numbers for each client individually and send the numbers to them. The clients should *never* be trusted to generate their own numbers. Why do you need to synchronize random numbers? Generally the server does all the work and then returns the results (which may include the random numbers in a readable form, say, a throw of dice). The client shouldn't actually have to manipulate the random numbers at all, except perhaps for recording purposes.

This is not needed (except when authentication is required). A client will compute their RNs on their own, and the server will do that too. And they must match. The client/server never needs to send the RNs, except for some starting messages, which must be originated from the server. If a client tries to manipulate it's RNs, it will desync. That kind of cheat is not possible (but deterministic games have other types of cheats being eased, like remove fog of war cheat, or "I see what you're doing" cheat; because every client knows everything about everything)

This topic is closed to new replies.

Advertisement