Article on Lockstep Multiplayer

Started by
15 comments, last by Sergey Ignatchenko 8 years, 3 months ago

I wrote an article on lockstep multiplayer, with a particular focus on the steps you can take to prevent or catch the vast majority of desync bugs long before you write a single line of multiplayer code:

http://www.tundragames.com/minimizing-the-pain-of-lockstep-multiplayer/

Advertisement

Thanks for sharing! I've had my fair share of desync bugs and one thing I can attest to is logging everything and having ways to replay those logs. Otherwise it's so freaking hard to debug, especially if you allow for state rewinding and fast forwarding.

when i did the multiplayer version of SIMTrek / SIMSpace i also went with lockstep and fixed update speed. it was necessary for the high degree of accuracy required by a hard core flight sim.

but i handled things a bit differently...

"if every player shares their input with every other player, the simulation can be run on everyone’s machines with identical input to produce identical output"

its only necessary to share relevant state changes with other players to keep all players in sync. input can be processed locally, and just the pertinent results are sent to other

players.

" lockstep is especially attractive to mobile developers because cellular and bluetooth connections can be extremely poor relative to the broadband connections that PC and console developers can generally rely upon."

i wrote my own transfer protocall. its was so robust you could unplug the phone line and plug it back in and the game would keep right on running without missing a beat. so lost packets was a non-issue. all said, at the end of the day, you can only ACK so many ACKs, then you just have to take it on faith that the packet got through. if it didn't, that's what auto-re-send and auto-re-sync are for.

"The code that drives your game logic must be fully deterministic across all the machines that will play against each other. That is, the machines must run the exact same set of calculations based on the exact same set of inputs and produce the exact same results."

unnecessary if calculations are performed locally, and just the results are transmitted.

" in common with most lockstep games we share checksums of the game state between machines to detect desynchronisation and treat checksum mismatches similarly to network errors by stopping the game and displaying an appropriate message."

with a robust protocall, lost packets go away. by transmitting results, not input, you always get the same results on all machines. with no lost packets and the same results on all machines, you basically can't lose sync, so checksum is unnecessary.

"One of the simplest but often forgotten steps that we took was to organize our code to make it obvious which systems needed to be fully deterministic for lockstep"

by transmitting results, not input, nothing has to be deterministic.

"Right from the start we made sure that our simulation used an independent random number generator from the rest of the game.... ...Obviously, the random number seed used by the simulator needs to be agreed upon by all machines and we did this by generating a seed from a checksum of the shared launch settings.

by transmitting results, nothing has to be deterministic, so separate random number generators with matching seeds for deterministic code sections are unnecessary.

"Floating point numbers can present something of a problem"

if you perform floating point operations locally, then transmit just the results, floats are not an issue.

"Having decided not to use floating point math, we naturally decided to use fixed point math in its place."

by transmitting results not input, floats are not an issue, so fixed point is unnecessary.

"The main tool we used was that every time we step our simulation forward, we load the previous state, and step forward again. We then compared the two new states we produced and if there are any differences then it indicates a problem in our determinism. in order to achieve this we wrote code to serialize and deserialize the entire simulation state. "

sending results not input means nothing has to be deterministic, so all this is totally unnecessary when sending results, not input.

"We found it was incredibly valuable to invest the time on code to let the computer run automated matches overnight... ...our overnight single-player tests caught lots of rare event desync bugs."

sending results over a robust protocall means no loss of sync, so no automated testing required.

"Despite all our care there were a small handful of desync bugs that slipped through the net."

sending results over a robust protocall basically means this can't happen.

"It is invaluable during development to have extensive debug logging, so that if a rare desync occurs you can pinpoint the cause without necessarily needing to repro. Our multiplayer logging in Rapture involved serialising and writing the entire state to a logfile every frame, along with the launch data structure and any input messages. "

with no loss of sync, no debug logging is required.

"We had an interesting bug caused by ambiguous sequencing that looked something like: MyFunction(myRNG.GetRand(), myRNG.GetRand());"

By sending results, this becomes a non-issue.

its encouraging to see someone building a game with lockstep accuracy, as opposed to the typical sloppy prediction BS of non-lockstep. but you might want to consider sending results, and building a more robust transmission pipeline as a way to vastly simplify your life.

by sending results over a robust protocall, all of the following become unnecessary:

1. sending all input to every player

2. deterministic code

3. checking sync by transmitting game state checksums

4. keeping track of deterministic vs non-deterministic code

5. a separate random number generator for the simulation

6. identical seed values for all random number generators.

7. dealing with floating point issues between compilers and platforms

8. avoiding use of floating point

9. use of fixed point

10. serializing / de-serializing the entire game state

11. running update twice to check for sync errors.

12. automated "burn-in" testing for sync errors

13. debug logging of sync errors

14. dealing with different evaluation orders on different compilers / platforms.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

"The code that drives your game logic must be fully deterministic across all the machines that will play against each other. That is, the machines must run the exact same set of calculations based on the exact same set of inputs and produce the exact same results."


unnecessary if calculations are performed locally, and just the results are transmitted.


Then it is not a deterministic lockstep simulation.

One of the main benefits of deterministic lockstep is that your network bandwidth depends only on how many commands a user can issue, not how many entities are involved in the simulation. The classic article on this is the "1500 archers on a 28.8 kbps modem" article (linked from the FAQ.)

Also, if the client sends the results, then there is no way for the other clients (or the server) to verify that the results are correct, and thus cheating is possible. Deterministic lock-step essentially prevents state-altering cheats, because all the other clients will "vote" on what the outcome of your inputs are, so if you cheat, you will de-sync.

What you describe is a different mechanism. That's also a legitimate way to network games, especially if the local player only controls a single unit (like a car, plane, or avatar.) But it's not what's meant by "lock-step simulation."
enum Bool { True, False, FileNotFound };


Then it is not a deterministic lockstep simulation.

oh definitely not. its a way to maintain sync in a game designed with maximum randomness that does not rely on deterministic behavior. and it makes no allowances for cheating. it was used for multiplayer co-op missions in a single player starship flight sim, so cheating wasn't really a consideration.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Nice article :).

@spinningcubes | Blog: Spinningcubes.com | Gamedev notes: GameDev Pensieve | Spinningcubes on Youtube

Nice article :-)

I'm using a sort of lock-step model in my game. It's just like the normal RTS approach, sending just the inputs, but it's for more of an action game, so it doesn't wait until everyone's inputs arrive before running a step. Its just plows on ahead with what it's currently got, and rewinds-and-replays if new inputs come in late. This means low latency response, potentially at the cost of some snapping. I think of it as combining the best bits of both the 'client sends inputs / server sends state' and the lock-step worlds :-)

rewinds-and-replays if new inputs come in late


Sounds like what "GGPO" also did. It's a fine model as long as your simulation is cheap to run (and rewind) and there aren't any terribly cliffs-edge parst of your game. (Do I need a "lock" on a vehicle before firing a missile? Is "lock" based on simulation? What happens if I had a lock, fired the weapon, and then get corrected to a simulation where I don't have the lock?)
enum Bool { True, False, FileNotFound };
Yes, it's the same as GGPO, as far as I understand how it works. My simulation isn't that cheap to run, but state save/restore is pretty cheap. It's all just a physics simulation, so there's no control effect discontinuities like your weapon lock/fire example - it's just a jump in the position of objects in the world, and the size of the jump is limited by the nature of the control that the player has - player control is to apply forces to motors, rather than instantaneously change the position of objects.

I should say that I do run the clients behind the server, so most of the inputs arrive in time and rewinding is only needed to cope with jitter. So the amount of rewinding is small and client-side only. But the trade-off is higher control latency than the GGPO method of running client and server in sync.

Deterministic lock-step has an inherent flaw (openy admitted in the article) that it goes with a speed of the slowest player. What about (ab)using deterministm in a different way - to send all the inputs to the server, where the server will timestamp them, and to send them to all the clients where the input will be "replayed" based on determinism? In other words, the only thing which server will do, is timestamping+forwarding.

It seems that such a simplistic server will scale much better than plain deterministic lockstep and will work better over the internet too (I will still prefer more traditional MMO approaches without relying on cross-plafform determinism and providing protection from "see through walls" kind of cheats, which are inevitable when relying on determinism, and allowing to recover from more-than-a-few-seconds-absence, but that's a very different story altogether)

This topic is closed to new replies.

Advertisement