Wargame/RTS Network Architecture - a danger of Lock-step Commands Synchronization

Started by
3 comments, last by gwynnbleid 17 years, 6 months ago
Hi, I'm working on a real-time wargame with thousands of unit on map gathered into squadrons drawn up in different formations. As you may understand the pathfinding and tracking algorithms are rather complex here. I need to put a multiplayer in this game. In general I have two different ways to do that (it was already discussed on GameDev but my question has its own specific - see further): 1) Game objects replication (periodical object data propagation) is a server-side simulation approach; 2) Lock-step commands synchronization (as it is proposed in article "1500 Archers on 28.8") is a client-side simulation approach. Usually for huge ammount of units it is reasonable to choose the second option - it saves bandwidth notably. But there can a problem I worry about. To use a lock-step command synchronization I need to make certain unit processing algorithms deterministic ones i.e. I should be sure that on every station the state of object in every moment is fully determined only by its initial state and player's command. In particular it concerns the pathfinding algorithms that are very complex in squadron-based RTS and wargames. So I worry that deterministic algorithms testing and debug will take extremely lot of time and resources in my case. It can even kill the project. I'm not the Microsoft and I have no resources enough for year of testing. As it is written in "1500 Archers on 28.8" article: "At first take it might seem that getting two pieces of identical code to run the same should be fairly easy and straightforward - not so. The Microsoft product manager - Tim Znamenacek told Mark early on “In every project, there is one stubborn bug that goes all the way to the wire - I think out-of-sync is going to be it” - he was right. The difficulty with finding out-of-sync errors is that very subtle differences would multiply over time." The only thing that will prove me an applicability of lock-step command synchronization in my wargame is for example that Mindscape made their Warhammer: Dark Omen multiplayer using this approach or Creative Assembly made their Rome Total War multiplayer using it. In other words I need some proof that a team with up to 6 programmers in it can use this approach to finish multiplayer wargame not more than for 1,5 years. Any information - facts and assuptions will be highly appreciated. Thanks in advance, Denis "Gwynn" Ischenko
Advertisement
In computer science (and especially the part that deals with programming language design and abstract analysis) we use a trick. To write algorithms for which a given property is true, we design a framework in which the property is true and remains true after any operation done by the program. Then, we prove that the framework has this property and implement it. Once this is done, any program written using the framework will have the property or be rejected by the compiler as malformed. This is how, for example, there are never any null-pointer-dereference bugs in ML languages.

If handled correctly, it would allow you to avoid bugs altogether, saving a lot of debugging time by designing the framework first.
Some general thoughts:

- Lockstep means it's practically impossible to implement late-joining of a game already in progress. For an RTS game that might not be a big thing, but it depends on your game.

- Lockstep also makes it harder when you start patching things - you've got to maintain exactly the same behaviour to allow different version to play against each other. Of course it may be acceptable to only allow people to play against the exact same version as themselves.

- Checksums are your friend. You can generate checksums for the various bits of state in your game (simple XOR is usually good enough) and send them at regular intervals to other players for comparison. As soon as the checksums deviate then you halt the game and debug it to see where the two diverged. If you do this well you can have it halt on or a few frames after the inital deviation, instead of seconds or minutes later when the change becomes visible.

- Threading and floating point tends to be very non-deterministic. Running your pathfinding in a different thread is likely to be very difficult to make deterministic. Using floating point numbers will probably require you to use maximum CPU precision (costing you performance), or you may find it easier to use fixed point maths.
Quote:So I worry that deterministic algorithms testing and debug will take extremely lot of time and resources in my case.


You should add a CRC of the game state at the beginning of each tick, to each outgoing packet. Every client should then calculate these CRCs and compare to the others. As soon as someone is out of sync, you stop all the clients in the game. If you have a good user input record and playback solution (or license one -- one's being advertized in Game Developer magazine), then you can easily reproduce the game to the point where it goes wrong.

It's not that hard to implement late joining in lock-step, but you have to pause the game for everybody until the late joiner has gotten the entire state dump.

With lockstep, every client needs to run the same version of the software, so make sure you exchange versions before you start the game.

I would stay away from floating point for a lockstep game. You CAN make it deterministic, at least within the same CPU architecture, but it's more trouble than you'll want when starting out. Fixed-point is good.
enum Bool { True, False, FileNotFound };
Hi all!

Thanks for replies. It should help.

I've just got to know that Relic used lock-step commands synchronization approach for their RTS games. I feel myself better now :)

Looking forward for more ideas :)

This topic is closed to new replies.

Advertisement