Replay & recorded games

Started by
32 comments, last by captain_crunch 8 years, 4 months ago

In my game, I have implemented a replay system for debugging purposes. When recording it uses the command pattern (all player actions are stored as a command data structure in a list) and seed numbers for the RNGs to store enough data to replay a game session. I also store a small amount of data to compare with and verify during replay.
The problem is that the verification during replaying most often shows a diversion from the recorded game. This makes the system unusable for most purposes.

I have already sunk a lot of time into the system but still hope that it can be made more reliable. I just need a way to find out why and how the replay session diverges. One idea is to use my save game feature to store a CRC-number based on the entire simulation state instead of the few variables that I am storing now. This would give reliable information on when the diversion happens. But I still need to find the place in the code, or the specific simulation data that causes the diversion. Is there any other way than extensive logging to do this? And how would this work without bogging down the whole program.

Advertisement
I think it is impossible to do a perfect, non-diverging replay (binary-compatible, so a CRC of the state will be the same). Imagine what happens when the user presses a key. You receive an event and log the event along with the current time, and then you process it. Events happen at arbitrary times, but the timer increments in steps of X milliseconds (or microseconds if you use a more high-res operating system than Windows). Even with a fixed timestep (say, one simulation step equals 20ms), you will due to diverging timers inevitably land one or the other odd event that belongs into frame 346335 into frame 346336 instead. That's because while you receive them at arbitrary times, you can only sort them into timer-granularity-sized buckets.

On the other hand, there are of cours plenty of opportunities of doing a replay not just slightly different, but alltogether wrong. Your usage of plural in "seeds for the RNGs" suggests that you have several RNGs running concurrently. Accessing them concurrently from different threads too, maybe? Bang. Scheduler assigns one time slice differently, and everything explodes. More information would obviously be needed to give a more fact-based answer (an uninitialized non-pointer variable somewhere is equally possible). But that's one rather obvious thing that can go wrong.

The game is single threaded and I only have one RNG for the simulation, so the description was a bit inaccurate.

I think it is impossible to do a perfect, non-diverging replay
You just declared OpenTTD cannot exist.

Network play in OpenTTD is being done by having each machine in the game update its local state, and only send the changes added by the users between the clients.

Obviously that system will break down when some clients compute different results. This is called a desync, and indeed, it's a nightmare to trace the cause.

There is a whole set of code conventions in place to avoid desyncs (I don't know them all, but eg no use of real numbers, since different cpu architectures have different ideas about rounding). In addition, there is code in place to dump all commands and their results as they are executed, which creates GBs of data if you let it run for a few hours at the server.

Determinism is totally possible - you just have to be careful and thorough. Whether it's worth the time investment for you I cannot tell.

Here's some useful links:

http://gafferongames.com/game-physics/fix-your-timestep/
http://gafferongames.com/networked-physics/deterministic-lockstep/

OpenTTD

OK, so according to what you describe, you have a (non-authorative?) server that merely records the changes made by (trusted?) clients.
That is different insofar as of course it is exactly known what happens at every tick, and sure enough you can play this back endlessly, and exactly binary-identically. Once the client has decided that this and that has happened (and shows it to the user), that is exactly what is being logged at the server, too.
What remains as an interesting problem is what happens if two clients decide to build a house in the same location at the same time. If clients are to decide on outcomes, this can be... mucho fun. Not even thinking about a cheating client...

But the OP stores player events (keypresses, mouse clicks, joystick... what else?) while a simulation goes on at some rate. Ideally a fixed timestep, but nothing has been said, it could in the worst case even be a FPS-dependent rate! Even with a fixed timestep simulation, however, it's very hard, if possible at all, to get a 100% identical playback, though.

The outcome is not fixed, only the inputs are. Ideally, of course, if the inputs are the same, the output should be binary-identical (computers are deterministic machines after all!). But a computer is not a completely deterministic machine, even in absence of floating point math. Not if scheduling, preemption, and timers are involved and are part of the equation.
My current project relies on determinism and I am able to reproduce actions based on user input
https://platformrpg.wordpress.com/

If you haven't been designing for determinism from the start though you may have a hard time getting it to work after the fact. As PeterStock suggested, fix your time step.
Be sure you aren't losing numerical precision when saving floating points values. Don't store the number in a text format like JSON, it should be saved as a binary format.
If you are recording and replaying across different processors there may be floating point rounding differences and there is nothing you can do about it except switching to fixed point simulations which I don't recommend.
My current game project Platform RPG


If you are recording and replaying across different processors there may be floating point rounding differences and there is nothing you can do about it except switching to fixed point simulations which I don't recommend.

Just having reliable replays on the same machine would be a big help in bug fixing.

I am using variable/non-fixed time steps. When recording, I store the elapsed time each frame in a text file in the form of a string like "0.0045674" the number of ticks which is an integer. During replay, I pass these values to the simulation. Not sure if this can cause problems?

Yes, that will definitely be problematic - like HappyCoder says, save/restore the binary representation of floating point values.

It's not actually necessary to use a fixed time step to get repeatable/deterministic behaviour for replays, but it is (much) easier for certain applications (like keeping 2 games in sync over a network).

For variable time steps, you need to make sure you use the same sized time steps during replays, which it sounds like you already are.

Tracking down determinism bugs will be hard, but if you do get to the bottom of them (and there likely will be many - not just one!) then the knowledge you gain from it will likely be useful in the future :-)

This topic is closed to new replies.

Advertisement