Clientside prediction / Rewind + Replay bug

346

Author

April 23, 2008 04:42 AM

Hey, I'm currently building a client/server game in C#. I've got a basic setup working pretty smoothly already, using the Lidgren network library. I based my implementation of network physics on Glenn Fiedler's article, which can be found here: http://www.gaffer.org/game-physics/networked-physics. It briefly describes a fixed timestep, input driven, client/server physics simulation. It also outlines client side prediction. My implementation is pretty straight forward. I keep a circular buffer of X number of recent moves sent to the server, and when a move comes back from the server, I check it against the move I stored in the buffer. If the position is within an arbitrary threshold, I do nothing. If the position is wrong, I go back to the matching move in my buffer, and set the correct position I just received from the server, then replay all the moves up to the present based on the corrected state. For the most part, it works just as it should. Problem is, there seems to be a bug in how I'm replaying the moves that sometimes results in a self-perpetuating loop of errors. It is always at the end of the replay that it shows up. Let's say the client is 5 moves ahead of the server when the server sends a correction. The client replays all 5 moves correctly, but #6 is wrong. Now, since we're 5 states ahead, we only find out that #6 is wrong when the client is on #11. So the client replays from 6 to 11, then 12 is wrong. Etc. The problem is that it only happens sometimes. And the error seems to cause the next one which causes the next one. There are other factors also. The above scenario happens when the input stream to the server is constant. I.e, holding down a directional key, moving your charactor. If you stop moving, the error loop will stop after a couple moves. (data is still being sent, just no movement changes). I've been trying to fix this for two weeks now, and I'm pretty much stuck. I've rewritten it 3-4 times, reviewed it 50 times, tweaked it interminabley and still I've got nothing. I don't really like asking someone else to try fixing it, because I'm not even sure exactly where the bug is, so I can't just post a little snippet of code. It would mean grabbing the whole project and figuring it out. The upside is that project is still pretty small: client is ~1000 lines, and the server is less. I'm using SlimDX for graphics, Lidgren for the network, and Farseer Physics engine for physics. IDE is VS 2008. Code is here: http://www.usedthoughts.com/mp1.zip If anyone can point out where I'm going wrong, I'd appreciate it very much.

If you gave a helpful reply, I rated you up.

shurcool

439

April 23, 2008 11:02 AM

I'm working on a client/server game that uses the exact same technique, and it works fine.

I don't have time to look at your code atm (I might later), but I can give you some advice and maybe it'll help.

It sounds like it could be an issue with timing/sequence - perhaps you're polling for input/sending commands on the client one tick too early/late? What I mean by that is you have to have worked out a very precise system where you know exactly where and when the next command occurs, what each input # represents, and so on.

Let's take your example:

Quote:Let's say the client is 5 moves ahead of the server when the server sends a correction. The client replays all 5 moves correctly, but #6 is wrong. Now, since we're 5 states ahead, we only find out that #6 is wrong when the client is on #11. So the client replays from 6 to 11, then 12 is wrong. Etc.

If the client replays all 5 moves correctly, what is #6? Is it the next command that will occur in the near future?

You have to be very precise in what you mean by command #6 or #5, etc.

If you already are, then try to put some output for each client/server message.

Print the command #, the player position before, position after, and the command itself (i.e. move left?).

Do that for both the server and the client. Play your game until you see this behaviour occur, then quit immediately and go through the latest output line by line.

Most likely the sequence numbers are off by one somewhere. It could be the difference between "++counter; doDomething(counter);" and "doSomething(counter); ++counter;".

Anyway, hope that's somewhat useful in case you haven't already done all that. But don't worry, you'll resolve this sooner or later. It probably took me about that long to get this system to be completely stable and bug-free.

Kal_Torak

346

Author

April 23, 2008 02:55 PM

Thanks for the pointers shurcool.

That it was an off-by-one error was my first assumption, but I'm pretty sure now that it's not the problem in this case.
I've debugged this thing to hell and back. I've printed out harddrives full of debug information, making sure everything is in sync.

One thing that is particularly annoying about this bug is that it doesn't happen all the time. It only occasionally gets stuck in the loop. If it was an OBO error, it should happen more consistently.

Due to the decoupling of the rate of my physics updates from the rate I'm sending updates over the network, the order in which I do updates is a little more complicated, but I still can't find any error in the logic there.

If you gave a helpful reply, I rated you up.

shurcool

439

April 23, 2008 03:27 PM

Ok, but can you explain what exactly you meant by #6 being wrong when the client is 5 moves ahead and replays all 5 of them correctly?

Quote:Original post by Kal_Torak
One thing that is particularly annoying about this bug is that it doesn't happen all the time. It only occasionally gets stuck in the loop. If it was an OBO error, it should happen more consistently.

Are you using threads? Could this be a thread synchronization problem?

fenghus

187

April 23, 2008 04:29 PM

I noticed you're simulating 300ms of latency; does the problem appear/disappear or vary in degree with amount of lag?
Does the lidgren net log display anything special when the problem appears? (resent packets etc? turn on 'verbose'...)
Btw, the base library class Stopwatch uses QueryPerformanceCounter internally so you can swap out HiResTimer.cs if you want less code in the project.

Kal_Torak

346

Author

April 23, 2008 08:41 PM

Quote:Original post by shurcool
Ok, but can you explain what exactly you meant by #6 being wrong when the client is 5 moves ahead and replays all 5 of them correctly?

I'll try to explain how I'm doing the updates here.

(A side note first, I should have said in my first post. To reproduce the error, first hold down either left or right,(A or D) and press escape to force a message loss. It will lose one message only. To reset it, press spacebar. I usually have to keep losing messages for 15-20 seconds before it shows up.
So hold down A/D and press escape then spacebar repeatedly.)

The network update occurs at different intervals than the physics updating.
This is because I don't want to send updates at as high a frequency as I'm updating local physics. I don't want to cap the local physics updating to the network update frequency either, because that would slow it down more than I'd like.

As a result, I get a variable number of physics updates per network update.

while (AppStillIdle)            {                double tempElapsed = HiResTimer.GetElapsedTime();                _accumulator += tempElapsed;                _updateAccumulator += tempElapsed;                while (_updateAccumulator >= UPDATE_INTERVAL)                {                                       _updateAccumulator -= UPDATE_INTERVAL;                    MoveState ms = = UpdateGame(); //The input is acquired in the UpdateGame method                    //The Replaying is also done in the UpdateGame() function                    ms.Timesteps = _timeSteps;                    ms.Velocity = _playerManager.LocalPlayer.Paddle.Body.LinearVelocity;                    ms.Position = _playerManager.LocalPlayer.Position;                    //Build the state and hand it to the network manager                    _networkManager.OutGoingState = ms;                    _moveHistory[CurrentHistoryIndex++] = ms; //store in circular history buffer                }		//This sends the update directly after it's been built, and before any more physics updates can occur.                _networkManager.HeartBeat();				                while (_accumulator >= DELTA_TIME)                {                    _playerManager.LocalPlayer.LastPosition = _playerManager.LocalPlayer.Position;                    _playerManager.UpdatePhysics(DELTA_TIME);                    _accumulator -= DELTA_TIME;                    _timeSteps++;                }			Render()            }

This has the effect of introducing a constant lag equal to UPDATE_INTERVAL, because the input I just sent out has not taken effect. No physics updates have been calculated based on it yet. So we have to wait until the next round, at which time the number of updates we performed on it will be sent to the server, and then, a whole message later, the server can calculate the results of the last input and send it back.

This has some implications in the Replay correction code, which I'll point out below.
This is mostly pseudo-code for clarity.

 private void Correct(MoveState correctState)        {            Player.State = correctState;            //Apply any input for this state.            //Remember, this input here has not been calculated yet            Player.ApplyInput(correctState.Input);                       //Set the timesteps back to the last correct state            int timeSteps = correctState.Timesteps;            //We have to replay all the way up to CurrentHistoryIndex            while (i != CurrentHistoryIndex)            {                //Move to the next state                i++;                //Run the required number of physics updates for this state.                //These updates are acting on the input from the LAST state.                while (timeSteps < moveHistory.Timesteps)                {                    _playerManager.UpdatePhysics(DELTA_TIME);                    timeSteps++;                }                //Apply any input for this state.                Player.ApplyInput(correctState.Input);                                //Correct the position stored in the Move History               moveHistory[i.Position = Player.Position;            }            //Run the remainder updates, this will bring us up to the current state.            //Note that this is after all the moves in history have been corrected.            //These are the updates that have happened on the last state, but have not been sent out yet.            //            while (timeSteps < _timeSteps)            {                _playerManager.UpdatePhysics(DELTA_TIME);                timeSteps++;            }         /*This seems to be where the error is introduced.  The moves in history are always corrected perfectly, but the the update that will be sent out next, using these remainder updates that we just calculated, that update will be wrong.*/        }

Here is some debug output that I've prettied up to make it more understandable.
This is the error loop in action. Couple things to note, even though the error state is not always 1 digit ahead of the replay end state, it's still the next consecutive state. This is because there are sometimes 1, sometimes 2, physics updates per network update.

Erroring on this state: 1130
*replaying*
Replay end state: 1136
*normal network updating*

Erroring on this state: 1137
*replaying*
Replay end state: 1143
*normal network updating*

Erroring on this state: 1145
*replaying*
Replay end state: 1150
*normal network updating*

Erroring on this state: 1152
*replaying*
Replay end state: 1157
*normal network updating*

Erroring on this state: 1159
*replaying*
Replay end state: 1165
*normal network updating*

Erroring on this state: 1166
*replaying*
Replay end state: 1172
*normal network updating*

Erroring on this state: 1174
*replaying*
Replay end state: 1179
*normal network updating*

So, taking the first two outputs from the above sequence,

Erroring on this state: 1130
*replaying*
Replay end state: 1136 //This is where the correction ended. It is the next one, (the one for which we ran the remainder updates) that is incorrect.
*normal network updating*

Erroring on this state: 1137 //This is the next one sent out after the replay, one with the corrected physics state.
*replaying*
Replay end state: 1143
*normal network updating*

I hope this clarifies more than it muddifies. It's difficult to explain.
Thanks for the replies.

P.S. shurcool, I'm pretty sure it's nothing to do with threading; I haven't explicitly used multiple threads at all.

Also, @fenghus, no, the latency doesn't change it. I've tested with more and less than 300ms, with the same results.

If you gave a helpful reply, I rated you up.

oliii

2,202

April 24, 2008 02:59 AM

Could be many things. First you should try sending network ticks EVERY frame to simplify the problem and use a fixed timestep (which I think you do). One input update -> one character update -> one packet sent.

But from where I stand, it looks like a timestep problem (or a mismatch of the number of updates performed for a given move sequence number).

Everything is better with Metal.

Kal_Torak

346

Author

April 24, 2008 04:30 PM

Quote:Original post by oliii
Could be many things. First you should try sending network ticks EVERY frame to simplify the problem and use a fixed timestep (which I think you do). One input update -> one character update -> one packet sent.

But from where I stand, it looks like a timestep problem (or a mismatch of the number of updates performed for a given move sequence number).

I'll probably restructure it that way for debug purposes, but as a final solution.. It's not a viable option to send messages 60 times per second.
The server is going to have possibly hundreds of concurrent clients.

Can someone give me a quick outline of how you could lock the local update rate (incl. input) to the required low network rate, and still the client update smoothly?
I can't see a nice solution right off the top of my head.

If you gave a helpful reply, I rated you up.

shurcool

439

April 24, 2008 05:02 PM

Do you mean your client sends 60 commands per second? That seems pretty high. In my game, my command rate is 20, and the update rate is independent (but at the moment is also set to 20). By update rate, I mean how often the server sends out authoritative state updates to the clients. The server->client updates don't have to be in sequence, as it's ok to skip some and always send only the latest.

As for the client commands, if you want to unlock the network sending from the physics tick rate, which I don't recommend doing, but it's possible by simply queueing up all the input commands since the last state that was authenticated by the server, and sending them all in one packet whatever times per second. By doing this, you're introducing extra artifical latency, so it's better just to send input commands to the server whenever you execute a physics update (i.e. as soon as one is avaliable).

Kal_Torak

346

Author

April 24, 2008 05:47 PM

Quote:Original post by shurcool
Do you mean your client sends 60 commands per second? That seems pretty high. In my game, my command rate is 20, and the update rate is independent (but at the moment is also set to 20). By update rate, I mean how often the server sends out authoritative state updates to the clients. The server->client updates don't have to be in sequence, as it's ok to skip some and always send only the latest.

As for the client commands, if you want to unlock the network sending from the physics tick rate, which I don't recommend doing, but it's possible by simply queueing up all the input commands since the last state that was authenticated by the server, and sending them all in one packet whatever times per second. By doing this, you're introducing extra artifical latency, so it's better just to send input commands to the server whenever you execute a physics update (i.e. as soon as one is avaliable).

My client updates physics at about 60 times per second. I send network updates 30 times per second.
The server replies to messages whenever they arrive. It doesn't send out updates at set intervals.

I've basically got two different timesteps going at the same time.

const float DELTA_TIME = 00.0166f;const float UPDATE_INTERVAL = .030F;while (AppStillIdle)            {                while (_updateAccumulator >= UPDATE_INTERVAL)                {                  _updateAccumulator -= UPDATE_INTERVAL;                  ProcessInput();                  SendNetworkMessage();                }		                while (_accumulator >= DELTA_TIME)                {                   _accumulator -= DELTA_TIME;                   UpdatePhysics();                }			Render()            }

The Input is locked with the Network rate while still letting the physics update faster.

Quote:Original post by shurcool
As for the client commands, if you want to unlock the network sending from the physics tick rate, which I don't recommend doing, but it's possible by simply queueing up all the input commands since the last state that was authenticated by the server

I'm already decoupling the network sending from the physics update, but in a different way. I just slow the input down to match the network, instead of speeding it up by queueing until the next network send.

If you gave a helpful reply, I rated you up.

Clientside prediction / Rewind + Replay bug

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Clientside prediction / Rewind + Replay bug

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines