Client Side Prediction and Server Reconciliation

Started by
12 comments, last by hplus0603 5 years, 10 months ago

Hey! Hopefully this isn't a complicated question! I've been looking into netcode and client prediction + server reconciliation using a variety of sources: Valve's dev blog, Gabriel Gambetta's tutorial, and other posts.

I'm just trying to get some confirmation to help me understand one distinction that doesn't seem to be made about the various ways one can handle prediction + reconciliation. As far as I can tell, there are two distinct ways of handling it:

Option 1: You ensure that your engine frame number is "universal" across the server and the client. This means that all server client communication includes this frame number and commands associated with a frame are run at exactly that frame on both the client and the server. This involves ensuring that the client runs at a later frame number than the server, running "ahead of it", to ensure that commands that reach the server are stamped with a frame number >= the actual frame the server is currently calculating.

Option 2: Each client stamps their commands with a "command number" that the server reciprocates when it sends updates to each client. This seems to be the model that Gabriel Gambetta's tutorial encourages. In this case, the server and the client frame numbers aren't really considered, and the server simply tells the client which command number it last processed and the client's state to allow the client to easily go back and reconcile which commands are old/new.

I've found a few tutorials based on Option 2, and I've implemented my own example using Option 1. While there's a couple of posts floating around about Option 2, I question whether or not that model works well at all. It feels like basing client side prediction on command numbers makes server and client state feel extremely inconsistent since the server simply executes any command whenever it receives it and the client reconciles extremely naively without any consideration of actual frame timings.

Additionally, it seems like Valve's dev blog indicates a possible Option 3 which is similar to Option 1 but is more complicated. They seem to use timestamps instead of frames, but also seem to mention running the server ahead of the client, which is a little confusing to me.

Thanks!

Advertisement

Option 2 works well when the set of actors is small (like, a two-person fighting game) and when re-simulation because you predicted wrong is cheap (like, a two-person fighting game.)

Option 1 is a much better match for games with many actors, physical simulation, and higher complexity.

Option 3 is indeed similar to option 1, except it keeps the server at the peak of the time, and each client runs behind to some extent based on its lag. It then keeps a lot of what the physical state of each simulated entity was at times in the past, and when it receives "X shot Y at time T" it can go back to time T and verify that X was indeed able to shoot Y. It may even put X back at time Tx, and Y back at time Ty which is the time that X would have seen Y at on their screen, for maximum accuracy. The benefit of this is that if I see me shoot you, then that "actually happens." The draw-back is that it opens up a time window where I could perhaps cheat by playing with time stamps in a hacked client. The more permissive the server is to time differences, the bigger that window is.

Latency is a drag. There's no perfect solution, and each game needs to come up with a solution that works for itself.

enum Bool { True, False, FileNotFound };

Got it. Thanks a lot. For clarity...

You mentioned how Option 2 is better when the set of actors is small. Why is that? If we use fighting games as an example, my thought process would be that you would care even more about exact frame timings since most fighting games are very precise and having my input be arbitrarily processed completely based on when it happened to be received by the server and have no correlation (at least enforced correlation) to what I see seems terrible.

Option 3 makes sense. What you're saying is that, in this system, I'm constantly playing in the past, and the server performs magical rewinding to retroactively determine if my action the past matches where the state was then. Is there any reason the server needs to be in the future then? It seems like, in this case, the server really is doing extrapolation since my client inputs are performed in the past, so the server in the future is purely just guessing where I would be. What's the point of that vs just having the server render just in time?

I guess a better way to put it is that you can have lag compensation done via Option 1 by having the server keep track of states and rewind. What's the advantage of Option 3 besides being a little confusing (how would you do client side prediction if the client is rendering in the past and getting updates from the server for the future?).

Option 2 isn't necessarily better for 2 players; it's just that it breaks down when you have many players.

I'd say that when you have few players, you can choose any of the options. When you have more players, choose option 1 or 3. When you have very many players, option 1 is likely best (because 3 needs too much RAM on the server.)

Note that option 2 can totally use frame timings; the main difference is that it continually rewinds and re-plays on the client because it receives opponent player commands "in the past" and update the local simulation to match. When you have wind-up animations, that'll still work out OK for players, but the draw-back is that players will see less of the wind-up animation based on the latency of the longest-latency player.

(There's of course also option 4, mainly used for RTS games: Everyone sends commands ahead of time, sufficiently long ahead of time that all players receive all commands before it's time to execute them, so everybody has deterministic simulation. This is why RTS games have "yes, sir!" acknowledgement animations; it hides the latency between giving a command, and the command actually taking effect.)

In option 3, the server is as far ahead as the closest/lowest-latency player. It's not different from option 1 in how it numbers the frames; it's different in how it treats the people being interacted with -- does hit detection happen in the time frame that the shooting player saw the shot player in, or in the server time frame?

enum Bool { True, False, FileNotFound };

Awesome, your first comment makes sense.

Your second point is interesting. In my model for what Option 2 entails, your server is "acking" user commands whenever it happens to receive user commands. If you put frame timings into the model, it really isn't Option 2 anymore, is it? I'm not saying you can't do that for a game, this is purely just theoretical and I'm trying to categorize different models of CSP in my head and how they work.

Yea, I know of the full lockstep "Option 4". That model is super straightforward, so I didn't include it.

For your final comment, I think I'm missing something fundamental about Option 3. I'll take a closer look at Valve's paper. It makes sense theoretically that if the server runs in the future, it's easier to reconcile when someone fired in the past and if it matches the state in the past. However, digging deeper, that feels contradictory. If I see someone at Tick 5 on my client and I fire, but the person has actually moved on the server at Tick 5 but that packet hasn't reached me, the server would still say I missed regardless of whether or not the server runs in the past or the future. Renumbering packets doesn't really change the fact that it takes time for opponents packets to propagate to the server and then to me, it feels like having the server be fully authoritative and "behind" just makes more sense.

I think my last comment is a little winded, so I don't think I have a strong model for the exact implementation differences between 1 and 3. I'll report back when I've dug a little deeper here.

3 hours ago, impguard said:

I think I'm missing something fundamental about Option 3. [..] If I see someone at Tick 5 on my client and I fire, but the person has actually moved on the server at Tick 5 but that packet hasn't reached me, the server would still say I missed regardless of whether or not the server runs in the past or the future.

I think this comes down to the understanding of what "Tick 5" means. If you see someone at position X when your Client is on Tick 5, then that implies that the person was at position X on the Server at Tick 5, which happened a short time ago. The server might now be on tick 6 or 7, but can look back through its past states to see where that person was at tick 5, and give you a realistic resolution from your perspective. From that person's perspective, they could have moved significantly past that position by now, but Valve favours the shooter.

To put it another way - the idea that the server runs in the future means that (a) it knows you are seeing the world as it used to be, and (b) it can resolve your choices based on what it knows it had sent you at that time so it looks consistent. It also means (c) sometimes players get shot even when locally it appears that they are already behind cover. There's no free lunch. :)

Some engines don't bother to do the rewinding and simply act on the server's current position - this doesn't work well for fast-paced shooters, but is simpler and sufficient for RPG-like games.

Yea, thinking through it more, reading, and now looking through your post, I think I have another follow up that I might be missing about Option 3.

How does this work with client side prediction, then? The intention is that the Server, running ahead of the client, sends me Frame 5, my next frame, and that should be the frame I render on the client. What happens when I press "move right"? If I start moving the client to the right, then the following frames (6, 7, 8, 9...) from the server won't include my action just yet, since it hasn't reached the server yet. Yet I would want to move the player to the right before a full RTT.

In Option 1, these server frames are in the past, so I just roll the client back and make sure my predicted inputs are A-ok.

In Option 3, these server frames are in the future, so I'm not really rolling back anything. It seems like I would just play my actions on these states, assuming the server will reconcile it properly (since it will eventually get my choice to move at frame 5).

It seems like Option 1 makes client side prediction and consistent state very clear, at the cost of lag compensation being a little confusing (since you don't really know what the client really is seeing), whereas Option 3 makes lag compensation clearer since you know exactly what the opponent saw when they pressed fire in the past at the cost of making client side prediction a little wonky (not even sure right now how you would do any client side prediction, I never get a historical snapshot to rewind to on the client).

Thanks again for some awesome discussion!

The definition of 'future' and 'past' is arbitrary - information takes time to travel, and always appears to come from 'the past', whatever numbers you attach to your timesteps. All times are relative. That means these approaches are more similar than perhaps you think.

In Option 3 (the Valve approach) the client gets information about future entity positions, but they are derived from inputs applied to a previous state. This is also the case for your local player - you receive information about where your local player is supposed to have ended, based on the server calculations applied to input you sent earlier. If you store a buffer of previous states, you can check to see if they match these replies from the server, and if they don't, then your local prediction was wrong (probably because some physics happened on the server to move you), and so are all local states after that. It would have to correct that by snapping or interpolating towards the new position (or, perhaps more likely, an improved prediction based on the last known good state).

In option 3, you either don't forward extrapolate the position of remote objects, OR the server has to do the same forward extrapolation as you would be doing, when "rewinding" the world to the state you should have seen. Easiest and most robust is to not extrapolate remote objects.

The local player is "extrapolated" in the sense that it "simulates ahead." As long as no actor/actor interactions happen, the server physics should be identical, but when it isn't, the client needs to be able to rewind the local player and re-play simulation from the point of correction.

enum Bool { True, False, FileNotFound };

Thanks. Yea, I understand how client side prediction works in model 1. I see what you're saying Kylotan.

In this case, here's where I'm landing regarding how the two approaches "mix". The server's state update to me is "in the future" with regards to remote objects, so I render then as I see them and the server knows that my Frame 5 is their Frame 5 with respect to to remote objects.

However, my Frame 5 is not the server's Frame 5 with respect to my own input, since my Frame 4 input sequence hasn't reached the server just yet, so there's no way the server's Frame 5 could match mine with respect to myself. How do you consolidate this piece unless you completely combine Options 1 and 3 (keep track of two different frame numbers)?

The issue is this: Suppose I'm just standing still for a while. On my Frame 5, I decide to finally move forward. However, the server's futuristic state packets are still coming in, so I receive Server Packet 6 just in time for me to render Frame 6. I've started moving however, so while I render all remote objects appropriately, I start predicting my movement forward. Right now, the server has me standing still in all update packets, but I've started moving forward.

Eventually the server will receive my update for movement on Frame 5. There are a couple of options. (1) Roll back and apply my movement from frame 5 on, with or without extrapolation (bad since it now makes opponent's "miss" if they shot me at Frame 5 but their packet was delayed). (2) Don't roll back. I guess the server would just start moving me forward at a later frame, but now it's inconsistent with me.

Finally, I would get back the server packet that includes my update. If RTT is 10 frames, this is my frame 15/16 from the server. With either of the options above, it's a little confusing how I am supposed to reconcile. If the server had rolled back to apply my movement and extrapolated forward 10 frames before sending me an update (since it only got a move forward for frame 5, not frame 16), it may be woefully off if I did other things between 5-16 that just hasn't resolved just yet.

If the server didn't reconcile and simply applied my movement on receipt, I guess I just have to reconcile the server frame 16 with my frame 5. This is the inconsistency problem. I guess one option would be to keep track of two frame numbers in this scenario. One for remote objects and one for my inputs. The server runs in the future for remote objects but in the past for my inputs, allowing me to reconcile my client side prediction but the server to reconcile my actions with remote objects.

After blabbing a bunch, it seems like the last paragraph is the only reasonable way to resolve this. Which indicates that Option 3 by itself isn't particularly useful since it makes Client Side prediction impossible. Whereas Option 1 is passable but may make players feel some desync. But Option 3 and Option 1 combined is kind of the best of all worlds.

Does anything I'm saying make sense?

This topic is closed to new replies.

Advertisement