Networked Physics sanity check

Started by
13 comments, last by Hodgman 5 years, 11 months ago

I'm working on the networking architecture for our game, but I'm not too experienced with multiplayer programming so I thought I'd ask for a sanity check of my ideas. Any feedback would be appreciated :D

For now, I'm assuming:
* I want to keep server CPU usage down low
* Both dedicated servers and one-client-as-server are required
* Gameplay is heavily physics simulation based, and slightly non-deterministic
* Direct client-to-client physics collisions are avoided in the gameplay (e.g. client-controlled objects pass through each other instead of colliding)
* Avoiding the ability for cheating is a high priority
* Players will have <200ms pings to the server
* Our gameplay / physics update code is very fast (think ~1ms per frame)

I'm using a server-authoritative model inspired by Quake 3 / Counter-strike, based on unreliable delta-compressed gamestate snapshots.

When connecting, the client synchronizes their wall-clock to the server's, and then sets their game clock to be 100ms ahead of the server's game clock -- i.e. every client will be extrapolating/predicting 100ms into the future.

Each frame on the client, it gathers user input and runs the game logic as usual, producing a predicted game-state. They also bundle up the last quarter-second worth of user inputs + timestamps and send them to the server (unreliable) -- the redundancy (sending old+new input snapshots instead of just the current shanpshot) is to guard against packet loss.

Each frame on the server, it will have a queue of time-stamped user inputs -- hopefully many of them will be in the future (for players with a packet trip time of under 100ms). The server consumes any inputs that are not in the future and applies them this frame. Every N frames (network update rate), the server sends a game-state snapshot to each client (using Q3 delta model).

On the client, if a snapshot has arrived from the server, it stores a small backup the current predicted position/orientation of each physics object and the current game-time value, and then replaces it's game-state with the version from the server. This game state will be around 100 to 200ms in the past though, so after applying it, the client runs it's Update function as many times as required to re-advance time to where it was actually up to (replaying it's own inputs each Update). This produces an updated predicted gamestate, but may also cause all physics objects to snap/jump/teleport to new positions in cases where the previous predictions did not match the server's behavior.

The difference between the new game-state and the backed-up positions/orientations are then subtracted to produce a position/orientation "error" value for each physics object. These errors are stored with each object and added on when computing their visual transforms -- this hides the fact that the objects have snapped/jumped. Over a few frames, the "error" values are lerped towards 0, which smoothly animates objects over to the new positions instead of 'snapping' there.

 

Cons:
* High client CPU usage, due to needing to perform many rewind-and-repredict physics ticks every time a server game-state packet arrives.
* Misprediction of other client's actions causes 200ms extrapolation errors (movement that suddenly starts/stops might have a ping-sized delay / error smoothing).

Pros:
* No latency on the client's own actions (when pings are <200ms).
* Low server CPU usage - no rewinding of gamestates on the server.

[edit] Here's a bunch of links on the Q3/Counter-strike model that I'm ripping off / converting to use extrapolation instead of interpolation:
http://trac.bookofhook.com/bookofhook/trac.cgi/wiki/Quake3Networking
http://www.ra.is/unlagged/network.html
http://fabiensanglard.net/quake3/network.php
https://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking
https://developer.valvesoftware.com/wiki/Latency_Compensating_Methods_in_Client/Server_In-game_Protocol_Design_and_Optimization

Advertisement

Seems like a solid version of the Quake networking model. We do something similar, except we keep the server ahead of the clients, ie. we never extrapolate on the clients, only interpolate. One thing missing from your description is lag compensation. Depending on the gameplay this may or may not be needed, but if you have hit-scan weapons you might want to have the server doing some kind of lag compensation, to counter the fact that the client and server are on different timelines.

It's quite a while (15 years, with a vague bit 6 years ago) since I did any networking, some bits are controversial and there are a lot of different ways of doing things and I'm sure there will be experts on here soon, but from vague memory:

I have an inkling that the client side predictions should never be 'off' unless you either

  1. have a large network hiccup or
  2. have some interaction with another player

Make it pretty much deterministic as possible, except for eye candy, with periodic clamping to integers (you will probably use this for network compression anyway) or integer math throughout in an ideal world. Be aware of floating point inconsistencies. Having lots of gameplay physics based sounds like diving in at the deep end but needs must lol.

For sending input, I think I basically sent the entire history since the last 'ack' (knowledgement) from the server which tells the client which tick the server has received up to.

This next bit I can't remember exactly and is just what I did, there may be much better (more modern) way of doing this :

On the server if particular clients get behind, that was their problem and they were more likely to get shot(!). On the server, each actor can be simulated up to a DIFFERENT tick, with some up to date and some behind. The clients interpolate their latest view of the other actors, the server sends out whatever it has. The success of a shot / whatever is dependent on what info the shooter sees (the server sees) NOT what the shootee sees (so if a shootee is behind they are at a disadvantage). If you make it depend on the shootee and info the server hasn't received yet it makes it easier to cheat, and super hard for shooters. And look at lag compensation for dealing with things like whether a shot hits.

On the server packets to each client, only send out info on actors that are potentially visible (use a PVS) this makes the packet much smaller and quicker for the client. And use deltas in packets wherever possible.

Write a network simulation layer so you can simulate and test all network conditions / packet loss.

As I am sure you are aware Glenn Fiedler wrote some great articles on this:

https://gafferongames.com/post/networked_physics_2004/

I'm more used to the 'client runs behind the server and interpolates' model so I don't think I have any good suggestions here, but my first preference would be to see if it's possible to reduce the 100ms requirement to something much smaller. If you're extrapolating a shorter time into the future you're going to get fewer corrections, since fewer unpredicted events will have occurred, and there will be less to replay when you get them, as you have a shorter timespan to cover.

Similarly, you might also consider whether you can increase the network update rate on the server - if the corrections arrive earlier, there will be fewer updates to apply client-side, which may save some time.

Thanks for the input, y'all :D

I should probably add a bit more context -- the specific kind of game that I'm making is a high-speed racing game, similar I guess to Wipeout, etc, except also with some RTS/MOBA elements involved. There's no shooting, but there is physics trigger based gameplay, equivalent to picking up items, etc...

On 5/4/2018 at 4:37 PM, GuyWithBeard said:

Seems like a solid version of the Quake networking model. We do something similar, except we keep the server ahead of the clients, ie. we never extrapolate on the clients, only interpolate. One thing missing from your description is lag compensation.

Yeah that seems to be the typical way to implement this model. If I understand correctly, to implement lag compensation in that typical version, the server needs to be able to rewind time and apply user inputs (e.g. hit-scans) in the past, based on timestamped user inputs.

I'm not clear on whether FPS games typically only do this for hit-scans (rewind time, trace rays, restore present), or whether the do it for all user inputs (rewind time, insert "move forward" command, fast-forward several physics ticks to present)? I was put off by this, because I assumed, with my physics based racing gameplay I would need the server to actually rewind and re-run the vehicle simulation when user inputs arrived... With the detail in the physics sim, this adds quite a bit of CPU expense (which my model has shifted from the server onto every client... :| )

In my version where the clients are ahead of the server, I side-step most of the need for any lag compensation on the server as it's assumed that timestamped user inputs arrive at the server before the moment at which they need to be applied. I also don't have any hit-scan weapons, so slight relative position errors between clients aren't a problem.

On 5/4/2018 at 4:51 PM, lawnjelly said:

I have an inkling that the client side predictions should never be 'off' unless you either

Yeah this should be the case. Mine will be very slightly off because I'm using PhysX which is non-deterministic. Running the exact same code with the exact same won't always produce the exact same outputs... Over long periods of time these build up to massive differences, but at the sub-second scale, the mis-matches will hopefully be imperceptible once network-smoothing is applied on top.

On 5/4/2018 at 4:51 PM, lawnjelly said:

Write a network simulation layer so you can simulate and test all network conditions / packet loss.

As I am sure you are aware Glenn Fiedler wrote some great articles on this

Yeah I've read Glenn's work, and I'm using his (fairly recent) networking library yojimbo to handle the low-level stuff :) He's implemented a ping/packet-loss/jitter simulation layer in there already, but I'm yet to try it out!

15 hours ago, Kylotan said:

my first preference would be to see if it's possible to reduce the 100ms requirement to something much smaller. If you're extrapolating a shorter time into the future you're going to get fewer corrections, since fewer unpredicted events will have occurred, and there will be less to replay when you get them, as you have a shorter timespan to cover

Yeah i'm using a fixed 100ms at the moment, but it really should be automatically set to half your ping/RTT plus a small extra buffer for safety. So, a player with 200ms ping would use 100ms extrapolation, but a player with 30ms ping might only need 15ms extrapolation.

15 hours ago, Kylotan said:

Similarly, you might also consider whether you can increase the network update rate on the server - if the corrections arrive earlier, there will be fewer updates to apply client-side, which may save some time

I had a similar hunch, but one downside is that for each server-update that arrives, I need to repredict the entire 100ms (or whatever this extrapolation length is). So a doubling of the server send rate basically doubles the CPU usage on the client side! :o 

This also affects clients with larger pings too... a client with 400ms ping (so: 200ms extrapolation), has double the CPU usage of a client with 200ms ping (100ms extrapolation)...

I guess I could adjust the server sending rate based on the ping, so low-ping players get more updates per second, and high ping players get less in order to avoid overwhelming their CPU's.

I think this is the main flaw in my model -- at high ping times and/or high server send rates, the client's CPU load grows too large.

 

I'm actually also going to implement a second networking model for a 100-player time-trial race mode, but that will basically just be sharing replay-splines, with massive latency and zero interaction between players (besides comparing lap times).

 

[edit] I just found http://mrelusive.com/publications/papers/The-DOOM-III-Network-Architecture.pdf which seems to describe something similar to what I'm doing:

Quote

The server progresses the game without waiting for input from players. The server duplicates old player input if no new input has arrived in time to process the next game frame. The client tries to make sure the server always has new input to advance the game state at the server. As such the time at the client is ahead of the server time. The client runs just far enough ahead such that input from the player can be processed immediately at the client and can be sent over the network to the server where it arrives before the server needs to process the input to advance the game.

The client is able to run ahead of the server by using prediction to advance the state of entities. The server sends snapshots to the client at a rate between 10 and 20 Hz. While no new snapshot has arrived the client predicts the state of entities on a frame to frame basis. Upon receiving a snapshot the client overwrites it's entity states with the entity states from the snapshot. A snapshot the client receives is from at least a full ping time in the past relative to the current time at the client. As a result the client temporarily moves back in time when it processes the snapshot. The client then has to quickly re-predict ahead from this state up to the current client time, from where the client can continue the prediction on a frame to frame basis.

 

After reading that doc from Doom 3, I was quite surprised that Quake 3 had separate code for client and server simulation, it seems a no brainer to share much of this code, maybe it was for historical reasons (which they corrected on Doom 3). Also in Doom 3 sending 32 bit floats seems a tad overkill imo, and I wouldn't regard it as gospel..

Some of the major bits boil down to this:

  • Should server predict client actors ahead of when it has received their input (yes / no)
  • Should client predict the player - yes
  • Should client predict the other actors from the snapshots using interpolation / extrapolation, or full on physics simulation

If the client is doing full on prediction for actors, then you have to ask why the server is doing this too, it seems repeating the same thing in 99% of cases. I.e. if you know an actor is pressing forward, and the server predicts him from tick 20 - 25, whats the point, just tell the client that you have him predicted only up to tick 20, his input, and leave the client to predict all that stuff. Congrats, you've just saved a shedload of server CPU messing.

The exception is where there is interaction between actors, such as sliding off each other. This is where things get really dicey, because the order of physics interactions is critical to the result. Afaik this is why when you play counterstrike etc everything runs smooth until you brush up against another player and you can judder all over the place.

In my opinion you would be best off writing your code to handle all of these cases, it should not be significantly more difficult than designing yourself into a corner ahead of time. Then you can try all the approaches see what works best for you, or perhaps swap them on the fly to trade off accuracy and CPU. As I point out earlier there may only be need to do the whole shebang when actors are directly interacting.

Another thing you may find is that you may want to use a slightly different setup due to the nature of your game. In a fast paced racing game an actor could have collided with a wall and shot off the edge of a track in the time frame of a typical extrapolation, so there may be greater need for physics rather than extrapolation, which might have been fine in a 1st person shooter.

15 hours ago, lawnjelly said:

After reading that doc from Doom 3, I was quite surprised that Quake 3 had separate code for client and server simulation, it seems a no brainer to share much of this code, maybe it was for historical reasons (which they corrected on Doom 3). Also in Doom 3 sending 32 bit floats seems a tad overkill imo, and I wouldn't regard it as gospel..

Quake 1 was a pioneer in the idea of forming a strong separation between your "update" and "render" code (something that should be fairly standard these days). However, they achieved this by basically designing a dedicated game server module with all the game rules, and "dumb terminal" that just knows how to render game objects. Even when playing single-player, the engine spawns a local server and a "client" and communicates between them similar to a networked game. This was/is a good idea in general, but they took it a bit far to the point where gameplay programmers had very limited control over how things were rendered, or replicated over the network.
This architecture was also replicated by the many, many descendants of Quake, such as Half-Life or Call of Duty (though I'm not sure what their modern engines are doing now -- Source 2 / COD 9000)...

Yeah sending full 32bit seems like an easy choice, but 32bit floats only provide 24bit precision at the worst case anyway... so as far as I can tell, you may as well define what your worst-case range is and then send 24bit fixed point over the network :) (or less than 24bit if that's overly precise)... e.g. if your world extends out to 4000 units, and 1unit = 1meter, then at the edge of the world, floats will give you approximately 4000m / 2^24 = ~1/4mm precision (but much greater precision close to the origin)... So this implies that you're happy to have 0.25mm precision in your coordinates, so 24bit fixed point replication should be fine for you!

15 hours ago, lawnjelly said:

The exception is where there is interaction between actors, such as sliding off each other.

Yeah we're simply having all vehicles be ghosts to each other -- passing right through. We might try enabling collisions, but it will be a "custom" game mode, not the standard...
On a fun side note -- in many FPS games, instead of having players actually collide with each other, they instead override your movement commands when you're intersecting with another player. e.g. if you're touching a player on your left side, the game forces your "Strafe right" key on. This provides a nice "squishy" feel to player collisions and allows you to push people around.

15 hours ago, lawnjelly said:

Another thing you may find is that you may want to use a slightly different setup due to the nature of your game. In a fast paced racing game an actor could have collided with a wall and shot off the edge of a track in the time frame of a typical extrapolation, so there may be greater need for physics rather than extrapolation

Yeah definately. We have huge curves in the tracks everywhere, so simple dead-reckoning style linear extrapolation would cause cars to clip through the track everywhere. To "extrapolate" (better called "predict", I guess) the game-state, I simply call the game's Update function as many times as required to advance from the server's snapshot to the client's local time.

For what it's worth, for a racing game, you may get better extrapolation results if you extrapolate players based on what the AI would do in the current time step.

Also, for time re-play, you don't necessarily need to re-step the physics; just keep a log of physics states from the past, and use the appropriate state for whatever interactions you are checking. Even if a player "dies," you can often let their physics have simulated for a few more steps before you actually apply the death.

enum Bool { True, False, FileNotFound };

I have been working on a game with networked physics model that is almost identical to what you are proposing. So I can tell you it definitely works, getting things to be mostly deterministic has been my biggest challenge. I am using bullet which is definitely not deterministic out of the box, especially when the client side does not have the full world. However, with some tweaking you can get a sufficient level of determinism.

Getting good resimulation performance in worst case scenario is also fun. Still not 100% on that yet. Currently refactoring my resim code to be physics state only and run on a BG thread as needed. Which is basically only when remote player input prediction fails.

Look up papers on dead reckoning to get good input prediction. Works very well for racing games, I am using Lagrange polynomials based on past input. I believe that is a good approach for racing games as well.

Sleep is for the weak, or at least that is what I tell myself every morning.
5 hours ago, hplus0603 said:

for a racing game, you may get better extrapolation results if you extrapolate players based on what the AI would do in the current time step.

Yeah I was thinking of trying this, but currently my AI is pretty brain dead and definitely not a good proxy for a human...

Currently I'm just repeating remote player's inputs from the last snapshot. I was also thinking of having the server broadcast inputs constantly (e.g. At 60Hz) even though state snapshots are sent less frequently (e.g. 10Hz).

5 hours ago, hplus0603 said:

Also, for time re-play, you don't necessarily need to re-step the physics; just keep a log of physics states from the past, and use the appropriate state for whatever interactions you are checking

My problem is that the bulk of the game interactions are the physics in the present. I need to correct the client's physics simulation state to not get too far out of sync with the server's physics simulation.

5 hours ago, jpmcmu said:

Getting good resimulation performance in worst case scenario is also fun.

I'm kind of lucky in that for most of the project, I've been using a fixed 600Hz gameplay rate, which gives a budget of around 1.5ms per frame, and we've just dropped the gameplay loop back down to 60Hz so now I'm more than 10x under budget. Currently I can do somewhere around 10 to 15 ticks per frame (167 to 250 ms simulated time) within our 60Hz performance target. 

5 hours ago, jpmcmu said:

I am using Lagrange polynomials based on past input.

You build curves based on past input samples over time to predict future input samples? e.g. If the player has been depressing the brake increasingly over the last few frames, they'll probably continue to depress it further? Neat :)

This topic is closed to new replies.

Advertisement