Rollbacks and Simulation Replay - Performance

Started by
23 comments, last by Tom Sloper 9 months ago

Hi everyone! Despite being a successful games programmer for a few years now, this is the first time I decided to post here on forum ? So welcome!

I am building a multiplayer game right now that involves cars, that are simulated physically. So I have been doing various networking for years now, and I am very familiar with the client/server architecture, client side prediction, lag compensation and server reconciliation techniques. However I mostly worked on netcode for shooter games, which was a bit simpler than the physics simulation I am dealing with right now.

My technology
I am using Unreal Engine, but that is completely irrelevant as I have integrated separate physics engine(the newest NVidia PhysX) to be able to manually tick the physics engine in order to replay physics and deal generally with the fixed time step. So I have complete control over the physics for my vehicles and the entire scene, and Unreal doesn't interfere, I made sure of it.
My solution allows me to change time steps at edit time, so I am able to test various fixed time steps without any trouble.

My networking solution
I am using standard networking solution that has been introduced during multiple GDC talks such as Overwatch, Rocket League, Halo and Glenn Fiedler's networked physics talk.
So, my clients are always running ahead of the server by half RTT + 1 buffer frame. Clients are using upstream throttle in order to simulate faster or slower, which is dictated by the server, to always have some inputs buffer on the server side. This solution is very robust as I have shipped multiple FPP and TPP shooter games without any trouble. The clients only send a list of inputs to the server, from oldest not-yet-acknowledged input, to the current one, so packet loss is covered very well(standard technique).
For the characters it's all simple, local client is using its own inputs to predict local client and buffers the states and inputs locally, so when server returns state, it checks if correction is needed, and if so, it replays all the frames from the corrected one up until current one to be back on the lockstep. It all works perfectly, I wanted you to be aware I am very familiar with this networking model.

Now to the point…
The regular player characters can be rolled back to the correction frame and replayed extremely easily, because they do not have any physics, they are kinematic beings and therefore their “physics” replays are very, very cheap.
I started implemented my vehicle system, which is deterministic enough(except for the floating point errors, but they are well covered using the corrections, so this can be completely skipped). Each input for the vehicle on all clients and server gives exactly the same results, which is very big win for me. However….
My game is heavily focused on cars collisions, and that's where the trouble start!

Predicting everything(like Rocket League)
Simulation running at 60hz
In this solution I am predicting local player, which is easy, due to the deterministic nature of the vehicle and physics system I have. There are never ever mispredictions, even when hitting some static objects around.
However, OTHER CARS are not on the same timeline for me as I am. I used the Rocket League solution, where I would predict remote clients just as I predict mine. In this case when I get information about the remote client from the past, I check if it needs to be corrected. If so, I do the replay from that corrected frame to the current one. Standard technique and it works very well. There are some mispredictions pretty often, because I cannot predict remote client's input, so all I have is their last used input. If they have changed the direction during that time, I have no way to know that, but that's okay, the correction and replay covers that well enough, so I don't really see that.
This technique seems perfect, but… performance. I measured my simulation and 1 physics frame takes usually around 0.5ms - 1ms time, which is very quick, really nice. However. Let's say I am on the frame 100, but because I am high ping client(let's say 200ms ping), I may receive information about remote clients from around 200 or more milliseconds ago, which with 60Hz simulation translates to around 13 frames. This is where the trouble starts. If my simulation usually takes around 1ms, I need to replay the simulation from 13 frames ago, up to current one, which is 13ms. This IS SIGNIFICANT. Considering, that the game rendering at 60FPS(16ms), this is over 80% of the time to be added! Of course, we can try to budget the game in a way that it is able to run at 120fps, when we would have enough budget to keep it there, but I am wondering if I am missing something? Or is it really the way it needs to be? Considering how remote clients go out of sync as soon as they change their inputs, this can very often simulate 13 frames of physics during one fixed time step tick. I do not think I can squeeze the physics to simulate faster than that 0.5 to 1ms, this is pretty good as it is.

So do you think I am missing something here? Or is this they way to go, and we should consider optimizing other parts of the game to account for that extra 13ms? Thanks for any responses!

Advertisement

No matter what you do, you'll never overcome latency. Instead, focus on what fits well for the game.

How does the game play? How does it feel? When played in real-world situations with people located in distant cities, does the game feel laggy, choppy, or stuttery? Do players rubber-band? If so, that's something to address.

Rollback and re-simulate is a technique among many in the toolbox to help mitigate the effects of latency. Most games apply several of them, like audio cues, different-length animations, and many more.

As for optimizing, there are things you need to optimize and things you want to optimize. If your machines are unable to do all the things that needs to get done then you need to optimize, it must be made faster to keep up, and you have to do it for a quality game experience. If your machines are able to keep up then instead it becomes something you may want to optimize, freeing up the processing power for other tasks that cat you can add make the game experience better.

Thank you for valuable input.
I am well aware latency is always going to be a thing. And yeah, I am well aware of the fact that there are various techniques, I wanted to see if there are some people more experienced with networking particularly physics and vehicles than me. I have a couple of solutions I will try as well, such as allowing for local player lag in order to keep all clients in the same timeline. In fact this is something I am exploring right now. It given 100% accurate results, so I am checking now if I can hide the lag behind some vehicle suspension animations, sounds and some particles.

So for my game the levels will not be very large, there will be up to 12 people and right now. As of the questions you asked. The game doesn't feel laggy or choppy, the netcode is pretty well compressed and optimized, there is not much bandwidth taken, and we are also using net dormancy and relevancy systems, which save even more bandwidth. That part is okay. Of course, I cannot allow local and remote clients to be on different timeline(be simulated locally from a different frame), because the collisions between vehicles will happen between local player frame 100 vs remote client frame 80 or something like that, this results in horrible desyncs, I mean it's quite obvious this is not the solution here.

So I see only 2 solutions here, either predicting remote clients and local client and often replay it when there is misprediction, or allow lagging of the local client and hide it behind some animations and other effects.

I guess what my question really is, is just whether or not that physics corrections replay is something that games with networked physics do in real life? For reference, games we are inspired by are something like World of Tanks or Crossout. They both handle vehicles collisions extremely well and there is no lag I can lay my finger on.

Fury22 said:
whether or not that physics corrections replay is something that games with networked physics do in real life

Some games do. There.com did this all the way back in 1999, to jam a many-player physical virtual world into a 56k modem, and other games have done this since. The “GGPO” network library, for fighter-style games, is one example, and has a good description of how it works in that case.

It's a good way to save on network bandwidth, because you only need to send a game tick number and a small input vector, rather than a full entity state. And even if you correct with a full state, you can then get a better forward simulated copy to display to the client – as long as simulation is reasonably cheap. (This is a hidden assumption in GGPO; if you're running a full rigid body dynamics system, that won't work so well.)

enum Bool { True, False, FileNotFound };

hplus0603 said:

Fury22 said:
whether or not that physics corrections replay is something that games with networked physics do in real life

Some games do. There.com did this all the way back in 1999, to jam a many-player physical virtual world into a 56k modem, and other games have done this since. The “GGPO” network library, for fighter-style games, is one example, and has a good description of how it works in that case.

It's a good way to save on network bandwidth, because you only need to send a game tick number and a small input vector, rather than a full entity state. And even if you correct with a full state, you can then get a better forward simulated copy to display to the client – as long as simulation is reasonably cheap. (This is a hidden assumption in GGPO; if you're running a full rigid body dynamics system, that won't work so well.)

I guess that is the issue in my case. I aware games do that kind of replay, and my character system also does it successfully, but the characters being kinematic, having very simple non-physical movement is pretty much free when replaying as many frames as I desire. But as you said, my vehicles are indeed rigid bodies, that only get forces applied to them.

It's also interesting to me how Rocket League has done their replay system, because from their GDC I know that they do indeed predict not only local player's car, but also other players, and they replay the entire physics scene pretty much all the time. They are using Bullet physics, which turns out to be slightly slower than PhysX(I measured it, I have both implemented for comparison). And yet their simulation runs at 120Hz, so their replays contains many more frames than mine. They mentioned, that their corrections are extremely expensive during that GDC, but have not mentioned any particular numbers. It's interesting that they have gotten away with this. I can only assume the rest of the game is very well optimized, so they are still within the budget even with the worst-case-scenario corrections. This is impressive.

Computers are fast these days. As long as there's just a couple of bodies per car, plus just a few for the environment, plus static colliders for the arena, you should be able to run the full simulation in way less than a millisecond. If you budget ten frames of latency, that's just a couple of milliseconds per frame.

It's really when you get into n-squared interactive bodies – piles of boxes in a container, throngs of people in a crowded space, and so on – that you end up paying a higher cost.

enum Bool { True, False, FileNotFound };

Well, so my cars are reduced to 1 body per car and that will not change. Not only performance-wise, but it also gives crazy good determinism as there are less that can go wrong during collisions.
However, that still concerns me.

Remote clients are predicted like in Rocket League. This means, that have to constantly forward-simulate them. Technically it's fine, I have a pipeline for forward simulation, but that is somewhat expensive depending on how old packets we receive.

Fury22 said:
that is somewhat expensive

I think your next step is to determine what “somewhat expensive” really means. If you spend 1 millisecond of 1 core per frame on forward simulation on your target hardware, that's probably perfectly fine. If you can get a better experience by using some CPU rather than letting the CPU sit idle, that seems like a good trade-off.

enum Bool { True, False, FileNotFound };

Also, be very careful about any kind of networked physics coupled with replication. It's easy for something you think is deterministic to not actually be deterministic, and something that has completely ruined many aspiring games over the years.

As an example, FPU operations are defined within an accuracy & precision guidelines and produce results which are in tolerance but still bitwise different when computed between architectures. The newer SIMD float operations are better in that regard. Another issue is that even if your C++ code appears deterministic, two different builds might be optimized differently that give slightly different results, so even though they're built twenty minutes apart they produce slightly different numeric results. Even though they're built from the same source code your server executable and client executable will be optimized differently and thus produce numerically different results. Even if your code is deterministic, it's playing on a CPU that isn't when viewed in conjunction with all processes running. Code moving between processors, or interplay with other processes, tasks, and libraries can also give subtly different results that are still within math tolerance but bitwise different.

Networked physics also plays badly over real-world networks. having a ball/rock roll down a hill with network-controlled synced physics might LOOK okay on the gigabit LAN, but spread people across the country on a variety of consumer equipment and the updates will look like the object is popping around violently rather than moving smoothly.

It's something that needs extensive testing. It's a huge cost that is hidden for people who aren't experienced with it already, and a huge motivation for companies to use pre-built solutions like UE5, Unity, C4, and assorted other engines and middleware. It is certainly possible to make deterministic distributed physics, but be mindful that you're looking at an enormous challenge.

And regarding performance, all performance numbers are best with real numbers, as H+ commented above. It's not enough to say “it is expensive”, instead metrics like “932 microseconds in this measurement," or “217 microseconds on average, peaking at 392 microseconds” are far more useful but harder to get. When tuning performance you need to carefully measure before the changes and also after the changes, to make sure you've really improved the situation in all use scenarios.

@undefined all the things about the determinism you mentioned is something I am fully aware of. By deterministic I meant deterministic enough to have results close enough, that the network corrections would fix any errors.

This is just the same as they did with Rocket League. They are not deterministic, they use Bullet, they still have floating point errors, but that is expected. This is why we replicate physics states to fix stuff like that.

My physics is very close to deterministic and I have tested it across around 10 machines placed around europe. In general my netcode looks and works really good so far with the physics.

But I guess you are right. I need to complete more game systems and have more complete gameplay in order to say whether that x milliseconds is a lot or not.

As of the multithreading, physx is parallelized by default, so It already utilizes multithreading internally.

This topic is closed to new replies.

Advertisement