# Synchronized input-based networking- computational load?

This topic is 3670 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I've been thinking about implementing a game networking system whereby all player's game inputs are sent to all other players (as opposed to more concrete game state info like position and velocity), which are then applied to every player on every machine involved. What draws me to this approach is its low maintenance and bandwidth. Although it's not where I got the idea, I read The Internet Sucks., and it's helpful. But Lincroft seemed to have more problems with networking than anything else- what I'm more worried about is the computational cost of this technique. Since the server and client machines must maintain a synchronized game state, and there's considerable delays between the creation of information and its receival on other machines, client and server machines alike must basically go back in time and re-apply old input data to get all the way back to the current state; this means that they must run up to about 10 physics updates per frame, and so around a thousand per second. In my case there are several hundred rigid bodies (most of which would be inactive, however). It seems impossible that the CPU could keep up. Has anyone tried this before, or just have any other ideas? Thanks. Of course one way would be to set up an incredibly complex system which quickly simulates only rigid bodies against the environment in cases where the bodies couldn't possibly have interacted with others, but that's beyond my abilities. Not sure it'd be enough of an optimization anyway.

##### Share on other sites
An important factor is - what kind of game is it? There is no single catch all solution which will work for every game type.

##### Share on other sites
As said already it depends what kind of game it is. Client-Server beats P2P almost every time bandwidth wise. It only relies on 1 computer having solid processing power and download/upload. Everyone else is free to have their own latency and packet loss and what have you. Depends on the game though.

Quote:
 Original post by BaiameSince the server and client machines must maintain a synchronized game state

true for both Client-Server and P2P

Quote:
 Original post by Baiameclient and server machines alike must basically go back in time and re-apply old input data to get all the way back to the current state; this means that they must run up to about 10 physics updates per frame, and so around a thousand per second

I don't think you understand the concept of a Client-Server system. Clients send input and server sends out full state/update packets to players. Culling on the server can make this task trivial.

Quote:
 Original post by BaiameOf course one way would be to set up an incredibly complex system which quickly simulates only rigid bodies against the environment in cases where the bodies couldn't possibly have interacted with others, but that's beyond my abilities. Not sure it'd be enough of an optimization anyway.

P2P is far from being an optimization.

Put it this way. Player 1 hits a rigid body block and Player 2 hits a rigid body block and the blocks move toward one another. They send their input to all other players. Timestamps will have to be used to know when things happened. If one player receives the message at 100 ms and another receives it at 250 ms and the two blocks already hit. Well you can see the complexity. Who's client is right after subtle errors compound. Also imagine if one client lags for 5 seconds and requires a full state update. Who sends out the full state?

In a Client-Server the players send input and the server runs 1 authoritative simulation. Player 1 send a packet saying it wants to move forward and player 2 does the same. The server updates and performs the physics step after processing the packets and realizes the players hit the blocks. It marks their delta variables and sends out small packets to players when the update finishes telling everyone within range of the positions and velocities with a timestamp.

Which system seems more stable? Everyone trying to synchronize simulations or all the clients synchronizing to one true simulation.

##### Share on other sites
You don't have to send state packets from the server; you can still send an input stream. That saves a lot of bandwidth, but does cost processing -- both on the server, and on the other clients.

If you're just displaying simple dead reckoned entities, then the most simulation you have to do on the client for the entity is a little bit of testing so you don't go through the ground. If you run the full simulation (based on inputs) of entities on servers and clients, you have to run the full character physical simulation, which may be significantly more expensive.

If you want to know one way of building an entire system around input-synchronous simulation, while hiding latency wherever possible, see US Patent 6628287.

[Edited by - hplus0603 on February 1, 2008 1:39:45 PM]

##### Share on other sites
Quote:
 Original post by TheGilbAn important factor is - what kind of game is it? There is no single catch all solution which will work for every game type.

It's a first-person shooter with up to 64 players (no, nothing like the Battlefield games).
Quote:
 Original post by SirisianI don't think you understand the concept of a Client-Server system. Clients send input and server sends out full state/update packets to players. Culling on the server can make this task trivial.

Not with the method I'm talking about. I guess my explanation wasn't clear enough- please glance through The Internet Sucks, you'll get the idea. But basically nothing but inputs are sent from clients to the server, and from the server to clients. Can't do much culling, as it's a requirement that every player can potentially see every other player on the map (I could send lower-fidelity info for more distant players though, if I went with the "server sends absolute states" way, which seems likely at this point). EDIT: This part wasn't very clear. I meant that if I opted for the traditional approach, I couldn't do much; if I went ahead with what I'm planning and talking about here, I couldn't do ANY culling.
Quote:
 Original post by SirisianP2P is far from being an optimization.Put it this way. Player 1 hits a rigid body block and Player 2 hits a rigid body block and the blocks move toward one another. They send their input to all other players. Timestamps will have to be used to know when things happened. If one player receives the message at 100 ms and another receives it at 250 ms and the two blocks already hit. Well you can see the complexity. Who's client is right after subtle errors compound. Also imagine if one client lags for 5 seconds and requires a full state update. Who sends out the full state?

Read again: what I described in that paragraph was an idea for a purely machine-side physics system optimization; it had nothing to do with client/server or P2P (it would apply either way). But what you said wouldn't in itself be a problem. The simulation runs at a fixed timestep on the client machines and the server, so synchronization isn't conceptually difficult. Update numbers would be used to know when things happened. Both player's simulations would revert to old gamestates to apply the new input information and run them from there up until the present time; bugs notwithstanding, there are no synchronization issues.

In any case I had pretty much already decided for client-server, as
1. The bandwidth requirements for 64 players P2P are still too high with this method.
2. Late-joining and player dropping (due to packets not being recieved in time) are much easier to handle.
Quote:
 Original post by hplus0603If you want to know one way of building an entire system around input-synchronous simulation, while hiding latency wherever possible, see US Patent 6628287.

Thanks, but the grinding repetition made it impossible for me to read all the way through. But from what I saw they didn't do anything that Lincroft didn't.

What really drew me to this approach in the first place was consideration of the physics system. The physics will be roughly as complex as those in Half-Life 2. Now, it's bad enough that with 64 players you need to send 64^2*packet size bytes each update just for the player info, but you also in this case need to send alot of info pertaining to the rigid bodies. If, on the other hand, you only send player inputs, you don't need to send ANY info about the bodies.

So what I need is either a super-fast physics system (I estimate that 500,000 CPU cycles per ~50 ms timestep update would do it), or a means of handling all the information not directly about players well enough, or something completely different.

##### Share on other sites
Quote:
 please glance through The Internet Sucks

That article is 9 years old, and the lessons it draws from even older.

It doesn't make them less valid - but 12 years is a very long time. Processor computing power has increased 256 times. Graphics performance probably by 1024-2048 times.

Internet has gone from mostly dial-up PPP to broadband, where streaming 5-20k is somewhat viable, although practical stream rates remain at 3k. It's the age of FaceBook, YouTube and BitTorrent afterall.

Quote:
 The physics will be roughly as complex as those in Half-Life 2. Now, it's bad enough that with 64 players you need to send 64^2*packet size bytes each update just for the player info, but you also in this case need to send alot of info pertaining to the rigid bodies. If, on the other hand, you only send player inputs, you don't need to send ANY info about the bodies.

If it's of same complexity, then how does HL2 do it? If they were sending 4 byte packets, that would be 280000 gigabytes of data per update. But still, their servers run just fine.

##### Share on other sites
Quote:
 from what I saw they didn't do anything that Lincroft didn't

Actually, the novel part of that invention is sliding the reference time frame around for different objects, and dealing with how objects interact based on their client-side perceived time frame. Whereas Tie Fighter had one database of "forward extrapolated" entities it used for rendering, that invention has the ability to continuously slide entities in relative time, and does so based on various criteria.

Anyway, I don't think CPU power has increased 256 times since 1999. I believe a top-of-the-line CPU from then was something like an 800 HMz Pentium III (this was a year or two before the Pentium 4 came out) with 133 MHz memory. Current CPUs run at 3 GHz with two or four cores, and 1333 MHz (effective) memory. Thus, you can claim a factor 10, but not 100. Basically, once the Pentium 4 second generation came out, Moore's law stopped applying to computer speed scaling, and we're now down to a measly 15-25% speed improvement (per single core) per year.

So, the result is still the same: When using input-synchronous client/server simulation with an authoritative server, you become limited on the number of entities you can deal with on the server. However, 64 entities should be no problem.

The problem with using existing physics engines like Havok or PhysX (or even ODE) is that they are not guaranteed deterministic. That means that the lockstep assumption goes out the window -- the simulations will slowly diverge. Now, you may call me partial, because my company is in the bussiness of licensing an enterprise virtual world platform based on exactly that: input-synchronous client/server simulation with an authoritative server. However, I'm just telling it like it is :-)

I believe HL2 uses entity updates for both players and physical objects. It also uses more than 5 kB/sec of bandwidth for a 16-person deathmatch with physical objects.

##### Share on other sites
Things have changed a lot. With my current server I've been developing (a test on bandwidth) I plan to average around 400 delta updates (give or take) for a 28K connection (aka 3584 bytes/s (best case meaning they are only playing the game and have no interruptions from their ISP)).

"Havok or PhysX (or even ODE) is that they are not guaranteed deterministic."
So true. Few physics engines are. I had to engineer my own for 2D (it's rather poor, but gets the job done. Deterministic physics = easy synchronization. A lot less worrying about things like, "did the grenade bounce off the wall and hit that object exactly like it did for everyone?" With timestamps you end up with the ability to know exactly where everything should be at a given time by rolling the simulation back and forth. (tricky, still haven't got to that part yet).

In the whole scheme of things I prefer trading CPU for bandwidth since I figured out that delta and full state updates are actually really cheap operations to do per player. (2D anyway).

I wouldn't go by what an article that old says about TCP either. In my experience with TCP it's perfectly fine for twitch based games (not as good as something like reliable UDP or something but it works).

##### Share on other sites
Quote:
 Original post by AntheusIt doesn't make them less valid - but 12 years is a very long time. Processor computing power has increased 256 times. Graphics performance probably by 1024-2048 times.

But the complexity of the physical simulation required is probably greater than 256 times, considering that proper newtonian mechanics (with lots of collision) are required, whereas XWvTF presumably had an arcade flight model.
Quote:
 Original post by AntheusInternet has gone from mostly dial-up PPP to broadband, where streaming 5-20k is somewhat viable, although practical stream rates remain at 3k. It's the age of FaceBook, YouTube and BitTorrent afterall.

But if I minimize the server's upstream requirements, a hell of alot more people can host reasonably large servers. Also, the worst-case complexity (i.e. alot of rigid bodies AND players in the same area) is the same as the average case in terms of networking; it only requires more CPU time on all machines.
Quote:
 Original post by AntheusIf they were sending 4 byte packets, that would be 280000 gigabytes of data per update.

I don't know how you figure that, but
Quote:
 Original post by Antheusf it's of same complexity, then how does HL2 do it?

no idea. It'd be good to know.

##### Share on other sites
Quote:
 Original post by hplus0603Actually, the novel part of that invention is sliding the reference time frame around for different objects...

Thanks for summing it up for me.
Quote:
 Original post by hplus0603So, the result is still the same: When using input-synchronous client/server simulation with an authoritative server, you become limited on the number of entities you can deal with on the server. However, 64 entities should be no problem.

What do you mean by "entities"? I need several hundred rigid bodies.
Quote:
 Original post by hplus0603The problem with using existing physics engines like Havok or PhysX (or even ODE) is that they are not guaranteed deterministic. That means that the lockstep assumption goes out the window -- the simulations will slowly diverge.

I'm aware of that, but I believe the Newton physics engine is deterministic, and it happens to be the easiest to implement on account of the engine I'm using (TrueVisison 3D 6.5).
Quote:
 Original post by hplus0603I believe HL2 uses entity updates for both players and physical objects. It also uses more than 5 kB/sec of bandwidth for a 16-person deathmatch with physical objects.

That's consistent with my experiences in HL2 multiplayer mods- things get horrible laggy when a couple dozen rigid bodies are in roughly the same place.

##### Share on other sites
Quote:
 I plan to average around 400 delta updates (give or take) for a 28K connection (aka 3584 bytes/s

What does 400 delta updates mean? The 28 byte UDP header overhead times 400 datagrams (say, 20 players 20 times a second) means a lot more bytes than that, so that's not what you mean.

The upstream bandwidth used can be estimated as:

  bandwidth = tickrate * (28 + deltasize * numplayers) * numplayers

tickrate is number of updates a second, and deltasize is the size of the update for a given player, in bytes. The bad part about that formula is the numplayers-squared factor.

Quote:
 I believe the Newton physics engine is deterministic, and it happens to be the easiest to implement on account of the engine I'm using (TrueVisison 3D 6.5)

Several hundred rigid entities will be challenging; if nothing else, then because you'll need to use base line re-sync for cases where the client can't possibly make the right guess. Unless you use a server-side round-trip for every player's interaction, including firing weapons (which usually is bad from a gameplay perspective). I don't know if Newton is multi-threaded, and I don't know how complex environments and bodies you are planning on using, though -- with the right trade-offs and optimizations, it might work.

I doubt that Newton combined with TrueVision will actually end up being as deterministic as you say they are, when you start deploying in the wild. There are lots of little details that will trip you up.

##### Share on other sites
//Off Topic
Quote:
Original post by hplus0603
Quote:
 I plan to average around 400 delta updates (give or take) for a 28K connection (aka 3584 bytes/s

What does 400 delta updates mean? The 28 byte UDP header overhead times 400 datagrams (say, 20 players 20 times a second) means a lot more bytes than that, so that's not what you mean.

The upstream bandwidth used can be estimated as:

  bandwidth = tickrate * (28 + deltasize * numplayers) * numplayers

tickrate is number of updates a second, and deltasize is the size of the update for a given player, in bytes. The bad part about that formula is the numplayers-squared factor.
Oh whoops I wrote that down too fast without thinking. 32 bytes for a TCP header lets say and 10 updates per second. And my automatic serialization is really odd, so I'm just gonna say roughly 20 bytes for an update and 10 bytes for random stuff maybe.
3500 = 10(32 + 10 + 20x);
//10 is for some random data
x = 15.4 updates per tick, so like 154 updates per second. Not too bad, but kind of unrealistic since some updates only take a fraction of a byte or like 23 bits and such, so the number is probably much higher, but worst case isn't too bad. I'll have to do tests later. I've switched around the system to an automatic serialization system I'm inventing. So much more to research.

##### Share on other sites
Quote:
 Original post by hplus0603Several hundred rigid entities will be challenging; if nothing else, then because you'll need to use base line re-sync for cases where the client can't possibly make the right guess.

I don't see how that situation would arise if the physics engine is deterministic. The client is making guesses but they only apply to that which is apparent to the player, not what the client knows about the actual gamestate.
Quote:
 Original post by hplus0603I doubt that Newton combined with TrueVision will actually end up being as deterministic as you say they are, when you start deploying in the wild. There are lots of little details that will trip you up.

Hmm, guess you're right about Truevision insofar as its built-in Newton support is concerned- I can't know whether it has precision or other issues, and if it has them I can't fix them (it's closed-source). But if Newton itself isn't deterministic, then the blurb on their site is full of damn evil lies.

EDIT: Using estimates of 3000 updates/sec peak and 300 average (50 Hz simulation, max 800 ms old information) Newton clearly isn't gonna do it. I checked out the stress test from their site, the situation in which is considerably more complex than what I'm doing. Nevertheless, it peaked at 4 ms per physics update. This is on a 3GHz Core 2 Duo (I forgot the model) with a 1333 MHz FSB, and DDR2-800 MHz CL4 RAM. So I guess I find a super-fast deterministic physics engine (seems unlikely), write my own (seems REALLY unlikely), or scrap the idea altogether. The physics don't actually need to be as high-fidelity or realistic as HL2s, but still...

[Edited by - Baiame on February 1, 2008 8:27:48 PM]

##### Share on other sites
Quote:
 I don't see how that situation would arise if the physics engine is deterministic.

There's all kinds of things that can trip it up. Start with the rounding mode and internal precision mode of the FPU, which can get changed by various shared libraries. Continue with differences in floating-point implementation between Intel and AMD. And, even worse, different code generation if you use different compilers (say, for Linux servers, or for a PowerPC version).

Then, there's the problem that a "big matrix" solver can only be deterministic if EVERYTHING is consistent. This includes things like in what order objects are added to the world. Unless you want to re-play the entire sequence of events from the birth of the world when a player joins, you will not get deterministic results. Which order an object finds itself in in some internal list or hash of some sort will matter to determinism.

Quote:
 3500=10(32+10+20x)

I don't understand that formula. The base TCP header is 40 bytes. The updates are MULTIPLIED per user. Let's say 10 users, 10 updates per user per tick, and 10 ticks per second with 20 bytes per update.

10*(40+20*10)*10 == 100*(240) == 24 kB/s

Perhaps you're talking about only a single channel at a time, as seen at the client? My formula is for calculating the needed upstream bandwidth at the server, so it's sending to each of the clients data about each of the clients. Plugging in 300 rigid bodies of which 60 are players:

10*(40+20*300)*60 == 600*(6040) == 3.6 MB/s or enough to almost saturate a T3 connection. Each client sees 60 kB/s, which is about half of a typical 1.5 Mbps DSL connection.

##### Share on other sites
Quote:
 Original post by hplus0603There's all kinds of things that can trip it up. Start with the rounding mode and internal precision mode of the FPU, which can get changed by various shared libraries. Continue with differences in floating-point implementation between Intel and AMD. And, even worse, different code generation if you use different compilers (say, for Linux servers, or for a PowerPC version).

Oh yeah, I was aware of floating-point issues. Guess I didn't consider how hard they would be to control when using someone else's library...
Quote:
 Original post by hplus0603Then, there's the problem that a "big matrix" solver can only be deterministic if EVERYTHING is consistent. This includes things like in what order objects are added to the world. Unless you want to re-play the entire sequence of events from the birth of the world when a player joins, you will not get deterministic results. Which order an object finds itself in in some internal list or hash of some sort will matter to determinism.

I was reading about that last night on the Bullet forums. I know what you mean, but it doesn't mean you have to re-play from the very beginning; you'd "just" have to find some way of buffering the entire state of everything in the physics library after every update. Which would require lots (too much?) memory, but more importantly would require intimate knowledge of the particular library. Which would itself have to be open-source. And again, really really fast. And it'd take alot of time. And patience.

As if I weren't ready enough to give up, I worked out that one of my other requirements (that I forgot to mention) is impossible to fulfill on modern connections. Guess it was just another overambitious idea that got shot down at an early stage, but thanks for your help ensuring the "early" part, everyone.

##### Share on other sites
Quote:
 Original post by Baiame Which would require lots (too much?) memory, but more importantly would require intimate knowledge of the particular library. Which would itself have to be open-source. And again, really really fast. And it'd take alot of time. And patience.

I don't think memory would necessarily be an issue.

You'd receive baseline state over the network anyway. That would mean that relevant dynamic shared state needs to be relatively small. Everything else would need to be either static and require no copying, or could be dynamically evaluated as needed.

##### Share on other sites
Quote:
 you'd "just" have to find some way of buffering the entire state of everything in the physics library after every update

You need to do more than buffering. You need to make sure that all the internal data structures work the same on all nodes. This means that ordering of rows in the matrix, for example, is deterministic based only on the incoming entities and constraints (say, sorted by entity ID or similar), not on something else, like history or chance. The same problem crops up in a few different places, actually.

##### Share on other sites
what if ObjectA is not controlled by input, it is controlled by time or a counter. If the network system just seds/receives input each machine has
its own update for ObjectA. Making synchronization a bit harder.
So you would still need one server to control the overall game behavior not to mention cheating and security.Thus taking everything back to dead reckoning or whatever method you want to use for inter/exter-polation.
If I am not mistaken valve's paper for HL2 network describes what they do in details. In summary:
Clients send their input and continue using them normally (client side prediction).
Server Handles and moves all game objects and sends correction packets from time to time.

##### Share on other sites
Quote:
 So you would still need one server to control the overall game behavior not to mention cheating and security.Thus taking everything back to dead reckoning

Server authority and input-synchronous communication are orthogonal concepts. Just because you're server authoritative doesn't mean that you need to use dead reckoning. Read the patent I pointed you at, or try Virtual Pimp My Ride, There.com or Virtual Laguna Beach which are some of the users of that technology (with free accounts).