Can bandwidth variation cause extra lag/loss?

Started by
11 comments, last by Oogst 9 years, 8 months ago

In Awesomenauts we recently reduced both bandwidth and packet count a LOT, but it seems like this is actually creating more issues for a number of issues. A theory we have on why this is, is that some internet connections perform better under constant high load than under varying lower load. Is this true?

Before the patch each client used to send 150 packets per second all the time, using around 20kB/s upload. This is for a six player match.

After the improvements we now send between 50 and 120 packets per second and use between 8 and 13kB/s.

So bandwidth and packet count have greatly been reduced under ALL circumstances, but have become a lot less constant. Might this variation in itself be a cause for new problems?

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

Advertisement

I also see an increase of packet loss when I reduce the size of the packets, but not the rate. It seems that some routers prefer bigger packets.

Stefano Casillo
TWITTER: [twitter]KunosStefano[/twitter]
AssettoCorsa - netKar PRO - Kunos Simulazioni

It may help to elaborate what "more issues" you are seeing. If that is the fact that packet loss becomes more painful, then that's only natural. If you have fewer packets sent less often (with much less redundancy), then every single packet is crucial. Losing just one is much more noticeable than it would be otherwise (when you'd maybe lose 3 or 4 but wouldn't care!).

As for packet sizes and frequencies, the size of each packet should actually not matter on the internet (as long as you don't exceed the MTU) since routers work on a packet-per-second notion and have number-of-packet length queues. They do not really care about how much is in one packet, and due to minimum frame lengths and padding, sending a small packet and a large packet is more or less the same on the physical cable, too.

That isn't true on ATM obviously (last mile for DSL home users and much of the "mobile" stuff), since an ATM frame has an entire 48 bytes of payload. So yeah, quite clearly smaller packets cause fewer ATM frames (which should actually be better!). It also matters on wireless, since other than in a cable, bit errors due to noise do actually happen. On a properly shielded ethernet cable, the chance for that to happen is around 10-13 to 10-14, which comes down to zero for most practical purposes, but surely that's not the case for Wifi when your neighbour turns on the hoover...

Also, the frequency of packets may very well matter. While common logic tells you "fewer is better", this may not always be true.

A lot of wireless networks (including the one I use here, and I don't even know how you could turn this off!) throttle down when the device thinks that there isn't enough traffic. Not sure how it works exactly, but devices on the 2.4GHz net often run with 20-28 Mbit/s if you don't touch them or don't do a lot and as soon as you do something serious switch to 54MBit/s. Similarly, my laptop is currently shown between 292 and 299 MBit/s (that's on 5GHz), which is the "usual" high-power value that I see whenever I bother to look, and 100MBit/s being the low-power one. The box has "600" written on it, so I guess it would probably go twice as fast, if the laptop's network card allowed for it.

Assuming it works similarly for other people, then quite possibly, you may get double (or triple?) bandwidth for some users when you send more packets.

Also, there is a statistical thing to it which might affect you in presence of congestion. Routers have relatively short forward queues and discard quickly. This is done on purpose nowadays. When RAM became cheaper, the initial assumption was that one could improve routers by putting in more RAM and making queues longer. This proved wrong because if a router keeps a packet for too long before forwarding it, it will be considered "lost" and will already be resent, ending up as duplicate packet and wasting bandwidth.

Common logic tells you that if the router is already congested, you should send less. However, routers discard quickly, and not necessarily in-order. So it is possible that sending more packets on a congested router gives you a higher likelihood of getting some through. Though of course the opposite may as well be the case due to bandwidth quotas / QoS / DoS prevention, and whatnot. Impossible to tell, really.

In summary, packets can be lost and will be lost. You just have to be able to cope with that fact, there's not much you can do about it.

Hmm, so our hypothesis that sending less might create new problems is actually true then... Argh! How would you suggest we handle this? Basically the only thing I can reasonably tweak is sending more deliberately when it is not really needed. But how can the game know when it needs to do that?

An idea behind this patch was that if the bandwidth is usually lower, then the connection is not congested yet when a short peak happens, making it better able to handle shorter periods of more bandwidth. Is this idea not true at all then, or only not true for some users?

As for the type of network issues: we have seen several. Some users claim higher ping overall. Others claim to have more network errors (connection lost entirely). Others claim that during longer teamfights there is first more lag and then it settles and play fine again. I think only the latter would fit exactly what you describe: the connection doesn't immediately scale up when the game starts sending more, but if it keeps using more than it does scale up after a while.

We did think of the situation where packet loss becomes more relevant and were already wondering whether we might want to send more stuff twice now even before the time to wait for an acknowledgement has passed.

(And thanks for the extensive reply, highly appreciated! worshippy.gif)

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

our hypothesis that sending less might create new problems is actually true then


In general, you should see lower server load, and more players able to play well, when using less data and fewer packets.
The one area where fewer/less may hurt you is if you previously had redundancy, and now you don't, and you see occasional-but-not-infrequent packet loss.
In general, most well-designed router and transmission systems I know of would perform equally-or-better with fewer/smaller packets.
However, there may be all kinds of ill-conceived devices in the way between a player and the packet destination, not least of which is the player's wifi router or internet gateway. That may be a 10 year old device that thought that prioritizing larger/bigger streams would lead to better throughput benchmark numbers -- who knows?
If you can quantify what the packets are, and how you use them, and how much "loss" or other "degradation" you are seeing, it would be easier to make a better judgment on what's going on.

Btw: 50 to 120 packets per second is still a whole lot. I presume that you send packet-per-command instead of bundling multiple messages into a single packet and sending on a fixed schedule?
enum Bool { True, False, FileNotFound };

Hmm, so our hypothesis that sending less might create new problems is actually true then... Argh! How would you suggest we handle this? Basically the only thing I can reasonably tweak is sending more deliberately when it is not really needed. But how can the game know when it needs to do that?

An idea behind this patch was that if the bandwidth is usually lower, then the connection is not congested yet when a short peak happens, making it better able to handle shorter periods of more bandwidth. Is this idea not true at all then, or only not true for some users?

As for the type of network issues: we have seen several. Some users claim higher ping overall. Others claim to have more network errors (connection lost entirely). Others claim that during longer teamfights there is first more lag and then it settles and play fine again. I think only the latter would fit exactly what you describe: the connection doesn't immediately scale up when the game starts sending more, but if it keeps using more than it does scale up after a while.

We did think of the situation where packet loss becomes more relevant and were already wondering whether we might want to send more stuff twice now even before the time to wait for an acknowledgement has passed.

(And thanks for the extensive reply, highly appreciated! worshippy.gif)

Options that I can think of, to solve the problem in a short time:

1) Create a NOP-like package, just to create some padding. Use some in-client statistics to define what is the ideal padding (remember to limit the max padding, in order not to be ddosed).

2) Support both protocols (the current one, and the former with bigger packages). Let players change it themselves, or create an statics on latency, use one protocol sometimes, then the other. Select the one that performs better.

A long time solution would probably be trying to define if some specific hardware works better with the former protocol and create a more use friendly option on the ui, for instance a checkbox saying: "Use old format network (works better with older modems/devices/routers*)"

Finally, never discard the possibility of a bug in either the client or the server code.

* I don't know what would be the best way to phrase it, but assume users do not know the technichal language.

Currently working on a scene editor for ORX (http://orx-project.org), using kivy (http://kivy.org).

Btw: 50 to 120 packets per second is still a whole lot.

It would be interesting to know if this is for 6 players (that's what I assumed!), or for one. If that's per player, it is indeed a lot.

Hmm, so our hypothesis that sending less might create new problems is actually true then... Argh! How would you suggest we handle this?

For some people, yes. But probably not for all. I would not try to handle this at all. You cannot ever make something that works for everybody. This is simply impossible.

Do send what you must send (but not more), and make it work for 95% of your users 95% of the time. That's as good as you can get. Yes, some people will still complain, but you can't make it work for everyone.

Do not "economize" and try to be super-smart to squeeze out a few extra bytes, but then don't send what you must send, or make your protocol fragile if a single datagram doesn't arrive as you expected. That's saving on the wrong end.

Do assume that packet loss will happen, because it just will (occasionally, but unavoidably). You don't own or control the internet, so there's not much you can do either. Do however not assume that it happens all the time and in a large, measurable quantity because it doesn't (well, it does, this still happens, but only when you either got congestion control totally wrong, or when something is broken that's beyond your powers to fix!).

The 50 to 120 packets per second is indeed a lot. It is what one player sends to all five players combined. So when it is 50, each player sends 10 packets to each other player. Awesomenauts is entirely peer to peer, so each player needs to inform every other player of his own status. This number is for upload only: each player also receives a total of 50 to 120 packets per second.

We do need to handle things like this: we are seeing way too many network problems in Awesomenauts at the moment and way too high lag so we need to look into every weird little detail to find where we can improve something. We received dozens of reports from players who claim they got more problems since the network optimisation, so we cannot discard that. Especially since Awesomenauts is competitive multiplayer, so if one of the six players has a problem, everyone in the match is affected.

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

Thanks for clarifying. 50-100 packets per second when sending to five other players is not unreasonable, other than peer-to-peer being generally unreasonable as a design choice :-) (I think that, for you, that ship already sailed, so no more about that.)

Lag is not a well-defined word in the greater world, unfortunately. When your players say "lag," you should make very clear about what that actually means. I've heard users call "lag" on anything -- for some users, it's bad frame rate. For other users, it's occasional network corrections. For yet other users, it's how fast the character accelerates. For other users, it's how many frames the graphics card keeps buffering, and yet others will say that it's when they see one thing but another player sees another thing. Lag can be caused by thermal management in laptops, or fluorescent lights being on in the room, or a room-mate watching the latest episode of House of Cards on Netflix. If all they're complaining about is "lag," you don't know what they're complaining about! In that case, you should dive into a lot more detail, including visiting players in the same city as you to see what they're experiencing first hand.

If you can't visit any users at all (sadness!) then at least you may be able to collect statistics on ping times, packet loss, packet send/receive rate, frame rate, CPU utilization, and perhaps system temperature, and upload this kind of data to your home servers every few minutes from randomly selected users. This will give you some more hard data to correlate to. You might even have a dialog after a match where users get to rate "My opponents were nice/bad," "I had fun/it sucked," and "The game was laggy/okay," and upload the statistics together with the survey results. You can't fix what you can't quantify and measure!
enum Bool { True, False, FileNotFound };

Yeah, it is often difficult to interpret what users mean by "lag". We recently had a user who complained about lag and it turned out that the cause was that sounds were played 50ms too late, causing the game to feel less responsive...

We have been monitoring this for ages and have tons of network metrics, so we do know very well what issues users are having. The main issues we see are:

-Jitter and packet loss causing character movement to not look fluent

-High latency causing situations similar to the infamous shooting-around-the-corner problem

-Connection loss causing players to be disconnected

-Latency spikes / short periods of no packets coming through

We have lots of users and we know that each of these is happening too much, so we are looking for anything we can find to improve on any of these. We had hoped that general bandwidth/packet count decrease would do a lot for all of those. It did in a sense: the number of network errors was reduces by around 15% I think since this patch.

My dev blog
Ronimo Games (my game dev company)
Awesomenauts (2D MOBA for Steam/PS4/PS3/360)
Swords & Soldiers (2D RTS for Wii/PS3/Steam/mobile)

Swords & Soldiers 2 (WiiU)
Proun (abstract racing game for PC/iOS/3DS)
Cello Fortress (live performance game controlled by cello)

This topic is closed to new replies.

Advertisement